Press Release


Data Integration Solutions Largely Ineffective, Unprofitable Expenditures for Life Sciences Companies Due to Insufficient Front-End Planning, According to 3rd Millennium

CAMBRIDGE, MA - May 6, 2002 -- In a recently released White Paper assessing the promise and reality of data integration for biopharmaceutical R&D, 3rd Millennium stressed that many life sciences companies have not benefited from their data integration solutions due to poor planning and flawed selection and application of technology solutions. 3rd Millennium is a private, bioinformatics consulting firm helping clients implement bioinformatics systems that support early-stage biological research and discovery activities. (download White Paper).

Many recent studies have emphasized how the explosion in biological data resulting from genomics technologies holds the promise of rapidly improving drug discovery and development. At the same time, there are relatively few examples of how this influx of data has significantly improved the drug discovery process. One of the critical challenges of the post-genomic era is integrating these new sources of biological data. Data integration, which involves unifying data that are scientifically related, but originate from unrelated sources, has become a complex challenge due to the wealth, complexity and lack of standardization of new data types and sources.

"Through the integration of a vast range of information now available, researchers and program directors can discover relationships that enable them to make better and faster decisions about targets and drug candidates. However, even though powerful data integration technologies exist, the challenge of determining the best solution has doomed many investments. Companies are often undermined by how much more difficult it is to match their objectives with the right data integration solution than for other software investments. And if a company selects incorrectly, it might end up with a solution that only partially addresses its requirements," said Roland Carel, Ph.D., Senior Systems Architect for 3rd Millennium.

In this White Paper, 3rd Millennium provided an overview of the four main approaches for solving data integration problems, which are database federation, data warehouses & datamarts, specialized databases and point solutions. 3rd Millennium also detailed the organizational contexts in which each approach is most beneficial, with an additional review of database integration products and technologies such as SRS, DiscoveryLink, the K1 integration middleware, Genomax, and XML. And finally, 3rd Millennium detailed the key considerations companies must address when framing their data integration needs.

Key issues cited by 3rd Millennium in selecting a data-integration solution include:

> Determining the precise scientific questions being asked.
Choosing the right technology to solve a data integration problem depends in large part on the nature of the research questions being asked. However, since during the course of the implementation, scientists may refine their research objectives in light of new, related knowledge, it is crucial to anticipate other questions that might not be initially evident---but that are likely to emerge as the project progresses. By taking into account not only the current research requirements, but also planning for contingencies, the data integration solution implemented is more likely to retain its value over time.

> Managing the disparity among data sources.
Different sources of data are complex, subject to interpretation, and contain many non-obvious discrepancies. This tends to obscure the true relationships between the data and makes it difficult to ascertain precisely the research questions that the data can help answer. Requirements for the integrated data may often call for a wide range of queries that can only be accommodated through detailed knowledge and extensive processing of the raw data. The trap of responding to these difficulties by making simplifying assumptions about the data can result in a solution that can only accommodate a much narrower range of research questions than originally envisioned. The end-result is a system that is of moderate value to researchers and is not used.

> Determining the best technology solution.
Some research questions require access to a large number of rapidly changing data sources. Some highly specific questions require data cleaning and normalization across sources that might require creating a data warehouse or a datamart. And in many cases, data integration issues can only be resolved through the creation of a specialized, dedicated database. The right solution often involves more than one technology as well as a mix of existing and custom interfaces. Determining the right technology must be done in the context of the organization's legacy information infrastructure, its realistic technical capabilities and of the research questions targeted by the integration.

3rd Millennium predicts that data integration will become even more important as organizations increasingly focus on bridging the gap between biology and chemistry in drug discovery and as highly data-driven approaches such as systems biology become more prevalent.

"While superior data integration capability does not guarantee results, making the wrong data integration decision is detrimental to R&D programs. We believe implementing the right data integration technologies might prove critical to ultimate success," Carel added.

About 3rd Millennium, Inc.
3rd Millennium Inc., located in Cambridge, MA is a leading bioinformatics consultancy that works with pharmaceutical, biotech, and academic clients on a project basis to design and develop software systems critical to early stage biological research and discovery. 3rd Millennium's expertise is in custom biological database applications, microarray informatics and analytics, and data and analysis integration. The company has extensive experience developing systems for laboratory information management, microarray-based gene expression profiling, gene analysis and annotation, genomic and proteomic data integration and versioning, and biological pathway modeling. Additional information can be found at http://www.3rdmill.com.