Sample Projects

The following is a sample of some recent custom systems we developed for our clients:
  1. A knowledge base of post-translational modifications
  2. Product development platform for a biological reagent company
  3. Array Repository Data Analysis System for the department of defense
  4. Knowledge management system for a drug discovery company
  5. LIMS for a microarray facility
  6. Sequencing pipeline for a genomics company
  7. Public gene expression database
  8. LIMS for a MS proteomics facility
  9. LIMS for biological sample provider
  10. Curation of annotations for genes

A knowledge base of post-translational modifications

a) Requirements: A company had developed a curated resource for protein modification events and they effect on cellular processes. The system suffered from several ills including poor performance, inadequate curation interfaces, a limited security model, weak search capabilities, and the inability to display all the data stored in the database. In addition to curated data, high throughput data from a novel characterization method of protein modifications based on mass spectrometry needed to be accommodated in the system. Although most of the current data was for protein phosphorylation, large amount of data for other types of protein modifications were expected to be loaded in the system at an ever increasing throughput.

b) Our solution: The database for the system was restructured significantly. New important concepts missing from the initial version of the system were introduced and fully support for multiple types of protein modifications was added. The restructuring included many features that resulted in a several fold improvement of query performance. The software for the system was completely rewritten using current generation enterprise technologies. In the reengineered system, new user interfaces enable users to formulate sophisticated queries based on treatments, diseases, cell lines, and tissues, and result in retrieval of bulk data sets. These bulk data sets can be downloaded for analysis and model building. There are provisions for bulk queries that will allow high throughput labs to determine whether sites from their modification site discovery programs are novel. Interactive viewers for protein structures and the modification sites were incorporated in the system. Also, an export of the database as a graph of biological objects is available for analysis of the data in a pathway tool.

Go to top
Protein modification (Website)

Protein modification (Curation)

Product development platform for a biological reagent company

a) Requirements: A leading developer and producer or biological reagents had reached the limit of FileMaker’s data management capabilities. They needed a comprehensive data and business process management solution that would support them through their rapid growth. Due to the diversity of their portforlio of reagents, including but not limited to polyclonal and monoclonal antibodies, siRNA, purified protein kinases, small molecules, and peptides, they requested a flexible system capable of recording highly diverse data. Product developers, product managers, laboratory scientists, marketing and graphics staff, receiving and shipping staff, and senior management all required means to record and organize mission critical information and to search and mine this information in flexible and secure ways. In addition, the system should facilitate the transfer and tracking of samples to and from contractors, the transfer of product data to the company’s corporate web site, and the publication of high quality product data sheets.

b) Our solution: The Product Development Platform is an enterprise system that supports all aspects of reagent development and production. It assists product development through a comprehensive set of Laboratory Information Management System interfaces and functions. Specialized groups such as those performing immunohistochemistry or flow cytometry assays are supported through dedicated modules. The system supports receiving barcoded samples from suppliers and aids the management of freezer inventory. It organizes products in a unified repository that unifies all product types under a unified paradigm. It tracks lots of products and the assays that characterize the lots. It manages the process of releasing a product for sale and is integrated to the financial and e-commerce systems. Finally, it is tightly integrated with Adobe InDesign for automated production of product data sheets.

Go to top
Product Development Platform

Array Repository Data Analysis System for the department of defense

a) Requirements: Biological researchers within the Department of Defense leverage functional genomics technologies for a wide ranges of studies such as the identification of novel targets and therapeutics for Malaria, the dissection of host-response pathways for HIV-Malaria co-infection, the development of an HIV vaccine, or the identification of biomarkers and the development of novel therapeutics to counter the effect of chemical and biological toxins. The data collected from Affymetrix and De Novo spotted arrays through large scale experiments needs to be tracked, shared with authorized collaborators, and analyzed to extract important biological knowledge. By identifying causal relationships between stimuli and differential expression and incorporating prior knowledge from public domain and proprietary sources of structured and unstructured information, researchers can build association models of biology for their systems of interest.

b) Our solution: ARDAS is an information system to build models of gene expression by assisting in the management of both experimental and analytical microarray data. ARDAS is comprised of three modules: the Warehouse, the LIMS, and the AIMS. The Warehouse stores all the information derived throughout the microarray experimental and analytical workflows. The Laboratory Information Management System (LIMS) module records information associated to the array printing and hybridization workflows. This information can then be passed to the Warehouse and transferred to the Analysis Information Management System (AIMS) to determine which genes are differently regulated for biological reasons. These lists of genes and expression values can be then stored back into the Warehouse and queried to build biological models of expression.

Go to top
Array Repository Data Analysis System

Knowledge management system for a drug discovery company

a) Requirements: A drug discovery company needed a Knowledge Management System (KMS) to capture the information generated by cross-functional research groups. The company was shifting its focus from a pure service model to an integrated drug discovery infrastructure and needed to leverage its research and discovery capability more effectively. The principal aim of the project was to capture and centralize the knowledge generated by the scientists in the several divisions, and to organize that knowledge such that it can be easily mined, browsed, and navigated. By providing a common platform to all scientists in the organization, KMS enables serendipity by providing comprehensive views of biological and chemical entities, fosters collaboration between scientists in different functional groups by defining a common framework for all groups, supports partnerships with other pharmaceutical companies by providing data delivery mechanisms and clear separation of intellectual property, and facilitates decision making through detailed tracking of projects and programs.

b) Our solution: We specified, designed, and developed an enterprise-wide Knowledge Management System that captures and organizes the results of experimental work in a unified framework. This information is cross-referenced to an extensible model for biological and chemical entities. Rich annotations and relationships are loaded automatically for these entities from the public domain or through curation by scientists. Researchers are able to search this knowledge base through sophisticated data mining functions. The system is also capable of automatically generating new information by automatically running computational biology tools and recording and processing the results from the tools. The users can review the generated data, track the status of the work done across multiple projects and programs, annotate any object in the system, and associate supporting materials as attachments. The system also has the capability to automatically create links to many outside sources such as Genbank or Medline.

Go to top
Knowledge Management System

LIMS for a microarray facility

a) Requirements: A large biotechnology company needed a Laboratory Information Management System (LIMS) for tracking the operation of its central microarray facility. The facility was a user of the Affymetrix platform and designed, spotted and hybridized spotted chips as well. They wanted the LIMS to record all the steps in the microarray workflow, RNA labeling, hybridization, scanning as well as the construction of spotted chips. The ability to constrain the work in the laboratory based on the quality-control assessment of samples and chips was required. The LIMS was mandated to capture raw and normalized expression data calculated from the images for the hybridized chips, provide a sophisticated search interface and include extensive reporting capability.

b) Our solution: We created a LIMS with a common database model and a unified set of user interfaces for the Affymetrix and spotted array platforms. Through the LIMS, users can organize their data in a project hierarchy and track all of the operations in the laboratory. To expedite data entry, user interfaces support a batch mode where data from multiple procedures, e.g., the hybridization of several chips, can be entered in a single form. The LIMS incorporates a datamart that stores the expression data and provides high performance search functionality.

Go to top
Microarray LIMS

Sequencing pipeline for a genomics company

a) Requirements: A genomics company had exceeded the capacity of its sequence-processing pipeline. They required a new system that was scalable, automated, flexible, and fault tolerant. The system had to seamlessly process the chromatograms generated by the sequencing laboratory through a set of bioinformatics tools with user-specified parameters, and deliver the resulting data to the end users. The system had to allow for user-defined processing and delivery, to automatically compute and store reads' statistics, to store all chromatograms and FastA files in a secure central repository, and to automatically notify end-users of relevant events.

b) Our solution: We designed and developed an automated, highly scalable, and fault tolerant sequencing pipeline that validates and processes in real time the chromatograms produced by the sequencers. The system computes statistics based on user preferences and stores all results in a centralized and secure database. The input files and the files created during processing are stored in a file repository. The files and the statistics for the reads are distributed based on registered user instructions. A highly flexible reporting tool and a comprehensive security model enable the users to search the database for data they are authorized to access and to view the associated sequence files. Reports can be scheduled to run periodically and are returned to users as either text, HTML or XML.

Go to top
Sequencing Pipeline

Public gene expression database

a) Requirements: A consortium of academic researchers from ten different research organizations around the world needed a secure, easy to use system for sharing, integrating, and extracting data to facilitate their academic collaboration. These scientists were conducting a series of coordinated microarray gene expression experiments on several different model organisms to study polyglutamine-expansion neurodegenerative disorders. Specific requirements included support for coordination of experimental design, the ability to record the biological context of samples, extensibility to manage tens of gigabytes of raw data and thousands of files, and support for sophisticated ad hoc queries involving biological context, gene annotation, and expression values.

b) Our solution: We developed a centralized, web accessible data repository with data loading, data extraction and sophisticated search capabilities through an intuitive user interfaces. The system enables researchers to compare and analyze expression data generated from different disease models. It also supports complex querying based on the biological context of the samples and gene expression criteria. The system parses and extracts files uploaded from multiple sites into data series from the coordinated experimental designs (the system currently contains over 15,000 data files generated from microarray experiments). A robust data security and administration capability provides flexible, secure data sharing and data access to many classes of users. It is described by one of the lead users as, “A very robust system that meets our complex research needs exceptionally well.”

Go to top
Gene Expression
Gene Expression Database

LIMS for a MS proteomics facility

a) Requirements: A large-scale MS proteomics facility needed a LIMS to track the operation of its laboratories. The laboratories operated as an industrial facility where many samples were simultaneously characterized through several workflows and a large battery of instruments operated continuously. In addition to tracking samples, containers, reagents, consumables, and the execution of tasks, the LIMS was required to include components for inventory management, equipment part tracking, equipment maintenance tracking, workstation configuration, and suppliers contact information.

b) Our solution: We implemented a flexible workflow system that allows users to define their own workflows and to execute these workflows in the laboratory. Although the underlying database and the associated software was highly sophisticated, the user interfaces presented to the end-users were straightforward and intuitive. Of the many tasks that might be active simultaneously in the laboratory, technicians would only see those assigned to them. Laboratory managers could access information on all of the tasks and gauge in real time the productivity of the laboratory. Extensive support for quality control information provided metrics for the quality of the work for the laboratory, individual technicians, or workstations.

Go to top
Proteomics LIMS
Proteomics LIMS

LIMS for biological sample provider

a) Requirements: A clinical genomics company needed a Laboratory Information Management System (LIMS) for tracking the verification of pathology reports for diseased tissue samples. They wanted the LIMS to manage all aspects of the laboratory operation, including container and samples location and barcodes, standard operating procedures, experimental parameters for all tasks, and results produced by instruments, e.g., images of the stained samples or pathology reports.

b) Our solution: We created a web-enabled LIMS that tracks the complete histology and pathology verification processes. The workflow is initiated when tissue slides are generated from paraffin-embedded or frozen samples. The slides are tracked at each workstation in the histology laboratory where they are prepared for the verification step. The LIMS provides laboratory managers with statistical and throughput reports that provide the information required for managing and optimizing operations. After the samples slides pass the histology laboratory quality control checks they are provided to the pathologists. As part of the pathology verification process, the microscopic features and slide images are recorded in the LIMS. The workflow completes when samples are sent back to storage. Although a sample may be fully processed, the location of each slide generated from a sample remains available in the LIMS.

Go to top
Tissue LIMS
Tissue LIMS

Curation of annotations for genes

a) Requirements: A company desired to extend the functionality of an existing software product to enable the tracking, storing, and versioning biological objects such as sequences, annotation or sequence clusters. They also needed a search engine that would operate at the level of object versions. Scientists would then be able to pursue work on different versions of the same objects and keep track of their preferred versions in a project context, independently of other users. The underlying database was required to store query snapshots and be able to execute queries in different project contexts and across several preferred versions of the same object.

b) Our solution: We create a model for representing a hierarchy of projects that allows for the versioning and tracking of biological objects and of their relationships. This structure permits the scientists to work on multiple versions of the same objects simultaneously. It also allows an organization to reach a consensus on the preferred versions of objects without affecting the work of individual researchers. The system also facilitates the construction of contextual queries that are dynamically maintained for optimum performance.

Go to top
Versioning of Sequences & Annotations