Image analysis is critical to understanding the results of large-scale cell-based screening assays used in drug discovery. While the momentum in machine vision allows tens of thousands of compounds to be investigated automatically, high-content screening (HCS) methods present challenges for extracting, analyzing, integrating, and sharing assay results. HCS produces massive data sets and the images involved can be incredibly complex due to the multitude of file formats and next-generation imaging technologies now in use. Furthermore, cell image data must also be integrated with other sources of chemical and biological knowledge—knowledge that may be scattered across any number of “information silos” existing both inside and outside of the research enterprise.

All too often, effectively leveraging the research generated through HCS requires painstaking image analysis, manual reformatting, and conversion of various file formats (text and numeric data, as well as images) and hand-coded point-to-point IT connections to move information between various scientific systems and applications. With the pressure to quickly identify promising compounds increasing—along with the volume and complexity of all types of scientific data—ad-hoc attempts at integration are no longer viable. It’s simply too time consuming and too expensive. The question then becomes: How can drug discovery organizations make use of the data associated with HCS as quickly and cost effectively as possible?

Path to faster discovery
Pharmaceutical companies need an integrated and automated way to manage the research generated through images—as well as through other techniques such as statistical analysis, modeling, simulation, etc.—so that critical information can quickly be found, used, and shared throughout the organization. This requires an end-to-end, enterprise-level approach to scientific informatics that facilitates fast data access, integrated analysis and reporting, and cross-disciplinary collaboration.

Necessary features may include:
• The ability to capture, annotate, analyze, model, and share relevant image information from HCS screens regardless of the format or source system.
• Integration of  images, text files, and numerical data into analysis protocols and reports that can be used by researchers and decision-makers throughout the organization.
• The creation of interactive reports that dynamically link images to data points within charts, tables, graphs, or scatter plots.

Emerging IT solutions that utilize Web services to create a centralized data management platform can facilitate this data management. Research processes can be split into parts that can function independently from their source system or application and be used to support “plug and play” integration of multiple data types and formats—including images—without requiring customized IT intervention. For example, cell images can be automatically linked with chemical compound data or previous biological assay results, enabling researchers to combine information into interactive reports, models, and statistical analyses.

From 30,000+ molecules to hundreds of hits
A 2010 report in Molecular Biosystems details the effectiveness of an integrated and automated approach to the extraction, analysis, and annotation of information from image-based cell screening of chemical libraries.1 A cancer research center based in Madrid, Spain wanted to speed and simplify the multi-step process that was required to leverage large volumes of image-based phenotypic information generated through HCS. This process involved a number of time-consuming independent tasks ranging from image retrieval and analysis steps to data formatting and customized scripting to convert and integrate data files from different systems and sources. Pipeline Pilot, an enterprise scientific informatics platform developed by Accelrys was deployed. It enabled what the authors described as an “integrated one-step approach” to image analysis and hit assessment.

In the HCS assay example described in the report, more than 30,000 compounds were screened to identify inhibitors of a signaling pathway activated in more than 50% of human cancers. Using the Pipeline Pilot, researchers were able to automate the extraction and analysis of the cell images used in the assay, perform statistical analysis and create models using integrated phenotypic and chemical data, perform hit assessment, and archive all information in an annotated database—without requiring any manual intervention or custom point-to-point data or system integration.

This approach not only enabled the research enterprise to speed the discovery of inhibitors to the signaling pathway being studied, it also facilitated the collection and archiving of a wide array of valuable phenotypic information generated by the HCS assay—information that could be used to query new profiles later without the need to re-analyze the original cell images. Compared to traditional methods, the one-step approach had an overall accuracy rate of 96.1%, in addition to identifying hits that were missed during the manual, multi-step process.

A flexible, services-based platform for  image informatics can allow drug researchers to use data generated by sophisticated HCS technologies, legacy systems, external databases, and more, and overcome the integration challenges these systems present. Manual tasks like image retrieval, formatting, processing, and reporting can be automated, freeing up IT resources and speeding research efforts. Chemists, biologists, toxicologists, and other specialists can increase collaborative efforts, and organizations can optimize the insights gained from images to drive faster, better, and more innovative discoveries. ?

1. Rabal O, Link W, Sereld BG, Bischoff JR and Oyarzabal J. An integrated one-step system to extract, analyze and annotate all relevant information from image-based cell screening of chemical libraries. Mol. BioSyst. 2010(6);711-720.

About the Author
Tim Moran has more than 12 years of experience in scientific imaging at Beckman Coulter, Zeiss, and Cellomics (Thermo Fisher Scientific) and holds undergraduate and graduate degrees in microbiology and molecular genetics from University of California, Los Angeles and University of California, Irvine, respectively.