click to enlarge Figure 1: Example secondary screening data displayed in E-Notebook featuring a heat map overlaid on the plate layout and calculated IC50 from a dose response curve fit. (All figures: PerkinElmer)
Sources define research as “creative work undertaken systematically to increase the stock of knowledge… and the use of this stock of knowledge to devise new applications.” Although creativity demands flexibility, most software solutions that are available to drug research scientists today either:
• Limit the ability to directly compare experiments because the data is variable; or
• Constrain data analysis to processes defined prior to experimentation in order to store structured data
Drug research scientists often face the challenge of juggling their own research and analytical needs with the needs of the larger institution. Research is variable yet standardization often implies the need to conform to a process. Given these challenges, science needs a new approach for empowering scientists in drug discovery while also managing data to enable analysis across experiments.
click to enlarge Figure 2: Example compound summary showing results for selected assays in a single comprehensive form.
Capturing your data
Although experimentation is typically tightly controlled to ensure scientifically relevant results, the data required to be captured and analyzed differs from experiment to experiment. Historical software solutions to this problem involved specialized data-capture mechanisms for every new procedure. While this type of approach gives the scientist freedom in the research process, it has revealed a number of disadvantages. These include:
• Difficulty comparing results across experiments
• A lack of consistency in the interpretation of results
• Significant obstacles with sharing data
With user-definable data capture in such a structured environment, scientists are given the flexibility without sacrificing their organizational needs. Figure 1 shows an example of the free-form tables and analysis that can follow. Data appear in the interface as linked tables or spreadsheets but it is stored in a normalized table in Oracle.
For the drug researcher, the process is simple: decide what information is coming in and what information should come out. An understanding of this will define the tables and columns needed to capture data. This will also allow keep researchers from forcing unintended constraints on the experiment design.
Automated calculations, plots and curve fits can also be defined to accelerate data analysis. Scientists maintain control over the data through QC validation of results and ad hoc plotting.
click to enlarge Figure 3: Traditional SAR table example showing compound data on the left with biological activity in columns to the right.
Making sense of your data
Allowing scientists to define their data universe offers clear benefits but also introduces technical challenges regarding data sharing and collation. Furthermore, experimental data alone is not enough to interpret results and make informed decisions. For example, sample or compound information is often stored in a data structure separate from the results of the individual experiment but interpretation of those results typically requires merging these pieces of information together.
Many scientists spend endless time manually combining data from different data sources and transferring that data from one software platform to another. Not only does this delay reporting of experimental results but it also introduces too many possibilities for transcription errors.
Transforming and joining captured data can be facilitated through tools for adding metadata around the different types of information relevant to your research. Users indicate where key identifiers are in their experimental data (e.g., sample ID) in order to join the data with otherwise separate data sources. For example, an in vivo biologist may refer to a treatment with a compound ID. In this particular example, details for the substance being used to dose the animal exist in a separate data structure managed by a separate data entry mechanism. Having the compound ID allows the biologist to find and combine related information programmatically without the need for spreadsheet exports or complex SQL queries.
In a simpler case, where the scientist simply wants to look and aggregate across experiments—those entered by him and his colleagues—the data may be stored in the same place but is not accessible in the format required. In this situation, the data from singleton experiments can be properly merged into a comprehensive dataset and allow for a more global view. Figure 2 depicts one way to combine data across experiments into a single dataset without any need for writing code.
click to enlarge Figure 4: An example of what a SAR analysis could become through the use of TIBCO Spotfire combined with Lead Discovery.
Discovering something new
Pre-defined reporting and charting enable scientists to quickly communicate information in a standardized format. Toxicology results for a potential drug candidate, results of an assay run and inventory usage reports are all relevant examples of defined results combined with a defined format to communicate accurate information quickly. Figure 3 shows a structure activity relationship (SAR) table, which is a traditional way of communicating biological results to the medicinal chemist.
A problem emerges when reports are the only means for data analysis. Analysis of data cannot be constrained by the methods we have used in the past. Discovering new insights hidden in results requires a level of hands-on interactivity only achievable through a dynamic data analytics platform.
Figure 4 highlights the type of chemically intelligent data portal that can be achieved when the scientist is able to define how data is visualized and interpreted. Instantaneous filtering and interactive visualizations, combined with powerful statistics, creates a fast-paced gateway to decision making.
Empowering scientists through technology leads to better science and efficient processes. This self-service solution from data capture through to analysis reduces historical bottlenecks and enables better research that bring much-needed therapies to market faster.