Public Data Spurs Profiling Advantage
Optimizing a lead series in pharmaceutical drug discovery represents a critical step in transitioning initial hit compounds from screening into potent, selective, and bioavailable agents suitable for progression to preclinical development and eventual consideration as candidate drugs. Success often depends on timely access to critical profile data offering insight into potential efficacy, side-effects, or drug safety issues—all well-recognized factors responsible for many compounds failing in clinical trials.
One of the most challenging aspects of this process is that it is actually a multi-objective optimization. Scientists must not only achieve biochemical potency, they must at the same time optimize other characteristics of a drug-like compound with reasonable in vivo pharmacokinetics. For example, researchers can’t just focus on reducing the propensity for undesirable off-target activities; they also have to simultaneously consider ways to maintain or indeed, improve drug absorption and first-pass metabolism.
Multi-objective, parallel assessments such as these are often hampered by an increasingly decentralized and fragmented pharmaceutical R&D environment in which critical information is scattered across departmental and geographic silos. In response to this situation, the industry is increasingly adopting enterprise approaches to capturing and storing varied data types. Managing critical scientific data and making it available to scientists in a usable, structured format enables organizations to achieve a higher level of innovation efficiency and drive top-line growth.
Bringing disparate data together
A comprehensive picture of the collective knowledge available to researchers is not complete without consideration of the wealth of valuable data available in the public domain. PubChem,1 ChemBL,2 DrugBank,3 and the PDB4 are just a few examples of publicly accessible databases containing useful biological information, such as assay results and target, pathway, and even receptor-ligand binding information. Care must always be exercised assessing the quality and accuracy of data derived from any public repository,5 but effectively leveraging these resources—in combination with in-house enterprise informatics services—can potentially and significantly improve the quality of project-design decisions. Indeed, many computational groups have attempted to develop “chemogenomic” methods to mine this vast matrix of experimental data for insights into predicting novel interactions.6,7,8,9
Within pharmaceutical R&D, predictive sciences are now regularly deployed alongside experimental research teams to support the design and decision-making process. Indeed, in silico prediction models are widely used on key end-points such as solubility, blood-brain barrier penetration, and “drug-likeness.” In contrast to high-throughput screening, virtual target profiling accelerates lead optimization both in the chemical space—enabling large number of compounds to be tested—and in the biological space—enabling predictions for many targets for each molecule.
It is clear that prediction quality is heavily dependent both on the quality of the data used to generate the models and the complementarity of the predicted compound structure to the chemistry space of the compounds contained in the model set. To address this dependency, some organizations are now attempting to automate the process of model building and validation, such that any prediction model available to a research scientist provides the best possible assessment of prediction while also using the most recent data.10
Leveraging public data
In silico ligand profiling is another important lead optimization tool, because it can help scientists anticipate potential adverse drug reactions and side effects, or perhaps even suggest new targets for an existing drug. Many public data resources contain either biological assay results or structural information on how a small molecule interacts with a protein active site, allowing multiple ligand profiling strategies to be applied in parallel.
Researchers recently compared four in silico screening approaches to ligand profiling, including 2D and 3D ligand similarity searching, pharmacophore screening, and receptor-ligand docking.11 The study implemented a fully automated method to generate 3D pharmacophore queries from protein−ligand X-ray structures with an estimation of pharmacophore selectivity based on the number of anticipated drug-like hits. The protocol was applied to the sc-PDB data set of protein-ligand complexes to generate a database of 68,056 pharmacophores (PharmaDB) describing 2,556 unique targets. This provided the opportunity to compare ligand-based and structure-based methods in profiling a set of 157 diverse ligands against a panel of targets for the first time. In the majority of cases, when sufficient ligand data was available, 2D similarity methods significantly out-performed the structure-based methods in ranking the true targets among the top 1% scoring entries. This is to some extent expected, but nonetheless, underlines the critical value of experimental data in helping to predict and guide compound design. Another finding of the study was that, for some ligands, only a single method could be applied successfully. Notably, the authors proposed that when there was not sufficient information available for 2D similarity methods, receptor-ligand pharmacophore models were a fast and reliable alternative to docking. Overall, the study suggested that a workflow using the best profiling method based on the protein−ligand context is the best strategy to follow. The authors also presented concrete guidelines for selecting the optimal computational method according to simple ligand and binding-site properties of proteins.
Studies such as these clearly illustrate that public data repositories are invaluable resources that can significantly benefit many aspects of the lead optimization process. By coupling those databases with QSAR approaches,10 an opportunity arises to leverage the maximum knowledge and insight available to accelerate lead optimization and improve innovation efficiency. Arguably, organizations that do not leverage these resources place themselves at an elevated risk of failing to innovate effectively.
About the author
Dr. Goupil-Lamy joined Accelrys as a support scientist in 1998. She obtained her PhD in molecular biophysics from the University of Pierre and Marie Curie. Dr. Stevens has more than 12 years experience in the practical application of computational chemistry. He received his PhD in computational chemistry from the University of Portsmouth.
1. Wang Y, et al. PubChem’s BioAssay Database. Nucleic Acids Res. 2012;40(D1):D400-D412.
2. Gaulton A, et al. ChEMBL: a large-scale bioactivity database for drug discovery. Nucl Acids Res. 2011;40(D1):D1100-D1107.
3. Knox C, et al. DrugBank 3.0: a comprehensive resource for ‘Omics’ research on drugs. Nucleic Acids Res. 2011;39(Database issue):D1035-D1041.
4. Berman HM, et al. The Protein Data Bank. Nucleic Acids Res. 2000;28(1):235-242.
5. Kramer C, et al. The Experimental Uncertainty of Heterogeneous Public Ki Data. J Med Chem. 2012;55(11): 5165-73.
6. Jacoby E. Computational Chemogenomics. Comp Molecular Sci. 2011;1:57-67.
7. Bajorath J. Computational analysis of ligand relationships within target families. Curr Opin Chem Biol. 2008; 12(3):352-358.
8. Rognan D. Chemogenomic approaches to rational drug design. Br J Pharmacol. 2007; 152(1):38-52.
9. Harris CJ, Stevens AP. Chemogenomics: structuring the drug discovery process to gene families. Drug Discov Today. 2006;11(19-20): 880-888.
10. Rodgers SL, et al. Predictivity of Simulated ADME AutoQSAR Models over Time. Molecular Informatics. 2011;30(2-3), 256–266.
11. Meslamani J, et al. Protein–Ligand-Based Pharmacophores: Generation and Utility Assessment in Computational Ligand Profiling. J Chem Inf Model. 2012;52(4):943–955.