As researchers strive for greater efficiency during the drug development process, it is essential that they exploit all available information to guide them. It is increasingly common to see drug discovery programs supported by the use of structural biology to enhance the quality of molecular design. Structure-based drug design (SBDD) takes advantage of the ability to determine and visualize the 3D structure of drug target molecules. The crystal structure of a lead molecule bound to the target of interest can be invaluable in guiding the design of modifications to improve the compound’s affinity for the target, or its selectivity versus other targets. Such structural information has been used successfully in the design of several approved drugs. SBDD can be employed throughout the drug discovery process and is of particular utility during hit identification and the lead-optimization process.
Identification of high-quality hits can be a challenge for any program, but especially for novel target classes such as those being investigated in the emerging area of epigenetics. Structural knowledge of the target allows the use of virtual screening to identify hits for proteins that have no known inhibitors, or to access novel binding-sites on a known target. Virtual screening compares the 3D conformation of a potential inhibitor with the target binding-site, in order to assess the level of spatial and electrostatic complementarity.
Only those compounds that are predicted to bind well—the so-called “virtual hits”—need to be progressed into biological screens. The strength of applying structural knowledge is the elimination of large numbers of unsuitable compounds, thereby enriching the screening library and enhancing the hit-rate.1 While hits may be generated by random screening of compound libraries, this approach is much less efficient. For example, to find hits for the lysine methyltransferase targets G9a and SMYD2, two groups of researchers have reported screening hundreds of thousands of compounds but yielded a few validated hits.2,3
Once a hit series has been identified, structural information enables the binding mode of the inhibitor or ligand to be verified, the rationalization of structure-activity relationships (SAR), and the direction of design effort towards a more optimal interaction with the active site. Examples of SBDD from the recent literature disclosing clinical candidates include identifying key interactions and locking the bioactive conformation to gain potency via identifying drug-like bioisteres, as in linsitinib/OSI-906, an oral inhibitor of IGF-1R and IR currently in a Phase 3 clinical trial in adrenocortical carcinoma;4 directing drug design into unexploited regions of the target as in PF-489791, a PDE5 inhibitor currently in Phase 2 for Raynaud’s disease;5 and, optimizing the scaffold while retaining key interaction, as in Xalkori/crizotinib, approved in 2011 to treat certain patients with late-stage lung cancer.6
Structural information is also used to improve selectivity over related targets, an area which has recently been comprehensively reviewed.7 Structural information has become so invaluable to drug design that if a crystal of the target is not available, a homology model will often be built and used as a surrogate, such as for EZH2, an epigenetic target of great interest.8
Challenges in protein expression
Ideally, one would like a crystal structure of the target at the outset of a project, preferably containing a bound ligand, but frequently this information only arrives later in the discovery program. A common reason for this bottleneck in generating high-resolution crystal structures is the need for large quantities of pure, soluble protein that will readily form crystals. Full-length proteins are often too large and complex to produce in recombinant cell expression systems and can be challenging— or even impossible—to crystallize. An alternative to using the full-length protein is to focus only on the region of particular interest, such as the relevant catalytic or binding domain. The challenge then becomes how to identify a fragment of the protein that is easier to manipulate and express, but contains the appropriate amino-acid sequence that allows the protein to be folded correctly, and in some cases, to have functional activity.
The most widely used method to identify these domains is bioinformatic analysis of the gene and the protein it encodes.9 In silico analysis and prediction of suitable expression constructs can be complemented by experimental methods such as partial proteolysis.10 However, domain boundaries identified through bioinformatics are not necessarily optimal for expression; a few amino acid residues can significantly change the solubility, stability, and the level of expression of the desired protein. Bioinformatic approaches cannot predict domain boundaries with sufficient precision to give consistently reliable results. Hence the traditional approach is an iterative, trial-and-error process of designing and redesigning constructs until a sufficiently good construct is found, or the search has to be abandoned.
An alternative approach is to rapidly generate a large number of randomly fragmented constructs from the gene of interest, and to express and select those constructs that encode highly-expressed, stably-folded soluble proteins.11,12 When properly executed, this unbiased approach of random fragmentation and screening is rapid, gives a clear-cut result, and has the advantage that it can identify expressible regions within a target gene that are unexpected or unpredicted. It is also suitable for tackling proteins with unknown or poorly understood domain architecture. Library-based approaches test large numbers of fragments, orders of magnitude greater than the conventional iterative approach. Therefore, the expression space can be sampled comprehensively and such fine-sampling may be especially useful for targets whose expression is sensitive to the precise sequence at the N- and C-termini.
The Combinatorial Domain Hunting approach to protein expression
In a recent case study, Brussels-based pharmaceutical company UCB and Domainex used a random-fragmentation technique known as Combinatorial Domain Hunting (CDH) to identify a protein construct suitable for structure-based drug design, for a target that had previously proved difficult to express and crystallize.13 Figure 2 demonstrates how once a library of DNA fragments has been generated using the CDH random-fragmentation approach, the clones that code for stably-folded, soluble domains are identified and selected for the next stage of drug discovery.14
The protein, mitogen-activated protein kinase kinase (MAPKK, also known as MEK) is of major therapeutic interest. Although one crystal structure of this kinase had been reported at the outset of the program, attempts to improve the expression yield were modest, and efforts to produce structures of MAPKK-inhibitor complexes were unsuccessful. The CDH random-fragmentation approach identified a construct that covers the kinase domain and crystallizes significantly better than the bioinformatically designed expression constructs. The construct identified by CDH as giving the highest expression-levels of soluble protein contained an unexpected N-terminal extension in addition to the core domain. The protein domain identified using the CDH approach was subsequently used to resolve the structure of MAPKK and a number of MAPKK-inhibitor co-structures. The structural information assisted UCB scientists in designing a novel class of MEK-1 inhibitors.15
Protein-protein interactions are another particularly challenging area for drug discovery, and many researchers believe that these targets will often only be tractable through access to structural information. To that end a variant of the CDH technology, called CDH2, which allows the identification of heterodimeric domain complexes, has been developed.16 These technologies will allow drug researchers using SBDD to remain at the forefront of scientific discovery.
About the author
Philip Fallon has over 15 years industrial experience of small-molecule discovery research and is a named inventor on several patents. Research experience includes oncology, osteoporosis, natural products, and agrochemicals. He earned a doctorate from the University of Nottingham.
1. Jenkins JL, et al. Proteins: Structure, Function, and Bioinformatics, 2003;50(1),81–93.
2. Kubicek S, et al. Reversal of H3K9me2 by a small-molecule inhibitor for the G9a histone methyltransferase. Mol Cell. 2007; 25:(3),473–481.
3. Ferguson AD, et al. Structural basis of substrate methylation and inhibition of SMYD2. Structure. 2011;19(9):1262-73.
4. Mulvihill and Buck. The discovery of OSI-906, a small-molecule inhibitor of the insulin-like growth factor-1 and insulin receptors. Accounts in Drug Discovery: Case Studies in Medicinal Chemistry. 2011; 71-102.
5. Bell and Palmer. The discovery of the long-acting PDE5 inhibitor PF-489791 for the treatment of pulmonary hypertension. Accounts in Drug Discovery: Case Studies in Medicinal Chemistry. 2011; 166-182.
6. Cui JJ, et al. Structure based drug design of crizotinib, a potent and selective dual inhibitor of mesenchymal-epithelial transition factor kinase and anaplastic lymphoma kinase. J Med Chem. 2011;54(18):6342–6363.
7. Huggins DJ, et al. Rational approaches to improving selectivity in drug design. J Med Chem. 2012;55(4):1424-44.
8. Yap DB, et al. Somatic mutations at EZH2 Y641 act dominantly through a mechanism of selectively altered PRC2 catalytic activity, to increase H3K27 trimethylation. Blood. 2011;117(8):2451-2459.
9. Mooij WT, et al. ProteinCCD: enabling the design of protein truncation constructs for expression and crystallization experiments. Nucleic Acids Research. 2009;37:W402–W405.
10. Gao X, et al. High-throughput limited proteolysis/mass spectrometry for protein domain elucidation. J Struct Funct Genomics. 2005;6(2-3):129-34.
11. Savva R, et al. DNA fragmentation based combinatorial approaches to soluble protein expression Part II: library expression, screening and scale-up. Drug Discov Today. 2007;12(21-22):939-47.
12. Savva R, et al. DNA fragmentation-based combinatorial approaches to soluble protein expression Part I. Generating DNA fragment libraries. Drug Discov Today. 2007;12(21-22):931-38.
13. Meier C, et al. Engineering human MEK-1 for structural studies: A case study of combinatorial domain hunting. J Struct Biol. 2012;177(2):329-34.
14. Reich S, et al. Combinatorial Domain Hunting: An effective approach for the identification of soluble protein domains adaptable to high-throughput applications. Protein Sci. 2006;15(10):2356–2365.
15. Laing VE, et al. Fused thiophene derivatives as MEK inhibitors. Bioorg Med Chem Lett. 2012;22(1):472-5.
16. Maclagan K, et al. A combinatorial method to enable detailed investigation of protein-protein interactions. Future Med Chem. 2011;3(3):271-82.