Analysis | Published:

Objective assessment of cancer genes for drug discovery

Nature Reviews Drug Discovery volume 12, pages 3550 (2013) | Download Citation


Selecting the best targets is a key challenge for drug discovery, and achieving this effectively, efficiently and systematically is particularly important for prioritizing candidates from the sizeable lists of potential therapeutic targets that are now emerging from large-scale multi-omics initiatives, such as those in oncology. Here, we describe an objective, systematic, multifaceted computational assessment of biological and chemical space that can be applied to any human gene set to prioritize targets for therapeutic exploration. We use this approach to evaluate an exemplar set of 479 cancer-associated genes, reveal the tension between biological relevance and chemical tractability, and describe major gaps in available knowledge that could be addressed to aid objective decision-making. We also propose drug repurposing opportunities and identify potentially druggable cancer-associated proteins that have been poorly explored with regard to the discovery of small-molecule modulators, despite their biological relevance.

Key points

  • Identifying and validating disease-causing genes that are viable as drug targets is a key challenge in drug discovery.

  • Large-scale multi-omics initiatives are deepening our understanding of cancer and providing an unbiased view of possible molecular mechanisms of the disease. Such studies usually result in sizeable lists — often hundreds — of potential cancer drug targets, most of which are not members of well-understood cancer pathways.

  • The selection a small number of genes for in-depth biological validation is thus often done in an ad hoc manner, thereby running the risk of bias or neglecting potentially druggable and therapeutically important novel targets.

  • We describe an objective, systematic, multifaceted computational approach of assessing biological and chemical space that draws on unprecedented volumes of multidisciplinary data, simultaneously, to assess large gene lists.

  • We utilize our new approach to evaluate 479 cancer genes from the Cancer Gene Census as an exemplar list and demonstrate the power of such an unbiased approach in rapidly unveiling potential therapeutic opportunities.

  • This analysis reveals the tension between biological relevance versus chemical tractability and highlights major gaps in available knowledge that can be addressed to aid objective decision-making.

  • We hypothesize drug repurposing opportunities and identify potentially druggable cancer proteins that are as yet poorly explored in the chemical space — despite their biological relevance — and we propose these proteins for in-depth chemical and biological studies.

  • We also illustrate how the mapping of biological and chemical data distillations onto cellular networks can provide deeper insights and potentially guide rational drug combination experiments.

  • We provide a live web-based portal to allow simultaneous annotation of up to 500 genes that can be applied to any human gene list. We propose that by using our approach alongside a researcher's own biological knowledge, stronger, more rational and unbiased decisions about target selection can be made that could lead to the discovery of a new generation of novel and chemically tractable therapeutic targets.


Cancer is the leading cause of death worldwide and the second commonest in developed countries1. Opportunities to help reduce this huge adverse impact of cancer through the discovery of new drugs are benefiting from an increasing biological understanding of the disease, driven by technological advances including next-generation sequencing2 and other genomic technologies, genome-wide association studies (GWAS)3, proteomics analyses4, RNA interference studies5 and chemical biology6,7. These are fuelling efforts to identify vulnerabilities in cancer cells and hence new drug targets8. Nevertheless, cancer incidence continues to rise (see the cancer fact sheet on the World Health Organization (WHO) website), drug resistance to both cytotoxic and molecularly targeted drugs continues to emerge9, and the genetic complexity and heterogeneity10 associated with many cancers is becoming increasingly apparent. Consequently, the discovery and application of innovative targeted therapeutics that can have a genuine clinical impact is becoming a more difficult task11,12.

Although several novel first-in-class molecularly targeted drugs for cancer were approved in 2011–2012, including abiraterone13, crizotinib14 and vemurafenib15, the rate of success in cancer drug development is among the lowest of all therapeutic areas16,17. Not only can the causative genetics of cancer be complex, but targets that are biologically compelling may fall outside the 'druggable' or 'ligandable'18 parameters that are readily accessible by conventional medicinal chemistry approaches. Advances in drugging target classes that have traditionally been considered to be intractable, such as protein–protein interactions — exemplified by the discovery of the B cell lymphoma 2 (BCL-2) and BCL-XL inhibitor navitoclax19,20 — may well expand the target space that is accessible to small-molecule drug discovery. Nonetheless, reliance on tried and tested targets persists in drug discovery pipelines (as highlighted in an article on the Forbes website).

Large-scale genomics initiatives such as The Cancer Genome Atlas (TCGA), the Cancer Genome Project and the International Cancer Genome Consortium21 are providing a growing list of genes that are causally involved in cancer. Other complementary large-scale approaches such as synthetic lethality screens22 and GWAS3 are also contributing to the generation of lists of candidate genes that have a biologically and pathologically compelling role in cancer11. But which of these genes will lead to the next cancer drug?

Although systematic approaches to discover cancer genes are now commonplace, assessing and prioritizing them for biological validation and therapeutic exploitation remains a largely ad hoc exercise that does not necessarily make use of the full breadth and depth of data available from the abundant large-scale initiatives. In the field of tropical disease research, the TDR Targets resource23 introduced the importance of estimating a protein's potential therapeutic attractiveness through the combination of high-level druggability, orthology and predicted lethality to the infectious organism to prioritize targets for further investigation. Since then, the data landscape has changed substantially with the availability of large-scale, open-access resources24,25, making it possible for the first time to address target assessment comprehensively, objectively and in depth.

In order to therapeutically exploit a gene, it is crucial to establish the biological role of the gene and/or protein in disease causation and the cognate biochemical pathway. This requires extensive biological validation, which can be a challenging, long and expensive process26,27 and is not scalable to the size of the gene lists generated by large-scale efforts. Therefore, there is a clear need for a systematic, objective, data-driven assessment of such gene lists, based on information integrated from different disciplines and sources, with the goal of prioritizing genes for further detailed biological validation studies. Such an approach must provide sufficient detail to allow quantitative and thorough examination of the data supporting a target's suitability for therapeutic development, but it must also be applicable at a scale that can meet the demands of the large gene lists emerging from multi-omics efforts. When combined with the researchers' own biological and disease-specific knowledge, this assessment can form a powerful data-driven guide to selecting targets for further experiments before making the major resource-intensive commitment to a drug discovery project. Here, we demonstrate such a systematic, unbiased and objective computational approach.

The workflow with the annotation scheme and assessment criteria is summarized in Fig. 1. Further information on the data sets and analysis is provided in Box 1, with full details in Supplementary information S1 (notes). We have applied our approach to assess the Cancer Gene Census28 list, which is a manually curated set of genes that have mutations or other genomic abnormalities associated with cancer and are likely to be causative, as identified from genetic studies. This data set exemplifies large gene lists that have been generated from initial experimental exploration and that have varying degrees of biological validation. It highlights the roles of the genes in specific cancers, although many of the genes may well have roles in other malignancies, especially in view of the number of potential drug targets suggested by ongoing large-scale efforts.

Figure 1: Workflow with annotation scheme and assessment criteria.
Figure 1

The list of biologically relevant genes is annotated using homology to targets of approved pharmaceuticals; the molecular and cellular properties of any existing published active small molecules; three-dimensional (3D) structure, druggability or ligandability; and functional class and subcellular localization. Through the incorporation of additional disease information — for example, from The Cancer Genome Atlas (TCGA) or International Cancer Genome Consortium (ICGC)50 — therapeutic hypotheses can be derived. The data are then combined to rank potential targets based on available supporting evidence for their chemical tractability. The ranking and importance of different features will depend on the ultimate requirement of the analysis: for example, whether the aim is to propose a drug repurposing hypothesis or to identify a novel target. Similarly, certain strands of information — such as the existence of cell-penetrant chemical tools — will only be important for certain targets, in this case intracellular proteins. Mapping the information above onto pathways or cellular interaction networks provides an additional informative view of therapeutic intervention points on the pathway: for example, for combination studies.

Box 1: Box 1 | Data sets and analysis

A comprehensive description of the data, analysis approaches and additional notes is provided in Supplementary information S1 (notes). Some key aspects are highlighted here.

The data set analysed here was downloaded from the homepage of the Cancer Gene Census, which is an ongoing project that manually curates genes that have mutations or other genomic abnormalities associated with cancer. A total of 479 genes were selected for the analysis. Each Census protein was manually classified into a single class (for example, enzyme, transcription factor, transcription regulator, and so on); for simplicity, all receptor tyrosine kinases have been classified as enzymes only despite also being receptors.

Most of the analyses described here used our integrated cancer database, canSAR25, which brings together biological, chemical and pharmacological data. All the information on individual genes is presented on the canSAR 'Target Synopsis' pages; for details, see Supplementary information S2 (table). Additionally, a multidisciplinary target annotation tool has been developed within canSAR that allows the annotation of up to 500 targets at a time using the types of information described within this analysis. The set of US Food and Drug Administration (FDA)-approved drugs and their manually curated molecular targets is updated and maintained within the canSAR database as a joint effort between the canSAR25 and ChEMBL database24 teams. The full list of updated drugs can be obtained from the canSAR website in the 'Browse Compounds' section. The canSAR database contains chemical bioactivity data from various resources. The bulk of the bioactivity data are from the ChEMBL database24 but some data were also obtained from the Binding Database (BindingDB)79 and the US National Cancer Institute (NCI) Developmental Therapeutics Program80, among others. The original data were obtained from primary scientific literature and curated resources, and so compounds identified are either available to purchase or details of their synthesis have been published.

Three-dimensional structural characterization of the Census proteins was obtained using protein accession cross-references in the Protein Data Bank (PDB75; July 2012) and by carrying out sequence homology searches to the most current version of the PDB using the BLASTP program81 (version 2.2.19). The ligand-bound complexes of Census protein structures were obtained from canSAR3D, a component of canSAR that includes weekly updated structural data and compound–ligand annotations.

Druggability predictions were initially obtained from the ChEMBL DrugEBllity Database, which is part of the ChEMBL resource24, utilizing a decision-tree algorithm known as Strudle. All cavities within each structural domain were identified, and then geometric and physicochemical descriptors such as volume, accessible and buried surface area as well as the number of hydrogen-bond acceptors and donors were determined for each cavity. These properties were used by the ChEMBL Strudle algorithm, which classifies the cavities as druggable or undruggable using strict and relaxed criteria. A cavity that is classified as druggable using the strict model has properties that are consistent with binding to strictly 'rule of five' (RO5)53-compliant drugs. Cavities with physicochemical properties that are consistent with binding to a small molecule that is not necessarily RO5-compliant are classified as druggable (or tractable) according to the relaxed model. For each assessment type, the algorithm24 was trained using three-dimensional structures that bind to compounds with appropriate physicochemical properties (see Supplementary information S1 (notes) for property values). Once the algorithm was trained, the assessment could be carried out on any three-dimensional pocket regardless of whether a ligand was bound. Having used the initial predictions from the database, we manually inspected them. For targets of particular interest (such as those listed in Table 1) we also calculated the druggability using the entire PDB chains and full PDB files, as well as the alternative method SiteMap76 for confirmation.

Table 1: Druggable Cancer Gene Census proteins with no or few published active compounds

By carrying out unbiased computational analyses of disparate data, including chemoinformatics, gene expression, mutations and three-dimensional structure, we have been able to prioritize — for further experimental work — the most chemically tractable targets within the gene list. In addition, we suggest potential alternative repurposing indications for known drugs and chemical tools. Here, we have focused on the chemical tractability of targets with small-molecule drugs rather than biologics, which would require a different prioritization strategy.

Disease-relevant versus drugged classes

At the time of our analysis (July 2012), the Cancer Gene Census list contained a total of 488 genes or loci. After the removal of duplicates and genes with no curated protein sequences, a total of 479 genes from the Census were selected for the subsequent analysis. We classified the 479 genes into major functional classes (Fig. 2a), as detailed in section 3 in Supplementary information S1 (notes). For simplicity, we assigned every gene product to a single functional class, although occasionally more than one class was applicable (for example, receptor tyrosine kinases can fit into both 'enzyme' and 'receptor' classes, but here they are placed in the enzyme class). The main functional classes represented in the Census are as follows: transcription factors and regulators (29%), enzymes (25%) and enzyme regulators (7%). Transcription factors in this set comprise primarily low-structural-complexity proteins such as C2H2 zinc finger domain-, basic leucine zipper domain- and helix–loop–helix domain-containing proteins, but also include three nuclear hormone receptors (peroxisome proliferator-activated receptor-γ (PPARγ), retinoid receptor-α and nuclear receptor subfamily 4 group A3). In general, most of the Census proteins are localized in either the nucleus (49%) or the cytoplasm (25%), whereas 10% are cell membrane-associated or extracellular.

Figure 2: Functional classes of proteins from the Cancer Gene Census.
Figure 2

a | Distribution of the Census protein classes. Each layer of the schematic diagram represents sublevels of the functional class hierarchy. Enzymes (predominantly kinases) and transcription factors (as well as transcription regulators) constitute over half the Census. A total of 18% of proteins, here labelled 'Other', fall into 15 classes including cell adhesion proteins, cytokines, splicing factors, ubiquitin proteins and growth factors. b | Tension between biological relevance versus tractability for drug discovery. Enrichment data are calculated from the fraction of targets of approved small-molecule drugs within a particular functional class versus the fraction of Census proteins within the same class. The analysis was performed using all small-molecule drugs approved by the US Food and Drug Administration (FDA) as well as oncology-specific drugs. Positive values indicate enrichment in approved drug targets, whereas negative values represent enrichment in the Census proteins. Transmembrane proteins are over-represented in approved drug targets whereas transcription factors are under-represented. 7TM1, seven-transmembrane class 1 G protein-coupled receptors; cAMP, cyclic AMP; ETS, transcription factor ETS; GATA, transcription factor GATA; NHR, nuclear hormone receptor; PI3K, phosphoinositide 3-kinase; TCL1, T cell leukaemia/lymphoma protein 1; YEATS, YEATS domain-containing protein.

We compared the functional class distribution (Fig. 2a) of the Census proteins to that of the targets of small-molecule pharmaceuticals approved by the US Food and Drug Administration (FDA), across all therapeutic areas as well as for oncology alone29. Figure 2b highlights the functional class enrichments between the data sets. There is a greater than twofold enrichment of enzymes in the current target set29 compared to the Census proteins (Census proteins: 25%; drug targets (all therapeutic areas): 55%; drug targets (oncology): 64%). The success of enzymes as targets of launched drugs is probably because enzymes typically possess a well-defined catalytic site, and because of the relative ease of setting up chemical screening assays. This enrichment points to the need for a concerted effort, using systems-based approaches, to identify enzymes that affect oncoproteins regardless of their own oncogenic role.

Conversely, transmembrane proteins such as G protein-coupled receptors (GPCRs) and ion channels are less frequently targeted in oncology, although they have been the focus of many drug discovery projects in other therapeutic areas owing to their importance as molecular gateways to cells and their suitability for binding to small molecules. They are also substantially under-represented in the list of Census proteins (Census proteins: 5%; drug targets (all therapeutic areas): 48%; drug targets (oncology): 7%).

The enrichment of transcription factors in the Census list (Census proteins: 17%; current drug targets (all therapeutic areas): 5.6%; drug targets (oncology): 9%) supports their importance in cancer30,31, as exemplified by nuclear factor-κB (NF-κB)32, signal transducers and activators of transcription (STATs)33 and the transcription factor ETS34, but also reveals the tension between their importance in the molecular pathology of cancer versus their chemical tractability. In Fig. 2b we illustrate the enrichment of transcription factors and regulators as a group, and highlight the enrichment of nuclear hormone receptors (a subgroup of ligand-activated transcription factors) separately. Historically, transcription factors have been largely inaccessible to drug discovery efforts owing to the need to target more challenging protein–DNA or protein–protein interfaces and a general absence of an enclosed hydrophobic pocket, with the exception of nuclear hormone receptors, which have a small-molecule ligand-binding domain and thus can be more readily targeted with low-molecular-weight drugs, as exemplified by tamoxifen (an oestrogen receptor (ER) antagonist) and flutamide (an androgen receptor (AR) antagonist). Nonetheless, recent progress has been made with stabilized peptides that have the potential to target protein–protein interfaces in transcription factor complexes; for example, in the Notch pathway31,35. Alternatively, transcription factors could be indirectly targeted in cancer using systems biology approaches to analyse up- and/or downstream members of pathways involving transcription factors to identify proteins that are more chemically tractable36.

Opportunities for drug repurposing

One of the advantages of integrative and unbiased large-scale analyses is the ability to identify links in existing knowledge that may not always be readily apparent, such as potential opportunities to repurpose existing drugs or chemical tools. Examples of such studies have been reported for infectious37,38 and inflammatory39 diseases.

Using precedence or homology to targets of approved drugs29, we have identified proteins from the Cancer Gene Census that are themselves targets of approved drugs or that are members of the same protein family as an existing drug target and thus can be hypothesized to be druggable. These proteins, examples of drugs and associated therapeutic indications are detailed in section 4 in Supplementary information S1 (notes). We have found that a total of 28 Census proteins have >50% homology to a known drug target. Twenty-five of these proteins are themselves targets of launched drugs, and of these 25 proteins, three (PPARγ, DNA methyltransferase 3A and aldehyde dehydrogenase) do not currently have small-molecule drugs indicated for cancer therapy. A fourth protein, thyroid-stimulating hormone receptor (TSHR), is the target of a biologic that has been indicated as a diagnostic and adjuvant therapy to avoid hypothyroidism after thyroid ablation in patients with thyroid cancer. Antagonists of this receptor have not been indicated for the treatment of cancer.

PPARγ is a type II nuclear hormone receptor that has been reported in the Census because of the observation of a dominant translocation in patients with follicular thyroid cancer28; its 'insufficiency' through reduced protein levels or activity drives oncogenesis40. It is also the target for the thiazolidinedione class of antidiabetic drugs, which are currently used for the treatment of type 2 diabetes. Studies have demonstrated the antitumour activity of PPARγ agonists, which is mediated through the transactivation of genes that regulate cell proliferation, apoptosis and differentiation41. It has been suggested that treatment with a PPARγ agonist delays the progression of thyroid carcinogenesis40. There are various clinical trials currently underway involving PPARγ agonists in cancers including malignant liposarcoma42 and non-small-cell lung cancer ( identifiers: NCT01199068; NCT01199055). Based on our analysis, we hypothesize a potential new application for PPARγ agonists in follicular thyroid cancer.

TSHR is a class 1 GPCR and is the target of the biologic recombinant human TSHα. The Census reports a dominant missense mutation in TSHR that is associated with thyroid adenoma, and studies have shown that constitutive activation of TSHR by somatic mutations can lead to toxic thyroid adenoma43,44,45. TSHR mRNA has been studied as a potential biomarker of circulating thyroid carcinoma cells46. In several studies, small-molecule antagonists and inverse agonists of TSHR have been shown to block stimulating antibodies for the potential therapy of hyperthyroidism47,48. These combined findings suggest that TSHR antagonists may have therapeutic potential in thyroid cancers.

Smoothened (SMO) is the target of vismodegib, which has recently been approved for the treatment of basal cell carcinoma49. SMO is listed in the Census because a dominant missense mutation in the SMO gene has been associated with basal cell carcinoma. Interestingly, by exploring the TCGA database of copy number variation and gene expression data from 599 patients with glioblastoma multiforme50, we have identified that the SMO gene exhibits copy number variation ranging from three to seven copies in 35% of patients. Additionally, according to the same database, SMO is overexpressed in 93% of the patients (at least twofold overexpression in comparison with matched normal tissue). If these findings are validated and found to be clinically significant, they would suggest another application for vismodegib in the treatment of SMO-amplified glioblastoma multiforme.

Additional applications can be envisaged for drugs that are in development or even earlier chemical tool compounds. For example, fostamatinib — an inhibitor of spleen tyrosine kinase (SYK) that is currently in development for the treatment of rheumatoid arthritis — has shown clinical activity in non-Hodgkin's lymphoma and chronic lymphocytic leukaemia51, and was found to induce tumour cell death in retinoblastoma models52. SYK is causally implicated in all of these cancer types and was found to have a role in head and neck squamous cell carcinoma77. Based on the oncogenic mutations reported in the Census, we propose that SYK inhibitors such as fostamatinib may also be useful in the treatment of myelodysplastic syndromes and peripheral T cell lymphoma.

Scale and utility of structural data

The availability of three-dimensional structures for a protein greatly empowers small-molecule drug discovery, as structural information can be used in hit generation and lead optimization as well as in understanding mechanisms of drug binding and resistance. Where available, multiple structural snapshots of the same protein provide an enhanced level of information to support drug discovery, as they allow the exploration of alternative functionally relevant conformations. In the absence of appropriate experimental structures, homology models have proved to be useful tools for this purpose. However, the ability to generate informative models decreases with lower homology and with fewer available relevant structural templates (see the notes on homology in Supplementary information S1 (notes)).

Our analysis shows that out of the 479 Census proteins, 257 (54%) are structurally characterized (Fig. 3a). By contrast, only approximately one-quarter of the human proteome has been structurally characterized. The higher degree of structural characterization of Census proteins is probably because many of these proteins have a high biological importance and hence have been more extensively studied. However, a detailed inspection reveals a grossly unequal representation of structural data. The distribution of the number of structures per protein (see section 5 in Supplementary information S1 (notes)) shows that out of the 257 structurally characterized proteins, one-third have only a single structure determined, whereas 14 better-known cancer proteins (such as RAS, heat shock protein 90 (HSP90) and tumour suppressor p53) each have more than 30 structural snapshots determined. It is important to note that many proteins have only been structurally characterized to a limited degree, and for some (for example, fibroblast growth factor receptor 3) the structure of the catalytic domain — which is important for drug-binding — has not yet been determined. Indeed, only 20% of the amino acid content of Census proteins has been structurally determined, highlighting the degree of partial structural characterization.

Figure 3: Structural characterization of proteins from the Cancer Gene Census.
Figure 3

a | More than half (257) of the Census proteins are structurally characterized, 119 have homology to a three-dimensional structure, whereas 103 cannot be mapped to a three-dimensional structure at significant homology. The shading indicates an increasing level of sequence homology as a percentage between Census proteins and the nearest structure in the Protein Data Bank (PDB), with the grey portion representing those proteins that have no significant homology to any PDB structures, rendering structure-based druggability predictions impossible. b | Structural druggability by target class. Each pie chart represents the proportion of druggable targets for a target class. The size of each individual pie represents the number of structurally characterized proteins in that target class, the largest being enzymes — with 98 structurally characterized enzymes — and the smallest being histone-binding proteins (with three structures). Proteins that are druggable and 'druggable by homology' (with ≥50% homology to a druggable structure in the PDB) are shown in green and orange respectively, whereas proteins for which structures are available but that are not predicted to be druggable are represented in red.

For one third of the Census proteins (79 in total), structures have been determined in complex with small-molecule ligands; the majority of these proteins are enzymes. For 11 proteins, structural complexes with 10 or more different ligands have been independently determined. However, the majority of the structurally characterized proteins (178 out of 257) have not been solved in complex with any ligand (see section 5 in Supplementary information S1 (notes)). Some of these proteins may not naturally bind to small-molecule ligands; however, 36 of these proteins are enzymes, representing a set of cancer-associated proteins for which ligand-bound structures would be greatly beneficial.

Structural annotation can be expanded by identifying structurally characterized homologues, thus allowing the indirect analysis of otherwise uncharacterized proteins. A further 119 Census proteins can be structurally annotated in this way (Fig. 3a); of these 119 proteins, 69 have 50–89% homology to the nearest determined structure, which makes them suitable candidates for potentially useful homology modelling. Conversely, 103 Census proteins cannot currently be structurally annotated (see Supplementary information S1 (notes) for structural characterization criteria). These Census proteins may be structurally uncharacterized owing to technical difficulties or simply because of a relative lack of scientific interest in them so far. Nonetheless, for the structurally uncharacterized fraction of the Census proteins, the future availability of three-dimensional structures would help considerably in understanding their functional roles in cancer and would probably aid drug discovery efforts. Furthermore, efforts to provide multiple structural snapshots of Census proteins — determined under different biologically relevant conditions — would be both informative and valuable for objectively assessing the suitability of these proteins to bind small-molecule drugs.

Chemical tractability using structure

Although the precedence-based druggability assessment described above provides a useful indication of druggability and also identifies drug repurposing opportunities, this approach has its limitations. First, it highlights protein families that have been drugged successfully in the past; therefore, it is historically biased as it ignores the possibility of a druggable but as yet undrugged protein family. Second, it assumes that all members of a given protein family are equally druggable. However, using the structural data available, the application of structure-based druggability predictions can extend the druggability boundaries and help to address some of these caveats.

Briefly, the structure-based druggability assessment identifies all cavities on a given three-dimensional structure and assesses their likely druggability based on physicochemical parameters that are independent of the homology of the protein to known drug targets. Two models of druggability predictions are used here: strict druggability (in which the cavity is compatible with binding to a 'rule of five'53 (RO5)-compliant drug-like molecule) and the more relaxed chemical tractability (see Box 1 and Supplementary information S1 (notes)). Interestingly, we have found that a total of 103 proteins from the Cancer Gene Census are predicted to be druggable using the strict model, which can be extended to 211 proteins using the relaxed chemical tractability model.

Examining the druggability results by functional class (Fig. 3b) reveals that enzymes are the most druggable among the Census proteins: 63% of enzymes with known structures are predicted to be druggable using the strict model. Conversely, 94% of transcription factors with a known structure are predicted to be undruggable using the strict model.

The strength of the structure-based algorithm is that it allows the identification of novel and potentially druggable proteins that lie outside its historical training set, such as histone-binding proteins (Supplementary information S1 (notes)). However, this approach is limited by incomplete structural characterization of the proteins. Out of the 27 enzymes with known structures that fail the druggability prediction assessment, we have found that structures of the possible druggable domains are not available for 15 enzymes (for example, the catalytic domain of fibroblast growth factor receptor 3) but these enzymes can be annotated as being druggable because of their homology to druggable structures. Five additional enzymes are predicted to be chemically tractable using the relaxed model. These include the oncoprotein MDM2, which has two druggable domains according to this model, both of which are targets of inhibitors that are currently in Phase I trials54.

A further potential limitation of de novo structure-based druggability assessment is that some proteins can be predicted to be undruggable because the available structures are in an undruggable conformation. For example, isocitrate dehydrogenase 1 (IDH1) has multiple structures in three major conformations, only one of which is druggable using the strict model. This transient site phenomenon, which has also been identified in other proteins55, can be partially addressed by using as many structural snapshots of the same protein as possible and by including homologues to aid the understanding of possible variations in the three-dimensional protein structure.

Chemical landscape for Census proteins

Small-molecule chemical tools56 can substantially enhance biological validation and mechanistic evaluation, and in particular aid the understanding of the specific functional role of a protein in cancer. Additionally, the presence of published chemical screens for a target or a very close homologue indicates the existence of viable binding or biochemical assays — a key component needed for expediting the therapeutic exploration of a potential target. To identify which proteins from the Cancer Gene Census have active compounds in the literature that can be potentially utilized as tools, we used canSAR25 — an integrated database that brings together biological, chemical and pharmacological data on all human proteins — to report the number of active compounds for each Census protein and its homologues. In addition to identifying possible tool compounds, this analysis helps in predicting the chemical tractability of the potential targets.

Submicromolar active compounds identified in binding or biochemical assays have been reported in the literature for 86 of the 479 Census proteins (see section 7 in Supplementary information S1 (notes)). Of these, submicromolar drug-like compounds have been reported for 73 proteins. Generally, we can hypothesize that chemical hits can be mapped based on target homology. An additional four Census proteins have at least 50% sequence homology to a target with active compounds. Assay protocols have been published for all of these putative targets; furthermore, 69 of these targets have compounds that are active in cellular assays and so they are likely to be cell-penetrant chemical tools. An analysis of compounds that are active against Census proteins is provided in Supplementary information S1 (notes).

Multifaceted assessment of Census proteins

As seen above, pathogenic importance in cancer does not always correlate with chemical tractability. We have ranked the members of the Cancer Gene Census list based on evidence of chemical tractability: namely, family precedence, availability of submicromolar compounds and structure-based druggability (see Supplementary information S2 (table)). Figure 4 shows the overlap among these three pieces of evidence. We have found that a total of 132 Census proteins have at least one piece of evidence for chemical tractability (Fig. 4) and that 173 can be designated as tractable based on ≥50% homology to a tractable protein.

Figure 4: Multifaceted approach to identify suitable targets for drug discovery from the Cancer Gene Census.
Figure 4

This diagram illustrates the integration of evidence for the chemical tractability of Census proteins. There is evidence of chemical tractability for a total of 132 Census targets. Twenty-five are targets of approved drugs (represented by the pink region). For all but one of these targets (thyroid-stimulating hormone receptor (TSHR), the target of a biotherapeutic drug) active small-molecule compounds have been published. For a total of 86 Census proteins, reports on active compounds with submicromolar activity in biochemical or binding assays have been published (represented by the blue region). We have identified 103 Census proteins that have structures with predicted druggable cavities (represented by the green region) using the strict druggability criteria (see Box 1 and Supplementary information S1 (notes)). Of particular interest, there are 46 druggable Census proteins for which there are few or no published compounds in the literature. When combined with disease-specific information and knowledge of targets and/or pathways, these alternative pieces of evidence produce objective criteria for selecting the best targets for further validation. The novel potentially druggable targets could indicate new biological areas for chemical exploration but they also present a higher potential drug development risk, which can be reduced by the identification and use of chemical tools. IP, intellectual property.

Interestingly, we have identified 27 Census proteins for which there are active compounds in the literature that have submicromolar potency, despite the fact that these proteins were not predicted to be druggable by either structure- or precedence-based methods (Fig. 4). No three-dimensional structures have been determined for 11 of these proteins; the remainder (for example, the serine/threonine kinase BUB1B) are only partially structurally determined and the druggable region is missing, further highlighting the importance and limitations of utilizing three-dimensional structural data.

The combination of these orthogonal assessments — that is, structure-based, ligand-based and precedence-based assessments — can more comprehensively identify possible tractable targets for future drug discovery (Fig. 4). Importantly, using structure-based assessments, we have identified 46 Census proteins that are considered to be druggable, but for which few or no active compounds have yet been published. Table 1 details these 46 targets grouped into functional classes. Of these, 16 are enzymes — predominantly methyltransferases, helicases and ligases. Other targets encompass a diverse range of functions, and include histone-binding proteins and regulatory subunits of enzymes.

Table 1: Druggable Cancer Gene Census proteins with no or few published active compounds

Some of the Census genes are oncogenes and others are tumour suppressors. Chemical intervention strategies for addressing these two classes will be different. It is easier, for example, to inhibit an abnormally activated enzyme but much more difficult to correct a loss-of-function phenotype. Out of the 46 targets, 26 are annotated by the Census as oncogenes in the reported cancer type, 19 are annotated as tumour suppressors28, and one — CBL, encoding an E3 ubiquitin protein ligase — is annotated as having both functions. Indeed, it is becoming increasingly common to identify genes that can act both as oncogenes and as tumour suppressors, depending on the specific genetic context57,58,59.

When there is no small-molecule chemical matter or crystal structure available for a protein, knowledge of close homologues provides a starting point in the initial search for chemical tools or potential druggable proteins. By mapping the annotation from homologous targets (with ≥50% homology), we have found that the number of potentially druggable proteins without chemical screening data increases from 46 to 83. Furthermore, the number of proteins with active compounds increases to 90, providing more chemical options for exploring cancer biology.

Potential novel druggable targets

Of particular interest are the 46 cancer-associated targets (Table 1) we have identified that are predicted to be druggable and are structurally characterized but appear to have undergone little or no active small-molecule chemical screening efforts as far as can be gleaned from the literature. These 46 proteins that are potentially druggable but as yet unexplored chemically are promising potential drug targets. The existence of three-dimensional structures for these proteins makes them excellent candidates for structure-based hit identification — for example, using fragment-based approaches. Some of these targets have only one druggable structure available, and further structural characterization may support or contradict the evidence for the existence of a chemically tractable pocket. Nonetheless, the list of 46 cancer-associated targets contains some interesting proteins that merit further investigation. It is worth noting that this list is based on the strict definition of druggability and can be expanded to 143 targets when considering a less stringent definition (see Supplementary information S2 (table)). Of considerable interest, 27 of the 46 targets are labelled in the Cancer Gene Census as oncogenes, having dominant genetics, and thus small molecules that could block their activity would be especially desirable as chemical tools. We describe some examples of these targets below (Fig. 5).

Figure 5: Examples of predicted druggable cancer targets from the Cancer Gene Census.
Figure 5

Three-dimensional structures are shown of four selected examples of targets that are predicted to be druggable but for which active small-molecule inhibitors have not yet been published. The druggable cavity is displayed as a surface. The Protein Data Bank (PDB) code of the displayed structure is given after the name. a | Guanine nucleotide-binding protein Gs α-subunit isoform (GNAS); a somatic, dominant activating mutation in this gene is implicated in pituitary adenoma. b | Isocitrate dehydrogenase 1 (IDH1); the predicted druggable cavity is the α-ketoglutarate binding site rather than the cofactor NADP binding site; a somatic, dominant change-of-function mutation in this gene is implicated in type II glioma, glioblastoma and leukaemia. The druggable cavity is only fully formed in some conformations of this protein. c | Soluble calcium-activated nucleotidase 1 (CANT1); CANT1 is androgen-regulated, and is overexpressed in prostate cancer; reduction of CANT1 expression reduces prostate cell proliferation. d | Transcriptional activator BRG1 (also known as SMARCA4); SMARCA4 has a potential dual role as both a tumour suppressor and an oncogene in multiple cancers, including gastric and ovarian cancer.

The guanine nucleotide-binding protein Gs α-subunit isoform GNAS (Fig. 5a) is identified in the Census as having a dominant activating mutation in pituitary adenoma28. It is involved in the hormonal regulation of adenylyl cyclase, thus stimulating the synthesis of cyclic AMP. Further activating mutations of the GNAS gene have been identified in kidney, thyroid, adenocortical, colorectal and Leydig cell tumours60. Small-molecule inhibitors of this enzyme would therefore serve as tools for validating the role of GNAS in the biology of these cancers and may have potential therapeutic applications.

The enzyme IDH1 (Fig. 5b) is a cytosolic dehydrogenase that is involved in the third step of the citric acid cycle. Mutations in the IDH1 gene are found in 80% of grade II–III gliomas and secondary glioblastomas in humans, and have more recently been linked to various types of leukaemia, including acute myeloid leukaemia61. IDH1 mutations result in loss of the enzyme's ability to catalyse the conversion of isocitrate to α-ketoglutarate; instead, mutant proteins gain a neomorphic enzymatic activity, catalysing the reduction of isocitrate to the 'oncometabolite' 2-hydroxyglutarate, which plays a part in the development and progression of malignant brain tumours62. Despite the growing interest in the role of this enzyme, particularly in gliomas63, at the time of our analysis no chemical tool compounds had been published to help understand its biology. However, after the completion of our analysis, the results of a high-throughput screen of more than 3,000 compounds against IDH1 have been deposited into the PubChem database64, heralding the beginnings of efforts towards the chemical modulation of this important target.

Calcium-activated nucleotidase 1 (CANT1) (Fig. 5c), a member of the apyrase family, is a soluble nucleotidase that preferentially hydrolyses di- and triphosphates in a calcium-dependent manner65. Various CANT1–ETV4 (ETS variant 4) fusion transcripts have been identified in prostate cancer28,66. In addition, CANT1 is overexpressed in prostate tumours, and a reduction in CANT1 expression in prostate cancer cell lines results in decreased proliferation and migration67. The catalytic site of CANT1 is predicted to be druggable based on our analysis, and target validation studies would benefit from the availability of a small-molecule chemical inhibitor of CANT1.

A family of related enzymes, namely DEAD box helicases (DEAD box helicase 5 (DDX5), DDX6, DDX10, DDX17 and eukaryotic translation initiation factor 4A (EIF4A)), are listed in the Census because activating mutations in these enzymes have been implicated in several solid cancers and leukaemias28. According to our analysis, the catalytic domain of this family is highly flexible and thus the formation of a druggable cavity is dependent on conformational rearrangement. The paucity of chemical probe data for members of this family may be partly due to this transient binding site or to difficulties in assay development. Nonetheless, the fact that this protein family is frequently associated with cancer, often with activating mutations, probably warrants investment into biological target validation, assay development and compound screening to identify inhibitors of these enzymes.

A total of 19 of the predicted druggable proteins in Table 1 are products of tumour suppressor genes. Therapeutic targeting of these proteins using small molecules will be more complex as it will require agonists or activators, or the modulation of an alternative protein that regulates their function. BRG1 (also known as SMARCA4) (Fig. 5d) is a transcriptional activator, and reported mutations have indicated its role as a tumour suppressor in various cancers68. It has two druggable domains: the AAA ATPase domain and a bromodomain. Developing small-molecule chemical tools to examine the specific roles of these domains would enhance our ability to understand the role of SMARCA4 in cancer. Moreover, it has been shown that rather than acting solely as a tumour suppressor, SMARCA4 is involved in senescence and in the activation of p53 (Ref. 69). SMARCA4 has also been shown to be overexpressed in various cancers70,71. This suggests that small-molecule modulators of SMARCA4, either activators or inhibitors, would be useful as chemical tools and could have therapeutic potential.

A network view of evidence-based assessment

Pathogenic driver genes in cancer function within complex protein interaction networks12. Mapping the information discussed in this present study onto a protein interaction network has the potential to allow the identification of key druggable intervention points and also to help pinpoint potential alternative targets that in turn regulate undruggable disease drivers. Figure 6 shows an interaction network of the proteins from the Cancer Gene Census; the interaction network is annotated according to biological roles (Fig. 6a), the precedence- and chemistry-based approaches (Fig. 6b), the structure-based assessment approaches (Fig. 6c) and, finally, a combination of all these annotations (Fig. 6d).

Figure 6: A network view of evidence-based assessment of proteins from the Cancer Gene Census.
Figure 6

A protein interaction network of the Census proteins has been generated by querying large public protein-interaction databases (see Supplementary information S1 (notes) for details). Most Census proteins are connected within a broad interaction network. However, 97 are not directly connected; these are shown as singletons at the bottom of the figures. The size of the node reflects the number of its first-degree interactions. a | The network is annotated according to biological role (dominant or recessive genetics) as defined in the Census report. b | The network is annotated based on precedence (targets of approved drugs) and published chemistry data. c | The network is annotated according to the structure-based druggability assessment. d | The diagram depicts the combined annotation of all of the above networks.

The resulting global interaction network reflects biological processes that are important to multiple cancers, from which some interesting patterns emerge with respect to therapeutic opportunities. With regard to biological role, oncoproteins and tumour suppressors tend to occupy geographically distinct regions of the network: some subnetworks are predominantly oncogenic, whereas others are predominantly tumour-suppressive (Fig. 6a). Targets of approved pharmaceuticals tend to be well connected, but interestingly are not major hubs of the network, and they primarily cluster within the same oncogenic subnetwork (Fig. 6b). This partially reflects a historical bias, as molecularly targeted oncology drugs are typically inhibitors or antagonists that are aimed at blocking the function of oncoproteins, which are clustered into a single major oncogenic subnetwork. However, Fig. 6b also shows that this historical activity appears to have neglected, to a large degree, other oncoproteins in smaller subnetworks. This finding poses the question of whether the previous focus of drug discovery efforts within limited cancer subnetworks may help to explain the high rates of emerging resistant clones that may exploit alternative routes through the global network9,10. Despite the fact that historical drug discovery activity has focused on limited subnetworks, we have found that targets that are predicted to be druggable fall within multiple subnetworks — both oncogenic and tumour-suppressive (Fig. 6c).

Finally, combining all of the above annotations reveals druggable oncogenic subnetworks outside the regions of historical bias (Fig. 6d). This observation could provide support for approaches that explicitly aim to target these alternative, neglected subnetworks in future drug discovery programmes. Furthermore, applying the above analysis to a network derived from data relating to a specific cancer type may similarly highlight key druggable subnetworks in that particular cancer, and this could prove to be useful in identifying targets for rational drug combinations12.


The global, systematic, objective and multidisciplinary computational analysis presented here allows for the effective, unbiased and data-driven assessment and prioritization of large, biologically compelling gene lists for the purpose of drug discovery. This approach, applied here to the Cancer Gene Census as an exemplar gene list, can be adapted to any gene set of interest that may emerge from large-scale omics studies, functional screening initiatives or GWAS39,72 from any therapeutic area. The full data set of 479 Census genes, with all the annotations from the analysis presented here, is available in Supplementary information S2 (table).

We emphasize that the large-scale approach described here is not a substitute for a detailed biological understanding of individual pathogenic targets and pathways; rather, it can be used alongside and indeed inform such classical in-depth studies. Currently, the selection of targets for validation generally tends to be an ad hoc process that does not take advantage of the sheer volume and multidisciplinary breadth and depth of data that are now becoming increasingly available from the abundant large-scale initiatives that are ongoing worldwide.

As detailed biological validation of each individual target is a challenging, long and expensive process26,27, objective prioritization to a more manageable gene list is essential. We have shown that, by integrating information from different large-scale research initiatives — comprising information on protein functional class, homology to targets of approved drugs, three-dimensional structure and the existence of published active small molecules — we are able very effectively to annotate a biologically and pathogenically important gene list containing potential drug targets, exemplified here by the Cancer Gene Census. We have also detected potential 'quick wins' through the identification of targets for repurposing known drugs or active chemical tool compounds that can be tested for activity in cancer models. Such small-molecule probe compounds have the potential to facilitate biological research and target validation and act as pathfinders for drug discovery7. The ability to identify published active compounds rapidly and systematically makes hypothesis testing using small-molecule chemical tools achievable without needing initial medicinal chemistry investment.

Of interest, our analysis has revealed the intriguing tension between targets that are chemically tractable with current medicinal chemistry capabilities versus those that are biologically important in cancer. Many oncoproteins are intractable with conventional medicinal chemistry approaches. These targets can be potentially addressed with small-molecule compounds through two strategies: by investing in medicinal chemistry approaches to expand the boundaries of druggability20; or via the identification of alternative druggable targets within the pathway or subnetwork in question12,36.

Although the Cancer Gene Census has the advantage of being a richly annotated and well-curated data set, the protocol and analysis demonstrated here for cancer can be applied to any human gene set from any disease area. In addition, the data in canSAR relate to all human proteins. Whereas certain biological data such as gene mutations and copy number variations focus on cancer, the three-dimensional structure data, druggability assessments and chemical bioactivity data are comprehensive and relate to the entire proteome and multiple model organisms. This makes the analysis and underlying tools applicable to most human diseases.

When combined with disease-specific biological information, together with an understanding of the target and cognate pathway involved in disease causation, the wealth of integrated multidisciplinary data can help provide a powerful, unbiased rationale for target prioritization and selection. This analysis utilizes large-scale multidisciplinary knowledge, including peer-reviewed and curated public chemoinformatic data from the medicinal chemistry literature24,25. In the future, this information could be further enriched by mining the patent literature. Well-curated patent resources do not currently exist in the public domain, and commercial databases are limited for large-scale analyses. Also, patents generally lack detailed structure–activity relationship data and precise individual compound potencies. In addition to the well-curated and peer-reviewed chemical data literature sources currently included in our approach, it should be possible to apply our methodology to other chemical databases in the public domain such as the PubChem database64, which will increase the breadth of the chemical annotation, albeit at the expense of including data that are not curated or peer-reviewed.

By mapping the multidisciplinary information discussed in the present study onto a protein interaction network, we reveal that targeted cancer drugs have historically focused on limited oncogenic subnetworks, neglecting other potentially druggable oncogenic and tumour-suppressive subnetworks that may be crucial for disease pathology and drug resistance. Mapping the chemogenomic and druggability data onto a disease-derived protein interaction network and combining them with biological data and existing knowledge of the disease will help identify key druggable nodes and suggest alternative approaches to modulate less druggable targets. In addition, the identification of chemically neglected subnetworks may reveal alternative therapeutic approaches and synergistic combinations that could inform target selection either for drug discovery or for drug repurposing, with the aim of identifying rational drug combinations that have the potential to overcome the major challenge of drug resistance9,10,12.

Perhaps surprisingly, our analysis shows that the horizon of cancer drug discovery may not be as dominated by intractable targets as is frequently feared73. Although efforts to extend the target space and drug the 'undruggable'73 continue to be necessary, our analysis shows that there are many protein classes with active sites or binding pockets that are compatible with conventional small-molecule drug intervention, but they have yet to be explored chemically. Some of these classes are beginning to be targeted (for example, bromodomains)74,78 but, on the basis of available evidence, others do not appear to be receiving extensive medicinal chemistry investments.

Of particular interest for new drug discovery, we have identified 46 proteins in the Cancer Gene Census (as highlighted in Table 1, with specific examples discussed above) that need to be examined carefully as they may provide the next wave of tractable targets for cancer. These proteins, which are predicted to be druggable but for which there is a lack of chemical compounds to modulate them, represent potential novel biological targets for chemical exploitation. This novelty carries with it an increased development risk compared to tried and tested targets, but this risk could be reduced by the discovery and use of small-molecule chemical tools7,56. Additionally, and despite the widely accepted importance of structural biology in modern drug discovery, we have noted that only 26% of human proteins are structurally characterized (either themselves or via their very close orthologues). Moreover, as we show, many of these proteins are only partially characterized. Indeed, only 12% of the amino acid content of the proteome is structurally characterized, highlighting the urgent need for further investment to expand the number and diversity of available protein structures82.

To empower the research community to carry out its own objective computational assessments, we provide a live web-based tool to allow the rapid annotation of human gene lists with the type of integrative information used in our own analysis (see the canSAR website). This tool has the advantage of allowing researchers to refresh their assessments regularly in response to the fast-changing information landscape, which is continually updated on the canSAR website25. We strongly recommend that the information outputs that researchers obtain from their use of our approach and computational tools are considered together with their own internal experimental data or in-house analyses — for example, of proprietary patent databases.

In conclusion, recent surveys26,27 have emphasized the importance of very careful and thorough target validation to reduce the risk of failure later in the drug development pipeline. However, in-depth 'wet biology' assessments to achieve this goal are time-consuming and expensive, which precludes conducting such studies on large numbers of targets to match the scale of omics initiatives. In addition to having biological and pathogenic importance as well as acceptable druggability or chemical tractability, targets need to have available biological assays for identifying small-molecule chemical probes and progressing them into drug discovery programmes. The application of systematic, multidisciplinary and unbiased computational assessments, such as the one we describe and exemplify here, can help to prioritize those targets and target classes that will benefit most from further investment in target validation, assay development, chemical biology or medicinal chemistry to discover new small-molecule probes and drugs. This will have a major impact on opening up new target space for drug discovery in cancer and, by extrapolation, other disease areas.


  1. 1.

    et al. Global cancer statistics. CA Cancer J. Clin. 61, 69–90 (2011).

  2. 2.

    , & Advances in understanding cancer genomes through second-generation sequencing. Nature Rev. Genet. 11, 685–696 (2010).

  3. 3.

    et al. Principles for the post-GWAS functional characterization of cancer risk loci. Nature Genet. 43, 513–518 (2011).

  4. 4.

    & The grand challenge to decipher the cancer proteome. Nature Rev. Cancer 10, 652–660 (2010).

  5. 5.

    et al. Functional viability profiles of breast cancer. Cancer Discov. 1, 260–273 (2011).

  6. 6.

    et al. An integrated platform of genomic assays reveals small-molecule bioactivities. Nature Chem. Biol. 4, 498–506 (2008).

  7. 7.

    & Probing the probes: fitness factors for small molecule tools. Chem. Biol. 17, 561–577 (2010).

  8. 8.

    , , & Making sense of cancer genomic data. Genes Dev. 25, 534–555 (2011).

  9. 9.

    & Circumventing cancer drug resistance in the era of personalized medicine. Cancer Discov. 2, 214–226 (2012).

  10. 10.

    & Clonal evolution in cancer. Nature 481, 306–313 (2012).

  11. 11.

    Exploring the genomes of cancer cells: progress and promise. Science 331, 1553–1558 (2011).

  12. 12.

    , & Combinatorial drug therapy for cancer in the post-genomic era. Nature Biotech. 30, 679–692 (2012).

  13. 13.

    et al. Abiraterone and increased survival in metastatic prostate cancer. N. Engl. J. Med. 364, 1995–2005 (2011).

  14. 14.

    et al. Anaplastic lymphoma kinase inhibition in non-small-cell lung cancer. N. Engl. J. Med. 363, 1693–1703 (2010).

  15. 15.

    et al. Improved survival with vemurafenib in melanoma with BRAF V600E mutation. N. Engl. J. Med. 364, 2507–2516 (2011).

  16. 16.

    , & The productivity crisis in pharmaceutical R&D. Nature Rev. Drug Discov. 10, 428–438 (2011).

  17. 17.

    & Exploiting the cancer genome: strategies for the discovery and clinical development of targeted molecular therapeutics. Annu. Rev. Pharmacol. Toxicol. 52, 549–573 (2012).

  18. 18.

    , & Fragment screening to predict druggability (ligandability) and lead discovery success. Drug Discov. Today 16, 284–287 (2011).

  19. 19.

    et al. An inhibitor of Bcl-2 family proteins induces regression of solid tumours. Nature 435, 677–681 (2005).

  20. 20.

    & The challenge of drugging undruggable targets in cancer: lessons learned from targeting BCL-2 family members. Clin. Cancer Res. 13, 7264–7270 (2007).

  21. 21.

    et al. International network of cancer genome projects. Nature 464, 993–998 (2010).

  22. 22.

    , , & Utilizing RNA interference to enhance cancer drug discovery. Nature Rev. Drug Discov. 6, 556–568 (2007).

  23. 23.

    et al. Genomic-scale prioritization of drug targets: the TDR Targets database. Nature Rev. Drug Discov. 7, 900–907 (2008).

  24. 24.

    et al. ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res. 40, D1100–D1107 (2012).

  25. 25.

    , , & canSAR: an integrated cancer public translational research and drug discovery resource. Nucleic Acids Res. 40, D947–D956 (2012).

  26. 26.

    & Drug development: raise standards for preclinical cancer research. Nature 483, 531–533 (2012).

  27. 27.

    , & Believe it or not: how much can we rely on published data on potential drug targets? Nature Rev. Drug Discov. 10, 712 (2011).

  28. 28.

    et al. A census of human cancer genes. Nature Rev. Cancer 4, 177–183 (2004).

  29. 29.

    , & How many drug targets are there? Nature Rev. Drug Discov. 5, 993–996 (2006).

  30. 30.

    Transcription factors as targets for cancer therapy. Nature Rev. Cancer 2, 740–749 (2002).

  31. 31.

    et al. Direct inhibition of the NOTCH transcription factor complex. Nature 462, 182–188 (2009).

  32. 32.

    , & TNF- and cancer therapy-induced apoptosis: potentiation by inhibition of NF-kB. Science 274, 784–787 (1996).

  33. 33.

    & The STATs of cancer — new molecular targets come of age. Nature Rev. Cancer 4, 97–105 (2004).

  34. 34.

    & ETS transcription factors and their emerging roles in human cancer. Eur. J. Cancer 41, 2462–2478 (2005).

  35. 35.

    Outsmarting a mastermind. Dev. Cell 17, 750–752 (2009).

  36. 36.

    et al. The aurora kinase inhibitor CCT137690 downregulates MYCN and sensitizes MYCN-amplified neuroblastoma in vivo. Mol. Cancer Ther. 10, 2115–2123 (2011).

  37. 37.

    et al. The genome of the blood fluke Schistosoma mansoni. Nature 460, 352–358 (2009).

  38. 38.

    et al. Computational repositioning of the anticonvulsant topiramate for inflammatory bowel disease. Sci. Transl. Med. 3, 96ra76 (2011).

  39. 39.

    et al. Use of genome-wide association studies for drug repositioning. Nature Biotech. 30, 317–320 (2012).

  40. 40.

    et al. PPARγ insufficiency promotes follicular thyroid carcinogenesis via activation of the nuclear factor-kB signaling pathway. Oncogene 25, 2736–2747 (2005).

  41. 41.

    et al. Novel high-affinity PPARγ agonist alone and in combination with paclitaxel inhibits human anaplastic thyroid carcinoma tumor growth via p21WAF1//CIP1. Oncogene 25, 2304–2317 (2005).

  42. 42.

    et al. Induction of solid tumor differentiation by the peroxisome proliferator-activated receptor-γ ligand troglitazone in patients with liposarcoma. Proc. Natl Acad. Sci. USA 96, 3951–3956 (1999).

  43. 43.

    et al. Thyrotropin receptor gene alterations in thyroid hyperfunctioning adenomas. J. Clin. Endocrinol. Metab. 81, 1548–1551 (1996).

  44. 44.

    et al. Somatic mutations in the thyrotropin receptor gene cause hyperfunctioning thyroid adenomas. Nature 365, 649–651 (1993).

  45. 45.

    Hyperfunctioning thyroid adenoma and activating mutations in the TSH receptor gene. Arch. Med. Res. 30, 510–513 (1999).

  46. 46.

    et al. Effectiveness of peripheral thyrotropin receptor mRNA in follow-up of differentiated thyroid cancer. Ann. Surg. Oncol. 16, 473–480 (2009).

  47. 47.

    et al. A low-molecular-weight antagonist for the human thyrotropin teceptor with therapeutic potential for hyperthyroidism. Endocrinology 149, 5945–5950 (2008).

  48. 48.

    et al. A small molecule inverse agonist for the human thyroid-stimulating hormone receptor. Endocrinology 151, 3454–3459 (2010).

  49. 49.

    et al. Efficacy and safety of vismodegib in advanced basal-cell carcinoma. N. Engl. J. Med. 366, 2171–2179 (2012).

  50. 50.

    The Cancer Genome Atlas Research Network. Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature 455, 1061–1068 (2008).

  51. 51.

    et al. Inhibition of Syk with fostamatinib disodium has significant clinical activity in non-Hodgkin lymphoma and chronic lymphocytic leukemia. Blood 115, 2578–2585 (2010).

  52. 52.

    et al. A novel retinoblastoma therapy from genomic and epigenetic analyses. Nature 481, 329–334 (2012).

  53. 53.

    , , & Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv. Drug Deliv. Rev. 23, 3–25 (1997).

  54. 54.

    , , & Novel targeted therapeutics: inhibitors of MDM2, ALK and PARP. J. Hematol. Oncol. 4, 16 (2011).

  55. 55.

    & Structural biology and drug discovery of difficult targets: the limits of ligandability. Chem. Biol. 19, 42–50 (2012).

  56. 56.

    & New approaches to molecular cancer therapeutics. Nature Chem. Biol. 2, 689–700 (2006).

  57. 57.

    et al. The tumour suppressor C/EBPδ inhibits FBXW7 expression and promotes mammary tumour metastasis. EMBO J. 29, 4106–4117 (2010).

  58. 58.

    , , & A tumor suppressor and oncogene: the WT1 story. Leukemia 21, 868–876 (2007).

  59. 59.

    , & E1A: tumor suppressor or oncogene? Preclinical and clinical investigations of E1A gene therapy. Breast Cancer 8, 285–293 (2001).

  60. 60.

    & Crystal structure of the CSL-Notch-mastermind ternary complex bound to DNA. Cell 124, 985–996 (2006).

  61. 61.

    et al. The common feature of leukemia-associated IDH1 and IDH2 mutations is a neomorphic enzyme activity converting α-ketoglutarate to 2-hydroxyglutarate. Cancer Cell 17, 225–234 (2010).

  62. 62.

    et al. Cancer-associated IDH1 mutations produce 2-hydroxyglutarate. Nature 462, 739–744 (2009).

  63. 63.

    & Virchow 2011 or how to ID(H) human glioblastoma. J. Clin. Oncol. 29, 4473–4474 (2011).

  64. 64.

    , , & PubChem: integrated platform of small molecules and biological activities. Annu. Rep. Comput. Chem. 4, 217–241 (2008).

  65. 65.

    , , & Cloning, expression, and characterization of a soluble calcium-activated nucleotidase, a human enzyme belonging to a new family of extracellular nucleotidases. Arch. Biochem. Biophys. 406, 105–115 (2002).

  66. 66.

    et al. Two unique novel prostate-specific and androgen-regulated fusion partners of ETV4 in prostate cancer. Cancer Res. 68, 3094–3098 (2008).

  67. 67.

    et al. The androgen-regulated calcium-activated nucleotidase 1 (CANT1) is commonly overexpressed in prostate cancer and is tumor-biologically relevant in vitro. Am. J. Pathol. 178, 1847–1860 (2011).

  68. 68.

    et al. Frequent BRG1/SMARCA4-inactivating mutations in human lung cancer cell lines. Hum. Mutat. 29, 617–622 (2008).

  69. 69.

    et al. The BRG1 ATPase of chromatin remodeling complexes is involved in modulation of mesenchymal stem cell senescence through RB-P53 pathways. Oncogene 29, 5452–5463 (2010).

  70. 70.

    et al. Increased expression but not genetic alteration of BRG1, a component of the SWI/SNF complex, is associated with the advanced stage of human gastric carcinomas. Pathobiology 69, 315–320 (2001).

  71. 71.

    et al. Comparison of expression profiles in ovarian epithelium in vivo and ovarian cancer identifies novel candidate genes involved in disease pathogenesis. PLoS ONE 6, e17617 (2011).

  72. 72.

    et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl Acad. Sci. USA 106, 9362–9367 (2009).

  73. 73.

    in The Harvey Lectures: Series 102, 2006–2007 1–16 (Wiley-Blackwell; 2010).

  74. 74.

    et al. Selective inhibition of BET bromodomains. Nature 468, 1067–1073 (2010).

  75. 75.

    et al. The Protein Data Bank. Nucleic Acids Res. 28, 235–242 (2000).

  76. 76.

    New method for fast and accurate binding-site identification and analysis. Chem. Biol. Drug Des. 69, 146–148 (2007).

  77. 77.

    et al. Syk tyrosine kinase is linked to cell motility and progression in squamous cell carcinomas of the head and neck. Cancer Res. 67, 7907–7916 (2007).

  78. 78.

    , , & Druggability analysis and structural classification of bromodomain acetyl-lysine binding sites. J. Med. Chem. 55, 7346–7359 (2012).

  79. 79.

    , , & The Binding Database: data management and interface design. Bioinformatics 18, 130–139 (2002).

  80. 80.

    The NCI60 human tumour cell line anticancer drug screen. Nature Rev. Cancer 6, 813–823 (2006).

  81. 81.

    , , , & Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).

  82. 82.

    et al. The scientific impact of the Structural Genomics Consortium: a protein family and ligand-centered approach to medically-relevant human proteins. J. Struct. Funct. Genomics 8, 107–119 (2007).

Download references


This work was supported by Cancer Research UK (grant numbers C309/A8274 and C309/A11566). P.W. is a Cancer Research UK Life Fellow. The authors acknowledge additional funding from Cancer Research UK to the Cancer Research UK Cancer Centre and from the UK National Health Service (NHS) to the National Institute for Health Research (NIHR) Biomedical Research Centre at The Institute of Cancer Research and Royal Marsden Hospital, UK. The authors thank K. Bulusu for technical help, and thank J. Blagg, M. Garnett and U. McDermott for valuable discussions and comments. Author contributions: B.A.L. conceived the project and designed the analysis; M.P., M.H.B. and B.A.L. performed the data analysis and informatics and wrote the paper; P.W. provided biological analysis and insights and wrote the paper; J.T. developed the target annotation tool.

Author information


  1. Cancer Research UK Cancer Therapeutics Unit, The Institute of Cancer Research, London SM2 5NG, UK.

    • Mishal N. Patel
    • , Mark D. Halling-Brown
    • , Joseph E. Tym
    • , Paul Workman
    •  & Bissan Al-Lazikani


  1. Search for Mishal N. Patel in:

  2. Search for Mark D. Halling-Brown in:

  3. Search for Joseph E. Tym in:

  4. Search for Paul Workman in:

  5. Search for Bissan Al-Lazikani in:

Competing interests

M.P., J.T., P.W. and B.A.L. are employees of The Institute of Cancer Research (ICR), which has a commercial interest in inhibitors of cytochrome P450-C17 (CYP17), heat shock protein 90 (HSP90), phosphoinositide 3-kinase (PI3K), protein kinase B (PKB), histone deacetylase and other targets, and operates a 'Rewards to Inventors' scheme. P.W. and colleagues at ICR have received research funding from Cougar Biotechnology, Johnson & Johnson, Vernalis, Yamanouchi, Piramed Pharma (acquired by Roche), Astex Pharmaceuticals, AstraZeneca, Sareum, Merck Serono and Chroma Therapeutics. P.W. is a consultant and/or a member of the scientific advisory board for Novartis, Piramed Pharma, Astex Pharmaceuticals, Chroma Therapeutics, Kudos Pharmaceuticals, Wilex and Nextech Invest.

Corresponding authors

Correspondence to Paul Workman or Bissan Al-Lazikani.

Supplementary information

PDF files

  1. 1.

    Supplementary information S1

    Dataset, methodology and additional notes

Excel files

  1. 1.

    Supplementary information S2

    Descriptions of Supplementry Table 2

About this article

Publication history



Further reading