Analysis | Published:

The support of human genetic evidence for approved drug indications

Nature Genetics volume 47, pages 856860 (2015) | Download Citation


Over a quarter of drugs that enter clinical development fail because they are ineffective. Growing insight into genes that influence human disease may affect how drug targets and indications are selected. However, there is little guidance about how much weight should be given to genetic evidence in making these key decisions. To answer this question, we investigated how well the current archive of genetic evidence predicts drug mechanisms. We found that, among well-studied indications, the proportion of drug mechanisms with direct genetic support increases significantly across the drug development pipeline, from 2.0% at the preclinical stage to 8.2% among mechanisms for approved drugs, and varies dramatically among disease areas. We estimate that selecting genetically supported targets could double the success rate in clinical development. Therefore, using the growing wealth of human genetic data to select the best targets and indications should have a measurable impact on the successful development of new drugs.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.


  1. 1.

    , , & Trends in risks associated with new drug development: success rates for investigational drugs. Clin. Pharmacol. Ther. 87, 272–277 (2010).

  2. 2.

    & Trial watch: phase II and phase III attrition rates 2011–2012. Nat. Rev. Drug Discov. 12, 569 (2013).

  3. 3.

    et al. Lessons learned from the fate of AstraZeneca's drug pipeline: a five-dimensional framework. Nat. Rev. Drug Discov. 13, 419–431 (2014).

  4. 4.

    et al. Can the flow of medicines be improved? Fundamental pharmacokinetic and pharmacological principles toward improving Phase II survival. Drug Discov. Today 17, 419–424 (2012).

  5. 5.

    et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res. 42, D1001–D1006 (2014).

  6. 6.

    , & Validating therapeutic targets through human genetics. Nat. Rev. Drug Discov. 12, 581–594 (2013).

  7. 7.

    et al. Common variants at 30 loci contribute to polygenic dyslipidemia. Nat. Genet. 41, 56–65 (2009).

  8. 8.

    et al. Genetics of rheumatoid arthritis contributes to biology and drug discovery. Nature 506, 376–381 (2014).

  9. 9.

    et al. Use of genome-wide association studies for drug repositioning. Nat. Biotechnol. 30, 317–320 (2012).

  10. 10.

    et al. GWASdb: a database for human genetic variants identified by genome-wide association studies. Nucleic Acids Res. 40, D1047–D1054 (2012).

  11. 11.

    & Rational drug repositioning by medical genetics. Nat. Biotechnol. 31, 1080–1082 (2013).

  12. 12.

    & The druggable genome. Nat. Rev. Drug Discov. 1, 727–730 (2002).

  13. 13.

    , , , & Genic intolerance to functional variation and the interpretation of personal genomes. PLoS Genet. 9, e1003709 (2013).

  14. 14.

    , & UMLS-Interface and UMLS-Similarity: open source software for measuring paths and semantic similarity. AMIA Annu. Symp. Proc. 2009, 431–435 (2009).

  15. 15.

    et al. Genome-wide meta-analysis identifies novel multiple sclerosis susceptibility loci. Ann. Neurol. 70, 897–912 (2011).

  16. 16.

    1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010).

  17. 17.

    et al. Mapping the genetic architecture of gene expression in human liver. PLoS Biol. 6, e107 (2008).

  18. 18.

    et al. Annotation of functional variation in personal genomes using RegulomeDB. Genome Res. 22, 1790–1797 (2012).

  19. 19.

    et al. Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 1190–1195 (2012).

  20. 20.

    et al. The Comparative Toxicogenomics Database: update 2013. Nucleic Acids Res. 41, D1104–D1114 (2013).

  21. 21.

    Semantic similarity in a taxonomy: an information-based measure and its application to problems of ambiguity in natural language. J. Artif. Intell. Res. 11, 95–130 (1999).

  22. 22.

    in Proc. Int. Conf. Machine Learning 296–304 (Morgan Kaufmann Publishers, 1998).

  23. 23.

    R Development Core Team. R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2014).

  24. 24.

    ggplot2: Elegant Graphics for Data Analysis (Springer, 2009).

Download references


We would like to thank N. Srivastava for much of the manual mapping of GWASdb traits and Pharmaprojects indications to MeSH terms and P. Agarwal for many helpful conversations. GWASdb-related work was supported by the Research Grants Council, Hong Kong SAR, China (781511M, 17121414M) and the National Natural Science Foundation of China (91229105).

Author information


  1. Quantitative Sciences, GlaxoSmithKline, Research Triangle Park, North Carolina, USA.

    • Matthew R Nelson
    • , Jeffery L Painter
    •  & Judong Shen
  2. Quantitative Sciences, GlaxoSmithKline, Stevenage, UK.

    • Hannah Tipney
    • , John C Whittaker
    •  & Philippe Sanseau
  3. Department of Systems Biology, Columbia University, New York, New York, USA.

    • Paola Nicoletti
    • , Yufeng Shen
    •  & Aris Floratos
  4. Department of Biomedical Informatics, Columbia University, New York, New York, USA.

    • Yufeng Shen
    •  & Aris Floratos
  5. State Key Laboratory of Brain and Cognitive Sciences, University of Hong Kong, Hong Kong, SAR, China.

    • Pak Chung Sham
  6. Centre for Genomics Sciences, Li Ka Shing (LKS) Faculty of Medicine, University of Hong Kong, Hong Kong SAR, China.

    • Pak Chung Sham
    • , Mulin Jun Li
    •  & Junwen Wang
  7. Department of Biochemistry, LKS Faculty of Medicine, University of Hong Kong, Hong Kong SAR, China.

    • Mulin Jun Li
    •  & Junwen Wang
  8. Alternative Discovery and Development, GlaxoSmithKline, Upper Merion, Pennsylvania, USA.

    • Lon R Cardon


  1. Search for Matthew R Nelson in:

  2. Search for Hannah Tipney in:

  3. Search for Jeffery L Painter in:

  4. Search for Judong Shen in:

  5. Search for Paola Nicoletti in:

  6. Search for Yufeng Shen in:

  7. Search for Aris Floratos in:

  8. Search for Pak Chung Sham in:

  9. Search for Mulin Jun Li in:

  10. Search for Junwen Wang in:

  11. Search for Lon R Cardon in:

  12. Search for John C Whittaker in:

  13. Search for Philippe Sanseau in:


This work was conceived by M.R.N., H.T., L.R.C., J.C.W. and P.S. The primary analyses were designed and conducted by M.R.N. Supporting analyses were provided by J.L.P. and J.S. The mapping of variants to genes was conducted by M.R.N., P.N., Y.S. and A.F. The GWASdb data were created and provided by P.C.S., M.J.L. and J.W. The manuscript was written by M.R.N. with contributions from H.T., J.C.W. and P.S.

Competing interests

M.R.N., H.T., J.L.P., J.S., L.R.C., J.C.W. and P.S. are employees of GlaxoSmithKline, a global healthcare company, that may conceivably benefit financially through this publication.

Corresponding author

Correspondence to Matthew R Nelson.

Integrated supplementary information

Supplementary information

PDF files

  1. 1.

    Supplementary Text and Figures

    Supplementary Figures 1–10 and Supplementary Note.

Excel files

  1. 1.

    Supplementary Table 1: Count of publications, associations and genes corresponding to each MeSH term.

    MeSH: unique MeSH terms mapped to GWASdb traits. Publications: the number of publications or unique data sources reporting associations with the MeSH term. Associations: the number of unique SNPs reported to be associated with the MeSH term. Genes: the number of unique genes to which the SNPs associated with the MeSH term are mapped.

  2. 2.

    Supplementary Table 2: Count of drugs and genes mapping to each MeSH term.

    MeSH: unique MeSH terms mapped to indications. Drugs: the number of drugs in Pharmaprojects with the MeSH term as an indication. Genes: the number of genes reported to be targets for the drugs with the MeSH term as an indication.

  3. 3.

    Supplementary Table 3: Count of drugs and MeSH terms corresponding to each drug target gene.

    Gene: unique drug target genes. Drugs: the number of drugs reported to target the gene product. MeSH: the number of MeSH terms for which the drugs targeting the gene product are indicated.

  4. 4.

    Supplementary Table 4: Count of target genes and MeSH terms corresponding to each drug.

    Drug: unique drugs. Genes: the number of genes reported to be targets for the drug. MeSH: the number of MeSH terms indicated for the drug.

  5. 5.

    Supplementary Table 5: All indications for drugs with reported human drug targets with the number of genes associated with them.

    MSH.Ind: drug indication (mapped to the best MeSH term). N.Traits: the number of indications or traits with a relative similarity ≥0.7 to the indication. N.Assns: the number of independent associations reported for the indication or a similar trait. Traits: list of traits with a relative similarity ≥0.7 to the indication; may be other indications without a corresponding genetic trait.lApprovedUS.EU: logical indication of whether the indication is for a drug approved in the United States or European Union.

  6. 6.

    Supplementary Table 6: Drug targets with a genetic association (GWASdb or OMIM) mapped to the same gene for a trait with relative similarity ≥0.7 to the corresponding indication.

    Gene: drug target gene. MSH.Ind: drug indication (mapped to the best MeSH term). Category: indication disease category. MSH.Trt: MeSH term for the trait with a genetic association to the drug target gene. pvalue: the P value for the genetic association. If multiple associations mapped the same gene to the same trait, the smallest P value with the largest gene score was selected. P values of zero indicate that the association is from OMIM. eCat: extension of the RegulomeDB category. If the variant mapping the association to the listed gene was not due to a DHS-correlated enhancer, a value of "0" was assigned to amino acid–changing variants, "2" was assigned for eQTLs and "9" was assigned otherwise. Associations from OMIM were not assigned a value. Rank: the rank of the mapping of the given gene for the reported association. RelSim: relative similarity between the indication and trait. LatestPhase: the latest phase that each unique gene-indication pair achieved in Pharmaprojects.

  7. 7.

    Supplementary Table 7: Counts of target-indication pairs included in the analyses presented in Figure 3b and used to estimate the enrichment of genetic associations.

    Phase.Latest: latest phase in the development pipeline to which each target-indication combination has progressed. Num.Targets: total number of target-indication combinations progressing to that phase. N.Overlap: number of target-indication combinations that overlap with a gene-trait association. Percent: percent of target-indication combinations that overlap with a gene-trait association. Genetic.Evidence: the source of the genetic evidence.

  8. 8.

    Supplementary Table 8: Manually scored MeSH term similarities to values of 0.9, 0.75 and 0.5 reflecting a subjective measure of similarity not captured via the ontological relationships.

    MSH1: MeSH term 1. MSH2: MeSH term 2. ManSim: manually assigned similarity. RelSim: relative similarity averaging Resnik and Lin relative similarity measures. RelSim.Res: Resnik relative similarity. RelSim.Lin: Lin relative similarity.

  9. 9.

    Supplementary Table 9: Manual assignment of traits or drug indications (MeSH terms) to disease categories.

    MSH: MeSH term. MSH.Top: MeSH term for the top level of the MeSH hierarchy. Disease: the original disease trait. Indication: the original indication. Category: manually assigned trait or indication category.

Text files

  1. 1.

    Supplementary Data Set 1: GWASdb entries with MeSH terms mapped for each trait and genes annotated as described in the Online Methods.

    Disease: the name of the trait for the corresponding genetic association as provided by GWASdb, which is generally taken directly from the GWAS catalog or whatever source from which the association was derived. snp_id: the identifier (generally a dbSNP rs ID) of the SNP reported to be associated with disease. Link: the reference for the association. Most references are a PubMed ID for the published paper. pvalue: the P value reported for the association of snp_id with disease. Source: the origin of the association information. It may be the following: GWAS:A/B: results listed in the NHGRI GWAS Catalog. The publications the associations of GWAS:B are drawn from published tables and supplementary information. Omim: from Online Mendelian Inheritance in Man. All P values are zero. GWASCentral: from GWAS Central. dbGaP: associations from dbGaP. SNP.Trait.Cnt: the number of associations of the same snp_id with the same disease in the original data set. These have been reduced to a single row in this data set, and the minimum P value was selected. MSH: Medical Subject Heading for disease. Manually mapped by Computational Biology. MSH.Top: the MeSH term for the top level of the branch to which the trait is mapped. In most instances, there are many branches to which a single MSH may be mapped. When this occurs, the most common top-level term in GWASdb is selected. snp.ld: a SNP in linkage disequilibrium (LD) with snp_id that provides a plausible connection to a gene. Gene: a gene that snp.ld is within 5 kb of, is an eQTL for or sits in a DNase I hypersensitivity site that is correlated with, or is within the transcription start site of. r2: LD between snp_id and snp.ld. eqtl: indicates whether snp.ld is an eQTL for a gene. The eQTL data are drawn from rdb: indicates whether snp.ld-gene mapping is the result of a DHS correlation (from Maurano et al. (2012), provided by J. Stamatoyannopoulos). Cat.rdb: RegulomeDB category of the SNP (if rdb is "yes"). Lower values indicate more lines of converging functional evidence. eCat: a derivative of Cat.rdb, filling in values where rdb is "no." If eqtl is "yes" but rdb is "no," then it gets a value of 2. If snp.ld is a missense variant (amino acid change), the value is 0. The value is 9 otherwise. AAEffect: amino acid effect of snp.ld. AAScore: Condel score from VEP for nonsynonymous variants. GeneScore: an overall assessment of the evidence that the associated variant has a causal effect on the gene in question, ranging from values of zero to eight. Higher scores imply higher weight of causal evidence. The contributions to GeneScore are summarized on a separate GeneScore Wiki page. Rank: the rank for the given gene for its strength of connection to snp_id. This takes LD and functional evidence into account.

  2. 2.

    Supplementary Data Set 2: Reduction of the genetic association data to a single row per gene and trait as used for most analyses described.

    Variable names are as given for Supplementary Data Set 1.

  3. 3.

    Supplementary Data Set 3: Data set of unique target-indication combinations in Pharmaprojects.

    Gene: drug target gene. MSH: MeSH term for drug indication. MSH.Top: top-level MeSH term. Phase.Latest: latest phase to which the target-indication pair progressed through the development pipeline. lApprovedUS.EU: indicator of whether a drug for the target-indication pair has been approved in the United States or European Union.

  4. 4.

    Supplementary Data Set 4: Relative similarity matrix of MeSH terms.

    Row and column names correspond to each MeSH term for which a relative similarity could be computed.

About this article

Publication history





Further reading