The support of human genetic evidence for approved drug indications

Journal name:
Nature Genetics
Volume:
47,
Pages:
856–860
Year published:
DOI:
doi:10.1038/ng.3314
Received
Accepted
Published online

Abstract

Over a quarter of drugs that enter clinical development fail because they are ineffective. Growing insight into genes that influence human disease may affect how drug targets and indications are selected. However, there is little guidance about how much weight should be given to genetic evidence in making these key decisions. To answer this question, we investigated how well the current archive of genetic evidence predicts drug mechanisms. We found that, among well-studied indications, the proportion of drug mechanisms with direct genetic support increases significantly across the drug development pipeline, from 2.0% at the preclinical stage to 8.2% among mechanisms for approved drugs, and varies dramatically among disease areas. We estimate that selecting genetically supported targets could double the success rate in clinical development. Therefore, using the growing wealth of human genetic data to select the best targets and indications should have a measurable impact on the successful development of new drugs.

At a glance

Figures

  1. Summary of data resources and mappings between them.
    Figure 1: Summary of data resources and mappings between them.

    (a) Summary of each data resource and the key filtering and processing steps applied to create the final set of gene-trait and drug target–indication combinations investigated in this study. GWASdb sources correspond to unique PubMed IDs or other unique data sources given for each association. GAD, Genetic Association Database. (b) An example of the approach to map genetically associated variants to genes, illustrated with the bone mineral density GWAS association with rs9533090 (depicted in red). Of five SNPs in strong LD with rs9533090 (r2 ≥ 0.8), one falls within a DNase I–hypersensitive site (DHS) that was found to have a sensitivity signal correlated with the DHS of the TNFSF11 gene transcription start site (TSS).

  2. Enrichment of target genes for drugs approved in the United States or the European Union.
    Figure 2: Enrichment of target genes for drugs approved in the United States or the European Union.

    Associations are shown for all 22,012 coding genes (top half) and for 2,555 classically druggable genes (bottom half). Target enrichment is estimated for genes in OMIM, genes with any connection with a GWASdb association, only the top gene for each GWASdb association, genes in OMIM or the top GWASdb gene, and genes in the lower quartile of the RVIS distribution. Odds ratios with exact 95% confidence intervals were estimated from 2 × 2 tables of the status of each gene as a drug target versus status in the above categories (Online Methods).

  3. Overlap between drug targets and their indications with genetic associations for similar traits.
    Figure 3: Overlap between drug targets and their indications with genetic associations for similar traits.

    (a) The percentage of target-indication pairs for drugs approved in the United States or the European Union overlapping with gene-trait combinations from GWASdb or OMIM. Indications and traits were considered similar if the relative similarity was at least 0.7. The percent overlap is shown for all 820 pairs (“Overall”; 158 unique indications and 265 targets) and for pairs stratified by the indication therapy category. Indications are limited to those with at least five genes in GWASdb or OMIM associated with a similar trait. Exact 95% confidence intervals are shown, along with the number of target-indication pairs, for each row. The enrichment was further assessed via permutation testing and was found to be highly statistically significant (P < 1 × 10−5; Supplementary Fig. 9). (b) Percent overlap for all target-indication pairs at the latest stage reached in the development pipeline. Overlap is given for GWASdb, OMIM and GWASdb, and OMIM combined. All other features are as in Figure 1a.

  4. Summary of genetic association data and their traits and gene mappings.
    Supplementary Fig. 1: Summary of genetic association data and their traits and gene mappings.

    Distribution of the (a) number of publications or sources and (b) reported associations for each unique MeSH term. (c) Distribution of the number of genes mapped for each MeSH term. (d) Distribution of the number of genes mapped to each SNP (excluding SNPs with no genes mapped; n = 5,272). (e) Distribution of P values for all unique associations. These summaries are limited to publications and sources with at least one association with a P value ≤1 × 10–8. Panels b and c were truncated at 50, panel d was truncated at 30 and panel e was truncated at 100. All values over those thresholds are shown at the maximum value. (f) Distribution of the number of genes for each unique MeSH term in OMIM.

  5. Summary of the drug data and their target gene and indications.
    Supplementary Fig. 2: Summary of the drug data and their target gene and indications.

    Distribution of the (a) number of drugs observed for each target gene in the analysis data set (truncated at 50). (b) Distribution of the number of target genes for each drug (i.e., multiple drug targets or combinations of therapeutic agents). (c) Distribution of the number of MeSH terms (i.e., unique indications) for each drug (truncated at 15). (d) Distribution of the number of drugs listed for each MeSH term (truncated at 100). (e) Distribution of the number of target genes for each MeSH term (truncated at 100). (f) Distribution of the number of MeSH terms for each target gene (truncated at 50).

  6. Illustrated use of the MeSH ontology to estimate relative similarity.
    Supplementary Fig. 3: Illustrated use of the MeSH ontology to estimate relative similarity.

    The methods used in this study (lin and resnik, implemented in UMLS::Similarity) combined both path length and information content. See the Online Methods for additional details.

  7. Overlap of drug targets with genetic associations by disease category and latest development phase.
    Supplementary Fig. 4: Overlap of drug targets with genetic associations by disease category and latest development phase.

    (a) Overlap between drug targets and their indications with genetic associations for similar traits. The percentage of target-indication pairs overlapping with gene-trait combinations from GWASdb or OMIM for the latest development phase each pair achieved as recorded in Pharmaprojects. The number of unique target-indication pairs for each category at each phase is shown to the right of each plot. Exact 95% confidence intervals are shown. (b) Distribution of the number of target-indication pairs at each phase by category.

  8. Overlap between drug targets and their indications with genetic associations for similar traits with genetic associations restricted to GWASdb only.
    Supplementary Fig. 5: Overlap between drug targets and their indications with genetic associations for similar traits with genetic associations restricted to GWASdb only.

    Overlap for (a) drugs approved in the United States or European Union and (b) the furthest development phase to which each target-indication pair progressed. Exact 95% confidence intervals are shown.

  9. Overlap between drug targets and their indications with genetic associations for similar traits with genetic associations restricted to OMIM only.
    Supplementary Fig. 6: Overlap between drug targets and their indications with genetic associations for similar traits with genetic associations restricted to OMIM only.

    Overlap for (a) drugs approved in the United States or European Union and (b) the furthest development phase to which each target-indication pair progressed. Exact 95% confidence intervals are shown.

  10. Tradeoff between the number of indications studied and overall genetic support.
    Supplementary Fig. 7: Tradeoff between the number of indications studied and overall genetic support.

    The tradeoff between the number of indications studied and overall genetic support when setting a lower bound on the number of independent genes associated with a trait related to each indication (relative similarity ≥ 0.7), restricted to drugs approved in the United States or European Union. The percentage of target-indication pairs with genetic support increases as indications are restricted to those with the most genetic information available, although at the cost of considering far fewer indications. The analyses reported in Figure 3 and Supplementary Figures 4,5,6 selected five as the threshold, where the first enrichment plateau is observed.

  11. Distribution of the number of genes associated with traits similar ([ge]0.7) to the indications included in the analysis of overlap with genetic associations.
    Supplementary Fig. 8: Distribution of the number of genes associated with traits similar (≥0.7) to the indications included in the analysis of overlap with genetic associations.

    The few indications with very large numbers of genes associated were truncated at 50. (The full range is available in Supplementary Table 5.) The box corresponds to the interquartile range, the center line corresponds to the median, the whisker correspond to the maximum or 1.5 times the interquartile range (whichever is largest) and the points identify further outliers. The numbers given on the y axis are the number of unique indications observed in each phase. There is no statistically significant variability among phases (P = 0.18); analysis of the rank of the number of associations with phase as ordered variable) or with the linear trend (P = 0.37; analysis of rank of number of associations with phase as numeric with 1 = preclinical and 5 = approved in the United States or European Union).

  12. System for scoring the strength of evidence tying a variant with a phenotypic association to a gene.
    Supplementary Fig. 9: System for scoring the strength of evidence tying a variant with a phenotypic association to a gene.

    For variant function, “DHS Rdb 3” indicates that the variant has a RegulomeDB score of 3 and falls within a proximal or distal DHS site, “eQTL or DHS (2)” indicates that the variant was either identified as an eQTL in the University of Chicago eQTL database or had a RegulomeDB score of 2 and “eQTL & DHS” indicates that the variant was both identified as an eQTL and fell within a DHS site with a RegulomeDB score of 2 or less. LD is in the form of r2.

  13. Permutation test of overlap between approved drug target-indications and genetic evidence (GWASdb or OMIM).
    Supplementary Fig. 10: Permutation test of overlap between approved drug target–indications and genetic evidence (GWASdb or OMIM).

    (a) The permutation scheme to simulate the null distribution. (b) The distribution of the percent of gene-trait and target-indication pairs that overlap over 10,000 permutations and the overlap observed in the original data (red downward arrow). (c) The overlap observed in the original data overall and by disease category (red points) and the median percent overlap over 10,000 permutations (red ×).

References

  1. DiMasi, J.A., Feldman, L., Seckler, A. & Wilson, A. Trends in risks associated with new drug development: success rates for investigational drugs. Clin. Pharmacol. Ther. 87, 272277 (2010).
  2. Arrowsmith, J. & Miller, P. Trial watch: phase II and phase III attrition rates 2011–2012. Nat. Rev. Drug Discov. 12, 569 (2013).
  3. Cook, D. et al. Lessons learned from the fate of AstraZeneca's drug pipeline: a five-dimensional framework. Nat. Rev. Drug Discov. 13, 419431 (2014).
  4. Morgan, P. et al. Can the flow of medicines be improved? Fundamental pharmacokinetic and pharmacological principles toward improving Phase II survival. Drug Discov. Today 17, 419424 (2012).
  5. Welter, D. et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res. 42, D1001D1006 (2014).
  6. Plenge, R.M., Scolnick, E.M. & Altshuler, D. Validating therapeutic targets through human genetics. Nat. Rev. Drug Discov. 12, 581594 (2013).
  7. Kathiresan, S. et al. Common variants at 30 loci contribute to polygenic dyslipidemia. Nat. Genet. 41, 5665 (2009).
  8. Okada, Y. et al. Genetics of rheumatoid arthritis contributes to biology and drug discovery. Nature 506, 376381 (2014).
  9. Sanseau, P. et al. Use of genome-wide association studies for drug repositioning. Nat. Biotechnol. 30, 317320 (2012).
  10. Li, M.J. et al. GWASdb: a database for human genetic variants identified by genome-wide association studies. Nucleic Acids Res. 40, D1047D1054 (2012).
  11. Wang, Z.Y. & Zhang, H.Y. Rational drug repositioning by medical genetics. Nat. Biotechnol. 31, 10801082 (2013).
  12. Hopkins, A.L. & Groom, C.R. The druggable genome. Nat. Rev. Drug Discov. 1, 727730 (2002).
  13. Petrovski, S., Wang, Q., Heinzen, E.L., Allen, A.S. & Goldstein, D.B. Genic intolerance to functional variation and the interpretation of personal genomes. PLoS Genet. 9, e1003709 (2013).
  14. McInnes, B.T., Pedersen, T. & Pakhomov, S.V. UMLS-Interface and UMLS-Similarity: open source software for measuring paths and semantic similarity. AMIA Annu. Symp. Proc. 2009, 431435 (2009).
  15. Patsopoulos, N.A. et al. Genome-wide meta-analysis identifies novel multiple sclerosis susceptibility loci. Ann. Neurol. 70, 897912 (2011).
  16. 1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature 467, 10611073 (2010).
  17. Schadt, E.E. et al. Mapping the genetic architecture of gene expression in human liver. PLoS Biol. 6, e107 (2008).
  18. Boyle, A.P. et al. Annotation of functional variation in personal genomes using RegulomeDB. Genome Res. 22, 17901797 (2012).
  19. Maurano, M.T. et al. Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 11901195 (2012).
  20. Davis, A.P. et al. The Comparative Toxicogenomics Database: update 2013. Nucleic Acids Res. 41, D1104D1114 (2013).
  21. Resnik, P. Semantic similarity in a taxonomy: an information-based measure and its application to problems of ambiguity in natural language. J. Artif. Intell. Res. 11, 95130 (1999).
  22. Lin, D. in Proc. Int. Conf. Machine Learning 296304 (Morgan Kaufmann Publishers, 1998).
  23. R Development Core Team. R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2014).
  24. Wickham, H. ggplot2: Elegant Graphics for Data Analysis (Springer, 2009).

Download references

Author information

Affiliations

  1. Quantitative Sciences, GlaxoSmithKline, Research Triangle Park, North Carolina, USA.

    • Matthew R Nelson,
    • Jeffery L Painter &
    • Judong Shen
  2. Quantitative Sciences, GlaxoSmithKline, Stevenage, UK.

    • Hannah Tipney,
    • John C Whittaker &
    • Philippe Sanseau
  3. Department of Systems Biology, Columbia University, New York, New York, USA.

    • Paola Nicoletti,
    • Yufeng Shen &
    • Aris Floratos
  4. Department of Biomedical Informatics, Columbia University, New York, New York, USA.

    • Yufeng Shen &
    • Aris Floratos
  5. State Key Laboratory of Brain and Cognitive Sciences, University of Hong Kong, Hong Kong, SAR, China.

    • Pak Chung Sham
  6. Centre for Genomics Sciences, Li Ka Shing (LKS) Faculty of Medicine, University of Hong Kong, Hong Kong SAR, China.

    • Pak Chung Sham,
    • Mulin Jun Li &
    • Junwen Wang
  7. Department of Biochemistry, LKS Faculty of Medicine, University of Hong Kong, Hong Kong SAR, China.

    • Mulin Jun Li &
    • Junwen Wang
  8. Alternative Discovery and Development, GlaxoSmithKline, Upper Merion, Pennsylvania, USA.

    • Lon R Cardon

Contributions

This work was conceived by M.R.N., H.T., L.R.C., J.C.W. and P.S. The primary analyses were designed and conducted by M.R.N. Supporting analyses were provided by J.L.P. and J.S. The mapping of variants to genes was conducted by M.R.N., P.N., Y.S. and A.F. The GWASdb data were created and provided by P.C.S., M.J.L. and J.W. The manuscript was written by M.R.N. with contributions from H.T., J.C.W. and P.S.

Competing financial interests

M.R.N., H.T., J.L.P., J.S., L.R.C., J.C.W. and P.S. are employees of GlaxoSmithKline, a global healthcare company, that may conceivably benefit financially through this publication.

Corresponding author

Correspondence to:

Author details

Supplementary information

Supplementary Figures

  1. Supplementary Figure 1: Summary of genetic association data and their traits and gene mappings. (108 KB)

    Distribution of the (a) number of publications or sources and (b) reported associations for each unique MeSH term. (c) Distribution of the number of genes mapped for each MeSH term. (d) Distribution of the number of genes mapped to each SNP (excluding SNPs with no genes mapped; n = 5,272). (e) Distribution of P values for all unique associations. These summaries are limited to publications and sources with at least one association with a P value ≤1 × 10–8. Panels b and c were truncated at 50, panel d was truncated at 30 and panel e was truncated at 100. All values over those thresholds are shown at the maximum value. (f) Distribution of the number of genes for each unique MeSH term in OMIM.

  2. Supplementary Figure 2: Summary of the drug data and their target gene and indications. (103 KB)

    Distribution of the (a) number of drugs observed for each target gene in the analysis data set (truncated at 50). (b) Distribution of the number of target genes for each drug (i.e., multiple drug targets or combinations of therapeutic agents). (c) Distribution of the number of MeSH terms (i.e., unique indications) for each drug (truncated at 15). (d) Distribution of the number of drugs listed for each MeSH term (truncated at 100). (e) Distribution of the number of target genes for each MeSH term (truncated at 100). (f) Distribution of the number of MeSH terms for each target gene (truncated at 50).

  3. Supplementary Figure 3: Illustrated use of the MeSH ontology to estimate relative similarity. (204 KB)

    The methods used in this study (lin and resnik, implemented in UMLS::Similarity) combined both path length and information content. See the Online Methods for additional details.

  4. Supplementary Figure 4: Overlap of drug targets with genetic associations by disease category and latest development phase. (262 KB)

    (a) Overlap between drug targets and their indications with genetic associations for similar traits. The percentage of target-indication pairs overlapping with gene-trait combinations from GWASdb or OMIM for the latest development phase each pair achieved as recorded in Pharmaprojects. The number of unique target-indication pairs for each category at each phase is shown to the right of each plot. Exact 95% confidence intervals are shown. (b) Distribution of the number of target-indication pairs at each phase by category.

  5. Supplementary Figure 5: Overlap between drug targets and their indications with genetic associations for similar traits with genetic associations restricted to GWASdb only. (263 KB)

    Overlap for (a) drugs approved in the United States or European Union and (b) the furthest development phase to which each target-indication pair progressed. Exact 95% confidence intervals are shown.

  6. Supplementary Figure 6: Overlap between drug targets and their indications with genetic associations for similar traits with genetic associations restricted to OMIM only. (255 KB)

    Overlap for (a) drugs approved in the United States or European Union and (b) the furthest development phase to which each target-indication pair progressed. Exact 95% confidence intervals are shown.

  7. Supplementary Figure 7: Tradeoff between the number of indications studied and overall genetic support. (103 KB)

    The tradeoff between the number of indications studied and overall genetic support when setting a lower bound on the number of independent genes associated with a trait related to each indication (relative similarity ≥ 0.7), restricted to drugs approved in the United States or European Union. The percentage of target-indication pairs with genetic support increases as indications are restricted to those with the most genetic information available, although at the cost of considering far fewer indications. The analyses reported in Figure 3 and Supplementary Figures 4,5,6 selected five as the threshold, where the first enrichment plateau is observed.

  8. Supplementary Figure 8: Distribution of the number of genes associated with traits similar (≥0.7) to the indications included in the analysis of overlap with genetic associations. (28 KB)

    The few indications with very large numbers of genes associated were truncated at 50. (The full range is available in Supplementary Table 5.) The box corresponds to the interquartile range, the center line corresponds to the median, the whisker correspond to the maximum or 1.5 times the interquartile range (whichever is largest) and the points identify further outliers. The numbers given on the y axis are the number of unique indications observed in each phase. There is no statistically significant variability among phases (P = 0.18); analysis of the rank of the number of associations with phase as ordered variable) or with the linear trend (P = 0.37; analysis of rank of number of associations with phase as numeric with 1 = preclinical and 5 = approved in the United States or European Union).

  9. Supplementary Figure 9: System for scoring the strength of evidence tying a variant with a phenotypic association to a gene. (82 KB)

    For variant function, “DHS Rdb 3” indicates that the variant has a RegulomeDB score of 3 and falls within a proximal or distal DHS site, “eQTL or DHS (2)” indicates that the variant was either identified as an eQTL in the University of Chicago eQTL database or had a RegulomeDB score of 2 and “eQTL & DHS” indicates that the variant was both identified as an eQTL and fell within a DHS site with a RegulomeDB score of 2 or less. LD is in the form of r2.

  10. Supplementary Figure 10: Permutation test of overlap between approved drug target–indications and genetic evidence (GWASdb or OMIM). (161 KB)

    (a) The permutation scheme to simulate the null distribution. (b) The distribution of the percent of gene-trait and target-indication pairs that overlap over 10,000 permutations and the overlap observed in the original data (red downward arrow). (c) The overlap observed in the original data overall and by disease category (red points) and the median percent overlap over 10,000 permutations (red ×).

PDF files

  1. Supplementary Text and Figures (2,168 KB)

    Supplementary Figures 1–10 and Supplementary Note.

Excel files

  1. Supplementary Table 1: Count of publications, associations and genes corresponding to each MeSH term. (27,500 KB)

    MeSH: unique MeSH terms mapped to GWASdb traits. Publications: the number of publications or unique data sources reporting associations with the MeSH term. Associations: the number of unique SNPs reported to be associated with the MeSH term. Genes: the number of unique genes to which the SNPs associated with the MeSH term are mapped.

  2. Supplementary Table 2: Count of drugs and genes mapping to each MeSH term. (31,608 KB)

    MeSH: unique MeSH terms mapped to indications.
    Drugs: the number of drugs in Pharmaprojects with the MeSH term as an indication. Genes: the number of genes reported to be targets for the drugs with the MeSH term as an indication.

  3. Supplementary Table 3: Count of drugs and MeSH terms corresponding to each drug target gene. (53,163 KB)

    Gene: unique drug target genes. Drugs: the number of drugs reported to target the gene product. MeSH: the number of MeSH terms for which the drugs targeting the gene product are indicated.

  4. Supplementary Table 4: Count of target genes and MeSH terms corresponding to each drug. (614 KB)

    Drug: unique drugs. Genes: the number of genes reported to be targets for the drug. MeSH: the number of MeSH terms indicated for the drug.

  5. Supplementary Table 5: All indications for drugs with reported human drug targets with the number of genes associated with them. (46,723 KB)

    MSH.Ind: drug indication (mapped to the best MeSH term). N.Traits: the number of indications or traits with a relative similarity ≥0.7 to the indication. N.Assns: the number of independent associations reported for the indication or a similar trait. Traits: list of traits with a relative similarity ≥0.7 to the indication; may be other indications without a corresponding genetic trait.lApprovedUS.EU: logical indication of whether the indication is for a drug approved in the United States or European Union.

  6. Supplementary Table 6: Drug targets with a genetic association (GWASdb or OMIM) mapped to the same gene for a trait with relative similarity ≥0.7 to the corresponding indication. (2,649 KB)

    Gene: drug target gene. MSH.Ind: drug indication (mapped to the best MeSH term). Category: indication disease category. MSH.Trt: MeSH term for the trait with a genetic association to the drug target gene. pvalue: the P value for the genetic association. If multiple associations mapped the same gene to the same trait, the smallest P value with the largest gene score was selected. P values of zero indicate that the association is from OMIM. eCat: extension of the RegulomeDB category. If the variant mapping the association to the listed gene was not due to a DHS-correlated enhancer, a value of "0" was assigned to amino acid–changing variants, "2" was assigned for eQTLs and "9" was assigned otherwise. Associations from OMIM were not assigned a value. Rank: the rank of the mapping of the given gene for the reported association. RelSim: relative similarity between the indication and trait. LatestPhase: the latest phase that each unique gene-indication pair achieved in Pharmaprojects.

  7. Supplementary Table 7: Counts of target-indication pairs included in the analyses presented in Figure 3b and used to estimate the enrichment of genetic associations. (11,292 KB)

    Phase.Latest: latest phase in the development pipeline to which each target-indication combination has progressed.
    Num.Targets: total number of target-indication combinations progressing to that phase. N.Overlap: number of target-indication combinations that overlap with a gene-trait association. Percent: percent of target-indication combinations that overlap with a gene-trait association. Genetic.Evidence: the source of the genetic evidence.

  8. Supplementary Table 8: Manually scored MeSH term similarities to values of 0.9, 0.75 and 0.5 reflecting a subjective measure of similarity not captured via the ontological relationships. (28,701 KB)

    MSH1: MeSH term 1. MSH2: MeSH term 2. ManSim: manually assigned similarity. RelSim: relative similarity averaging Resnik and Lin relative similarity measures. RelSim.Res: Resnik relative similarity. RelSim.Lin: Lin relative similarity.

  9. Supplementary Table 9: Manual assignment of traits or drug indications (MeSH terms) to disease categories. (155 KB)

    MSH: MeSH term. MSH.Top: MeSH term for the top level of the MeSH hierarchy. Disease: the original disease trait. Indication: the original indication. Category: manually assigned trait or indication category.

Text files

  1. Supplementary Data Set 1: GWASdb entries with MeSH terms mapped for each trait and genes annotated as described in the Online Methods. (12,332 KB)

    Disease: the name of the trait for the corresponding genetic association as provided by GWASdb, which is generally taken directly from the GWAS catalog or whatever source from which the association was derived. snp_id: the identifier (generally a dbSNP rs ID) of the SNP reported to be associated with disease. Link: the reference for the association. Most references are a PubMed ID for the published paper. pvalue: the P value reported for the association of snp_id with disease. Source: the origin of the association information. It may be the following: GWAS:A/B: results listed in the NHGRI GWAS Catalog. The publications the associations of GWAS:B are drawn from published tables and supplementary information. Omim: from Online Mendelian Inheritance in Man. All P values are zero. GWASCentral: from GWAS Central. dbGaP: associations from dbGaP. SNP.Trait.Cnt: the number of associations of the same snp_id with the same disease in the original data set. These have been reduced to a single row in this data set, and the minimum P value was selected. MSH: Medical Subject Heading for disease. Manually mapped by Computational Biology. MSH.Top: the MeSH term for the top level of the branch to which the trait is mapped. In most instances, there are many branches to which a single MSH may be mapped. When this occurs, the most common top-level term in GWASdb is selected. snp.ld: a SNP in linkage disequilibrium (LD) with snp_id that provides a plausible connection to a gene. Gene: a gene that snp.ld is within 5 kb of, is an eQTL for or sits in a DNase I hypersensitivity site that is correlated with, or is within the transcription start site of. r2: LD between snp_id and snp.ld. eqtl: indicates whether snp.ld is an eQTL for a gene. The eQTL data are drawn from eqtl.uchicago.edu. rdb: indicates whether snp.ld-gene mapping is the result of a DHS correlation (from Maurano et al. (2012), provided by J. Stamatoyannopoulos). Cat.rdb: RegulomeDB category of the SNP (if rdb is "yes"). Lower values indicate more lines of converging functional evidence. eCat: a derivative of Cat.rdb, filling in values where rdb is "no." If eqtl is "yes" but rdb is "no," then it gets a value of 2. If snp.ld is a missense variant (amino acid change), the value is 0. The value is 9 otherwise. AAEffect: amino acid effect of snp.ld. AAScore: Condel score from VEP for nonsynonymous variants. GeneScore: an overall assessment of the evidence that the associated variant has a causal effect on the gene in question, ranging from values of zero to eight. Higher scores imply higher weight of causal evidence. The contributions to GeneScore are summarized on a separate GeneScore Wiki page. Rank: the rank for the given gene for its strength of connection to snp_id. This takes LD and functional evidence into account.

  2. Supplementary Data Set 2: Reduction of the genetic association data to a single row per gene and trait as used for most analyses described. (2,973 KB)

    Variable names are as given for Supplementary Data Set 1.

  3. Supplementary Data Set 3: Data set of unique target-indication combinations in Pharmaprojects. (1,521 KB)

    Gene: drug target gene. MSH: MeSH term for drug indication. MSH.Top: top-level MeSH term. Phase.Latest: latest phase to which the target-indication pair progressed through the development pipeline. lApprovedUS.EU: indicator of whether a drug for the target-indication pair has been approved in the United States or European Union.

  4. Supplementary Data Set 4: Relative similarity matrix of MeSH terms. (60,162 KB)

    Row and column names correspond to each MeSH term for which a relative similarity could be computed.

Additional data