Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

A genetics-led approach defines the drug target landscape of 30 immune-related traits


Most candidate drugs currently fail later-stage clinical trials, largely due to poor prediction of efficacy on early target selection1. Drug targets with genetic support are more likely to be therapeutically valid2,3, but the translational use of genome-scale data such as from genome-wide association studies for drug target discovery in complex diseases remains challenging4,5,6. Here, we show that integration of functional genomic and immune-related annotations, together with knowledge of network connectivity, maximizes the informativeness of genetics for target validation, defining the target prioritization landscape for 30 immune traits at the gene and pathway level. We demonstrate how our genetics-led drug target prioritization approach (the priority index) successfully identifies current therapeutics, predicts activity in high-throughput cellular screens (including L1000, CRISPR, mutagenesis and patient-derived cell assays), enables prioritization of under-explored targets and allows for determination of target-level trait relationships. The priority index is an open-access, scalable system accelerating early-stage drug target selection for immune-mediated disease.

This is a preview of subscription content

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: Overview of Pi applied to rheumatoid arthritis.
Fig. 2: Validating Pi target prioritization for rheumatoid arthritis.
Fig. 3: Cross-trait application of Pi informing utility of approach and predictors.
Fig. 4: Landscape of prioritized target genes across immune traits.
Fig. 5: Landscape of prioritized target pathways across immune traits.
Fig. 6: Multitrait comparisons.

Data availability

The data that support the findings of this study are available within the paper and its Supplementary Information files. The Pi relational database has been deposited into figshare ( and is also available from the Pi web server (

Code availability

Software codes, together with the user and reference manual, have been packaged and deposited into Bioconductor (available at, including codes for the showcase in this manuscript supporting reproducible research.


  1. 1.

    Hay, M., Thomas, D. W., Craighead, J. L., Economides, C. & Rosenthal, J. Clinical development success rates for investigational drugs. Nat. Biotechnology 32, 40–51 (2014).

    CAS  Article  Google Scholar 

  2. 2.

    Plenge, R. M., Scolnick, E. M. & Altshuler, D. Validating therapeutic targets through human genetics. Nat. Rev. Drug Discov. 12, 581–594 (2013).

    CAS  Article  Google Scholar 

  3. 3.

    Nelson, M. R. et al. The support of human genetic evidence for approved drug indications. Nat. Genet. 47, 856–860 (2015).

    CAS  Article  Google Scholar 

  4. 4.

    Finan, C. et al. The druggable genome and support for target identification and validation in drug development. Sci. Transl. Med. 9, eaag1166 (2017).

    Article  Google Scholar 

  5. 5.

    Koscielny, G. et al. Open Targets: a platform for therapeutic target identification and validation. Nucleic Acids Res. 45, D985–D994 (2017).

    CAS  Article  Google Scholar 

  6. 6.

    Okada, Y. et al. Genetics of rheumatoid arthritis contributes to biology and drug discovery. Nature 506, 376–381 (2014).

    CAS  Article  Google Scholar 

  7. 7.

    Albert, F. W. & Kruglyak, L. The role of regulatory variation in complex traits and disease. Nat. Rev. Genet. 16, 197–212 (2015).

    CAS  Article  Google Scholar 

  8. 8.

    Fairfax, B. P. et al. Innate immune activity conditions the effect of regulatory variants upon monocyte gene expression. Science 343, 1246949 (2014).

    Article  Google Scholar 

  9. 9.

    Giambartolomei, C., Vukcevic, D., Schadt, E. E., Franke, L. & Hingorani, A. D. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet. 10, e1004383 (2014).

    Article  Google Scholar 

  10. 10.

    Spalinger, M. R. et al. PTPN2 regulates inflammasome activation and controls onset of intestinal inflammation and colon cancer. Cell Rep. 22, 1835–1848 (2018).

    CAS  Article  Google Scholar 

  11. 11.

    Svensson, M. N. D. et al. Reduced expression of phosphatase PTPN2 promotes pathogenic conversion of Tregs in autoimmunity. J. Clin. Invest. 129, 1193–1210 (2019).

    Article  Google Scholar 

  12. 12.

    Manguso, R. T. et al. In vivo CRISPR screening identifies Ptpn2 as a cancer immunotherapy target. Nature 547, 413–418 (2017).

    CAS  Article  Google Scholar 

  13. 13.

    Guo, Y. et al. CD40L-dependent pathway is active at various stages of rheumatoid arthritis disease progression. J. Immunol. 198, 4490–4501 (2017).

    CAS  Article  Google Scholar 

  14. 14.

    Schwabe, C. et al. Safety, pharmacokinetics, and pharmacodynamics of multiple rising doses of BI 655064, an antagonistic anti-CD40 antibody, in healthy subjects: a potential novel treatment for autoimmune diseases. J. Clin. Pharmacol. 58, 1566–1577 (2018).

    CAS  Article  Google Scholar 

  15. 15.

    Marigorta, U. M. et al. Transcriptional risk scores link GWAS to eQTLs and predict complications in Crohn’s disease. Nat. Genet. 49, 1517–1521 (2017).

    CAS  Article  Google Scholar 

  16. 16.

    Jonkers, I. H. & Wijmenga, C. Context-specific effects of genetic variants associated with autoimmune disease. Hum. Mol. Genet. 26, 185–192 (2017).

    Article  Google Scholar 

  17. 17.

    Atsumi, T. et al. A point mutation of Tyr-759 in interleukin 6 family cytokine receptor subunit gp130 causes autoimmune arthritis. J. Exp. Med. 196, 979–990 (2002).

    CAS  Article  Google Scholar 

  18. 18.

    Sakaguchi, N. et al. Altered thymic T-cell selection due to a mutation of the ZAP-70 gene causes autoimmune arthritis in mice. Nature 426, 454–460 (2003).

    CAS  Article  Google Scholar 

  19. 19.

    Meng, X. et al. Hypoxia-inducible factor-1α is a critical transcription factor for IL-10-producing B cells in autoimmune disease. Nat. Commun. 9, 251 (2018).

    Article  Google Scholar 

  20. 20.

    Vermeire, K. et al. Accelerated collagen-induced arthritis in IFN-gamma receptor-deficient mice. J. Immunol. 158, 5507–5513 (1997).

    CAS  PubMed  Google Scholar 

  21. 21.

    Boe, A., Baiocchi, M., Carbonatto, M., Papoian, R. & Serlupi-crescenzi, O. Interleukin 6 knock-out mice are resistant to antigen-induced experimental arthritis. Cytokine 11, 1057–1064 (1999).

    CAS  Article  Google Scholar 

  22. 22.

    Tada, B. Y., Ho, A., Matsuyama, T. & Mak, T. W. Reduced incidence and severity of antigen-induced autoimmune diseases in mice lacking interferon regulatory factor-1. J. Exp. Med. 185, 231–238 (1997).

    CAS  Article  Google Scholar 

  23. 23.

    Lacey, C. A., Mitchell, W. J., Brown, C. R. & Skyberg, A. Temporal role for MyD88 in a model of Brucella-induced arthritis and musculoskeletal inflammation. Infect. Immun. 85, e00961–16 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  24. 24.

    Wong, P. K. K. et al. SOCS-3 negatively regulates innate and adaptive immune mechanisms in acute IL-1-dependent inflammatory arthritis. J. Clin. Invest. 116, 1571–1581 (2006).

    CAS  Article  Google Scholar 

  25. 25.

    Pierer, M., Wagner, U., Rossol, M. & Ibrahim, S. Toll-like receptor 4 is involved in inflammatory and joint destructive pathways in collagen-induced arthritis in DBA1J mice. PLoS ONE 6, e23539 (2011).

    CAS  Article  Google Scholar 

  26. 26.

    De Wolf, H. et al. High-throughput gene expression profiles to define drug similarity and predict compound activity. Assay Drug Dev. Technol. 16, 162–176 (2018).

    CAS  Article  Google Scholar 

  27. 27.

    Fang, H. & Gough, J. supraHex: an R/Bioconductor package for tabular omics data analysis using a supra-hexagonal map. Biochem. Biophys. Res. Commun. 443, 285–289 (2014).

    CAS  Article  Google Scholar 

  28. 28.

    Dargahi, N. et al. Multiple sclerosis: immunopathology and treatment update. Brain Sci. 7, 78 (2017).

    Article  Google Scholar 

  29. 29.

    Brockmann, M. et al. Genetic wiring maps of single-cell protein states reveal an off-switch for GPCR signalling. Nature 546, 307–311 (2017).

    CAS  Article  Google Scholar 

  30. 30.

    Mujtaba, M. G. et al. Treatment of mice with the suppressor of cytokine signaling-1 mimetic peptide, tyrosine kinase inhibitor peptide, prevents development of the acute form of experimental allergic encephalomyelitis and induces stable remission in the chronic relapsing/remit. J. Immunol. 175, 5077–5086 (2005).

    CAS  Article  Google Scholar 

  31. 31.

    Todd, J. A. et al. Regulatory T cell responses in participants with type 1 diabetes after a single dose of interleukin-2: a non-randomised, open label, adaptive dose-finding trial. PLoS Med. 13, e1002139 (2016).

    Article  Google Scholar 

  32. 32.

    Danese, S. et al. Tofacitinib as induction and maintenance therapy for ulcerative colitis. N. Engl. J. Med. 377, 1723–1736 (2017).

    Article  Google Scholar 

  33. 33.

    Panés, J. et al. Tofacitinib for induction and maintenance therapy of Crohn’s disease: results of two phase IIb randomised placebo-controlled trials. Gut 66, 1049–1059 (2017).

    Article  Google Scholar 

  34. 34.

    Tulunay, A. et al. Activation of the JAK/STAT pathway in Behcet’s disease. Genes Immun. 16, 170–175 (2015).

    CAS  Article  Google Scholar 

  35. 35.

    Beeh, K., Kanniess, F., Wagner, F., Schilder, C. & Naudts, I. The novel TLR-9 agonist QbG10 shows clinical efficacy in persistent allergic asthma. J. Allergy Clin. Immunol. 131, 866–874 (2013).

    CAS  Article  Google Scholar 

  36. 36.

    Parnas, O. et al. A genome-wide CRISPR screen in primary immune cells to dissect regulatory networks. Cell 162, 675–686 (2015).

    CAS  Article  Google Scholar 

  37. 37.

    Hedrich, C. M. Epigenetics in SLE. Curr. Rheumatol. Rep. 19, 58 (2017).

    Article  Google Scholar 

  38. 38.

    Singh, N. et al. Alterations in nuclear structure promote lupus autoimmunity in a mouse model. Dis. Model Mech. 9, 885–897 (2016).

    CAS  Article  Google Scholar 

  39. 39.

    Banerjee, S., Biehl, A., Gadina, M., Hasni, S. & Schwartz, D. M. JAK–STAT signaling as a target for inflammatory and autoimmune diseases: current and future prospects. Drugs 77, 521–546 (2017).

    CAS  Article  Google Scholar 

  40. 40.

    Lee, W. H. Open access target validation is a more efficient way to accelerate drug discovery. PLoS Biol. 13, e1002164 (2015).

    Article  Google Scholar 

  41. 41.

    MacArthur, J. et al. The new NHGRI-EBI Catalog of published genome-wide association studies (GWAS Catalog). Nucleic Acids Res. 45, D896–D901 (2016).

    Article  Google Scholar 

  42. 42.

    Rao, S. S. P. et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–1680 (2014).

    CAS  Article  Google Scholar 

  43. 43.

    Javierre, B. M. et al. Lineage-specific genome architecture links enhancers and non-coding disease variants to target gene promoters. Cell 167, 1369–1384.e19 (2016).

    CAS  Article  Google Scholar 

  44. 44.

    Fairfax, B. P. et al. Genetics of gene expression in primary immune cells identifies cell type-specific master regulators and roles of HLA alleles. Nat. Genet. 44, 502–510 (2012).

    CAS  Article  Google Scholar 

  45. 45.

    Westra, H.-J. et al. Systematic identification of trans eQTLs as putative drivers of known disease associations. Nat. Genet. 45, 1238–1243 (2013).

    CAS  Article  Google Scholar 

  46. 46.

    Naranbhai, V. et al. Genomic modulators of gene expression in human neutrophils. Nat. Commun. 6, 7545 (2015).

    Article  Google Scholar 

  47. 47.

    Kasela, S. et al. Pathogenic implications for autoimmune mechanisms derived by comparative eQTL analysis of CD4+ versus CD8+ T cells. PLoS Genet. 13, e1006643 (2017).

    Article  Google Scholar 

  48. 48.

    Zhu, Z. et al. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat. Genet. 48, 481–487 (2016).

    CAS  Article  Google Scholar 

  49. 49.

    Ashburner, M. et al. Gene Ontology: tool for the unification of biology. Nat. Genet. 25, 25–29 (2000).

    CAS  Article  Google Scholar 

  50. 50.

    Hamosh, A. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res. 33, D514–D517 (2004).

    Article  Google Scholar 

  51. 51.

    Kibbe, W. A. et al. Disease Ontology 2015 update: an expanded and updated database of human diseases for linking biomedical knowledge through disease data. Nucleic Acids Res. 43, D1071–D1078 (2015).

    CAS  Article  Google Scholar 

  52. 52.

    Köhler, S. et al. The Human Phenotype Ontology in 2017. Nucleic Acids Res. 45, D865–D876 (2016).

    Article  Google Scholar 

  53. 53.

    Smith, C. L. & Eppig, J. T. The Mammalian Phenotype Ontology: enabling robust annotation and comparative analysis. Wiley Interdiscip. Rev. Syst. Biol. Med. 1, 390–399 (2009).

    CAS  Article  Google Scholar 

  54. 54.

    Grady, L. Random walks for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 28, 1768–1783 (2006).

    Article  Google Scholar 

  55. 55.

    Szklarczyk, D. et al. The STRING database in 2017: quality-controlled protein–protein association networks, made broadly accessible. Nucleic Acids Res. 39, 561–568 (2016).

    Article  Google Scholar 

  56. 56.

    Gaulton, A. et al. The ChEMBL database in 2017. Nucleic Acids Res. 45, D945–D954 (2017).

    CAS  Article  Google Scholar 

  57. 57.

    Loughin, T. M. A systematic comparison of methods for combining p-values from independent tests. Comput. Stat. Data Anal. 47, 467–485 (2004).

    Article  Google Scholar 

  58. 58.

    Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005).

    CAS  Article  Google Scholar 

  59. 59.

    Fabregat, A. et al. The reactome pathway knowledgebase. Nucleic Acids Res. 44, D481–D487 (2016).

    CAS  Article  Google Scholar 

  60. 60.

    Kanehisa, M., Furumichi, M., Tanabe, M., Sato, Y. & Morishima, K. KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res. 45, D353–D361 (2017).

    CAS  Article  Google Scholar 

  61. 61.

    Fang, H. & Gough, J. The ‘dnet’ approach promotes emerging research on cancer patient survival. Genome Med. 6, 64 (2014).

    PubMed  PubMed Central  Google Scholar 

  62. 62.

    Wang, Z. et al. Extraction and analysis of signatures from the Gene Expression Omnibus by the crowd. Nat. Commun. 7, 12846 (2016).

    CAS  Article  Google Scholar 

  63. 63.

    Berman, H. M. et al. The Protein Data Bank. Nucleic Acids Res. 28, 235–242 (2000).

    CAS  Article  Google Scholar 

  64. 64.

    Schmidtke, P. & Barril, X. Understanding and predicting druggability. A high-throughput method for detection of drug binding sites. J. Med. Chem. 53, 5858–5867 (2010).

    CAS  Article  Google Scholar 

  65. 65.

    Mungall, C. J. et al. The Monarch Initiative: an integrative data and analytic platform connecting phenotypes to genotypes across species. Nucleic Acids Res. 45, D712–D722 (2017).

    CAS  Article  Google Scholar 

  66. 66.

    Fang, H. et al. XGR software for enhanced interpretation of genomic summary data, illustrated by application to immunological traits. Genome Med. 8, 129 (2016).

    Article  Google Scholar 

Download references


We thank A. Edwards for comments on the manuscript. This project was supported by: the European Research Council (FP7/2007-2013), through an EU/EFPIA Innovative Medicines Initiative Joint Undertaking (ULTRA-DD 115766 and 281824 to J.C.K.); Arthritis Research UK (20773 to J.C.K.); the Wellcome Trust Investigator Award (204969/Z/16/Z to J.C.K.); Wellcome Trust grants 090532/Z/09/Z and 203141/Z/16/Z (to the Wellcome Centre for Human Genetics core facility) and 201488/Z/16/Z (to B.P.F.); NIHR Oxford Biomedical Research Centre; Estonian Research Council (PRG184 to L.M.); Alzheimer’s Research UK (ARUK-2018DDI-OX to P.E.B.); and Structural Genomics Consortium (charity number 1097737), which receives funds from AbbVie, Bayer Pharma, Boehringer Ingelheim, the Canada Foundation for Innovation, the Eshelman Institute for Innovation, Genome Canada, the Innovative Medicines Initiative (EU/EFPIA) (ULTRA-DD grant number 115766), Janssen, Merck (Darmstadt, Germany), MSD, Novartis Pharma, the Ontario Ministry of Economic Development and Innovation, Pfizer, the São Paulo Research Foundation, Takeda and the Wellcome Trust (106169/ZZ14/Z). For computation, we used the Oxford Biomedical Research Computing facility—a joint development between the Wellcome Centre for Human Genetics and the Big Data Institute, supported by Health Data Research UK and the NIHR Oxford Biomedical Research Centre. The views expressed are those of the author(s) and not necessarily those of the NHS, NIHR or Department of Health. Listed in the ULTRA-DD Consortium are Target Priorization Network (TPN) members (alphabetical order).

Author information





H.F., J.C.K., M.S., C.B., P.B., B.P.F., C.A.O. and P.J.P. conceived of the study. H.F. and J.C.K. developed the methodology. H.F. developed the software and curated the database. H.F., H.D.W., B.K., K.L.B., J.O., S.K. and J.K.W. performed the analyses. H.D.W., F.E.M., L.C., T.S., Y.S. and L.B. performed the investigation. P.J.P., H.W.G., B.P.F., J.C.K., L.M., B.D.M., D.D., S.D.C. and P.E.B. provided resources. H.F. curated the data. H.F. and J.C.K. wrote the original draft. H.F., J.C.K., K.L.B., B.K., L.H., J.O., H.D.W., M.S., C.A.O., A.L.L. and F.E.M. reviewed and edited the manuscript. H.F., J.C.K., H.D.W., K.L.B., S.D.C. and B.K. revised the manuscript. H.F., J.C.K., A.S., A.L.L. and K.L.B. designed the visualization. J.C.K. supervised the study. J.C.K. and M.S. acquired the funding.

Corresponding author

Correspondence to Julian C. Knight.

Ethics declarations

Competing interests

The Structural Genomics Consortium receives funds from AbbVie, Bayer Pharma, Boehringer Ingelheim, the Canada Foundation for Innovation, the Eshelman Institute for Innovation, Genome Canada, Janssen, Merck (Darmstadt, Germany), MSD, Novartis Pharma, the Ontario Ministry of Economic Development and Innovation, Pfizer, the São Paulo Research Foundation, Takeda and the Wellcome Trust (authors B.D.M., D.D., C.B., Y.S., L.B. and M.S.). These funders had no direct role in study conceptualization, design, data collection, analysis, decision to publish or preparation of the manuscript, except for Janssen (authors H.D.W., J.K.W., H.W.G. and P.J.P.), which generated the L1000 data in house for the compound screen presented in the paper.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated supplementary information

Supplementary Figure 1 Pilot analysis supporting principles used by Pi.

a, Schematic illustration of scoring for nearby genes (nGene) from GWAS summary data accounting for linkage disequilibrium (LD) structure and genomic organization. Considering genomic proximity, that is, a distance window for GWAS SNPs taking account of LD structure, and also considering genomic organization, that is, nearby genes and SNPs constrained to the same topologically associated domain (TAD). b, Simultaneous optimization of parameters regarding genomic influential range (distance and decay for nearby gene scoring) and network influential range (the restarting probability controlling the degree of network connectivity being exploited by random walk with restart), in terms of performance measured by area under the curve (AUC) separating gold standard positives (clinical proof-of-concept immune drug targets) and gold standard negatives (simulated genes unlikely to be drug targets). Clinical proof-of-concept immune drug targets are sourced from the ChEMBL database, defined as a collection of target genes of drugs with phase 2 concluded, moving into phase 3 and above in Pi immune traits. c, Enrichment analysis of approved immune drug targets (left), phase 3 and above immune drug targets (middle), and clinical proof-of-concept immune targets (right) in terms of chromatin conformation genes (cGene) by physical interaction and eQTL genes (eGene) by expression. Enrichment analysis is based on one-sided Fisher’s exact test, with the vertical line in grey indicating the false discovery rate (FDR) threshold at 0.05. d, The cGene scored considering the empirical cumulative distribution function (eCDF) of the significance/strength level linking an SNP to a gene. e, The methodological overview of incorporating colocalization analysis at the GWAS-eQTL integration step in the Pi pipeline, and how to estimate directionality and magnitude of effect.

Supplementary Figure 2 Network analysis supporting principles used by Pi.

a, Enrichment analysis of immune drug targets (of different phases) in terms of GWAS reported genes (left) and GWAS genes plus their interacting neighbors (right). Inserted below is per-trait enrichment analysis focusing on clinical proof-of-concept immune drug targets. Fisher’s exact test (two-sided) used to calculate odds ratio (OR) with 95% confidence interval (CI; represented by lines). GWAS reported genes are sourced from GWAS Catalog (P < 5 × 10−8), and interaction neighbors of GWAS genes are sourced from the STRING database, defined as interactions with high confidence score. b, Schematic overview showing an approach for identifying pathway crosstalk and assessing the significance level.

Supplementary Figure 3 Illustration of data and links for prioritization, evidence, druggability and effect output in rheumatoid arthritis (RA).

a, The front page with a navigation tab listing main features and an interface “Pi Reveal” searching for traits and targets. b, The trait-specific page for prioritized target genes. c, A tabular display of the top 5 genes in target pathway crosstalk together with an overview of evidence, druggability and effect. d, Illustration of data and links for one prioritized gene, CD40, showing evidence used in Pi for this gene and thus accelerating decision-making on target selection. This includes but is not limited to: (i) target priority info, linked to details on genomic predictors (for example, cell types/states in which eQTLs and eGenes are identified together with magnitude and directionality of effect), annotation predictors and interaction/network plot; and (ii) target druggable info, where available, including DGIdb druggable gene categories and PDB druggable pockets (linked to details and 3D View of the PDB protein structure embedded with druggable pockets shown in blue).

Supplementary Figure 4 Diagram of schema used in the Pi relational database.

Available at, together with schema documentation giving instructions on how to install and use the database. In particular the table “pi_priority” provides trait-specific target priority information together with an overview of associated information. The linked tables provide details on data used for predictor preparation, current therapeutic drugs with phase of development, and different measurements on druggability.

Supplementary Figure 5 Further information regarding target prioritizations and validation for rheumatoid arthritis (RA).

a, Enrichment analysis of Pi top 1% prioritized target genes for RA, in terms of targets of approved drugs in RA, juvenile idiopathic arthritis (JIA) or 28 other immune traits. Fisher’s exact test (one-sided) used. b, Gold standard positives (GSPs) were defined from approved drug targets; genes unlikely to be drug targets (gold standard negatives, GSNs) were defined from the gene druggable space in which GSPs and their direct interaction neighbors had been removed. c, Benchmarking Pi, comparing performance of a naïve method (how often a gene is targeted by drugs), and two other genetics-based methods to separate approved drug targets (GSPs) from GSNs. d, Comparing performance of Pi versus Mendelian disease genetics plus network neighbors (identified via random walk). e, TSEA of Pi prioritized target genes for RA, in terms of disease-specific gene expression signatures in RA (compared to unaffected controls). Normalized enrichment score (NES) indicates likelihood of Pi prioritized target genes to be modulated in disease (over- or under-expressed). Disease-specific gene expression signatures are sourced from CREEDS, with cell types or tissues of origin and the primary data source (GSE accession number) indicated. f, Venn diagram illustrating the enrichment of genes with druggable pockets in RA novel targets (defined as top 1% prioritized but excluding targets of current therapeutics in RA). The significance level (P), odds ratio (OR) and 95% confidence interval (CI) calculated according to Fisher’s exact test (one-sided). g, Identification of drug perturbation gene signatures enriched in RA novel targets. The significance level (FDR) calculated according to Fisher’s exact test (one-sided). Novel RA targets in these enriched signatures shown on the left, with targets ordered according to Pi rating. PBMCs, peripheral blood mononuclear cells; LPS, lipopolysaccharide; CML, chronic myeloid leukemia. h, Novel RA targets identified by Pi that are targets of approved drugs in other disease indications, indicating potential repurposing opportunities.

Supplementary Figure 6 Analysis of Pi rating for drugs and target genes using L1000 data and disease-specific gene signatures in RA.

a, Schematic overview showing approach to validate Pi rating at the drug and target level. b, Correlation of Pi ratings with disease-relevant activity of a compound (transcriptional similarity between an RA disease gene expression signature and the compound transcriptional profile in PBMCs quantified using Zhang’s connection score), shown at the drug (left) and target (right) level. Spearman rank correlation calculated, with the significance level estimated empirically (randomly sampling 20,000 times). Sensitivity (estimated by removing different percentages of drugs, repeated 100 times) and specificity (estimated by calculating the correlations based on Pi rating in 29 other immune traits listed in Fig. 3a) shown below. Error bars represent standard deviation with the mean centered. c, L1000 data identifying compounds targeting novel targets in RA. Zhang’s connection score quantifies the disease-relevant expression activity of a compound as the transcriptional similarity between the RA disease gene expression signature and the compound transcriptional profile in PBMCs. The significance level (P = 5%) for the connection score estimated by randomized test using the methodology.

Supplementary Figure 7 Pi predictors, network effect, and negative control.

a, Performance comparisons for individual predictors across traits (within Pi and direct use). Measured by area under the curve (AUC) separating gold standard positives (GSPs) and gold standard negatives (stimulated). Analysing using GSPs based on either targets of phase 2 and above (left), targets of drugs at phase 3 and above (middle) or approved drug targets (right). Direct use of immune annotations without knowledge of genomic seed genes is much less predictive of drug targets. Notably, such analysis restricted to traits with >10 targets, that is, 16 traits based on targets of phase 2 and above (left), 11 traits based on phase 3 and above (middle), and 5 traits based on approved drug targets (right). b, Optimizing components of Pi predictors. Comparing individual predictors (x-axis) constructed using both seed genes and non-seed genes (left) and using only seed genes (without incorporating network connectivity, right). GSPs are based on target genes of drugs at phase 2 and above. c, Scatter plot showing relationship between network degree and priority rank (all 30 traits with a total of n = 1,200 dots). Correlation based on Spearman’s rank test (two-sided). d, Negative control for enrichment of immune drug targets. Left: schematic illustration of use of Experimental Factor Ontology (EFO) tree to select immune mediated GWAS disease traits and non-immune GWAS disease traits. Right: target set enrichment analysis (TSEA) for approved immune drug targets in the Pi prioritized gene list taking as inputs GWAS SNPs from immune mediated diseases (in blue) or from exclusively non-immune mediated diseases (in yellow), with sensitivity assessed by removing different percentages of GWAS SNPs. The horizontal line in grey indicates the Bonferroni-adjusted P-value threshold at 0.05.

Supplementary Figure 8 Machine learning and predictor informativeness.

a, Performance comparisons between machine learning algorithms. Area under the curve (AUC) shown with 95% confidence intervals based on 10-repeated 3-fold cross validation per algorithm with optimized tuning parameters. Per fold, two thirds of GSPs and GSNs used for training, one third for performance evaluation. GSPs are based on target genes of drugs at phase 2 and above (that is, clinical proof-of-concept targets). Using random forest consistently outperforms or is competitive to the top performer of other algorithms (followed by state-of-the-art boosting algorithms, classical ones, and generalized linear algorithms). b, Relative importance of predictors. Measured by decrease in accuracy (disabling that predictor) scaled relative to maximum decrease, estimated by random forest. Annotation predictors with knowledge of genomic seed genes are in general more informative than genomic predictors based on actual ‘usage’ of predictors by random forest; this is consistent with performance directly measured for individual predictors (Fig. 3b).

Supplementary Figure 9 Target set enrichment analysis for immune traits.

Such analysis restricted to 11 traits with >10 phase 3 and above targets (left), and to 5 traits with > 10 approved drug targets (right). Bar plot shows the proportion of such targets at “leading edge” of prioritized rankings. Coverage (total number within the leading edge / total number of targets for that trait) indicated, together with FDR.

Supplementary Figure 10 Sensitivity of enrichments for immune drug targets to immune-related annotation predictors.

Scatter plot shows target set enrichment analysis results including normalized enrichment score (NES), coverage and FDR (the horizontal line in blue indicating the FDR threshold at 0.01) for the prioritized gene list after removing one or more annotation predictors.

Supplementary Figure 11 Cluster analysis of top 1% prioritized genes across 16 immune traits.

a, A supra-hexagonal map labeled with the built-in index and the number of target genes per hexagon. Six target clusters were identified using the region-growing partition of the trained map. This partition allows for topology-preserving identification of target clusters, each of which is continuous over the map. b, Schematic flowchart illustrating multilayer data comparison. Target-trait matrix containing Pi rating is used as input data for training the map, while druggable pocket data (a binary vector) is used as additional data for overlaying onto the trained map. The resulting druggable map indicates the probability of each hexagon containing druggable genes, with the percentage (%) of druggable genes for each cluster shown (pie chart).

Supplementary Figure 12 Target cluster enrichment analysis.

a, Forest plot of approved drug targets (grouped by drug indication terms) enriched in cluster C6. The significance level (FDR), odds ratio and 95% CI calculated according to Fisher’s exact test (one-sided). Also illustrated gene lists for each enriched drug indication term. Right: illustration of approved drug targets for immune system diseases found in the cluster, together with diseases and drugs. b, Forest plot of KEGG pathways enriched in target clusters. Lines represent 95% CI (one-sided Fisher’s exact test).

Supplementary Figure 13 Heatmap illustrating prioritized pathways for 12 traits.

Based on KEGG pathway map: immune system pathways (left) and signal transduction pathways (right).

Supplementary Figure 14 Target validation for systemic lupus erythematosus (SLE).

a, Illustration of protein targets (y-axis) targeted by chemical probes (x-axis). Probes are ordered by the number of targets it has. Multiple targets per probe are colour-coded by probe-to-target affinity measured by IC50. b, Epigenetic probe assays tested using cytokine stimulated PBMCs from SLE patients (n = 5 recruited patients on a random basis without a risk for a self-selection bias). c, Physical interaction of the region harboring SLE-associated SNP rs558702 with EHMT2 in Macrophages M0 (promoter capture Hi-C score = 13.99), M1 (13.28) and M2 (10.58). d, Interaction plot for EHMT2 illustrating the top prioritized neighbors of this gene in SLE.

Supplementary Figure 15 Further analysis for 50 crosstalk network genes.

a, Individual pathways significantly enriched in the crosstalk. The significance level (FDR), odds ratio (OR) and 95% confidence interval (CI) calculated according to Fisher’s exact test (one-sided), using all multitrait rated genes as the test background. b, Venn diagram illustrating the enrichment of mouse immune-mediated disease phenotypes, using all multitrait rated genes as the test background. Calculated using one-sided Fisher’s exact test. Genes annotated to mouse immune-mediate disease phenotypes sourced from the Monarch Initiative. c, Identification of drug perturbation gene signatures enriched in crosstalk network genes. The significance level (FDR) calculated according to Fisher’s exact test (one-sided). Crosstalk network genes in these enriched signatures shown on the left. PBMCs, peripheral blood mononuclear cells; LPS, lipopolysaccharide; AML, acute myeloid leukemia; CML, chronic myeloid leukemia. d, Heatmap illustrating the potential nodal points/genes and their PDB known protein structures with predicted druggable pockets. Color-coded by the number of druggable pockets found in the structure. e, Multitrait rating versus novelty score, with 41 highly rated but under-explored genes highlighted in box. Also labeled are genes involved in interferon and IL2/6/20 family signaling pathways. f, Illustration of immune system pathways enriched per trait. Trait-specific enrichment based on under-explored genes (identified in e) also rated in top 1% for that specific trait, calculated using one-sided Fisher’s exact test. See Fig. 5a for the layout and coloring.

Supplementary information

Supplementary Information

Supplementary Figs. 1–15 and Supplementary Note

Reporting Summary

Supplementary Dataset 1

List of the top 150 prioritized target genes, together with evidence in 30 immune traits.

Supplementary Dataset 2

Gold-standard drug targets.

Supplementary Dataset 3

List of 116 RA novel target genes, together with the utilities explored.

Supplementary Dataset 4

Correlation with disease-relevant activity of compounds in RA.

Supplementary Dataset 5

A supra-hexagonal map of 878 highly prioritized genes across 16 immune traits.

Supplementary Dataset 6

List of 668 genes highly rated in at least one of 12 immune traits with high G2CT potential.

Supplementary Dataset 7

List of 50 pathway network genes, together with the utilities explored.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Fang, H., The ULTRA-DD Consortium., De Wolf, H. et al. A genetics-led approach defines the drug target landscape of 30 immune-related traits. Nat Genet 51, 1082–1091 (2019).

Download citation

Further reading


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing