A genetics-led approach defines the drug target landscape of 30 immune-related traits

Article metrics


Most candidate drugs currently fail later-stage clinical trials, largely due to poor prediction of efficacy on early target selection1. Drug targets with genetic support are more likely to be therapeutically valid2,3, but the translational use of genome-scale data such as from genome-wide association studies for drug target discovery in complex diseases remains challenging4,5,6. Here, we show that integration of functional genomic and immune-related annotations, together with knowledge of network connectivity, maximizes the informativeness of genetics for target validation, defining the target prioritization landscape for 30 immune traits at the gene and pathway level. We demonstrate how our genetics-led drug target prioritization approach (the priority index) successfully identifies current therapeutics, predicts activity in high-throughput cellular screens (including L1000, CRISPR, mutagenesis and patient-derived cell assays), enables prioritization of under-explored targets and allows for determination of target-level trait relationships. The priority index is an open-access, scalable system accelerating early-stage drug target selection for immune-mediated disease.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: Overview of Pi applied to rheumatoid arthritis.
Fig. 2: Validating Pi target prioritization for rheumatoid arthritis.
Fig. 3: Cross-trait application of Pi informing utility of approach and predictors.
Fig. 4: Landscape of prioritized target genes across immune traits.
Fig. 5: Landscape of prioritized target pathways across immune traits.
Fig. 6: Multitrait comparisons.

Data availability

The data that support the findings of this study are available within the paper and its Supplementary Information files. The Pi relational database has been deposited into figshare (https://doi.org/10.6084/m9.figshare.6972746) and is also available from the Pi web server (http://pi.well.ox.ac.uk).

Code availability

Software codes, together with the user and reference manual, have been packaged and deposited into Bioconductor (available at http://bioconductor.org/packages/Pi), including codes for the showcase in this manuscript supporting reproducible research.


  1. 1.

    Hay, M., Thomas, D. W., Craighead, J. L., Economides, C. & Rosenthal, J. Clinical development success rates for investigational drugs. Nat. Biotechnology 32, 40–51 (2014).

  2. 2.

    Plenge, R. M., Scolnick, E. M. & Altshuler, D. Validating therapeutic targets through human genetics. Nat. Rev. Drug Discov. 12, 581–594 (2013).

  3. 3.

    Nelson, M. R. et al. The support of human genetic evidence for approved drug indications. Nat. Genet. 47, 856–860 (2015).

  4. 4.

    Finan, C. et al. The druggable genome and support for target identification and validation in drug development. Sci. Transl. Med. 9, eaag1166 (2017).

  5. 5.

    Koscielny, G. et al. Open Targets: a platform for therapeutic target identification and validation. Nucleic Acids Res. 45, D985–D994 (2017).

  6. 6.

    Okada, Y. et al. Genetics of rheumatoid arthritis contributes to biology and drug discovery. Nature 506, 376–381 (2014).

  7. 7.

    Albert, F. W. & Kruglyak, L. The role of regulatory variation in complex traits and disease. Nat. Rev. Genet. 16, 197–212 (2015).

  8. 8.

    Fairfax, B. P. et al. Innate immune activity conditions the effect of regulatory variants upon monocyte gene expression. Science 343, 1246949 (2014).

  9. 9.

    Giambartolomei, C., Vukcevic, D., Schadt, E. E., Franke, L. & Hingorani, A. D. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet. 10, e1004383 (2014).

  10. 10.

    Spalinger, M. R. et al. PTPN2 regulates inflammasome activation and controls onset of intestinal inflammation and colon cancer. Cell Rep. 22, 1835–1848 (2018).

  11. 11.

    Svensson, M. N. D. et al. Reduced expression of phosphatase PTPN2 promotes pathogenic conversion of Tregs in autoimmunity. J. Clin. Invest. 129, 1193–1210 (2019).

  12. 12.

    Manguso, R. T. et al. In vivo CRISPR screening identifies Ptpn2 as a cancer immunotherapy target. Nature 547, 413–418 (2017).

  13. 13.

    Guo, Y. et al. CD40L-dependent pathway is active at various stages of rheumatoid arthritis disease progression. J. Immunol. 198, 4490–4501 (2017).

  14. 14.

    Schwabe, C. et al. Safety, pharmacokinetics, and pharmacodynamics of multiple rising doses of BI 655064, an antagonistic anti-CD40 antibody, in healthy subjects: a potential novel treatment for autoimmune diseases. J. Clin. Pharmacol. 58, 1566–1577 (2018).

  15. 15.

    Marigorta, U. M. et al. Transcriptional risk scores link GWAS to eQTLs and predict complications in Crohn’s disease. Nat. Genet. 49, 1517–1521 (2017).

  16. 16.

    Jonkers, I. H. & Wijmenga, C. Context-specific effects of genetic variants associated with autoimmune disease. Hum. Mol. Genet. 26, 185–192 (2017).

  17. 17.

    Atsumi, T. et al. A point mutation of Tyr-759 in interleukin 6 family cytokine receptor subunit gp130 causes autoimmune arthritis. J. Exp. Med. 196, 979–990 (2002).

  18. 18.

    Sakaguchi, N. et al. Altered thymic T-cell selection due to a mutation of the ZAP-70 gene causes autoimmune arthritis in mice. Nature 426, 454–460 (2003).

  19. 19.

    Meng, X. et al. Hypoxia-inducible factor-1α is a critical transcription factor for IL-10-producing B cells in autoimmune disease. Nat. Commun. 9, 251 (2018).

  20. 20.

    Vermeire, K. et al. Accelerated collagen-induced arthritis in IFN-gamma receptor-deficient mice. J. Immunol. 158, 5507–5513 (1997).

  21. 21.

    Boe, A., Baiocchi, M., Carbonatto, M., Papoian, R. & Serlupi-crescenzi, O. Interleukin 6 knock-out mice are resistant to antigen-induced experimental arthritis. Cytokine 11, 1057–1064 (1999).

  22. 22.

    Tada, B. Y., Ho, A., Matsuyama, T. & Mak, T. W. Reduced incidence and severity of antigen-induced autoimmune diseases in mice lacking interferon regulatory factor-1. J. Exp. Med. 185, 231–238 (1997).

  23. 23.

    Lacey, C. A., Mitchell, W. J., Brown, C. R. & Skyberg, A. Temporal role for MyD88 in a model of Brucella-induced arthritis and musculoskeletal inflammation. Infect. Immun. 85, e00961–16 (2017).

  24. 24.

    Wong, P. K. K. et al. SOCS-3 negatively regulates innate and adaptive immune mechanisms in acute IL-1-dependent inflammatory arthritis. J. Clin. Invest. 116, 1571–1581 (2006).

  25. 25.

    Pierer, M., Wagner, U., Rossol, M. & Ibrahim, S. Toll-like receptor 4 is involved in inflammatory and joint destructive pathways in collagen-induced arthritis in DBA1J mice. PLoS ONE 6, e23539 (2011).

  26. 26.

    De Wolf, H. et al. High-throughput gene expression profiles to define drug similarity and predict compound activity. Assay Drug Dev. Technol. 16, 162–176 (2018).

  27. 27.

    Fang, H. & Gough, J. supraHex: an R/Bioconductor package for tabular omics data analysis using a supra-hexagonal map. Biochem. Biophys. Res. Commun. 443, 285–289 (2014).

  28. 28.

    Dargahi, N. et al. Multiple sclerosis: immunopathology and treatment update. Brain Sci. 7, 78 (2017).

  29. 29.

    Brockmann, M. et al. Genetic wiring maps of single-cell protein states reveal an off-switch for GPCR signalling. Nature 546, 307–311 (2017).

  30. 30.

    Mujtaba, M. G. et al. Treatment of mice with the suppressor of cytokine signaling-1 mimetic peptide, tyrosine kinase inhibitor peptide, prevents development of the acute form of experimental allergic encephalomyelitis and induces stable remission in the chronic relapsing/remit. J. Immunol. 175, 5077–5086 (2005).

  31. 31.

    Todd, J. A. et al. Regulatory T cell responses in participants with type 1 diabetes after a single dose of interleukin-2: a non-randomised, open label, adaptive dose-finding trial. PLoS Med. 13, e1002139 (2016).

  32. 32.

    Danese, S. et al. Tofacitinib as induction and maintenance therapy for ulcerative colitis. N. Engl. J. Med. 377, 1723–1736 (2017).

  33. 33.

    Panés, J. et al. Tofacitinib for induction and maintenance therapy of Crohn’s disease: results of two phase IIb randomised placebo-controlled trials. Gut 66, 1049–1059 (2017).

  34. 34.

    Tulunay, A. et al. Activation of the JAK/STAT pathway in Behcet’s disease. Genes Immun. 16, 170–175 (2015).

  35. 35.

    Beeh, K., Kanniess, F., Wagner, F., Schilder, C. & Naudts, I. The novel TLR-9 agonist QbG10 shows clinical efficacy in persistent allergic asthma. J. Allergy Clin. Immunol. 131, 866–874 (2013).

  36. 36.

    Parnas, O. et al. A genome-wide CRISPR screen in primary immune cells to dissect regulatory networks. Cell 162, 675–686 (2015).

  37. 37.

    Hedrich, C. M. Epigenetics in SLE. Curr. Rheumatol. Rep. 19, 58 (2017).

  38. 38.

    Singh, N. et al. Alterations in nuclear structure promote lupus autoimmunity in a mouse model. Dis. Model Mech. 9, 885–897 (2016).

  39. 39.

    Banerjee, S., Biehl, A., Gadina, M., Hasni, S. & Schwartz, D. M. JAK–STAT signaling as a target for inflammatory and autoimmune diseases: current and future prospects. Drugs 77, 521–546 (2017).

  40. 40.

    Lee, W. H. Open access target validation is a more efficient way to accelerate drug discovery. PLoS Biol. 13, e1002164 (2015).

  41. 41.

    MacArthur, J. et al. The new NHGRI-EBI Catalog of published genome-wide association studies (GWAS Catalog). Nucleic Acids Res. 45, D896–D901 (2016).

  42. 42.

    Rao, S. S. P. et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–1680 (2014).

  43. 43.

    Javierre, B. M. et al. Lineage-specific genome architecture links enhancers and non-coding disease variants to target gene promoters. Cell 167, 1369–1384.e19 (2016).

  44. 44.

    Fairfax, B. P. et al. Genetics of gene expression in primary immune cells identifies cell type-specific master regulators and roles of HLA alleles. Nat. Genet. 44, 502–510 (2012).

  45. 45.

    Westra, H.-J. et al. Systematic identification of trans eQTLs as putative drivers of known disease associations. Nat. Genet. 45, 1238–1243 (2013).

  46. 46.

    Naranbhai, V. et al. Genomic modulators of gene expression in human neutrophils. Nat. Commun. 6, 7545 (2015).

  47. 47.

    Kasela, S. et al. Pathogenic implications for autoimmune mechanisms derived by comparative eQTL analysis of CD4+ versus CD8+ T cells. PLoS Genet. 13, e1006643 (2017).

  48. 48.

    Zhu, Z. et al. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat. Genet. 48, 481–487 (2016).

  49. 49.

    Ashburner, M. et al. Gene Ontology: tool for the unification of biology. Nat. Genet. 25, 25–29 (2000).

  50. 50.

    Hamosh, A. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res. 33, D514–D517 (2004).

  51. 51.

    Kibbe, W. A. et al. Disease Ontology 2015 update: an expanded and updated database of human diseases for linking biomedical knowledge through disease data. Nucleic Acids Res. 43, D1071–D1078 (2015).

  52. 52.

    Köhler, S. et al. The Human Phenotype Ontology in 2017. Nucleic Acids Res. 45, D865–D876 (2016).

  53. 53.

    Smith, C. L. & Eppig, J. T. The Mammalian Phenotype Ontology: enabling robust annotation and comparative analysis. Wiley Interdiscip. Rev. Syst. Biol. Med. 1, 390–399 (2009).

  54. 54.

    Grady, L. Random walks for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 28, 1768–1783 (2006).

  55. 55.

    Szklarczyk, D. et al. The STRING database in 2017: quality-controlled protein–protein association networks, made broadly accessible. Nucleic Acids Res. 39, 561–568 (2016).

  56. 56.

    Gaulton, A. et al. The ChEMBL database in 2017. Nucleic Acids Res. 45, D945–D954 (2017).

  57. 57.

    Loughin, T. M. A systematic comparison of methods for combining p-values from independent tests. Comput. Stat. Data Anal. 47, 467–485 (2004).

  58. 58.

    Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005).

  59. 59.

    Fabregat, A. et al. The reactome pathway knowledgebase. Nucleic Acids Res. 44, D481–D487 (2016).

  60. 60.

    Kanehisa, M., Furumichi, M., Tanabe, M., Sato, Y. & Morishima, K. KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res. 45, D353–D361 (2017).

  61. 61.

    Fang, H. & Gough, J. The ‘dnet’ approach promotes emerging research on cancer patient survival. Genome Med. 6, 64 (2014).

  62. 62.

    Wang, Z. et al. Extraction and analysis of signatures from the Gene Expression Omnibus by the crowd. Nat. Commun. 7, 12846 (2016).

  63. 63.

    Berman, H. M. et al. The Protein Data Bank. Nucleic Acids Res. 28, 235–242 (2000).

  64. 64.

    Schmidtke, P. & Barril, X. Understanding and predicting druggability. A high-throughput method for detection of drug binding sites. J. Med. Chem. 53, 5858–5867 (2010).

  65. 65.

    Mungall, C. J. et al. The Monarch Initiative: an integrative data and analytic platform connecting phenotypes to genotypes across species. Nucleic Acids Res. 45, D712–D722 (2017).

  66. 66.

    Fang, H. et al. XGR software for enhanced interpretation of genomic summary data, illustrated by application to immunological traits. Genome Med. 8, 129 (2016).

Download references


We thank A. Edwards for comments on the manuscript. This project was supported by: the European Research Council (FP7/2007-2013), through an EU/EFPIA Innovative Medicines Initiative Joint Undertaking (ULTRA-DD 115766 and 281824 to J.C.K.); Arthritis Research UK (20773 to J.C.K.); the Wellcome Trust Investigator Award (204969/Z/16/Z to J.C.K.); Wellcome Trust grants 090532/Z/09/Z and 203141/Z/16/Z (to the Wellcome Centre for Human Genetics core facility) and 201488/Z/16/Z (to B.P.F.); NIHR Oxford Biomedical Research Centre; Estonian Research Council (PRG184 to L.M.); Alzheimer’s Research UK (ARUK-2018DDI-OX to P.E.B.); and Structural Genomics Consortium (charity number 1097737), which receives funds from AbbVie, Bayer Pharma, Boehringer Ingelheim, the Canada Foundation for Innovation, the Eshelman Institute for Innovation, Genome Canada, the Innovative Medicines Initiative (EU/EFPIA) (ULTRA-DD grant number 115766), Janssen, Merck (Darmstadt, Germany), MSD, Novartis Pharma, the Ontario Ministry of Economic Development and Innovation, Pfizer, the São Paulo Research Foundation, Takeda and the Wellcome Trust (106169/ZZ14/Z). For computation, we used the Oxford Biomedical Research Computing facility—a joint development between the Wellcome Centre for Human Genetics and the Big Data Institute, supported by Health Data Research UK and the NIHR Oxford Biomedical Research Centre. The views expressed are those of the author(s) and not necessarily those of the NHS, NIHR or Department of Health. Listed in the ULTRA-DD Consortium are Target Priorization Network (TPN) members (alphabetical order).

Author information

H.F., J.C.K., M.S., C.B., P.B., B.P.F., C.A.O. and P.J.P. conceived of the study. H.F. and J.C.K. developed the methodology. H.F. developed the software and curated the database. H.F., H.D.W., B.K., K.L.B., J.O., S.K. and J.K.W. performed the analyses. H.D.W., F.E.M., L.C., T.S., Y.S. and L.B. performed the investigation. P.J.P., H.W.G., B.P.F., J.C.K., L.M., B.D.M., D.D., S.D.C. and P.E.B. provided resources. H.F. curated the data. H.F. and J.C.K. wrote the original draft. H.F., J.C.K., K.L.B., B.K., L.H., J.O., H.D.W., M.S., C.A.O., A.L.L. and F.E.M. reviewed and edited the manuscript. H.F., J.C.K., H.D.W., K.L.B., S.D.C. and B.K. revised the manuscript. H.F., J.C.K., A.S., A.L.L. and K.L.B. designed the visualization. J.C.K. supervised the study. J.C.K. and M.S. acquired the funding.

Correspondence to Julian C. Knight.

Ethics declarations

Competing interests

The Structural Genomics Consortium receives funds from AbbVie, Bayer Pharma, Boehringer Ingelheim, the Canada Foundation for Innovation, the Eshelman Institute for Innovation, Genome Canada, Janssen, Merck (Darmstadt, Germany), MSD, Novartis Pharma, the Ontario Ministry of Economic Development and Innovation, Pfizer, the São Paulo Research Foundation, Takeda and the Wellcome Trust (authors B.D.M., D.D., C.B., Y.S., L.B. and M.S.). These funders had no direct role in study conceptualization, design, data collection, analysis, decision to publish or preparation of the manuscript, except for Janssen (authors H.D.W., J.K.W., H.W.G. and P.J.P.), which generated the L1000 data in house for the compound screen presented in the paper.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated supplementary information

Supplementary Figure 1 Pilot analysis supporting principles used by Pi.

a, Schematic illustration of scoring for nearby genes (nGene) from GWAS summary data accounting for linkage disequilibrium (LD) structure and genomic organization. Considering genomic proximity, that is, a distance window for GWAS SNPs taking account of LD structure, and also considering genomic organization, that is, nearby genes and SNPs constrained to the same topologically associated domain (TAD). b, Simultaneous optimization of parameters regarding genomic influential range (distance and decay for nearby gene scoring) and network influential range (the restarting probability controlling the degree of network connectivity being exploited by random walk with restart), in terms of performance measured by area under the curve (AUC) separating gold standard positives (clinical proof-of-concept immune drug targets) and gold standard negatives (simulated genes unlikely to be drug targets). Clinical proof-of-concept immune drug targets are sourced from the ChEMBL database, defined as a collection of target genes of drugs with phase 2 concluded, moving into phase 3 and above in Pi immune traits. c, Enrichment analysis of approved immune drug targets (left), phase 3 and above immune drug targets (middle), and clinical proof-of-concept immune targets (right) in terms of chromatin conformation genes (cGene) by physical interaction and eQTL genes (eGene) by expression. Enrichment analysis is based on one-sided Fisher’s exact test, with the vertical line in grey indicating the false discovery rate (FDR) threshold at 0.05. d, The cGene scored considering the empirical cumulative distribution function (eCDF) of the significance/strength level linking an SNP to a gene. e, The methodological overview of incorporating colocalization analysis at the GWAS-eQTL integration step in the Pi pipeline, and how to estimate directionality and magnitude of effect.

Supplementary Figure 2 Network analysis supporting principles used by Pi.

a, Enrichment analysis of immune drug targets (of different phases) in terms of GWAS reported genes (left) and GWAS genes plus their interacting neighbors (right). Inserted below is per-trait enrichment analysis focusing on clinical proof-of-concept immune drug targets. Fisher’s exact test (two-sided) used to calculate odds ratio (OR) with 95% confidence interval (CI; represented by lines). GWAS reported genes are sourced from GWAS Catalog (P < 5 × 10−8), and interaction neighbors of GWAS genes are sourced from the STRING database, defined as interactions with high confidence score. b, Schematic overview showing an approach for identifying pathway crosstalk and assessing the significance level.

Supplementary Figure 3 Illustration of data and links for prioritization, evidence, druggability and effect output in rheumatoid arthritis (RA).

a, The front page with a navigation tab listing main features and an interface “Pi Reveal” searching for traits and targets. b, The trait-specific page for prioritized target genes. c, A tabular display of the top 5 genes in target pathway crosstalk together with an overview of evidence, druggability and effect. d, Illustration of data and links for one prioritized gene, CD40, showing evidence used in Pi for this gene and thus accelerating decision-making on target selection. This includes but is not limited to: (i) target priority info, linked to details on genomic predictors (for example, cell types/states in which eQTLs and eGenes are identified together with magnitude and directionality of effect), annotation predictors and interaction/network plot; and (ii) target druggable info, where available, including DGIdb druggable gene categories and PDB druggable pockets (linked to details and 3D View of the PDB protein structure embedded with druggable pockets shown in blue).

Supplementary Figure 4 Diagram of schema used in the Pi relational database.

Available at https://doi.org/10.6084/m9.figshare.6972746, together with schema documentation giving instructions on how to install and use the database. In particular the table “pi_priority” provides trait-specific target priority information together with an overview of associated information. The linked tables provide details on data used for predictor preparation, current therapeutic drugs with phase of development, and different measurements on druggability.

Supplementary Figure 5 Further information regarding target prioritizations and validation for rheumatoid arthritis (RA).

a, Enrichment analysis of Pi top 1% prioritized target genes for RA, in terms of targets of approved drugs in RA, juvenile idiopathic arthritis (JIA) or 28 other immune traits. Fisher’s exact test (one-sided) used. b, Gold standard positives (GSPs) were defined from approved drug targets; genes unlikely to be drug targets (gold standard negatives, GSNs) were defined from the gene druggable space in which GSPs and their direct interaction neighbors had been removed. c, Benchmarking Pi, comparing performance of a naïve method (how often a gene is targeted by drugs), and two other genetics-based methods to separate approved drug targets (GSPs) from GSNs. d, Comparing performance of Pi versus Mendelian disease genetics plus network neighbors (identified via random walk). e, TSEA of Pi prioritized target genes for RA, in terms of disease-specific gene expression signatures in RA (compared to unaffected controls). Normalized enrichment score (NES) indicates likelihood of Pi prioritized target genes to be modulated in disease (over- or under-expressed). Disease-specific gene expression signatures are sourced from CREEDS, with cell types or tissues of origin and the primary data source (GSE accession number) indicated. f, Venn diagram illustrating the enrichment of genes with druggable pockets in RA novel targets (defined as top 1% prioritized but excluding targets of current therapeutics in RA). The significance level (P), odds ratio (OR) and 95% confidence interval (CI) calculated according to Fisher’s exact test (one-sided). g, Identification of drug perturbation gene signatures enriched in RA novel targets. The significance level (FDR) calculated according to Fisher’s exact test (one-sided). Novel RA targets in these enriched signatures shown on the left, with targets ordered according to Pi rating. PBMCs, peripheral blood mononuclear cells; LPS, lipopolysaccharide; CML, chronic myeloid leukemia. h, Novel RA targets identified by Pi that are targets of approved drugs in other disease indications, indicating potential repurposing opportunities.

Supplementary Figure 6 Analysis of Pi rating for drugs and target genes using L1000 data and disease-specific gene signatures in RA.

a, Schematic overview showing approach to validate Pi rating at the drug and target level. b, Correlation of Pi ratings with disease-relevant activity of a compound (transcriptional similarity between an RA disease gene expression signature and the compound transcriptional profile in PBMCs quantified using Zhang’s connection score), shown at the drug (left) and target (right) level. Spearman rank correlation calculated, with the significance level estimated empirically (randomly sampling 20,000 times). Sensitivity (estimated by removing different percentages of drugs, repeated 100 times) and specificity (estimated by calculating the correlations based on Pi rating in 29 other immune traits listed in Fig. 3a) shown below. Error bars represent standard deviation with the mean centered. c, L1000 data identifying compounds targeting novel targets in RA. Zhang’s connection score quantifies the disease-relevant expression activity of a compound as the transcriptional similarity between the RA disease gene expression signature and the compound transcriptional profile in PBMCs. The significance level (P = 5%) for the connection score estimated by randomized test using the methodology.

Supplementary Figure 7 Pi predictors, network effect, and negative control.

a, Performance comparisons for individual predictors across traits (within Pi and direct use). Measured by area under the curve (AUC) separating gold standard positives (GSPs) and gold standard negatives (stimulated). Analysing using GSPs based on either targets of phase 2 and above (left), targets of drugs at phase 3 and above (middle) or approved drug targets (right). Direct use of immune annotations without knowledge of genomic seed genes is much less predictive of drug targets. Notably, such analysis restricted to traits with >10 targets, that is, 16 traits based on targets of phase 2 and above (left), 11 traits based on phase 3 and above (middle), and 5 traits based on approved drug targets (right). b, Optimizing components of Pi predictors. Comparing individual predictors (x-axis) constructed using both seed genes and non-seed genes (left) and using only seed genes (without incorporating network connectivity, right). GSPs are based on target genes of drugs at phase 2 and above. c, Scatter plot showing relationship between network degree and priority rank (all 30 traits with a total of n = 1,200 dots). Correlation based on Spearman’s rank test (two-sided). d, Negative control for enrichment of immune drug targets. Left: schematic illustration of use of Experimental Factor Ontology (EFO) tree to select immune mediated GWAS disease traits and non-immune GWAS disease traits. Right: target set enrichment analysis (TSEA) for approved immune drug targets in the Pi prioritized gene list taking as inputs GWAS SNPs from immune mediated diseases (in blue) or from exclusively non-immune mediated diseases (in yellow), with sensitivity assessed by removing different percentages of GWAS SNPs. The horizontal line in grey indicates the Bonferroni-adjusted P-value threshold at 0.05.

Supplementary Figure 8 Machine learning and predictor informativeness.

a, Performance comparisons between machine learning algorithms. Area under the curve (AUC) shown with 95% confidence intervals based on 10-repeated 3-fold cross validation per algorithm with optimized tuning parameters. Per fold, two thirds of GSPs and GSNs used for training, one third for performance evaluation. GSPs are based on target genes of drugs at phase 2 and above (that is, clinical proof-of-concept targets). Using random forest consistently outperforms or is competitive to the top performer of other algorithms (followed by state-of-the-art boosting algorithms, classical ones, and generalized linear algorithms). b, Relative importance of predictors. Measured by decrease in accuracy (disabling that predictor) scaled relative to maximum decrease, estimated by random forest. Annotation predictors with knowledge of genomic seed genes are in general more informative than genomic predictors based on actual ‘usage’ of predictors by random forest; this is consistent with performance directly measured for individual predictors (Fig. 3b).

Supplementary Figure 9 Target set enrichment analysis for immune traits.

Such analysis restricted to 11 traits with >10 phase 3 and above targets (left), and to 5 traits with > 10 approved drug targets (right). Bar plot shows the proportion of such targets at “leading edge” of prioritized rankings. Coverage (total number within the leading edge / total number of targets for that trait) indicated, together with FDR.

Supplementary Figure 10 Sensitivity of enrichments for immune drug targets to immune-related annotation predictors.

Scatter plot shows target set enrichment analysis results including normalized enrichment score (NES), coverage and FDR (the horizontal line in blue indicating the FDR threshold at 0.01) for the prioritized gene list after removing one or more annotation predictors.

Supplementary Figure 11 Cluster analysis of top 1% prioritized genes across 16 immune traits.

a, A supra-hexagonal map labeled with the built-in index and the number of target genes per hexagon. Six target clusters were identified using the region-growing partition of the trained map. This partition allows for topology-preserving identification of target clusters, each of which is continuous over the map. b, Schematic flowchart illustrating multilayer data comparison. Target-trait matrix containing Pi rating is used as input data for training the map, while druggable pocket data (a binary vector) is used as additional data for overlaying onto the trained map. The resulting druggable map indicates the probability of each hexagon containing druggable genes, with the percentage (%) of druggable genes for each cluster shown (pie chart).

Supplementary Figure 12 Target cluster enrichment analysis.

a, Forest plot of approved drug targets (grouped by drug indication terms) enriched in cluster C6. The significance level (FDR), odds ratio and 95% CI calculated according to Fisher’s exact test (one-sided). Also illustrated gene lists for each enriched drug indication term. Right: illustration of approved drug targets for immune system diseases found in the cluster, together with diseases and drugs. b, Forest plot of KEGG pathways enriched in target clusters. Lines represent 95% CI (one-sided Fisher’s exact test).

Supplementary Figure 13 Heatmap illustrating prioritized pathways for 12 traits.

Based on KEGG pathway map: immune system pathways (left) and signal transduction pathways (right).

Supplementary Figure 14 Target validation for systemic lupus erythematosus (SLE).

a, Illustration of protein targets (y-axis) targeted by chemical probes (x-axis). Probes are ordered by the number of targets it has. Multiple targets per probe are colour-coded by probe-to-target affinity measured by IC50. b, Epigenetic probe assays tested using cytokine stimulated PBMCs from SLE patients (n = 5 recruited patients on a random basis without a risk for a self-selection bias). c, Physical interaction of the region harboring SLE-associated SNP rs558702 with EHMT2 in Macrophages M0 (promoter capture Hi-C score = 13.99), M1 (13.28) and M2 (10.58). d, Interaction plot for EHMT2 illustrating the top prioritized neighbors of this gene in SLE.

Supplementary Figure 15 Further analysis for 50 crosstalk network genes.

a, Individual pathways significantly enriched in the crosstalk. The significance level (FDR), odds ratio (OR) and 95% confidence interval (CI) calculated according to Fisher’s exact test (one-sided), using all multitrait rated genes as the test background. b, Venn diagram illustrating the enrichment of mouse immune-mediated disease phenotypes, using all multitrait rated genes as the test background. Calculated using one-sided Fisher’s exact test. Genes annotated to mouse immune-mediate disease phenotypes sourced from the Monarch Initiative. c, Identification of drug perturbation gene signatures enriched in crosstalk network genes. The significance level (FDR) calculated according to Fisher’s exact test (one-sided). Crosstalk network genes in these enriched signatures shown on the left. PBMCs, peripheral blood mononuclear cells; LPS, lipopolysaccharide; AML, acute myeloid leukemia; CML, chronic myeloid leukemia. d, Heatmap illustrating the potential nodal points/genes and their PDB known protein structures with predicted druggable pockets. Color-coded by the number of druggable pockets found in the structure. e, Multitrait rating versus novelty score, with 41 highly rated but under-explored genes highlighted in box. Also labeled are genes involved in interferon and IL2/6/20 family signaling pathways. f, Illustration of immune system pathways enriched per trait. Trait-specific enrichment based on under-explored genes (identified in e) also rated in top 1% for that specific trait, calculated using one-sided Fisher’s exact test. See Fig. 5a for the layout and coloring.

Supplementary information

Supplementary Information

Supplementary Figs. 1–15 and Supplementary Note

Reporting Summary

Supplementary Dataset 1

List of the top 150 prioritized target genes, together with evidence in 30 immune traits.

Supplementary Dataset 2

Gold-standard drug targets.

Supplementary Dataset 3

List of 116 RA novel target genes, together with the utilities explored.

Supplementary Dataset 4

Correlation with disease-relevant activity of compounds in RA.

Supplementary Dataset 5

A supra-hexagonal map of 878 highly prioritized genes across 16 immune traits.

Supplementary Dataset 6

List of 668 genes highly rated in at least one of 12 immune traits with high G2CT potential.

Supplementary Dataset 7

List of 50 pathway network genes, together with the utilities explored.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Further reading