An open approach to systematically prioritize causal variants and genes at all published human GWAS trait-associated loci

Mountjoy, Edward; Schmidt, Ellen M.; Carmona, Miguel; Schwartzentruber, Jeremy; Peat, Gareth; Miranda, Alfredo; Fumis, Luca; Hayhurst, James; Buniello, Annalisa; Karim, Mohd Anisul; Wright, Daniel; Hercules, Andrew; Papa, Eliseo; Fauman, Eric B.; Barrett, Jeffrey C.; Todd, John A.; Ochoa, David; Dunham, Ian; Ghoussaini, Maya

doi:10.1038/s41588-021-00945-5

Article
Published: 28 October 2021

An open approach to systematically prioritize causal variants and genes at all published human GWAS trait-associated loci

Nature Genetics volume 53, pages 1527–1533 (2021)Cite this article

21k Accesses
139 Citations
134 Altmetric
Metrics details

Subjects

Abstract

Genome-wide association studies (GWASs) have identified many variants associated with complex traits, but identifying the causal gene(s) is a major challenge. In the present study, we present an open resource that provides systematic fine mapping and gene prioritization across 133,441 published human GWAS loci. We integrate genetics (GWAS Catalog and UK Biobank) with transcriptomic, proteomic and epigenomic data, including systematic disease–disease and disease–molecular trait colocalization results across 92 cell types and tissues. We identify 729 loci fine mapped to a single-coding causal variant and colocalized with a single gene. We trained a machine-learning model using the fine-mapped genetics and functional genomics data and 445 gold-standard curated GWAS loci to distinguish causal genes from neighboring genes, outperforming a naive distance-based model. Our prioritized genes were enriched for known approved drug targets (odds ratio = 8.1, 95% confidence interval = 5.7, 11.5). These results are publicly available through a web portal (http://genetics.opentargets.org), enabling users to easily prioritize genes at disease-associated loci and assess their potential as drug targets.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Open Targets Genetics pipeline schematic.**

**Fig. 2: Performance of the L2G model.**

Genetic associations of protein-coding variants in human disease

Article Open access 23 February 2022

Benjamin B. Sun, Mitja I. Kurki, … Heiko Runz

Rare variant contribution to human disease in 281,104 UK Biobank exomes

Article Open access 10 August 2021

Quanli Wang, Ryan S. Dhindsa, … Slavé Petrovski

Integrating 3D genomic and epigenomic data to enhance target gene discovery and drug repurposing in transcriptome-wide association studies

Article Open access 07 June 2022

Chachrit Khunsriraksakul, Daniel McGuire, … Dajiang J. Liu

Data availability

Our results are freely available through a web portal (genetics.opentargets.org), GraphQL API or through bulk download. GWAS gold-standard genes: github.com/opentargets/genetics-gold-standards.

Code availability

All analysis code is available open source (Apache license) in the following repositories:

https://github.com/opentargets/genetics-sumstat-data

https://github.com/opentargets/genetics-finemapping

https://github.com/opentargets/genetics-colocalisation

https://github.com/opentargets/genetics-v2d-data

https://github.com/opentargets/genetics-v2g-data

https://github.com/opentargets/genetics-l2g-scoring

https://github.com/opentargets/genetics-gold-standards

https://github.com/opentargets/genetics-variant-annotation

References

Hindorff, L. A. et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl Acad. Sci. USA 106, 9362–9367 (2009).
Article CAS PubMed PubMed Central Google Scholar
Altshuler, D., Daly, M. J. & Lander, E. S. Genetic mapping in human disease. Science 322, 881–888 (2008).
Article CAS PubMed PubMed Central Google Scholar
Claussnitzer, M. et al. FTO obesity variant circuitry and adipocyte browning in humans. N. Engl. J. Med. 373, 895–907 (2015).
Article CAS PubMed PubMed Central Google Scholar
Zhu, Z. et al. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat. Genet. 48, 481–487 (2016).
Article CAS PubMed Google Scholar
Brænne, I. et al. Prediction of causal candidate genes in coronary artery disease loci. Arterioscler. Thromb. Vasc. Biol. 35, 2207–2217 (2015).
Article PubMed PubMed Central CAS Google Scholar
Fachal, L. et al. Fine-mapping of 150 breast cancer risk regions identifies 191 likely target genes. Nat. Genet. 52, 56–73 (2020).
Article CAS PubMed PubMed Central Google Scholar
Xue, A. et al. Genome-wide association analyses identify 143 risk variants and putative regulatory mechanisms for type 2 diabetes. Nat. Commun. 9, 2941 (2018).
Article PubMed PubMed Central CAS Google Scholar
Okada, Y. et al. Genetics of rheumatoid arthritis contributes to biology and drug discovery. Nature 506, 376–381 (2014).
Article CAS PubMed Google Scholar
Fang, H. et al. A genetics-led approach defines the drug target landscape of 30 immune-related traits. Nat. Genet. 51, 1082–1091 (2019).
Article CAS PubMed PubMed Central Google Scholar
Hay, M., Thomas, D. W., Craighead, J. L., Economides, C. & Rosenthal, J. Clinical development success rates for investigational drugs. Nat. Biotechnol. 32, 40–51 (2014).
Article CAS PubMed Google Scholar
Nelson, M. R. et al. The support of human genetic evidence for approved drug indications. Nat. Genet. 47, 856–860 (2015).
Article CAS PubMed Google Scholar
Carvalho-Silva, D. et al. Open Targets Platform: new developments and updates two years on. Nucleic Acids Res. 47, D1056–D1065 (2019).
Article CAS PubMed Google Scholar
Koscielny, G. et al. Open Targets: a platform for therapeutic target identification and validation. Nucleic Acids Res. 45, D985–D994 (2017).
Article CAS PubMed Google Scholar
Ochoa, D. et al. Open Targets Platform: supporting systematic drug-target identification and prioritisation. Nucleic Acids Res. 49, D1302–D1310 (2021).
Article PubMed CAS Google Scholar
GTEx Consortium. et al. Genetic effects on gene expression across human tissues. Nature 550, 204–213 (2017).
Article PubMed Central Google Scholar
Võsa, U. et al. Unraveling the polygenic architecture of complex traits using blood eQTL metaanalysis. Preprint at bioRxiv https://doi.org/10.1101/447367 (2018).
Kerimov, N. et al. A compendium of uniformly processed human gene expression and splicing quantitative trait loci. Nat. Genet. 53, 1290–1299 (2021).
Article CAS PubMed PubMed Central Google Scholar
Sun, B. B. et al. Genomic atlas of the human plasma proteome. Nature 558, 73–79 (2018).
Article CAS PubMed PubMed Central Google Scholar
Buniello, A. et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 47, D1005–D1012 (2019).
Article CAS PubMed Google Scholar
Zhou, W. et al. Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies. Nat. Genet. 50, 1335–1341 (2018).
Article CAS PubMed PubMed Central Google Scholar
Neale Lab. UK Biobank bulk summary statistics. http://www.nealelab.is/uk-biobank (2018).
Yang, J. et al. Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat. Genet. 44, 369–375 (2012).
Article CAS PubMed PubMed Central Google Scholar
Wellcome Trust Case Control Consortium. et al. Bayesian refinement of association signals for 14 loci in 3 common diseases. Nat. Genet. 44, 1294–1301 (2012).
Article CAS Google Scholar
Farh, K. K.-H. et al. Genetic and epigenetic fine mapping of causal autoimmune disease variants. Nature 518, 337–343 (2015).
Article CAS PubMed Google Scholar
McLaren, W. et al. The Ensembl Variant Effect Predictor. Genome Biol. 17, 122 (2016).
Article PubMed PubMed Central CAS Google Scholar
Michailidou, K. et al. Association analysis identifies 65 new breast cancer risk loci. Nature 551, 92–94 (2017).
Article PubMed PubMed Central CAS Google Scholar
Kawase, T. et al. PH domain-only protein PHLDA3 is a p53-regulated repressor of Akt. Cell 136, 535–550 (2009).
Article CAS PubMed Google Scholar
Giambartolomei, C. et al. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet. 10, e1004383 (2014).
Article PubMed PubMed Central CAS Google Scholar
Burgess, S. et al. Guidelines for performing Mendelian randomization investigations. Wellcome Open Res. 4, 186 (2020).
Article PubMed PubMed Central Google Scholar
Surendran, S. et al. An update on vitamin B₁₂-related gene polymorphisms and B₁₂ status. Genes Nutr. 13, 2 (2018).
Article CAS PubMed PubMed Central Google Scholar
Todd, J. A. Evidence that UBASH3 is a causal gene for type 1 diabetes. Eur. J. Hum. Genet. 26, 925–927 (2018).
Article CAS PubMed PubMed Central Google Scholar
de Lange, K. M. et al. Genome-wide association study implicates immune activation of multiple integrin genes in inflammatory bowel disease. Nat. Genet. 49, 256–261 (2017).
Article PubMed PubMed Central CAS Google Scholar
Mosnier, J.-F. et al. ADAM15 upregulation and interaction with multiple binding partners in inflammatory bowel disease. Lab Invest. 86, 1064–1073 (2006).
Article CAS PubMed Google Scholar
Jung, I. et al. A compendium of promoter-centered long-range chromatin interactions in the human genome. Nat. Genet. 51, 1442–1449 (2019).
Article CAS PubMed PubMed Central Google Scholar
Andersson, R. et al. An atlas of active enhancers across human cell types and tissues. Nature 507, 455–461 (2014).
Article CAS PubMed PubMed Central Google Scholar
Thurman, R. E. et al. The accessible chromatin landscape of the human genome. Nature 489, 75–82 (2012).
Article CAS PubMed PubMed Central Google Scholar
Visscher, P. M., Brown, M. A., McCarthy, M. I. & Yang, J. Five years of GWAS discovery. Am. J. Hum. Genet. 90, 7–24 (2012).
Article CAS PubMed PubMed Central Google Scholar
Soskic, B. et al. Chromatin activity at GWAS loci identifies T cell states driving complex immune diseases. Nat. Genet. 51, 1486–1493 (2019).
Article CAS PubMed PubMed Central Google Scholar
Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).
Article CAS PubMed PubMed Central Google Scholar
Wakefield, J. Bayes factors for genome-wide association studies: comparison with P-values. Genet. Epidemiol. 33, 79–86 (2009).
Article PubMed Google Scholar
1000 Genomes Project Consortium et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Article CAS Google Scholar
Morales, J. et al. A standardized framework for representation of ancestry data in genomics studies, with application to the NHGRI-EBI GWAS Catalog. Genome Biol. 19, 21 (2018).
Article PubMed PubMed Central Google Scholar
Adzhubei, I. A. et al. A method and server for predicting damaging missense mutations. Nat. Methods 7, 248–249 (2010).
Article CAS PubMed PubMed Central Google Scholar
Zerbino, D. R. et al. Ensembl 2018. Nucleic Acids Res. 46, D754–D761 (2018).
Article CAS PubMed Google Scholar
Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).
Article CAS PubMed PubMed Central Google Scholar
Stacey, D. et al. ProGeM: a framework for the prioritization of candidate causal genes at molecular quantitative trait loci. Nucleic Acids Res. 47, e3 (2019).
Article CAS PubMed Google Scholar
Type 2 Diabetes Knowledge Portal. http://www.type2diabetesgenetics.org/gene/effectorGeneTable (2019).
Gaulton, A. et al. The ChEMBL database in 2017. Nucleic Acids Res. 45, D945–D954 (2017).
Article CAS PubMed Google Scholar
Chen, T. & Guestrin, C. XGBoost. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining—KDD ’16. https://doi.org/10.1145/2939672.2939785 (2016).
Friedman, J. H. Greedy function approximation: a gradient boosting machine. Ann. Stat. 29, 1189–1232 (2001).
Article Google Scholar
Pencina, M. J., D’ Agostino, R. B., D’ Agostino, R. B. & Vasan, R. S. Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond. Stat. Med. 27, 157–172 (2008).
Article PubMed Google Scholar
Pencina, M. J., D’Agostino, R. B. Sr & Steyerberg, E. W. Extensions of net reclassification improvement calculations to measure usefulness of new biomarkers. Stat. Med. 30, 11–21 (2011).
Article PubMed Google Scholar
Sobczyk, M. K., Gaunt, T. R. & Paternoster, L. MendelVar: gene prioritisation at GWAS loci using phenotypic enrichment of mendelian disease genes. Bioinformatics 37, 1–8 (2021).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We thank E. McDonagh, J. Maranville and D. Hulcoop for their useful feedback to improve the paper, and H. Parkinson, J. MacArthur, D. Zerbino and K. Alasoo for their support with the GWAS Catalog and eQTL Catalogue data. This research has been conducted using the UK Biobank Resource. This work was funded by Open Targets. E.M. was funded by JDRF (4-SRA-2017-473-A-N) to the Diabetes and Inflammation Laboratory, University of Oxford. This research was funded in part by a Wellcome Trust grant (no. 206194). For the purpose of Open Access, the authors have applied a CC-BY public copyright license to any author-accepted manuscript version arising from this submission.

Author information

Authors and Affiliations

Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
Edward Mountjoy, Ellen M. Schmidt, Jeremy Schwartzentruber, Mohd Anisul Karim, Daniel Wright, Jeffrey C. Barrett, Ian Dunham & Maya Ghoussaini
Open Targets, Wellcome Genome Campus, Hinxton, UK
Edward Mountjoy, Ellen M. Schmidt, Miguel Carmona, Jeremy Schwartzentruber, Gareth Peat, Alfredo Miranda, Luca Fumis, James Hayhurst, Annalisa Buniello, Mohd Anisul Karim, Daniel Wright, Andrew Hercules, Jeffrey C. Barrett, David Ochoa, Ian Dunham & Maya Ghoussaini
European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
Miguel Carmona, Jeremy Schwartzentruber, Gareth Peat, Alfredo Miranda, Luca Fumis, James Hayhurst, Annalisa Buniello, Andrew Hercules, David Ochoa & Ian Dunham
Systems Biology, Biogen, Cambridge, MA, USA
Eliseo Papa
Integrative Biology, Internal Medicine Research Unit, Pfizer Worldwide Research, Development and Medical, Cambridge, MA, USA
Eric B. Fauman
Wellcome Centre for Human Genetics, Nuffield Department of Medicine, NIHR Oxford Biomedical Research Centre, University of Oxford, Oxford, UK
John A. Todd

Authors

Edward Mountjoy
View author publications
You can also search for this author in PubMed Google Scholar
Ellen M. Schmidt
View author publications
You can also search for this author in PubMed Google Scholar
Miguel Carmona
View author publications
You can also search for this author in PubMed Google Scholar
Jeremy Schwartzentruber
View author publications
You can also search for this author in PubMed Google Scholar
Gareth Peat
View author publications
You can also search for this author in PubMed Google Scholar
Alfredo Miranda
View author publications
You can also search for this author in PubMed Google Scholar
Luca Fumis
View author publications
You can also search for this author in PubMed Google Scholar
James Hayhurst
View author publications
You can also search for this author in PubMed Google Scholar
Annalisa Buniello
View author publications
You can also search for this author in PubMed Google Scholar
Mohd Anisul Karim
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Wright
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Hercules
View author publications
You can also search for this author in PubMed Google Scholar
Eliseo Papa
View author publications
You can also search for this author in PubMed Google Scholar
Eric B. Fauman
View author publications
You can also search for this author in PubMed Google Scholar
Jeffrey C. Barrett
View author publications
You can also search for this author in PubMed Google Scholar
John A. Todd
View author publications
You can also search for this author in PubMed Google Scholar
David Ochoa
View author publications
You can also search for this author in PubMed Google Scholar
Ian Dunham
View author publications
You can also search for this author in PubMed Google Scholar
Maya Ghoussaini
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

M.G., J.S., E.M. and I.D. wrote the manuscript. E.M. conducted the analysis and designed and built the machine-learning model. E.M., E.M.S. and M.G. prioritized GWASs for curation from the GWAS Catalog. E.M., M.C., A.B., J.H. and E.P. curated and processed the GWASs and functional genomics data. E.B.F., E.M. and M.G. curated the gold standards. G.P., A.M., L.F., A.H., E.P. and M.C. designed and implemented visualizations for analysis. J.S. conducted fine-mapping comparisons and Mendelian disease enrichments. D.O. performed additional analysis. I.D., M.G., J.A.T. and J.C.B. conceived and supervised the study. M.A.K. generated Fig. 1. M.G., E.M., E.M.S., D.W. and E.P. worked on the biological questions and the underlying visualizations in the portal.

Corresponding author

Correspondence to Maya Ghoussaini.

Ethics declarations

Competing interests

J.A.T. is a member of the GSK Human Genetics Advisory Board. E.B.F. is a full-time employee of and shareholder in Pfizer, Inc. E.P. was an employee of Biogen at the time of the work. E.P. is now an employee of AstraZeneca. The remaining authors do not have any competing interests.

Additional information

Peer review information Nature Genetics thanks Guillaume Lettre and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer review reports are available.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Difference between fine-mapping methods.

a, Histogram of the absolute difference in the number of variants in the 95% credible set across all loci. The median absolute difference was 7 variants. b, Histogram of the posterior probability at a given locus that is contained in variants shared between the 95% credible sets of the two methods, determined for all loci. The median shared variant probability was 0.70.

Extended Data Fig. 2 Minor allele frequency of the lead variant in 1000 Genomes European samples, stratified by the number of variants in the 95% credible set.

Association signals with smaller credible sets tend to be from lower frequency haplotypes. This effect is especially pronounced for the lowest credible set size bins (1 and 2-5).

Extended Data Fig. 3 Schematic showing how variant to gene distance features are calculated.

Distance to the transcription start site (TSS) is the number of bases from the variant to the TSS of the canonical transcript of the gene as defined by Ensembl. Distance to the gene footprint is the smallest number of base pairs between the variant and any position between the TSS and transcription end site of the canonical transcript. (Calculations, left) Some L2G distance features are an average across variants, weighted by each variant’s posterior probability from fine mapping. (Calculations, right) ‘Neighborhood’ features are defined on a log scale relative to the gene with the best score in that category (here, smallest distance). The negative log is used so that genes with better feature values have higher neighborhood scores.

Extended Data Fig. 4 Missingness.

The fraction of variants with missing values (no annotation in that category) is shown for representative input features of the L2G model.

Extended Data Fig. 5 Histogram of the number of prioritized genes (having L2G score ≥ 0.5) at each locus.

Very few loci have more than one gene prioritized.

Extended Data Fig. 6 Mendelian disease genes with matching phenotypes at GWAS loci are enriched for having high L2G scores.

For nine GWAS traits, we show the distribution of L2G scores for all Mendelian disease genes within 100 kb of a GWAS signal, stratified by whether or not the Mendelian disease has a matching phenotype term with the GWAS trait.

Extended Data Fig. 7 Gene enrichment.

Enrichment of genes with model score ≥ 0.5 for the distance-only models (top 2) or the full L2G model (bottom), stratified by whether the gene is a known drug target in ChEMBL phase ≥ 2, ≥ 3, or ≥ 4. Error bars show the 95% confidence interval from a two-sided Fisher’s exact test. The total numbers of positive gene-indication pairs used were 329, 216 and 159 for phases ≥2, ≥3 and ≥4 respectively. The overall sample size, including positive and negative gene-indication pairs, was 104,934.

Extended Data Fig. 8 Feature distributions.

Each plot shows the mean value of a given predictor across different gold-standard datasets (y-axis) for either gold standard positive genes (GSP, green) or gold standard negative genes (GSN, yellow). GSP genes are more easily distinguished from GSN genes by distance in the manually curated datasets (especially Progem, Fauman_twitter, and T2D).

Supplementary information

Reporting Summary

Supplementary Tables 1

Supplementary Tables 1–15.

Peer Review.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mountjoy, E., Schmidt, E.M., Carmona, M. et al. An open approach to systematically prioritize causal variants and genes at all published human GWAS trait-associated loci. Nat Genet 53, 1527–1533 (2021). https://doi.org/10.1038/s41588-021-00945-5

Download citation

Received: 21 August 2020
Accepted: 20 August 2021
Published: 28 October 2021
Issue Date: November 2021
DOI: https://doi.org/10.1038/s41588-021-00945-5

This article is cited by

Integrative single-cell analysis: dissecting CD8 + memory cell roles in LUAD and COVID-19 via eQTLs and Mendelian Randomization
- Jintao Wu
- Xiaocheng Mao
- Huijuan Xu
Hereditas (2024)
Constructing eRNA-mediated gene regulatory networks to explore the genetic basis of muscle and fat-relevant traits in pigs
- Chao Wang
- Choulin Chen
- Yuwen Liu
Genetics Selection Evolution (2024)
Development of a human genetics-guided priority score for 19,365 genes and 399 drug indications
- Áine Duffy
- Ben Omega Petrazzini
- Ron Do
Nature Genetics (2024)
Leveraging electronic health records and knowledge networks for Alzheimer’s disease prediction and sex-specific biological insights
- Alice S. Tang
- Katherine P. Rankin
- Marina Sirota
Nature Aging (2024)
Investigation of genetic determinants of cognitive change in later life
- Liam Mahedy
- Emma L. Anderson
- Lavinia Paternoster
Translational Psychiatry (2024)