Leveraging polygenic enrichments of gene features to predict genes underlying complex traits and diseases

Weeks, Elle M.; Ulirsch, Jacob C.; Cheng, Nathan Y.; Trippe, Brian L.; Fine, Rebecca S.; Miao, Jenkai; Patwardhan, Tejal A.; Kanai, Masahiro; Nasser, Joseph; Fulco, Charles P.; Tashman, Katherine C.; Aguet, Francois; Li, Taibo; Ordovas-Montanes, Jose; Smillie, Christopher S.; Biton, Moshe; Shalek, Alex K.; Ananthakrishnan, Ashwin N.; Xavier, Ramnik J.; Regev, Aviv; Gupta, Rajat M.; Lage, Kasper; Ardlie, Kristin G.; Hirschhorn, Joel N.; Lander, Eric S.; Engreitz, Jesse M.; Finucane, Hilary K.

doi:10.1038/s41588-023-01443-6

Article
Published: 13 July 2023

Leveraging polygenic enrichments of gene features to predict genes underlying complex traits and diseases

Elle M. Weeks ORCID: orcid.org/0000-0002-4317-4444¹^na1,
Jacob C. Ulirsch ORCID: orcid.org/0000-0002-7947-0827^1,2^na1^nAff32,
Nathan Y. Cheng¹,
Brian L. Trippe^3,4,
Rebecca S. Fine^1,5,6^nAff33,
Jenkai Miao^1,5,
Tejal A. Patwardhan^1,7,
Masahiro Kanai ORCID: orcid.org/0000-0001-5165-4408^1,8,9,10,
Joseph Nasser¹,
Charles P. Fulco¹^nAff34,
Katherine C. Tashman¹,
Francois Aguet ORCID: orcid.org/0000-0001-9414-300X¹,
Taibo Li ORCID: orcid.org/0000-0002-6624-9293^1,11,
Jose Ordovas-Montanes^1,12,13,14,
Christopher S. Smillie^1,3,
Moshe Biton^1,15^nAff35,
Alex K. Shalek^{1,16,17,18,19},
Ashwin N. Ananthakrishnan²⁰,
Ramnik J. Xavier^1,15,20,21,
Aviv Regev ORCID: orcid.org/0000-0003-3293-3158^1,18,22^nAff36,
Rajat M. Gupta^1,23,24,
Kasper Lage^1,25,
Kristin G. Ardlie^1,26,
Joel N. Hirschhorn^1,5,6,27,
Eric S. Lander^1,28,29,
Jesse M. Engreitz ORCID: orcid.org/0000-0002-5754-1719^1,30,31 &
…
Hilary K. Finucane ORCID: orcid.org/0000-0003-3864-9828^1,8

Nature Genetics volume 55, pages 1267–1276 (2023)Cite this article

12k Accesses
20 Citations
37 Altmetric
Metrics details

Subjects

Abstract

Genome-wide association studies (GWASs) are a valuable tool for understanding the biology of complex human traits and diseases, but associated variants rarely point directly to causal genes. In the present study, we introduce a new method, polygenic priority score (PoPS), that learns trait-relevant gene features, such as cell-type-specific expression, to prioritize genes at GWAS loci. Using a large evaluation set of genes with fine-mapped coding variants, we show that PoPS and the closest gene individually outperform other gene prioritization methods, but observe the best overall performance by combining PoPS with orthogonal methods. Using this combined approach, we prioritize 10,642 unique gene–trait pairs across 113 complex traits and diseases with high precision, finding not only well-established gene–trait relationships but nominating new genes at unresolved loci, such as LGR4 for estimated glomerular filtration rate and CCR7 for deep vein thrombosis. Overall, we demonstrate that PoPS provides a powerful addition to the gene prioritization toolbox.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 2: Evaluation of PoPS and comparison to other similarity-based methods.**

**Fig. 3: Most informative gene features used by PoPS.**

**Fig. 4: Comparing and combining PoPS with locus-based methods.**

**Fig. 5: High-confidence genes for selected traits.**

**Fig. 6: Known and new biological examples.**

Refining the impact of genetic evidence on clinical success

Article Open access 17 April 2024

Genome-wide association studies

Article 26 August 2021

Tissue-specific enhancer–gene maps from multimodal single-cell data identify causal disease alleles

Article 09 April 2024

Data availability

A repository of processed gene features, visualizations of top derived features and code to reproduce these analyses are available on GitHub at https://github.com/FinucaneLab/gene_features. Complete PoPS results for 95 complex traits in the UK Biobank and 18 additional disease traits, as well as results for PoPS and locus-based methods in genome-wide significant loci, are available at https://www.finucanelab.org/data.

Code availability

PoPS is available as an open-source Python package at https://github.com/FinucaneLab/pops. A static version of the PoPS method used in the present study is available at https://doi.org/10.5281/zenodo.8002379.

References

Visscher, P. M. et al. 10 years of GWAS discovery: biology, function, and translation. Am. J. Hum. Genet. 101, 5–22 (2017).
Article CAS PubMed PubMed Central Google Scholar
Donnelly, P. Progress and challenges in genome-wide association studies in humans. Nature 456, 728–731 (2008).
Article CAS PubMed Google Scholar
Gallagher, M. D. & Chen-Plotkin, A. S. The post-GWAS era: from association to function. Am. J. Hum. Genet. 102, 717–730 (2018).
Article CAS PubMed PubMed Central Google Scholar
Reich, D. E. et al. Linkage disequilibrium in the human genome. Nature 411, 199–204 (2001).
Article CAS PubMed Google Scholar
van Arensbergen, J., van Steensel, B. & Bussemaker, H. J. In search of the determinants of enhancer-promoter interaction specificity. Trends Cell Biol. 24, 695–702 (2014).
Article PubMed PubMed Central Google Scholar
Pers, T. H. et al. Biological interpretation of genome-wide association studies using predicted gene functions. Nat. Commun. 6, 5890 (2015).
Article CAS PubMed Google Scholar
Hormozdiari, F. et al. Colocalization of GWAS and eQTL signals detects target genes. Am. J. Hum. Genet. 99, 1245–1260 (2016).
Article CAS PubMed PubMed Central Google Scholar
Gusev, A. et al. Integrative approaches for large-scale transcriptome-wide association studies. Nat. Genet. 48, 245–252 (2016).
Article CAS PubMed PubMed Central Google Scholar
de Leeuw, C. A., Mooij, J. M., Heskes, T. & Posthuma, D. MAGMA: generalized gene-set analysis of GWAS data. PLoS Comput. Biol. 11, e1004219 (2015).
Article PubMed PubMed Central Google Scholar
Greene, C. S. et al. Understanding multicellular function and disease with human tissue-specific networks. Nat. Genet. 47, 569–576 (2015).
Article CAS PubMed PubMed Central Google Scholar
Fulco, C. P. et al. Activity-by-contact model of enhancer-promoter regulation from thousands of CRISPR perturbations. Nat. Genet. 51, 1664–1669 (2019).
Article CAS PubMed PubMed Central Google Scholar
Jung, I. et al. A compendium of promoter-centered long-range chromatin interactions in the human genome. Nat. Genet. 51, 1442–1449 (2019).
Article CAS PubMed PubMed Central Google Scholar
Ulirsch, J. C. et al. Interrogation of human hematopoiesis at single-cell and single-variant resolution. Nat. Genet. 51, 683–693 (2019).
Article CAS PubMed PubMed Central Google Scholar
Javierre, B. M. et al. Lineage-specific genome architecture links enhancers and non-coding disease variants to target gene promoters. Cell 167, 1369–1384 (2016).
Article CAS PubMed PubMed Central Google Scholar
Andersson, R. et al. An atlas of active enhancers across human cell types and tissues. Nature 507, 455–461 (2014).
Article CAS PubMed PubMed Central Google Scholar
Liu, Y., Sarkar, A., Kheradpour, P., Ernst, J. & Kellis, M. Evidence of reduced recombination rate in human regulatory domains. Genome Biol. 18, 193 (2017).
Article PubMed PubMed Central Google Scholar
Fine, R. S., Pers, T. H., Amariuta, T., Raychaudhuri, S. & Hirschhorn, J. N. Benchmarker: an unbiased, association-data-driven strategy to evaluate gene prioritization algorithms. Am. J. Hum. Genet. 104, 1025–1039 (2019).
Article CAS PubMed PubMed Central Google Scholar
Barbeira, A. N. et al. Exploiting the GTEx resources to decipher the mechanisms at GWAS loci. Genome Biol. 22, 49 (2021).
Article PubMed PubMed Central Google Scholar
Stacey, D. et al. ProGeM: a framework for the prioritization of candidate causal genes at molecular quantitative trait loci. Nucleic Acids Res. 47, e3 (2019).
Article CAS PubMed Google Scholar
Yang, J. et al. Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 42, 565–569 (2010).
Article CAS PubMed PubMed Central Google Scholar
Kanai, M. et al. Insights from complex trait fine-mapping across diverse populations. Preprint at medRxiv https://doi.org/2021.09.03.21262975 (2021).
The 1000 Genomes Project Consortium et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Article Google Scholar
Li, T. et al. A scored human protein–protein interaction network to catalyze genomic interpretation. Nat. Methods 14, 61–64 (2017).
Article CAS PubMed Google Scholar
Ashburner, M. et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25, 25–29 (2000).
Article CAS PubMed PubMed Central Google Scholar
Kanehisa, M., Goto, S., Sato, Y., Furumichi, M. & Tanabe, M. KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Res. 40, D109–D114 (2012).
Article CAS PubMed Google Scholar
Croft, D. et al. Reactome: a database of reactions, pathways and biological processes. Nucleic Acids Res. 39, D691–D697 (2011).
Article CAS PubMed Google Scholar
Blake, J. A. et al. The Mouse Genome Database: integration of and access to knowledge about the laboratory mouse. Nucleic Acids Res. 42, D810–D817 (2014).
Article CAS PubMed Google Scholar
Teslovich, T. M. et al. Biological, clinical and population relevance of 95 loci for blood lipids. Nature 466, 707–713 (2010).
Article CAS PubMed PubMed Central Google Scholar
Wheeler, E. et al. Impact of common genetic determinants of hemoglobin A1c on type 2 diabetes risk and diagnosis in ancestrally diverse populations: a transethnic genome-wide meta-analysis. PLoS Med. 14, e1002383 (2017).
Article PubMed PubMed Central Google Scholar
Kurkó, J. et al. Genetics of rheumatoid arthritis—a comprehensive review. Clin. Rev. Allergy Immunol. 45, 170–179 (2013).
Article PubMed PubMed Central Google Scholar
Gejman, P. V., Sanders, A. R. & Duan, J. The role of genetics in the etiology of schizophrenia. Psychiatr. Clin. North Am. 33, 35–66 (2010).
Article PubMed PubMed Central Google Scholar
Heyes, S. et al. Genetic disruption of voltage-gated calcium channels in psychiatric and neurological disorders. Prog. Neurobiol. 134, 36–54 (2015).
Article CAS PubMed PubMed Central Google Scholar
GTEx, Consortium et al. Genetic effects on gene expression across human tissues. Nature 550, 204–213 (2017).
Article Google Scholar
Zhu, Z. et al. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat. Genet. 48, 481–487 (2016).
Article CAS PubMed Google Scholar
GTEx Consortium. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330 (2020).
Article Google Scholar
Wang, Q. S. et al. Leveraging supervised learning for functionally informed fine-mapping of cis-eQTLs identifies an additional 20,913 putative causal eQTLs. Nat. Commun. 12, 3394 (2021).
Article CAS PubMed PubMed Central Google Scholar
Mountjoy, E. et al. An open approach to systematically prioritize causal variants and genes at all published human GWAS trait-associated loci. Nat. Genet. 53, 1527–1533 (2021).
Article CAS PubMed PubMed Central Google Scholar
Dron, J. S. & Hegele, R. A. Genetics of lipid and lipoprotein disorders and traits. Curr. Genet. Med. Rep. 4, 130–141 (2016).
Article PubMed PubMed Central Google Scholar
Thompson, D. J. et al. Genetic predisposition to mosaic Y chromosome loss in blood. Nature 575, 652–657 (2019).
Article CAS PubMed PubMed Central Google Scholar
Brisch, R. et al. The role of dopamine in schizophrenia from a neurobiological and evolutionary perspective: old fashioned, but still in vogue. Front. Psychiatry 5, 47 (2014).
PubMed PubMed Central Google Scholar
Basak, A. et al. BCL11A deletions result in fetal hemoglobin persistence and neurodevelopmental alterations. J. Clin. Invest. 125, 2363–2368 (2015).
Article PubMed PubMed Central Google Scholar
Quednow, B. B., Brzózka, M. M. & Rossner, M. J. Transcription factor 4 (TCF4) and schizophrenia: integrating the animal and the human perspective. Cell. Mol. Life Sci. 71, 2815–2835 (2014).
Article CAS PubMed Google Scholar
Ulirsch, J. C. et al. Systematic functional dissection of common genetic variation affecting red blood cell traits. Cell 165, 1530–1545 (2016).
Article CAS PubMed PubMed Central Google Scholar
Cvejic, A. et al. SMIM1 underlies the Vel blood group and influences red blood cell traits. Nat. Genet. 45, 542–545 (2013).
Article CAS PubMed PubMed Central Google Scholar
Cawley, N. X. et al. Obese carboxypeptidase E knockout mice exhibit multiple defects in peptide hormone processing contributing to low bone mineral density. Am. J. Physiol. Endocrinol. Metab. 299, E189–E197 (2010).
Article CAS PubMed PubMed Central Google Scholar
Kato, S. et al. Leucine-rich repeat-containing G protein-coupled receptor-4 (LGR4, Gpr48) is essential for renal development in mice. Nephron Exp. Nephrol. 104, e63–e75 (2006).
Article CAS PubMed Google Scholar
Budnik, I. & Brill, A. Immune factors in deep vein thrombosis initiation. Trends Immunol. 39, 610–623 (2018).
Article CAS PubMed PubMed Central Google Scholar
Lambert, M. P., Sachais, B. S. & Kowalska, M. A. Chemokines and thrombogenicity. Thromb. Haemost. 97, 722–729 (2007).
Article CAS PubMed Google Scholar
Purcell, S. et al. PLINK: a toolset for whole-genome association and population-based linkage analysis. Am. J. Hum. Genet. 81, 559–575 (2007).
Article CAS PubMed PubMed Central Google Scholar
Loh, P.-R., Kichaev, G., Gazal, S., Schoech, A. P. & Price, A. L. Mixed-model association for biobank-scale datasets. Nat. Genet. 50, 906–908 (2018).
Article CAS PubMed PubMed Central Google Scholar
Zhou, W. et al. Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies. Nat. Genet. 50, 1335–1341 (2018).
Article CAS PubMed PubMed Central Google Scholar
Stuart, T. et al. Comprehensive integration of single-cell data. Cell 177, 1888–1902 (2019).
Article CAS PubMed PubMed Central Google Scholar
Baglama, J. & Reichel, L. Restarted block Lanczos bidiagonalization methods. Numer. Algorithms 43, 251–272 (2007).
Article Google Scholar
Hyvärinen, A. Fast and robust fixed-point algorithms for independent component analysis. IEEE Trans. Neural Netw. 10, 626–634 (1999).
Article PubMed Google Scholar
McInnes, L., Healy, J. & Melville, J. UMAP: uniform manifold approximation and projection for dimension reduction. Preprint at https://doi.org/10.48550/arXiv.1802.03426 (2018).
Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).
Article CAS PubMed PubMed Central Google Scholar
Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases and traits. Nat. Genet. 47, 1236–1241 (2015).
Article CAS PubMed PubMed Central Google Scholar
UK10K Consortium et al. The UK10K project identifies rare variants in health and disease. Nature 526, 82–90 (2015).
Article Google Scholar
Csárdi, G. & Nepusz, T. The igraph software package for complex network research. Int. J. complex syst. 1695, 1–9 (2006).
Google Scholar
Wang, G., Sarkar, A., Carbonetto, P. & Stephens, M. A simple new approach to variable selection in regression, with application to genetic fine mapping. J. R. Stat. Soc. Series B Stat. Methodol. 82, 1273–1300 (2020).
Article PubMed PubMed Central Google Scholar
Benner, C. et al. Prospects of fine-mapping trait-associated genomic regions by using summary statistics from genome-wide association studies. Am. J. Hum. Genet. 101, 539–551 (2017).
Article CAS PubMed PubMed Central Google Scholar
McLaren, W. et al. The Ensembl variant effect predictor. Genome Biol. 17, 122 (2016).
Article PubMed PubMed Central Google Scholar
Cairns, J. et al. CHiCAGO: robust detection of DNA looping interactions in Capture Hi-C data. Genome Biol. 17, 127 (2016).
Article PubMed PubMed Central Google Scholar
Roadmap Epigenomics Consortium et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).
Article PubMed Central Google Scholar
Calderon, D. et al. Landscape of stimulation-responsive chromatin across diverse human immune cells. Nat. Genet. 51, 1494–1505 (2019).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We thank K. Aragam, A. Butterworth, M. Daly, N. Artomov, Y. Reshef and all members of the Finucane lab for helpful discussions. This research was conducted using the UK Biobank Resource under project 31063. H.K.F. was funded by a National Institutes of Health (NIH) grant (no. DP5 OD024582) and by Eric and Wendy Schmidt. J.M.E. was supported by a Pathway to Independence Award (grant nos. K99HG00917 and R00HG009917), the Harvard Society of Fellows and the Base Research Initiative at Stanford University. J.M. and J.N.H. were supported by an NIH grant (no. R01DK075787). R.S.F. was supported by National Human Genome Research Institute, NIH (grant no. F31HG009850). J.O.-M. was supported by the Richard and Susan Smith Family Foundation, the HHMI Damon Runyon Cancer Research Foundation Fellowship (no. DRG-2274-16), the AGA Research Foundation’s AGA-Takeda Pharmaceuticals Research Scholar Award in Inflammatory Bowel Disease (grant no. AGA2020-13-01), the HDDC Pilot and Feasibility (grant no. P30 DK034854) and the Food Allergy Science Initiative.

Author information

Jacob C. Ulirsch
Present address: Artificial Intelligence Laboratory, Illumina, Inc., San Diego, CA, USA
Rebecca S. Fine
Present address: Vertex Pharmaceuticals Incorporated, Boston, MA, USA
Charles P. Fulco
Present address: Bristol Myers Squibb, Cambridge, MA, USA
Moshe Biton
Present address: Department of Biological Regulation, Weizmann Institute of Science, Rehovot, Israel
Aviv Regev
Present address: Genentech, San Francisco, CA, USA
These authors contributed equally: Elle M. Weeks, Jacob C. Ulirsch.

Authors and Affiliations

Broad Institute of MIT and Harvard, Cambridge, MA, USA
Elle M. Weeks, Jacob C. Ulirsch, Nathan Y. Cheng, Rebecca S. Fine, Jenkai Miao, Tejal A. Patwardhan, Masahiro Kanai, Joseph Nasser, Charles P. Fulco, Katherine C. Tashman, Francois Aguet, Taibo Li, Jose Ordovas-Montanes, Christopher S. Smillie, Moshe Biton, Alex K. Shalek, Ramnik J. Xavier, Aviv Regev, Rajat M. Gupta, Kasper Lage, Kristin G. Ardlie, Joel N. Hirschhorn, Eric S. Lander, Jesse M. Engreitz & Hilary K. Finucane
Program in Biological and Biomedical Sciences, Harvard Medical School, Boston, MA, USA
Jacob C. Ulirsch
Program in Computational & Systems Biology, MIT, Cambridge, MA, USA
Brian L. Trippe & Christopher S. Smillie
Computer Science & Artificial Intelligence Lab, MIT, Cambridge, MA, USA
Brian L. Trippe
Division of Endocrinology and Center for Basic and Translational Obesity Research, Boston Children’s Hospital, Boston, MA, USA
Rebecca S. Fine, Jenkai Miao & Joel N. Hirschhorn
Department of Genetics, Harvard Medical School, Boston, MA, USA
Rebecca S. Fine & Joel N. Hirschhorn
Department of Statistics, Harvard University, Cambridge, MA, USA
Tejal A. Patwardhan
Analytic and Translational Genetics Unit, MGH, Boston, MA, USA
Masahiro Kanai & Hilary K. Finucane
Program in Bioinformatics and Integrative Genomics, Harvard Medical School, Boston, MA, USA
Masahiro Kanai
Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
Masahiro Kanai
MD-PhD Program, Johns Hopkins University School of Medicine, Baltimore, MD, USA
Taibo Li
Division of Gastroenterology, Hepatology, and Nutrition, Boston Children’s Hospital, Boston, MA, USA
Jose Ordovas-Montanes
Program in Immunology, Harvard Medical School, Boston, MA, USA
Jose Ordovas-Montanes
Harvard Stem Cell Institute, Cambridge, MA, USA
Jose Ordovas-Montanes
Department of Molecular Biology, MGH, Boston, MA, USA
Moshe Biton & Ramnik J. Xavier
Institute for Medical Engineering and Science, MIT, Cambridge, MA, USA
Alex K. Shalek
Department of Chemistry, MIT, Cambridge, MA, USA
Alex K. Shalek
Koch Institute for Integrative Cancer Research, MIT, Cambridge, MA, USA
Alex K. Shalek & Aviv Regev
Ragon Institute of MGH, MMIT, Cambridge, MA, USA
Alex K. Shalek
Gastrointestinal Unit and Center for the Study of Inflammatory Bowel Disease, MGH, Boston, MA, USA
Ashwin N. Ananthakrishnan & Ramnik J. Xavier
Center for Computational and Integrative Biology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
Ramnik J. Xavier
Howard Hughes Medical Institute, MIT, Cambridge, MA, USA
Aviv Regev
Division of Cardiovascular Medicine and Genetics, Department of Medicine, Brigham and Women’s Hospital, Boston, MA, USA
Rajat M. Gupta
Harvard Medical School, Boston, MA, USA
Rajat M. Gupta
Department of Surgery, MGH, Boston, MA, USA
Kasper Lage
Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
Kristin G. Ardlie
Department of Pediatrics, Harvard Medical School, Boston, MA, USA
Joel N. Hirschhorn
Department of Biology, MIT, Cambridge, MA, USA
Eric S. Lander
Department of Systems Biology, Harvard Medical School, Boston, MA, USA
Eric S. Lander
Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
Jesse M. Engreitz
BASE Initiative, Betty Irene Moore Children’s Heart Center, Lucile Packard Children’s Hospital, Stanford University School of Medicine, Stanford, CA, USA
Jesse M. Engreitz

Authors

Elle M. Weeks
View author publications
You can also search for this author in PubMed Google Scholar
Jacob C. Ulirsch
View author publications
You can also search for this author in PubMed Google Scholar
Nathan Y. Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Brian L. Trippe
View author publications
You can also search for this author in PubMed Google Scholar
Rebecca S. Fine
View author publications
You can also search for this author in PubMed Google Scholar
Jenkai Miao
View author publications
You can also search for this author in PubMed Google Scholar
Tejal A. Patwardhan
View author publications
You can also search for this author in PubMed Google Scholar
Masahiro Kanai
View author publications
You can also search for this author in PubMed Google Scholar
Joseph Nasser
View author publications
You can also search for this author in PubMed Google Scholar
Charles P. Fulco
View author publications
You can also search for this author in PubMed Google Scholar
Katherine C. Tashman
View author publications
You can also search for this author in PubMed Google Scholar
Francois Aguet
View author publications
You can also search for this author in PubMed Google Scholar
Taibo Li
View author publications
You can also search for this author in PubMed Google Scholar
Jose Ordovas-Montanes
View author publications
You can also search for this author in PubMed Google Scholar
Christopher S. Smillie
View author publications
You can also search for this author in PubMed Google Scholar
Moshe Biton
View author publications
You can also search for this author in PubMed Google Scholar
Alex K. Shalek
View author publications
You can also search for this author in PubMed Google Scholar
Ashwin N. Ananthakrishnan
View author publications
You can also search for this author in PubMed Google Scholar
Ramnik J. Xavier
View author publications
You can also search for this author in PubMed Google Scholar
Aviv Regev
View author publications
You can also search for this author in PubMed Google Scholar
Rajat M. Gupta
View author publications
You can also search for this author in PubMed Google Scholar
Kasper Lage
View author publications
You can also search for this author in PubMed Google Scholar
Kristin G. Ardlie
View author publications
You can also search for this author in PubMed Google Scholar
Joel N. Hirschhorn
View author publications
You can also search for this author in PubMed Google Scholar
Eric S. Lander
View author publications
You can also search for this author in PubMed Google Scholar
Jesse M. Engreitz
View author publications
You can also search for this author in PubMed Google Scholar
Hilary K. Finucane
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

E.M.W. and H.K.F. conceived of the study. E.M.W., J.C.U., N.Y.C. and H.K.F. designed the research, performed the experiments, analyzed the data and interpreted the results. B.L.T. and R.S.F. designed and performed the enrichment-based validations. J.M., T.A.P., M.K., J.N., C.P.F., K.C.T., F.A., T.L., J.O.-M., C.S.S., M.B., A.K.S., A.N.A., R.J.X., A.R., R.M.G., K.L., K.G.A., J.N.H. and J.M.E. provided data or analysis tools used by PoPS or other gene prioritization methods. E.S.L. helped advise the project. E.M.W., J.C.U. and H.K.F. wrote the manuscript with input from all authors. H.K.F. supervised the project.

Corresponding authors

Correspondence to Elle M. Weeks or Hilary K. Finucane.

Ethics declarations

Competing interests

J.C.U. reports compensation from consulting services with Goldfinch Bio and is an employee of Illumina. R.S.F. is an employee of Vertex Pharmaceuticals Incorporated. C.P.F. is an employee of Bristol Myers Squibb. J.O.-M. reports compensation for consulting services with Cellarity. A.R. is a cofounder and equity holder of Celsius Therapeutics and an equity holder in Immunitas, and was an SAB member of Thermo Fisher Scientific, Syros Pharmaceuticals, Neogene Therapeutics and Asimov until 31 July 2020. From 1 August 2020, A.R. is an employee of Genentech. J.N.H. served on the Scientific Advisory Board of and consults for Camp4 Therapeutics. E.S.L. serves on the Board of Directors for Codiak BioSciences and Neon Therapeutics, and serves on the Scientific Advisory Board of F-Prime Capital Partners and Third Rock Ventures; he is also affiliated with several nonprofit organizations including serving on the Board of Directors of the Innocence Project, Count Me In and Biden Cancer Initiative, and the Board of Trustees for the Parker Institute for Cancer Immunotherapy. He has served and continues to serve on various federal advisory committees. The remaining authors declare no competing interests.

Peer review

Peer review information

Nature Genetics thanks the anonymous reviewers for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 PoPS model parameter choices and feature selection.

a-c, Results using Benchmarker to compare different parameter choices for fitting the PoPS model, meta-analyzed across independent traits (n = 46). Error bars represent 95% confidence intervals around the meta-analyzed point estimate. a, Feature selection: GLS with an L₁ penalty on the full set of features performs less well than GLS after marginal selection using a P value < 0.05 threshold from the two-sided Wald test. b, Error model: ordinary least squares (OLS) performs less well than generalized least squares (GLS) using marginal selection from a. c, Joint model regularization: GLS after marginal feature selection with an L₂ penalty performs better than similar models with an L₁ penalty or no penalty. d, Number of features selected (marginal P value < 0.05 from the two-sided Wald test) and included in the joint predictive model for PoPS for each trait. A legend for trait domain colors is provided in Fig. 2.

Extended Data Fig. 2 Additional comparisons using closest gene metric.

a, Results using closest gene enrichment to compare similarity-based gene prioritization methods, meta-analyzed within each trait domain across independent traits (n = 46). Error bars represent 95% confidence intervals around the meta-analyzed point estimate. b, Results using closest gene enrichment to compare PoPS results using different feature sets, meta-analyzed within each trait domain across independent traits (n = 46). Error bars represent 95% confidence intervals around the meta-analyzed point estimate.

Extended Data Fig. 3 Comparison of gene expression features derived from bulk and single-cell RNA seq datasets.

a, Results using Benchmarker to compare PoPS results using different feature sets, meta-analyzed within each trait domain across independent traits (n = 46). Error bars represent 95% confidence intervals around the meta-analyzed point estimate. b, Results using closest gene enrichment to compare PoPS results using different feature sets, meta-analyzed within each trait domain across independent traits (n = 46). Error bars represent 95% confidence intervals around the meta-analyzed point estimate.

Extended Data Fig. 4 Comparison of similarity-based methods using precision and recall.

Precision-recall plot showing performance of similarity-based methods.

Extended Data Fig. 5 Comparing prioritization criteria.

Precision-recall plots for each method with varying prioritization criteria. Each point shows the precision and recall for a set of prioritized genes selected using prioritization criteria based on absolute thresholds and/or relative rank in a locus. For all methods, the star represents the final chosen criteria. a, Circles: PoP scores ranked ≤ 2–5 in the locus. Star: highest PoPS score in the locus. b, Plus: significant TWAS P value after Bonferroni correction (P < 0.05/235,584). Circles: TWAS P values ranked ≤ 2–5 in the locus. Star: significant TWAS P value after Bonferroni correction (P < 0.05/235,584) and the most significant in the locus. c, Pluses: CLPP > 0.01, 0.1, 0.5, 0.9, and 0.99. Circles: CLPP > 0.01, 0.1, 0.5, 0.9, and 0.99 and also the highest CLPP in the locus. Star: CLPP > 0.1 and also the highest CLPP in the locus. d, Plus: any predicted connection from ABC. Circles: ABC connection strength ranked ≤ 2–5 in the locus. Star: highest ABC connection strength in the locus. e, Pluses: any predicted connection from PCHiC for individual datasets. Triangle: any predicted connection from PCHi-C in any dataset. Circles: highest connection strength in the locus for individual datasets. Star: highest connection strength in the locus in any dataset. f, Pluses: any predicted connection from E-P correlation for individual datasets. Triangle: any predicted connection from E-P correlation in any dataset. Circles: highest connection strength in the locus for individual datasets. Star: highest connection strength in the locus in any dataset. g, Circle: closest gene by distance to the transcription start site. Star: closest gene by distance to the gene body. h, Circles: MAGMA z-scores ranked ≤ 2–5 in the locus. Star: highest MAGMA score in the locus. i, Plus: significant SMR P value after Bonferroni correction (P < 0.05/18,383). Circles: SMR P values ranked ≤ 2–5 in the locus. Star: significant SMR P value after Bonferroni correction (P < 0.05/18,383) and the most significant in the locus.

Extended Data Fig. 6 Performance of PoPS and locus-based gene prioritization methods by trait.

Precision-recall plots for each method. Each point represents a single trait colored by trait domain. Only traits for which the method prioritized at least five genes in the validation loci were included. A legend for trait domain colors is provided in Fig. 2.

Extended Data Fig. 7 Additional performance metrics using evaluation gene set in 1,348 non-coding loci containing genes that harbor fine-mapped protein coding variants.

a, Sensitivity-specificity plot showing performance of locus-based methods, PoPS, intersections of pairs of locus-based methods, and intersections of PoPs with locus-based methods on the evaluation gene set of 589 genes with fine-mapped protein coding variants. b, Heatmap showing performance using the F-score of locus-based methods, PoPS, intersections of pairs of locus-based methods, and intersections of PoPs with locus-based methods.

Extended Data Fig. 8 Number of prioritized genes for non-UK Biobank traits.

Number of unique gene-trait pairs prioritized by PoPS, locus-based gene prioritization methods, and their intersections, sorted by estimated precision. The full height of each bar represents the total number of genes prioritized. The opaque portion of each bar represents the expected number of true causal genes prioritized. Methods to the left of the dashed line achieve precision greater than 75%.

Extended Data Fig. 9 Known example RBM38.

Top: summary statistics colored by LD to the lead variant and fine-mapping results for variants in the locus colored by credible set. Bottom: results from PoPS and locus-based methods for all genes in the locus. Genes are colored by strength of prediction for each method with a star denoting the prioritized gene. Variant rs737092, RBM38 for mean corpuscular hemoglobin (MCH).

Extended Data Fig. 10 Sensitivity of precision and recall estimates to locus definition.

a, Loci defined as +/− 100 kb on either side of the lead variant. b, Loci defined as +/− 1 Mb on either side of the lead variant. c, Results restricted to loci in fine-mapped regions with three or fewer independent credible sets. d, Results restricted to loci in fine-mapped regions with five or fewer independent credible sets.

Supplementary information

Reporting Summary

Peer Review File

Supplementary Tables

Supplementary Tables 1–12.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Weeks, E.M., Ulirsch, J.C., Cheng, N.Y. et al. Leveraging polygenic enrichments of gene features to predict genes underlying complex traits and diseases. Nat Genet 55, 1267–1276 (2023). https://doi.org/10.1038/s41588-023-01443-6

Download citation

Received: 14 August 2020
Accepted: 09 June 2023
Published: 13 July 2023
Issue Date: August 2023
DOI: https://doi.org/10.1038/s41588-023-01443-6

This article is cited by

Genetics of chronic respiratory disease
- Ian Sayers
- Catherine John
- Ian P. Hall
Nature Reviews Genetics (2024)
Multiomic approaches to stroke: the beginning of a journey
- Stéphanie Debette
- Daniel I. Chasman
Nature Reviews Neurology (2024)
The Role of Genetics in Advancing Cardiometabolic Drug Development
- Roukoz Abou-Karam
- Fangzhou Cheng
- Akl C. Fahed
Current Atherosclerosis Reports (2024)
Epigenomic insights into common human disease pathology
- Christopher G. Bell
Cellular and Molecular Life Sciences (2024)
Biological basis of extensive pleiotropy between blood traits and cancer risk
- Miguel Angel Pardo-Cea
- Xavier Farré
- Miquel Angel Pujana
Genome Medicine (2024)