Leveraging fine-mapping and multipopulation training data to improve cross-population polygenic risk scores

Weissbrod, Omer; Kanai, Masahiro; Shi, Huwenbo; Gazal, Steven; Peyrot, Wouter J.; Khera, Amit V.; Okada, Yukinori; Martin, Alicia R.; Finucane, Hilary K.; Price, Alkes L.

doi:10.1038/s41588-022-01036-9

Article
Published: 07 April 2022

Leveraging fine-mapping and multipopulation training data to improve cross-population polygenic risk scores

Nature Genetics volume 54, pages 450–458 (2022)Cite this article

11k Accesses
75 Citations
74 Altmetric
Metrics details

Subjects

Abstract

Polygenic risk scores suffer reduced accuracy in non-European populations, exacerbating health disparities. We propose PolyPred, a method that improves cross-population polygenic risk scores by combining two predictors: a new predictor that leverages functionally informed fine-mapping to estimate causal effects (instead of tagging effects), addressing linkage disequilibrium differences, and BOLT-LMM, a published predictor. When a large training sample is available in the non-European target population, we propose PolyPred⁺, which further incorporates the non-European training data. We applied PolyPred to 49 diseases/traits in four UK Biobank populations using UK Biobank British training data, and observed relative improvements versus BOLT-LMM ranging from +7% in south Asians to +32% in Africans, consistent with simulations. We applied PolyPred⁺ to 23 diseases/traits in UK Biobank east Asians using both UK Biobank British and Biobank Japan training data, and observed improvements of +24% versus BOLT-LMM and +12% versus PolyPred. Summary statistics-based analogs of PolyPred and PolyPred⁺ attained similar improvements.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Overview of PolyPred and PolyPred⁺.**

**Fig. 2: Recommendations for the application of PolyPred, PolyPred⁺ and related methods.**

**Fig. 3: Cross-population PRS results for simulated UK Biobank traits using in-sample LD.**

**Fig. 4: Cross-population PRS results for real UK Biobank traits.**

**Fig. 5: Cross-population PRS results for Biobank Japan and Uganda-APCDR traits.**

**Fig. 6: Cross-population PRS results for UK Biobank east Asians when incorporating both European and non-European training data.**

BridgePRS leverages shared genetic effects across ancestries to increase polygenic risk score portability

Article Open access 20 December 2023

Quantifying portable genetic effects and improving cross-ancestry genetic prediction with GWAS summary statistics

Article Open access 14 February 2023

Analysis of polygenic risk score usage and performance in diverse human populations

Article Open access 25 July 2019

Data availability

Access to the UK Biobank resource is available via application (http://www.ukbiobank.ac.uk). PRS coefficients generated in the present study are available for public download at http://data.broadinstitute.org/alkesgroup/polypred_results. Summary LD information of n = 337,000 British-ancestry UK Biobank individuals for 2,763 overlapping 3-Mb loci is available at https://data.broadinstitute.org/alkesgroup/UKBB_LD. Summary LD information of n = 50,000 UK Biobank individuals for SBayesR is available at https://zenodo.org/record/3350914. Summary LD information used by PRS-CS is available at https://github.com/getian107/PRScs. Baseline-LF v.2.2.UKB annotations and LD scores for UK Biobank SNPs are available at https://data.broadinstitute.org/alkesgroup/LDSCORE/baselineLF_v2.2.UKB.tar.gz. Source data are provided with this paper.

Code availability

PolyPred and PolyPred⁺ are provided as part of the open-source software package PolyFun, which is freely available at https://doi.org/10.5281/zenodo.6139679 (ref. ⁸⁹) and https://github.com/omerwe/polyfun. BOLT-LMM is available at https://data.broadinstitute.org/alkesgroup/BOLT-LMM. SBayesR is available at https://cnsgenomics.com/software/gctb. PRS-CS is available at https://github.com/getian107/PRScs.

References

Chatterjee, N., Shi, J. & García-Closas, M. Developing and evaluating polygenic risk prediction models for stratified disease prevention. Nat. Rev. Genet. 17, 392–406 (2016).
Article CAS PubMed PubMed Central Google Scholar
Khera, A. V. et al. Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat. Genet. 50, 1219–1224 (2018).
Article CAS PubMed PubMed Central Google Scholar
Torkamani, A., Wineinger, N. E. & Topol, E. J. The personal and clinical utility of polygenic risk scores. Nat. Rev. Genet. 19, 581–590 (2018).
Article CAS PubMed Google Scholar
Khera, A. V. et al. Polygenic prediction of weight and obesity trajectories from birth to adulthood. Cell 177, 587–596 (2019).
Article CAS PubMed PubMed Central Google Scholar
Mavaddat, N. et al. Polygenic risk scores for prediction of breast cancer and breast cancer subtypes. Am. J. Hum. Genet. 104, 21–34 (2019).
Article CAS PubMed Google Scholar
Li, R., Chen, Y., Ritchie, M. D. & Moore, J. H. Electronic health records and polygenic risk scores for predicting disease risk. Nat. Rev. Genet. 21, 493–502 (2020).
Article CAS PubMed Google Scholar
Márquez-Luna, C., Loh, P.-R. & South Asian Type 2 Diabetes (SAT2D) Consortium, SIGMA Type 2 Diabetes Consortium & Price, A. L. Multiethnic polygenic risk scores improve risk prediction in diverse populations. Genet. Epidemiol. 41, 811–823 (2017).
Grinde, K. E. et al. Generalizing polygenic risk scores from Europeans to Hispanics/Latinos. Genet. Epidemiol. 43, 50–62 (2019).
Article PubMed Google Scholar
Peterson, R. E. et al. Genome-wide association studies in ancestrally diverse populations: opportunities, methods, pitfalls, and recommendations. Cell 179, 589–603 (2019).
Article CAS PubMed PubMed Central Google Scholar
Sirugo, G., Williams, S. M. & Tishkoff, S. A. The missing diversity in human genetic studies. Cell 177, 26–31 (2019).
Article CAS PubMed PubMed Central Google Scholar
Duncan, L. et al. Analysis of polygenic risk score usage and performance in diverse human populations. Nat. Commun. 10, 3328 (2019).
Article CAS PubMed PubMed Central Google Scholar
Gurdasani, D., Barroso, I., Zeggini, E. & Sandhu, M. S. Genomics of disease risk in globally diverse populations. Nat. Rev. Genet. 20, 520–535 (2019).
Article CAS PubMed Google Scholar
Martin, A. R. et al. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat. Genet. 51, 584–591 (2019).
Article CAS PubMed PubMed Central Google Scholar
Wang, Y. et al. Theoretical and empirical quantification of the accuracy of polygenic scores in ancestry divergent populations. Nat. Commun. 11, 3865 (2020).
Article PubMed PubMed Central CAS Google Scholar
Amariuta, T. et al. Improving the trans-ancestry portability of polygenic risk scores by prioritizing variants in predicted cell-type-specific regulatory elements. Nat. Genet. 52, 1346–1354 (2020).
Article CAS PubMed PubMed Central Google Scholar
Marnetto, D. et al. Ancestry deconvolution and partial polygenic score can improve susceptibility predictions in recently admixed individuals. Nat. Commun. 11, 1628 (2020).
Article PubMed PubMed Central CAS Google Scholar
Bitarello, B. D. & Mathieson, I. Polygenic scores for height in admixed populations. G3 10, 4027–4036 (2020).
Article CAS PubMed PubMed Central Google Scholar
Chen, M.-H. et al. Trans-ethnic and ancestry-specific blood-cell genetics in 746,667 individuals from 5 global populations. Cell 182, 1198–1213 (2020).
Article CAS PubMed PubMed Central Google Scholar
Mahajan, A. et al. Trans-ancestry genetic study of type 2 diabetes highlights the power of diverse populations for discovery and translation. Preprint at medRxiv https://www.medrxiv.org/content/10.1101/2020.09.22.20198937v1 (2020).
Cavazos, T. B. & Witte, J. S. Inclusion of variants discovered from diverse populations improves polygenic risk score transferability. Hum. Genet. Genom. Adv. 2, 100017 (2021).
Article Google Scholar
Mills, M. C. & Rahal, C. The GWAS diversity monitor tracks diversity by disease in real time. Nat. Genet. 52, 242–243 (2020).
Article CAS PubMed Google Scholar
Lehmann, B. C., Mackintosh, M., McVean, G. & Holmes, C. C. High trait variability in optimal polygenic prediction strategy within multiple-ancestry cohorts. Preprint at bioRxiv https://www.biorxiv.org/content/10.1101/2021.01.15.426781v2 (2021).
Ji, Y. et al. Incorporating European GWAS findings improve polygenic risk prediction accuracy of breast cancer among East Asians. Genet. Epidemiol. https://doi.org/10.1002/gepi.22382 (2021).
Ruan, Y. et al. Improving polygenic prediction in ancestrally diverse populations. Preprint at medRxiv https://www.medrxiv.org/content/10.1101/2020.12.27.20248738v2 (2020).
Cai, M. et al. A unified framework for cross-population trait prediction by leveraging the genetic correlation of polygenic traits. Am. J. Hum. Genet. https://doi.org/10.1016/j.ajhg.2021.03.002 (2021).
Huang, Q. Q. et al. Transferability of genetic loci and polygenic scores for cardiometabolic traits in British Pakistanis and Bangladeshis. Preprint at medRxiv https://www.medrxiv.org/content/10.1101/2020.12.27.20248738v2 (2021).
Durvasula, A. & Lohmueller, K. E. Negative selection on complex traits limits phenotype prediction accuracy between populations. Am. J. Hum. Genet. 108, 620–631 (2021).
Article CAS PubMed PubMed Central Google Scholar
Coram, M. A., Fang, H., Candille, S. I., Assimes, T. L. & Tang, H. Leveraging multi-ethnic evidence for risk assessment of quantitative traits in minority populations. Am. J. Hum. Genet. 101, 218–226 (2017).
Article CAS PubMed PubMed Central Google Scholar
Wojcik, G. L. et al. Genetic analyses of diverse populations improves discovery for complex traits. Nature 570, 514–518 (2019).
Article CAS PubMed PubMed Central Google Scholar
Shi, H. et al. Population-specific causal disease effect sizes in functionally important regions impacted by selection. Nat. Commun. 12, 1098 (2021).
Article CAS PubMed PubMed Central Google Scholar
Kuchenbaecker, K. et al. The transferability of lipid loci across African, Asian and European cohorts. Nat. Commun. 10, 4330 (2019).
Article PubMed PubMed Central CAS Google Scholar
Mostafavi, H. et al. Variable prediction accuracy of polygenic scores within an ancestry group. eLife 9, e48376 (2020).
Article CAS PubMed PubMed Central Google Scholar
Vilhjálmsson, B. J. et al. Modeling linkage disequilibrium increases accuracy of polygenic risk scores. Am. J. Hum. Genet. 97, 576–592 (2015).
Article PubMed PubMed Central CAS Google Scholar
Schaid, D. J., Chen, W. & Larson, N. B. From genome-wide associations to candidate causal variants by statistical fine-mapping. Nat. Rev. Genet. 19, 491–504 (2018).
Article CAS PubMed PubMed Central Google Scholar
Weissbrod, O. et al. Functionally informed fine-mapping and polygenic localization of complex trait heritability. Nat. Genet. 52, 1355–1363 (2020).
Article CAS PubMed PubMed Central Google Scholar
Loh, P.-R. et al. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat. Genet. 47, 284–290 (2015).
Article CAS PubMed PubMed Central Google Scholar
Loh, P.-R., Kichaev, G., Gazal, S., Schoech, A. P. & Price, A. L. Mixed-model association for biobank-scale datasets. Nat. Genet. 50, 906–908 (2018).
Article CAS PubMed PubMed Central Google Scholar
Lloyd-Jones, L. R. et al. Improved polygenic prediction by Bayesian multiple regression on summary statistics. Nat. Commun. 10, 5086 (2019).
Article PubMed PubMed Central CAS Google Scholar
Ge, T., Chen, C.-Y., Ni, Y., Feng, Y.-C. A. & Smoller, J. W. Polygenic prediction via Bayesian regression and continuous shrinkage priors. Nat. Commun. 10, 1776 (2019).
Article PubMed PubMed Central CAS Google Scholar
Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).
Article CAS PubMed PubMed Central Google Scholar
Nagai, A. et al. Overview of the BioBank Japan project: study design and profile. J. Epidemiol. 27, S2–S8 (2017).
Article PubMed PubMed Central Google Scholar
Asiki, G. et al. The general population cohort in rural south-western Uganda: a platform for communicable and non-communicable disease studies. Int. J. Epidemiol. 42, 129–141 (2013).
Article PubMed PubMed Central Google Scholar
Heckerman, D. et al. Linear mixed model for heritability estimation that explicitly addresses environmental variation. Proc. Natl Acad. Sci. USA 113, 7377–7382 (2016).
Article CAS PubMed PubMed Central Google Scholar
Duan, S., Zhang, W., Cox, N. J. & Dolan, M. E. FstSNP-HapMap3: a database of SNPs with high population differentiation for HapMap3. Bioinformation 3, 139–141 (2008).
Article PubMed PubMed Central Google Scholar
Purcell, S. M. et al. Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature 460, 748–752 (2009).
Article CAS PubMed Google Scholar
Stahl, E. A. et al. Bayesian inference analyses of the polygenic architecture of rheumatoid arthritis. Nat. Genet. 44, 483–489 (2012).
Article CAS PubMed PubMed Central Google Scholar
Gazal, S. et al. Functional architecture of low-frequency variants highlights strength of negative selection across coding and non-coding annotations. Nat. Genet. 50, 1600–1607 (2018).
Article CAS PubMed PubMed Central Google Scholar
Lam, M. et al. Comparative genetic architectures of schizophrenia in East Asian and European populations. Nat. Genet. 51, 1670–1678 (2019).
Article CAS PubMed PubMed Central Google Scholar
Nievergelt, C. M. et al. International meta-analysis of PTSD genome-wide association studies identifies sex- and ancestry-specific genetic risk loci. Nat. Commun. 10, 4558 (2019).
Article PubMed PubMed Central CAS Google Scholar
Sakaue, S. et al. Trans-biobank analysis with 676,000 individuals elucidates the association of polygenic risk scores of complex traits with human lifespan. Nat. Med. 26, 542–548 (2020).
Article CAS PubMed Google Scholar
Vuckovic, D. et al. The polygenic and monogenic basis of blood traits and diseases. Cell 182, 1214–1231.e11 (2020).
Article CAS PubMed PubMed Central Google Scholar
Guo, J. et al. Global genetic differentiation of complex traits shaped by natural selection in humans. Nat. Commun. 9, 1865 (2018).
Article PubMed PubMed Central CAS Google Scholar
Sved, J. A., McRae, A. F. & Visscher, P. M. Divergence between human populations estimated from linkage disequilibrium. Am. J. Hum. Genet. 83, 737–743 (2008).
Article CAS PubMed PubMed Central Google Scholar
Budin-Ljøsne, I. et al. Data sharing in large research consortia: experiences and recommendations from ENGAGE. Eur. J. Hum. Genet. 22, 317–321 (2014).
Article PubMed Google Scholar
Surakka, I. et al. The impact of low-frequency and rare variants on lipid levels. Nat. Genet. 47, 589–597 (2015).
Article CAS PubMed PubMed Central Google Scholar
Horikoshi, M. et al. Discovery and fine-mapping of glycaemic and obesity-related trait loci using high-density imputation. PLoS Genet. 11, e1005230 (2015).
Article PubMed PubMed Central CAS Google Scholar
Pain, O. et al. Evaluation of polygenic prediction methodology within a reference-standardized framework. PLoS Genet. 17, e1009021 (2021).
Article CAS PubMed PubMed Central Google Scholar
Chung, W. et al. Efficient cross-trait penalized regression increases prediction accuracy in large cohorts using secondary phenotypes. Nat. Commun. 10, 569 (2019).
Article CAS PubMed PubMed Central Google Scholar
Chun, S. et al. Non-parametric polygenic risk prediction via partitioned GWAS summary statistics. Am. J. Hum. Genet. 107, 46–59 (2020).
Article CAS PubMed PubMed Central Google Scholar
Im, C. et al. Generalizability of ‘GWAS hits’ in clinical populations: lessons from childhood cancer survivors. Am. J. Hum. Genet. 107, 636–653 (2020).
Article CAS PubMed PubMed Central Google Scholar
Daetwyler, H. D., Villanueva, B. & Woolliams, J. A. Accuracy of predicting the genetic risk of disease using a genome-wide approach. PloS ONE 3, e3395 (2008).
Article PubMed PubMed Central CAS Google Scholar
Visscher, P. M. & Hill, W. G. The limits of individual identification from sample allele frequencies: theory and statistical analysis. PLoS Genet. 5, e1000628 (2009).
Article PubMed PubMed Central CAS Google Scholar
Galinsky, K. J. et al. Estimating cross-population genetic correlations of causal effect sizes. Genet. Epidemiol. 43, 180–188 (2019).
Article PubMed Google Scholar
Brown, B. C., Ye, C. J., Price, A. L. & Zaitlen, N. Transethnic genetic-correlation estimates from summary statistics. Am. J. Hum. Genet. 99, 76–88 (2016).
Article CAS PubMed PubMed Central Google Scholar
1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Article CAS Google Scholar
Zeng, J. et al. Signatures of negative selection in the genetic architecture of human complex traits. Nat. Genet. 50, 746–753 (2018).
Article CAS PubMed Google Scholar
Schoech, A. P. et al. Quantification of frequency-dependent genetic architectures in 25 UK Biobank traits reveals action of negative selection. Nat. Commun. 10, 790 (2019).
Article CAS PubMed PubMed Central Google Scholar
Zhang, Q., Privé, F., Vilhjálmsson, B. & Speed, D. Improved genetic prediction of complex traits from individual-level data or summary statistics. Nat. Commun. 12, 4192 (2021).
Article CAS PubMed PubMed Central Google Scholar
Hu, Y. et al. Leveraging functional annotations in genetic risk prediction for human complex diseases. PLoS Comput. Biol. 13, e1005589 (2017).
Article PubMed PubMed Central CAS Google Scholar
Márquez-Luna, C. et al. Incorporating functional priors improves polygenic prediction accuracy in UK Biobank and 23andMe data sets. Nat. Commun. 12, 6052 (2021).
Article PubMed PubMed Central CAS Google Scholar
Mak, T. S. H., Porsch, R. M., Choi, S. W., Zhou, X. & Sham, P. C. Polygenic scores via penalized regression on summary statistics. Genet. Epidemiol. 41, 469–480 (2017).
Article PubMed Google Scholar
Yang, S. & Zhou, X. Accurate and scalable construction of polygenic scores in large biobank data sets. Am. J. Hum. Genet. 106, 679–693 (2020).
Article CAS PubMed PubMed Central Google Scholar
Qian, J. et al. A fast and scalable framework for large-scale and ultrahigh-dimensional sparse regression with application to the UK Biobank. PLoS Genet. 16, e1009141 (2020).
Article CAS PubMed PubMed Central Google Scholar
Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).
Article CAS PubMed PubMed Central Google Scholar
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–75 (2007).
Article CAS PubMed PubMed Central Google Scholar
Akiyama, M. et al. Characterizing rare and low-frequency height-associated variants in the Japanese population. Nat. Commun. 10, 4393 (2019).
Article CAS PubMed PubMed Central Google Scholar
Loh, P.-R. et al. Reference-based phasing using the Haplotype Reference Consortium panel. Nat. Genet. 48, 1443–1448 (2016).
Article CAS PubMed PubMed Central Google Scholar
Das, S. et al. Next-generation genotype imputation service and methods. Nat. Genet. 48, 1284–1287 (2016).
Article CAS PubMed PubMed Central Google Scholar
Price, A. L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904–909 (2006).
Article CAS PubMed Google Scholar
Sakaue, S. et al. A cross-population atlas of genetic associations for 220 human phenotypes. Nat. Genet. 53, 1415–1424 (2021).
Article CAS PubMed Google Scholar
Gurdasani, D. et al. Uganda genome resource enables insights into population history and genomic discovery in Africa. Cell 179, 984–1002 (2019).
Article CAS PubMed PubMed Central Google Scholar
Lam, M. et al. RICOPILI: Rapid Imputation for COnsortias PIpeLIne. Bioinformatics 36, 930–933 (2020).
Article CAS PubMed Google Scholar
Lloyd-Jones, L. GCTB SBayesR shrunk sparse linkage disequilibrium matrices for HM3 variants, summary statistics and predictors generated from ‘Improved polygenic prediction by Bayesian multiple regression on summary statistics’ by Lloyd-Jones, Zeng et al. 2019. Zenodo https://doi.org/10.5281/ZENODO.3350914 (2019).
Yang, J., Lee, S. H., Goddard, M. E. & Visscher, P. M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011).
Article CAS PubMed PubMed Central Google Scholar
Gazal, S., Marquez-Luna, C., Finucane, H. K. & Price, A. L. Reconciling S-LDSC and LDAK functional enrichment estimates. Nat. Genet. 51, 1202–1204 (2019).
Article CAS PubMed PubMed Central Google Scholar
Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience 4, 7 (2015).
Article PubMed PubMed Central CAS Google Scholar
Purcell, S. & Chang, C. PLINK v2.00a3LM www.cog-genomics.org/plink/2.0/
The UK10K Consortium et al.The UK10K project identifies rare variants in health and disease. Nature 526, 82–90 (2015).
Article CAS Google Scholar
Weissbrod, O. Source code for PolyFun. Zenodo https://doi.org/10.5281/zenodo.6139679 (2022).

Download references

Acknowledgements

We thank A. Schoech and C. Márquez-Luna for helpful discussions. This research was conducted using the UK Biobank resource under application no. 16549 and was funded by the National Institutes of Health (NIH; grant nos. U01 HG009379, U01 HG012009, R37 MH107649, R01 MH101244 and R01 HG006399). M.K. was supported by a Nakajima Foundation Fellowship and the Masason Foundation. W.J.P. was supported by an NWO Veni grant (no. 91619152). A.R.M. was supported by the National Institute of Mental Health (grant no. K99/R00MH117229). H.K.F. was supported by E. and W. Schmidt. A.V.K. was supported by grants (nos. 1K08HG010155 and 1U01HG011719) from the National Human Genome Research Institute and a sponsored research agreement from IBM Research. Y.O. was supported by JSPS KAKENHI (grant nos. 19H01021 and 20K21834) and AMED (grant nos. JP21km0405211, JP21ek0109413, JP21ek0410075, JP21gm4010006 and P21km0405217) and JST Moonshot R&D (grant nos. JPMJMS2021 and JPMJMS2024). Computational analyses were performed on the O2 High-Performance Compute Cluster at Harvard Medical School.

Author information

These authors contributed equally: Omer Weissbrod, Masahiro Kanai, Huwenbo Shi.

Authors and Affiliations

Epidemiology Department, Harvard School of Public Health, Boston, MA, USA
Omer Weissbrod, Huwenbo Shi, Steven Gazal, Wouter J. Peyrot & Alkes L. Price
Broad Institute of MIT and Harvard, Cambridge, MA, USA
Masahiro Kanai, Amit V. Khera, Alicia R. Martin, Hilary K. Finucane & Alkes L. Price
Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
Masahiro Kanai & Yukinori Okada
OMNI Bioinformatics, San Francisco, CA, USA
Huwenbo Shi
Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
Steven Gazal
Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
Steven Gazal
Department of Psychiatry, Amsterdam UMC, Vrije Universiteit, Amsterdam, the Netherlands
Wouter J. Peyrot
Verve Therapeutics, Cambridge, MA, USA
Amit V. Khera
Laboratory for Systems Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
Yukinori Okada
Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
Hilary K. Finucane
Laboratory of Genome Technology, Human Genome Center, Institute of Medical Science, University of Tokyo, Tokyo, Japan
Koichi Matsuda
Laboratory of Clinical Genome Sequencing, Graduate School of Frontier Sciences, University of Tokyo, Tokyo, Japan
Koichi Matsuda & Yoichiro Kamatani
Division of Genetics, Institute of Medical Science, University of Tokyo, Tokyo, Japan
Yuji Yamanashi
Division of Clinical Genome Research, Institute of Medical Science, University of Tokyo, Tokyo, Japan
Yoichi Furukawa
Division of Molecular Pathology, IMSUT Hospital Department of Internal Medicine, Institute of Medical Science, University of Tokyo, Tokyo, Japan
Takayuki Morisaki
Department of Cancer Biology, Institute of Medical Science, University of Tokyo, Tokyo, Japan
Yoshinori Murakami
Laboratory of Complex Trait Genomics, Graduate School of Frontier Sciences, University of Tokyo, Tokyo, Japan
Yoichiro Kamatani
Department of Public Policy, Institute of Medical Science, University of Tokyo, Tokyo, Japan
Kaori Muto & Akiko Nagai
Department of Urology, Iwate Medical University, Iwate, Japan
Wataru Obara
Department of Internal Medicine and Rheumatology, Juntendo University Graduate School of Medicine, Tokyo, Japan
Ken Yamaji
Department of Respiratory Medicine, Juntendo University Graduate School of Medicine, Tokyo, Japan
Kazuhisa Takahashi
Division of Pharmacology, Department of Biomedical Science, Nihon University School of Medicine, Tokyo, Japan
Satoshi Asai
Division of Genomic Epidemiology and Clinical Trials, Clinical Trials Research Center, Nihon University School of Medicine, Tokyo, Japan
Satoshi Asai & Yasuo Takahashi
Tokushukai Group, Tokyo, Japan
Takao Suzuki & Nobuaki Sinozaki
Department of Hematology, Nippon Medical School, Tokyo, Japan
Hiroki Yamaguchi
Department of Bioregulation, Nippon Medical School, Kawasaki, Japan
Shiro Minami
Tokyo Metropolitan Geriatric Hospital and Institute of Gerontology, Tokyo, Japan
Shigeo Murayama
Fukujuji Hospital, Japan Anti-Tuberculosis Association, Tokyo, Japan
Kozo Yoshimori
Cancer Institute Hospital of the Japanese Foundation for Cancer Research, Tokyo, Japan
Satoshi Nagayama
Center for Clinical Research and Advanced Medicine, Shiga University of Medical Science, Shiga, Japan
Daisuke Obata
Department of General Thoracic Surgery, Osaka International Cancer Institute, Osaka, Japan
Masahiko Higashiyama
Iizuka Hospital, Fukuoka, Japan
Akihide Masumoto
National Hospital Organization, Osaka National Hospital, Osaka, Japan
Yukihiro Koretsune

Authors

Omer Weissbrod
View author publications
You can also search for this author in PubMed Google Scholar
Masahiro Kanai
View author publications
You can also search for this author in PubMed Google Scholar
Huwenbo Shi
View author publications
You can also search for this author in PubMed Google Scholar
Steven Gazal
View author publications
You can also search for this author in PubMed Google Scholar
Wouter J. Peyrot
View author publications
You can also search for this author in PubMed Google Scholar
Amit V. Khera
View author publications
You can also search for this author in PubMed Google Scholar
Yukinori Okada
View author publications
You can also search for this author in PubMed Google Scholar
Alicia R. Martin
View author publications
You can also search for this author in PubMed Google Scholar
Hilary K. Finucane
View author publications
You can also search for this author in PubMed Google Scholar
Alkes L. Price
View author publications
You can also search for this author in PubMed Google Scholar

Consortia

The Biobank Japan Project

Koichi Matsuda
, Yuji Yamanashi
, Yoichi Furukawa
, Takayuki Morisaki
, Yoshinori Murakami
, Yoichiro Kamatani
, Kaori Muto
, Akiko Nagai
, Wataru Obara
, Ken Yamaji
, Kazuhisa Takahashi
, Satoshi Asai
, Yasuo Takahashi
, Takao Suzuki
, Nobuaki Sinozaki
, Hiroki Yamaguchi
, Shiro Minami
, Shigeo Murayama
, Kozo Yoshimori
, Satoshi Nagayama
, Daisuke Obata
, Masahiko Higashiyama
, Akihide Masumoto
& Yukihiro Koretsune

Contributions

O.W., M.K., H.S. and A.L.P. designed the study. O.W., M.K., H.S. and S.G. analyzed the data. O.W., M.K., H.S. and A.L.P. wrote the manuscript with assistance from S.G., W.J.P., A.V.K, Y.O., A.R.M. and H.K.F.

Corresponding authors

Correspondence to Omer Weissbrod or Alkes L. Price.

Ethics declarations

Competing interests

O.W. is an employee and holds equity in Eleven Therapeutics. H.S. is an employee of Genentech and holds stock in Roche. A.V.K. is an employee and holds equity in Verve Therapeutics, and has served as a scientific advisor to Sanofi, Amgen, Maze Therapeutics, Navitor Pharmaceuticals, Sarepta Therapeutics, Novartis, Silence Therapeutics, Korro Bio, Veritas International, Color Health, Third Rock Ventures, Foresite Labs and Columbia University (NIH); A.V.K. received speaking fees from Illumina, MedGenome, Amgen and the Novartis Institute for Biomedical Research, and also received a sponsored research agreement from IBM Research. All other authors declare no competing interests.

Peer review

Peer review file

Nature Genetics thanks Marylyn Ritchie and Vincent Plagnol for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Cross-population PRS results for real UK Biobank traits, using summary statistics from a meta-analysis of many cohorts.

We report average prediction accuracy (relative-R², but computed with respect to PRS-CS instead of BOLT-LMM; see main text), meta-analyzed across 4 well-powered, approximately independent traits, for PRS trained in European Network for Genetic and Genomic Epidemiology (ENGAGE) samples (average N = 61,365) and applied to four UK Biobank populations. Target population sample sizes are indicated in parentheses; PolyPred and its summary statistic-based analogues used 500 additional training samples from each target population to estimate mixing weights. Asterisks above each bar denote statistical significance of the difference vs. PRS-CS, with red asterisks denoting a disadvantage (*P < 0.05; **P < 0.001). P-values were computed using a two-sided Wald test and were not adjusted for multiple comparisons. Errors bars denote standard errors. Numerical results, results for all 4 traits analyzed, absolute prediction accuracies (R²), and P-values of relative improvements vs. PRS-CS are reported in Supplementary Table 5 and Supplementary Table 8.

Source data

Supplementary information

Supplementary Information

Supplementary Tables 1–11 and Note.

Reporting Summary.

Peer Review File.

Supplementary Tables

Supplementary Tables 1–11.

Source data

Source Data Fig. 3

Source data for Fig. 3.

Source Data Fig. 4

Source data for Fig. 4.

Source Data Fig. 5

Source data for Fig. 5.

Source Data Fig. 6

Source data for Fig. 6.

Source Data Extended Data Fig. 1

Source data for Extended Data Fig. 1.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Weissbrod, O., Kanai, M., Shi, H. et al. Leveraging fine-mapping and multipopulation training data to improve cross-population polygenic risk scores. Nat Genet 54, 450–458 (2022). https://doi.org/10.1038/s41588-022-01036-9

Download citation

Received: 10 January 2021
Accepted: 25 February 2022
Published: 07 April 2022
Issue Date: April 2022
DOI: https://doi.org/10.1038/s41588-022-01036-9

This article is cited by

Recent advances in polygenic scores: translation, equitability, methods and FAIR tools
- Ruidong Xiang
- Martin Kelemen
- Samuel A. Lambert
Genome Medicine (2024)
A genome-wide association study of neutrophil count in individuals associated to an African continental ancestry group facilitates studies of malaria pathogenesis
- Andrei-Emil Constantinescu
- David A. Hughes
- Emma E. Vincent
Human Genomics (2024)
Principles and methods for transferring polygenic risk scores across global populations
- Linda Kachuri
- Nilanjan Chatterjee
- Tian Ge
Nature Reviews Genetics (2024)
Improving fine-mapping by modeling infinitesimal effects
- Ran Cui
- Roy A. Elzur
- Hilary K. Finucane
Nature Genetics (2024)
Improving polygenic risk prediction in admixed populations by explicitly modeling ancestral-differential effects via GAUDI
- Quan Sun
- Bryce T. Rowland
- Yun Li
Nature Communications (2024)

Subjects

Abstract

Access options

Similar content being viewed by others

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Consortia

The Biobank Japan Project

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review file

Additional information

Extended data

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links