Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Leveraging fine-mapping and multipopulation training data to improve cross-population polygenic risk scores

Abstract

Polygenic risk scores suffer reduced accuracy in non-European populations, exacerbating health disparities. We propose PolyPred, a method that improves cross-population polygenic risk scores by combining two predictors: a new predictor that leverages functionally informed fine-mapping to estimate causal effects (instead of tagging effects), addressing linkage disequilibrium differences, and BOLT-LMM, a published predictor. When a large training sample is available in the non-European target population, we propose PolyPred+, which further incorporates the non-European training data. We applied PolyPred to 49 diseases/traits in four UK Biobank populations using UK Biobank British training data, and observed relative improvements versus BOLT-LMM ranging from +7% in south Asians to +32% in Africans, consistent with simulations. We applied PolyPred+ to 23 diseases/traits in UK Biobank east Asians using both UK Biobank British and Biobank Japan training data, and observed improvements of +24% versus BOLT-LMM and +12% versus PolyPred. Summary statistics-based analogs of PolyPred and PolyPred+ attained similar improvements.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Buy article

Get time limited or full article access on ReadCube.

$32.00

All prices are NET prices.

Fig. 1: Overview of PolyPred and PolyPred+.
Fig. 2: Recommendations for the application of PolyPred, PolyPred+ and related methods.
Fig. 3: Cross-population PRS results for simulated UK Biobank traits using in-sample LD.
Fig. 4: Cross-population PRS results for real UK Biobank traits.
Fig. 5: Cross-population PRS results for Biobank Japan and Uganda-APCDR traits.
Fig. 6: Cross-population PRS results for UK Biobank east Asians when incorporating both European and non-European training data.

Data availability

Access to the UK Biobank resource is available via application (http://www.ukbiobank.ac.uk). PRS coefficients generated in the present study are available for public download at http://data.broadinstitute.org/alkesgroup/polypred_results. Summary LD information of n = 337,000 British-ancestry UK Biobank individuals for 2,763 overlapping 3-Mb loci is available at https://data.broadinstitute.org/alkesgroup/UKBB_LD. Summary LD information of n = 50,000 UK Biobank individuals for SBayesR is available at https://zenodo.org/record/3350914. Summary LD information used by PRS-CS is available at https://github.com/getian107/PRScs. Baseline-LF v.2.2.UKB annotations and LD scores for UK Biobank SNPs are available at https://data.broadinstitute.org/alkesgroup/LDSCORE/baselineLF_v2.2.UKB.tar.gz. Source data are provided with this paper.

Code availability

PolyPred and PolyPred+ are provided as part of the open-source software package PolyFun, which is freely available at https://doi.org/10.5281/zenodo.6139679 (ref. 89) and https://github.com/omerwe/polyfun. BOLT-LMM is available at https://data.broadinstitute.org/alkesgroup/BOLT-LMM. SBayesR is available at https://cnsgenomics.com/software/gctb. PRS-CS is available at https://github.com/getian107/PRScs.

References

  1. Chatterjee, N., Shi, J. & García-Closas, M. Developing and evaluating polygenic risk prediction models for stratified disease prevention. Nat. Rev. Genet. 17, 392–406 (2016).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  2. Khera, A. V. et al. Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat. Genet. 50, 1219–1224 (2018).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  3. Torkamani, A., Wineinger, N. E. & Topol, E. J. The personal and clinical utility of polygenic risk scores. Nat. Rev. Genet. 19, 581–590 (2018).

    CAS  PubMed  Article  Google Scholar 

  4. Khera, A. V. et al. Polygenic prediction of weight and obesity trajectories from birth to adulthood. Cell 177, 587–596 (2019).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  5. Mavaddat, N. et al. Polygenic risk scores for prediction of breast cancer and breast cancer subtypes. Am. J. Hum. Genet. 104, 21–34 (2019).

    CAS  PubMed  Article  Google Scholar 

  6. Li, R., Chen, Y., Ritchie, M. D. & Moore, J. H. Electronic health records and polygenic risk scores for predicting disease risk. Nat. Rev. Genet. 21, 493–502 (2020).

    CAS  PubMed  Article  Google Scholar 

  7. Márquez-Luna, C., Loh, P.-R. & South Asian Type 2 Diabetes (SAT2D) Consortium, SIGMA Type 2 Diabetes Consortium & Price, A. L. Multiethnic polygenic risk scores improve risk prediction in diverse populations. Genet. Epidemiol. 41, 811–823 (2017).

  8. Grinde, K. E. et al. Generalizing polygenic risk scores from Europeans to Hispanics/Latinos. Genet. Epidemiol. 43, 50–62 (2019).

    PubMed  Article  Google Scholar 

  9. Peterson, R. E. et al. Genome-wide association studies in ancestrally diverse populations: opportunities, methods, pitfalls, and recommendations. Cell 179, 589–603 (2019).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  10. Sirugo, G., Williams, S. M. & Tishkoff, S. A. The missing diversity in human genetic studies. Cell 177, 26–31 (2019).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  11. Duncan, L. et al. Analysis of polygenic risk score usage and performance in diverse human populations. Nat. Commun. 10, 3328 (2019).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  12. Gurdasani, D., Barroso, I., Zeggini, E. & Sandhu, M. S. Genomics of disease risk in globally diverse populations. Nat. Rev. Genet. 20, 520–535 (2019).

    CAS  PubMed  Article  Google Scholar 

  13. Martin, A. R. et al. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat. Genet. 51, 584–591 (2019).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  14. Wang, Y. et al. Theoretical and empirical quantification of the accuracy of polygenic scores in ancestry divergent populations. Nat. Commun. 11, 3865 (2020).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  15. Amariuta, T. et al. Improving the trans-ancestry portability of polygenic risk scores by prioritizing variants in predicted cell-type-specific regulatory elements. Nat. Genet. 52, 1346–1354 (2020).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  16. Marnetto, D. et al. Ancestry deconvolution and partial polygenic score can improve susceptibility predictions in recently admixed individuals. Nat. Commun. 11, 1628 (2020).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  17. Bitarello, B. D. & Mathieson, I. Polygenic scores for height in admixed populations. G3 10, 4027–4036 (2020).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  18. Chen, M.-H. et al. Trans-ethnic and ancestry-specific blood-cell genetics in 746,667 individuals from 5 global populations. Cell 182, 1198–1213 (2020).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  19. Mahajan, A. et al. Trans-ancestry genetic study of type 2 diabetes highlights the power of diverse populations for discovery and translation. Preprint at medRxiv https://www.medrxiv.org/content/10.1101/2020.09.22.20198937v1 (2020).

  20. Cavazos, T. B. & Witte, J. S. Inclusion of variants discovered from diverse populations improves polygenic risk score transferability. Hum. Genet. Genom. Adv. 2, 100017 (2021).

    Article  Google Scholar 

  21. Mills, M. C. & Rahal, C. The GWAS diversity monitor tracks diversity by disease in real time. Nat. Genet. 52, 242–243 (2020).

    CAS  PubMed  Article  Google Scholar 

  22. Lehmann, B. C., Mackintosh, M., McVean, G. & Holmes, C. C. High trait variability in optimal polygenic prediction strategy within multiple-ancestry cohorts. Preprint at bioRxiv https://www.biorxiv.org/content/10.1101/2021.01.15.426781v2 (2021).

  23. Ji, Y. et al. Incorporating European GWAS findings improve polygenic risk prediction accuracy of breast cancer among East Asians. Genet. Epidemiol. https://doi.org/10.1002/gepi.22382 (2021).

  24. Ruan, Y. et al. Improving polygenic prediction in ancestrally diverse populations. Preprint at medRxiv https://www.medrxiv.org/content/10.1101/2020.12.27.20248738v2 (2020).

  25. Cai, M. et al. A unified framework for cross-population trait prediction by leveraging the genetic correlation of polygenic traits. Am. J. Hum. Genet. https://doi.org/10.1016/j.ajhg.2021.03.002 (2021).

  26. Huang, Q. Q. et al. Transferability of genetic loci and polygenic scores for cardiometabolic traits in British Pakistanis and Bangladeshis. Preprint at medRxiv https://www.medrxiv.org/content/10.1101/2020.12.27.20248738v2 (2021).

  27. Durvasula, A. & Lohmueller, K. E. Negative selection on complex traits limits phenotype prediction accuracy between populations. Am. J. Hum. Genet. 108, 620–631 (2021).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  28. Coram, M. A., Fang, H., Candille, S. I., Assimes, T. L. & Tang, H. Leveraging multi-ethnic evidence for risk assessment of quantitative traits in minority populations. Am. J. Hum. Genet. 101, 218–226 (2017).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  29. Wojcik, G. L. et al. Genetic analyses of diverse populations improves discovery for complex traits. Nature 570, 514–518 (2019).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  30. Shi, H. et al. Population-specific causal disease effect sizes in functionally important regions impacted by selection. Nat. Commun. 12, 1098 (2021).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  31. Kuchenbaecker, K. et al. The transferability of lipid loci across African, Asian and European cohorts. Nat. Commun. 10, 4330 (2019).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  32. Mostafavi, H. et al. Variable prediction accuracy of polygenic scores within an ancestry group. eLife 9, e48376 (2020).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  33. Vilhjálmsson, B. J. et al. Modeling linkage disequilibrium increases accuracy of polygenic risk scores. Am. J. Hum. Genet. 97, 576–592 (2015).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  34. Schaid, D. J., Chen, W. & Larson, N. B. From genome-wide associations to candidate causal variants by statistical fine-mapping. Nat. Rev. Genet. 19, 491–504 (2018).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  35. Weissbrod, O. et al. Functionally informed fine-mapping and polygenic localization of complex trait heritability. Nat. Genet. 52, 1355–1363 (2020).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  36. Loh, P.-R. et al. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat. Genet. 47, 284–290 (2015).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  37. Loh, P.-R., Kichaev, G., Gazal, S., Schoech, A. P. & Price, A. L. Mixed-model association for biobank-scale datasets. Nat. Genet. 50, 906–908 (2018).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  38. Lloyd-Jones, L. R. et al. Improved polygenic prediction by Bayesian multiple regression on summary statistics. Nat. Commun. 10, 5086 (2019).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  39. Ge, T., Chen, C.-Y., Ni, Y., Feng, Y.-C. A. & Smoller, J. W. Polygenic prediction via Bayesian regression and continuous shrinkage priors. Nat. Commun. 10, 1776 (2019).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  40. Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  41. Nagai, A. et al. Overview of the BioBank Japan project: study design and profile. J. Epidemiol. 27, S2–S8 (2017).

    PubMed  PubMed Central  Article  Google Scholar 

  42. Asiki, G. et al. The general population cohort in rural south-western Uganda: a platform for communicable and non-communicable disease studies. Int. J. Epidemiol. 42, 129–141 (2013).

    PubMed  PubMed Central  Article  Google Scholar 

  43. Heckerman, D. et al. Linear mixed model for heritability estimation that explicitly addresses environmental variation. Proc. Natl Acad. Sci. USA 113, 7377–7382 (2016).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  44. Duan, S., Zhang, W., Cox, N. J. & Dolan, M. E. FstSNP-HapMap3: a database of SNPs with high population differentiation for HapMap3. Bioinformation 3, 139–141 (2008).

    PubMed  PubMed Central  Article  Google Scholar 

  45. Purcell, S. M. et al. Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature 460, 748–752 (2009).

    CAS  PubMed  Article  Google Scholar 

  46. Stahl, E. A. et al. Bayesian inference analyses of the polygenic architecture of rheumatoid arthritis. Nat. Genet. 44, 483–489 (2012).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  47. Gazal, S. et al. Functional architecture of low-frequency variants highlights strength of negative selection across coding and non-coding annotations. Nat. Genet. 50, 1600–1607 (2018).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  48. Lam, M. et al. Comparative genetic architectures of schizophrenia in East Asian and European populations. Nat. Genet. 51, 1670–1678 (2019).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  49. Nievergelt, C. M. et al. International meta-analysis of PTSD genome-wide association studies identifies sex- and ancestry-specific genetic risk loci. Nat. Commun. 10, 4558 (2019).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  50. Sakaue, S. et al. Trans-biobank analysis with 676,000 individuals elucidates the association of polygenic risk scores of complex traits with human lifespan. Nat. Med. 26, 542–548 (2020).

    CAS  PubMed  Article  Google Scholar 

  51. Vuckovic, D. et al. The polygenic and monogenic basis of blood traits and diseases. Cell 182, 1214–1231.e11 (2020).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  52. Guo, J. et al. Global genetic differentiation of complex traits shaped by natural selection in humans. Nat. Commun. 9, 1865 (2018).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  53. Sved, J. A., McRae, A. F. & Visscher, P. M. Divergence between human populations estimated from linkage disequilibrium. Am. J. Hum. Genet. 83, 737–743 (2008).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  54. Budin-Ljøsne, I. et al. Data sharing in large research consortia: experiences and recommendations from ENGAGE. Eur. J. Hum. Genet. 22, 317–321 (2014).

    PubMed  Article  Google Scholar 

  55. Surakka, I. et al. The impact of low-frequency and rare variants on lipid levels. Nat. Genet. 47, 589–597 (2015).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  56. Horikoshi, M. et al. Discovery and fine-mapping of glycaemic and obesity-related trait loci using high-density imputation. PLoS Genet. 11, e1005230 (2015).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  57. Pain, O. et al. Evaluation of polygenic prediction methodology within a reference-standardized framework. PLoS Genet. 17, e1009021 (2021).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  58. Chung, W. et al. Efficient cross-trait penalized regression increases prediction accuracy in large cohorts using secondary phenotypes. Nat. Commun. 10, 569 (2019).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  59. Chun, S. et al. Non-parametric polygenic risk prediction via partitioned GWAS summary statistics. Am. J. Hum. Genet. 107, 46–59 (2020).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  60. Im, C. et al. Generalizability of ‘GWAS hits’ in clinical populations: lessons from childhood cancer survivors. Am. J. Hum. Genet. 107, 636–653 (2020).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  61. Daetwyler, H. D., Villanueva, B. & Woolliams, J. A. Accuracy of predicting the genetic risk of disease using a genome-wide approach. PloS ONE 3, e3395 (2008).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  62. Visscher, P. M. & Hill, W. G. The limits of individual identification from sample allele frequencies: theory and statistical analysis. PLoS Genet. 5, e1000628 (2009).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  63. Galinsky, K. J. et al. Estimating cross-population genetic correlations of causal effect sizes. Genet. Epidemiol. 43, 180–188 (2019).

    PubMed  Article  Google Scholar 

  64. Brown, B. C., Ye, C. J., Price, A. L. & Zaitlen, N. Transethnic genetic-correlation estimates from summary statistics. Am. J. Hum. Genet. 99, 76–88 (2016).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  65. 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015).

    Article  CAS  Google Scholar 

  66. Zeng, J. et al. Signatures of negative selection in the genetic architecture of human complex traits. Nat. Genet. 50, 746–753 (2018).

    CAS  PubMed  Article  Google Scholar 

  67. Schoech, A. P. et al. Quantification of frequency-dependent genetic architectures in 25 UK Biobank traits reveals action of negative selection. Nat. Commun. 10, 790 (2019).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  68. Zhang, Q., Privé, F., Vilhjálmsson, B. & Speed, D. Improved genetic prediction of complex traits from individual-level data or summary statistics. Nat. Commun. 12, 4192 (2021).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  69. Hu, Y. et al. Leveraging functional annotations in genetic risk prediction for human complex diseases. PLoS Comput. Biol. 13, e1005589 (2017).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  70. Márquez-Luna, C. et al. Incorporating functional priors improves polygenic prediction accuracy in UK Biobank and 23andMe data sets. Nat. Commun. 12, 6052 (2021).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  71. Mak, T. S. H., Porsch, R. M., Choi, S. W., Zhou, X. & Sham, P. C. Polygenic scores via penalized regression on summary statistics. Genet. Epidemiol. 41, 469–480 (2017).

    PubMed  Article  Google Scholar 

  72. Yang, S. & Zhou, X. Accurate and scalable construction of polygenic scores in large biobank data sets. Am. J. Hum. Genet. 106, 679–693 (2020).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  73. Qian, J. et al. A fast and scalable framework for large-scale and ultrahigh-dimensional sparse regression with application to the UK Biobank. PLoS Genet. 16, e1009141 (2020).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  74. Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  75. Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–75 (2007).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  76. Akiyama, M. et al. Characterizing rare and low-frequency height-associated variants in the Japanese population. Nat. Commun. 10, 4393 (2019).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  77. Loh, P.-R. et al. Reference-based phasing using the Haplotype Reference Consortium panel. Nat. Genet. 48, 1443–1448 (2016).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  78. Das, S. et al. Next-generation genotype imputation service and methods. Nat. Genet. 48, 1284–1287 (2016).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  79. Price, A. L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904–909 (2006).

    CAS  PubMed  Article  Google Scholar 

  80. Sakaue, S. et al. A cross-population atlas of genetic associations for 220 human phenotypes. Nat. Genet. 53, 1415–1424 (2021).

    CAS  PubMed  Article  Google Scholar 

  81. Gurdasani, D. et al. Uganda genome resource enables insights into population history and genomic discovery in Africa. Cell 179, 984–1002 (2019).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  82. Lam, M. et al. RICOPILI: Rapid Imputation for COnsortias PIpeLIne. Bioinformatics 36, 930–933 (2020).

    CAS  PubMed  Article  Google Scholar 

  83. Lloyd-Jones, L. GCTB SBayesR shrunk sparse linkage disequilibrium matrices for HM3 variants, summary statistics and predictors generated from ‘Improved polygenic prediction by Bayesian multiple regression on summary statistics’ by Lloyd-Jones, Zeng et al. 2019. Zenodo https://doi.org/10.5281/ZENODO.3350914 (2019).

  84. Yang, J., Lee, S. H., Goddard, M. E. & Visscher, P. M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  85. Gazal, S., Marquez-Luna, C., Finucane, H. K. & Price, A. L. Reconciling S-LDSC and LDAK functional enrichment estimates. Nat. Genet. 51, 1202–1204 (2019).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  86. Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience 4, 7 (2015).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  87. Purcell, S. & Chang, C. PLINK v2.00a3LM www.cog-genomics.org/plink/2.0/

  88. The UK10K Consortium et al.The UK10K project identifies rare variants in health and disease. Nature 526, 82–90 (2015).

    Article  CAS  Google Scholar 

  89. Weissbrod, O. Source code for PolyFun. Zenodo https://doi.org/10.5281/zenodo.6139679 (2022).

Download references

Acknowledgements

We thank A. Schoech and C. Márquez-Luna for helpful discussions. This research was conducted using the UK Biobank resource under application no. 16549 and was funded by the National Institutes of Health (NIH; grant nos. U01 HG009379, U01 HG012009, R37 MH107649, R01 MH101244 and R01 HG006399). M.K. was supported by a Nakajima Foundation Fellowship and the Masason Foundation. W.J.P. was supported by an NWO Veni grant (no. 91619152). A.R.M. was supported by the National Institute of Mental Health (grant no. K99/R00MH117229). H.K.F. was supported by E. and W. Schmidt. A.V.K. was supported by grants (nos. 1K08HG010155 and 1U01HG011719) from the National Human Genome Research Institute and a sponsored research agreement from IBM Research. Y.O. was supported by JSPS KAKENHI (grant nos. 19H01021 and 20K21834) and AMED (grant nos. JP21km0405211, JP21ek0109413, JP21ek0410075, JP21gm4010006 and P21km0405217) and JST Moonshot R&D (grant nos. JPMJMS2021 and JPMJMS2024). Computational analyses were performed on the O2 High-Performance Compute Cluster at Harvard Medical School.

Author information

Authors and Affiliations

Authors

Consortia

Contributions

O.W., M.K., H.S. and A.L.P. designed the study. O.W., M.K., H.S. and S.G. analyzed the data. O.W., M.K., H.S. and A.L.P. wrote the manuscript with assistance from S.G., W.J.P., A.V.K, Y.O., A.R.M. and H.K.F.

Corresponding authors

Correspondence to Omer Weissbrod or Alkes L. Price.

Ethics declarations

Competing interests

O.W. is an employee and holds equity in Eleven Therapeutics. H.S. is an employee of Genentech and holds stock in Roche. A.V.K. is an employee and holds equity in Verve Therapeutics, and has served as a scientific advisor to Sanofi, Amgen, Maze Therapeutics, Navitor Pharmaceuticals, Sarepta Therapeutics, Novartis, Silence Therapeutics, Korro Bio, Veritas International, Color Health, Third Rock Ventures, Foresite Labs and Columbia University (NIH); A.V.K. received speaking fees from Illumina, MedGenome, Amgen and the Novartis Institute for Biomedical Research, and also received a sponsored research agreement from IBM Research. All other authors declare no competing interests.

Peer review

Peer review file

Nature Genetics thanks Marylyn Ritchie and Vincent Plagnol for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Cross-population PRS results for real UK Biobank traits, using summary statistics from a meta-analysis of many cohorts.

We report average prediction accuracy (relative-R2, but computed with respect to PRS-CS instead of BOLT-LMM; see main text), meta-analyzed across 4 well-powered, approximately independent traits, for PRS trained in European Network for Genetic and Genomic Epidemiology (ENGAGE) samples (average N = 61,365) and applied to four UK Biobank populations. Target population sample sizes are indicated in parentheses; PolyPred and its summary statistic-based analogues used 500 additional training samples from each target population to estimate mixing weights. Asterisks above each bar denote statistical significance of the difference vs. PRS-CS, with red asterisks denoting a disadvantage (*P < 0.05; **P < 0.001). P-values were computed using a two-sided Wald test and were not adjusted for multiple comparisons. Errors bars denote standard errors. Numerical results, results for all 4 traits analyzed, absolute prediction accuracies (R2), and P-values of relative improvements vs. PRS-CS are reported in Supplementary Table 5 and Supplementary Table 8.

Source data

Supplementary information

Supplementary Information

Supplementary Tables 1–11 and Note.

Reporting Summary.

Peer Review File.

Supplementary Tables

Supplementary Tables 1–11.

Source data

Source Data Fig. 3

Source data for Fig. 3.

Source Data Fig. 4

Source data for Fig. 4.

Source Data Fig. 5

Source data for Fig. 5.

Source Data Fig. 6

Source data for Fig. 6.

Source Data Extended Data Fig. 1

Source data for Extended Data Fig. 1.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Weissbrod, O., Kanai, M., Shi, H. et al. Leveraging fine-mapping and multipopulation training data to improve cross-population polygenic risk scores. Nat Genet 54, 450–458 (2022). https://doi.org/10.1038/s41588-022-01036-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41588-022-01036-9

Further reading

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing