Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Technical Report
  • Published:

Boosting the power of genome-wide association studies within and across ancestries by using polygenic scores

Abstract

Genome-wide association studies (GWASs) have been mostly conducted in populations of European ancestry, which currently limits the transferability of their findings to other populations. Here, we show, through theory, simulations and applications to real data, that adjustment of GWAS analyses for polygenic scores (PGSs) increases the statistical power for discovery across all ancestries. We applied this method to analyze seven traits available in three large biobanks with participants of East Asian ancestry (nā€‰=ā€‰340,000 in total) and report 139 additional associations across traits. We also present a two-stage meta-analysis strategy whereby, in contributing cohorts, a PGS-adjusted GWAS is rerun using PGSs derived from a first round of a standard meta-analysis. On average, across traits, this approach yields a 1.26-fold increase in the number of detected associations (range 1.07- to 1.76-fold increase). Altogether, our study demonstrates the value of using PGSs to increase the power of GWASs in underrepresented populations and promotes such an analytical strategy for future GWAS meta-analyses.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: PGS adjustment increases the GWAS power in simulated and real data.
Fig. 2: PGS-adjusted X-chromosome and exome-wide association studies of height in the UKB.
Fig. 3: Adjustment for multiple sources of PGSs further increases the power of GWASs in EAS-ancestry biobanks.

Similar content being viewed by others

Data availability

Data derived from the Taiwan Biobank are restricted. Access to data in general (including genome-wide association study (GWAS) summary statistics) requires transfer agreements and other requirements. Specific inquiries regarding how to access these resources should be sent to Taiwan Biobank researchers. Phenotypes and genotypes from participants of the Health and Retirement Study (HRS) can be accessed from dbGaP under accession no. phs000428.v2.p2. Summary statistics of GWAS meta-analyses for the seven traits measured across BioBank Japan, the Taiwan Biobank and the Korean Genome and Epidemiology Study, as well as GWASs of X-chromosome variants conducted in the UK Biobank, are publicly available at https://zenodo.org/record/8213134 (version 3). Source data are provided with this paper.

Code availability

Source code (shell and R scripts) used to run simulations and polygenic score-adjusted genome-wide association study analyses can be publicly downloaded at https://zenodo.org/record/8213134 (version 3).

References

  1. Wang, Y. et al. Theoretical and empirical quantification of the accuracy of polygenic scores in ancestry divergent populations. Nat. Commun. 11, 3865 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  2. Yengo, L. et al. A saturated map of common genetic variants associated with human height. Nature 610, 704ā€“712 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Bennett, D., Oā€™Shea, D., Ferguson, J., Morris, D. & Seoighe, C. Controlling for background genetic effects using polygenic scores improves the power of genome-wide association studies. Sci. Rep. 11, 19571 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Mbatchou, J. et al. Computationally efficient whole-genome regression for quantitative and binary traits. Nat. Genet. 53, 1097ā€“1103 (2021).

    Article  CAS  PubMed  Google Scholar 

  5. Loh, P.-R. et al. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat. Genet. 47, 284ā€“290 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Yengo, L. et al. Meta-analysis of genome-wide association studies for height and body mass index in āˆ¼700000 individuals of European ancestry. Hum. Mol. Genet. 27, 3641ā€“3649 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Wood, A. R. et al. Defining the role of common variation in the genomic and biological architecture of adult human height. Nat. Genet. 46, 1173ā€“1186 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Locke, A. E. et al. Genetic studies of body mass index yield new insights for obesity biology. Nature 518, 197ā€“206 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Lloyd-Jones, L. GCTB SBayesR shrunk sparse linkage disequilibrium matrices for HM3 variants, summary statistics and predictors generated from ā€œImproved polygenic prediction by Bayesian multiple regression on summary statisticsā€ by Lloyd-Jones, Zeng et al. 2019. Zenodo https://doi.org/10.5281/zenodo.3350914 (2019).

  10. Lui, J. C. et al. Synthesizing genome-wide association studies and expression microarray reveals novel genes that act in the human growth plate to modulate height. Hum. Mol. Genet. 21, 5193ā€“5201 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Marouli, E. et al. Rare and low-frequency coding variants alter human adult height. Nature 542, 186ā€“190 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Bulik-Sullivan, B. K. et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291ā€“295 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Jurgens, S. J. et al. Adjusting for common variant polygenic scores improves yield in rare variant association analyses. Nat. Genet. 55, 544ā€“548 (2023).

    Article  CAS  PubMed  Google Scholar 

  14. Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203ā€“209 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Nagai, A. et al. Overview of the BioBank Japan Project: study design and profile. J. Epidemiol. 27, S2ā€“S8 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  16. Hirata, M. et al. Cross-sectional analysis of BioBank Japan clinical data: a large cohort of 200,000 patients with 47 common diseases. J. Epidemiol. 27, S9ā€“S21 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  17. Akiyama, M. et al. Characterizing rare and low-frequency height-associated variants in the Japanese population. Nat. Commun. 10, 4393 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Kim, Y., Han, B.-G. & KoGES Group. Cohort profile: the Korean Genome and Epidemiology Study (KoGES) consortium. Int. J. Epidemiol. 46, e20 (2017).

  19. Moon, S. et al. The Korea Biobank Array: design and identification of coding variants associated with blood biochemical traits. Sci. Rep. 9, 1382 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  20. Feng, Y.-C. A. et al. Taiwan Biobank: a rich biomedical research database of the Taiwanese population. Cell Genom. 2, 100197 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Lloyd-Jones, L. R. et al. Improved polygenic prediction by Bayesian multiple regression on summary statistics. Nat. Commun. 10, 5086 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  22. Yang, J., Lee, S. H., Goddard, M. E. & Visscher, P. M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76ā€“82 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Sidorenko, J. et al. The effect of X-linked dosage compensation on complex trait variation. Nat. Commun. 10, 3009 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  24. Szustakowski, J. D. et al. Advancing human genetics research and drug discovery through exome sequencing of the UK Biobank. Nat. Genet. 53, 942ā€“948 (2021).

    Article  CAS  PubMed  Google Scholar 

  25. Willer, C. J., Li, Y. & Abecasis, G. R. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26, 2190ā€“2191 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

L.Y. was supported by the Australian Research Council (DE200100425, FT220100069). P.M.V. was supported by the Australian Research Council (FL180100072). Y.-F.L. was supported by the National Health Research Institutes (NP-109-PP-09, NP-110-PP-09) and the National Science and Technology Council (109-2314-B-400-017, 110-2314-B-400-028-MY3) of Taiwan. Y.-C.A.F. acknowledges support from the National Taiwan University (NTU-112L7404), the Yushan Young Fellow Program provided by the Ministry of Education (MOE; NTU-112V1020-2), the National Science and Technology Council (NSTC 112-2314-B-002-200-MY3) and the Population Health Research Center from Featured Areas Research Center Program within the framework of the Higher Education Sprout Project by the MOE in Taiwan (NTU-112L9004). Y.O. was supported by the Japan Society for the Promotion of Science KAKENHI (Grants-in-Aid for Scientific Research; 22H00476), Japan Agency for Medical Research and Development (JP21gm4010006, JP22km0405211, JP22ek0410075, JP22km0405217, JP22ek0109594), Japan Science and Technology Agencyā€™s Moonshot R&D Program (JPMJMS2021, JPMJMS2024), Takeda Science Foundation and Bioinformatics Initiative of Osaka University Graduate School of Medicine. S.N. was supported by the Takeda Science Foundation. The Korean Genome and Epidemiology Study (KoGES) was supported by the Brain Pool Plus (BP+, Brain Pool+) Program through the National Research Foundation of Korea funded by the Ministry of Science and ICT (2020H1D3A2A03100666). This study includes data from the KoGES (4851-302), National Research Institute of Health, Centers for Disease Control and Prevention, Ministry for Health and Welfare, Republic of Korea. This research was conducted using the Taiwan Biobank resource. We thank all participants and investigators of the Taiwan Biobank. We thank the National Center for Genome Medicine of Taiwan for the technical support in genotyping. We thank the National Core Facility for Biopharmaceuticals (MOST 106-2319-B-492-002) and the National Center for High-Performance Computing of the National Applied Research Laboratories of Taiwan for providing computational and storage resources. The Health and Retirement Study (HRS) was supported by the National Institute on Aging (U01AG009740). HRS genotyping received additional support from the National Institute on Aging (RC2 AG036495, RC4 AG039029). HRS data were obtained from dbGaP (database of Genotypes and Phenotypes, accession no. phs000428.v2.p2). We thank D.J. Benjamin, P. Turley and M.E. Goddard for helpful and constructive discussions.

Author information

Authors and Affiliations

Authors

Consortia

Contributions

L.Y. designed the study, derived the theory and ran simulations. A.I.C. ran simulations, performed simulated and real data analysis and drafted the manuscript. S.N., S.-C.L., K.N., J.S., H.W., Y.K., L.-H.W., S.L., Y.-F.L., Y.-C.A.F., Y.O. and P.M.V. curated the data, performed quality control, performed statistical analyses and interpreted the results. All authors contributed to the writing and revision of the manuscript.

Corresponding authors

Correspondence to Adrian I. Campos or Loic Yengo.

Ethics declarations

Competing interests

A.I.C. is currently an employee of the Regeneron Genetics Center, a wholly owned subsidiary of Regeneron Pharmaceuticals, Inc., and may own stocks or stock options. The remaining authors declare no competing interests.

Peer review

Peer review information

Nature Genetics thanks Cassandra Spracklen and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisherā€™s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Methods, Note, Figs. 1ā€“16 and Tables 1ā€“10.

Reporting Summary

Peer Review File

Source data

Source Data Fig. 1

Statistical source data.

Source Data Fig. 2

Statistical source data.

Source Data Fig. 3

Statistical source data.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Campos, A.I., Namba, S., Lin, SC. et al. Boosting the power of genome-wide association studies within and across ancestries by using polygenic scores. Nat Genet 55, 1769ā€“1776 (2023). https://doi.org/10.1038/s41588-023-01500-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41588-023-01500-0

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter ā€” what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing