Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Brief Communication
  • Published:

Adjusting for population stratification in polygenic risk score analyses: a guide for model specifications in the UK Biobank

Abstract

The current study was conducted to provide a general guidance for model specifications in polygenic risk score (PRS) analyses of the UK Biobank, such as adjusting for covariates (i.e. age, sex, recruitment centers, and genetic batch) and the number of principal components (PCs) that need to be included. To cover behavioral, physical and mental health outcomes, we evaluated three continuous outcomes (BMI, smoking, drinking) and two binary outcomes (Major Depressive Disorder and educational attainment). We applied 3280 (656 per phenotype) different models including different sets of covariates. We evaluated these different model specifications by comparing regression parameters such as R2, coefficients, and P values, as well as ANOVA tests. Findings suggest that only up to three PCs appears to be sufficient for controlling population stratification for most outcomes, whereas including other covariates (particularly age and sex) appears to be more essential for model performance.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1

Data availability

The data that support the findings of this study are available from UK Biobank (https://www.ukbiobank.ac.uk/), but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with the permission of UK Biobank.

References

  1. Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet. 2006;38:904–9.

    Article  CAS  PubMed  Google Scholar 

  2. Bycroft C, Freeman C, Petkova D, Band G, Elliott LT, Sharp K, et al. The UK Biobank resource with deep phenotyping and genomic data. Nature. 2018;562:203–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Naret O, Kutalik Z, Hodel F, Xu ZM, Marques-Vidal P, Fellay J. Improving polygenic prediction with genetically inferred ancestry. HGG Adv. 2022;3:100109.

    CAS  PubMed  PubMed Central  Google Scholar 

  4. Agrawal A, Chiu AM, Le M, Halperin E, Sankararaman S, Gravel S. Scalable probabilistic PCA for large-scale genetic variation data. PLoS Genet. 2020;16:e1008773.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Reed E, Nunez S, Kulp D, Qian J, Reilly MP, Foulkes AS. A guide to genome-wide association analysis and post-analytic interrogation. Stat Med. 2015;34:3769–92.

    Article  PubMed  PubMed Central  Google Scholar 

  6. Clifton L, Collister JA, Liu X, Littlejohns TJ, Hunter DJ. Assessing agreement between different polygenic risk scores in the UK Biobank. Sci Rep. 2022;12:12812.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Tanigawa Y, Qian J, Venkataraman G, Justesen JM, Li R, Tibshirani R, et al. Significant sparse polygenic risk scores across 813 traits in UK Biobank. PLoS Genet. 2022;18:e1010105.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Yun J-S, Jung SH, Shivakumar M, Xiao B, Khera AV, Won HH, et al. Polygenic risk for type 2 diabetes, lifestyle, metabolic health, and cardiovascular disease: a prospective UK Biobank study. Cardiovasc Diabetol. 2022;21:131.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Prive F, Aschard H, Carmi S, Folkersen L, Hoggart C, O’Reilly PF, et al. Portability of 245 polygenic scores when derived from the UK Biobank and applied to 9 ancestry groups from the same cohort. Am J Hum Genet. 2022;109:12–23.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Ge T, Irvin MR, Patki A, Srinivasasainagendra V, Lin YF, Tiwari HK, et al. Development and validation of a trans-ancestry polygenic risk score for type 2 diabetes in diverse populations. Genome Med. 2022;14:70.

    Article  PubMed  PubMed Central  Google Scholar 

  11. Elliott LT, Sharp K, Alfaro-Almagro F, Shi S, Miller KL, Douaud G, et al. Genome-wide association studies of brain imaging phenotypes in UK Biobank. Nature. 2018;562:210–6.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Hellwege JN, Keaton JM, Giri A, Gao X, Velez Edwards DR, Edwards TL. Population stratification in genetic association studies. Curr Protoc Hum Genet. 2017;95:520.

    Google Scholar 

  13. Price AL, Zaitlen NA, Reich D, Patterson N. New approaches to population stratification in genome-wide association studies. Nat Rev Genet. 2010;11:459–63.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Pirastu N, Cordioli M, Nandakumar P, Mignogna G, Abdellaoui A, Hollis B, et al. Genetic analyses identify widespread sex-differential participation bias. Nat Genet. 2021;53:663–71.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Funding

L-KP is supported by the Kootstra Talent Fellowship of Maastricht University. BPFR was funded by a VIDI award number 91718336 from the Netherlands Scientific Organisation. SG and JvO are supported by the Ophelia research project, ZonMw grant number: 636340001. SG, BPFR, and JvO are supported by the YOUTH-GEMs project, funded by the European Union’s Horizon Europe program under the grant agreement number: 101057182.

Author information

Authors and Affiliations

Authors

Contributions

BDL and SG had full access to all the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.

Corresponding author

Correspondence to Sinan Guloksuz.

Ethics declarations

Competing interests

The authors declare no competing interests.

Ethical approval

All participants provided written consent and ethical approval was given by the National Research Ethics Service Committee North West Multi-Center Haydock, Committee reference: 11/NW/0382.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lin, B.D., Pries, LK., van Os, J. et al. Adjusting for population stratification in polygenic risk score analyses: a guide for model specifications in the UK Biobank. J Hum Genet 68, 653–656 (2023). https://doi.org/10.1038/s10038-023-01161-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s10038-023-01161-1

Search

Quick links