Abstract
The current study was conducted to provide a general guidance for model specifications in polygenic risk score (PRS) analyses of the UK Biobank, such as adjusting for covariates (i.e. age, sex, recruitment centers, and genetic batch) and the number of principal components (PCs) that need to be included. To cover behavioral, physical and mental health outcomes, we evaluated three continuous outcomes (BMI, smoking, drinking) and two binary outcomes (Major Depressive Disorder and educational attainment). We applied 3280 (656 per phenotype) different models including different sets of covariates. We evaluated these different model specifications by comparing regression parameters such as R2, coefficients, and P values, as well as ANOVA tests. Findings suggest that only up to three PCs appears to be sufficient for controlling population stratification for most outcomes, whereas including other covariates (particularly age and sex) appears to be more essential for model performance.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Rent or buy this article
Prices vary by article type
from$1.95
to$39.95
Prices may be subject to local taxes which are calculated during checkout
Data availability
The data that support the findings of this study are available from UK Biobank (https://www.ukbiobank.ac.uk/), but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with the permission of UK Biobank.
References
Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet. 2006;38:904–9.
Bycroft C, Freeman C, Petkova D, Band G, Elliott LT, Sharp K, et al. The UK Biobank resource with deep phenotyping and genomic data. Nature. 2018;562:203–9.
Naret O, Kutalik Z, Hodel F, Xu ZM, Marques-Vidal P, Fellay J. Improving polygenic prediction with genetically inferred ancestry. HGG Adv. 2022;3:100109.
Agrawal A, Chiu AM, Le M, Halperin E, Sankararaman S, Gravel S. Scalable probabilistic PCA for large-scale genetic variation data. PLoS Genet. 2020;16:e1008773.
Reed E, Nunez S, Kulp D, Qian J, Reilly MP, Foulkes AS. A guide to genome-wide association analysis and post-analytic interrogation. Stat Med. 2015;34:3769–92.
Clifton L, Collister JA, Liu X, Littlejohns TJ, Hunter DJ. Assessing agreement between different polygenic risk scores in the UK Biobank. Sci Rep. 2022;12:12812.
Tanigawa Y, Qian J, Venkataraman G, Justesen JM, Li R, Tibshirani R, et al. Significant sparse polygenic risk scores across 813 traits in UK Biobank. PLoS Genet. 2022;18:e1010105.
Yun J-S, Jung SH, Shivakumar M, Xiao B, Khera AV, Won HH, et al. Polygenic risk for type 2 diabetes, lifestyle, metabolic health, and cardiovascular disease: a prospective UK Biobank study. Cardiovasc Diabetol. 2022;21:131.
Prive F, Aschard H, Carmi S, Folkersen L, Hoggart C, O’Reilly PF, et al. Portability of 245 polygenic scores when derived from the UK Biobank and applied to 9 ancestry groups from the same cohort. Am J Hum Genet. 2022;109:12–23.
Ge T, Irvin MR, Patki A, Srinivasasainagendra V, Lin YF, Tiwari HK, et al. Development and validation of a trans-ancestry polygenic risk score for type 2 diabetes in diverse populations. Genome Med. 2022;14:70.
Elliott LT, Sharp K, Alfaro-Almagro F, Shi S, Miller KL, Douaud G, et al. Genome-wide association studies of brain imaging phenotypes in UK Biobank. Nature. 2018;562:210–6.
Hellwege JN, Keaton JM, Giri A, Gao X, Velez Edwards DR, Edwards TL. Population stratification in genetic association studies. Curr Protoc Hum Genet. 2017;95:520.
Price AL, Zaitlen NA, Reich D, Patterson N. New approaches to population stratification in genome-wide association studies. Nat Rev Genet. 2010;11:459–63.
Pirastu N, Cordioli M, Nandakumar P, Mignogna G, Abdellaoui A, Hollis B, et al. Genetic analyses identify widespread sex-differential participation bias. Nat Genet. 2021;53:663–71.
Funding
L-KP is supported by the Kootstra Talent Fellowship of Maastricht University. BPFR was funded by a VIDI award number 91718336 from the Netherlands Scientific Organisation. SG and JvO are supported by the Ophelia research project, ZonMw grant number: 636340001. SG, BPFR, and JvO are supported by the YOUTH-GEMs project, funded by the European Union’s Horizon Europe program under the grant agreement number: 101057182.
Author information
Authors and Affiliations
Contributions
BDL and SG had full access to all the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Ethical approval
All participants provided written consent and ethical approval was given by the National Research Ethics Service Committee North West Multi-Center Haydock, Committee reference: 11/NW/0382.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
About this article
Cite this article
Lin, B.D., Pries, LK., van Os, J. et al. Adjusting for population stratification in polygenic risk score analyses: a guide for model specifications in the UK Biobank. J Hum Genet 68, 653–656 (2023). https://doi.org/10.1038/s10038-023-01161-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s10038-023-01161-1