Abstract
The development of next-generation sequencing technology has provided us with great convenience in genetic association studies and many effective analysis methods were proposed continuously. However, population stratification is still a major issue in current genetic association studies. Many existing methods have been developed to remove the bias due to population stratification for common variant association studies, but such methods may be not effective for rare variant, which will lead to power reduction. Therefore, in this paper, we develop a principal component analysis strategy (called PC-LLR) based on local linear regression method to eliminate population stratification effect in both rare variant and common variant association studies. Simulation results indicate that the new PC-LLR method can eliminate population stratification effect well. It has correct type I error rates in all cases and higher powers in most cases, while most existing methods have inflated type I error rates at least in some cases. We also demonstrate that the PC-LLR is more effective to eliminate population stratification effect through applying the PC-LLR to the whole-exome sequencing data set from genetic analysis workshop 19 (GAW19).
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Jiang YX, Epstein MP, Conneely KN. Assessing the impact of population stratification on association studies of rare variation. Hum Hered. 2013;76:28–35.
Wang MD, Ma WJ, Zhou Y. Association detection between ordinal trait and rare variants based on adaptive combination of P values. J Hum Genet. 2018;63:37–45.
Kiezun A, Bamshad M, Rich SS, Smith JD, Turner E, Project NES, et al. Fine-scale patterns of population stratification confound rare variant association tests. PloS One. 2013;8:e65834.
Oetjens MT, Brown-Gentry K, Goodloe R, Dilks HH, Crawford DC. Population stratification in the context of diverse epidemiologic surveys sans genome-wide data. Front Genet. 2016;7:1–10.
Price AL, Zaitlen NA, Reich D, Patterson N. New approaches to population stratification in genome-wide association studies. Nat Rev Genet. 2010;11:459–63.
Liu L, Zhang DH, Liu H, Arendt C. Robust methods for population stratification in genome wide association studies. BMC Bioinform. 2013;14:1–12.
Dandine-Roulland C, Bellenguez C, Debette S, Amouyel P, Génin E, Perdry H. Accuracy of heritability estimations in presence of hidden population stratification. Sci Rep. 2016;6:1–10.
Hellwege J, Keaton J, Giri A, Gao XY, Edwards DRV, Edwards TL. Population stratification in genetic association studies. Curr Protoc Hum Genet. 2017;366:96–101.
Wang XX, Zhang SL, Li Y, Li MY, Sha QY. A powerful approach to test an optimally weighted combination of rare variants in admixed populations. Genet Epidemiol. 2015;39:294–305.
Devlin B, Roeder K, Wasserman L. Genomic control, a new approach to genetic-based association studies. Theor Popul Biol. 2001;60:155–66.
Marchini J, Cardon LR, Phillips MS, Donnelly P. The effects of human population structure on large genetic association studies. Nat Genet. 2004;36:512–7.
Bacanu SA, Devlin B, Roeder K. Association studies for quantitative traits in structured populations. Genet Epidemiol. 2002;22:78–93.
Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet. 2006;38:904–9.
Kang HM, Sul JH, Service SK, Zaitlen NA, Kong S, Freimer NB, et al. Variance component model to account for sample structure in genome-wide association studies. Nat Genet. 2010;42:348–54.
Zhang ZW, Ersoz E, Lai CQ, Todhunter RJ, Tiwari HK, Gore MA, et al. Mixed linear model approach adapted for genome-wide association studies. Nat Genet. 2010;42:355–60.
Yang J, Zaitlen NA, Goddard ME, Visscher PM, Price AL. Advantages and pitfalls in the application of mixed-model association methods. Nat Genet. 2014;46:100–6.
Mathieson I, McVean G. Differential confounding of rare and common variants in spatially structured populations. Nat Genet. 2012;44:243–6.
Epstein MP, Duncan R, Jiang YX, Conneely KN, Allen AS, Satten GA. A permutation procedure to correct for confounders in case-control studies, including tests of rare variation. Am J Hum Genet. 2012;91:215–23.
Mao XY, Li Y, Liu YC, Lange L, Li MY. Testing genetic association with rare variants in admixed populations. Genet Epidemiol. 2013;37:38–47.
Zhang YW, Guan WH, Pan W. Adjustment for population stratification via principal components in association analysis of rare variants. Genet Epidemiol. 2013;37:99–109.
Sha QY, Zhang K, Zhang SL. A nonparametric regression approach to control for population stratification in rare variant association studies. Sci Rep. 2016;6:1–12.
Fan J, Gijbels I. Local polynomial modelling and its applications. NY, London, BR: Chapman & Hall/CRC; 1996. p. 14–22.
Gu JP, Li Q, Yang JC. Multivariate local polynomial kernel estimators: leading bias and asymptotic distribution. Economet Rev. 2015;34:979–1010.
Taylor J, Einbeck J. Challenging the curse of dimensionality in multivariate local linear regression. Comput Stat. 2013;28:955–76.
Yang L, Tschernig R. Multivariate bandwidth selection for local linear regression. J R Stat Soc. 2010;61:793–815.
Speckman P. Kernel smoothing in partial linear models. J R Stat Soc. 1988;50:413–36.
Sha QY, Zhang ZG, Zhang SL. An improved score test for genetic association studies. Genet Epidemiol. 2011;35:350–9.
Madsen BE, Browning SR. A Groupwise association test for rare mutations using a weighted sum statistic. PloS Genet. 2009;5:e1000384.
Li BS, Leal SM. Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am J Hum Genet. 2008;83:311–21.
Balding DJ, Nichols RA. A method for quantifying differentiation between populations at multi-allelic loci and its implications for investigating identity and paternity. Genetica. 1995;96:3–12.
Blangero J, Teslovich TM, Sim X, Almeida MA, Jun G, Dyer TD, et al. Omics-squared: human genomic, transcriptomic and phenotypic data for genetic analysis workshop 19. BMC Proc. 2016;10:71–7.
Gomes KFB, Santos AS, Semzezem C, Correia MR, Brito LA, Ruiz MO, et al. The influence of population stratification on genetic markers associated with type 1 diabetes. Sci Rep. 2017;7:1–10.
Wu CQ, Dewan A, Hoh J, Wang ZH. A comparison of association methods correcting for population stratification in case–control studies. Ann Hum Genet. 2012;75:418–27.
Roeder K, Luca D. Searching for disease susceptibility variants in structured populations. Genomics. 2009;93:1–4.
Acknowledgements
The authors would like to thank the joint editor and reviewers for helpful comments that greatly improved the presentation of the paper. This research was supported by the Natural Science Foundation of Heilongjiang Province of China (LH2019A020). The Genetic Analysis Workshops (GAW) are supported by GAW grant R01 GM031575 from the National Institute of General Medical Sciences. Preparation of the Genetic Analysis Workshop 17 Simulated Exome Data Set was supported in part by NIH R01 MH059490 and used sequencing data from the 1000 Genomes Project (www.1000genomes.org). The GAW19 whole genome sequence data were provided by the T2D-GENES Consortium, which is supported by NIH grants U01 DK085524, U01 DK085584, U01 DK085501, U01 DK085526, and U01 DK085545. The other genetic and phenotypic data for GAW19 were provided by the San Antonio Family Heart Study and San Antonio Family Diabetes/Gallbladder study, which are supported by NIH grants P01 HL045222, R01 DK047482, and R01 DK053889.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
He, J., Ma, W. & Zhou, Y. Gene association detection via local linear regression method. J Hum Genet 65, 115–123 (2020). https://doi.org/10.1038/s10038-019-0676-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s10038-019-0676-3