Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Gene association detection via local linear regression method

Abstract

The development of next-generation sequencing technology has provided us with great convenience in genetic association studies and many effective analysis methods were proposed continuously. However, population stratification is still a major issue in current genetic association studies. Many existing methods have been developed to remove the bias due to population stratification for common variant association studies, but such methods may be not effective for rare variant, which will lead to power reduction. Therefore, in this paper, we develop a principal component analysis strategy (called PC-LLR) based on local linear regression method to eliminate population stratification effect in both rare variant and common variant association studies. Simulation results indicate that the new PC-LLR method can eliminate population stratification effect well. It has correct type I error rates in all cases and higher powers in most cases, while most existing methods have inflated type I error rates at least in some cases. We also demonstrate that the PC-LLR is more effective to eliminate population stratification effect through applying the PC-LLR to the whole-exome sequencing data set from genetic analysis workshop 19 (GAW19).

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1

Similar content being viewed by others

References

  1. Jiang YX, Epstein MP, Conneely KN. Assessing the impact of population stratification on association studies of rare variation. Hum Hered. 2013;76:28–35.

    Article  Google Scholar 

  2. Wang MD, Ma WJ, Zhou Y. Association detection between ordinal trait and rare variants based on adaptive combination of P values. J Hum Genet. 2018;63:37–45.

    Article  Google Scholar 

  3. Kiezun A, Bamshad M, Rich SS, Smith JD, Turner E, Project NES, et al. Fine-scale patterns of population stratification confound rare variant association tests. PloS One. 2013;8:e65834.

    Article  Google Scholar 

  4. Oetjens MT, Brown-Gentry K, Goodloe R, Dilks HH, Crawford DC. Population stratification in the context of diverse epidemiologic surveys sans genome-wide data. Front Genet. 2016;7:1–10.

    Article  Google Scholar 

  5. Price AL, Zaitlen NA, Reich D, Patterson N. New approaches to population stratification in genome-wide association studies. Nat Rev Genet. 2010;11:459–63.

    Article  CAS  Google Scholar 

  6. Liu L, Zhang DH, Liu H, Arendt C. Robust methods for population stratification in genome wide association studies. BMC Bioinform. 2013;14:1–12.

    CAS  Google Scholar 

  7. Dandine-Roulland C, Bellenguez C, Debette S, Amouyel P, Génin E, Perdry H. Accuracy of heritability estimations in presence of hidden population stratification. Sci Rep. 2016;6:1–10.

    Article  Google Scholar 

  8. Hellwege J, Keaton J, Giri A, Gao XY, Edwards DRV, Edwards TL. Population stratification in genetic association studies. Curr Protoc Hum Genet. 2017;366:96–101.

    Google Scholar 

  9. Wang XX, Zhang SL, Li Y, Li MY, Sha QY. A powerful approach to test an optimally weighted combination of rare variants in admixed populations. Genet Epidemiol. 2015;39:294–305.

    Article  Google Scholar 

  10. Devlin B, Roeder K, Wasserman L. Genomic control, a new approach to genetic-based association studies. Theor Popul Biol. 2001;60:155–66.

    Article  CAS  Google Scholar 

  11. Marchini J, Cardon LR, Phillips MS, Donnelly P. The effects of human population structure on large genetic association studies. Nat Genet. 2004;36:512–7.

    Article  CAS  Google Scholar 

  12. Bacanu SA, Devlin B, Roeder K. Association studies for quantitative traits in structured populations. Genet Epidemiol. 2002;22:78–93.

    Article  Google Scholar 

  13. Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet. 2006;38:904–9.

    Article  CAS  Google Scholar 

  14. Kang HM, Sul JH, Service SK, Zaitlen NA, Kong S, Freimer NB, et al. Variance component model to account for sample structure in genome-wide association studies. Nat Genet. 2010;42:348–54.

    Article  CAS  Google Scholar 

  15. Zhang ZW, Ersoz E, Lai CQ, Todhunter RJ, Tiwari HK, Gore MA, et al. Mixed linear model approach adapted for genome-wide association studies. Nat Genet. 2010;42:355–60.

    Article  CAS  Google Scholar 

  16. Yang J, Zaitlen NA, Goddard ME, Visscher PM, Price AL. Advantages and pitfalls in the application of mixed-model association methods. Nat Genet. 2014;46:100–6.

    Article  Google Scholar 

  17. Mathieson I, McVean G. Differential confounding of rare and common variants in spatially structured populations. Nat Genet. 2012;44:243–6.

    Article  CAS  Google Scholar 

  18. Epstein MP, Duncan R, Jiang YX, Conneely KN, Allen AS, Satten GA. A permutation procedure to correct for confounders in case-control studies, including tests of rare variation. Am J Hum Genet. 2012;91:215–23.

    Article  CAS  Google Scholar 

  19. Mao XY, Li Y, Liu YC, Lange L, Li MY. Testing genetic association with rare variants in admixed populations. Genet Epidemiol. 2013;37:38–47.

    Article  Google Scholar 

  20. Zhang YW, Guan WH, Pan W. Adjustment for population stratification via principal components in association analysis of rare variants. Genet Epidemiol. 2013;37:99–109.

    Article  Google Scholar 

  21. Sha QY, Zhang K, Zhang SL. A nonparametric regression approach to control for population stratification in rare variant association studies. Sci Rep. 2016;6:1–12.

    Article  Google Scholar 

  22. Fan J, Gijbels I. Local polynomial modelling and its applications. NY, London, BR: Chapman & Hall/CRC; 1996. p. 14–22.

    Google Scholar 

  23. Gu JP, Li Q, Yang JC. Multivariate local polynomial kernel estimators: leading bias and asymptotic distribution. Economet Rev. 2015;34:979–1010.

    Article  Google Scholar 

  24. Taylor J, Einbeck J. Challenging the curse of dimensionality in multivariate local linear regression. Comput Stat. 2013;28:955–76.

    Article  Google Scholar 

  25. Yang L, Tschernig R. Multivariate bandwidth selection for local linear regression. J R Stat Soc. 2010;61:793–815.

    Article  Google Scholar 

  26. Speckman P. Kernel smoothing in partial linear models. J R Stat Soc. 1988;50:413–36.

    Google Scholar 

  27. Sha QY, Zhang ZG, Zhang SL. An improved score test for genetic association studies. Genet Epidemiol. 2011;35:350–9.

    Article  Google Scholar 

  28. Madsen BE, Browning SR. A Groupwise association test for rare mutations using a weighted sum statistic. PloS Genet. 2009;5:e1000384.

    Article  Google Scholar 

  29. Li BS, Leal SM. Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am J Hum Genet. 2008;83:311–21.

    Article  CAS  Google Scholar 

  30. Balding DJ, Nichols RA. A method for quantifying differentiation between populations at multi-allelic loci and its implications for investigating identity and paternity. Genetica. 1995;96:3–12.

    Article  CAS  Google Scholar 

  31. Blangero J, Teslovich TM, Sim X, Almeida MA, Jun G, Dyer TD, et al. Omics-squared: human genomic, transcriptomic and phenotypic data for genetic analysis workshop 19. BMC Proc. 2016;10:71–7.

    Article  Google Scholar 

  32. Gomes KFB, Santos AS, Semzezem C, Correia MR, Brito LA, Ruiz MO, et al. The influence of population stratification on genetic markers associated with type 1 diabetes. Sci Rep. 2017;7:1–10.

    Article  Google Scholar 

  33. Wu CQ, Dewan A, Hoh J, Wang ZH. A comparison of association methods correcting for population stratification in case–control studies. Ann Hum Genet. 2012;75:418–27.

    Article  Google Scholar 

  34. Roeder K, Luca D. Searching for disease susceptibility variants in structured populations. Genomics. 2009;93:1–4.

    Article  CAS  Google Scholar 

Download references

Acknowledgements

The authors would like to thank the joint editor and reviewers for helpful comments that greatly improved the presentation of the paper. This research was supported by the Natural Science Foundation of Heilongjiang Province of China (LH2019A020). The Genetic Analysis Workshops (GAW) are supported by GAW grant R01 GM031575 from the National Institute of General Medical Sciences. Preparation of the Genetic Analysis Workshop 17 Simulated Exome Data Set was supported in part by NIH R01 MH059490 and used sequencing data from the 1000 Genomes Project (www.1000genomes.org). The GAW19 whole genome sequence data were provided by the T2D-GENES Consortium, which is supported by NIH grants U01 DK085524, U01 DK085584, U01 DK085501, U01 DK085526, and U01 DK085545. The other genetic and phenotypic data for GAW19 were provided by the San Antonio Family Heart Study and San Antonio Family Diabetes/Gallbladder study, which are supported by NIH grants P01 HL045222, R01 DK047482, and R01 DK053889.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ying Zhou.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

He, J., Ma, W. & Zhou, Y. Gene association detection via local linear regression method. J Hum Genet 65, 115–123 (2020). https://doi.org/10.1038/s10038-019-0676-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s10038-019-0676-3

Search

Quick links