Pitfalls of predicting complex traits from SNPs

Wray, Naomi R.; Yang, Jian; Hayes, Ben J.; Price, Alkes L.; Goddard, Michael E.; Visscher, Peter M.

doi:10.1038/nrg3457

Opinion
Published: 18 June 2013

Pitfalls of predicting complex traits from SNPs

Naomi R. Wray¹,
Jian Yang^1,2,
Ben J. Hayes^3,4,5,
Alkes L. Price^6,7,
Michael E. Goddard^3,8 &
…
Peter M. Visscher^1,2

Nature Reviews Genetics volume 14, pages 507–515 (2013)Cite this article

27k Accesses
453 Citations
61 Altmetric
Metrics details

Subjects

Abstract

The success of genome-wide association studies (GWASs) has led to increasing interest in making predictions of complex trait phenotypes, including disease, from genotype data. Rigorous assessment of the value of predictors is crucial before implementation. Here we discuss some of the limitations and pitfalls of prediction analysis and show how naive implementations can lead to severe bias and misinterpretation of results.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Flowchart of SNP-based prediction analysis.**

**Figure 2: The overlap pitfall: non-independence of discovery and validation samples.**

Controlling for background genetic effects using polygenic scores improves the power of genome-wide association studies

Article Open access 01 October 2021

Improved genetic prediction of complex traits from individual-level data or summary statistics

Article Open access 07 July 2021

Leveraging functional genomic annotations and genome coverage to improve polygenic prediction of complex traits within and between ancestries

Article Open access 30 April 2024

References

de los Campos, G., Gianola, D. & Allison, D. B. Predicting genetic predisposition in humans: the promise of whole-genome markers. Nature Rev. Genet. 11, 880–886 (2010).
CAS PubMed Google Scholar
Gonzalez-Camacho, J. M. et al. Genome-enabled prediction of genetic values using radial basis function neural networks. Theor. Appl. Genet. 125, 759–771 (2012).
CAS PubMed PubMed Central Google Scholar
Crossa, J. et al. Prediction of genetic values of quantitative traits in plant breeding using pedigree and molecular markers. Genetics 186, 713–724 (2010).
CAS PubMed PubMed Central Google Scholar
Wei, Z. et al. From disease association to risk assessment: an optimistic view from genome-wide association studies on type 1 diabetes. PLoS Genet. 5, e1000678 (2009).
PubMed PubMed Central Google Scholar
de los Campos, G., Hickey, J. M., Pong-Wong, R., Daetwyler, H. D. & Calus, M. P. L. Whole genome regression and prediction methods applied to plant and animal breeding. Genetics 193, 1255–1268 (2012).
Google Scholar
Heffner, E. L., Sorrells, M. E. & Jannink, J. L. Genomic selection for crop improvement. Crop Sci. 49, 1–12 (2009).
CAS Google Scholar
Riedelsheimer, C. et al. Genomic and metabolic prediction of complex heterotic traits in hybrid maize. Nature Genet. 44, 217–220 (2012).
CAS PubMed Google Scholar
Becker, F. et al. Genetic testing and common disorders in a public health framework: how to assess relevance and possibilities. Eur. J. Hum. Genet. 19, S6–S44 (2011).
PubMed PubMed Central Google Scholar
Visscher, P. M., Hill, W. G. & Wray, N. R. Heritability in the genomics era—concepts and misconceptions. Nature Rev. Genet. 9, 255–266 (2008).
CAS PubMed Google Scholar
Janssens, A. C. et al. Predictive testing for complex diseases using multiple genes: fact or fiction? Genet. Med. 8, 395–400 (2006).
PubMed Google Scholar
Wray, N. R., Yang, J., Goddard, M. E. & Visscher, P. M. The genetic interpretation of area under the ROC curve in genomic profiling. PLoS Genet. 6, e1000864 (2010).
PubMed PubMed Central Google Scholar
Burga, A., Casanueva, M. O. & Lehner, B. Predicting mutation outcome from early stochastic variation in genetic interaction partners. Nature 480, 250–253 (2011).
CAS PubMed Google Scholar
Seddon, J. M. et al. Prediction model for prevalence and incidence of advanced age-related macular degeneration based on genetic, demographic, and environmental variables. Invest. Ophthalmol. Vis. Sci. 50, 2044–2053 (2009).
PubMed Google Scholar
Polychronakos, C. & Li, Q. Understanding type 1 diabetes through genetics: advances and prospects. Nature Rev. Genet. 12, 781–792 (2011).
CAS PubMed Google Scholar
So, H. C., Kwan, J. S., Cherny, S. S. & Sham, P. C. Risk prediction of complex diseases from family history and known susceptibility loci, with applications for cancer screening. Am. J. Hum. Genet. 88, 548–565 (2011).
CAS PubMed PubMed Central Google Scholar
Pharoah, P. D., Antoniou, A. C., Easton, D. F. & Ponder, B. A. Polygenes, risk prediction, and targeted prevention of breast cancer. N. Engl. J. Med. 358, 2796–2803 (2008).
CAS PubMed Google Scholar
Chatterjee, N. et al. Projecting the performance of risk prediction based on polygenic analyses of genome-wide association studies. Nature Genet. 45, 400–405 (2013).
CAS PubMed Google Scholar
Tenesa, A. & Haley, C. S. The heritability of human disease: estimation, uses and abuses. Nature Rev. Genet. 14, 139–149 (2013).
CAS PubMed Google Scholar
Ayodo, G. et al. Combining evidence of natural selection with association analysis increases power to detect malaria-resistance variants. Am. J. Hum. Genet. 81, 234–242 (2007).
CAS PubMed PubMed Central Google Scholar
Raj, T. et al. Alzheimer disease susceptibility loci: evidence for a protein network under natural selection. Am. J. Hum. Genet. 90, 720–726 (2012).
CAS PubMed PubMed Central Google Scholar
Jostins, L. et al. Host–microbe interactions have shaped the genetic architecture of inflammatory bowel disease. Nature 491, 119–124 (2012).
CAS PubMed PubMed Central Google Scholar
Barreiro, L. B., Laval, G., Quach, H., Patin, E. & Quintana-Murci, L. Natural selection has driven population differentiation in modern humans. Nature Genet. 40, 340–345 (2008).
CAS PubMed Google Scholar
Crow, J. F. Maintaining evolvability. J. Genet. 87, 349–353 (2008).
PubMed Google Scholar
Vissers, L. E. et al. A de novo paradigm for mental retardation. Nature Genet. 42, 1109–1112 (2010).
CAS PubMed Google Scholar
de Brouwer, A. P. et al. Mutation frequencies of X-linked mental retardation genes in families from the EuroMRX consortium. Hum. Mutat. 28, 207–208 (2007).
PubMed Google Scholar
Yang, J. et al. Common SNPs explain a large proportion of the heritability for human height. Nature Genet. 42, 565–569 (2010).
CAS PubMed Google Scholar
Visscher, P. M. et al. A commentary on 'Common SNPs explain a large proportion of the heritability for human height' by Yang et al. (2010). Twin. Res. Hum. Genet. 13, 517–524 (2010).
PubMed Google Scholar
Visscher, P. M., Brown, M. A., McCarthy, M. I. & Yang, J. Five years of GWAS discovery. Am. J. Hum. Genet. 90, 7–24 (2012).
CAS PubMed PubMed Central Google Scholar
Purcell, S. M. et al. Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature 460, 748–752 (2009).
CAS PubMed Google Scholar
Lee, S. H. et al. Estimating the proportion of variation in susceptibility to schizophrenia captured by common SNPs. Nature Genet. 44, 247–250 (2012).
CAS PubMed Google Scholar
Haile-Mariam, M., Nieuwhof, G. J., Beard, K. T., Konstatinov, K. V. & Hayes, B. J. Comparison of heritabilities of dairy traits in Australian Holstein-Friesian cattle from genomic and pedigree data and implications for genomic evaluations. J. Anim. Breed. Genet. 130, 20–31 (2013).
CAS PubMed Google Scholar
Jensen, J., Su, G. & Madsen, P. Partitioning additive genetic variance into genomic and remaining polygenic components for complex traits in dairy cattle. BMC Genet. 13, 44 (2012).
CAS PubMed PubMed Central Google Scholar
Kemper, K. E., Daetwyler, H. D., Visscher, P. M. & Goddard, M. E. Comparing linkage and association analyses in sheep points to a better way of doing GWAS. Genet. Res. 94, 191–203 (2012).
CAS Google Scholar
Wu, M. C. et al. Rare-variant association testing for sequencing data with the sequence kernel association test. Am. J. Hum. Genet. 89, 82–93 (2011).
CAS PubMed PubMed Central Google Scholar
Bacanu, S. A., Nelson, M. R. & Whittaker, J. C. Comparison of statistical tests for association between rare variants and binary traits. PLoS ONE 7, e42530 (2012).
CAS PubMed PubMed Central Google Scholar
Lindor, N. M. et al. A review of a multifactorial probability-based model for classification of BRCA1 and BRCA2 variants of uncertain significance (VUS). Hum. Mutat. 33, 8–21 (2012).
CAS PubMed Google Scholar
Stahl, E. A. et al. Bayesian inference analyses of the polygenic architecture of rheumatoid arthritis. Nature Genet. 44, 483–489 (2012).
CAS PubMed Google Scholar
Goddard, M. E. Genomic selection: prediction of accuracy and maximisation of long term response. Genetica 136, 245–257 (2009).
PubMed Google Scholar
Hayes, B. J., Bowman, P. J., Chamberlain, A. J. & Goddard, M. E. Invited review: Genomic selection in dairy cattle: progress and challenges. J. Dairy Sci. 92, 433–443 (2009).
CAS PubMed Google Scholar
Daetwyler, H. D., Villanueva, B. & Woolliams, J. A. Accuracy of predicting the genetic risk of disease using a genome-wide approach. PLoS ONE 3, e3395 (2008).
PubMed PubMed Central Google Scholar
de los Campos, G. et al. Predicting quantitative traits with regression models for dense molecular markers and pedigree. Genetics 182, 375–385 (2009).
CAS PubMed PubMed Central Google Scholar
Goddard, M. E., Wray, N. R., Verbyla, K. L. & Visscher, P. M. Estimating effects and making predictions from genome-wide marker data. Statist. Sci. 24, 517–529 (2009).
Google Scholar
Stephens, M. & Balding, D. J. Bayesian statistical methods for genetic association studies. Nature Rev. Genet. 10, 681–690 (2009).
CAS PubMed Google Scholar
Guan, Y. T. & Stephens, M. Bayesian variable selection regression for genome-wide association studies and other large-scale problems. Ann. Appl. Statist. 5, 1780–1815 (2011).
Google Scholar
Zhou, X., Carbonetto, P. & Stephens, M. Polygenic modeling with Bayesian sparse linear mixed models. PLoS Genet. 9, e1003264 (2013).
CAS PubMed PubMed Central Google Scholar
Erbe, M. et al. Improving accuracy of genomic predictions within and between dairy cattle breeds with imputed high-density single nucleotide polymorphism panels. J. Dairy Sci. 95, 4114–4129 (2012).
CAS PubMed Google Scholar
Yang, J. et al. Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nature Genet. 44, 369–375 (2012).
CAS PubMed Google Scholar
Meigs, J. B. et al. Genotype score in addition to common risk factors for prediction of type 2 diabetes. N. Engl. J. Med. 359, 2208–2219 (2008).
CAS PubMed PubMed Central Google Scholar
Kraft, P. & Hunter, D. J. Genetic risk prediction—are we there yet? N. Engl. J. Med. 360, 1701–1703 (2009).
CAS PubMed Google Scholar
Paynter, N. P. et al. Association between a literature-based genetic risk score and cardiovascular events in women. JAMA 303, 631–637 (2010).
CAS PubMed PubMed Central Google Scholar
Wacholder, S. et al. Performance of common genetic variants in breast-cancer risk models. N. Engl. J. Med. 362, 986–993 (2010).
CAS PubMed PubMed Central Google Scholar
Meuwissen, T. H., Hayes, B. J. & Goddard, M. E. Prediction of total genetic value using genome-wide dense marker maps. Genetics 157, 1819–1829 (2001).
CAS PubMed PubMed Central Google Scholar
Ober, U. et al. Using whole-genome sequence data to predict quantitative trait phenotypes in Drosophila melanogaster. PLoS Genet. 8, e1002685 (2012).
CAS PubMed PubMed Central Google Scholar
Abraham, G., Kowalczyk, A., Zobel, J. & Inouye, M. SparSNP: fast and memory-efficient analysis of all SNPs for phenotype prediction. BMC Bioinformatics 13, 88 (2012).
PubMed PubMed Central Google Scholar
Yang, J. et al. Genomic inflation factors under polygenic inheritance. Eur. J. Hum. Genet. 19, 807–812 (2011).
PubMed PubMed Central Google Scholar
Patterson, N., Price, A. L. & Reich, D. Population structure and eigenanalysis. PLoS Genet. 2, e190 (2006).
PubMed PubMed Central Google Scholar
Derringer, J. et al. Predicting sensation seeking from dopamine genes. A candidate-system approach. Psychol. Sci. 21, 1282–1290 (2010).
PubMed Google Scholar
Mackay, T. F. et al. The Drosophila melanogaster Genetic Reference Panel. Nature 482, 173–178 (2012).
CAS PubMed PubMed Central Google Scholar
Powell, J. E. & Zietsch, B. P. Predicting sensation seeking from dopamine genes: use and misuse of genetic prediction. Psychol. Sci. 22, 413–415 (2011).
PubMed Google Scholar
Skafidas, E. et al. Predicting the diagnosis of autism spectrum disorder using gene pathway analysis. Mol. Psychiatry 11 Sep 2012 (10.1038/mp.2012.126).
Ambroise, C. & McLachlan, G. J. Selection bias in gene extraction on the basis of microarray gene-expression data. Proc. Natl Acad. Sci. USA 99, 6562–6566 (2002).
CAS PubMed PubMed Central Google Scholar
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
CAS PubMed PubMed Central Google Scholar
Yang, J., Lee, S. H., Goddard, M. E. & Visscher, P. M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011).
CAS PubMed PubMed Central Google Scholar
Makowsky, R. et al. Beyond missing heritability: prediction of complex traits. PLoS Genet. 7, e1002051 (2011).
CAS PubMed PubMed Central Google Scholar
Lango Allen, H. et al. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature 467, 832–838 (2010).
CAS PubMed PubMed Central Google Scholar
Daetwyler, H. D., Calus, M. P. L., Pong-Wong, R., de los Campos, G. & Hickey, J. M. Genomic prediction in animals and plants: simulation of data, validation, reporting and benchmarking. Genetics 193, 347–365 (2012).
PubMed Google Scholar
Price, A. L. et al. Discerning the ancestry of European Americans in genetic association studies. PLoS Genet. 4, e236 (2008).
PubMed PubMed Central Google Scholar
Belgard, T. G., Jankovic, I., Lowe, J. K. & Geschwind, D. H. Population structure confounds autism genetic classifier. Mol. Psychiatry 2 Apr 2013 (10.1038/mp.2013.34).
Lee, S. H., Wray, N. R., Goddard, M. E. & Visscher, P. M. Estimating missing heritability for disease from genome-wide association studies. Am. J. Hum. Genet. 88, 294–305 (2011).
PubMed PubMed Central Google Scholar
Price, A. L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nature Genet. 38, 904–909 (2006).
CAS PubMed Google Scholar
Thornton, T. et al. Estimating kinship in admixed populations. Am. J. Hum. Genet. 91, 122–138 (2012).
CAS PubMed PubMed Central Google Scholar
Lubke, G. H. et al. Estimating the genetic variance of major depressive disorder due to all single nucleotide polymorphisms. Biol. Psychiatry 72, 707–709 (2012).
CAS PubMed PubMed Central Google Scholar
Machiela, M. J. et al. Evaluation of polygenic risk scores for predicting breast and prostate cancer risk. Genet. Epidemiol. 35, 506–514 (2011).
PubMed PubMed Central Google Scholar
Evans, D. M., Visscher, P. M. & Wray, N. R. Harnessing the information contained within genome-wide association studies to improve individual prediction of complex disease risk. Hum. Mol. Genet. 18, 3525–3531 (2009).
CAS PubMed Google Scholar
Peterson, R. E. et al. Genetic risk sum score comprised of common polygenic variation is associated with body mass index. Hum. Genet. 129, 221–230 (2011).
PubMed Google Scholar
Lee, S. H., Yang, J., Goddard, M. E., Visscher, P. M. & Wray, N. R. Estimation of pleiotropy between complex diseases using single-nucleotide polymorphism-derived genomic relationships and restricted maximum likelihood. Bioinformatics 28, 2540–2542 (2012).
CAS PubMed PubMed Central Google Scholar
Campbell, C. D. et al. Demonstrating stratification in a European American population. Nature Genet. 37, 868–872 (2005).
CAS PubMed Google Scholar
Turchin, M. C. et al. Evidence of widespread selection on standing variation in Europe at height-associated SNPs. Nature Genet. 44, 1015–1019 (2012).
CAS PubMed Google Scholar
Psaty, B. M. et al. Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium: design of prospective meta-analyses of genome-wide association studies from 5 cohorts. Circ. Cardiovasc. Genet. 2, 73–80 (2009).
PubMed PubMed Central Google Scholar
Qi, L. et al. Genetic variants at 2q24 are associated with susceptibility to type 2 diabetes. Hum. Mol. Genet. 19, 2706–2715 (2010).
CAS PubMed PubMed Central Google Scholar
Yang, J. et al. Genome partitioning of genetic variation for complex traits using common SNPs. Nature Genet. 43, 519–525 (2011).
CAS PubMed Google Scholar

Download references

Acknowledgements

The authors acknowledge funding from the Australian National Health and Medical Research Council (1047956, 1011506, 613601, 613602, 1048853, 1052684, 1050218), Australian Research Council (FT0991360, DP130102666) and the US National Institutes of Health (NIH; R01 HG006399, R01 GM 075091, P01 GM 099568, R01 MH100141). The authors thank J. Witte for helpful comments. Funding support for the genome-wide association study (GWAS) of gene and environment initiatives in type 2 diabetes is provided through the NIH Genes, Environment and Health Initiative (GEI; U01HG004399). The human subjects participating in the GWAS are derived from the Nurses' Health Study (NHS) and Health Professionals' Follow-up Study (HPFS), and these studies are supported by NIH grants CA87969, CA55075 and DK58845. Assistance with phenotype harmonization and genotype cleaning, as well as with general study coordination, was provided by the gene–environment association studies, GENEVA Coordinating Center (U01 HG004446). Assistance with data cleaning was provided by the US National Center for Biotechnology Information (NCBI). Funding support for genotyping, which was carried out at the Broad Institute of MIT and Harvard, was provided by the NIH GEI (U01HG004424). The data sets used for the analyses described in this manuscript were obtained from dbGap accession number phs000091. The Atherosclerosis Risk in Communities Study (ARIC) was carried out as a collaborative study supported by US National Heart, Lung, and Blood Institute (NHLBI) contracts (HHSN268201100005C, HHSN268201100006C, HHSN268201100007C, HHSN268201100008C, HHSN268201100009C, HHSN268201100010C, HHSN268201100011C, HHSN268201100012C), R01HL087641, R01HL59367 and R01HL086694; National Human Genome Research Institute contract U01HG004402; and NIH contract HHSN268200625226C. The authors thank the staff and participants of the ARIC study for their important contributions. Infrastructure was partly supported by grant number UL1RR025005, a component of the NIH and NIH Roadmap for Medical Research. The Framingham Heart Study (FHS) is conducted and supported by the NHLBI in collaboration with Boston University (Contract No. N01-HC-25195). This article was not prepared in collaboration with investigators of the FHS and does not necessarily reflect the opinions or views of the FHS, Boston University or the NHLBI. Funding for SHARe Affymetrix genotyping was provided by NHLBI Contract N02-HL-64278. SHARe Illumina genotyping was provided under an agreement between Illumina and Boston University. The authors are grateful to S. Pollack, C. Palmer and J. Hirschhorn for assistance with FHS data.

Author information

Authors and Affiliations

Naomi R. Wray, Jian Yang and Peter M. Visscher are at The Queensland Brain Institute, The University of Queensland, QBI Building, St Lucia, Queensland 4071, Australia.,
Naomi R. Wray, Jian Yang & Peter M. Visscher
Jian Yang and Peter M. Visscher are at The University of Queensland Diamantina Institute, Level 7, 37 Kent Street, Translational Research Institute, Woolloongabba, Queensland 4102, Australia.,
Jian Yang & Peter M. Visscher
Ben J. Hayes and Michael E. Goddard are at the Biosciences Research Division, Department of Primary Industries, GPO Box 4440, Melbourne, Victoria 3001, Australia.,
Ben J. Hayes & Michael E. Goddard
Ben J. Hayes is at the Dairy Futures Cooperative Research Centre, AgriBio, Centre for AgriBioscience, 5 Ring Road, La Trobe University, Bundoora, Victoria 3083, Australia,
Ben J. Hayes
and La Trobe University, Bundoora, Victoria 3086, Australia.,
Ben J. Hayes
Alkes L. Price is at the Department of Epidemiology, USA; the Department of Biostatistics, Harvard School of Public Health, 677 Huntington Avenue, Boston, Massachusetts 02115, Harvard School of Public Health, 655 Huntington Avenue, SPH2, 4th Floor, Boston, Massachusetts 02115, USA; the Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA; and the Program in Molecular and Genetic Epidemiology, Harvard School of Public Health, 655 Huntington Avenue, Building 2, 2nd Floor, Boston, Massachusetts 02115, USA.,
Alkes L. Price
the Department of Biostatistics, Harvard School of Public Health, 655 Huntington Avenue, SPH2, 4th Floor, Boston, Massachusetts 02115, USA,
Alkes L. Price
Michael E. Goddard is at the Faculty of Land and Food Resources, University of Melbourne, Melbourne, Victoria 3010, Australia.,
Michael E. Goddard

Authors

Naomi R. Wray
View author publications
You can also search for this author in PubMed Google Scholar
Jian Yang
View author publications
You can also search for this author in PubMed Google Scholar
Ben J. Hayes
View author publications
You can also search for this author in PubMed Google Scholar
Alkes L. Price
View author publications
You can also search for this author in PubMed Google Scholar
Michael E. Goddard
View author publications
You can also search for this author in PubMed Google Scholar
Peter M. Visscher
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Peter M. Visscher.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

PowerPoint slides

PowerPoint slide for Fig. 1

PowerPoint slide for Fig. 2

Glossary

Ancestry principal components: Principal components derived from the genome relationship matrix that account for the genetic substructure of the data. In case–control studies, these principal components can reflect genotyping artefacts, such as plate, batch and genotyping centre, that could be confounded with case–control status.
Conventionally unrelated: Individuals that are not closely related: for example, more distantly related than third cousins.
Cross-validated: Cross-validation involves testing the validity of a prediction in the absence of an independent external validation sample. This is done by dividing the sample into k independent subsets (balanced with respect to case–control status in disease data). Each of the k subsets is used in turn as a validation sample for a predictor derived from the remaining k– 1 subsets.
Cryptic relatedness: When a sample is thought to comprise unrelated individuals on the basis of recorded pedigree relationships but in fact includes close relatives: for example, second cousin or closer.
Effective population size: The number of individuals in an idealized population with random mating and no selection that would lead to the same rate of inbreeding as observed in the real population.
Estimated breeding values: Estimates of the additive genetic value for a particular trait that an individual will pass on to descendants.
Heritability: The proportion of phenotypic variance attributable to additive genetic variation.
Independent sample: In the context of risk prediction, this is a sample from the same population but excluding individuals that are closely related. It is necessary for the individuals in different samples from the same population to share common ancestors, and indeed this distant sharing underpins the efficacy of a risk predictor.
Independent SNPs: Uncorrelated single-nucleotide polymorphisms (SNPs) in linkage equilibrium.
Linkage disequilibrium: (LD). The nonrandom association of alleles at different loci.
Polygenic prediction analysis: Any analysis method that predicts genetic risk or breeding values on the basis of the combined contribution of many loci.
Profile scoring: A polygenic prediction method for prediction of genetic value or risk for each individual (a 'profile') in a validation sample generated from the sum of the alleles they carry weighted by the association effect size estimated in a discovery sample.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wray, N., Yang, J., Hayes, B. et al. Pitfalls of predicting complex traits from SNPs. Nat Rev Genet 14, 507–515 (2013). https://doi.org/10.1038/nrg3457

Download citation

Published: 18 June 2013
Issue Date: July 2013
DOI: https://doi.org/10.1038/nrg3457

This article is cited by

Genomic and population characterization of a diversity panel of dwarf and tall coconut accessions from the International Coconut Genebank for Latin America and Caribbean
- Allison Vieira da Silva
- Emiliano Fernandes Nassau Costa
- Roberto Fritsche-Neto
Genetic Resources and Crop Evolution (2024)
The trade-off between density marker panels size and predictive ability of genomic prediction for agronomic traits in Coffea canephora
- Ithalo Coelho de Sousa
- Cynthia Aparecida Valiati Barreto
- Moysés Nascimento
Euphytica (2024)
Genomic analysis and prediction of genomic values for distichiasis in Staffordshire bull terriers
- Dina Jørgensen
- Ernst-Otto Ropstad
- Frode Lingaas
Canine Medicine and Genetics (2023)
A comprehensive investigation into the genetic relationship between music engagement and mental health
- Laura W. Wesseldijk
- Yi Lu
- Miriam A. Mosing
Translational Psychiatry (2023)
Dissecting the causal association between social or physical inactivity and depression: a bidirectional two-sample Mendelian Randomization study
- Guorui Zhao
- Zhe Lu
- Weihua Yue
Translational Psychiatry (2023)