Predicting genetic predisposition in humans: the promise of whole-genome markers

de los Campos, Gustavo; Gianola, Daniel; Allison, David B.

doi:10.1038/nrg2898

Perspectives
Published: 03 November 2010

Predicting genetic predisposition in humans: the promise of whole-genome markers

Gustavo de los Campos¹,
Daniel Gianola² &
David B. Allison³

Nature Reviews Genetics volume 11, pages 880–886 (2010)Cite this article

3560 Accesses
183 Citations
5 Altmetric
Metrics details

Subjects

Abstract

Although genome-wide association studies have identified markers that are associated with various human traits and diseases, our ability to predict such phenotypes remains limited. A perhaps overlooked explanation lies in the limitations of the genetic models and statistical techniques commonly used in association studies. We propose that alternative approaches, which are largely borrowed from animal breeding, provide potential for advances. We review selected methods and discuss the challenges and opportunities ahead.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

Improved genetic prediction of complex traits from individual-level data or summary statistics

Article Open access 07 July 2021

Qianqian Zhang, Florian Privé, … Doug Speed

Genome-wide association studies

Article 26 August 2021

Emil Uffelmann, Qin Qin Huang, … Danielle Posthuma

Controlling for background genetic effects using polygenic scores improves the power of genome-wide association studies

Article Open access 01 October 2021

Declan Bennett, Donal O’Shea, … Cathal Seoighe

References

Guttmacher, A. E. & Collins, F. S. Genomic medicine — a primer. N. Engl. J. Med. 347, 1512–1520 (2002).
Article CAS PubMed Google Scholar
Dominiczak, A. F. & McBride, M. W. Genetics of common polygenic stroke. Nature Genet. 35, 116–117 (2003).
Article CAS PubMed Google Scholar
Maher, B. Personal genomes: the case of the missing heritability. Nature 456, 18–21 (2008).
Article CAS PubMed Google Scholar
Manolio, T. A. et al. Finding the missing heritability of complex diseases. Nature 461, 747–753 (2009).
Article CAS PubMed PubMed Central Google Scholar
Hill, W. G., Goddard, M. E. & Visscher, P. M. Data and theory point to mainly additive genetic variance for complex traits. PLoS Genet. 4, e1000008 (2008).
Article PubMed PubMed Central Google Scholar
Lander, E. S. & Schork, N. J. Genetic dissection of complex traits. Science 265, 2037–2048 (1994).
Article CAS PubMed Google Scholar
Goddard, M. E. & Hayes, B. J. Mapping genes for complex traits in domestic animals and their use in breeding programmes. Nature Rev. Genet. 10, 381–391 (2009).
Article CAS PubMed Google Scholar
Falconer, D. S. & Mackay, T. F. C. Introduction to Quantitative Genetics 4th edn (Longman, Harlow, UK, 1996).
Google Scholar
Hill, W. G. Understanding and using quantitative genetic variation. Philos. Trans. R. Soc. Lond. B 365, 73–85 (2010).
Article Google Scholar
Fisher, R. The correlation between relatives on the supposition of Mendelian inheritance. Trans. R. Soc. Edinb. Earth Sci. 52, 399–433 (1918).
Article Google Scholar
Wright, S. Systems of mating. Parts I.–V. Genetics 6, 111–178 (1921).
CAS PubMed PubMed Central Google Scholar
Henderson, C. R. Estimation of genetic parameters. Ann. Math. Stat. 21, 309–310 (1950).
Google Scholar
Henderson, C. R. Best linear unbiased estimation and prediction under a selection model. Biometrics 31, 423–447 (1975).
Article CAS PubMed Google Scholar
Meuwissen, T. H., Hayes, B. J. & Goddard, M. E. Prediction of total genetic values using genome-wide dense marker maps. Genetics 157, 1819–1829 (2001).
CAS PubMed PubMed Central Google Scholar
Habier, D. Fernando, R. L. & Dekkers, J. C. M. The impact of genetic relationships information on genome-assisted breeding values. Genetics 177, 2389–2397 (2007).
Article CAS PubMed PubMed Central Google Scholar
González-Recio, O. et al. Non-parametric methods for incorporating genomic information into genetic evaluations: an application to mortality in broilers. Genetics 178, 2305–2313 (2008).
Article PubMed PubMed Central Google Scholar
VanRaden, P. M. et al. Reliability of genomic predictions for North American Holstein bulls. J. Dairy Sci. 92, 16–24 (2009).
Article CAS PubMed Google Scholar
Hayes, B. J., Bowman, P. J., Chamberlain, A. J. & Goddard, M. E. Genomic selection in dairy cattle: progress and challenges. J. Dairy Sci. 92, 433–443 (2009).
Article CAS PubMed Google Scholar
de los Campos, G. et al. Predicting quantitative traits with regression models for dense molecular markers and pedigrees. Genetics 182, 375–385 (2009).
Article CAS PubMed PubMed Central Google Scholar
Weigel, K. A. et al. Predictive ability of direct genomic values for lifetime net merit of Holstein sires using selected subsets of single nucleotide polymorphism markers. J. Dairy Sci. 92, 5248–5257 (2009).
Article CAS PubMed Google Scholar
Vazquez, A. et al. Predictive ability of subsets of SNP with and without parent average in US Holsteins. J. Dairy Sci. 2010 (doi:10.3168/jds.2010–3335).
Hoerl, A. E. & Kennard, R. W. Ridge regression: biased estimation for non-orthogonal problems. Technometrics 12, 55–67 (1970).
Article Google Scholar
Tibshirani, R. Regression shrinkage and selection via the LASSO. J. R. Stat. Soc. Series B 58, 267–288 (1996).
Google Scholar
Zou, H. & Hastie, T. Regularization and variable selection via the elastic net. J.R. Stat. Soc. Series B 67, 301–320 (2005).
Article Google Scholar
Park, T. & Casella, G. The Bayesian LASSO. J. Am. Stat. Assoc. 103, 681–686 (2008).
Article CAS Google Scholar
Wahba, G. Spline Models for Observational Data (Society for Industrial and Applied Mathematics, Philadelphia, 1990).
Book Google Scholar
Hastie, T., Tibshirani, R. & Friedman, J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction 2nd edn (Springer-Verlag, New York, 2009).
Book Google Scholar
Gianola, D., Fernando, R. L. & Stella, A. Genomic-assisted prediction of genetic value with semiparametric procedures. Genetics 173, 1761–1776 (2006).
Article CAS PubMed PubMed Central Google Scholar
Gianola, D. & van Kaam, J. B. Reproducing kernel Hilbert spaces regression methods for genomic assisted prediction of quantitative traits. Genetics 178, 2289–2303 (2008).
Article PubMed PubMed Central Google Scholar
Kimeldorf, G. S. & Wahba, G. A correspondence between Bayesian estimation on stochastic process and smoothing by splines. Ann. Math. Stat. 41, 495–502 (1970).
Article Google Scholar
de los Campos, G., Gianola, D. & Rosa, G. J. M. Reproducing kernel Hilbert spaces regression: a general framework for genetic evaluation. J. Anim. Sci. 87, 1883–1887 (2009).
Article CAS PubMed Google Scholar
de los Campos, G., Gianola, D., Rosa, G. J. M., Weigel, K. & Crossa, J. Semi-parametric genomic-enabled prediction of genetic values using reproducing kernel Hilbert spaces regressions. Genetics Res. 92, 295–308 (2010).
Article CAS Google Scholar
Shawe-Taylor, J. & Cristianini, N. Kernel Methods for Pattern Analysis (Cambridge Univ. Press, UK, 2004).
Book Google Scholar
Schaid, D. J. Genomic similarity and kernel methods I: advancements by building on mathematical and statistical foundations. Hum. Hered. 70, 109–131 (2010).
Article PubMed Google Scholar
Garrick, D. J. The nature, scope and impact of some whole-genome analyses in beef cattle in 9th World Congress on Genetics Applied to Livestock (Leipzig, Germany, 2010).
Google Scholar
Long, N. et al. Radial basis function regression methods for predicting quantitative traits using SNP markers. Genetics Res. 92, 209–225 (2010).
Article CAS Google Scholar
Crossa, J. et al. Prediction of genetic values of quantitative traits in plant breeding using pedigree and molecular markers. Genetics 2 Sep 2010 (doi:10.1534/genetics.110.118521).
Article CAS PubMed PubMed Central Google Scholar
Piepho, H. P. Ridge regression and extensions for genomewide selection in maize. Crop Sci. 49, 1165–1176 (2009).
Article Google Scholar
Legarra, A., Robert-Granié, C., Manfredi, E. & Elsen, J. M. Performance of genomic selection in mice. Genetics 180, 611–618 (2008).
Article PubMed PubMed Central Google Scholar
Jannink, J. L., Lorenz, A. J. & Hiroyoshi, I. Genomic selection in plant breeding: from theory to practice. Brief. Funct. Genomics 9, 166–177 (2010).
Article CAS PubMed Google Scholar
Goddard, M. E. Genomic selection: prediction of accuracy and maximization of long term response. Genetica 136, 245–257 (2009).
Article PubMed Google Scholar
Zhong, S., Dekkers, J. C., Fernando R. L. & Jannink, J. L. Factors affecting accuracy from genomic selection in populations derived from multiple inbred lines: a barley case study. Genetics 182, 355–364 (2009).
Article CAS PubMed PubMed Central Google Scholar
Gianola, D. Theory and analysis of threshold characters. J. Anim. Sci. 54, 1079–1096 (1982).
Article Google Scholar
Holzapfel, C. et al. Genes and lifestyle factors in obesity: results from 12462 subjects from MONICA/KORA. Int. J. Obes. 1–8 (2010).
Seshadri, S. et al. Genome-wide analysis of genetic loci associated with Alzheimer disease. JAMA 303, 1832–1840 (2010).
Article CAS PubMed PubMed Central Google Scholar
Valenzuela, R. K. et al. Predicting phenotype from genotype: normal pigmentation. J. Forensic Sci. Soc. 55, 315–322 (2010).
Article CAS Google Scholar
Willer, C. J. et al. Six new loci associated with body mass index highlight a neuronal influence on body weight regulation. Nature Genet. 41, 25–34 (2008).
PubMed Google Scholar
Zhao, J. et al. The role of obesity-associated loci identified in genome-wide association studies in the determination of pediatric BMI. Obesity 17, 2254–2257 (2009).
Article PubMed Google Scholar
van Hoek, M. et al. Predicting type 2 diabetes based on polymorphisms from genome-wide association studies: a population-based study. Diabetes 57, 3122–3128 (2008).
Article CAS PubMed PubMed Central Google Scholar
Wary, N. R., Goddard, M. E. & Visscher, P. M. Prediction of indivual genetic risk to diseases from genome-wide association studies. Genome Res. 17, 1520–1528 (2007).
Article Google Scholar
Purcell, S. M. et al. Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature 460, 748–752 (2009).
CAS PubMed Google Scholar
Yang, J. et al. Common SNPs explain a large proportion of the heritability for human height. Nature Genet. 42, 565–569 (2010).
Article CAS PubMed Google Scholar
Witten, D. M. & Tibshirani, R. Survival analysis with high-dimensional covariates. Stat. Methods Med. Res. 19, 29–51 (2010).
Article PubMed Google Scholar
Box, G. E. P. & Draper, N. R. Empirical Model-Building and Response Surfaces (Wiley, New York, 1987).
Google Scholar
Cockerham, C. C. An extension of the concept of partitioning hereditary variance for analysis of covariance among relatives when epistasis is present. Genetics 39, 859–882 (1954).
CAS PubMed PubMed Central Google Scholar
Kempthorne, O. The correlation between relatives in a random mating population. Proc. R. Soc. Lond. B 143, 103–113 (1954).
Article Google Scholar
Lynch, M. & Ritland, K. Estimation of pairwise relatedness with molecular markers. Genetics 152, 1753–1766 (1999).
CAS PubMed PubMed Central Google Scholar
Eding, J. H. & Meuwissen, T. H. Marker based estimates of between and within population kinships for the conservation of genetic diversity. J. Anim. Breed. Genet. 118, 141–159 (2001).
Article CAS Google Scholar
Visscher, P. M. et al. Assumption-free estimation of heritability from genome-wide identity-by-descent sharing between full siblings. PLoS Genet. 2, e41 (2006).
Article PubMed PubMed Central Google Scholar
Hayes, B. J. & Goddard, M. E. Prediction of breeding values using marker-derived relationship matrices. J. Anim. Sci. 86, 2089–2092 (2008).
Article CAS PubMed Google Scholar
Feng, R., McClure, L. A., Tiwari, H. K. & Howard, G. A new estimate of family disease history providing improved prediction of disease risks. Stat. Med. 28, 1269–1283 (2009).
Article PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We are grateful to K. Grimes, A. Vazquez, Y. Klimentidis and S. Cofield for their helpful comments on this paper.

Author information

Authors and Affiliations

Gustavo de los Campos is at the Section on Statistical Genetics, Biostatistics, University of Alabama at Birmingham, 1665 University Boulevard, Alabama 35294, USA. gcampos@uab.edu,
Gustavo de los Campos
Daniel Gianola is at the Departments of Animal Sciences, Dairy Science and Biostatistics and Medical Informatics, University of Wisconsin-Madison, 1675 Observatory Dr., Wisconsin 53706, USA. gianola@ansci.wisc.edu,
Daniel Gianola
David B. Allison is at the Section on Statistical Genetics, Biostatistics, University of Alabama at Birmingham, 1665 University Boulevard, Alabama 35294, USA. DAllison@ms.soph.uab.edu,
David B. Allison

Authors

Gustavo de los Campos
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Gianola
View author publications
You can also search for this author in PubMed Google Scholar
David B. Allison
View author publications
You can also search for this author in PubMed Google Scholar

Ethics declarations

Competing interests

Gustavode los Campos has served as a consultant to CIMMYT and Aviagen; both organizations work with genomic-enabled prediction of genetic values for plant and poultry breeding, respectively. Daniel Gianola serves on the International Scientific Advisory Board of Aviagen. David Allison has received numerous grants, consulting fees and donations from non-profit and for profit entities, some of which may have interests in the genomic prediction of phenotypes.

Supplementary information

Supplementary information S1 (box)

Online Box: Probit Model (PDF 99 kb)

Glossary

Bayesian estimation: Bayesian inferences are based on the posterior distribution of the unknowns given the data. Following Bayes' rule, this distribution is proportional to the product of the distribution of the data given the unknowns times the prior distribution of the unknowns.
Basis function: In regression analysis, basis functions are functions of predictors used to construct the regression. Polynomials, exponential and logarithm are examples of basis functions commonly used for parametric regressions.
Censored phenotype: Censoring occurs when, for some individuals, the phenotypic information consists of bounds but the actual phenotypic value is unknown. This is commonly observed in longevity studies when, at the time of analysis, some patients may still be alive.
Genomic medicine: The use of genome information in the prevention, diagnosis and treatment of disorders.
Goodness of fit: A measure of how well a model fits the data in a training sample. The log likelihood and R-squared statistic are commonly used measures of goodness of fit. The residual sum of squares is a commonly used measure of lack of fit.
LASSO: The Least Absolute Shrinkage and Selection Operator²³ is a penalized estimation method commonly used in regression. The penalty function in LASSO is the sum of the absolute value of the regression coefficients. LASSO performs variable selection and shrinkage simultaneously.
Objective function: The function whose value is minimized or maximized in an optimization problem.
Ordinary least squares: The ordinary least squares estimates of parameters in a regression model are obtained by minimizing the residual sum of squares of the regression.
Over-fitting: A term used to describe the situation in which a model fits the training data well but fails to perform well when used to predict outcomes of a collection of subjects (testing data) that was not used to fit the model.
Parametric regression model: A regression model in which the regression function is set to have a known functional form (for example, a polynomial).
Penalized estimation: Penalized estimates are commonly used in situations in which the number of unknowns is large with respect to the number of records. Penalized estimates are obtained by solving an optimization problem whose objective function embeds a compromise between a goodness-of-fit measure and a measure of model complexity or penalty function.
Quantitative genetic theory: Genetic, mathematical and statistical models used to study traits that are affected by a large number of genes.
Regression model: A statistical model used to describe relationships (for example, a conditional mean) between a response variable and a set of predictors through a regression function involving some parameter(s) to be estimated from data.
Semi-parametric regression model: A regression model in which the regression function is not assumed to be a member of a parametric family.
Shrinkage: In standard estimation methods (for example, maximum likelihood or OLS) estimates are obtained by optimizing with respect to a goodness-of-it or lack-of-fit measure. Relative to these estimates, Bayesian and penalized estimates are shrunk towards some values (typically zero). This prevents over-fitting and, under certain conditions, may reduce mean-squared error of estimates and predictions.
Training data: The data set used to fit a model.

Rights and permissions

Reprints and permissions

About this article

Cite this article

de los Campos, G., Gianola, D. & Allison, D. Predicting genetic predisposition in humans: the promise of whole-genome markers. Nat Rev Genet 11, 880–886 (2010). https://doi.org/10.1038/nrg2898

Download citation

Published: 03 November 2010
Issue Date: December 2010
DOI: https://doi.org/10.1038/nrg2898

This article is cited by

Gene based markers improve precision of genome-wide association studies and accuracy of genomic predictions in rice breeding
- Chandrappa Anilkumar
- T. P. Muhammed Azharudheen
- Bhaskar Chandra Patra
Heredity (2023)
Genomic selection for agronomic traits in a winter wheat breeding program
- Alexandra Ficht
- David J. Konkin
- Istvan Rajcan
Theoretical and Applied Genetics (2023)
Incorporating kernelized multi-omics data improves the accuracy of genomic prediction
- Mang Liang
- Bingxing An
- Huijiang Gao
Journal of Animal Science and Biotechnology (2022)
Stacked kinship CNN vs. GBLUP for genomic predictions of additive and complex continuous phenotypes
- Nelson Nazzicari
- Filippo Biscarini
Scientific Reports (2022)
BayesR3 enables fast MCMC blocked processing for largescale multi-trait genomic prediction and QTN mapping analysis
- Edmond J. Breen
- Iona M. MacLeod
- Michael E. Goddard
Communications Biology (2022)