Abstract

Genetic association studies have yielded a wealth of biological discoveries. However, these studies have mostly analyzed one trait and one SNP at a time, thus failing to capture the underlying complexity of the data sets. Joint genotype-phenotype analyses of complex, high-dimensional data sets represent an important way to move beyond simple genome-wide association studies (GWAS) with great potential. The move to high-dimensional phenotypes will raise many new statistical problems. Here we address the central issue of missing phenotypes in studies with any level of relatedness between samples. We propose a multiple-phenotype mixed model and use a computationally efficient variational Bayesian algorithm to fit the model. On a variety of simulated and real data sets from a range of organisms and trait types, we show that our method outperforms existing state-of-the-art methods from the statistics and machine learning literature and can boost signals of association.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

References

  1. 1.

    Human phenotyping on a population scale. Nat. Methods 12, 711–714 (2015).

  2. 2.

    et al. A genome-wide meta-analysis identifies 22 loci associated with eight hematological parameters in the HaemGen consortium. Nat. Genet. 41, 1182–1190 (2009).

  3. 3.

    & Efficient multivariate linear mixed model algorithms for genome-wide association studies. Nat. Methods 11, 407–409 (2014).

  4. 4.

    et al. Polymorphisms in B3GAT1, SLC9A9 and MGAT5 are associated with variation within the human plasma N-glycome of 3533 European adults. Hum. Mol. Genet. 20, 5000–5011 (2011).

  5. 5.

    et al. Genomics meets glycomics—the first GWAS study of human N-glycome identifies HNF1α as a master regulator of plasma protein fucosylation. PLoS Genet. 6, e1001256 (2010).

  6. 6.

    et al. MultiPhen: joint model of multiple phenotypes can increase discovery in GWAS. PLoS One 7, e34861 (2012).

  7. 7.

    , , , & Estimation of pleiotropy between complex diseases using single-nucleotide polymorphism–derived genomic relationships and restricted maximum likelihood. Bioinformatics 28, 2540–2542 (2012).

  8. 8.

    et al. An integrative genomics approach to infer causal associations between gene expression and disease. Nat. Genet. 37, 710–717 (2005).

  9. 9.

    & Multipoint quantitative-trait linkage analysis in general pedigrees. Am. J. Hum. Genet. 62, 1198–1211 (1998).

  10. 10.

    , , , & Association analysis in a variance components framework. Genet. Epidemiol. 21 (suppl. 1), S341–S346 (2001).

  11. 11.

    , & Prediction of total genetic value using genome-wide dense marker maps. Genetics 157, 1819–1829 (2001).

  12. 12.

    et al. Bivariate genome-wide association study suggests that the DARC gene influences lean body mass and age at menarche. Sci. China Life Sci. 55, 516–520 (2012).

  13. 13.

    et al. Evaluation of genetic risk scores for lipid levels using genome-wide markers in the Framingham Heart Study. BMC Proc. 3 (suppl. 7), S46 (2009).

  14. 14.

    , & Prediction of hypertension based on the genetic analysis of longitudinal phenotypes: a comparison of different modeling approaches for the binary trait of hypertension. BMC Proc. 8 (suppl. 1) Genetic Analysis Workshop 18Vanessa Olmo, S78 (2014).

  15. 15.

    , , & Multiple quantitative trait analysis using Bayesian networks. Genetics 198, 129–137 (2014).

  16. 16.

    , & A methodology for multivariate phenotype-based genome-wide association studies to mine pleiotropic genes. BMC Syst. Biol. 5(suppl. 2), S13 (2011).

  17. 17.

    , , & A combinatorial approach for detecting gene-gene interaction using multiple traits of Genetic Analysis Workshop 16 rheumatoid arthritis data. BMC Proc. 3 (suppl. 7), S43 (2009).

  18. 18.

    et al. An atlas of genetic influences on human blood metabolites. Nat. Genet. 46, 543–550 (2014).

  19. 19.

    et al. Human metabolic individuality in biomedical and pharmaceutical research. Nature 477, 54–60 (2011).

  20. 20.

    , , & On the distance of genetic relationships and the accuracy of genomic prediction in pig breeding. Genet. Sel. Evol. 46, 49 (2014).

  21. 21.

    , , & Genome-wide association analysis for multiple continuous secondary phenotypes. Am. J. Hum. Genet. 92, 744–759 (2013).

  22. 22.

    et al. Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 42, 565–569 (2010).

  23. 23.

    , & The Northern Swedish Population Health Study (NSPHS)—a paradigmatic study in a rural population combining community health and basic research. Rural Remote Health 10, 1363 (2010).

  24. 24.

    , , , & Finding the sources of missing heritability in a yeast cross. Nature 494, 234–237 (2013).

  25. 25.

    et al. Combined sequence-based and genetic mapping analysis of complex traits in outbred rats. Nat. Genet. 45, 767–775 (2013).

  26. 26.

    et al. Dissection of additive genetic variability for quantitative traits in chickens using SNP markers. J. Anim. Breed. Genet. 131, 183–193 (2014).

  27. 27.

    et al. An eight-parent multiparent advanced generation inter-cross population for winter-sown wheat: creation, properties, and validation. G3 (Bethesda) 4, 1603–1610 (2014).

  28. 28.

    & A multivariate test of association. Bioinformatics 25, 132–133 (2009).

  29. 29.

    , , , & A comparison of multivariate genome-wide association methods. PLoS One 9, e95923 (2014).

  30. 30.

    , , & Efficient set tests for the genetic analysis of correlated traits. Nat. Methods 12, 755–758 (2015).

  31. 31.

    , , & Network inference in matrix-variate Gaussian models with non-independent noise. arXiv (2013).

  32. 32.

    , , , & A method for fine mapping quantitative trait loci in outbred animal stocks. Proc. Natl. Acad. Sci. USA 97, 12649–12654 (2000).

  33. 33.

    & Genotype imputation for genome-wide association studies. Nat. Rev. Genet. 11, 499–511 (2010).

  34. 34.

    Insulin-like growth factor-1 potentiates platelet activation via the IRS/PI3Kα pathway. Blood 110, 4243–4252 (2007).

  35. 35.

    & Role of fibronectin assembly in platelet thrombus formation. J. Thromb. Haemost. 4, 1461–1469 (2006).

  36. 36.

    et al. Signaling by ephrinB1 and Eph kinases in platelets promotes Rap1 activation, platelet adhesion, and aggregation via effector pathways that do not require phosphorylation of ephrinB1. Blood 103, 1348–1355 (2004).

  37. 37.

    et al. Y-box binding protein-1 down-regulates expression of carbamoyl phosphate synthetase-I by suppressing CCAAT enhancer-binding protein-α function in mice. Gastroenterology 137, 330–340 (2009).

  38. 38.

    , , & Hyperammonemia inhibits platelet aggregation in rats. Thromb. Res. 81, 195–201 (1996).

  39. 39.

    , & CTLA4-Ig prevents alloantibody production and BMT rejection in response to platelet transfusions in mice. Transfusion 52, 2209–2219 (2012).

  40. 40.

    et al. Unraveling modulators of platelet reactivity in cardiovascular patients using omics strategies: towards a network biology paradigm. Adv. Intern. Med. 1, 25–37 (2013).

  41. 41.

    et al. A novel transcription factor, T-bet, directs Th1 lineage commitment. Cell 100, 655–669 (2000).

  42. 42.

    , & Polygenic modeling with Bayesian sparse linear mixed models. PLoS Genet. 9, e1003264 (2013).

  43. 43.

    GTEx Consortium. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348, 648–660 (2015).

  44. 44.

    & Statistical Analysis with Missing Data (John Wiley & Sons, 1987).

  45. 45.

    , & Linear response methods for accurate covariance estimates from mean field variational Bayes. arXiv (2015).

  46. 46.

    et al. Pervasive sharing of genetic effects in autoimmune disease. PLoS Genet. 7, e1002254 (2011).

  47. 47.

    , , , & Pleiotropy in complex traits: challenges and strategies. Nat. Rev. Genet. 14, 483–495 (2013).

  48. 48.

    et al. A powerful and efficient set test for genetic markers that handles confounders. Bioinformatics 29, 1526–1533 (2013).

  49. 49.

    & Genome-wide efficient mixed-model analysis for association studies. Nat. Genet. 44, 821–824 (2012).

  50. 50.

    , & Bivariate quantitative trait linkage analysis: pleiotropy versus co-incident linkages. Genet. Epidemiol. 14, 953–958 (1997).

  51. 51.

    , & Spectral regularization algorithms for learning large incomplete matrices. J. Mach. Learn. Res. 11, 2287–2322 (2010).

  52. 52.

    et al. Missing value estimation methods for DNA microarrays. Bioinformatics 17, 520–525 (2001).

  53. 53.

    & mice: multivariate imputation by chained equations in R. J. Stat. Softw. 45, 1–67 (2011).

  54. 54.

    & Transposable regularized covariance models with an application to missing data imputation. Ann. Appl. Stat. 4, 764–790 (2010).

  55. 55.

    Some matrix-variate distribution theory: notational considerations and a Bayesian application. Biometrika 68, 265–274 (1981).

  56. 56.

    , , & An introduction to variational methods for graphical models. Mach. Learn. 37, 183–233 (1999).

  57. 57.

    , , , & in Lecture Notes in Computer Science Vol. 8189 (eds. Hutchison, D. et al.) 210–225 (Springer, 2013).

  58. 58.

    et al. Rank-one matrix pursuit for matrix completion. Proc. 31st Int. Conf. Machine Learning 91–99 (2014).

  59. 59.

    , , , & Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nat. Genet. 44, 955–959 (2012).

Download references

Acknowledgements

J.M. acknowledges support from the European Research Council (ERC; grant 617306). A.D. acknowledges support from Wellcome Trust grant 099680/Z/12/Z. This work was supported by Wellcome Trust grant 090532/Z/09/Z. A.K. acknowledges support from the Royal Society under the Industry Fellowship scheme.

Author information

Author notes

    • Andrew Dahl
    •  & Valentina Iotchkova

    These authors contributed equally to this work.

Affiliations

  1. Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK.

    • Andrew Dahl
    • , Richard Mott
    •  & Jonathan Marchini
  2. Human Genetics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, UK.

    • Valentina Iotchkova
    •  & Nicole Soranzo
  3. European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, UK.

    • Valentina Iotchkova
    •  & Amelie Baud
  4. Department of Immunology, Genetics and Pathology, Science for Life Laboratory Uppsala, Uppsala University, Uppsala, Sweden.

    • Åsa Johansson
    •  & Ulf Gyllensten
  5. Aviagen, Ltd., Newbridge, UK.

    • Andreas Kranis
  6. Roslin Institute, University of Edinburgh, Midlothian, UK.

    • Andreas Kranis
  7. Department of Statistics, University of Oxford, Oxford, UK.

    • Jonathan Marchini

Authors

  1. Search for Andrew Dahl in:

  2. Search for Valentina Iotchkova in:

  3. Search for Amelie Baud in:

  4. Search for Åsa Johansson in:

  5. Search for Ulf Gyllensten in:

  6. Search for Nicole Soranzo in:

  7. Search for Richard Mott in:

  8. Search for Andreas Kranis in:

  9. Search for Jonathan Marchini in:

Contributions

A.D., V.I. and J.M. developed the method. A.D. carried out all analysis. J.M. and A.D. wrote the manuscript. A.B. and R.M. provided extensive advice on analysis of the rat GWAS data set. Å.J. and U.G. provided the NSPHS data set. N.S. provided the UKNBS data set. A.K. provided the chicken data set and advice on analysis. All authors critiqued the manuscript.

Competing interests

A.K. is an employee of Aviagen, Ltd., a poultry breeding company that supplies broiler breeding stock worldwide. A.K. also holds an Industry Fellowship from the Royal Society and is based part time in the Roslin Institute.

Corresponding author

Correspondence to Jonathan Marchini.

Integrated supplementary information

Supplementary information

PDF files

  1. 1.

    Supplementary Text and Figures

    Supplementary Figures 1–12 and Supplementary Note.

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/ng.3513

Further reading