Genomic structural equation modelling provides insights into the multivariate genetic architecture of complex traits

Article metrics

Abstract

Genetic correlations estimated from genome-wide association studies (GWASs) reveal pervasive pleiotropy across a wide variety of phenotypes. We introduce genomic structural equation modelling (genomic SEM): a multivariate method for analysing the joint genetic architecture of complex traits. Genomic SEM synthesizes genetic correlations and single-nucleotide polymorphism heritabilities inferred from GWAS summary statistics of individual traits from samples with varying and unknown degrees of overlap. Genomic SEM can be used to model multivariate genetic associations among phenotypes, identify variants with effects on general dimensions of cross-trait liability, calculate more predictive polygenic scores and identify loci that cause divergence between traits. We demonstrate several applications of genomic SEM, including a joint analysis of summary statistics from five psychiatric traits. We identify 27 independent single-nucleotide polymorphisms not previously identified in the contributing univariate GWASs. Polygenic scores from genomic SEM consistently outperform those from univariate GWASs. Genomic SEM is flexible and open ended, and allows for continuous innovation in multivariate genetic analysis.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: Genomic SEM solutions for p- and neuroticism-factor models with SNP effect.
Fig. 2: Manhattan plots of unique, independent hits from genomic SEM.
Fig. 3: Out-of-sample prediction using genomic SEM- and univariate-based PGSs for psychiatric traits.

Data availability

The data that support the findings of this study are all publicly available. Links to the location of summary statistics, linkage disequilibrium scores, reference panel data and the code used to produce the current results can all be found at https://github.com/MichelNivard/GenomicSEM/wiki.

Code availability

GenomicSEM software is an R package that is available from GitHub at https://github.com/MichelNivard/GenomicSEM. The GenomicSEM R package can be installed directly at https://github.com/MichelNivard/GenomicSEM/wiki. Example GenomicSEM code, including code used to produce the results, is provided for each set of analyses at https://github.com/MichelNivard/GenomicSEM/wiki.

References

  1. 1.

    Lee, S. H. et al. Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs. Nat. Genet. 45, 984–994 (2013).

  2. 2.

    Bush, W. S., Oetjens, M. T. & Crawford, D. C. Unravelling the human genome–phenome relationship using phenome-wide association studies. Nat. Rev. Genet. 17, 129–145 (2016).

  3. 3.

    Yang, J., Lee, S. H., Goddard, M. E. & Visscher, P. M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011).

  4. 4.

    ReproGen Consortium et al. An atlas of genetic correlations across human diseases and traits. Nat. Genet. 47, 1236–1241 (2015).

  5. 5.

    Barban, N. et al. Genome-wide analysis identifies 12 loci influencing human reproductive behavior. Nat. Genet. 48, 1462–1472 (2016).

  6. 6.

    Jansen, P. R. et al. Genome-wide analysis of insomnia (N=1,331,010) identifies novel loci and functional pathways. Nat. Genet. 51, 394–403 (2019).

  7. 7.

    Wray, N. R. et al. Genome-wide association analyses identify 44 risk variants and refine the genetic architecture of major depression. Nat. Genet. 50, 668–681 (2018).

  8. 8.

    Okbay, A. et al. Genome-wide association study identifies 74 loci associated with educational attainment. Nature 533, 539–542 (2016).

  9. 9.

    Verhulst, B., Maes, H. H. & Neale, M. C. GW-SEM: a statistical package to conduct genome-wide structural equation modeling. Behav. Genet. 47, 345–359 (2017).

  10. 10.

    Beaumont, R. N. et al. Genome-wide association study of offspring birth weight in 86,577 women identifies five novel loci and highlights maternal genetic effects that are independent of fetal genetics. Hum. Mol. Genet. 27, 742–756 (2018).

  11. 11.

    Turley, P. et al. Multi-trait analysis of genome-wide association summary statistics using MTAG. Nat. Genet. 50, 229–237 (2018).

  12. 12.

    Cheung, M. W.-L. metaSEM: an R package for meta-analysis using structural equation modeling. Front. Psychol. 5, 1521–1532 (2015).

  13. 13.

    Savalei, V. & Bentler, P. M. A two-stage approach to missing data: theory and application to auxiliary variables. Struct. Equ. Modeling 16, 477–497 (2009).

  14. 14.

    Yuan, K. H. & Bentler, P. M. Robust mean and covariance structure analysis through iteratively reweighted least squares. Psychometrika 65, 43–58 (2000).

  15. 15.

    Browne, M. W. Asymptotically distribution‐free methods for the analysis of covariance structures. Br. J. Math. Stat. Psychol. 37, 62–83 (1984).

  16. 16.

    Huedo-Medina, T. B., Sánchez-Meca, J., Marín-Martínez, F. & Botella, J. Assessing heterogeneity in meta-analysis: Q statistic or I 2 index? Psychol. Methods 11, 193–220 (2006).

  17. 17.

    Caspi, A. & Moffitt, T. E. All for one and one for all: mental disorders in one dimension. Am. J. Psychiatry 175, 831–844 (2018).

  18. 18.

    Caspi, A. et al. The p factor. Clin. Psychol. Sci. 2, 119–137 (2013).

  19. 19.

    Pettersson, E., Larsson, H. & Lichtenstein, P. Common psychiatric disorders share the same genetic origin: a multivariate sibling study of the Swedish population. Mol. Psychiatry 21, 717–721 (2016).

  20. 20.

    Smoller, J. W. et al. Psychiatric genetics and the structure of psychopathology. Mol. Psychiatry 24, 409–420 (2019).

  21. 21.

    Stochl, J. et al. Mood, anxiety and psychotic phenomena measure a common psychopathological factor. Psychol. Med. 45, 1483–1493 (2015).

  22. 22.

    Seed, C. et al. Hail: An Open-Source Framework for Scalable Genetic Data. Neale Lab http://www.nealelab.is/blog/2017/7/19/rapid-gwas-of-thousands-of-phenotypes-for-337000-samples-in-the-uk-biobank (2017).

  23. 23.

    Nieuwboer, H. A., Pool, R., Dolan, C. V., Boomsma, D. I. & Nivard, M. G. GWIS: genome-wide inferred statistics for functions of multiple phenotypes. Am. J. Hum. Genet. 99, 917–927 (2016).

  24. 24.

    Rietveld, C. A. et al. GWAS of 126,559 individuals identifies genetic variants associated with educational attainment. Science 6139, 1467–1471 (2013).

  25. 25.

    Ruderfer, D. M. et al. Polygenic dissection of diagnosis and clinical dimensions of bipolar disorder and schizophrenia. Mol. Psychiatry 19, 1017–1024 (2014).

  26. 26.

    Maier, R. M. et al. Improving genetic prediction by leveraging genetic correlations among human diseases and traits. Nat. Commun. 9, 989–993 (2018).

  27. 27.

    Van der Sluis, S., Posthuma, D. & Dolan, C. V. TATES: efficient multivariate genotype–phenotype analysis for genome-wide association studies. PLoS Genet. 9, e1003235 (2013).

  28. 28.

    Allegrini, A. et al. Genomic prediction of cognitive traits in childhood and adolescence. Preprint at biorXiv https://www.biorxiv.org/content/10.1101/418210v1 (2018).

  29. 29.

    Rhemtulla, M., Brosseau-Liard, P. É. & Savalei, V. When can categorical variables be treated as continuous? A comparison of robust continuous and categorical SEM estimation methods under suboptimal conditions. Psychol. Methods 17, 354–373 (2012).

  30. 30.

    Pers, T. H. et al. Biological interpretation of genome-wide association studies using predicted gene functions. Nat. Commun. 6, 5890–5895 (2015).

  31. 31.

    Li, Z. et al. Genome-wide association analysis identifies 30 new susceptibility loci for schizophrenia. Nat. Genet. 49, 1576–1583 (2017).

  32. 32.

    Hu, Y. et al. GWAS of 89,283 individuals identifies genetic variants associated with self-reporting of being a morning person. Nat. Commun. 7, 10448–10453 (2016).

  33. 33.

    The Autism Spectrum Disorders Working Group et al. Meta-analysis of GWAS of over 16,000 individuals with autism spectrum disorder highlights a novel locus at 10q24.32 and a significant overlap with schizophrenia. Mol. Autism 8, 21 (2017).

  34. 34.

    Hill, W. D. et al. A combined analysis of genetically correlated traits identifies 187 loci and a role for neurogenesis and myelination in intelligence. Mol. Psychiatry 24, 169–181 (2019).

  35. 35.

    Martin, N. G. & Eaves, L. J. The genetical analysis of covariance structure. Heredity 38, 79–95 (1977).

  36. 36.

    Zhu, X. et al. Meta-analysis of correlated traits via summary statistics from GWASs with an application in hypertension. Am. J. Hum. Genet. 96, 21–36 (2015).

  37. 37.

    Ray, D. & Boehnke, M. Methods for meta‐analysis of multiple traits using GWAS summary statistics. Genet. Epidemiol. 42, 134–145 (2018).

  38. 38.

    O'Reilly, P. F. et al. MultiPhen: joint model of multiple phenotypes can increase discovery in GWAS. PLoS One 7, e34861 (2012).

  39. 39.

    de Vlaming, R., Johannesson, M., Magnusson, P. K., Ikram, M. A. & Visscher, P. M. Equivalence of LD-score regression and individual-level-data methods. Preprint at biorXiv https://www.biorxiv.org/content/10.1101/211821v1 (2017).

  40. 40.

    Lee, J. J., McGue, M., Iacono, W. G. & Chow, C. C. The accuracy of LD Score regression as an estimator of confounding and genetic correlations in genome-wide association studies. Genet. Epidemiol. 42, 783–795 (2018).

  41. 41.

    Jöreskog, K. G. & Sörbom, D. LISREL 8: Structural Equation Modeling with the SIMPLIS Command Language (Scientific Software International, 1993).

  42. 42.

    Boker, S. M. & McArdle, J. J. Path analysis and path diagrams. Wiley StatsRef: Statistics Reference Online https://onlinelibrary.wiley.com/doi/10.1002/9781118445112.stat06517 (2014).

  43. 43.

    Bulik-Sullivan, B. K. et al. LD score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).

  44. 44.

    Baselmans, B. M. et al. Multivariate genome-wide analyses of the well-being spectrum. Nat. Genet. 51, 445–451 (2019).

  45. 45.

    Bates, D., Maechler, M., Davis, T. A., Oehlschlägel, J. & Riedy, R. matrix: Sparse and dense matrix classes and methods. R package version 1.2-12 (2017).

  46. 46.

    Flora, D. B. & Curran, P. J. An empirical evaluation of alternative methods of estimation for confirmatory factor analysis with ordinal data. Psychol. Methods 9, 466–491 (2004).

  47. 47.

    Savalei, V. Understanding robust corrections in structural equation modeling. Struct. Equ. Modeling 21, 149–160 (2014).

  48. 48.

    Yarkoni, T. & Westfall, J. Choosing prediction over explanation in psychology: lessons from machine learning. Persp. Psychol. Sci. 12, 1100–1122 (2017).

  49. 49.

    Lloyd-Jones, L. R., Robinson, M. R., Yang, J. & Visscher, P. M. Transformation of summary statistics from linear mixed model association on all-or-none traits to odds ratio. Genetics 208, 1397–1408 (2018).

  50. 50.

    Kenny, D. A. Measuring model fit. David A. Kenny http://davidakenny.net/cm/fit.htm (2014).

  51. 51.

    Kaplan, D. Structural Equation Modeling: Foundations and Extensions Vol. 10 (Sage, 2008).

  52. 52.

    Tanaka, J. S. Multifaceted conceptions of fit in structural equation models. In Testing Structutal Equation Models 10–37 (Sage, 1993).

  53. 53.

    Hu, L. T. & Bentler, P. M. Cutoff criteria for fit indexes in covariance structure analysis: conventional criteria versus new alternatives. Struct. Equ. Modeling 6, 1–55 (1999).

  54. 54.

    Bentler, P. M. & Hu, L. T. in Structural Equation Modeling: Concepts, Issues, and Applications 76–99 (SAGE Publications Inc., 1995).

  55. 55.

    Bentler, P. M. & Satorra, A. Testing model nesting and equivalence. Psychol. Methods 15, 111–123 (2010).

  56. 56.

    Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).

  57. 57.

    Consortium, I. H. The international HapMap project. Nature 426, 789–796 (2003).

  58. 58.

    Ripke, S. et al. A mega-analysis of genome-wide association studies for major depressive disorder. Mol. Psychiatry 18, 497–511 (2013).

  59. 59.

    Ripke, S. et al. Biological insights from 108 schizophrenia-associated genetic loci. Nature 511, 421–427 (2014).

  60. 60.

    Muthén, L. K. & Muthén, B. O. Mplus: The Comprehensive Modeling Program for Applied Researchers. Version 7.3. https://www.statmodel.com/download/usersguide/MplusUserGuideVer_7.pdf (Muthén & Muthén, 2014).

  61. 61.

    Euesden, J., Lewis, C. M. & O’Reilly, P. F. PRSice: polygenic risk score software. Bioinformatics 31, 1466–1468 (2014).

  62. 62.

    Eysenck, S. B., Eysenck, H. J. & Barrett, P. A revised version of the psychoticism scale. Pers. Individ. Diff. 6, 21–29 (1985).

  63. 63.

    Smith, B. H. et al. Cohort profile: Generation Scotland: Scottish Family Health Study (GS:SFHS). The study, its participants and their potential for genetic research on health and illness. Int. J. Epidemiol. 42, 689–700 (2012).

  64. 64.

    Rossel, Y. lavaan: An R package for structural equation modeling and more. Version 0.5–12 (BETA) http://users.ugent.be/~yrosseel/lavaan/lavaanIntroduction.pdf (2012).

  65. 65.

    Neale, M. C. et al. OpenMx 2.0: extended structural equation and statistical modeling. Psychometrika 81, 535–549 (2016).

Download references

Acknowledgements

E.M.T.-D., K.P.H. and A.D.G. were supported by NIH grant R01HD083613. E.M.T.-D., S.J.R. and I.J.D. were supported by NIH grant R01AG054628. E.M.T.-D. and K.P.H. were each supported by Jacobs Foundation research fellowships. E.M.T.-D. and K.P.H. are members of the Population Research Center at the University of Texas at Austin, which is supported by NIH grant P2CHD042849. M.G.N. is supported by a Royal Netherlands Academy of Science Professor Award to D. I. Boomsma (PAH/6635), ZonMw grant: ‘Genetics as a research tool: a natural experiment to elucidate the causal effects of social mobility on health’ (pnr: 531003014) and ZonMw project: ‘Can sex- and gender-specific gene expression and epigenetics explain sex-differences in disease prevalence and etiology?’ (pnr: 849200011). H.F.I. is supported by the ‘Aggression in children: unraveling gene–environment interplay to inform treatment and intervention strategies' (ACTION) project. ACTION receives funding from the European Union Seventh Framework Program (FP7/2007-2013) under grant agreement number 602768. P.D.K. and R.d.V. were supported by ERC Consolidator Grant 647648 EdGe. I.J.D., A.M.M., S.J.R., R.E.M. and W.D.H. are members of the University of Edinburgh Centre for Cognitive Ageing and Cognitive Epidemiology, which is part of the cross-council Lifelong Health and Wellbeing Initiative (MR/K026992/1). W.D.H. is supported by a grant from Age UK (Disconnected Mind Project). PGS analyses for the p factor were conducted under UKB dataset resource–application number 4844. PGS analyses for neuroticism were conducted using data from Generation Scotland. Generation Scotland received core support from the Chief Scientist Office of the Scottish Government Health Directorates (CZD/16/6) and the Scottish Funding Council (HR03006). Genotyping of the Generation Scotland:Scottish Family Health Study samples was carried out by the Genetics Core Laboratory at the Wellcome Trust Clinical Research Facility, Edinburgh, Scotland, and was funded by the Medical Research Council UK and Wellcome Trust (Wellcome Trust Strategic Award ‘STratifying Resilience and Depression Longitudinally’ (STRADL) reference 104036/Z/14/Z). Ethical approval for the Generation Scotland:Scottish Family Health Study was obtained from the Tayside Committee on Medical Research Ethics (on behalf of the National Health Service). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Author information

A.D.G., M.R., H.F.I., M.G.N. and E.M.T.-D. developed the software. A.D.G., M.G.N. and E.M.T.-D. developed the theory underlying genomic SEM. A.D.G., M.R., R.d.V., M.G.N. and E.M.T.-D. developed the techniques and mathematical derivations. A.D.G., T.T.M., M.G.N. and E.M.T.-D. performed the simulation studies. S.J.R., R.E.M. and E.M.T.-D. performed the polygenic prediction analyses. A.D.G., M.G.N. and E.M.T.-D. wrote the manuscript. M.R., S.J.R., T.T.M., W.D.H., A.M.M., I.J.D., R.E.M., P.D.K. and K.P.H. provided feedback and edited the manuscript.

Correspondence to Andrew D. Grotzinger.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Methods, Supplementary Results, and Supplementary Figures 1–27.

Reporting Summary

Supplementary Dataset

Study raw data presented in Supplementary Tables 1–21.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Further reading