Abstract
In this study, we used insurance claims for over one-third of the entire US population to create a subset of 128,989 families (481,657 unique individuals). We then used these data to (i) estimate the heritability and familial environmental patterns of 149 diseases and (ii) infer the genetic and environmental correlations for disease pairs from a set of 29 complex diseases. The majority (52 of 65) of our study's heritability estimates matched earlier reports, and 84 of our estimates appear to have been obtained for the first time. We used correlation matrices to compute environmental and genetic disease classifications and corresponding reliability measures. Among unexpected observations, we found that migraine, typically classified as a disease of the central nervous system, appeared to be most genetically similar to irritable bowel syndrome and most environmentally similar to cystitis and urethritis, all of which are inflammatory diseases.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
van de Water, T., Suliman, S. & Seedat, S. Gender and cultural issues in psychiatric nosological classification systems. CNS Spectr. 21, 334–340 (2016).
Kendler, K.S. The nature of psychiatric disorders. World Psychiatry 15, 5–12 (2016).
Endlicher, S. Genera Plantarum Secundum Ordines Naturales Disposita (F. Beck, 1836).
Jussieu, A.L.d. & Stafleu, F.A. Genera Plantarum (Upsaliæ:apud. J. Cramer; Stechert-Hafner Service Agency, 1964).
Linné, C.v. et al. The Families of Plants: With Their Natural Characters, According to the Number, Figure, Situation, and Proportion of All of the Parts of Fructification (John Jackson, 1787).
Thunberg, K.P. et al. Nova Genera Plantarum (Upsaliæ :apud. J. Edman etc., 1781).
Anderson, M.J. Carl Linnaeus: Genius of Classification (Enslow Publishers, 2015).
Felsenstein, J. Inferring Phylogenies (Sinauer Associates, 2004).
Suthram, S. et al. Network-based elucidation of human disease similarities reveals common functional modules enriched for pluripotent drug targets. PLoS Comput. Biol. 6, e1000662 (2010).
Fisher, R.A. XV.—the correlation between relatives on the supposition of Mendelian inheritance. Trans. R. Soc. Edinb. 52, 399–433 (1918).
Wright, S. Systems of mating. I. The biometric relations between parent and offspring. Genetics 6, 111–123 (1921).
Lynch, M. & Walsh, B. Genetics and Analysis of Quantitative Traits (Sinauer, 1998).
Gelman, A. Bayesian Data Analysis 3rd edn. (CRC Press, 2014).
Hadfield, J.D. MCMC methods for multi-response generalized linear mixed models: the MCMCglmm R package. J. Stat. Softw. 33, 1–22 (2010).
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate—a practical and powerful approach to multiple testing. J. Royal Stat. Soc. B Met. 57, 289–300 (1995).
Lichtenstein, P. et al. Common genetic determinants of schizophrenia and bipolar disorder in Swedish families: a population-based study. Lancet 373, 234–239 (2009).
Boyle, E.A., Li, Y.I. & Pritchard, J.K. An expanded view of complex traits: from polygenic to omnigenic. Cell 169, 1177–1186 (2017).
Saitou, N. & Nei, M. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4, 406–425 (1987).
Efron, B. The Jackknife, the Bootstrap and Other Resampling Plans (Society for Industrial and Applied Mathematics, 1982).
Felsenstein, J. Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39, 783–791 (1985).
Efron, B. The bootstrap and Markov-chain Monte Carlo. J. Biopharm. Stat. 21, 1052–1062 (2011).
Farh, K.K. et al. Genetic and epigenetic fine mapping of causal autoimmune disease variants. Nature 518, 337–343 (2015).
Gormley, P. et al. Meta-analysis of 375,000 individuals identifies 38 susceptibility loci for migraine. Nat. Genet. 48, 856–866 (2016).
Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases and traits. Nat. Genet. 47, 1236–1241 (2015).
Xia, C. et al. Pedigree- and SNP-associated genetics and recent environment are the major contributors to anthropometric and cardiometabolic trait variation. PLoS Genet. 12, e1005804 (2016).
Schildkraut, J.M., Risch, N. & Thompson, W.D. Evaluating genetic association among ovarian, breast, and endometrial cancer: evidence for a breast/ovarian cancer relationship. Am. J. Hum. Genet. 45, 521–529 (1989).
Davis, L.K. et al. Partitioning the heritability of Tourette syndrome and obsessive compulsive disorder reveals differences in genetic architecture. PLoS Genet. 9, e1003864 (2013).
Lee, S.H. et al. Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs. Nat. Genet. 45, 984–994 (2013).
Loh, P.R. et al. Contrasting genetic architectures of schizophrenia and other complex diseases using fast variance-components analysis. Nat. Genet. 47, 1385–1392 (2015).
Muñoz, M. et al. Evaluating the contribution of genetics and familial shared environment to common disease using the UK Biobank. Nat. Genet. 48, 980–983 (2016).
Vattikuti, S., Guo, J. & Chow, C.C. Heritability and genetic correlations explained by common SNPs for metabolic syndrome traits. PLoS Genet. 8, e1002637 (2012).
Liu, C. et al. Revisiting heritability accounting for shared environmental effects and maternal inheritance. Hum. Genet. 134, 169–179 (2015).
Zuk, O., Hechter, E., Sunyaev, S.R. & Lander, E.S. The mystery of missing heritability: genetic interactions create phantom heritability. Proc. Natl. Acad. Sci. USA 109, 1193–1198 (2012).
Zaitlen, N. et al. Using extended genealogy to estimate components of heritability for 23 quantitative and dichotomous traits. PLoS Genet. 9, e1003520 (2013).
Wray, N.R. & Maier, R. Genetic basis of complex genetic disease: the contribution of disease heterogeneity to missing heritability. Curr. Epidemiol. Rep. 1, 220–227 (2014).
Ojodu, J., Hulihan, M.M., Pope, S.N. & Grant, A.M. Incidence of sickle cell trait—United States, 2010. MMWR Morb. Mortal. Wkly. Rep. 63, 1155–1158 (2014).
Denny, J.C. et al. PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene–disease associations. Bioinformatics 26, 1205–1210 (2010).
Korsgaard, I.R. et al. Multivariate Bayesian analysis of Gaussian, right censored Gaussian, ordered categorical and binary traits using Gibbs sampling. Genet. Sel. Evol. 35, 159–183 (2003).
Falconer, D. & Mackay, T. Introduction to Quantitative Genetics 4th edn. (Longman Scientific and Technical, 1996).
Falconer, D.S. The inheritance of liability to certain diseases, estimated from the incidence among relatives. Ann. Hum. Genet. 29, 51–76 (1965).
Sorensen, D. & Gianola, D. Likelihood, Bayesian and MCMC Methods in Quantitative Genetics (Springer-Verlag, 2002).
Rodriguez, G. & Goldman, N. An assessment of estimation procedures for multilevel models with binary responses. J. R. Stat. S`. Ser. A Stat. Soc. 158, 73–89 (1995).
de Villemereuil, P., Gimenez, O. & Doligez, B. Comparing parent–offspring regression with frequentist and Bayesian animal models to estimate heritability in wild populations: a simulation study for Gaussian and binary traits. Methods Ecol. Evol. 4, 260–275 (2013).
Gelman, A. Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper). Bayesian Anal. 1(3), 515–534 (2006).
Gelman, A. & Rubin, D.B. Inference from iterative simulation using multiple sequences. Stat. Sci. 7, 457–511 (1992).
Heidelberger, P. & Welch, P.D. Simulation run length control in the presence of an initial transient. Opns Res. 31, 1109–1144 (1983).
Plummer, M., Best, N., Cowles, K. & Vines, K. CODA: Convergence Diagnosis and Output Analysis for MCMC. R News 6, 7–11 (2006).
Benjamini, Y. & Yekutieli, D. The control of the false discovery rate in multiple testing under dependency. Ann. Stat. 29, 1165–1188 (2001).
Spiegelhalter, D.J., Best, N.G., Carlin, B.P. & Van Der Linde, A. Bayesian measures of model complexity and fit. J. Royal Stat. Soc. B Stat. Methodol. 64, 583–639 (2002).
Bérénos, C., Ellis, P.A., Pilkington, J.G. & Pemberton, J.M. Estimating quantitative genetic parameters in wild populations: a comparison of pedigree and genomic approaches. Mol. Ecol. 23, 3434–3451 (2014).
Charmantier, A. & Réale, D. How do misassigned paternities affect the estimation of heritability in the wild? Mol. Ecol. 14, 2839–2850 (2005).
Morrissey, M.B., Wilson, A.J., Pemberton, J.M. & Ferguson, M.M. A framework for power and sensitivity analyses for quantitative genetic studies of natural populations, and case studies in Soay sheep (Ovis aries). J. Evol. Biol. 20, 2309–2321 (2007).
Kreider, R.M. & Lofquist, D.A. Adopted children and stepchildren: 2010. P20-572. (US Census Bureau, 2014).
Anttila, V. et al. Analysis of shared heritability in common disorders of the brain. Preprint at bioRxiv. https://doi.org/10.1101/048991 (2016).
Pippitt, K., Li, M. & Gurgle, H.E. Diabetes mellitus: screening and diagnosis. Am. Fam. Physician 93, 103–109 (2016).
Acknowledgements
We thank E. Gannon, R. Melamed, R. Mork, and M. Rzhetsky for numerous comments on earlier versions of the manuscript. This work was funded by the DARPA Big Mechanism program under ARO contract W911NF1410333, by National Institutes of Health grants R01HL122712, 1P50MH094267, and U01HL108634-01, and by a gift from Liz and Kent Dauten.
Author information
Authors and Affiliations
Contributions
All authors contributed extensively to the work presented in this paper. K.W. and A.R. designed experiments, analyzed data, and wrote the manuscript; K.W., H.G., and H.P. performed computational experiments; and N.J.C., H.G., and H.P. contributed to iterative improvement of the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Integrated supplementary information
Supplementary Figure 1 Environmental effects estimates.
(a) Common couple environment effects. (b) Common sibling environment effects. (c) Unique environment effects. Bar color in the bar plots indicates biological systems associated with each disease, consistent throughout all figures.
Supplementary Figure 2 Testing dependence of heritability estimates on age of onset; heritability distributions, sorted by biological system.
(a) Histograms and density plots of heritability estimates by biological system. (b) Heritability estimate versus disease age of onset for biological systems with more than three diseases, with linear fits indicated by solid lines.
Supplementary Figure 4 Classification trees: ICD-9 versus phenotypic correlations.
(a) A classification of diseases that corresponds to a subset of ICD-9 taxonomy. (b) Disease classification constructed from phenotypic correlations between diseases; distances between diseases were calculated as 1 – correlation.
Supplementary Figure 5 Neighbor-joining classifications showing the 29 conditions’ nosologies inferred from genetic and environmental correlations presented on the left and the right trees, respectively.
For both classifications, we defined the distance between diseases as 1 – correlation. Because we estimated a posterior distribution for each correlation estimate, we were able to sample 10,000 distance sets using posterior distributions for pairwise correlations. For each of these samples, we estimated a classification and computed reliability measures for individual classification topology partitions (each integer number on the tree indicates the percentage of trees out of 10,000 in which this particular partition was present). Disease labels are colored according to associated biological systems, consistent with other figures. Note that, while the genetic and environmental trees are significantly different, both are stable, as the bootstrap-like numbers indicate.
Supplementary Figure 6 Estimates of age-related increase in disease liability for seven late-onset conditions (aneurysm, atherosclerosis, benign colon neoplasm, cataract, cerebrovascular disease, keratosis, and osteoarthritis).
Error bars show 1 s.d., and LOcally WEighted Scatter-plot Smoother (LOWESS) curve fits are shown with solid lines.
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–6 and Supplementary Tables 3 and 5–8 (PDF 1883 kb)
Supplementary Table 1
Acronyms, biological systems, prevalence percentages and standard errors for 149 studied diseases. (XLSX 74 kb)
Supplementary Table 2
Heritability and preventability estimates and standard deviations for 149 studied diseases. (XLSX 83 kb)
Supplementary Table 4
Pairwise estimates and standard deviations of genetic, environmental and phenotypic correlations for 29 diseases. (XLSX 64 kb)
Rights and permissions
About this article
Cite this article
Wang, K., Gaitsch, H., Poon, H. et al. Classification of common human diseases derived from shared genetic and environmental determinants. Nat Genet 49, 1319–1325 (2017). https://doi.org/10.1038/ng.3931
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/ng.3931
This article is cited by
-
Genetic and phenotypic similarity across major psychiatric disorders: a systematic review and quantitative assessment
Translational Psychiatry (2024)
-
Influences of Genetic and Environmental Factors on Chronic Migraine: A Narrative Review
Current Pain and Headache Reports (2024)
-
Polygenic risk score-based phenome-wide association for glaucoma and its impact on disease susceptibility in two large biobanks
Journal of Translational Medicine (2024)
-
Genetic and environmental contributions to co-occurring physical health conditions in autism spectrum condition and attention-deficit/hyperactivity disorder
Molecular Autism (2023)
-
Genetic Risk Assessment of Degenerative Eye Disease (GRADE): study protocol of a prospective assessment of polygenic risk scores to predict diagnosis of glaucoma and age-related macular degeneration
BMC Ophthalmology (2023)