In this study, we used insurance claims for over one-third of the entire US population to create a subset of 128,989 families (481,657 unique individuals). We then used these data to (i) estimate the heritability and familial environmental patterns of 149 diseases and (ii) infer the genetic and environmental correlations for disease pairs from a set of 29 complex diseases. The majority (52 of 65) of our study's heritability estimates matched earlier reports, and 84 of our estimates appear to have been obtained for the first time. We used correlation matrices to compute environmental and genetic disease classifications and corresponding reliability measures. Among unexpected observations, we found that migraine, typically classified as a disease of the central nervous system, appeared to be most genetically similar to irritable bowel syndrome and most environmentally similar to cystitis and urethritis, all of which are inflammatory diseases.
Subscribe to Journal
Get full journal access for 1 year
only $18.75 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
van de Water, T., Suliman, S. & Seedat, S. Gender and cultural issues in psychiatric nosological classification systems. CNS Spectr. 21, 334–340 (2016).
Kendler, K.S. The nature of psychiatric disorders. World Psychiatry 15, 5–12 (2016).
Endlicher, S. Genera Plantarum Secundum Ordines Naturales Disposita (F. Beck, 1836).
Jussieu, A.L.d. & Stafleu, F.A. Genera Plantarum (Upsaliæ:apud. J. Cramer; Stechert-Hafner Service Agency, 1964).
Linné, C.v. et al. The Families of Plants: With Their Natural Characters, According to the Number, Figure, Situation, and Proportion of All of the Parts of Fructification (John Jackson, 1787).
Thunberg, K.P. et al. Nova Genera Plantarum (Upsaliæ :apud. J. Edman etc., 1781).
Anderson, M.J. Carl Linnaeus: Genius of Classification (Enslow Publishers, 2015).
Felsenstein, J. Inferring Phylogenies (Sinauer Associates, 2004).
Suthram, S. et al. Network-based elucidation of human disease similarities reveals common functional modules enriched for pluripotent drug targets. PLoS Comput. Biol. 6, e1000662 (2010).
Fisher, R.A. XV.—the correlation between relatives on the supposition of Mendelian inheritance. Trans. R. Soc. Edinb. 52, 399–433 (1918).
Wright, S. Systems of mating. I. The biometric relations between parent and offspring. Genetics 6, 111–123 (1921).
Lynch, M. & Walsh, B. Genetics and Analysis of Quantitative Traits (Sinauer, 1998).
Gelman, A. Bayesian Data Analysis 3rd edn. (CRC Press, 2014).
Hadfield, J.D. MCMC methods for multi-response generalized linear mixed models: the MCMCglmm R package. J. Stat. Softw. 33, 1–22 (2010).
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate—a practical and powerful approach to multiple testing. J. Royal Stat. Soc. B Met. 57, 289–300 (1995).
Lichtenstein, P. et al. Common genetic determinants of schizophrenia and bipolar disorder in Swedish families: a population-based study. Lancet 373, 234–239 (2009).
Boyle, E.A., Li, Y.I. & Pritchard, J.K. An expanded view of complex traits: from polygenic to omnigenic. Cell 169, 1177–1186 (2017).
Saitou, N. & Nei, M. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4, 406–425 (1987).
Efron, B. The Jackknife, the Bootstrap and Other Resampling Plans (Society for Industrial and Applied Mathematics, 1982).
Felsenstein, J. Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39, 783–791 (1985).
Efron, B. The bootstrap and Markov-chain Monte Carlo. J. Biopharm. Stat. 21, 1052–1062 (2011).
Farh, K.K. et al. Genetic and epigenetic fine mapping of causal autoimmune disease variants. Nature 518, 337–343 (2015).
Gormley, P. et al. Meta-analysis of 375,000 individuals identifies 38 susceptibility loci for migraine. Nat. Genet. 48, 856–866 (2016).
Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases and traits. Nat. Genet. 47, 1236–1241 (2015).
Xia, C. et al. Pedigree- and SNP-associated genetics and recent environment are the major contributors to anthropometric and cardiometabolic trait variation. PLoS Genet. 12, e1005804 (2016).
Schildkraut, J.M., Risch, N. & Thompson, W.D. Evaluating genetic association among ovarian, breast, and endometrial cancer: evidence for a breast/ovarian cancer relationship. Am. J. Hum. Genet. 45, 521–529 (1989).
Davis, L.K. et al. Partitioning the heritability of Tourette syndrome and obsessive compulsive disorder reveals differences in genetic architecture. PLoS Genet. 9, e1003864 (2013).
Lee, S.H. et al. Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs. Nat. Genet. 45, 984–994 (2013).
Loh, P.R. et al. Contrasting genetic architectures of schizophrenia and other complex diseases using fast variance-components analysis. Nat. Genet. 47, 1385–1392 (2015).
Muñoz, M. et al. Evaluating the contribution of genetics and familial shared environment to common disease using the UK Biobank. Nat. Genet. 48, 980–983 (2016).
Vattikuti, S., Guo, J. & Chow, C.C. Heritability and genetic correlations explained by common SNPs for metabolic syndrome traits. PLoS Genet. 8, e1002637 (2012).
Liu, C. et al. Revisiting heritability accounting for shared environmental effects and maternal inheritance. Hum. Genet. 134, 169–179 (2015).
Zuk, O., Hechter, E., Sunyaev, S.R. & Lander, E.S. The mystery of missing heritability: genetic interactions create phantom heritability. Proc. Natl. Acad. Sci. USA 109, 1193–1198 (2012).
Zaitlen, N. et al. Using extended genealogy to estimate components of heritability for 23 quantitative and dichotomous traits. PLoS Genet. 9, e1003520 (2013).
Wray, N.R. & Maier, R. Genetic basis of complex genetic disease: the contribution of disease heterogeneity to missing heritability. Curr. Epidemiol. Rep. 1, 220–227 (2014).
Ojodu, J., Hulihan, M.M., Pope, S.N. & Grant, A.M. Incidence of sickle cell trait—United States, 2010. MMWR Morb. Mortal. Wkly. Rep. 63, 1155–1158 (2014).
Denny, J.C. et al. PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene–disease associations. Bioinformatics 26, 1205–1210 (2010).
Korsgaard, I.R. et al. Multivariate Bayesian analysis of Gaussian, right censored Gaussian, ordered categorical and binary traits using Gibbs sampling. Genet. Sel. Evol. 35, 159–183 (2003).
Falconer, D. & Mackay, T. Introduction to Quantitative Genetics 4th edn. (Longman Scientific and Technical, 1996).
Falconer, D.S. The inheritance of liability to certain diseases, estimated from the incidence among relatives. Ann. Hum. Genet. 29, 51–76 (1965).
Sorensen, D. & Gianola, D. Likelihood, Bayesian and MCMC Methods in Quantitative Genetics (Springer-Verlag, 2002).
Rodriguez, G. & Goldman, N. An assessment of estimation procedures for multilevel models with binary responses. J. R. Stat. S`. Ser. A Stat. Soc. 158, 73–89 (1995).
de Villemereuil, P., Gimenez, O. & Doligez, B. Comparing parent–offspring regression with frequentist and Bayesian animal models to estimate heritability in wild populations: a simulation study for Gaussian and binary traits. Methods Ecol. Evol. 4, 260–275 (2013).
Gelman, A. Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper). Bayesian Anal. 1(3), 515–534 (2006).
Gelman, A. & Rubin, D.B. Inference from iterative simulation using multiple sequences. Stat. Sci. 7, 457–511 (1992).
Heidelberger, P. & Welch, P.D. Simulation run length control in the presence of an initial transient. Opns Res. 31, 1109–1144 (1983).
Plummer, M., Best, N., Cowles, K. & Vines, K. CODA: Convergence Diagnosis and Output Analysis for MCMC. R News 6, 7–11 (2006).
Benjamini, Y. & Yekutieli, D. The control of the false discovery rate in multiple testing under dependency. Ann. Stat. 29, 1165–1188 (2001).
Spiegelhalter, D.J., Best, N.G., Carlin, B.P. & Van Der Linde, A. Bayesian measures of model complexity and fit. J. Royal Stat. Soc. B Stat. Methodol. 64, 583–639 (2002).
Bérénos, C., Ellis, P.A., Pilkington, J.G. & Pemberton, J.M. Estimating quantitative genetic parameters in wild populations: a comparison of pedigree and genomic approaches. Mol. Ecol. 23, 3434–3451 (2014).
Charmantier, A. & Réale, D. How do misassigned paternities affect the estimation of heritability in the wild? Mol. Ecol. 14, 2839–2850 (2005).
Morrissey, M.B., Wilson, A.J., Pemberton, J.M. & Ferguson, M.M. A framework for power and sensitivity analyses for quantitative genetic studies of natural populations, and case studies in Soay sheep (Ovis aries). J. Evol. Biol. 20, 2309–2321 (2007).
Kreider, R.M. & Lofquist, D.A. Adopted children and stepchildren: 2010. P20-572. (US Census Bureau, 2014).
Anttila, V. et al. Analysis of shared heritability in common disorders of the brain. Preprint at bioRxiv. https://doi.org/10.1101/048991 (2016).
Pippitt, K., Li, M. & Gurgle, H.E. Diabetes mellitus: screening and diagnosis. Am. Fam. Physician 93, 103–109 (2016).
We thank E. Gannon, R. Melamed, R. Mork, and M. Rzhetsky for numerous comments on earlier versions of the manuscript. This work was funded by the DARPA Big Mechanism program under ARO contract W911NF1410333, by National Institutes of Health grants R01HL122712, 1P50MH094267, and U01HL108634-01, and by a gift from Liz and Kent Dauten.
The authors declare no competing financial interests.
Integrated supplementary information
(a) Common couple environment effects. (b) Common sibling environment effects. (c) Unique environment effects. Bar color in the bar plots indicates biological systems associated with each disease, consistent throughout all figures.
Supplementary Figure 2 Testing dependence of heritability estimates on age of onset; heritability distributions, sorted by biological system.
(a) Histograms and density plots of heritability estimates by biological system. (b) Heritability estimate versus disease age of onset for biological systems with more than three diseases, with linear fits indicated by solid lines.
Supplementary Figure 3 Positive correlations between phenotypic and genetic correlations and between phenotypic and environmental correlations.
(a) A classification of diseases that corresponds to a subset of ICD-9 taxonomy. (b) Disease classification constructed from phenotypic correlations between diseases; distances between diseases were calculated as 1 – correlation.
Supplementary Figure 5 Neighbor-joining classifications showing the 29 conditions’ nosologies inferred from genetic and environmental correlations presented on the left and the right trees, respectively.
For both classifications, we defined the distance between diseases as 1 – correlation. Because we estimated a posterior distribution for each correlation estimate, we were able to sample 10,000 distance sets using posterior distributions for pairwise correlations. For each of these samples, we estimated a classification and computed reliability measures for individual classification topology partitions (each integer number on the tree indicates the percentage of trees out of 10,000 in which this particular partition was present). Disease labels are colored according to associated biological systems, consistent with other figures. Note that, while the genetic and environmental trees are significantly different, both are stable, as the bootstrap-like numbers indicate.
Supplementary Figure 6 Estimates of age-related increase in disease liability for seven late-onset conditions (aneurysm, atherosclerosis, benign colon neoplasm, cataract, cerebrovascular disease, keratosis, and osteoarthritis).
Error bars show 1 s.d., and LOcally WEighted Scatter-plot Smoother (LOWESS) curve fits are shown with solid lines.
Supplementary Figures 1–6 and Supplementary Tables 3 and 5–8 (PDF 1883 kb)
Acronyms, biological systems, prevalence percentages and standard errors for 149 studied diseases. (XLSX 74 kb)
Heritability and preventability estimates and standard deviations for 149 studied diseases. (XLSX 83 kb)
Pairwise estimates and standard deviations of genetic, environmental and phenotypic correlations for 29 diseases. (XLSX 64 kb)
About this article
Cite this article
Wang, K., Gaitsch, H., Poon, H. et al. Classification of common human diseases derived from shared genetic and environmental determinants. Nat Genet 49, 1319–1325 (2017). https://doi.org/10.1038/ng.3931
A large-scale genetic correlation scan identified the plasma proteins associated with brain function related traits
Brain Research Bulletin (2020)
NAR Genomics and Bioinformatics (2020)
HLA class I alleles are associated with clinic-based migraine and increased risks of chronic migraine and medication overuse
Probing disrupted neurodevelopment in autism using human stem cell‐derived neurons and organoids: An outlook into future diagnostics and drug development
Developmental Dynamics (2020)
Nature Reviews Disease Primers (2020)