We analysed a large health insurance dataset to assess the genetic and environmental contributions of 560 disease-related phenotypes in 56,396 twin pairs and 724,513 sibling pairs out of 44,859,462 individuals that live in the United States. We estimated the contribution of environmental risk factors (socioeconomic status (SES), air pollution and climate) in each phenotype. Mean heritability (h2 = 0.311) and shared environmental variance (c2 = 0.088) were higher than variance attributed to specific environmental factors such as zip-code-level SES (varSES = 0.002), daily air quality (varAQI = 0.0004), and average temperature (vartemp = 0.001) overall, as well as for individual phenotypes. We found significant heritability and shared environment for a number of comorbidities (h2 = 0.433, c2 = 0.241) and average monthly cost (h2 = 0.290, c2 = 0.302). All results are available using our Claims Analysis of Twin Correlation and Heritability (CaTCH) web application.
Subscribe to Journal
Get full journal access for 1 year
only $17.42 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
The data that support the findings of this study are available from Aetna Insurance, but restrictions apply to the availability of these data, which were used under licence for the current study, and so are not publicly available. Please contact N. Palmer (email@example.com) for inquiries about the Aetna dataset. Summary data are, however, available from the authors upon reasonable request and with permission of Aetna Insurance. Code for analysis, generation of figures and figure files is available at https://github.com/cmlakhan/twinInsurance.
Collins, F. S. & Varmus, H. A new initiative on precision medicine. N. Engl. J. Med. 372, 793–795 (2015).
Roberts, N. J. et al. The predictive capacity of personal genome sequencing. Sci. Transl. Med. 4, 133ra58–133ra58 (2012).
Wray, N. R., Yang, J., Goddard, M. E. & Visscher, P. M. The genetic interpretation of area under the ROC curve in genomic profiling. PLoS Genet. 6, e1000864 (2010).
Wang, K., Gaitsch, H., Poon, H., Cox, N. J. & Rzhetsky, A. Classification of common human diseases derived from shared genetic and environmental determinants. Nat. Genet. 49, 1319–1325 (2017).
Polubriaginof, F. C. G. et al. Disease heritability inferred from familial relationships reported in medical records. Cell 173, 1692–1704.e11 (2018).
Benyamin, B., Wilson, V., Whalley, L. J., Visscher, P. M. & Deary, I. J. Large, consistent estimates of the heritability of cognitive ability in two entire populations of 11-year-old twins from Scottish mental surveys of 1932 and 1947. Behav. Genet. 35, 525–534 (2005).
Graham, G. N. Why your zip code matters more than your genetic code: promoting healthy outcomes from mother to child. Breastfeed. Med. 11, 396–397 (2016).
Slade-Sawyer, P. Is health determined by genetic code or zip code? Measuring the health of groups and improving population health. N. C. Med. J. 75, 394–397 (2014).
Heckerman, D. et al. Linear mixed model for heritability estimation that explicitly addresses environmental variation. Proc. Natl Acad. Sci. USA 113, 7377–7382 (2016).
Denny, J. C. et al. Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data. Nat. Biotechnol. 31, 1102–1110 (2013).
Storey, J. D. A direct approach to false discovery rates. J. R. Stat. Soc. Series B Stat. Methodol. 64, 479–498 (2002).
Polderman, T. J. C. et al. Meta-analysis of the heritability of human traits based on fifty years of twin studies. Nat. Genet. 47, 702–709 (2015).
van Dongen, J., Eline Slagboom, P., Draisma, H. H. M., Martin, N. G. & Boomsma, D. I. The continuing value of twin studies in the omics era. Nat. Rev. Genet. 13, 640–653 (2012).
Docherty, A. R. et al. Comparison of twin and extended pedigree designs for obtaining heritability estimates. Behav. Genet. 45, 461–466 (2015).
Liu, C. et al. Revisiting heritability accounting for shared environmental effects and maternal inheritance. Hum. Genet. 134, 169–179 (2015).
Loh, P.-R. et al. Contrasting genetic architectures of schizophrenia and other complex diseases using fast variance-components analysis. Nat. Genet. 47, 1385–1392 (2015).
Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).
Lee, S. H. et al. Estimating the proportion of variation in susceptibility to schizophrenia captured by common SNPs. Nat. Genet. 44, 247–250 (2012).
Dieleman, J. L. et al. US Spending on personal health care and public health, 1996–2013. JAMA 316, 2627–2646 (2016).
McWilliams, J. M. & Schwartz, A. L. Focusing on high-cost patients - the key to addressing high costs? N. Engl. J. Med. 376, 807–809 (2017).
Richesson, R. L. et al. A comparison of phenotype definitions for diabetes mellitus. J. Am. Med. Inform. Assoc. 20, e319–e326 (2013).
Krieger, N. et al. Choosing area based socioeconomic measures to monitor social inequalities in low birth weight and childhood lead poisoning: the public health disparities geocoding project (US). J. Epidemiol. Community Health 57, 186–199 (2003).
Blair, D. R. et al. A nondegenerate code of deleterious variants in Mendelian loci contributes to complex disease risk. Cell 155, 70–80 (2013).
Huff, S. M. et al. Development of the logical observation identifier names and codes (LOINC) vocabulary. J. Am. Med. Inform. Assoc. 5, 276–292 (1998).
Visscher, P. M., Benyamin, B. & White, I. The use of linear mixed models to estimate variance components from data on twin pairs by maximum likelihood. Twin. Res. 7, 670–674 (2004).
Beasley, T. M., Erickson, S. & Allison, D. B. Rank-based inverse normal transformations are increasingly used, but are they merited? Behav. Genet. 39, 580–595 (2009).
Reich, T., James, J. W. & Morris, C. A. The use of multiple thresholds in determining the mode of transmission of semi-continuous traits. Ann. Hum. Genet. 36, 163–184 (1972).
Falconer, D. S. & Mackay, T. C. Introduction to Quantitative Genetics (John Wiley & Sons. Inc., New York,, 1989).
Weinberg, W. Beiträge zur Physiologie und Pathologie der Mehrlingsgeburten beim Menschen. Pflugers Arch. Gesamte Physiol. Menschen Tiere 88, 346–430 (1901).
Neale, M. C. A finite mixture distribution model for data collected from twins. Twin. Res. 6, 235–239 (2003).
Scarr-Salapatek, S. Race, social class, and IQ. Science 174, 1285–1295 (1971).
Benjamini, Y. & Yekutieli, D. The control of the false discovery rate in multiple testing under dependency. Ann. Stat. 29, 1165–1188 (2001).
Bates, D., Mächler, M., Bolker, B. & Walker, S. Fitting linear mixed-effects models using lme4. J. Stat. Softw. 67, 1–48 (2015).
R. C. Team R: A language and environment for statistical computing (R Foundation for Statistical Computing, 2014).
Viechtbauer, W. Conducting meta-analyses in R with the metafor package. J. Stat. Softw. 36, 1–48 (2010).
DerSimonian, R. & Laird, N. Meta-analysis in clinical trials. Control. Clin. Trials 7, 177–188 (1986).
Qi, T. et al. Identifying gene targets for brain-related traits using transcriptomic and methylomic data from blood. Nat. Commun. 9, 2282 (2018).
We thank K. Fox of Aetna, Inc., N. Palmer of Harvard Medical School, and I. Kohane of Harvard Medical School for support and providing access to the Aetna Insurance Claims Data. We are grateful to L. O’Connor and A. Price for helpful discussion. This research was supported by the Australian National Health and Medical Research Council (1078037 and 1113400), National Institutes of Health NIEHS (R00ES23504 and R21ES205052), the National Science Foundation (1636870), and the Sylvia & Charles Viertel Charitable Foundation.
The authors declare no competing interests.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Lakhani, C.M., Tierney, B.T., Manrai, A.K. et al. Repurposing large health insurance claims data to estimate genetic and environmental contributions in 560 phenotypes. Nat Genet 51, 327–334 (2019). https://doi.org/10.1038/s41588-018-0313-7
Nature Reviews Genetics (2020)
Journal of Clinical Investigation (2020)
Nature Medicine (2020)
NAR Genomics and Bioinformatics (2020)
Pleiotropy of polygenic factors associated with focal and generalized epilepsy in the general population
PLOS ONE (2020)