Article | Open

Changing Polygenic Penetrance on Phenotypes in the 20th Century Among Adults in the US Population

  • Scientific Reports 6, Article number: 30348 (2016)
  • doi:10.1038/srep30348
  • Download Citation
Published online:


This study evaluates changes in genetic penetrance—defined as the association between an additive polygenic score and its associated phenotype—across birth cohorts. Situating our analysis within recent historical trends in the U.S., we show that, while height and BMI show increasing genotypic penetrance over the course of 20th Century, education and heart disease show declining genotypic effects. Meanwhile, we find genotypic penetrance to be historically stable with respect to depression. Our findings help inform our understanding of how the genetic and environmental landscape of American society has changed over the past century, and have implications for research which models gene-environment (GxE) interactions, as well as polygenic score calculations in consortia studies that include multiple birth cohorts.


This study evaluates changes in polygenic penetrance—defined as the association between a polygenic score (PGS) and its associated phenotype—across recent birth cohorts in the United States. The answer to this question informs our understanding of how the genetic and environmental landscape of American society has changed over the past century, and offers suggestive evidence for the selective influence of environment on genetic expression. Our findings also have important implications for PGS calculations in consortia studies that include multiple birth cohorts. This inquiry would not have been possible even a decade ago, before the development of PGS techniques to predict complex phenotypes1. The approach is not without its limitations; however, the scalar variables provided by PGS construction are unique in that they allow researchers to ask a number of questions that were not possible with latent heritability models. This allows for fresh opportunities to explore a range of issues, from polygenicity of traits to gene-environment (GxE) interactions.

In the present paper, we exploit this opportunity by asking whether the associations between PGS and several phenotypes have changed over the course of the 20th century in the U.S. Because the economic, social, and physical environments underwent dramatic changes during this period, it is likely that the association between a PGS and its related phenotype has also evolved as a consequence2. We examine five important phenotypes—height, body mass index (BMI), education, depression, and heart disease—chosen due to their key associations with health and mortality, the different age ranges at which they are salient3,4,5, and the fact that GWAS results (for all SNPs and not just top hits) are available for all five6,7,8,9,10. We find that while height and BMI show increasing PGS penetrance over the course of the 20th century birth cohorts, education and heart disease exhibit the opposite trend. In contrast, the association between depression and its underlying genetic architecture remained stable over the same period.

Additive heritability (for which PGS penetrance is a proxy), independently of how it is measured, is contingent on the social structure. Indeed, heritability is not a fixed parameter across time and place but is always a ‘local perturbation analysis'11. Supposing a phenotype to be the product of a complex process involving both genetics, environment, and perhaps their interactions (that is, yi = f(Gi, Ei) + εi), a complete analysis would require that we first know the partial derivatives of the unknown function f(G, E). Absent a specified model of f(G, E), many scholars, particularly in the social sciences, have attempted to inductively model gene-environment correlations (rGE) and interactions (GxE). Starting with the seminal paper in this area of scholarship12, most of these studies rely on endogenous measures of environment and/or fail to adequately control for population structure, thereby producing under-identified results that may reflect rGE, GxE, ExE or GxG13.

A few exceptions to this trend include studies that deploy nationally-representative, genome-wide data with controls for principle components in order to address population stratification on the genetic side while econometrically exploiting natural experiments on the environmental side to assure exogeneity of environment14. A promising avenue in this regard has been scholarship that takes advantage of data spanning a wide range of birth cohorts to assess how heritability may be changing over the shifting (if unmeasured) environment across decades. For instance, recent research has shown that a PGS for physiological predisposition to tobacco use has exhibited more robust correlations over time with phenotypic measures of smoking in the U.S. population15. Studies which employ sibling and twin comparisons and candidate gene studies show the same pattern of increasing genetic penetrance with respect to tobacco use among recent cohorts16,17. These results suggest that as the dangers of tobacco use were publicized in the latter half of the 20th century, the underlying genotype signifying a greater propensity to smoke exerted a more pronounced influence on behavior.

Other research shows a similar historical shift in genomic influence on physical characteristics, with increasing associations between genetic architecture and BMI in recent decades for US adults18,19. Likewise, twin-based models of the heritability of education appear to show an increasing effect of genotype over a similar time period20. We expand on this literature by focusing on a wider breadth of phenotypes and employ polygenic scores based on millions of SNPs rather than individual markers in identifying historical shifts in genetic expression.

Some have argued that these changes reflect the relative increase of genetic over social factors as determinants of complex behavioral traits like smoking, rather than a true increase in the causal association between genetic polymorphisms and phenotypes. This distinction is important because it emphasizes genetic penetrance rather than expression, per se. That is, the social and historical context can, at times, mask small genetic associations because the environment may be ‘pushing’ the phenotype, which limits our ability to observe penetrance16. The social environment can also serve as a trigger (or, alternatively, as a controlling influence) in which differential rates of expression (or methylation) in response to specific environmental signals denotes a biological mechanism, through which the environment causes genes to function in a particular manner21.


We used data from the Health and Retirement Study (HRS). Details about inclusion in the sample and selective attrition can be found in the Supplementary Information notes. Our data are from the 2012 wave of the HRS, and allowed us to observe the consistency of PGS-phenotype correlations across birth cohorts in the mid-20th century among U.S. adults. Respondents were born between 1919 and 1955 and, on average, went on to complete over 13 years of education. Nearly 40% of the respondents self-reported heart disease. Baseline associations between the five traits and their respective polygenic scores (Supplemental Table S2) are significant at conventional alpha levels. The polygenic score for BMI is the best predictor of its associated outcome, followed by education and height.

We interacted the PGS for each trait with birth year to predict the corresponding phenotypes in Fig. 1 (model also included main effects for both birth year and phenotype; see Equation 2 in Methods). We find that, while there is tendency for those in later birth cohorts to accrue more education, the predictive power of genotype for education is declining over time. This finding is contrary to some twin-based evidence that the genetic penetrance for education has risen20; this could be due to a number of dynamics including the inherent differences between twin methods and the PGS approach, differences in the birth cohorts studied or changing gender dynamics. (We discuss potential difference sand explanations in depth in the SI on pages 10–11). Similarly, declines in heart disease are matched by declines in the predictive natures of the heart disease PGS. Meanwhile, the predictive power of height and BMI polygenic scores have increased significantly, while depression appears flat. Our results showing an increased PGS penetrance of BMI in particular among more recent cohorts of Americans are broadly consistent with recent researchbased on a more limited polygenic score and other forms of genetic analysis18,19.

Figure 1: Predicted standardized values of selected phenotypes by polygenic score (+1 or −1 standard deviations), across birth cohorts among genotyped respondents in the Health and Retirement Study (N = 8,865).
Figure 1

Height (p < 0.05) and BMI (p < 0.001) polygenic scores become more predictive in later birth cohorts while education (p < 0.05) and heart disease (p < 0.05) PGSs become less predictive. Depression does not show a significant trend. The lines show fitted values for those at 1 SD above (gray) and below (black) the mean. Points are based on binned means for two groups of respondents (standardized value below 0, black; standardized value above 0, dark gray). For each group, the distribution of birth years is divided into 20 subgroups with approximately equal numbers. Plotted points are the mean birth year and response for these subgroups.

One potential explanation for these trends in PGS penetrance could be due to changes in the genetic variation in the population that could result from differential fertility and/or genetic assortative mating22,23,24. To assess this latter possibility, we calculated the variance for each of the five PGSs across birth cohorts. These are reported in the Supplemental Information Fig. S5, Panel B. For all the scores, variances are unchanged across birth cohorts, supporting the understanding that changes in PGS predictive power reflect GxE effects that result from a shifting environmental landscape. Namely, if the variance component for G is unchanged, any change in additive heritability or SNP-based PGS prediction is likely due to a shift in the variance component for the environmental portion. We also perform other sensitivity checks related to mortality and sample ascertainment (presented in detail in the Supplementary Information), and find that our results broadly reflect a changing influence in environmental conditions, and do not appear to be driven by biases introduced by the data (see SI, Page 5–10). Likewise, our results become stronger when measurement error for each PGS is taken into consideration through SIMEX analysis (Table S3) and are robust to Huber-White adjustments for clustering by household (Table S4). That said, our power to detect the interaction term is limited for some phenotypes (particularly depression—see Table S2 and SI notes for discussion), so replication of our results will be important.


The twentieth century witnessed massive shifts in the social and nutritional environment of the United States. The change from an agrarian society to an industrial and post-industrial one has well documented effects on population health25 and is also associated with the expansion of schooling26, medical improvements27, increased longevity28, and caloric abundance29. Any or all of these changes may influence not only relationships between important phenotypes but between those phenotypes and their underlying genotypes as well. Under this multi-dimensionally shifting environmental regime, the genotypic effects of height and BMI PGSs evince trends of increasing predictive power, while education PGS shows a declining association with years of schooling, perhaps due to policy and structural changes in society that has reduced variation in the phenotype (see Panel B of SI Fig. S4).

As nutritional deprivation receded as a restraining force on genetic expression, height and weight could more “accurately” reflect underlying genetic potential as measured by common SNPs. Meanwhile, educational “abundance” had the opposite effect: with the steady expansion of schooling we find that rather than constraints on the full extent of ability being lifted to reveal increasing genetic penetrance, we observed declining genetic prediction among more recent cohorts. During this time, secondary schooling became nearly universal and post-secondary education more common, yet the genetic signal was weakened. Thus, in some cases—like height and BMI— environmental barriers can act to suppress genetic effects, while in others (such as education) such obstacles can act to accentuate genetic associations. This may be a useful dichotomous classification scheme to apply to cohort analysis of genetic influence on other phenotypes going forward.

Materials and Methods

Phenotypes were computed based on RAND Fat Files, version N (which covers data collection up until 2012). We examined:

  • Education: Total years of educational attainment.

  • BMI: Mean BMI over all available waves.

  • Height: Max height over all available waves.

  • Heart Disease: Whether a respondent ever reports heart problems (rXheart).

  • Depression: Mean CESD score over all available waves. This variable had a skewed distribution, so it was transformed via the logarithm (after adding one to everyone’s mean).

Sample descriptives are shown in Table S1.


Polygenic Scores (PGSs) were first suggested in 2007 as flexible tools for quantifying the genetic contribution to a phenotype30. Polygenic scores have several attractive features. First, unlike candidate genes, they are “hypothesis-free” measures—i.e. ex ante knowledge about the biological processes involved is not needed to estimate a score for a particular phenotype. Rather, a polygenic scores casts a wide net across an individual’s entire genome to yield a single quantitative measure of genetic risk, or genetic risk score (GRS)31,32,33,34, allowing researchers to explore how genes operate within environments where the biological mechanisms are not yet fully understood35.

PGSs were constructed based on publicly available data from recent GWAS (additional details on the genetic data and the construction of polygenic scores are available in the SI)6,7,8,9,10. The same approach was conducted with each set of GWAS results. Briefly, SNPs in the HRS genetic database were matched to SNPs with reported results in a GWAS. Since the risk allele is not always readily identifiable, we removed all ambiguous SNPs. For each of these SNPs, a loading was calculated as the number of phenotypically associated reference alleles multiplied by the effect-size estimated in the original GWAS as shown in Equation 1, below. Thus, a polygenic score (PS) for individual is a weighted average across the number of SNPs (n) of the number of reference alleles x (0, 1 or 2) at that SNP multiplied by the score for that SNP ():

where SNPs with relatively large p-values will have small effects (and thus be down weighted in creating the composite), so we do not impose a p-value threshold. Loadings were summed across the SNP set to calculate the polygenic score. The score was then standardized to have a mean of 0 and SD of 1 for ease of interpretation (though analysis of raw scores does not change results). Genetic analyses were done using the second-generation PLINK software36. Finally, scores were residualized on the top 10 principal components computed from the non-Hispanic whites in HRS to ensure that none of the reported results are due to changes in population stratification (though results without residualization on PCs do not change, see Fig. S3 of SI). To examine changes in PGS penetrance, we estimated Equation 2:

Huber-White correction for the non-independence of spousal pairs does not change results (see Supplementary Information Table S4).

Additional Information

How to cite this article: Conley, D. et al. Changing Polygenic Penetrance on Phenotypes in the 20th Century Among Adults in the US Population. Sci. Rep. 6, 30348; doi: 10.1038/srep30348 (2016).


  1. 1.

    et al. Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature 460, 748–752 (2009).

  2. 2.

    & Societal development and the shifting influence of the genome on status attainment. Res. Soc. Stratif. Mobil. 26, 235–255 (2008).

  3. 3.

    , , & Increased educational attainment and its effect on child mortality in 175 countries between 1970 and 2009: a systematic analysis. Lancet 376, 959–974 (2010).

  4. 4.

    et al. Trends in cardiovascular health metrics and associations with all-cause and CVD mortality among US adults. JAMA 307, 1273–1283 (2012).

  5. 5.

    et al. Associations of mortality rates with height using son’s height as an instrumental variable. J. Epidemiol. Community Health 65, A26 (2011).

  6. 6.

    et al. Defining the role of common variation in the genomic and biological architecture of adult human height. Nat. Genet. 46, 1173–1186 (2014).

  7. 7.

    et al. Genetic studies of body mass index yield new insights for obesity biology. Nature 518, 197–206 (2015).

  8. 8.

    et al. A mega-analysis of genome-wide association studies for major depressive disorder. Mol. Psychiatry 18, 497–511 (2013).

  9. 9.

    et al. GWAS of 126,559 individuals identifies genetic variants associated with educational attainment. Science 340, 1467–1471 (2013).

  10. 10.

    et al. Large-scale association analysis identifies 13 new susceptibility loci for coronary artery disease. Nat. Genet. 43, 333–338 (2011).

  11. 11.

    & The heritability hang-up. Science 190, 1163–1168 (1975).

  12. 12.

    et al. Influence of life stress on depression: moderation by a polymorphism in the 5-HTT gene. Science 301, 386–389 (2003).

  13. 13.

    & The challenge of causal inference in gene-environment interaction research: leveraging research designs from the social sciences. Am. J. Public Health 103, S42–S45 (2013).

  14. 14.

    & The long-term consequences of Vietnam-era conscription and genotype on smoking behavior and health. Beh. Genet. 46, 43–58 (2016).

  15. 15.

    , , & Cohort effects in the genetic influence on smoking. Beh. Genet. 46, 31–42(2016).

  16. 16.

    et al. Population composition, public policy, and the genetics of smoking. Demography 48, 1517–1533 (2011).

  17. 17.

    Why have tobacco control policies stalled? Using genetic moderation to examine policy impacts. PloS One 7, e50576 (2012).

  18. 18.

    , , , & The genome-wide influence on human BMI depends on physical activity, life course, and historical period. Demography 52, 1651–1670 (2015).

  19. 19.

    & Lifetime socioeconomic status, historical context, and genetic inheritance in shaping body mass in middle and late adulthood. Am. Soc. Rev. 80, 705–737 (2015).

  20. 20.

    , & Variation in the heritability of educational attainment: An international meta-analysis. Soc. Forces 92, 109–140 (2013).

  21. 21.

    , , & Epigenome-wide association studies for common human diseases. Nat. Rev. Genet. 12, 529–541 (2011).

  22. 22.

    et al. Genetic and socioeconomic study of mate choice in Latinos reveals novel assortment patterns. Proc. Natl. Acad. Sci. USA 112, 13621–13626 (2015).

  23. 23.

    , & Genomic assortative mating in marriages in the United States. PLoS One 9, e112322 (2014).

  24. 24.

    , , & Genetic and educational assortative mating among US adults. Proc. Natl. Acad. Sci. USA 111, 7996–8000 (2014).

  25. 25.

    & The great agricultural transition: crisis, change, and social consequences of twentieth century US farming. Ann. Rev. Soc. 27, 103–124 (2001).

  26. 26.

    & The worldwide expansion of higher education in the twentieth century. Am. Soc. Rev. 70, 898–920 (2005).

  27. 27.

    & The role of public health improvements in health advances: The twentieth-century United States. Demography 42, 1–22 (2005).

  28. 28.

    & Rising morbidity and mortality in midlife among white non-Hispanic Americans in the 21st century. Proc. Natl. Acad. Sci. USA 112, 15078–15083 (2015).

  29. 29.

    , & The determinants of mortality. J. Econ. Perspect. 20, 97–120 (2006).

  30. 30.

    , & Prediction of individual genetic risk to disease from genome-wide association studies. Genome Res. 17, 1520–1528 (2007).

  31. 31.

    et al. Polygenic risk, rapid childhood growth, and the development of obesity: Evidence from a 4-decade longitudinal study. JAMA Pediatr. 166, 515–521 (2012).

  32. 32.

    et al. Development and evaluation of a genetic risk score for obesity. Biodemography Soc. Biol. 59, 85–100 (2013).

  33. 33.

    et al. Polygenic risk and the developmental progression to heavy, persistent smoking and nicotine dependence: Evidence from a 4-decade longitudinal study. JAMA Psychiatry 70, 534–542 (2013).

  34. 34.

    , , , & Polygenic influence on educational attainment. AERA Open 1, 1–13 (2015).

  35. 35.

    & Integrating genetics and social science: Genetic risk scores. Biodemography Soc. Biol. 60, 137–155 (2014).

  36. 36.

    et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience 4, 7 (2015).

Download references


This work was supported by the Russell Sage Foundation (grant: “GxE and Health Inequality over the Life Course”). This research uses data from the HRS, which is sponsored by the National Institute on Aging (Grants NIA U01AG009740, RC2AG036495, and RC4AG039029) and conducted by the University of Michigan. Research was supported by the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD) of the National Institutes of Health (NIH) under Award R21HD078031. The authors also acknowledge co-funding from the NICHD and the Office of Behavioral and Social Sciences Research (1R21HD071884). Further support was provided by the NIH/NICHD-funded University of Colorado Population Center (R24HD066613).

Author information


  1. Department of Sociology, Princeton University, Princeton, NJ 08644, USA

    • Dalton Conley
  2. Department of Sociology, New York University, New York, NY 10012, USA

    • Thomas M. Laidley
  3. Institute of Behavioral Science and Department of Sociology, University of Colorado, Boulder, CO 80309, USA

    • Jason D. Boardman
  4. Graduate School of Education, Stanford University, Stanford CA94305, USA.

    • Benjamin W. Domingue


  1. Search for Dalton Conley in:

  2. Search for Thomas M. Laidley in:

  3. Search for Jason D. Boardman in:

  4. Search for Benjamin W. Domingue in:


D.C., B.D. and J.D.B. designed and conceived the research. D.C. and B.D. analyzed data. D.C., B.D. and T.M.L. wrote the paper. All authors discussed and reviewed the manuscript.

Competing interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to Dalton Conley or Benjamin W. Domingue.

Supplementary information


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Creative Commons BYThis work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit