Estimates of shared environmental influence on educational attainment (EA) using the Classical Twin Design (CTD) have been enlisted as genetically sensitive measures of unequal opportunity. However, key assumptions of the CTD appear violated for EA. In this study we compared CTD estimates of shared environmental influence on EA with estimates from a Nuclear Twin and Family Design (NTFD) in the same 982 German families. Our CTD model estimated shared environmental influence at 43%. After accounting for assortative mating, our best fitting NTFD model estimated shared environmental influence at 26%, disaggregating this into twin-specific shared environments (16%) and environmental influences shared by all siblings (10%). Only the sibling shared environment captures environmental influences that reliably differ between families, suggesting the CTD substantially overestimates between-family differences in educational opportunity. Moreover, parental education was found to have no environmental effect on offspring education once genetic influences were accounted for.
Educational attainment (i.e., ultimate years of education completed) is a key variable in the behavioural sciences because of its effectiveness in predicting a wide variety of important life outcomes. Despite being a measure that can be calculated from a single questionnaire item (e.g., “what is the highest qualification you’ve obtained?”) educational attainment (EA) is one of the best predictors of occupational status and income1, longevity and health outcomes2, and the risk of receiving a criminal conviction3. The qualities needed to advance through the modern secondary and tertiary education system appear to be useful for navigating a wide variety of challenges that life throws at individuals in advanced industrial economies.
One of the most established findings in the social sciences is that EA tends to run in families—a result which has widely been interpreted as evidence of persistent inequality in environmental opportunity and the “social reproduction” of socioeconomic advantages4,5,6,7,8. However, as noted by Jencks and Tach “the size of the correlation between the economic status of parents and their children is not a good indicator of how close a society has come to equalising opportunity… In particular, we must separate the contributions of genes” (p.2-3)9. From the 1970 s twin studies began to show evidence that the variation in EA had a substantial genetic component10,11. Two studies published in the last decade have sought to summarise the results of the international twin literature that has accumulated since then: a meta-analysis by Branigan et al.12 and a mega-analysis by Silventoinen et al.13 (see Supplementary Note 1). Both studies converged on similar results, estimating mean heritability at 40%–43% and mean shared environmental influence at 31%–36%. These heritability estimates are low relative to other highly correlated cognitive outcomes such as adult general cognitive ability (60%–80%)14,15,16 or adolescent school grades ( ~ 60% at age 16)17. However, the estimates of shared environmental influence are especially conspicuous, being among the highest for any behavioural trait investigated in adults.
That such high estimates have been reported for a socioeconomic outcome that bears on many important life chances has compelled some researchers to draw far-reaching conclusions about what this says about equality of opportunity in contemporary society. For example, after reporting high shared environmental estimates in their U.S. sample, Nielsen and Roos18 argued this “indicates a high level of inequality of opportunity for educational attainment in American Society at the turn of the twenty-first century” (p.535). However, a review paper by Freese and Jao19 cautioned against prematurely leaping to moralised conclusions about high estimates of shared environmental influence for EA when these might have innocuous explanations.
One possibility is that these are methodological artefacts. The mean international estimates of genetic and environmental influence on EA described above were calculated using variations on the Classical Twin Design (CTD). In CTD studies the variance in the target outcome is typically partitioned into additive genetic influence (A), shared environmental influence (C), and nonshared environmental influence (E) by comparing the resemblance of monozygotic (MZ) twins reared together with the resemblance of dizygotic (DZ) twins reared together. But just as estimates of the family environment’s influence on EA are confounded by unmodelled genetic influences in studies using parent-child or non-twin sibling correlations12, ACE estimates in the CTD are confounded by other unmodelled parameters that can potentially bias them up or down or affect their interpretation20,21,22. Two unmodelled parameters of particular interest in the present study are assortative mating and twin-specific shared environments.
One of the potential explanations for high C estimates of EA suggested by Freese & Jao (2017) was the presence of unmodelled assortative mating19. The CTD ACE model assumes random mating between spouses, attributing any additional resemblance shared by MZ twins relative to DZ twins to the additional 50% of their genes they are assumed to share [following Falconer’s formula A = 2(rMZ-rDZ)]23. Any residual resemblance between MZ twins after accounting for genetic influences is attributed to the shared environment (i.e. C = rMZ-A)23. However, under conditions of positive phenotypic assortment where spouses actively match on a heritable trait, this will induce a genetic correlation between spouses for that trait which also leads to higher genetic resemblance between their DZ twin offspring than the 50% kinship coefficient assumed under the CTD. This will cause the CTD to underestimate heritability and overestimate shared environmental influence.
EA exhibits some of the highest spousal correlations for any trait investigated, averaging r = 0.5324. However, phenotypic assortment is not the only possible explanation. Alternative explanations that do not imply increased genetic correlations between DZ twins are spousal convergence, in which partners become more similar over time due to their environmental influence upon each other; and social homogamy, in which the community from which individuals draw their partners resembles them for purely environmental reasons25. However, a large Australian study found spousal convergence played a negligible role in partner similarity for EA26, and recent molecular genetic studies have found strong evidence for phenotypic assortment on EA and associated traits27,28,29,30,31,32. A recent Norwegian study estimated the genetic correlation between spouses for EA at 0.37 and the genetic correlation between siblings at 0.6731—a value much larger than the expected correlation of 0.5, suggesting that CTD estimates of EA have been doubling the difference between MZ and DZ twin correlations to estimate heritability when tripling the difference might be more appropriate. Martin33 developed a method to correct CTD ACE estimates for bias due to phenotypic assortment when data on spousal correlations for parents is available. The authors of the Silventoinen et al.13 mega-analysis of 193,518 twins applied this adjustment to a subsample of 23,705 families with parent data (cross-parental correlations of 0.57). When they did so, the C estimate was driven to zero and all the C variance was re-allocated to the A estimate. The unadjusted ACE estimates for this subsample were not published in the paper but were almost identical with the full sample (A = 43%, C = 30%, and E = 27% vs. A = 43%, C = 31%, and E = 26%. Private correspondence with authors). To the extent that the spousal correlations for the wider sample are similar and phenotypic assortment explains that correlation, this potentially implies the mean C estimate in the main results for the mega-analysis should be entirely re-allocated to the A estimate, i.e.: A = 74%, E = 26%.
In Branigan et al.12, 13 of the 34 subgroups included in the meta-analysis were from studies that published spousal correlations for either the twins or their parents; however, the potential bias assortative mating introduced to ACE estimates in these studies was not explored. In Table 1 we recalculated the ACE estimates for each of these subgroups and adjusted them for assortative mating. We then replicated the fixed effects meta-analysis performed by Branigan et al.12 for this subsample, obtaining grand mean estimates for both the adjusted and the unadjusted ACE estimates (full details of this analysis are provided in Supplementary Note 2 and Supplementary Tables 1 and 8). The difference between our grand mean estimates in the adjusted vs. the unadjusted sample suggest, on average, A is biased downwards and C biased upwards by 16–17 percentage points in these CTD studies. Our grand mean ACE estimates for the unadjusted subsample are very similar to the headline results from the full sample in Branigan et al. (2013) suggesting the headline estimates may be biased to a similar extent (A = 38%, C = 39%, E = 22% in the subsample vs. A = 40%, C = 36%, E = 25% in the full sample).
Twin-specific shared environmental influence is another unmodelled parameter in CTD studies that has important implications for how CTD estimates of shared environmental influence are interpreted. In this study, twin-specific shared environments refers to environmental influences held in common by twins which are experienced as nonshared environmental influences by siblings growing up at different times. These will include the effects of, e.g.: birth order relative to other siblings; “birthday effects” of being born earlier or later in the year; and the cohort effects of being born in a particular political, economic, or cultural epoch. Kendler et al. (2019)34 invoked twin-specific shared environments alongside assortative mating as a potential explanation for why CTD estimates of C were 11–12 percentage points higher for EA than estimates from half- or step-sibling study designs using the same Swedish register data.
A longstanding convention in CTD studies is to interpret C estimates as a measure of “between-family” environments which “make members of a family…similar to one another and different from members of other families” (p.18)35. For EA, this convention leads to C being interpreted as a measure of inequality of environmental opportunity between families, e.g., Nielsen and Roos (2015)18 write: “The shared environment component … has a direct policy interpretation: it reflects the potential effect on educational attainment of raising the quality of the most disadvantaged family environments to the level of the most advantaged ones” (p.539). But to the extent that C captures twin-specific shared environments, it will also capture environmental effects that make siblings in the same family different from each other, making them an inflated estimate of between-family environmental differences. Moreover, while twin-specific shared environments will capture real inequalities of opportunity between siblings, these within-family differences in opportunity are not the kind that ordinarily preoccupy policymakers or advocacy groups, who tend to be more concerned about between-family differences in, e.g., parental income, education, or occupational status36.
The presence of twin-specific shared environments (T) can be detected by incorporating data from DZ twins and their non-twin siblings in the same study, with T indicated when DZ twins resemble each other more closely than non-twin siblings. These effects have previously been reported for a US twin and sibling study of EA18 which found that C was 11.3% higher (and E correspondingly 11.3% lower) for twins than for non-twin siblings when accounting for T. The supplements of the first Genome-Wide Association Study (GWAS) of EA also included a twin and sibling analysis of the Swedish Multigenerational Registry, which reported that T accounted for 6.2% of the variance37. Furthermore, when we compared sibling correlations for EA from a recent international study8 with DZ twin correlations from studies in the same countries with similar birth cohorts, the DZ twin correlations were invariably higher, suggesting twin-specific environments might be a general phenomenon for this outcome (see Table 2).
In this study, we used a Nuclear Twin and Family Design (NTFD) to account for both assortative mating and twin-specific shared environments using data on twins, their parents, and their siblings in the German TwinLife sample. Moreover, unlike the CTD, NTFD models can estimate non-additive genetic influences (N) and shared environmental influences simultaneously. Unmodelled N can bias estimates of heritability upwards in the CTD ACE model and bias estimates of shared environmental influence downwards. The degree of bias introduced depends on whether these unmodelled non-additive genetic influences consist of gene-gene interactions at single loci (“dominance”) or interactions across multiple loci (“epistasis”).
Furthermore, NTFD models can disaggregate phenotypic transmission (P)—here the environmental effects of parental education on offspring education—from other twin or sibling shared environments. NTFD models can likewise disaggregate the variance explained by passive gene-environment correlation (rGE), which is captured under the C-component in the CTD ACE model. The contribution of passive rGE to EA is a subject of growing scientific interest as molecular genetic studies have indicated it might explain around half of the phenotypic variation captured by current EA polygenic scores (PGSs)29,38,39,40,41. We compare the results from NTFD and CTD models run on EA data from the same families in order to assess the size and direction of bias in our CTD parameter estimates.
A previous TwinLife study by Eifler and Riemann (2021)42 used an NTFD phenotypic assortment model to decompose the variance in school leaving certificates. Here we extend that work to decompose ultimate years of education completed as imputed from both completed qualifications and enrolled post-secondary education courses. We further build on that analysis by contrasting NTFD results with CTD results, by fitting social homogamy models in addition to phenotypic assortment models, and by modelling both dominance and epistasis as potential sources of non-additive genetic influence. By exploring a wider range of boundary conditions in which different assumptions are made and different parameters are estimated, we have attempted to map out the plausible parameter space defined by NTFD models of these data22.
Correlations between different relatives
Correlations for EA between different relatives in our sample are presented in Table 3. MZ twins were highly correlated (r = 0.77) suggesting substantial familial (i.e., genetic and/or environmental) influences on the trait. DZ twins were somewhat less correlated than MZ twins (r = 0.6) suggesting that some of the familial influence is genetic, but most is due to shared environmental influence. However, mothers and fathers were also highly correlated with each other (r = 0.6), suggesting that assortative mating of some kind is present. This could imply that genetic influence is higher, and shared environmental influence lower, than CTD ACE estimates would normally imply. Additionally, DZ twin correlations (r = 0.6) were substantially higher than the correlations between twins and their non-twin siblings (r = 0.45), suggesting that twin-specific shared environmental influences might play an important role.
Model fitting results
Based on the twin correlations above, in which the MZ twin correlations were less than twice as large as the DZ twin or sibling correlations, we proceeded to fit a CTD ACE model to our twin only data rather than an ANE model that estimates non-additive genetic influences instead of shared environmental influences. This produced estimates of: A = 34% (95% CIs: 23–47%), C = 43% (31–53%), and E = 23% (20–26%).
We then fit NTFD models to our full twin, parent, and sibling data. Three phenotypic assortment (PA) models and three social homogamy (SH) models were compared against a saturated model, respectively fixing non-additive genetic effects (N), sibling shared environments (S), and phenotypic transmission (P) to zero, as only two of these three parameters can be estimated simultaneously20. None of these six baseline models fit the data significantly worse than the saturated model. We proceeded to drop all non-significant paths from each of the baseline models to see if doing so produced a significant reduction in fit. It did not. Model fitting results are presented in Table 4.
Our PA models returned mean estimates of additive genetic influence ranging between 51–56%, non-additive genetic influences of 0–1%, parental influence of 0–1%, passive rGE of −2%−0%, sibling shared environments of 0–10%, twin-specific shared environments of 16–25%, and nonshared environments of 23%. Non-additive genetic influences, parental influences, and passive rGE could be dropped from all three PA baseline models without producing a significant decline in fit.
Our SH models returned mean estimates of additive genetic influence of 36–70%, zero non-additive genetic influence, phenotypic transmission of 0–4%, passive rGE of 0–10%, sibling shared environments of 0–11%, twin-specific shared environments of 0–25%, and nonshared environments of 21–23%. Phenotypic transmission was statistically significant in the two SH baseline models in which it was freely estimated and was therefore retained in the corresponding submodels. The SH baseline model that fixed phenotypic transmission to zero was our worst-fitting model and also yielded unusual results, e.g., producing heritability estimates even higher than our PA models (69%). Without this model or its nested submodels the SH heritability estimates range from 36–39%, substantially lower than our PA estimates and closer to our CTD estimate of 34%.
Under all PA models the parent-offspring correlation for EA was entirely genetically mediated. Under our SH models 43–46% of the parent-offspring correlation was genetically mediated except in our worst-fitting models where phenotypic transmission was fixed to zero (see Supplementary Table 4).
In general, SH models fit the data slightly worse than our PA models; however, our SH baseline model which assumed no non-additive genetic influence fit the data marginally better than PA models which assumed no sibling effects (see AIC and BIC values in Table 4). Our best fitting model overall, reporting both the lowest Akaike’s Information Criterion and lowest Bayesian Information Criterion43,44, was the PA ASTE model where A = 51% (46–56%), S = 10% (0.1–18%), T = 16% (8–26%), and E = 23% (21–26%).
Our best fitting NTFD model is contrasted against the CTD ACE model in Fig. 1. Our six NTFD baseline models are compared in Fig. 2. Variance components and confidence intervals for all NTFD models are presented in Table 5. Finally, the path estimates (with standard errors) for all NTFD models are available in Supplementary Table 2.
In addition to the results displayed here which assume that non-additive genetic influence is characterised by dominance we also ran an alternative set of epistatic models which assumed it was characterised by multi-local gene–gene interactions that only MZ twins share in common (see Supplementary Tables 3 and 5). This scenario isn’t considered biologically plausible but ensures that non-additive genetic effects shared by DZ twins aren’t over-estimated22. There were negligible differences between the dominance and epistatic model results.
We set out to explore how NTFD estimates of genetic and environmental influence for EA differed from conventional CTD estimates when the inclusion of more relative classes allowed additional parameters to be estimated. When phenotypic assortment (PA) was assumed, broad heritability estimates ranged from 51% to 56% in our NTFD models. Our best-fitting model estimated heritability at 51%, up 17 percentage points from our CTD estimate of 34%. This difference aligns with the 17-point assortative mating adjustment to heritability that we calculated in our re-analysis of studies in Branigan et al.12 (see Table 1). Together these results indicate that the 40% and 43% mean heritability estimates for EA reported in Branigan et al.12 and Silventoinen et al.13 might underestimate the true international average heritability for the relevant populations.
If the mean heritability of EA is ~17 percentage points higher than previously believed, this could also indicate that the ceiling on polygenic prediction for EA is higher than previously assumed. While the variance explained by PGSs (12–16% depending on cohort)29 is already approaching the current SNP heritability for EA (averaging ~15% globally)45, as whole genome sequencing of large samples becomes widespread, and rarer variants associated with EA are identified, it’s expected that both the SNP heritability and the variance explained by future EA PGSs will increase46,47,48. Pedigree-based estimates of heritability therefore provide an optimistic upper bound for the strength of the polygenic prediction that might ultimately be achieved.
Our NTFD results also suggest that CTD estimates of shared environmental influence (C) for EA might be overestimated. Total shared environmental influence (including passive rGE) was 26% in our best-fitting model, down 17 points from our CTD estimate of 43% after accounting for phenotypic assortment. Again, this aligns closely with our assortative mating adjustment to the Branigan et al. (2013) ACE estimates. Once we consider the growing evidence for genetic correlations between spouses for EA27,28,29,30,31,32 and the high spousal correlations for the studies included in Branigan et al. (2013)12 and Silventoinen et al. (2020)13, it suggests the 31–36% mean international estimates of shared environmental influence for EA in those studies might be substantially inflated.
However, our results also indicate that CTD estimates of shared environmental influence on EA cannot be safely interpreted as “between-family” differences in environmental opportunity irrespective of whether unmodelled assortative mating is an issue. Our NTFD models were able to decompose the shared environment into variance components that are shared by non-twin siblings (i.e., S, P, and rGE) and twin-specific shared environments (T) that are not. T accounted for 16–25% of the variance in our PA models and contributed a similar range in our SH models (when our worst fitting SH models that set parental effects to zero were excluded). Taking both assortative mating and twin-specific shared environments into account, our best-fitting model indicated just 10% of the variance in EA could be attributed to between-family environmental differences. This is 33 points lower than our C estimate of 43% under the CTD. Our survey of DZ twin versus non-twin sibling correlations for EA in Table 2 indicates that twin-specific shared environments are relevant for many of the populations in which CTD studies of EA have previously been conducted. These results suggest that researchers should refrain from drawing strong conclusions about the differences in educational opportunity between families based on CTD estimates for EA12,13,49.
Additionally, the decomposition of the shared environment under our NTFD PA models (which include our best-fitting model) implied negligible environmental influence of parental education on offspring education. Under these models, the observed parent-offspring correlation was entirely genetically mediated (see Supplementary Table 4) inverting the traditional sociological interpretation that this correlation captures environmental inequalities4,5,6,7,8. However, this does not imply that parents have no effect on offspring EA. Parental attributes other than EA could be driving some of the phenotypic similarity between siblings and between twins that is captured under S and T and those parental attributes could potentially include alternative socioeconomic indicators such as parental income.
Full genetic mediation of the parent-offspring correlation for EA was also found in a recent Norwegian study using a Multiple Children of Twins design50. However that study speculated that this was the result of Norway’s egalitarian social policies and specifically predicted that the more stratified German education system would produce different results50. Instead, our results indicate that genetic mediation of the parent-offspring correlation might be a more general phenomenon. That would suggest the intergenerational mobility literature exaggerates the environmental transmission of advantage and the differences in opportunity between families even more than CTD studies have previously indicated12.
For over 60 years it has been common practice in the social sciences to treat the correlation for EA between first-degree relatives as a direct measure of inequality of environmental opportunity, painting a picture of society that is deeply and persistently unmeritocratic4,5,6,7,8. By demonstrating that a substantial fraction of the familial correlation is genetic, CTD studies have shown that environmental differences between families play a much smaller role in the intergenerational persistence of EA than has sometimes been suggested12. Nevertheless, conspicuously high CTD estimates of shared environmental influence for EA have continued to cause concern about high levels of unequal opportunity for this outcome13,18,51. The results presented here suggest that shared environmental influence might account for even less of the variation in educational attainment than conventional twin studies have indicated and that environmental opportunities might therefore be more equal than these studies have implied. Moreover, a large fraction of the remaining shared environmental variation for EA appears to consist of twin-specific shared environments that capture within-family differences in opportunity that carry a different moral and political connotation to between-family differences (even if they remain potential targets for political intervention). A promising avenue for future research would be to identify specific environmental variables which account for these within- and between-family differences in educational opportunity52,53.
That noted, we stress that equality of environmental opportunity—while a widely endorsed social goal—is not an uncontested one. Some have argued for a more radical egalitarian agenda that seeks to reduce the influence of both environmental and genetic accidents of birth on socially valued outcomes54,55,56. Others have argued that promoting conditions that maximise general welfare and personal freedom should take precedence over attempts to reduce environmental differences between people57,58,59. These important philosophical debates are, however, beyond the scope of this paper.
Our study involved the following limitations. By assuming subjects who are enroled in ongoing post-secondary studies go on to complete those courses, we potentially introduce bias by failing to capture dropouts. However, if we make stricter assumptions and only use the level of education completed, this severely reduces the variance in years of education (because of the youth of our sample). This is also an unrealistic assumption about the educational trajectory of subjects enrolled in post-secondary education given low German drop-out rates and a tendency for students and trainees to transfer horizontally into an alternative vocational or tertiary qualification rather than making a vertical change between categories60. Follow-up studies when the cohort is older will be able to address this limitation.
In addition, the negligible effect of parental EA on offspring EA under our PA models contradicts the evidence from studies which find a significant association between the EA of adoptive parents and adoptive children61,62. Here we stress that, while our best-fitting model was a PA model, our SH models also fit the data. It’s possible that a mixed homogamy scenario, in which phenotypic assortment and social homogamy both play a role, might explain the data better than the PA and SH models compared in this study. If so, that would suggest that the true contribution of genetic and environmental influences to the parent-offspring correlation and to the variance in EA lies somewhere between the PA and SH estimates presented here. This might also explain why our best-fitting model indicates no passive rGE in contrast to molecular genetic literature that suggests that EA polygenic scores partly capture passive rGE28,29,38,39,40,41; however, we also note that phenotypic assortment is expected to produce some of the molecular genetic effects that have been interpreted as passive rGE or “genetic nurture”28,38,63.
We also stress that the biases in CTD parameter estimates that we have reported for EA will not necessarily generalise to other traits. The size and direction of these biases can vary considerably across different traits depending on the extent to which different assumptions in the CTD model are violated.
In summary, by comparing the estimates of genetic and environmental influence on Educational Attainment (EA) from a Nuclear Twin and Family Design with the results from a conventional twin-only study in the same German families, we were able to account for some potential confounds in the Classical Twin Design (CTD). Our results indicate that unmodelled assortative mating may be introducing substantial downwards bias into CTD estimates of heritability for EA while correspondingly biasing estimates of shared environmental influence upwards. Our results also indicate that twin-specific shared environments might account for a substantial portion of the shared environmental estimate in CTD studies of EA, suggesting that such estimates cannot be safely interpreted as between-family differences in environmental opportunity. Our survey of previous CTD studies of EA suggest both issues are likely to generalise beyond our TwinLife sample, as we find high spousal correlations in those studies and high DZ twin correlations relative to non-twin sibling correlations in comparable samples. Together these findings suggest the differences in educational opportunity between families are substantially lower than CTD estimates of shared environmental influence on EA have indicated. In addition, we found that the relatively high parent-offspring correlation for EA in our German sample was fully explained by genetic transmission under our best fitting model, suggesting parental education might not be the engine of social reproduction of advantage that many sociological studies have implied.
All analyses were performed on data from TwinLife: a cross-sequential panel-study of German twins and their immediate relatives (parents, spouses, and the nearest sibling by age). TwinLife is broadly representative of twin and multiple-birth households in Germany64. The full sample consists of 4,097 twin pairs spanning four birth cohorts (born 1990–1993, 1997–1998, 2003–2004 and 2009–2010). Since its inception in 2014, data on participating twins and their relatives has been collected every year with face-to-face interviews and telephone interviews taking place on an alternating biennial basis. For this study, we used data from the oldest 1990–1993 cohort of twins (and relatives) only. We only used data on siblings who were born less than five apart from the twins in any given family to ensure our results were not primarily driven by outliers with large sibship-age differences. Data on educational attainment was available for 1,020 MZ twins (498 complete pairs), 896 DZ twins (439 complete pairs), 215 siblings, 906 mothers, and 536 fathers. Descriptive statistics are provided in Table 6.
The TwinLife study received ethical approval from the German Psychological Association (protocol numbers: RR 11.2009 and RR 09.2013). Respondents provided written informed consent for their data to be used for research purposes65.
Educational attainment was operationalised as a continuous variable by mapping the highest educational qualification obtained to a corresponding number of years of education (see Supplementary Table 6). Where twins or siblings were partway through a tertiary or professional qualification, we assigned years of education based on the completed qualification. In doing so we follow, Baier and Lang60, who note that German young adults who do not complete their enroled course generally achieve an alternative qualification of a similar type (e.g. tertiary or vocational) rather than dropping out. Means and standard deviations for the different types of family members are displayed in Table 6.
After calculating means and variances for each relative class, we calculated correlations between each type of family member (as shown in Table 3). We then corrected educational attainment for age and gender66 and z-standardised the residuals before fitting CTD or NTFD structural equation models. Twin modelling was performed using the OpenMx67 package in R68.
The Classical Twin Design (CTD)
The CTD is one of the most commonly used study designs in behavioural genetics. The CTD compares the resemblance of reared-together MZ twin for a given trait with the resemblance of reared-together DZ twins. The CTD assumes random mating on the trait in question, under which DZ twins are expected to share half of their trait relevant genes in common on average, compared to MZ twins who share all of their genes in common. The CTD also assumes that rearing conditions are equal between both kinds of twins (the Equal Environments Assumption), therefore any additional resemblance shown between MZ twin pairs compared to DZ twin pairs is attributed to additive genetic influence (A). Any residual similarity between MZ twins that is not explained by genetic influences is attributed to the shared environment (C). If MZ twins are more than twice as similar as DZ twins, genetic dominance is typically assumed to explain this, and it is modelled instead of C. Finally, the variance that cannot be accounted for by MZ twin resemblance is attributed to the nonshared environment (E). The methodology for fitting CTD structural equation models to twin data has been described in detail elsewhere69.
The Nuclear Twin and Family Design (NTFD)
Including additional relative classes in the NTFD enables several of the assumptions in the CTD to be relaxed and more parameters to be estimated. Non-additive genetic influences (N) and shared environmental influences can be estimated simultaneously, and the shared environment can be further decomposed into the shared sibling environment (S), the environmental effects of parental education on offspring education (P), and—if non-twin siblings are also included in the model—the twin-specific shared environment (T). Passive rGE can also be disaggregated from shared environmental influences. As means and variances in EA were similar for both twins and non-twin siblings (see Table 6), T was modelled as a variance component for all relative classes rather than an additional variance component experienced exclusively by twins18,70.
Incorporating data from parents also allows the NTFD model to directly account for assortative mating. We explored two boundary conditions: a phenotypic assortment model in which the correlation for EA between parents was assumed to be the result of active mate selection on education (inducing a genetic correlation between spouses), and a social homogamy model in which the correlation between spouses was assumed to be environmentally driven. We modelled social homogamy by extending the traditional NTFD model using innovations from the “Cascade” model developed by Keller et al. (2009)20. A latent phenotype (M’ and F’) is introduced between the observed parental phenotype (M and F) and the assortative mating copath (µ) linking each parent in the standard NTFD model.
Under the phenotypic assortment model, the variance of the latent parental phenotype is defined by the same variance components as the parental phenotype (and the variance of the parental phenotype is the same as its covariance with the latent phenotype making the algebra identical with that of the standard NTFD model). By contrast, under the social homogamy model, the genetic (a and n) paths leading to the latent phenotype are set to zero, obliging the covariance between the parental phenotype and the latent phenotype to be mediated by non-genetic factors.
As phenotypic transmission, non-additive genetic effects, and sibling-shared environmental influences could not all be estimated simultaneously20, we ran three baseline models (ANSTE, AFSTE, ANFTE) in which each of these three effects were respectively fixed to zero. This was performed under both a phenotypic assortment and a social homogamy assumption. These six baseline models were then compared against a saturated model (describing the means, variances and covariances of the different relative classes) using a chi-squared test to see if any produced a significantly worse fit to the data. For each baseline which did not show a significant reduction in fit, we iteratively dropped all paths with 95% confidence intervals crossing zero to see if this produced a significant reduction in fit using further chi-squared tests. Parameter estimates were reported for all baseline models which did not show a significant reduction in fit from the saturated model and were likewise reported for all submodels which did not show a significant reduction in fit compared to these baseline models. From these statistically significant models, the overall best-fitting model was determined on the basis of the lowest Akaike’s Information Criterion71. Finally, we ran a set of six additional baseline models to test if results were substantially affected if non-additive genetic effects were characterised as multi-local epistatic effects rather than as dominance or bi-local gene-gene interactions22.
A path diagram of our NTFD phenotypic assortment model is provided in Fig. 3. The algebra assumed to underlie our CTD, NTFD-PA, and NTFD-SH models is provided in Supplementary Table 7. The methodology for fitting NTFD structural equation models to twin and family data has been described in detail elsewhere20.
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
The TwinLife dataset that supports the main results of this study is available free of charge to researchers via GESIS (https://doi.org/10.4232/1.13987) subject to the completion of a Data Use Agreement. The data used for adjusting grand mean ACE estimates in Branigan et al.12 for assortative mating are included in this published article (and its supplementary information files).
Code will be made available on request.
Geyer, S., Hemström, Ö., Peter, R. & Vågerö, D. Education, income, and occupational class cannot be used interchangeably in social epidemiology. Empirical evidence against a common practice. J. Epidemiol. Community Health 60, 804–810 (2006).
Lleras-Muney, A. The relationship between education and adult mortality in the United States. Rev. Econ. Stud. 72, 189–221 (2005).
Lochner, L. & Moretti, E. The effect of education on crime: evidence from prison inmates, arrests, and self-reports. Am. Econ. Rev. 94, 155–189 (2004).
Breen, R. & Jonsson, J. O. Inequality of opportunity in comparative perspective: recent research on educational attainment and social mobility. Annu. Rev. Sociol. 31, 223–243 (2005).
Hertz, T. et al. The inheritance of educational inequality: international comparisons and fifty-year trends. BE J. Econ. Anal. Policy 7, 10 (2008).
van der Weide, R., Lakner, C., Mahler, D. G., Narayan, A. & Ramasubbaiah, R. Intergenerational Mobility Around the World. https://papers.ssrn.com/abstract=3981372. https://doi.org/10.2139/ssrn.3981372 (2021).
Shavit, Y. & Blossfeld, H. Persistent Inequality: Changing Educational Attainment In Thirteen Countries. (Avalon Publishing, 1993).
Grätz, M. et al. Sibling similarity in education across and within societies. Demography 58, 1011–1037 (2021).
Taubman, P. The determinants of earnings: genetics, family, and other environments: a study of white male twins. Am. Econ. Rev. 66, 858–870 (1976).
Taubman, P. Kinometrics: Determinants of Socioeconomic Success Within and Between Families. (North-Holland Publishing Company, 1977).
Branigan, A. R., McCallum, K. J. & Freese, J. Variation in the heritability of educational attainment: an international meta-analysis. Soc. Forces 92, 109–140 (2013).
Silventoinen, K. et al. Genetic and environmental variation in educational attainment: an individual-based analysis of 28 twin cohorts. Sci. Rep. 10, 12681 (2020).
Briley, D. A. & Tucker-Drob, E. M. Comparing the developmental genetics of cognition and personality over the Lifespan. J. Pers. 85, 51–64 (2017).
Malanchini, M. et al. Pathfinder: A gamified measure to integrate general cognitive ability into the biological, medical and behavioural sciences. bioRxiv 2021.02.10.430571 https://doi.org/10.1101/2021.02.10.430571 (2021).
Plomin, R. & Deary, I. J. Genetics and intelligence differences: five special findings. Mol. Psychiatry 20, 98–108 (2015).
Rimfeld, K. et al. The stability of educational achievement across school years is largely explained by genetic factors. Npj Sci. Learn. 3, 16 (2018).
Nielsen, F. & Roos, J. M. Genetics of educational attainment and the persistence of privilege at the turn of the 21st Century. Soc. Forces 94, 535–561 (2015).
Freese, J. & Jao, Y.-H. Shared environment estimates for educational attainment: a puzzle and possible solutions. J. Pers. 85, 79–89 (2017).
Keller, M. C. et al. Modeling extended twin family data I: Description of the cascade model. Twin Res. Hum. Genet. 12, 8–18 (2009).
Keller, M. C., Medland, S. E. & Duncan, L. E. Are extended twin family designs worth the trouble? A comparison of the bias, precision, and accuracy of parameters estimated in four twin family models. Behav. Genet. 40, 377–393 (2010).
Keller, M. C. & Coventry, W. L. Quantifying and Addressing Parameter Indeterminacy in the Classical Twin Design. Twin Res. Hum. Genet. 8, 201–213 (2005).
Falconer, D. & MacKay, T. Introduction to quantitative genetics. (Longman, 1996).
Horwitz, T. B. & Keller, M. C. A comprehensive meta-analysis of human assortative mating in 22 complex traits. 2022.03.19.484997. Preprint at https://doi.org/10.1101/2022.03.19.484997 (2022).
Heath, A. C. & Eaves, L. J. Resolving the effects of phenotype and social background on mate selection. Behav. Genet. 15, 15–30 (1985).
Zietsch, B. P., Verweij, K. J. H., Heath, A. C. & Martin, N. G. Variation in human mate choice: simultaneously investigating heritability, parental influence, sexual imprinting, and assortative mating. Am. Nat. 177, 605–616 (2011).
Yengo, L. et al. Imprint of assortative mating on the human genome. Nat. Hum. Behav. 2, 948–954 (2018).
Nivard, M. et al. Neither nature nor nurture: Using extended pedigree data to elucidate the origins of indirect genetic effects on offspring educational outcomes. Preprint at https://doi.org/10.31234/osf.io/bhpm5 (2022).
Okbay, A. et al. Polygenic prediction of educational attainment within and between families from genome-wide association analyses in 3 million individuals. Nat. Genet. 54, 437–449 (2022).
Hugh-Jones, D., Verweij, K. J. H., St. Pourcain, B. & Abdellaoui, A. Assortative mating on educational attainment leads to genetic spousal resemblance for polygenic scores. Intelligence 59, 103–108 (2016).
Torvik, F. A. et al. Modeling assortative mating and genetic similarities between partners, siblings, and in-laws. Nat. Commun. 13, 1108 (2022).
Robinson, M. R. et al. Genetic evidence of assortative mating in humans. Nat. Hum. Behav. 1, 1–13 (2017).
Martin, N. Genetics of sexual and social attitudes in twins. in Twin Research: Psychology and Methodology, Alan R 13–23 (1978).
Kendler, K. S., Ohlsson, H., Lichtenstein, P., Sundquist, J. & Sundquist, K. The Nature of the Shared Environment. Behav. Genet. 49, 1–10 (2019).
Plomin, R. & DeFries, J. C. Genetics and intelligence: Recent data. Intelligence 4, 15–24 (1980).
Lehti, H. The role of kin in educational and status attainment. (2020).
Rietveld, C. A. et al. GWAS of 126,559 individuals identifies genetic variants associated with educational attainment. Science 340, 1467–1471 (2013).
Kong, A. et al. The nature of nurture: Effects of parental genotypes. Science 359, 424–428 (2018).
Cheesman, R. et al. Comparison of Adopted and Nonadopted Individuals Reveals Gene–Environment Interplay for Education in the UK Biobank. Psychol. Sci. 0956797620904450 https://doi.org/10.1177/0956797620904450 (2020).
Okbay, A. et al. Genome-wide association study identifies 74 loci associated with educational attainment. Nature 533, 539–542 (2016).
Bates, T. C. et al. The nature of nurture: using a virtual-parent design to test parenting effects on children’s educational attainment in genotyped families. Twin Res. Hum. Genet. 21, 73–83 (2018).
Eifler, E. F. & Riemann, R. The aetiology of educational attainment: A nuclear twin family study into the genetic and environmental influences on school leaving certificates. Br. J. Educ. Psychol. 92, 881–897 (2022).
Akaike, H. Factor Analysis and AIC. in Selected Papers of Hirotugu Akaike (eds. Parzen, E., Tanabe, K. & Kitagawa, G.) 371–386 (Springer New York, 1998). https://doi.org/10.1007/978-1-4612-1694-0_29.
Raftery, A. E. Bayesian model selection in social research. Sociol. Methodol. 25, 111–163 (1995).
Lee, J. J. et al. Gene discovery and polygenic prediction from a genome-wide association study of educational attainment in 1.1 million individuals. Nat. Genet. 50, 1112 (2018).
Young, A. I. Discovering missing heritability in whole-genome sequencing data. Nat. Genet. 54, 224–226 (2022).
Wainschtein, P. et al. Assessing the contribution of rare variants to complex trait heritability from whole-genome sequence data. Nat Genet. 54, 263–273 (2022).
Yengo, L. et al. A saturated map of common genetic variants associated with human height. Nature 610, 704–712 (2022).
Engzell, P. & Tropf, F. C. Heritability of education rises with intergenerational mobility. Proc. Natl Acad. Sci. 116, 25386–25388 (2019).
Baier, T., Eilertsen, E. M., Ystrom, E., Zambrana, I. M. & Lyngstad, T. H. An Anatomy of the Intergenerational Correlation of Educational Attainment -Learning from the Educational Attainments of Norwegian Twins and their Children. Res. Soc. Stratif. Mobil. 100691 https://doi.org/10.1016/j.rssm.2022.100691 (2022).
Harden, K. P. Reports of my death were greatly exaggerated”: behavior genetics in the postgenomic era. Annu. Rev. Psychol. 72, 37–60 (2021).
Turkheimer, E., D’Onofrio, B. M., Maes, H. H. & Eaves, L. J. Analysis and interpretation of twin studies including measures of the shared environment. Child Dev. 76, 1217–1233 (2005).
Engelhardt, L. E., Church, J. A., Harden, K. P. & Tucker‐Drob, E. M. Accounting for the shared environment in cognitive abilities and academic achievement with measured socioecological contexts. Dev. Sci. 22, e12699 (2019).
Rawls, J. A Theory of Justice. (Harvard University Press, 2009).
Harden, K. P. The Genetic Lottery: Why DNA Matters for Social Equality. (Princeton University Press, 2021).
deBoer, F. The Cult of Smart: How Our Broken Education System Perpetuates Social Injustice. (St. Martin’s Publishing Group, 2020).
Hayek, F. A. von. The Mirage of Social Justice. (University of Chicago Press, 1978).
Hayek, F. A. The Constitution of Liberty: The Definitive Edition. (Routledge, 2020).
Morris, D. The Culture War is Coming for Your Genes. Quillette https://quillette.com/2021/09/30/the-culture-war-is-coming-for-your-genes/ (2021).
Baier, T. & Lang, V. The social stratification of environmental and genetic influences on education: new evidence using a register-based twin sample. Sociol. Sci. 6, 143–171 (2019).
Björklund, A. & Salvanes, K. G. Chapter 3 - Education and Family Background: Mechanisms and Policies. in Handbook of the Economics of Education (eds. Hanushek, E. A., Machin, S. & Woessmann, L.) vol. 3 201–247 (Elsevier, 2011).
Holmlund, H., Lindahl, M. & Plug, E. The causal effect of parents’ schooling on children’s schooling: a comparison of estimation methods. J. Econ. Lit. 49, 615–651 (2011).
Young, A. I., Benonisdottir, S., Przeworski, M. & Kong, A. Deconstructing the sources of genotype-phenotype associations in humans. Science 365, 1396–1400 (2019).
Lang, V. & Kottwitz, A. The sampling design and socio-demographic structure of the first wave of the TwinLife panel study: a comparison with the Microcensus. vol. 03 https://pub.uni-bielefeld.de/record/2913250 (2017).
Lang, V. et al. An introduction to the german twin family panel (TwinLife). Jahrb. F.ür. Natl Stat. 240, 837–847 (2020).
McGue, M. & Bouchard, T. J. Adjustment of twin data for the effects of age and sex. Behav. Genet. 14, 325–343 (1984).
Boker, S. et al. OpenMx: An open source extended structural equation modeling framework. Psychometrika 76, 306–317 (2011).
R Core Team. R: A language and environment for statistical computing. (2020).
Neale, M. & Cardon, L. R. Methodology for Genetic Studies of Twins and Families. (Springer Science & Business Media, 2013).
Koeppen-Schomerus, G., Spinath, F. M. & Plomin, R. Twins and non-twin siblings: different estimates of shared environmental influence in early childhood. Twin Res. Hum. Genet. 6, 97–105 (2003).
Wagenmakers, E.-J. & Farrell, S. AIC model selection using Akaike weights. Psychon. Bull. Rev. 11, 192–196 (2004).
Heath, A. C. et al. Education policy and the heritability of educational attainment. Nature 314, 734–736 (1985).
Lykken, D. T., Bouchard, T. J., McGue, M. & Tellegen, A. The minnesota twin family registry: some initial findings. Acta Genet. Medicae Gemellol. Twin Res 39, 35–70 (1990).
Baker, L. A., Treloar, S. A., Reynolds, C. A., Heath, A. C. & Martin, N. G. Genetics of educational attainment in Australian twins: Sex differences and secular changes. Behav. Genet. 26, 89–102 (1996).
Bingley, P., Christensen, K. & Walker, I. Twin-based Estimates of the Returns to Education: Evidence from the Population of Danish Twins. 30
Silventoinen, K., Sarlio-Lähteenkorva, S., Koskenvuo, M., Lahelma, E. & Kaprio, J. Effect of environmental and genetic factors on education-associated disparities in weight and weight gain: a study of Finnish adult twins. Am. J. Clin. Nutr. 80, 815–822 (2004).
Silventoinen, K., Kaprio, J. & Lahelma, E. Genetic and environmental contributions to the association between body height and educational attainment: a study of adult finnish twins. Behav. Genet. 30, 477–485 (2000).
Ørstavik, R. E. et al. Sex differences in genetic and environmental influences on educational attainment and income. Twin Res. Hum. Genet. 17, 516–525 (2014).
Lyngstad, T. H., Ystrøm, E. & Zambrana, I. M. An Anatomy of Intergenerational Transmission: Learning from the educational attainments of Norwegian twins and their parents. Preprint at https://doi.org/10.31235/osf.io/fby2t (2017).
Isacsson, G. Estimates of the return to schooling in Sweden from a large sample of twins. Labour Econ. 6, 471–489 (1999).
We’d like to thank all the families who participated in TwinLife for making this study possible, thank Dr Karri Silventoinen for providing unpublished results from Silventoinen et al. (2020)13, thank Dr Martin Diewald, Dr Felix Tropf, and Dr Stuart J Ritchie for their feedback on early drafts of the manuscript, and thank our anonymous reviewers for their helpful comments.
Open Access funding enabled and organized by Projekt DEAL.
The authors declare no competing interests.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Wolfram, T., Morris, D. Conventional twin studies overestimate the environmental differences between families relevant to educational attainment. npj Sci. Learn. 8, 24 (2023). https://doi.org/10.1038/s41539-023-00173-y