Genetic and environmental influences on human height from infancy through adulthood at different levels of parental education

Genetic factors explain a major proportion of human height variation, but differences in mean stature have also been found between socio-economic categories suggesting a possible effect of environment. By utilizing a classical twin design which allows decomposing the variation of height into genetic and environmental components, we tested the hypothesis that environmental variation in height is greater in offspring of lower educated parents. Twin data from 29 cohorts including 65,978 complete twin pairs with information on height at ages 1 to 69 years and on parental education were pooled allowing the analyses at different ages and in three geographic-cultural regions (Europe, North America and Australia, and East Asia). Parental education mostly showed a positive association with offspring height, with significant associations in mid-childhood and from adolescence onwards. In variance decomposition modeling, the genetic and environmental variance components of height did not show a consistent relation to parental education. A random-effects meta-regression analysis of the aggregate-level data showed a trend towards greater shared environmental variation of height in low parental education families. In conclusion, in our very large dataset from twin cohorts around the globe, these results provide only weak evidence for the study hypothesis.

non-siblings. On the other hand, in families with a higher socio-economic position, the environment is likely to be more uniformly good with fewer environmental factors restricting growth and thus leading to taller offspring and less environmental variation.
According to the bioecological model, at-risk environments will mask genetic differences between individuals, while enriched environments will amplify genetic differences 22,23 . This leads to the hypothesis, that the heritability of height should increase with higher parental socioeconomic position. To our knowledge, there are no previous studies testing this hypothesis and thus no direct evidence whether the heritability of height differs according to family social background and parental education. Further, such modifying effect of socio-economic characteristics might change over birth cohorts or could be different in males and females, if some cultures would encourage scare resources to be primarily shared with male offspring.
To examine the modification of genetic and environmental variance components by parental education, large datasets collected across a range of strata within society or across different countries are needed. The power to detect such effect was explored by Boomsma and Martin 24 who concluded that heritability differences between groups of 0.3 or smaller requires large samples. Such information from large datasets was available from 29 twin cohorts participating in the CODATwins (COllaborative project of Development of Anthropometrical measures in Twins) project representing 15 countries from different parts of the world 25 . We utilized this database (i) to test whether parental education modifies the genetic and environmental variation of height in males and females from infancy through adulthood and (ii) to assess whether the possible modification effects vary between different geographic-cultural regions (Europe, North America and Australia, and East Asia).

Results
Descriptive statistics of height and parental education by age and sex for the pooled data (all cohorts together) are presented in Table 1 (the corresponding statistics by cultural-geographic region are presented in Supplementary table 1). Mean height showed the expected age pattern, and the difference between consecutive age groups was very similar in boys and girls during childhood. The exception was the slight decrease observed at 18 (males) and 20-69 (females) years, which reflects differences in the distribution of different cohorts within each age group. Mean height was generally tallest in Europe, somewhat shorter in North America and Australia and shortest in East Asia in both www.nature.com/scientificreports www.nature.com/scientificreports/ males and females. Paternal and maternal education generally decreased with age, which reflects the increasing education over birth cohorts since parents of younger twins were, on average, born later as compared to parents of older twins. Parental education was virtually identical for male and female twins during childhood and slightly greater in females from late adolescence. Parental education was generally lowest in Europe, reflecting that European twin cohorts were older than North American and Australian and East Asian cohorts (Supplementary table 1).
The associations between parental education (i.e., combined maternal and paternal) and offspring height, i.e. height difference in cm by one year difference of parental education, are presented in Fig. 1. From around age 5 years, parental education showed a generally positive association with offspring height; the pattern was similar in males and females, with significant associations in mid-childhood and from adolescence onwards. Regarding the geographic-cultural regions -which approximate ethnicity in the present study -the pattern in Europe was similar to that observed for the whole data set because it represents a large fraction of the total sample. In North America and Australia, the associations between parental education and offspring height were stronger than in Europe in some age groups, particularly in mid-childhood. In East Asia, the associations generally varied around zero and were not statistically significant. In North America and Australia and East Asia, the 95% confidence intervals (CIs) were, however, much broader than in Europe because of the smaller sample sizes. www.nature.com/scientificreports www.nature.com/scientificreports/ The total variance of height decomposed into additive genetic, shared environmental and unique environmental components in the three categories of parental education is shown in Fig. 2 (the estimates with 95% CIs are available in Supplementary Table 2). The total height variation was slightly greater in the lower than in the higher parental education level in some age-by-sex groups, but no consistent relation emerged by educational categories over ages. From age 13 years onwards, the total height variance was generally greater in males than in females. As indicated by overlapping CIs, genetic and environmental variances did not show any distinct relation across parental education categories from infancy through adulthood; the relative proportion of genetic and environmental variances did not show any relation either (Supplementary Table 3). Next, univariate variance decomposition modeling for height was carried out separately in the three geographic-cultural regions ( Fig. 3 and Supplementary Tables 4 and 5). The total variance of height was greatest in North America and Australia and lowest in East Asia, but no distinct relation in the variance components (both total estimates and relative proportion) across the parental education levels emerged (seen as overlapping CIs). In East Asia, possibly due to the smaller sample sizes, the magnitude of the variance components between the educational categories varied more than in the other two geographic-cultural regions.
Finally, we ran a random-effects meta-regression analysis of raw variance components of height (pooling all age groups and geographic-cultural regions together). The results showed some significant differences between the middle and low parental education categories (Table 2), when looking at the confidence intervals. In comparison with low parental education, for middle education shared environmental (c 2 ) component of height was significantly smaller in males and in both sexes together. The point estimates for the other sex and variance components groups followed the same direction, but were not significant. Given the number of comparisons, we should be very careful in a substantive interpretation of these findings. Standardized variance components models gave very similar results (Supplementary Table 6).

Discussion
Questions about the modification of genetic and environmental variance components require very large and genetically informative data sets. Our large twin study pooling data for 65,978 complete twin pairs from 29 cohorts from 15 countries established that for human height there is a high and consistent heritability across parental education levels. The same result, i.e. similar genetic and environmental variances of height across parental education levels, was found in different geographic-cultural regions having different mean stature.
The meta-regression analysis also failed to provide substantial evidence for the study hypothesis that shared environmental variation of height tends to be greater in low parental education families; the evidence is weak considering the size of the dataset when pooling all data together. In a previous study from the CODATwins database, we found that there was no decrease in the environmental variance of adult height over the birth cohorts from the late 19 th century to the late 20 th century, nor any clear secular changes in the heritability 18 . Therefore, using two very different approaches -i.e., indirect information on the increasing standard of living over 100 years and the direct measures of socio-economic position of childhood family-we established that there is no or very little evidence of greater shared environmental variation in height in disadvantageous environments.
The offspring of better educated parents were generally taller, particularly in mid-childhood and from adolescence onwards, than those whose parents had lower education. Our findings for average height are in agreement with several population based studies showing a positive association between parents' education and offspring height 15,17,26,27 . In a Chinese study, childhood height was also related to grandparents' education, suggesting that socioeconomic conditions of current and previous generations may affect height 28 . In some societies, children from families with lower socioeconomic status (SES) may still have, on average, poorer diets and be more severely affected by infections than those from families with higher SES [13][14][15] . Comparison between geographic-cultural regions showed that parental education was more strongly related to height in North America and Australia than in Europe, which may reflect larger social inequalities in the former. www.nature.com/scientificreports www.nature.com/scientificreports/ In the families of lower SES, environmental effects (e.g. malnutrition) on height may restrict individuals from reaching their genetic potential, leading to shorter stature. It is likely that there are differences in these environmental factors between low SES families; in high SES families, in contrast, the environment securing optimal growth is likely to be more homogeneous. These environmental influences would result in more between than within family variation in lower SES families, which according to the bioecological model is expected to increase shared environmental variation leading to lower heritability of height in lower as compared with higher SES families. It is thus interesting that even when we found the expected differences in mean height between families of high and low parental education, only very weak differences in genetic or environmental variances or in the www.nature.com/scientificreports www.nature.com/scientificreports/ heritability estimates of height were observed. It is theoretically possible that environmental factors affecting growth are so uniformly distributed in lower SES families that there is no variance of height explained by these environmental factors and thus the influence is not seen as shared environmental variation. However, we do not find this very likely since it would mean that families with high and low parental education form two distinctive but internally very homogenous groups. Further, this should be the case in all three cultural-geographic regions.
Finally, it is possible that the differences in height between the families of high and low parental education are not because of a causal effect of poorer living conditions on height but reflect genetic height differences. A study of children born in the 1990s found that higher education mothers had taller sons and daughters and that these differences in offspring height were fully explained by parental height 26 . This can be explained also by inheritance of socio-economic factors and not only genetic factors affecting height. However, there is also direct evidence on a modest genetic correlation (r = 0.13) between education and height based on linkage disequilibrium score regression analyses 29 . Thus, a not unreasonable hypothesis is that genetic variance of height can also differ by parental educational level. Such hypotheses will be testable in future studies, with the increasing availability of large genotyped cohorts (e.g. 30 ).
The present study has several strengths. First of all, our large multinational database of twin cohorts, with data on parental education and height over childhood and adulthood, allows a comprehensive research of the genetic and environmental influences on individual height differences across parental education categories over lifespan in different cultural-geographic regions. We had sufficient statistical power to address these questions. The individual-based data, in comparison to literature based meta-analyses, provide important advantages such as better opportunities for statistical modeling and lack of publication bias. However, our study also has limitations. Ethnic-cultural groups are differently represented and the greatest proportion of the database is formed by Caucasian populations following Westernized lifestyles. In addition, most of the height measures were self-reported 31 , which increases measurement error and thus may bias our results toward greater estimates of unique environmental effects. However, this is not likely to explain the main result, i.e., relatively similar genetic and environmental variances of height across the categories of parental educational attainment. Also when pooling the estimates of variance components from different ages, we could not adjust the SEs by multiple observations at different ages, and thus, the 95% CIs are likely to be too narrow. Therefore, the main emphasis should be on the age-specific results, where only one observation from each individual is used.
In conclusion, there is no solid evidence that lower parental education is related to greater environmental variation in offspring height from infancy through adulthood. Thus, our findings indicate that the heritability estimates of height are quite uniform across parental education levels in spite of differences in mean height.

Materials and methods
Sample. This study is performed with data from the CODATwins project, which was planned to pool information on height and weight data from all twin projects in the world 31 . Additional information on paternal and maternal education was available for 29 twin cohorts from 15 countries. The participating twin cohorts are listed in Table 1 (footnote) and were described in detail elsewhere 25,31 .
In the original database, there were 137,867 twin individuals with a total of 311,087 height measurements at ages 1-69 years. Age was classified to single-year age groups from age 1 to 19 years (e.g. age 1 includes 0.5-1.5 years range) and one unique adult age group (20-69 years); height measures at ages ≥70 years were excluded because individuals in old age are more prone to develop osteoporosis leading to shorter height 32 . Outliers and implausible values were checked by visual inspection for each age and sex group and removed (0.1% of the measurements) to obtain an approximately normal distribution, resulting in 310,736 measurements. To confirm that all analyses are based on independent observations, we selected one height measure per individual in each age group by keeping the measurement at the youngest age (removing <10% of the measurements) resulting in 282,176 height measurements from 137,574 twin individuals. After excluding twins without data on their co-twins, we had 264,610 height measurements (132,305 paired height measurements; 38% monozygotic (MZ), 34% same-sex dizygotic (SSDZ) and 28% opposite-sex dizygotic (OSDZ) twin pairs) from 65,978 complete twin  www.nature.com/scientificreports www.nature.com/scientificreports/ pairs (the number of observations by age and twin cohort is available on request). The different educational classifications used in the surveys were transformed as educational years by using the mean level of educational years in each category as described in detail elsewhere 25 .
In order to analyze possible differences in the genetic and environmental contribution on height across geographical-cultural regions, the cohorts were grouped in three regions: Europe (10 cohorts), North America and Australia (12 cohorts) and East Asia (5 cohorts) with 88,632, 34,087 and 8,873 paired height measurements, respectively. Two cohorts (Israel and Turkey) were not included in these sub-analyses by geographic-cultural region because the populations in these countries differ genetically from European populations 33 , and the data were too sparse to study these cohorts separately. The same classification was used also in our previous studies on the genetics of height in childhood 19 and adulthood 18 based on the CODATwins database.
All participants were volunteers and they or their parents/legal guardians gave informed consent when participating in their original study. Only a limited set of observational variables and anonymized data were delivered to the data management center at University of Helsinki. The pooled analysis was approved by the ethical committee of Department of Public Health, University of Helsinki, and the methods were carried out in accordance with the approved guidelines.
Statistical analyses. Statistical analyses were conducted using Stata statistical software (version 14.0; StataCorp, College Station, Texas, USA). First, all height measurements were adjusted for exact age and twin cohort within each age and sex group using linear regression model (height was used as the dependent variable and exact age and twin cohort as independent variables) and the resulting residuals were used as the outcome variable in the further statistical modeling. Twin cohorts were numbered as a nominal level variable in the regression analyses (i.e., a separate dummy variable was created for each twin cohort). Since paternal and maternal education (ranging from 0 to 30 years) may be differently associated with offspring birth year, we adjusted maternal and paternal education separately for twin cohort and birth year of their twin children (used as a proxy indicator for the birth years of parents) by fitting a regression model (maternal or paternal education was used as the dependent variable and twin cohort and birth year of their twin children as independent variables). Thus, the residuals indicate how much shorter or longer the parental education duration is as compared with that of the average person having a certain birth year in each twin cohort. These regression residuals were then summed up to get combined parental education and divided into three SD-based categories (<−0.5, −0.5 to +0.5, > +0.5), indicating low, intermediate and high parental education (31%, 40% and 29% of the observations, respectively).
We first studied the association between height and parental education separately for each age and sex group in all cohorts together as well as by the geographic-cultural regions. Linear regression models were used with parental education as the explanatory variable and height residuals as the outcome. The associations were adjusted for zygosity because of slight differences in height 34 and parental education between MZ and DZ twins 25 . The non-independence within twin pairs was taken into account by using the cluster-option available in Stata 35 . This option takes into account that twin pairs rather than independent individuals are sampled and accordingly corrects the standard errors to be larger because of the less informative sample design.
To estimate genetic and environmental influences on the variation of height, we employed classic twin modeling based on linear structural equations 36 . MZ twins share the same genomic sequence, whereas DZ twins share, on average, 50% of their genes identical-by-descent. On this basis, it is possible to decompose the total variance of height into variance due to additive genetic effects (A: correlated 1.0 for MZ and 0.5 for DZ pairs), dominance genetic effects (D: 1.0 for MZ and 0.25 for DZ pairs), common (shared) environmental effects (C: by definition, correlated 1.0 for MZ and DZ pairs) and unique (non-shared) environmental effects (E: by definition, uncorrelated in MZ and DZ pairs). As in our previous studies in children 18 and adults 17 , we found evidence of shared environmental variation but no evidence of dominance genetic variation in height. Thus, we used the additive genetic/shared environment/unique environment model in the analyses. Models were fitted separately for each parental education category by age and sex groups. A clear sex-specific genetic effect for height was found in childhood 19 and adulthood 18 , and thus it was included in all models allowing the opposite-sex DZ genetic correlation to be lower than the 0.5. Because DZ twins were slightly taller than MZ twins from infancy to adulthood 34 , different means for MZ and DZ twins were allowed. All genetic models were fitted by the OpenMx package (version 2.0.1) in the R statistical platform 31 using the maximum likelihood method.
In order to test whether variance components of height were significantly different between parental education categories, we ran a random-effects meta-regression analysis of the aggregate-level data of raw variance components. Adjustments were carried out for geographic-cultural regions and age categories, and models were run separately by sex and for both sexes together. However, it should be noted that in these analyses the SEs are not corrected for multiple observations and consequently the 95% CI are likely to be somewhat too narrow, possibly leading to a spurious support of the original hypothesis.

Data availability
The data used in this study is owned by the third parties (the individual twin cohorts) and made available to us in condition that they will be used only in this meta-analysis.
For this reason, we do not have legal rights to re-deliver the data or to provide it to other third parties without permissions from the data owners. In order to replicate the results, each researcher need to apply the data set from each individual twin cohort owners and to harmonize the data as a metafile.