Nature Genetics | Letter
Population genetic differentiation of height and body mass index across Europe
- Journal name:
- Nature Genetics
- Volume:
- 47,
- Pages:
- 1357–1362
- Year published:
- DOI:
- doi:10.1038/ng.3401
- Received
- Accepted
- Published online
Across-nation differences in the mean values for complex traits are common1, 2, 3, 4, 5, 6, 7, 8, but the reasons for these differences are unknown. Here we find that many independent loci contribute to population genetic differences in height and body mass index (BMI) in 9,416 individuals across 14 European countries. Using discovery data on over 250,000 individuals and unbiased effect size estimates from 17,500 sibling pairs, we estimate that 24% (95% credible interval (CI) = 9%, 41%) and 8% (95% CI = 4%, 16%) of the captured additive genetic variance for height and BMI, respectively, reflect population genetic differences. Population genetic divergence differed significantly from that in a null model (height, P < 3.94 × 10−8; BMI, P < 5.95 × 10−4), and we find an among-population genetic correlation for tall and slender individuals (r = −0.80, 95% CI = −0.95, −0.60), consistent with correlated selection for both phenotypes. Observed differences in height among populations reflected the predicted genetic means (r = 0.51; P < 0.001), but environmental differences across Europe masked genetic differentiation for BMI (P < 0.58).
Subject terms:
At a glance
Figures
-
Figure 1: Observed divergence and predicted genetic divergence in height and BMI across 14 European nations. (a–d) The predicted genetic means (a,c) and observed means (b,d) for height and BMI for 14 European nations are shown across Europe. From recently published data, we estimated national differences in mean height and BMI for 14 European countries, accounting for trends over time, with a European average height of 171.1 cm (95% CI = 169.6, 172.8 cm) and an average BMI of 25.0 (95% CI = 24.7, 25.3) across nations for males between 2000 and 2010.
-
Figure 2: Predicted genetic differentiation compared to the expectation under genetic drift for height and BMI across 14 European nations. (a,b) The mean predicted genetic differentiation (blue) and differentiation under the null model representing genetic drift (gray) are shown for 14 European nations, with 95% credible intervals, for height (a) and BMI (b). ISO 2 country codes indicate each nation. The average P value of differentiation from the null expectation is shown in the figure. (c) The pattern of population-level co-differentiation for height and BMI across 14 European nations. The negative population genetic co-differentiation for these traits of −0.80 (95% CI = −0.95, −0.60) is represented by the blue ellipse.
-
Figure 3: Association between observed population means and predicted genetic population means for height and BMI across 14 European nations. (a,b) Predicted population genetic means are plotted against observed population means for height (a) and BMI (b). P values give the significance of the multivariate Pearson product-moment correlation between the predicted population genetic means and the observed population means for both traits. For height, the correlation (r = 0.51, 95% CI = 0.39, 0.61) was greater than that expected under the null model (r = 0.03, 95% CI = −0.21, 0.17). For BMI, the correlation (r = −0.10, 95% CI = −0.19, 0.01) was not significantly different from the null expectation (r = −0.08, 95% CI = −0.24, 0.15).
-
Supplementary Fig. 1: Overview of the study design. -
Supplementary Fig. 2: Projection of the prediction samples onto the first two HapMap principal components. For the non-ascertained set of independent (LD r2 < 0.1), common, HapMap 3 loci that we used for prediction, we projected our prediction samples onto the first two principal components from HapMap.
-
Supplementary Fig. 3: Simulation study comparing population and within-family association testing. Using real genotype data, causal variants were allocated to 5,000 independent loci at random across the genome, with effects sampled from a normal distribution. Effect sizes were estimated in a genome-wide association study without controlling for population stratification (GWAS; yellow), in a GWAS that controlled for the first 20 principal components (green) and in a sibling pair within-family design (blue). The effect sizes were then used to predict the phenotype in an independent set of sibling pair data, and a recently derived approach was used to test for population stratification bias in the effect size estimates (Online Methods and ref. 33). Variance attributable to a Cg or a Ce term is indicative of population stratification bias, which can be observed in the GWAS scenario that did not appropriately control for population stratification.
-
Supplementary Fig. 4: Simulation study comparing population and within-family association. Using real genotype data, causal variants were allocated to 5,000 independent loci at random across the genome, with their effects sampled from a normal distribution. Fifty simulation replicates were conducted. Effect sizes were estimated in a genome-wide association study without controlling for population stratification (GWAS) and in a sibling pair within-family design. The effect sizes from the top 100 independent SNPs identified by the GWAS, the top 500 independent SNPs identified by the GWAS and genome-wide independent loci were then used for prediction in an independent sample. Individuals from the prediction sample were projected onto the first principal component of the discovery sample, and then two groups were selected based on the upper and lower quartiles of the distribution of the projected principal component. The mean difference in the predictor is shown when the predictor was created using effect sizes estimated in a GWAS that did not control for population stratification, using effect sizes estimated in a within-family design, using the true simulated effect sizes and when the within-family effect sizes were randomly allocated to loci under our null model. A predictor created from within-family estimates of effect size yields similar estimates to the true simulated values and the null model, and in no simulation were our predictions significantly different from our null model. This result was irrespective of whether there was ascertainment of loci from a discovery GWAS containing population stratification bias, as when we selected the top 100 or top 500 loci from the GWAS, re-estimated the effects in a within-family analysis and created a predictor we found no evidence of differentiation from our null model. A predictor created from GWAS estimates of effect size that are biased by population stratification yields variable estimates of the prediction difference between the 2 groups, and in 10 of the 50 simulations our predictions differed significantly from our null model.
-
Supplementary Fig. 5: Simulation study comparing population and within-family association testing showing potential ascertainment bias. Using real genotype data, causal variants were allocated to 5,000 independent loci at random across the genome, with their effects sampled from a normal distribution. Fifty simulation replicates were conducted. Genotype-environment correlation was induced through a phenotypic mean difference of 0.5 s.d. along the first principal component of the discovery sample. Effect sizes were estimated in a genome-wide association study without controlling for population stratification (GWAS) and in a sibling pair within-family design. The effect sizes from the top 100 independent SNPs identified by the GWAS, the top 500 independent SNPs identified by the GWAS and genome-wide independent loci were then used for prediction in an independent sample. Individuals from the prediction sample were projected onto the first principal component of the discovery sample, and two groups were then selected based on the upper and lower quartiles of the distribution of the projected principal component. The mean difference in the predictor is shown when the predictor was created using effect sizes estimated in a GWAS that did not control for population stratification, using effect sizes estimated in a within-family design, using the true simulated effect sizes and when the within-family effect sizes were randomly allocated to loci under our null model. Ascertainment bias is evident here, where if the top 100 or 500 loci from a GWAS containing population stratification were selected to create a predictor then, irrespective of whether biased SNP effect estimates from the GWAS or unbiased SNP effect estimates from the within-family analysis were used, a significant deviation from the null model is observed. This is because the loci selected from the GWAS were those where the genotype-environment correlation was the strongest. In this simulation scenario, there is no selection on the phenotype and, thus, when there is no ascertainment of loci (when genome-wide SNPs are used to create the predictor), no prediction differences from the true values or the null model were evident.
-
Supplementary Fig. 6: Proportion of population-level variance in a genetic predictor comprised of different sets of SNPs for height and BMI. Genetic predictors were created from independent (pairwise LD correlation < 0.1, >1 Mb apart), common HapMap 3 loci selected at different significance thresholds (P < 5 × 10–8, P < 5 × 10–6, P < 5 × 10–4, P < 0.005, P < 0.05, P < 0.1) from large-scale meta-analyses. All refers to genome-wide independent (pairwise LD correlation < 0.1, >1 Mb apart), common HapMap 3 loci that were either selected preferentially on the basis of their within-family association with either trait or selected at random.
-
Supplementary Fig. 7: Increased prediction accuracy results in increased population-level variance. We selected SNPs from large-scale meta-analyses, re-estimated their effects in a within-family design that is unbiased of population stratification and then assessed prediction accuracy in an independent sample for (a) height and (b) body mass index. We find that the population-level variance is better captured when a greater amount of phenotypic variation is explained by the predictor.
-
Supplementary Fig. 8: Genome-wide pattern of population genetic differentiation for height and body mass index. -
Supplementary Fig. 9: The characteristics of a SNP and its contribution to the genome-wide pattern of population differentiation for height. Relationship between the contribution of a SNP to the genome-wide pattern of population differentiation (c2 value) and (a) allele frequency, (b) phenotypic variance explained and (c) meta-analysis P value.
-
Supplementary Fig. 10: The characteristics of a SNP and its contribution to the genome-wide pattern of population differentiation for body mass index. Relationship between the contribution of a SNP to the genome-wide pattern of population differentiation (c2 value) and (a) allele frequency, (b) phenotypic variance explained and (c) meta-analysis P value.
-
Supplementary Fig. 11: Simulation study of the general approach. (a) Simulated distribution of allele frequency differentiation, θ, at 10,000 loci plotted against allele frequency. (b) The simulated association between θ and the additive genetic variance contributed by each locus, which assumes that the most differentiated loci are those that contribute most to the additive genetic variance. (c) Profile scores were calculated from different sets of simulated loci, where each set explained differing amounts of the total variance. Population-level variance is shown for each set of loci, demonstrating that, even when the loci cumulatively explain only 20% of the total variance, 50% of the population-level genetic variance is captured. (d) Error variance for each locus was simulated from a normal distribution with variance equal to a percentage of the total variance in profile score. Increasing amounts of error variance were added to the profile score to approximate the effects of adding an increasing number of false positive loci. Population-level variance is shown at different error variances, demonstrating that including a large number of false positives decreases the population-level effects.
-
Supplementary Fig. 12: Simulation study of the error variance induced by the addition of null SNPs. The error variance for each locus was simulated from a normal distribution as a percentage of the total variance in profile score. Increasing amounts of error variance were added to the profile score to approximate the effects of adding an increasing number of false positive loci. The 95% confidence intervals are shown for the population mean profile score of 11 simulated populations across increasing amounts of incorporated error variance: (a) 0%, (b) 1%, (c) 2.5%, (d) 5%, (e) 7.5% and (f) 10%. This pattern reflects the fact that including a large number of false positives decreases the amount of population-level variance estimated.
-
Supplementary Fig. 13: Overlap in the annotation of differentiated loci to genes for height and BMI. Five hundred SNP loci were selected that are expected to contribute most to the pattern of population genetic differentiation for height and body mass index. These SNP loci were annotated to genes, and the overlap was estimated. The annotation was then repeated by randomly selecting 500 loci from the top 10,000 SNP loci 100 times. The 95% confidence interval of the percentage of overlapping genes across the 100 sampling steps was 8.4–19.4%. In the expected top 500 loci contributing to population genetic variation of each trait, the percentage overlap was 19.8%.
-
Supplementary Fig. 14: Population genetic differentiation across six Italian villages. Predicted population genetic differentiation for BMI and height on a small scale across six northern Italian villages. There was no significant differentiation from the null model for height (c2 = 0.23, P = 0.985) and for BMI (c2 = 6.27, P = 0.817) at any set of SNPs, and we present the results across the genome. The villages are Sauris (sa), Resia (re), Illegio (il), Erto (er), Clausetto (cl) and San Martino del Corso (sm) in the Friuli-Venezia Giulia region in northeastern Italy. On this small scale, we find no evidence for population differentiation for either phenotype.
-
Supplementary Fig. 15: Population genetic differentiation in the Human Genetic Diversity Panel for independent genome-wide SNPs ascertained on within-family effect sizes. (a) Height and (b) BMI. P values give the deviation of the predicted means from the null expectation. The proportion of population-level variance was 17.5% (95% CI = 9.6, 27.9) for height and 13.1% (95% CI = 7.2, 21.3) for BMI. Worldwide, there was no evidence of any population-level genetic correlation (0.007, 95% CI = –0.051, 0.064).