Population genetic differentiation of height and body mass index across Europe

Journal name:
Nature Genetics
Volume:
47,
Pages:
1357–1362
Year published:
DOI:
doi:10.1038/ng.3401
Received
Accepted
Published online

Across-nation differences in the mean values for complex traits are common1, 2, 3, 4, 5, 6, 7, 8, but the reasons for these differences are unknown. Here we find that many independent loci contribute to population genetic differences in height and body mass index (BMI) in 9,416 individuals across 14 European countries. Using discovery data on over 250,000 individuals and unbiased effect size estimates from 17,500 sibling pairs, we estimate that 24% (95% credible interval (CI) = 9%, 41%) and 8% (95% CI = 4%, 16%) of the captured additive genetic variance for height and BMI, respectively, reflect population genetic differences. Population genetic divergence differed significantly from that in a null model (height, P < 3.94 × 10−8; BMI, P < 5.95 × 10−4), and we find an among-population genetic correlation for tall and slender individuals (r = −0.80, 95% CI = −0.95, −0.60), consistent with correlated selection for both phenotypes. Observed differences in height among populations reflected the predicted genetic means (r = 0.51; P < 0.001), but environmental differences across Europe masked genetic differentiation for BMI (P < 0.58).

At a glance

Figures

  1. Observed divergence and predicted genetic divergence in height and BMI across 14 European nations.
    Figure 1: Observed divergence and predicted genetic divergence in height and BMI across 14 European nations.

    (ad) The predicted genetic means (a,c) and observed means (b,d) for height and BMI for 14 European nations are shown across Europe. From recently published data, we estimated national differences in mean height and BMI for 14 European countries, accounting for trends over time, with a European average height of 171.1 cm (95% CI = 169.6, 172.8 cm) and an average BMI of 25.0 (95% CI = 24.7, 25.3) across nations for males between 2000 and 2010.

  2. Predicted genetic differentiation compared to the expectation under genetic drift for height and BMI across 14 European nations.
    Figure 2: Predicted genetic differentiation compared to the expectation under genetic drift for height and BMI across 14 European nations.

    (a,b) The mean predicted genetic differentiation (blue) and differentiation under the null model representing genetic drift (gray) are shown for 14 European nations, with 95% credible intervals, for height (a) and BMI (b). ISO 2 country codes indicate each nation. The average P value of differentiation from the null expectation is shown in the figure. (c) The pattern of population-level co-differentiation for height and BMI across 14 European nations. The negative population genetic co-differentiation for these traits of −0.80 (95% CI = −0.95, −0.60) is represented by the blue ellipse.

  3. Association between observed population means and predicted genetic population means for height and BMI across 14 European nations.
    Figure 3: Association between observed population means and predicted genetic population means for height and BMI across 14 European nations.

    (a,b) Predicted population genetic means are plotted against observed population means for height (a) and BMI (b). P values give the significance of the multivariate Pearson product-moment correlation between the predicted population genetic means and the observed population means for both traits. For height, the correlation (r = 0.51, 95% CI = 0.39, 0.61) was greater than that expected under the null model (r = 0.03, 95% CI = −0.21, 0.17). For BMI, the correlation (r = −0.10, 95% CI = −0.19, 0.01) was not significantly different from the null expectation (r = −0.08, 95% CI = −0.24, 0.15).

  4. Overview of the study design.
    Supplementary Fig. 1: Overview of the study design.
  5. Projection of the prediction samples onto the first two HapMap principal components.
    Supplementary Fig. 2: Projection of the prediction samples onto the first two HapMap principal components.

    For the non-ascertained set of independent (LD r2 < 0.1), common, HapMap 3 loci that we used for prediction, we projected our prediction samples onto the first two principal components from HapMap.

  6. Simulation study comparing population and within-family association testing.
    Supplementary Fig. 3: Simulation study comparing population and within-family association testing.

    Using real genotype data, causal variants were allocated to 5,000 independent loci at random across the genome, with effects sampled from a normal distribution. Effect sizes were estimated in a genome-wide association study without controlling for population stratification (GWAS; yellow), in a GWAS that controlled for the first 20 principal components (green) and in a sibling pair within-family design (blue). The effect sizes were then used to predict the phenotype in an independent set of sibling pair data, and a recently derived approach was used to test for population stratification bias in the effect size estimates (Online Methods and ref. 33). Variance attributable to a Cg or a Ce term is indicative of population stratification bias, which can be observed in the GWAS scenario that did not appropriately control for population stratification.

  7. Simulation study comparing population and within-family association.
    Supplementary Fig. 4: Simulation study comparing population and within-family association.

    Using real genotype data, causal variants were allocated to 5,000 independent loci at random across the genome, with their effects sampled from a normal distribution. Fifty simulation replicates were conducted. Effect sizes were estimated in a genome-wide association study without controlling for population stratification (GWAS) and in a sibling pair within-family design. The effect sizes from the top 100 independent SNPs identified by the GWAS, the top 500 independent SNPs identified by the GWAS and genome-wide independent loci were then used for prediction in an independent sample. Individuals from the prediction sample were projected onto the first principal component of the discovery sample, and then two groups were selected based on the upper and lower quartiles of the distribution of the projected principal component. The mean difference in the predictor is shown when the predictor was created using effect sizes estimated in a GWAS that did not control for population stratification, using effect sizes estimated in a within-family design, using the true simulated effect sizes and when the within-family effect sizes were randomly allocated to loci under our null model. A predictor created from within-family estimates of effect size yields similar estimates to the true simulated values and the null model, and in no simulation were our predictions significantly different from our null model. This result was irrespective of whether there was ascertainment of loci from a discovery GWAS containing population stratification bias, as when we selected the top 100 or top 500 loci from the GWAS, re-estimated the effects in a within-family analysis and created a predictor we found no evidence of differentiation from our null model. A predictor created from GWAS estimates of effect size that are biased by population stratification yields variable estimates of the prediction difference between the 2 groups, and in 10 of the 50 simulations our predictions differed significantly from our null model.

  8. Simulation study comparing population and within-family association testing showing potential ascertainment bias.
    Supplementary Fig. 5: Simulation study comparing population and within-family association testing showing potential ascertainment bias.

    Using real genotype data, causal variants were allocated to 5,000 independent loci at random across the genome, with their effects sampled from a normal distribution. Fifty simulation replicates were conducted. Genotype-environment correlation was induced through a phenotypic mean difference of 0.5 s.d. along the first principal component of the discovery sample. Effect sizes were estimated in a genome-wide association study without controlling for population stratification (GWAS) and in a sibling pair within-family design. The effect sizes from the top 100 independent SNPs identified by the GWAS, the top 500 independent SNPs identified by the GWAS and genome-wide independent loci were then used for prediction in an independent sample. Individuals from the prediction sample were projected onto the first principal component of the discovery sample, and two groups were then selected based on the upper and lower quartiles of the distribution of the projected principal component. The mean difference in the predictor is shown when the predictor was created using effect sizes estimated in a GWAS that did not control for population stratification, using effect sizes estimated in a within-family design, using the true simulated effect sizes and when the within-family effect sizes were randomly allocated to loci under our null model. Ascertainment bias is evident here, where if the top 100 or 500 loci from a GWAS containing population stratification were selected to create a predictor then, irrespective of whether biased SNP effect estimates from the GWAS or unbiased SNP effect estimates from the within-family analysis were used, a significant deviation from the null model is observed. This is because the loci selected from the GWAS were those where the genotype-environment correlation was the strongest. In this simulation scenario, there is no selection on the phenotype and, thus, when there is no ascertainment of loci (when genome-wide SNPs are used to create the predictor), no prediction differences from the true values or the null model were evident.

  9. Proportion of population-level variance in a genetic predictor comprised of different sets of SNPs for height and BMI.
    Supplementary Fig. 6: Proportion of population-level variance in a genetic predictor comprised of different sets of SNPs for height and BMI.

    Genetic predictors were created from independent (pairwise LD correlation < 0.1, >1 Mb apart), common HapMap 3 loci selected at different significance thresholds (P < 5 × 10–8, P < 5 × 10–6, P < 5 × 10–4, P < 0.005, P < 0.05, P < 0.1) from large-scale meta-analyses. All refers to genome-wide independent (pairwise LD correlation < 0.1, >1 Mb apart), common HapMap 3 loci that were either selected preferentially on the basis of their within-family association with either trait or selected at random.

  10. Increased prediction accuracy results in increased population-level variance.
    Supplementary Fig. 7: Increased prediction accuracy results in increased population-level variance.

    We selected SNPs from large-scale meta-analyses, re-estimated their effects in a within-family design that is unbiased of population stratification and then assessed prediction accuracy in an independent sample for (a) height and (b) body mass index. We find that the population-level variance is better captured when a greater amount of phenotypic variation is explained by the predictor.

  11. Genome-wide pattern of population genetic differentiation for height and body mass index.
    Supplementary Fig. 8: Genome-wide pattern of population genetic differentiation for height and body mass index.
  12. The characteristics of a SNP and its contribution to the genome-wide pattern of population differentiation for height.
    Supplementary Fig. 9: The characteristics of a SNP and its contribution to the genome-wide pattern of population differentiation for height.

    Relationship between the contribution of a SNP to the genome-wide pattern of population differentiation (c2 value) and (a) allele frequency, (b) phenotypic variance explained and (c) meta-analysis P value.

  13. The characteristics of a SNP and its contribution to the genome-wide pattern of population differentiation for body mass index.
    Supplementary Fig. 10: The characteristics of a SNP and its contribution to the genome-wide pattern of population differentiation for body mass index.

    Relationship between the contribution of a SNP to the genome-wide pattern of population differentiation (c2 value) and (a) allele frequency, (b) phenotypic variance explained and (c) meta-analysis P value.

  14. Simulation study of the general approach.
    Supplementary Fig. 11: Simulation study of the general approach.

    (a) Simulated distribution of allele frequency differentiation, θ, at 10,000 loci plotted against allele frequency. (b) The simulated association between θ and the additive genetic variance contributed by each locus, which assumes that the most differentiated loci are those that contribute most to the additive genetic variance. (c) Profile scores were calculated from different sets of simulated loci, where each set explained differing amounts of the total variance. Population-level variance is shown for each set of loci, demonstrating that, even when the loci cumulatively explain only 20% of the total variance, 50% of the population-level genetic variance is captured. (d) Error variance for each locus was simulated from a normal distribution with variance equal to a percentage of the total variance in profile score. Increasing amounts of error variance were added to the profile score to approximate the effects of adding an increasing number of false positive loci. Population-level variance is shown at different error variances, demonstrating that including a large number of false positives decreases the population-level effects.

  15. Simulation study of the error variance induced by the addition of null SNPs.
    Supplementary Fig. 12: Simulation study of the error variance induced by the addition of null SNPs.

    The error variance for each locus was simulated from a normal distribution as a percentage of the total variance in profile score. Increasing amounts of error variance were added to the profile score to approximate the effects of adding an increasing number of false positive loci. The 95% confidence intervals are shown for the population mean profile score of 11 simulated populations across increasing amounts of incorporated error variance: (a) 0%, (b) 1%, (c) 2.5%, (d) 5%, (e) 7.5% and (f) 10%. This pattern reflects the fact that including a large number of false positives decreases the amount of population-level variance estimated.

  16. Overlap in the annotation of differentiated loci to genes for height and BMI.
    Supplementary Fig. 13: Overlap in the annotation of differentiated loci to genes for height and BMI.

    Five hundred SNP loci were selected that are expected to contribute most to the pattern of population genetic differentiation for height and body mass index. These SNP loci were annotated to genes, and the overlap was estimated. The annotation was then repeated by randomly selecting 500 loci from the top 10,000 SNP loci 100 times. The 95% confidence interval of the percentage of overlapping genes across the 100 sampling steps was 8.4–19.4%. In the expected top 500 loci contributing to population genetic variation of each trait, the percentage overlap was 19.8%.

  17. Population genetic differentiation across six Italian villages.
    Supplementary Fig. 14: Population genetic differentiation across six Italian villages.

    Predicted population genetic differentiation for BMI and height on a small scale across six northern Italian villages. There was no significant differentiation from the null model for height (c2 = 0.23, P = 0.985) and for BMI (c2 = 6.27, P = 0.817) at any set of SNPs, and we present the results across the genome. The villages are Sauris (sa), Resia (re), Illegio (il), Erto (er), Clausetto (cl) and San Martino del Corso (sm) in the Friuli-Venezia Giulia region in northeastern Italy. On this small scale, we find no evidence for population differentiation for either phenotype.

  18. Population genetic differentiation in the Human Genetic Diversity Panel for independent genome-wide SNPs ascertained on within-family effect sizes.
    Supplementary Fig. 15: Population genetic differentiation in the Human Genetic Diversity Panel for independent genome-wide SNPs ascertained on within-family effect sizes.

    (a) Height and (b) BMI. P values give the deviation of the predicted means from the null expectation. The proportion of population-level variance was 17.5% (95% CI = 9.6, 27.9) for height and 13.1% (95% CI = 7.2, 21.3) for BMI. Worldwide, there was no evidence of any population-level genetic correlation (0.007, 95% CI = –0.051, 0.064).

References

  1. Moussavi, S. et al. Depression, chronic diseases, and decrements in health: results from the World Health Surveys. Lancet 370, 851858 (2007).
  2. Wild, S., Roglic, G., Green, A., Sicree, R. & King, H. Global prevalence of diabetes: estimates for the year 2000 and projections for 2030. Diabetes Care 27, 10471053 (2004).
  3. Dye, C. Global burden of tuberculosis: estimated incidence, prevalence, and mortality by country. J. Am. Med. Assoc. 282, 677686 (1999).
  4. Lopez, A.D., Mathers, C.D., Ezzati, M., Jamison, D.T. & Murray, C.J.L. Global and regional burden of disease and risk factors, 2001: systematic analysis of population health data. Lancet 367, 17471757 (2006).
  5. Wang, H. et al. Age-specific and sex-specific mortality in 187 countries, 1970–2010: a systematic analysis for the Global Burden of Disease Study 2010. Lancet 380, 20712094 (2012).
  6. Jemal, A., Center, M.M., DeSantis, C. & Ward, E.M. Global patterns of cancer incidence and mortality rates and trends. Cancer Epidemiol. Biomarkers Prev. 19, 18931907 (2010).
  7. Kim, A.S. & Johnston, S.C. Global variation in the relative burden of stroke and ischemic heart disease. Circulation 124, 314323 (2011).
  8. Johnston, S.C., Mendis, S. & Mathers, C.D. Global variation in stroke burden and mortality: estimates from monitoring, surveillance, and modelling. Lancet Neurol. 8, 345354 (2009).
  9. Yang, J., Visscher, P.M. & Wray, N.R. Sporadic cases are the norm for complex disease. Eur. J. Hum. Genet. 18, 10391043 (2010).
  10. Hill, W.G., Goddard, M.E. & Visscher, P.M. Data and theory point to mainly additive genetic variance for complex traits. PLoS Genet. 4, e1000008 (2008).
  11. Yang, J. et al. Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 42, 565569 (2010).
  12. Morris, A.P. et al. Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes. Nat. Genet. 44, 981990 (2012).
  13. Lee, S.H. et al. Estimation and partitioning of polygenic variation captured by common SNPs for Alzheimer's disease, multiple sclerosis and endometriosis. Hum. Mol. Genet. 22, 832841 (2013).
  14. Yang, J. et al. Ubiquitous polygenicity of human complex traits: genome-wide analysis of 49 traits in Koreans. PLoS Genet. 9, e1003355 (2013).
  15. Robinson, M.R., Wray, N.R. & Visscher, P.M. Explaining additional genetic variation in complex traits. Trends Genet. 30, 124132 (2014).
  16. Abegunde, D.O., Mathers, C.D., Adam, T., Ortegon, M. & Strong, K. The burden and costs of chronic diseases in low-income and middle-income countries. Lancet 370, 19291938 (2007).
  17. Kim, A.S. & Johnston, S.C. Temporal and geographic trends in the global stroke epidemic. Stroke 44, S123S125 (2013).
  18. Ezzati, M. & Riboli, E. Can noncommunicable diseases be prevented? Lessons from studies of populations and individuals. Science 337, 14821487 (2012).
  19. Hartl, D.L. & Clark, A.G. Principles of Population Genetics (Sinauer Associates, 1997).
  20. Leinonen, T., McCairns, R.J.S., O'Hara, R.B. & Merilä, J. QST-FST comparisons: evolutionary and ecological insights from genomic heterogeneity. Nat. Rev. Genet. 14, 179190 (2013).
  21. James, P.T., Rigby, N. & Leach, R. The obesity epidemic, metabolic syndrome and future prevention strategies. Eur. J. Cardiovasc. Prev. Rehabil. 11, 38 (2004).
  22. Popkin, B.M. Global nutrition dynamics: the world is shifting rapidly toward a diet linked with noncommunicable diseases. Am. J. Clin. Nutr. 84, 289298 (2006).
  23. Wang, Y.C., McPherson, K., Marsh, T., Gortmaker, S.L. & Brown, M. Health and economic burden of the projected obesity trends in the USA and the UK. Lancet 378, 815825 (2011).
  24. Ng, M. et al. Global, regional, and national prevalence of overweight and obesity in children and adults during 1980–2013: a systematic analysis for the Global Burden of Disease Study 2013. Lancet 384, 766781 (2014).
  25. Finucane, M.M. et al. National, regional, and global trends in body-mass index since 1980: systematic analysis of health examination surveys and epidemiological studies with 960 country-years and 9·1 million participants. Lancet 377, 557567 (2011).
  26. Turchin, M.C. et al. Evidence of widespread selection on standing variation in Europe at height-associated SNPs. Nat. Genet. 44, 10151019 (2012).
  27. Speliotes, E.K. et al. Association analyses of 249,796 individuals reveal 18 new loci associated with body mass index. Nat. Genet. 42, 937948 (2010).
  28. Lango Allen, H. et al. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature 467, 832838 (2010).
  29. Yang, J. et al. Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat. Genet. 44, 369375 (2012).
  30. Yang, J. et al. FTO genotype is associated with phenotypic variability of body mass index. Nature 490, 267272 (2012).
  31. Amato, R., Miele, G., Monticelli, A. & Cocozza, S. Signs of selective pressure on genetic variants affecting human height. PLoS ONE 6, e27588 (2011).
  32. Berg, J.J. & Coop, G. A population genetic signal of polygenic adaptation. PLoS Genet. 10, e1004412 (2014).
  33. Wood, A.R. et al. Defining the role of common variation in the genomic and biological architecture of adult human height. Nat. Genet. 46, 11731186 (2014).
  34. Locke, A.E. et al. Genetic studies of body mass index yeild new insights for obesity biology. Nature 518, 197206 (2015).
  35. Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559575 (2007).
  36. Dudbridge, F. Power and predictive accuracy of polygenic risk scores. PLoS Genet. 9, e1003348 (2013).
  37. Baten, J. & Blum, M. Growing tall but unequal: new findings and new background evidence on anthropometric welfare in 156 countries, 1810–1989. Econ. Hist. Dev. Reg. 27, S66S85 (2012).
  38. Sabeti, P.C. et al. Genome-wide detection and characterization of positive selection in human populations. Nature 449, 913918 (2007).
  39. Nielsen, R., Hellmann, I., Hubisz, M., Bustamante, C. & Clark, A.G. Recent and ongoing selection in the human genome. Nat. Rev. Genet. 8, 857868 (2007).
  40. Bustamante, C.D. et al. Natural selection on protein-coding genes in the human genome. Nature 437, 11531157 (2005).
  41. Blekhman, R. et al. Natural selection on genes that underlie human disease susceptibility. Curr. Biol. 18, 883889 (2008).
  42. Barreiro, L.B., Laval, G., Quach, H., Patin, E. & Quintana-Murci, L. Natural selection has driven population differentiation in modern humans. Nat. Genet. 40, 340345 (2008).
  43. Akey, J.M. et al. Population history and natural selection shape patterns of genetic variation in 132 genes. PLoS Biol. 2, e286 (2004).
  44. Barreiro, L.B. & Quintana-Murci, L. From evolutionary genetics to human immunology: how selection shapes host defence genes. Nat. Rev. Genet. 11, 1730 (2010).
  45. Vasseur, E. & Quintana-Murci, L. The impact of natural selection on health and disease: uses of the population genetics approach in humans. Evol. Appl. 6, 596607 (2013).
  46. Chiaroni, J., Underhill, P.A. & Cavalli-Sforza, L.L. Y chromosome diversity, human expansion, drift, and cultural evolution. Proc. Natl. Acad. Sci. USA 106, 2017420179 (2009).
  47. Ovaskainen, O., Karhunen, M., Zheng, C., Arias, J.M.C. & Merilä, J. A new method to uncover signatures of divergent and stabilizing selection in quantitative traits. Genetics 189, 621632 (2011).
  48. Diverse Populations Collaborative Group. Weight-height relationships and body mass index: some observations from the Diverse Populations Collaboration. Am. J. Phys. Anthropol. 128, 220229 (2005).
  49. Lande, R. Genetic variation and phenotypic evolution during allopatric speciation. Am. Nat. 116, 463479 (1980).
  50. Esko, T. et al. Genetic characterization of northeastern Italian population isolates in the context of broader European genetic diversity. Eur. J. Hum. Genet. 21, 659665 (2013).
  51. Weir, B.S. & Hill, W.G. Estimating F-statistics. Annu. Rev. Genet. 36, 721750 (2002).
  52. Weir, B.S. & Cockerham, C.C. Estimating F-statistics for the analysis of population structure. Evolution 38, 13581370 (1984).
  53. Cockerham, C.C. & Weir, B.S. Correlations, descent measures: drift with migration and mutation. Proc. Natl. Acad. Sci. USA 84, 85128514 (1987).
  54. Williams, A.L., Patterson, N., Glessner, J., Hakonarson, H. & Reich, D. Phasing of many thousands of genotyped samples. Am. J. Hum. Genet. 91, 238251 (2012).
  55. Howie, B., Marchini, J. & Stephens, M. Genotype imputation with thousands of genomes. G3 (Bethesda) 1, 457470 (2011).
  56. 1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 5665 (2012).
  57. Delaneau, O., Marchini, J. & Zagury, J.-F. A linear complexity phasing method for thousands of genomes. Nat. Methods 9, 179181 (2012).
  58. Hadfield, J.D. MCMC methods for multi-response generalized linear mixed models: the MCMCglmm R package. J. Stat. Softw. 33(2), 1–22 (2010)

Download references

Author information

Affiliations

  1. Queensland Brain Institute, University of Queensland, Brisbane, Queensland, Australia.

    • Matthew R Robinson,
    • Gibran Hemani,
    • Konstantin Shakhbazov,
    • Joseph E Powell,
    • Anna Vinkhuyzen,
    • Jian Yang &
    • Peter M Visscher
  2. Department of Internal Medicine, Erasmus University Medical Center, Rotterdam, the Netherlands.

    • Carolina Medina-Gomez &
    • Fernando Rivadeneira
  3. Institute for Maternal and Child Health, Istituto di Ricovero e Cura a Carattere Scientifico (IRCCS) 'Burlo Garofolo', Trieste, Italy.

    • Massimo Mezzavilla &
    • Paolo Gasparini
  4. Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, UK.

    • Massimo Mezzavilla &
    • Paolo Gasparini
  5. Estonian Genome Center, University of Tartu, Tartu, Estonia.

    • Tonu Esko,
    • Andres Metspalu &
    • Joel N Hirschhorn
  6. Center for Basic and Translational Obesity Research, Boston Children's Hospital, Boston, Massachusetts, USA.

    • Tonu Esko,
    • Tune H Pers,
    • Sailaja Vedantam &
    • Joel N Hirschhorn
  7. Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA.

    • Tonu Esko,
    • Tune H Pers,
    • Sailaja Vedantam &
    • Joel N Hirschhorn
  8. Department of Genetics, Harvard Medical School, Boston, Massachusetts, USA.

    • Tonu Esko,
    • Tune H Pers,
    • Daniel I Chasman &
    • Joel N Hirschhorn
  9. University of Queensland Diamantina Institute, University of Queensland, Translational Research Institute, Brisbane, Queensland, Australia.

    • Joseph E Powell,
    • Jian Yang &
    • Peter M Visscher
  10. Division of Cancer Epidemiology and Genetics, National Cancer Institute, US National Institutes of Health, Bethesda, Maryland, USA.

    • Sonja I Berndt
  11. Department of Medical Sciences, Molecular Epidemiology and Science for Life Laboratory, Uppsala University, Uppsala, Sweden.

    • Stefan Gustafsson &
    • Erik Ingelsson
  12. Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA.

    • Anne E Justice
  13. Division of Gastroenterology, Department of Internal Medicine, University of Michigan, Ann Arbor, Michigan, USA.

    • Bratati Kahali &
    • Kari E North
  14. Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA.

    • Bratati Kahali &
    • Kari E North
  15. Center for Statistical Genetics, Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA.

    • Adam E Locke,
    • Goncalo R Abecasis &
    • Elizabeth K Speliotes
  16. Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, Lyngby, Denmark.

    • Tune H Pers
  17. Genetics of Complex Traits, University of Exeter Medical School, University of Exeter, Exeter, UK.

    • Andrew R Wood &
    • Timothy M Frayling
  18. Department of Neurology and Neurosurgery, Brain Center Rudolf Magnus, University Medical Center Utrecht, Utrecht, the Netherlands.

    • Wouter van Rheenen,
    • Leonard H van den Berg &
    • Jan H Veldink
  19. Norwegian Centre for Mental Disorders Research (NORMENT), KG Jebsen Centre for Psychosis Research, Division of Mental Health and Addiction, Oslo University Hospital and Institute of Clinical Medicine, University of Oslo, Oslo, Norway.

    • Ole A Andreassen
  20. Institute of Biological Psychiatry, MHC Sct. Hans, Mental Health Devices Copenhagen, Roskilde, Denmark.

    • Thomas M Werge
  21. Department of Clinical Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.

    • Thomas M Werge
  22. Lundbeck Foundation Initiative for Integrative Psychiatric Research, (iPSYCH), Aarhus, Denmark.

    • Thomas M Werge
  23. Neuroscience Campus Amsterdam, VU University Medical Center, Amsterdam, the Netherlands.

    • Dorret I Boomsma,
    • Eco J C de Geus &
    • Jouke Jan Hottenga
  24. EMGO+ Institute for Health and Care Research, VU University Medical Center, Amsterdam, the Netherlands.

    • Dorret I Boomsma,
    • Eco J C de Geus &
    • Jouke Jan Hottenga
  25. Department of Biological Psychology, VU University Amsterdam, Amsterdam, the Netherlands.

    • Dorret I Boomsma,
    • Eco J C de Geus &
    • Jouke Jan Hottenga
  26. Division of Preventive Medicine, Brigham and Women's Hospital, Boston, Massachusetts, USA.

    • Daniel I Chasman
  27. Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK.

    • Erik Ingelsson
  28. Medical Research Council (MRC) Epidemiology Unit, University of Cambridge, Institute of Metabolic Science, Addenbrooke's Hospital, Cambridge, UK.

    • Ruth J F Loos
  29. Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, New York, USA.

    • Ruth J F Loos
  30. Genetics of Obesity and Related Metabolic Traits Program, Icahn School of Medicine at Mount Sinai, New York, New York, USA.

    • Ruth J F Loos
  31. Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, New York, USA.

    • Ruth J F Loos
  32. Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden.

    • Patrik K E Magnusson &
    • Nancy L Pedersen
  33. QIMR Berghofer Medical Research Institute, Brisbane, Queensland, Australia.

    • Nicholas G Martin &
    • Grant W Montgomery
  34. Carolina Center for Genome Sciences, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA.

    • Kari E North
  35. Department of Twin Research and Genetic Epidemiology, King's College London, St. Thomas' Hospital, London, UK.

    • Timothy D Spector
  36. Biosciences Research Division, Department of Primary Industries, Melbourne, Victoria, Australia.

    • Michael E Goddard
  37. Department of Food and Agricultural Systems, University of Melbourne, Melbourne, Victoria, Australia.

    • Michael E Goddard

Contributions

Conception and design of the study: M.R.R., M.E.G., J.Y. and P.M.V. Data analysis: M.R.R., with additional contributions from G.H., C.M.-G., M.M., K.S., T.E., J.E.P., A.V., S.I.B., S.G., A.E.J., B.K., A.E.L., T.H.P., S.V., A.R.W. and W.v.R. Study oversight, sample collection and management: J.H.V., L.H.v.d.B., O.A.A., P.G., A.M., F.R., T.M.W., G.R.A., D.I.B., D.I.C., E.J.C.d.G., T.M.F., J.N.H., J.J.H., E.I., R.J.F.L., P.K.E.M., N.G.M., G.W.M., K.E.N., N.L.P., T.D.S. and E.K.S. Manuscript writing: M.R.R. and P.M.V., with contributions from all authors on the final version.

Competing financial interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to:

Author details

Supplementary information

Supplementary Figures

  1. Supplementary Figure 1: Overview of the study design. (431 KB)
  2. Supplementary Figure 2: Projection of the prediction samples onto the first two HapMap principal components. (306 KB)

    For the non-ascertained set of independent (LD r2 < 0.1), common, HapMap 3 loci that we used for prediction, we projected our prediction samples onto the first two principal components from HapMap.

  3. Supplementary Figure 3: Simulation study comparing population and within-family association testing. (164 KB)

    Using real genotype data, causal variants were allocated to 5,000 independent loci at random across the genome, with effects sampled from a normal distribution. Effect sizes were estimated in a genome-wide association study without controlling for population stratification (GWAS; yellow), in a GWAS that controlled for the first 20 principal components (green) and in a sibling pair within-family design (blue). The effect sizes were then used to predict the phenotype in an independent set of sibling pair data, and a recently derived approach was used to test for population stratification bias in the effect size estimates (Online Methods and ref. 33). Variance attributable to a Cg or a Ce term is indicative of population stratification bias, which can be observed in the GWAS scenario that did not appropriately control for population stratification.

  4. Supplementary Figure 4: Simulation study comparing population and within-family association. (163 KB)

    Using real genotype data, causal variants were allocated to 5,000 independent loci at random across the genome, with their effects sampled from a normal distribution. Fifty simulation replicates were conducted. Effect sizes were estimated in a genome-wide association study without controlling for population stratification (GWAS) and in a sibling pair within-family design. The effect sizes from the top 100 independent SNPs identified by the GWAS, the top 500 independent SNPs identified by the GWAS and genome-wide independent loci were then used for prediction in an independent sample. Individuals from the prediction sample were projected onto the first principal component of the discovery sample, and then two groups were selected based on the upper and lower quartiles of the distribution of the projected principal component. The mean difference in the predictor is shown when the predictor was created using effect sizes estimated in a GWAS that did not control for population stratification, using effect sizes estimated in a within-family design, using the true simulated effect sizes and when the within-family effect sizes were randomly allocated to loci under our null model. A predictor created from within-family estimates of effect size yields similar estimates to the true simulated values and the null model, and in no simulation were our predictions significantly different from our null model. This result was irrespective of whether there was ascertainment of loci from a discovery GWAS containing population stratification bias, as when we selected the top 100 or top 500 loci from the GWAS, re-estimated the effects in a within-family analysis and created a predictor we found no evidence of differentiation from our null model. A predictor created from GWAS estimates of effect size that are biased by population stratification yields variable estimates of the prediction difference between the 2 groups, and in 10 of the 50 simulations our predictions differed significantly from our null model.

  5. Supplementary Figure 5: Simulation study comparing population and within-family association testing showing potential ascertainment bias. (144 KB)

    Using real genotype data, causal variants were allocated to 5,000 independent loci at random across the genome, with their effects sampled from a normal distribution. Fifty simulation replicates were conducted. Genotype-environment correlation was induced through a phenotypic mean difference of 0.5 s.d. along the first principal component of the discovery sample. Effect sizes were estimated in a genome-wide association study without controlling for population stratification (GWAS) and in a sibling pair within-family design. The effect sizes from the top 100 independent SNPs identified by the GWAS, the top 500 independent SNPs identified by the GWAS and genome-wide independent loci were then used for prediction in an independent sample. Individuals from the prediction sample were projected onto the first principal component of the discovery sample, and two groups were then selected based on the upper and lower quartiles of the distribution of the projected principal component. The mean difference in the predictor is shown when the predictor was created using effect sizes estimated in a GWAS that did not control for population stratification, using effect sizes estimated in a within-family design, using the true simulated effect sizes and when the within-family effect sizes were randomly allocated to loci under our null model. Ascertainment bias is evident here, where if the top 100 or 500 loci from a GWAS containing population stratification were selected to create a predictor then, irrespective of whether biased SNP effect estimates from the GWAS or unbiased SNP effect estimates from the within-family analysis were used, a significant deviation from the null model is observed. This is because the loci selected from the GWAS were those where the genotype-environment correlation was the strongest. In this simulation scenario, there is no selection on the phenotype and, thus, when there is no ascertainment of loci (when genome-wide SNPs are used to create the predictor), no prediction differences from the true values or the null model were evident.

  6. Supplementary Figure 6: Proportion of population-level variance in a genetic predictor comprised of different sets of SNPs for height and BMI. (144 KB)

    Genetic predictors were created from independent (pairwise LD correlation < 0.1, >1 Mb apart), common HapMap 3 loci selected at different significance thresholds (P < 5 × 10–8, P < 5 × 10–6, P < 5 × 10–4, P < 0.005, P < 0.05, P < 0.1) from large-scale meta-analyses. All refers to genome-wide independent (pairwise LD correlation < 0.1, >1 Mb apart), common HapMap 3 loci that were either selected preferentially on the basis of their within-family association with either trait or selected at random.

  7. Supplementary Figure 7: Increased prediction accuracy results in increased population-level variance. (118 KB)

    We selected SNPs from large-scale meta-analyses, re-estimated their effects in a within-family design that is unbiased of population stratification and then assessed prediction accuracy in an independent sample for (a) height and (b) body mass index. We find that the population-level variance is better captured when a greater amount of phenotypic variation is explained by the predictor.

  8. Supplementary Figure 8: Genome-wide pattern of population genetic differentiation for height and body mass index. (178 KB)
  9. Supplementary Figure 9: The characteristics of a SNP and its contribution to the genome-wide pattern of population differentiation for height. (116 KB)

    Relationship between the contribution of a SNP to the genome-wide pattern of population differentiation (c2 value) and (a) allele frequency, (b) phenotypic variance explained and (c) meta-analysis P value.

  10. Supplementary Figure 10: The characteristics of a SNP and its contribution to the genome-wide pattern of population differentiation for body mass index. (110 KB)

    Relationship between the contribution of a SNP to the genome-wide pattern of population differentiation (c2 value) and (a) allele frequency, (b) phenotypic variance explained and (c) meta-analysis P value.

  11. Supplementary Figure 11: Simulation study of the general approach. (351 KB)

    (a) Simulated distribution of allele frequency differentiation, θ, at 10,000 loci plotted against allele frequency. (b) The simulated association between θ and the additive genetic variance contributed by each locus, which assumes that the most differentiated loci are those that contribute most to the additive genetic variance. (c) Profile scores were calculated from different sets of simulated loci, where each set explained differing amounts of the total variance. Population-level variance is shown for each set of loci, demonstrating that, even when the loci cumulatively explain only 20% of the total variance, 50% of the population-level genetic variance is captured. (d) Error variance for each locus was simulated from a normal distribution with variance equal to a percentage of the total variance in profile score. Increasing amounts of error variance were added to the profile score to approximate the effects of adding an increasing number of false positive loci. Population-level variance is shown at different error variances, demonstrating that including a large number of false positives decreases the population-level effects.

  12. Supplementary Figure 12: Simulation study of the error variance induced by the addition of null SNPs. (196 KB)

    The error variance for each locus was simulated from a normal distribution as a percentage of the total variance in profile score. Increasing amounts of error variance were added to the profile score to approximate the effects of adding an increasing number of false positive loci. The 95% confidence intervals are shown for the population mean profile score of 11 simulated populations across increasing amounts of incorporated error variance: (a) 0%, (b) 1%, (c) 2.5%, (d) 5%, (e) 7.5% and (f) 10%. This pattern reflects the fact that including a large number of false positives decreases the amount of population-level variance estimated.

  13. Supplementary Figure 13: Overlap in the annotation of differentiated loci to genes for height and BMI. (80 KB)

    Five hundred SNP loci were selected that are expected to contribute most to the pattern of population genetic differentiation for height and body mass index. These SNP loci were annotated to genes, and the overlap was estimated. The annotation was then repeated by randomly selecting 500 loci from the top 10,000 SNP loci 100 times. The 95% confidence interval of the percentage of overlapping genes across the 100 sampling steps was 8.4–19.4%. In the expected top 500 loci contributing to population genetic variation of each trait, the percentage overlap was 19.8%.

  14. Supplementary Figure 14: Population genetic differentiation across six Italian villages. (79 KB)

    Predicted population genetic differentiation for BMI and height on a small scale across six northern Italian villages. There was no significant differentiation from the null model for height (c2 = 0.23, P = 0.985) and for BMI (c2 = 6.27, P = 0.817) at any set of SNPs, and we present the results across the genome. The villages are Sauris (sa), Resia (re), Illegio (il), Erto (er), Clausetto (cl) and San Martino del Corso (sm) in the Friuli-Venezia Giulia region in northeastern Italy. On this small scale, we find no evidence for population differentiation for either phenotype.

  15. Supplementary Figure 15: Population genetic differentiation in the Human Genetic Diversity Panel for independent genome-wide SNPs ascertained on within-family effect sizes. (292 KB)

    (a) Height and (b) BMI. P values give the deviation of the predicted means from the null expectation. The proportion of population-level variance was 17.5% (95% CI = 9.6, 27.9) for height and 13.1% (95% CI = 7.2, 21.3) for BMI. Worldwide, there was no evidence of any population-level genetic correlation (0.007, 95% CI = –0.051, 0.064).

PDF files

  1. Supplementary Text and Figures (7,037 KB)

    Supplementary Figures 1–15, Supplementary Tables 1 and 2, and Supplementary Note.

Additional data