Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Genetic correlates of social stratification in Great Britain

Abstract

Human DNA polymorphisms vary across geographic regions, with the most commonly observed variation reflecting distant ancestry differences. Here we investigate the geographic clustering of common genetic variants that influence complex traits in a sample of ~450,000 individuals from Great Britain. Of 33 traits analysed, 21 showed significant geographic clustering at the genetic level after controlling for ancestry, probably reflecting migration driven by socioeconomic status (SES). Alleles associated with educational attainment (EA) showed the most clustering, with EA-decreasing alleles clustering in lower SES areas such as coal mining areas. Individuals who leave coal mining areas carry more EA-increasing alleles on average than those in the rest of Great Britain. The level of geographic clustering is correlated with genetic associations between complex traits and regional measures of SES, health and cultural outcomes. Our results are consistent with the hypothesis that social stratification leaves visible marks in geographic arrangements of common allele frequencies and gene–environment correlations.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Geographic distributions (birthplace) of the first five PCs, Moran’s I and empirical P values for Moran’s I.
Fig. 2: Moran’s I of five phenotypes and 33 SBLUP polygenic scores computed using the average polygenic score per region in 378 local authority regions (n = 320,940 unrelated individuals).
Fig. 3: Geographic distribution (birthplace) of EA polygenic scores, after regressing out 100 PCs (n = 320,940 unrelated individuals), and Townsend indices from 1971 and 2011.
Fig. 4: Geographically clustered polygenic scores (n = 16; ordered by Moran’s I) for the four migration groups.
Fig. 5: Polygenic scores and EA outcomes over time.
Fig. 6: Comparisons between the results of the RGWASs on EA from census data and from an individual-level EA GWAS that excluded British participants.

Similar content being viewed by others

Data availability

This research was conducted using data from the UK Biobank resource (application number 12514) and dbGaP (accession number: phs000674). UK Biobank data can be accessed on request once a research project has been submitted and approved by the UK Biobank committee. dbGaP data can also be accessed on request once a research project has been submitted and approved by dbGaP. The regional measures that have been analysed are publicly available and can be downloaded using the links provided in the Methods.

Code availability

Custom R code used for statistical analyses (for example, the computation of Moran’s I) is available from the corresponding authors on request.

References

  1. Tobler, W. R. A computer movie simulating urban growth in the Detroit region. Econ. Geog. 46, 234–240 (1970).

    Article  Google Scholar 

  2. Novembre, J. et al. Genes mirror geography within Europe. Nature 456, 98–101 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Abdellaoui, A. et al. Population structure, migration, and diversifying selection in the Netherlands. Eur. J. Hum. Genet. 21, 1277–1285 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Kerminen, S. et al. Fine-scale genetic structure in Finland. G3 (Bethesda) 7, 3459–3468 (2017).

    Article  Google Scholar 

  5. Leslie, S. et al. The fine-scale genetic structure of the British population. Nature 519, 309–314 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Zhang, G., Muglia, L. J., Chakraborty, R., Akey, J. M. & Williams, S. M. Signatures of natural selection on genetic variants affecting complex human traits. Appl. Transl. Genom. 2, 78–94 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  7. Berg, J. J. & Coop, G. A population genetic signal of polygenic adaptation. PLoS Genet. 10, e1004412 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  8. Turkheimer, E. Three laws of behavior genetics and what they mean. Curr. Dir. Psychol. Sci. 9, 160–164 (2000).

    Article  Google Scholar 

  9. Coulter, R. & Scott, J. What motivates residential mobility? Re‐examining self‐reported reasons for desiring and making residential moves. Popul. Space Place 21, 354–371 (2015).

    Article  Google Scholar 

  10. Long, J. Rural–urban migration and socioeconomic mobility in Victorian Britain. J. Econ. Hist. 65, 1–35 (2005).

    Article  Google Scholar 

  11. Park, C. Sacred Worlds: An Introduction to Geography and Religion (Routledge, 2002).

  12. Rodden, J. The geographic distribution of political preferences. Annu. Rev. Polit. Sci. 13, 321–340 (2010).

    Article  Google Scholar 

  13. Boyle, P. Population geography: migration and inequalities in mortality and morbidity. Prog. Hum. Geog. 28, 767–776 (2004).

    Article  Google Scholar 

  14. Lewis, G. & Booth, M. Regional differences in mental health in Great Britain. J. Epidemiol. Community Health 46, 608–611 (1992).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Tyrrell, J. et al. Height, body mass index, and socioeconomic status: Mendelian randomisation study in UK Biobank. Br. Med. J. 352, i582 (2016).

    Article  Google Scholar 

  16. Marmot, M. The health gap: the challenge of an unequal world. Lancet 386, 2442–2444 (2015).

    Article  PubMed  Google Scholar 

  17. Beard, E. et al. Healthier central England or North–South divide? Analysis of national survey data on smoking and high-risk drinking. BMJ Open 7, e014210 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  18. Brimblecombe, N., Dorling, D. & Shaw, M. Migration and geographical inequalities in health in Britain. Soc. Sci. Med. 50, 861–878 (2000).

    Article  CAS  PubMed  Google Scholar 

  19. Davey Smith, G. & Ebrahim, S. ‘Mendelian randomization’: can genetic epidemiology contribute to understanding environmental determinants of disease? Int. J. Epidemiol. 32, 1–22 (2003).

    Article  Google Scholar 

  20. Richards, J. B. & Evans, D. M. Back to school to protect against coronary heart disease? Br. Med. J. https://www.bmj.com/content/358/bmj.j3849 (2017).

  21. Verweij, K. J., Mosing, M. A., Zietsch, B. P. & Medland, S. E. in Statistical Human Genetics 151–170 (Springer, 2012).

  22. Sudlow, C. et al. UK Biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 12, e1001779 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  23. Price, A. L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904–909 (2006).

    Article  CAS  PubMed  Google Scholar 

  24. Wray, N. R. et al. Pitfalls of predicting complex traits from SNPs. Nat. Rev. Genet. 14, 507–515 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Moran, P. A. Notes on continuous stochastic phenomena. Biometrika 37, 17–23 (1950).

    Article  CAS  PubMed  Google Scholar 

  26. Okbay, A. et al. Genome-wide association study identifies 74 loci associated with educational attainment. Nature 533, 539–542 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Lee, J. J. et al. Gene discovery and polygenic prediction from a genome-wide association study of educational attainment in 1.1 million individuals. Nat. Genet. 50, 1112–1121 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Pasaniuc, B. & Price, A. L. Dissecting the genetics of complex traits using summary association statistics. Nat. Rev. Genet. 18, 117–127 (2017).

    Article  CAS  PubMed  Google Scholar 

  29. Haworth, S. et al. Apparent latent structure within the UK Biobank sample has implications for epidemiological analysis. Nat. Commun. 10, 333 (2019).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  30. Niedomysl, T. How migration motives change over migration distance: evidence on variation across socio-economic and demographic groups. Reg. Stud. 45, 843–855 (2011).

    Article  Google Scholar 

  31. Foden, M., Fothergill, S. & Gore, T. The State of the Coalfields: Economic and Social Conditions in the Former Mining Communities of England, Scotland and Wales (Centre for Regional Economic and Social Research, Sheffield Hallam Univ., 2014).

  32. Beatty, C., Fothergill, S. & Powell, R. Twenty years on: has the economy of the UK coalfields recovered? Environ. Plan. A 39, 1654–1675 (2007).

    Article  Google Scholar 

  33. Townsend, P., Phillimore, P. & Beattie, A. Health and Deprivation: Inequality and the North (Routledge, 1988).

  34. Kong, A. et al. Selection against variants in the genome associated with educational attainment. Proc. Natl Acad. Sci. USA 114, E727–E732 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Hill, W. D. et al. Molecular genetic contributions to social deprivation and household income in UK Biobank. Curr. Biol. 26, 3083–3089 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Cetateanu, A. & Jones, A. Understanding the relationship between food environments, deprivation and childhood overweight and obesity: evidence from a cross sectional England-wide study. Health Place 27, 68–76 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  38. Silventoinen, K. et al. Parental education and genetics of BMI from infancy to old age: a pooled analysis of 29 twin cohorts. Obesity 27, 855–865 (2019).

    PubMed  Google Scholar 

  39. Menozzi, P., Piazza, A. & Cavalli-Sforza, L. Synthetic maps of human gene frequencies in Europeans. Science 201, 786–792 (1978).

    Article  CAS  PubMed  Google Scholar 

  40. Berg, J. J. et al. Reduced signal for polygenic adaptation of height in UK Biobank. eLife 8, e39725 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  41. Sohail, M. et al. Polygenic adaptation on height is overestimated due to uncorrected stratification in genome-wide association studies. eLife 8, e39702 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  42. Abdellaoui, A. et al. Educational attainment influences levels of homozygosity through migration and assortative mating. PLoS One 10, e0118935 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  43. Domingue, B. W., Rehkopf, D. H., Conley, D. & Boardman, J. D. Geographic clustering of polygenic scores at different stages of the life course. RSF 4, 137–149 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  44. Cummins, S. C., McKay, L. & MacIntyre, S. McDonald’s restaurants and neighborhood deprivation in Scotland and England. Am. J. Prev. Med. 29, 308–310 (2005).

    Article  PubMed  Google Scholar 

  45. Alford, J. R., Funk, C. L. & Hibbing, J. R. Are political orientations genetically transmitted? Am. Polit. Sci. Rev. 99, 153–167 (2005).

    Article  Google Scholar 

  46. Benjamin, D. J. et al. The genetic architecture of economic and political preferences. Proc. Natl Acad. Sci. USA 109, 8026–8031 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Hatemi, P. K. & McDermott, R. The genetics of politics: discovery, challenges, and progress. Trends Genet. 28, 525–533 (2012).

    Article  CAS  PubMed  Google Scholar 

  48. Hatemi, P. K., Medland, S. E., Morley, K. I., Heath, A. C. & Martin, N. G. The genetics of voting: an Australian twin study. Behav. Genet. 37, 435–448 (2007).

    Article  PubMed  Google Scholar 

  49. Smith, K. et al. Biology, ideology, and epistemology: how do we know political attitudes are inherited and why should we care? Am. J. Polit. Sci. 56, 17–33 (2012).

    Article  Google Scholar 

  50. Koenig, L. B., McGue, M., Krueger, R. F. & Bouchard, T. J. Genetic and environmental influences on religiousness: findings for retrospective and current religiousness ratings. J. Pers. 73, 471–488 (2005).

    Article  PubMed  Google Scholar 

  51. Alabrese, E., Becker, S. O., Fetzer, T. & Novy, D. Who voted for Brexit? Individual and regional data combined. Eur. J. Polit. Econ. 56, 132–150 (2019).

    Article  Google Scholar 

  52. Fry, A. et al. Comparison of sociodemographic and health-related characteristics of UK Biobank participants with those of the general population. Am. J. Epidemiol. 186, 1026–1034 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  53. Selzam, S. et al. Comparing within-and between-family polygenic score prediction. Am. J. Hum. Genet. 105, 351–363 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Kong, A. et al. The nature of nurture: effects of parental genotypes. Science 359, 424–428 (2018).

    Article  CAS  PubMed  Google Scholar 

  55. Llobera, J. R. An Invitation to Anthropology: the Structure, Evolution and Cultural Identity of Human Societies (Berghahn Books, 2003).

  56. Robinson, M. R. et al. Genetic evidence of assortative mating in humans. Nat. Hum. Behav. 1, 0016 (2017).

    Article  Google Scholar 

  57. Hugh-Jones, D., Verweij, K. J., Pourcain, B. S. & Abdellaoui, A. Assortative mating on educational attainment leads to genetic spousal resemblance for polygenic scores. Intelligence 59, 103–108 (2016).

    Article  Google Scholar 

  58. Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Yang, J., Lee, S. H., Goddard, M. E. & Visscher, P. M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. 1000 Genomes Project Consortium.A global reference for human genetic variation. Nature 526, 68–74 (2015).

    Article  CAS  Google Scholar 

  61. Abraham, G., Qiu, Y. & Inouye, M. FlashPCA2: principal component analysis of Biobank-scale genotype datasets. Bioinformatics 33, 2776–2778 (2017).

    Article  CAS  PubMed  Google Scholar 

  62. McCarthy, S. et al. A reference panel of 64,976 haplotypes for genotype imputation. Nat. Genet. 48, 1279–1283 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  63. Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Office for National Statistics; National Records of Scotland; Northern Ireland Statistics and Research Agency. 2011 Census aggregate data. UK Data Service https://doi.org/10.5257/census/aggregate-2011-2 (Edition: February 2017).

  65. Altman, D. G. & Bland, J. M. Statistics notes: the normal distribution. Br. Med. J. 310, 298 (1995).

    Article  CAS  Google Scholar 

  66. Loh, P.-R. et al. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat. Genet. 47, 284–290 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. Yang, J., Zaitlen, N. A., Goddard, M. E., Visscher, P. M. & Price, A. L. Advantages and pitfalls in the application of mixed-model association methods. Nat. Genet. 46, 100–106 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  68. Bulik-Sullivan, B. K. et al. LD score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  69. Wray, N. R. et al. Genome-wide association analyses identify 44 risk variants and refine the genetic architecture of major depression. Nat. Genet. 50, 668–681 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

This research was supported by the Australian National Health and Medical Research Council (1107258, 1078901, 1078037, 1056929, 1048853 and 1113400) and the Sylvia and Charles Viertel Charitable Foundation (Senior Medical Research Fellowship). A.A. and K.J.H.V. are supported by the Foundation Volksbond Rotterdam. A.A. and M.G.N. are supported by ZonMw grants 849200011 and 531003014 from The Netherlands Organisation for Health Research and Development. B.P.Z. received funding from the Australian Research Council (FT160100298). The research was conducted using data from the UK Biobank Resource (application number: 12514) and dbGaP (accession number: phs000674). The Genetic Epidemiology Research on Adult Health and Aging study was supported by grant RC2 AG036607 from the National Institutes of Health, as well as grants from the Robert Wood Johnson Foundation, Ellison Medical Foundation, Wayne and Gladys Valley Foundation and Kaiser Permanente. The authors thank the Kaiser Permanente Medical Care Plan, Northern California Region members who participated in the Kaiser Permanente Research Program on Genes, Environment and Health. This study was conducted using UK Biobank resources under application number 12514. UK Biobank was established by the Wellcome Trust medical charity, Medical Research Council, Department of Health, Scottish Government and Northwest Regional Development Agency. It also received funding from the Welsh Assembly Government, British Heart Foundation and Diabetes UK. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Author information

Authors and Affiliations

Authors

Contributions

A.A., D.H.-J. and P.M.V. conceived and designed the study. A.A., D.H.-J., L.Y. and K.E.K. analysed the data. A.A. wrote the manuscript and produced the figures. D.H.-J., L.Y., K.E.K., M.G.N., L.V., Y.H., B.P.Z., T.M.F., N.R.W., J.Y., K.J.H.V. and P.M.V. provided significant feedback on the analyses and the manuscript. P.M.V. supervised the project.

Corresponding authors

Correspondence to Abdel Abdellaoui or Peter M. Visscher.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Primary Handling Editor: Stavroula Kousta.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Variation explained by regional differences of uncorrected polygenic scores.

Linear mixed model results, with phenotype or polygenic score (without regressing out 100 PCs) as a dependent variable and region as random effect (N = 320,940 unrelated individuals). Left: Local Authorities (~380 regions); Middle: MSOA (~5,300 regions), Right: Coal mining Regions (fitted as a binary variable). Red: Birth Place; Green: Current Address; Yellow = significant after FDR correction.

Extended Data Fig. 2 Variation explained by regional differences of ancestry-corrected polygenic scores.

Linear mixed model results, with phenotype or polygenic score (after regressing out 100 PCs) as a dependent variable and region as random effect (N = 320,940 unrelated individuals). Left: Local Authorities (~380 regions); Middle: MSOA (~5,300 regions), Right: Coal mining Regions (fitted as a binary variable). Red: Birth Place; Green: Current Address; Yellow = significant after FDR correction.

Extended Data Fig. 3 Variation explained by regional differences of ancestry-informative PCs.

Linear mixed model results, with PCs as a dependent variable and region as random effect (N = 320,940 unrelated individuals). Left: Local Authorities (~380 regions); Middle: MSOA (~5,300 regions), Right: Coal mining Regions (fitted as a binary variable). Red: Birth Place; Green: Current Address; Yellow = significant after FDR correction.

Extended Data Fig. 4 Associations between polygenic scores and regional measures of socio-economic outcomes.

The standardized effect size estimates of robust linear regressions of polygenic scores on regional measures of socio-economic outcomes in unrelated UK Biobank participants of European descent (N ~320k). The polygenic scores are all standardized residuals after regressing out 100 PCs. Every individual was given the value of their region. Significant effects are colored, whereby the significance threshold is based on FDR correction across all tests shown in all four panels. All SEs were ≤ .002.

Extended Data Fig. 5 Associations between polygenic scores and regional measures of nutrition and health.

The standardized effect size estimates of robust linear regressions of polygenic scores on regional measures of nutrition and health outcomes in unrelated UK Biobank participants of European descent (N ~320k). The polygenic scores are all standardized residuals after regressing out 100 PCs. Every individual was given the value of their region. Significant effects are colored, whereby the significance threshold is based on FDR correction across all tests shown in all four panels. All SEs were ≤ .002.

Extended Data Fig. 6 Associations between polygenic scores and regional measures of religiosity and political preference.

The standardized effect size estimates of robust linear regressions of polygenic scores on regional measures of religiosity and election outcomes in unrelated UK Biobank participants of European descent (N ~320k). The polygenic scores are all standardized residuals after regressing out 100 PCs. Every individual was given the value of their region. Significant effects are colored, whereby the significance threshold is based on FDR correction across all tests shown in all four panels. All SEs were ≤ .002.

Extended Data Fig. 7 Associations between polygenic scores and individual-level phenotypes.

The standardized effect size estimates of robust linear regressions of polygenic scores on individual level phenotypes in unrelated UK Biobank participants of European descent (N ~320k). The polygenic scores are all standardized residuals after regressing out 100 PCs. Significant effects are colored, whereby the significance threshold is based on FDR correction across all tests shown in all four panels. All SEs were ≤ .002.

Extended Data Fig. 8 Genetic correlations between regional measures of socio-economic outcomes and a range of complex traits and diseases.

Genetic correlations (above) and their SEs (below) based on LD score regression for the RGWASs on SES-related traits. Colored is significant after FDR correction. The green numbers in the left part of the Figure below the diagonal of 1’s are the phenotypic correlations between the regional outcomes. The blue stars next to the trait names indicate that UK Biobank was part of the GWAS of the trait. See Supplementary Table 3 for the list of GWASs that the summary statistics of the complex traits were derived from.

Extended Data Fig. 9 Genetic correlations between regional measures of health- and nutrition and a range of complex traits and diseases.

Genetic correlations (above) and their SEs (below) based on LD score regression for the RGWASs on health- and nutrition-related traits. Colored is significant after FDR correction. The green numbers in the left part of the Figure below the diagonal of 1’s are the phenotypic correlations between the regional outcomes. The blue stars next to the trait names indicate that UK Biobank was part of the GWAS of the trait. See Supplementary Table 3 for the list of GWASs that the summary statistics of the complex traits were derived from.

Extended Data Fig. 10 Genetic correlations between regional measures of religiosity and political preference and a range of complex traits and diseases.

Genetic correlations (above) and their SEs (below) based on LD score regression for the RGWASs on ideology-related traits (religion and political preference). Colored is significant after FDR correction. The green numbers in the left part of the Figure below the diagonal of 1’s are the phenotypic correlations between the regional outcomes. The blue stars next to the trait names indicate that UK Biobank was part of the GWAS of the trait. See Supplementary Table 3 for the list of GWASs that the summary statistics of the complex traits were derived from.

Supplementary information

Supplementary Information

General summary and frequently asked questions, Supplementary Notes, Supplementary References, Supplementary Tables 1–3 and Supplementary Figs. 1–24.

Reporting Summary

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Abdellaoui, A., Hugh-Jones, D., Yengo, L. et al. Genetic correlates of social stratification in Great Britain. Nat Hum Behav 3, 1332–1342 (2019). https://doi.org/10.1038/s41562-019-0757-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41562-019-0757-5

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing