Abstract
Human DNA polymorphisms vary across geographic regions, with the most commonly observed variation reflecting distant ancestry differences. Here we investigate the geographic clustering of common genetic variants that influence complex traits in a sample of ~450,000 individuals from Great Britain. Of 33 traits analysed, 21 showed significant geographic clustering at the genetic level after controlling for ancestry, probably reflecting migration driven by socioeconomic status (SES). Alleles associated with educational attainment (EA) showed the most clustering, with EA-decreasing alleles clustering in lower SES areas such as coal mining areas. Individuals who leave coal mining areas carry more EA-increasing alleles on average than those in the rest of Great Britain. The level of geographic clustering is correlated with genetic associations between complex traits and regional measures of SES, health and cultural outcomes. Our results are consistent with the hypothesis that social stratification leaves visible marks in geographic arrangements of common allele frequencies and gene–environment correlations.
This is a preview of subscription content, access via your institution
Relevant articles
Open Access articles citing this article.
-
Genetic predictors of cultural values variation between societies
Scientific Reports Open Access 17 May 2023
-
Partner choice, confounding and trait convergence all contribute to phenotypic partner similarity
Nature Human Behaviour Open Access 16 March 2023
-
A meta-analysis of genetic effects associated with neurodevelopmental disorders and co-occurring conditions
Nature Human Behaviour Open Access 20 February 2023
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Rent or buy this article
Get just this article for as long as you need it
$39.95
Prices may be subject to local taxes which are calculated during checkout






Data availability
This research was conducted using data from the UK Biobank resource (application number 12514) and dbGaP (accession number: phs000674). UK Biobank data can be accessed on request once a research project has been submitted and approved by the UK Biobank committee. dbGaP data can also be accessed on request once a research project has been submitted and approved by dbGaP. The regional measures that have been analysed are publicly available and can be downloaded using the links provided in the Methods.
Code availability
Custom R code used for statistical analyses (for example, the computation of Moran’s I) is available from the corresponding authors on request.
References
Tobler, W. R. A computer movie simulating urban growth in the Detroit region. Econ. Geog. 46, 234–240 (1970).
Novembre, J. et al. Genes mirror geography within Europe. Nature 456, 98–101 (2008).
Abdellaoui, A. et al. Population structure, migration, and diversifying selection in the Netherlands. Eur. J. Hum. Genet. 21, 1277–1285 (2013).
Kerminen, S. et al. Fine-scale genetic structure in Finland. G3 (Bethesda) 7, 3459–3468 (2017).
Leslie, S. et al. The fine-scale genetic structure of the British population. Nature 519, 309–314 (2015).
Zhang, G., Muglia, L. J., Chakraborty, R., Akey, J. M. & Williams, S. M. Signatures of natural selection on genetic variants affecting complex human traits. Appl. Transl. Genom. 2, 78–94 (2013).
Berg, J. J. & Coop, G. A population genetic signal of polygenic adaptation. PLoS Genet. 10, e1004412 (2014).
Turkheimer, E. Three laws of behavior genetics and what they mean. Curr. Dir. Psychol. Sci. 9, 160–164 (2000).
Coulter, R. & Scott, J. What motivates residential mobility? Re‐examining self‐reported reasons for desiring and making residential moves. Popul. Space Place 21, 354–371 (2015).
Long, J. Rural–urban migration and socioeconomic mobility in Victorian Britain. J. Econ. Hist. 65, 1–35 (2005).
Park, C. Sacred Worlds: An Introduction to Geography and Religion (Routledge, 2002).
Rodden, J. The geographic distribution of political preferences. Annu. Rev. Polit. Sci. 13, 321–340 (2010).
Boyle, P. Population geography: migration and inequalities in mortality and morbidity. Prog. Hum. Geog. 28, 767–776 (2004).
Lewis, G. & Booth, M. Regional differences in mental health in Great Britain. J. Epidemiol. Community Health 46, 608–611 (1992).
Tyrrell, J. et al. Height, body mass index, and socioeconomic status: Mendelian randomisation study in UK Biobank. Br. Med. J. 352, i582 (2016).
Marmot, M. The health gap: the challenge of an unequal world. Lancet 386, 2442–2444 (2015).
Beard, E. et al. Healthier central England or North–South divide? Analysis of national survey data on smoking and high-risk drinking. BMJ Open 7, e014210 (2017).
Brimblecombe, N., Dorling, D. & Shaw, M. Migration and geographical inequalities in health in Britain. Soc. Sci. Med. 50, 861–878 (2000).
Davey Smith, G. & Ebrahim, S. ‘Mendelian randomization’: can genetic epidemiology contribute to understanding environmental determinants of disease? Int. J. Epidemiol. 32, 1–22 (2003).
Richards, J. B. & Evans, D. M. Back to school to protect against coronary heart disease? Br. Med. J. https://www.bmj.com/content/358/bmj.j3849 (2017).
Verweij, K. J., Mosing, M. A., Zietsch, B. P. & Medland, S. E. in Statistical Human Genetics 151–170 (Springer, 2012).
Sudlow, C. et al. UK Biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 12, e1001779 (2015).
Price, A. L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904–909 (2006).
Wray, N. R. et al. Pitfalls of predicting complex traits from SNPs. Nat. Rev. Genet. 14, 507–515 (2013).
Moran, P. A. Notes on continuous stochastic phenomena. Biometrika 37, 17–23 (1950).
Okbay, A. et al. Genome-wide association study identifies 74 loci associated with educational attainment. Nature 533, 539–542 (2016).
Lee, J. J. et al. Gene discovery and polygenic prediction from a genome-wide association study of educational attainment in 1.1 million individuals. Nat. Genet. 50, 1112–1121 (2018).
Pasaniuc, B. & Price, A. L. Dissecting the genetics of complex traits using summary association statistics. Nat. Rev. Genet. 18, 117–127 (2017).
Haworth, S. et al. Apparent latent structure within the UK Biobank sample has implications for epidemiological analysis. Nat. Commun. 10, 333 (2019).
Niedomysl, T. How migration motives change over migration distance: evidence on variation across socio-economic and demographic groups. Reg. Stud. 45, 843–855 (2011).
Foden, M., Fothergill, S. & Gore, T. The State of the Coalfields: Economic and Social Conditions in the Former Mining Communities of England, Scotland and Wales (Centre for Regional Economic and Social Research, Sheffield Hallam Univ., 2014).
Beatty, C., Fothergill, S. & Powell, R. Twenty years on: has the economy of the UK coalfields recovered? Environ. Plan. A 39, 1654–1675 (2007).
Townsend, P., Phillimore, P. & Beattie, A. Health and Deprivation: Inequality and the North (Routledge, 1988).
Kong, A. et al. Selection against variants in the genome associated with educational attainment. Proc. Natl Acad. Sci. USA 114, E727–E732 (2017).
Hill, W. D. et al. Molecular genetic contributions to social deprivation and household income in UK Biobank. Curr. Biol. 26, 3083–3089 (2016).
Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).
Cetateanu, A. & Jones, A. Understanding the relationship between food environments, deprivation and childhood overweight and obesity: evidence from a cross sectional England-wide study. Health Place 27, 68–76 (2014).
Silventoinen, K. et al. Parental education and genetics of BMI from infancy to old age: a pooled analysis of 29 twin cohorts. Obesity 27, 855–865 (2019).
Menozzi, P., Piazza, A. & Cavalli-Sforza, L. Synthetic maps of human gene frequencies in Europeans. Science 201, 786–792 (1978).
Berg, J. J. et al. Reduced signal for polygenic adaptation of height in UK Biobank. eLife 8, e39725 (2019).
Sohail, M. et al. Polygenic adaptation on height is overestimated due to uncorrected stratification in genome-wide association studies. eLife 8, e39702 (2019).
Abdellaoui, A. et al. Educational attainment influences levels of homozygosity through migration and assortative mating. PLoS One 10, e0118935 (2015).
Domingue, B. W., Rehkopf, D. H., Conley, D. & Boardman, J. D. Geographic clustering of polygenic scores at different stages of the life course. RSF 4, 137–149 (2018).
Cummins, S. C., McKay, L. & MacIntyre, S. McDonald’s restaurants and neighborhood deprivation in Scotland and England. Am. J. Prev. Med. 29, 308–310 (2005).
Alford, J. R., Funk, C. L. & Hibbing, J. R. Are political orientations genetically transmitted? Am. Polit. Sci. Rev. 99, 153–167 (2005).
Benjamin, D. J. et al. The genetic architecture of economic and political preferences. Proc. Natl Acad. Sci. USA 109, 8026–8031 (2012).
Hatemi, P. K. & McDermott, R. The genetics of politics: discovery, challenges, and progress. Trends Genet. 28, 525–533 (2012).
Hatemi, P. K., Medland, S. E., Morley, K. I., Heath, A. C. & Martin, N. G. The genetics of voting: an Australian twin study. Behav. Genet. 37, 435–448 (2007).
Smith, K. et al. Biology, ideology, and epistemology: how do we know political attitudes are inherited and why should we care? Am. J. Polit. Sci. 56, 17–33 (2012).
Koenig, L. B., McGue, M., Krueger, R. F. & Bouchard, T. J. Genetic and environmental influences on religiousness: findings for retrospective and current religiousness ratings. J. Pers. 73, 471–488 (2005).
Alabrese, E., Becker, S. O., Fetzer, T. & Novy, D. Who voted for Brexit? Individual and regional data combined. Eur. J. Polit. Econ. 56, 132–150 (2019).
Fry, A. et al. Comparison of sociodemographic and health-related characteristics of UK Biobank participants with those of the general population. Am. J. Epidemiol. 186, 1026–1034 (2017).
Selzam, S. et al. Comparing within-and between-family polygenic score prediction. Am. J. Hum. Genet. 105, 351–363 (2019).
Kong, A. et al. The nature of nurture: effects of parental genotypes. Science 359, 424–428 (2018).
Llobera, J. R. An Invitation to Anthropology: the Structure, Evolution and Cultural Identity of Human Societies (Berghahn Books, 2003).
Robinson, M. R. et al. Genetic evidence of assortative mating in humans. Nat. Hum. Behav. 1, 0016 (2017).
Hugh-Jones, D., Verweij, K. J., Pourcain, B. S. & Abdellaoui, A. Assortative mating on educational attainment leads to genetic spousal resemblance for polygenic scores. Intelligence 59, 103–108 (2016).
Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).
Yang, J., Lee, S. H., Goddard, M. E. & Visscher, P. M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011).
1000 Genomes Project Consortium.A global reference for human genetic variation. Nature 526, 68–74 (2015).
Abraham, G., Qiu, Y. & Inouye, M. FlashPCA2: principal component analysis of Biobank-scale genotype datasets. Bioinformatics 33, 2776–2778 (2017).
McCarthy, S. et al. A reference panel of 64,976 haplotypes for genotype imputation. Nat. Genet. 48, 1279–1283 (2016).
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
Office for National Statistics; National Records of Scotland; Northern Ireland Statistics and Research Agency. 2011 Census aggregate data. UK Data Service https://doi.org/10.5257/census/aggregate-2011-2 (Edition: February 2017).
Altman, D. G. & Bland, J. M. Statistics notes: the normal distribution. Br. Med. J. 310, 298 (1995).
Loh, P.-R. et al. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat. Genet. 47, 284–290 (2015).
Yang, J., Zaitlen, N. A., Goddard, M. E., Visscher, P. M. & Price, A. L. Advantages and pitfalls in the application of mixed-model association methods. Nat. Genet. 46, 100–106 (2014).
Bulik-Sullivan, B. K. et al. LD score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).
Wray, N. R. et al. Genome-wide association analyses identify 44 risk variants and refine the genetic architecture of major depression. Nat. Genet. 50, 668–681 (2018).
Acknowledgements
This research was supported by the Australian National Health and Medical Research Council (1107258, 1078901, 1078037, 1056929, 1048853 and 1113400) and the Sylvia and Charles Viertel Charitable Foundation (Senior Medical Research Fellowship). A.A. and K.J.H.V. are supported by the Foundation Volksbond Rotterdam. A.A. and M.G.N. are supported by ZonMw grants 849200011 and 531003014 from The Netherlands Organisation for Health Research and Development. B.P.Z. received funding from the Australian Research Council (FT160100298). The research was conducted using data from the UK Biobank Resource (application number: 12514) and dbGaP (accession number: phs000674). The Genetic Epidemiology Research on Adult Health and Aging study was supported by grant RC2 AG036607 from the National Institutes of Health, as well as grants from the Robert Wood Johnson Foundation, Ellison Medical Foundation, Wayne and Gladys Valley Foundation and Kaiser Permanente. The authors thank the Kaiser Permanente Medical Care Plan, Northern California Region members who participated in the Kaiser Permanente Research Program on Genes, Environment and Health. This study was conducted using UK Biobank resources under application number 12514. UK Biobank was established by the Wellcome Trust medical charity, Medical Research Council, Department of Health, Scottish Government and Northwest Regional Development Agency. It also received funding from the Welsh Assembly Government, British Heart Foundation and Diabetes UK. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.
Author information
Authors and Affiliations
Contributions
A.A., D.H.-J. and P.M.V. conceived and designed the study. A.A., D.H.-J., L.Y. and K.E.K. analysed the data. A.A. wrote the manuscript and produced the figures. D.H.-J., L.Y., K.E.K., M.G.N., L.V., Y.H., B.P.Z., T.M.F., N.R.W., J.Y., K.J.H.V. and P.M.V. provided significant feedback on the analyses and the manuscript. P.M.V. supervised the project.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Peer review information Primary Handling Editor: Stavroula Kousta.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Variation explained by regional differences of uncorrected polygenic scores.
Linear mixed model results, with phenotype or polygenic score (without regressing out 100 PCs) as a dependent variable and region as random effect (N = 320,940 unrelated individuals). Left: Local Authorities (~380 regions); Middle: MSOA (~5,300 regions), Right: Coal mining Regions (fitted as a binary variable). Red: Birth Place; Green: Current Address; Yellow = significant after FDR correction.
Extended Data Fig. 2 Variation explained by regional differences of ancestry-corrected polygenic scores.
Linear mixed model results, with phenotype or polygenic score (after regressing out 100 PCs) as a dependent variable and region as random effect (N = 320,940 unrelated individuals). Left: Local Authorities (~380 regions); Middle: MSOA (~5,300 regions), Right: Coal mining Regions (fitted as a binary variable). Red: Birth Place; Green: Current Address; Yellow = significant after FDR correction.
Extended Data Fig. 3 Variation explained by regional differences of ancestry-informative PCs.
Linear mixed model results, with PCs as a dependent variable and region as random effect (N = 320,940 unrelated individuals). Left: Local Authorities (~380 regions); Middle: MSOA (~5,300 regions), Right: Coal mining Regions (fitted as a binary variable). Red: Birth Place; Green: Current Address; Yellow = significant after FDR correction.
Extended Data Fig. 4 Associations between polygenic scores and regional measures of socio-economic outcomes.
The standardized effect size estimates of robust linear regressions of polygenic scores on regional measures of socio-economic outcomes in unrelated UK Biobank participants of European descent (N ~320k). The polygenic scores are all standardized residuals after regressing out 100 PCs. Every individual was given the value of their region. Significant effects are colored, whereby the significance threshold is based on FDR correction across all tests shown in all four panels. All SEs were ≤ .002.
Extended Data Fig. 5 Associations between polygenic scores and regional measures of nutrition and health.
The standardized effect size estimates of robust linear regressions of polygenic scores on regional measures of nutrition and health outcomes in unrelated UK Biobank participants of European descent (N ~320k). The polygenic scores are all standardized residuals after regressing out 100 PCs. Every individual was given the value of their region. Significant effects are colored, whereby the significance threshold is based on FDR correction across all tests shown in all four panels. All SEs were ≤ .002.
Extended Data Fig. 6 Associations between polygenic scores and regional measures of religiosity and political preference.
The standardized effect size estimates of robust linear regressions of polygenic scores on regional measures of religiosity and election outcomes in unrelated UK Biobank participants of European descent (N ~320k). The polygenic scores are all standardized residuals after regressing out 100 PCs. Every individual was given the value of their region. Significant effects are colored, whereby the significance threshold is based on FDR correction across all tests shown in all four panels. All SEs were ≤ .002.
Extended Data Fig. 7 Associations between polygenic scores and individual-level phenotypes.
The standardized effect size estimates of robust linear regressions of polygenic scores on individual level phenotypes in unrelated UK Biobank participants of European descent (N ~320k). The polygenic scores are all standardized residuals after regressing out 100 PCs. Significant effects are colored, whereby the significance threshold is based on FDR correction across all tests shown in all four panels. All SEs were ≤ .002.
Extended Data Fig. 8 Genetic correlations between regional measures of socio-economic outcomes and a range of complex traits and diseases.
Genetic correlations (above) and their SEs (below) based on LD score regression for the RGWASs on SES-related traits. Colored is significant after FDR correction. The green numbers in the left part of the Figure below the diagonal of 1’s are the phenotypic correlations between the regional outcomes. The blue stars next to the trait names indicate that UK Biobank was part of the GWAS of the trait. See Supplementary Table 3 for the list of GWASs that the summary statistics of the complex traits were derived from.
Extended Data Fig. 9 Genetic correlations between regional measures of health- and nutrition and a range of complex traits and diseases.
Genetic correlations (above) and their SEs (below) based on LD score regression for the RGWASs on health- and nutrition-related traits. Colored is significant after FDR correction. The green numbers in the left part of the Figure below the diagonal of 1’s are the phenotypic correlations between the regional outcomes. The blue stars next to the trait names indicate that UK Biobank was part of the GWAS of the trait. See Supplementary Table 3 for the list of GWASs that the summary statistics of the complex traits were derived from.
Extended Data Fig. 10 Genetic correlations between regional measures of religiosity and political preference and a range of complex traits and diseases.
Genetic correlations (above) and their SEs (below) based on LD score regression for the RGWASs on ideology-related traits (religion and political preference). Colored is significant after FDR correction. The green numbers in the left part of the Figure below the diagonal of 1’s are the phenotypic correlations between the regional outcomes. The blue stars next to the trait names indicate that UK Biobank was part of the GWAS of the trait. See Supplementary Table 3 for the list of GWASs that the summary statistics of the complex traits were derived from.
Supplementary information
Supplementary Information
General summary and frequently asked questions, Supplementary Notes, Supplementary References, Supplementary Tables 1–3 and Supplementary Figs. 1–24.
Rights and permissions
About this article
Cite this article
Abdellaoui, A., Hugh-Jones, D., Yengo, L. et al. Genetic correlates of social stratification in Great Britain. Nat Hum Behav 3, 1332–1342 (2019). https://doi.org/10.1038/s41562-019-0757-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41562-019-0757-5
This article is cited by
-
Partner choice, confounding and trait convergence all contribute to phenotypic partner similarity
Nature Human Behaviour (2023)
-
Genetic predictors of cultural values variation between societies
Scientific Reports (2023)
-
A meta-analysis of genetic effects associated with neurodevelopmental disorders and co-occurring conditions
Nature Human Behaviour (2023)
-
A framework for research into continental ancestry groups of the UK Biobank
Human Genomics (2022)
-
Mendelian imputation of parental genotypes improves estimates of direct genetic effects
Nature Genetics (2022)