Article

The fine-scale genetic structure of the British population

Received:
Accepted:
Published online:

Abstract

Fine-scale genetic variation between human populations is interesting as a signature of historical demographic events and because of its potential for confounding disease studies. We use haplotype-based statistical methods to analyse genome-wide single nucleotide polymorphism (SNP) data from a carefully chosen geographically diverse sample of 2,039 individuals from the United Kingdom. This reveals a rich and detailed pattern of genetic differentiation with remarkable concordance between genetic clusters and geography. The regional genetic differentiation and differing patterns of shared ancestry with 6,209 individuals from across Europe carry clear signals of historical demographic events. We estimate the genetic contribution to southeastern England from Anglo-Saxon migrations to be under half, and identify the regions not carrying genetic material from these migrations. We suggest significant pre-Roman but post-Mesolithic movement into southeastern England from continental Europe, and show that in non-Saxon parts of the United Kingdom, there exist genetically differentiated subgroups rather than a general ‘Celtic’ population.

  • Subscribe to Nature for full access:

    $199

    Subscribe

Additional access options:

Already a subscriber?  Log in  now or  Register  for online access.

Accessions

Data deposits

Genotype data, as well as location information at county level (or aggregated across counties where there are small numbers of samples associated with a particular county), will be made available by the WTCCC access process, via the European Genotype Archive (https://www.ebi.ac.uk/ega/) under accession numbers EGAS00001000672 and EGAD00010000632.

References

  1. 1.

    & Association study designs for complex diseases. Nature Rev. Genet. 2, 91–99 (2001)

  2. 2.

    , , & The effects of human population structure on large genetic association studies. Nature Genet. 36, 512–517 (2004)

  3. 3.

    & Common and rare variants in multifactorial susceptibility to common diseases. Nature Genet. 40, 695–701 (2008)

  4. 4.

    , & The History and Geography of Human Genes (Princeton Univ. Press, 1994)

  5. 5.

    et al. Genetic evidence of an early exit of Homo sapiens sapiens from Africa through eastern Africa. Nature Genet. 23, 437–441 (1999)

  6. 6.

    et al. A worldwide survey of haplotype variation and linkage disequilibrium in the human genome. Nature Genet. 38, 1251–1260 (2006)

  7. 7.

    The 1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012)

  8. 8.

    et al. Gene flow from North Africa contributes to differential human genetic diversity in southern Europe. Proc. Natl Acad. Sci. USA 110, 11791–11796 (2013)

  9. 9.

    & The geography of recent genetic ancestry across Europe. PLoS Biol. 11, e1001555 (2013)

  10. 10.

    , & Inferring human colonization history using a copying model. PLoS Genet. 4, e1000078 (2008)

  11. 11.

    The Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678 (2007)

  12. 12.

    et al. Population structure and genome-wide patterns of variation in Ireland and Britain. Eur. J. Hum. Genet. 18, 1248–1254 (2010)

  13. 13.

    et al. People of the British Isles: preliminary analysis of genotypes and surnames in a UK-control population. Eur. J. Hum. Genet. 20, 203–210 (2012)

  14. 14.

    The International Multiple Sclerosis Genetics Consortium & The Wellcome Trust Case Control Consortium 2. Genetic risk and a primary role for cell-mediated immune mechanisms in multiple sclerosis. Nature 476, 214–219 (2011)

  15. 15.

    , , & Inference of population structure using dense haplotype data. PLoS Genet. 8, e1002453 (2012)

  16. 16.

    , & Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009)

  17. 17.

    , & Efficient computation with a linear mixed model on large-scale data sets with applications to genetic studies. Ann. Appl. Stat. 7, 369–390 (2013)

  18. 18.

    et al. Genetic evidence for different male and female roles during cultural transitions in the British Isles. Proc. Natl Acad. Sci. USA 98, 5078–5083 (2001)

  19. 19.

    et al. A Y chromosome census of the British Isles. Curr. Biol. 13, 979–984 (2003)

  20. 20.

    et al. Genetic evidence for a family-based Scandinavian settlement of Shetland and Orkney during the Viking periods. Heredity 95, 129–135 (2005)

  21. 21.

    et al. The Eurasian heartland: a continental perspective on Y-chromosome diversity. Proc. Natl Acad. Sci. USA 98, 10244–10249 (2001)

  22. 22.

    et al. A genetic atlas of human admixture history. Science 343, 747–751 (2014)

  23. 23.

    Genetic Analysis of Psoriasis Consortium & the Wellcome Trust Case Control Consortium 2. A genome-wide association study identifies new psoriasis susceptibility loci and an interaction between HLA-C and ERAP1. Nature Genet. 42, 985–990 (2010)

  24. 24.

    , & Inference of population structure using multilocus genotype data. Genetics 155, 945–959 (2000)

  25. 25.

    in Population Structure and Genetic Disorders (eds , , , & ) 211–238 (Academic Press, 1980)

  26. 26.

    , & A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet. 5, e1000529 (2009)

  27. 27.

    Isolation by distance. Genetics 28, 114–138 (1943)

  28. 28.

    Genetic Structure and Selection in Subdivided Populations (Princeton University Press, 2004)

  29. 29.

    & Solving Least Squares Problems (Reprinted by the Society for Industrial and Applied Mathematics, 1995)

  30. 30.

    , , & Estimating and interpreting FST: the impact of rare variants. Genome Res. 23, 1514–1521 (2013)

  31. 31.

    & Linkage disequilibrium in humans: models and data. Am. J. Hum. Genet. 69, 1–14 (2001)

  32. 32.

    et al. Discerning the ancestry of European Americans in genetic association studies. PLoS Genet. 4, e236 (2008)

  33. 33.

    et al. The history of African gene flow into Southern Europeans, Levantines, and Jews. PLoS Genet. 7, e1001373 (2011)

  34. 34.

    , & Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics 164, 1567–1587 (2003)

  35. 35.

    et al. Ancient admixture in human history. Genetics 192, 1065–1093 (2012)

  36. 36.

    National Records of Scotland. 2011 Census: Digitised Boundary Data (Scotland). [computer file]. UK Data Service Census Support. Downloaded from:

  37. 37.

    Northern Ireland Statistics and Research Agency. 2011 Census: Digitised Boundary Data (Northern Ireland). [computer file]. UK Data Service Census Support. Downloaded from:

  38. 38.

    Office for National Statistics. 2011 Census: Digitised Boundary Data (England and Wales). [computer file]. UK Data Service Census Support. Downloaded from:

  39. 39.

    European maps. Sourced from Eurostat, copyright EuroGeographics for the administrative boundaries

Download references

Acknowledgements

We thank J. Cheshire for his advice. We thank the UK Office for National Statistics, the National Records of Scotland, and the Northern Ireland Statistics and Research Agency for providing the boundaries used for the UK maps. We note that census output is Crown copyright and is reproduced with the permission of the Controller of HMSO and the Queen’s Printer for Scotland. We further acknowledge the provision of maps from Eurostat, which are copyright EuroGeographics for the administrative boundaries. We acknowledge support from the Wellcome Trust (072974/Z/03/Z, 088262/Z/09/Z, 075491/Z/04/Z, 075491/Z/04/A, 075491/Z/04/B, 090532/Z/09/Z, 084818/Z/08/Z, 095552/Z/11/Z, 085475/Z/08/Z, 098387/Z/12/Z, 098386/Z/12/Z), the Academy of Finland (257654) and the Australian National Health and Medical Research Council (APP1053756). P.D. was supported in part by a Wolfson-Royal Society Merit Award.

Author information

Author notes

    • Stephen Leslie
    • , Bruce Winney
    •  & Garrett Hellenthal

    These authors contributed equally to this work.

    • Peter Donnelly
    •  & Walter Bodmer

    These authors jointly supervised this work.

Affiliations

  1. Murdoch Childrens Research Institute, Royal Children’s Hospital, Flemington Road, Parkville, Victoria 3052, Australia

    • Stephen Leslie
  2. University of Melbourne, Department of Mathematics and Statistics, Parkville, Victoria 3010, Australia

    • Stephen Leslie
  3. University of Oxford, Department of Oncology, Old Road Campus Research Building, Roosevelt Drive, Oxford OX3 7DQ, UK

    • Stephen Leslie
    • , Bruce Winney
    • , Abdelhamid Boumertit
    • , Tammy Day
    • , Katarzyna Hutnik
    • , Ellen C. Royrvik
    •  & Walter Bodmer
  4. University College London Genetics Institute, Darwin Building, Gower Street, London WC1E 6BT, UK

    • Garrett Hellenthal
  5. Counsyl, 180 Kimball Way, South San Francisco, California 94080, USA

    • Dan Davison
  6. University of Oxford, Institute of Archaeology, 36 Beaumont Street, Oxford OX1 2PG, UK

    • Barry Cunliffe
  7. University of Bristol, Department of Mathematics, University Walk, Bristol BS8 1TW, UK

    • Daniel J. Lawson
  8. College of Medicine, Swansea University, Singleton Park, Swansea SA2 8PP, UK

    • Daniel Falush
  9. The Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Oxford OX3 7BN, UK

    • Colin Freeman
    •  & Peter Donnelly
  10. University of Helsinki, P.O. Box 20, Helsinki, FI-00014, Finland

    • Matti Pirinen
  11. University of Oxford, Department of Statistics, 1 South Parks Road, Oxford OX1 3TG, UK

    • Simon Myers
    •  & Peter Donnelly
  12. University of Oxford, University Museum of Natural History, Parks Road, Oxford OX1 3PW, UK

    • Mark Robinson
  13. Information about participants appears in the Supplementary Information.

Consortia

  1. Wellcome Trust Case Control Consortium 2

  2. International Multiple Sclerosis Genetics Consortium

Authors

  1. Search for Stephen Leslie in:

  2. Search for Bruce Winney in:

  3. Search for Garrett Hellenthal in:

  4. Search for Dan Davison in:

  5. Search for Abdelhamid Boumertit in:

  6. Search for Tammy Day in:

  7. Search for Katarzyna Hutnik in:

  8. Search for Ellen C. Royrvik in:

  9. Search for Barry Cunliffe in:

  10. Search for Daniel J. Lawson in:

  11. Search for Daniel Falush in:

  12. Search for Colin Freeman in:

  13. Search for Matti Pirinen in:

  14. Search for Simon Myers in:

  15. Search for Mark Robinson in:

  16. Search for Peter Donnelly in:

  17. Search for Walter Bodmer in:

Contributions

W.B. conceived and directed the PoBI project. P.D. directed the analysis and sample genotyping. B.W., A.B., T.D., K.H., E.C.R. and W.B. collected the UK (PoBI) samples and extracted DNA. IMSGC provided the European samples’ genotypes and geographical information. Sample genotyping and quality control was performed by WTCCC2 for both the UK and European genotype data. S.L., G.H., S.M. and P.D. performed the major analyses with contributions from B.W., D.D., D.J.L., D.F., C.F., M.R., M.P. and W.B. M.R. and B.C. provided historical and archaeological information and context. G.H. made Extended Data Fig. 2. S.L. produced all the other figures. P.D., S.L., B.W., G.H., S.M., M.R. and W.B. wrote the manuscript. All authors reviewed the manuscript.

Competing interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to Peter Donnelly.

Extended data

Supplementary information

PDF files

  1. 1.

    Supplementary Information

    This file contains Supplementary Notes, Supplementary References and Supplementary Figure 1.

Excel files

  1. 1.

    Supplementary Table 1

    This table contains pairwise FST values for the UK sample collection districts. For each of the 30 UK sample collection districts the table gives the pairwise FST value. The standard errors on these estimates (not shown for clarity of exposition) have a mean of 0.0001 and a maximum of 0.0003. The labels of the sample collection districts are interpreted as follows: CUM = Cumbria; LIN = Lincolnshire; NEA = North East England; OXF = Oxfordshire; YOR = Yorkshire; CHE = Cheshire; NTH = Northamptonshire; NOT = Nottinghamshire; DOR = Dorset; SUS = Sussex; NOR = Norfolk; WOR = Worcestershire; DEV = Devon; SPE = South Pembrokeshire; COR = Cornwall; NWA = North Wales; ARG = Argyle and Bute; NPE = North Pembrokeshire; BAN = Banff and Buchan; NIR = Northern Ireland; ORK = Orkney; SUF = Suffolk; LEI = Leicestershire; FOD = Forest of Dean; HER = Herefordshire; HAM = Hampshire; DER = Derbyshire; LAN = Lancashire; KEN = Kent; GLO = Gloucestershire.

  2. 2.

    Supplementary Table 2

    This table contains pairwise FST values for the UK clusters. For each of the 17 UK clusters used in the main analysis (labelled approximately from north to south) the table gives the pairwise FST value. The standard errors on these estimates (not shown for clarity of exposition) have a mean of 0.0001 and a maximum of 0.0003.

  3. 3.

    Supplementary Table 3

    This table shows robustness of the inferred UK clusters. For each pair of the 17 UK clusters used in the main analysis (labelled approximately from north to south) the table gives the total variation distance between the copying vectors (TVDCV) associated with the pair (see Methods for details). The TVDCV statistic is interpreted as a measure of the differentiation of the pair of clusters, based on genetic ancestry. Using the TVDCV statistic, one can calculate the p-value from a permutation test of the null hypothesis that, given the cluster sizes, the individuals in the two clusters are assigned randomly to each cluster. Based on 1,000 permutations for each pair, all the pairwise comparisons of clusters give p-values below 0.001, confirming that the actual clusters are capturing real ancestry differences.

  4. 4.

    Supplementary Table 4

    This table contains European ancestry profiles of the UK clusters. For each of the 17 UK clusters used in the main analysis (rows, labelled approximately from north to south) the table gives the ancestry profile point estimates (with 95% confidence intervals derived by bootstrapping shown in brackets) for the 20 of the 51 groups obtained in the European clustering analysis (columns, labelled by European group number): those that contribute at least 1% to the ancestry profile of at least one UK cluster are shown.

  5. 5.

    Supplementary Table 5

    This table contains differences between the ancestry profiles of the UK clusters. For each pair of the 17 UK clusters used in the main analysis (labelled approximately from north to south) the table gives the total variation distance between the ancestry profiles (TVDAP) associated with the pair (see Methods for details). The TVDAP statistic is interpreted as a measure of the differentiation of the pair of clusters, based on genetic ancestry. Using the TVDAP statistic, one can calculate the p-value from a permutation test of the null hypothesis that, given the cluster sizes, the individuals in the two clusters are assigned randomly to each cluster. The calculated p-values, based on 1,000 permutations for each pair, are shown in brackets.

  6. 6.

    Supplementary Table 6

    This table shows robustness of the ancestry profiles. The table gives the inferred ancestry profiles for 18 clusters, simulated under various demographic scenarios and using two different simulation approaches (here labelled ‘Real Data’ and ‘Forwards’, see Methods for details). Each simulation assumes the cluster is the result of a single admixture of two populations (samples from which are derived from the clusters we used in our main analyses), in the proportions given (50:50; 25:75; 10:90; labelled 50, 25, 10 respectively). For each of the simulated clusters the table gives the ancestry profile point estimates (with 95% confidence intervals derived by bootstrapping shown in brackets) for the 51 groups obtained in the European clustering analysis (columns, labelled by European group number). See Methods and Supplementary Note for more details.

  7. 7.

    Supplementary Table 7

    This table contains correlations between European groups’ contributions to the UK ancestry profiles. Displayed are pairwise correlations (Pearson’s r) of each European group’s contributions to the ancestry profiles of each of the 17 UK clusters used in our main analysis. Here only values for European groups that contribute at least 1% to the ancestry profile of at least one UK cluster are shown. a, Ordered by European group numbers. b, Grouped into clusters according to similar patterns of correlation coefficients. Note that there are various scenarios which can give rise to these correlations, so that strong correlations between contributions from two European groups do not necessarily imply that the two groups contributed ancestry through the same migration event(s) (see Methods and Supplementary Note for examples of this).

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.