The population of the Mediterranean island of Sardinia has made important contributions to genome-wide association studies of complex disease traits and, based on ancient DNA studies of mainland Europe, Sardinia is hypothesized to be a unique refuge for early Neolithic ancestry. To provide new insights on the genetic history of this flagship population, we analyzed 3,514 whole-genome sequenced individuals from Sardinia. Sardinian samples show elevated levels of shared ancestry with Basque individuals, especially samples from the more historically isolated regions of Sardinia. Our analysis also uniquely illuminates how levels of genetic similarity with mainland ancient DNA samples varies subtly across the island. Together, our results indicate that within-island substructure and sex-biased processes have substantially impacted the genetic history of Sardinia. These results give new insight into the demography of ancestral Sardinians and help further the understanding of sharing of disease risk alleles between Sardinia and mainland populations.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Data availability

Allele frequency summary data analyzed in the study have been deposited with the European Genome-phenome Archive under accession number EGAS00001002212. The disaggregated individual-level sequence data for 2,105 samples (adult volunteers of the SardiNIA cohort longitudinal study) analyzed in this study are from Sidore et al.2 and are available from the database of Genotypes and Phenotypes under project identifier phs000313.v4.p2. The remaining individual-level sequence data are from a case-control study of autoimmunity from across Sardinia, consent and local institutional review board approval having been obtained. These data are only available for sharing and collaborating on by request from the project leader, Francesco Cucca, Consiglio Nazionale delle Ricerche, Italy.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.


  1. 1.

    Lettre, G. & Hirschhorn, J. N. Small island, big genetic discoveries. Nat. Genet. 47, 1224–1225 (2015).

  2. 2.

    Sidore, C. et al. Genome sequencing elucidates Sardinian genetic architecture and augments association analyses for lipid and blood inflammatory markers. Nat. Genet. 47, 1272–1281 (2015).

  3. 3.

    Naitza, S. et al. A genome-wide association scan on the levels of markers of inflammation in Sardinians reveals associations that underpin its complex regulation. PLoS Genet. 8, e1002480 (2012).

  4. 4.

    Zoledziewska, M. et al. Height-reducing variants and selection for short stature in Sardinia. Nat. Genet. 47, 1352–1356 (2015).

  5. 5.

    Steri, M. et al. Overexpression of the cytokine BAFF and autoimmunity risk. N. Engl. J. Med. 376, 1615–1626 (2017).

  6. 6.

    Cucca, F. et al. The distribution of DR4 haplotypes in Sardinia suggests a primary association of type I diabetes with DRB1 and DQB1 loci. Hum. Immunol. 43, 301–308 (1995).

  7. 7.

    Marrosu, M. G. et al. The co-inheritance of type 1 diabetes and multiple sclerosis in Sardinia cannot be explained by genotype variation in the HLA region alone. Hum. Mol. Genet. 13, 2919–2924 (2004).

  8. 8.

    Pugliatti, M. et al. The epidemiology of multiple sclerosis in Europe. Eur. J. Neurol. 13, 700–722 (2006).

  9. 9.

    Cao, A. & Galanello, R. Beta-thalassemia. Genet. Med. 12, 61–76 (2010).

  10. 10.

    Dyson, S. L., & Rowland, R. J. Archaeology and History in Sardinia from the Stone Age to the Middle Ages: Shepherds, Sailors, & Conquerors (University of Pennsylvania Museum of Archaeology and Anthropology: Philadelphia, PA, USA, 2007).

  11. 11.

    Cavalli-Sforza, L. L., Menozzi, P. & Piazza, A. The History and Geography of Human Genes (Princeton University Press: Princeton, NJ, USA, 1994).

  12. 12.

    Eaves, I. A. et al. The genetically isolated populations of Finland and Sardinia may not be a panacea for linkage disequilibrium mapping of common disease genes. Nat. Genet. 25, 320–323 (2000).

  13. 13.

    Calò, C. M., Melis, A., Vona, G. & Piras, I. S. Sardinian population (Italy): a genetic review. Int. J. Mod. Anthropol. 1, 39–64 (2008).

  14. 14.

    Cavalli-Sforza, L. L. & Piazza, A. Human genomic diversity in Europe: a summary of recent research and prospects for the future. Eur. J. Hum. Genet. 1, 3–18 (1993).

  15. 15.

    Barbujani, G. & Sokal, R. R. Zones of sharp genetic change in Europe are also linguistic boundaries. Proc. Natl Acad. Sci. USA 87, 1816–1819 (1990).

  16. 16.

    Zavattari, P. et al. Major factors influencing linkage disequilibrium by analysis of different chromosome regions in distinct populations: demography, chromosome recombination frequency and selection. Hum. Mol. Genet. 9, 2947–2957 (2000).

  17. 17.

    Elhaik, E. et al. Geographic population structure analysis of worldwide human populations infers their biogeographical origins. Nat. Commun. 5, 3513 (2014).

  18. 18.

    Cann, H. M. Human genome diversity. C. R. Acad. Sci. III, Sci. Vie 321, 443–446 (1998).

  19. 19.

    Haak, W. et al. Massive migration from the steppe was a source for Indo-European languages in Europe. Nature 522, 207–211 (2015).

  20. 20.

    Keller, A. et al. New insights into the Tyrolean Iceman’s origin and phenotype as inferred by whole-genome sequencing. Nat. Commun. 3, 698 (2012).

  21. 21.

    Lazaridis, I. et al. Ancient human genomes suggest three ancestral populations for present-day Europeans. Nature 513, 409–413 (2014).

  22. 22.

    Li, J. Z. et al. Worldwide human relationships inferred from genome-wide patterns of variation. Science 319, 1100–1104 (2008).

  23. 23.

    Skoglund, P. et al. Origins and genetic legacy of Neolithic farmers and hunter-gatherers in Europe. Science 336, 466–469 (2012).

  24. 24.

    Allentoft, M. E. et al. Population genomics of Bronze Age Eurasia. Nature 522, 167–172 (2015).

  25. 25.

    Hofmanová, Z. et al. Early farmers from across Europe directly descended from Neolithic Aegeans. Proc. Natl Acad. Sci. USA 113, 6886–6891 (2016).

  26. 26.

    Mathieson, I. et al. Genome-wide patterns of selection in 230 ancient Eurasians. Nature 528, 499–503 (2015).

  27. 27.

    Sikora, M. et al. Population genomic analysis of ancient and modern genomes yields new insights into the genetic ancestry of the Tyrolean Iceman and the genetic structure of Europe. PLoS Genet. 10, e1004353 (2014).

  28. 28.

    Ghirotto, S. et al. Inferring genealogical processes from patterns of Bronze-Age and modern DNA variation in Sardinia. Mol. Biol. Evol. 27, 875–886 (2010).

  29. 29.

    Fraumene, C., Petretto, E., Angius, A. & Pirastu, M. Striking differentiation of sub-populations within a genetically homogeneous isolate (Ogliastra) in Sardinia as revealed by mtDNA analysis. Hum. Genet. 114, 1–10 (2003).

  30. 30.

    Morelli, L. et al. Frequency distribution of mitochondrial DNA haplogroups in Corsica and Sardinia. Hum. Biol. 72, 585–595 (2000).

  31. 31.

    Pala, M. et al. Mitochondrial haplogroup U5b3: a distant echo of the epipaleolithic in Italy and the legacy of the early Sardinians. Am. J. Hum. Genet. 84, 814–821 (2009).

  32. 32.

    Olivieri, A. et al. Mitogenome diversity in Sardinians: a genetic window onto an island’s past. Mol. Biol. Evol. 34, 1230–1239 (2017).

  33. 33.

    Francalacci, P. et al. Low-pass DNA sequencing of 1200 Sardinians reconstructs European Y-chromosome phylogeny. Science 341, 565–569 (2013).

  34. 34.

    Caramelli, D. et al. Genetic variation in prehistoric Sardinia. Hum. Genet. 122, 327–336 (2007).

  35. 35.

    Vona, G. The peopling of Sardinia (Italy): history and effects. Int. J. Anthropol. 12, 71–87 (1997).

  36. 36.

    Contu, D. et al. Y-chromosome based evidence for pre-neolithic origin of the genetically homogeneous but diverse Sardinian population: inference for association scans. PLoS One 3, e1430 (2008).

  37. 37.

    Morelli, L. et al. A comparison of Y-chromosome variation in Sardinia and Anatolia is more consistent with cultural rather than demic diffusion of agriculture. PLoS One 5, e10419 (2010).

  38. 38.

    Semino, O. et al. The genetic legacy of Paleolithic Homo sapiens sapiens in extant Europeans: a Y chromosome perspective. Science 290, 1155–1159 (2000).

  39. 39.

    Rootsi, S. et al. Phylogeography of Y-chromosome haplogroup I reveals distinct domains of prehistoric gene flow in europe. Am. J. Hum. Genet. 75, 128–137 (2004).

  40. 40.

    Chikhi, L., Nichols, R. A., Barbujani, G. & Beaumont, M. A. Y genetic data support the Neolithic demic diffusion model. Proc. Natl Acad. Sci. USA 99, 11008–11013 (2002).

  41. 41.

    Passarino, G. et al. Y chromosome binary markers to study the high prevalence of males in Sardinian centenarians and the genetic structure of the Sardinian population. Hum. Hered. 52, 136–139 (2001).

  42. 42.

    Olalde, I. et al. The Beaker phenomenon and the genomic transformation of northwest Europe. Nature 555, 190–196 (2018).

  43. 43.

    Kivisild, T. The study of human Y chromosome variation through ancient DNA. Hum. Genet. 136, 529–546 (2017).

  44. 44.

    Skoglund, P. et al. Genomic insights into the peopling of the Southwest Pacific. Nature 538, 510–513 (2016).

  45. 45.

    Goldberg, A., Gunther, T., Rosenberg, N. A. & Jakobsson, M. Ancient X chromosomes reveal contrasting sex bias in Neolithic and Bronze Age Eurasian migrations. Proc. Natl Acad. Sci. USA 114, 2657–2662 (2017).

  46. 46.

    Günther, T. et al. Ancient genomes link early farmers from Atapuerca in Spain to modern-day Basques. Proc. Natl Acad. Sci. USA 112, 11917–11922 (2015).

  47. 47.

    Hellenthal, G. et al. A genetic atlas of human admixture history. Science 343, 747–751 (2014).

  48. 48.

    Loh, P. R. et al. Inferring admixture histories of human populations using linkage disequilibrium. Genetics 193, 1233–1254 (2013).

  49. 49.

    Moorjani, P. et al. The history of African gene flow into Southern Europeans, Levantines, and Jews. PLoS Genet. 7, e1001373 (2011).

  50. 50.

    Barbujani, G., Bertorelle, G., Capitani, G. & Scozzari, R. Geographical structuring in the mtDNA of Italians. Proc. Natl Acad. Sci. USA 92, 9171–9175 (1995).

  51. 51.

    Pistis, G. et al. High differentiation among eight villages in a secluded area of Sardinia revealed by genome-wide high density SNPs analysis. PLoS One 4, e4654 (2009).

  52. 52.

    Sanna, S. et al. Variants within the immunoregulatory CBLB gene are associated with multiple sclerosis. Nat. Genet. 42, 495–497 (2010).

  53. 53.

    Zoledziewska, M. et al. Variation within the CLEC16A gene shows consistent disease association with both multiple sclerosis and type 1 diabetes in Sardinia. Genes Immun. 10, 15–17 (2009).

  54. 54.

    Pilia, G. et al. Heritability of cardiovascular and personality traits in 6,148 Sardinians. PLoS Genet. 2, e132 (2006).

  55. 55.

    Petkova, D., Novembre, J. & Stephens, M. Visualizing spatial population structure with estimated effective migration surfaces. Nat. Genet. 48, 94–100 (2016).

  56. 56.

    Novembre, J. & Stephens, M. Interpreting principal component analyses of spatial population genetic variation. Nat. Genet. 40, 646–649 (2008).

  57. 57.

    Botigué, L. R. et al. Gene flow from North Africa contributes to differential human genetic diversity in southern Europe. Proc. Natl Acad. Sci. USA 110, 11791–11796 (2013).

  58. 58.

    Henn, B. M. et al. Genomic ancestry of North Africans supports back-to-Africa migrations. PLoS Genet. 8, e1002397 (2012).

  59. 59.

    Paschou, P. et al. Maritime route of colonization of Europe. Proc. Natl Acad. Sci. USA 111, 9211–9216 (2014).

  60. 60.

    Terhorst, J., Kamm, J. A. & Song, Y. S. Robust and scalable inference of population history from hundreds of unphased whole genomes. Nat. Genet. 49, 303–309 (2017).

  61. 61.

    Schiffels, S. & Durbin, R. Inferring human population size and separation history from multiple genome sequences. Nat. Genet. 46, 919–925 (2014).

  62. 62.

    Raghavan, M. et al. Upper Palaeolithic Siberian genome reveals dual ancestry of Native Americans. Nature 505, 87–91 (2014).

  63. 63.

    Patterson, N. et al. Ancient admixture in human history. Genetics 192, 1065–1093 (2012).

  64. 64.

    Blasco Ferrer, E. Paleosardo: Le Radici Linguistiche Della Sardegna Neolitica (De Gruyter, Berlin and New York, 2010).

  65. 65.

    Pickrell, J. K. et al. Ancient west Eurasian ancestry in southern and eastern Africa. Proc. Natl Acad. Sci. USA 111, 2632–2637 (2014).

  66. 66.

    Heyer, E., Chaix, R., Pavard, S. & Austerlitz, F. Sex-specific demographic behaviours that shape human genomic variation. Mol. Ecol. 21, 597–612 (2012).

  67. 67.

    Lazaridis, I. & Reich, D. Failure to replicate a genetic signal for sex bias in the steppe migration into central Europe. Proc. Natl Acad. Sci. USA 114, E3873–E3874 (2017).

  68. 68.

    Goldberg, A., Günther, T., Rosenberg, N. A. & Jakobsson, M. Reply to Lazaridis and Reich: robust model-based inference of male-biased admixture during Bronze Age migration from the Pontic-Caspian Steppe. Proc. Natl Acad. Sci. USA 114, E3875–E3877 (2017).

  69. 69.

    Wilkins, J. F. & Marlowe, F. W. Sex-biased migration in humans: what should we expect from genetic data? Bioessays 28, 290–300 (2006).

  70. 70.

    Keinan, A. & Clark, A. G. Recent explosive human population growth has resulted in an excess of rare genetic variants. Science 336, 740–743 (2012).

  71. 71.

    Joshi, P. K. et al. Directional dominance on stature and cognition in diverse human populations. Nature 523, 459–462 (2015).

  72. 72.

    Lohmueller, K. E. The impact of population demography and selection on the genetic architecture of complex traits. PLoS Genet. 10, e1004379 (2014).

  73. 73.

    Simons, Y. B., Turchin, M. C., Pritchard, J. K. & Sella, G. The deleterious mutation load is insensitive to recent population history. Nat. Genet. 46, 220–224 (2014).

  74. 74.

    Uricchio, L. H., Zaitlen, N. A., Ye, C. J., Witte, J. S. & Hernandez, R. D. Selection and explosive growth alter genetic architecture and hamper the detection of causal rare variants. Genome Res. 26, 863–873 (2016).

  75. 75.

    Browning, B. L. & Browning, S. R. A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. Am. J. Hum. Genet. 84, 210–223 (2009).

  76. 76.

    Altshuler, D. M. et al. Integrating common and rare genetic variation in diverse human populations. Nature 467, 52–58 (2010).

  77. 77.

    Price, A. L. et al. Long-range LD can confound genome scans in admixed populations. Am. J. Hum. Genet. 83, 135–139 (2008).

  78. 78.

    Kong, A. et al. Fine-scale recombination rate differences between sexes, populations and individuals. Nature 467, 1099–1103 (2010).

  79. 79.

    Tennessen, J. A. et al. Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 337, 64–69 (2012).

  80. 80.

    Gravel, S. et al. Demographic history and rare allele sharing among human populations. Proc. Natl Acad. Sci. USA 108, 11983–11988 (2011).

  81. 81.

    Nelson, M. R. et al. An abundance of rare functional variants in 202 drug target genes sequenced in 14,002 people. Science 337, 100–104 (2012).

  82. 82.

    Shringarpure, S. S., Bustamante, C. D., Lange, K. & Alexander, D. H. Efficient analysis of large datasets and sex bias with ADMIXTURE. BMC Bioinformatics 17, 218 (2016).

  83. 83.

    Mallick, S. et al. The Simons Genome Diversity Project: 300 genomes from 142 diverse populations. Nature 538, 201–206 (2016).

  84. 84.

    Woerner, A. E., Veeramah, K. R., Watkins, J. C., Hammer, M. F. & Novembre, J. The role of phylogenetically conserved elements in shaping patterns of human genomic diversity. Mol. Biol. Evol. 35, 2284–2295 (2018).

Download references


The authors would like to thank Iosif Lazaridis, Pontus Skoglund, Nick Patterson, Sohini Ramanchandran, Alan Rogers, and Robert Brown for discussion and technical assistance, as well as members of the Novembre and Lohmueller labs for constructive comments regarding this research. This study was funded in part by the National Institutes of Health (NIH), including support via National Human Genome Research Institute grants HG005581, HG005552, HG006513, HG007022 to G.R.A., and HG007089 to J.N.; via National Heart, Lung, and Blood Institute grant HL117626 to G.R.A.; via National Institute of General Medical Sciences grant GM108805 to J.N., F32GM106656 to C.W.K.C., and T32GM007197 to J.H.M. and A.B.; via National Institute of Neurological Disorders and Stroke grant T32NS048004 to C.W.K.C.; by the Intramural Research Program of the NIH, National Institute on Aging, with contracts N01-AG-1-2109 and HHSN271201100005C to the Italian National Research Council (Consiglio Nazionale delle Ricerche); and by National Science Foundation fellowship DGE-1746045 to J.H.M. and H.A. This research was also supported by Sardinian Autonomous Region (L.R. no. 7/2009) grant cRP3-154, PB05 InterOmics MIUR Flagship Project, and grant FaReBio2011 (Farmaci e Reti Biotecnologiche di Qualità) to F.C.

Author information


  1. Center for Genetic Epidemiology, Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA

    • Charleston W. K. Chiang
  2. Center for Neurobehavioral Genetics, Semel Institute for Neuroscience and Behavior, University of California, Los Angeles, Los Angeles, CA, USA

    • Charleston W. K. Chiang
  3. Department of Ecology and Evolutionary Biology, University of California, Los Angeles, Los Angeles, CA, USA

    • Charleston W. K. Chiang
    •  & Kirk E. Lohmueller
  4. Department of Human Genetics, University of Chicago, Chicago, IL, USA

    • Joseph H. Marcus
    • , Arjun Biddanda
    •  & John Novembre
  5. Istituto di Ricerca Genetica e Biomedica, Consiglio Nazionale delle Ricerche, Monserrato, Cagliari, Italy

    • Carlo Sidore
    • , Magdalena Zoledziewska
    • , Maristella Pitzalis
    • , Fabio Busonero
    • , Andrea Maschio
    • , Giorgio Pistis
    • , Maristella Steri
    • , Andrea Angius
    •  & Francesco Cucca
  6. Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA

    • Carlo Sidore
    • , Fabio Busonero
    • , Giorgio Pistis
    •  & Goncalo R. Abecasis
  7. Committee on Evolutionary Biology, University of Chicago, Chicago, IL, USA

    • Hussein Al-Asadi
  8. Laboratory of Genetics, National Institute on Aging, US National Institutes of Health, Baltimore, MD, USA

    • David Schlessinger
  9. Dipartimento di Scienze Biomediche, Università degli Studi di Sassari, Sassari, Italy

    • Francesco Cucca


  1. Search for Charleston W. K. Chiang in:

  2. Search for Joseph H. Marcus in:

  3. Search for Carlo Sidore in:

  4. Search for Arjun Biddanda in:

  5. Search for Hussein Al-Asadi in:

  6. Search for Magdalena Zoledziewska in:

  7. Search for Maristella Pitzalis in:

  8. Search for Fabio Busonero in:

  9. Search for Andrea Maschio in:

  10. Search for Giorgio Pistis in:

  11. Search for Maristella Steri in:

  12. Search for Andrea Angius in:

  13. Search for Kirk E. Lohmueller in:

  14. Search for Goncalo R. Abecasis in:

  15. Search for David Schlessinger in:

  16. Search for Francesco Cucca in:

  17. Search for John Novembre in:


F.C., G.R.A., D.S., and J.N. conceived of the study. C.W.K.C., C.S., D.S., F.C., G.R.A., and J.N. designed the study. C.W.K.C., J.H.M., C.S., H.A., and A.B. performed the analyses. C.W.K.C., J.H.M., C.S., A.B., K.E.L., G.R.A., D.S., F.C., and J.N. interpreted the data. C.S., M.Z., M.P., F.B., A.M., G.P., M.S., A.A., G.R.A., D.S., and F.C. contributed to data collection and the initial preparation for genetic analysis. C.W.K.C. and J.N. wrote the paper with input from all coauthors.

Competing interests

The authors declare no competing interests.

Corresponding authors

Correspondence to Charleston W. K. Chiang or John Novembre.

Integrated supplementary information

  1. Supplementary Figure 1 Fst matrix within Sardinia, including the HGDP Sardinians.

    The HGDP Sardinians are recorded as being collected from the Gennargentu region (A. Piazza, personal communication). Consistent with this record, the HGDP Sardinians show close affinity to individuals from Ogliastra, although roughly half of the samples are more similar to the broader sample outside of Ogliastra. Thus, we labeled the two subgroups of HGDP as ‘SarHGa’ and ‘SarHGb’.

  2. Supplementary Figure 2 Relationship between HGDP Sardinians and newly sequenced Sardinians.

    a, PCA results of merged Sardinian whole-genome sequences and the HGDP Sardinians. b, Admixture results of the merged dataset. The HGDP Sardinians are recorded as being collected from the Gennargentu region (A. Piazza, personal communication). Consistent with this record, the HGDP Sardinians show close affinity to individuals from Ogliastra, although roughly half of the samples are more similar to the broader sample outside of Ogliastra. Thus, we labeled the two subgroups of HGDP as ‘HGa’ and ‘HGb’.

  3. Supplementary Figure 3 Admixture results using the merged dataset of Sardinia and Human Origins Array data, for K = 2 to K = 15.

    For K = 4 to K = 15, Sardinians (HGDP Sardinians, Arzana, and Cagliari, grouped by the black box) all cluster together with nearly 100% of a unique component of ancestry that is not found at high levels outside of Sardinia, consistent with their relative isolation and drift. Visualization across all K values was generated using Pong (v. 1.4.5; Bioinformatics 32, 2817–2823, 2016).

  4. Supplementary Figure 4 Coalescent-based inference of demographic history using MSMC.

    a, Inferred relative cross-coalescence rate between pairs of populations through time, based on four haplotypes each from Arzana, CEU, and TSI. A cross-coalescence rate of 0.5 is arbitrarily defined as the divergence time, which differs in definition from divergence time estimated by SMC++. Thus, one should be cautious with direct numerical comparisons of divergence time between the two methods. b, Population size history inference based on eight haplotypes of high-coverage individuals from each of Lanusei (LAN), Arzana (ARZ), 1000 Genomes CEU and TSI. The shaded box denotes approximately the Neolithic period, around 4,500 to 8,000 years ago, converted to units of generations assuming 30 years per generation.

  5. Supplementary Figure 5 Inference of demographic history using SMC++ for each Sardinian subpopulation.

    a,b, Population size history. For each population, we used 2 high-coverage individuals and 40 low-coverage individuals for analysis. The parameters used for SMC++ were t1 = 150 and knots = 10. We found that all populations from Ogliastra (ARZANA, LANUSEI, and ILBONO) displayed lower effective population sizes in the recent past than mainland European populations (CEU, TSI) (a) or the population from Cagliari (b). Uncertainty in the estimates is reflected through ten bootstrap samples, shown in the same but lighter color to the trajectory estimated using the entire dataset. The shaded box denotes approximately the Neolithic period, around 4,500 to 8,000 years ago, converted to units of generations assuming 30 years per generation. c, Population divergence time estimates. Each black cross denotes the point estimates output by SMC++; uncertainty and mean point estimates from ten bootstrap samples are shown by the violin plot and the black dot, respectively.

  6. Supplementary Figure 6 Relationship between Sardinia and mainland populations.

    a, Outgroup f3 results of the form f3(Mbuti; Arzana, X), where X is a mainland population from Human Origins Array data. Outside of Sardinia, Sardinians show the highest amount of shared drift with the Basque. b, Similar display of results for f3(Mbuti; Cagliari, X). c, To formally test for excess sharing between Sardinians and Basque, we computed D statistics of the form D(Mbuti, Arzana; Tuscan or Bergamo, X), where X is a mainland European population. Relative to Arzana–Tuscan or Arzana–Bergamo sharing, excess of sharing between Arzana and X would result in significant positive values of this D statistic; dearth of sharing between Arzana and X would result in significant negative values. Mainland populations with significant results (| Z | > 4) in this analysis are bolded on the y axis.

  7. Supplementary Figure 7 Analysis of admixture signal using f3 statistics.

    For each (target) Sardinian populations with sample size greater than eight and selected European populations from Human Origins Array data, we computed f3 of the form f3(target; source 1, source 2), where source 1 and source 2 are all possible pairs of populations in Human Origins Array data. Then, for each target population, we display the pair of source populations that produced the lowest (not necessarily negative) f3 value. Significantly negative f3 statistics, bolded and marked with an asterisk, provide evidence for admixture, which we observed for the mainland European populations of English, French, Tuscan, Sicilian, and Spanish from most regions of Spain (Z range from –4.5 to –11.1). Error bars represent the standard error of the estimated f3 based on the block jackknife procedure. None of the Sardinian or Basque populations showed evidence of admixture by this test.

  8. Supplementary Figure 8 Relationship of ancient pre-Neolithic hunter-gatherers across Europe.

    The map visualizes outgroup f3 statistics of the form f3(Mbuti; Loschbour, X), where X is a population across the merged dataset of Sardinia and Human Origins Array data. Population abbreviations are the same as in Fig. 4.

  9. Supplementary Figure 9 Mixture proportions of the three-component ancestries among Sardinian populations.

    Using a method first presented in Haak et al. (Nature 522, 207–211, 2015), we computed unbiased estimates of mixture proportions without a parameterized model of relationships between the test populations and the outgroup populations based on f4 statistics. The three-component ancestries were represented by early Neolithic individuals from the LBK culture (LBK_EN), pre-Neolithic hunter-gatherers (Loschbour), and Bronze Age steppe pastoralists (Yamnaya). See Supplementary Table 5 for standard error estimates computed using a block jackknife.

  10. Supplementary Figure 10 Admixture analysis contrasting chromosome X and the autosome based on 1,577 unrelated Sardinians and TSI individuals at K = 3.

    a, Results using the autosomal data. b, Results using chromosome X data. c, Correlation of the ‘red’ ancestral component from a for each individual between chromosome X (x axis) and the autosome (y axis). The solid black line is the y = x line; the dashed black line is the regression trend line. Overall, there is enrichment of the red ancestry component (found most prominently among Sardinians from Ogliastra) on the X chromosome as compared to the autosomes. We observed qualitatively similar results whether we used only the females in the dataset, removed the Arzana individuals, or compared chromosome X estimates only to those on chromosome 7, which is the most closely matched autosome in terms of sequenced base pairs and gene density in the human reference genome.

  11. Supplementary Figure 11 Ratio of heterozygosity estimates between chromosome X and the autosome.

    For each of the 21 deeply sequenced female Europeans found in SGDP, we computed the ratio of heterozygosity on the X chromosome versus the autosome. The analysis is restricted to the 3,606 and 787 10-kb windows believed to be neutrally evolving (Woerner, A. E., Veeramah, K. R., Watkins, J. C., Hammer, M. F. & Novembre, J. The role of phylogenetically conserved elements in shaping patterns of human genomic diversity. Mol. Biol. Evol. 35, 2284–2295 (2018)) on the autosome and X chromosome, respectively. Error bars represent standard errors estimated from block jackknife procedures in 10-Mb blocks.

Supplementary information

  1. Supplementary Text and Figures

    Supplementary Figures 1–11 and Supplementary Tables 1–6

  2. Reporting Summary

  3. Supplementary Table 7

About this article

Publication history