From around 4,000 to 2,000 BC the forest-steppe north-western Pontic region was occupied by people who shared a nomadic lifestyle, pastoral economy and barrow burial rituals. It has been shown that these groups, especially those associated with the Yamnaya culture, played an important role in shaping the gene pool of Bronze Age Europeans, which extends into present-day patterns of genetic variation in Europe. Although the genetic impact of these migrations from the forest-steppe Pontic region into central Europe have previously been addressed in several studies, the contribution of mitochondrial lineages to the people associated with the Corded Ware culture in the eastern part of the North European Plain remains contentious. In this study, we present mitochondrial genomes from 23 Late Eneolithic and Bronze Age individuals, including representatives of the north-western Pontic region and the Corded Ware culture from the eastern part of the North European Plain. We identified, for the first time in ancient populations, the rare mitochondrial haplogroup X4 in two Bronze Age Catacomb culture-associated individuals. Genetic similarity analyses show close maternal genetic affinities between populations associated with both eastern and Baltic Corded Ware culture, and the Yamnaya horizon, in contrast to larger genetic differentiation between populations associated with western Corded Ware culture and the Yamnaya horizon. This indicates that females with steppe ancestry contributed to the formation of populations associated with the eastern Corded Ware culture while more local people, likely of Neolithic farmer ancestry, contributed to the formation of populations associated with western Corded Ware culture.
The forest-steppe north-western Pontic region of the middle Dniester and Prut interfluve was a place of contact and exchange routes between human populations inhabiting the drainages of the Black and Baltic Seas from around 4,000 to 2,000 BC1. During this time, the region was occupied by forest-steppe populations attributed to the Eneolithic (3350–3200 BC)1 and the succeeding Bronze Age groups associated with the Yamnaya - Pit Grave (dated to 3,100/3,050–2,800 BC)2, the Catacomb (2,600–2,200 BC), the Babyno (2,200–1,700/1,600 BC) and the Noua (1,600–1,200/1,100 BC) cultures3,4. Although there were cultural differences between these populations, they all shared a similar nomadic lifestyle, pastoral economy and barrow burial rituals5. Some of the rounded burial mounds founded by Eneolithic people were reused by the succeeding cultural entities of the Early Bronze Age1, while other kurgans shared a mix of characteristics from both the Late Eneolithic and the Early Bronze Age funeral rites1,4.
According to some researchers6,7,8,9, the Yamnaya culture originated in the Volga-Ural interfluve and spread across the Pontic-Caspian steppe between 3,300–2,800 BC. This cultural expansion led to the development of a less homogenous group of cultural entities belonging to the so-called Yamnaya Cultural-Historical Area/Unity10,11, hereafter reffered to as ‘the Yamnaya horizon’12. People associated with the eastern Yamnaya culture spread across the steppe to the east of Don River. With no settlements identified in this area, they were thought to be more mobile because of their supposed nomadic profile of economy stimulated by environmental conditions of Kuban – North Caspian steppes13. On the other hand, Yamnaya settlements were found more frequently in the forest-steppe Pontic regions, to the west of Don River, probably due to favorable environmental conditions12.
One of the most widely debated issues, which emerged in connection to studies on the Yamnaya horizon, was the relationship between the people associated with the Yamnaya and the Central European final Neolithic cultures, in particular the Corded Ware culture (dated to 2800–2300 BC)14. Archaeological records point to some similarities between the Corded Ware culture and the steppe, including shared practices such as the barrow structures and burial rituals2. Adoption of a herding economy based on mobility through the use of wagons and horses, was also proposed as a common trait associated with both the Yamnaya and Corded Ware cultures12. These observations led some researchers to suggest a possible Yamnaya migration toward the Baltic drainage basin15 or a massive westward expansion of the steppe pastoralist people, representing the “barrow culture”, into the North European Plain12,16. However, specific burial customs of the Yamnaya people, such as the scarcity of grave goods, the presence of ochre, and the building of specific wooden roof or floor structures, generated opposing arguments emphasizing the significant differences between the Corded Ware and steppe cultures2.
Recent ancient DNA (aDNA) studies suggest that the large-scale migration of steppe populations associated with the Yamnaya horizon contributed to the formation of the final Neolithic central European populations14,17,18. Moreover, people associated with the Yamnaya horizon have been shown to be an admixed population with ancestry from Eastern hunter-gatherers and Caucasus hunter-gatherers14,17,19,20. Ancient DNA data indicate that the Neolithic populations from Central Europe already had the ‘Caucasus’ genetic component from the eastern steppes around 2,500 BC. Presence of this genetic component was used as an argument for the expansion of people from the Pontic-Caspian region into the central Europe14,17. X chromosome sequence data suggest that it was primarily males who participated in these migrations21,22 and contributed to the formation of the people associated with the Corded Ware culture14,17. Based on the X chromosome data obtained mainly from western Corded Ware-associated individuals, it was estimated that, for every female, ~4–15 males migrated from the steppe21. Subsequently, the Yamnaya genetic component spread across Bronze Age Europe and West Asia14.
Although questions concerning the migrations of nomadic people have been addressed by a number of studies19,23,24, the contribution of mitochondrial lineages associated with the Yamnaya horizon to the formation of people associated with the Corded Ware culture from the eastern part of the North European Plain, especially from the region of modern Poland, remains contentious. To investigate the maternal relationship between these two groups, we generated complete mitochondrial genomes from the representatives of Late Eneolithic and Early Bronze Age populations from the north-western Pontic region, including Yamnaya groups and individuals associated with the Corded Ware culture from the eastern part of the North European Plain.
Materials and Methods
Ancient DNA was extracted from the Corded Ware culture individuals excavated in south-eastern Poland (N = 12) and Moravia (N = 3). Late Eneolithic (N = 5) and Bronze Age human remains (N = 25) originated from western Ukraine and came from the Yampil barrow cemetery complex located in the north–western region of the Black Sea. Bronze Age individuals were associated with different archaeological cultures, including Yamnaya (N = 14), Catacomb (N = 2), Babyno (N = 4) and Noua (N = 5). The sampling localities are shown in Fig. 1. Detailed information about sampled individuals can be found in Table 1, Supplementary Information Text (Materials) and Supplementary Table S1.
DNA extraction, genomic DNA library preparation and Illumina sequencing
Ancient DNA was extracted from teeth or petrous bones in a laboratory dedicated to aDNA analyses at the Department of Human Evolutionary Biology, Adam Mickiewicz University in Poznan (AMU), Poland. Molecular methods used for aDNA extraction and construction of Illumina sequencing genome libraries have been previously described25. One extraction and one PCR blank each were set up as negative controls during amplification of the DNA libraries. The libraries were sequenced on Illumina HiSeq2500 (125 bp, paired-end, each library on 1/10th of a lane) or Illumina HiSeq X Ten (150 bp, paired-end, each library on 1/20th of a lane) at the SNP & SEQ technology platform in Uppsala, Sweden.
Mitochondrial DNA capture enrichment and Ion Torrent PGM sequencing
The RNA bait library for complete mitochondrial genomes (mtDNA) enrichment was prepared from two present day individuals of known haplotypes following26 with minor modifications (see Supplementary Information Text for details). Two rounds of mtDNA enrichment were carried out on 12 libraries that yielded low levels of endogenous mtDNA reads in the initial Illumina shotgun screening (Supplementary Table S1). Enriched mtDNA Illumina libraries were converted into Ion Torrent sequencing libraries by PCR with indexed fusion primers (Supplementary Table S6). PCR-amplified Ion Torrent libraries were purified, prepared into equimolar pools of up to five libraries per pool and sequenced on the Ion 318 chip using the Ion Torrent PGM system (Thermo Fisher Scientific) at Molecular Biology Techniques Laboratory, Faculty of Biology, AMU.
Preliminary pipeline computation was undertaken using resources provided by the Swedish National Infrastructure for Computing (SNIC) through the Uppsala Multidisciplinary Center for Advanced Computational Science (UPPMAX)27. Illumina sequencing data were processed using a custom analytical pipeline28.
First, we trimmed residual removed adapters and merged read pairs according to29. BWA software package version 0.7.830 was used to map merged reads as single-end reads against the revised Cambridge Reference Sequence (rCRS)31,32 (GenBank: NC_012920). The ratio of reads mapping to Y and X chromosomes (Ry) (with mapping quality greater than 30) was calculated to assign molecular sex to the individuals sequenced on the Illumina platform33.
FASTX-Toolkit (http://hannonlab.cshl.edu/fastx_toolkit/) was used to demultiplex sequences generated by the PGM Ion Torrent. Cutadapt v.1.8.134 was then used to remove long (−M 110), short (−m 35), and low-quality sequences (−q 20). The filtered reads were analyzed with FastQC v 0.11.335 using the options described previously36. The sequences were mapped against the rCRS using TMAP v3.4.137. To collapse duplicate sequence reads with identical start and end coordinates (for both PGM and Illumina sequence data) we used FilterUniqueSAMCons.py38. Misincorporation patterns were assessed using mapDamage v2.0.539. For each individual, contamination levels were estimated with the use of schmutzi40 as previously in36. Consensus sequences were built using ANGSD v0.91041. We accepted only reads with a minimum mapping score of 30, a minimum base quality of 20, and a minimum coverage of 3, as in36. Where necessary, comparative published mt genomes were reconstructed from the bam files with the use of the same methods as described above. Mitochondrial haplogroups (mtDNA hgs) were assigned for each individual with the use of HAPLOFIND42, the PhyloTree phylogenetic tree build 1743 and Mitomaster44.
For comparative studies we used ancient mtDNA data obtained from the literature, the European Nucleotide Archive (www.ebi.ac.uk/ena) and NCBI GenBank (www.ncbi.nlm.nih.gov) web databases. All comparative populations used for principal component analysis (PCA), t-Distributed Stochastic Neighbor Embedding (t-SNE), pairwise genetic distances (FST), and AMOVA analysis are described in detail in Supplementary Tables S2–S5. We divided the population associated with the Corded Ware culture into a western group encompassing comparative German samples, and an eastern group comprising individuals linked with the Corded Ware culture from this study. Similarly individuals associated with the Yamnaya horizon were divided into western and eastern groups according to their geographic localization. Due to overlapping dating of the samples and their common origin, the western Yamnaya horizon group encompassed individuals associated with Yamnaya culture and Late Eneolithic from this study, and additional comparative samples from present day Ukraine and Bulgaria24,45,46. Because of potential maternal kinship between two Catacomb culture-associated individuals from this study, only one of them was used in the PCA and in a comparative Catacomb group. The eastern Yamnaya horizon group consisted of Yamnaya samples from the Samara region in Russia14,17,46. The map with archaeological sites from which the studied individuals originated, was generated using QGIS 2.12.247 (Fig. 1).
PCA for frequencies of mtDNA hgs was calculated using Python 3.5 and Scikit-learn v. 0.18.1 package48. We utilized Matplotlib 1.5.1 Python package49 for plotting the PCA results and mtDNA hgs loadings.
We have used a centroid-based clustering approach to examine the PCA results and search for logical clusters within our data. We applied the k-means method (as implemented in Scikit-learn v. 0.18.1 Python package)48 to the first 5 principal components from the PCA analysis (for details see Supplementary Information Text). All k-means variants can be found in the Supplementary Material Text and Supplementary Fig. S27.
To further explore the relatedness of populations using the mtDNA hg frequencies, we ran the t-Distributed Stochastic Neighbor Embedding (t-SNE) analysis50 as implemented in Scikit-learn (18.1) Python package.
FST values were computed in Arlequin 3.551. In total, 18 populations were used in FST and following AMOVA analyses, consisting only of ancient individuals with complete mtDNA genomes (Supplementary Table S3), using Nei’s average number of pairwise differences52 and 10,000 permutations to estimate the p-values. To visualize FST values we employed multidimensional scaling (MDS) analysis with the use of Python Scikit-learn 0.18.1 package48.
We have used the traditional method of analysis of molecular variance (AMOVA)53 to assess the population differentation using complete mt genomes. We ran AMOVA using the Arlequin 3.5 software package (for details see Supplementary Information Text).
Availability of data and material
Mitochondrial genome sequences were deposited in GenBank under accession numbers MH176332, MH176333, MH17635 - MH176355.
Ancient mitochondrial genomes
Out of the 45 analyzed samples, we successfully obtained 23 mtDNA genomes, belonging to individuals associated with the Corded Ware culture (N = 11), and with the Late Eneolithic (N = 3) and Bronze Age (N = 9) from western Pontic region (Table 1 and Supplementary Table S1). Eleven of the mtDNA genomes were retrieved from the Illumina shotgun screening data with the depth-of-coverage (DoC) ranging between ca. 5.4× to 64×. The remaining twelve mtDNA genomes were retrieved from the hybridization capture enrichment followed by PGM Ion Torrent sequencing, and yielded DoC ranging between 11× to 194×. Nucleotide misincorporation patterns assessed using MapDamage showed characteristic aDNA damage involving C-T and G-A transitions at the 5′ and 3′ ends of DNA fragments, respectively (Supplementary Fig. S26). Schmutzi estimations conducted for each individual showed low levels of contaminations (1–3%) (Supplementary Table S1). Additionally, we found no contamination in the extraction blanks and PCR negative controls. The mitochondrial DNA data are deposited in GenBank under accession numbers MH176332, MH176333, MH17635-MH176355.
In general, the individuals associated with the Corded Ware culture and the Yamnaya horizon were assigned to mtDNA lineages common among modern-day west Eurasian groups (hgs H, I, J, T, U2, U4, U5, W, X). Individuals associated with the Eneolithic and Yamnaya cultures were assigned to hgs U2e1a1, U5a2b, H2a1 and U5a1i1, U4c1, W3a1, W3a1a, respectively (Table 1 and Supplementary Table S1). Other Bronze Age individuals from the western Pontic region belonged to hg X4 (two individuals associated with Catacomb culture) and J2b1a, J1c2m, H1e (three individuals associated with Babyno culture). Individuals associated with the Corded Ware culture were assigned to hgs H (H6a, H15a1, H2a2, H1e), U4 (U4b1a1a, U4a2f), W5b, U5a1b, I4a and T2e (Table 1 and Supplementary Table S1).
Genetic distances between ancient populations
The PCA results described 50.62% of the variability and were combined with the k-means clustering (with the k value of 5 as the best representation of the data, at the average silhouette of 0.2608) (Figs 2 and S27). Based on these results individuals associated with the western and eastern Yamnaya horizon (YAE and YAW in Fig. 2) were grouped within a cluster consisting of populations from central Eurasia and Europe (blue cluster) including people associated with eastern Corded Ware culture (CWPlM) and Baltic Corded Ware culture (CWBal). This cluster did not contain any populations linked with early Neolithic farmers (red), or hunter-gatherers (green and yellow). On the other hand, k-means clustering linked the western Corded Ware culture-associated population (CWW) with Near East and Neolithic farmer ancestry groups from western and central Europe.
The k-means clustering on t-SNE results was consistent with the PCA results, although the approach to dimension reduction of the t-SNE algorithm is completely different than that of the PCA. Scatterplot of populations colored according to the k-means k = 7 (average silhouette 0.5158) (Fig. 3) represented the main components of European genetic ancestry. Individuals associated with western Corded Ware culture (CWW in Fig. 3) clustered with the early Neolithic Farmer ancestry group (dark green), while people associated with CWPlM from this study and CWBal showed greater affinity to the eastern European cluster (dark blue) which included mostly steppe populations associated with the Yamnaya horizon, Srubnaya, and western Scythians. Another clearly defined cluster was the central-western Asia group (red) with Andronovo, Catacomb and Siberian populations. Iron Age central Asia cluster (light green) consisted mostly of Altai and Russian Scythians and populations from Siberia and Kazakhstan. The strong hunter-gatherer ancestry cluster (light blue) included the hunter-gatherers and Neolithic populations with major hunter-gatherers component associated with the Neolithic Ukraine and the Scandinavian Pitted Ware culture. The last two clusters comprised of populations linked with the post-Linear Pottery culture from central Europe and other Middle and Late Neolithic groups from Europe (yellow and purple).
Pairwise mtDNA-based FST54 values (Supplementary Table S4), visualized on MDS using the raw non-linearized FST (stress value = 0.099) (Fig. 4), also supported the PCA results and indicated that western and eastern Yamnaya horizon groups (YAW and YAE) were closer to people associated with the eastern Corded Ware culture (CWPlM) (FST = 0.00; FST = 0.01, respectively; both p > 0.05) and Baltic Corded Ware culture (CWBal) (FST = 0.00; FST = 0.00, respectively; both p > 0.05), than to populations associated with the western Corded Ware culture (CWW) (FST = 0.047 and FST = 0.059, respectively; both statistically significant p < 0.05). Western and eastern Yamnaya horizon groups also showed close genetic affinity to the Iron Age western Scythians (SCU) (FST = 0.0022 and FST = 0.006, respectively, both p > 0.05). The most distant populations to the Yamnaya horizon groups were western hunter-gatherers (HGW) (FST = 0.23 and FST = 0.15, p < 0.001; see Supplementary Table S4).
The FST-based MDS reflected the general European population history in the post-LGM period as the three highest FST scores were detected between western hunter-gatherers (HGW) and people associated with Linear Pottery culture (LBK) (FST = 0.33, p < 0.001), between eastern hunter-gatherers (HGE) and Baltic hunter-gatherers (HGBal) (FST = 0.35, p < 0.05), and between western (HGW) and eastern hunter-gatherers (HGE) (FST = 0.36, p < 0.05) (Fig. 4 and Supplementary Table S4). The Yamnaya horizon groups (YAE and YAW) were placed centrally between northern hunter-gatherers (HGN) and Neolithic farmers (LDN), in direct proximity to the Bronze and Iron Age populations from Eastern Europe (SCU, BARu, SRU) and close to individuals associated with eastern and Baltic Corded Ware culture (Fig. 4).
We investigated the within- and between-group variability using an AMOVA analysis. Concentrating on the eastern and western Corded Ware groups, we found the best variability distribution when the individuals associated with the western Corded Ware culture (CWW in Supplementary Table S5) were grouped together with the Middle Neolithic/Bronze Age Central Europe groups, while individuals associated with the eastern and Baltic Corded Ware culture (CWPlM, CWBal), and Yamnaya horizon groups (YAW and YAE) clustered together with the eastern Europe populations (from the Middle Neolithic-Bronze Age) (4.68% of variability among groups, 3.04% among populations within groups).
By analyzing ancient mitochondrial genomes, we show that people from the eastern and western Corded Ware culture were genetically differentiated. Individuals associated with the eastern Corded Ware culture (from present day Poland and the Czech Republic) shared close maternal genetic affinity with individuals associated with the Yamnaya horizon while the genetic differentiation between individuals associated with the western Corded Ware culture (from present-day Germany) and the Yamnaya horizon was more extensive. This decreasing cline of steppe related ancestry from east to west likely reflect the direction of the steppe migration. It also indicates that more people with steppe-related ancestry, likely both females and males, contributed to the formation of the population associated with the eastern Corded Ware culture. Similarly, closer genetic affinity to populations associated with Yamnaya horizon can be observed in Baltic Corded Ware groups, which confirms earlier indications of a direct migrations from the steppe not only to the west but also to the north, into the eastern Baltic region18,19,55. The mitochondrial data further suggests that with increased distance from the source populations of the steppe, the contribution of local people increase, which is seen as an increase of maternal lineages of Neolithic farmer ancestry in individuals associated with the western Corded Ware culture.
Among the analyzed samples, we identified two Catacomb culture-associated individuals (poz220 and poz221) belonging to hg X4. They are the first ancient individuals assigned to this particular lineage. Haplogroup X4 is rare among present day populations and has been found only in one individual each from Central Europe, Balkans, Anatolia and Armenia56,57. Moreover, we have reported mtDNA haplotypes that might be associated with the migration from the steppe and point to genetic continuity in the north Pontic region from Bronze Age until the Iron Age. These haplotypes were assigned to hgs U5, U4, U2 and W3. MtDNA hgs U5a and U4, identified in this study among Yamnaya, Late Eneolithic and Corded Ware culture-associated individuals, have previously been found in high frequencies among northern and eastern hunter-gatherers19,23,28,55,58,59. Moreover, they appeared in the north Pontic region in populations associated with Mesolithic (hg U5a)45, Eneolithic (Post-Stog) (hg U4)24, Yamnaya (hgs U5, U5a)24, Catacomb (hgs U5 and U5a)24 and Iron Age Scythians (hg U5a)60, suggesting genetic continuity of these particular mtDNA lineages in the Pontic region from, at least, the Bronze Age. Hgs U5a and U4-carrying populations were also present in the eastern steppe, along with individuals from the Yamnaya culture from Samara region14,17, the Srubnaya23 and the Andronovo from Russia14. Interestingly, hg U4c1 found in the Yamnaya individual (poz224) has so-far been found only in two Bell Beaker- associated individuals61 and one Late Bronze Age individual from Armenia14, which might suggest a steppe origin for hg U4c1. A steppe origin can possibly also be assigned to hg U4a2f, found in one individual (poz282) but not reported in any other ancient populations to date, and to U5a1- the ancestral lineage of U5a1b, reported for individual poz232, which was identified not only in Corded Ware culture-associated population from central and eastern Europe55,61 but also in representatives of Catacomb culture from the north Pontic region24, Yamnaya from Bulgaria and Russia17,46, Srubnaya23 and Andronovo62-associated groups. Hg U2e, reported for Late Eneolithic individual (poz090), was also identified in western Corded Ware culture-associated individual23 and in succeeding Sintashta14, Potapovka and Andronovo23 groups, suggesting possible genetic continuity of U2e1 in the western part of the north Pontic region.
Hgs W3a1 and W3a1a, found in two Yamnaya individuals from this study (poz208 and poz222), were also identified in Yamnaya-associated individuals from the Russia Samara region17 and later in Únětice and Bell Beaker groups from Germany61,63, supporting the idea of an eastern European steppe origin of these haplotypes and their contribution to the Yamnaya migration toward the central Europe. The W3a1 lineage was not identified in Neolithic times and, thus, we assume that it appeared in the steppe region for the first time during the Bronze Age. Notably, hgs W1 and W5, which predate the Bronze Age in Europe, were found only in individuals associated with the early Neolithic farmers from Starčevo in Hungary (hg W5)64, early Neolithic farmers from Anatolia (hg W1-T119C)23, and from the Schöningen group (hg W1c)61 and Globular Amphora culture from Poland (hg W5)45.
This study is the first to present mitochondrial genome data from the population associated with Corded Ware culture from the south-eastern part of present-day Poland. As this area is geographically close to the steppe region, it provides us with a better picture of the early steppe migration between 3,000 and 2,500 BC. Although our results indicate a contribution of females as well as males to the formation of populations associated with eastern Corded Ware culture, more detailed studies of X chromosome data are needed to clearly resolve female and male migrations, especially between the western Pontic steppe and the eastern part of the North European Plain.
Ancient mitochondrial genome data from the western Pontic region and, for the first time, from the south-eastern part of present day Poland, show close genetic affinities between populations associated with the eastern Corded Ware culture and the Yamnaya horizon. This indicates that females had also participated in the migration from the steppe. Furthermore, greater mtDNA differentiation between populations associated with the western Corded Ware culture and the Yamnaya horizon points to an increased contribution of individuals with a maternal Neolithic farmer ancestry with increasing geographic distance from the steppe region, forming the population associated with the western Corded Ware culture. Among the analyzed samples, we identified, for the first time in ancient populations, two Catacomb culture-associated individuals belonging to the now-rare mtDNA hg X4.
Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
We are grateful to Maanasa Raghavan from the Cambridge University, for helpful comments on the text. We also thank to Sylwia Łukasik from Adam Mickiewicz University in Poznan, for help in collecting bone samples. The project was supported by the National Programme for the Development of Humanities [NPRH 0108/NPH3/H12/82/2014]. E.E. was supported by ELIXIR CZ research infrastructure project (MEYS Grant No: LM2015047) including access to computing and storage facilities, and M.J. and H.M. were supported by Knut and Alice Wallenberg Foundation.