Diverse origin of mitochondrial lineages in Iron Age Black Sea Scythians

Scythians were nomadic and semi-nomadic people that ruled the Eurasian steppe during much of the first millennium BCE. While having been extensively studied by archaeology, very little is known about their genetic identity. To fill this gap, we analyzed ancient mitochondrial DNA (mtDNA) from Scythians of the North Pontic Region (NPR) and successfully retrieved 19 whole mtDNA genomes. We have identified three potential mtDNA lineage ancestries of the NPR Scythians tracing back to hunter-gatherer and nomadic populations of east and west Eurasia as well as the Neolithic farming expansion into Europe. One third of all mt lineages in our dataset belonged to subdivisions of mt haplogroup U5. A comparison of NPR Scythian mtDNA linages with other contemporaneous Scythian groups, the Saka and the Pazyryks, reveals a common mtDNA package comprised of haplogroups H/H5, U5a, A, D/D4, and F1/F2. Of these, west Eurasian lineages show a downward cline in the west-east direction while east Eurasian haplogroups display the opposite trajectory. An overall similarity in mtDNA lineages of the NPR Scythians was found with the late Bronze Age Srubnaya population of the Northern Black Sea region which supports the archaeological hypothesis suggesting Srubnaya people as ancestors of the NPR Scythians.

of Siberia 2,8 . The forest-steppe zone of the core Scythian territory in the NPR was settled by populations with agro-pastoral economy, while the nomadic and semi-nomadic Scythian tribes occupied the steppe regions adjacent to the northern Black Sea. Archaeological studies of the Scythian and pre-Scythian period sites in the forest-steppe zone of the NPR pointed towards autochthonous origins of local agro-pastoral Scythian populations. Those groups which first emerged between the middle 7 th -3 rd century BCE, may have been formed on the foundation of pre-existing groups belonging to such cultures as Srubnaya (Timber Grave) and Thracian Hallstatt 9 . The origins of nomadic Scythians in the steppes of Central Asia 10 were supported by recent archaeological findings of elements of developed Scythian material culture in a series of kurgans in the western Sayan Mountains in southern Siberia dated to 9 th -7 th century BCE 11 .
The relationships between the groups of the Scythian horizon are not fully understood. Some researchers regard the NPR Scythians and Scytho-Siberians as one society on account of similar animal motifs on the products of their material culture, while others consider them to be different populations each having a distinct origin and geographic specificity yet sharing cultural traditions 12 . Moreover, the ancestral relationships between the NPR Scythians and local predecessor populations in the Ponto-Caspian region are neither fully resolved.
Most of the current knowledge about the genetic relations among the populations belonging to the Scythian cultural horizon is based on studies of non-coding mitochondrial DNA (mtDNA) fragments from Scythian remains from the lower Don and southern Ural regions in Russia 13 as well as Pazyryks from Altai and Inner Mongolia 7,14-18 . To date, only one complete mt genome of a Scythian individual from Russia has been published 19 . Mitochondrial lineages in the studied populations consist of an overlapping mix of haplogroups of east and west Eurasian descent, which does not clarify their ultimate origins. Here we aim to identify the maternal origin of the NPR Scythians and their genetic affinities to other contemporary Scythian groups through the analyses of complete mtDNA genomes. The investigated human remains were excavated from kurgans, crypts and ground burials in the main area of Scythian distribution, including territories in the lower Dniester, lower and middle Dnieper and Crimea.

Materials and Methods
Materials. We extracted ancient DNA (aDNA) from 29 Iron Age Scythian individuals excavated in present-day Moldova (n = 21) and Ukraine (n = 8). Moldavian samples were obtained from the Archaeological Laboratory collection at the Taras Shevchenko University in Tiraspol. Ukrainian Scythians came from the Archeology Archives of the Institute of Archaeology, National Academy of Sciences of Ukraine in Kyiv and Odessa, and the Department of Culture and Tourism of the Cherkasy Regional State Administration. Detailed information about the ancient individuals can be found in Supplementary Information Text (Materials and Methods) and Supplementary Table S1.
From each Scythian individual from Moldova two teeth were collected and transferred to the Department of Human Evolutionary Biology at the Adam Mickiewicz University in Poznań (AMU) in Poland. Ukrainian samples consisted of both bone fragments (n = 5) and teeth (n = 3), which were analyzed at the Archaeological Research Laboratory (AFL), Department of Archaeology and Classical Studies, Stockholm University (SU) in Sweden and the Archeogenetics Laboratory at Grand Valley State University (GVSU) in Allendale, Michigan, USA.  DNA extraction, library construction and Illumina sequencing. All procedures leading to Next Generation Sequencing (NGS) were performed in the ancient DNA laboratories located at AMU and the AFL. Mitochondrial hypervariable region I (HVRI) and haplogroup-diagnostic coding regions of mtDNA of the Ukrainian samples were analyzed at GVSU using the methods described in the Supplementary Information Text (Materials and Methods). Osteological samples underwent decontamination procedures including NaOCl and UV treatments (Supplementary Information Text (Materials and Methods)), followed by DNA extraction as described by refs 20 and 21. Illumina-compatible blunt-end libraries were prepared following 22 and screened on Illumina HiSeq 2500 High Output v4 (2 × 125 bp). For more details, see Supplementary Information Text (Materials and Methods).
Illumina sequencing was performed at the National Genomics Infrastructure (NGI) in Stockholm, Sweden. Sequence data were merged and mapped to human genome as previously published 23 . All primary pipeline computations were performed on resources provided by the Swedish National Infrastructure for Computing (SNIC) through Uppsala Multidisciplinary Center for Advanced Computational Science (UPPMAX) under projects b2013240 and b2015307 24 . Mitochondrial DNA capture enrichment and Ion Torrent PGM sequencing. We used commercially biotinylated probes (MYbaits ® ) for human mtDNA provided by MYcroarray ® (Ann Arbor, MI, USA; www. mycroarray.com) for mtDNA enrichment by hybridization capture on the Illumina sequencing libraries. The procedure was undertaken on 25 libraries, which did not yield sufficient mtDNA genome coverage (less than 5x), after initial Illumina screening (Supplementary Table S1). We performed two rounds of target enrichment following the manufacturer's protocol version 2.3.1 (http://www.mycroarray.com/pdf/MYbaits-manual-v2.pdf). More details about mitochondrial genome capture can be found in Supplementary  Bioinformatic analysis. Customizable analytical pipeline was used to process Illumina sequencing data as described in ref. 23. Read pairs were merged, and adapters were trimmed according to ref. 22. Merged reads were mapped as single-end reads against the revised Cambridge Reference Sequence (rCRS) 26,27 (GenBank: NC_012920) with the use of BWA software package version 0.7.8 28 .
FilterUniqueSAMCons.py was used to collapse clonal reads with identical start and end coordinates, for both PGM and Illumina sequence data as in ref. 22. Misincorporation patterns were determined with the use of map-Damage v2.0.5 32 . Final sequences were visualized using Biomatters IGV software v2.3.66 33 . ANGSD v0.910 34 was applied to build consensus sequence accepting only reads with mapping score of 30, a minimum base quality of 20, and a minimum coverage of 3.
Mitochondrial haplotypes were determined for each sample with the use of HAPLOFIND 35 , and the PhyloTree phylogenetic tree build 17 36 . The mutations reported as unexpected or missing were visually inspected in the original binary alignment map (BAM) files in IGV 33 .
The ratio of reads mapping to Y and X chromosomes (R y ) was calculated to determine molecular sex of individuals sequenced on the Illumina platform 37 . Individuals with the R y ratio ≤ 0.016 were identified as females, while those with R Y ≥ 0.077 were determined as males (Supplementary Information Text Table S7). The molecular sex calculation was restricted only to the DNA sequence reads with mapping qualities of at least 30.  Table S2). PCA was computed using RapidMiner Studio 7 (RapidMiner Inc., Boston, MA, USA) and plotted using Matplotlib 1.5.1 Python package 38 .
We have performed top-down clustering analysis in the form of k-means algorithm to explore the unbiased population relations in the PCA based on haplogropup frequencies sample-set. K-means analysis was performed in RapidMiner Studio 7 (RapidMiner Inc., Boston, MA, USA).
Pairwise genetic distances (F ST ) were applied only to samples with complete ancient mt genomes (Supplementary Table S3). F ST values and Nei's average number of pairwise differences 39 were computed in Arlequin 3.5 40 . We tested the hierarchical partitioning of the genetic variance in different setups by analysis of molecular variance (AMOVA) 41 as implemented in Arlequin. Populations used in AMOVA were the same as for the F ST analysis and the list is provided in Supplementary Table S3. We have tested several configurations of population groupings to evaluate the best matching position of Scythian population from the NPR and its relation to other populations. Multidimensional scaling (MDS) of F ST values was computed using Python scikit-learn 0.17 package 42 . The maps were created using QGIS 2.12.2 43 .
The median network was calculated and computed for the haplogroup U5 using the Networks 4.614 software (fluxus-engineering.com) with the most common mutations weighted reversely to their frequency.

Results
We successfully retrieved complete mt genomes for 19 out of the 29 Scythian individuals (Table 1 and  Supplementary Table S1). Shotgun screening of genomic DNA libraries generated mt genomes for four individuals with the coverage ranging from 6.5x to 70x. For the remaining samples, the mt capture enrichment followed by Ion Torrent sequencing, provided additional 15 complete mt genomes with the depth of coverage from 7x to 103.9x. Mitochondrial DNA genome data were deposited in GenBank under accession numbers KX977302-KX977320.
The analysis of damage patterns revealed the presence of damages typical for aDNA, i.e. C-T and G-A transitions accumulated at 5′ and 3′ ends, respectively ( Supplementary Fig. S11).
The results of the low-resolution mtDNA screening at GVSU were consistent with whole mt capture results from AMU (see Supplementary Table S1). Polymorphism patterns uncovered in the coding and HVRI regions in specimens SCY009 and SCY011 identified their lineages to belong to haplogroups J and A respectively. Specimens SCY006 and SCY010 were assigned to the M* and N* clades respectfully.
To trace genetic affinities between Scythians from present day Moldova and Ukraine (SCU) and other European and Asian ancient populations, their mt haplogroup frequencies were visualized in the space of principle components. The PCA plot of the first two components accounted for 43.4% of the total variance (Fig. 2). The SCU group was located in direct proximity to the central European Neolithic Corded Ware culture (CWC). It also grouped near a number of Bronze Age populations from eastern and central Europe (Srubnaya (SRU), Yamnaya (YAM) and Unetice (UNC)), as well as from central Asia (Bronze Age Kazakhstan (BAK)). Finally, k-means clustering (k value = 5), grouped SCU within a cluster further encompassing Scythians from Russia (SCR) and Tagar culture from southern Siberia (TAG). The Pazyryks from Mongolia (SCM) and Altai (SCA) were separated from SCU and grouped within the k-mean cluster consisting of Central and East Asian populations.
Slatkin's linearized pairwise F ST values calculated on complete mt genomes and visualized using MDS (Fig. 3) and heatmap (Fig. 4) (Fig. 3). Analysis of molecular variance (AMOVA) summarized and tested the distribution of genetic variability within and between subpopulations. We tested several other combinations of grouping Scythians with two or more populations. The results suggested the combination with the highest intragroup and lowest intergroup genetic variability to be the Scythians-Srubnaya sample-set (4.93% of variability among groups, − 0.73% among populations within groups). Second best result was observed in combination of Scythians with Unetice sample-set (3.05% among groups, 1.14% among populations). The AMOVA results are summarized in Supplementary Table S6.
Considering the high number of Scythian individuals identified as belonging to haplogroup U5 (31.6%), we conducted median network analysis of published ancient haplotypes belonging to U5a and U5b subdivisions and ranging from the Mesolithic (excluding hunter-gatherers predating last glacial maximum) to the Iron Age (Fig. 5). Haplotypes used in the median network analysis are described in detail in Supplementary Table S4. The Scythians in the U5a cluster were located centrally in the median network, near the U5a ancestral node. The U5a Scythian samples (SCU) grouped together with Mesolithic hunter-gatherers from Sweden (haplotypes 3, 4 and 5), Germany (8), Russia (10) and France (45 and 46). Non-Mesolithic samples in those clusters were less numerous and included Srubnaya (SRU) (57) and Yamnaya (YAM) (13 and 14) from Samara region and one representative of Karasuk culture from the Altai territory (central-south Siberia) (24). A single U5b Scythian individual (31) was distant from the East European populations and was placed basally to a broader cluster composed of west hunter-gatherers (HGW) from France (41) and Germany (47) as well as chronologically younger samples from the Middle Neolithic (MNE) and Unetice (UNC) (both from Germany) (15 and 53, respectively).

Discussion
On the basis of published data concerning the phylogeography of mt lineages distribution in ancient populations of Europe and Asia, the 19 complete mt genomes of the NPR Iron Age Scythians produced in this study fall into three main groups of different ancestry. The first group of mt lineages is represented by U5 haplotypes that are considered to be a European Hunter-Gatherer genetic component 44,45 . The second group comprises haplotypes belonging to H, J, T, W and N1b, ultimately connected to the genetic package of the early Neolithic farmers 44,46,47 , and the third group includes A, D, M10 and F mt lineages considered to be of East Eurasian origin [48][49][50][51] .
Representatives of the U5 haplogroup account for one third of the mt haplotypes identified in analyzed Scythians. These ultimately relate to West Eurasian hunter-gatherers, whose descendants extended throughout the European subcontinent and into East Eurasia from the Mesolithic to the Bronze Age. We have combined the known diversity of the prehistoric U5 lineage from different sources 19,45-47,52-56 into a median network, which allowed for identification of candidate source populations contributing to the U5a diversity in NPR Scythians. Although the network approach is still limited by the small number of available samples, our results indicate that the source populations with the most closely related haplotypes are the Bronze Age Srubnaya (SRU) (1900-1200 BCE) and the earlier Yamnaya (YAM) (3300-2700 BCE) from the Ponto-Caspian region and Bronze Age populations from the Altai Mountains, such as Karasuk (1500-800 BCE). Diverse U5a-carrying populations of the steppe such as Yamnaya, Srubnaya and Scythians shared the nomadic lifestyle, with the economic foundations transforming from wild game exploitation in the Mesolithic to pastoralic animal husbandry in the Bronze Age. Furthermore, both Yamnaya and Srubnaya were part of the Kurgan culture phenomenon 57 with the Scythian cultural horizon being the most representative for the kurgan building tradition.
The second group of identified mt genomes in the NPR Scythians is comprised of lineages that suggest associations of the NPR Scythians with the Neolithic European farming groups. Although the mt lineage composition of analyzed Scythians significantly differs from that seen in NEN and LBK groups (F ST = 0.06, F ST = 0.04, respectively, p > 0.05), particular lineages such as J1c2, T2b, H/H5, H, W and N1b1a ultimately go back to the earliest European farmers. Lineages of J1c, H5 and T2b belong to the Neolithic farming package of mtDNA haplogroups which have been found in most Neolithic and Bronze Age European populations 19,44,47,55,58 . Noteworthy, individuals belonging to the N1b1a have been found in the Neolithic Anatolia 19,59 , but the lineage has not appeared in any other Eurasian Neolithic, Bronze or Iron Age populations to date. Therefore, the presence of N1b1a in Scythians could be attributed to population migrations along the southern boundaries of the Ponto-Caspian region 60 .
Mitochondrial lineages T2b, W3a and J2b1a6 have been identified in representatives of the Bronze Age groups such as Srubnaya (T2b, J2b1a, H5) 19 , Sintashta (J2b1a, J1c) 55 , Yamnaya (W3a and T2b) 47,61 and Mezhovskaya (J2b1a) 55 along with East European Eneolithic Trypillian culture (T2b) 62 , suggesting genetic continuity of these lineages from at least the Bronze Age or even Neolithic times in NPR region. This is further supported by published data 19 which showed that almost one fifth of the genetic makeup of the Late Bronze Age Srubnaya people of the Ponto-Caspian region is of the Early European Farmer or Anatolian Neolithic ancestry possibly resulting from the admixture of populations related to Early European Farmers and Yamnaya. Thus, if the NPR Scythians are the descendants of populations related to Srubnaya, the origin of the identified farming lineages would likely be within the steppe/forest-steppe region between the NPR and southern Ural. According to previous genomic studies 47,55 the CWC people are likely to have arisen from Yamnaya background. Thus, genetic affinities of the NPR Scythians to the Yamnaya people might also explain their close genetic similarity to the CWC reflected in PCA and F ST results. Close genetic relations of the NPR Scythians and Srubnaya are supported by the F ST analyses revealing no significant differences between these two populations (Figs 3 and 4, Supplementary Table S5). AMOVA further corroborated these results showing that the combination of Scythians and Srubnaya results in the lowest genetic variability among populations in a group and the highest variability between groups (Supplementary Table S6). Our results support the archaeological interpretations concerning the origin of the Scythians. One of the versions of this theory suggests that the Srubnaya people have migrated in several waves from the Volga-Ural region to the NPR during the second half of the second millennium BCE where their descendants gave rise to the Scythians around the 7 th century BCE 63 .
The third group of lineages identified in the NPR Scythians is derived from East Eurasian ancestry. Since, to our knowledge, there is no evidence of agricultural subsistence in East Eurasian Scythians, these lineages should be considered to be genetic components associated with nomadic populations. Mitochondrial haplogroups such as A, D and F have already been found in samples from the Mesolithic, Neolithic, Bronze and Iron Age southern Siberia and the Altai 7, [14][15][16]18,55,[64][65][66][67] . Notably, haplogroup M10 found among Scythians from Glinoe, is present not only in far East Asia but also in modern populations of the Altai 68 . Among ancient populations, the M10 lineages have been found in Chinese specimens from southern Xinjiang 48 (8 th -1 st century BCE) and Xiongnu 69 (3 rd century BCE). Furthermore, haplogroup M have been identified in Pleistocene individuals from western Europe but it is thought that these lineages disappeared due to bottleneck effect in the Last Glacial Maximum 56 .  Haplogroup H8c, identified in the NPR Scythians likely belongs to East Eurasian lineages as well. Sequence analysis of 830 modern Eurasian mt genomes suggested a distinct phylogeographc history for H8, with a clustering of Near Eastern and Central Asian haplotypes of H8 and a pronounced presence of carriers of H8c in the Altai region 70 . It was further hypothesized that the distribution of this lineage could have been facilitated by nomadic migrations along the NPR coast 70 . Although our PCA analysis showed Altai Pazyryk (SCA) to be distant from the NPR Scythians (SCU) (Fig. 2), it must be emphasized that the presence of haplogroups A and H8c in the analyzed population connects NPR Scythians to the Altai and identifies this region as a possible source of these East Eurasian mt lineages. The only Scythian mt genome from southern Urals published thus far also belonged to an East Eurasian lineage, G2a 19 . The Y chromosomal lineage (R1a1a1b2a2a) reconstructed for this individual was supposedly characteristic for members of the Srubnaya culture 19 which additionally supports our conclusions concerning close genetic links between Scythians and people related to the late Bronze Age Srubnaya.
Previous analyses of mtDNA HVRI sequence data from Scythians inhabiting Rostov-on-Don region in eastern NPR also resulted in the identification of East Eurasian lineages, such as D 13 . Moreover, mt lineage D4j2 identified in the individual SCY006 was shown to have been dominant in the Pazyryk culture from Altai and Inner Mongolia (4 th -2 nd century BCE) 7,14,[16][17][18] . The presence of East Eurasian mt lineages supports those archaeological theories that acknowledge the influence of an East Eurasian element in the formation of the Scythian horizon 71,72 .
Scythians from Rostov-on-Don as well as Pazyryks from Altai and Inner Mongolia were carriers of mixed east and west Eurasian lineages, with the dominant presence of the latter at 62.5% and 53.3%, respectively 7,13-18 . Mitochondrial haplogroup analyses of the NPR Scythians from this study and those from Rostov-on-Don and Pazyryks from Altai and Inner Mongolia, reveal that, for the most part, the same lineages are found in all three groups and are often singularly represented in each group. Noteworthy, comparing the frequencies of east and west Eurasian haplogroups in all three groups of the Scythian horizon, an east-west mtDNA lineage cline is visible, for east Eurasian lineages going west-east is from 26.3% (in present study) through 37.5% (in Scythians from Rostov-on-Don) to 46.7% (in Pazyryks) with the opposite trend for west Eurasian lineages. Otherwise, mt lineage composition is comparable in all three groups of the Scythian horizon which supports their shared maternal genetic roots founded on the common east and west Eurasian substrate with an input from neighboring populations. The genetic influx of East Eurasian haplotypes might be the result of establishing relationships between migrants with European ancestry and women of east Eurasian origin as was previously proposed by 66 in case of Iron Age south Siberian populations. However, more detailed studies of autosomal DNA are needed to clearly resolve this issue.

Conclusions
Sequence data from whole mt genomes indicate three potential mtDNA lineage ancestries of the NPR Scythians. The first component traces back to west Eurasian hunter-gatherers and is represented by the lineages belonging to subdivisions of haplogroup U5. The second component is composed of mt lineages connected with Neolithic farming expansion into Europe (H, J, T, W and N1b). The last ancestral mt lineage component is comprised of east Eurasian haplotypes belonging to D, A, F1, H8 and M10 which point to association of the NPR Scythians with east Eurasian populations, in particular from the Altai region. A comparison of NPR Scythian mtDNA lineages with other ancient groups suggests close genetic affinities with representatives of the Bronze Age Srubnaya population, which is in agreement with the archaeological hypothesis suggesting Srubnaya people as the ancestors of the NPR Scythians. However, to provide additional genetic support for this hypothesis data from nuclear genomes are needed.