Introduction

Recent studies of genome-wide autosomal single nucleotide polymorphisms (SNPs) showed that Siberian populations are characterized by significant amount of West Eurasian admixture.1 This is consistent with results of mitochondrial DNA (mtDNA), and Y chromosome studies showing that North-Asian populations have a closer genetic relationship with Central Asian and West-Eurasian populations.2, 3, 4, 5, 6, 7 The presence of West-Eurasian genetic components in North and East Asia can be interpreted either as the evidence of ancient migration via the northern route8 or simply as the reflection of recent population admixture.9, 10 However, results of mtDNA analysis in northern Eurasian populations do not support a northern Asian migration route out of Africa, but they do predict that there were at least two migrations into South Siberia, one from East Asia and one from West Eurasia.6 West Eurasian mtDNA haplogroups found in gene pools of South Siberians (for example, N1e, I4, J1b2, N1a, U4 and X2e) demonstrate an obvious link between populations of Siberia and those of West Asia, the Caucasus and East Europe. It is noteworthy that complete mtDNA genome-based coalescence times for haplogroups X2e, J1b2 and N1a suggest their postglacial flows (about 12–15 Ka) from the west. Similarly, evidence of geographic distribution and Y chromosome short tandem repeat (Y-STR) diversity indicates that haplogroup Q1a3*-M346 (sister haplogroup of the Native American-specific haplogroup Q1a3a-M3), R1b-M343 and R1a1*-M17 probably migrated into North Asia and northern East Asia from the west.7 The age estimation of Y-STR variation within these Y chromosome haplogroups suggests the existence of such migrations in postglacial time (about 15–18 Ka). In addition, ancient DNA studies demonstrate that West Eurasian admixture was present in South Siberia and Northwest China as early as the early Bronze Age, thus indicating that West Eurasian genetic input occurred earlier than the proposed relatively recent population admixture.11, 12, 13, 14

It is assumed that haplogroup P-92R7 consisting of subclades Q-M242 and R-M207 originated in Central Asia about 40 Ka.15 Haplogroup Q-M242 is present in Eurasia, but it is most frequent in North Asia.7, 16, 17, 18, 19 It was suggested also that Q-M242 carriers migrated through the Altai/Baikal region of Siberia into the Americas.16, 17, 20 Haplogroup R-M207 consisting of two currently defined subclades R1-M173 and R2-M124 occurs at high frequencies in some regions of Eurasia. Haplogroup R1-M173 is estimated to have arisen during the height of the Last Glacial Maximum, most likely in Southwest Asia.8, 15 This haplogroup includes two main subclades, R1a-M420 and R1b-M343, with distinctive Eurasian distribution.18, 21, 22, 23 R1a-M420 is most frequently observed in East Europe, Altai region of Siberia and Southwest Asia.2, 24, 25 R1b-M343 originated in West Asia consists of two most frequent subclades, R1b1b1-M73 and R1b1b2-M269.8, 18, 23, 26 Haplogroup R1b1b1-M73 is observed mainly in Asia, and haplogroup R1b1b2-M269 is frequent in Europe, especially on the west, but it is present in West Asia as well.7, 23 Moreover, a recent study by Balaresque et al.27 suggests a West Asian origin of R1b1b2-M269 and its Neolithic expansion. Haplogroup R2-M124 is most often observed in Asia, especially in South and Central Asia.7, 8, 15, 25

Although the presence of haplogroups R-M207 and Q-M242 in South Siberia and neighboring regions has been reported earlier,16, 17, 20, 28, 29 there has not been detailed examination of their substructure in Siberians, although this might be very important for providing new insights into the early migrations into the Americas. Therefore, in this study, we have performed phylogeographic analysis of subclades within haplogroups R-M207 and Q-M242 in different populations of Siberia and northern East Asia, based on high-resolution genotyping of Y chromosomes using both SNPs and STR-based approaches.

Materials and methods

Subjects and DNA typing

A total of 885 samples (whole blood and hair roots specimens) from unrelated males were collected in populations of South Siberia (Altaians, Teleuts, Khakassians, Shors, Tuvinians, Todjins, Tofalars, Sojots, Buryats), Central and East Siberia (Evenks, Evens, Yakuts and Koryaks), East Asia (Mongols and Koreans) and East Europe (Kalmyks) (Table 1). As the Mongolic-speaking Kalmyks appear to be descendants of western Mongolians (Oirats) migrated to the Caspian region in the 17th century from Central Asia and taking into account the considerable genetic similarity between Kalmyks, Buryats and Mongols demonstrated both by Y chromosome and mtDNA variability data,6, 30 the Kalmyks are considered here as ‘North Asians’. All samples studied were collected with appropriate ethics approval and informed consent.

Table 1 Haplogroups Q, R2 and R1b distribution (no. of individuals and % values in parenthesis) in populations studied

Haplogroup P markers 92R7 (for the whole haplogroup P), M242 (for Q), MEH2 (for Q1a), M120 (for Q1a1), M25 (for Q1a2), M346 (for Q1a3), M3 (for Q1a3a), P48 (for Q1a4), P89 (for Q1a5), M323 (for Q1a6), M173 (for R1), M343 (for R1b), M73 (for R1b1b1), M269 (for R1b1b2) and M124 (for R2) were assayed using PCR primers summarized in Karafet et al.31 All polymorphisms were typed by means of DNA sequencing on ABI 3130 Genetic Analyzer (Applied Biosystems, Foster City, CA, USA). The Y-SNP haplogroup nomenclature used here is according to the recommendations of the Y Chromosome Consortium.31

A total of 121 samples belonging to haplogroups Q (88 samples), R1b (23 samples) and R2 (10 samples) were analyzed at 12 STR loci (DYS19, DYS385a, DYS385b, DYS389I, DYS389II, DYS390, DYS391, DYS392, DYS393, DYS437, DYS438, DYS439) using PowerPlex Y System (Promega, Madison, WI, USA). Electrophoresis results were analyzed using Genscan v. 3.7 and Genotyper v. 3.7 software (Applied Biosystems).

Data analysis

Median joining networks of STR-haplotypes were constructed using the Network 4.6 program (http://fluxus-engineering.com). For the network construction, STR variants were weighted (with a weight assigned to a range of variance values) following the distribution of the number of mutations at character.32 Loci DYS385a and DYS385b were excluded from the median joining network analysis, because an unambiguous assignment of the alleles to these loci is impossible without their separate typing. In the network construction and the age calculation procedures, the allele sizes for DYS389II were determined with the subtraction of DYS389I.

The age of STR variation within haplogroups was estimated as the average squared difference in the number of repeats between all current chromosomes and the founder haplotype (formed by the median values of the repeat scores at each STR locus within the haplogroup), averaged over STR loci and divided by means of a mutation rate.25, 33 The evolutionary effective mutation rate of 6.9 × 10−4 per 25 years based on STR variation within Y chromosome haplogroups in the populations with documented short-term histories was used.33 The upper bound for divergence time of two groups of haplotypes was calculated as divergence time estimate, assuming STR variance in repeat number at the beginning of population subdivision (Vo) equal to zero.34 In the age calculation procedures, only tri- and tetranucleotide markers were used, so, besides the ambiguous loci DYS385a and DYS385b, locus DYS438 with pentanucleotide repeats was excluded from the calculations.

Results

SNP analysis of 885 Y chromosomes from 16 ethnic groups representing populations of northern East Asia demonstrates that haplogroup Q is frequent in some Siberian populations studied (Table 1). This haplogroup consists of four subgroups: Q1a*-MEH2, Q1a2-M25, Q1a3*-M346 and Q1a3a-M3. Haplogroup Q1a3*-M346 was found in Turkic-speaking Tuvinians (38%), Todjins (38.5%), Altaians (25.8%), Sojots (7.1%) and Khakassians (6.3%), and only once in Mongolic-speaking Kalmyks (1.1%). Rare haplogroup Q1a2-M25 previously detected mostly in Iranians, Turks, Uygurs, Uzbeks and Han7, 18, 28, 35 was found also in Kalmyks (1.1%). Amerindian-specific haplogroup Q1a3a-M3 was present at a frequency of 3.2% only in Tungusic-speaking Evens inhabiting the Sea of Okhotsk coast. Four Koryak individuals (10.3%) from the same region belong to paragroup Q1a* (xQ1a1, Q1a2, Q1a3, Q1a4, Q1a5, Q1a6). They are all characterized by similar Y-STR profiles (Supplementary Table S1) and probably belong to a certain haplogroup, yet, unidentified by SNP marker.

Typing of Y chromosome marker M173 allowed us to define R1-structure in geographic region under investigation. We reported earlier phylogeographic pattern of haplogroup R1a1-M17 in Siberian populations, indicating that Shors and Teleuts had the highest frequency of this haplogroup (more than 50%), and that R1a1-M17 generally was more frequent in populations from Altai and eastern Sayan region than in the adjacent areas.24 Subsequent SNP analysis demonstrates that two R1b haplogroups—R1b1b1-M73 and R1b1b2-M269—are present in some Siberian populations (Table 1). R1b1b2-M269 that is frequent in Europe is rarely observed in diverse set of Siberian populations: Evenks (2.4%), Buryats (0.7%), Mongols (4.3%) and Tofalars (6.7%). However, more interesting fact is the presence of haplogroup R1b1b1-M73 in the whole series of Turkic-speaking populations—Shors (13.2%), Teleuts (11.4%), Khakassians (3.2%), Tuvinians (1.9%), Altaians (1.1%), as well as in Mongolic-speaking Kalmyks (2.2%). Quite the contrary, the remaining R-haplogroup, R2-M124, is present only in Mongolic-speaking Buryats (2.7%) and Kalmyks (6.6%) (Table 1). Haplogroup R2-M124 has a predominant distribution in South Asia (in India and Pakistan),25 but it is also found in China (in Uygurs, Han, Hui)7 and Central Asia (in Tajiks and Kyrgyz),28 thus, its presence in some Siberian populations is likely a result of male influence from Central/East Asia.

To obtain a better resolution of phylogenetic relationships between Y chromosomes, we have analyzed haplotypes at 12 STR loci (Supplementary Table S1). Median joining network of haplogroup Q1a demonstrates subdivision of STR-haplotypes in accordance with their haplogroup affiliation into three clusters corresponding to haplogroups Q1a3*-M346, Q1a3a-M3 and Q1a*-MEH2 (Figure 1). Coalescence age of South Siberian Q1a3*-M346, based on the average squared difference in the number of tri- and tetranucleotide repeats, is about 4.03±1.25 Ka, while the age of the Koryak Q1a*-MEH2 appears to be only 1.0±1.0 Ka.

Figure 1
figure 1

Median joining network of haplogroups Q1a*-MEH2, Q1a3*-M346 and Q1a3a-M3 based on STR loci DYS19, DYS389I, DYS389II, DYS390, DYS391, DYS392, DYS393, DYS437, DYS438 and DYS439 in populations of North Asia. Each circle represents a haplotype, defined by a combination of STR markers. Circle size is shown proportional to haplotype frequency and the smallest circle represents one haplotype. The lines between circles represent mutational distance, the shortest distance being a single mutational step. Median vectors are indicated by black points. Haplotypes are labeled as follows: Alt—Altaians, Ev—Evens, Kh—Khakassians, Krk—Koryaks, St—Sojots, Td—Todjins, Tv—Tuvinians.

Median network of haplogroup R1b1b1-M73 shows that there are two subclusters of haplotypes in Siberian populations studied. One of them (designated as a in Figure 2) is determined by median haplotype 14-13-16-13-17-22-11-13-13-15-10-13 (for all loci studied: DYS19, DYS385a, DYS385b, DYS389I, DYS389II, DYS390, DYS391, DYS392, DYS393, DYS437, DYS438, DYS439) and another one (B) by haplotype 14-13-13-14-16-19-11-13-13-15-10-13 (Figure 2). The coalescence age of R1b1b1-M73 in South Siberia, based on tri- and tetranucleotide repeats, is estimated as about 18.2±10.5 Ka. The ages of subclusters A and B are amounted to 4.4±1.5 and 5.6±4.0 Ka, respectively. Analysis of published data demonstrates that these two subclusters are present simultaneously not only in Siberia but also in different ethnic populations of China7 and the Caucasus.23, 36 In addition, in the Caucasus populations, a third subcluster of R1b1b1-M73 indicated here as ‘C’ may be recognized (Figure 2). To infer migration history of haplogroup R1b1b1-M73, a further search of the subcluster-defining SNPs is required. In addition, more Y-STR data are needed for R1b1b1-M73 to make clear the origin of this haplogroup in Siberians.

Figure 2
figure 2

Median joining network of haplogroup R1b1b1-M73 based on STR loci DYS19, DYS389I, DYS389II, DYS390, DYS391, DYS392, DYS393, DYS437, DYS438 and DYS439 in populations of northern Eurasia. Designations are as in Figure 1. Haplotypes are labeled as follows: Alt—Altaians, Kh—Khakassians, Km—Kalmyks, Sh—Shors, Tel—Teleuts, Tv—Tuvinians. The network includes also haplotypes which were reported elsewhere;36 Bal—Balkars, Bash—Bashkirs, Kab—Kabardinians, Mar—Mari, Meg—Megrels, Rus—Russians, Tat—Tatars, Tur—Turks. STR-based subclusters are designated by capital letters a, b and c.

Discussion

Previous studies have shown that haplogroup R1a1-M17 is present in South Siberia from the Holocene period (11.3±3.2 Ka),24 while the oldest age estimates dating back to Mesolithic times (approximately 18 Ka) were obtained for this haplogroup in South India and South Pakistan.37 Similarly, high estimation of the age has been received for haplogroup R1b1b1-M73 (22.9±9.3 Ka), based on variation of tri- and tetranucleotide repeats in populations of the Caucasus, South Ural and East Europe.36 The age of this haplogroup in South Siberia is also high, about 18.2±10.5 Ka. However, a much younger age was found for haplogroup Q1a3*-M346 in South Siberia (4.0±1.25 Ka). One should note that the age of Q1a3*-M346 calculated from haplotypes detected in populations of China and North Pakistan appears to be much higher—17.8±4.1 Ka according to Zhong et al.7

To compare STR diversity of haplogroups Q1a3*-M346 and R1b1b1-M73 in different geographic regions we have formed a set of SNP-typed haplotypes limited by seven STR markers (DYS19, DYS389I, DYS389II, DYS390, DYS391, DYS392, DYS393) and retrieved from different population studies (Table 2). Haplogroup R1b1b1-M73 is characterized by the high intra-haplogroup variability and appears to be very ancient, being dated at about 40 Ka. We should note, however, that the calculated age of R1b1b1-M73 is so high probably due to very high repeat variance at locus DYS390 (4.31 versus 0.25–0.93 at the remaining six loci). Coalescence time estimate for this haplogroup in South Siberia corresponds to about 20 Ka, although its two phylogenetic STR-clusters (A and B) appear to be young (<4 Ka), suggesting that their entrance in South Siberia occurred in relatively recent historical times or that evolution of this Y chromosome lineages in South Siberia was associated with recent population bottlenecks. It seems that appropriate ancient DNA samples should be studied to solve this problem.

Table 2 Coalescence age estimates and median haplotypes of Y chromosome haplogroups

Similar problem exists with haplogroup Q1a3*-M346, which is dated in South Siberia at 4.5 Ka. This value is much lower than that obtained in previous studies, although those estimations were based solely on Q*(xQ1a3a-M3) STR diversity as the Y chromosome data have lacked enough resolution. For instance, Q* has been dated to 17.7 Ka in South Siberia2 or either 15.4 Ka in Mongolia.16 According to study by Zegura et al.,20 divergence time estimate for the Altaian and North Asian Q* haplotypes versus the Native American Q1a3a-M3 haplotypes was 17.2±4.6 Ka. We have also calculated divergence time estimate between Q1a3*-M346 haplotypes found in South Siberians (n=80) and Q1a3a haplotypes revealed in northern Native Americans (n=184; retrieved from Malhi et al.38). As a result, we found that, despite the low current diversity of haplogroup Q1a3*-M346 in South Siberia, divergence time between Q1a3*-M346 and Q1a3a-M3 is equal to 13.81±3.88 Ka, thereby pointing to a relatively recent entry date to America. One should note that similar divergence time was estimated for the other Amerindian founder haplogroup C3b-P39—13.9±3.2 Ka.20 In addition, complete mtDNA genome studies have shown that all pan-American mtDNA haplogroups reveal entry times of 15–18 Ka, which are suggestive of a concomitant post-Last Glacial Maximum arrival from Beringia with early Amerindians.39, 40, 41, 42 As for the question whether the Q1a3*-M346 was present in South Siberia in the post-Last Glacial Maximum, we can report that divergence time estimate for South Siberian versus Chinese/North Pakistan Q1a3*-M346 haplotypes is 15.29±5.49 Ka, thus supporting the idea for the real presence of Q1a3*-M346 in South Siberia in those times.

As for the presence of Amerindian-specific haplogroup Q1a3a-M3 in Evens, this finding is consistent with previous studies of Northeast Siberian populations. Earlier, haplogroup Q1a3a has been discovered at a low frequency in Siberian Eskimos, Chukchi and Evens.43, 44 Karafet et al.43 have hypothesized that this haplogroup may have originated in the New World/eastern Beringia and that its infrequent presence in Northeast Siberian populations can be explained by back-migration from Alaska to Siberia. A search of Q1a3a-M3 STR-haplotypes revealed in Evens in the YHRD 3.0 database (http://www.yhrd.org; release 36 built on 15 April 2011; 91 493 haplotypes within 686 world populations) has shown that matching haplotypes (for loci DYS19, DYS385a, DYS385b, DYS389I, DYS389II, DYS390, DYS391, DYS392 and DYS393) are rarely present in modern Native American populations. Thus, this favors back-migration hypothesis for the presence of Q1a3a-M3 haplotypes in Northeast Siberia.

An interesting cluster of Q1a*-MEH2 haplotypes was detected in Koryaks inhabiting the Sea of Okhotsk coast. Although such haplotypes are relatively frequent, exclusively in Koryaks (10.3%), they are characterized by low age, which corresponds to about 1.0 Ka. Nevertheless, it seems probable that this Q1a*-MEH2 cluster appears to be much older taking into account that Q1a* haplotype (xQ1a1, Q1a3, Q1a4, Q1a5, Q1a6) has been detected in extinct Palaeo-Eskimo individuals belonging to the Saqqaq culture (dated approximately 4.75–2.5 14C ky).1 It is noteworthy that according to results of SNP genotyping, the populations closest to the Saqqaq individuals are Koryaks and Chukchi.1 As that study suggested that the ancestral Saqqaq peoples separated from their Old World relatives about 5.5 Ka, it is possible that Q1a*-MEH2 lineage detected in Koryaks represents an ancient genetic component, which in the past united the peoples of Northeast Asia, North America and Greenland. In order to determine the range of Q1a*-MEH2 haplotypes detected in Koryaks we have performed a search for similar STR haplotypes in the YHRD 3.0 database. As a result, we did not find identical haplotypes among 91 493 haplotypes within 686 world populations, but the only similar 9-marker haplotype (loci DYS19, DYS385a, DYS385b, DYS389I, DYS389II, DYS390, DYS391, DYS392 and DYS393) was found in Yukaghirs from Northeast Siberia (Yakutia). According to Pakendorf et al.,29 this haplotype (13-11-19-15-16-25-11-15-13; one-step differences from the Koryak Q1a*-MEH2 haplotypes are shown in bold) was detected in 4 out of 13 Yukaghirs studied and belongs to haplogroup Q-P36. Therefore, it is likely that the range of Q1a*-MEH2 may cover a distance of about 1000 km between the coasts of East Siberian Sea and Sea of Okhotsk. Coalescence age of the Koryak/Yukaghir Q1a*-MEH2 haplotypes is about 3.5±1.5 Ka, that is within the bounds of the Saqqaq culture dating.

To sum up, our study demonstrates ancient genetic links between modern Siberians and Native Americans and supports a single-migration model, with a common genetic source for Native Americans in the Altai-Sayan region of South Siberia.20 In addition, we found Y chromosome evidence demonstrating the close relationship between extinct Palaeo-Eskimo individual belonging to Saqqaq culture and modern Koryaks from Northeast Siberia, thus indicating that this region was a starting point of additional migrations across the Bering Strait.