Introduction

Anatomically modern humans populated the Siberian plain presumably during the Upper Paleolithic,1, 2 but the geographical origin of this population settlement is more controversial.3, 4 After the initial colonization, recurrent migrations from various directions have shaped the genetic composition of indigenous populations living in Siberia.5 For the current Siberian mtDNA and Y chromosome variation, both Central Asian6, 7, 8, 9 and East Asian origin has been proposed.9, 10, 11 In addition, South Siberia may have had an important role in the peopling of the more northern regions of Siberia.12, 13, 14, 15

One of the geographically remote and genetically less studied areas is Northwest Siberia, a region located north of the Central Asian Steppes, east from the Ural mountain range and surrounded by the Ob and Yenisei river valleys. In this study, we focus on the Northwest Siberian Ugric-speaking Khanty and Mansi populations. The Khanty and Mansi are known to have crossed the Urals, the northern geographical boundary between West and East Eurasia. They migrated from Pechora and Vychegda river regions west of Urals to the Ob river valley during the first millennium AD as one ancestral Ob-Ugric population, which split into the ethnic groups of Khanty and Mansi soon after this resettlement.5 The specific aim of the present study is to analyze mtDNA and Y chromosome variation in Khanty and Mansi populations, and situate them on the North Eurasian genetic landscape. Here North Eurasia is taken to include regions north of the 51° of northern latitude and from Northeast Europe to Northeast Siberia (Figure 1). Furthermore, we try to disentangle ancient migration processes from recent genetic amalgamation of West and East Eurasian gene pools in the Northwest of Siberia.

Figure 1
figure 1

Map of the (a) 42 mtDNA and (b) 33 Y chromosome population samples analyzed with their approximate sampling locations. Circles are proportional to sample sizes and frequencies of geographically associated West Eurasian (mtDNA: HV, N1, N2, JT, UK, I, X; Y chromosome: F( × G, H, I, J, K), G, H, I, J, R1), East Eurasian (mtDNA: A, B, D, G, M7, M8-CZ, M9-E, M10, N9-Y, F; Y chromosome: C, K( × L, N, O, Q, R), L, N, Q, R( × R1) and South Asian (mtDNA: M*, R*; Y chromosome: O) haplogroup clusters are shown. Some subhaplogroups with distinct distribution in northern Eurasia are shown with individual colors (mtDNA U4, U7; Y chromosome N2, N3). Population abbreviations for mtDNA and Y chromosome and references therein can be found in Supplementary Tables 3 and 4, respectively.

Materials and methods

Khanty and Mansi samples

Blood samples with informed consents were collected from unrelated Khanty (n=106) and Mansi (n=63) individuals along the lower Ob-river valley. DNA was extracted following a standard phenol–chloroform protocol.

Mitochondrial HVS-I and II regions were amplified using primer pairs L15997-H16391 and L048-H408, respectively.16 Positions 16024–16383 and 72–340 were compared to the revised Cambridge Reference Sequence17 and variable sites were considered (data in Supplementary Table 1). Each mtDNA sequence was assigned to a haplogroup within the mtDNA phylogeny.12, 18, 19, 20, 21, 22 When coding region information was necessary to confirm haplogroup classification seven mtDNA positions (7028, 10400, 10873, 11151, 11719, 12308 and 12705)23 were analyzed (a subset of 41 samples; Table 1).

Table 1 MtDNA and Y chromosome haplogroup frequencies and diversity indices of the KHA and MAN populations

A set of 17 Y chromosome binary polymorphisms (SNPs) were hierarchically analyzed following the phylogeny and nomenclature of the Y chromosome24 in 28 Khanty and 25 Mansi samples. These 53 male samples were genotyped with markers M89, M172, M69, M201, M170, M9, 12f2, M145, M45, M173, SRY10831, M17, P25, M467, M1788, P437 and P367 as described in Bosch et al23 (additional primers in Supplementary Table 2) and their haplogroup was determined (Table 1). In addition, 11 Y chromosome short tandem repeat (STR) loci (DYS19, DYS390, DYS391, DYS392, DYS393, DYS385, DYS437, DYS438, DYS439, DYS389 I and II) were genotyped using the PowerplexY-kit (Promega, USA). In this data set, the actual repeat number at locus DYS389II was determined by substracting the length of DYS389I from DYS389II (STR haplotypes are available in Supplementary Table 2).

Reference data

A total of 3522 mitochondrial HVS-I sequences with relevant coding SNP/RFLP information from 29 different Eurasian populations were retrieved from the literature (Supplementary Table 3). Haplogroup classification was obtained from the original sources and refined according to the criteria used for the Khanty and Mansi samples. Thus, the S sequences15 were reclassified as M8a19 and the Saami HVS-I sequences25 were reclassified according to Bandelt et al.26 For the analysis, some sequences were clustered into major haplogroups: N1a and b into N1; W and N2a into N2; H and HV1 into HV as well as V into haplogroup HV0. Furthermore, sequences classified neither as M7, M8, M9, M10 and M13, nor as the major N or R haplogroups were considered to represent haplogroup M*. Two sequences were excluded: an L2 sequence in the Norwegian sample27 and one sequence with unclear haplogroup definition in the Karelian sample.25 The approximate locations of all the 42 mtDNA population samples used in this study are shown in Figure 1a.

Y chromosome SNP reference data for 2175 individuals from 27 different Eurasian populations were retrieved from the literature (Supplementary Table 4). All the analyzed individuals, typed with differing marker sets, were reclassified into 27 possible Y chromosome haplo/paragroups to allow population comparisons. The four Finnish individuals28 with no clear haplogroup classification were omitted and the Saami data8 was refined based on Tambets et al.29 The approximate locations of all the 33 Y chromosome population samples used in this study are shown in Figure 1b. In addition, nine-locus Y-STR minimal haplotype data of 1734 individuals from 10 different Eurasian populations were included in the STR comparisons (Supplementary Table 4).

Population samples were geographically grouped to Northeast Europe, Northwest Siberia, Northeast Siberia, Central Siberia, South Siberia, Southeast Siberia, Central Asia and East Asia based on Karafet et al7 (Figure 1a and b). Linguistic grouping of samples was performed according to Greenberg:30 Indo-European, Finnic, Ugric, Samoyedic, Turkic, Mongolic, Tunguso, Chukotko-Kamchatka, Eskimo-Aleut, Sino-Tibetan and language isolates of Yeniseian, Nivkhi and Yukagir (Supplementary Tables 3 and 4).

Data analysis

Population diversity indices for mtDNA and Y chromosomal data were estimated using Arlequin 3.01.31 In addition, the weighted intralineage mean pairwise difference (WIMP) was calculated.23, 32 The exact test for population differentiation and the analysis of molecular variance (AMOVA) were also performed.33 To define and test the geographic structure without a priori grouping, spatial analysis of molecular variance (SAMOVA)34 and autocorrelation indices for DNA analysis (AIDA)35 were performed.

To visualize population relationships, correspondence analyses (CA) were performed for both mtDNA and Y chromosome haplogroup frequencies using Statistica 6.0 (StatSoft Inc, US). Population pairwise FST (mtDNA) and RST (Y-STRs) distances were estimated and visualized by multidimensional scaling (MDS), computed with Statistica 6.0 (StatSoft Inc, US). The correlation between distance matrices of mtDNA and Y chromosome haplogroups was estimated using the Mantel test.31

Haplogroup-specific median joining networks36 for mitochondrial and Y chromosome data were constructed using program NETWORK 4.1 (www.fluxus-technology.com). Characters were weighted according to their variance within the haplogroup. For mitochondrial HVS-I positions, the weights given in Bandelt et al37 were used, with 99 assumed as weight for positions with no transitional information. For the Y-STR loci, the weights were calculated as 10 × (Vm/Vi), where Vm is the mean variance of all STR loci and Vi is the variance of STR locus in question.23 To estimate the age of expansion within the mtDNA haplogroups, the ρ-statistic implemented in Network 4.1 was used.38, 39

Results

Khanty and Mansi mtDNA and Y chromosome lineages

In the Khanty and Mansi population samples 39 and 27 haplotypes were observed which could be clustered into 19 and 13 mtDNA haplogroups, respectively (Table 1). These haplogroups represent both West and East Eurasian-associated mtDNA lineages (Figure 1a). Out of the 50 mtDNA haplotypes observed among the Khanty and Mansi, 16 haplotypes were observed in both populations (Supplementary Table 1). On the basis of mtDNA haplogroup frequencies, the Khanty and Mansi samples are not significantly different from each other, but they differ significantly (P<0.05) from the Mansi described by Derbeneva et al.40 However, based on mtDNA haplotype frequencies, the Khanty and two Mansi population samples in this study are significantly different from each other.

Y chromosome binary polymorphisms divided the Khanty into four and the Mansi into six paternal haplogroups (Table 1). Haplogroups R1a1, N2 and N3a were present in both populations, and comprised 96.4 and 84.0% of the Khanty and Mansi Y chromosomes, respectively. These Y chromosome haplogroups are typically found in West (R1a1)41 or across northern Eurasia (N2, N3a).11 On the basis of the Y chromosome haplogroup frequencies, the Khanty and Mansi are significantly different (P<0.05) from each other and also from the Khanty described by Karafet et al.7 The Khanty and Mansi Y-STR minimal haplotypes were compared to the YHRD database of 37 133 European (ie West Eurasia) and Asian (ie East Eurasia) haplotypes. This showed that 53.6% of the Khanty and 32.0% of the Mansi individuals carried haplotypes commonly found in West Eurasia, whereas 21.4% of the Khanty and 44.0% of the Mansi represent East Eurasian haplotype matches. When the binary SNPs were combined with the 11 Y-STR loci, the Khanty showed 14 and Mansi 13 distinct haplotypes with one common East Eurasian N211 haplotype (found in 17.9% of Khanty and 36.0% of Mansi) shared between these samples (Supplementary Table 2). However, the Khanty and Mansi are significantly different (P<0.05) from each other based on haplotype frequencies (Supplementary Table 2).

North Eurasian mtDNA and Y chromosome landscape

To provide a wider view of the North Eurasian genetic landscape, an mtDNA and Y chromosome dataset of 42 and 33 Eurasian population samples was compiled, respectively, (Supplementary Tables 3 and 4). Uniparental lineages were geographically classified into West Eurasian, East Eurasian and South Asian-associated haplogroups (Figure 1 legend).9, 11, 41, 42, 43 In this large dataset, both the mtDNA and Y chromosome data revealed similar patterns (Figure 1a and b). East Eurasian haplogroups are widespread across the whole Siberia, but show a clear decrease in frequency toward West Eurasia. In contrast, West Eurasian lineages show an opposite frequency trend, decreasing toward the East. AIDA showed significant clinal distribution of all mtDNA and Y chromosome haplogroups up to the distance of 4200 km (Figure 2). However, the Y chromosome haplogroup distribution showed a significant increase of autocorrelation at the distance of 4900 km, probably due to the high prevalence of N3 haplogroup across the whole northern Eurasia (Figure 1b). A Mantel test between the population pairwise distances of mtDNA and Y chromosome haplogroups among 15 Eurasian populations (see Supplementary Tables 3 and 4) showed nonsignificant correlation (P=0.41).

Figure 2
figure 2

Autocorrelation indices for DNA analysis (AIDA) of mtDNA and Y chromosome haplogroup data. All autocorrelation values were statistically significant (P<0.05). x axis, lower limit of geographical distance class defined with 700 km interval and y axis, autocorrelation index.

Genetic relationships among the North Eurasian populations

The MDS plot based on mtDNA haplotype FST distances (Figure 3a) places Khanty and Mansi samples in an intermediate position between Northeast Europe and South Siberia/Central Asia. The Y chromosome MDS plot (Figure 3b) appears less structured; here the Khanty and Mansi are only grouped in first dimension with the Finns. All the rest of the populations are loosely associated, with the exception of the distinct Buryat sample. CA of the mtDNA haplogroups shows a clinal west–east pattern of populations (Figure 4a), where the Khanty and Mansi are again located in an intermediate position between clusters of Northeast Europe and Central Asia/South Siberia. Compared to the mtDNA, the Y chromosome CA shows a more scattered pattern (Figure 4b), where the Northwest Siberians form their own cluster, with the exception of the distinct Ket and Selkup samples. The results of the SAMOVA analysis (not shown) are congruent with the MDS and CA plots (Figures 3a, b and 4a, b). The Khanty and Mansi group with the main cluster including populations from Northeast Europe, Central Asia and Central/South Siberia, but form their own cluster when the number of specified groups is increased.

Figure 3
figure 3

Multidimensional scaling (MDS) based on (a) FST distances from HVS-I sequences (stress=0.115) and (b) RST distances from nine Y chromosome STR loci (stress=0.076) between 42 and 12 Eurasian population samples, respectively. Population abbreviations and reference can be found in Supplementary Tables 3 and 4.

Figure 4
figure 4

Correspondence analysis based on (a) mtDNA and (b) Y chromosome haplogroups. Haplogroups are depicted and populations for mtDNA and Y chromosome are marked with numbers corresponding population samples and reference in Supplementary Tables 3 and 4, respectively.

The results of the AMOVA analyses showed a higher heterogeneity for the Y chromosome compared to the mtDNA (Table 2). When the populations are grouped according to geography or linguistics, the mtDNA and Y chromosome have similar distribution. However, the genetic structure of Y chromosomal haplogroups and haplotypes is better defined by geographical grouping (FCT; 10.61/14.61%) than by linguistic grouping (FCT; 7.35/13.80%), respectively.

Table 2 Variance apportionment (%) of the AMOVA results

mtDNA and Y chromosome diversity in Northwest Siberia

Out of all 383 Northwest Siberian mtDNA sequences in the present dataset (Supplementary Table 3), the most frequent West Eurasian haplogroups are U (27.2%), H (11.8%) and J (9.7%). Moreover, the frequency of the U4 mtDNA haplogroup among the Northwest Siberians is high, ranging from 8.5% among the Khanty to 28.9% among the Ket. The Northwest Siberian populations showed a high U4 haplotype diversity (0.833±0.024) compared to South Siberian (0.471±0.063) or Central Asian populations (0.400±0.237; Figure 5a). A similar pattern is detected within the U5a haplogroup with the highest diversity in Northeast Europe (0.930±0.015), followed by Northwest Siberia (0.713±0.083) and South Siberia (0.524±0.209). By contrast, the Northeast European U5b subhaplogroup, including the U5b1b1 ‘Saami motif’,25, 29 was not observed among Northwest Siberians except the Samoyedic-speaking Nganasan (1.9%). The U7 subhaplogroup was found only among the Northwest Siberian Khanty (14.2%) and Mansi (3.2–5.1%), Central Asian Uighur (4.2%) and Northeast European Finns (0.5%). The most frequent East Eurasian haplogroups in Northwest Siberia are C (19.1%) and D (15.7%), with C* and D* (up to 17.5 and 19.0% among the Mansi, respectively) the most frequent subhaplogroups. The C* subhaplogroup shows highest diversity in Central Asia (0.956±0.031) followed by South Siberia (0.893±0.011) and Northwest Siberia (0.869±0.021), similarly as haplogroup D* (data not shown).

Figure 5
figure 5

Median-joining networks for Eurasian (a) mtDNA U4 sequences (colors for the U4 network are the following: gray (original Khanty and Mansi data with additional reference populations40, 44, 45); white (Northeast European populations16, 25, 46, 47, 48) and black (South Siberian populations12, 15, 49 grouped with Central Asian populations50, 51 and Central Siberia Yakuts52)) and Y chromosome (b) N3 (the N3 network corresponds the following colors: gray (original Khanty and Mansi data); white (Northeast European populations28, 53) and black (Central Siberian Yakuts54)) and (c) N2 Y-short tandem repeat (STR) haplotypes (nine loci) (the N2 network corresponds the following colors: gray (original Khanty and Mansi data); white (Northeast European populations11, 28) and black (South Siberian populations11 grouped with Northeast Siberian Eskimos11)).

Regarding the 587 Northwest Siberian Y chromosomes in the present dataset (see Supplementary Table 4), the most frequent haplogroups are N2 (33.4%), Q (23.7%) and N3 (22.3%). N2 and Q are at high frequency mainly within populations of Northwest Siberia (up to 92.1 and 93.8%, respectively), but N3 is frequent across the whole northern Eurasia (Figure 1b). Within haplogroup N3 all Khanty and Mansi STR haplotypes (Supplementary Table 2) cluster into the Northeast European clade as opposed to Central Siberian Yakut (Figure 5b). In contrast, within haplogroup N2, the Khanty and Mansi STR haplotypes (Supplementary Table 2) are divided between two previously defined11 West and East Eurasian clades (Figure 5c).

Discussion

The Khanty and Mansi possess a combination of east- and west-associated uniparental lineages, including features distinct from those observed among other Northwest Siberian populations. The distribution of the West and East Eurasian lineages shows clear east–west clines across the North Eurasia, which is in line with the current understanding of the Eurasian genetic landscape.8, 9, 11, 21, 22, 41 In this context, the Northwest Siberia appears as a ‘contact zone’ between the East and West Eurasian gene pools. Among the Northwest Siberians, the diversity within several uniparental sublineages (such as U7, J1b, J1c, J2, G2 and C5 for mtDNA; and N3 and R1a1 for Y chromosome) appears limited. However, c. 20–25% of the mtDNA haplotypes belonging to western (U4, U5a) and eastern (C*, D*) haplogroups show moderate haplotype diversities among the Khanty, Mansi, Ket and Nganasan. This suggests that the Northwest Siberia was initially colonized by humans carrying both West and East Eurasian Upper Paleolithic lineages. This is congruent with the concept of a genetic continuum of the early Upper Paleolithic populations expanding from Near East/Southeast Europe to Northwest Siberia.40, 44 Indeed, similar ρ-estimates were obtained for mtDNA U4 haplotypes in Northeast European (15.2–29.6 Kyr), Khanty and Mansi (11.9–29.7 Kyr) and all Northwest Siberian populations (12.8–29.6 Kyr), which also agree with the estimates presented for European U4 haplotypes (16–24 Kyr)18 In parallel, similar late Upper Paleolithic coalescent time estimates were observed for C* mtDNAs in Kanty and Mansi (14.5–31.7 Kyr BP) and all Northwest Siberian populations (14.4–32.0 Kyr BP).

However, the mtDNA U7 and Y chromosome N2 haplogroups among the Khanty and Mansi are probably of more recent origin. The U7 haplogroup is nearly absent across Northern Eurasia and mainly found in Near East (10–12% in Iran and India).9 All the U7 sequences found in Khanty and Mansi are identical, and the same sequence is found in Uighur of Central Asia. This suggests a recent founder effect of U7 haplogroup in Khanty and Mansi with a probable Central Asian origin. Similarly, the Southeast Asian-derived Y chromosome N2 lineage,11 allegedly specific for Uralic speakers, is also found among the Indo-European and Altaic speakers, but it is clearly more frequent in the Northwest Siberia (6.9–92.1%).7 Recently, two subclusters within the N2 lineage were described: the East (N2-A) and the West Eurasian (N2-E) clusters.11 Both East (N2-A) and West Eurasian (N2-E)-associated N2 subclusters are found among the Khanty and Mansi (Supplementary Table 2), whereas the Northeast European or South Siberian populations possess only either one of the subclusters (Figure 5c). The coalescent ages of these subclusters are considered relatively young (N2-E, 3.9–9.7; N2-A, 4.2–8.2 Kyr BP),11 although the diversity observed here within the N2-E subcluster (0.933±0.054) is significantly higher than in N2-A (0.500±0.121). This unique combination of N2 subclusters in the Khanty and Mansi suggests a recent amalgamation of western and eastern lineages in Northwest Siberia.

The haplogroup and haplotype compositions of the Khanty and Mansi samples differ significantly from those previously analyzed.7, 40 This, however, probably stems from limited sampling of different geographical subgroups as is probably in the case with the two Khanty population samples used in this study (T Karafet, personal communication) In addition, the histories of male and female populations appear to differ: Khanty and Mansi mtDNA exhibit between 58.7 and 68.9% of West Eurasian gene pool, whereas the other Northwest Siberians present 20.4–47.4% of western lineages. By contrast, the Khanty and Mansi Y chromosome shows between 76.6 and 89.3% of East Eurasian gene pool, similar values as estimated for the other Northwest Siberian populations.

The amalgamation of east- and west-associated uniparental lineages is also observed in Central Asia,50 Southwest Asia42 and South Siberia.14 However, the admixture among the Khanty and Mansi differs slightly from the surrounding populations. This is also supported by multilocus autosomal data, which clearly places the Khanty as an intermediate population between Northeast Europe and Central Siberia/East Asia.55

The initial admixture of uniparental lineages among the Khanty, Mansi and other Northwest Siberians could be explained by northward migration of human groups already carrying both West and East Eurasian Upper Paleolithic lineages originating from Central Asia and South Siberia. Later, when the Ob-ugric Khanty and Mansi migrated from the western side of Ural Mountains to Northwest Siberia, the unique amalgamation of N2-E and N2-A was formed. Similarly the mtDNA U7 and Y chromosome N3 haplotypes in Northwest Siberia suggest gene flow from Central Asia and Northeast Europe, respectively. However, Northwest Siberians are a heterogeneous group of populations showing lower haplotype diversity among and within uniparental haplogroups when compared to the more southern populations. This emphasize the complex background of Northwest Siberian genetic diversity, shaped by recurrent founder effects, admixture and drift in these indigenous populations.