Introduction

During the last glacial maximum (28–18 thousand years ago, kya), the Siberian Arctic and Subarctic were either completely abandoned1 or sparsely populated.2 A significant environmental shift was driven by the warming, which occurred in the High Arctic ~17–15 kya.3 It appears that some Siberian groups may have reached western Alaska ~14.5 kya, whereas others were stuck in the lowlands of central Beringia (the Bering Isthmus proper) until the last flooding of the Bering Strait, ~11.5 kya.4 The northern rim of the Pacific, offering a wealth of resources, may have led coastal migrants deep into the interior from marine and estuarine habitats to riverine, lake, and marsh ecosystems all the way down the Pacific Coast of North America.5

Initial settlement of the New World was followed by a period of extensive gene flow between Asian and American populations, inhabiting the Arctic landscapes. Recently, owing to an exhaustive nuclear DNA study taken from 17 Siberian and 52 Native American groups, it has been shown that the initial settlement of the Western Hemisphere was followed by at least two separate population movements from northeastern Eurasia, with the contribution from the first arrival accounting for the majority of total Paleo-Indian ancestry, and with decreasing contributions from the subsequent movements, which mainly affected Na-Dene and Eskimo-Aleut groups.6 However, this model has not been fully evaluated, in part, because researchers were unable to date their arrival, which instead could be inferred from entire mitochondrial genomes.7

The last inhabitants of former Beringia, reflected genetically in the Yukaghir, Chukchi, Eskimo-Aleuts, Na-Dene, and Northwestern North American Indians, have been a focus of extensive comparisons between complete mtDNA sequences from Siberia and from North America. As a result, the Siberian affinities of major Native American mtDNA haploclusters (A–D), initially detected by comparing a few single-nucleotide polymorphisms of similar haplotypes from Eurasia and the Americas,7, 8, 9, 10, 11, 12, 13, 14 have been revised and extended. It has become evident that the range of distinct lineages confined to the entire Eskimo Arctic is a subset of haplogroups A2a, A2b, D2a, and D4b1a2a1, though each of their evolutionary histories is far from being explicitly clear. The high variance of molecular dates, owing to insufficient sampling from living native groups with small effective population sizes, could have considerably influenced coalescence estimates for trans-Beringian mtDNA lineages.10, 13, 15

In the present study, newly obtained entire mtDNA sequences were integrated with those previously published, and updated genealogies and age estimates were generated to shed additional light on source populations from which ancestors of modern Chukchi, Eskimos, Aleuts, and even some Na-Dene groups originated.

Materials and methods

Populations

Siberian Eskimos

By the turn of the 20th century, the territory inhabited by Siberian (Yupik) Eskimos had been reduced to small enclaves on the northeastern and southern coasts of the Chukchi Peninsula: Naukan, Ungaziq (Chaplin), Plover (Provideniya) Bay, and Sireniki.16 The Naukan community of ~330 members, including part of the Eskimo group from Big Diomede Island (Ostrov Ratmanova), went through a process of displacement of native villages beginning in the early 1950’s. In recent centuries, Inupiaq replaced Yupik on Seward Peninsula, thus breaking the American-Siberian Yupik chain and creating a major language barrier between Big Diomede and Naukan. The Sireniki language (now extinct), a relic of probably the earliest wave of Yupik in Chukotka, occupied most of the south shore of the Chukchi Peninsula two centuries ago.17 From the present-day genetic standpoint, Sireniki, Plover, and Chaplin Eskimos appear to be an amalgamation of previously highly endogamous distinct populations.10, 18

Chukchi

The Chukchi, especially Maritime Chukchi, are thought to have emerged as small group of fishing and sea-mammal hunting populations during the period of Neolithic marine-hunter culture or even earlier. Whereas present-day Chukchi, both coastal and inland, speak a dialect of the Chukchi-Kamchatkan language family, a North Alaskan Inupiaq language was spoken along the Arctic Coast of Chukotka from Uelen to as far as Shelagski Cape since at least 1791 and into the 19th century.17, 19 Fusing the Yupik coastal people and reindeer Chukchi camps into one village was a common practice while of administrative relocations and resettlements of indigenous communities in Russia’s Northeast in 1940–1950’s.18

Commander Aleuts

Modern archeology in the Aleut region (Alaska) has generated a series of radiocarbon dates from across the Aleutians to confirm and refine the widely held view that the initial peopling of the archipelago took place from east to west; the earliest human occupation in the eastern Aleutians is dated to ~9.0 kya (reviewed in Veltre D and Smith M, 200820). In contrast to all of the larger Aleutian Islands, which had considerably longer histories of permanent inhabitation, the Commander Islands were initially peopled only in 1825–1826. Subsequently, new settlers from the eastern Aleutians and Alaska were brought by the Russian-American Company in 1840.9, 21, 22 As a result, the Aleut community extensively admixed with Russians were established on the Commander Islands, the village of Nikolskoye, Bering Island, and the village of Preobrazhenskoye on the Copper (Medniy) Island, the latter no longer exists.

Samples

The Chukchi and Siberian Eskimo samples (n=47) come from individuals who were largely born in or derived from coastal villages scattered around the perimeter of the Chukchi Peninsula (Figure 1). With respect to the Commander Aleut samples, 31 of 36 mtDNAs were completely sequenced previously, whereas the remaining five individuals of presumably Tlingit maternal origin were assessed only at the SNP-level.10 To discern their ancestral status in full, these five mtDNA samples were entirely sequenced, and a phylogenetic tree encompassing all 36 complete mtDNA genomes was built.

Figure 1
figure 1

Map of northeastern end of Siberia adjoining to Alaska and Aleutian Islands, showing mtDNA sampling locations.

mtDNA analysis

The complete sequencing procedure entailed PCR amplification of the 22 overlapping mtDNA templates, which were sequenced in both directions with BigDye terminator chemistry (PE Applied Biosystems, Foster City, CA, USA) and the ABI Prism 3130 DNA Analyzer. Trace files were analyzed with Sequencher (version 4.5 GeneCode Corporation) software. We used facilities of the ‘Genomics’ Sequencing Center, SBRAS, Novosibirsk, Russian Federation. Variants were scored relative to the revised Cambridge Reference Sequence, rCRS (GenBank accession number NC_012920.1).23 Entire mtDNA sequences revealing shared lineages or sublineages within and among related Siberian/American sources were assembled into phylogenetic trees by using the median-joining algorithm of Network 4.5.1.024 and mtPhyl v4.015 software.

For the phylogeny reconstruction, preliminary-reduced median network analyses led to a suggested branching order for the trees, which we then constructed most parsimoniously by hand.

Estimates of coalescence and expansion times

The coalescence dates were estimated with the ρ statistics.25 Standard error (σ) was calculated according to Saillard et al.26 Mutational distances were converted into years using the substitution rate for the entire molecule of 2.67 × 10−8 substitutions per site per year.27 BEAST 1.8.1 was used to calculate maximum likelihood (ML) estimates and to obtain Bayesian skyline plots (BSPs) for complete A2a, A2b, D2a, and D4b1a2 sequences with a log-normal relaxed molecular clock (uncorrelated). The general time-reversible sequence substitution model with a fixed fraction of invariable sites and gamma-distributed rates (GTR+I+G) was used because this model has the best fit to our data according to jModelTest 2.1.4.28 These calculations were performed on entire mtDNA sequences (excluding the mutations at np m.16182A>C, m.16183A>C, m.16194A>C, and m.16519T>N because of the high instability of these nucleotide positions). The KC417443.1 sequence from an ancient modern human securely dated at 39.5 kya was used as an outgroup and provided a consistent internal calibration point in our analyses.27 Using the MCMC approach, 40 000 000 iterations were performed, with samples taken every 1000 steps. The initial 10% of iterations were considered burn-in and discarded. Inspection of posterior samples ensured sufficient sampling and was used as a check for convergence to the stationary distribution. Tracer v1.6 was used to visualize BSPs. In order to generate a tree structure that would be directly comparable between analyses, larger subhaplogroups were forced to be monophyletic.

Results

The 52 samples (25 Chukchi, 22 Siberian Eskimos, and 5 Aleut) were chosen from a much larger collection of our previously collected samples9, 10, 18 not yet examined at the entire mtDNA genome level, and were completely sequenced. These were merged with 66 complete mitogenomes, we generated for the same and related Siberian groups previously, and then combined with 83 publicly available complete sequences. The total data set (n=201) was assigned mainly to lineages A2a, A2b, D2a, and D4b1a2. To focus phylogenetic analyses, we removed C5a2, C4b2, and D3 sequences (n=8), a likely admixture between expanding Chukchi and adjoining Yukaghir and/or Even.10, 13 The complete mtDNA genomes from Alaska, NW Canada, and Greenland were downloaded for comparative purposes from GenBank (Supplementary Table S1). The coalescence time and variance computed from the roots of A2a, A2b, D2a, and D4b1a lineages (haplogroups) and their younger nodes are given in Table 1.

Table 1 Age estimates (maximum likelihood and ρ statistics) for haplogroups A2a, A2b, D2a, and D4b1a2 and their major subhaplogroups

Haplogroups A2a and A2b

Of 201 mtDNA genomes scattered on both sides of the Bering Strait, 108 samples (53.7%) fall into different sublineages of A2a and A2b haplogroups (Supplementary Table S1). The phylogeny of A2a is structured into five major subbranches, assigned as A2a1, A2a2, A2a3, A2a4, and A2a5, in accord with the latest release of PhyloTree-Build29 (Figures 2 and 3, Supplementary Figure S1). In addition, there is a large number of independent basal branches and haplotypes. Contrary to A2a1, A2a2, and A2a3, which are confined to the Chukchi and Eskimos (Yupik and Inuit-Inupiaq), A2a4 and A2a5 are associated with various Na-Dene-speaking tribes, as well as Apache and Navajo in areas where Athapaskan populations expanded (eg, US Southwest). The estimated age of the A2a cluster encompassing 66 sequences is ~3.9 kya. The A2b root appears to emerge somewhere within the Bering Sea/Bering Strait region relatively recently (ML: 2.6 kya; rho: 3.1 kya), shortly before the split and subsequent spread of Inupiaq-speakers into northern Alaska and the Canadian Arctic.

Figure 2
figure 2

Schematic phylogeny of mtDNA hg A2a complete sequences at the Chukchi-Bering Sea area and beyond. Ancient samples are in bold.

Figure 3
figure 3

Schematic phylogeny of mtDNA hg D2a complete sequences revealed in the Commanders, Greenland, and Chukotka. Ancient samples are in bold.

Haplogroup D2a

Among the 63 haplogroup D2a complete mtDNA sequences generated and publicly available to date, 36 are from the Commanders, 22 from Chukotka, with the remaining 5 sequences derived from the Canadian Arctic and Greenland (Figure 3, Supplementary Table S1). The D2a1 defined by m.9667A>G is particularly widespread. It is well separated from the major branch (D2a), and encompassing Paleo-Eskimo (Saqqaq), dated to ~3600–4170 14C years,30 Aleut D2a1a, Sireniki D2a1b, and Middle Dorset D2a1-m.11176G>A (Supplementary Figure S3).

Interestingly, D2a2 defined by m.4991G>A is associated at most with mtDNA samples from the Plover and Chaplin Eskimos, as well as the Chukchi from the arctic coast. The occurrence of D2a2 on the East Siberian and Chukchi Sea coasts is intriguing. One possible explanation is the relocation of ~50 Yupik from the Plover (Provideniya) Bay area in 1926 by the Soviet Government, aiming to extend their sovereignty over Wrangel Island. A tiny colony in the village of Ushakovskoye persisted until the 1970s, when descendants of the original settlers started being repatriated to the mainland, largely to the Ayon Island and Cape Schmidt, the homeland of Maritime Chukchi.10, 16 The age of the entire D2a cluster is 4.4/4.3 kya. Remarkably, the age of the D2a1a subcluster, calculated on the basis of 31 mitogenomes uniquely marked by the m.8910C>A transversion, is only 1.4/1.0 kya, implying shallow historical depth for the Aleut-specific D2a1a.

Haplogroup D4b1a2

A portion of the eastern Asian D4b1 phylogeny is given in Supplementary Figure S3. Apart from a relic haplotype (EU482305.1) stemming directly from the basic D4b1a2, there are two major subbranches (D4b1a2a1 and D4b1a2a2) that share a prominent node, m.13720C>T. The geographic distribution of the complete D4b1a2 mtDNAs and its time-depth (8.7/10.1 kya) may be attributed to those Siberians who underwent pronounced differentiation in the Altai-Sayan refugium, followed by far-reaching dispersals toward the northeastern edge of Siberia and subsequent isolation between ancestral and descendant groups. This conjecture is supported by the overlap of the Yukaghir, Tubalar, Altai-kizhi, Buryat, Tuvan, Chukchi, Naukan, and Canadian Eskimo (Inuit) mtDNA sequences within the D4b2a phylogeny.10, 13, 31

The coalescence date of the D4b1a2a1-m.16093T>C subhaplogroup distinguished by m.14305G>A, m.15448C>A, m.16172T>C, and m.16215A>G unequivocally links a portion of the Tubalar maternal gene pool from northeastern Altai to that harbored by the Neo-Eskimos. An updated time estimate of the D4b1a2a1-m.11383T>C-m.14122A>C-m.16093T>C, based on 11 complete sequences is 1.9/2.8 kya, being much younger D4b1a2a1-m.16093T>C aged 4.4/6.2 kya. Because the founding haplotype marked by m.11383T>C-m.14122A>C appear to derive from Arctic Canada, it is likely that the D4b1a2a1-m.11383T>C-m.14122A>C-m.16093T>C (D3a2 in Volodko et al10) would define original Inuit territory as the geographic origin of the D4b1a2a1-m.11383T>C-m.14122A>C-m.16093T>C range expansion.

Discussion

New findings and interpretation

The above results appear to reflect the origins and expansion history of Beringian-specific mtDNA lineages. To reconstruct their migratory trajectories during population expansion, we used BSPs for major haplogroups. Thus, the BSP for A2a and A2b points to demographic growth events dated at ~2.0 and 1.5 kya, respectively, whereas the steepest D2a expansion took place less than 1 kya (Figure 4). The phylogeographic profiles support the scenario that central Beringia was the ancestral homeland for A2 as a whole originating in situ around 15.0 kya, but only much later were its particular derivatives (A2a and A2b) involved in the Chukotka-Alaska coastal expansion. Unlike basal A2a and A2b sequences, the A2 root have scarcely ever been encountered, with only three mitogenomes reported to date.10, 29 Lack of the A2 root on both sides of the Bering Strait may indicate that a basal population, being restricted in an isolated glacial refugium, have gone extinct during the time of postglacial submergence of the Bering Isthmus. Surprisingly, the first evidence for human presence in Alaska (Swan Point), the most likely region for entry into the New World, has a similar date of ~14.5 kya,4 thus eliminating discordance between inferences based on paleoecological, genetic, and archeological records regarding the timing of settlement in central Beringia.

Figure 4
figure 4

Bayesian Skyline Plots for A2a, A2b, D2a mtDNAs. The y axis indicates the effective number of females. The solid line is the median estimate and the colored shading shows the 95% highest posterior density limits.

Of particular note, A2 mtDNAs are virtually absent from the Siberian interior.10, 13, 31, 32, 33 In this connection, we consider challenging the assertion by Tamm et al.34: ‘A novel demographic scenario of relatively recent gene flow from Beringia to deep into western Siberia (Samoyed-speaking Selkups) is the most likely explanation for the phylogeography of haplogroup A2a, which is nested within an otherwise exclusively Native American A2 phylogeny’. The single Selkup mitogenome, attributed to A2a by Tamm et al,34 indeed belongs to A2a1 (Supplementary Figure S1). Aside from an almost identical A2a1 complete sequence harbored by an Inuit gleaned from GenBank (Supplementary Table S1), the HVS-I database also indicates a major presence of A2a1 among the Chukchi10 and in the Aleut.35 Hence, our reevaluation of the claimed Selkup record of A2 casts doubts on its validity and makes it clear that western-central Siberia is essentially bereft of the A2 lineage.

Interaction between Tlingit and Aleut

Analysis of mtDNA variation in Commander Islands inhabitants has disclosed five D2a mtDNA genomes (13.9%) of non-Aleut origin, lacking the m.8910C>A variant. Their Native American – rather than Aleut – identity is based on self-reported family ancestry and genealogies (field notes compiled by Rem Sukernik in 2001 and 2007, unpublished). It appears that the mtDNA data are consistent with historical records indicating that some Tlingit women were among the Alaskan natives and Russian sea otter hunters relocated by the Russian-American Company from New Archangel in 1840, the Russian outpost (present-day Sitka) in southeastern Alaska.21 This finding is congruent with the identification of D2a HVS-I markers (m.16092T>C, m.16129G>A, and m.16271T>C) in a few of the mtDNA samples retrieved from pre-Columbian remains on or near the Alaska Peninsula.35 It is likely that the D2a root defined by m.11959A>G and its derivative (D2a-m.16092T>C) traces its origin to populations initially established in southeastern Alaska. Likewise, one modern Aleut in the sample of Rubizc et al36 and one Apache (Na-Dene) in the sample of Torroni et al37 exhibit D2a-m.16092T>C. The fact that D2a is virtually absent in the present-day Tlingit38 may imply that a severe bottleneck played a critical role in Russian Alaska and Tlingit demographic history.39 Notably, the Tlingit are most closely related to Alaskan Athapaskan as revealed by HLA-DRB1 allele frequencies.22 Taken together, the archeological, linguistic, and genetic evidence support a single origin for Athapaskan Indians and Northwest Coastal inhabitants, the Tlingit included, prior to their spread to the interior of Alaska.40

Neo-Eskimos versus Paleo-Eskimos

The first human colonization of the North American Arctic was accomplished by Paleo-Eskimos who ~5000 years ago lived in small groups, widely distributed at very low density across the vast Arctic coastline, separating the eastern tip of Siberia from the southern tip of Greenland. The far-reaching Paleo-Eskimo migrations have occurred during a particularly warm postglacial period, and may have been partly motivated by the movements of their primary food source: herds of caribou and muskox. Around 1000 years ago, almost all of the Paleo-Eskimo groups had been replaced throughout their territories by the Alaskan Thule culture (sometimes called Neo-Eskimos), the direct ancestors of the Inuit. Their settlement patterns have changed to that of large coastal communities that reflect an increased reliance on sea mammals hunting, especially whaling, for subsistence. Although Neo-Eskimos are found to be mainly of in situ origin from Arctic Coast, it is not explicitly clear whether the Inuit people and the Paleo-Eskimo cultures that preceded them were genetically the same people or independent groups.41, 42, 43, 44

Previous studies of mtDNA variation in Neo-Eskimos demonstrated a close similarity between mtDNA pools of the Bering Sea Eskimos and Canadian and Greenlandic Inuit.10, 18, 26, 45 Most recently, the genetic prehistory of the North American Arctic was the focus of considerable research that led to a number of new insights into human migration patterns in the New World Arctic.46 The study has been conducted using nuclear and mitochondrial DNA data. The authors concluded that despite cultural differences, Paleo-Eskimos display a substantial level of genetic continuity. However, the sampling problem is an important issue: of large data set comprising 169 ancient samples from the sites in the High Arctic, mostly in Canada, Greenland and Alaska, only seven ancient mtDNAs were successfully retrieved and entirely sequenced.46 To have a richer picture of mitochondrial diversity in Beringia, we incorporated these particular mitogenomes into the trees allowing for inferences pertaining to the genetic origins and relationships of the various cultures to each other (Supplementary Figures S1–S3). Of the three Siberian Eskimo tribes – Sireniki, Chaplin, and Naukan – only Naukan is genetically similar to Canadian and Greenlandic Inuit, primarily because they share subhaplogroup D4b1a2a1-m.16093T>C harboring m.11383T>C-m.14122A>C motif, whereas Neo-Eskimos, the Naukan included, do not exhibit D2a10, 15, 46 (also refer this study). In contrast, the occurrence of D2a1b in the Sireniki mtDNA gene pool implies that traces of the Paleo-Eskimo cultures have not been fully erased by the subsequent spread of Neo-Eskimos. This conjecture is congruent with the fundamental genetic position of the Sireniki language that led Krauss to conclude that Siberian Eskimoan was never a dialect continuum, but rather a result of distinctive coastal dispersals.17

Conclusion

The initial split between eastern and western Beringian haplogroups is likely located in the southern extent of Siberia and the Russian Far East, the territory of the presumptive origin of founding haplotypes for major Native American mtDNA haplogroups at or near the last glacial maximum. Current analysis, based on the largest and most diverse set of complete mtDNA sequences, generated to date on both sides of the Bering Strait has revealed a palimpsest of different migrations. The direct ancestors of Paleo-Eskimos are primarily drawn from Chukotka, whereas Neo-Eskimos are found to be mainly of the northern Alaska/Arctic Canada origin but also to harbor Altai-Sayan-related ancestry. Co-representation of A2a and D2a, as well as our phylogeographical and BSP analyses of these haplogroups, point to a common origin of Paleo-Eskimos, Aleuts, and Tlingit (and probably other Na-Dene groups), whose direct ancestors must have lived along the southern coastline of the former Bering Land Bridge in the early Holocene.