Introduction

The Sami are indigenous people inhabiting the Northern Shield, an area including the northernmost parts of Finland, Norway, Sweden and the Kola Peninsula of Russia. The traditional Sami lifestyle is nomadic, based on reindeer herding, fishing and hunting. Sami are the last European population leading a subsistence lifestyle and their traditional diet consists of high amounts of animal products, particularly from reindeer.1, 2 With increased contact with surrounding populations, the lifestyle of the Sami has become increasingly ‘Westernised’ and many now live in towns and have occupations similar to other local populations. The evolutionary origin of the Sami population has been an enigma. Archaeological sites in central and northern Sweden reveal the presence of people in this area as early as 9000 years before present (YBP). Among the first areas to become accessible after the last glaciation was the Atlantic coast of Scandinavia, and humans may have advanced along the Norwegian coastline and then moved inland towards Sweden and Finland. Humans may have also arrived at the Northern Shield through Finland from Western Eurasia or Continental Europe. Recent excavations in middle Sweden indicate that reindeer herding has existed for at least 1000 years and that the Sami population in this area is post-Medieval.3, 4 While humans may have arrived in the Northern Shield soon after the last glaciation, the relationship between those early settlers and the Sami people is not known.

A total of 11 Sami dialects are recognised but only six of these are in common use with the remaining five being either extinct or spoken by less than 50 people. The Sami languages are members of the Finnic group, within the Finno-Ugric subfamily of the Uralic languages. Apart from the Sami, the Finnic group comprises languages from Finland and Estonia as well as the languages of several indigenous peoples from western Russia, including Udmurt, Mari and Komi, centred around the junction of the Volga and Kama rivers to the west of the Ural mountains.

Early studies showed that the frequency of blood group and protein polymorphisms differ significantly between Sami and the general Swedish population5 and for some loci the data have been interpreted as indicative of an Asian influence.6 On the basis of classical markers, the Sami cluster with other Caucasian populations but as an outlier to Continental European populations.7, 8 Studies of multiple DNA markers have confirmed the overall similarity of Sami with other European populations,9 but some genetic markers yield results that are consistent with a genetic contribution from Asian populations,10 distinguishing Sami from the peoples of Southern and Western Europe. Y chromosome haplotypes have pointed to multiple founding lineages in both Finns and Sami and the Asian component in males has been estimated to be around 50%.

Mitochondrial DNA (mtDNA) is frequently used for population studies. These studies have benefited from the fact that mtDNA is, for all practical purposes, clonally inherited and that it exhibits a high degree of polymorphism. Also, recent database resources enable comparisons of the pattern of genetic variation in the entire mtDNA genome in large population samples.11 Two mtDNA haplogroups (denoted V and U5b)12, 13 account for the majority of mtDNA diversity in Norwegian, Finnish and northern Swedish Sami, with a few other haplogroups (H, Z and D5) occurring at much lower frequencies.14 Haplogroup V is found across Europe and in low frequencies in Eastern European populations14 and has the highest frequency in Swedish (68%) relative to Finnish (37%) and Norwegian Sami (33%). Haplogroup U5b is present at low frequencies across Europe15 and shows the opposite trend with 26% in Swedish, 41% in Finnish and 57% in Norwegian Sami.14 The vast majority of Sami U5b sequences carry HVR-I (hypervariable region 1) substitutions at 16144, 16189 and 16270 which was referred to as the ‘Sami-specific motif’8 or U5b115 haplogroup. Further, coding region variation has narrowed this designation down to U5b1b (ntps 7385 and 10927) and with the addition of the transition at 16144 the ‘Sami-specific’ subclade has been denoted U5b1b1.14

Haplogroup H is found at low frequency in the Sami relative to other Northern or Continental European populations.14, 16 Haplogroup H is the most common haplogroup in European populations and is present at low frequencies in Volga-Finnic populations and rare in central Asian populations.14, 17 The probable European origin and the higher frequency of H among Norwegian Sami suggests that haplogroup H entered Fennoscandia via a migration along the Atlantic coast of Norway or may be present in the Sami due to more recent admixture with European populations.14 The remaining haplogroups found at appreciable frequencies in Sami are D5 and Z.14 While D5 and Z are present at low frequencies in some Asian populations and D5 is relatively common in China,18 both are virtually absent in Europe, implying an Asian origin. Haplogroup Z is most frequent in Northeastern Asia19 and present in Siberian populations as well as in the Volga-Ural region.14 While subhaplogroup Z120, 21 has been observed in the Koryak and Itelmen populations,19 it has also been noted to account for all Z lineages in Western Asia and Northern Europe.14

The pattern of mtDNA haplogroup frequencies in the Sami indicates that the population may have been influenced by several migrations from different source populations. Previous studies of the mitochondrial DNA of the Sami have focused on haplogroup frequencies and estimates of genetic diversity based on HVR sequences. Here, we present an analysis of the complete mtDNA genome from the northern and southern Swedish Sami groups, with the purpose of studying the genetic structure of the populations and addressing the origin of the Sami people.

Materials and methods

Population samples

The frequency of mitochondrial haplogroups were determined from the following population samples; Swedish Sami from the County of Norrbotten, denoted northern Swedish Sami (n=152) and Swedish Sami from Västerbotten, denoted southern Swedish Sami (n=138). Haplogroup frequencies from other populations were obtained from the Human Mitochondrial Genome Database (mtDB)11 and published data.14 The complete mtDNA genome was determined for 18 individuals from northern Swedish Sami (n=7), southern Swedish Sami (n=7) and the Volga-Ural region (n=4) (previously determined to carry the Z1 subhaplogroup). The 18 mitochondrial sequences produced for this study (Accession nos: DQ902694 to DQ902711) and 107 published sequences used for comparison (Accession nos: AF346988, AF347006, AP008305, AP008381, AP008419, AP008426, AP008553, AP008756, AP008829, AP008841, AY195750, AY195761, AY195781, AY255155, AY339433 – AY339459, AY339514 – AY339522, AY339530 – AY339544, AY495109, AY495118, AY495306 – AY495315, AY495317 – AY495330, AY519493, AY713979, AY738946, AY738947, AY882400 – AY882406, AY882409 – AY882411, AY882413 – AY882415) are available at GenBank.

SNP typing and DNA sequencing

The mtDNA haplogroups of the southern and northern Swedish Sami were studied by sequence analysis of selected sites that define the major Sami haplogroups U5b1b1, V, Z, D5 and H. The sites studied include nps 4580, 7028, 7385, 10397, 12618, 12930, 16144, 16189, 16224, 16260, 16270, 16298, 16362. Additional sites were examined when necessary to resolve other haplogroups. This and complete genome sequencing were performed using primers described by Rieder et al22 and BigDye chemistry (Applied Biosystems, CA, USA). Sequencing was performed on an ABI3700 automated fragment analysis machine and analysed with DNA Sequencing Analysis software (Applied Biosystems, CA, USA). Sequence alignments were generated using Sequencher (Gene Codes Corporation, Ann Arbor, MI, USA) software.

Genetic analyses

All nucleotide positions given are relative to CRS.23 Nucleotide diversity among lineages of various regions was estimated using the DnaSP software.24 The age of haplogroups was estimated by building Neighbour-Joining25 trees with PAUP,26 using the Kimura 2 parameter model of nucleotide substitution,27 and calculating the mean branch lengths to shared nodes. In these calculations we assumed a constant substitution rate of 1.71 × 10−8 substitutions per site per year.28 Median-Joining networks were calculated using the software Network 4.1.1.2 (http://www.fluxus-engineering.com). The admixture analyses were carried out using the LEA (Likelihood based estimation of admixture) software, using the method described by Chikhi et al29 that is available at the site: http://www.cnrs-gif.fr/pge/bioinfo/lea.

Results

mtDNA haplogroup frequencies

Haplogroups V (58. 6%) and U5b (35.5%) predominate in the northern Swedish Sami, with several other haplogroups (H, Z) occurring at low frequency (Table 1). This distribution is very similar to that presented previously for a population sample from this area, as well as to the distribution in Finnish and Norwegian Sami (Table 1).14 Haplogroups V and U5b are also present at high frequency in the southern Swedish Sami, along with haplogroups H, Z and a range of other haplogroups (Table 1). While the presence of V and U5b in both the northern and southern Swedish Sami at appreciable frequencies indicates that these two populations share the same genetic origin, the haplogroup distribution in the southern Swedish Sami population differs from the northern Sami in several respects. The frequency of haplogroups V and U5b is lower and haplogroup H (34.8%) much higher in the southern Swedish Sami. Also, the southern Swedish Sami have a number of other haplogroups not found in other Sami populations (I, J, K) but characteristic of Continental European populations (Table 1). The difference in haplogroup frequency distribution between southern Swedish Sami and the other Sami populations could be due to recent admixture with Swedish or other Continental populations. To further study these alternatives, we stratified the southern Swedish Sami sample into those with traditional occupations (ie reindeer herding) and those with nontraditional occupations, on the premise that those with traditional occupations are more likely to have exclusively Sami ancestors. The reindeer herders have a haplogroup distribution similar to that of northern Swedish Sami, with a lower frequency of haplogroup H and a higher frequency of V and U5b1b1 (Table 1). The southern Sami with nontraditional occupations have a haplogroup distribution similar to that of the Continental European population, with a high frequency of H and other ‘non-Sami’ haplogroups constituting more than 70%. The difference between these two groups of southern Swedish Sami could be due to admixture. Using the haplogroup frequencies for the northern Sami and Continental Europeans as the two source populations, we estimated the extent of admixture from the Continental European population in the combined southern Swedish Sami to be 48%, among those with traditional occupations to be 16% and among those with nontraditional occupations to be 67%, using the LEA software.

Table 1 Haplogroup frequencies in different populations

Phylogeny of European and East Asian mtDNA lineages

To study the relationship and genetic diversity within some of the mtDNA haplogroups in Sami populations, we sequenced complete Sami mitochondrial genomes from each of haplogroups V, U5b1b1 and Z and supplemented our dataset with published Sami sequences and complete sequences from other populations. Median-Joining networks were constructed for each of these haplogroups to study the relationship of the Sami mtDNA sequences and sequences from other populations. In the network for haplogroup V, Sami mtDNA sequences are scattered and mainly group with sequences from Finland (Figure 1a). Three of the Sami have identical sequences but there is no indication of monophyletic groups of Sami sequences. The network for haplogroup U5b contains representatives of the subhaplogroups U5b1b, U5b1b1, U5b1a, U5b1, U5b2 and U5b (Figure 1b). All Sami sequences are found in the U5b1b1 clade together with some sequences from Finland. The close relationship with sequences from Finland may be due to admixture.30 The nucleotide diversity for Sami sequences of haplogroups U5b1b1 and V is very low (π=1 × 10−4 and π=1.8 × 10−4, respectively). Calculated from the mean branch length to their shared node, the time to the most recent common ancestor of Sami haplogroup V sequences is 7600 YBP, and for U5b1b1 5500 YBP amongst Sami and 6600 YBP among Sami and Finns. The estimated ages of the U5b1b1 clades are in general agreement with an estimate of the age of their ancestral haplogroup (U5b1b) of 8600 YBP.31

Figure 1
figure 1

Median-Joining networks with relevant substitution positions based on the complete mtDNA genome sequences for individuals with haplogroups (a) V, (b) U5b and (c) Z. The networks have been constructed as described in the text. The colour coding is a follows: green – Sami; blue – Finland; red – Volga-Ural; orange – Continental Europe; pink – Japan; yellow – NE Asia; light blue – China; light green – India.

Except for one Yakut sequence belonging to haplogroup U5b, the only Asian sequences that share a close relationship with Sami sequences are members of haplogroup Z, which comprises East Asian and Eurasian lineages. The network for haplogroup Z shows a clear separation of Z1 sequences between Finns, Sami, Volga-Ural and one Koryak from East Asian Z sequences (Figure 1c). The genetic divergence between the Koryak and the Sami, Finns and Volga-Ural sequences (fixed coding region substitutions at ntps 740, 9494, 12930) warrants the designation of these sequences as a separate subgroup, denoted Z1a. The coding region nucleotide diversity within the Z1a group is remarkably low (π=7.1 × 10−5), indicating that these 18 sequences from three populations last shared a common ancestor very recently. Calculated from the mean branch length to their common node, the most recent common ancestor for Z1a group is estimated at 2700 YBP. By contrast, the genetic link from Z1a to Z1 in Northeast Asia (Koryak) extends back to 13 000 YBP.

Discussion

The northern Swedish Sami have two dominating mtDNA haplogroups, similar to other Sami populations. The presence of these two haplogroups in all Sami populations, albeit at different frequencies, points to a common origin for all Sami populations in the northern Shield area. Among the Sami, the southern Swedish Sami are outliers in their distribution of mtDNA haplogroups. The high frequency of the haplogroups present in Continental Europe in the southern Swedish Sami with non-traditional occupations indirectly supports admixture with the (European) Swedish population. The admixture analysis confirms this observation, lending no support for the southern Swedish Sami having a different genetic origin than the northern Sami.

The contemporary Swedish Sami population is estimated to number about 50 000 people,32 but the population size is likely to have been considerably smaller in historic times. The near complete dominance of only two haplogroups in the northern Swedish, Finnish and Norwegian Sami and the small population size indicates that the Sami could have been subject to strong genetic drift. This limited population size is supported by high linkage disequilibrium (LD) between microsatellite and SNP markers in Swedish Sami relative to the general population in Finland and Sweden.33, 34, 35, 36

The distribution of Sami lineages within the European haplogroup V indicates that Sami have been affected by a migration of Continental European tribes either moving directly north through Sweden or by way of the Atlantic coast, or alternatively, via the Volga-Ural region of Russia where V has been found at appreciable frequencies.14 Haplogroup U5b is widely dispersed in Europe and therefore provides few clues as to putative migrations. However, U5b1b1 has a restricted geographic distribution centred on Northern and Eastern Europe, where it has also been identified in the Volga-Ural region.14 The presence of haplogroup Z implies a contribution, albeit limited, to the Sami gene pool from Asia. The close relationship of Z1a lineages from Finns and Sami with those of the Volga-Ural again implicates that region as a probable source for Sami mitochondrial diversity. There is, however, a difference in the apparent ages of the different Sami haplogroups. The nucleotide diversity among Sami sequences for the three haplogroups studied here is very low. The ages of the variation for U5b1b1 and V among Swedish Sami are similar (5500 and 7600 YBP, respectively) but considerably older than for Z (2700 YBP). The surprisingly close link between haplogroup Z1a among Sami and the Volga-Ural sequences suggest that this haplogroup was brought in during the last 2–3000 YBP. Our data supports that a migration from Eastern Europe, in the vicinity of the Volga-Ural region, is the likely source for much of the Sami mtDNA diversity14 but indicates multiple migrations, the first being 6–7000 YBP and at least one additional migration 2–3000 YBP. Considering the similarity observed between Sami and Finnish mitochondrial lineages, this observation of multiple migration events would also support previous population genetic studies that have indicated dual origins of the Finnish people.37