Introduction

The repeated glacial fluctuations of the Quaternary had a major impact on the geographic distribution and the genetic structure of temperate and cold-adapted species1,2. Climatic and glacial oscillations forced species to move, adapt or be driven to extinction2. Thus species ranges repeatedly expanded and contracted as they tracked their preferred habitat1,2,3,4. During glacial/interglacial periods, species persisted in refugia5,6. The main glacial refugia in Europe were located on the Iberian and Italian peninsulas, as well as the Balkans7. In North America and Siberia, the main refugium was Beringia, which encompassed the Bering land bridge, parts of Yukon, Alaska and eastern Siberia8,9. Responding to the climatic oscillations in the opposite way than temperate species, cold-adapted species had larger distributions during glaciations and were confined to refugia during interglacials5,6. At the end of the last glacial maximum (LGM; ca. 18,000–20,000 years before present (BP)), cold-adapted species such as the Arctic fox (Alopex lagopus), ptarmigan (Lagopus muta), collared lemmings (Dicrostonyx torquatus), and reindeer/caribou (Rangifer tarandus) re-colonised the formerly glaciated areas of the High Arctic, including its remote islands, leading to their current circumpolar distributions in the Holarctic region2,10,11,12.

Svalbard and Franz Josef Land, each composed of hundreds of islands, are the most remote archipelagos of the Holarctic region. As both archipelagos were almost entirely glaciated during the LGM, most current plant and animal species must have re-colonised the archipelagos after the retreat of the ice sheets (i.e. 10,000–15,000 years BP), thus making them a great study system for the genetic and evolutionary effects of recent colonisation and isolation13,14,15.

Rangifer tarandus, referred to as reindeer in Eurasia and caribou in North America, is an ideal species with which to study the impact of glacial cycles on species distributions16,17,18. Its current distribution spreads from the High Arctic (including Svalbard) to the tundra and taiga regions of northern Europe, Siberia, North America and Greenland2,18,19,20,21. The species is well adapted to various environments and shows four major ecotypes: the boreal forest form, the barren-ground or tundra form, the mountain form and the arctic form16,17,18,22.

Previous studies indicate that ca. 70,000 years BP an ancestral lineage of Rangifer tarandus split into two distinct mitochondrial lineages that represent two different refugia: the Euro-Beringian lineage (BEL) and the North American lineage (NAL)6,23,24,25,26,27. The BEL lineage is the most diverse and considered as the ancestral lineage for Rangifer tarandus25. The NAL lineage likely originated south of the North American ice sheet, where reindeer survived before expanding northward26. Moreover, some BEL sublineages expanded south across North America from Beringia when the ice retreated and secondary contact occurred in Canada ca. 8000 years BP, corresponding with the final deglaciation of North America20,23,24,25,26,27,28. Consequently, the two lineages interbred and now occur as genetically admixed populations in Western North America. Reindeer are extinct on Franz Josef Land, but ancient remains suggest that they occupied the archipelago ca. 6400–1300 years BP21.

There is still uncertainty about the colonisation route taken by the Arctic island reindeer that now inhabit Svalbard (i.e., Svalbard reindeer, R. t. platyrhynchus). Two main hypotheses have been proposed. The first hypothesis proposes a North American origin (i.e., western route), stating that reindeer from the Arctic islands originate from a site in the Canadian Arctic archipelagos and spread eastward to Greenland and subsequently to Svalbard29 (Fig. 1). This hypothesis is supported by similarities in cranial measurements between individuals from Svalbard, specimens from Ellesmere Island and Axel Heiberg Island as well as extinct caribou from East Greenland (R. t. eogroenlandicus)29. Furthermore, previous studies identified a nuclear transferrin (TF) allele shared between Svalbard reindeer and Peary caribou (R. t. pearyi), as well as some other North American subspecies, which is entirely absent in Eurasian reindeer30,31,32.

Figure 1
figure 1

Map of the Holarctic illustrating the two main hypotheses of reindeer colonisation routes to Svalbard. The red arrows indicate the western route hypothesis, and the blue arrows indicate the eastern route hypothesis. Franz Josef Land is labelled as ‘FJL’ and Novaya Zemlya as ‘NVZ’. Additionally, the map includes close-ups of the three Eurasian Arctic archipelagos Novaya Zemlya, Franz Josef Land and Svalbard. Present-day glaciation is shown in the close-ups. Arrows point out the biggest islands or structures of each archipelago. Black and orange dots point out the approximate known sampling locations.

The second hypothesis supports a Eurasian origin (i.e., eastern route). It states that the Svalbard population of R. t. platyrhynchus originates from the R. t. tarandus population of the Novaya Zemlya archipelago, possibly in a stepping-stone dispersal pattern via Franz Josef Land (Fig. 1)4,33,34,35,36. Previous phylogenetic4,37 and osteological36 studies support an eastern route based on shared haplotypes between Svalbard and Novaya Zemlya as well as the Taimyr peninsula. Congruent with these studies, the Svalbard rock ptarmigan (Lagopus muta) also shares a haplotype with its Taymyr populations, indicating an eastern route of colonisation and thus that other species may have used this dispersal route11. While there is partial support for both routes of colonisation in reindeer, the use of short mtDNA fragments in previous studies lacks resolution to properly infer the origin of Arctic island reindeer, and therefore uncertainty remains.

Colonisation of the new remote High Arctic habitat suggests that the species has a number of morphological, behavioural, physiological and metabolic adaptations, which are encoded by a number of genes17,20,38. Indeed, several nuclear genes (e.g. UCP-1; TRPA1) have been identified and associated with cold-adaptation in Arctic species12,39. As mitochondria play an essential role in cell processes such as energy production and thermoregulation, variation within the mitochondrial DNA (mtDNA) can also be involved in cold-adaptation40. Previous studies on mtDNA genes of other species indicate that there is a trade-off between energy or heat production and that the latter is favoured in cold-adapted species40,41.

Since reindeer/caribou have become extinct in parts of their former range, genetic diversity has likely been lost21. Therefore, modern samples alone may not be sufficient to disentangle the colonisation route to the High Arctic. Advances in ancient DNA (aDNA) extraction and sequencing now enable the use of the subfossil remains of organisms that lived thousands of years ago42,43. Hence, aDNA has proven to be a powerful tool to study genetic variation that existed in the past42,43. Examining ancient genomes thus allows the direct observation of recent evolutionary change, enabling a better understanding of changes in biodiversity and adaptation patterns43,44. However, aDNA sequences are sprone to contamination and DNA damage45. Since mtDNA is more abundant than nuclear DNA, it is of great utility for evolutionary studies, especially in cases where endogenous content is too low for sequencing the entire nuclear genome. Moreover, mtDNA evolves approximately 4-10 times faster than nuclear DNA, which enables the estimation of divergence times among very closely related lineages and within a species46. To date, the mitogenomes of many extinct temperate and Arctic species (e.g. quagga (Equus quagga), cave bear (Ursus spelaeus), mammoths (Mammuthus sp.), moose (Alces alces), wattlebirds (Callaeidae)) have provided a wealth of knowledge on Pleistocene evolutionary biology and ecology47,48,49,50,51.

Here, we sequenced complete mitogenomes from ancient (i.e. > 500 years BP), historical (i.e. ~ 70–500 years BP), and modern reindeer specimens and analysed them together with previously published whole mitogenomes representing most of the Rangifer tarandus Holarctic distribution. We reconstructed the species’ range-wide phylogeny and inferred the timing and route of colonisation of the Arctic archipelagos Svalbard, Franz Josef Land and Novaya Zemlya. Furthermore, we leveraged this large dataset and analysed mtDNA protein coding genes of Svalbard reindeer to identify possible signatures of positive selection associated with adaptation to the High Arctic environment.

Results

Mitogenome sequencing

To reconstruct the phylogeny of reindeer across the whole Holarctic region, investigate the origin and route of colonisation of the Arctic archipelagos by Rangifer tarandus and examine the mitogenomic basis of its adaptation to the arctic environment, we analysed 123 published and 51 newly-sequenced ancient and historical (i.e., 4752–100 years BP) as well as modern whole mitogenomes. The newly-sequenced ancient and historical samples (n = 31) had a mean sequencing depth ranging from 6× to 2131×, and an endogenous content ranging from 3 to 89% (Table S1). Some mitogenomes were excluded due to low quality (see “Methods”), thus a reduced mitochondrial alignment of 166 mitogenomes with a length of 16,349 bp was used to construct a haplotype network. Furthermore, a subset of 83 mitogenomes with a length of 16,351 bp was used to construct a tip-dated Bayesian phylogeny. The data showed that the extinct East Greenland caribou, Svalbard reindeer, and Novaya Zemlya reindeer have the lowest nucleotide diversity, at least an order of magnitude lower than in any other population previously analysed (Table 1).

Table 1 Overview of the nucleotide diversity, haplotype diversity, number of segregating sites and total number of haplotypes in the different geographic populations.

Haplotype network

The haplotype network analysis revealed three distinct clusters (i.e., I, II and III; Fig. 2). Among the 166 reindeer mitogenomes, a total of 73 distinct mitochondrial haplotypes were identified (Fig. 2). The highest haplotype diversity is found in Canada, which includes several lineages and ecotypes (Table S1).

Figure 2
figure 2

Median-joining haplotype network inferred from 166 whole mitogenome sequences of reindeer. The three main haplotype clusters (I–III) are circled in red, yellow and green, respectively. Cluster I and III = BEL and Cluster II = NAL. Each circle within the clusters represents a unique haplotype and its size reflects the number of individuals with that haplotype. Note that the branch length is not scaled to the number of substitutions and that the labels correspond to the first name of the individual with this haplotype appearing in the input file. Each small black circle indicates one nucleotide difference between haplotypes. Colour codes listed in the legend denote the geographic origin of the population. Fennoscandia combines the samples from Sweden, Finland and Norway. Geographic location abbreviation in sample names, SB: Svalbard; FJL: Franz Josef Land; R: Russia; NVZ: Novaya Zemlya; C: Canada; WG: Western Greenland; S: Sweden; N: Norway, F: Finland. Asterisks indicate ancient and historical samples.

Cluster I shows a star-like structure and comprises 27 haplotypes represented by 114 Svalbard individuals. Nested within this cluster are two of the three individuals from Franz Josef Land, each represented by its own unique haplotype, and the five individuals from Novaya Zemlya represented by a single haplotype. The haplotype from one ancient Svalbard sample (SB_MLM61; ca. 1875 years BP) is at the centre of cluster I. All other cluster I haplotypes stem from this one in a star-like pattern, indicating that it may be ancestral. Closest to the centre of the cluster I are mostly haplotypes found in single ancient individuals. Haplotypes found in modern individuals from Svalbard are located on cluster I’s outer branches. Two of the three Franz Josef Land haplotypes and the Novaya Zemlya haplotype group on an outer branch of cluster I with one of the two oldest 14C-dated samples (SB_MLM82; 3127 years BP), slightly outside of the centre of the network.

Cluster II is the most divergent among the three clusters, showing the highest number of substitutions (i.e. 64) between branches. It comprises seven individuals from Canada, each represented by its own unique haplotype, distributed in two smaller branches. Finally, cluster III comprises 32 haplotypes from most sampled populations (i.e., Canada, Finland, Russia, Sweden, Norway, Greenland and Franz Josef Land) across four smaller branches with no obvious geographical structuring.

Bayesian phylogeny and timing of divergence

The Bayesian phylogeny revealed seven clades and 10 major nodes (A–H; Fig. 3) with strong statistical support (PP > 0.92) except for nodes D and E (PP = 0.35–0.36). The earliest branching clade 1 comprises individuals from Canada that belong to the NAL. There is little geographical structure within clades 2–6, and Eurasian and Canadian samples are distributed across the branches. All these clades belong to the BEL. In clade 7, all of the modern, historical and ancient Svalbard individuals form a group that is sister to a clade containing two of the three samples from Franz Josef Land and the five samples from Novaya Zemlya. Clade 3 comprises all four modern Norwegian individuals. The estimated mean ages for the historical samples from Eastern Greenland range from 1968 to 1870 years BP (Table S2) and have 95% highest posterior density (HPD) values ranging from 0 years BP to 4701 years BP.

Figure 3
figure 3

Bayesian phylogenetic tree based on 83 whole mitogenome sequences from modern, historical and ancient reindeer. To facilitate the visualisation of the phylogeny, the number of Svalbard sequences was randomly reduced to a subset of 22 mitogenomes (i.e.; 11 historical and ancient and 11 modern ones). Asterisks indicate historical and ancient samples. Clades 1–7 are indicated with curly brackets. Clade 1 exclusively includes the individuals of the NAL lineage as indicated by a blue curly bracket and clades 2–7 include the individuals of the BEL lineage as indicated by a red curly bracket. Samples are coloured by geographical location as listed in the legend and labels include the subspecies. Major nodes are labelled alphabetically and their divergence times are listed as mean node age and 95% highest posterior density (HPD) given in thousands of years (ka) in the upper left corner. Posterior probability (PP) node support over 0.72 is indicated by black node circles. PP of the major nodes is > 0.9 except for nodes D and E (0.36 and 0.35 respectively).

The earliest split in the phylogenetic tree separates the NAL and the BEL at 69,577 years BP (95% HPD: 50,871–87,202 years BP; node A, Fig. 3). The most recent split between parts of the BEL and the Svalbard, Franz Josef Land and Novaya Zemlya archipelagos is estimated at 20,065 years BP (95% HPD: 14,487–25,666 years BP, node G), and the divergence between Svalbard and Franz Josef Land/Novaya Zemlya is estimated at 8792 years BP (95% HPD: 6526–11,256 years BP, node H). The split between the two Franz Josef Land samples is estimated at 6710 years BP (95% HPD: 5235–8362 years BP, node I), and the first divergence among Svalbard individuals at 5925 years BP (95% HPD: 4415–7585 years BP, node J). The estimated posterior substitution rate for the mitogenomes based on the inferred 14C dates of ancient samples and of historical and modern samples with a known year of death was 9.4148 × 10–8 substitutions/site/year (95% HPD Interval: 5.84 × 10–8, 1.31 × 10–7).

Positive selection and adaptation to cold environments

We tested for evidence of positive selection in modern cold-adapted Svalbard reindeer using the random effects and Bayesian approaches implemented in MEME and FUBAR, respectively, which estimate the ratio of synonymous and non-synonymous mutations (i.e., ω = dN/dS) at a given site. Using MEME, we found no indication for positive selection in any mitochondrial genes. In contrast, FUBAR showed evidence for positive selection in only a single codon (i.e., position 205; AA Lysine) within gene ND4 of Complex I, with a posterior probability of 0.922 and an ω value of 7.325 (synonymous α = 4.950, non-synonymous β = 36.260).

Discussion

Here, we used 174 ancient, historical and modern mitogenomes to resolve the timing, origin and route of colonisation of the Arctic islands by reindeer and to investigate the genetic basis of adaptation to the Svalbard environment. First, in agreement with previous studies, we identified two distinct mtDNA reindeer lineages: NAL and BEL23,24,25,26,27. Second, our dataset revealed Franz Josef Land and Novaya Zemlya as closest sister clades to Svalbard, thereby indicating a recent shared ancestry for these Arctic islands. Furthermore, we found some evidence for positive selection within the mtDNA of the cold-adapted Svalbard reindeer.

Consistent with Taylor et al.27 our phylogenetic tree identifies the same two mtDNA lineages. Seven Canadian samples that diverged ca. 70,000 years BP in our phylogeny most likely belong to the NAL, whereas all other samples belong to the BEL lineage. Furthermore, mitogenomes from different subspecies are found throughout the tree thus showing inconsistencies with the current reindeer taxonomy. Our estimate of the divergence time between the NAL and BEL lineages based on a tip-dated phylogeny of whole mitochondrial genomes (ca. 50,000–87,000 years BP) overlaps with the estimate of Klütsch et al.23 (i.e., 44,600–135,800 years BP) based on microsatellite data and mtDNA control region haplotypes, as well as that of Polfus et al.24 (i.e., 68,600–173,600 years BP). However, our estimate is older than that of McDevitt et al.26 (28,100–46,700 years BP) and in stark contrast with that of Yannic et al.25 (184,000–430,000 years BP). The inconsistency between our estimate and those in these two studies25,26 is likely due to the fact that they used control region and mtDNA cytochrome b sequences, respectively, whereas we have used whole mitogenome sequences as well as ancient radiocarbon dated samples. In comparison to fossil-based approaches25, which use the most recent common ancestor of the Odocoleini for calibration, tip-dating incorporates multiple radiocarbon-dated samples, thus improving accuracy of divergence times and preventing underestimation in cases where clades are older than their oldest known fossil52,53. Future approaches using a broader dataset of ancient genomes and dated specimens, especially North American individuals (NAL), may be able to provide more precise information on the timing of divergence of the two lineages.

Moreover, in accordance with the results of Taylor et al.27, two individuals from Western Greenland formed a sister group to one of the two Peary caribou individuals (C_28), thus supporting a close relationship between modern Peary caribou and Greenland caribou. In contrast, the second Peary caribou individual (C_27) forms a sister group to the three ancient and extinct Eastern Greenland caribou (R. t. eogroenlandicus) samples, possibly as a result of local adaptation as suggested by27. Furthermore, earlier studies found a close relationship between Peary caribou and the extinct Eastern Greenland caribou based on morphological evidence as well as shared haplotypes37,54. The three Eastern Greenlandic samples were not radiocarbon-dated but the mean age estimates based on the phylogeny range from 1968 to 1870 years BP, suggesting that they likely are part of the extinct subspecies. This is further supported since they do not branch with the Western Greenlandic samples, which have been identified as belonging to the extant Greenland caribou subspecies (R. t. groenlandicus). However, the molecular age ranges for the three samples from Eastern Greenland overlap with 0 years BP, suggesting a possibility of being modern and belonging to the extant Greenland caribou subspecies. To ultimately confirm the age of these samples, it would be necessary to radiocarbon date them. Nonetheless, simply relying on their phylogenetic placement, it seems likely that they are part of the extinct subspecies. The remaining unresolved clades 2–6 of the phylogeny show little geographic structure, most likely due to admixture or introgression between lineages55.

The dated phylogeny estimated the divergence time between most of Eurasia and the Arctic islands at ca. 20,000 years BP. This estimate coincides with the end of the LGM ca. 20,000–18,000 BP and the beginning of a warmer period that initiated glacial retreat20,37. During the Weichselian glaciation (ca. 115,000–12,000 BP) reindeer established large populations covering most of northern Eurasia. These populations gradually began to retreat northward at the end of the Holocene, reaching as far as the High Arctic archipelagos20. However, since the deglaciation of these Arctic islands began slightly later ca. 13,000–10,000 years BP, their environmental conditions at the end of the LGM were not yet suitable for reindeer due to insufficient vegetation cover37,56,57,58,59. The colonisation of the archipelagos therefore must have occurred later in the early Holocene (from 10,000 years BP onwards) when the climate became milder37,57. Receding glaciers, warmer temperatures and reduced snow-cover during the early- to mid-Holocene (9000–6000 years BP) facilitated growth of various plants such as mosses, dwarf shrubs, lichens and grasses on the High Arctic archipelagos, thus providing favourable and extended vegetation sources for reindeer21,57,58,59. Consistent with these studies on the past climate and vegetation of the archipelagos, our Bayesian approach estimates the divergence of reindeer of the Eurasian Arctic islands at ca. 8000 years BP. The earliest published evidence for reindeer on Svalbard is based on reindeer faeces that were found in peat deposits radiocarbon-dated to 3800–5000 years BP59 whereas ancient antlers were recently dated up to 7,080 years BP (M. Le Moullec, B. B. Hansen & M. Martin, unpubl. data,60). Furthermore, ancient antlers from Franz Josef Land were radiocarbon dated at ca. 6400 years BP21. Consistent with these dates, our phylogeny supports a divergence between the two ancient samples from Franz Josef Land at ca. 6700 years BP and a divergence between the ancient and modern Svalbard samples slightly more recently at ca. 5900 years BP. For Novaya Zemlya, there are no radiocarbon dated estimates available and our phylogeny only includes two modern individuals. However, given geomorphologic studies on the Eurasian ice-sheet and the close relationship to the other archipelagos revealed by genetic studies, a colonisation at that time would seem plausible4,37,61. The close relationship between the contemporary and extinct reindeer populations of Svalbard, Franz Josef Land and Novaya Zemlya are thus consistent with a scenario where Svalbard was colonised from the east.

Surprisingly, one sample from Franz Josef Land is placed in a different clade than the other ones. FJL1b is radiocarbon dated to 3896 years BP and clusters in clade 4 with two samples from Russia. However, this radiocarbon date cannot be further corroborated by damage patterns as the sample was USER-treated. A possible explanation for this clustering may be that FJL1b represents a secondary colonisation, since it is still unclear whether Franz Josef Land had a suitable habitat to sustain a continuous population over time or rather if multiple extinction-colonisation events occurred21,61,62. Therefore, despite clustering with Russian samples, FJL1b most likely comes from Franz Josef Land.

Overall, our results are concordant with a previous study4 that found a shared haplotype among the three High Arctic islands and the Russian mainland, likely supporting an eastern route of colonisation. Additional contemporary evidence of such a dispersal route includes an observation of a live reindeer in Svalbard in 1912 bearing antler markers by the people of Novaya Zemlya, possibly also indicating contact to the mainland and confirming that movement across the ice was indeed possible33,35.

In contrast to earlier studies based on transferrin variation, we did not find a close relationship between Peary caribou, other North American caribou and Svalbard reindeer30,31,32. A more recent study based on sequences of the mtDNA control region6 found a shared haplotype between Svalbard reindeer and caribou from Quebec, suggesting a North American colonisation route. However, this haplotype is the same one previously found on the Taimyr Peninsula37, thus equally supporting an eastern route of colonisation.

It is worth noting that the absence of ancient mitogenomes from Canada and Siberia in our dataset potentially limits our understanding of the colonisation history of the Arctic islands. For instance, our results show that the clade closest to that of the Eurasian Arctic islands reindeer comprises individuals from Canada and Fennoscandia rather than Russia, thus challenging a unidirectional eastern origin hypothesis. Furthermore, the dispersal history of caribou is complex, with strong evidence for admixture between the NAL and BEL lineages23,24,27,28. Nevertheless, evidence based on nuclear genomes60,63 shows a closer affinity between Svalbard and Russian reindeer and thus supports an eastern route of colonisation, consistent with the interpretation based on our mtDNA data. Moreover, in the considered time frame (since the end of the LGM, ca. 10–12 ka BP), a direct colonisation from east or west is a more likely hypothesis than circumpolar dispersal from Taimyr across Northern Siberia, Alaska, Canada and Greenland.

Colonisation of its new High Arctic habitat must have exposed reindeer to a number of selective pressures, due to different environmental and thermal conditions compared to the mainland38. To produce and retain heat, reindeer require a number of physiological adaptations for an enhanced thermoregulation. Mitochondria are responsible for energy and heat production and both processes rely on a proton gradient that is generated during the process of oxidative phosphorylation (OXPHOS)41. Because caloric intake needs to be allocated to either energy production (i.e. coupled OXPHOS) or heat generation (i.e. uncoupled OXPHOS), there is a trade-off between those processes41,64. In an arctic environment, mutations in the mtDNA selecting the generation of heat rather than energy can occur and seem likely38,41,64,65.

The FUBAR method found evidence for positive selection in the gene ND4 of Complex I in the OXPHOS pathway. Complex I is the first and largest of the five complexes that make up the OXPHOS pathway64. It is responsible for the provision of electrons enabling the translocation of protons across the inner mitochondrial membrane creating the pH gradient and the subunit ND4 acts as a direct proton pump41,64. Hence, a change in sequence in this gene could potentially be of adaptive value, especially in regards to environmental pressures such as cold-stress, hypoxia or nutrient availability41. While thermoregulation is a biological function likely under selection in Svalbard reindeer, there are other environmental factors such as nutrient availability that may affect the OXPHOS trade-off41. In contrast to cold-adaptation, starvation or a shortage of nutrients may act as a force to generate ATP more efficiently, resulting in the exact opposite effect of enhancing the coupling of energy production in the OXPHOS pathway41. Importantly, signatures of positive selection in the mtDNA involving gene ND4 and support for the trade-off hypothesis have been found in several species adapted to different climates and habitats (e.g., fish41,66, birds67, insects65, arachnids40). In light of the environmental conditions of reindeer, it seems possible that the site under positive selection induces an elevated efficiency of OXPHOS uncoupling, resulting in an enhanced heat production. In addition to uncoupling as a mechanism for cold-adaptation, mitochondrial proteins may also impact the production of energy and heat through changes in mtDNA copy number, transcription and translation, messenger RNA stability as well as protein stability38,40. Camus et al.65 found that a sequence variation within the mtDNA, such as the present mutation on the positively selected site, may affect the copy number of mitogenomes and gene expression, therefore possibly enabling an adaptation to a cold climate. Further research on gene expression and the determination of mtDNA copy numbers are necessary to prove or rule out a possible impact on energy production through these factors.

In contrast to FUBAR68, there was no evidence for positive selection using MEME even though it is considered to be more sensitive because it considers both pervasive and episodic selection69. While MEME70 should show results that are consistent with methods only able to detect pervasive selection such as FUBAR, it is generally more powerful in detecting episodic selection. FUBAR is instead considered more robust than random effect models such as MEME, as it uses a more flexible and less restrictive setting and is thus less sensitive to model specifications66. It should be noted that it might be beneficial to study the nuclear genome in context of the uncoupled OXPHOS pathway and cold-adaptation, since Complex I also harbours subunits that are encoded by the nuclear genome and studies suggest that, despite the mtDNA evolving roughly 4-10 times faster, a coevolution of mtDNA and nuclear DNA is to be expected71.

Using aDNA, we show how species colonise remote islands in the Arctic and adapt to the environmental conditions of Svalbard. Furthermore, we refine previous divergence dates connected to glacial refugia and range shifts. Whole ancient genomes of reindeer as well as other terrestrial species would be essential to examine in more detail the history of colonisation of the Arctic archipelagos. Finally, additional data from Novaya Zemlya and Franz Josef Land would help to study patterns of adaptation to the conditions similar to Svalbard and to better resolve evolutionary relationships among the three islands.

Methods

Sample acquisition

The 31 historical and ancient samples (i.e., antlers, subfossil bones and dental calculus) were collected during several field trips on various sites of Svalbard (n = 17), Eastern Greenland (n = 3), Franz Josef Land (n = 3), Sweden (n = 7) and Norway (n = 1). Raw sequences for most ancient Svalbard samples (n = 16) were previously published by Kellner et al.72 and are available under the BioProject Accession no. PRJEB60484. 14C dates were calibrated using the IntCal20 calibration curve73. The whole mitogenomes from modern individuals comprise previously published samples from Svalbard (n = 96; ENA BioProject PRJEB57293 and PRJEB61721) and Russia (n = 1; ENA BioProject PRJEB57293 and PRJEB61721) as well as newly sequenced genomes from Northern Finland (n = 3), Norway (n = 4) and several locations in Russia (n = 13). Raw sequences for the modern Canadian samples (n = 24) and the modern Western Greenland samples (n = 2) were downloaded from GenBank (NCBI) under the BioProject Accession No. PRJNA63490827.

DNA extraction and sequencing

Historical and ancient samples were processed using various protocols in dedicated, positively-pressurised ancient DNA laboratory facilities at the Centre for Palaeogenetics (Stockholm), the NTNU University Museum, and Uppsala University. All procedures were performed using sterilised equipment and working surfaces to minimise the risk of exogenous and cross-sample contamination. For the three samples from Eastern Greenland (HOC#1,2,5) and one sample from Franz Josef Land (FJL1b), DNA was extracted from ca. 50 mg of bone or antler powder. To prevent exogenous contamination of the bone powder, bones were wrapped in aluminium foil except for the small drilling area and the drilling bit was changed after removing the outer part of the bone. DNA extraction was carried out following the control protocol described in Dehasque et al.44 and included two negative controls (extraction blanks), and double-stranded DNA (dsDNA) libraries were built following the protocol of Meyer and Kircher74, including two additional negative controls (library blanks) per 10 samples, with an additional incubation step removing deaminated cytosine sites (considered damage patterns) through uracil-DNA-glycosylase and endonuclease VIII (USER) treatment75. Lastly, libraries were sequenced on either 2 × 50 bp chemistry on an Illumina SPrime or 2 × 100 bp chemistry on the Illumina NovaSeq S4 at the National Genomics Infrastructure (NGI, Stockholm, Sweden). Four extraction blanks processed alongside the first 20 reindeer samples extracted with this method were sequenced and ran through a metagenomics pipeline, which showed that they contained< 0.01% reads assigned to mammals, indicating no reindeer cross contamination into the blanks.

The seven historical samples from Sweden, one historical sample from Norway and three modern samples from Russia (Table S1) were collected as dental calculus scraped from the individuals’ teeth. Surface decontamination consisted of 10 min of UV light exposure and 30 s of EDTA wash, as has been previously described in Brealey et al.76. Next, DNA was extracted from ca. 5–20 mg of dental calculus following a silica-column based extraction method77. An extraction blank was carried out alongside the samples for every extraction batch. dsDNA libraries for calculus samples, extraction blanks and library blanks (no-template controls) were prepared following van der Valk et al.78 without USER treatment. Libraries were sequenced using an Illumina HiSeq 2500 platform and using 2 × 150 bp chemistry and an additional two lanes on the same platform using 2 × 100 bp chemistry (SciLifeLab, Uppsala, Sweden).

For the ancient samples B16, B20a, M72, MLM12, MLM16, MLM51, MLM80, and MLM82 from Svalbard, ca. 50 mg of bone/antler powder was subjected to a DNA extraction protocol based on a previous study79. Briefly, bone powder was agitated for 48 h at 55 °C in 1 mL lysis buffer80 consisting of 0.1 M urea, 0.45 M EDTA, 0.15 mg/mL proteinase K, and molecular biology H2O. 10 µL of 20 mg/mL proteinase K was added prior to the final 20 h of lysis. An extraction blank was carried out alongside the samples for every extraction batch. Then the protocol of Dabney and Meyer79 was used with bleach-sterilised Zymo-Spin V column extenders. Dual-indexed dsDNA libraries were constructed on the eluted DNA, extraction blanks and a single library blank (no-template control) following the protocol of Meyer and Kircher74 without USER treatment. These libraries were subjected to target enrichment with Rangifer tarandus mitochondrial DNA baits provided by Arbor Biosciences (Product Code 303048) using the MYbaits protocol v.3.02. In-solution hybridization was performed at 55 °C for 28 h. Thereafter, the pre- and post-capture libraries were sequenced on an Illumina MiniSeq instrument with 1 × 75 bp or 2 × 75 bp chemistry (NTNU University Museum) and an Illumina NextSeq 550 instrument with 1 × 75 bp chemistry (NTNU Genomics Core Facility).

All other historical and ancient samples were extracted from ca. 48–360 mg of bone/antler powder which was drilled after surface decontamination by wiping with bleach, using a silica-pellet based extraction protocol based on Rohland and Hofreiter81. Briefly, a digestion buffer consisting of 1.25% (v/v) proteinase K (20 mg/mL), 90% (v/v) EDTA (0.5 M) and 8.75% (v/v) molecular-grade water was used for digestion. Samples were pre-digested in 1 mL digestion buffer for 10 min at 37 °C on a rotor. The samples were spun down, the pre-digest was removed, and 4 mL digestion buffer was added to the samples. Samples were left for digestion on a rotor for 18 h at 37 °C. The lysis buffer was mixed 1:10 with Qiagen PB buffer modified by adding 9 mL sodium acetate (5 M) and 2 mL NaCl (5 M) to 500 mL of Qiagen PB buffer. pH was adjusted to 4.0 using concentrated (37%/12 M) HCl. 50 µL in-solution silica beads were added, and samples were left on a rotor for 1 h at ambient temperature to allow binding of the DNA. Silica pellets with bound DNA were purified using Qiagen MinElute purification kit following the manufacturer's instructions and eluted in 65 µL Qiagen EB buffer. An extraction blank was carried out alongside the samples for every extraction batch. Then dual-indexed dsDNA libraries were constructed on the eluted DNA samples, extraction blanks, and a single library blank (no-template control) following the BEST v.1.1 protocol82. Thereafter, libraries were sequenced on an Illumina MiniSeq (NTNU University Museum), an Illumina NextSeq 550 (NTNU Genomics Core Facility), or an Illumina NovaSeq 6000 platform using 2 × 150 bp chemistry (Norwegian National Sequencing Centre, Novogene UK).

For the modern samples, DNA was extracted from ca. 20 mg of skin or muscle tissue alongside an extraction blank using a DNeasy Blood and Tissue Kit (Qiagen, Hilden, Germany) following the manufacturer’s instructions. For bone and antler samples minor modifications were applied as previously described in Peeters et al.83. Libraries were then either prepared in-house or at Novogene UK. For in-house library preparation, DNA was subsequently sheared to a fragment length of 400 bp using a Covaris ME220 focused ultrasonicator. Next, libraries were prepared from the sheared DNA using a blunt-end, single-tube protocol for double-stranded DNA (BEST v.1.1)82. For libraries prepared at Novogene UK, DNA was sheared to a fragment length of 350 bp and subsequently end-repaired, A-tailed and ligated with Illumina adapters. Libraries were then either sequenced on an Illumina HiSeq 4000 platform using a 2 × 150 bp set up at the NTNU Genomics Core Facility or on an Illumina NovaSeq 6000 platform using 2 × 150 bp set up at Novogene, UK.

Bioinformatic procedures

Raw sequence reads of the ancient samples were trimmed of adapters and internal barcodes, reads shorter than a length of 30 bp were discarded and paired-end reads were merged using fastp v.0.23.284. Subsequently, the merged reads were mapped to a reference mitogenome of Rangifer tarandus (GenBank Accession: NC_007703.1) using the BWA aln algorithm v.0.7.1785.

Next, the alignments were converted from SAM to BAM format and coordinate sorted and indexed using SAMtools v.1.986. PCR duplicates were then removed using the samremovedup.py python script (https://github.com/pontussk/samremovedup). For samples that were sequenced with multiple runs, reads were merged using SAMtools merge v.1.9.

Mapped reads were then imported into Geneious Prime v.2022.2.2 (https://www.geneious.com)87 and consensus sequences were called for each sample using the majority call rule with a minimum depth of 5 × per position. Any ambiguous positions remaining as well as positions with no coverage at all were called as “N”. Variants and sites where two bases remained in a 50:50 ratio, were visually inspected and called as “N” in the consensus sequences to avoid incorporation of erroneous SNPs.

Endogenous DNA content (defined as the number of reads mapping to the reference mitogenome divided by the total number of reads prior to duplicate removal) for the historical and ancient samples was estimated from the BAM files before duplicate removal using SAMtools flagstat v.1.9. The sequences were then aligned using the MUSCLE v.3.8.42588 plug-in in Geneious Prime v.2022.2.2 with a maximum number of iterations of 8. Afterwards, remaining indels in the alignment were visually detected and removed.

Modern samples were processed using the GenErode bioinformatics pipeline89. Briefly, raw sequences were trimmed of adapters using fastp v.0.22.0 and then mapped to the Rangifer tarandus mitogenome reference using the BWA mem algorithm v.0.7.17. Next, reads were sorted and PCR duplicates were removed using SAMtools v.1.12. The resulting BAM files were then subsampled to an average depth of 93–100 × using SAMtools view v.1.9 and subsequently imported into Geneious Prime v.2022.2.2, where consensus sequences were called as described above.

In this study we generated and collated data for a total of 174 mitogenomes. Within this dataset, we constructed a haplotype network based on 166 of these mitogenomes whereas a subset of 83 mitogenomes was used for the construction of a phylogenetic tree (see details below).

Beast phylogeny and haplotype network

To quantify the haplotype diversity and examine the relationships among populations and islands, we built a haplotype network for 166 of the total 174 mitogenomes from all over the Holarctic range. The alignment was then exported in nexus format with a traits block containing a division into the nine sampling regions into PopART v.1.790. Since PopART masks any columns in the alignment with ambiguous or missing sites (e.g., ‘N’), several samples (n = 8; four historical Swedish, three ancient Eastern Greenland and one historical Russian) with a high proportion (i.e., \({\overline{\text{x}}}\) = 27.8%) of such sites were removed to avoid underestimation of haplotype number and an overall loss of resolution. The final dataset was then used to build a haplotype network using the median-joining option in PopART90,91.

Secondly, we built a phylogeny for modern, historical and ancient samples using BEAST v.1.10.492. To facilitate the visualisation of the phylogeny, the number of Svalbard sequences was randomly reduced to a subset of 22 mitogenomes (i.e.; 11 historical and ancient and 11 modern ones), leading to a total of 83 mitogenomes used for the phylogeny. To estimate the molecular age of the undated samples, an ‘undated’ taxa partition was created using the ‘sampling with individual priors’ option. The median probability values of the 14C dated samples (cal BP) were used for the calibrated tip dates. Additionally, ten modern and historical samples from animals put-down either in 1994 or 1910 were also used as tip dates. The evolutionary model used for the mitogenome dataset was determined in jmodeltest2 v.2.1.993,94 to be HKY with gamma distribution and invariant sites. A constant size model with a strict molecular clock and a clock rate with a uniform distribution and a mean value of 9.4148 × 10–8 substitutions/site/year (95% HPD Interval: 5.8434 × 10–8, 1.3088 × 10–7) was used for the construction of the phylogeny. The substitution rate was estimated in a preliminary run, using only the modern samples and the dated samples with their median 14C ages as tip dates. The ages of the undated samples were estimated using a uniform prior ranging from 0 to 15,000 years BP. Three independent trees were then run for 100 million generations with sampling every 1000 generations. All BEAST runs were checked visually for convergence (i.e., aiming for a minimum effective sample size for parameters above 200) in Tracer v.1.7.295. Subsequently, the three individual trees and log files were combined after removing 10% burn-in (10 million generations) of each file using LogCombiner v.1.10.4. Next, TreeAnnotator v.1.10.4 was used to summarise the trees from the combined trees file into a single target tree. Lastly, the Bayesian tree was visualised and built in FigTree v.1.4.496. Summary statistics on nucleotide diversity (π), haplotype diversity (Hd) and total number of segregating sites were calculated using DnaSP6 v.6.12.0397.

Tests of positive selection

To test for evidence of positive selection in Svalbard reindeer, the reference and annotation for the Rangifer tarandus mitochondrial reference sequence (GenBank Accession: NC_007703.1) were downloaded and imported into Geneious. The annotated reference was then aligned to the modern Svalbard sample sequences (n = 96) using the MUSCLE v.3.8.425 plug-in in Geneious with a maximum number of iterations of 8. Next, non-coding regions and regions coding for tRNA and rRNA were removed manually.

To identify stop codons and remove the three corresponding nucleotides without causing frameshifts, the consensus nucleotide sequence was translated to an amino acid (AA) sequence using the translation tool for vertebrate mitochondrial DNA in Geneious. Due to the overlap in the annotation, protein coding regions of the mitogenomes were then separated into three different alignments. A first alignment ‘Reindeer 1’ containing genes ND1, ND2, COX1, COX2, ATP8, COX3, ND3, ND4L, ND5 and CYTB; a second alignment ‘Reindeer 2’ containing genes ATP6 and ND4 and a third alignment ‘Reindeer 3’ containing gene ND6. Finally, the alignment files were exported from Geneious in fasta format for further downstream analyses.

We identified synonymous and non-synonymous substitutions using the Mixed Effects Model of Evolution (MEME) method70 and the Fast Unconstrained Bayesian AppRoximation (FUBAR) method67 available at datamonkey.org98,99. For each method the alignment files were uploaded on datamonkey.org and the genetic code was set to vertebrate mitochondrial DNA code. The p-value threshold for MEME was set to 0.05 and the cut-off for the posterior probability for FUBAR was set to 0.9. The resulting FUBAR values for dS and dN were used to manually calculate ω. If ω = 1 the site is considered to evolve without selective pressure (i.e., neutrally), if ω < 1 the site is considered to be under negative selection (i.e., purifying selection) and if ω > 1 the site is under positive selection100,101,102.

Maps

All maps included in this study were created using R v.4.42 (https://www.R-project.org/)103 in RStudio v2022.12.0.353 (http://www.rstudio.com/)104. The circular maps of the Holarctic were created using the ‘maps’ package (https://www.rdocumentation.org/packages/maps/versions/3.4.1) and the close-up maps of the archipelagos using the ‘ggplot2’ and ‘ggOceanMaps’ packages (https://mikkovihtakari.github.io/ggOceanMaps/)105.