Introduction

The North Atlantic region was dominated primarily by Norwegian Vikings for several centuries starting in the late 700 AD. Scandinavian Vikings travelled the coastal regions stretching from Norway, Sweden, and Denmark to the Shetlands, Orkney, Scotland, the Hebrides, and Ireland where they settled and intermarried with the original populations.1, 2, 3 The Faroese saga4 dates the first settlement of the archipelago to approximately 825–875 AD. Irish monks may, however, have lived on the islands as early as 650 AD and later deserted them due to the appearance of the Vikings.5, 6, 7 Historical and archaeological records date the discovery of Iceland slightly later, that is, just before 870 AD. The oldest archaeological findings on the Faroe Islands are of Norse origin and date back to 1100 or later, but items of Celtic origin from this period have also been found.8, 9 The term ‘Celtic’ is in this context defined as Scotland and Ireland. The Faroese language has its closest affinity to Icelandic and old Norwegian languages, but a Celtic influence has also been identified.5, 10 Throughout the remaining text Scottish/Irish ancestry will be referred to as British Isles ancestry.

Due to the relatively isolated geographic position of the archipelago, migration to the Faroe Islands has presumably been sparse after the initial settlement period. After the reign of Norwegian kings, the Faroe Islands became part of the Danish kingdom in 1380 AD. In the centuries after, Danish priests and officials settled on the islands, and sailors and traders from other European countries are very likely to have left descendants on the islands.6 The Faroese population was presumably started by only a few founders and experienced early slow growth to a population size of only 4000 for the entire Faroese archipelago in the late 1300s, increasing to approx. 5000 around 1800 AD and then growing rapidly during recent years to the present 48 000 inhabitants.

In recent years, haploid, non-recombining and uniparentally inherited parts of the human genome (the Y chromosome and the mitochondrial genome) have become standard tools for genetic analyses of human populations.11, 12, 13 Specific markers or regions of these units have been successfully used in conjunction with archaeological, historical and linguistic sources to trace the origin of human populations in a number of studies.14, 15 Genetic markers of the Y chromosome reveal patterns of population history through the paternal line, whereas the mitochondrial genome reveals population history through the maternal line.

Several populations of the North Atlantic region (Figure 3) have recently been studied with respect to male and female ancestry.16, 17, 18, 19, 20 These studies suggest that areas remote from Scandinavia have excess Scandinavian ancestry among males and excess British Isles ancestry among females, whereas areas close to Scandinavia have a higher degree of symmetric ancestry proportions among females and males. In this study we add an important piece of missing information to the population history of the North Atlantic region by presenting an analysis of HVR I sequences from the mitochondrial control region in the Faroe Islanders. The two main focuses of this paper are to (i) assess the effect of genetic drift on shaping genetic diversity within the Faroese population, and (ii) to evaluate the genetic impact of Scandinavian versus British Isles female ancestors on the Faroese population and compare it with previous estimates of male ancestry based on Faroese Y-chromosomal data20 applying identical methods.19

Figure 3
figure 3

A map of the North Atlantic region showing the geographical position of the eight populations. The pie chart represents female admixture proportions () for the Faroe Islands estimated using the mtDNA data set (left pie) in relation to male admixture proportions ()21,39 for the Y-chromosomal data set (right pie).

Methods

DNA samples were provided by 122 unrelated males whose grandmothers originated from various islands, thereby representing all major regions of the archipelago. These males were originally collected for a study of male ancestry proportions using Y-chromosomal markers on the Faroe Islands.20 Sixty-one unrelated Danish males were added to an existing data set of 54 Danish males to increase the Danish contribution to the Scandinavian data set. All individuals had voluntarily consented to participate in the genetic survey. In addition to the mitochondrial haplotypes sequenced by the authors, the following previously published HVR I sequences were obtained for 12 European populations: 394,17 551,21 14,22 and 3923 Icelanders, 78 sequences from the Orkney Islands,18 49 from Isle of Skye,18 181 sequences from the Western Scottish isles,18 219 sequences from the Scottish North West coast,18 502 sequences from the Shetlands,19 672 sequences from Inland Scotland,18 10024 and 20015 Irish sequences, 19 sequences from the Faroe Islands,25 323,18 215,26 7411 and 1624 sequences from Norway, 2827 and 3228 Swedish sequences, 33,22 524 and 1625 Danish sequences. The past Scandinavian source population is represented in our study by contemporary samples from Norway, Sweden and Denmark. These populations were therefore grouped in the subsequent analyses. The past source population of the Northern and Western parts of the British Isles is represented by contemporary samples from Ireland and inland Scotland,19 which were grouped for analytical purposes. Sequences from the Western Scottish isles and the neighbouring Isle of Skye were likewise grouped, and a total of eight populations were used in the subsequent analyses (Figure 3).

DNA was prepared from blood samples, using a standard sucrose/Triton-lysis protocol with sodium chloride/isopropanol precipitation. All mtDNA site numbers referred to are in accordance with the reference sequence by Anderson et al.29

Fragments of 540 bp of the mitochondrial HVR I region were obtained by standard PCR amplification using primer pairs L15 999/H16 498.17 Amplification reactions were performed on 18–36 ng of DNA in a 6-μ volume by use of taq polymerase (Roche). The typical thermal cycling profile was 94°C for 5 min, 94°C for 30 s, 50°C for 30 s, 72°C for 60 s, for 35 cycles, and 72°C for 7 min. After amplification, the double-stranded DNA was purified using MicroClean PCR purification kit (Microzone Ltd) prior to direct sequencing. Cycle sequencing reactions (8 μl) were performed using ABI Prism™ Big Dye v.3.0 terminator cycle sequencing kits. Both strands of the PCR product were sequenced using fluorescence-labeled primers on a 3100 Genetic Analyzer (Applied Biosystems, CA, USA). Approximately 500 bp of forward and reverse sequences were aligned and manually checked and edited using SeqMan™II version 5.0 (DNASTAR Inc.). To allow comparison with previously published data, only the sequence starting with position 16 090 and ending with position 16 366 were considered in the subsequent comparative analyses.

Several summary statistics were calculated as implemented in the Arlequin 2001 software package30 (Table 1). Gene diversity measures were estimated using the formula by Nei,31 defined as the probability that two randomly chosen haplotypes in a population are different. Three methods were applied to estimate the population parameter θ as implemented in Arlequin: (i) θk was estimated from the infinite-allele-mutation model equilibrium relationship between the expected number of haplotypes (k), the sample size (n) and θ using Ewens' sampling formula.32 (ii) Watterson's estimator33 of θ, θS is based on the infinite-site mutation model relationship between the number of segregating sites (S), the sample size (n) and θ. (iii) Tajima's estimator34 of θ, θπ is also based on the infinite-site mutation model, but on the relationship between the mean number of pairwise differences (π) and θ. Tajima's D is the normalised difference between θS and θπ.35 D was estimated for each population, and its significance was tested by generating random samples under the hypothesis of selective neutrality and population equilibrium.

Table 1 Summary statistics

The following measures of population differentiation were obtained using Arlequin.30 Analysis of molecular variance (AMOVA) was applied and an empirical P-value was obtained by performing 3000 permutations of haplotypes among populations. In addition, exact tests of population differentiation were performed and pairwise significances were evaluated using 10 000 Markov chain steps. Pairwise genetic distances between populations were calculated in the form of Slatkins linearised Fst36 and presented in two-dimensional space by multidimensional scaling using the ASCAL procedure implemented in SPSS 13.0. Haplotypes were permuted 3000 times between populations to obtain the null distribution of Fst values and evaluate significance.

The phylogenetic relationship of the Faroese mtDNA haplotypes was analysed using a median-joining network approach as implemented in Network 4.1.1.1.37 All sites and substitutions were weighted equally, assuming mutation rate homogeneity.

We examined the female contribution of the Scandinavian and the British Isles source populations to the putative admixed population of the Faroe Islands by applying the approach18, 19 on maternally inherited mitochondrial DNA sequences. The best fitting admixture model was found by searching through all possible admixture proportions from ηN=0 to 1 in intervals of 0.0001, where ηN represents the proportion of either Scandinavian or British Isles founders. Randomly selected Faroese haplotypes were assigned a probability18, 19 of Scandinavian or British Isles ancestry, determined by their relative frequency in the two source populations. For specific ‘private haplotypes’ not present in the source populations, this probability was derived from the relative frequency of the presumed founder haplotype, that is, the phylogenetically closest haplotypes in terms of mutational differences.18, 19 An average proportion of haplotypes assigned a Scandinavian or British Isles origin (N) was obtained by performing 10 000 Monte Carlo simulations for each admixture model.

Results

An Analysis of molecular variance approach based on the mutational differences between haplotypes38 indicated that 99.2% of the genetic variation was due to variance within populations, whereas only 0.80% was due to differences among populations. This small AMOVA Fst value (0.00802) was, however, significantly different from 0 (P<0.00001), suggesting that the distribution of haplotypes among populations deviates from random expectations. The exact test was based solely on haplotype frequencies39 and showed a significant difference (P<0.0001) between the haplotype distribution of all pairs of populations except: Scotland/Ireland vs Orkney (P=0.07), Scotland/Ireland vs Scottish NW coast (P=0.27), Scandinavia vs Orkney (P=0.073). In terms of haplotype frequencies, the Faroe Islands thus appeared to be significantly differentiated from all of the remaining seven neighbouring populations in the North Atlantic region. Furthermore, the Faroese population appeared as an outlier with regard to pairwise genetic distance (Figure 1) indicating its genetic isolation. All pairwise measures of Slatkin's Fst were significantly different from 0 except that between the British Isles sample and the Scottish NW-coast.

Figure 1
figure 1

Multidimensional scaling of the genetic distance between populations, measured as Slatkin's linearized Fst, calculated on the basis of coalescence times for mitochondrial haplotypes within and between populations.

As opposed to the AMOVA approach and Slatkin's Fst, the exact tests do not take mutational differences between haplotypes into account when testing for population differentiation. The significance of almost all pairwise comparisons of the exact test therefore suggests the detected population structure to be due to the differential distribution of individual haplotypes rather than to differences in the distribution of deep-rooted phylogenetic clades among populations.19

The diversity statistics in Table 1 indicate that smaller, geographically isolated island populations (Faroe Islands and Iceland) have fewer haplotypes per number of individuals sampled than mainland populations, which are likely to have a larger effective population size. Small island populations (Western Isles and Skye, Orkney, Shetlands) relatively close to the mainland have a higher number of haplotypes per number of individuals, probably attributable to a higher level of gene flow from the mainland compared to the more remote island populations (Iceland and the Faroe Islands). Values of Tajima's D close to zero are generally interpreted to reflect selective neutrality and constant population size, whereas negative values are interpreted to indicate either a selective sweep or a past population expansion affecting a single locus under the first scenario or all loci under the latter scenario.35, 40 In the context of European HVR I sequences, the low values of the population parameter θ (θk, θS or θπ) and the relatively high values of Tajima's D indicate that the Faroe Islands and Iceland have the smallest effective female population size.21 The Faroe Islands also appeared to have a substantially lower level of gene diversity than the other populations (Table 1).

The correlation between θk and the proportion of private haplotypes19 was r=0.749 (P=0.032), with Iceland above and the Scottish NW-coast below the predicted 95% confidence limit, as the only two populations found outside the 95% confidence region. This indicates that the Icelandic population has a relative excess of private haplotypes, and the Scottish NW-coast has a deficit of private haplotypes. The Faroe Islands and the Shetlands were placed above the regression line at the 85% confidence limit, suggesting that the Faroe Islands and the Shetlands have a modest excess of private haplotypes. Leaving out Iceland produces a correlation of r=0.816 (P=0.025) and places the Faroese population at the upper 92% confidence limit, suggesting that the inclusion of the larger Icelandic gene pool makes it appear less isolated. Genetically isolated populations are expected to have an excess of private haplotypes,19 which is consistent with Iceland, the Faroe Islands, and the Shetlands being the most geographically remote populations.

Median-joining network analysis was performed using all 141 Faroese mtDNA sequences based on positions 16 090–16 365 (corresponds to the positions used in the comparative analysis and revealed 28 distinct haplotypes, network not shown) and using only the 122 sequences obtained in this study based on positions 16 026–16 485 (revealed 29 distinct haplotypes). The overall pattern did not differ between the two networks, but the latter provided a slightly better phylogenetic resolution (Figure 2).

Figure 2
figure 2

Median joining haplotype network representing the phylogenetic relationship between the 29 observed Faroese haplotypes. This analysis was based on the 122 sequences obtained in the current study and position 16026 to 16485. In order to obtain more phylogenetic resolution and make use of the full-length of sequences obtained in the current study, the 19 Faroese sequences by Miller25 were excluded. Numbers on branches refer to the positions undergoing mutational changes from one distinct haplotype to another, positions are numbered according to Anderson et al.27 Numbers in italics refer to the numbers outside position 16090 to 16356 not used in the haplotype sharing analysis to allow comparison with previously published data. The haplotype sharing is thus based on a collapsed network (not shown) with 28 distinct haplotypes based on position 16090 to 16356 and all 141 Faroese sequences. Each circle represents a single distinct haplotype and their sizes are proportional to the frequency of that particular haplotype in the Faroese sample. According to the legend to the left in the figure, different shading of circles indicates with which populations the Faroese population shares that particular haplotype. Light grey circles, denoted mv (median vectors), represent ancestral nodes not present in the sample.

Figure 3 shows a map of the NW Europe and matrilineal and patrilineal admixture proportions in terms of estimated from mtDNA (current study) and Y-chromsomal data (reanalysed using by Goodacre et al19), whereas Table 2 presents the admixture proportions of the Faroe Islands in relation to previously estimated admixture proportions for the remaining North Atlantic populations.19 The proportion of Scandinavian matrilineal ancestry of the Faroese population was only 16.7% (CI0.95: 0.114–0.227), whereas the previous estimate of Scandinavian patrilineal ancestry was 87%19 (CI0.95: 0.809–0.921, Helgason personal communication). British Isles matrilineal ancestry of the Faroese populations was 83.3% (CI0.95: 0.773–0.886).

Table 2 Admixture proportions

Discussion

The current study suggests that only about 17% of the female settlers of the Faroe Islands were of Scandinavian descent, whereas a much larger 83% had British Isles ancestry. Previous studies19, 20 suggested that 87% of the male settlers were of Scandinavian descent, with only 13% having British Isles ancestry. These results may, however, have been affected by the high level of genetic drift occurring in the Faroese population. The applied admixture approach should, however, take into account uncertainty in estimates of haplotype frequencies in the admixed and the ancestral populations due to the effects of genetic drift subsequent to admixture.18 The results of the admixture analysis are inevitably influenced by the choice of source populations. The British Isles sample included 169 (21%) mtDNA sequences more than the Scandinavian sample, and this may have biased the estimate towards increased British Isles ancestry. For the Y-chromosomal data set, 279 (90%) more individuals were used for the British Isles data set than for the Scandinavian data set,19 suggesting that the observed 87% Scandinavian ancestry may represent an underestimation.

The original study of the Faroese male admixture proportions20 suggested that 50% of the Faroese Y chromosomes had Scandinavian ancestry, whereas 23% of haplotypes were assigned to the British Isles and 27% to the Icelandic population. This study used a different approach for assigning y-chromosomal haplotypes, but the data set was reanalysed by Goodacre et al19 using the approach, and is therefore directly comparable with the present study on Faroese mtDNA sequences. The method used by Jorgensen et al does not rely on the assumptions of linkage equilibrium, but the exact performance of this method using completely linked markers remains unexplored.41 The approach16, 18 takes into account the nonrecombining nature of the Y chromosome. Both analyses leave little doubt that a majority of Scandinavian ancestry existed among the male settlers of the Faroe Islands. The study by Jorgensen et al20 consider potential ancestral populations separately instead of pooling contemporary samples likely to represent the same ancestral source population.19 In the current study we have adopted the latter approach for two main reasons. (i) The exact constitution of the original source populations remain unknown and the original source populations are therefore best represented by the combined samples of their descendants. These contemporary populations are closely related which justifies the pooling into a Scandinavian and a British Isles source population. (ii) The admixture approach,18 which seems superior to other methods in the present context,19 is currently not extended to deal with more than two ancestral populations. Standard frequency-based methods for estimating admixture proportions do not take into account private haplotypes of the admixed population42 and are therefore unlikely to be informative for the current data set as the Faroese sample has a high proportion of private haplotypes (Table 1). Admixture estimates (mY) based on the mean coalescence times of haplotype pairs do take private haplotypes into account, but estimates are only accurate if the ancestral populations are substantially differentiated,43 which is not the case for the current data set (Figure 1).

Leaving out Iceland, Shetland, Orkney, the Western Isles & Skye and the Scottish NW-coast as potential source populations is reasonable as they have probably been heavily affected by the same migration processes that gave rise to the Faroese population, according to historical and genetic evidence. Of the 28 distinct Faroese haplotypes, 10% were shared only with the Icelandic population and an additional nine haplotypes (32%) with the Icelandic and other populations. This high degree of similarity with Iceland suggests that the female settlers originated from the same regions18, 19 and settled the archipelago and Iceland at about the same time. Alternatively, the high degree of similarity could be explained by an extensive immigration from the much larger population of Iceland to the Faroe Islands during later periods. However, no support for the latter scenario exists in the historical records.

The present study supports the historical, archaeological and linguistic record by suggesting that a considerable number of the first female settlers in the archipelago originated from the British Isles whereas the majority of the males originated from Scandinavia.20 Goodacre et al19 conclude that Scandinavian settlement of the North Atlantic region during the Viking Age was primarily family-based in the closer and more secure areas (Orkney, Shetland, Scottish NW coast), whereas pronounced male-biased settlement occurred at the ‘frontier’ (the Western Isles, Skye, Iceland). Lone Viking males, who later established families with British Isles females, were thus more prominent in the remote and less secure areas. The asymmetry in female and male ancestry proportions observed in the Faroe Islands thus fit well into the pattern of male and female admixture proportions of the North Atlantic region. The population of the Faroe Islands appears to have even more discrepant admixture proportions among male and female settlers than the putative ‘sister’ population in Iceland.16, 18, 19 It thereby exhibits the greatest discrepancy in male vs female ancestry proportions of all populations in the North Atlantic region (Table 2).

Traditional statistics, such as gene diversity and mean pairwise mutational differences, have recently been suggested to be unreliable indicators of the degree of homogeneity and varying effective population size among closely related, recently diverged populations.21 This is considered a problem, especially for loci with many haplotypes (such as mtDNA), where drift primarily acts to reduce the number of rare haplotypes.21 Statistics based on the number of distinct haplotypes (θk) or the number of segregating sites (θS, Tajima's D) are, however, suggested to be much more sensitive to such differences in the effect of events (bottlenecks and founder events) in recent population histories, and therefore more reliable as indicators of differences in effective population size.21 The Faroese population had a small number of founders, slow population growth over centuries and putative occasional reductions in population size due to epidemics, followed by recent population expansion.44 Such a demographic history makes genetic drift likely to have played a major role in shaping the genetic diversity of the population. Although no severe bottleneck (Ne=50–100) could be detected using autosomal markers,45 the above-mentioned factors are still likely to affect the genetic diversity and distribution of Faroese mtDNA haplotypes. The putative effects of increased genetic drift during founder events and bottlenecks of the Faroese population are reflected by the low number of haplotypes per sampled mtDNA sequence, the low level of gene diversity, and the small effective population size (Table 1).21

We conclude that the relative Faroese homogeneity provides strong evidence that genetic drift has played an important role in shaping Faroese genetic diversity compared to most other European populations. This will have implications for attempts to locate genes of complex disorders. In populations with increased genetic homogeneity individuals sharing a phenotype are more likely to do so because they also share genetic material than would be the case in more heterogeneous populations. This may facilitate the use of linkage and association mapping methods by reducing the problem of allelic- and locus-heterogeneity. If genetic drift has played a major role in shaping the patterns of genetic diversity, it will also have affected the patterns of linkage disequilibrium.21 Randomly selected chromosomes from the Faroe Islands appeared to contain larger shared segments (ie segments that have not been reshuffled by recombination) than chromosomes from more heterogeneous European populations.45 Consequently, certain aspects of association mapping are be expected to be easier on the Faroe Islands, whereas others are be expected to be more difficult.46

Both of the two main focuses of this paper (i) assessing the effects of genetic drift on shaping genetic diversity and (ii) inferring population history of the Faroe Islands will be of importance when interpreting results of gene mapping studies and in choosing populations for replication. It also adds important missing information to the otherwise well-examined population history of the North Atlantic Europe.