Introduction

The first human settlers that arrived at the Canary Islands do not seem to pre-date the 1st millennium BC.1 Since the incorporation of the Canary Islands to the European world in the 15th century, the origin and survival of these aboriginal inhabitants has been a debatable topic. Population genetic studies on their present day inhabitants, mainly those based on uniparental markers, have given support to the most probable Northwest African Berber origin of the ‘Guanches’, as the native Canarians are generally known. Mitochondrial DNA (mtDNA) lineages such as U6 and Y-chromosome markers as M81, with a Berber origin,2,3 have a significant higher presence in the Canary Islands than in Iberians, the main colonisers of the Islands.4,5

Admixture analysis taking the Iberians, Northwest and West sub-Saharan African populations as parental sources of the actual Canarian population, gave estimates of around 33% for the maternal4 and 6% for the paternal5 Guanche lineages. This strong sexual asymmetry was explained as a result of a strong bias favouring matings between European males and aboriginal females, and to the important aboriginal male mortality during the Conquest.6 However, these results, although congruent with history, are susceptible of criticism. First of all, as the Berber markers are also present in the Iberian Peninsula,7,8,9 drift effects after the Spanish colonisation could justify their higher frequency in the Canary Islands, without invoking aboriginal heritage. Furthermore, after the Conquest, the need of labour led to the introduction of slaves from the Northwest African coast. With time, these slaves were freed and integrated into the island population. This could justify the presence, in the current Canarian gene pool, of a higher amount of Berber markers than the Iberian Peninsula. However, the geographic distribution of the U6 subclades in Africa and the Canary Islands weakens this statement. In Northwest Africa, the predominant subgroup is U6a, which is scarce in the Archipelago.2,4 On the other hand, subgroup U6b is very rare in North Africa, but the sublineage U6b1 is the most prevalent of the U6 subhaplogroup in the Canarian population,4 and has still not been detected in North Africa.2,10,11,12 Certainly, the straight way to confirm the aboriginal contribution, to the current mtDNA gene pool of the Archipelago, would be to check for the presence of this Canarian U6b1 subclade directly on the aboriginal remains of the Islands. Fortunately, the advances in molecular biology have made the retrieval of ancient DNA (aDNA) from archaeological specimens a tenable goal, especially if these remains are probably less than 1000 years old.

In this paper, we present the results obtained from a sample of 129 Canarian aborigines, analysed for mtDNA polymorphisms using hypervariable region I (HVRI) sequences and restriction fragment length polymorphisms (RFLP).

Materials and methods

Samples

This survey includes 131 teeth, corresponding to 129 different individuals, belonging to 15 archaeological sites sampled from four of the seven Canary Islands and dated around 1000 years old (Figure 1). Care was taken to choose teeth without fractures. To avoid sampling repetitions, whenever possible, only one type of tooth was chosen by site, preferentially left canines. Only twice, teeth of the same mandible were available. In these cases, replication was done in the Department of Genetics of Las Palmas de Gran Canaria.

Figure 1
figure 1

Geographical distribution of the archaeological sites sampled in this study.

Extraction

Prior to the extraction, the surface of the tooth was thoroughly washed with 15% HCl, rinsed with UV-treated ddH2O and dried under a UV lamp for 5 min on each side. After this, each tooth was placed between two sterilised metal plates and crushed with a hammer. The pieces were then introduced into 15 ml tubes (Costar), and DNA was extracted according to a modified silica-based protocol.13 Briefly, 1–2 ml of an undiluted commercial guanidine thiocyanate solution (DNAzol®) was added to each tube and incubated, at room temperature, for 3–4 days. After this, the supernatant was passed through commercial silica columns (QIAquick®, Qiagen).

Amplification

For mtDNA HVRI analyses, seven primer pairs were designed in order to amplify the 400 bp (from 16 000 to 16 400) in overlapping fragments with sizes ranging from 82 to 124 bp (Table 1). HVRI fragments were PCR-amplified in 50 μl reactions using 7–9 μl of DNA. Four additional primer pairs were designed to enable restriction fragment length analysis (RFLP) of the four most common African14 and European15 haplogroup specific sites (Table 1). RFLP fragments were PCR-amplified in 10 μl reactions using 2 μl of DNA extract. Both PCR reactions were submitted to 35 amplification cycles with each one consisting of 10 s steps, with denaturation at 94°C, annealing at the corresponding temperature (Table 1) and extension at 72°C. Positive amplifications were purified using 7.5 M ammonium acetate or directly digested in 15 μl reaction volumes.

Table 1 List of primers used in this study, annealing temperature and product size

Cloning

In those cases in which only one of the three negative PCR controls showed contamination and this was of a much lower intensity than the sample amplification, all products were cloned into pGEM-T vectors (Promega). Several clones were sequenced for each fragment until an unambiguous sequence was obtained. Cloned sequences from the sample were used only when: (a) detected variants were different between the contaminated controls and the aboriginal sample, and (b) the mutations observed for that segment were phylogenetically congruent with the haplotype obtained with the rest of the fragments. Systematic sequencing of the contaminated PCR controls revealed two predominant exogenous sequences. For fragment four, 183C 189 217, which is part of the RHVI motif B4, and for fragment six, 298 325 327, which belongs to haplogroup C. None of these contaminating sequences belong to the people working in the lab or to those known to be involved in the archaeological manipulation.

Sequencing

PCR fragments were directly sequenced using the same primer pairs as for the amplification. Clones were sequenced using M13 universal primers. All primers were labelled with γ32-ATP, and fmol® DNA Cycle Sequencing System (Promega) was used for sequencing.

Prevention of contamination

All extractions were performed in a dedicated laboratory physically separated from the main genetics department, constantly irradiated with UV lamps and frequently cleaned with bleach. All sample manipulations were performed in a laminar flow cabinet, with dedicated pipettes and sterile filter tips (Tip One, Star Lab). Solutions were commercially acquired when possible; otherwise, they were autoclaved and UV-treated. Lab coats, face shields, hats and sterile gloves were used at all times. All metallic material was sterilised in an oven at 200°C for at least 2 h.

In the first stages of the study, the effectiveness of the decontamination process before the DNA extraction was verified in the following way: a tooth was immersed in a solution containing chicken DNA and then submitted to the decontamination protocol. Another tooth was processed in the same way, but without decontaminating the surface. Both cases were submitted to PCR amplifications using specific primers that amplify chicken mtDNA cyt b. In the first case no amplification products were obtained, while in the second a 400 bp product was observed after the amplification, demonstrating the effectiveness of our protocol.

To monitor contamination during extraction, an extraction blank was processed together with each tooth. PCR contamination was monitored using three negative controls per reaction.

RFLP typing of present Canarian population

The published HVRI sequences for the present day Canary Islanders4 were additionally tested by restriction analysis to unambiguously classify them into haplogroups.16

Statistical analysis

Sequences were sorted into haplogroups.16 Gene diversity was calculated as (n/n−1) (1−∑pi2), where n is the sample size and pi is the frequency of the detected haplotypes.17 Relationships between populations were estimated using two methods: haplogroup frequency-based linearised FST,18 computed by means of the Arlequin 2000 program,19 and distances based on shared haplotypes. For the latter, matches were calculated as ∑(xi × yi), xi and yi being the frequency of haplotype i (taking into account the positions between 16 069 and 16 365) in the two compared populations, and distance, D, was estimated simply as 1−∑(xi × yi). Both data sets were used to obtain multidimensional scaling (MDS)20 plots using SPSS ver 9 package. Admixture estimates were also calculated by two methods. The first, mL,21 is based on haplogroup frequencies and was used considering each haplogroup as an allele of the same locus. The second estimator is based on the analysis of shared lineages (LS) between populations. Briefly, the number of shared haplotypes (hiC) between each parental (i) and the Canarian population (C) were counted, and corrected for differences in sample size by dividing by the number of different haplotypes present in each parental population (Hi). These values were normalised in order to obtain the relative contribution of each parental population. Thus, the contribution of population A can be calculated as:

For both admixture estimates, the Canarian sequences4 have been compared with published and unpublished sequences from the Iberian Peninsula,7,8,9,12,16,22,24 Northwest sub-Saharan Africa2,6,25,26,27 and the aboriginal sequences obtained in this work, as the three most probable parental populations.

Results

Informative mtDNA sequences were obtained from a total of 71 individuals, accounting for an efficiency of 55%. The two replications from the laboratory of Las Palmas de Gran Canaria gave identical sequences to those of our lab. A total of 31 different haplotypes were found among these individuals giving a gene diversity of 0.93±0.02, slightly lower, but not significantly different to that found in the actual Canarian population (0.97±0.01), Iberian Peninsula (0.96±0.00) or Berbers (0.95±0.01).

In a previous study4 on present day Canarians, eight sequences, found in at least four of the seven Islands, were thought to be already present in the aboriginal colonisers and were defined as ‘founder haplotypes’.4 Five of these sequences have been found in the aboriginal sample (Table 2). CRS sequences are the most abundant, accounting for 21.12% of the sample. However, not all could be RFLP assigned to concrete haplogroups. The Canarian-specific U6b1 sequences are also found in high frequency (8.45%), corroborating the fact that these lineages were already present in the aboriginal population. Three additional founder haplotypes4 were also detected (260, 069 126 and 126 292 294), all of them showing equal or higher frequencies than in the present day Canarian population. In addition, six private haplotypes have been detected. Two of them (145 213 and 126 224 292 294) belong to Caucasic haplogroups, and the other four to the African macrohaplogroup L (Table 2).

Table 2 Haplotypes found in the aboriginal sample indicating the number of individuals, the RFLP analysed and the geographical distribution of the lineages

Table 3 compares haplogroup frequencies between the aborigines and the present day Canarians. By far, haplogroup H/HV/U*/R(-CRS) is the most abundant, encompassing more than 30% of the sample. Haplotypes belonging to L3 are also in higher frequencies (9.86%) than in the present day Canarian sample4 (χ2=6.55; df=1; P<0.05), but within the range of North African populations (5–26%).2,10,11,12

Table 3 Haplogroup frequencies in the sample of Aborigines (present study) and the revised present day population of the Canary Islands4

FST and D values were calculated between the Guanche sample, the Canarians, the Iberian Peninsula and the different Northwest African populations, in order to establish their genetic relationships (Table 4). For FST, aborigines are not significantly different to present day Canarians nor to Iberians, and the most related North African populations are Moroccans and Moroccan Berbers. The D values give similar results, Moroccan Berbers being the most related North African population. MDS plots, based on FST and D values (Figure 2), show that the first dimension clearly separates Saharan and Sahelian populations from the others. However, whereas FST separates the Guanches, Iberians and Canarians from the Mahgrib group, D values cluster Guanches with Moroccans Berbers and Iberians, pointing to the Berbers as the most related Northwest African population to the aborigines. The close relationship between aborigines and Iberians is puzzling, but this is mainly due to the high frequency of shared CRS. When this haplotype is omitted from the calculations (results not shown), FST show similar relationships, but D clearly separate Iberians from the Guanche–Canarian–Berber cluster.

Table 4 Linearised FST (below the diagonal) and D (above the diagonal) distances between the populations from the Canary Islands, Iberian Peninsula and Northwest Africa
Figure 2
figure 2

MDS plots based on (a) FST and (b) D distances.

For the first time, admixture estimates of the current Canarian population can be calculated using the aborigines as a parental population. The mL estimator points to the Iberian Peninsula as the main contributor to the Canarian maternal gene pool with 55%, followed by 42% of aboriginal influence and a minimal sub-Saharan African input of around 3% (Table 5). On the contrary, for the LS estimator, the aborigines account for the highest input with 73%. The Iberian Peninsula would have contributed with 21.5% and sub-Saharan Africans with the remaining 5.5%. The differences between both estimators are mainly due to the fact that they are measuring different aspects: on one hand, differences/similarities in haplogroup frequencies, and on the other, the relative amount of haplotype sharing. As similar haplogroup frequencies do not imply similar haplotype sharing, we are more confident in the LS values.

Table 5 Admixture estimates (%) obtained for the present day Canarian population4 using two different estimators (mL and LS)

Discussion

The high diversity found in the Guanches, comparable to the present day Canarian and to continental (Northwest African) populations, is against the idea that the aboriginal settlement implied strong founder effects. It is congruent with a great ethnic diversity of the Guanches,1,28 and/or the existence of several differentiated migratory waves to the Canary Islands before the European Conquest.1,5

The detection in the Guanches of the most abundant haplotype of the U6b1 branch, also found in present day islanders,4 points to a significant continuity of the aboriginal maternal gene pool. However, in contrast to the overall diversity detected, the North African U6 representatives in the Guanches are only of two types: 172 189 219 278 and 163 172 219 311. It is striking that the Canarian branch of U6 has not been detected in any of the analysed North African populations to date.2,10,11,12 One explanation could be that the 16 163 mutation arose in the Islands. However, this hypothesis has several objections. The first, is that the estimated age of the subgroup is around 6000 years,29 which predates the arrival of the first human settlers to the Islands.1 The second is that this hypothesis implies an important human interinsular movement as this sublineage is currently found on all the Islands. However, this has not been corroborated by archaeological, anthropological and linguistic data, which point to a considerable human heterogeneity even on the same Island.1,28 The third is that U6b, the phylogenetically closest U6b1 ancestor, is present in Africa but absent in the Canarian Archipelago. For all these considerations, most probably, the U6b1 clade emerged in Africa and migrated to the Canary Islands. It could be possible that U6b1 subgroup is still present in Northwest Africa. However, giving a sample size of 524, and assuming a U6b1 frequency for North Africa similar to that in the Canary Islands, the probability that this clade has not been detected in North Africa is negligible. This could indicate that either the exact region has not been sampled yet, or that the actual U6b1 frequency in Africa is lower than that of the Canary Islands. Today, U6b lineages have only been sporadically found in two Moroccans,2,11 a Wolof,2 a Fulbe27 and in the Iberian Peninsula.7,8,9,23 This wide distribution could be compatible with the idea that U6b lineages were present, in the past, in all of this Western area, but posterior demographic movements reshaped its genetic landscape. The fact that four of the six private aboriginal haplotypes belong to the African L cluster reinforces this idea. These facts difficult the search for an exact geographic origin of the Canarian aborigines. However, molecular relationships point to the Moroccan Berbers as the most related African population to the Guanches, confirming, at a genetic level, the previous general supposition of the strong cultural and anthropological affinities between the Guanches and the westernmost African Berbers.1,28

Quantitative admixture approaches, using the aboriginal sample as a parental contributor, showed that the Guanches constitute 42–73% of the present day Canarian maternal gene pool. These data confirm previous estimates using Northwest African populations as a parental contributor (33–43%).4,6 Both results support, from a maternal perspective, the supposition that since the end of the 16th century, at least, two-thirds of the Canarian population had an indigenous substrate, as was previously inferred from historical and anthropological data.30