Introduction

The Balearic archipelago and the islands of Sardinia, Corsica and Sicily are enclosed in the westernmost part of the Mediterranean basin by the Iberian Peninsula and the Strait of Gibraltar to the west, North Africa to the south and the Strait of Messina and the Italic Peninsula to the east. This region covers an area of about 0.85 million km2 that embraces a set of populations closely related not only by geography but also by historical relationships, probably since their initial peopling in middle and upper Paleolithic times. Their common historical background includes numerous and almost continuous waves of settlements and conquests by several mainland civilisations. This coming and going of populations in a close geographical area has constituted a challenge for archaeologists, historians, ethnologists and anthropologists alike. The latter have been particularly interested in determining the degree of genetic relationships of a set of populations that, even having a common past of invaders for centuries, still preserve some remarkable differences.

In Majorca, the largest island of the Balearic archipelago, archaeological data suggest peopling since the Paleolithic period. The island was occupied by the Carthaginians before passing to the Romans, who installed a long period of prosperity. From 707, the island was increasingly attacked by Muslim raiders from North Africa. Two centuries later, the Caliphate of Cordoba conquered Majorca, ushering in a new period of prosperity for the island. In the thirteen century, the Catalano–Aragonese launched an invasion with 15,000 men and 1,500 horses, annexing the island to the kingdom. In the archipelago, the mother tongue is the Balearic variation of Catalan, a Romance language spoken in a large part of the former territories of this kingdom. From a genetic point of view, recent data from mtDNA haplotype variability (Picornell et al. 2005) suggest a high similarity among Majorca, other Balearic islands and Spanish populations historically related with the Catalano–Aragonese kingdom. This affinity points to an important gene flow from the mainland without significant bottlenecks involved in the colonisation of the island.

Corsica and Sardinia formed a single land mass in early Paleolithic times. They are now separated by a straight of about 12 km wide. As a result of this close vicinity, these two islands share a common background despite the fact that for many centuries, the contact with different Mediterranean invaders was apparently limited to the coastal flatland territories of the islands. Carthaginians and Romans pushed the indigenous people into the central region of the islands, which explains the fact that, in the case of Sardinia, the centre is the most conservative region linguistically and genetically (Piazza et al. 1988; Cappello et al. 1996). In regards to the vernacular languages, the inhabitants of both islands speak Romance languages, but in the case of Corsica, the language shows a high affinity with the Tuscan dialect although with some internal differentiations, whereas Sardinian is a clearly distinct Romance language, preserving traces of the indigenous pre-Roman languages of the island until this very day.

These two islands also share some demographic features, likely a product of their abrupt geography, that have modeled their genetic structure mainly due to the effect of isolation and genetic drifting. Until the eighteenth century, Sardinia had a population that rarely exceeded 400,000 inhabitants. It was even lower in Corsica: 100,000 inhabitants until the end of the eighteenth century (Day 1987; Gatti 1995; Simi 1997). Genetic studies conducted in Corsica and Sardinia, although numerous, failed to coincide in data. Some studies based on classical markers indicate genetic similarity (Memmi et al. 1998; Vona et al. 2003), whereas others emphasise their genetic heterogeneity (Calafell et al. 1996). More recently, DNA studies have added more data without conclusive results. Francalacci et al. (2003) conducted a survey of Y-chromosome haplotypes in several samples from Corsica, Sicily and central Sardinia. Their main findings underline the differentiation of Sardinians and Sicilians from other Mediterraneans, whereas Corsica remains more similar to continental Italy and French samples, excluding the possibility of significant gene flow from central Sardinia to north-central Corsica. Data from mtDNA (Morelli et al. 2000) have also demonstrated a remarkable discontinuity among central Sardinians and both north Sardinians and Corsicans. On the other hand, a different mtDNA study (Falchi et al. 2006) found genetic similarities among Iberian, Corsican and Sardinian populations. This study confirms the fact that most mtDNA haplogroups in these samples coalesced in Paleolithic dates. Information from autosomal markers also gives controversial results; the maternal genetic similarities among Iberian, Corsican and Sardinian populations seem to be reflected in the high frequencies of ß039 thalassemic mutation (Falchi et al. 2005), whereas a multilocus analysis of autosomal microsatellites (Tofanelli et al. 2001) suggests a remarkable genetic differentiation between Sardinia and Corsica.

The particular position of Sicily in the centre of the Mediterranean has made the passage through it easier for peoples from virtually all of the Mediterranean and beyond. Before the Roman conquest, Sicily was occupied by remnants of the autochthonous populations of Sicani, Elymi, and Siculi (Indo-European populations that arrived between the second and first millennium BC), as well as by Phoenicians (tenth to eighth century BC) and Greeks (eighth century BC). The Sicilian language has inherited vocabulary and grammatical forms from these earliest settlers of the island as well as from the later colonists and conquerors. In view of their heterogeneous background, the subject of genetic relationships between populations on the island of Sicily is controversial. Some studies based on classical polymorphisms, and later on autosomal DNA markers (Calò et al. 2003; Ghiani et al. 2002; Piazza et al. 1988; Romano et al. 2003), indicated that Sicily is genetically heterogeneous, with a considerable East–West gradient compatible with population settlements occurring at different times. Other authors (Rickards et al. 1998) state that there was no clear geographic clustering within Sicily, rejecting an East–West differentiation.

Although the genetic information here summarised is extensive and covers everything from classical polymorphisms to uniparental and autosomal DNA, as far as we know, none of these studies have tested the four main islands with samples including different geographical areas inside each one jointly. This is the context in which we are presenting our work. We analysed a set of eight autosomal Alu polymorphisms and three short tandem repeats (STRs) closely linked to the CD4, F13B and DM Alu markers in seven regions of Majorca, Sardinia, Corsica, and Sicily. We selected these particular markers for two main reasons, the first being the widely contrasted informative nature of Alu insertions for the study of human populations (Watkins et al. 2001) due to their stability, low mutation rate and known ancestral state, and the second due to the remarkable degree of information provided by Alu markers linked with STRs. The latter are very effective for estimating divergence between populations, although their mutation rate involves a certain degree of homoplasy that can mask the true genetic relationships. Information on haplotype frequencies, together with STR variation on ancestral Alu allele background compared with STR variation on the derived Alu alleles, has been used to estimate fine genetic relationships between human populations, not only on a large geographical scale (Tishkoff et al. 1996; Ramakrishnan and Mountain 2004), but also at a microgeographical level (Flores et al. 2000; Esteban et al. 2004).

The main objectives of this work are: (1) exploration of the degree of internal variability of Corsica, Sardinia and Sicily for comparison of the results with previous studies that suggested different heterogeneity levels inside these islands, (2) analysis of the genetic relationships among the four islands by maximum usage of different genetic markers such as Alu and STRs and (3) use of the qualitative information provided by the Alu/STR haplotypes to determine the amount of external gene flow received in the islands as a result of their historical background.

Material and methods

A total of 360 unrelated and healthy autochthonous individuals from seven well-defined rural areas of Majorca, central Sardinia, west-coast Sardinia, central Corsica, west-coast Corsica, east Sicily and west Sicily were analysed. Samples were obtained with the informed consent of the participants, whos four grandparents were born in the same region. The geographical position of the samples is detailed in Fig. 1.

Fig. 1
figure 1

Geographical position of the seven insular samples and other west Mediterranean groups used in comparisons

Eight human-specific Alu insertion polymorphisms (DM, HS2.43, B65, PV92, D1, F13B, A25 and TPA25) were typed using the primers and polymerase chain reaction (PCR) amplification conditions previously described in Stoneking et al. (1997) and Edward and Gibbs (1992), with minor modifications. As for STRs, CD4 consists of a pentanucleotide (TTTTC)n repeat amplified according to Tishkoff et al. (1996), with minor modifications. This STR maps approximately 9 kb from the Alu marker. The F13B STR is a tetranucleotide repeat (TTTA)n at 4 kb of the Alu marker. Amplification conditions were as described in Nishimura and Murray (1992), with slight modifications. The DM (CTG)n repeat was amplified according to Brook et al. (1992). In this case the Alu polymorphism is located 5 kb telomeric to the repeat. After amplification with fluorescent-labeled primers, PCR products were pooled and electrophoresed on an ABI PRISM 3700 DNA sequencer (Applied Biosystems, Foster City, CA, USA). Genescan and Genemapper 3.0 programs (ABI PRISM, Applied Biosystems) were used to generate fragment sizes and genotypes. Different selected individuals were sequenced for each STR to confirm size lengths and assign the correct repeat number for comparisons with data generated by other authors.

Allele frequencies were computed by direct counting, and Hardy–Weinberg equilibrium was tested by an exact test (Guo and Thomson 1992). Standard gene diversity indices by populations and locus were estimated according to Nei (1987). Locus frequency distributions were compared by an exact test for population differentiation. The program PHASE was used to generate, by means of a Bayesian statistical method, estimates of haplotype frequencies. The v.2.1 implements extensions of the original methods described in Stephens et al. (2001) and Stephens and Donnelly (2003). Linkage disequilibrium estimates between the STR and their respective Alu were quantified using the adaptation of Black and Krafsur (1985) algorithms contained in the computer program GENETIX 4.05 (Belkhir et al. 1996–2004). The apportionment of genetic variance was checked by analysis of molecular variance (AMOVA) through the ARLEQUIN computer package (Excoffier et al. 2005). Locus-by-locus fixation indices (FST, FSC and FCT) were averaged to obtain a global value. The statistical significance of these averages was checked by combining probabilities (Sokal and Rohlf 1997).

Apart from the seven samples included in this study, data from other European and North African samples, mainly from the Mediterranean basin, were collected from the literature. For our seven samples, we reached a database of 18 Alu polymorphisms by linking this work to a previous one conducted by our research team (Caló et al. 2005), but the available literature allowed us to create a database of 20 Mediterranean groups only for the following Alu: APO, B65, PV92, D1, F13B, A25, TPA25 and ACE. These samples (see Fig. 1 for geographical location) come from the works of Stoneking et al. (1997), Comas et al. (2000) and Garcia-Obregon et al. (2006, 2007). The consulted population data from CD4, F13B and DM STRs was obtained from the ALFRED database (Rajeevan et al. 2005).

Population relationships were approached by means of FST-related genetic distances analyses (Reynolds et al. 1983) using the PHYLIP 3.6 package (Felsenstein 1989) and depicted through multidimensional scaling from the distance matrix. Genetic distances (δμ)2 for STR data according to Goldstein et al. (1995) were calculated by the computer program Microsat 2 (written by E. Minch and available from: http://www.hpgl.stanford.edu). Population divergence times were estimated according to Goldstein et al. (1995), who proposed an equation to calculate divergence time among two samples by dividing the estimated value of the (δμ)2 distance by twice the product of the mutation rate (β) and the constant size variance (ω) of mutational jumps considering a generation time of 25 years. For divergence time calculations, we assumed that ω is constant with a value of 0.04 (1/25) and β value of 2.8 ( 10−4 (Chakraborty et al. 1997). The time obtained is expressed in years before present (YBP).

Results

Variability in west Mediterranean Islands

Alu polymorphisms

Allele frequencies for the eight Alus are shown in Table 1. In general, all loci were in Hardy–Weinberg equilibrium after Bonferroni correction (excepting D1 in west Sicily) and showed significant gene diversity differences (Kruskal–Wallis test p < 0.001). In regards to FST values, only the PV92 loci showed moderate genetic differentiation (FST of 6.7%). Although the samples from Sicily and Sardinia occupied extreme positions in the population variation ranges (see Table 1) for some loci, when average heterozygosities were compared, the Kruskal–Wallis test indicated no remarkable differences (p = 0.873) among our samples.

Table 1 Alu insertion frequencies in west-Mediterranean islands

Pairwise population comparisons across the eight loci revealed a remarkable degree of heterogeneity (significant p values for 17 comparisons out of 21) disrupted only for four population comparisons that failed to show significant differences: west-coast Corsica with both central Sardinia and Majorca, and central Corsica with east Sicily and Majorca. The locus that showed the highest number of significant population comparisons (11 out of 21) was consistent with the genetic differentiation revealed by FST values, PV92. A nonhierarchical AMOVA yielded an average FST in west-Mediterranean islands of 2.2% (p < 0.001). Inside Sardinia, the level of population genetic variance (FST of 5.5%, p < 0.001) was even higher than that observed for the whole of the islands. Both the low number of samples and the extreme allele frequencies shown by the two Sardinian samples in almost all loci probably accounted for this FST value. However, the same pattern was observed when we recalculated FST using information from 18 Alu polymorphisms: average FST (3.7%, p < 0.001) inside Sardinia was triple that observed among islands (1.3%, p < 0.01).

STR gene diversity and Alu/STR compound systems

Allele size frequencies of CD4, F13B and DM microsatellites are available as supplementary material from the Web site of the journal. Overall, the three distributions were in Hardy–Weinberg equilibrium after Bonferroni correction, with the only exception of DM STR being in central Sardinia. Allele diversity values and some statistical parameters of allele size distributions, including STR variation on derived chromosomes, are reported in Table 2. Heterozygosity values in the three STRs showed similarly notable levels of within-population variation, but no significant population differences were detected in neither diversity values or allele size frequencies. STR variation in the CD4- and DM-derived chromosomes (those Alu-) was extremely lower in all cases, according to previous knowledge about the distribution of these compound systems in modern humans (Tishkoff et al. 1996, 1998). On the contrary, STR variation in the derived F13B chromosomes (those carrying the Alu insertion) was high and very similar to that described for the general variation of this STR.

Table 2 Variation of CD4, F13 and DM microsatellites in west-Mediterranean islands

Alu/STR haplotype frequencies are reported in Tables 3, 4 and 5 for CD4, F13B and DM markers, respectively. In the three compound systems, Alu and STR alleles were in linkage disequilibrium. Agreeing with that observed for the STR distributions, our samples showed high levels of within-population diversity but weak population differences.

Table 3 CD4 Alu/short tandem repeat (STR) haplotype frequencies and global gene diversity
Table 4 F13B Alu/short tandem repeat (STR) haplotype frequencies and global gene diversities
Table 5 DM Alu/short tandem repeat (STR) haplotype frequencies and global gene diversities

The number of different CD4 haplotypes in Sardinia and Sicily (seven and nine, respectively) exceeded in number those found in Corsica and Majorca due to the presence of some African characteristic combinations (Alu—with both alleles of five and eight repeats) in the former populations. In the particular case of east Sicily, these haplotypes accounted for a frequency of 5.6%. Another haplotype (Alu-/10 repeats allele) characteristic of Berber groups (Flores et al. 2000; Esteban et al. 2004) was also found in central Sardinia (1.2%), east Sicily (2.5%) and, with more remarkable frequencies, in west Sicily (4%). The pattern of F13B haplotype frequencies was very similar among islands excluding the comparison between west-coast Sardinia and west-coast Corsica (p = 0.04). Some haplotypes were common to all groups, whereas some others were found scattered in certain samples; however, none of these particular haplotypes were detected in Majorca. DM haplotypes also showed a similar pattern of high within-population diversity combined with great population homogeneity.

Leaving aside some occasional differences, the remarkable genetic heterogeneity within and among islands detected for the set of Alu markers did not match up with the global homogeneity detected for STR variation and Alu/STR haplotype frequencies. Furthermore, in west-Mediterranean islands, the FST values deduced from STR variation in the three loci as a whole (FST = 0.01%) or from the three Alu/STR compound systems (FST = 0.02%) were not significantly different from zero.

Genetic relationships in the west-Mediterranean basin

Global relationships in our samples were assessed through FST-related genetic distance matrices for 18 Alu polymorphisms and three Alu/STR combinations and through (δμ)2 distances for the three STRs. In all matrices, distance values were significantly different from zero in more than 90% of cases. Alu and Alu/STR distance matrices were positively correlated (Mantel test, r = 0.763, p = 0.041) and underlined the genetic differentiation of west-coast Sardinia and west Sicily [see Fig. 2a for the multidimensional scaling (MDS) plot based on Alu data]. The first dimension of the MDS plot based on (δμ)2 distances (Fig. 2b) clearly distinguished two population clusters, with west-coast Sardinia and west Sicily as the most differentiated samples within each group. Figure 2b also contains estimates of divergence times among samples; the two main population clusters showed a time separation of around 25,000 YBP, whereas the divergence inside each group was considerably lower.

Fig. 2
figure 2

a Plot of multidimensional scaling (MDS) (stress = 0.008) applied to the FST genetic distance matrix based on 18 Alu markers. b Plot of MDS (stress < 0.001) applied to the (δμ)2 genetic distance matrix based on three short tandem repeats (STRs). Years before present (YBP) estimated through the distance values are indicated for the main groups; for averaged YBP, standard deviations (SDs) are indicated in parentheses

Heterogeneity within west-Mediterranean islands has been examined in a wider context (Fig. 3a) to determine its true significance. FST-related genetic distances among our samples and a set of related populations for 8 Alu polymorphisms ranged from the lowest value of 0.0018 between two Spanish samples (northeast Spain and Navarre) to the highest 0.1379 (between west-coast Sardinia and west Sicily). Average genetic distances inside west-Mediterranean islands (average distance dm = 0.044) were considerably higher than the average distances among southwestern Europeans (dm = 0.010) and North Africans (dm = 0.013). On average, west-Mediterranean islands showed the highest between-group distance with North Africans (0.045). The fraction of genetic variance resulting from differences among these two groups measured through the across-loci average FCT value was 1.79% (p < 0.001).

Fig. 3
figure 3

a Plot of multidimensional scaling (MDS) (stress = 0.091) applied to the FST genetic distance matrix based on eight Alu markers. b Position of west-Mediterranean islands in the heterozygosity vs. distance from the centroid plot based on Alu polymorphisms

When population relationships were depicted through an MDS plot (Fig. 3a), west-coast Sardinia, central Sardinia and west Sicily occupied a peripheral position in the upper part of the graphic, whereas the other samples were closely related to the Spanish and French samples. The Sardinian differentiation may be explained by the fact that, in comparison with the whole population correlation between distance from centroid and heterozygosity (Fig. 3b), they showed less heterozygosity than that expected under the Harpending and Ward (1982) model, suggesting either a greater influence of genetic isolation or smaller effective population size.

Discussion

For west-Mediterranean islands, the autosomal Alu and STR data reported here are the first to be described and jointly discussed in order to shed light on some of the most controversial issues of west-Mediterranean population relationships, namely, the internal degree of heterogeneity within islands, the particular affinities and/or differences among islands, the amount of external gene flow received and finally, the divergence times among these regions.

Concerning Alu markers, the seven west-Mediterranean samples show noticeable levels of genetic diversity, with the only exceptions being east Sicily and west-coast Sardinia, wich have the lowest average heterozygosities. Genetic differentiation inside and among islands is extremely high as can be deduced from both the results of pairwise population comparisons (17 out of 21 cross-loci population comparisons are statistically significant) and global FST values. A general trend to low gene diversity in Sardinia (Fig. 3b) joined with discrepant patterns of Alu allele frequencies among samples could be consistent with such differentiation. The global degree of differentiation among islands (2.2%, p < 0.001) is even slightly higher than that reported in Europeans (1.9%) or North Africans (1.5%, Comas et al. 2000; 2.3%, Gonzalez-Perez et al. 2003) for a comparable set of Alu markers and samples.

West-coast Sardinia and west Sicily are clearly differentiated from all samples (Figs. 2a, 2b, 3a). The action of genetic drift in relatively small population groups could have contributed to their differentiation. Although we cannot ignore that historical, linguistic and some genetic evidence (Piazza 1988) in Sardinia point to differences in population settlements among central and coastal areas due to the confinement of the original population, the Nuragici, into the internal regions as a result of the Carthaginian and Roman invasions. In the case of Sicily, historical records also indicate an important retreat of the original Sicanian population due to the arrival of the continental Italian Sicels. Our results concur with evidence based on Y-chromosome haplotypes (Francalacci et al. 2003) and mtDNA (Morelli et al. 2000) that point out Sardinia and Sicily as the most differentiated populations in the west-Mediterranean basin.

The relative heterogeneity among the remaining insular samples revealed by the plot based on 18 Alu markers (Fig. 2a) is less evident when other Mediterranean groups are added to the MDS analysis. The proximity of Corsica, Majorca and east Sicily to continental samples indicated by the set of eight Alu markers (Fig. 3a) has also been suggested by data from mtDNA (Falchi et al. 2006; Picornell et al. 2005).

Genetic differentiation in west-Mediterranean islands is not evident by STR variation. Although all samples show notable levels of within-population diversity, neither significant population differences nor remarkable levels of genetic variance have been detected in any of the three analysed STRs. The discrepancies observed between results indicated by Alu markers and STR variation may derive from the different nature of these two polymorphisms. The former are unique events far from the effect of random fluctuations caused by mutation and probably reflect the ancestral origin of populations. A population split from this ancestral group with enough time to accumulate STR variation due to the high microsatellite mutation rates, together with the homogenising effect of gene flow, could explain the observed discrepancies in genetic heterogeneity and FST values among these two genetic markers.

Gene flow among west-Mediterranean islands and beyond seems to have been outstanding. STR variation on the three loci coincide in showing high heterozygosity values in all samples; in most cases, STR variation parameters are higher than those reported for mainland Europeans (Tishkoff et al. 1996, 1998; Esteban et al. 2004). Insularity has not acted as a strong barrier to gene flow, at least among west-Mediterranean islands and mainland southern Europe, according to the merged historical background of these samples. However, historical records also point out North African influences. We have not detected any remarkable affinity among west-Mediterranean islands and North Africans. But this fact does not exclude some particular examples of African gene flow. Traces of African contributions to the gene pool of some islands can be deduced from the frequency of several CD4 Alu/STR haplotypes. The relatively high contribution of African-characteristic haplotypes in Sicily (8.16% in the east sample and 5.16% in the west sample) in comparison with the other islands (less than 2.5%) agrees with the strategic geographic position and the historical background of this island. Majorca, however, which was under Islamic rule for more than three centuries, does not exhibit any trace of African haplotypes. This fact agrees with other genetic data (Picornell et al. 2005) reinforcing the historical evidence that documented an important repopulation of the island by Spaniards after the Catalano–Aragonese conquest.

We conclude with some data of divergence times among samples, even though these estimations represent maximum values, because they are based on the assumption that the measured STR variation has developed locally, and we know that gene-flow processes in the west Mediterranean could have added some bias to time calculations. Time estimates (Fig. 2b) separate our samples by a time range of around 24,259 ± 6,211 YBP in two groups: central Sardinia, central Corsica, west-coast Corsica and west Sicily vs. west-coast Sardinia, east Sicily and Majorca. An average date of 5,973 ± 2,815 YBP separates west Sicily and west-coast Corsica from the remaining populations inside their respective groups. These dates are compatible with the population heterogeneity revealed by Alu data, suggesting that some differences among our samples could be traced back to the first settlement of the islands, likely reflecting genetic drift and/or genetic isolation processes. On the other hand, the high within-population diversities and the remarkable STR and Alu/STR homogeneity among islands suggest that, at least since Neolithic times, gene flow has been active in the west-Mediterranean basin. Genetic drift in west-coast Sardinia and gene flow in west Sicily have probably stressed their general genetic differentiation.