Main

Human population genetics in the Western Mediterranean has a long tradition, spanning from classical studies1 to whole-genome genetic analyses.2 Many of these studies have tackled the existence of a genetic barrier in the Strait of Gibraltar between Iberian and North African populations.

With only 14.3 km of ocean separating Europe and Africa at its narrowest point, some geneticists see the Strait as a bridge for cultural and genetic diffusion, a ‘melting pot’,3 whereas others see it as a barrier to gene flow. According to the latter view, obstructed navigation between the two continents or linguistic/cultural differences may have restricted gene flow in the past.4

There are plenty of enzyme- and DNA-based studies arguing in favor of either the ‘melting pot’5, 6, 7, 8 or the ‘genetic barrier’ model.9, 10, 11, 12 These studies have mostly used genetic distances, principal component analysis (PCA) and/or spatial autocorrelation analysis.

Here we search for genetic patterns in the Western Mediterranean by use of spatial PCA (sPCA), a recently developed, spatially explicit multivariate method that reduces the multidimensionality of geo-referenced genetic data like ours to a few synthetic variables.13 Unlike ordinary PCA, which we also use here, maximization in sPCA is carried out not on genetic variance alone, but rather on the product of genetic variance and spatial autocorrelation (measured by Moran’s I).14 This way, global structures (for example, clines) are disentangled from local structures (that is, strong genetic differences between neighbors) and from random noise. Highly positive eigenvalues (for example, the product of a high genetic variance explained times a high positive autocorrelation) correspond to global patterns, whereas highly negative eigenvalues (for example, the product of a high genetic variance explained times a high negative autocorrelation) indicate local patterns.

For this study, we considered three different classes of autosomal markers: polymorphic Alu insertions (PAIs), single-nucleotide polymorphisms (SNPs) and short tandem repeats (STRs; Supplementary Table 1). The PAI data as well as part of the SNP and STR data has been reported elsewhere.8, 12, 15 All genotypic data are available in the Supplementary Excel File.

The analysis was carried out on 15 geographically well-defined Western Mediterranean populations from continental Spain, South France, Morocco, Algeria and Tunisia (Figure 1; Supplementary Table 2). Participants were healthy, unrelated individuals of either sex who had their four grandparents born in the same region. Mean sample size was 96 for PAIs (N=1151), 41 for SNPs (N=531) and 42 for STRs (N=467). All participants gave their informed consent and the study was performed in compliance with the guidelines of the Ethical Committee of the University of Barcelona.

Figure 1
figure 1

Approximate geographic location of the studied population samples. AS: Oviedo, Asturias, North Spain, PA: Valles Pasiegos, Cantabria, North Spain, BA: Guipuzcoa, Basque Country, North Spain, OL: Olot, Catalonia, Northeast Spain, GD: Sierra de Gredos, Avila, Central Spain, AL: La Alpujarra, Andalusia, South Spain, TO: Toulouse, South France, DO: Doukkala, Morocco, OU: Ouarzazate, Morocco, AM: Amizmiz, High Atlas, Morocco, AN: Asni, High Atlas, Morocco, KH: Khenifra, High Atlas, Morocco, BO: Bouhria, Northeast Atlas, Morocco, MZ: M’zab, Algeria, TN: Monastir, Tunisia (see also Supplementary Table 2).

The analyses were performed on four different data sets: (i) 16 PAIs in 12 populations; (ii) 35 SNPs in 13 populations; (iii) 13 STRs in 11 populations; and (iv) all 64 markers in 9 populations (Supplementary Table 2). We carried out preliminary Mantel tests16 for correlation between genetic and geographic distances (10 000 permutations) using the ade4 v1.4-17 statistical package17 (http://pbil.univ-lyon1.fr/ADE-4/) in R v2.15.1 (http://www.r-project.org/). All correlations were modest but significant (r ranging from 0.317 to 0.564; P<0.05), indicating the presence of spatial structure in the samples. We further employed two dimensionality reduction methods on the data. First, we carried out classical PCA using ade4 to gain insight into the genetic relationships among our samples. Then, we carried out sPCA with the adegenet v1.3-5 statistical package18 (http://adegenet.r-forge.r-project.org/) in R to specifically investigate the spatial pattern of genetic variation in the studied populations.

Figure 2 shows the PCA plots of the first vs second PC for the three marker sets together and separately. Linkage disequilibrium in some SNPs and STRs did not affect the results of the analysis as proven by the omission of the correlated markers (data not shown). In all plots, two clusters were visually distinguishable along the first PC, corresponding to Southwestern Europe and North Africa. Several smaller clusters were also identifiable, especially on the African side of our geographical setting. In all marker systems, the first PC explained a considerably higher percentage of genetic variance comparing with the rest of the PCs. This structure could be dating back to prehistoric times, likely to the Upper Palaeolithic, as suggested by previous age estimates for two bi-locus haplotypes (comprising of one PAI and one STR) from the same samples,8 and in agreement with the generally accepted migration pattern in the Mediterranean.19

Figure 2
figure 2

PCA plots of the studied Western Mediterranean populations based on PAIs, SNPs and STRs together and separately. Circles correspond to South European and triangles to North African populations. Figures in brackets correspond to the percentage of variance explained by each PC. Note that plots are not in the same scale.

Despite the apparent South vs North genetic differentiation in the Western Mediterranean, PCA alone does not provide enough evidence for the existence of local structures in our sample. Such questions are better answered by methods that are based on geo-referenced data. In this light, Figure 3 shows the eigenvalues of the sPCA for the three marker sets together and separately. Because real structures tend to produce extreme positive or negative eigenvalues, sPCA indicated a single global structure for each of the marker sets and no local structures (for example, a genetic barrier between neighboring North African and South European populations). These results are more compatible with a clinal distribution of allele frequencies rather than with abrupt changes, suggesting that isolation by distance is a more likely mechanism of genetic differentiation in the Western Mediterranean. An alternative/complementary explanation is a progressive introgression from North African to Southwestern European populations, possibly reflecting the Muslim conquest of Hispania in the 8th century AC.

Figure 3
figure 3

Bar plots of the eigenvalues of the sPCA (ordered from higher to lower) based on PAIs, SNPs and STRs together and separately. Positive eigenvalues (on the left) correspond to global structures, whereas negative eigenvalues (on the right) indicate local patterns. Actual structures should result in more extreme (positive or negative) eigenvalues. Moran’s I for PAIs: (0.617, 0.227, 0.144, 0.017, −0.082, −0.187, −0.131, −0.262, −0.400, −0.329, −0.372); SNPs: (0.569, 0.337, 0.174, 0.048, −0.038, −0.167, −0.108, −0.315, −0.275, −0.379, −0.269, −0.378); STRs: (0.427, 0.356, 0.213, −0.106, −0.106, −0.286, −0.350, −0.366, −0.272, −0.366); all markers: (0.486, 0.166, 0.073, −0.201, −0.270, −0.298, −0.342, −0.495).

Our observations contradict previous results that were arguing in favor of a major genetic barrier between the Iberian Peninsula and the Western Maghreb. Such studies were either tracing patterns of maximal genetic differentiation—typically pairwise FST values—on a specific spatial distribution provided by the geographic coordinates of the studied populations10 or they were testing for a single specific case of local structure: the one corresponding to a genetic barrier imposed by the Mediterranean Sea.12 Compared with those methods, sPCA is a more sensitive approach because (i) it takes into consideration an unbiased estimate of geographical structure (Moran’s I) and (ii) it is hypothesis free.

It is worth noting that all three data sets provided very similar results, despite the differences in the underlying mutation models. In accord with our observation, a previous study using PAIs, SNPs and STRs found high and statistically significant correlations between pairwise population genetic distances for the three marker types and for 27 worldwide populations from Africa, Asia and Europe.20

In conclusion, our analysis suggests that population structure in the Western Mediterranean is most probably the result of an isolation-by-distance mechanism between South European and North African populations. No strong local structures seem to affect the genetic landscape in the studied region. Future implementations of sPCA should include bigger data sets, as well as mtDNA and Y-chromosome data to provide information on sex-biased dispersal of allele frequencies in the Western Mediterranean.