Introduction

It has often been suggested that genetically isolated populations would offer increased statistical power to detect association because of the impact on their genomic structure of their particular demography.1 Linkage disequilibrium (LD) would be higher than in other populations because of the reduced effective population size, which limits the number of haplotypes that can be present in the population. Moreover, such populations are expected to be genetically more homogeneous, which may translate into being less diverse in the genetic architecture of susceptibility to a particular complex disease.

An apparent candidate as a genetically isolated population suitable for association studies is the Basques. They live in a small area straddling the westernmost section of the French–Spanish border, and speak an isolated, non-Indo-European language. Basques have been described as the most differentiated population in continental western Europe based on classical polymorphisms (ie, blood groups, enzyme polymorphism, and HLA).2 Their mtDNA sequences and Y-chromosome polymorphisms do not show such relatively intense differences, and locate them at the end of European-wide gradients in the frequencies of the most common European lineages, namely mtDNA haplogroup H3, 4 and Y-chromosome haplogroup R1b-M2695, 6, 7 lineage frequencies. As for other DNA autosomal markers, a survey of 11 Alu insertion polymorphisms showed that their allele frequencies were not significantly different between Basques and other Iberians.8 A genome scan of 650 000 SNPs seemed to slightly differentiate French Basques from other western European populations (see Figure 2a in Li et al9); however, given the samples analyzed, such a data set cannot be used to place Basques directly in an Iberian context. In summary, the previous assumption that Basques are strong genetic outliers has been challenged by a number of studies.

In the present study, we seek to verify whether Basques show indeed the genetic features that would make them more appropriate than other populations for genetic association studies. With this aim, we genotyped 123 SNPs in a 1-Mb region of chromosome 22 in two Basque populations, as well as in three other nonisolated Spanish populations, as controls. In addition, this data set can be used to explore the genetic heterogeneity of Spanish populations; because a major source of heterogeneity might be the different contribution of the North African populations7 (a varying proportion of the Iberian Peninsula was ruled by North African Muslims from 711 to 1492 AD), we also genotyped three North African populations.

Material and methods

Samples

We genotyped a total of 541 individuals from eight populations: Basques from Iparralde (the part of the Basque country lying in France) (N=85), Basques from Gipuzkoa (a province of Spain) (N=61), Catalans (in northeastern Iberia) (N=75), Extremadura in southwest Spain (N=36), Andalusians in southern Spain (N=60), north Moroccans (N=84) and south Moroccans (N=84), both mostly Berber-speakers, and Saharauis (N=56), that is, inhabitants of the western Sahara. All individuals had all four grandparents born in the same region. Informed consent was obtained from all donors. Part of the samples was gathered within the LD Europe European project, led by H Cann, CEPH, Paris.

SNP selection and genotyping

SNPs were selected and genotyped as in González-Neira et al.10 Briefly, SNPs were selected from dbSNP build 115 in a region spanning 987 872 bp in chromosome 22, from 32 600 114 to 33 587 986 bp (NCBI Build 34). The 1-Mb region begins at the 3′-end of the glycosyltransferase-like protein LARGE. Three transcripts and nine Gnomon-model predicted genes (www.ncbi.nlm.nih.gov/projects/mapview) map in this region. After we selected and genotyped this region, newer genome releases have shown that it is actually closer to the LARGE gene: in build 36.3, it overlaps by 50 kb with the last intron and exon of the 664-kb LARGE gene. We investigated whether selection may have acted to distort interpopulation differentiation patterns in this region within the geographic area analyzed. LD patterns, FST values in Europe, and the expected neutral FST distribution (see details of this analysis in Supplementary materials) did not reveal any trace of selection driven by LARGE or by any unknown, functional element in the region.

The final data set consisted of 123 SNPs, with an average distance between contiguous SNPs of 8097 bp, and a 95% of the distances fall between 931 and 21 781 bp.

Statistical analysis

Genotypic data were managed with the SNPator platform (www.snpator.org).11 Average heterozygosities and the average number of alleles was directly calculated using the Arlequin package version 2.00,12 http://anthropologie.unige.ch/arlequin/. Arlequin was also used to compute FST distances among populations and to perform AMOVA. The FST distance matrix was visualized by means of multidimensional scaling (MDS) computed with Statistica version 6.0. Negative FST values were set to zero.

LD was described by computing r2, and by counting the proportion of SNP pairs that had r2-values ≥0.8 for different bins of physical distance. LD parameters were computed with Haploview, which was also used to find pairwise tagSNPs within each population by means of the tagger algorithm.13 Haploview infers haplotypes by means of an expectation–maximization algorithm.

Results

The median and interquartile range for heterozygosity is plotted for each population in Figure 1. Average heterozygosity was actually larger in the two Basque samples than in other Iberian populations; this difference reached statistical significance with Extremadura (P10−5 both with French and Spanish Basques, Wilcoxon's nonparametric test), probably given the smaller sample size in Extremadura and the larger number of monomorphic SNPs in this sample.

Figure 1
figure 1

Median and interquartile range for heterozygosity of 123 SNPs in several Iberian and Northwestern African population samples.

FST genetic distances were computed either with the whole 123-SNP data set or with the set of the 56 SNPs that were tagSNPs in each of the eight populations, and were plotted with MDS. Both data sets produced very similar results, and only the latter is presented in Figure 2. Stress was 0.0005 (P<0.01, Sturrock and Rocha14). The MDS plot shows that genetic variation can be described with a single component, namely, Iberian vs North African populations (note the huge difference in scale between components 1 and 2), and Basques are not differentiated from other Iberian populations.

Figure 2
figure 2

Multidimensional scaling plot based on FST genetic distances computed from the allele frequencies of 56 tagSNPs. BAF, French Basques; BAS, Spanish Basques; CAT, Catalans; EXT, Extremadura; AND, Andalusians; NMO, North Moroccans; SMO, South Moroccans; SAH, Saharans.

Next, we tested for the significance of the genetic differences apparent in the FST matrix by means of AMOVA. As above, we used the whole data set or the set of tagSNPs, with very similar results. FST among all populations is 1.03% (P<0.001), and FCT is 2.05% (P=0.02) among the Iberians and North Africans. FCT was much lower between Basques and Iberian non-Basques (−0.02%, which has to be taken as 0). That is, allele frequencies for 123 SNPs show no statistically significant difference among Basques and non-Basques.

Finally, we analyzed LD. Average r2 between adjacent SNPs was lower in French Basques than in any other Iberian sample (P<0.014, Wilcoxon's test), whereas it was similar (and not significantly different) in Spanish Basques. In Figure 3, the proportion of SNP pairs in each physical distance bin with r2>0.8 is plotted, for SNPs with minor allele frequencies (MAF) above 0.05. We verified the hypothesis that the Basques may have a slower LD decay. It is apparent from Figure 3 that this is not the case; actually, French Basques show significantly more LD than Catalans at the 0–15 kb bin (P=0.024 for the whole SNP data set and P=0.012 for MAF>0.05), and Spanish Basques, as other Iberian populations, show less LD than Southern Moroccans at the 0–15 kb and 30–45 kb bins (P=0.038 and P=0.006 for the whole set; P=0.042 and P=0.004, respectively, for the MAF>0.05 subset). A similar pattern was found when analyzing average r2 within each distance bin (data not shown).

Figure 3
figure 3

Proportion of SNP pairs with R2>0.8 as a function of physical distance. Only SNPs with a minor allele frequency above 0.05 were used. Abbreviations as in Figure 2.

Discussion

We have found that Basques have similar or larger heterozygosity, and similar or smaller LD than other Iberians, from which they are not differentiated in their allele frequencies. Thus, we must conclude that, at least as gathered from this genomic region, Basques do not exhibit the genomic properties that would make them a particularly attractive population for association studies.

Our finding that Basques are not genetically differentiated contrasts sharply with previous reports that showed them to be outliers in the genetic variation of western Europe, and appears more in line with the observations from mtDNA sequences and Y-chromosome polymorphisms3, 4, 5, 6, 7 or for whole-genome scans of SNP variation9 (in another recent whole-genome scan,15 French Basques were the only western European population analyzed, and thus, their status as outlier in Europe could not be properly assessed). The loci with the largest allele frequency differences in previous studies2 were ABO, RH, and other blood groups, that is, glycoproteins of the red cell membrane, as well as the HLA histocompatibility system. These genes share a common trait: they are involved in pathogen–human interaction. It is tempting to speculate, then, that such differences were not exclusively due to founder effects and subsequent reduced gene flow, but that microgeographical natural selection, possibly linked to pathogens, had also played a role. A whole-genome scan of 3000 British subjects16 found that, within Britain, HLA was also one of the genomic regions showing the strongest differentiation, and natural selection was also invoked in that case for this geographical differentiation.

These results have strong implications in two partly overlapping fields: on the one hand, it joins a number of studies3, 4, 5, 6, 7, 8, 9 in contradicting the previous extensive literature2, 17, 18, 19, 20, 21 on the genetic distinctiveness of the Basques as a result of population processes and with a strong interaction with cultural isolation; and, on the other hand, on the putative use of the Basques (and of other European isolates yet to be characterized) as a population of choice in understanding the genetic basis of complex traits and diseases.