Introduction

Recent reports have shown that in isolated and conservative populations (isolates) the level of linkage disequilibrium (LD), the nonrandom association of alleles at closely linked loci, is particularly high.1, 2, 3, 4, 5, 6, 7

The design and feasibility of genome-wide association studies are critically dependent on the extent of LD. The higher the extension of LD, the lower are the number of polymorphism markers that have to be examined in order to find an association with a particular disease. Additionally, rare mutations that are diluted in ‘open’ populations can have high frequencies due to endogamy and founder effect in these isolates.

This study has been designed to examine the extension of the LD on a region of the X chromosome (Xq13), as a measure of the background LD, in some populations of the island of Corsica. Even though the LD extension on the region Xq13 is not representative of the average level of LD of the entire genome,8 the analysis of this region has been widely used as a measure of a general LD in a given population and to compare the levels of LD between populations.1, 2, 3, 4, 5, 6, 7, 9

The interest on the isolates of the island of Corsica is based on its closeness to the island of Sardinia, which is the object of several studies, carried out by different groups, aimed at the mapping of disease-related genes. Recent data have shown a genetic closeness between the two islands. The genetic proximity of the populations of Sardinia and Corsica has been inferred on the basis of the gene frequency of serum proteins, isozymes, as well as mitochondrial control region sequence variations. All the data show that Sardinians and Corsicans are genetically much closer to each other than to any other Mediterranean population.10, 11, 12, 13, 14, 15 Corsica is the third largest island of the Western Mediterranean sea. It is very mountainous with peaks reaching 2710 m in altitude. Until Pleistocene, it was physically linked to Sardinia in a single geological block. About 11 000 years ago, the two islands fell apart and were separated by the strait called ‘Bocche di Bonifacio’.

According to mitochondrial DNA sequence variations,15 the Sardinian-Corsica block was peopled in a period between 14 000 and 78 000 years ago (Paleolithic period), through the Tuscany island during the last glaciation (Wurm), when the sea level was lower. After the physical separation (during the Pleistocene period), the population of the two islands fell apart even though a reduced, but constant genetic flow remains between north Sardinia and southern Corsica. For both islands, genetic drift, isolation and low population numbers have played a strong part in their genetic shaping. Sardinia and Corsica were invaded several times, often by the same populations. In the great majority of cases, these invasions were limited to the coast and left slight marks on the gene pool of the native populations. Strong evidence also suggests an internal microgeographic diversity inside Sardinia and Corsica, with the most conserved population located in the center of the two islands on the mountainous regions.4, 16, 17, 18, 19, 20 The internal conserved populations of Sardinia and Corsica are also genetically closer between the two islands.19, 20

Materials and method

The three populations under investigation in the present study belong to an area of northern central Corsica of 14 000 inhabitants (Figure 1). Corte is a small city with about 6000 inhabitants. Bozio and Niolo are geographic areas situated East and West of Corte, respectively. Bozio and Niolo are constituted of several small villages. The total population of Niolo and Bozio is 2400 and 1400 inhabitants, respectively. DNA samples of unrelated male individuals were collected in the region of Niolo (n=49), Bozio (n=51) and Corte (n=50). Samples have been analyzed using seven dinucleotide microsatellite markers on chromosome Xq13.3: DXS983, DXS986, DXS8092, DXS8082, DXS1225, DXS8037 and DXS995. Microsatellites were analyzed by using an ABI Prism 377 DNA analyzer.1 Genotypes were processed by Genescan v3.1 and Genotyper v2.5 software.

Figure 1
figure 1

Corsica isolates and West Mediterranean Basin.

The nonrandom allelic association between pairs of microsatellite loci has been tested by an extension of Fisher's exact test on contingency tables.21 P-values have been corrected by the step-down Holm–Sidak procedure22 with the formula Pcorrected=1−(1−P)n, where n is the number of P-values smaller or equal to that being corrected. With this procedure, correction results in a probability with a distribution from 0 to 1, unlike the Bonferroni correction, usually used in multiple testing corrections, in which the corrected P-value can exceed 1. In the present paper, we have re-applied the step-down Holm–Sidak procedure22 to the published P-values of Gavoi,2 Talana4 and Urzulei.5 We were induced to do so by the paper of Katoh et al,7 2002, where the procedure has been re-applied on the data of Zavattari et al,2 showing that the Holm–Sidak method could be inaccurately applied. Indeed the number of P-values remaining lower than 0.05 (bold in Table 1) are fewer than those reported.2, 4, 5

Table 1 Pairwise LD between markers for Bozio, Niolo and Corte

Each P-value has been considered significant when <0.05 and suggestive when 0.05<P<0.10. LD has been measured by the normalized disequilibrium coefficient D23 between each marker loci pairs. As small sample size can overestimate the values of D′, we also reporte estimated D′ corrected by a bootstrap procedure.24 Disequilibrium across each locus was plotted using GOLD software25 (http://www.sph.umich.edu/csg/abecasis/GOLD/). In order to test the null hypothesis that the allelic distribution is identical across the populations under investigation, we have carried out a genetic differentiation test using the Genepop 3.3 software (ftp://ftp.cefe.cnrs-mop.fr/pub/pc/msdos/genepop).26 For each locus, the test is carried out on a contingency table. The method, based on allelic distribution of alleles in various samples, is described by Raymond and Rousset.27 In the present work, the test is performed automatically for all pairs of populations for all loci.

Results

Table 1 reports the significance of nonrandom allelic association between pairs of microsatellite loci by the pairwise LD based on Fisher's exact test. The data in bold are those remaining lower than 0.05 after correction using the Holm–Sidak method.22 The data are compared to those of some Sardinian genetic isolates reported in literature.2, 4, 5 The population of Bozio shows 10 out of 21 pairs with significant LD (P<0.05) and two suggestive (0.05<P<0.10). The largest distance between markers with a significant P is 9.5 megabases (Mb) (DXS983–DXS1225). After correction, using the Holm–Sidak method, Bozio shows five pairs with significant LD, with the largest distance between markers with a significant P at 2.5 Mb (DXS8082–DXS986). The population of Niolo shows six pairs with significant LD, and two pairs after correction. The largest distance between markers with a significant P is 3.5 Mb (DXS8082–DXS995).

The population of Corte shows two pairs with significant LD and three suggestive. The largest distance between markers with a significant P is 2 Mb (DXS1225–DXS986). After correction, only one pair remains significant at a distance of 0.5 Mb (DXS8082–DXS1225).

Data for Bozio and Niolo are similar to those reported by others for some Sardinian genetic isolates such as Talana and Urzulei (Table 1).4, 5 The other Sardinian isolate examined to date, Gavoi, shows instead a greater number of significant pairwise comparisons (Table 1).2

Figure 2 shows the extent of the LD graphically, between pairwise microsatellite markers inside the genomic region under investigation in the population of Niolo, Corte and Bozio. Red indicates D′=1; dark blue D′=0. As shown in Figure 2, the population of Niolo shows the highest D′ value (0.81; Table 2); however, the D′ values decrease with the distance. The population of Bozio shows a maximum D′ value (0.78; Table 2), which is slightly lower to that of Niolo. Nevertheless, the D′ values decline at a much lower rate with the distance and remain at intermediate values, higher than 0.5 for a large part of the Xq13 region. On the contrary, the maximum D′ value for the population of Corte (0.65; Table 2) is lower than those of Bozio and Niolo, decreasing to under 0.5 for the most part of the interval.

Figure 2
figure 2

Distribution of LD for all the studied markers. Distances (Mb) are those reported in Table 1. The figure was drawn by the GOLD program (v. 1.1): colors reflect D′ values from red (D′=1) to deep blue (D′=0).

Table 2 Original D′ values and D′ derived by use of a bootstrap correction for all the isolates studied

In order to visualize and compare the strength of LD over distance among Sardinian and Corsican isolates, Figure 3 shows the LD trend along 10 Mb of the X chromosome region under investigation for the three populations examined and in the Sardinian isolates of Talana (Figure 3a) and Gavoi (Figure 3b). D′ value of 1 indicates complete LD, 0 indicates no LD. The degree of LD needed for effective mapping depends on several factors, nevertheless D′ values higher than 0.5 are considered useful.28

Figure 3
figure 3

Comparison of the LD extension evaluated as average multiallelic D′ values versus stratified physical distances for the chromosome Xq13 region in Corsica and Sardinia isolates. Each point represents the average of pairwise comparisons of six loci. (a) Average D′ values have been calculated without marker DXS995 to allow comparison with the Sardinian isolate of Talana as reported by Angius et al.4 (b) Average D′ values were calculated including marker DXS995 to allow comparison with the Sardinian isolate of Gavoi as reported by Zavattari et al.2

In Figure 3a, the average D′ values over the distance have been calculated excluding the pairwise D′ values of the marker DXS995, which have not been analyzed by Angius et al.5

In Figure 3b, the average D′ values over the distance have been calculated including marker DXS995 to allow comparison with the genetic isolate of Gavoi as reported by Zavattari et al.2 The average D′ values for the population of Niolo, Bozio and Corte are slightly lower than those of Figure 2a obtained excluding marker DXS995.

We could not compare the Sardinian isolate of Urzulei to our samples since D′ data are not published.5

In summary, Figure 3 a and b shows that average D′ values along 10 Mb for the central isolated population of Corsica and those of central Sardinia, reported by others, are similar with the exception of Corte (see below). In particular, the populations of Bozio and Gavoi show average D′ values close to or higher than 0.5 over very long distances (Figure 3b), while those of Niolo and Talana show high values of average D′ for relatively short distances (1 Mb) with a decrease over distance (Figure 3a).

It is noteworthy that the population of Gavoi shows greater number of significant P-values and lower average D′ values than other populations such as Bozio. This difference is most likely due to the larger sample size used in the work of Zavattari et al.2 Indeed the power of Fisher's test and D′ values are affected by sample size.29

In order to evaluate the bias of D′ estimates due to small sample size, we corrected the estimated D′ value by use of the bootstrap procedure. As shown in Table 2, there is a general decrease in D′ values after correction. Nevertheless, most of the D′ values greater than 0.5 are still higher than 0.5 after correction. Only the D′ values that were very close to 0.5 before correction go below this value afterwards. These data confirm the high degree of LD in the populations of Niolo and Bozio.

Finally, we have also carried out a genetic differentiation test, by Genepop 3.3, based on the allelic distribution of alleles in the three samples. Results (Table 3) show no differentiation between the populations of Niolo and Bozio (P=0.2), whereas both populations show significant P-values against Corte (P=3 × 10−4 and 2 × 10−5 for Niolo and Bozio, respectively).

Table 3 Genetic differentiation for each population pair

Discussion

Our results show a high degree of LD for the population of Bozio and Niolo, and a lower degree for Corte. This high degree of LD has most likely been created by genetic drift and has been maintained by isolation and slow growth of the populations. Corte shows a lower degree of LD. This result is most likely explained by the fact that Corte, being the historical capital of Corsica, has been relatively more open to genetic flow compared to Niolo and Bozio regions. Very little is known about the biodemographic history of the populations under investigation, since data based on the ecclesiastical records are not available.

The Niolo region is located in a mountainous area in the central northwest of Corsica, is composed of several villages and its population size has been relatively constant, with a very slow growth rate in the last decades. Its economy is mainly based on the presence of snow resorts. The Bozio region is located in a mountainous region in the central northeast of Corsica and is composed of several villages, very close to each other, which are being depopulated owing to migration toward the cost and outside the island owing to the poor economy of the area. The central area of the island, on the whole, has been the stronghold of the Corsican indigenous population against the different invaders.

Genetic differentiation test shows no differentiation among the populations of Niolo and Bozio, which are both different from Corte (Table 3). We believe that there has not been a significant genetic flow between the populations of Bozio and Niolo and Corte since populations are separated by strong natural barriers. The genetic proximity between Bozio and Niolo, pointed out by the genetic differentiation test, is most likely explained by the fact that the three populations derive from a common ancestral genetic pool, which has been lost in the population of Corte, more open, and has probably been preserved in the populations of Bozio and Niolo.

Isolates have been suggested to be useful in the identification of genetic regions involved in common diseases.30, 31, 32 In this context, it is worth noting that the general population of the island of Sardinia is considered suitable to carry out association studies aimed at the identification of genes involved in the pathogenesis of complex diseases, given the well-documented genetic isolation18 and the high frequency of some genetic diseases.33, 34, 35, 36 Nevertheless, Eaves et al9 have shown that background LD extension on the general population of Sardinia is not very high and comparable to that of the general population of the United Kingdom. The reason for such apparent contradiction is to be found in the earlier-mentioned peculiar genetic structure of the Sardinian, as well as Corsica, population, which is characterized by an extraordinary degree of microdifferentiation.12, 17, 20 To the observed internal variability has contributed the following: (1) the internal geographical barriers; (2) the strict isolation and the accompanying great level of endogamy and inbreeding; and (3) the endemic presence of malaria and other diseases as well as famines, which exerted a strong selective pressure. Nevertheless, it is possible to delimitate inside the two islands homogeneous areas with high LD level as reported here and by several authors.12, 17

In approaching these studies, we need to bear in mind that the identification of a genomic region associated to a disease must be confirmed in more than one subisolate for at least two reasons: (1) different loci may be responsible for the same disease phenotype in different populations and (2) a casual genetic identity must be distinguished from a true identity by descent.37, 38, 39 To this end, the internal founding populations of Corsica and Sardinia may represent an interesting system to validate association study results.29 The two populations derive from a common founding genetic pool, and similar evolutionary forces, such as isolation, consanguinity and bottleneck due to famines and epidemics, have shaped them over time. In addition, the two populations share a similar dietary regime and climate (Mediterranean). All these considerations suggest that they may have ‘selected’ the same kind of allele associate to particular common diseases.

Populations with high LD extension are well suited for a rough LD mapping of extended genomic regions, whereas they are probably unsuited for fine mapping since markers found far away from the disease-associated locus may also show a significant association with the disease genes. Therefore, we suggest a multistep procedure as proposed by Kaessmann et al6 in attempting to use isolates of the Sardinia and Corsica Islands in association studies:

  • Identification of a large genomic region containing the disease-associated locus in a small subisolated population of the central region of Corsica and replication in a small subisolated population of central Sardinia.

  • The disease-associated region could be more finely mapped in a recently expanded population.

  • Fine mapping could be carried out in open populations in which the extent of LD is low, such as the general population of Sardinia4, 5 and possibly Corsica (the LD extension on Xq13.3 in the general population of Corsica is under investigation).

  • Open population in which the level of background LD is very low, such as the African population, could be used for the final mapping, in agreement with the common diseases/common variant hypothesis and the ‘out of Africa’ theory of human evolution.40

In this study, we have used microsatellite markers to assess background LD. The use of microsatellite markers in LD mapping is being substituted by single nucleotide polymorphisms. Nevertheless, microsatellite markers remain a valuable tool for the first screening of background LD in a given population, as shown by this study and by others,1, 2, 3, 4, 5, 6, 7 and can be useful for studies of complex traits in isolated populations where the extent of LD is particularly high.29

In conclusion, our data indicate that the population of the central region of Corsica could be well suited for the mapping of genes involved in the pathogenesis of complexes diseases. We also believe that Sardinia and Corsica and their isolates could represent an interesting system to carry out association studies given the common genetic derivation and similarity of selective pressures.