Abstract
Genetic isolates with a history of a small founder population, long-lasting isolation and population bottlenecks represent exceptional resources in the identification of genes involved in the pathogenesis of multifactorial diseases. In these populations, the disease allele reveals linkage disequilibrium (LD) with markers over significant genetic intervals, therefore facilitating disease locus identification. This study has been designed to examine the background LD extension in some subpopulations of Corsica. Our interest in the island of Corsica is due to its geographical and genetic proximity to the other Mediterranean island of Sardinia. Sardinian isolates in which the extension of the background LD is particularly high have been recently identified and are now the object of studies aimed at the mapping of genes involved in complex diseases. Recent evidence has highlighted that the genetic proximity between the populations of Corsica and Sardinia is particularly true for the internal conservative populations. Given these considerations, Sardinia and Corsica may represent a unique system to carry out parallel association studies whose results could be validated by comparison. In the present study, we have analyzed the LD extension on the Xq13 genomic region in three subpopulations of Corsica: Corte, Niolo and Bozio, all located in the mountainous north-center of the island. Our results show a strong degree of LD over long distance for the population of Bozio and to a less extent for the population of Niolo. Their LD extent is comparable to or higher than that reported for other isolates.
Similar content being viewed by others
Introduction
Recent reports have shown that in isolated and conservative populations (isolates) the level of linkage disequilibrium (LD), the nonrandom association of alleles at closely linked loci, is particularly high.1, 2, 3, 4, 5, 6, 7
The design and feasibility of genome-wide association studies are critically dependent on the extent of LD. The higher the extension of LD, the lower are the number of polymorphism markers that have to be examined in order to find an association with a particular disease. Additionally, rare mutations that are diluted in ‘open’ populations can have high frequencies due to endogamy and founder effect in these isolates.
This study has been designed to examine the extension of the LD on a region of the X chromosome (Xq13), as a measure of the background LD, in some populations of the island of Corsica. Even though the LD extension on the region Xq13 is not representative of the average level of LD of the entire genome,8 the analysis of this region has been widely used as a measure of a general LD in a given population and to compare the levels of LD between populations.1, 2, 3, 4, 5, 6, 7, 9
The interest on the isolates of the island of Corsica is based on its closeness to the island of Sardinia, which is the object of several studies, carried out by different groups, aimed at the mapping of disease-related genes. Recent data have shown a genetic closeness between the two islands. The genetic proximity of the populations of Sardinia and Corsica has been inferred on the basis of the gene frequency of serum proteins, isozymes, as well as mitochondrial control region sequence variations. All the data show that Sardinians and Corsicans are genetically much closer to each other than to any other Mediterranean population.10, 11, 12, 13, 14, 15 Corsica is the third largest island of the Western Mediterranean sea. It is very mountainous with peaks reaching 2710 m in altitude. Until Pleistocene, it was physically linked to Sardinia in a single geological block. About 11 000 years ago, the two islands fell apart and were separated by the strait called ‘Bocche di Bonifacio’.
According to mitochondrial DNA sequence variations,15 the Sardinian-Corsica block was peopled in a period between 14 000 and 78 000 years ago (Paleolithic period), through the Tuscany island during the last glaciation (Wurm), when the sea level was lower. After the physical separation (during the Pleistocene period), the population of the two islands fell apart even though a reduced, but constant genetic flow remains between north Sardinia and southern Corsica. For both islands, genetic drift, isolation and low population numbers have played a strong part in their genetic shaping. Sardinia and Corsica were invaded several times, often by the same populations. In the great majority of cases, these invasions were limited to the coast and left slight marks on the gene pool of the native populations. Strong evidence also suggests an internal microgeographic diversity inside Sardinia and Corsica, with the most conserved population located in the center of the two islands on the mountainous regions.4, 16, 17, 18, 19, 20 The internal conserved populations of Sardinia and Corsica are also genetically closer between the two islands.19, 20
Materials and method
The three populations under investigation in the present study belong to an area of northern central Corsica of 14 000 inhabitants (Figure 1). Corte is a small city with about 6000 inhabitants. Bozio and Niolo are geographic areas situated East and West of Corte, respectively. Bozio and Niolo are constituted of several small villages. The total population of Niolo and Bozio is 2400 and 1400 inhabitants, respectively. DNA samples of unrelated male individuals were collected in the region of Niolo (n=49), Bozio (n=51) and Corte (n=50). Samples have been analyzed using seven dinucleotide microsatellite markers on chromosome Xq13.3: DXS983, DXS986, DXS8092, DXS8082, DXS1225, DXS8037 and DXS995. Microsatellites were analyzed by using an ABI Prism 377 DNA analyzer.1 Genotypes were processed by Genescan v3.1 and Genotyper v2.5 software.
The nonrandom allelic association between pairs of microsatellite loci has been tested by an extension of Fisher's exact test on contingency tables.21 P-values have been corrected by the step-down Holm–Sidak procedure22 with the formula Pcorrected=1−(1−P)n, where n is the number of P-values smaller or equal to that being corrected. With this procedure, correction results in a probability with a distribution from 0 to 1, unlike the Bonferroni correction, usually used in multiple testing corrections, in which the corrected P-value can exceed 1. In the present paper, we have re-applied the step-down Holm–Sidak procedure22 to the published P-values of Gavoi,2 Talana4 and Urzulei.5 We were induced to do so by the paper of Katoh et al,7 2002, where the procedure has been re-applied on the data of Zavattari et al,2 showing that the Holm–Sidak method could be inaccurately applied. Indeed the number of P-values remaining lower than 0.05 (bold in Table 1) are fewer than those reported.2, 4, 5
Each P-value has been considered significant when <0.05 and suggestive when 0.05<P<0.10. LD has been measured by the normalized disequilibrium coefficient D′23 between each marker loci pairs. As small sample size can overestimate the values of D′, we also reporte estimated D′ corrected by a bootstrap procedure.24 Disequilibrium across each locus was plotted using GOLD software25 (http://www.sph.umich.edu/csg/abecasis/GOLD/). In order to test the null hypothesis that the allelic distribution is identical across the populations under investigation, we have carried out a genetic differentiation test using the Genepop 3.3 software (ftp://ftp.cefe.cnrs-mop.fr/pub/pc/msdos/genepop).26 For each locus, the test is carried out on a contingency table. The method, based on allelic distribution of alleles in various samples, is described by Raymond and Rousset.27 In the present work, the test is performed automatically for all pairs of populations for all loci.
Results
Table 1 reports the significance of nonrandom allelic association between pairs of microsatellite loci by the pairwise LD based on Fisher's exact test. The data in bold are those remaining lower than 0.05 after correction using the Holm–Sidak method.22 The data are compared to those of some Sardinian genetic isolates reported in literature.2, 4, 5 The population of Bozio shows 10 out of 21 pairs with significant LD (P<0.05) and two suggestive (0.05<P<0.10). The largest distance between markers with a significant P is 9.5 megabases (Mb) (DXS983–DXS1225). After correction, using the Holm–Sidak method, Bozio shows five pairs with significant LD, with the largest distance between markers with a significant P at 2.5 Mb (DXS8082–DXS986). The population of Niolo shows six pairs with significant LD, and two pairs after correction. The largest distance between markers with a significant P is 3.5 Mb (DXS8082–DXS995).
The population of Corte shows two pairs with significant LD and three suggestive. The largest distance between markers with a significant P is 2 Mb (DXS1225–DXS986). After correction, only one pair remains significant at a distance of 0.5 Mb (DXS8082–DXS1225).
Data for Bozio and Niolo are similar to those reported by others for some Sardinian genetic isolates such as Talana and Urzulei (Table 1).4, 5 The other Sardinian isolate examined to date, Gavoi, shows instead a greater number of significant pairwise comparisons (Table 1).2
Figure 2 shows the extent of the LD graphically, between pairwise microsatellite markers inside the genomic region under investigation in the population of Niolo, Corte and Bozio. Red indicates D′=1; dark blue D′=0. As shown in Figure 2, the population of Niolo shows the highest D′ value (0.81; Table 2); however, the D′ values decrease with the distance. The population of Bozio shows a maximum D′ value (0.78; Table 2), which is slightly lower to that of Niolo. Nevertheless, the D′ values decline at a much lower rate with the distance and remain at intermediate values, higher than 0.5 for a large part of the Xq13 region. On the contrary, the maximum D′ value for the population of Corte (0.65; Table 2) is lower than those of Bozio and Niolo, decreasing to under 0.5 for the most part of the interval.
In order to visualize and compare the strength of LD over distance among Sardinian and Corsican isolates, Figure 3 shows the LD trend along 10 Mb of the X chromosome region under investigation for the three populations examined and in the Sardinian isolates of Talana (Figure 3a) and Gavoi (Figure 3b). D′ value of 1 indicates complete LD, 0 indicates no LD. The degree of LD needed for effective mapping depends on several factors, nevertheless D′ values higher than 0.5 are considered useful.28
In Figure 3a, the average D′ values over the distance have been calculated excluding the pairwise D′ values of the marker DXS995, which have not been analyzed by Angius et al.5
In Figure 3b, the average D′ values over the distance have been calculated including marker DXS995 to allow comparison with the genetic isolate of Gavoi as reported by Zavattari et al.2 The average D′ values for the population of Niolo, Bozio and Corte are slightly lower than those of Figure 2a obtained excluding marker DXS995.
We could not compare the Sardinian isolate of Urzulei to our samples since D′ data are not published.5
In summary, Figure 3 a and b shows that average D′ values along 10 Mb for the central isolated population of Corsica and those of central Sardinia, reported by others, are similar with the exception of Corte (see below). In particular, the populations of Bozio and Gavoi show average D′ values close to or higher than 0.5 over very long distances (Figure 3b), while those of Niolo and Talana show high values of average D′ for relatively short distances (1 Mb) with a decrease over distance (Figure 3a).
It is noteworthy that the population of Gavoi shows greater number of significant P-values and lower average D′ values than other populations such as Bozio. This difference is most likely due to the larger sample size used in the work of Zavattari et al.2 Indeed the power of Fisher's test and D′ values are affected by sample size.29
In order to evaluate the bias of D′ estimates due to small sample size, we corrected the estimated D′ value by use of the bootstrap procedure. As shown in Table 2, there is a general decrease in D′ values after correction. Nevertheless, most of the D′ values greater than 0.5 are still higher than 0.5 after correction. Only the D′ values that were very close to 0.5 before correction go below this value afterwards. These data confirm the high degree of LD in the populations of Niolo and Bozio.
Finally, we have also carried out a genetic differentiation test, by Genepop 3.3, based on the allelic distribution of alleles in the three samples. Results (Table 3) show no differentiation between the populations of Niolo and Bozio (P=0.2), whereas both populations show significant P-values against Corte (P=3 × 10−4 and 2 × 10−5 for Niolo and Bozio, respectively).
Discussion
Our results show a high degree of LD for the population of Bozio and Niolo, and a lower degree for Corte. This high degree of LD has most likely been created by genetic drift and has been maintained by isolation and slow growth of the populations. Corte shows a lower degree of LD. This result is most likely explained by the fact that Corte, being the historical capital of Corsica, has been relatively more open to genetic flow compared to Niolo and Bozio regions. Very little is known about the biodemographic history of the populations under investigation, since data based on the ecclesiastical records are not available.
The Niolo region is located in a mountainous area in the central northwest of Corsica, is composed of several villages and its population size has been relatively constant, with a very slow growth rate in the last decades. Its economy is mainly based on the presence of snow resorts. The Bozio region is located in a mountainous region in the central northeast of Corsica and is composed of several villages, very close to each other, which are being depopulated owing to migration toward the cost and outside the island owing to the poor economy of the area. The central area of the island, on the whole, has been the stronghold of the Corsican indigenous population against the different invaders.
Genetic differentiation test shows no differentiation among the populations of Niolo and Bozio, which are both different from Corte (Table 3). We believe that there has not been a significant genetic flow between the populations of Bozio and Niolo and Corte since populations are separated by strong natural barriers. The genetic proximity between Bozio and Niolo, pointed out by the genetic differentiation test, is most likely explained by the fact that the three populations derive from a common ancestral genetic pool, which has been lost in the population of Corte, more open, and has probably been preserved in the populations of Bozio and Niolo.
Isolates have been suggested to be useful in the identification of genetic regions involved in common diseases.30, 31, 32 In this context, it is worth noting that the general population of the island of Sardinia is considered suitable to carry out association studies aimed at the identification of genes involved in the pathogenesis of complex diseases, given the well-documented genetic isolation18 and the high frequency of some genetic diseases.33, 34, 35, 36 Nevertheless, Eaves et al9 have shown that background LD extension on the general population of Sardinia is not very high and comparable to that of the general population of the United Kingdom. The reason for such apparent contradiction is to be found in the earlier-mentioned peculiar genetic structure of the Sardinian, as well as Corsica, population, which is characterized by an extraordinary degree of microdifferentiation.12, 17, 20 To the observed internal variability has contributed the following: (1) the internal geographical barriers; (2) the strict isolation and the accompanying great level of endogamy and inbreeding; and (3) the endemic presence of malaria and other diseases as well as famines, which exerted a strong selective pressure. Nevertheless, it is possible to delimitate inside the two islands homogeneous areas with high LD level as reported here and by several authors.12, 17
In approaching these studies, we need to bear in mind that the identification of a genomic region associated to a disease must be confirmed in more than one subisolate for at least two reasons: (1) different loci may be responsible for the same disease phenotype in different populations and (2) a casual genetic identity must be distinguished from a true identity by descent.37, 38, 39 To this end, the internal founding populations of Corsica and Sardinia may represent an interesting system to validate association study results.29 The two populations derive from a common founding genetic pool, and similar evolutionary forces, such as isolation, consanguinity and bottleneck due to famines and epidemics, have shaped them over time. In addition, the two populations share a similar dietary regime and climate (Mediterranean). All these considerations suggest that they may have ‘selected’ the same kind of allele associate to particular common diseases.
Populations with high LD extension are well suited for a rough LD mapping of extended genomic regions, whereas they are probably unsuited for fine mapping since markers found far away from the disease-associated locus may also show a significant association with the disease genes. Therefore, we suggest a multistep procedure as proposed by Kaessmann et al6 in attempting to use isolates of the Sardinia and Corsica Islands in association studies:
-
Identification of a large genomic region containing the disease-associated locus in a small subisolated population of the central region of Corsica and replication in a small subisolated population of central Sardinia.
-
The disease-associated region could be more finely mapped in a recently expanded population.
-
Fine mapping could be carried out in open populations in which the extent of LD is low, such as the general population of Sardinia4, 5 and possibly Corsica (the LD extension on Xq13.3 in the general population of Corsica is under investigation).
-
Open population in which the level of background LD is very low, such as the African population, could be used for the final mapping, in agreement with the common diseases/common variant hypothesis and the ‘out of Africa’ theory of human evolution.40
In this study, we have used microsatellite markers to assess background LD. The use of microsatellite markers in LD mapping is being substituted by single nucleotide polymorphisms. Nevertheless, microsatellite markers remain a valuable tool for the first screening of background LD in a given population, as shown by this study and by others,1, 2, 3, 4, 5, 6, 7 and can be useful for studies of complex traits in isolated populations where the extent of LD is particularly high.29
In conclusion, our data indicate that the population of the central region of Corsica could be well suited for the mapping of genes involved in the pathogenesis of complexes diseases. We also believe that Sardinia and Corsica and their isolates could represent an interesting system to carry out association studies given the common genetic derivation and similarity of selective pressures.
References
Laan M, Paabo S : Demographic history and LD in human population. Nat Genet 1997; 17: 435–438.
Zavattari P, Deidda E, Whalen M et al: Major factors influencing LD by analysis of different chromosome regions in distinct populations: demography, chromosome recombination frequency and selection. Hum Mol Genet 2000; 9: 2947–2957.
Varilo T, Laan M, Hovatta I, Wiebe V, Terwilliger JD, Peltonen L : Linkage disequilibrium in isolated populations: Finland and young sub-population of Kuusano. Eur J Hum Genet 2000; 8: 604–612.
Angius A, Melis MP, Morelli L et al: Archival, demographic and genetic studies define a Sardinian sub-isolate as a suitable model for mapping complex traits. Hum Genet 2001; 109: 198–209.
Angius A, Bebbere D, Petretto E et al: Not all isolates are equal: linkage disequilibrium analysis on Xq13.3 reveals different patterns in Sardinian sub-populations. Hum Genet 2002; 111: 9–15.
Kaessmann H, Zollner S, Gustaffson A et al: Extensive LD in small human populations in Eurasia. Am J Hum Genet 2002; 70: 673–685.
Katoh T, Mono S, Ikuta T et al: Genetic isolates in East Asia: a study of linkage disequilibrium in the X chromosome. Am J Hum Genet 2002; 71: 395–400.
Taillon-Miller P, Bauer-Sardina I, Saccone NL et al: Juxtaposed regions of extensive and minimal linkage disequilibrium in human Xq25 and Xq28. Nat Genet 2000; 25: 324–328.
Eaves IA, Merriman T, Barber RA et al: The genetically isolated populations of Finland and Sardinia may not be a panacea for linkage disequilibrium mapping of common disease genes. Nat Genet 2000; 25: 320–323.
Vona G, Memmì M, Varesi L, Mameli GE, Succa V : A study of several genetic markers in the Corsican population (France). Anthropol Anz 1995; 53: 125–132.
Varesi L, Vona G, Memmì M, Marongiu F, Ristaldi MS : β-Thalassemia mutations in Corsica. Hemoglobin 2000; 24: 239–244.
Vona G, Memmì M, Calò CM et al: Genetic structure of the Corsican population (France): a review; in: Research Signpost (eds): Recent Research Developments in Human Genetics: 2002, Vol 1, pp 147–164.
Grimaldi MC, Crouau-Roy B, Contu L, Amoros JP : Molecular variation of HLA class I genes in the corsican population: approach to its origin. Eur J Immunogenet 2002; 29: 101–107.
Vona G, Moral P, Memmì M, Ghiani ME, Varesi L : Genetic structure and affinities of the Corsican population (France): classical genetic markers analysis. Am J Hum Biol 2003; 15: 151–163.
Varesi L, Memmì M, Cristofari MC, Mameli GE, Calò MC, Vona G : Mitochondrial control-region sequence in the corsican population (France). Am J Hum Biol 2000; 12: 339–351.
Ranque J, Vicoli R, Silicati V et al: Le peuplement humain in Corse. Varietè des populations insulaires. Etude sèro-antropologique. Bull Soc Corse Biol Hum 1961; 1: 115–168.
Cappello N, Rendine S, Griffo R et al: Genetic analysis of Sardinia I data on 12 polymorphisms in 21 linguistic domains. Ann Hum Genet 1996; 60: 125–141.
Vona G : The peopling of Sardinia (Italy): history and effects. Int J Anthropol 1997; 12: 71–87.
Calò CM, Vacca L, Vona G : Analysis of selected microsatellites in 4 regions of Corsica (Francia). Antropol Contemp 1999; 87–98, (Monografia).
Latini V, Vona G, Ristaldi MS et al: β-Globin cluster haplotypes in Corsica and Sardinia populations. Hum Biol 2003; 75: 855–871.
Slatkin M : Linkage disequilibrium in growing and stable populations. Genetics 1994; 137: 331–336.
Lautenberger JA, Stephens JC, O'Brien SJ, Smith MW : Significant admixture linkage disequilibrium across the FY locus in African Americans. Am J Hum Genet 2000; 66: 969–978.
Lewontin RC : The interaction of selection and linkage. General considerations; heterotic models. Genetics 1964; 49: 49–67.
Efron B, Tibshirani RJ : An Introduction to the Bootstrap. New York: Chapman & Hall; 1993.
Abecasis GR, Cookson WO : Gold – graphical overview of linkage disequilibrium. Bioinformatics 2000; 16: 182–183.
Raymond M, Rousset F : GENEPOP (version 1.2): population genetics software for exact tests and ecumenisms. J Hered 1995; 86: 248–249.
Raymond M, Rousset F : An exact test for population differentiation. Evolution 1995; 49: 1280–1283.
Reich DE, Cargill M, Bolk S et al: Linkage disequilibrium in the human genome. Nature 2001; 411: 199–204.
Mohlke KL, Lange EM, Valle TT et al: Linkage disequilibrium between microsatellite markers extends beyond 1 cM on chromosome 20 in Finns. Choice in mapping genes for complex diseases. Nat Genet 1999; 23: 394–404.
Ardlie KG, Kruglyak L, Seielstad M : Patterns of LD in the human genome. Nat Genet 2002; 3: 299–309.
Marrosu MG, Murru R, Murru MR et al: Dissection of the HLA association with multiple sclerosis in the founder isolate population of Sardinia. Hum Mol Genet 2001; 11: 1221–1226.
Sheffield VC, Stone EM, Carmi R : Use of isolated inbred human populations for identification of disease genes. TIG Rev 1998; 14: 391–396.
Wright AF, Carothers AD, Pirastu M : Population multiple sclerosis in the founder isolated population of Sardinia. Hum Mol Genet 2001; 10: 2907–2916.
Contu D, Morelli L, Zavattari P et al: Sex-related bias and exclusion mapping of the non recombinant portion of chromosome Y in human type 1 diabetes in the isolated founder population of Sardinia. Diabetes 2002; 51: 3573–3576.
Angius A, Petretto E, Maestrale GB et al: A new essential hypertension susceptibility locus on chromosome 2p24–p25, detected by genomewide search. Am J Hum Genet 2002; 71: 893–905.
Gianfrancesco F, Esposito T, Ombra MN et al: Identification of a novel gene and a common variant associated with uric acid nephrolithiasis in a Sardinian genetic isolate. Am J Hum Genet 2003; 72: 1479–1491.
Risch NJ : Searching for genetic determinants in the new millennium. Nature 2000; 405: 847–856.
Cardon LR, Bell JI : Association study designs for complex diseases. Nat Rev 2001; 2: 91–99.
Heath S, Robledo R, Beggs W : A novel approach to search for identity by descent in small samples of patients and controls from the same mendelian breeding unit: a pilot study on myopia. Hum Hered 2001; 52: 183–190.
Tishkoff SA, Williams SM : Genetic analysis of African populations: human evolution and complex disease. Nat Rev 2002; 3: 611–621.
Acknowledgements
We thank Traci Primm for reviewing the English version of this manuscript. This work has been supported by Ministero della Sanità, Italy. Progetti speciali art. 12 bis coma 6, decreto legislativo D. lgs 229/99 and Diagnosi, prevenzione e terapia di malattie monogeniche e multifattoriali nella popolazione Sarda, Regione sardecna
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Latini, V., Sole, G., Doratiotto, S. et al. Genetic isolates in Corsica (France): linkage disequilibrium extension analysis on the Xq13 region. Eur J Hum Genet 12, 613–619 (2004). https://doi.org/10.1038/sj.ejhg.5201205
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/sj.ejhg.5201205
Keywords
This article is cited by
-
The Mediterranean Sea as a barrier to gene flow: evidence from variation in and around the F7 and F12 genomic regions
BMC Evolutionary Biology (2010)
-
Validation of six closely linked STRs located in the chromosome X centromere region
International Journal of Legal Medicine (2010)
-
The value of some Corsican sub-populations for genetic association studies
BMC Medical Genetics (2008)
-
Linkage disequilibrium analysis in the genetically isolated Norfolk Island population
Heredity (2008)
-
Populationsgenetik des humanen X-Chromosoms
Medizinische Genetik (2008)