Haplotype block structure study of the CFTR gene. Most variants are associated with the M470 allele in several European populations


An average of about 1700 CFTR (cystic fibrosis transmembrane conductance regulator) alleles from normal individuals from different European populations were extensively screened for DNA sequence variation. A total of 80 variants were observed: 61 coding SNSs (results already published), 13 noncoding SNSs, three STRs, two short deletions, and one nucleotide insertion. Eight DNA variants were classified as non-CF causing due to their high frequency of occurrence. Through this survey the CFTR has become the most exhaustively studied gene for its coding sequence variability and, though to a lesser extent, for its noncoding sequence variability as well. Interestingly, most variation was associated with the M470 allele, while the V470 allele showed an ‘extended haplotype homozygosity’ (EHH). These findings make us suggest a role for selection acting either on the M470V itself or through an hitchhiking mechanism involving a second site. The possible ancient origin of the V allele in an ‘out of Africa’ time frame is discussed.


The CFTR (cystic fibrosis transmembrane conductance regulator) gene has been extensively characterized for its pathologic variability. However, the study of its overall random variability, wherein disease causing mutations should be framed, has started only recently.1

For many of the several hundreds CFTR variants reported, it is not known whether they are or not CF-causing and this may produce difficulties for genetic counselling. Besides the obvious criteria to identify with certainty the CF-causing mutations (eg frameshift and termination mutations), a purely statistical approach to identify not fully penetrant CF-causing mutations has been proposed by Bombieri et al.2 It is based on the consideration that every CFTR variant with a frequency certainly higher than the cumulative frequency of the not unambiguously identified CF-causing alleles cannot be a fully penetrant CF-causing allele. It was applied to a random sample of 191 Europeans (=382 genes), a population where the cumulative frequency of the not unambiguously identified CF-causing alleles is 0.004 (the difference between 0.02, the total frequency of the CF-causing alleles, and 0.016, the total frequency of the well identified CF-causing alleles: WHO Report3). In that study, 10 certainly non-CF-causing alleles were classified.2

Present paper reports eight further certainly not fully penetrant CF-causing alleles identified through this purely statistical approach. However, the most interesting finding concerned the very different patterns of variability found on the CFTR genes carrying the M470 or the V470 allele.

Materials and methods

The sample

A large part of the present sample was the same as previously described.1 It consists of 1337 healthy, unrelated individuals (selected on the basis of the birth place of the four grandparents) from six geographical areas: Northern Italy, Verona (n=300); Central Italy, Rome (n=300); Southern France, Montpellier (n=300); Northern France, Brest (n=278); Czech Republic, Prague (n=118); and Spain, Barcelona (n=41). All individuals gave their informed consent. Since not all individuals have been studied for all the 27 exons of the CFTR gene, an average sample size has been computed. It amounts to about 1700 haploid genomes. For a detailed list of the sample size studied for each exon see Modiano et al.1

Mutation analysis

Genomic DNA was extracted from blood samples, amplified in vitro by PCR and analysed by DGGE2 or DHPLC.4 Every mutant discovered by these methods was sequenced with the ABI PRISM 377 or 310 Sequence Analyser. Some variants have been studied, on a fraction of the total sample, with a restriction enzyme specific method: the following cSNSs numbered as in Table A1, nos. 1, 12, 20, 24–26, 28, 29, 37, 45, 48, 56, 59, and 60; and the following intronic variants 3041–71g/c, 1001+11c/t, and 2752–15c/g (methods available on request cristina.bombieri@medgen.univr.it).

Maximum likelihood (ML)

Estimates of haplotype frequencies, of linkage disequilibria and of their statistical significance were calculated by ARLEQUIN, ver. 2.000.5

Degree of heterozygosity (H)

The degree of heterozygosity (2pq for diallelic sites) has been calculated for each variable site both within the M and the V CFTR genes utilizing the allele frequencies for each variable site within the M (or the V) CFTR genes (see Table 1).

Table 1 Frequencies of the CFTR variants within the M or the V alleles

Web resources

Information about CFTR gene sequence and mutations are available at the Cystic Fibrosis Genetic Analysis Consortium Web Site: http://genet.sickkids.on.ca/cftr


A total of 4443 coding and 2367 noncoding bp (2184 bp intronic plus 183 bp of the UTR regions) had been studied by DGGE (denaturing gradient gel electrophoresis) or DHPLC (denaturing high performance liquid chromatography). Table A1 is an update of that already published in Modiano et al1 and reports the absolute and relative frequencies of the 61 cSNSs (single-nucleotide substitutions in a coding sequence) found in a Czech sample, larger than that already published, together with the updated European frequencies.

A detailed analysis of the cSNS variability has been presented elsewhere.1 Among the 61 cSNSs (45 nonsynonymous, and 16 synonymous) observed in the entire length of the gene, three (ref. nos. 16, 32, 60) were frankly polymorphic (q>0.05) and eight only slightly polymorphic (0.005<q<0.05); all the other cSNSs showed very low frequencies (34 of them were singletons).

Table A2 reports the frequencies of the 19 non-cSNS variant sites detected in this study: 16 intronic (12 SNSs, three STRs, and one nucleotide insertion) and three exonic (one SNS in the 5′UTR and two trinucleotide deletions in the coding sequence).

The density of polymorphic SNSs in the coding and in the noncoding regions turned out to be compatible (12/4443=1/370 and 4/2367=1/592 bp, respectively; P≈0.4); on the contrary, the density of rare SNSs appeared to be three-fold higher in the coding region (49/4443=1/91 bp in the exons and 9/2367=1/263 bp in the introns; P≈0.002).

It has been possible to classify as not fully penetrant CF-causing alleles, on the basis of their frequencies, six cSNSs (ref. nos. 6, 20, 26, 29, 37, and 59, black arrows in Table A1), besides the four already classified in the previous investigation2 (ref. nos. 16, 32, 54, and 60, white arrows in Table A1), and two noncoding variants (ref. letter E and O in Table A2).

The availability of a large number of mutants collected on a random sample of individuals made it possible to perform a comparison between the indirectly estimated relative mutation rates of the 12 possible type of substitutions (Figure 1). The expected numbers of cSNSs have been computed assuming that μ is the same for all of them and that all the mutational events had the same probability to be detected; therefore, since the four nonsynonymous cSNP (ref. nos. 6, 16, 26 and 29) may have been not neutral they have been excluded, and the total number of cSNSs was 57 instead of 61. The T↔A rate was much lower than expected (obs.=1; exp.=11.1; P≈0.002; ≈0.02 with the Bonferroni correction). The combined rate of the complementary C → T and G → A SNSs is about threefold higher than expected (obs.=23; exp.=7.9; P≈10−7), confirming already known notions.6, 7, 8 It is commonly accepted that the strong excess of these two SNSs is due to a particularly high probability of the C nucleotide (both in the sense and in the nonsense DNA strand), when it is followed by G, to mutate to T (see, for example, Cooper and Krawczak9 and Cooper et al10). This is strongly supported by present data. In fact, since 873 is the number of C in the 4443 coding bp of the CFTR gene, 873 is the number of CpN dinucleotides of this gene. Among them only 57 (6.5%) are CpG, whereas six out of the eight (75%) C → T mutations of the present study were in a CpG dinucleotide (P≈0). Similarly, the total number of NpG dinucleotides is 972 and only 57 are CpG (5.9%), whereas five out of the 15 (33.3%) G → A mutations of the present study were in a CpG dinucleotide (P≈10−4).

Figure 1

Indirect relative mutation rate estimates of the 12 types of cSNSs. The expected number for each of 12 cSNS, say X → Y, has been obtained by multiplying the proportion of X among the 4443 coding CFTR bp (T=1236; C=873; A=1362 and G=972) by 57 (the number of cSNSs) and dividing this figure by 3 (each nucleotide can mutate to the other 3). Complementary cSNSs have been combined because the two DNA strands exhibited compatible mutational behaviours (expressed by the ratio obs/exp).

An almost complete linkage disequilibrium (LD) between the M470V and the two other highly polymorphic cSNS sites of the gene (ref. nos. 32 and 60) has been observed. These LDs (D′=0.91 and 0.90, respectively) are shown in Figure 2; they are not due to blocks of absence of recombination.11, 12, 13 The two sites 32 and 60, in fact, are not in strong disequilibrium between themselves in the CFTR genes carrying the M470 allele, thus suggesting that they are very ancient. This situation is strongly reminiscent of that of Rh, the genetic system where the LD phenomenon was first discovered.14, 15 In fact, this system too consists of one polymorphic locus, D/d, and of two additional loci, C/c and E/e, which are highly polymorphic within the chromosomes carrying the D allele and barely polymorphic within the chromosomes with the d allele.

Figure 2

Haploid assortments for the three highly polymorphic cSNSs of the CFTR gene. Lack of haplotype variability associated to the V470 allele compared with M470. V and M indicate the M470V alleles. Three letter words indicate haplotypes; V or M in the first position: M470V (site 16); t or g in the second position: 2694 t/g (site 32); a or g in the third position: 4521 a/g (site 60). The areas indicate haplotype frequencies. D and D′=absolute and relative linkage disequilibrium values, respectively. Reference numbers are as in Table A1. Accession numbers for these three highly polymorphic cSNSs in the dbSNP public database (http://www.ncbi.nlm.nih.gov/SNP/) are: rs213950 for M470V, rs1042077 for 2694t/g, and rs2800136 for 4521g/a.

The large number of CFTR genes studied allowed us to subdivide the total sample into two subsamples consisting of genes carrying the M470 or the V470 allele, respectively. Table 1 compares the degree of variability of the CFTR genes in these two subsamples. It clearly appears that the CFTR genes carrying the M allele are much more variable than those carrying the V allele for most of the markers suitable for such comparison (ie those for which a variant allele was found in at least one MM or VV homozygote, respectively) both for the ‘slow’ (mutation rate in the order of 10−8 to 10−7; n=39) and for the ‘fast’ (mutation rate 10−4 to 10−2; n=3; ie STRs) evolution markers.16 Thus, the estimate of the overall variability of the CFTR gene is the weighted mean of two very different patterns of variability: that of the CFTR genes carrying the M and that of the CFTR genes carrying the V allele, plus, obviously, that due to the M470V site itself. These findings show the existence of an ‘extended haplotype homozygosity’ region (EHH),17 namely of an almost ‘allele-restricted’ monomorphic region concerning only the CFTR genes with the V allele. Such strong preferential concentration of variability within the M CFTR genes turned out to be correlated with the distance from the 470 site being stronger in the DNA sequence around ±50 kb from it (Table 2).

Table 2 The intensity of the M-restricted variability depends on the distance from the M470V site


Sabeti et al17 suggested that an EHH implies a recent positive selection, and verified this hypothesis in two genes (G6PD and TNSF5) known for having been recently subjected to positive selection. Thus, the present finding of an EHH region encompassing the M470V site strongly suggests that the CFTR gene recently underwent selection. This suggestion is in accordance with previous findings of extended homogeneous haplotypes associated with specific CF mutations.18, 19

Some features of the CFTR gene suggest a possible scenario for the selection process:

  1. 1)

    M470 is the ancestral allele. It is in fact the allele found in all the other species studied so far;20, 21, 22

  2. 2)

    M470 is almost fixed among the sub-Saharian Africans: the combined V frequency in the three sub-Saharian African populations we have studied (Mossì, Burkina Faso, n=146 individuals; Ewondo, Ghana, n=10; Pygmies, CAR, n=10; unpublished data) was 0.02±0.01. This high prevalence of M470 presumably applies to all sub-Saharian populations.

  3. 3)

    the V470 allele outside of Africa is very frequent, it is even more common than the M allele (eg for the Europeans1 and for the Asians23);

  4. 4)

    the bulk of CFTR gene variability is restricted to the haplotypes carrying the M470 allele (Table 1).

The time elapsed since the radiation of H. sapiens from Africa to the rest of the Old World (only two/four thousand generations24) has been far too short to account in terms of genetic drift only25 for such a tremendous increase of the V allele frequency. Therefore, a selective process seems more likely. As far as the time of onset of the selection process, the great extension of the region encompassing the M470V site with an almost complete LD suggests that it is recent (see also Slatkin and Bertorelle26). We wish to suggest the involvement of a selectively advantageous mutation X that would have caused, in relatively few generations, the increase of the V allele frequency outside of Africa.

V allele frequency could have increased by one of the following three possible mechanisms, the first one relates to the V mutation itself, the other two pertain to the hitchhiking phenomenon:27

  1. 1)

    X is V. The V allele is very common outside of Africa because only there it has been advantageous. There are indications that the V allele might produce a less functional protein. The V470 allele has, in fact, been reported to have a 1.7 times lower intrinsic chloride channel activity,28 as confirmed by different studies.23, 29, 30 It might have conferred a selective advantage in particular environments as suggested for CF heterozygotes and tubercolosis,31 or lung infections,32 or diarrhea caused by enterotoxic bacteria;33, 34

  2. 2)

    X is not V, and was already present in Africa, in CFTR genes with the V allele, before the migration of H. sapiens, but it did not confer any selective advantage. Its frequency increased dramatically in Europe following human exposure to different environmental conditions that made it advantageous;

  3. 3)

    X is not V, and was born in Europe, in one CFTR gene with the V allele, before the migration of H.sapiens towards Asia.

The first two possibilities require the additional hypothesis that the African V was carried by only one haplotype, while the third possibility is independent from the number of different African haplotypes carrying the V allele.

A choice among these three possibilities would require, at least, the ascertainment of variability, if any, of the African haplotypes carrying V allele.


  1. 1

    Modiano G, Bombieri C, Ciminelli BM et al: A large-scale study of the random variability of a coding sequence: a study on the CFTR gene. Eur J Hum Genet 2005; 13: 184–192.

  2. 2

    Bombieri C, Giorgi S, Carles S et al: A new approach for identifying non-pathogenic mutations. An analysis of the cystic fibrosis transmembrane regulator gene in normal individuals. Hum Genet 2000; 106: 172–178.

  3. 3

    WHO Report: The molecular genetic epidemiology of cystic fibrosis, 2004, www.who.int/genomics/publications/en/.

  4. 4

    Le Marechal C, Audrezet MP, Quere I, Raguenes O, Langonne S, Ferec C : Complete and rapid scanning of the cystic fibrosis transmembrane conductance regulator (CFTR) gene by denaturing high-performance liquid chromatography (D-HPLC): major implications for genetic counselling. Hum Genet 2001; 108: 290–298.

  5. 5

    Schneider S, Roessli D, Excoffier L : Arlequin ver. 2000: a software for population genetics data analysis. Switzerland: Genetics and Biometry Laboratory, University of Geneva.

  6. 6

    Modiano G, Battistuzzi G, Motulsky AG : Nonrandom patterns of codon usage and of nucleotide substitutions in human alpha- and beta-globin genes: an evolutionary strategy reducing the rate of mutations with drastic effects? Proc Natl Acad Sci 1981; 78: 1110–1114.

  7. 7

    Stephens JC, Schneider JA, Tanguay DA et al: Haplotype variation and linkage disequilibrium in 313 human genes. Science 2001; 293: 489–493.

  8. 8

    Strachan T, Read AP : Human Molecular Genetics, 3rd edn. London and New York: Garland Science, 2004.

  9. 9

    Cooper DN, Krawczak M : Human gene mutation. Oxford: BIOS Scientific Publishers, 1993.

  10. 10

    Cooper DN, Krawczak M, Antonarakis SE : The nature and mechanisms of human gene mutation; in Scriver C, Beaudet AL, Sly WS, Valle D (eds): Metabolic and molecular bases of inherited disease. New York: McGraw Hill, 1995, pp 259–291.

  11. 11

    Reich DE, Cargill M, Bolk S et al: Linkage disequilibrium in the human genome. Nature 2001; 411: 199–204.

  12. 12

    Goldstein DB : Islands of linkage disequilibrium. Nat Genet 2001; 29: 109–111.

  13. 13

    Gabriel SB, Schaffner SF, Nguyen H et al: The structure of haplotype blocks in the human genome. Science 2002; 296: 2225–2229.

  14. 14

    Race RR : The Rh genotypes and Fisher's theory. Blood 1948; 3 (suppl 2): 27–42.

  15. 15

    Race RR, Sanger R : Blood groups in man. Oxford: Blackwell Scientific Publication, 1958.

  16. 16

    Jobling MA, Hurles ME, Tyler-Smith C : Human evolutionary genetics: origins, peoples and disease. New York: Garland Science, Taylor & Francis Group, 2004.

  17. 17

    Sabeti PC, Reich DE, Higgins JM et al: Detecting recent positive selection in the human genome from haplotype structure. Nature 2002; 419: 832–837.

  18. 18

    Dork T, Neumann T, Wulbrand U et al: Intra- and extragenic marker haplotypes of CFTR mutations in cystic fibrosis families. Hum Genet 1992; 88: 417–425.

  19. 19

    Sereth H, Shoshani T, Bashan N, Kerem BS : Extended haplotype analysis of cystic fibrosis mutations and its implications for the selective advantage hypothesis. Hum Genet 1993; 92: 289–295.

  20. 20

    Tucker SJ, Tannahill D, Higgins CF : Identification and developmental expression of the Xenopus laevis cystic fibrosis transmembrane conductance regulator gene. Hum Mol Genet 1992; 1: 77–82.

  21. 21

    Vuillaumier S, Kaltenboeck B, Lecointre G, Lehn P, Denamur E : Phylogenetic analysis of cistyc fibrosis transmembrane conductance regulator gene in mammalian species argues for the development of a rabbit model for cystic fibrosis. Mol Biol Evol 1997; 14: 372–380.

  22. 22

    Wine JJ, Glavac D, Hurlock G et al: Genomic DNA sequence of Rhesus (M. mulatta) cystic fibrosis (CFTR) gene. Mamm Genome 1998; 9: 301–305.

  23. 23

    Lee JH, Choi JH, Namkung W et al: A haplotype-based molecular analysis of CFTR mutations associated with respiratory and pancreatic diseases. Hum Mol Genet 2003; 12: 2321–2332.

  24. 24

    Klein RG : The human career. Human biological and cultural origins. Chicago: The University Chicago Press, 1989.

  25. 25

    Neuhauser C : Mathematical models in population genetics; in Balding DJ et al (eds):: Handbook of statistical genetics. New York John Wiley & Sons, 2001, pp 153–177.

  26. 26

    Slatkin M, Bertorelle G : The use of intraallelic variability for testing neutrality and estimating population growth rate. Genetics 2001; 158: 865–874.

  27. 27

    Wagener DK, Cavalli-Sforza LL : Ethnic variation in genetic disease: possible roles of hitchhiking and epistasis. Am J Hum Genet 1975; 27: 348–364.

  28. 28

    Cuppens H, Lin W, Jaspers M et al: Polyvariant mutant cystic fibrosis transmembrane conductance regulator genes. The polymorphic (Tg)m locus explains the partial penetrance of the T5 polymorphism as a disease mutation. J Clin Invest 1998; 101: 487–496.

  29. 29

    Noone PG, Pue CA, Zhou Z et al: Lung disease associated with the IVS8 5T allele of the CFTR gene. Am J Respir Crit Care Med 2000; 162: 1919–1924.

  30. 30

    Wei L, Vankeerberghen A, Jaspers M, Cassiman J, Nilius B, Cuppens H : Suppressive interactions between mutations located in the two nucleotide binding domains of CFTR. FEBS Lett 2000; 473: 149–153.

  31. 31

    Meindl RS : Hypothesis: a selective advantage for cystic fibrosis heterozygotes. Am J Phys Anthropol 1987; 74: 39–45.

  32. 32

    Pier GB, Grout M, Zaidi TS et al: Role of mutant CFTR in hypersusceptibility of cystic fibrosis patients to lung infections. Science 1996; 271: 64–67.

  33. 33

    Gabriel SE, Brigman KN, Koller BH, Boucher RC, Stutts MJ : Cystic fibrosis heterozygote resistance to cholera toxin in the cystic fibrosis mouse model. Science 1994; 266: 107–109.

  34. 34

    Quinton PM : Human genetics. What is good about cystic fibrosis? Curr Biol 1994; 4: 742–743.

Download references


This work was funded by the The Italian Ministry of University and Research; the Italian Ministry of Health, ‘National Project for Standardization and Quality Assurance of Genetic Tests’ (D.lg 505/92); The Italian Cystic Fibrosis Research Foundation; MZCR IGA 1A/8236-3, 00000064203 and EC-CF Chip and CRMGEN to MM.

Author information

Correspondence to Pier Franco Pignatti.



Table A1 is an update of that already published in Modiano et al1 and reports the absolute and relative frequencies of the 61 cSNSs (single-nucleotide substitutions in a coding sequence) found in a Czech sample, larger than that already published, together with the updated European frequencies. Table A2 reports the frequencies of the 19 non-cSNS variant sites detected in this study: 16 intronic (12 SNSs, three STRs, and one nucleotide insertion) and three exonic (one SNS in the 5′UTR and two trinucleotide deletions in the coding sequence).

Table a1 Fequencies of the 61 CFTR cSNSs
Table a2 Frequencies of the 19 CFTR gene variants other than cSNSs

Rights and permissions

Reprints and Permissions

About this article


  • CFTR
  • random variability
  • allele-restricted haplotype variability

Further reading