Cystic fibrosis (CF) is the most common lethal genetic disease in Europeans and is transmitted as an autosomal recessive disorder caused by loss of function of the Cystic Fibrosis Transmembrane conductance Regulator (CFTR) gene. Since long it has been supposed that its quite high frequency (1/2500 implying an allele frequency equal to 0.02) is due to a Europe-restricted selective advantage of the heterozygous CF/+ individuals. Nevertheless, this hypothesis has never been directly demonstrated nor the hypothetical selective factor identified. Here, we provide compelling evidence that the CF/+ advantage hypothesis is right. As to the identification of the selective factor, as the CFTR protein is a Cl− channel (and hence involved in water excretion), diarrhea appears a reasonable candidate for such a role. Cholera has, in fact, been suggested, but its area of diffusion had not been limited to Europe. Considering that initially all humans were lactose intolerant in the post-weaning life and Europeans became cattle breeders and adopted a dairy-milk diet, we propose that the hypothetical Europe-restricted CF/+ advantage consisted of a resistance to lactose-caused diarrhea in populations that adopted such a habit while they were still lactose intolerant. This initial emergency segregational load adaptation would have been followed by a not costly one consisting of the present European quasi-fixed post-weaning lactase persistence (due to the dominant P allele of the lactase, LCT, gene), resulting in lactose tolerance. This hypothesis is supported by the positive correlation between the proportion, among the CF alleles, of the F508del allele and the P allele – recently identified with the T allele 14 Kb upstream to the LCT gene – frequency and provides a possible explanation for the well-known north–south cline of the F508del allele frequency.

Both adaptations would be examples of genetic responses to a culture-rooted adverse factor.

CF is the most common (1/2500 newborns) severe autosomal recessive disease in Europe. The global frequency of the CF-causing alleles of the CFTR gene, estimated as the square root of the prevalence of the disease, turned out to be about 0.02 in Europe, 0.01 in Africa and 0.002 in Far-Eastern Asia.1

The high CF allele frequency in Europe could only be accounted for by a ‘Europe-restricted’ high mutation rate or by a ‘Europe-restricted’ heterozygous CF/+ advantage. The most plausible explanation seems to be the latter, mainly because a ‘Europe-restricted’ high mutation rate was judged extremely unlikely, although it was never directly disproved. In addition, the biological basis of the hypothetical ‘Europe-restricted’ CF/+ heterozygous advantage is far from being identified because of the lack of a convincing candidate adverse factor: bacteria-driven diarrheas – proposed on the basis of the functional properties of the CFTR protein – do not seem to have had ‘Europe-restricted’ histories, a fundamental prerequisite for a good candidate.

In this paper, first we definitively reject the ‘Europe-restricted’ high mutation rate (μ+ → CF) hypothesis; then we propose a ‘Europe-restricted’ biological factor that could account for the CF/+ heterozygous advantage.

Discarding the ‘Europe-restricted high μ+ → CF hypothesis’

The mutation rates CFTR+ → CFTRCF to be postulated in order to account, in terms of mutation only, for the present CF gene frequencies are 4 × 10−4 for Europe, 10−4 for Africa and 4 × 10−6 for Far-Eastern Asia.

A by-product of the ‘Europe-restricted’ high μ+ → CF hypothesis would be the continuous and rapid substitution of the pre-existing ‘old’ CF alleles with an equal number of ‘new’ CF alleles; this process would have taken place according to the function

(Fold/Ftotal)t=(Fold/Ftotal)t=0 e−0.02t, where Fold is the proportion of the ‘old’ CF alleles among all the CF alleles (the frequency of which is Ftotal); t is the time, expressed in terms of generations, elapsed from an arbitrary moment (t=0) when all the CF alleles were ‘old’ (thus Fold/Ftotal was 1); and 0.02 is the rate of substitution of the ‘old’ CF alleles by an equal number of ‘new’ CF alleles, per generation.

The ‘Europe-restricted’ heterozygous advantage would instead result in a substantially complete maintenance of the ‘old’ CF alleles, namely of their original spectrum and haplotype relationships.

Fundamental – and testable – implications are that the high μ+ → CF hypothesis expects that the present spectrum of the CF alleles, besides being drastically different from the original one, consists of a very large number of different CF alleles, each extremely rare, and distributed, on the whole, at random with respect to the alleles of the polymorphic sites of the CFTR gene. On the contrary, the heterozygous advantage hypothesis expects that the present CF genes spectrum reflects ‘faithfully’ the original one in terms of both composition and haplotypic relationships. In other words, the few ancient CF alleles should be still common and largely located in their original haplotype.

To choose between these two sets of expectations, we have worked out the data reported in Ciminelli et al,2 where 358 non-F508del CF alleles had been studied in order to ascertain their association with the alleles of the M470V highly polymorphic site.

Two aspects of the data have been considered:

  1. a)

    the percentage of the recent (mainly the singletons) and of the ancient CF alleles (all the others) and

  2. b)

    the extent of their preferential association with the M allele.

Table 1 indicates that at least 90% of the CF alleles are ancient and that they are associated with the M allele 14 times (21.2/1.5) more frequently than the recent CF mutations. Both sets of results are incompatible with the ‘Europe-restricted’ high μ+ → CF hypothesis.

Table 1 Estimate of the relative proportions of RECENT and ANCIENT CF alleles and of the extent of the preferential association with the M470 allele

Therefore, these data conclusively disprove the ‘European-restricted’ high mutation rate hypothesis, thus proving, by exclusion, that the high CF gene frequency is due to a CF/+ heterozygous advantage.

Nature of the CFTR+/CFTRCF heterozygote advantage

The CFTR protein is the Cl− ion channel responsible for the displacement of that ion, and consequently of water, from the mucosa epithelial cells into the respiratory and intestinal lumen. It is therefore reasonable to believe that heterozygosity for a CFTR functionally silent allele (=a CF allele) makes the intestinal epithelial cells less prone to secrete Cl− ions and water, as shown in the mouse model,5 although not confirmed in humans.6 Chloride transport in vivo in CF heterozygotes is often below 50%, and at times normal,7 corresponding to the severity of the known mutations.8

Thus, considering that one of the events required to develop any type of diarrhea is the accumulation of water in the intestinal lumen and that the most severe diarrheas have been caused by cholera, it has been suggested that:

  1. 1

    the CF heterozygotes are more resistant to the diarrhea-causing factors and this resistance would have conferred to them a selective advantage sufficient to maintain the CF alleles at frequencies very high for a lethal allele;9, 10, 11, 12

  2. 2

    the main diarrhea-causing factor has been tentatively identified with cholera epidemias.13

Here, we propose to accept the first of the two hypotheses only, namely that the cause (hereafter named adaptogen) of the CF/+ balanced polymorphism has been a diarrhea-causing factor. As for the second hypothesis, as there is no stringent indication that Europeans have been preferentially exposed to cholera (and/or allied diseases), it appears reasonable to assign the role of major candidate adaptogen, instead of to cholera, to any Europe-restricted intense and long-lasting diarrhea-causing factor. We hypothesized that the role of major adaptogen has been played by the dairy-milk diet adopted by Europeans when they were still lactose intolerant. In fact, the vast majority of human populations, as all the other mammals, are lactose intolerant in the post-weaning life; lactose tolerance is a feature restricted to the European populations, especially to those of North-Western Europe, where it is near fixation.14, 15 According to our hypothesis, when the cattle breeder populations of Europe began to make use of milk in the post-weaning life, they were lactose intolerant and the balanced polymorphism for the CF alleles has represented an ‘emergency’ (with a high cost/benefit ratio due to the high segregational load) adaptation to this dietary change. Clearly, the truly effective (etiological), and ultimately successful, adaptation to the milk diet has been a dramatic increase, even reaching fixation in North-Western Europe, of lactase persistence, that is, lactose tolerance, as suggested previously.16, 17, 18, 19

Table 2 lists the main properties required to make a factor a good candidate for having played the role of adaptogen and compares cholera and lactose intolerance for these properties. Cholera and allied enterobacteria-caused diarrheas should be discarded mainly because of the lack of a ‘Europe-restricted’ distribution. Dairy milk plus lactose intolerance, instead, appears to be a better candidate.

Table 2 Comparison between two diarrhea-causing factors, cholera and dairy milk plus lactose intolerance, for their compliance to the prerequisites of an ideal adaptogen candidate in the CF/+ heterozygous advantage

Obviously, the selective ‘usefulness’ of such ‘emergency’ adaptation should have become less and less with time down to nullification as the frequency of lactose tolerance was increasing.

The question then arises: ‘Why the onset of the lactase-persistence adaptation was so much delayed with respect to the CF ‘emergency’ adaptation?’

A reasonable answer is that CF alleles, being ‘loss of function’ alleles, were certainly present when the selective pressure due to the combined effect of lactose intolerance and dairy milk started, whereas lactose-tolerance, a sophisticated phenotypic modification (a single gene onthogenetically regulated in all mammals to express itself only during weaning had to be ‘convinced’ to continue to work through post-weaning life), was not so easily available or even absent in the exposed populations.

In summary, the most economical scenario to explain two ‘Europe-restricted’ features, the extremely high frequency of lactose tolerance and the high frequency of the CF alleles, consists in considering them as facets of the same coin: two profoundly different adaptations to the same stringent, culture-rooted adaptogen, the abundant intake of lactose during the post-weaning life. Both adaptations would avoid severe diarrhea attacks following lactose ingestion. Lactase persistence acts by preventing the arrival of lactose into the colon and the consequent hyperfermentation owing to its utilization by the intestinal bacteria; CF heterozygosity would instead act by reducing the diarrhogen effect of the hyperfermentation. The time course of events might have been the following:

  1. a)

    Initially all humans were lactose intolerant and the cumulative CF gene frequency was maintained at a value of about 0.002 by a μ+ → CF≈4 × 10−6 (as in Japan).

  2. b)

    5000–10 000 years ago cattle-breeding and dairy milk post-weaning diet started,19, 20 resulting in a stringent selective pressure.

  3. c)

    The onset of an ‘emergency’ adaptive response consisting of an immediate expansion of the already available CF alleles (the antiquity of the F508del allele has been estimated to be 52,000 years21). They would have attained and maintained a frequency unknown, but possibly higher than the present one, resulting from a balance between the advantage of being more resistant to the adverse consequences of being unable to digest lactose (hence to benefit from an important food as the milk) and a heavy segregational load.

  4. d)

    The onset of the not costly lactase-persistence adaptation, which attained rapidly a very high frequency (up to fixation in some European areas).14, 15

  5. e)

    A progressive decrease of the CF gene frequency down to the present (still high) values, which can be considered a relic of the past ‘emergency’ adaptation.

The fact that the global CF gene frequency in Africa, where some ancient pastoralist cultures do occur, is believed to be about 0.01 – a value intermediate between those of Europe and Japan – would be in favor of the present hypothesis if a positive correlation between pastoralism and CF gene frequency distribution in Africa were found. However, owing to the lack of information about the spectrum of CF alleles in African populations and the low prevalence of the disease, a search in this direction could only be based on extensive epidemiological surveys, not to mention diagnostic problems.

The molecular basis of the lactose-persistence phenotype (consisting in a T/C polymorphism 14 kb upstream to the LCT gene, with the dominant T allele conferring lactase persistence) has been identified recently22 so that directly testing the hypothesis by comparing CFTR+/CFTR+ with CFTR+/CF subjects all with the C/C lactose-intolerant genotype would be a feasible, but not an easy, task.

Geographic aspects

Both LCT*T and F508del alleles are monophyletic.23, 24 This is equivalent to saying that they were born in a single geographic area where, being selectively advantageous, they attained relatively high frequencies and were then exported to other areas. The spectrum of the potentially adaptive alleles of the LCT gene is likely to be very limited, so the fact that the T allele was the one selected was a priori a very likely event; on the contrary, as the potential spectrum of pathologic (CF) alleles is extremely large, it appears likely that the F508del allele was the successful one by pure chance.

The question then arises: ‘Why the various European populations are all adapted to a dairy milk diet mainly with the same CF allele (F508del) instead of being each adapted with its own CF allele(s)?’

The simplest and most economical explanation is that a dairy-milk diet became established in a single area and remained restricted to that area for a period of time sufficient to allow the T and the F508del alleles to attain high values. Then, in a second phase, the population of that area exported to the rest of Europe its dairy-milk diet culture together with the two adaptive genes, that is, the adaptogen and the two genetic adaptations to it. These two alleles would have then been amplified in the recipient populations because of their adaptive value owing to the co-imported dairy milk diet.

As to the identification of the ‘donor area’, there is no doubt that by far the best candidate is Northern Europe, where these alleles have their highest frequencies. This is perfectly in line with the proposal of Beja-Pereira et al.25 (based on data on six milk protein genes of cows), according to which dairy cow breeding was first developed in Northern Europe.

This interpretation is consistent with the highly significant (P<0.001) positive correlation between the LCT*T frequency and the proportion, among the CF alleles, of the F508del allele, as shown in Figure 1, and would account for the well-known F508del allele frequency north–south cline.

Figure 1
figure 1

Positive correlation between F508del and LCT*P. Data for different European populations from Swallow and Hollox14 and Bobadilla et al.26 LCT*P indicates the dominant lactase-persistence allele identified at the phenotypic level (ability to digest lactose); at a first approximation it can be considered to coincide with the LCT*T allele identified at the DNA level.15

It may be worth noticing that the rest of Europe has not reached the ‘donor area’ neither for the T allele nor for the F508del allele frequency, but it did for the combined CF alleles frequency (about 0.02 in the whole of Europe) through the complementary (to 0.02) accumulation of CF alleles other than F508del. This can be explained by considering that, whereas it would have been difficult to mount an adaptation at the LCT level (in addition to that provided through importation), there were a number of opportunities for adaptation through CF alleles other than F508del▪