INTRODUCTION

MUTYH-associated polyposis (MAP) (OMIM no. 608456; http://www.ncbi.nlm.nih.gov/omim/) is an autosomal recessive colorectal polyposis syndrome caused by mutations of the base excision repair gene MUTYH.1, 2

MUTYH is located on chromosome 1p34.3–p32.1 and consists of 16 exons encompassing 1.65 kb (OMIM no. 604933, coding sequence: NM_001128425.1). Over the past years, its mutation spectrum has been described in series of MAP patients from different countries and with varying phenotypes (www.lovd.nl/MUTYH). In populations of European origin, the missense mutations c.536A>G;p.Tyr179Cys and c.1187G>A;p.Gly396Asp are by far the most common disease-causing variants, accounting for 50–82% of MUTYH mutations identified in MAP patients.3, 4 p.Tyr179Cys and p.Gly396Asp have been identified in Sephardi Jews, but not in Ashkenazi, in Finns or in Far Eastern Asians. Moreover, different variants have a larger role in other populations, pointing to ethnic and geographic differences in the pattern of MUTYH mutations.5 Screening of large series has shown that the combined p.Tyr179Cys and p.Gly396Asp allele frequency in healthy controls of European origin ranges from 0.2 to 1.14% (Table 1). These mutations are detected with similar frequencies in unselected series of colorectal cancer (CRC) patients (Table 1).

Table 1 p.Tyr179Cys and p.Gly396Asp allele frequencies in MAP and CRC patients and in controls from different populations

These findings raise questions concerning the origin and spread of the p.Tyr179Cys and p.Gly396Asp mutations among European populations. In the present study, we have addressed this by performing haplotype analysis in 80 MAP families from Italy and Germany segregating one or both common MUTYH mutations. The estimated ages of the two mutations are consistent with the existence of ancestral haplotypes on which p.Tyr179Cys and p.Gly396Asp arose several thousand years ago, suggesting that separate founder effects account for the current genetic epidemiology of the two most common MUTYH mutations in individuals of European ancestry.

MATERIALS AND METHODS

A total of 144 genomic leukocyte DNA samples, derived from 31 families with p.Tyr179Cys, 29 with p.Gly396Asp and 20 with both mutations, were genotyped. One to 11 relatives were genotyped in 11 families segregating p.Tyr179Cys, in five families with p.Gly396Asp and in all families segregating both mutations. In the remaining families only the index case was available for genotyping. All families were of European ancestry and originated from different regions of Italy (n=42) or Germany (n=38).

Two hundred and eighty-eight control chromosomes, 164 from Italy and 124 from Germany, not carrying the two MUTYH recurrent mutations, were examined to estimate allele and haplotype frequencies in the general population. Controls included 62 unrelated individuals and 64 parents from 32 father-mother-child trios without history of CRC, as well as 18 members of MAP families who did not carry either MUTYH recurrent mutation.

Six short tandem repeat (STR) markers (D1S421, D1S451, D1S2677, D1S3175, D1S2874 and D1S2824), spanning a region of 3.5 Mb including the MUTYH locus on chromosome 1, were used for haplotype analysis. Primer sequences and conditions are available on request.

Allele frequency distributions in normal and mutated chromosomes were compared by Fisher’s exact tests using PASW Statistic 18.03 software (IBM, NY, USA). Bonferroni correction was applied to correct for multiple comparisons, multiplying the nominal significance level (P) by the number of alleles tested for each locus. P<0.05 was considered as a cutoff for statistical significance. Haplotypes including the six microsatellites were manually constructed to minimize the number of recombinations.

Mutation age estimates were performed by DMLE+2.2 software (www.dmle.org)54 using the following parameters: haplotypes of microsatellite loci in mutation-bearing (MB) and control chromosomes; map distances between markers and mutation site inferred on the basis of physical distances (in Mb) given in the UCSC Genome Browser (GRCh37/hg19 assembly) considering 1 Mb=1 cM. The European population growth rate (genr) was estimated with the following formula: genr=log(Pt/P0)/g, in which genr is the population growth rate per generation, Pt is the estimated present population size, P0 is the estimated size of the population at reference time and g is the number of generations between these two time points. The total European population currently comprises 502 489 100 people (http://europa.eu/index_it.htm). Historical and demographic data indicate that about 19 000 000 people lived in this area in year 400 BC.55 Accordingly, the average genr of this population was estimated to be 0.034 from 400 BC to the present time, assuming 25 years/generation.56 The calculations were repeated using the genr value estimated for the last 20 000 years, corresponding to 0.0075.57 The proportion of MB chromosomes sampled (fc) was first calculated separately for Italy and Germany considering the following parameters: current Italian (60 742 397 inhabitants, http://demo.istat.it/) and German population size (82 438 000 inhabitants, www.destatis.de/); p.Tyr179Cys and p.Gly396Asp allele frequencies in Europeans: 0.3 and 0.7, respectively;35, 42 and the number of chromosomes investigated in the study. Mutation age was calculated both separately for the two countries and on combined Italian and German data using the average fc value estimated for the two populations (0.00014 for p.Tyr179Cys and 0.000043 for p.Gly396Asp).

RESULTS

Allele and haplotype analysis of polymorphic markers flanking the MUTYH locus was performed on the 288 control chromosomes, as well as on MAP probands and, when possible, on their relatives. For both mutations, not all chromosomes from the proband group were considered, as their phase could not be determined. Overall, 29 families (17 from Italy and 12 from Germany) with probands homozygous for p.Tyr179Cys or p.Gly396Asp, 19 (9 from Italy and 10 from Germany) with probands heterozygous for either mutation, and 13 (4 from Italy and 9 from Germany) with compound heterozygotes for both mutations were used to estimate allele frequencies and for haplotype analysis in chromosomes carrying either MUTYH variant.

Statistically significant differences in allele frequencies between wild-type and p.Tyr179Cys and p.Gly396Asp chromosomes were observed for markers located closer to MUTYH. In particular, D1S421, D1S451, D1S2677 and D1S3175 showed highly significant differences for p.Tyr179Cys, and D1S2677 and D1S3175 for p.Gly396Asp chromosomes (Table 2).

Table 2 Frequencies of the most common microsatellite alleles of the MUTYH region in p.Tyr179Cys and p.Gly396Asp cases and in controls

A common haplotype (148-252-167-157) spanning 0.76 Mb was found in 46 (76%) p.Tyr179Cys chromosomes (Figure 1). In addition, the genotypes observed in three of the six remaining p.Tyr179Cys heterozygotes whose phase at D1S451 was undefined were compatible with this common shared haplotype. Seven additional haplotypes (148-252-169-157, 144-252-165-157, 148-250-167-157, 146-252-167-157, 148-256-167-157, 148-252-167-161 and 148-252-167-165; the latter one was observed in four chromosomes) segregated with p.Tyr179Cys in the remaining chromosomes; these differed by at most two markers from the common haplotype.

Figure 1
figure 1

Haplotype branching trees of p.Tyr179Cys (right) and p.Gly396Asp (left) chromosomes in Italian (top panels) and German (bottom panels) families. The markers used are shown according to their position on the physical map (UCSC Genome Browser Feb. 2009 GRCh37/h19). Boxed numbers indicate STR alleles comprised in the ancestral haplotypes; for p.Gly396Asp, the two most common haplotypes (A and B) are indicated by solid and dashed lines, respectively; non boxed numbers correspond to additional alleles detected in the study families. Family codes are indicated above or below alleles in which they have been identified; for each family, the number of chromosomes with the corresponding allele is indicated as × 1 (1 copy) or × 2 (2 copies).

Two distinct haplotypes (referred to as haplotypes A and B) covering a region of 0.22 Mb were more common on p.Gly396Asp chromosomes (Figure 1). Haplotype A (175-157) was shared by 21 (42%) and haplotype B (177-157) by 19 (38%) p.Gly396Asp chromosomes, respectively. Five other haplotypes (179-157, 181-157, 175-161, 167-157 and 169-157), all sharing the 157 allele, were observed on the remaining p.Gly396Asp chromosomes.

Neither haplotype associated with p.Gly396Asp was observed in controls, and the p.Tyr179Cys-associated haplotype was present in only 2 out of 288 control chromosomes.

p.Tyr179Cys age was estimated as 278 generations (g) (95% credible set (CS): 234–360), corresponding to 6950 years (95% CS: 5850–9000), in the Italian population, and 304 g (95% CS: 261–399), corresponding to 7600 years (95% CS: 6525–9975), in the German population with genr 0.034. Combined data from both populations gave an age estimate of 305 g (95% CS: 271–418), corresponding to 7625 years (95% CS: 6775–10 450) (Figure 2a–c).

Figure 2
figure 2

Posterior probability distribution plots of p.Tyr179Cys and p.Gly396Asp ages (in generations), as estimated by DMLE+2.2 software using a population growth rate (genr) of 0.034. (a) Age estimate in the Italian population for p.Tyr179Cys (fc=0.00016); (b) age estimate in the German population for p.Tyr179Cys (fc=0.00012); (c) age estimate for p.Tyr179Cys using combined data from both populations (fc=0.00014); (d) age estimate in the Italian population for p.Gly396Asp (fc=0.00004); (e) age estimate in the German population for p.Gly396Asp (fc=0.000047); (f) age estimate for p.Gly396Asp using combined data from both populations (fc=0.000043). The vertical broken lines indicate the 95% credible set values.

Using the same genr value, p.Gly396Asp age estimates were 300 g (95% CS: 270–397), corresponding to 7500 years (95% CS: 6750–9925) and 347 g (95% CS: 306–456), corresponding to 8675 years (95% CS: 7650–11 400), for the Italian and German populations, respectively. Combined data for the two populations gave an estimate of 350 g (95% CS: 313–435), corresponding to 8750 years (95% CS: 7825–10 875) (Figure 2d–f).

The following estimates were obtained for the combined German and Italian populations when the genr was set at 0.0075: 1245 g (95% CS: 1035–1486) for p.Tyr179Cys and 1309 g (95% CS: 1172–1579) for p.Gly396Asp (data not shown); these correspond to 31 125 years (95% CS: 25 875–37 150 years) and 32 725 years (95% CS: 29 300–39 475 years), respectively.

DISCUSSION

Biallelic MUTYH germline mutations contribute to a considerable proportion of attenuated forms of colorectal adenomatous polyposis. To date, around 300 different MUTYH mutations have been identified in MAP patients worldwide (www.lovd.nl/MUTYH). Regional and ethnic differences in the mutation spectrum reported in MUTYH screening projects since 2002, and the presence of specific mutations at higher frequency in some populations suggest the existence of multiple founder effects.28, 34, 58, 59, 60, 61

p.Tyr179Cys and p.Gly396Asp are the most common MAP-linked MUTYH mutations in populations of European origin. The finding of highly conserved core haplotypes for both mutations in German and Italian MAP patients suggests that the p.Tyr179Cys and p.Gly396Asp chromosomes sampled derive from two ancestral chromosomes, on which the mutations arose independently. Although a single common haplotype was associated with p.Tyr179Cys, two distinct haplotypes, sharing the D1S3175 allele, were observed at similar frequencies on p.Gly396Asp chromosomes. The latter finding could be due to an early mutation at marker D1S2677 or to a recombination event. A higher haplotype diversity for p.Gly396Asp is not surprising, as this mutation is more frequent and the age estimates indicate that it is probably older than p.Tyr179Cys.

To investigate the natural history of the two common MUTYH mutations, we calculated their apparent age using the DMLE+2.2 software;54 this is considered to provide the best estimates of gene genealogies and mutation age compared with other methods.62, 63 If it is assumed that g=25 years and that 1960 is the average year of birth of the subjects enlisted in the present study, our overall results would date the age of the MUTYH p.Tyr179Cys and p.Gly396Asp mutations back to 6775–10 450 (or 4775–8450 B.C.) and 7825–10 875 (or 5825–8875 B.C.) years ago, respectively, corresponding to the Neolithic revolution and just before or in concomitance with the development of agriculture and consequent migrations in Europe. The ancient origin of the two mutations is reasonable in light of their even distribution and subpolymorphic frequencies across most European populations.

On the other hand, these results have to be interpreted with some caution, as they are sensitive to changes in the parameters used, namely g and genr. For instance, use of lower genr values, such as 0.0075, corresponding to the average genr estimated for Europe in the last 20 000 years,57 would date the mutations back to around 30 000 years ago, toward the end of the last Pleistocene glaciation in the Paleolithic. However, this result is less compatible with the absence or very low frequency of these mutations in non European populations, as other genetic variants, such as factor V Leiden, that likely date back to that period are widely spread in other ethnic groups.64

The relationship between the age of a mutation and its frequency in the population from which the identical-by-descent MB chromosomes were sampled is influenced by both historical population growth and natural selection. Generally, allelic coalescence models assumed for estimating mutation age are influenced by population growth and selection through a common parameter which is the sum of the two effects.65 Thus, for any mutation, the effect of selection, if any, cannot be separated from that of demography in the shaping of the genealogy of MB chromosomes. The frequency of the p.Gly396Asp mutation is usually higher than that of p.Tyr179Cys in the general population of European countries; specifically, in the largest series investigated so far, involving a total of 5064 control individuals, p.Tyr179Cys and p.Gly396Asp allele frequencies were 0.3 and 0.7%, respectively.35 Genetic drift and/or selection might account for the differences in frequencies and estimated ages between the two mutations.

Considering the general late onset of both adenoma and CRC formation in MAP patients,4, 6, 8, 11, 23 negative selection would be expected to be limited. Nevertheless, a robust correlation between certain MUTYH genotypes and the colorectal phenotype has recently been established.66 Both p.Gly396Asp homozygotes and p.Gly396Asp/p.Tyr179Cys compound heterozygotes show a later presentation and have a significantly lower CRC hazard than p.Tyr179Cys homozygotes (P<0.001). Interestingly, while the allele frequency in large samples of healthy controls is clearly skewed towards p.Gly396Asp, p.Tyr179Cys dominates in MAP patients (Table 1). In addition, the frequencies of these two mutations in population-based CRC patients are similar to those observed in controls of corresponding ethnicity; this is likely due to the milder phenotype in MAP patients not selected for the presence of polyposis.44

These observations indicate that the two mutations have uneven effects on clinical phenotype, and suggest that they might be associated with slightly different selection coefficients. This implies that caution is required in using cumulative allele frequencies (combined samples of general population and patients) when a deleterious mutation is investigated for its natural history.

In conclusion, our findings and the analysis of data from the literature support the hypothesis that the high frequency of the p.Tyr179Cys and p.Gly396Asp mutations in MAP patients from the present German and Italian populations is attributable to founder events that occurred during early European history. The possibility that selection may have had a major role in shaping the allelic genealogy of the two mutations and in determining their present day frequencies cannot be excluded without further evidence. Comparative haplotype studies in different populations of chromosomes bearing the same MUTYH mutations will refine our knowledge on their origin and spread into the European continent. In addition, the different distribution of MUTYH mutations among populations have implications for mutation detection strategies. A screening approach restricted to p.Tyr179Cys and p.Gly396Asp does not seem to be appropriate for patients with a non European ethnic background.