To identify variant alleles in MLH1, MSH2 or MSH6 with high sensitivity and specificity, we designed a new high-density oligonucleotide array (HNPCC Chip) that uses improved bioinformatic variant-detection algorithms. The HNPCC Chip covers 14 kb of MLH1-MSH2-MSH6 coding and flanking sequence. The HNPCC Chip effectively detected variants in these three mismatch repair (MMR) genes (Fig. 1), accurately detecting 60 of 60 unique missense or base substitution mutations (data not shown).

Figure 1: Variant allele detection by the HNPCC Chip resequencing array.
figure 1

(a) Graph of HNPCC Chip. Loss of hybridization analysis for full MLH1 coding sequence and splice junctions. Data are the average of signals obtained for one sense and one antisense chip hybridization with the same individual's genomic DNA. Nucleotide substitutions in individuals with CRC are graphically shown as a signal peak above baseline (ratio of signal for wild-type DNA to DNA from affected individual; WT/AI). Downward spikes below graph were inserted post hoc to facilitate identification of exon boundaries. This DNA was heterozygous with respect to a C→T base substitution in MLH1 exon 14. The boundaries of all 19 MLH1 coding exons are labeled on the ordinate axis. (b) Higher magnification of HNPCC Chip signal analysis for MLH1 exon 14 (108 bp). WT/AI, normalized ratio of chip hybridization signal for fluorescein-labeled wild-type reference DNA divided by the hybridization signal for streptavidin-biotin-phycoerythrin–labeled DNA from an affected individual. The exact position of the C→T substitution is indicated by the arrow in both panels.

We used the HNPCC Chip to look for variants in 35 unrelated Israeli individuals with CRC who had one first-degree relative with CRC. We identified 36 unique variants (Supplementary Table 1 online) but only 1 classic MMR mutation. Of these 36 variants, 20 (55%) are represented in the SNPdb, Environmental Genome Project, Celera, SNP500 or HGBase SNP databases. Another six (17%) MMR variants are in mutation databases (ICG-HNPCC, Cardiff Mutation Database or Myriad Genetics Database and Medline), leaving ten (28%) new MMR variants in these intensively resequenced genes.

We tested association of MMR variant alleles with CRC in a set of 455 cases and 455 matched controls from the Molecular Epidemiology of Colorectal Cancer (MECC) study (Supplementary Table 2 online). Three variants showed allelic association with CRC. MLH1 415C (odds ratio (OR) = 5.0, 95% confidence interval (c.i.) = 0.6–42.8) and MSH6 1613C (OR = 1.87, 95% c.i. = 0.90–3.8) showed trends towards association with CRC (Table 1). In contrast, MSH6 2633C was overrepresented in controls (OR = 0.35, 95% c.i. = 0.14–0.90). To minimize false positive associations, we validated these associations in a set of 844 individuals with CRC and 940 controls that did not overlap with the first set (Table 1). MSH6 1613C was equally frequent in cases and controls (OR = 1.0, 95% c.i. = 0.6–2.0; combined data set OR = 1.4, 95% c.i. = 0.9–2.2). For MSH6 2633C, the validation set showed no CRC risk (OR = 0.80, 95% c.i. = 0.4–1.4; combined data set OR = 0.60, 95% c.i. = 0.4–0.99). MSH6 2633T→C was originally identified in probands with familial and metachronous CRC1,2. It is annotated as a pathogenic mutation in the Celera, HGMD and ICG-HNPCC databases but also occurs in Ashkenazi control subjects3. We cannot exclude the possibility that this variant has a subtle effect, but our large data set suggests that it is probably not associated with CRC risk. Similar results have been observed with putative rare ATM polymorphisms that, when tested in larger sets, are not associated with cancer susceptibility4.

Table 1 Association between the MLH1 415C, MSH6 1613C and MSH6 2633C variant alleles and risk of CRC

In the validation set, the MLH1 415C allele was more frequent in CRC cases (OR = 4.6, 95% c.i. = 1.5–13.9; combined data set OR = 4.6, 95% c.i. = 1.7–12.3). These results suggest that MLH1 accounts for 1.3% of CRC in Israel. We identified MLH1 415C in Ashkenazi Jews, Muslim and Christian Arabs, Sephardic Jews, Druze Christians and Bedouins (self-described ethnicities; Fig. 2 and Table 2). Analysis of flanking and intragenic microsatellite markers identified no conserved MLH1 415C haplotype and suggested that Jewish carriers of the MLH1 415C allele are unrelated. The average age of CRC onset in carriers of MLH1 415C is 70.1 years, compared with 20–40 years in individuals with HNPCC. Only 6 of 21 carriers of MLH1 415C had a first-degree relative affected with HNPCC-spectrum tumors (Fig. 2). Multiple primary tumors are also rare. Only one proband had a second primary malignancy (papillary thyroid cancer), and none had metachronous CRC (Fig. 2 and Supplementary Table 3 online). Classic inactivating mutations in MLH1 that underlie HNPCC cause MSI. Notably, in the 11 CRCs from carriers of MLH1 415C available for MSI testing, 9 (82%) did not show MSI (Supplementary Table 4 online). Analogous to the attenuated APC allele encoding the amino acid substitution I1307K5 and the high-penetrance syndrome familial adenomatous polyposis, the clinical characteristics of the 21 carriers of the MLH1 415C allele with CRC in this study more closely resemble those of individuals with sporadic CRC than those of individuals with typical HNPCC (Fig. 2 and Table 2).

Figure 2: Pedigrees of selected carriers of MLH1 415C.
figure 2

Probands are indicated by the arrows. The self-reported ethnicity of each family is indicated. Current age, or age at death and cause of death (generally cancer of the indicated organ), is indicated. MI, myocardial infarction.

Table 2 Characteristics of population-based validation set and carriers of MLH1 415C with CRC

Mutations in MMR genes are thought to cause CRC by increasing rates of single-base substitutions and frameshift mutations (MSI) or preventing the initiation of apoptosis6. Structure-function and comparative sequence analyses defined essential, evolutionarily conserved MMR domains. In Escherichia coli, a MutS dimer binds mismatched DNA and then forms an ATP-binding-dependent ternary complex with dimeric MutL7. In mammals, MutS homologs (MSH proteins) bind to mismatched DNA. MutL homologs (MLH and PMS proteins) interact with MSH proteins and help to catalyze different MMR functions in an ATPase-dependent manner.

Sequence comparisons of MLH1 from different species show that D132H is highly conserved (Fig. 3). The MLH1 N terminus has an ATPase function8. To determine the functional effects of the MLH1 D132H substitution, we expressed the N terminus of MLH1 (amino acids 1–344) containing the ATPase and ATP binding domains with and without this sequence variant in bacteria as a glutathione S-transferase (GST) fusion protein. Consistent with previous studies, GST alone had essentially no ATPase activity, whereas the wild-type MLH1 N terminus fused to GST had measurable activity8,9,10. The MLH1 D132H substitution attenuated, but did not eliminate, ATPase activity (Fig. 3).

Figure 3: Structural and functional analyses of MLH1 D132H.
figure 3

(a) Comparison of ATPase activity in MLH1 D132H and wild-type MLH1 in vitro using MLH1 amino acids 1–344 expressed as a GST fusion protein. (b) ClustalW sequence alignment of MLH1 N terminus in different species. MLH1 Asp132 is highlighted in red. (c) Modeling of the MLH1 D132H substitution based on the crystal structures of E. coli MutL and human PMS2. The dimeric structure of the ATPase domain of E. coli MutL in the presence of a nonhydrolyzable ATP analog, ADPnP (PDB: 1B63). The two MutL subunits represented by ribbon diagrams are shown in green and yellow. The ADPnP molecules are represented by ball-and-sticks in red. The secondary structures potentially disturbed by the D132H mutation in MLH1 are highlighted in blue and labeled with three residues (His112, Asp132 and Tyr157) that interact to stabilize the protein structure. (d) Asp132 and its interacting partners, His119 and Tyr164, in human MLH1 are modeled onto the human PMS2 structure (PDB: 1H7U). Protein backbone is represented by calcium traces in blue, the three interacting side chains are shown in green ball-and-sticks, oxygen atoms are shown in red and nitrogen atoms are shown in blue. Asp139 forms a hydrogen bond with Tyr164 and charged interactions with His119. Replacing Asp139 with histidine (shown in gray) would result in clash of His132 with Tyr164 and perturb the interaction with His119. In turn, these perturbations could affect ATP binding, ATPase and dimer formation as indicated.

To understand more precisely the mechanisms by which MLH1 D132H disrupts protein function, we mapped the substitution onto the crystal structures of MLH1 homologs E. coli MutL and human PMS2. Asp132 is situated in an evolutionarily conserved β-hairpin structure that is part of the ATP binding and hydrolysis domain and buttresses the 'ATP lid'. In the PMS2 structure, a pair of charged residues His139 (equivalent to MLH1 Asp132) and Asp119 (equivalent to MLH1 His112 at the end of the ATP lid) interact to stabilize the ATP lid. In MLH1, these charged residues are swapped, and MLH1 Tyr157, which substitutes a sterically smaller serine residue in PMS2, bridges the interaction between MLH1 Asp132 and His112. Replacing MLH1 Asp132 with histidine interrupts the interactions between Asp132 and His112 (Fig. 3), which in turn destabilizes the ATP lid and reduces the ATPase activity. The importance of the β-hairpin is shown by the MLH1 A128P mutation associated with HNPCC, which disrupts the β-hairpin formation completely rather than destabilizing a specific interaction. The effects of mutations that impair a protein function are often proportional to the spatial distance from the center of action. MLH1 D132H is more distant from the ATPase active site (16 Å) than MLH1 A128P and other mutations that cause HNPCC, such as MLH1 M35R and MLH1 S44F (7–9 Å). Structural analysis predicts that MLH1 D132H will attenuate, but not eliminate, MLH1 ATPase activity.

Perhaps the most notable feature of MLH1 D132H is that it does not cause a strong MSI phenotype like classic MLH1 mutations. Previously, MLH1 K618A and MLH1 E578G were described in probands with HNPCC and CRCs without MSI11,12. But these mutations occur in a region of the protein that is not conserved across multiple species, and thus their interpretation has been controversial13,14,15. An analogous situation is seen with MSH6 alleles. Classic MSH6 mutations cause CRC without MSI16, but a variant in the MSH6 ATP binding site was identified that confers CRC with MSI17.

Our studies provide strong molecular epidemiologic, structural, bioinformatic and functional evidence that MLH1 D132H underlies a new phenotype relevant to 'sporadic' CRC without MSI and suggest that such alleles may account for a higher proportion of population-based susceptibility to CRC than previously realized. Similarly, promoter polymorphisms that cause a partial reduction in MMR gene transcription may contribute to sporadic CRC18.

In individuals carrying MLH1 415G→C mutations, loss of heterozygosity rarely occurs (Supplementary Table 4 online). One explanation is that, as in sporadic CRC, MLH1 promoter hypermethylation of the wild-type allele19 occurs instead of loss of heterozygosity. Alternatively, MLH1 D132H ATP-binding activity is notably increased relative to that of wild-type MLH1 at the same time that ATPase activity is decreased (data not shown). Together, these results suggest that MLH1 D132H bound to ATP may be 'trapped' in a transition state and unable to catalyze ATP hydrolysis. This trapped MLH1 D132H protein complexed to MSH2, MSH6 and PMS2-MLH3 may increase susceptibility to CRC through a dominant-negative mechanism. Future mechanistic studies will be necessary to define more precisely the mechanism of MLH1 D132H action.

Future studies analyzing knock-in mice expressing Mlh1 D132H will be important for understanding precisely the different MMR-mediated mechanisms by which the attenuated allele causes cancer susceptibility. Furthermore, analyzing susceptibility to gastrointestinal tumors in mice expressing Mlh1 D132H versus Mlh1-null mice models will help to quantify more precisely the cancer risks attributable to inactivation of the non-MSI mismatch repair functions of MLH1.

In the Ashkenazim, founder mutations in BLM, BRCA1, BRCA2, FANCC, MSH2 and APC that are associated with susceptibility to cancer have been described4,6,20,21,22. Together with the APC allele encoding I1307K (10% carrier frequency in Ashkenazim with CRC) and BLMAsh (1.3% carrier frequency), which are associated with susceptibility to CRC without MSI, and the MSH2 allele encoding A636P (0.6% carrier frequency), which is associated with susceptibility to CRC with MSI, the identification of MLH1 415G→C (1.3% carrier frequency) should help identify Ashkenazim at greater risk for CRC. As MLH1 415G→C occurs in various ethnic groups, future population-based studies in other populations will be required to estimate more precisely the risk of CRC associated with this allele.


Israeli individuals with CRC and matched controls.

The MECC study is a population-based, matched case-control study that includes more than 3,000 incident CRC cases and controls. The MECC study participants were previously described22,23. The sampling frame is defined by all men, women and children at risk of CRC who live in the geographic region of northern Israel that includes the Northern and Haifa Districts. Eligible cases include any person newly diagnosed with CRC between 1 January 1999 and 31 March 2004 in this geographic region. Eligible cases were invited to participate and interviewed within 6 months of diagnosis on average. Potential controls are identified from the same sampling frame by generating a list of individuals from the Clalit Health Services database, matched with respect to exact year of birth, sex and clinic code. Ethnicity is evaluated in detail in the interview for better control of this source of variation in the analyses. Individuals previously diagnosed with CRC are not eligible to participate. The study was approved by the Institutional Review Boards at the University of Michigan and Carmel Medical Center in Haifa. Written, informed consent was required for eligibility.

High-density oligonucleotide array-based identification of sequence variants.

We designed a pair of oligonucleotide microarrays (Affymetrix), each containing 250,000 25-nucleotide probes, to interrogate sense and antisense sequences for all possible MLH1, MSH2 and MSH6 sequence variations in the 45 coding exons and >2 kb of total periexonic flanking sequences. These included probes complementary to all possible MLH1, MSH2 and MSH6 single nucleotide substitutions, one-, two-, three- and four-base-pair deletions and one- and two-base-pair insertions. A series of perfect match probes complementary to every 25-nucleotide segment of the ATM coding region are present in quadruplicate for both sense and antisense sequences across the arrays. We prepared genomic DNA from study participants and hybridized the arrays as previously described24. We wrote custom sequence analysis software to analyze the HNPCC Chip using DNA chip sequence normalization algorithms originally tested, and then improved, for BRCA1, BRCA2 and ATM DNA chips. This analysis software is available on request. We set the signals present after normalization to data generated from separate two-color cohybridization experiments (to correct for reproducible fluctuations in the ratio of test and reference perfect match probe hybridization) to a value of one for the exon containing a probe and to zero for all other exons. Two additional quantities reflect multiplicative intra- and intermolecular probe structure normalization scores. These attempt to normalize for inter- and intramolecular array probe hybridization and secondary structures that may also contribute to fluctuations in perfect match probe signal ratios. Similarly, we incorporated an algorithm to predict the potential for duplex formation between adjacent probes (intermolecular structure) within a feature in the array (intramolecular structure). The inter- and intramolecular probe structure predictions differ in that complementary nucleotides less than five positions apart are considered only in intramolecular probe structure analysis. This reflects the need for a loop of at least four residues to be present in a stable intramolecular hairpin structure. Finally, we averaged corrected ratios of test and reference perfect match probe hybridization signals against those generated from individual comparisons with data sets generated from multiple HNPCC Chip hybridization experiments. This provides a final set of multiplicative correction factors to minimize systematic fluctuations in hybridization signal ratios. We then plotted the ratios of reference and test perfect match probe signals against their respective nucleotide positions. We scored the loss-of-hybridization peaks according to peak height and width. We confirmed the sequences of all variants identified in study participants by fluorescent dideoxy DNA sequencing (Applera) using PolyPhred sequence analysis software.

SNP genotyping.

We genotyped SNPs using Masscode Technology (QIAGEN GmbH), which uses highly multiplexed detection of low molecular weight photocleavable Mass Tags in a single quadrupole mass spectrometer. The assay uses a two-step PCR process for signal amplification and SNP discrimination. Allele-specific incorporation of the Mass Tags occurs during the second amplification step, as previously described25. We carried out SNP assays using 4 ng of genomic DNA. We included positive and negative (no DNA) controls on every 96-well plate.

Statistical genetic analyses of variant allele frequencies.

We carried out matched and unmatched univariate analyses on each individual SNP using conditional and unconditional logistic regression as implemented in SAS version 8.2 (SAS Institute). We analyzed each SNP as presence versus absence of the minor allele.

MSI assays.

We analyzed MSI as previously described23. Tumors from 11 case subjects who carried the 415G→C polymorphism were available for MSI analysis. We extracted normal and tumor DNA from microdissected DNA and analyzed it for the consensus panel of five markers26. We labeled forward primers for Bat 25, Bat 26, D2S125 and D5S346 and reverse primers for D17S250 with [γ32P]ATP and included them in a 20-μl PCR reaction that included 1 μl of microdissected DNA. We separated PCR products on 6% polyacrylamide gels for 3 h at 65 watts and exposed them to film at −80 °C for 12–20 h. We double-scored the films and marked them as stable, instable or indicating loss of heterozygosity.

Microsatellite genotyping for conserved haplotype analysis and loss of heterozygosity studies.

We genotyped four microsatellite markers (5′–3′: D3S1298, D3S1277, D3S1611 and D3S3527) in and around MLH1 in 24 carriers and 21 noncarriers of the 415C allele to determine whether this allele was part of a conserved haplotype. Genotyping for D3S1277 was done in the University of Michigan Sequencing Core. Genotyping for D3S1298, D3S1611 and D3S3527 was optimized in the laboratory using γ32P-ATP-labeled forward primers with the annealing temperatures 54 °C (D3S1298), 60 °C (D3S1611) and 56 °C (D3S3527). We separated the PCR products on a 6% polyacrylamide gel for 3-4 h at 65 watts and exposed them at −80 °C for 12–20 h. We analyzed individuals from the Centre d'Etude du Polymorphisme Humain (CEPH) family 1347 with known genotypes alongside the study participants and used their results to double-score the genotypes.

Mapping of MLH1 415G→C onto MutL and PMS2 N-terminal crystal structures.

First, we superimposed the structures of E. coli MutL and human PMS2 ATPase domains, which showed that the ATPase active site, including the β-hairpin that contains MLH1 D132, is conserved. PMS2 His139, equivalent to MLH1 D132, forms a hydrogen bond with Asp119, which is equivalent to MLH1 His112. Therefore, we replaced His139 of PMS2 (using the ONO graphic program27 on a Silicon Graphic Workstation) with an aspartate (Asp132) and replaced Asp119 of PMS2 with a histidine (His112). We used the amino acid rotamer library to optimize the side chain conformations. We noticed that Tyr157 of MLH1 would be in the vicinity of Asp132 and His112 based on the E. coli MutL structure, which contains a tyrosine at the equivalent position rather than a serine as found in PMS2. The resulting model of MLH1 structure (Fig. 3b) is a composite of human PMS2 (all the backbone atoms and side chains of Asp132 and His112) and E. coli MutL (Tyr157 position).

ATPase biochemical studies of protein sequences of wild-type and D132H MLH1.

We amplified amino acids 1–304 of wild-type and D132H MLH1 using Pfx Polymerase (Invitrogen) and ligated them in-frame to the C terminus of GST (pGEX KT; Amersham Pharmacia), with a stop codon immediately following the MLH1 coding sequences. We purified GST fusion proteins to complete homogeneity as assessed by Coomassie-stained acrylamide gels as previously described28. We carried out ATPase comparative activity assays using equal amounts of protein for the wild-type and mutated allele and quantified and analyzed them as previously described29,30.

Note: Supplementary information is available on the Nature Genetics website.