Introduction

Lynch syndrome (LS) is the most common inherited cause of colorectal cancer (CRC). The estimated population frequency is 1:370 to 1:2,000 in Western populations1,2. LS is also associated with increased lifetime risk of several other cancer types including endometrial and ovarian cancer3,4. LS prevalence has never been determined across an entire nation.

Microsatellite instability (MSI) and mismatch repair deficiency (dMMR) are hallmarks of LS-related CRC. About 15% of CRC exhibit dMMR5 with 2–3% caused by germline mutations in the MLH1, MSH2, MSH6, PMS2 or EPCAM genes6 while 12% of CRC cases have somatic inactivation of MLH1 via promoter hypermethylation (MLH1-hm)7. Furthermore, double somatic mismatch repair (MMR) mutations may explain up to 67% of dMMR CRC cases without LS or MLH1-hm8.

Several groups in Europe and the United States have recommended universal screening of CRC9,10,11,12 using MSI testing or immunohistochemistry for the MMR proteins to identify potential LS cases. MLH1-hm can be assessed directly or inferred by the presence of a somatic BRAF V600E mutation13.

Due to the low frequency of individual LS mutations and heterogeneity in phenotypic expression, it has proven difficult to accurately establish population-based prevalences and to assess the cancer penetrance of LS gene mutations. Furthermore, numerous reported variants of uncertain clinical significance (VUS) complicate genetic counseling. Recently, a collaborative effort was undertaken to reclassify MMR variants in the International Society for Gastrointestinal Hereditary Tumours (InSiGHT) database (http://insight-group.org/variants/database/)14.

The Icelandic population offers several advantages for studying the genetic epidemiology and associated cancer risks of LS15. These include (i) a nationwide cancer registry dating back to 1955; (ii) universal tumour banking since 1935; (iii) documentation of the entire population’s genealogy over centuries; and (iv) the isolation and relative homogeneity of the population which enhances the potential to discover founder mutations. To date, the DNA of over 150,000 Icelanders has undergone whole-genome analysis on microarray platforms. Additionally, 8,453 Icelanders have undergone whole-genome sequencing (WGS). Using long-range phasing, these sequence variants, down to a frequency of <0.01%, have been imputed into the genomes of those genotyped and into un-genotyped close relatives identified via genealogic databases (familial imputation). This familial imputation allows the inclusion of genotypes for cancer cases that were diagnosed decades ago, greatly enhancing the power of this study.

The objective of this study was to investigate the prevalence of LS in the Icelandic population and to establish the different etiologies for dMMR in a population-based CRC cohort. Tumours from all CRC cases diagnosed from 2000–2009 were screened for dMMR using immunohistochemistry and methylation assays and patients were genotyped for MMR variants extracted from the deCODE database. Unexplained dMMR cases underwent germline WGS and if still unexplained, tumour ColoSeq. Furthermore, variants extracted from the deCODE database were merged with the cancer registry to estimate the cancer penetrance of LS gene mutations and cancer association of VUS’s found in the Icelandic population. We find that MSH6 and PMS2 mutations prevail in the population with a LS prevalence of 1 in 226, the highest reported so far.

Results

LS founder mutations in the Icelandic population

Association studies, using WGS data in the Icelandic population and information on all cancer types, revealed three LS mutations that showed significant association with CRC and endometrial cancer (Table 1). All three mutations are present at allelic frequencies >0.05%, presumably reflecting a founder effect in the population. One mutation in MSH6 (p.Leu585Pro, class 3 per InSiGHT) and another in PMS2 (p.Pro246Cysfs*3, class 5 per InSiGHT) were found in 9 and 12 patients with dMMR CRC, respectively (see below). A further mutation in PMS2 (p.Met1?; pathogenic mutation in National Center for Biotechnology Information (NCBI) ClinVar16) was identified in the population but not in any CRC patients diagnosed during 2000–2009. These mutations were imputed and the prevalence of LS in the Icelandic population determined to be 0.442% or one in 226 individuals. The imputation quality was tested by direct genotyping of the three variants. The concordance between imputed and directly measured genotypes was 1.00 for PMS2 p.Pro246Cysfs*3 and PMS2 p.Met1? and 0.99 for MSH6 p.Leu585Pro (Supplementary Table 1).

Table 1 Lynch syndrome mutations in the Icelandic population.

Patient characteristics

A total of 1,208 patients were diagnosed with colorectal carcinoma in Iceland from 2000–2009, with 1,182 (97.8%) included in this study. Patient characteristics are shown in Table 2. All patients with abnormal immunohistochemistry and 78.2% of patients with normal immunohistochemistry had germline DNA available for genotyping (80.6% of the cohort).

Table 2 Patient and tumour characteristics in the colorectal cancer cohort.

LS and dMMR in the CRC cohort

MMR immunohistochemistry was abnormal in 132 patients (11.2%; Table 3). MLH1-hm was found in 90 cases (7.6% of the cohort). Twenty-one patients with abnormal immunohistochemistry and six patients with normal immunohistochemistry had LS mutations (Table 1 and Supplementary Table 2). Eight different LS mutations were identified, including the MSH6 p.Leu585Pro and PMS2 p.Pro246Cysfs*3 mutations described above, four private mutations (found in a single family each) and two mutations in PMS2, not found in other family members or the population (de novo or very recent). The prevalence of LS in the CRC cohort was therefore 2.3% (27/1182). The median age at CRC diagnosis of patients with the MSH6 p.Leu585Pro mutation was 60 (Q1 51, Q3 73; range 41–75) and the median age of patients with the PMS2 p.Pro246Cysfs*3 mutation was 60.5 (Q1 50.5, Q3 70; range 31–86). One patient with PMS2 tumour loss had a class 3 PMS2 variant (p.Glu705Lys). Tumour ColoSeq identified the same variant as well as likely loss of heterozygosity (LOH) in the tumour (Table 4). We, therefore, believe this variant is pathogenic and classified this case as having LS. Tumour ColoSeq was performed on 5 of 6 tumour samples from patients with LS who had normal MMR IHC (Table 4). Second hits were found in 3 cases with a MSH6 p.Leu585Pro mutation but not 2 cases with a PMS2 p.Pro246Cysfs*3 mutation. Twenty-one patients with dMMR tumours had neither a LS mutation or MLH1-hm. Sixteen patients were found to have double somatic MMR mutations by ColoSeq tumour testing (1.4% of the cohort; Table 4). Five dMMR cases remained unexplained, three cases had one somatic MSH2 mutation and tested negative for MSH2 methylation suggesting an unidentifiable MSH2 mutation in germline or tumour DNA. Two cases failed tumour testing due to low tumour DNA content. None of these patients had a convincing family history of cancer. By using the 953 individuals (80.6% of the cohort) that had germline genotyping as a denominator, the sensitivity and specificity of IHC in detecting LS was 77.8% and 97.7%, respectively. The positive and negative predictive value of abnormal IHC predicting LS was 50.0% and 99.3%, respectively.

Table 3 Colorectal cancer cases with abnormal mismatch repair protein immunohistochemistry.
Table 4 Somatic mismatch repair gene mutations as tested by ColoSeq.

Origin of LS mutations

The PMS2 p.Pro246Cysfs*3 mutation carriers in Iceland, share a short haplotype with individuals from Sweden, Britain and the US (Supplementary Table 3), indicating that this mutation arose from a common ancestor, previously dated back 1,625 years17. Three private LS mutations in the CRC cohort and one private mutation outside of the CRC cohort were identified (Table 1 and Supplementary Fig. 1). The MSH6 mutations (p.Val282Thrfs*10, p.Phe1088Leufs*5, p.Arg1172Lysfs*5) and the MSH2 mutation (p.Tyr815*) were traced back to ancestors in the 1700–1800s by genotyping individuals who shared haplotypes around the mutation. Interestingly, a novel balanced translocation, T(3p22;5q31), with breakpoints in MLH1 intron 3 and ZCCHC10, was identified by WGS and confirmed by karyotyping and fluorescence in situ hybridization (FISH) in a family with high incidence of cancer (Fig. 1). This translocation was found in a parent and was inherited by several offspring with nearly 100% cancer penetrance (colorectal, endometrial, gastric, ovarian and renal cell carcinoma; three offspring developed two cancers) but not found in the 8,453 Icelanders who have undergone WGS.

Figure 1: Novel MLH1 translocation.
figure 1

(a) The top half of the figure displays enlarged GRCh38 reference sequences from within MLH1 on chromosome 3 (blue) and from the reverse complement of chromosome 5 within ZCCHC10 (red) in a patient with CRC with MLH1/PMS2 absent on IHC. The bottom shows the translocation breakpoints observed in the sequencing data. The translocation connects an intron of MLH1 with an intron of ZCCHC10 in sense of the genes’ orientations. At the breakpoint, a 3-bp duplication is present on both translocation haplotypes (dark blue), a 5-bp deletion is missing from both translocation haplotypes (dark red), and a 10-bp motif from nearby got inserted in reverse complemented orientation into one of the translocation haplotypes (yellow). (b) FISH, chromosome 3 (red CY3), chromosome 5 (green fitc). Chromosomes 3 and 5 are marked with white arrows. The scale shown is 5 μm. (c) Family tree with cancers and age at diagnosis. Proband is indicated with an arrow.

LS and associated cancer risks

The three imputed mutations (MSH6 p.Leu585Pro, PMS2 p.Pro246Cysfs*3 and PMS2 p.Met1?) vary in risk for different LS-associated cancer types as shown in Table 5. The MSH6 p.Leu585Pro mutation is associated with a nearly 50% lifetime risk of endometrial cancer and a 36% and 25% risk of CRC in males and females as well as a 12% lifetime risk of brain cancer (glioma). The two PMS2 mutations are associated with a significantly increased risk of endometrial, colorectal and ovarian cancer, although the overall cancer risks are lower than for the MSH6 mutation. Risks for other cancers were not significantly increased and are shown in Supplementary Table 4.

Table 5 Icelandic Lynch syndrome mutations and odds ratios for different cancers.

Variants of uncertain significance

We identified 13 MMR gene variants previously described as VUS’s and ten novel variants, in the Icelandic population. These were imputed and odds ratios for CRC estimated. None of these variants were associated with cancer risk or unexplained dMMR suggesting they are benign variants. These variants are listed in Table 6.

Table 6 Mismatch repair gene variants of uncertain significance and associated colorectal cancer risk in the Icelandic population.

Discussion

In this comprehensive nationwide study, WGS and imputation were combined with dMMR screening of CRC to accurately determine the prevalence of LS in the Icelandic population. LS causes 2.3% of all CRC in Iceland which is similar to the prevalence in two CRC population-based studies from Finland19 and the US6 but it is higher than LS prevalence in Southern Europe20,21,22,23. The distribution of mutations among the MMR genes in Iceland is unique. LS is predominantly caused by mutations in MSH6 and PMS2 which are responsible for 96% of all LS cases in the CRC cohort while mutations in these two genes cause 28% of LS-associated CRC in the US6. Only a single mutation in MSH2 and a single novel MLH1 translocation were found in the population.

The low rates of MLH1 and MSH2 mutations constitute a negative founder effect, that is, the founders may not have introduced MLH1 and MSH2 mutations into the population. Genetic drift, which is more common in smaller populations, may also have influenced low rates of MLH1 and MSH2 mutations. Furthermore, the higher cancer penetrance and earlier onset associated with MLH1 and MSH2 mutations may have impacted reproductive fitness. Over 50 mutations showing founder effects have been described across the world and in some populations they have a large effect on the LS gene distribution24. In Finland, two MLH1 mutations cause >50% of all LS cases19,25 and in the Netherlands26 and Sweden27, MSH6 mutations are unusually highly prevalent. The PMS2 p.Pro246Cysfs*3 mutation in Iceland shares a haplotype with cohorts from Sweden, US and Britain17. This mutation was previously dated back to around 1,625 years ago17, a time before Iceland was settled and it is likely to have entered the gene pool via one of the original settlers. The MSH6 p.Leu585Pro and PMS2 p.Met1? mutations have been described in InSiGHT and NCBI ClinVar but are not known to be founder mutations in any population. The MSH6 p.Leu585Pro mutation is clearly pathogenic in this study with a strong cancer risk association. The PMS2 p.Met1? mutation has a cancer risk association indicating that it is a pathogenic mutation. In addition, four private LS mutations were found and traced back to a single common ancestor in each family who lived in the 1700–1800s.

The MLH1 germline translocation described is to our knowledge the first case of a translocation causing LS. The translocation is passed to offspring and affected family members had close to 100% cancer penetrance. Screening for translocations where LS is suspected and germline mutations cannot be found should be considered.

The population-based prevalence of LS has been estimated in CRC and endometrial cancer studies in several countries but no country has determined the prevalence empirically. It has been postulated that LS prevalence is higher than estimated in most studies1 and here we show the highest prevalence so far of 0.442% or 1:226 individuals. Of note, the prevalence is higher than elsewhere even though the proportion of LS in CRC is just 2.3%. This is due to the markedly low cancer penetrance of the MSH6 and PMS2 mutations in Iceland as compared with cancer penetrance with MLH1 and MSH2 mutations, dominating in most populations.

One of the strengths of this study is the ability to determine cancer risk for each of the imputable mutations. This will help tailor genetic counseling in Iceland, specific to the individual mutation. The MSH6 p.Leu585Pro mutation is associated with a high (50%) lifetime risk of endometrial cancer while the PMS2 mutations (p.Pro246Cysfs*3 and p.Met1?) have a lower lifetime risk of cancer. One of the limitations of this study is the small size of the population, which makes it difficult to estimate the risk of rare cancers such as brain cancer (confidence intervals around odds ratios are wide).

Our two-pronged approach, of linking population-based MMR variants to a dMMR CRC phenotype, imputing these variants into the population and performing an association analysis with the cancer registry, strengthens the assumption of pathogenicity (or lack thereof). The MSH6 p.Leu585Pro mutation is described as a class 3 VUS in InSiGHT but, based on our data, should be reclassified as a pathogenic mutation. Thirteen other VUS’s were not associated with a dMMR phenotype or increased cancer risks. Some of these variants can be reclassified as benign variants based on our data (variants with high population frequency and low odds ratios with narrow 95% confidence interval while others may require additional data before reclassification).

The incidence of dMMR via immunohistochemistry (11.2%) was lower than reported in many studies. In this study, an etiology was found for nearly all dMMR cases, leaving very few unexplained dMMR cases (only 0.4% of all cases remained unexplained). As 80% of all cases were genotyped, sensitivity and specificity could be calculated for the immunohistochemistry method. In three cases with the MSH6 p.Leu585Pro germline mutation, the stains were weak but present in >1% of cells (Supplementary Table 2) and a second tumour mutation was found on ColoSeq (Table 4) indicating that these tumours developed as a result of LS. In two PMS2 p.Pro246Cysfs*3 cases with strong stains ColoSeq did not detect a second tumour mutation so these patients developed a sporadic CRC unrelated to their germline mutation. Sporadic tumour development is likely more common with the lower penetrance mutations in MSH6 and PMS2.

The population-based incidence of double somatic MMR mutations in CRC is described in this study for the first time. In total, 1.4% of all CRC had double somatic MMR mutations. These cases followed the more traditional pattern of MMR mutations with 81.3% of the mutations occurring in MLH1 and MSH2. The cancer etiology in three cases with just one somatic hit in MSH2 remains unclear. The family history was unconvincing so it is unlikely that a LS mutation was missed. Nevertheless such cases present a challenge for genetic counseling. The tumour location in LS patients was left-sided or rectal in 50% of cases while it was predominantly right-sided in the MLH1-hm and double somatic MMR cases (>80%). It is possible that left-sidedness is more common with MSH6 and PMS2 mutations as compared with MLH1 and MSH2 mutations. Less than 10% of patients with dMMR CRC presented with metastastic disease, similar to other studies on dMMR CRC. A male predominance was seen in the LS group, a female predominance in the MLH1-hm group while the double hit sporadic tumours had a more even gender distribution.

In conclusion, this study has mapped the prevalence of LS in the Icelandic population, and its associated cancer risks and established that one in 226 Icelanders carry this syndrome. Furthermore, the contribution of different etiologies to dMMR CRC has been described. It is likely that among the population of 320,000 Icelanders, >1,000 individuals carry mutations causing LS. Identifying these individuals and establishing a cancer screening program are imperative next steps. Thirteen class 3 variants were not associated with an increased cancer risk and one class 3 variant was determined to be pathogenic; these results could guide counseling of patients with these variants. Finally, we have shown that a combined approach of phenotypic screening and population-based WGS can be used to extensively map out an inherited syndrome with a well-recognized phenotype.

Methods

CRC patient population

All patients diagnosed with CRC in Iceland from 1 January 2000 to 31 December 2009 as identified through the Icelandic Cancer Registry (http://www.krabbameinsskra.is) were included in the study. Figure 2 depicts the study design and reasons for exclusion. The Icelandic National Bioethics Committee (VSNb2013010033/03.15), the Icelandic Data Protection Authority (2013010109TS), and the Ohio State University (OSU) Institutional Review Board (2013C0144) approved this study. Descriptive statistics (median with quartiles for age and frequency for categorical variables) were provided to summarize the patient population. Cases where the origin of the primary tumour was regarded to be non-colonic (appendix, ileum and so on.) or unknown were excluded. Charts were accessed and clinical information obtained. Foreigners and patients without available tumour material were excluded. In cases of synchronous CRC, only the tumour with the most advanced stage or stage subdivision was analysed. In cases of metachronous CRC, the tumour diagnosed in the defined period was used (the first one if there were two).

Figure 2: Study design.
figure 2

CRC, colorectal cancer; IHC, immunohistochemistry; LS, Lynch syndrome; WGS, whole-genome sequencing.

Immunohistochemistry and MLH1 hypermethylation testing

Haematoxylin and eosin stained tumour slides (obtained from Landspitali University Hospital, Akureyri Hospital and Histopathology Institute, Sudurlandsbraut) were reviewed and formalin-fixed paraffin embedded tissue blocks selected for further analysis. Tissue microarrays were built at the OSU Pathology Core Facility with two 1.0 mm cores from each case. Immunoperoxidase staining was performed using primary antibodies for MLH1 (Novacastra, Buffalo Grove, IL; NCL-L-MLH-1; Clone:ESO5; diluted 1:500), MSH2 (Calbiochem, [Merck Biosciences AG], Basel-Land, Switzerland; NA-27; Clone:FE11; diluted 1:3,000), MSH6 (Epitomics Inc, Burlingame, CA; AC-0047; Clone:EP49; diluted 1:800) and PMS2 (BD Pharmingen, San Jose, CA; 556415; clone:A16-4; diluted 1:300). Stains were scored as present with convincing nuclear staining in tumour cells with a positive internal control. In cases where only biopsies were available or rectal cancer cases where pre-radiation therapy biopsy samples were chosen, biopsies were stained for MSH6 and PMS2. If either were lost, stains were performed for MLH1 (if PMS2 lost) and MSH2 (if MSH6 lost). In cases scored as absent where no MLH1-hm or germline mutation was found, stains were repeated on a whole section (if previously done on tissue microarrays). In cases found to have LS with normal immunohistochemistry, stains were repeated on a whole section.

Tumours with MLH1 loss were tested for MLH1-hm by pyrosequencing using the Pyromark Q96 CpG MLH1 kit (QIAGEN, Hilden, Germany). Up to five 1.0 mm cores were used for tumour DNA and germline DNA extraction by standard methods. Methylation levels of ≥15% were classified as positive for hypermethylation. The Pyromark Q96 CpG MLH1 kit (QIAGEN, Hilden, Germany) tests hypermethylation in four sites of the promoter region of MLH1, positions −209 to −188. Polymerase chain reaction is used to amplify the region and the degree of methylation of four CpG sites is analysed in a single pyrosequencing reaction by taking the average of four sites. In cases where only biopsy tissue was available and MLH1-hm testing failed, BRAF immunohistochemistry for the V600E mutation was performed as previously described using the VE1 antibody (1:700, incubation 15 min; Spring Bioscience, Pleasanton, CA) with an automated staining system (Bond Autostainer; Leica Microsystems, Buffalo Grove, IL)28. BRAF V600E immunohistochemistry was graded as positive for the mutation when diffuse cytoplasmic staining in tumour cells was seen.

Detection of LS mutations

The whole genomes of 8,453 Icelanders, irrespective of cancer status, were sequenced using Illumina technology to a mean depth of at least 10 × , unveiling 31.6 million single-nucleotide polymorphisms (SNP) and short insertions/deletions that meet stringent quality criteria. These variants were imputed into 150,656 Icelanders whose DNA had been genotyped with various Illumina SNP chips and phased using long-range phasing29,30.

All patients with dMMR tumours had germline DNA genotyped for MMR variants found by WGS of the 8,435 Icelanders. If no LS mutations were identified, WGS was undertaken on blood samples with Illumina technology or, in cases where blood DNA was not available, DNA from archived formalin-fixed paraffin embedded normal tissue was subjected to Sanger sequencing of the MMR genes. Single variant genotyping was carried out by Sanger sequencing. The PMS2 p.Pro246Cysfs*3 variant was also genotyped using PCR and size fractionation. The region around chr7:5997387 was amplified from genomic DNA from blood using conventional PCR and size fractionated on 3730 DNA Analyser (Applied Biosystems Inc.). The PMS2 p.Met1? variant was also genotyped using the Centaurus (Nanogen) platform. All primer sequences are listed in Supplementary Table 5. To assess the quality of the imputation, a set of imputed carriers and non-carriers of the variants were genotyped using single marker assays, comparing imputed genotypes to genotypes obtained by direct genotyping (see Supplementary Table 1).

Whole-genome genetic analyses of the Icelandic population

WGS and genotyping, imputation and association analysis used in the Icelandic population were as described15. The whole genomes of 8,453 Icelanders were sequenced using Illumina technology to a mean depth of at least 10 × (median 32 × ). SNPs and indels were identified and their genotypes called for all samples simultaneously using the Genome Analysis Toolkit (GATK version 2.2–13)31. Genotype calls were improved by using information about haplotype sharing, taking advantage of the fact that all the sequenced individuals had also been chip-typed and long range phased. A total of 31.6 million SNPs and short indels that met stringent quality criteria were identified in the 8,453 sequenced Icelanders. These variants were then imputed into 150,656 Icelanders who had been genotyped with various Illumina SNP chips and their genotypes phased using long-range phasing29,30. Genealogical deduction of carrier status of 294,212 un-typed relatives of chip-typed individuals further increased the sample size for association analysis and increased the power to detect associations. Individuals who have any form of cancer and controls were derived from the chip-typed individuals and un-typed relatives. Association testing for case–control analysis was performed using logistic regression.

To account for inflation in test statistics due to cryptic relatedness and stratification, we applied the method of LD score regression32,33. With a set of 1.1 M variants, we regressed the χ2 statistics from our genome-wide association study scan against LD score and used the intercept as a correction factor. The LD scores were downloaded from an LD score database (see URL). For the traits reported here, the estimated correction factors were 1.15 for CRC, 1.12 for colon cancer, 1.04 for rectal cancer, 1.05 for endometrial cancer, 1.04 for ovarian cancer, 1.03 for brain cancer, 1.25 for breast cancer, 1.07 for bladder cancer, 1.02 for esophageal cancer, 1.01 for gallbladder/bile duct cancer, 1.13 for gastric cancer, 1.02 for liver cancer, 1.00 for myelodysplastic syndrome, 1.09 for pancreatic cancer, 1.22 for prostate cancer, 1.01 for renal/ureteral cancer, 1.03 for testicular cancer.

Information on cancer in the study population

Individuals affected with all forms of cancer were identified through the Icelandic Cancer Registry. The Icelandic Cancer Registry contains all cancer diagnoses in Iceland from 1 January 1955. Over 90% of diagnoses are histologically confirmed. The Icelandic Cancer Registry contains records of 4,434 Icelandic CRC patients (52% males) diagnosed from 1 January 1955 until 31 December 2013. Recruitment of cancer cases of all types was initiated in 2001 and included all prevalent cases as well as newly diagnosed cases from that time. Of the 1,685 CRC cases diagnosed from 1 January 2001–31 December 2013, 1,354 (80%) participated in our study. Patients are recruited by trained nurses on behalf of the patients’ treating physicians, through special recruitment clinics. Participants in the study sign an informed consent form, donate a blood sample and answer a lifestyle questionnaire.

The median age at diagnosis for all consenting cases was 72 years, the same as that for all CRC patients in the Icelandic Cancer Registry. In addition to the chip-genotyped cases, we used information on 2,480 CRC cases without chip information whose genotype probabilities were imputed using methods of familial imputation15. The 262,425 controls (combined chip-typed and familially imputed) used in this study consisted of individuals from other ongoing genome-wide association studies at deCODE. No individual disease group is represented by more than 10% of the total control group. Samples from other cancer cases used in the cross-risk analysis come from other ongoing projects at deCODE Genetics. All subjects were of European ancestry.

All sample identifiers were encrypted in accordance with the regulations of the Icelandic Data Protection Authority. Approval for the study was granted by the Icelandic National Bioethics Committee (ref. VSNb201410008/03.12) and the Icelandic Data Protection Authority (ref. 2014101449).

The Icelandic genealogy

The Icelandic genealogical database contains 819,410 individuals back to 740 AD. Of the 471,284 Icelanders recorded to have been born in the 20th century, 91.1% had a recorded father and 93.7% had a recorded mother in the database. Similarly, of the 183,896 Icelanders recorded to have been born in the 19th century, 97.5% had a recorded father and 97.8% had a recorded mother.

Haplotype analysis for the PMS2 frameshift mutation

Seven SNPs spanning exons 5–9 in the PMS2 gene were analysed in patients carrying the PMS2 p.Pro246Cysfs*3 founder mutation and compared to the haplotype found in carriers of this founder mutation in US, Swedish and British populations17. These are described in Supplementary Table 3.

MLH1 translocation, karyotyping and FISH

To scan for structural variation, all reads from the WGS data that align to the respective genes including 100 Kbp flanking regions to both sides of the genes were extracted. All reads that have a mapping quality of 0 or an average PHRED-scaled base calling quality of 25 or below were excluded. The remaining sets of reads were analysed for read pair discordance: A read was considered be part of a discordant read pair if (1) its mate is unaligned or aligns to a different chromosome or (2) the read and its mate align in unexpected orientation to each other or (3) the distance between the alignment positions of the read and its mate differ by more than three standard deviations from the mean insert size of the sequencing library. The discordant pairs that passed these filters were analysed manually for clusters where the reads align to similar positions. Suspicious clusters were further examined for alternative alignment locations of the two read ends and discarded if the location in the reference genome was repetitive. Further, clusters were discarded as common variants if present in many whole-genome sequenced controls. The only cluster that passed all filters indicates the translocation between chromosome 3 (MLH1) and chromosome 5 (ZCCHC10). The exact breakpoint positions and target-site events were inferred from 36 soft-clipped reads from the two locations on chromosome 3 and 5 in the WGS data of one patient and confirmed by 31 soft-clipped reads in a relative.

Metaphase chromosomes were harvested from PHA-stimulated (phytohaemagglutinin) lymphocytes cultured in McCoýs 5A medium with 20% fetal calf serum applying standard methods. Chromosomes were G-banded and karyotyped34. Whole chromosome FISH was performed with directly labelled StarFISH paint probes (Gambio Ltd., Cambridge, UK) for chromosomes 3 (CY3-labelled) and 5 (fitc-labelled) from Cambio, following their protocols. Chromosomes were counterstained with DAPI (4,6-diamidino-2-phenylindole). Image analysis was done using Leica CW4000 FISH software and Leica DMRA2 microscope with appropriate filters for FITC and Cy3.

Somatic tumour mutation and MSH2 methylation analysis

Three micrograms of tumour DNA (1 ug in two biopsy cases) were used to perform ColoSeq tumour next-generation sequencing in dMMR cases that remained unexplained after germline mutation and MLH1-hm analysis and selected LS cases35. ColoSeq tumour is a clinical diagnostic assay that detects single nucleotide, indel and deletion/duplication mutations in MLH1, MSH2, MSH6, PMS2, and EPCAM as well as the BRAF p.V600E mutation and phenotypic MSI. The assay uses paired-end sequencing on the Illumina HiSeq 2500 instrument to sequence all exons, introns, and flanking sequences at >300 × average coverage. LOH was determined by haplotype analysis of the variant allele fraction. Cases were considered solved if: (1) Two pathogenic or likely pathogenic mutations were identified (mutations identified as class 4 or 5 or predicted to result in protein truncation); or (2) One pathogenic or likely pathogenic mutation was identified with associated LOH. Cases were considered possibly solved if only one pathogenic or likely pathogenic mutation was identified with possible LOH. Phenotypic MSI was assessed with ColoSeq tumour next-generation sequencing data using mSINGS (MSI by NGS)36.

One microgram tumour DNA was treated with sodium bisulphite using the EZ Methylation Gold kit (Zymo Research, Irvine, California), according to the manufacturer’s instructions. Approximately 100 ng bisulphite-converted template was tested for the presence of methylation by two methods, quantitative CpG pyrosequencing and real-time methylation-specific PCR (qMSP) and temperature denaturation analysis, alongside the Universal Methylated Human DNA Standard (fully methylated) and Human WGA Non-Methylated DNA (unmethylated) control samples (Zymo Research, Irvine, California). CpG pyrosequencing was conducted as previously described37 with some modifications. PCR amplification was performed with 1 μM of each primer 5′-Biotin-TTTGGAAGTTGATTGGGTGTGGT-3′ and 5′-CYACTTCTCCYACATACCCTAAAAAAAAC-3′ and 3 μM MgCl2 with cycling conditions of 95 °C for 5 min, followed by 45 cycles of 94 °C for 30 s, annealing at 66 °C for 30 s and extensions at 72 °C for 30 s, then a final extension step at 72 °C for 10 min. PCR products were prepared for pyrosequencing according to the standard protocol. Pyrosequencing was performed using internal primer 5′-CCACACCCACTAAACTATT-3′ with nucleotide dispensation order 5′-CTA CGA CTC CTC ATC GAT CCA GAT CAG ATC GAT ACA GAC ATC AGA TCA GAC AC-3′ on the PyroMark Q96-ID system (Qiagen). Methylation levels were measured using the PyroMark CpG Methylation software. The methylation levels were quantified at six consecutive CpG sites. Semi-quantitative methylation-specific PCR of the MSH2 promoter was performed using primers 5′-GTAGTAGTTAAAGTTATTAGCGTGCGCG-3′ and 5′-TCCTTCGACTACACCGCCATATCG-3′ with cycling conditions of 95 °C for 6 min, followed by 42 cycles of 94 °C for 30 s, annealing at 59 °C for 30 s, and extension at 72 °C for 30 s, followed by a melt curve from 65 °C to 94 °C at 0.5 °C increments for 5 s each. A MyoD control reaction for DNA input and normalization of MSH2 methylation levels was included, as we have previously described38. Real-time qMSP was performed on the CFX-96 Real-time System (BioRad).

Data availability

The variants found in the MMR genes (pathogenic, likely pathogenic and variants of unknown significance) that support the findings of this study have been deposited in InSiGHT (https://www.insight-group.org/variants/databases/) with the accession codes (http://insight-database.org/variants/0000004813 to http://insight-database.org/variants/0000004866). LD score database (accessed 23 June 2015) ftp://atguftp.mgh.harvard.edu/ brendan/1k_eur_r2_hm3snps_se_weights.RDS. All other remaining data are available within the Article and Supplementary Files, or available from the authors upon request.

Additional information

How to cite this article: Haraldsdottir, S. et al. Comprehensive population-wide analysis of Lynch syndrome in Iceland reveals founder mutations in MSH6 and PMS2. Nat. Commun. 8, 14755 doi: 10.1038/ncomms14755 (2017).

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.