Main

Recent developments in high-throughput sequence capture methods and next-generation sequencing technologies have made exome sequencing a viable approach to the identification of pathological mutations, both from a technical standpoint and in terms of being cost-effective.1, 2, 3, 4 The advent of exome sequencing has already contributed significantly toward the identification of new causal mutations (and genes) for a number of previously unresolved Mendelian disorders such as Kabuki syndrome, Miller syndrome, Sensenbrenner syndrome, and Fowler syndrome to name just a few. Further, exome sequencing has proven to be an effective tool to interrogate the genetic basis of Mendelian disorders in samples derived from both families and unrelated individuals.5, 6, 7, 8 Since the inception of the idea of using exome sequencing as both a discovery9 and a diagnostic tool10 for Mendelian disorders, this field has advanced very considerably.11 Accompanied and aided by other technical advances such as the development of computational and statistical approaches to interrogate the myriad variants identified by exome sequencing,12, 13 including algorithms to detect copy number variants using exome sequencing data,14 and the idea (and practical demonstration) of using single-nucleotide polymorphism genotypes extracted from exome sequencing data to perform accurate genetic linkage mapping to reduce the ‘search space’ for genetic variants,15 exome sequencing has emerged as a mature analytical approach.

Although major progress has been made in understanding the genetic basis of Mendelian disorders over the past 3 years using exome sequencing, so far only limited studies have interrogated familial forms of cancer, ie, familial pancreatic cancer16 and hereditary pheochromocytoma (a rare neural crest cell tumor).17 By harnessing the latest technological advances, Jones et al16 identified a germline truncating mutation in PALB2 through exome sequencing a single patient with familial pancreatic cancer. That this patient might have a familial form of pancreatic cancer was suggested by the fact that his sister had also developed the disease. In similar manner, mutations in MAX, the MYC-associated factor X gene, were also identified through sequencing the exomes of three unrelated individuals with hereditary pheochromocytoma.17

Since 2005, >100 genome-wide association studies have been performed to interrogate the genetic basis of various sporadic or polygenic forms of cancer (such as colorectal, prostate, breast, and lung) for which numerous statistically robust and novel single-nucleotide polymorphisms or genetic loci have been identified.18, 19 In addition to their polygenic nature, these cancers are multifactorial, involving a complex interaction of multiple genetic and environmental factors. By contrast, little progress has so far been achieved in the context of ‘familial’ cancers (ie, cancers displaying a very evident family history with clustering of multiple affected family members). More specifically, familial forms of cancer typically occur in more individuals in a given family than would be expected by chance alone. Familial cancers are often characterized by their occurrence at a comparatively early age, thereby indicating the potential presence of a gene mutation that increases the risk of cancer. However, familial clustering of cases may also be a sign of a shared environment or lifestyle, or alternatively chance alone. By contrast, sporadic cancers lack any obvious family history of the disease.

The slow progress of research into familial cancer has been illustrated, for example, in hereditary diffuse gastric cancer. CDH1 was the first causal gene identified for this cancer in 1998,20 and it remains the only known gene underlying hereditary diffuse gastric cancer. However, germline mutations in this gene account for only a proportion of hereditary diffuse gastric cancer cases,21 suggesting that an as-yet-to-be identified gene(s) is likely to be responsible for the remaining cases unexplained by CDH1. Similarly, BRCA1 and BRCA2 are the only high-penetrance genes for familial breast cancer, although numerous novel single-nucleotide polymorphisms and genetic loci conferring low-to-moderate risk or effect size (odds ratio <1.5) have been identified by genome-wide association studies of polygenic breast cancer.22, 23 Some of these common alleles have been reported to modify risk in BRCA1 and BRCA2 mutations carriers.24 However, so far the results from genome-wide association studies have limited value for individual risk prediction,25 as compared with the high-penetrance inherited mutations in causal genes for familial breast cancer which can prompt drastic clinical intervention such as mastectomy. An analysis to evaluate the potential for individualized disease risk stratification based on common single-nucleotide polymorphisms identified by genome-wide association studies in breast cancer came to the conclusion that the clinical utility of single, common, low-penetrance genes for breast cancer risk prediction is currently quite limited.26

In the context of familial colorectal cancer, the genetic causes of familial adenomatous polyposis and Lynch syndrome have been well documented; in most instances, they are accounted for by germline mutations in the APC gene and DNA mismatch-repair genes (ie, MSH2, MLH1, MSH6, and PMS2), respectively. For example, ∼90% of familial adenomatous polyposis cases are caused by germline mutations in the APC gene. The majority of these mutations introduce a premature stop codon resulting in a truncated protein. Similarly, the MSH2 and MLH1 genes harbor >90% of the germline mutations found in Lynch syndrome patients.27, 28 By contrast, the genetic etiology of familial colorectal cancer type X remains largely unknown.29 It is widely anticipated that new insights generated from studies on familial colorectal cancer type X will lead to the molecular characterization of a novel form of familial colorectal cancer which will necessitate the reclassification of subsets of families with a strong history of colorectal cancer.

Common approaches to interrogate the genetic basis of disease

Several genetic approaches are available to interrogate the genetic basis of disease. Hence, choosing the study design to best fit the research questions to be answered is critical. This is in turn influenced by the nature of the disease, or the postulated model of the disease. The nature of the disease refers to whether it is likely to be monogenic or polygenic at both extremes of a ‘genetic spectrum’. At one extreme, monogenic disease is caused by the action of a high-penetrance mutation, whereas at the opposite extreme, polygenic disease is likely to result from the complex interaction of multiple genetic variants with low penetrance (measured as an odds ratio in association studies, ie, odds ratio <1.5).30, 31, 32, 33, 34, 35, 36, 37 The penetrance estimates the proportion of individuals carrying a particular mutation or variant that also express an associated phenotype or disease. Thus, most carriers of high-penetrance mutations will eventually develop the clinical phenotype in question.

Some familial cancers such as familial adenomatous polyposis, Lynch syndrome, and hereditary diffuse gastric cancer are clear examples of monogenic diseases. However, the distinction between monogenic and polygenic diseases is frequently blurred, particularly in the context of cancer predisposition, and the nature of some disease states could lie somewhere on a continuum. Although monogenic disease is caused by a single complete penetrance or high-penetrance mutation, modifier mutations are frequently found that affect the clinical manifestations of the disease, impacting on, eg, disease severity or phenotypic heterogeneity.38, 39, 40 Thus, variants on 8q23.3 and 11q23.1 have been found to modify colorectal cancer risk in Lynch syndrome.41, 42 On the other hand, some polygenic diseases such as age-related macular degeneration have been attributed to several genetic variants with considerably large effect sizes (eg, the variants in CFH; odds ratio=4.6), which taken together, account for 50% of the heritability of the disease. By contrast, 32 genetic variants identified as being associated with Crohn's disease are able to explain only 20% of the heritability.43

Genome-wide linkage studies followed by positional cloning have been very successful in identifying causal mutations for monogenic disorders because the mutations segregate perfectly with the disorders according to classical Mendelian inheritance patterns (ie, autosomal dominant, autosomal recessive, and X-linked). The perfect segregation patterns rely upon complete (or at least high) penetrance of the causal mutation.36 For example, APC was identified back in 1991 through linkage analysis and positional cloning involving chromosome 5q21.44, 45 Homozygosity mapping, on the other hand, is a more powerful and effective approach to study recessive Mendelian disorders in consanguineous families. Such traditional approaches that narrow the search down to a specific genomic locus have now been coupled with next-generation sequencing to identify the underlying gene and causal mutation(s). However, extremely rare disorders or conditions characterized by de-novo mutations are not amenable to analysis using this type of study design. The advent of exome sequencing has however succeeded in identifying new causal mutations and genes for disorders previously unresolved by traditional approaches.7, 8

Exome sequencing requires several sequence enrichment steps followed by massively parallel sequencing. The development of commercial whole-exome enrichment kits by Agilent, Illumina, and Nimblegen has been largely responsible for making exome sequencing feasible in practical terms for the ordinary laboratory.1, 2, 3, 4 During the enrichment steps, the genomic regions of interest (ie, all exons) are captured, whereas the unwanted DNA sequences (ie, the non-coding regions) are removed before sequencing, leading to a significant reduction in the proportion of the genome that needs to be sequenced. As a consequence, a greater sequencing depth (ie, a larger number of sequence reads covering a given region) of the targeted regions can be achieved. This is particularly important in the context of diagnostic applications, whether to allow the accurate detection of inherited disease mutations or to confirm that an apparently negative result is correct.46 Figure 1 displays a schematic illustration of library preparation using array-based and in-solution hybrid capture.

Figure 1
figure 1

Schematic illustration of library preparation using array-based and in-solution hybrid capture. The workflow involves multiple steps and is similar for the capture of both exome and custom or targeted regions. However, the difference between the two contexts lies in the different sets of probes required for the exome and the targeted analysis of specific genes. The difference between on-solid and in-solution capture methods is that the oligonucleotide probes are tethered on microarrays or are suspended in-solution (oligonucleotide probes attached to beads), respectively. The capture of the adapter-ligated DNA fragments is based on their complementarity with the oligonucleotide probe sequences. The coupling of these enrichment methods with NGS technologies has made exome sequencing more feasible technically as well as more cost-effective.

Exome sequencing is considered to be a more cost-effective tool, and analytically less challenging, than whole-genome sequencing. Despite the higher throughput and steadily decreasing cost of next-generation sequencing, performing whole-genome sequencing on hundred samples is still prohibitively expensive for most laboratories. This is particularly pertinent in the context of studying polygenic diseases where a large sample set is often required to achieve adequate statistical power to identify disease-associated variants that subsequently turn out to display low penetrance in genome-wide association studies. Although a commercial whole-genome sequencing service costs approximately US$5000, the expense incurred for data storage and computational analysis is still substantial.47, 48 Further, because exome sequencing focuses on 1–2% of the human genome, greater sequencing depth of the coding regions is obtained. As a result, exome sequencing has become a more popular approach than whole-genome sequencing.

On the other hand, the genome-wide association studies approach, deployed as commercial genome-wide single-nucleotide polymorphism genotyping arrays, is being widely applied to interrogate polygenic diseases. The genotyping array is designed to investigate common single-nucleotide polymorphisms (with a minor allele frequency of >5%) in both coding and non-coding regions throughout the human genome.49 The design of genome-wide association studies and single-nucleotide polymorphism selection in genotyping arrays has been largely driven by the common-disease common-variant hypothesis.50 As a result, most of the risk alleles that have been identified by genome-wide association studies are common (allele frequency >5%) conferring small effect sizes (odds ratio<1.5). Owing to these small effect sizes, the identified single-nucleotide polymorphisms collectively only explain a small portion of the heritability of the diseases investigated by genome-wide association studies, leaving most of the heritability unexplained; this is commonly known as the ‘missing heritability’.43 Thus, the validity of the common-disease common-variant model has now been challenged by the fact of this missing heritability and research efforts are already being directed toward the investigation of rare variants by sequencing the entire loci implicated by genome-wide association studies.51, 52, 53, 54, 55 Results to date suggest that the two competing hypotheses (common-disease common variant and multiple rare variant) are not mutually exclusive, as studies have revealed the contributions of both common and rare variants to complex disease.56 Existing data have already suggested that both the multiple rare variant and common-disease common-variant models can pertain at the same loci, eg, rare variants were identified in the loci implicated by genome-wide association studies through DNA sequencing, and these loci have been termed as ‘pleomorphic risk loci’.57 The development of newer genotyping arrays, such as the Illumina Omni arrays that leverage data from the 1000 Genomes Project, has increased the coverage of ‘less common’ single-nucleotide polymorphisms (minor allele frequency 1–5%). Exome sequencing has now further enhanced our ability to investigate rare single-nucleotide variants (<1%) within gene coding regions.52

Exome sequencing of familial cancers

Exome sequencing has increasingly been applied to the identification of somatic mutations in various sporadic cancers,58, 59, 60, 61, 62 but so far few studies have identified deleterious germline mutations in familial cancers.16, 17 Despite the limited number of studies performed to date, the advent of exome sequencing has provided new opportunities to identify the germline mutations ultimately responsible for familial cancers. Thus, a germline truncating mutation in the PALB2 gene was identified through exome sequencing of a patient with suspected familial pancreatic cancer.16 A total of 15 461 germline variants were identified in patient's exome that comprised synonymous, missense, non-sense, and splice site mutations plus microdeletions and microinsertions. On the basis that the gene underlying a cancer with a hereditary predisposition may be expected to harbor no normal alleles (ie, one allele is inherited in mutant form, often truncating, whereas the other (wild-type) allele is inactivated by somatic mutation), the ‘search space’ was narrowed down to three genes (SERPINB12, RAGE, and PALB2). Of these, PALB2 appeared to be the most promising causal gene candidate for the pancreatic cancer in question because the inherited stop codon mutations in the other two genes are relatively common in healthy individuals. This hypothesis was confirmed by finding that the patient harbored a germline frameshift deletion (4 bp) and had somatically acquired a second mutation at the canonical exon 10 splice site of PALB2. Further investigation to determine whether PALB2 mutations occur in additional putatively familial pancreatic cancers then identified truncating mutations in 3 of 96 patients screened.16 In addition to familial pancreatic cancer, germline mutations in PALB2 have previously been associated with familial breast cancer and Fanconi anemia.63, 64 Recent studies have also found evidence of inherited mutations in the PALB2 gene underlying familial breast cancer.65, 66, 67 For example, in a large-scale analysis of 1144 familial breast cancer patients, ∼3.4% of patients were heterozygous for a non-sense, frameshift, or frameshift-associated splice mutation in PALB2, reinforcing the apparent general relevance of this gene in familial cancers.66

In similar vein, germline mutations in MAX were identified in three unrelated individuals with hereditary pheochromocytoma by exome sequencing.17 This study studied heterozygous single-nucleotide variants and small indels, because it was postulated that it would be very unlikely that homozygous variants could act as founder mutations. Several common filtering criteria were applied, as with other studies Mendelian disorders (Figure 2), ie, selecting only those variants within coding regions which were predicted to yield amino-acid changes and which affected the same gene in all three samples. This resulted in the identification of a total of five single-nucleotide variants, located within two genes (MAX and ADCY6). The ADCY6 gene was then excluded as the causal gene because one of the variants did not segregate with the disease through the family pedigree. By contrast, segregation of two variants in MAX with hereditary pheochromocytoma was observed in the two families from whom DNA from affected relatives was available. Additional evidence to support the causative role of the variants in MAX came from their absence (or non-detection) in >750 population-matched control chromosomes. Finally, additional screening for MAX mutations in 59 cases lacking germline mutations in known genes for hereditary pheochromocytoma identified two additional truncating mutations and three missense variants.17 This remarkable discovery highlights the potential of exome sequencing to discover the genetic causes of those familial cancers that remain to be characterized.

Figure 2
figure 2

Commonly adopted approaches to identifying causal mutations. The three ‘universal’ criteria used to filter the ‘less likely’ causal variants are removing common variants, focusing on deleterious variants, and predicting and retaining variants with functional effects (ie, criteria 1–3). The other criteria are dependent upon the study design, for example, whether unrelated or family samples are sequenced (criteria 4–6). The variant filtering or analysis also depends on whether linkage and homozygosity data from the families are available, since such data significantly reduce the search space for potentially causal variants (criterion 7). Finally, the mode of inheritance of the Mendelian disorder will determine whether the focus should be placed upon homozygous, compound heterozygous or heterozygous variants (criterion 8). Additional criteria may be needed, for example, restricting cases that are highly similar in terms of their phenotypic manifestations to minimize the clinical/phenotypic heterogeneity that may mask the identification of causal mutations. However, these criteria are likely to vary from study to study.

Taken together, these studies have demonstrated the feasibility of exome sequencing as a powerful discovery tool for germline mutations in familial cancer. Thus, the number of successful examples of exome sequencing is expected to increase over the coming years until such a time as whole-genome sequencing becomes less challenging in terms of its ‘real’ cost (ie, not just the cost of the sequencing) and data analysis. Exome sequencing in familial cancers is a burgeoning field. By contrast, >100 genome-wide association studies have investigated the associations of germline variants (common single-nucleotide polymorphisms) with various polygenic cancers. These approaches could also be applied to familial colorectal cancer type X. However, the choice of which approach to adopt depends upon both the question posed and the hypothesis to be investigated. Although exome sequencing has yet to be applied to familial colorectal cancer, >10 genome-wide association studies have so far been conducted in the context of polygenic colorectal cancer. The total number of polygenic colorectal cancer loci (identified by, eg, genome-wide association studies) has now exceeded 20 (A Catalog of Published Genome-Wide Association Studies, http://www.genome.gov/gwastudies/). As with other cancers, all the risk alleles so far identified have an odds ratio of <1.3, indicating that most of the heritability of polygenic colorectal cancer remains unaccounted for. However, it is currently unclear whether the results from these genome-wide association studies are also applicable to familial colorectal cancer type X.

Familial colorectal cancer type X is distinct from lynch syndrome

Hereditary non-polyposis colorectal cancer is the most common familial form of colorectal cancer. The term hereditary non-polyposis colorectal cancer has been applied to heterogeneous groups of families that fulfill the Amsterdam Criteria.68 The Amsterdam Criteria are summarized in Table 1. The clinical hallmarks of hereditary non-polyposis colorectal cancer syndrome resulted in a classification scheme designated as Amsterdam Criteria I, later modified as Amsterdam Criteria II to take into account the extra-colonic cancers. Hereditary non-polyposis colorectal cancer is commonly used interchangeably with Lynch syndrome; however, only a subset of hereditary non-polyposis colorectal cancer cases have Lynch syndrome (characterized by microsatellite instability because of the deficiency in mismatch repair).

Table 1 Amsterdam Criteria

It has now however become apparent that not all Amsterdam Criteria-positive families have Lynch syndrome.69 The recognition that microsatellite instability is caused by loss of mismatch-repair activity led to the original discovery of the genes that underlie Lynch syndrome. It is noteworthy that hereditary non-polyposis colorectal cancer has a clinical diagnosis; by contrast, Lynch syndrome is defined by a molecular or genetic diagnosis of germline mutations in DNA mismatch-repair genes. A total of four mismatch-repair genes have been identified (MSH2, MLH1, MSH6, and PMS2), the first two of which (MSH2 and MLH1) account for >90% of the germline mutations.28 In addition to germline mutations, epimutations of the mismatch-repair genes have also been identified as causing Lynch syndrome.70, 71, 72 EPCAM (also known as TACSTD1) deletion leading to MSH2 promoter methylation is another common genetic aberration causing Lynch syndrome.73, 74, 75 Therefore, it is evident that only a proportion of hereditary non-polyposis colorectal cancer cases which fulfill the Amsterdam Criteria have germline mutations in mismatch-repair genes, which by ‘definition’ constitutes Lynch syndrome. As a result, families with a strong family history of colorectal cancer that do not have Lynch syndrome have been grouped as ‘familial colorectal cancer-type X’; alternatively, some studies have referred to them as mismatch-repair gene mutation-positive and -negative families for Lynch syndrome and familial colorectal cancer type X, respectively.29 Similarly, patients with a family history of gastric cancer but lacking the CDH1 mutation may constitute a distinct condition termed as the familial gastric cancer syndrome,76, 77 which may be regarded as being conceptually similar to familial colorectal cancer type X.

Familial colorectal cancer type X constitutes a heterogeneous group of cases with clear clinical differences as compared with Lynch syndrome. The term familial colorectal cancer type X was introduced in 2005 by Lindor et al29 who studied 161 familial clusters of colorectal cancer, all of which met the Amsterdam Criteria. The investigators divided the families into those in whom microsatellite instability was present in the colorectal cancer cases and those in whom microsatellite instability was absent. It was held to be reasonable to suppose that these two groups might be characterized by different disease entities, and it was shown that this was indeed the case. The families with microsatellite instability had all the classic clinical features of Lynch syndrome, and the average age to develop cancer was 48.7 years. There was a six-fold increase in risk for colorectal cancer and significantly increased risks for extra-colonic cancers such as cancers of the uterus, stomach, kidney, ovary, small intestine, and ureter. By contrast, families in the other group (microsatellite instability absent) who met the Amsterdam Criteria but did not harbor mutations in their mismatch-repair genes had a significant two-fold increase in risk for colorectal cancer but not for other cancers. Further, the average age for developing colorectal cancer in this group was 60.7 years, significantly older than the group with microsatellite instability. Thus, by both genetic (germline mutations in mismatch-repair genes) and clinical criteria, this study suggested that Lynch syndrome and familial colorectal cancer type X are two distinct disease entities.29

This new definition and classification has important clinical implications. Before the concept of ‘two disease entities’, it was unclear whether families with the clinical hereditary non-polyposis colorectal cancer syndrome but lacking microsatellite instability and germline mutation in one of the mismatch-repair genes, should be managed in the same way as those families with both of these genetic characteristics. Following the demonstration of the genetic and clinical differences between Lynch syndrome and familial colorectal cancer type X, it was suggested that the surveillance protocol of frequent colonoscopy and endometrial surveillance usually advised for Lynch syndrome families would not be recommended for the familial colorectal cancer type X families. Instead, a less stringent protocol including colonoscopy every 5 years and starting at a more advanced age was proposed.29 Thus, the identification of the casual mutations and genes underlying familial colorectal cancer type X is also expected to help in the molecular genetic diagnosis of the disease, as has been the case in other types of familial colorectal cancer such as Lynch syndrome, familial adenomatous polyposis, and MUTYH-associated polyposis, and in developing guidelines to ensure appropriate screening for any high-risk group and the clinical management of patients.

In similar vein, significant clinical differences between these two different hereditary non-polyposis colorectal cancer groups have also been found in multiple studies.78, 79, 80, 81 In a study that investigated 25 families with truncating mutations in MLH1 or MSH2, and 16 families that fulfilled the Amsterdam Criteria but lacked mutations in these genes, major clinical differences in age of onset, tumor spectrum, tumor localization, and tumor progression were found.81 These differences included an earlier age of onset for colorectal cancer and other tumors, and more synchronous and metachronous colorectal and extra-colonic tumors in the group with mutations in MLH1 or MSH2. Another striking difference between the two groups was the localization of colorectal cancer. Within the group with mutations in MLH1 or MSH2, 68% of the cancers were found proximal to the splenic flexure, representing a distribution over the entire colorectum, whereas in the other group, 79% of cancers were located distal to the splenic flexure, all in the sigmoid colon and rectum. Another important difference was in the number of adenomas observed between the two groups. A higher colorectal adenoma/carcinoma ratio and a tendency toward more synchronous or metachronous adenomas were observed in the group without mutations, indicating a slower progression of adenomas to carcinomas in the absence of a mismatch-repair gene deficiency.

Chromosomal instability, microsatellite instability, and CpG island methylation phenotype are three major pathways that account for the majority of colorectal cancers. The autosomal dominant familial adenomatous polyposis (caused by mutations in the APC gene) and the autosomal recessive MUTYH-associated polyposis (caused by mutations in MUTYH) exerted their influence through the chromosomal instability pathway. On the other hand, the autosomal dominant Lynch syndrome (caused by mutations in the mismatch-repair genes) involved the microsatellite instability pathway. Finally, CpG island methylation phenotype is associated with hyperplastic polyposis and the development of colorectal cancer from serrated polyps, with the presence of BRAF somatic mutations and MLH1 promoter methylation in the tumor.82, 83, 84 The CpG island methylation phenotype is no longer grouped into positive (CpG island methylation phenotype high) vs negative, because an additional category known as CpG island methylation phenotype low has been identified which has some unique features including associations with KRAS mutations,85, 86, 87 MGMT promoter methylation,88 and low-level methylation at specific loci.89 These unique features are distinct from the CpG island methylation phenotype high and CpG island methylation phenotype-negative categories which suggests that there are at least three different CpG island methylation phenotypic categories in human colorectal cancer. Therefore, it would be interesting to explore whether germline mutations in these known genes (MLH1, MSH2, MSH6, PMS2, APC, and MUTYH) and pathways could be involved in familial colorectal cancer type X. However, it has been shown that the majority of families fulfilling the Amsterdam Criteria did not have an identifiable mutation in these six genes and none had evidence of methylator pathway dysfunction. This suggests that high-penetrance mutations in as-yet-to-be identified genes may cause familial colorectal cancer type X.90 Taken together, a variety of genetic and clinical differences between Lynch syndrome and familial colorectal cancer type X have emerged.

Various molecular features (such as epigenetics) of familial colorectal cancer type X have been further characterized and these have helped to distinguish it from other types of colorectal cancer.91, 92 More specifically, colorectal cancers with microsatellite stable (ie, lacking mismatch-repair deficiency) and Amsterdam Criteria positive were found to display a significantly lower degree of long interspersed element-1 methylation as compared with other patient groups, ie, Lynch syndrome with identified mismatch-repair mutations and sporadic colorectal cancers with or without microsatellite instability. Long interspersed element-1 comprises a substantial portion of the human genome and its methylation correlates with global methylation status. Genome-wide DNA hypomethylation has an important role in genomic instability and carcinogenesis.91 Additionally, it was suggested that cancers characterized by long interspersed element-1 extreme hypomethylation constitute a previously unrecognized and distinct subtype of colorectal cancer.92 In addition to young onset (consistent with familial predisposition), colorectal cancers characterized by long interspersed element-1 hypomethylation consistently exhibited a poor prognosis.93, 94 Taken together, these findings further suggest molecular heterogeneity of familial colorectal cancer type X, with several possible molecular subtypes.

Monogenic vs polygenic models for familial cancers

The fulfillment of Amsterdam Criteria has reinforced our view that familial colorectal cancer type X genetics has a monogenic component, ie, it is likely to be caused by high-penetrance mutations, similarly to the germline mutations in mismatch-repair genes responsible for Lynch syndrome. However, the clinical characteristics such as the late onset of disease and the lower risk of colorectal cancer suggest that familial colorectal cancer type X might also have a polygenic component (ie, involving interaction of multiple low-penetrance genetic variants) similarly to sporadic colorectal cancer cases, which have been investigated by genome-wide association studies. Elucidating the nature of the disease is important as it will influence the approaches adopted to interrogate the genetic basis of familial colorectal cancer type X.

A similar scenario has been noted in the context of another cancer; familial nasopharyngeal carcinoma cases are defined as multiple affected members within a family. However, there is little evidence to suggest that a familial form of nasopharyngeal carcinoma exists with characteristics markedly distinct from sporadic cases.95 This conclusion has been supported by a more recent analysis that investigated the fit of single gene, polygenic and multifactorial models to the observed pattern of transmission of nasopharyngeal carcinoma in a hospital-based family history study, which also suggested a multifactorial mode of inheritance for nasopharyngeal carcinoma.96 So far, no high-penetrance causal mutation or gene has been identified for familial nasopharyngeal carcinoma, although several susceptibility loci have been tentatively identified by linkage studies.97, 98, 99 Similarly, familial testicular germ cell tumors are defined as testicular germ cell tumors that have been diagnosed in at least two blood relatives. Nevertheless, previous linkage studies of multiple familial testicular germ cell tumor families did not reveal any high-penetrance genes and it has therefore been concluded that the combined effects of multiple common genetic variants, each conferring a modest risk, might underlie the disease.100 By contrast, familial adenomatous polyposis, Lynch syndrome, and hereditary diffuse gastric cancer represent clear examples of monogenic cancers where high-penetrance causal mutations and genes have been identified. Similarly, hereditary pheochromocytoma is known to be caused by germline mutations in one of the nine genes, namely RET, VHL, SDHA, SDHB, SDHC, SDHD, SDHAF2, NF1, and TMEM127.17 Although it is often assumed that monogenic models underlie the genetic etiology of familial cancers whereas polygenic models account for sporadic cancers, this is not always the case.

How to interrogate the genetics of familial colorectal cancer type X?

The nature of the disease determines the study design required to unravel the causal mutations or risk-predisposing variants for familial colorectal cancer type X. However, there is little evidence to show whether familial colorectal cancer type X is a monogenic or polygenic disease or whether it is somewhere in between. The evidence suggesting that familial colorectal cancer type X is a monogenic disease comes mainly from the fulfillment of Amsterdam Criteria. The Amsterdam Criteria state that at least three relatives must have colorectal cancer. However, the familial aggregation, with multiple affected family members in one family, could also be due to shared non-genetic factors, which would not therefore necessarily be compatible with the monogenic model. Such environmental factors would be expected to interact with multiple genetic risk factors causing colorectal cancer, a multifactorial disease model proposed for polygenic disease. This therefore raises the question as to whether the Amsterdam Criteria are sufficient to support a monogenic basis for familial colorectal cancer type X. Furthermore, some of the clinical features of familial colorectal cancer type X implied that it could have a polygenic basis. This uncertainty in the nature of the disease for familial colorectal cancer type X presents substantial challenges in terms of deciding upon an optimal approach to interrogate its genetic basis.

The targeted sequencing of causal genes, already applied in the context of other familial cancers (such as CDH1 (hereditary diffuse gastric cancer), BRCA1 and BRCA2 (familial breast cancer), and the genes underlying hereditary pheochromocytoma), appears to be a worthwhile approach to identify deleterious germline mutations for familial colorectal cancer type X. The rationale is that germline mutations in these genes could underlie different familial cancers, as for example in the case of the PALB2 germline mutations that have been found in both familial pancreatic and breast cancers.16, 63 Another notable example is provided by the germline mutations in the BRCA2 gene that not only increase the risk of breast and ovarian cancer, but also pancreatic cancer.101 This targeted sequencing approach has been greatly aided by high-throughput enrichment methods and next-generation sequencing technologies to selectively enrich for regions of interest. Hundreds of genes can be sequenced efficiently, leveraging these technological advances compared with traditional PCR-based Sanger sequencing. The efficiency of this approach has been exemplified in a targeted sequencing study of germline mutations in 21 tumor suppressor genes for 360 women with inherited ovarian, peritoneal, or fallopian tube carcinoma.102 This study harnessed the power of the Sure-Select enrichment system and the Illumina sequencing platform to sequence these genes; 24% of the patients were found to carry germline loss-of-function mutations in 12 genes, six of which had not previously been implicated in inherited ovarian carcinoma. Although this targeted approach has limited discovery value, as these genes had already been implicated in causing familial cancers, it could still have some novelty value by identifying germline mutations in known genes for cancers, which have not yet been linked to these genes.

This targeted approach can be expanded to include the entire set of exons in all genes in the human genome. Exome sequencing on its own or coupled with linkage analysis has already unravelled multiple new causal mutations and genes for Mendelian disorders.7, 8 Furthermore, these discoveries were made by exome sequencing fewer than 10 patient samples in most of the studies reported. As such, it is also widely anticipated that exome sequencing will represent a powerful tool to reveal the genetic causes of familial colorectal cancer type X by identifying rare and deleterious or high-penetrance mutations within gene coding regions. However, the appropriate selection of cases will have a key role in determining the success or otherwise of exome sequencing in this context. In addition to fulfilling the Amsterdam Criteria, and excluding germline mutations in mismatch-repair genes, selecting cases with a very early onset of disease, severe clinico-pathological manifestations or the ‘extreme’ familial colorectal cancer type X phenotypes are expected to enrich for the ‘monogenic’ component and hence enhance our chances of identifying high-penetrance mutations. Recurrent mutations (similar mutations in different samples) or genes harboring several different deleterious mutations (which include single-nucleotide variants and small indels) across multiple samples can then be prioritized for further studies using a larger sample of cases.

On the other hand, if we assume that familial colorectal cancer type X has a polygenic component, then genome-wide association studies would represent the ideal approach to identify common single-nucleotide polymorphisms associated with this disease. Further, whole-genome genotyping arrays would also allow copy number variants to be investigated to a certain extent for their associations with familial colorectal cancer type X within a single genome-wide association study. High-density genotyping arrays have been used to identify copy number variants in a cohort of 41 colorectal cancer patients who were below 40 years of age at diagnosis and/or who exhibited an overt family history.103 Multiple copy number variants, encompassing genes such as CDH18, GREM1, and BCR, were identified in six patients as well as two deletions encompassing two microRNA genes, hsa-mir-491/KIAA1797 and hsa-mir-646/AK309218. Interestingly, these copy number variants had not previously been reported in relation to colorectal cancer predisposition, nor had they been encountered in large control cohorts. This illustrates the potential power of copy number variant investigation to identify novel causal or susceptibility genes or genetic loci for both familial and sporadic colorectal cancers. Through another interesting observation, multiple genomic aberrations including copy number gains and losses in different chromosomes have also been detected in 30 mismatch repair-proficient familial colorectal cancers. In particular, the frequency of 20q gain is remarkably increased when compared with sporadic colorectal cancer, suggesting that the 20q gain is involved in the genetic etiology of these mismatch repair-proficient familial colorectal cancers.104 The finding that most of these genomic aberrations were also observed in sporadic colorectal cancer further suggests that familial and sporadic colorectal cancers could share genetic predisposition to a certain extent.

It is however noteworthy that genome-wide association studies represent an indirect association study design, based on linkage disequilibrium, to detect the disease-causing variants, as compared with direct sequencing. To achieve the required statistical power and significance threshold to detect common single-nucleotide polymorphisms conferring small effect sizes (odds ratio <1.5), several thousands of cases and controls are required for the initial genome-wide genotyping and subsequent replication studies.105 Although the cost of genotyping arrays is steadily becoming much cheaper, a hefty investment is still required to analyze thousands of samples. In addition to this cost, collecting the adequate sample size of patients to embark on a genome-wide association study is a considerable challenge if this is to be achieved without an international consortium (because of the rarity of familial colorectal cancer type X as compared with sporadic colorectal cancer cases). The polygenic basis of familial colorectal cancer type X is still a speculative issue. Bearing in mind this uncertainty, an alternative is to leverage the results from genome-wide association studies of colorectal cancer by genotyping the robust single-nucleotide polymorphism associations in a familial colorectal cancer type X cohort. This approach might be more feasible in terms of cost-effectiveness and sample size (without the need of a stringent significance threshold to account for several hundred thousand single-nucleotide polymorphisms). The penalty of multiple testing imposed in genome-wide association studies should increase the attractiveness of this approach in the context of testing single-nucleotide polymorphisms identified by genome-wide association studies for familial colorectal cancer type X. One may speculate that if familial colorectal cancer type X has a polygenic component, some of these polymorphisms should also be associated with familial colorectal cancer type X, which would then warrant a comprehensive genome-wide association study for familial colorectal cancer type X in the future. This speculation appears reasonable because common shared single-nucleotide polymorphisms or genetic loci have been found in several different cancers. There have been several examples of the practical utility of genome-wide association study results in the context of familial cancers. These studies have provided evidence to suggest that low-penetrance variants may explain the increased cancer risk in familial colorectal cancer106, 107, 108 and in familial testicular germ cell tumors.100

Finally, the genes or genetic loci implicated in colorectal cancer by genome-wide association studies can be captured and sequenced. This targeted sequencing approach is very cost-effective as up to 96 samples can be multiplexed through barcoding for massively parallel sequencing. This targeted sequencing approach will interrogate both rare variants and common single-nucleotide polymorphisms in the loci identified by genome-wide association studies. The promise of this approach in unravelling rare variants in loci implicated by genome-wide association studies has already been demonstrated.51, 53, 54, 55 For example, deep resequencing of such loci has identified independent rare variants associated with inflammatory bowel disease.55

Perspectives and conclusions

The genetic and clinical differences between Lynch syndrome and familial colorectal cancer type X have been well documented. However, the genetic etiologies of familial colorectal cancer type X remain to be determined. There is also a paucity of evidence to indicate one way or the other whether familial colorectal cancer type X is a monogenic or a polygenic disease. On the other hand, the genetics of sporadic/polygenic colorectal cancer have been comprehensively investigated by >10 genome-wide association studies over the past few years. One striking observation is the sharing of common single-nucleotide polymorphisms or genetic loci across different cancers. It is therefore reasonable to speculate that if familial colorectal cancer type X has a polygenic basis, some of the single-nucleotide polymorphisms identified by genome-wide association studies as conferring risk of colorectal cancer might be expected to show associations with familial colorectal cancer type X as well. Given the expense and logistic challenges involved in collecting a large number of familial colorectal cancer type X cases to embark on a genome-wide association study, together with the uncertainty of the disease model, we believe that the genotyping of genome-wide association study-identified single-nucleotide polymorphisms in familial colorectal cancer type X would be a more feasible first approach to explore the genetic etiology of this disease. However, given the low incidence of familial colorectal cancer type X (ie, only ∼2–3% of colorectal cancer families meet Amsterdam Criteria and about half of these are Lynch syndrome cases), collecting an adequate large sample size is difficult and challenging especially for studying the association of single-nucleotide polymorphisms with modest effect sizes. Thus, National or International Consortia involving many centers are likely to be needed to recruit large numbers of patients. Alternatively, the genes or loci identified by genome-wide association studies could be investigated using a targeted sequencing approach to unravel rare variants of larger effect size.

One of the limitations of genome-wide association studies is that they are based upon an indirect association study design, which is reliant on linkage disequilibrium to identify the disease functional variants. As a result, the surrogate markers (ie, the associated single-nucleotide polymorphisms) identified by genome-wide association studies generally lack functional significance. Furthermore, to enhance the statistical power, genome-wide association studies have tended to lump all colorectal cancers in the disease group, even although it is well recognized that colorectal cancers are inherently heterogeneous. These challenges have led to the notion and conceptualization of ‘molecular pathological investigation’, which is a relatively new field of epidemiology based upon the molecular classification of cancer.109 It is a multidisciplinary field involving the investigation of the interrelationship between exogenous and endogenous (eg, genetic) factors, tumoral molecular signatures, and tumor progression. Further, integrating genome-wide association studies with molecular pathological investigation allows examination of the relationship between susceptibility alleles identified by genome-wide association studies and specific molecular alterations/subtypes, which can help to elucidate the function of these alleles and provide insights into whether the detected susceptibility alleles are truly causal. Although there are challenges, molecular pathological epidemiology has unique strengths, and can provide insights into the pathogenic process.

In addition, exome sequencing of multiple ‘well-selected’ cases could be performed, assuming a monogenic basis in which high-penetrance mutations are predicted to underlie the genetic etiology of familial colorectal cancer type X. Exome sequencing of families with multiple affected individuals also represents a promising study design. This family-based design has the advantage that it allows for the genetically heterogeneous nature of familial colorectal cancer type X. Comparing unrelated individuals or probands from different families to identify ‘common/shared’ putative pathological variants or genes harboring putative pathological variants might not be a successful strategy for genetically heterogeneous diseases. However, it still depends on the degree of genetic heterogeneity (ie, allelic heterogeneity versus locus heterogeneity) characterizing the disease and this remains unknown. Although the family design is robust with respect to genetic heterogeneity (comparing affected and unaffected members in a family), one must recognize that it could also be problematic because the penetrance of disease mutations for familial colorectal cancer type X is likely to be lower than that for Lynch syndrome.

Moving forward, it is arguable that whole-genome sequencing should probably be considered instead of exome sequencing, as the cost differential between the two approaches (given a small patient sample size) would not be substantial, and because the former approach will generate genetic data for the entire genome rather than just 1–2% as for exome sequencing. However, one should select the study design that best fits the hypothesis where rare deleterious mutations in coding regions underlie the genetic etiology of a Mendelian disorder or familial cancer. So far, all the discoveries made by whole-genome sequencing could also have been achieved using exome sequencing for Mendelian disorders.110, 111 Furthermore, the genetic variants in most of the non-coding regions revealed by whole-genome sequencing remain ‘uninterpretable’ biologically. In taking a practical (rather than theoretical) point of view, whole-genome sequencing still presents a very substantial technical challenge as well as a challenge in terms of analyzing and interpreting the sequence data generated.

The disease models underpinning multiple familial cancers such as familial nasopharyngeal carcinoma,112 familial testicular germ cell tumor,113 familial chronic lymphocytic leukemia,114 and familial colorectal cancer (familial colorectal cancer type X)29 remain contentious as the high-penetrance mutations are yet to be identified. By contrast, multiple low-penetrance variants that confer an effect size of odds ratio <1.5 have been revealed through genome-wide association studies for the sporadic cases of these cancers; interestingly, some of these single-nucleotide polymorphisms have also been found to be associated with the familial cases (nasopharyngeal carcinoma,115 testicular germ cell tumor,116, 117 chronic lymphocytic leukemia,118, 119 and colorectal cancer120). In the context of familial colorectal cancer type X, we believe that the disease model and its genetic basis are likely to become more apparent when the approaches that we have outlined and discussed are applied in practice. This should facilitate the iterative interrogation of the genetics of familial colorectal cancer type X and other familial cancers of similar nature before embarking on either a comprehensive genome-wide association studies or whole-genome sequencing approach.