The colorectal adenoma–carcinoma sequence has provided a paradigmatic framework for understanding the successive somatic genetic changes and consequent clonal expansions that lead to cancer1. However, our understanding of the earliest phases of colorectal neoplastic changes—which may occur in morphologically normal tissue—is comparatively limited, as for most cancer types. Here we use whole-genome sequencing to analyse hundreds of normal crypts from 42 individuals. Signatures of multiple mutational processes were revealed; some of these were ubiquitous and continuous, whereas others were only found in some individuals, in some crypts or during certain periods of life. Probable driver mutations were present in around 1% of normal colorectal crypts in middle-aged individuals, indicating that adenomas and carcinomas are rare outcomes of a pervasive process of neoplastic change across morphologically normal colorectal epithelium. Colorectal cancers exhibit substantially increased mutational burdens relative to normal cells. Sequencing normal colorectal cells provides quantitative insights into the genomic and clonal evolution of cancer.
Subscribe to Journal
Get full journal access for 1 year
only $3.90 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Whole-genome and targeted sequencing data are deposited in the European Genome-phenome Archive (EGA) with accession codes EGAD00001004192 and EGAD00001004193. Images of microdissections and the physical distances between crypts are available on Mendeley Data (https://data.mendeley.com/datasets/zv6xrjxftw/1) by searching for the title of this article. All other data are available from the authors on request.
Fearon, E. R. & Vogelstein, B. A genetic model for colorectal tumorigenesis. Cell 61, 759–767 (1990).
Alexandrov, L. B. et al. The repertoire of mutational signatures in human cancer. Preprint at https://www.biorxiv.org/content/10.1101/322859v2 (2019).
Sabarinathan, R. et al. The whole genome panorama of cancer drivers. Preprint at https://www.biorxiv.org/content/10.1101/190330v2 (2017).
Blokzijl, F. et al. Tissue-specific mutation accumulation in human adult stem cells during life. Nature 538, 260–264 (2016).
Roerink, S. F. et al. Intra-tumour diversification in colorectal cancer at the single-cell level. Nature 556, 457–462 (2018).
Welch, J. S. et al. The origin and evolution of mutations in acute myeloid leukemia. Cell 150, 264–278 (2012).
Bae, T. et al. Different mutational rates and mechanisms in human cells at pregastrulation and neurogenesis. Science 359, 550–555 (2018).
Behjati, S. et al. Genome sequencing of normal cells reveals developmental lineages and mutational processes. Nature 513, 422–425 (2014).
Lee-Six, H. et al. Population dynamics of normal human blood inferred from somatic mutations. Nature 561, 473–478 (2018).
Martincorena, I. et al. High burden and pervasive positive selection of somatic mutations in normal human skin. Science 348, 880–886 (2015).
Suda, K. et al. Clonal expansion and diversification of cancer-associated mutations in endometriosis and normal endometrium. Cell Rep. 24, 1777–1789 (2018).
Lodato, M. A. et al. Aging and neurodegeneration are associated with increased mutations in single human neurons. Science 359, 555–559 (2018).
Hoang, M. L. et al. Genome-wide quantification of rare somatic mutations in normal human tissues using massively parallel sequencing. Proc. Natl Acad. Sci. USA 113, 9846–9851 (2016).
Nicholson, A. M. et al. Fixation and spread of somatic mutations in adult human colonic epithelium. Cell Stem Cell 22, 909–918 (2018).
Moore, L. et al. The mutational landscape of normal human endometrial epithelium. Preprint at https://www.biorxiv.org/content/10.1101/505685v1 (2018).
Martincorena, I. et al. Somatic mutant clones colonize the human esophagus with age. Science 362, 911–917 (2018).
Jaiswal, S. et al. Age-related clonal hematopoiesis associated with adverse outcomes. N. Engl. J. Med. 371, 2488–2498 (2014).
Xie, M. et al. Age-related mutations associated with clonal hematopoietic expansion and malignancies. Nat. Med. 20, 1472–1478 (2014).
McKerrell, T. et al. Leukemia-associated somatic mutations drive distinct patterns of age-related clonal hemopoiesis. Cell Rep. 10, 1239–1245 (2015).
Genovese, G. et al. Clonal hematopoiesis and blood-cancer risk inferred from blood DNA sequence. N. Engl. J. Med. 371, 2477–2487 (2014).
Potten, C. S., Kellett, M., Roberts, S. A., Rew, D. A. & Wilson, G. D. Measurement of in vivo proliferation in human colorectal mucosa using bromodeoxyuridine. Gut 33, 71–78 (1992).
Cheng, H. & Leblond, C. P. Origin, differentiation and renewal of the four main epithelial cell types in the mouse small intestine. V. Unitarian Theory of the origin of the four epithelial cell types. Am. J. Anat. 141, 537–561 (1974).
Lopez-Garcia, C., Klein, A. M., Simons, B. D. & Winton, D. J. Intestinal stem cell replacement follows a pattern of neutral drift. Science 330, 822–825 (2010).
Snippert, H. J. et al. Intestinal crypt homeostasis results from neutral competition between symmetrically dividing Lgr5 stem cells. Cell 143, 134–144 (2010).
Griffiths, D. F., Davies, S. J., Williams, D., Williams, G. T. & Williams, E. D. Demonstration of somatic mutation and colonic crypt clonality by X-linked enzyme histochemistry. Nature 333, 461–463 (1988).
Winton, D. J. & Ponder, B. A. J. Stem-cell organization in mouse small intestine. Proc. R. Soc. B 241, 13–18 (1990).
Kozar, S. et al. Continuous clonal labeling reveals small numbers of functional stem cells in intestinal crypts and adenomas. Cell Stem Cell 13, 626–633 (2013).
Barker, N. et al. Crypt stem cells as the cells-of-origin of intestinal cancer. Nature 457, 608–611 (2009).
Rouhani, F. J. et al. Mutational history of a human cell lineage from somatic to induced pluripotent stem cells. PLoS Genet. 12, e1005932 (2016).
Viel, A. et al. A specific mutational signature associated with DNA 8-Oxoguanine persistence in MUTYH-defective colorectal cancer. EBioMedicine 20, 39–49 (2017).
Alexandrov, L. B. et al. Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013).
Nik-Zainal, S. et al. Mutational processes molding the genomes of 21 breast cancers. Cell 149, 979–993 (2012).
Chan, K. et al. An APOBEC3A hypermutation signature is distinguishable from the signature of background mutagenesis by APOBEC3B in human cancers. Nat. Genet. 47, 1067–1072 (2015).
Boot, A. et al. Identification of novel mutational signatures in Asian oral squamous cell carcinomas associated with bacterial infections. Preprint at https://www.biorxiv.org/content/10.1101/368753v3 (2019).
Haradhvala, N. J. et al. Mutational strand asymmetries in cancer genomes reveal mechanisms of DNA damage and repair. Cell 164, 538–549 (2016).
Wolf, J. et al. Peripheral blood mononuclear cells of a patient with advanced Hodgkin’s lymphoma give rise to permanently growing Hodgkin–Reed Sternberg cells. Blood 87, 3418–3428 (1996).
Cancer Genome Atlas Network. Comprehensive molecular characterization of human colon and rectal cancer. Nature 487, 330–337 (2012).
Bomme, L. et al. Cytogenetic analysis of colorectal adenomas: karyotypic comparisons of synchronous tumors. Cancer Genet. Cytogenet. 106, 66–71 (1998).
Andersen, C. L. et al. Frequent occurrence of uniparental disomy in colorectal cancer. Carcinogenesis 28, 38–48 (2007).
Corley, D. A. et al. Variation of adenoma prevalence by age, sex, race, and colon location in a large population: implications for screening and quality programs. Clin. Gastroenterol. Hepatol. 11, 172–180 (2013).
Cancer Research UK. Bowel Cancer Incidence Statistics https://www.cancerresearchuk.org/health-professional/cancer-statistics/statistics-by-cancer-type/bowel-cancer/incidence#heading-Seven (accessed August 2018).
Stamp, C. et al. Predominant asymmetrical stem cell fate outcome limits the rate of niche succession in human colonic crypts. EBioMedicine 31, 166–173 (2018).
Li, Y. et al. Patterns of structural variation in human cancer. Preprint at https://www.biorxiv.org/content/10.1101/181339v1 (2017).
Lugli, N. et al. Enhanced rate of acquisition of point mutations in mouse intestinal adenomas compared to normal tissue. Cell Rep. 19, 2185–2192 (2017).
Travis, L. B. Therapy-associated solid tumors. Acta Oncol. 41, 323–333 (2002).
Jones, D. et al. cgpCaVEManWrapper: Simple execution of CaVEMan in order to detect somatic single nucleotide variants in NGS data. Curr. Protoc. Bioinformatics 56, 15.10.1–15.10.18 (2016).
Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164 (2010).
Raine, K. M. et al. cgpPindel: Identifying somatically acquired insertion and deletion events from paired end sequencing. Curr. Protoc. Bioinformatics 52, 15.7.1–15.7.12 (2015).
Van Loo, P. et al. Allele-specific copy number analysis of tumors. Proc. Natl Acad. Sci. USA 107, 16910–16915 (2010).
Scheinin, I. et al. DNA copy number analysis of fresh and formalin-fixed specimens by shallow whole-genome sequencing with identification and exclusion of problematic regions in the genome assembly. Genome Res. 24, 2022–2032 (2014).
Buels, R. et al. JBrowse: a dynamic web platform for genome visualization and analysis. Genome Biol. 17, 66 (2016).
Martincorena, I. et al. Universal patterns of selection in cancer and somatic tissues. Cell 171, 1029–1041 (2017).
Forbes, S. A. et al. COSMIC: somatic cancer genetics at high-resolution. Nucleic Acids Res. 45, D777–D783 (2017).
Felsenstein, J. PHYLIP—Phylogeny inference package (version 3.2).Cladistics 5, 164–166 (1989).
Roberts, N. Patterns of somatic genome rearrangement in human cancer. PhD thesis, Univ. Cambridge (Wellcome Trust Sanger Institute, 2018).
Farmery, J. H. R., Smith, M. L., Bioresource, N., Diseases, R. & Lynch, A. G. Telomerecat: a ploidy-agnostic method for estimating telomere length from whole genome sequencing data. Sci. Rep. 8, 1300 (2018).
Nersisyan, L. & Arakelyan, A. Computel: computation of mean telomere length from whole-genome next-generation sequencing data. PLoS One 10, e0125201 (2015).
Ding, Z. et al. Estimating telomere length from whole genome sequence data. Nucleic Acids Res. 42, e75 (2014).
Feuerbach, L. et al. TelomereHunter: telomere content estimation and characterization from whole genome sequencing data. Preprint at https://www.biorxiv.org/content/10.1101/065532v1 (2016).
This work was supported by the Wellcome Trust. We thank P. Scott, J. Fowler, D. Fernandez-Antoran and Y. Hooks for their advice with histology and laser-capture microdissection; M. Gerstung for his advice on statistics; the Sanger Institute Research and Development Facility for their help with sequencing microbiopsies; the staff of WTSI Sample Logistics, Genotyping, Pulldown, Sequencing and Informatics facilities for their contribution; K. Mahbubani, R. ten Hoopen, C. Scarpini and the Phoenix study team of N. Grehan, I. Debiram-Beecham, J. Crawte, T. Nukcheddy Grant, P. Lao-Sirieix and A. Hindmarsh for their help with sample collection; the Human Research Tissue Bank, which is supported by the National Institute for Health Research (NIHR) Cambridge Biomedical Research Centre from Addenbrooke’s Hospital; and all the individuals who contributed samples to this study. Access to samples of transplants from organ donors was provided by the Cambridge Biorepository for Translational Medicine. A.N. was funded through an MRC Clinical Research Fellowship; the autopsy cohort was funded through this, an MRC core grant (RG84369) and an NIHR Research Professorship (RG67258) to R.C.F., and additional infrastructure support was provided from the CRUK-funded Experimental Cancer Medicine Centre in Cambridge.
The authors declare no competing interests.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Peer review information Nature thanks Jacco van Rheenen and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Extended data figures and tables
a, Representative image of a section of colonic tissue. The magnified inset shows the section before and after dissection of a crypt. b, c, Coverage of crypts that underwent whole-genome (b) and targeted (c) sequencing. d, e, VAFs (that is, is half of the clonal fraction) for crypts that underwent whole-genome (d) and targeted (e) sequencing. f, g, Substitutions (f) and indels (g) that were removed by filtering steps and their mutational spectra, arranged as in Fig. 1.
Results of signature extraction using an HDP with pre-conditioning on signatures that are known to be active in colorectal cancer. For each signature, the extracted signature and the profile of a sample to which that signature contributes strongly are shown. Signatures are presented as in Fig. 2. The extraction of signatures using an HDP was followed by deconvolution by expectation maximization (Methods, Extended Data Fig. 3) to produce the versions of signatures that are presented in the main text.
Three signatures were decomposed (SBS1, DBSA and IDC). For each example, the original HDP version is shown on the top left, the PCAWG signatures that are deemed to contribute at least 10% of mutations to it on the right and the reconstituted signature that was built by combining the PCAWG signatures on the bottom left. The cosine similarity of the reconstituted signature to the original is shown.
a–c, Other methods of signature extraction were run to test the robustness of signature decomposition. a, HDP without pre-conditioning on PCAWG. b, In-house NMF without pre-conditioning on PCAWG. c, NMF implemented by the MutationalPatterns package in R (Methods).
For signatures that appeared to show a linear accumulation with age, the mutation rate per site was determined using mixed models, in which age and site were used as fixed effects and individual as a random effect. Confidence intervals were determined by bootstrapping. n = 445 crypts from 42 individuals. Solid lines represent the mean slope of the regression and shaded areas its 95% confidence intervals (CI95).
a–ap, For each individual, the phylogeny of crypts is shown three times: at the top, with branch lengths proportional to the number of SBSs; in the middle, with branch lengths proportional to the number of DBSs; and on the bottom, with branch lengths proportional to the number of small indels. Scale bars are shown on the right. A stacked bar plot of the mutational signatures that contribute to each branch is overlaid over every branch. ‘X0’ indicates mutations that could not confidently be assigned to any signature. Note that the ordering of signatures along a given branch is just for visualization purposes; we cannot distinguish the timing of different signatures along a branch. aq, The cumulative burden of SBSA (top) and SBSB (bottom) is plotted relative to the cumulative burden of SBS1 to time these mutational processes throughout life. Informative clades are shown (from patients labelled as in the rest of the figure), with every node and tip of the clade plotted in the space of the cumulative number of mutations that are due to a given signature that have occurred up until that node in the tree. Lines represent the branching structure of the tree.
a–d, A total of 449 crypts had sufficient coverage to be evaluated. a, Whole-chromosome amplifications in five crypts. The copy-number state (y axis) for each chromosome is shown, with one allele coloured red and the other green. Chromosomes are labelled along the top of the graph. b, Timing of copy-number changes throughout life. Vertical bars represent 95% confidence intervals, which were determined by bootstrapping. Horizontal bars represent the most likely time of the copy-number change, as defined by mutationTimeR (see Supplementary Information). c, Crypts with loss of heterozygosity (LOH). For each chromosome with a LOH event, the copy number across the whole chromosome is shown at the top, with the total copy number in black and the copy number for the minor allele in blue. The images at the bottom show example SNPs that support the LOH. In each case, reads from the crypt in question are shown above, and reads from its matched normal below. Thus, in the first image, the wild-type state (below) is heterozygous for a T SNP (red), whereas in the crypt in question (above), this polymorphism has now become homozygous. Small deviations from a fully homozygous state are probably a result of stromal contamination. d, Reads supporting structural variants in normal colon. Patients are labelled as in Extended Data Fig. 6.
Putative driver missense mutations in oncogene hotspots. The number of substitutions catalogued in COSMIC53 is shown on the y axis at each position along the gene, with the mutations that were observed in our cohort indicated with arrows.
For all crypts that were whole-genome sequenced to sufficient depth, and for crypts that underwent targeted sequencing and in which driver mutations were found, the signatures and driver mutations are shown. Each vertical column represents a crypt. The individual to whom each crypt belongs is indicated by the alternating colours in the top bar (labelling as in Extended Data Fig. 6). The site to which each crypt belongs is shown underneath. The matrix is coloured by the contribution of each signature to each crypt, normalized for each signature: the crypt with the largest contribution of a given signature is purple and the crypt with the smallest contribution is white. Crypts in which the signatures could not be assessed, either because they underwent targeted sequencing or because the coverage was poor, are grey. Driver mutations, including heterozygous mutations in tumour suppressor genes, are indicated by a black bar.
a, Number of stem cells and replacement rate of stem cells in normal human colonic crypts, as estimated by approximate Bayesian computation. Each point represents a simulation. Points are coloured according to their similarity to the observed data: the most similar 0.1% are coloured dark red, and so on, until the least similar simulations are blue. b, Approximate Bayesian computation of the rate of crypt fission (fissions per crypt per year) in the human colon. The prior distribution of the crypt fission rate (which was used to simulate many biopsies of the colon) is shown above, and the posterior distribution of the crypt fission rate (estimated by neural network regression on the simulations) is shown below. c, d, Evidence of crypt fusion in human colon. In each case, a phylogeny is shown at the top that depicts the genetic relationships between selected crypts. Dashed blue lines show mutations with a low allele fraction that are shared between crypts in a manner incompatible with the phylogeny dictated by the clonal mutations. Below each crypt in the phylogeny is an image that depicts its position in the section. Sections are labelled according to their z-stacked order. The allele fractions of mutations on each branch of the phylogeny in each crypt are shown at the bottom. The trinucleotide context of the mutations that occurred on each branch is shown on the right. See also Supplementary Information.
Extended Data Fig. 11 Comparison of the mutational signatures and driver landscape of normal crypts and colorectal adenocarcinomas.
a, Comparison of the burden of mutations for every mutational signature. For each signature, the y axis shows the mutational burden + 1 of every sample on a logarithmic scale. Normal colon and cancer samples are ordered within their groups. The signature attributions and mutational burden for colorectal adenocarcinoma are from a previous study2. A total of 60 cancers are compared with 472 normal crypts. b, The proportion of driver mutations in each gene in normal colon (left) and colorectal cancer (right). The frequency of driver mutations in cancer was derived using data from TCGA research network43 (Supplementary Methods).
Supplementary Information R code and working for statistical analyses presented in the main text. Supplementary Table 2 contains the input data for these analyses.
Supplementary Information Additional working and results to support three analyses.
Supplementary Table Information on all the patients in this study, including their age, cohort, and cancer status.
Supplementary Table The contribution of each mutational signature to each crypt. This file contains the input data used in Supplementary Results 1.
Supplementary Table This table contains all coding mutations detected in our crypts.
Supplementary Table This table contains a list of the candidate colorectal cancer driver genes included in our bait-set.
Supplementary Table This table contains candidate driver mutations annotated as to their characteristics and our evaluation of whether they were likely to be true driver mutations.
Supplementary Table Mutations from the TCGA analysis of colorectal cancer, evaluated using our criteria (Supplementary Methods) for whether they were likely driver mutations. This allows comparison with our data.
Supplementary Table Branch lengths of phylogenies were adjusted according to the probability of calling mutations on each branch, which depends on the number of descendants of a given branch and their sequencing coverage and clonality. This table contains the adjustment factors for each branch in each phylogeny.
About this article
Cite this article
Lee-Six, H., Olafsson, S., Ellis, P. et al. The landscape of somatic mutation in normal colorectal epithelial cells. Nature 574, 532–537 (2019). https://doi.org/10.1038/s41586-019-1672-7
BMC Cancer (2021)
Genome Biology (2021)
Intratumor heterogeneity: the hidden barrier to immunotherapy against MSI tumors from the perspective of IFN-γ signaling and tumor-infiltrating lymphocytes
Journal of Hematology & Oncology (2021)
Nature Reviews Cancer (2021)