Various species of the intestinal microbiota have been associated with the development of colorectal cancer1,2, but it has not been demonstrated that bacteria have a direct role in the occurrence of oncogenic mutations. Escherichia coli can carry the pathogenicity island pks, which encodes a set of enzymes that synthesize colibactin3. This compound is believed to alkylate DNA on adenine residues4,5 and induces double-strand breaks in cultured cells3. Here we expose human intestinal organoids to genotoxic pks+ E. coli by repeated luminal injection over five months. Whole-genome sequencing of clonal organoids before and after this exposure revealed a distinct mutational signature that was absent from organoids injected with isogenic pks-mutant bacteria. The same mutational signature was detected in a subset of 5,876 human cancer genomes from two independent cohorts, predominantly in colorectal cancer. Our study describes a distinct mutational signature in colorectal cancer and implies that the underlying mutational process results directly from past exposure to bacteria carrying the colibactin-producing pks pathogenicity island.
This is a preview of subscription content, access via your institution
Open Access articles citing this article.
Signal Transduction and Targeted Therapy Open Access 27 September 2022
Machine learning-based tissue of origin classification for cancer of unknown primary diagnostics using genome-wide mutation features
Nature Communications Open Access 11 July 2022
Cell Death & Disease Open Access 22 April 2022
Subscribe to Nature+
Get immediate online access to the entire Nature family of 50+ journals
Subscribe to Journal
Get full journal access for 1 year
only $3.90 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Get time limited or full article access on ReadCube.
All prices are NET prices.
Whole-genome sequence data have been deposited in the European Genome–Phenome Archive (https://ega-archive.org); accession number EGAS00001003934. The data used from the Hartwig Medical Foundation and Genomics England databases consist of patient-level somatic variant data (annotated variant call data) and are considered privacy sensitive and available through access-controlled mechanisms. Patient-level somatic variant and clinical data were obtained from the Hartwig Medical Foundation under data request number DR-084. Somatic variant and clinical data are freely available for academic use from the Hartwig Medical Foundation through standardized procedures. Privacy and publication policies, including co-authorship policies, can be retrieved from: https://www.hartwigmedicalfoundation.nl/en/data-policy/. Data request forms can be downloaded from https://www.hartwigmedicalfoundation.nl/en/applying-for-data/. To gain access to the data, this data request form should be emailed to email@example.com, upon which it will be evaluated within six weeks by the HMF Scientific Council and an independent Data Access Board. When access is granted, the requested data become available through a download link provided by HMF. Somatic variant data from the Genomics England data set were analysed within the Genomics England Research Environment secure data portal, under Research Registry project code RR87, and exported from the Research Environment following data transfer request 1000000003652 on 3 December 2019. The Genomics England data set can be accessed by joining the community of academic and clinical scientist via the Genomics England Clinical Interpretation Partnership (GeCIP), https://www.genomicsengland.co.uk/about-gecip/. To join a GeCIP domain, the following steps have to be taken: 1. Your institution has to sign the GeCIP Participation Agreement, which outlines the key principles that members of each institution must adhere to, including our Intellectual Property and Publication Policy. 2. Submit your application using the relevant form found at the bottom of the page (https://www.genomicsengland.co.uk/join-a-gecip-domain/). 3. The domain lead will review your application, and your institution will verify your identity for Genomics England and communicate confirmation directly to Genomics England. 4. Your user account will be created. 5. You will be sent an email containing a link to complete Information Governance training and sign the GeCIP rules (https://www.genomicsengland.co.uk/wp-content/uploads/2019/07/GeCIP-Rules_29-08-2018.pdf). Completing the training and signing the GeCIP Rules are requirements for you to access the data. After you have completed the training and signed the rules, you will need to wait for your access to the Research Environment to be granted. 6. This will generally take up to one working day. You will then receive an email letting you know your account has been given access to the environment, and instructions for logging in (for more detail, see: https://www.genomicsengland.co.uk/join-a-gecip-domain/). Details of the data access agreement can be retrieved from https://figshare.com/articles/GenomicEnglandProtocol_pdf/4530893/5. All requests will be evaluated by the Genomics England Access Review Committee taking into consideration patient data protection, compliance with legal and regulatory requirements, resource availability and facilitation of high-quality research. All analysis of the data must take place within the Genomics England Research Environment secure data portal, https://www.genomicsengland.co.uk/understanding-genomics/data/ and exported following approval of a data transfer request. Regarding co-authorship, all publications using data generated as part of the Genomics England 100,000 Genomes Project must include the Genomics England Research Consortium as co-authors. The full publication policy is available at https://www.genomicsengland.co.uk/about-gecip/publications/. All other data supporting the findings of this study are available from the corresponding author upon request.
All analysis scripts are available at https://github.com/ToolsVanBox/GenotoxicEcoli.
Allen, J. & Sears, C. L. Impact of the gut microbiome on the genome and epigenome of colon epithelial cells: contributions to colorectal cancer development. Genome Med. 11, 11 (2019).
Gagnaire, A., Nadel, B., Raoult, D., Neefjes, J. & Gorvel, J.-P. Collateral damage: insights into bacterial mechanisms that predispose host cells to cancer. Nat. Rev. Microbiol. 15, 109–128 (2017).
Nougayrède, J.-P. et al. Escherichia coli induces DNA double-strand breaks in eukaryotic cells. Science 313, 848–851 (2006).
Wilson, M. R. et al. The human gut bacterial genotoxin colibactin alkylates DNA. Science 363, eaar7785 (2019).
Xue, M. et al. Structure elucidation of colibactin and its DNA cross-links. Science 365, eaax2685 (2019).
Dejea, C. M. et al. Patients with familial adenomatous polyposis harbor colonic biofilms containing tumorigenic bacteria. Science 359, 592–597 (2018).
Bullman, S. et al. Analysis of Fusobacterium persistence and antibiotic response in colorectal cancer. Science 358, 1443–1448 (2017).
Kostic, A. D. et al. Fusobacterium nucleatum potentiates intestinal tumorigenesis and modulates the tumor-immune microenvironment. Cell Host Microbe 14, 207–215 (2013).
Wirbel, J. et al. Meta-analysis of fecal metagenomes reveals global microbial signatures that are specific for colorectal cancer. Nat. Med. 25, 679–689 (2019).
Buc, E. et al. High prevalence of mucosa-associated E. coli producing cyclomodulin and genotoxin in colon cancer. PLoS One 8, e56964 (2013).
Arthur, J. C. et al. Intestinal inflammation targets cancer-inducing activity of the microbiota. Science 338, 120–123 (2012).
Bossuet-Greif, N. et al. The colibactin genotoxin generates DNA interstrand cross-links in infected cells. mBio 9, e02393-17 (2018).
Alexandrov, L. B. et al. The repertoire of mutational signatures in human cancer. Nature 578, 94–101 (2020).
Alexandrov, L. B. et al. Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013).
Nik-Zainal, S. et al. Mutational processes molding the genomes of 21 breast cancers. Cell 149, 979–993 (2012).
Drost, J. et al. Use of CRISPR-modified human stem cell organoids to study the origin of mutational signatures in cancer. Science 358, 234–238 (2017).
Sato, T. et al. Long-term expansion of epithelial organoids from human colon, adenoma, adenocarcinoma, and Barrett’s epithelium. Gastroenterology 141, 1762–1772 (2011).
Tuveson, D. & Clevers, H. Cancer modeling meets human organoid technology. Science 364, 952–955 (2019).
Kucab, J. E. et al. A compendium of mutational signatures of environmental agents. Cell 177, 821–836.e16 (2019).
Jager, M. et al. Measuring mutation accumulation in single human adult stem cells by whole-genome sequencing of organoid cultures. Nat. Protocols 13, 59–78 (2018).
Cougnoux, A. et al. Bacterial genotoxin colibactin promotes colon tumour growth by inducing a senescence-associated secretory phenotype. Gut 63, 1932–1942 (2014).
Bartfeld, S. et al. In vitro expansion of human gastric epithelial stem cells and their responses to bacterial infection. Gastroenterology 148, 126–136.e6 (2015).
Li, Z.-R. et al. Divergent biosynthesis yields a cytotoxic aminomalonate-containing precolibactin. Nat. Chem. Biol. 12, 773–775 (2016).
Priestley, P. et al. Pan-cancer whole-genome analyses of metastatic solid tumours. Nature 575, 210–216 (2019).
Gonzalez-Perez, A. et al. IntOGen-mutations identifies cancer drivers across tumor types. Nat. Methods 10, 1081–1082 (2013).
Lee-Six, H. et al. The landscape of somatic mutation in normal colorectal epithelial cells. Nature 574, 532–537 (2019).
McLellan, L. K. & Hunstad, D. A. Urinary tract infection: pathogenesis and outlook. Trends Mol. Med. 22, 946–957 (2016).
Zawadzki, P. J. et al. Identification of infectious microbiota from oral cavity environment of various population group patients as a preventive approach to human health risk factors. Ann. Agric. Environ. Med. 23, 566–569 (2016).
Banerjee, S. et al. Microbial signatures associated with oropharyngeal and oral squamous cell carcinomas. Sci. Rep. 7, 4036 (2017).
Boot, A. et al. Identification of novel mutational signatures in Asian oral squamous cell carcinomas associated with bacterial infections Preprint at https://doi.org/10.1101/368753 (2019).
Payros, D. et al. Maternally acquired genotoxic Escherichia coli alters offspring’s intestinal homeostasis. Gut Microbes 5, 313–325 (2014).
Olier, M. et al. Genotoxicity of Escherichia coli Nissle 1917 strain cannot be dissociated from its probiotic activity. Gut Microbes 3, 501–509 (2012).
Jacobi, C. A. & Malfertheiner, P. Escherichia coli Nissle 1917 (Mutaflor): new insights into an old probiotic bacterium. Dig. Dis. 29, 600–607 (2011).
Blokzijl, F. et al. Tissue-specific mutation accumulation in human adult stem cells during life. Nature 538, 260–264 (2016).
Heo, I. et al. Modelling Cryptosporidium infection in human small intestinal and lung organoids. Nat. Microbiol. 3, 814–823 (2018).
Pace, P. et al. FANCE: the link between Fanconi anaemia complex assembly and activity. EMBO J. 21, 3414–3423 (2002).
Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 26, 589–595 (2010).
Osorio, F. G. et al. Somatic mutations reveal lineage relationships and age-related mutagenesis in human hematopoiesis. Cell Rep. 25, 2308–2316.e4 (2018).
Blokzijl, F., Janssen, R., van Boxtel, R. & Cuppen, E. MutationalPatterns: comprehensive genome-wide analysis of mutational processes. Genome Med. 10, 33 (2018).
Cunningham, F. et al. Ensembl 2015. Nucleic Acids Res. 43, D662–D669 (2015).
Cameron, D. L. et al. GRIDSS, PURPLE, LINX: unscrambling the tumor genome via integrated analysis of structural variation and copy number. Preprint at https://doi.org/10.1101/781013 (2019).
Genomics England The National Genomics Research and Healthcare Knowledgebase https://www.genomicsengland.co.uk/the-national-genomics-research-and-healthcare-knowledgebase/ (2017).
Raczy, C. et al. Isaac: ultra-fast whole-genome secondary analysis on Illumina sequencing platforms. Bioinformatics 29, 2041–2043 (2013).
Lawrence, M. et al. Software for computing and annotating genomic ranges. PLOS Comput. Biol. 9, e1003118 (2013).
We thank J. H. J. Hoeijmakers, P. Knipscheer and J. I. Garaycoechea for discussions on DNA damage, and P. Robinson, K. Vervier, T. Lawley, and M. Stratton for explorative analysis and discussions. This publication and the underlying study have been made possible partly on the basis of the data that Hartwig Medical Foundation and the Center of Personalised Cancer Treatment (CPCT) have made available to the study. This research was made possible through access to the data and findings generated by the 100,000 Genomes Project. The 100,000 Genomes Project is managed by Genomics England Limited (a wholly owned company of the Department of Health and Social Care). The 100,000 Genomes Project is funded by the National Institute for Health Research and NHS England. The Wellcome Trust, Cancer Research UK and the Medical Research Council have also funded research infrastructure. The 100,000 Genomes Project uses data provided by patients and collected by the National Health Service as part of their care and support. This work was supported by CRUK grant OPTIMISTICC (C10674/A27140), the Gravitation projects CancerGenomiCs.nl and the Netherlands Organ-on-Chip Initiative (024.003.001) from the Netherlands Organisation for Scientific Research (NWO) funded by the Ministry of Education, Culture and Science of the government of the Netherlands (C.P.-M., J.P.), the Oncode Institute (partly financed by the Dutch Cancer Society), the European Research Council under ERC Advanced Grant Agreement no. 67013 (J.P., T.M., H.C.), a VIDI grant from the NWO (no. 016.Vidi.171.023) to R.v.B. that supports A.R.H. and NWO building blocks of life project: Cell dynamics within lung and intestinal organoids (737.016.009) (M.H.G.). With financial support from ITMO Cancer AVIESAN (Alliance Nationale pour les Sciences de la Vie et de la Santé, National Alliance for Life Sciences & Health) within the framework of the Cancer Plan (HTE201601) (G.D., R.B.) as well as Howard Hughes Medical Institute, Mathers Foundation, and NIH-1R01DK115728-01A1 (Y.M., K.C.G.).
H.C. is inventor on several patents related to organoid technology; his full disclosure is given at https://www.uu.nl/staff/JCClevers/. M.M. is scientific advisory board chair and a consultant for OrigiMed, receives research support from Bayer, Janssen, and Ono, and receives royalty payments from Labcorp. H.C and K.C.G are co-founders of Surrozen.
Peer review information Nature thanks Bogdan Fedeles, Christian Jobin and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
Extended Data Fig. 1 Co-culture with genotoxic pks+ E. coli induces DNA interstrand crosslinks in healthy human intestinal organoids.
a, Representative images (out of n = 5 organoids per group) of DNA interstrand crosslink formation after 1 day of co-culture, measured by FANCD2 immunofluorescence (green). Nuclei were stained with DAPI (blue). Yellow boxes represent inset area. Scale bars, 50 μm (main image); 10 μm (inset). Experiment was repeated independently twice with similar results. b, Gating strategy to select epithelial cells (left) and to quantify viable cells (right). c, Mean ± s.d. viability of intestinal organoid cells after 1, 3 or 5 days of co-culture (n = 3 technical replicates) (bacteria eliminated after 3 days of co-culture). Points are independent replicates.
Extended Data Fig. 2 Genotoxic pks+ E. coli induce SBS-pks and ID-pks mutational signatures after long-term co-culture with wild-type intestinal organoids.
a, Ninety-six-trinucleotide mutational spectra of SBSs in each of the three individual clones sequenced per condition. Top three, dye; middle three, pksΔclbQ E. coli; bottom three, pks+ E. coli. b, Total 96-trinucleotide mutational spectra of organoids injected with pks+ E. coli or pksΔclbQ E. coli from which SBSs in dye-injected organoids have been subtracted. c, Heatmap depicting cosine similarity between 96-trinucleotide mutational profiles of organoids injected with dye, pks+ E. coli or pksΔclbQ E. coli. d, Indel mutational spectra plots from each of the three individual clones sequenced per condition. Top three, dye; middle three, pksΔclbQ E. coli; bottom three, pks+ E. coli. e, Total indel mutational spectra of organoids injected with pks+ E. coli and pksΔclbQ E. coli from which indels in dye-injected organoids have been subtracted. f, Heatmap depicting cosine similarity between indel mutational profiles of organoids injected with dye, pks+ E. coli or pksΔclbQ E. coli.
Extended Data Fig. 3 Genotoxic pks+ E. coli and isogenic strain reconstituted with pksΔ clbQ:clbQ induce SBS-pks and ID-pks mutational signatures after co-culture.
a, Ninety-six-trinucleotide mutational spectra of SBSs in three individual clones from the independent human healthy intestinal organoid line ASC-6a co-cultured for three rounds with pks+ or pksΔclbQ E. coli. b, Top, total 96-trinucleotide mutational spectra from the three clones co-cultured with from pks+ or pksΔclbQ E. coli shown in a. Bottom, resulting 96-trinucleotide mutational spectrum from ASC-6a organoids co-cultured with pks+ E. coli after the subtraction of background mutations from three parallel pksΔclbQ E. coli co-cultures (cosine similarity to SBS-pks = 0.77). c, Indel mutational spectra from the three independent ASC-6a clones co-cultured for three rounds with pks+ or pksΔclbQ E. coli. d, Top, total indel mutational spectra from the three clones co-cultured with pks+ or pksΔclbQ E. coli shown in c. Bottom, resulting indel mutational spectrum from the independent ASC-6a organoids co-cultured with pks+ E. coli after the subtraction of background mutations from three parallel pksΔclbQ E. coli co-cultures (cosine similarity to ID-pks = 0.93). e, Ninety-six-trinucleotide mutational spectra from three individual clones of the ASC-5a line co-cultured for three rounds with the isogenic recomplemented E. coli strain pksΔclbQ:clbQ. f, Top, total 96-trinucleotide mutational spectrum from the three clones co-cultured with pksΔclbQ:clbQ E. coli shown in e. Bottom, resulting mutational spectrum after subtracting pksΔclbQ background (cosine similarity to SBS-pks = 0.95). g, Indel mutational spectra from three individual clones of the ASC-5a line co-cultured for three rounds with the isogenic recomplemented E. coli strain pksΔclbQ:clbQ. h, Top, total indel mutational spectrum from the three clones co-cultured with pksΔclbQ:clbQ E. coli shown in g. Bottom, resulting mutational spectrum after subtracting pksΔclbQ background (cosine similarity to ID-pks = 0.95).
a, Ten-base up- and downstream profile shows an upstream homopolymer of adenosines that favours induction of T deletions. The length of the adenosine stretch decreases with increasing T homopolymer length (1–8, top left to bottom right).
a, De novo NMF-SBS-pks signature extracted by NMF on all 496 CRC metastases in the HMF data set. b, Cosine similarity scores between the de novo extracted SBS signature in a and COSMIC SigProfiler signatures, including our experimentally defined SBS-pks signature (left). c, Relative contribution of SBS-pks to clonal (corrected variant allele frequency >0.4, blue) and subclonal fractions (corrected variant allele frequency <0.2, red) of mutations in the 31 SBS/ID-pks high CRC metastases from the HMF cohort. Box, upper and lower quartiles; centre line, mean; whiskers, largest value no more than 1.5 times the interquartile range extending from the box; points, individual CRC metastases.
Supplementary Table 1 Mutations matching pks motifs in driver genes in colorectal cancer. List of the number of mutations matching the SBS-pks or ID-pks motifs and total number of mutations within the top 50 driver genes present in colorectal cancer. Dataset obtained from the IntOGen cancer mutation database25.
Supplementary Table 2 Protein coding sequence mutations matching the SBS/ID-pks motif. List of all mutations from all SBS/ID-pks high CRC samples matching the SBS/ID-pks extended motif and leading to changes in protein coding regions of the genome.
About this article
Cite this article
Pleguezuelos-Manzano, C., Puschhof, J., Rosendahl Huber, A. et al. Mutational signature in colorectal cancer caused by genotoxic pks+ E. coli. Nature 580, 269–273 (2020). https://doi.org/10.1038/s41586-020-2080-8
StrainGE: a toolkit to track and characterize low-abundance strains in complex microbial communities
Genome Biology (2022)
Lactobacillus murinus alleviate intestinal ischemia/reperfusion injury through promoting the release of interleukin-10 from M2 macrophages via Toll-like receptor 2 signaling
BMC Genomics (2022)
Nature Reviews Gastroenterology & Hepatology (2022)
Machine learning-based tissue of origin classification for cancer of unknown primary diagnostics using genome-wide mutation features
Nature Communications (2022)