Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Mutational signature in colorectal cancer caused by genotoxic pks+ E. coli


Various species of the intestinal microbiota have been associated with the development of colorectal cancer1,2, but it has not been demonstrated that bacteria have a direct role in the occurrence of oncogenic mutations. Escherichia coli can carry the pathogenicity island pks, which encodes a set of enzymes that synthesize colibactin3. This compound is believed to alkylate DNA on adenine residues4,5 and induces double-strand breaks in cultured cells3. Here we expose human intestinal organoids to genotoxic pks+ E. coli by repeated luminal injection over five months. Whole-genome sequencing of clonal organoids before and after this exposure revealed a distinct mutational signature that was absent from organoids injected with isogenic pks-mutant bacteria. The same mutational signature was detected in a subset of 5,876 human cancer genomes from two independent cohorts, predominantly in colorectal cancer. Our study describes a distinct mutational signature in colorectal cancer and implies that the underlying mutational process results directly from past exposure to bacteria carrying the colibactin-producing pks pathogenicity island.

This is a preview of subscription content

Access options

Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: Co-culture of healthy human intestinal organoids with genotoxic E. coli induces DNA damage.
Fig. 2: Long-term co-culture with pks+ E. coli induces SBS-pks and ID-pks mutational signatures.
Fig. 3: Consensus motifs and extended features of SBS-pks and ID-pks mutational signatures.
Fig. 4: SBS-pks and ID-pks mutational signatures are present in a subset of CRC samples from two independent cohorts.

Data availability

Whole-genome sequence data have been deposited in the European Genome–Phenome Archive (; accession number EGAS00001003934. The data used from the Hartwig Medical Foundation and Genomics England databases consist of patient-level somatic variant data (annotated variant call data) and are considered privacy sensitive and available through access-controlled mechanisms. Patient-level somatic variant and clinical data were obtained from the Hartwig Medical Foundation under data request number DR-084. Somatic variant and clinical data are freely available for academic use from the Hartwig Medical Foundation through standardized procedures. Privacy and publication policies, including co-authorship policies, can be retrieved from: Data request forms can be downloaded from To gain access to the data, this data request form should be emailed to, upon which it will be evaluated within six weeks by the HMF Scientific Council and an independent Data Access Board. When access is granted, the requested data become available through a download link provided by HMF. Somatic variant data from the Genomics England data set were analysed within the Genomics England Research Environment secure data portal, under Research Registry project code RR87, and exported from the Research Environment following data transfer request 1000000003652 on 3 December 2019. The Genomics England data set can be accessed by joining the community of academic and clinical scientist via the Genomics England Clinical Interpretation Partnership (GeCIP), To join a GeCIP domain, the following steps have to be taken: 1. Your institution has to sign the GeCIP Participation Agreement, which outlines the key principles that members of each institution must adhere to, including our Intellectual Property and Publication Policy. 2. Submit your application using the relevant form found at the bottom of the page ( 3. The domain lead will review your application, and your institution will verify your identity for Genomics England and communicate confirmation directly to Genomics England. 4. Your user account will be created. 5. You will be sent an email containing a link to complete Information Governance training and sign the GeCIP rules ( Completing the training and signing the GeCIP Rules are requirements for you to access the data. After you have completed the training and signed the rules, you will need to wait for your access to the Research Environment to be granted. 6. This will generally take up to one working day. You will then receive an email letting you know your account has been given access to the environment, and instructions for logging in (for more detail, see: Details of the data access agreement can be retrieved from All requests will be evaluated by the Genomics England Access Review Committee taking into consideration patient data protection, compliance with legal and regulatory requirements, resource availability and facilitation of high-quality research. All analysis of the data must take place within the Genomics England Research Environment secure data portal, and exported following approval of a data transfer request. Regarding co-authorship, all publications using data generated as part of the Genomics England 100,000 Genomes Project must include the Genomics England Research Consortium as co-authors. The full publication policy is available at All other data supporting the findings of this study are available from the corresponding author upon request.

Code availability

All analysis scripts are available at


  1. Allen, J. & Sears, C. L. Impact of the gut microbiome on the genome and epigenome of colon epithelial cells: contributions to colorectal cancer development. Genome Med. 11, 11 (2019).

    Article  Google Scholar 

  2. Gagnaire, A., Nadel, B., Raoult, D., Neefjes, J. & Gorvel, J.-P. Collateral damage: insights into bacterial mechanisms that predispose host cells to cancer. Nat. Rev. Microbiol. 15, 109–128 (2017).

    CAS  Article  Google Scholar 

  3. Nougayrède, J.-P. et al. Escherichia coli induces DNA double-strand breaks in eukaryotic cells. Science 313, 848–851 (2006).

    ADS  Article  Google Scholar 

  4. Wilson, M. R. et al. The human gut bacterial genotoxin colibactin alkylates DNA. Science 363, eaar7785 (2019).

    CAS  Article  Google Scholar 

  5. Xue, M. et al. Structure elucidation of colibactin and its DNA cross-links. Science 365, eaax2685 (2019).

    CAS  Article  Google Scholar 

  6. Dejea, C. M. et al. Patients with familial adenomatous polyposis harbor colonic biofilms containing tumorigenic bacteria. Science 359, 592–597 (2018).

    ADS  CAS  Article  Google Scholar 

  7. Bullman, S. et al. Analysis of Fusobacterium persistence and antibiotic response in colorectal cancer. Science 358, 1443–1448 (2017).

    ADS  CAS  Article  Google Scholar 

  8. Kostic, A. D. et al. Fusobacterium nucleatum potentiates intestinal tumorigenesis and modulates the tumor-immune microenvironment. Cell Host Microbe 14, 207–215 (2013).

    CAS  Article  Google Scholar 

  9. Wirbel, J. et al. Meta-analysis of fecal metagenomes reveals global microbial signatures that are specific for colorectal cancer. Nat. Med. 25, 679–689 (2019).

    CAS  Article  Google Scholar 

  10. Buc, E. et al. High prevalence of mucosa-associated E. coli producing cyclomodulin and genotoxin in colon cancer. PLoS One 8, e56964 (2013).

    ADS  CAS  Article  Google Scholar 

  11. Arthur, J. C. et al. Intestinal inflammation targets cancer-inducing activity of the microbiota. Science 338, 120–123 (2012).

    ADS  CAS  Article  Google Scholar 

  12. Bossuet-Greif, N. et al. The colibactin genotoxin generates DNA interstrand cross-links in infected cells. mBio 9, e02393-17 (2018).

    CAS  Article  Google Scholar 

  13. Alexandrov, L. B. et al. The repertoire of mutational signatures in human cancer. Nature 578, 94–101 (2020).

    ADS  CAS  Article  Google Scholar 

  14. Alexandrov, L. B. et al. Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013).

    CAS  Article  Google Scholar 

  15. Nik-Zainal, S. et al. Mutational processes molding the genomes of 21 breast cancers. Cell 149, 979–993 (2012).

    CAS  Article  Google Scholar 

  16. Drost, J. et al. Use of CRISPR-modified human stem cell organoids to study the origin of mutational signatures in cancer. Science 358, 234–238 (2017).

    ADS  CAS  Article  Google Scholar 

  17. Sato, T. et al. Long-term expansion of epithelial organoids from human colon, adenoma, adenocarcinoma, and Barrett’s epithelium. Gastroenterology 141, 1762–1772 (2011).

    CAS  Article  Google Scholar 

  18. Tuveson, D. & Clevers, H. Cancer modeling meets human organoid technology. Science 364, 952–955 (2019).

    ADS  CAS  Article  Google Scholar 

  19. Kucab, J. E. et al. A compendium of mutational signatures of environmental agents. Cell 177, 821–836.e16 (2019).

    CAS  Article  Google Scholar 

  20. Jager, M. et al. Measuring mutation accumulation in single human adult stem cells by whole-genome sequencing of organoid cultures. Nat. Protocols 13, 59–78 (2018).

    CAS  Article  Google Scholar 

  21. Cougnoux, A. et al. Bacterial genotoxin colibactin promotes colon tumour growth by inducing a senescence-associated secretory phenotype. Gut 63, 1932–1942 (2014).

    CAS  Article  Google Scholar 

  22. Bartfeld, S. et al. In vitro expansion of human gastric epithelial stem cells and their responses to bacterial infection. Gastroenterology 148, 126–136.e6 (2015).

    Article  Google Scholar 

  23. Li, Z.-R. et al. Divergent biosynthesis yields a cytotoxic aminomalonate-containing precolibactin. Nat. Chem. Biol. 12, 773–775 (2016).

    CAS  Article  Google Scholar 

  24. Priestley, P. et al. Pan-cancer whole-genome analyses of metastatic solid tumours. Nature 575, 210–216 (2019).

    ADS  CAS  Article  Google Scholar 

  25. Gonzalez-Perez, A. et al. IntOGen-mutations identifies cancer drivers across tumor types. Nat. Methods 10, 1081–1082 (2013).

    CAS  Article  Google Scholar 

  26. Lee-Six, H. et al. The landscape of somatic mutation in normal colorectal epithelial cells. Nature 574, 532–537 (2019).

    ADS  CAS  Article  Google Scholar 

  27. McLellan, L. K. & Hunstad, D. A. Urinary tract infection: pathogenesis and outlook. Trends Mol. Med. 22, 946–957 (2016).

    Article  Google Scholar 

  28. Zawadzki, P. J. et al. Identification of infectious microbiota from oral cavity environment of various population group patients as a preventive approach to human health risk factors. Ann. Agric. Environ. Med. 23, 566–569 (2016).

    Article  Google Scholar 

  29. Banerjee, S. et al. Microbial signatures associated with oropharyngeal and oral squamous cell carcinomas. Sci. Rep. 7, 4036 (2017).

    ADS  Article  Google Scholar 

  30. Boot, A. et al. Identification of novel mutational signatures in Asian oral squamous cell carcinomas associated with bacterial infections Preprint at (2019).

  31. Payros, D. et al. Maternally acquired genotoxic Escherichia coli alters offspring’s intestinal homeostasis. Gut Microbes 5, 313–325 (2014).

    Article  Google Scholar 

  32. Olier, M. et al. Genotoxicity of Escherichia coli Nissle 1917 strain cannot be dissociated from its probiotic activity. Gut Microbes 3, 501–509 (2012).

    Article  Google Scholar 

  33. Jacobi, C. A. & Malfertheiner, P. Escherichia coli Nissle 1917 (Mutaflor): new insights into an old probiotic bacterium. Dig. Dis. 29, 600–607 (2011).

    Article  Google Scholar 

  34. Blokzijl, F. et al. Tissue-specific mutation accumulation in human adult stem cells during life. Nature 538, 260–264 (2016).

    ADS  CAS  Article  Google Scholar 

  35. Heo, I. et al. Modelling Cryptosporidium infection in human small intestinal and lung organoids. Nat. Microbiol. 3, 814–823 (2018).

    CAS  Article  Google Scholar 

  36. Pace, P. et al. FANCE: the link between Fanconi anaemia complex assembly and activity. EMBO J. 21, 3414–3423 (2002).

    CAS  Article  Google Scholar 

  37. Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 26, 589–595 (2010).

    Article  Google Scholar 

  38. Osorio, F. G. et al. Somatic mutations reveal lineage relationships and age-related mutagenesis in human hematopoiesis. Cell Rep. 25, 2308–2316.e4 (2018).

    CAS  Article  Google Scholar 

  39. Blokzijl, F., Janssen, R., van Boxtel, R. & Cuppen, E. MutationalPatterns: comprehensive genome-wide analysis of mutational processes. Genome Med. 10, 33 (2018).

    Article  Google Scholar 

  40. Cunningham, F. et al. Ensembl 2015. Nucleic Acids Res. 43, D662–D669 (2015).

  41. Cameron, D. L. et al. GRIDSS, PURPLE, LINX: unscrambling the tumor genome via integrated analysis of structural variation and copy number. Preprint at (2019).

  42. Genomics England The National Genomics Research and Healthcare Knowledgebase (2017).

  43. Raczy, C. et al. Isaac: ultra-fast whole-genome secondary analysis on Illumina sequencing platforms. Bioinformatics 29, 2041–2043 (2013).

    CAS  Article  Google Scholar 

  44. Lawrence, M. et al. Software for computing and annotating genomic ranges. PLOS Comput. Biol. 9, e1003118 (2013).

    CAS  Article  Google Scholar 

Download references


We thank J. H. J. Hoeijmakers, P. Knipscheer and J. I. Garaycoechea for discussions on DNA damage, and P. Robinson, K. Vervier, T. Lawley, and M. Stratton for explorative analysis and discussions. This publication and the underlying study have been made possible partly on the basis of the data that Hartwig Medical Foundation and the Center of Personalised Cancer Treatment (CPCT) have made available to the study. This research was made possible through access to the data and findings generated by the 100,000 Genomes Project. The 100,000 Genomes Project is managed by Genomics England Limited (a wholly owned company of the Department of Health and Social Care). The 100,000 Genomes Project is funded by the National Institute for Health Research and NHS England. The Wellcome Trust, Cancer Research UK and the Medical Research Council have also funded research infrastructure. The 100,000 Genomes Project uses data provided by patients and collected by the National Health Service as part of their care and support. This work was supported by CRUK grant OPTIMISTICC (C10674/A27140), the Gravitation projects and the Netherlands Organ-on-Chip Initiative (024.003.001) from the Netherlands Organisation for Scientific Research (NWO) funded by the Ministry of Education, Culture and Science of the government of the Netherlands (C.P.-M., J.P.), the Oncode Institute (partly financed by the Dutch Cancer Society), the European Research Council under ERC Advanced Grant Agreement no. 67013 (J.P., T.M., H.C.), a VIDI grant from the NWO (no. 016.Vidi.171.023) to R.v.B. that supports A.R.H. and NWO building blocks of life project: Cell dynamics within lung and intestinal organoids (737.016.009) (M.H.G.). With financial support from ITMO Cancer AVIESAN (Alliance Nationale pour les Sciences de la Vie et de la Santé, National Alliance for Life Sciences & Health) within the framework of the Cancer Plan (HTE201601) (G.D., R.B.) as well as Howard Hughes Medical Institute, Mathers Foundation, and NIH-1R01DK115728-01A1 (Y.M., K.C.G.).

Author information

Authors and Affiliations




C.P.-M., J.P., A.R.H. and H.C. conceived the study; C.P.-M., J.P., A.R.H., R.v.B. and H.C. wrote the manuscript; A.R.H., H.M.W., F.M. and R.v.B. performed signature analysis; A.R.H., A.v.H., H.M.W., J.N., C.G., P.Q., M.G., M.M. and E.C. provided access to and analysed patient WGS data; G.D. and R.B. isolated bacterial strains and generated knockouts; C.P.-M., J.P., T.M., R.v.d.L., M.H.G. and S.v.E. established and performed organoid cloning experiments; C.P.-M., J.P. and J.B. performed organoid co-culture experiments; P.B.S., F.L.P., J.T. and R.J.L.W. performed bacteria validation and assays. Y.M. and K.C.G. provided and advised on the use of the Wnt surrogate-Fc fusion reagent.

Corresponding authors

Correspondence to Ruben van Boxtel or Hans Clevers.

Ethics declarations

Competing interests

H.C. is inventor on several patents related to organoid technology; his full disclosure is given at M.M. is scientific advisory board chair and a consultant for OrigiMed, receives research support from Bayer, Janssen, and Ono, and receives royalty payments from Labcorp. H.C and K.C.G are co-founders of Surrozen.

Additional information

Peer review information Nature thanks Bogdan Fedeles, Christian Jobin and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Co-culture with genotoxic pks+ E. coli induces DNA interstrand crosslinks in healthy human intestinal organoids.

a, Representative images (out of n = 5 organoids per group) of DNA interstrand crosslink formation after 1 day of co-culture, measured by FANCD2 immunofluorescence (green). Nuclei were stained with DAPI (blue). Yellow boxes represent inset area. Scale bars, 50 μm (main image); 10 μm (inset). Experiment was repeated independently twice with similar results. b, Gating strategy to select epithelial cells (left) and to quantify viable cells (right). c, Mean ± s.d. viability of intestinal organoid cells after 1, 3 or 5 days of co-culture (n = 3 technical replicates) (bacteria eliminated after 3 days of co-culture). Points are independent replicates.

Extended Data Fig. 2 Genotoxic pks+ E. coli induce SBS-pks and ID-pks mutational signatures after long-term co-culture with wild-type intestinal organoids.

a, Ninety-six-trinucleotide mutational spectra of SBSs in each of the three individual clones sequenced per condition. Top three, dye; middle three, pksΔclbQ E. coli; bottom three, pks+ E. coli. b, Total 96-trinucleotide mutational spectra of organoids injected with pks+ E. coli or pksΔclbQ E. coli from which SBSs in dye-injected organoids have been subtracted. c, Heatmap depicting cosine similarity between 96-trinucleotide mutational profiles of organoids injected with dye, pks+ E. coli or pksΔclbQ E. coli. d, Indel mutational spectra plots from each of the three individual clones sequenced per condition. Top three, dye; middle three, pksΔclbQ E. coli; bottom three, pks+ E. coli. e, Total indel mutational spectra of organoids injected with pks+ E. coli and pksΔclbQ E. coli from which indels in dye-injected organoids have been subtracted. f, Heatmap depicting cosine similarity between indel mutational profiles of organoids injected with dye, pks+ E. coli or pksΔclbQ E. coli.

Extended Data Fig. 3 Genotoxic pks+ E. coli and isogenic strain reconstituted with pksΔ clbQ:clbQ induce SBS-pks and ID-pks mutational signatures after co-culture.

a, Ninety-six-trinucleotide mutational spectra of SBSs in three individual clones from the independent human healthy intestinal organoid line ASC-6a co-cultured for three rounds with pks+ or pksΔclbQ E. coli. b, Top, total 96-trinucleotide mutational spectra from the three clones co-cultured with from pks+ or pksΔclbQ E. coli shown in a. Bottom, resulting 96-trinucleotide mutational spectrum from ASC-6a organoids co-cultured with pks+ E. coli after the subtraction of background mutations from three parallel pksΔclbQ E. coli co-cultures (cosine similarity to SBS-pks = 0.77). c, Indel mutational spectra from the three independent ASC-6a clones co-cultured for three rounds with pks+ or pksΔclbQ E. coli. d, Top, total indel mutational spectra from the three clones co-cultured with pks+ or pksΔclbQ E. coli shown in c. Bottom, resulting indel mutational spectrum from the independent ASC-6a organoids co-cultured with pks+ E. coli after the subtraction of background mutations from three parallel pksΔclbQ E. coli co-cultures (cosine similarity to ID-pks = 0.93). e, Ninety-six-trinucleotide mutational spectra from three individual clones of the ASC-5a line co-cultured for three rounds with the isogenic recomplemented E. coli strain pksΔclbQ:clbQ. f, Top, total 96-trinucleotide mutational spectrum from the three clones co-cultured with pksΔclbQ:clbQ E. coli shown in e. Bottom, resulting mutational spectrum after subtracting pksΔclbQ background (cosine similarity to SBS-pks = 0.95). g, Indel mutational spectra from three individual clones of the ASC-5a line co-cultured for three rounds with the isogenic recomplemented E. coli strain pksΔclbQ:clbQ. h, Top, total indel mutational spectrum from the three clones co-cultured with pksΔclbQ:clbQ E. coli shown in g. Bottom, resulting mutational spectrum after subtracting pksΔclbQ background (cosine similarity to ID-pks = 0.95).

Extended Data Fig. 4 Detailed sequence context for ID-pks and longer deletions by length.

a, Ten-base up- and downstream profile shows an upstream homopolymer of adenosines that favours induction of T deletions. The length of the adenosine stretch decreases with increasing T homopolymer length (1–8, top left to bottom right).

Extended Data Fig. 5 Signature extraction and clonal contribution of SBS-pks in CRC metastases.

a, De novo NMF-SBS-pks signature extracted by NMF on all 496 CRC metastases in the HMF data set. b, Cosine similarity scores between the de novo extracted SBS signature in a and COSMIC SigProfiler signatures, including our experimentally defined SBS-pks signature (left). c, Relative contribution of SBS-pks to clonal (corrected variant allele frequency >0.4, blue) and subclonal fractions (corrected variant allele frequency <0.2, red) of mutations in the 31 SBS/ID-pks high CRC metastases from the HMF cohort. Box, upper and lower quartiles; centre line, mean; whiskers, largest value no more than 1.5 times the interquartile range extending from the box; points, individual CRC metastases.

Extended Data Table 1 SBS-pks and ID-pks levels across tissue types

Supplementary information

Reporting Summary


Supplementary Table 1 Mutations matching pks motifs in driver genes in colorectal cancer. List of the number of mutations matching the SBS-pks or ID-pks motifs and total number of mutations within the top 50 driver genes present in colorectal cancer. Dataset obtained from the IntOGen cancer mutation database25.


Supplementary Table 2 Protein coding sequence mutations matching the SBS/ID-pks motif. List of all mutations from all SBS/ID-pks high CRC samples matching the SBS/ID-pks extended motif and leading to changes in protein coding regions of the genome.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Pleguezuelos-Manzano, C., Puschhof, J., Rosendahl Huber, A. et al. Mutational signature in colorectal cancer caused by genotoxic pks+ E. coli. Nature 580, 269–273 (2020).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

Further reading


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing