Colibactin DNA-damage signature indicates mutational impact in colorectal cancer

Dziubańska-Kusibab, Paulina J.; Berger, Hilmar; Battistini, Federica; Bouwman, Britta A. M.; Iftekhar, Amina; Katainen, Riku; Cajuso, Tatiana; Crosetto, Nicola; Orozco, Modesto; Aaltonen, Lauri A.; Meyer, Thomas F.

doi:10.1038/s41591-020-0908-2

Letter
Published: 01 June 2020

Colibactin DNA-damage signature indicates mutational impact in colorectal cancer

Paulina J. Dziubańska-Kusibab¹^na1,
Hilmar Berger ORCID: orcid.org/0000-0002-6304-4946¹^na1,
Federica Battistini²,
Britta A. M. Bouwman ORCID: orcid.org/0000-0002-9827-9497³,
Amina Iftekhar¹,
Riku Katainen⁴,
Tatiana Cajuso⁴,
Nicola Crosetto³,
Modesto Orozco^2,5,
Lauri A. Aaltonen⁴ &
…
Thomas F. Meyer ORCID: orcid.org/0000-0002-6120-8679¹

Nature Medicine volume 26, pages 1063–1069 (2020)Cite this article

10k Accesses
141 Citations
59 Altmetric
Metrics details

Subjects

Abstract

The mucosal epithelium is a common target of damage by chronic bacterial infections and the accompanying toxins, and most cancers originate from this tissue. We investigated whether colibactin, a potent genotoxin¹ associated with certain strains of Escherichia coli², creates a specific DNA-damage signature in infected human colorectal cells. Notably, the genomic contexts of colibactin-induced DNA double-strand breaks were enriched for an AT-rich hexameric sequence motif, associated with distinct DNA-shape characteristics. A survey of somatic mutations at colibactin target sites of several thousand cancer genomes revealed notable enrichment of this motif in colorectal cancers. Moreover, the exact double-strand-break loci corresponded with mutational hot spots in cancer genomes, reminiscent of a trinucleotide signature previously identified in healthy colorectal epithelial cells³. The present study provides evidence for the etiological role of colibactin in human cancer.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Colibactin preferentially damages DNA in specific AT-rich motifs.**

**Fig. 2: Colibactin-specific binding motif corresponds to extreme DNA-shape parameter values and electrostatic potential.**

**Fig. 3: Several cancers show enrichment of mutations in CDMs.**

**Fig. 4: Analysis of colibactin-induced DSB break ends and mutational patterns in CDMs indicate damaged positions.**

A distinct Fusobacterium nucleatum clade dominates the colorectal cancer niche

Article Open access 20 March 2024

Martha Zepeda-Rivera, Samuel S. Minot, … Christopher D. Johnston

A single-cell atlas enables mapping of homeostatic cellular shifts in the adult human breast

Article Open access 28 March 2024

Austin D. Reed, Sara Pensa, … Walid T. Khaled

Targeting DCAF5 suppresses SMARCB1-mutant cancer by stabilizing SWI/SNF

Article 27 March 2024

Sandi Radko-Juettner, Hong Yue, … Charles W. M. Roberts

Data availability

Input FASTA files have been submitted to the Gene Expression Omnibus under accession no. GSE145594 and analysis scripts are available from https://github.com/MPIIB-Department-TFMeyer/Dziubanska-Kusibab_et_al._Colibactin. All other data are available in the main text or supplementary materials.

References

Nougayrede, J. P. et al. Escherichia coli induces DNA double-strand breaks in eukaryotic cells. Science 313, 848–851 (2006).
Article CAS PubMed Google Scholar
Putze, J. et al. Genetic structure and distribution of the colibactin genomic island among members of the family Enterobacteriaceae. Infect. Immun. 77, 4696–4703 (2009).
Article CAS PubMed PubMed Central Google Scholar
Lee-Six, H. et al. The landscape of somatic mutation in normal colorectal epithelial cells. Nature 574, 532–537 (2019).
Article CAS PubMed Google Scholar
Watanabe, T., Tada, M., Nagai, H., Sasaki, S. & Nakao, M. Helicobacter pylori infection induces gastric cancer in Mongolian gerbils. Gastroenterology 115, 642–648 (1998).
Article CAS PubMed Google Scholar
Castellarin, M. et al. Fusobacterium nucleatum infection is prevalent in human colorectal carcinoma. Genome Res. 22, 299–306 (2012).
Article CAS PubMed PubMed Central Google Scholar
Arthur, J. C. et al. Intestinal inflammation targets cancer-inducing activity of the microbiota. Science 338, 120–123 (2012).
Article CAS PubMed PubMed Central Google Scholar
Cougnoux, A. et al. Bacterial genotoxin colibactin promotes colon tumour growth by inducing a senescence-associated secretory phenotype. Gut 63, 1932–1942 (2014).
Article CAS PubMed Google Scholar
Bleich, R. M. & Arthur, J. C. Revealing a microbial carcinogen. Science 363, 689–690 (2019).
Article CAS PubMed Google Scholar
Hibbing, M. E., Fuqua, C., Parsek, M. R. & Peterson, S. B. Bacterial competition: surviving and thriving in the microbial jungle. Nat. Rev. Microbiol. 8, 15–25 (2010).
Article CAS PubMed PubMed Central Google Scholar
Takahashi, I. et al. Duocarmycin A, a new antitumor antibiotic from Streptomyces. J. Antibiot. (Tokyo) 41, 1915–1917 (1988).
Article CAS Google Scholar
Igarashi, Y. et al. Yatakemycin, a novel antifungal antibiotic produced by Streptomyces sp. TP-A0356. J. Antibiot. (Tokyo) 56, 107–113 (2003).
Article CAS Google Scholar
Arcamone, F., Penco, S., Orezzi, P., Nicolella, V. & Pirelli, A. Structure and synthesis of distamycin A. Nature 203, 1064–1065 (1964).
Article CAS PubMed Google Scholar
Finlay, A., Hochstein, F., Sobin, B. & Murphy, F. Netropsin, a new antibiotic produced by a Streptomyces. J. Am. Chem. Soc. 73, 341–343 (1951).
Article CAS Google Scholar
Boger, D. L. & Johnson, D. S. CC-1065 and the duocarmycins: unraveling the keys to a new class of naturally derived DNA alkylating agents. Proc. Natl Acad. Sci. USA 92, 3642–3649 (1995).
Article CAS PubMed PubMed Central Google Scholar
Zha, L. et al. Colibactin assembly line enzymes use S-adenosylmethionine to build a cyclopropane ring. Nat. Chem. Biol. 13, 1063–1065 (2017).
Article CAS PubMed PubMed Central Google Scholar
Healy, A. R., Vizcaino, M. I., Crawford, J. M. & Herzon, S. B. Convergent and modular synthesis of candidate precolibactins. Structural revision of precolibactin A. J. Am. Chem. Soc. 138, 5426–5432 (2016).
Article CAS PubMed PubMed Central Google Scholar
Zha, L., Wilson, M. R., Brotherton, C. A. & Balskus, E. P. Characterization of polyketide synthase machinery from the pks island facilitates isolation of a candidate precolibactin. ACS Chem. Biol. 11, 1287–1295 (2016).
Article CAS PubMed Google Scholar
Vizcaino, M. I. & Crawford, J. M. The colibactin warhead crosslinks DNA. Nat. Chem. 7, 411–417 (2015).
Article CAS PubMed PubMed Central Google Scholar
Wilson, M. R. et al. The human gut bacterial genotoxin colibactin alkylates DNA. Science 363, eaar7785 (2019).
Xue, M. et al. Structure elucidation of colibactin and its DNA cross-links. Science 365, eaax2685 (2019).
Xue, M., Wernke, K. & Herzon, S. B. Depurination of colibactin-derived interstrand cross-links. Biochemistry 59, 892–900 (2020).
Article CAS PubMed Google Scholar
Yan, W. X. et al. BLISS is a versatile and quantitative method for genome-wide profiling of DNA double-strand breaks. Nat. Commun. 8, 15058 (2017).
Article CAS PubMed PubMed Central Google Scholar
Canela, A. et al. Genome organization drives chromosome fragility. Cell 170, 507–521.e518 (2017).
Article CAS PubMed PubMed Central Google Scholar
Yang, F., Kemp, C. J. & Henikoff, S. Anthracyclines induce double-strand DNA breaks at active gene promoters. Mutat. Res. 773, 9–15 (2015).
Article CAS PubMed PubMed Central Google Scholar
Sheffield, N. C. & Bock, C. LOLA: enrichment analysis for genomic region sets and regulatory elements in R and Bioconductor. Bioinformatics 32, 587–589 (2016).
Article CAS PubMed Google Scholar
Bailey, T. L. DREME: motif discovery in transcription factor ChIP-seq data. Bioinformatics 27, 1653–1659 (2011).
Article CAS PubMed PubMed Central Google Scholar
Tubbs, A. et al. Dual roles of poly(dA:dT) tracts in replication initiation and fork collapse. Cell 174, 1127–1142.e1119 (2018).
Article CAS PubMed PubMed Central Google Scholar
Nelson, H. C., Finch, J. T., Luisi, B. F. & Klug, A. The structure of an oligo(dA):oligo(dT) tract and its biological implications. Nature 330, 221–226 (1987).
Article CAS PubMed Google Scholar
Yuan, G. C. et al. Genome-scale identification of nucleosome positions in S. cerevisiae. Science 309, 626–630 (2005).
Article CAS PubMed Google Scholar
Tse, W. C. & Boger, D. L. Sequence-selective DNA recognition: natural products and nature’s lessons. Chem. Biol. 11, 1607–1617 (2004).
Article CAS PubMed Google Scholar
Drsata, T. et al. Mechanical properties of symmetric and asymmetric DNA A-tracts: implications for looping and nucleosome positioning. Nucleic Acids Res. 42, 7383–7394 (2014).
Article CAS PubMed PubMed Central Google Scholar
Buc, E. et al. High prevalence of mucosa-associated E. coli producing cyclomodulin and genotoxin in colon cancer. PLoS ONE 8, e56964 (2013).
Article CAS PubMed PubMed Central Google Scholar
Giannakis, M. et al. Genomic correlates of immune-cell Infiltrates in colorectal carcinoma. Cell Rep, 15, 857–865 (2016).
Article CAS Google Scholar
Katainen, R. et al. CTCF/cohesin-binding sites are frequently mutated in cancer. Nat. Genet. 47, 818–821 (2015).
Article CAS PubMed Google Scholar
Alexandrov, L. B. et al. The repertoire of mutational signatures in human cancer. Nature 578, 94–101 (2020).
Article CAS PubMed PubMed Central Google Scholar
Iannelli, F. et al. A damaged genome’s transcriptional landscape through multilayered expression profiling around in situ-mapped DNA double-strand breaks. Nat. Commun. 8, 15656 (2017).
Article CAS PubMed PubMed Central Google Scholar
Iyama, T. & Wilson, D. M. 3rd DNA repair mechanisms in dividing and non-dividing cells. DNA Repair 12, 620–636 (2013).
Article CAS PubMed PubMed Central Google Scholar
Choi, J. Y., Lim, S., Kim, E. J., Jo, A. & Guengerich, F. P. Translesion synthesis across abasic lesions by human B-family and Y-family DNA polymerases alpha, delta, eta, iota, kappa, and REV1. J. Mol. Biol. 404, 34–44 (2010).
Article CAS PubMed PubMed Central Google Scholar
Martin, L. P., Hamilton, T. C. & Schilder, R. J. Platinum resistance: the role of DNA repair pathways. Clin, Cancer Res. 14, 1291–1295 (2008).
Article CAS Google Scholar
Morin, P. J. et al. Activation of beta-catenin-Tcf signaling in colon cancer by mutations in beta-catenin or APC. Science 275, 1787–1790 (1997).
Article CAS PubMed Google Scholar
Pleguezuelos-Manzano, C. et al. Mutational signature in colorectal cancer caused by genotoxic pks ⁺ E. coli. Nature 580, 269–273 (2020).
Article CAS PubMed PubMed Central Google Scholar
Dziubańska-Kusibab, P. J. et al. Colibactin DNA damage signature indicates causative role in colorectal cancer. Preprint at https://www.biorxiv.org/content/10.1101/819854v1 (2019).
Gothe, H. J. et al. Spatial chromosome folding and active transcription drive DNA fragility and formation of oncogenic MLL translocations. Mol. Cell 75, 267–283.e212 (2019).
Article CAS PubMed Google Scholar
Zhang, F. et al. Breaks labeling in situ and sequencing (BLISS). Protocol Exchange https://doi.org/10.1038/protex.2017.018 (2017).
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Article PubMed PubMed Central CAS Google Scholar
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
CAS PubMed PubMed Central Google Scholar
The ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Speir, M. L. et al. The UCSC Genome Browser database: 2016 update. Nucleic Acids Res. 44, D717–D725 (2016).
Article CAS PubMed Google Scholar
Schneider, T. D. & Stephens, R. M. Sequence logos: a new way to display consensus sequences. Nucleic Acids Res. 18, 6097–6100 (1990).
Article CAS PubMed PubMed Central Google Scholar
Bembom, O. seqLogo: sequence logos for DNA sequence alignments. R package version 1.44.0 (2017).
Chiu, T. P. et al. DNAshapeR: an R/Bioconductor package for DNA shape prediction and feature encoding. Bioinformatics 32, 1211–1213 (2016).
Article CAS PubMed Google Scholar
Gelpi, J. L. et al. Classical molecular interaction potentials: improved setup procedure in molecular dynamics simulations of proteins. Proteins 45, 428–437 (2001).
Article CAS PubMed Google Scholar
Wang, J., Wolf, R. M., Caldwell, J. W., Kollman, P. A. & Case, D. A. Development and testing of a general amber force field. J. Comput. Chem. 25, 1157–1174 (2004).
Article CAS PubMed Google Scholar
Sousa da Silva, A. W. & Vranken, W. F. ACPYPE: AnteChamber PYthon Parser interfacE. BMC Res. Notes 5, 367 (2012).
Article PubMed PubMed Central Google Scholar
van Zundert, G. C. P. et al. The HADDOCK2.2 web server: user-friendly integrative modeling of biomolecular complexes. J. Mol. Biol. 428, 720–725 (2016).
Article CAS PubMed Google Scholar
Dans, P. D. et al. Long-timescale dynamics of the Drew–Dickerson dodecamer. Nucleic Acids Res. 44, 4052–4066 (2016).
Article CAS PubMed PubMed Central Google Scholar
Perez, A., Luque, F. J. & Orozco, M. Dynamics of B-DNA on the microsecond time scale. J. Am. Chem. Soc. 129, 14739–14745 (2007).
Article CAS PubMed Google Scholar
Jorgensen, W. L., Chandrasekhar, J., Madura, J. D., Impey, R. W. & Klein, M. L. Comparison of simple potential functions for simulating liquid water. J. Chem. Phys. 79, 926–935 (1983).
Article CAS Google Scholar
Berendsen, H. J., Postma, J. V., van Gunsteren, W. F., DiNola, A. & Haak, J. R. Molecular dynamics with coupling to an external bath. J. Chem. Phys. 81, 3684–3690 (1984).
Article CAS Google Scholar
Ryckaert, J.-P., Ciccotti, G. & Berendsen, H. J. Numerical integration of the cartesian equations of motion of a system with constraints: molecular dynamics of n-alkanes. J. Comput. Phys. 23, 327–341 (1977).
Article CAS Google Scholar
Ivani, I. et al. Parmbsc1: a refined force field for DNA simulations. Nat. Methods 13, 55–58 (2016).
Article CAS PubMed Google Scholar
Case, D. et al. AMBER 18 (University of California, 2018).
Roe, D. R. & Cheatham, T. E. 3rd PTRAJ and CPPTRAJ: software for processing and analysis of molecular dynamics trajectory data. J. Chem. Theory Comput. 9, 3084–3095 (2013).
Article CAS PubMed Google Scholar
Humphrey, W., Dalke, A. & Schulten, K. VMD: visual molecular dynamics. J. Mol. Graph. 14, 33–38 (1996). 27-38.
Article CAS PubMed Google Scholar
Ellrott, K. et al. Scalable open dcience approach for mutation calling of tumor exomes using multiple genomic pipelines. Cell Syst. 6, 271–281.e277 (2018).
Article CAS PubMed PubMed Central Google Scholar
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Statist. Soc. B 57, 289–300 (1995).
Google Scholar
Cibulskis, K. et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol. 31, 213–219 (2013).
Article CAS PubMed PubMed Central Google Scholar
Katainen, R. et al. Discovery of potential causative mutations in human coding and noncoding genome with the interactive software BasePlayer. Nat. Protoc. 13, 2580–2600 (2018).
Article CAS PubMed Google Scholar
Alexandrov, L. B. et al. The repertoire of mutational signatures in human cancer. Nature 578, 94–101 (2020).
Rosenthal, R., McGranahan, N., Herrero, J., Taylor, B. S. & Swanton, C. DeconstructSigs: delineating mutational processes in single tumors distinguishes DNA repair deficiencies and patterns of carcinoma evolution. Genome Biol. 17, 31 (2016).
Article PubMed PubMed Central CAS Google Scholar
Tamborero, D. et al. Cancer genome interpreter annotates the biological and clinical relevance of tumor alterations. Genome Med. 10, 25 (2018).
Article PubMed PubMed Central CAS Google Scholar
Bailey, M. H. et al. Comprehensive characterization of cancer driver genes and mutations. Cell 173, 371–385 e318 (2018).
Article CAS PubMed PubMed Central Google Scholar
Tate, J. G. et al. COSMIC: the catalogue of somatic mutations In cancer. Nucleic Acids Res. 47, D941–d947 (2019).
Article CAS PubMed Google Scholar
Gaffney, D. J. et al. Controls of nucleosome positioning in the human genome. PLoS Genet. 8, e1003036 (2012).
Article CAS PubMed PubMed Central Google Scholar
R Core Team. A Language and Environment for Statistical Computing https://www.R-project.org (R Foundation for Statistical Computing, 2018).

Download references

Acknowledgements

The results shown here are in whole or part based on data generated by the TCGA Research Network: https://www.cancer.gov/tcga. We thank S. Garnerone from N.C.’s laboratory for processing raw sBLISS data, P. D. Dans from the Biophysical Chemistry Lab, Department of Biological Science (CENUR Litoral Norte), UdelaR, Universidad de la República, for constructive discussions about the theoretical model of colibactin, U. Dobrindt from University of Münster for providing E. coli strains, and K. Lapid and R. Zietlow for editing the manuscript. We also acknowledge the computational resources provided by the CSC–IT Center for Science, Finland. P.J.D.-K. was supported by the IMPRS-IDI graduate school. B.A.M.B. was supported by a Rubicon fellowship from the Netherlands Organisation for Scientific Research. M.O. is an academic researcher of Institució Catalana de Recerca i Estudis Avancats. The present study was supported by the German Research Foundation (grant no. ME705/18-1) to T.F.M., by the Spanish Ministry of Science (grant nos. BIO2015-64802-R, BFU2014-61670-EXP and BFU2014-52864-R), the Catalan Government (grant no. 2014-SGR), Instituto de Salud Carlos III: Instituto Nacional de Bioinformática (PT 13/000/0030), the Biomolecular and Bioinformatics Resources Platform and the EU Horizon 2020 research and innovation program (grant nos. Elixir-Excelerate 676559 and BioExcel2:823830), and the MINECO Severo Ochoa Award of Excellence to the IRB Barcelona, and by the Ragnar Söderberg Foundation, the Swedish Foundation for Strategic Research (BD15-0095) and the Strategic Research Programme in Cancer (StratCan) at Karolinska Institutet to N.C. L.A.A. was supported by grants from the Academy of Finland (Finnish Center of Excellence no. 312041, Academy Professor grant nos. 319083 and 320149, iCAN Flagship 320185), Jane and Aatos Erkko Foundation, Sigrid Juselius Foundation, Helsinki Institute of Life Science and the Cancer Society of Finland.

Author information

These authors contributed equally: Paulina J. Dziubańska-Kusibab, Hilmar Berger.

Authors and Affiliations

Department of Molecular Biology, Max Planck Institute for Infection Biology, Berlin, Germany
Paulina J. Dziubańska-Kusibab, Hilmar Berger, Amina Iftekhar & Thomas F. Meyer
Institute for Research in Biomedicine (IRB Barcelona), Barcelona Institute of Science and Technology, Barcelona, Spain
Federica Battistini & Modesto Orozco
Science for Life Laboratory, Department of Medical Biochemistry and Biophysics, Karolinska Institutet, Stockholm, Sweden
Britta A. M. Bouwman & Nicola Crosetto
Applied Tumor Genomics Research Program and Department of Medical and Clinical Genetics, Biomedicum, University of Helsinki, Helsinki, Finland
Riku Katainen, Tatiana Cajuso & Lauri A. Aaltonen
Department of Biochemistry and Biomedicine, University of Barcelona, Barcelona, Spain
Modesto Orozco

Authors

Paulina J. Dziubańska-Kusibab
View author publications
You can also search for this author in PubMed Google Scholar
Hilmar Berger
View author publications
You can also search for this author in PubMed Google Scholar
Federica Battistini
View author publications
You can also search for this author in PubMed Google Scholar
Britta A. M. Bouwman
View author publications
You can also search for this author in PubMed Google Scholar
Amina Iftekhar
View author publications
You can also search for this author in PubMed Google Scholar
Riku Katainen
View author publications
You can also search for this author in PubMed Google Scholar
Tatiana Cajuso
View author publications
You can also search for this author in PubMed Google Scholar
Nicola Crosetto
View author publications
You can also search for this author in PubMed Google Scholar
Modesto Orozco
View author publications
You can also search for this author in PubMed Google Scholar
Lauri A. Aaltonen
View author publications
You can also search for this author in PubMed Google Scholar
Thomas F. Meyer
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

F.B., B.A.M.B., A.I. and R.K. are co-second authors. P.J.D.-K., H.B. and T.F.M. conceived the project and designed experiments. N.C. and B.A.M.B. designed the sBLISS experiments. P.J.D.-K. and B.A.M.B. performed the sBLISS experiments. A.I. performed the E. coli infection and immunostaining experiments. P.J.D.-K. and A.I. prepared the samples for sBLISS. P.J.D.-K. and H.B. performed the bioinformatics analysis. F.B. and M.O. built the theoretical model of colibactin. R.K., T.C. and L.A.A. provided and analyzed the WGS CRC data. P.J.D.-K., H.B. and T.F.M. wrote the manuscript.

Corresponding author

Correspondence to Thomas F. Meyer.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Alison Farrell was the primary editor on this article, and managed its editorial process and peer review in collaboration with the rest of the editorial team.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Overview of the analyses provided in this study.

a, The overall results generated in this work derive from four independent investigations of the colibactin-induced double-strand breaks (DSBs), using sBLISS technology. The resulting data sets were first used to identify the colibactin damage motif (CDM), which turned out to be best described by the consensus hexameric sequence AAWWTT. b, Topological analyses and the assignment of shape characteristics led to a virtual model of colibactin–DNA interaction, which is compatible with a colibactin-induced inter-strand cross-link formation between two adenines at positions 2(+) and 5(-) of the minor groove. c, The detection of the CDM served to identify associated single nucleotide substitutions in large human cancer genome data sets - both exome (TCGA) and whole genome (Finish cohort). The comparison of different cancer entities indicated notable enrichment of CDM-associated mutations in some colorectal cancer. The comparison of colon cancer types indicated an enrichment at distal colon locations. The classification of CDM-associated mutations revealed a strong correlation with known triplet signatures, above all with the signature SBSA, which was previously identified in normal colon epithelia. d, An independent analysis of the exact DSB endpoints led to the identification of putative processing sites following colibactin-induced DNA damage. Accordingly, the processing sites were located downstream (in a 5’>3’ direction) of the putatively-alkylated adenines on opposite strands, generating 2 nt long 5’ overhangs at the broken DNA. Alignment of these break positions with the location of colibactin-specific cancer mutations revealed a striking local correspondence, indicating the occurrence of mutations at distinct sites on the CDM. e, Overall, the sBLISS-derived analyses provide extensive insight into the mutational function of colibactin and reveal strong links to human cancer.

Extended Data Fig. 2 Validation of sBLISS for the detection of colibactin-associated DSB break points.

a, In contrast to infection with pks^- E. coli, Caco-2 cells infected with colibactin-producing pks⁺ E. coli displayed γH2AX expression, megalocytosis and multinucleation. Representative image of two independent experiments. b, sBLISS signal of etoposide-induced DSBs showed increased counts at TSS as compared to untreated control conditions. c, This heatmap indicates the log2 odds ratio of break enrichment in genomic region collections (FDR < 5%) as compared to the rest of the genome: untreated, etoposide-treated, pks⁺ E. coli-infected and pks^- E. coli-infected Caco-2 cells. Rows represent known genomic regions from different genome annotation collections. Etoposide treatment increased DSB enrichment in TSS-associated features, while DSBs in pks⁺ E. coli-infected cells show a rather weaker enrichment compared to untreated and pks^- E. coli-infected cells, indicating a more uniform distribution across the genome.

Extended Data Fig. 3 Enrichment of motifs in the context of DSBs upon infection with pks+ E. coli or etoposide treatment.

a, Using the DREME analysis, we determined sequence motifs in the context of the DSBs in pks⁺ E. coli- vs. pks^- E. coli-infected cells, n=4. The top three motifs identified for each biological replicate are presented. b, Using the DREME analysis, we determined sequence motifs in the context of the DSBs in etoposide treated vs. untreated control cells, n=4. The top ten motifs identified for each biological replicate are presented.

Extended Data Fig. 4 Enrichment of CDMs compared to other AT-rich sequences and in nucleosome-free regions.

a, The enrichment of hexanucleotide sequences in close proximity to DSB positions (±7 nt) upon different treatments. Bars represent the enrichment of DSB-associated hexanucleotide motifs (mean log2 ratios of DSB at each motif comparing two conditions) in pks⁺ E. coli-infected cells vs. non-infected (NT) cells, n=4 independent experiments. Error bars denote the 95% confidence interval (CI) around the mean log2 ratios. b, The enrichment of pentanucleotide sequences in close proximity to DSB positions (±7 nt) upon different treatments. Bars represent the enrichment of DSB-associated pentanucleotide motifs (mean log2 ratios of DSB at each motif comparing two conditions) in pks⁺ E. coli-infected vs. pks^- E. coli-infected cells (left panel) or in pks^- E. coli-infected vs. non-infected (NT) cells (right panel), n=4 independent experiments. Error bars denote the 95% CI around the mean log2 ratios. c, Scaled mean Log2 enrichment values (scaled to 1 for maximum enrichment per replicate) of all nonamers in the context of DSBs detected by sBLISS, n=4 independent experiments. The x-axis represents the number of A/T in a nonamer. Red box plots represent AAWWTT motifs in the nonamer sequence, while turquoise box plots represent all other motifs. Each data point corresponds to one nonamer. d, The distributions of hexanucleotide patterns containing only A/T around centers of nucleosome dyads in the human genome as determined by 80. The values are smoothed proportions of hexanucleotides within ±100 bp next to the nucleosome dyad centers. Dashed vertical lines represent ± 73 bp that indicate the size of the nucleosome-covered DNA. Grey curves represent mean ± 2*SD of all A/T hexanucleotides, except for the AAWWTT motif, AAAAAA, ATATAT, TATATA and their reverse complements, n=33 hexanucleotide motifs. Full curves represent the AAWWTT motifs. Dashed curved represent the outlier motifs AAAAAA, ATATAT, TATATA, showing clearly distinct profiles from all other hexanucleotide profiles.

Extended Data Fig. 5 Topological features of hexameric sequences and the CDMs.

a, The relationship between different hexanucleotide sequences (log2 ratio standardized to 1; mean of 4 independent experiments) and the values of predicted DNA shape parameters. CDMs are noted, demonstrating notable enrichment. For propeller twist (ProT), the mean of the values are computed for the 3rd and 4th nucleotide of the motif; for roll and helical twists (HelT), the mean of the values are computed for base-pair steps 2 to 4 in each hexamer. Error bars denote the 95% CI around mean scaled log2 ratios for each hexanucleotide sequence, n=4. b, The distance between the cyclopropanes (Å) of colibactin by running a molecular dynamics simulation of free colibactin in water, n=1. The horizontal red line identifies the average values of this distance (12.8 Å). c, The correlations of hexanucleotide motif enrichment (mean log2 ratio) with DNA shape parameters (left - 3 columns) and central dinucleotides (rightmost column), depending on the presence and distance of adenines on opposite strands, n=4 independent experiments. The presence and distance of adenines are classified in the following groups (from top to bottom, all sequences in 5’-3’ direction): A-T at 3 nt distance, A-T at 4 nt distance, A-T at 3 and 4 nt distance, T-A at 3 nt distance, T-A at 4 nt distance, T-A at 3 and 4 nt distance, no A-T/T-A at 3 or 4 nt distance. Lines in the left three columns are mean log2 ratios smoothed by local regression, method LOESS (locally estimated scatterplot smoothing).

Extended Data Fig. 6 Colibactin-related mutational characteristics in human cancers (I).

a, Enrichment of SBS mutations at AAATTT/AAAATT CDMs in exome sequences obtained from the TCGA project. Top row: cancer entities that showed enrichment across all subcohorts. Second to fourth row: cancer entities that showed enrichment only for POLE-mutated cases or no enrichment at all. COAD - colon adenocarcinoma, READ - rectal adenocarcinoma, STAD - stomach adenocarcinoma, UCEC - uterine corpus endometroid cancer, BLCA - bladder cancer, CESC - cervix squamous cell carcinoma, HNSC - head and neck squamous cell cancer, LUAD - lung adenocarcinoma, LUSC - lung squamous cell carcinoma, DLBC – diffuse large B-cell lymphoma, LAML – acute myeloid leukemia, SARC - sarcoma. Asterisks - significant difference (two-sided Mann-Whitney-U test, p < 0.05 and FDR < 20%) between CDMs and all WWWWW motifs. Numbers under asterisks – number of mutations overlapping AAWWTT / all mutations, [number of samples per cohort]. Error bars = 0 ± 2MAD intervals of mutation rates (mutations/base-pair) of WWWWWW sequences, excluding CDMs, after subtracting their mean. Dots - mutation rates for AAATTT/AAAATT after mean subtraction. Crosses - means of the CDMs. Q1-Q4 cohorts defined by the quantiles of total SNV numbers. Outliers - samples with SNV numbers significantly larger than the 95% CI in each cancer entity. POLE - polymerase epsilon mutated cases.

Extended Data Fig. 7 Cross-contamination of cancer mutation enrichments at AAWWTT sequences.

a, A heat map of SBS pentanucleotide change patterns in colorectal cancer (CRC) genomes based on the PCAWG project, n=60. The results indicate that the established hypermutator-associated SBS signature SBS28 strongly affects one AAWWTT-associated pentanucleotide sequence (red arrow). SBS28 possibly biases AAWWTT mutation estimates (leftmost annotation column for AAWWTT-derived pentanucleotide signature) in POLE/POLD1-mutated CRCs. b, A heat map of SBS pentanucleotide change patterns in stomach adenocarcinoma (STAD) genomes based on the PCAWG project, n=75. The results indicate that the established SBS signature SBS17b strongly affects two AAWWTT-associated pentanucleotide sequences (red arrows). SBS17b possibly biases AAWWTT mutation estimates (leftmost annotation column for AAWWTT-derived pentanucleotide signature) in STADs. The heatmaps show relative proportions of pentanucleotide changes (n=133, rows) associated with AAWWTT-related sequences or SBS28/SBS17b trinucleotide changes. The rows are ordered by the strength of expected contribution from SBS28 (CRC, A) or SBS17b (STAD, B) signatures, followed by AAWWTT contributions. Expected contributions from each signature to pentanucleotide changes are shown as color code on the top of the heatmap for each row. The left part of the heatmap represents pentanucleotide changes with contributions greater than zero in SBS28 or SBS17b (A or B, respectively), whereas the right part represents pentanucleotide changes with expected contributions of zero. Estimated contributions of SBS28, SBS17b and SBSA per tumor sample that are greater than zero are shown as a color code on the left of the heatmap. Black marks on the left mark cases with protein-coding gene-changing mutations in POLE or POLD1. The weighted sum of AAWWTT-related mutation proportions are shown as a color code in the leftmost annotation column. Red arrows below the heatmap denote AWWTT-associated pentanucleotide changes with high contributions from SBS28 or SBS17b.

Extended Data Fig. 8 Colibactin-related mutational characteristics in human cancers (II).

a, The analysis of SBS mutations at colibactin-specific hexanucleotide motifs, AAAATT and AAATTT, in whole genome somatic mutation data obtained from (n=200 CRC) 70. Top - differences in log2(mutations/bp covered by motif) between colibactin-specific and all other WWWWW motifs. Middle - total mutation count at CDMs. Bottom – proportions of total mutations that overlap with CDMs in mutated cases of MSS, MSI, and POLE. Asterisks denote significant difference (two-sided Mann-Whitney-U test, p < 0.05 and FDR < 20%) between CDMs and all other motifs with the same A:T content and length (AAAATTT/AAATTT vs. WWWWWW motifs). Error bars describe the ± 2MAD intervals for mutation rate (mutations/bp covered by motif) of WWWWWW motifs, excluding CDMs after subtracting their mean. Dots represent the mutation rates for the two colibactin-specific hexanucleotides after subtracting the mean of the WWWWWW motifs. Crosses represent the mean of the CDMs. POLE -polymerase epsilon mutated cases. b, Left panel: the proportions of cancer samples with protein-changing mutations among entity-specific or pan-cancer driver genes across several cancer entities in the TCGA data set. Right panel: the number of unique mutations (that is, unique site and nucleotide change) per cancer site divided by the total sample number of mutations in the entity. For both panels, error bars denote 95% CI around this proportion per cancer entity or cancer site. Numbers on the top of the bars indicate total samples per entity or cancer site. Blue bars – the proportion of mutations at positions 2 or 5 of AAWWTT, green bars – the proportion of mutations at other positions in AAWWTT, red bars – the proportion of all other protein-coding gene-changing mutations overlapping the less stringent pentanucleotide motifs, AAATT/AATTT or AAAAT/ATTTT.

Extended Data Fig. 9 Schematic representation of sBLISS-derived double-strand end configurations.

a, Blunting of original break ends by T4 polymerase preserves the original 5’ ends, which can then be determined by sequencing using the sBLISS protocol. Mapping of sBLISS reads to the genome allows us to derive strand-specific break configurations across all genomic sites, scanning the same motif. Frequencies of break ends depend on the break end configuration and differ for blunt ends or breaks with 3’ and 5’ overhangs. b, Validation of the break end analysis in a cell line with an inducible AsiSI restriction enzyme 48. The identified break positions correspond to the known break points of this restriction enzyme 82.

Supplementary information

Supplementary Information

Abbreviations

Reporting Summary

Supplementary Tables 1 and 2.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dziubańska-Kusibab, P.J., Berger, H., Battistini, F. et al. Colibactin DNA-damage signature indicates mutational impact in colorectal cancer. Nat Med 26, 1063–1069 (2020). https://doi.org/10.1038/s41591-020-0908-2

Download citation

Received: 06 February 2020
Accepted: 23 April 2020
Published: 01 June 2020
Issue Date: July 2020
DOI: https://doi.org/10.1038/s41591-020-0908-2

This article is cited by

Multi-omic profiling reveals associations between the gut microbiome, host genome and transcriptome in patients with colorectal cancer
- Shaomin Zou
- Chao Yang
- Lekun Fang
Journal of Translational Medicine (2024)
The microbial landscape of colorectal cancer
- Maxwell T. White
- Cynthia L. Sears
Nature Reviews Microbiology (2024)
Location and condition based reconstruction of colon cancer microbiome from human RNA sequencing data
- Gaia Sambruni
- Angeli D. Macandog
- Martin H. Schaefer
Genome Medicine (2023)
Enabling programmable dynamic DNA chemistry using small-molecule DNA binders
- Junpeng Xu
- Guan Alex Wang
- Feng Li
Nature Communications (2023)
Mutational signatures reveal mutual exclusivity of homologous recombination and mismatch repair deficiencies in colorectal and stomach tumors
- Amir Farmanbar
- Robert Kneller
- Sanaz Firouzi
Scientific Data (2023)