The mucosal epithelium is a common target of damage by chronic bacterial infections and the accompanying toxins, and most cancers originate from this tissue. We investigated whether colibactin, a potent genotoxin1 associated with certain strains of Escherichia coli2, creates a specific DNA-damage signature in infected human colorectal cells. Notably, the genomic contexts of colibactin-induced DNA double-strand breaks were enriched for an AT-rich hexameric sequence motif, associated with distinct DNA-shape characteristics. A survey of somatic mutations at colibactin target sites of several thousand cancer genomes revealed notable enrichment of this motif in colorectal cancers. Moreover, the exact double-strand-break loci corresponded with mutational hot spots in cancer genomes, reminiscent of a trinucleotide signature previously identified in healthy colorectal epithelial cells3. The present study provides evidence for the etiological role of colibactin in human cancer.
Subscribe to Journal
Get full journal access for 1 year
only $18.75 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Input FASTA files have been submitted to the Gene Expression Omnibus under accession no. GSE145594 and analysis scripts are available from https://github.com/MPIIB-Department-TFMeyer/Dziubanska-Kusibab_et_al._Colibactin. All other data are available in the main text or supplementary materials.
Nougayrede, J. P. et al. Escherichia coli induces DNA double-strand breaks in eukaryotic cells. Science 313, 848–851 (2006).
Putze, J. et al. Genetic structure and distribution of the colibactin genomic island among members of the family Enterobacteriaceae. Infect. Immun. 77, 4696–4703 (2009).
Lee-Six, H. et al. The landscape of somatic mutation in normal colorectal epithelial cells. Nature 574, 532–537 (2019).
Watanabe, T., Tada, M., Nagai, H., Sasaki, S. & Nakao, M. Helicobacter pylori infection induces gastric cancer in Mongolian gerbils. Gastroenterology 115, 642–648 (1998).
Castellarin, M. et al. Fusobacterium nucleatum infection is prevalent in human colorectal carcinoma. Genome Res. 22, 299–306 (2012).
Arthur, J. C. et al. Intestinal inflammation targets cancer-inducing activity of the microbiota. Science 338, 120–123 (2012).
Cougnoux, A. et al. Bacterial genotoxin colibactin promotes colon tumour growth by inducing a senescence-associated secretory phenotype. Gut 63, 1932–1942 (2014).
Bleich, R. M. & Arthur, J. C. Revealing a microbial carcinogen. Science 363, 689–690 (2019).
Hibbing, M. E., Fuqua, C., Parsek, M. R. & Peterson, S. B. Bacterial competition: surviving and thriving in the microbial jungle. Nat. Rev. Microbiol. 8, 15–25 (2010).
Takahashi, I. et al. Duocarmycin A, a new antitumor antibiotic from Streptomyces. J. Antibiot. (Tokyo) 41, 1915–1917 (1988).
Igarashi, Y. et al. Yatakemycin, a novel antifungal antibiotic produced by Streptomyces sp. TP-A0356. J. Antibiot. (Tokyo) 56, 107–113 (2003).
Arcamone, F., Penco, S., Orezzi, P., Nicolella, V. & Pirelli, A. Structure and synthesis of distamycin A. Nature 203, 1064–1065 (1964).
Finlay, A., Hochstein, F., Sobin, B. & Murphy, F. Netropsin, a new antibiotic produced by a Streptomyces. J. Am. Chem. Soc. 73, 341–343 (1951).
Boger, D. L. & Johnson, D. S. CC-1065 and the duocarmycins: unraveling the keys to a new class of naturally derived DNA alkylating agents. Proc. Natl Acad. Sci. USA 92, 3642–3649 (1995).
Zha, L. et al. Colibactin assembly line enzymes use S-adenosylmethionine to build a cyclopropane ring. Nat. Chem. Biol. 13, 1063–1065 (2017).
Healy, A. R., Vizcaino, M. I., Crawford, J. M. & Herzon, S. B. Convergent and modular synthesis of candidate precolibactins. Structural revision of precolibactin A. J. Am. Chem. Soc. 138, 5426–5432 (2016).
Zha, L., Wilson, M. R., Brotherton, C. A. & Balskus, E. P. Characterization of polyketide synthase machinery from the pks island facilitates isolation of a candidate precolibactin. ACS Chem. Biol. 11, 1287–1295 (2016).
Vizcaino, M. I. & Crawford, J. M. The colibactin warhead crosslinks DNA. Nat. Chem. 7, 411–417 (2015).
Wilson, M. R. et al. The human gut bacterial genotoxin colibactin alkylates DNA. Science 363, eaar7785 (2019).
Xue, M. et al. Structure elucidation of colibactin and its DNA cross-links. Science 365, eaax2685 (2019).
Xue, M., Wernke, K. & Herzon, S. B. Depurination of colibactin-derived interstrand cross-links. Biochemistry 59, 892–900 (2020).
Yan, W. X. et al. BLISS is a versatile and quantitative method for genome-wide profiling of DNA double-strand breaks. Nat. Commun. 8, 15058 (2017).
Canela, A. et al. Genome organization drives chromosome fragility. Cell 170, 507–521.e518 (2017).
Yang, F., Kemp, C. J. & Henikoff, S. Anthracyclines induce double-strand DNA breaks at active gene promoters. Mutat. Res. 773, 9–15 (2015).
Sheffield, N. C. & Bock, C. LOLA: enrichment analysis for genomic region sets and regulatory elements in R and Bioconductor. Bioinformatics 32, 587–589 (2016).
Bailey, T. L. DREME: motif discovery in transcription factor ChIP-seq data. Bioinformatics 27, 1653–1659 (2011).
Tubbs, A. et al. Dual roles of poly(dA:dT) tracts in replication initiation and fork collapse. Cell 174, 1127–1142.e1119 (2018).
Nelson, H. C., Finch, J. T., Luisi, B. F. & Klug, A. The structure of an oligo(dA):oligo(dT) tract and its biological implications. Nature 330, 221–226 (1987).
Yuan, G. C. et al. Genome-scale identification of nucleosome positions in S. cerevisiae. Science 309, 626–630 (2005).
Tse, W. C. & Boger, D. L. Sequence-selective DNA recognition: natural products and nature’s lessons. Chem. Biol. 11, 1607–1617 (2004).
Drsata, T. et al. Mechanical properties of symmetric and asymmetric DNA A-tracts: implications for looping and nucleosome positioning. Nucleic Acids Res. 42, 7383–7394 (2014).
Buc, E. et al. High prevalence of mucosa-associated E. coli producing cyclomodulin and genotoxin in colon cancer. PLoS ONE 8, e56964 (2013).
Giannakis, M. et al. Genomic correlates of immune-cell Infiltrates in colorectal carcinoma. Cell Rep, 15, 857–865 (2016).
Katainen, R. et al. CTCF/cohesin-binding sites are frequently mutated in cancer. Nat. Genet. 47, 818–821 (2015).
Alexandrov, L. B. et al. The repertoire of mutational signatures in human cancer. Nature 578, 94–101 (2020).
Iannelli, F. et al. A damaged genome’s transcriptional landscape through multilayered expression profiling around in situ-mapped DNA double-strand breaks. Nat. Commun. 8, 15656 (2017).
Iyama, T. & Wilson, D. M. 3rd DNA repair mechanisms in dividing and non-dividing cells. DNA Repair 12, 620–636 (2013).
Choi, J. Y., Lim, S., Kim, E. J., Jo, A. & Guengerich, F. P. Translesion synthesis across abasic lesions by human B-family and Y-family DNA polymerases alpha, delta, eta, iota, kappa, and REV1. J. Mol. Biol. 404, 34–44 (2010).
Martin, L. P., Hamilton, T. C. & Schilder, R. J. Platinum resistance: the role of DNA repair pathways. Clin, Cancer Res. 14, 1291–1295 (2008).
Morin, P. J. et al. Activation of beta-catenin-Tcf signaling in colon cancer by mutations in beta-catenin or APC. Science 275, 1787–1790 (1997).
Pleguezuelos-Manzano, C. et al. Mutational signature in colorectal cancer caused by genotoxic pks + E. coli. Nature 580, 269–273 (2020).
Dziubańska-Kusibab, P. J. et al. Colibactin DNA damage signature indicates causative role in colorectal cancer. Preprint at https://www.biorxiv.org/content/10.1101/819854v1 (2019).
Gothe, H. J. et al. Spatial chromosome folding and active transcription drive DNA fragility and formation of oncogenic MLL translocations. Mol. Cell 75, 267–283.e212 (2019).
Zhang, F. et al. Breaks labeling in situ and sequencing (BLISS). Protocol Exchange https://doi.org/10.1038/protex.2017.018 (2017).
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
The ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Speir, M. L. et al. The UCSC Genome Browser database: 2016 update. Nucleic Acids Res. 44, D717–D725 (2016).
Schneider, T. D. & Stephens, R. M. Sequence logos: a new way to display consensus sequences. Nucleic Acids Res. 18, 6097–6100 (1990).
Bembom, O. seqLogo: sequence logos for DNA sequence alignments. R package version 1.44.0 (2017).
Chiu, T. P. et al. DNAshapeR: an R/Bioconductor package for DNA shape prediction and feature encoding. Bioinformatics 32, 1211–1213 (2016).
Gelpi, J. L. et al. Classical molecular interaction potentials: improved setup procedure in molecular dynamics simulations of proteins. Proteins 45, 428–437 (2001).
Wang, J., Wolf, R. M., Caldwell, J. W., Kollman, P. A. & Case, D. A. Development and testing of a general amber force field. J. Comput. Chem. 25, 1157–1174 (2004).
Sousa da Silva, A. W. & Vranken, W. F. ACPYPE: AnteChamber PYthon Parser interfacE. BMC Res. Notes 5, 367 (2012).
van Zundert, G. C. P. et al. The HADDOCK2.2 web server: user-friendly integrative modeling of biomolecular complexes. J. Mol. Biol. 428, 720–725 (2016).
Dans, P. D. et al. Long-timescale dynamics of the Drew–Dickerson dodecamer. Nucleic Acids Res. 44, 4052–4066 (2016).
Perez, A., Luque, F. J. & Orozco, M. Dynamics of B-DNA on the microsecond time scale. J. Am. Chem. Soc. 129, 14739–14745 (2007).
Jorgensen, W. L., Chandrasekhar, J., Madura, J. D., Impey, R. W. & Klein, M. L. Comparison of simple potential functions for simulating liquid water. J. Chem. Phys. 79, 926–935 (1983).
Berendsen, H. J., Postma, J. V., van Gunsteren, W. F., DiNola, A. & Haak, J. R. Molecular dynamics with coupling to an external bath. J. Chem. Phys. 81, 3684–3690 (1984).
Ryckaert, J.-P., Ciccotti, G. & Berendsen, H. J. Numerical integration of the cartesian equations of motion of a system with constraints: molecular dynamics of n-alkanes. J. Comput. Phys. 23, 327–341 (1977).
Ivani, I. et al. Parmbsc1: a refined force field for DNA simulations. Nat. Methods 13, 55–58 (2016).
Case, D. et al. AMBER 18 (University of California, 2018).
Roe, D. R. & Cheatham, T. E. 3rd PTRAJ and CPPTRAJ: software for processing and analysis of molecular dynamics trajectory data. J. Chem. Theory Comput. 9, 3084–3095 (2013).
Humphrey, W., Dalke, A. & Schulten, K. VMD: visual molecular dynamics. J. Mol. Graph. 14, 33–38 (1996). 27-38.
Ellrott, K. et al. Scalable open dcience approach for mutation calling of tumor exomes using multiple genomic pipelines. Cell Syst. 6, 271–281.e277 (2018).
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Statist. Soc. B 57, 289–300 (1995).
Cibulskis, K. et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol. 31, 213–219 (2013).
Katainen, R. et al. Discovery of potential causative mutations in human coding and noncoding genome with the interactive software BasePlayer. Nat. Protoc. 13, 2580–2600 (2018).
Alexandrov, L. B. et al. The repertoire of mutational signatures in human cancer. Nature 578, 94–101 (2020).
Rosenthal, R., McGranahan, N., Herrero, J., Taylor, B. S. & Swanton, C. DeconstructSigs: delineating mutational processes in single tumors distinguishes DNA repair deficiencies and patterns of carcinoma evolution. Genome Biol. 17, 31 (2016).
Tamborero, D. et al. Cancer genome interpreter annotates the biological and clinical relevance of tumor alterations. Genome Med. 10, 25 (2018).
Bailey, M. H. et al. Comprehensive characterization of cancer driver genes and mutations. Cell 173, 371–385 e318 (2018).
Tate, J. G. et al. COSMIC: the catalogue of somatic mutations In cancer. Nucleic Acids Res. 47, D941–d947 (2019).
Gaffney, D. J. et al. Controls of nucleosome positioning in the human genome. PLoS Genet. 8, e1003036 (2012).
R Core Team. A Language and Environment for Statistical Computing https://www.R-project.org (R Foundation for Statistical Computing, 2018).
The results shown here are in whole or part based on data generated by the TCGA Research Network: https://www.cancer.gov/tcga. We thank S. Garnerone from N.C.’s laboratory for processing raw sBLISS data, P. D. Dans from the Biophysical Chemistry Lab, Department of Biological Science (CENUR Litoral Norte), UdelaR, Universidad de la República, for constructive discussions about the theoretical model of colibactin, U. Dobrindt from University of Münster for providing E. coli strains, and K. Lapid and R. Zietlow for editing the manuscript. We also acknowledge the computational resources provided by the CSC–IT Center for Science, Finland. P.J.D.-K. was supported by the IMPRS-IDI graduate school. B.A.M.B. was supported by a Rubicon fellowship from the Netherlands Organisation for Scientific Research. M.O. is an academic researcher of Institució Catalana de Recerca i Estudis Avancats. The present study was supported by the German Research Foundation (grant no. ME705/18-1) to T.F.M., by the Spanish Ministry of Science (grant nos. BIO2015-64802-R, BFU2014-61670-EXP and BFU2014-52864-R), the Catalan Government (grant no. 2014-SGR), Instituto de Salud Carlos III: Instituto Nacional de Bioinformática (PT 13/000/0030), the Biomolecular and Bioinformatics Resources Platform and the EU Horizon 2020 research and innovation program (grant nos. Elixir-Excelerate 676559 and BioExcel2:823830), and the MINECO Severo Ochoa Award of Excellence to the IRB Barcelona, and by the Ragnar Söderberg Foundation, the Swedish Foundation for Strategic Research (BD15-0095) and the Strategic Research Programme in Cancer (StratCan) at Karolinska Institutet to N.C. L.A.A. was supported by grants from the Academy of Finland (Finnish Center of Excellence no. 312041, Academy Professor grant nos. 319083 and 320149, iCAN Flagship 320185), Jane and Aatos Erkko Foundation, Sigrid Juselius Foundation, Helsinki Institute of Life Science and the Cancer Society of Finland.
The authors declare no competing interests.
Peer review information Alison Farrell was the primary editor on this article, and managed its editorial process and peer review in collaboration with the rest of the editorial team.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
a, The overall results generated in this work derive from four independent investigations of the colibactin-induced double-strand breaks (DSBs), using sBLISS technology. The resulting data sets were first used to identify the colibactin damage motif (CDM), which turned out to be best described by the consensus hexameric sequence AAWWTT. b, Topological analyses and the assignment of shape characteristics led to a virtual model of colibactin–DNA interaction, which is compatible with a colibactin-induced inter-strand cross-link formation between two adenines at positions 2(+) and 5(-) of the minor groove. c, The detection of the CDM served to identify associated single nucleotide substitutions in large human cancer genome data sets - both exome (TCGA) and whole genome (Finish cohort). The comparison of different cancer entities indicated notable enrichment of CDM-associated mutations in some colorectal cancer. The comparison of colon cancer types indicated an enrichment at distal colon locations. The classification of CDM-associated mutations revealed a strong correlation with known triplet signatures, above all with the signature SBSA, which was previously identified in normal colon epithelia. d, An independent analysis of the exact DSB endpoints led to the identification of putative processing sites following colibactin-induced DNA damage. Accordingly, the processing sites were located downstream (in a 5’>3’ direction) of the putatively-alkylated adenines on opposite strands, generating 2 nt long 5’ overhangs at the broken DNA. Alignment of these break positions with the location of colibactin-specific cancer mutations revealed a striking local correspondence, indicating the occurrence of mutations at distinct sites on the CDM. e, Overall, the sBLISS-derived analyses provide extensive insight into the mutational function of colibactin and reveal strong links to human cancer.
Extended Data Fig. 2 Validation of sBLISS for the detection of colibactin-associated DSB break points.
a, In contrast to infection with pks- E. coli, Caco-2 cells infected with colibactin-producing pks+ E. coli displayed γH2AX expression, megalocytosis and multinucleation. Representative image of two independent experiments. b, sBLISS signal of etoposide-induced DSBs showed increased counts at TSS as compared to untreated control conditions. c, This heatmap indicates the log2 odds ratio of break enrichment in genomic region collections (FDR < 5%) as compared to the rest of the genome: untreated, etoposide-treated, pks+ E. coli-infected and pks- E. coli-infected Caco-2 cells. Rows represent known genomic regions from different genome annotation collections. Etoposide treatment increased DSB enrichment in TSS-associated features, while DSBs in pks+ E. coli-infected cells show a rather weaker enrichment compared to untreated and pks- E. coli-infected cells, indicating a more uniform distribution across the genome.
Extended Data Fig. 3 Enrichment of motifs in the context of DSBs upon infection with pks+ E. coli or etoposide treatment.
a, Using the DREME analysis, we determined sequence motifs in the context of the DSBs in pks+ E. coli- vs. pks- E. coli-infected cells, n=4. The top three motifs identified for each biological replicate are presented. b, Using the DREME analysis, we determined sequence motifs in the context of the DSBs in etoposide treated vs. untreated control cells, n=4. The top ten motifs identified for each biological replicate are presented.
Extended Data Fig. 4 Enrichment of CDMs compared to other AT-rich sequences and in nucleosome-free regions.
a, The enrichment of hexanucleotide sequences in close proximity to DSB positions (±7 nt) upon different treatments. Bars represent the enrichment of DSB-associated hexanucleotide motifs (mean log2 ratios of DSB at each motif comparing two conditions) in pks+ E. coli-infected cells vs. non-infected (NT) cells, n=4 independent experiments. Error bars denote the 95% confidence interval (CI) around the mean log2 ratios. b, The enrichment of pentanucleotide sequences in close proximity to DSB positions (±7 nt) upon different treatments. Bars represent the enrichment of DSB-associated pentanucleotide motifs (mean log2 ratios of DSB at each motif comparing two conditions) in pks+ E. coli-infected vs. pks- E. coli-infected cells (left panel) or in pks- E. coli-infected vs. non-infected (NT) cells (right panel), n=4 independent experiments. Error bars denote the 95% CI around the mean log2 ratios. c, Scaled mean Log2 enrichment values (scaled to 1 for maximum enrichment per replicate) of all nonamers in the context of DSBs detected by sBLISS, n=4 independent experiments. The x-axis represents the number of A/T in a nonamer. Red box plots represent AAWWTT motifs in the nonamer sequence, while turquoise box plots represent all other motifs. Each data point corresponds to one nonamer. d, The distributions of hexanucleotide patterns containing only A/T around centers of nucleosome dyads in the human genome as determined by 80. The values are smoothed proportions of hexanucleotides within ±100 bp next to the nucleosome dyad centers. Dashed vertical lines represent ± 73 bp that indicate the size of the nucleosome-covered DNA. Grey curves represent mean ± 2*SD of all A/T hexanucleotides, except for the AAWWTT motif, AAAAAA, ATATAT, TATATA and their reverse complements, n=33 hexanucleotide motifs. Full curves represent the AAWWTT motifs. Dashed curved represent the outlier motifs AAAAAA, ATATAT, TATATA, showing clearly distinct profiles from all other hexanucleotide profiles.
a, The relationship between different hexanucleotide sequences (log2 ratio standardized to 1; mean of 4 independent experiments) and the values of predicted DNA shape parameters. CDMs are noted, demonstrating notable enrichment. For propeller twist (ProT), the mean of the values are computed for the 3rd and 4th nucleotide of the motif; for roll and helical twists (HelT), the mean of the values are computed for base-pair steps 2 to 4 in each hexamer. Error bars denote the 95% CI around mean scaled log2 ratios for each hexanucleotide sequence, n=4. b, The distance between the cyclopropanes (Å) of colibactin by running a molecular dynamics simulation of free colibactin in water, n=1. The horizontal red line identifies the average values of this distance (12.8 Å). c, The correlations of hexanucleotide motif enrichment (mean log2 ratio) with DNA shape parameters (left - 3 columns) and central dinucleotides (rightmost column), depending on the presence and distance of adenines on opposite strands, n=4 independent experiments. The presence and distance of adenines are classified in the following groups (from top to bottom, all sequences in 5’-3’ direction): A-T at 3 nt distance, A-T at 4 nt distance, A-T at 3 and 4 nt distance, T-A at 3 nt distance, T-A at 4 nt distance, T-A at 3 and 4 nt distance, no A-T/T-A at 3 or 4 nt distance. Lines in the left three columns are mean log2 ratios smoothed by local regression, method LOESS (locally estimated scatterplot smoothing).
a, Enrichment of SBS mutations at AAATTT/AAAATT CDMs in exome sequences obtained from the TCGA project. Top row: cancer entities that showed enrichment across all subcohorts. Second to fourth row: cancer entities that showed enrichment only for POLE-mutated cases or no enrichment at all. COAD - colon adenocarcinoma, READ - rectal adenocarcinoma, STAD - stomach adenocarcinoma, UCEC - uterine corpus endometroid cancer, BLCA - bladder cancer, CESC - cervix squamous cell carcinoma, HNSC - head and neck squamous cell cancer, LUAD - lung adenocarcinoma, LUSC - lung squamous cell carcinoma, DLBC – diffuse large B-cell lymphoma, LAML – acute myeloid leukemia, SARC - sarcoma. Asterisks - significant difference (two-sided Mann-Whitney-U test, p < 0.05 and FDR < 20%) between CDMs and all WWWWW motifs. Numbers under asterisks – number of mutations overlapping AAWWTT / all mutations, [number of samples per cohort]. Error bars = 0 ± 2MAD intervals of mutation rates (mutations/base-pair) of WWWWWW sequences, excluding CDMs, after subtracting their mean. Dots - mutation rates for AAATTT/AAAATT after mean subtraction. Crosses - means of the CDMs. Q1-Q4 cohorts defined by the quantiles of total SNV numbers. Outliers - samples with SNV numbers significantly larger than the 95% CI in each cancer entity. POLE - polymerase epsilon mutated cases.
a, A heat map of SBS pentanucleotide change patterns in colorectal cancer (CRC) genomes based on the PCAWG project, n=60. The results indicate that the established hypermutator-associated SBS signature SBS28 strongly affects one AAWWTT-associated pentanucleotide sequence (red arrow). SBS28 possibly biases AAWWTT mutation estimates (leftmost annotation column for AAWWTT-derived pentanucleotide signature) in POLE/POLD1-mutated CRCs. b, A heat map of SBS pentanucleotide change patterns in stomach adenocarcinoma (STAD) genomes based on the PCAWG project, n=75. The results indicate that the established SBS signature SBS17b strongly affects two AAWWTT-associated pentanucleotide sequences (red arrows). SBS17b possibly biases AAWWTT mutation estimates (leftmost annotation column for AAWWTT-derived pentanucleotide signature) in STADs. The heatmaps show relative proportions of pentanucleotide changes (n=133, rows) associated with AAWWTT-related sequences or SBS28/SBS17b trinucleotide changes. The rows are ordered by the strength of expected contribution from SBS28 (CRC, A) or SBS17b (STAD, B) signatures, followed by AAWWTT contributions. Expected contributions from each signature to pentanucleotide changes are shown as color code on the top of the heatmap for each row. The left part of the heatmap represents pentanucleotide changes with contributions greater than zero in SBS28 or SBS17b (A or B, respectively), whereas the right part represents pentanucleotide changes with expected contributions of zero. Estimated contributions of SBS28, SBS17b and SBSA per tumor sample that are greater than zero are shown as a color code on the left of the heatmap. Black marks on the left mark cases with protein-coding gene-changing mutations in POLE or POLD1. The weighted sum of AAWWTT-related mutation proportions are shown as a color code in the leftmost annotation column. Red arrows below the heatmap denote AWWTT-associated pentanucleotide changes with high contributions from SBS28 or SBS17b.
a, The analysis of SBS mutations at colibactin-specific hexanucleotide motifs, AAAATT and AAATTT, in whole genome somatic mutation data obtained from (n=200 CRC) 70. Top - differences in log2(mutations/bp covered by motif) between colibactin-specific and all other WWWWW motifs. Middle - total mutation count at CDMs. Bottom – proportions of total mutations that overlap with CDMs in mutated cases of MSS, MSI, and POLE. Asterisks denote significant difference (two-sided Mann-Whitney-U test, p < 0.05 and FDR < 20%) between CDMs and all other motifs with the same A:T content and length (AAAATTT/AAATTT vs. WWWWWW motifs). Error bars describe the ± 2MAD intervals for mutation rate (mutations/bp covered by motif) of WWWWWW motifs, excluding CDMs after subtracting their mean. Dots represent the mutation rates for the two colibactin-specific hexanucleotides after subtracting the mean of the WWWWWW motifs. Crosses represent the mean of the CDMs. POLE -polymerase epsilon mutated cases. b, Left panel: the proportions of cancer samples with protein-changing mutations among entity-specific or pan-cancer driver genes across several cancer entities in the TCGA data set. Right panel: the number of unique mutations (that is, unique site and nucleotide change) per cancer site divided by the total sample number of mutations in the entity. For both panels, error bars denote 95% CI around this proportion per cancer entity or cancer site. Numbers on the top of the bars indicate total samples per entity or cancer site. Blue bars – the proportion of mutations at positions 2 or 5 of AAWWTT, green bars – the proportion of mutations at other positions in AAWWTT, red bars – the proportion of all other protein-coding gene-changing mutations overlapping the less stringent pentanucleotide motifs, AAATT/AATTT or AAAAT/ATTTT.
a, Blunting of original break ends by T4 polymerase preserves the original 5’ ends, which can then be determined by sequencing using the sBLISS protocol. Mapping of sBLISS reads to the genome allows us to derive strand-specific break configurations across all genomic sites, scanning the same motif. Frequencies of break ends depend on the break end configuration and differ for blunt ends or breaks with 3’ and 5’ overhangs. b, Validation of the break end analysis in a cell line with an inducible AsiSI restriction enzyme 48. The identified break positions correspond to the known break points of this restriction enzyme 82.
About this article
Cite this article
Dziubańska-Kusibab, P.J., Berger, H., Battistini, F. et al. Colibactin DNA-damage signature indicates mutational impact in colorectal cancer. Nat Med (2020). https://doi.org/10.1038/s41591-020-0908-2