Colorectal cancer (CRC) was the third most prevalent cancer and the second leading cause of cancer-related death worldwide for both sexes in 2020 [1]. In Australia, it is the second most commonly diagnosed cancer with 1 in 21 (4.8%) and 1 in 30 (3.3%) males and females, respectively, developing CRC by 75 years of age [2]. CRC is a heterogeneous disease with regards to molecular drivers and pathways of tumorigenesis. In 2007, Jass et al. [3] described the molecular heterogeneity of CRC based on the presence or absence of chromosomal instability (CIN), microsatellite instability/mismatch repair deficiency (MSI/MMRd), CpG island methylator phenotype (CIMP), BRAF c.1799T>A p.Val600Glu (BRAF p.V600E), KRAS somatic mutations in codons 12 and 13 and germline pathogenic variants in the DNA MMR genes (Lynch syndrome). This heterogeneity over time has been shown to be associated with differing risk factors, survival, and treatment response. For example, serrated polyps, which are the precursors of the serrated pathway of tumorigenesis, characterised by BRAF p.V600E, CIMP-high and proximal location with or without high levels of MSI, show stronger association with tobacco smoking, alcohol intake and high body mass index (BMI) than conventional adenomas [4].

The etiology of the majority of CRC is multifactorial, involving the interplay between genetic, epigenetic, and environmental/lifestyle factors [5] whilst only ~5% of CRC are caused by germline pathogenic variants in known cancer-predisposition genes [6]. More recently, the gut microbiome is increasingly recognised to play an important role in CRC development [7, 8]. The presence of certain genotoxic gut bacteria is associated with CRC development [9]. Previous studies have shown that pks+ Escherichia coli (pks+ E. coli+), Enterotoxigenic Bacteroides fragilis (ETBF) and Fusobacterium nucleatum (F. nucleatum) are enriched in colonic mucosa of CRC-affected patients compared with healthy individuals [10,11,12].

E. coli strains from the B2 phylogroup [13] frequently harbour the 54 kb polyketide synthase (pks) island, which encodes enzymes for colibactin biosynthesis [14,15,16]. Colibactin is a genotoxin that induces DNA damage including inter-strand cross links [17] and double-stranded breaks [14, 18]. Recently, the colibactin-induced DNA damage was found to occur in specific patterns of single base substitution mutations (T > N), frequently in ATN and TTT sequence contexts [19]. This has led to the discovery of tumour mutational signatures associated with colibactin, in particular, the presence of single base substitution (SBS)-88 and short insertions and deletions (indels or ID)-18 signatures [19]. The somatic APC splice mutation (c.835-8 A > G) has been proposed as a biomarker of colibactin-induced DNA damage due to the specific sequence context of ATT > C and this has been observed in multiple adenomas from patients with unexplained colorectal polyposis though the presence of pks + E.coli+ was not measured [20].

ETBF secretes Bacteroides fragilis toxin and can cause symptoms such as inflammatory diarrhoea in humans [21]. ETBF triggers colitis, induces TH17 cell infiltration in murine models and promotes colonic tumorigenesis [22], although the mechanism underlying ETBF-related colorectal tumorigenesis is currently unknown.

F. nucleatum produces F. nucleatum adhesin A, which alters the β-catenin/Wnt signalling pathway and promotes CRC tumour growth [23]. F. nucleatum inhibits T-cell mediated immune responses against tumour cells [24] as well as creates a pro-inflammatory microenvironment that is favourable for colorectal neoplasia progression [25]. F. nucleatum is associated with CRCs that demonstrate MSI-high/MMRd, BRAF p.V600E somatic mutation and CIMP-high, features of the serrated pathways of tumorigenesis [12].

The aim of this study was to identify the intratumoral presence of genotoxic gut bacterial species, namely pks+ in E. coli (pks+ E. coli+), pks+ in non-E. coli (pks+ E. coli) and ETBF, and F. nucleatum, in population-based CRC tumour samples from the Australasian Colorectal Cancer Family Registry (ACCFR) and Melbourne Collaborative Cohort Study (MCCS), and from the clinic-based CRC tumour samples from the Applying Novel Genomic approaches to Early-onset and suspected Lynch Syndrome colorectal and endometrial cancers (ANGELS) study. The association between the intratumoral bacteria and specific clinicopathological characteristics and molecular features, including the APC c.835-8 A > G somatic mutation were examined.


Australasian Colorectal Cancer Family Registry (ACCFR)

The ACCFR is the Australasian arm of the International Colon Cancer Family Registry with >42,000 recruited participants [26, 27]. Tumours tested in this study were from the population-based recruitment arm of the ACCFR, recruited from the Victorian Cancer Registry independent of family cancer history and diagnosed with invasive carcinoma of the colon or rectum between 1997 and 2007 during two recruitment phases [26]. Phase I recruitment (1997–2001) involved all CRC patients diagnosed between 18 and 44 years of age and 50% of CRC patients diagnosed between 45 and 59 years of age. Phase II recruitment (2001-2006) involved all CRC patients between 18 and 49 years of age [26]. From the Jeremy Jass Memorial Tissue Bank, a total of 823 primary adenocarcinomas of the colon or rectum during two recruitment phases with tumour tissue collected were available to this study [28]. Cancers were verified using obtainable pathology reports, cancer registry reports, medical records, and/or death certificates [26, 27].

Melbourne Collaborative Cohort Study (MCCS)

The MCCS is a prospective cohort study composed of 41,513 participants - 17,044 males and 24,469 females- recruited between 1990 and 1994 [29, 30], designed to understand the role of diet and lifestyle associated with cancer risk, including CRC [29, 30]. Tumour tissue was collected and molecularly characterised for a total of 858 CRCs, with the diagnosis age ranging from 41 to 86 years [28].

Applying Novel Genomic approaches to Early-onset and suspected Lynch Syndrome colorectal and endometrial cancers (ANGELS)

The ANGELS study recruited patients referred from family cancer clinics across Australia who were: (1) CRC- or endometrial cancer-affected people with an MMRd and/or microsatellite instability high (MSI-H) tumour with a diagnosis of suspected Lynch syndrome (as previously defined [31]), or (2) CRC- or endometrial cancer-affected people with an MMR-proficient (MMRp) and/or microsatellite stable cancer diagnosed <45 years of age. The ANGELS study had collected CRC tumour tissue for 229 participants diagnosed between 2014–2021 [32, 33].

Written informed consent was obtained from all study participants to collect blood and tumour tissue materials. The study protocols were approved by Human Research Ethics Committees at the University of Melbourne (ACCFR and ANGELS) and Cancer Council Victoria (MCCS). Given their rarity across the three study groups, CRCs from germline biallelic MUTYH pathogenic variant carriers (n = 4), constitutional MLH1 epimutation carriers (n = 6) and germline carriers of variant of uncertain significance in the MMR genes (VUS, n = 9) were excluded from this study. The details for samples included in this study are shown in Supplementary Fig. 1. A total of 29 participants had synchronous and metachronous CRCs and individual CRCs were treated independently in the statistical analysis.

Analyses of clinicopathological features and molecular characteristics

CRC tumour tissue was available for 813 and 816 probands for ACCFR and MCCS, respectively, as previously described [28], and 221 probands from the ANGELS study. A standardised pathological review was performed by anatomical pathologist (CR) for all three studies [34]. Tumours from the ileo-cecal junction, cecum, ascending colon, hepatic flexure, and transverse colon were classified as proximal, whereas tumours from the splenic flexure, descending, sigmoid colon and recto-sigmoid junction were grouped as distal. Tumours from the rectum were classified as rectal. Tumour-infiltrating lymphocytes were scored as present when there were ≥ 5 intra-epithelial lymphocytes in at least one high-power field (40×) [35].

Molecular characterisation of each tumour was performed using consistent methodology for the ACCFR and MCCS studies [28]. Tumour MMR status was determined by immunohistochemical (IHC) staining for MLH1, MSH2, MSH6 and PMS2 protein expression on all CRCs and for a subset, MSI status was determined as previously described [36,37,38]. For the ACCFR and MCCS studies, the primary antibodies used were MLH1 (G168-15, BD PharMingen), MSH2 (G219-1129, BD PharMingen), MSH6 (44, BD Transduction Labs) and PMS2 (A16-4, BD PharMingen). For the ANGELS study, tumour MMR status was determined from MMR IHC testing by clinical diagnostic laboratory or by MMR IHC testing performed internally as previously described [39] and confirmed by whole exome or targeted tumour sequencing using the additive feature combination approach [33]. For the ANGELS study, the primary antibodies used were MLH1 (M1), MSH2 (G219-1129), MSH6 (SP93) and PMS2 (A16-4) supplied from Roche Diagnostics (Basel, Switzerland). Across the three studies, tumours were categorised as: (1) MMRp if they showed retained and normal expression of all four MMR proteins by IHC and were microsatellite stable according to the additive feature combination approach or MSI-PCR method where tested, or (2) MMRd if they demonstrated loss of expression of one or more MMR proteins by IHC and/or were MSI-high according to the additive feature combination approach or MSI-PCR method where tested [28, 33].

MMRd CRCs were further divided into three subgroups: (1) Lynch syndrome - where a germline pathogenic variant in one of the DNA MMR genes (MLH1, MSH2, MSH6 and PMS2) or in EPCAM was identified as previously described [26, 38, 40, 41]; (2) MLH1 methylated CRCs - positive for tumour hypermethylation of the MLH1 gene promoter using the MethyLight assay as previously described for ACCFR and MCCS [28, 40, 42] and using MethyLight and MS-HRM assays as previously described for ANGELS [39] with MLH1 methylation positive tumours showing concomitant loss of MLH1 and PMS2 expression by IHC; (3) Somatic biallelic MMR gene inactivation resulting from two somatic MMR gene mutations (double somatic MMR mutations) determined as described previously from either targeted tumour sequencing assay [39, 43] or from tumour whole exome sequencing [32, 33].

For the ACCFR and MCCS, KRAS codons 12&13 somatic mutations were tested using real-time quantitative PCR (qPCR) with high-resolution melting analysis in the presence of the SYTO9 fluorescent intercalating dye followed by direct Sanger sequencing on cases with differential melting profiles as previously described [34, 44]. BRAF p.V600E somatic mutation was tested using a fluorescent allele-specific polymerase chain reaction assay as previously described [45]. For the ANGELS study, KRAS codons 12&13 and BRAF p.V600E somatic mutations were derived from custom-designed panel sequencing or tested using Sanger or allele-specific PCR as for the ACCFR/MCCS tumours [32, 33, 39]. CIMP-high tumours were defined by tumour hypermethylation at 3 or more of the promoter regions of the 5 tumour suppressor genes: CACNA1G, IGF2, NEUROG1, RUNX3 and SOCS1 using MethyLight [46] for all three studies.

DNA extraction from tumour samples and qPCR assays for detecting pks + E. coli, ETBF and F. nucleatum

The tumour rich regions of the FFPE CRC tumour tissue were macrodissected as previously described for the ACCFR and MCCS studies [28, 43]. For the ANGELS study, the genomic DNA was extracted from FFPE CRC tumour tissues using the QIAamp DNA FFPE Tissue Kit (QIAGEN, Hilden, Germany), and the concentration was assessed using the Qubit fluorometer (Thermo Fisher Scientific, California, USA). The intratumoral presence of pks+ E. coli+, pks+ E. coli, ETBF, and F. nucleatum was assessed by performing qPCR, which is detailed in Supplementary methods. The pks+ E. coli strains (34351) were kindly provided by Drs Danielle Ingle and Norelle Sherry from the Peter Doherty Institute for Infection and Immunity, collected as part of the Controlling Superbugs study [47] and classed as Extraintestinal Pathogenic Escherichia coli (ExPEC) [48]. The pks- E. coli strains (MG1655) were provided by Dr. Dianna M. Hocking (Department of Microbiology and Immunology, The University of Melbourne). Genomic DNA from pks+ E. coli and pks- E. coli was used as internal controls for qPCR.

Genotyping assay for the APC c.835-8 A > G somatic mutation

A custom-designed TaqMan genotyping assay was used to detect the APC: c.835-8 A > G mutation in tumour DNA (ThermoFisher Scientific, California, USA, Cat# ANGZYCC), which was set up using TaqMan Genotyping Master Mix (ThermoFisher Scientific), and performed on Thermo QuantStudio 7 (ThermoFisher Scientific). The presence of APC: c.835-8 A > G mutation was determined using the QuantStudio Real-time PCR System software (v1.7.2, ThermoFisher Scientific).

Statistical analysis

All statistical tests were performed using R programming software (v4.2.1). Logistic regression was used to assess the association between the presence of intratumoral bacteria and clinicopathological and tumour features. Unless indicated otherwise, all tests were adjusted for sex, age at CRC diagnosis and study. P < 0.05 were considered statistically significant.


Participant and tumour characteristics associated with the intratumoral bacterial presence

In total, 1697 CRC tumours from 1666 individuals from the ACCFR (44.2%), MCCS (43.5%), and ANGELS (12.3%) had results from intratumoral bacteria testing. A description of the participants and their tumour characteristics are provided in Table 1. Of those, 29 (1.7%) participants had a synchronous or metachronous CRC, 26% were diagnosed with CRC before age 45 years (early-onset CRC or EOCRC) and 15.6% were MMRd. Of these MMRd CRCs with an explained etiology, 20.6% were related to Lynch syndrome while the remaining 79.4% were related to somatic MMR inactivation with 47.4% related to tumour MLH1 methylation and 32% related to double somatic MMR mutations.

Table 1 CRC-affected participants and their tumour characteristics from each of the three studies.

The prevalence of each of the bacteria and their association with participant clinicopathological and tumour characteristics are shown in Table 2. Intratumoral prevalence of pks+ E. coli+, pks+ E. coli, ETBF and F. nucleatum was 10.3%, 10.4%, 6.1% and 8.9%, respectively. CRCs with intratumoral pks+ E. coli+, were associated with male sex (P < 0.01, odds ratio (OR) = 1.54, 95% confidence interval (CI) = 1.11–2.13) when compared with female sex. CRCs with intratumoral pks+ E. coli were associated with older age at CRC diagnosis (P < 0.01) and low-grade tumours when compared with high-grade tumours (P = 0.02, ORhigh grade = 0.57, 95% CI = 0.36–0.89).

Table 2 Intratumoural presence of pks+ E. coli, pks+ E. coli, ETBF and F nucleatum and the association with tumour characteristics.

CRCs with intratumoral ETBF were more likely to be MMRd (P < 0.01, OR = 2.16, 95% CI = 1.30–3.49) and have KRAS codons 12&13 somatic mutations (P = 0.02, OR = 1.67, 95% CI = 1.09–2.53) when compared with MMRp and CRCs without KRAS codon 12&13 somatic mutations, respectively. Intratumoral F. nucleatum was associated with proximal tumour location when compared with distal (P < 0.01, ORdistal = 0.42, 95% CI = 0.26–0.64) and rectal locations (P < 0.01, ORrectal = 0.36, 95% CI = 0.23–0.55). CRCs with intratumoral F. nucleatum were also associated with high histology grade (P < 0.01, OR = 2.14, 95% CI = 1.48–3.08) and mucinous adenocarcinoma histologic type (P < 0.01, OR = 2.51, 95% CI = 1.54–3.97). For molecular features, intratumoral F. nucleatum was associated with CRCs with MMRd (P < 0.01, OR = 3.90, 95% CI = 2.63–5.75), BRAF p.V600E somatic mutation (P < 0.01, OR = 2.13, 95% CI = 1.35–3.28), and CIMP-high (P < 0.01, OR = 2.62, 95%CI = 1.64–4.10) when compared with CRCs without these features.

An analysis of the 442 EOCRCs did not show evidence that clinicopathological or tumour characteristics were associated with the presence of pks+ E. coli+ (Supplementary Table 1). In EOCRCs, the presence of ETBF (P < 0.01, OR = 4.17, 95% CI = 1.77–9.67) or F. nucleatum (P < 0.01, OR = 3.36, 95%CI = 1.67–6.65) was both associated with MMRd when compared with MMRp CRCs. F. nucleatum was also associated with the proximal tumour location when compared with distal tumour location (P = 0.01, ORdistal = 0.36, 95% CI = 0.16–7.70) and rectal tumour location (P < 0.01, ORrectal = 0.33, 95% CI = 0.15–7.00) (Supplementary Table 1). ETBF was associated with proximal tumour location when compared with distal location (P = 0.04, ORprox = 2.78/ORdistal = 0.36, 95%CIprox = 0.13–6.25/95%CIdist = 0.16–7.70) but this was not significant when compared with rectal location (P = 0.20, ORprox = 3.03/ORrectal = 0.33, 95%CIprox = 0.14–6.67/95%CIrectal = 0.15–7.00).

The presence of any two or all three of the bacteria (pks+ E. coli+, ETBF and F. nucleatum) in the same CRC was uncommon, with only 54 (3.2%) tumours having the presence of >1 bacteria (Supplementary Fig. 2). Only 6 (0.4%) tumours were detected with all three bacteria and these CRCs were not associated with any specific clinicopathological characteristics (Supplementary Table 2). CRCs that had both ETBF and F. nucleatum detected were associated with MMRd when compared with CRCs that did not have both ETBF and F. nucleatum detected (P < 0.01, OR = 4.56, 95%CI = 1.38–13.56) (Supplementary Table 2).

The colibactin-associated APC: c.835-8 A > G somatic mutation is associated with pks+ E. coli+

The association between the intratumoral presence of the pks island, with or without E. coli (pks+ E. coli+ and pks+ E. coli) and the APC: c.835-8 A > G somatic mutation was tested. Due to lower DNA requirement than the intratumoral bacterial screening, 62 additional samples (total n = 1759) were included in the APC: c.835-8 A > G testing. Across all CRCs, 3.3% had the APC: c.835-8 A > G somatic mutation, which was consistent with the frequency observed in both EOCRCs and late-onset CRCs (LOCRCs) (Table 3). The APC: c.835-8 A > G mutation was associated with intratumoral pks+ E. coli+ (P = 0.025, OR = 2.20, 95% CI = 1.05–4.25) but not with other bacteria carrying the pks island (pks+ E. coli; P = 0.36, OR = 0.61, 95% CI = 0.18–1.54) or with the E. coli bacteria not carrying the pks island (pks- E. coli+; P = 0.16, OR = 1.99, 95% CI = 0.67–4.72) (Table 3). These trends were consistent when tested in the EOCRCs (P = 0.022, OR = 4.26, 95%CI = 1.10–14.02), however, in the LOCRCs, the association between APC: c.835-8 A > G and pks+ E. coli+ was not significant (P = 0.18, OR = 1.78, 95% CI = 0.70–3.94) (Table 3). The APC: c.835-8 A > G mutation was not associated with intratumoral ETBF or F. nucleatum (data not shown). The participant and tumour characteristics associated with the APC: c.835-8 A > G somatic mutation are shown in Supplementary Table 3.

Table 3 The prevalence of the APC: c.835-8 A > G somatic mutation and its association with the intratumoral presence of pks+ E. coli+, pks+ E. coli, all pks+ bacteria and pks- E. coli+.

F. nucleatum is associated with both inherited and sporadic subtypes of MMRd CRC

ETBF and F. nucleatum were associated with MMRd CRCs (Table 2). The etiology of MMRd for 228 of these CRCs was known comprising 47 (20.6%) CRCs from people with Lynch syndrome, and 181 (79.4%) related to sporadic causes namely MLH1 promoter methylation (n = 108; 47.4%) and double somatic MMR mutations (n = 73; 32%). We further investigated the association between these bacteria and specific MMRd subgroups. The presence of F. nucleatum was associated with all three MMRd subgroups (Table 4), where the association was strongest for the MLH1 methylated subgroup (P < 0.01, OR = 4.91, 95% CI = 2.84–8.36). No associations were observed between the specific MMRd subgroups and intratumoral pks + E. coli or ETBF, though ETBF was overall associated with MMRd status (Table 4).

Table 4 Intratumoural presence of pks+ E. coli, pks+ E. coli, ETBF and F nucleatum and the association with CRC subgroup.

Tumour-infiltrating lymphocytes (TILs) are associated with F. nucleatum but not with pks+ E. coli or ETBF

The intratumoral presence of each of pks+ E. coli+, pks+ E. coli, ETBF, and F. nucleatum bacteria and the association with mild or marked levels of TILs present (combined as TILs present) within the tumour microenvironment was tested. The presence of TILs was associated with F. nucleatum (P < 0.01, OR = 1.97, 95% CI = 1.35–2.85), but not with pks+ E. coli or ETBF when tested across all CRCs (Table 5). The MMRd status was associated with both F. nucleatum (P < 0.01; Table 2) and TILs (P < 0.01; Table 5), we performed a stratified analysis to test whether the association between TILs and F. nucleatum is independent of MMR status. When the CRCs were stratified by tumour MMR status, the association between TILs and F. nucleatum was no longer present for either MMRp or MMRd CRCs (Table 5). These findings were consistent in both EOCRC and LOCRC (Table 5). The APC: c.835-8 A > G somatic mutation showed an inverse association with the presence of TILs across all CRCs (P < 0.01, ORAPC:c.835-8A>G = 0.19, 95% CI = 0.05–0.53), however, this observation was no longer significant when only MMRp CRCs were included in the analysis (P = 0.056, ORAPC:c.835-8A>G = 0.32, 95% CI = 0.08–0.87) (Supplementary Table 4).

Table 5 The association between tumour infiltrating lymphocytes (TILs) and intratumoral presences pks+ E. coli, ETBF and F nucleatum.


In this study of three Australian-based CRC cohorts comprising 1697 CRCs, the intratumoral prevalence of the pks+ E. coli+, ETBF and F. nucleatum genotoxic gut bacteria was 10%, 6% and 9%, respectively. The prevalence of other non-E. coli bacteria harbouring the pks island (pks+ E. coli) was 10%. An association between the APC: c.835-8 A > G somatic mutation and the intratumoral presence of pks+ E. coli+ was observed, although no association was observed between this somatic mutation and other bacteria harbouring the pks island that produces the genotoxin colibactin, highlighting the specific relationship between this hotspot mutation and pks+ E. coli+ bacteria. The presence of ETBF or F. nucleatum were each associated with MMRd as was the presence of both of these bacteria in the same CRC. The co-occurrence of all three bacteria in the same CRC was uncommon (0.4% of all CRCs), suggesting there is minimal interplay between these bacteria at the time of CRC diagnosis.

Pks+ E. coli

Colibactin-producing pks+ E. coli+ promotes CRC development by causing double-stranded DNA breakage [49, 50] and a specific pattern of mutational signature, namely SBS88 and ID18 [19]. The association between APC: c.835-8 A > G hotspot mutation and SBS88 have been identified in people with unexplained adenomatous polyposis [20], providing a mechanistic link and a potential biomarker for colibactin-induced tumorigenesis. This present study identified a significant association between the APC: c.835-8 A > G mutation and intratumoral pks+ E. coli+ but not with other pks harbouring bacteria (pks+ E. coli), indicating the specific association of this mutation with pks+ E. coli+.

This study found that the pks+ E. coli+ tumours with the APC: c.835-8 A > G mutation were more prevalent in EOCRCs than LOCRCs (9.5% versus 5.3%), with only the association in EOCRCs showing statistical significance. The reason for this association in the EOCRCs is currently unknown and, if validated, raises interesting questions regarding the mechanism in EOCRC versus LOCRC. It has been hypothesised that early-life exposure to pks+ E. coli+ may influence early-onset tumorigenesis. Pks+ E. coli+ is a common gut bacteria found in ~31% of healthy infants by 1-month post-birth [51]. In addition, Lee-six et al. found that the colibactin-related mutation signature in normal colonic crypts from healthy young individuals and this was most active in younger children before reaching 10 years of age [52]. Therefore, the association between APC: c.835-8 A > G and pks+ E. coli+ in EOCRCs could be related to early-life exposure to the bacteria when our gut microbiome is still undergoing developmental changes [53], posing an especially “sensitive” period to extrinsic influences. Boot et al. argued that in later life when microbiome homoeostasis is established, individuals may be less susceptible to colibactin-related mutagenesis [54]. Studies aimed at prevention of colibactin-related EOCRC may need to focus on detection and eradication of pks+ E. coli+ in children.

In this study, APC: c.835-8 A > G mutation was detected in 3.3% of CRCs, a similar frequency (3.2%) to the previous report [55]. Whilst APC: c.835-8 A > G had a significant association with pks+ E. coli+, only a small subset (6.3%) of CRCs with pks+ E. coli+ had this mutation. As mentioned above, there might be a specific window when colibactin-induced damage is likely to occur, hence not causing the mutation in all CRCs exposed to the bacteria. Alternatively, prolonged exposure to this bacteria may be necessary to result in DNA damage. Studies suggest that up to 31% of healthy infants harbour pks+ E. coli+ by 1-month post-birth [51], though there still is no longitudinal study to investigate how long pks+ E. coli+ persists into the adult life and potentially induces CRC. Our study has focused on intratumoral pks+ E. coli+ at CRC diagnosis/resection and does not exclude prior pks+ E. coli+ infection. It is plausible that the association between APC: c.835-8 A > G and pks+ E. coli+ may be dependent on the duration of the exposure. Terlouw et al. [20], identified the APC: c.835-8 A > G mutation in premalignant adenomas from people with unexplained adenomatous polyposis supporting this mutation and colibactin-induced DNA damage as an early event in tumorigenesis, although further studies are needed to help elucidate this bacteria’s driver role during CRC development.

A recent study by Arima et al. [56] investigated intratumoral pks+ E. coli in 1175 CRCs collected as a part of two large prospective cohort studies. The authors found that intratumoral pks+ E. coli is associated with high western diet score, highlighting the interplay between gut bacterial pathogens and diet in CRC development. Consistent with the results reported by Arima et al. [56], this current study found no association between intratumoral pks+ E. coli+, and age at CRC diagnosis, tumour location, BRAF p.V600E mutation and CIMP-high. However, our study identified a significant association of pks+ E. coli+ with male sex, where this was not detected in Arima et al. Our study measured pks+ E. coli+ using two target genes for the pks island (ClbB) and E. coli (UidA), only capturing the presence of pks+ E. coli+, whereas Arima et al. [56] targeted only the pks island (ClbB) and, therefore, could not differentiate between pks+ in E. coli or pks+ in other bacteria. This may explain the association with the male gender, which was not present in our pks only analysis. In our study, there was no evidence of an association between pks+ E. coli+ and presence of elevated TILs, indicating that pks+ E. coli+ does not cause a highly immunogenic tumour microenvironment (TME), at least at the time of CRC diagnosis/resection.


ETBF was significantly enriched in MMRd CRCs when compared with MMRp CRCs but not associated with any specific MMRd subgroups of either hereditary or sporadic etiology. This suggests that ETBF may be associated with the tumour microenvironment related to MMRd, rather than playing a causative role in CRC development. ETBF is infectious bacteria, which cause acute inflammation of the colon and shown to be a risk factor for colitis [57]. ETBF is present in colonic mucosa of people with familial adenomatous polyposis [58] and promotes oncogenic processes in a tumour-prone mice model (ApcMinΔ/+) [59].

Allen et al. reported that ETBF promotes loss of heterozygosity of Apc in the Apc mutant mice, however, the organoids exposed to ETBF showed near identical mutational profiles to unexposed controls, suggesting that ETBF does not cause a unique mutational signature unlike the SBS88 and ID18 signatures associated with pks+ E. coli [60]. This suggests that the carcinogenic mechanism of ETBF may not involve characteristic genomic aberrations and suggests alternative mechanisms including DNA methylation. Maiuri et al. examined genome-wide DNA methylation in mice exposed to ETBF. The authors reported that normal epithelium exposed to ETBF undergoes inflammation-driven tumorigenesis and caused a unique DNA methylation signature [61]. Interestingly, these methylation aberrations were abrogated in mice with dysfunctional Msh2, however, yet still promoting tumorigenesis. This suggests a multi-faceted role of ETBF and warrants further investigation of ETBF as the tumorigenic instigator and the risk mediator via modifying the epigenome.

F. nucleatum

F. nucleatum was associated with tumour characteristics related to MMRd and CRCs of the serrated pathway (e.g., right location, BRAF p.V600E, CIMP-high), consistent with previous reports [62, 63]. In addition, our study identified that F. nucleatum was associated with MMRd related to Lynch syndrome as well as sporadic MMRd CRCs related to MLH1 promoter methylation and double MMR somatic mutations. These findings suggest that F. nucleatum colonisation in the tumour is not related to MMRd etiology. This supports F. nucleatum causing opportunistic infections [64], exploiting the specific tumour microenvironment of MMRd CRCs.

Though F. nucleatum is found to be highly abundant in CRC tumours [64,65,66], the oncogenic mechanism of F. nucleatum remains to be elucidated, especially lacking are studies investigating their effect on the host genetic and epigenetics. Our study identified an association between TILs and F. nucleatum and this suggests the favourable tumour microenvironment of F. nucleatum. Given this association was no longer significant in the stratified analysis by MMR status, it may suggest that this interrelationship is dependent from the overrepresentation of TILs in MMRd CRCs, rather than suggesting a causative role of F. nucleatum.

Strengths and weaknesses of this study

Our study has several strengths including a large sample size from three CRC cohorts that have extensive molecular characterisation. Our study was the first to screen for intratumoral presence of these three CRC-associated bacteria but also to investigate the association between pks+ E. coli and the APC: c.835-8 A > G somatic mutation that is mechanistically linked to colibactin-related DNA damage. Our study also provided a stratified analysis of clinically relevant MMRd subgroups of both hereditary and sporadic etiologies, providing further insight into the nature of the association between MMRd and these bacteria. Further, by including a large number of EOCRCs, our study has provided separate analysis on EOCRCs, which is an emerging health problem, globally [67]. A further strength of our study was the use of assays that targeted both the pks island and E. coli, ensuring differentiation pks+ E. coli+ from other pks harbouring bacteria. This differentiation highlighted the specific association between pks+ E. coli+ and the APC: c.835-8 A > G somatic mutation.

The limitations of this study include the cross-sectional study design where in our participants, the prior infection of these bacteria could not be examined. Ex vivo or in vitro studies such as organoids co-culture experiments [19, 60] may further elucidate the direct effect these bacteria may have on colonic mucosa. Additionally, bacterial screening on premalignant polyps may help strengthen the early role of these bacteria in driving colorectal neoplasm.

In this study, we utilised FFPE specimens for detecting intratumoral bacteria. Although this biospecimen type is commonly used in such studies [12, 56], a recent meta-analysis indicated that the biospecimen type could influence the detection efficacy and affect the results [68]. Further studies performed on different biospecimen types (e.g., fresh frozen) may be needed to validate the findings from our study. Another limitation includes the lack of TNM stage information.


This study provides novel findings on specific molecular features and pathways of tumorigenesis associated with each genotoxic gut bacterium. The strength of the association between the presence of intratumoral pks+ E. coli+ and APC: c.835-8 A > G somatic mutation is shown for the first time. This has important clinical implication as the APC: c.835-8 A > G somatic mutation may represent a biomarker for colibactin-induced DNA damage in CRC tumours caused by pks+ E. coli+. This finding provides new opportunities for future studies on prevention and treatments of bacterial-driven CRCs.