Introduction

Colorectal cancer (CRC) is the third most common malignant tumour and the second most common cancer-related mortality in the world1. According to the temporal profiles and demographic predictions, approximately 2.2 million people will develop CRC, and 1.1 million people will die of the disease by 20302. The incidence and mortality of CRC tend to be higher in high-income countries, suggesting that a Western lifestyle could be a contributing factor2,3,4. Therefore, to reduce the number of CRC patients in the coming decades, it is essential to promote a lifestyle that mitigates CRC risk factors caused by the Western lifestyle typical of high-income countries.

Colibactin is a genotoxic secondary metabolite produced by organisms harbouring the polyketide synthase (pks) genomic island, including a certain gut Escherichia coli strain with a specific pks (pks+ E. coli)5,6,7. Colibactin alkylates the host cell DNA8, which leads to genomic instability by generating the DNA inter-strand cross-links that are potentially involved in the development of CRC8. Transient infection of mammalian cells with pks+ E. coli leads to cell cycle arrest5,7 and DNA double-strand breaks5,6,9. Previous studies have reported that the prevalence of pks+ E. coli isolated from the colonic epithelium is higher in CRC patients10, patients with familial adenomatous polyposis11, and patients with inflammatory bowel disease10 compared to healthy individuals. Therefore, pks+ E. coli may be a gut microbiome-related CRC risk factor.

Recently, reports have shown that changes in the faecal microbiome may occur in the very early stages of CRC12, leading to an increased risk of disease progression13,14. Moreover, among all cancer types, diet reportedly has the highest relative contribution to CRC incidence15. In fact, the effect of diet on the onset of CRC is greater than that of smoking or obesity16. Poor diet has also recently been associated with potentially adverse gut microbiome profiles in the fecael matter17, and colonic mucosa18 of healthy individuals. Although a causal relationship between dietary intake and CRC risk through the gut microbiome has not been established, reduced CRC risk by dietary intervention may be the most reasonable, and cost-effective method. Moreover, the relationship between dietary intake and the prevalence of pks+ E. coli, which is, to our knowledge, a risk factor for CRC, has not been elucidated. It is, therefore, essential to clarify this relationship as a means to potentially inform the design of CRC preventative measures. Hence, the purpose of this exploratory study was to examine the relationship between dietary intake and the prevalence of pks+ E. coli isolated from faecal matter in healthy Japanese individuals.

Results

Participant characteristics

The characteristics of participants with or without pks+ E. coli in the analysis cohort are presented in Table 1. Of the 223 participants included in the study, 60 were assigned to the pks+ E. coli group (26.9%). The prevalence of pks+ E. coli was significantly higher in males than in females. Age, serum triglyceride levels, and a family history of cancer were slightly lower in the pks+ E. coli group than in the pks- E. coli group, however, these differences were not significant. Meanwhile, there were slightly more people who smoked and were employed within the pks+ E. coli group compared to the pks- E. coli group, however, the results were not significantly.

Table 1 Baseline characteristics of particpants with or without pks+ E. coli.

Intake of food and beverages, and the pks+E. coli status

A comparison of food and beverage intake between participants with and without pks+ E. coli is shown in Table 2. The median green tea consumption was significantly lower in the pks+ E. coli group than in the pks- E. coli group (107.1 g, interquartile range [IQR] 21.4–324.6 vs 150.0 g, IQR 53.6–375.0, p = 0.019). In addition, after adjusting for energy intake using the density method, the median green tea consumption was significantly lower in the pks+ E. coli group compared to the pks- E. coli group (53.6 g/1,000 kcal, IQR 12.4–148.2 vs 86.7 g/1,000 kcal, IQR 29.8–257.7, p = 0.010). The same trend was observed in the total amount of non-alcoholic beverages consumed, which included green tea. The median egg intake was significantly lower in the pks+ E. coli group than in the pks- E. coli group (18.3 g/1,000 kcal, IQR 11.1–29.3 vs 22.8 g/1,000 kcal, IQR 14.0–33.2, p = 0.043). No difference was observed in the intake of other diets, beverages, or food groups between the two groups.

Table 2 Comparison of crude or energy-adjusted food and beverage intake between participant with and without pks+ E. coli.

Nutrient intake and the prevalence of pks+E. coli

A comparison of the energy and nutrient intake between participants with or without pks+ E. coli is shown in Table 3. All nutrient values are derived from food, not supplements. There was no significant difference observed in the overall crude mean nutrient intake between the two groups. However, after adjusting for energy intake using the density method, the average intake of riboflavin (mean 0.80 g/1,000 kcal, standard deviation [SD] 0.17 vs mean 0.86 g/1,000 kcal, SD 0.19, p = 0.026), folate (mean 210 g/1,000 kcal, SD 69 vs. mean 233 g/1,000 kcal, SD 75, p = 0.035), iron (mean 4.7 g/1,000 kcal, SD 1.1 vs. mean 5.0 g/1,000 kcal, SD 1.2, p = 0.044), and manganese (mean 1.60 g/1,000 kcal, SD 0.46 vs. mean 1.83 g/1,000 kcal, SD 0.58, p = 0.003) was significantly lower in the pks+ E. coli group compared to the pks- E. coli group. There were no significant differences observed in other nutrient intake between the two groups.

Table 3 Comparison of the crude or energy-adjusted energy and nutrient intake between participants with and without pks+ E. coli.

Multivariate analysis of dietary intake and the prevalence of pks+E. coli

Participant characteristics and dietary intake that were significantly associated with the prevalence of pks+ E. coli were applied to a multivariate analysis (Table 4). The multivariate analysis models, even after adjusting for covariates, showed that: (1) the prevalence of pks+ E. coli was significantly higher in males than in females (odds ratio [OR], OR, 2.27 [95% confidence interval (CI) 1.05–4.91], p = 0.038); and (2) the OR of pks+ E. coli for a 100 g/1,000 kcal increment in green tea consumption per day was 0.59 [95% CI 0.30–0.88, p = 0.003]. Regarding nutrients, the OR of pks+ E. coli for a 1 mg/1,000 kcal increment in manganese intake per day was 0.43 [95% CI 0.22–0.85, p = 0.012].

Table 4 Odds ratios of pks+ E. coli for the intake of energy-adjusted food, beverage, and nutrients calculated by multivariate logistic regression analysis.

Dietary intake variables that showed a significant association with the prevalence of pks+ E. coli in the multivariate analysis were used in a restricted cubic spline model to evaluate dose-dependent responses to pks+ E. coli (Fig. 1). The analysis of the curves for the dietary intake variables and the prevalence OR of pks+ E. coli showed that: (1) the prevalence OR of pks+ E. coli was significantly lower when green tea consumption exceeded approximately 430 g/day, or 260 g/1,000 kcal/day (reference: those without green tea consumption), and (2) the prevalence OR of pks+ E. coli was significantly lower when the intake of manganese exceeded approximately 2.86 mg/day (reference: 1.05 mg /day), or 2.65 mg/1,000 kcal/day (reference: 0.70 mg/1,000 kcal/day).

Figure 1
figure 1

Association of the intake of green tea (a,b) and manganese (c,d) with the prevalence odds ratio (OR) of colibactin-producing E. coli in a restricted cubic spline logistic regression model. Adjusted factors included age (continuous), sex (female or male), BMI (continuous), smoking status (never smoker, past smoker, or current smoker), family history of cancer (yes or no), energy intake (continuous), and step counts (continuous). The solid line represents the OR. The broken lines show the 95% Confidence Intervals (CIs). If the 95% CI for the OR did not include 1.00, the p value was estimated to be < 0.05. If the 95% CI included 1.00, the p value was estimated to be ≥ 0.05.

Discussion

The purpose of this cross-sectional study was to examine the relationship between dietary intake and the prevalence of pks+ E. coli in healthy Japanese individuals. After adjusting for confounding factors, our results showed a significant negative association between the prevalence of pks+ E. coli and the intake of green tea and manganese. In addition, we showed a significantly higher prevalence of pks+ E. coli in males than in females. To our knowledge, this is the first study to show a significant association between dietary intake and the prevalence of pks+ E. coli. Furthermore, the results of this study support a hypothesis suggesting an association between dietary intake and CRC risk.

Unlike other cancers, such as lung cancer, no single risk factor accounts for most cases of CRC. Apart from age and male sex, the following risk factors for CRC incidence have been identified and established in previous epidemiological studies: family history of cancer, inflammatory bowel disease, smoking, obesity, diabetes, excessive alcohol consumption, and high consumption of red and processed meat3. These established CRC risk factors have been associated with potentially adverse gut microbiome profiles13,14, indicating the importance of evaluating the prevalence of tumourigenic bacteria in the gut microbiota. This study indicated that the prevalence of pks+ E. coli isolated from faecal matter was 26.9% in our cohort. According to previous studies, the prevalence of pks+ E. coli isolated from the colonic epithelium was 20.8% in healthy UK10 and 22.0% in healthy US individuals11. The samples used to evaluate the prevalence of pks+ E. coli in previous studies and this current study were tissue and faecal matter, respectively. Although previous studies, as well as the current one, have different evaluation methods for the prevalence of pks+ E. coli, the results are relatively similar. Moreover, our results indicating that the prevalence of pks+ E. coli was significantly higher in males than in females supports those of previous studies that have reported the risk of CRC to be higher in males compared to females3,19. In other words, our results on the prevalence of pks+ E. coli are similar to those from previous studies in high-income countries, showing that participant characteristics in our study were unbiased and appropriate.

Taken together the key findings of the current study are as follows: (1) a negative association exists between green tea consumption, and manganese intake and the prevalence of pks+ E. coli; (2) a significantly lower OR of pks+ E. coli was observed in individuals with a daily intake of 430 g/day green tea compared to those without. Note, the manganese intake derived from green tea contributed to 75.6% of total manganese intake in this cohort, and most of the inter-individual variance in manganese intake was attributed to green tea consumption. Hence, the negative association observed between the intake of manganese and the prevalence of pks+ E. coli may be contributory to that of green tea consumption as the results lost significance following adjustment for green tea consumption. The accuracy of estimation of green tea intake by a brief-type self-administered diet history questionnaire (BDHQ) showed a moderate correlation with green tea intake estimated by dietary records (male, r = 0.68; female, r = 0.64). However, it has been reported that the accuracy of the estimated median green tea intake is overestimated by 20–23%20. Therefore, a minimum effective amount of green tea, which was significantly associated with reduced prevalence of pks+ E. coli in a dose–response relationship, may be 20% lower than approximately 430 g/day (i.e. ~ 340 g [2–3 cups/day]).

To our knowledge, there are two systematic reviews21,22, five prospective cohort studies23,24,25,26,27, and four case and control studies28,29,30,31 that have evaluated the relationship between green tea consumption and CRC risk; however, no consensus has been reached regarding the anti-CRC properties in these studies. The two systematic reviews concluded that the anti-CRC effects of green tea are inadequate and contradictory21,22. Only studies targeting East Asians have reported the anti-CRC effects of green tea26,27,29,30,31. Meanwhile, low green tea consumption in non-Asian countries may contribute to the non-significant results of these studies28. Green tea consumption for 7 days exhibited beneficial effects in improving lymphocytic DNA damage in middle-aged healthy non-smokers32. In addition, green tea catechins have anti-inflammatory properties33,34, which help to mitigate against oxidative tissue injury34. Subsequent changes in the gut microbiota and reduced intestinal inflammation may then be related to the anti-inflammatory properties of green tea and green tea polyphenols35. Although there are no studies describing the relationship between green tea consumption and the prevalence of pks+ E. coli, one study has described the effects of green tea consumption on the gut microbiome. Yuan et al. performed an intervention study in healthy Chinese adults and found a significant increase in the Firmicutes to Bacteroidetes ratio isolated from faecal matter after 2 weeks of green tea consumption (400 mL/day)36. Firmicutes and Bacteroidetes are two major bacterial phyla that dominate the human gut microbiota. The Firmicutes to Bacteroidetes ratio increases from birth to adulthood and is decreased in advanced ages37. Interestingly, higher frequencies of colibactin-producing E. coli and enterotoxigenic Bacteroides fragilis in the colonic epithelium were observed in patients with familial adenomatous polyposis compared to healthy individuals11. Their study also reported that mice with guts co-colonised with colibactin-producing E. coli and B. fragilis had a higher tumour growth rate due to increased levels of interleukin-17 in the colon, as well as DNA damage in the colonic epithelium compared to mice with either bacterial strain alone11. These results suggest that green tea consumption significantly reduces the prevalence of pks+ E. coli by suppressing the growth of certain microorganisms in the gut microbiome. The detailed mechanisms and causal relationships must be clarified with further intervention studies and fundamental studies.

The strength of this study is the verified association between the prevalence of pks+ E. coli and dietary intake estimated using a validated dietary assessment tool. By using the approach described above, this study generated a new hypothesis for the association between diet and the prevalence of pks+ E. coli as a tumourigenic bacteria. However, this study has certain methodological limitations. First, the temporal and direct causal relationship observed between dietary intake and the prevalence of pks+ E. coli could not be inferred as this study is a cross-sectional study. Second, although our results show that green tea and manganese intake is negatively associated with the prevalence of pks+ E. coli, these food and nutrient intakes estimated by BDHQ have not been fully validated against objective biomarkers. The results may have been affected by systematic errors due to body mass index and gender38. In addition, when we examined the association between diet and the prevalence of pks+ E. coli, we focused exclusively on diet estimated from a validated BDHQ. We have previously reported that yoghurt consumption increases stool frequency39. The results suggest that yoghurt consumption may affect the gut microbiome. Therefore, it is necessary to verify our results using a validated dietary assessment tool other than the BDHQ. Further, we conducted an exploratory investigation of the association between the intake of specitic foods, beverages, and nutrients and the prevalence of pks+ E. coli; however, multiple testing problems may arise when multiple tests are used to calculate p values. Finally, there is the possibility of selection bias due to higher health awareness of the participants in this study than in the general population. Of the 750 participants in the Nutrition and Exercise Intervention Study (NEXIS) cohort, 259 adults agreed to participate. As the participation rate is relatively low, volunteer bias may occur. In addition, participants are all from Tokyo metropolitan area with an average age of approximately 58 years. These limitations may prevent the generalisation of the results. Therefore, future studies with larger randomised samples should be used to investigate further the association between diet and the prevalence of pks+ E. coli. In addition, the effects of green tea consumption and the intake of manganese on the risk of pks+ E. coli should be examined by prospective cohort studies and randomised intervention studies.

Given the rapid Westernisation of diet around the world, there is an urgent need to highlight the importance of diet in the prevention of CRC. Furthermore, the difference in dietary intake between groups may explain the large global differences in cancer burden40. Therefore, to develop sustainable, comprehensive, and effective public health programmes for CRC prevention, our study data will provide useful insights into the development effective preventative intervention strategies for CRC.

Methods

Participants and study procedure

Of the 750 participants in the NEXIS cohort (ethical approval number: kenei102; clinical trial registration number: NCT00926744), which is operated and managed by the National Institute of Health and Nutrition, NIBIOHN since 2012, 259 adult males and females from the Tokyo metropolitan area, Japan, agreed to participate in this study (ethical approval number: kenei 3-04; clinical trial registration number: UMIN000023270). The health-related variables (e.g. smoking status, family history of disease, drugs use, and dietary survey) survey and faecal sampling in this study were conducted between September 2015 and December 2017.

The questionnaire for the lifestyle survey and a kit for faecal collection and storage were mailed to the participants. We used a triaxial accelerometer (Actimarker; EW4800; Panasonic Co., Ltd, Japan) to measure daily step counts as an objective index of physical activity. The participants were instructed to complete the questionnaire for the lifestyle survey (e.g. dietary survey) and collect lumps of faeces of ~ 2 cm diameter (approximately 3 g) at home. The collected faeces were immediately placed in a sealed container and stored in a − 20 °C freezer. The participants were instructed to bring the questionnaire and faecal samples to the National Institute of Health and Nutrition, NIBIOHN within 5 days after faecal collection, at which point they received health examinations, such as anthropometric and blood tests. The investigators, qualified as registered dieticians or nurses, checked the questionnaires and interviewed those with unanswered questions or unclear responses to confirm their answers. A portion of the frozen faeces was transported to the University of Shizuoka by a refrigerated truck and tested for the presence of pks+ E. coli. Blood samples were used to measure conventional risk factors for lifestyle-related diseases, such as haemoglobin A1c, triglyceride, and low-density lipoprotein-cholesterol. The study protocol was reviewed and approved by the Research Ethical Review Committee of NIBIOHN (approval number: kenei102 and kenei 3-04). Study procedures, as well as the risks associated with participation, were explained and written informed consent was obtained from all participants. Moreover, all study procedures were performed in accordance with relevant standard international guidelines/regulations.

Of the participants included in the baseline analysis (n = 259) who provided informed consent, those with a history of cancer (n = 12), gastrointestinal disease (n = 3), diabetes mellitus (n = 13), renal failure (n = 1), and cardiovascular disease (n = 6) were excluded from subsequent analyses. In addition, a participant with extremely low/high mean energy intake estimated by the BDHQ (n = 1) was excluded from the analysis as an outlier (< 600 or > 4,000 kcal/day)41. As a result, 223 participants were ultimately included in this study.

Confirmation of pks+E. coli using polymerase chain reaction (PCR)

To confirm that the E. coli was a pks+ strain, PCR was performed to amplify the genes from the clb cluster using the bacterial genomic DNA as a template. The details have been reported elsewhere42. In brief, the following 16 primer sets were used for amplifying each of the genes in the cluster: the primer sets were clbA-F/clbA-R for clbA, clbB-F/clbB-R for clbB, clbC-F/clbC-R for clbC, clbD-F/clbD-R for clbD, clbF-F/clbF-R for clbF, clbG-F/clbG-R for clbG, clbH-F/clbH-R for clbH, clbI-F/clbI-R for clbI, clbJ-F/clbJ-R for clbJ, clbK-F/clbK-R for clbK, clbL-F/clbL-R for clbL, clbM-F/clbM-R for clbM, clbN-F/clbN-R for clbN, clbO-F/clbO-R for clbO, clbP-F/clbP-R for clbP, and clbQ-F/clbQ-R for clbQ (see Supplementary Table S1 for primer sequence information). Of these primers, participants for whom clbB, clbJ, and clbQ were detected from faeces were defined as pks+ E. coli individual.

Dietary assessment

Dietary intake was evaluated using the BDHQ of Sasaki et al., consisting of 58 food and beverage items validated against 16-day dietary records (4 days per season)20,43. Food and beverage items listed in the BDHQ consist of food and beverages commonly consumed in Japan, according to the national health and nutrition survey44. This study examined only the frequency of intake of the 58 food and beverage items in the past month. Participants with unanswered questions or unclear responses were asked to confirm their responses during a face-to-face interview. Dietary and nutrient intakes were calculated from the weight of food intake (i.e. calculated according to the portion size and frequency of dietary intake) and nutritional information listed in the Standard Tables of Food Composition in Japan45.

Statistical analysis

All data were compared between participants with or without the pks+ E. coli. Categorical variables were expressed as numbers and percentages. The chi-square test was used to compare variables between the two groups. Descriptive statistics for continuous variables were expressed as mean and SD or median and IQR, and differences in continuous variables between the two groups were evaluated using the unpaired t test or the Mann–Whitney U test. Forty-two nutrient items43 and 28 items of food, beverages, and food groups20 calculated from the BDHQ were used in this study. The details of the food groups are described elsewhere20. In accordance with studies on the validation against the BDHQ, food and beverage consumption was expressed as median and IQR20, whereas nutrient intake was expressed as mean and SD43 and we conducted the descriptive statistics in accordance with these studies. In this study, to adjust for energy intake, food and nutrient intakes per 1,000 kcal were calculated using the density method46. Crude values and energy-adjusted values calculated using the density method for all food and nutrient intake variables were compared between with-without the pks+ E. coli.

Multivariate logistic regression analysis was used to adjust for potential confounding factors related to food and nutrient intake and the prevalence of pks+ E. coli. We adjusted for age (continuous), sex (female or male), body mass index (continuous), smoking status (never smoker, past smoker, or current smoker), family history of cancer (yes or no), energy intake (continuous), and step counts (continuous). These variables were decided with reference to covariates used in previous studies that examined the association between CRC and green tea consumption23,24,25,26,27,28,29,30,31. Participant characteristics, as well as food and nutrient intake that were significantly associated with the prevalence of pks+ E. coli (p < 0.05) were used in the analysis. The results of the analysis were expressed as OR and 95% CI. The OR and 95% CI were calculated for food and nutrient intake per unit increment. Food and nutrient intake variables that showed a significant association with pks+ E. coli prevalence in the multivariate analysis were used in a restricted cubic spline model, with three knots placed at the 5th, 50th, and 95th percentiles to evaluate dose-dependent responses to pks+ E. coli47. If the 95% CI for the OR did not include 1.00, the p value was estimated to be < 0.05. If the 95% CI included 1.00, the p value was estimated to be ≥ 0.05.

p < 0.05 (two-tailed) was considered statistically significant. All analyses were performed using Stata/MP 15.0 Statistical Software (StataCorp, College Station, TX, USA).