In most cases of sporadic colorectal cancers, tumorigenesis is a multistep process, involving genomic alterations in parallel with morphologic changes. In addition, accumulating evidence suggests that the human gut microbiome is linked to the development of colorectal cancer. Here we performed fecal metagenomic and metabolomic studies on samples from a large cohort of 616 participants who underwent colonoscopy to assess taxonomic and functional characteristics of gut microbiota and metabolites. Microbiome and metabolome shifts were apparent in cases of multiple polypoid adenomas and intramucosal carcinomas, in addition to more advanced lesions. We found two distinct patterns of microbiome elevations. First, the relative abundance of Fusobacterium nucleatum spp. was significantly (P < 0.005) elevated continuously from intramucosal carcinoma to more advanced stages. Second, Atopobium parvulum and Actinomyces odontolyticus, which co-occurred in intramucosal carcinomas, were significantly (P < 0.005) increased only in multiple polypoid adenomas and/or intramucosal carcinomas. Metabolome analyses showed that branched-chain amino acids and phenylalanine were significantly (P < 0.005) increased in intramucosal carcinomas and bile acids, including deoxycholate, were significantly (P < 0.005) elevated in multiple polypoid adenomas and/or intramucosal carcinomas. We identified metagenomic and metabolomic markers to discriminate cases of intramucosal carcinoma from the healthy controls. Our large-cohort multi-omics data indicate that shifts in the microbiome and metabolome occur from the very early stages of the development of colorectal cancer, which is of possible etiological and diagnostic importance.
Access optionsAccess options
Subscribe to Journal
Get full journal access for 1 year
only $18.75 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Brenner, H., Kloor, M. & Pox, C. P. Colorectal cancer. Lancet 383, 1490–1502 (2014).
Fearon, E. R. & Vogelstein, B. A genetic model for colorectal tumorigenesis. Cell 61, 759–767 (1990).
Jones, S. et al. Comparative lesion sequencing provides insights into tumor evolution. Proc. Natl Acad. Sci. USA 105, 4283–4288 (2008).
Ashktorab, H., Kupfer, S. S., Brim, H. & Carethers, J. M. Racial disparity in gastrointestinal cancer risk. Gastroenterology 153, 910–923 (2017).
Zeller, G. et al. Potential of fecal microbiota for early-stage detection of colorectal cancer. Mol. Syst. Biol. 10, 766 (2014).
Castellarin, M. et al. Fusobacterium nucleatum infection is prevalent in human colorectal carcinoma. Genome Res. 22, 299–306 (2012).
Kostic, A. D. et al. Genomic analysis identifies association of Fusobacterium with colorectal carcinoma. Genome Res. 22, 292–298 (2012).
Yang, Y. et al. Fusobacterium nucleatum increases proliferation of colorectal cancer cells and tumor development in mice by activating Toll-like receptor 4 signaling to nuclear factor-kB, and up-regulating expression of microRNA-21. Gastroenterology 152, 851–866 (2017).
Louis, P., Hold, G. L. & Flint, H. J. The gut microbiota, bacterial metabolites and colorectal cancer. Nat. Rev. Microbiol. 12, 661–672 (2014).
Hirayama, A. et al. Quantitative metabolome profiling of colon and stomach cancer microenvironment by capillary electrophoresis time-of-flight mass spectrometry. Cancer Res. 69, 4918–4925 (2009).
Liao, M. et al. Comparative analyses of fecal microbiota in Chinese isolated Yao population, minority Zhuang and rural Han by 16sRNA sequencing. Sci. Rep. 8, 1142 (2018).
Arumugam, M. et al. Enterotypes of the human gut microbiome. Nature 473, 174–180 (2011).
Ding, T. & Schloss, P. D. Dynamics and associations of microbial community types across the human body. Nature 509, 357–360 (2014).
Wong, J. M., de Souza, R., Kendall, C. W., Emam, A. & Jenkins, D. J. Colonic health: fermentation and short chain fatty acids. J. Clin. Gastroenterol. 40, 235–243 (2006).
Emiola, A. & Oh, J. High throughput in situ metagenomic measurement of bacterial replication at ultra-low sequencing coverage. Nat. Commun. 9, 4956 (2018).
Brestoff, J. R. & Artis, D. Commensal bacteria at the interface of host metabolism and the immune system. Nat. Immunol. 14, 676–684 (2013).
Zarling, E. J. & Ruchim, M. A. Protein origin of the volatile fatty acids isobutyrate and isovalerate in human stool. J. Lab. Clin. Med. 109, 566–570 (1987).
Devkota, S. et al. Dietary-fat-induced taurocholic acid promotes pathobiont expansion and colitis in Il10 −/− mice. Nature 487, 104–108 (2012).
Mottawea, W. et al. Altered intestinal microbiota–host mitochondria crosstalk in new onset Crohn’s disease. Nat. Commun. 7, 13419 (2016).
Xu, H. et al. Isoleucine biosynthesis in Leptospira interrogans serotype lai strain 56601 proceeds via a threonine-independent pathway. J. Bacteriol. 186, 5400–5409 (2004).
Bui, T. P. et al. Production of butyrate from lysine and the Amadori product fructoselysine by a human gut commensal. Nat. Commun. 6, 10062 (2015).
Prieto, M. A., Díaz, E. & García, J. L. Molecular characterization of the 4-hydroxyphenylacetate catabolic pathway of Escherichia coli W: engineering a mobile aromatic degradative cluster. J. Bacteriol. 178, 111–120 (1996).
Teufel, R. et al. Bacterial phenylalanine and phenylacetate catabolic pathway revealed. Proc. Natl Acad. Sci. USA 107, 14390–14395 (2010).
Russell, W. R. et al. High-protein, reduced-carbohydrate weight-loss diets promote metabolite profiles likely to be detrimental to colonic health. Am. J. Clin. Nutr. 93, 1062–1072 (2011).
Russell, W. R. et al. Major phenylpropanoid-derived metabolites in the human gut can arise from microbial fermentation of protein. Mol. Nutr. Food Res. 57, 523–535 (2013).
Windey, K., De Preter, V. & Verbeke, K. Relevance of protein fermentation to gut health. Mol. Nutr. Food Res. 56, 184–196 (2012).
Attene-Ramos, M. S., Wagner, E. D., Plewa, M. J. & Gaskins, H. R. Evidence that hydrogen sulfide is a genotoxic agent. Mol. Cancer Res. 4, 9–14 (2006).
Loubinoux, J., Bisson-Boutelliez, C., Miller, N. & Le Faou, A. E. Isolation of the provisionally named Desulfovibrio fairfieldensis from human periodontal pockets. Oral Microbiol. Immunol. 17, 321–323 (2002).
Feng, Q. et al. Gut microbiome development along the colorectal adenoma–carcinoma sequence. Nat. Commun. 6, 6528 (2015).
Yu, J. et al. Metagenomic analysis of faecal microbiome as a tool towards targeted non-invasive biomarkers for colorectal cancer. Gut 66, 70–78 (2017).
Bullman, S. et al. Analysis of Fusobacterium persistence and antibiotic response in colorectal cancer. Science 358, 1443–1448 (2017).
Carbonero, F., Benefiel, A. C., Alizadeh-Ghamsari, A. H. & Gaskins, H. R. Microbial pathways in colonic sulfur metabolism and links with health and disease. Front. Physiol. 3, 448 (2012).
Könönen, E. & Wade, W. G. Actinomyces and related organisms in human infections. Clin. Microbiol. Rev. 28, 419–442 (2015).
Kasai, C. et al. Comparison of human gut microbiota in control subjects and patients with colorectal carcinoma in adenoma: terminal restriction fragment length polymorphism and next-generation sequencing analyses. Oncol. Rep. 35, 325–333 (2016).
Bernstein, H., Bernstein, C., Payne, C. M. & Dvorak, K. Bile acids as endogenous etiologic agents in gastrointestinal cancer. World J. Gastroenterol. 15, 3329–3340 (2009).
Suzuki, K. & Bruce, W. R. Increase by deoxycholic acid of the colonic nuclear damage induced by known carcinogens in C57BL/6J mice. J. Natl Cancer Inst. 76, 1129–1132 (1986).
Vandeputte, D. et al. Quantitative microbiome profiling links gut community variation to microbial load. Nature 551, 507–511 (2017).
Tahara, T. et al. Fusobacterium in colonic flora and molecular features of colorectal carcinoma. Cancer Res. 74, 1311–1318 (2014).
Ito, M. et al. Association of Fusobacterium nucleatum with clinical and molecular features in colorectal serrated pathway. Int J. Cancer 137, 1258–1268 (2015).
Mima, K. et al. Fusobacterium nucleatum in colorectal carcinoma tissue and patient prognosis. Gut 65, 1973–1980 (2016).
Nishimoto, Y. et al. High stability of faecal microbiome composition in guanidine thiocyanate solution at room temperature and robustness during colonoscopy. Gut 65, 1574–1575 (2016).
Tsugane, S. & Sawada, N. The JPHC study: design and some findings on the typical Japanese diet. Jpn J. Clin. Oncol. 44, 777–782 (2014).
Furet, J. P. et al. Comparative assessment of human and farm animal faecal microbiota using real-time quantitative PCR. FEMS Microbiol. Ecol. 68, 351–362 (2009).
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J. 17, 10 (2011).
Mori, H., Maruyama, T., Yano, M., Yamada, T. & Kurokawa, K. VITCOMIC2: visualization tool for the phylogenetic composition of microbial communities based on 16S rRNA gene amplicons and metagenomic shotgun sequencing. BMC Syst. Biol. 12, 30 (2018).
Camacho, C. et al. BLAST+: architecture and applications. BMC Bioinformatics 10, 421 (2009).
Yarza, P. et al. Update of the All-Species Living Tree Project based on 16S and 23S rRNA sequence analyses. Syst. Appl. Microbiol. 33, 291–299 (2010).
Yarza, P. et al. Uniting the classification of cultured and uncultured bacteria and archaea using 16S rRNA gene sequences. Nat. Rev. Microbiol. 12, 635–645 (2014).
Sunagawa, S. et al. Metagenomic species profiling using universal phylogenetic marker genes. Nat. Methods 10, 1196–1199 (2013).
Truong, D. T. et al. MetaPhlAn2 for enhanced metagenomic taxonomic profiling. Nat. Methods 12, 902–903 (2015).
Holmes, I., Harris, K. & Quince, C. Dirichlet multinomial mixtures: generative models for microbial metagenomics. PLoS ONE 7, e30126 (2012).
Asnicar, F., Weingart, G., Tickle, T. L., Huttenhower, C. & Segata, N. Compact graphical representation of phylogenetic data and metadata with GraPhlAn. PeerJ 3, e1029 (2015).
Peng, Y., Leung, H. C., Yiu, S. M. & Chin, F. Y. IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 28, 1420–1428 (2012).
Besemer, J. & Borodovsky, M. Heuristic approach to deriving models for gene finding. Nucleic Acids Res. 27, 3911–3920 (1999).
Kanehisa, M. & Goto, S. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 28, 27–30 (2000).
Soga, T. et al. Quantitative metabolome analysis using capillary electrophoresis mass spectrometry. J. Proteome Res. 2, 488–494 (2003).
Mishima, E. et al. Evaluation of the impact of gut microbiota on uremic solute accumulation by a CE-TOFMS-based metabolomics approach. Kidney Int. 92, 634–645 (2017).
Tibshirani, R. Regression shrinkage and selection via the LASSO. J. R. Stat. Soc. B 58, 267–288 (1996).
Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
We thank all patients and their families who participated in this study, S. Goto for technical advice and A. Kaya, C. Shima, K. Igarashi, R. Usui, K. Murakami, I. Take, M. Sezawa, M. Iwahara, M. Komori, Z. Nakagawa, Y. Ohara and K. Kamezaki for expert technical assistance. Computations were partially performed on the NIG supercomputer at the ROIS National Institute of Genetics. This work was supported by grants from the National Cancer Center Research and Development Fund (25-A-4 and 28-A-4 to S.Y., S.F and T. Yamada and 29-A-6 to T. Yamada and T. Shibata); Practical Research Project for Rare/Intractable Diseases from the Japan Agency for Medical Research and Development (AMED) (JP18ek0109187 to S.Y., S.M., Y.S., S.F. and T. Yamada; JP18jk0210009 to S.Y. and T. Shibata); AMED-CREST (JP18gm0710003 to S.Y. and T. Soga.); JST (Japan Science and Technology Agency)-PRESTO (JPMJPR1537 to S.F. and JPMJPR1507 to T. Yamada); JSPS (Japan Society for the Promotion of Science) KAKENHI (16H04901, 17H05654 and 18H04805 to S.F.; 16J10135, 142558 and 221S0002 to T. Yamada); Joint Research Project of the Institute of Medical Science, the University of Tokyo (2017–2107 to S.Y. and T. Shibata); Integrated Frontier Research for Medical Science Division, Institute for Open and Transdisciplinary Research Initiatives, Osaka University (to S.Y.); the Takeda Science Foundation (to S.Y.) and Suzuken Memorial Foundation (H25–2–11 to S.Y.).
a, Overview of the study. Fecal samples from 616 subjects were used to collect whole-genome shotgun sequencing data, from which functional and taxonomic profiles were generated. Fecal samples from 406 subjects were used to perform CE-TOFMS analysis to generate metabolite profiles. Samples from 347 subjects were available for both the sequencing and CE-TOFMS data analyses. KO, KEGG orthology. b, Flow chart of the pipeline used for the metagenomics analysis. Our metagenomics pipeline consists of three parts, quality control, functional profiling and taxonomic profiling, in which raw reads first undergo a quality control check and are then used to run several analytical steps to finally generate the KEGG orthology gene-based functional and taxonomic profiles.
a,b, Distribution of age, gender, BMI, Brinkman index and alcohol consumption in 616 subjects with metagenome data (a) and 406 subjects with metabolome data (b). The boxes represent 25th–75th percentiles, black lines indicate the median and whiskers extend to the maximum and minimum values within 1.5× the interquartile range and dots indicate outliers.
a, Change in the fraction of human genome reads out of the total number of raw reads along with colorectal cancer progression. Significant increases (P < 0.005; one-sided Mann–Whitney U test) in the human genome ratio (percentage reads mapped to the human genome) in feces in the MP (n = 67), S0 (n = 73), SI/II (n = 111) and SIII/IV (n = 74) groups are evident compared to the healthy group (H, n = 251). The boxes represent 25–75th percentiles, black lines indicate the median, vertical lines show maximum values within 1.5× the interquartile range and dots indicate outliers beyond the 1.5× interquartile range. b,c, PCAs of genera (n = 251) (b) and metabolites (n = 149) (c) in healthy controls. d, Fitting to the Dirichlet multinomial mixture model indicated optimal classification of fecal metagenomes (n = 616) into four community types. e, Composition of top 30 genera in each of the four community types. f–i, Distribution of stages (healthy, MP, S0, SI/II, SIII/IV and HS) (f), tumor locations (right colon, left colon, rectum and double or triple cancers) (g), gender (h) and age (i) in each of the four community types. Patient distribution of the community types are as follows: community type 1, n = 191; community type 2, n = 172; community type 3, n = 129; community type 4, n = 124. The boxes in i represent 25th–75th percentiles, black lines indicate the median, whiskers extend to the maximum and minimum values within 1.5× the interquartile range and dots indicate outliers.
Extended Data Fig. 4 Distributions of tumor locations and gender in the overall structures of metagenomes and metabolomes.
a,b, PCA plots of genus profiles (n = 616) grouped by tumor location (a) and gender (b). c,d, PCA plots of metabolite profiles (n = 406) grouped by tumor location (c) and gender (d).
a,b, Abundance of A. parvulum estimated with whole-genome shotgun metagenomic sequence data (a) and by qPCR using the 16S rRNA gene copy number (b) in samples from 73 patients with S0 CRCs (green) and 73 healthy controls (red). c, Spearman’s correlation coefficient of A. parvulum abundances between the two methods was calculated with P-value computation using asymptotic approximation. d, qPCR demonstrated statistically significant differences in the gene number of A. parvulum between the healthy controls and patients with S0 CRCs (one-sided Mann–Whitney U test). The boxes represent 25th–75th percentiles, black lines indicate the median, whiskers extend to the maximum and minimum values within 1.5× the interquartile range and dots indicate outliers.
Replication rates were plotted for the 20 species shown in Fig. 2. The y axis (GRiD) is defined as the ratio between coverage at the peak (ori) and trough (ter) for the reference bacterial genome. Samples that had sufficient coverage are plotted (coverage > 0.2) for mapping against the reference genomes. Sample numbers varied with dependent species and are indicated in parentheses. P values were calculated using one-sided Mann–Whitney U tests for each of the stages (MP, S0, SI/II, SIII/IV) and were compared to the healthy controls. Significant elevation or depletion are denoted as follows: +++, elevation with P < 0.005; ++, elevation with P < 0.01; +, elevation with P < 0.05; −−−, depletion with P < 0.005; −−, depletion with P < 0.01; −, depletion at P < 0.05. The boxes represent 25th–75th percentiles, black lines indicate the median, whiskers extend to the maximum and minimum values within 1.5× the interquartile range and dots indicate outliers.
Sixty-five metabolites with statistically significant (P < 0.005; one-sided Mann–Whitney U test) differences for any of the stages (MP, n = 45; S0, n = 30; SI/II, n = 80; SIII/IV, n = 68 compared to the healthy controls (n = 149) on capillary electrophoresis time of flight mass spectrometer (CE-TOFMS) analysis. Significant changes (elevation and depletion) are denoted as follows: +++, elevation with P < 0.005; ++, elevation with P < 0.01; +, elevation with P < 0.05; −−−, depletion with P< 0.005; −−, depletion with P < 0.01; −, depletion at P < 0.05. The boxes represent 25th–75th percentiles, black lines indicate the median, whiskers extend to the maximum and minimum values within 1.5× the interquartile range and dots indicate outliers.
Extended Data Fig. 8 Metabolomic changes in the tricarboxylic acid (TCA) pathways and metagenomic changes in amino acid metabolism and other representative pathways.
a, Quantified levels of metabolites involved in the tricarboxylic acid (TCA) pathway. The levels of the three TCA metabolites, succinate, fumarate and malate, were significantly higher P < 0.005; one-sized Mann–Whitney U test) in S0 samples (and SIII/IV samples for fumarate) (+++, P < 0.005; ++, P < 0.01; +, P < 0.05) compared to healthy control samples. It is uncertain what is the cause of accumulation of succinate, fumarate and malate in the feces of patients with early colorectal cancer, despite extremely low concentrations of other TCA intermediates such as 2-oxoglutarate. It is known that some bacteria synthesize ATP using a reverse reaction of succinate dehydrogenase and produce succinate as a byproduct, as part of fumarate respiration, in which fumarate rather than molecular oxygen is used as electron acceptor. The boxes represent 25th–75th percentiles, black lines indicate the median, whiskers extend to the maximum and minimum values within 1.5× the interquartile range and dots indicate outliers. The concentration is shown on the y axis (nmol g−1). Healthy (n = 127), MP (n = 45), S0 (n = 30), SI/II (n = 80), SIII/IV (n = 68). N.D., not detected and/or not determined. b, Pathway modules for metabolism types omitted from Fig. 3b. The pathway modules are modified from KEGG pathway maps ‘Alanine, aspartate and glutamate metabolism’, ‘Cysteine and methionine metabolism’, ‘Methane metabolism’ and ‘Arginine and proline metabolism’. ‘Leucine degradation’ is constructed based on leucine metabolism of Clostridium difficile, as the bacterial map is not available in KEGG. For each KO gene, bar plots show KO gene abundances averaged over samples within each of the five groups, healthy (n = 251), MP (n = 67), S0 (n = 73), SI/II (n = 111) and SIII/IV (n = 74) in order of left to right, and colored according to the order of the values. Each KO gene is composed of organism genes represented by circles. The sizes and colors of the circles are proportional to the relative abundances of the organism genes. Organism genes are grouped into one row and indicated by the organism name. The three most abundant organisms in the healthy controls are shown using three letter codes (for example, ova for Oscillibacter valericigenes, kpe for Klebsiella pneumoniae 342). Abbreviations for other organism names can be found in Supplementary Table 4. Gene numbers linked to each of the genes are listed in Supplementary Table 5. Dots in each pathway represent intermediate metabolites. The colors of the boxes of pathway components are marked in red for significant elevation (P < 0.005; one-sided Mann–Whitney U test) for any of the stages (MP, S0, SI/II and SIII/IV) compared to the healthy controls.
Extended Data Fig. 9 Metagenomic and metabolomic potentials as diagnostic markers for early (S0) and advanced (SIII/IV) CRCs.
a, ROC curves as performance evaluation for LASSO logistic regression and random-forest models and to distinguish samples from patients with S0 (left two panels) and SIII/IV (right two panels) CRCs from samples of healthy controls. Models were designed based on species (red), KO genes (blue) or metabolites (black) individually, or a combination (green) of the three features. For the S0 classification, the species-based model used 29 species, the KO gene-based model used 16 KO genes and the metabolite-based model used 24 metabolites. For SIII/IV classification, the species-based model used 55 species, the KO gene-based model used 5 KO genes and the metabolite-based model used 62 metabolites. In the combination models, species, KO gene and metabolite features were selected from the individual models. Classification accuracy was evaluated on the AUC using 10 randomized 10-fold cross-validation testing. For the LASSO logistic regression models, all features that satisfied the abundance thresholds were used to construct both the individual models and the combination model. The discriminant features among all are shown. b, Discriminant features identified from LASSO logistic regression classifiers and random-forest classifiers for distinguishing S0 (n = 27) and SIII/IV (n = 54) cases from the healthy controls (n = 127). The colors of box plots represent significant increases (red) and decreases (light blue) in each group, compared to the healthy controls (P < 0.005; one-sided Mann–Whitney U test). Dark gray boxes indicate features without statistical significance. The x axis shows the percentage contribution of the features to the model in each test (see Methods). The boxes represent 25th–75th percentiles, black lines indicate the median, whiskers extend to the maximum and minimum values within 1.5× the interquartile range and dots indicate outliers. c, Analysis of potential confounding factors that might affect metagenomic and metabolomic classifiers. We analyzed AUCs for factors such as age, gender, BMI, smoking and alcohol exposure. Smoking and alcohol values are indicated as the Brinkman index and the amount of alcohol consumption, respectively. Whereas the gender of the patient and the Brinkman index significantly differed between groups (Supplementary Table 1), neither the random-forest nor the logistic regression model achieved high accuracy.
Fusobacterium spp., Akkermansia muciniphila and sulfidogenic bacteria (B. wadsworthia and Pyramidobacter piscolens), which were previously reported to exhibit relationships with dietary intake, were examined for Spearman’s correlation coefficients with dietary fiber (water-soluble, insoluble and total dietary fiber), dietary protein intake (protein and meat), dietary fat (lipid), dietary calcium (calcium), serving of dairy products (milk) and energy intake (energy). +++, correlation with P < 0.005; +, correlation with P < 0.05. Samples that lack dietary data are omitted from the computation of correlation coefficients. Healthy, n = 242; MP, n = 67; S0, n = 72; SI/II, n = 109; SIII/IV, n = 71.