Metagenomic analysis of colorectal cancer datasets identifies cross-cohort microbial diagnostic signatures and a link with choline degradation

Thomas, Andrew Maltez; Manghi, Paolo; Asnicar, Francesco; Pasolli, Edoardo; Armanini, Federica; Zolfo, Moreno; Beghini, Francesco; Manara, Serena; Karcher, Nicolai; Pozzi, Chiara; Gandini, Sara; Serrano, Davide; Tarallo, Sonia; Francavilla, Antonio; Gallo, Gaetano; Trompetto, Mario; Ferrero, Giulio; Mizutani, Sayaka; Shiroma, Hirotsugu; Shiba, Satoshi; Shibata, Tatsuhiro; Yachida, Shinichi; Yamada, Takuji; Wirbel, Jakob; Schrotz-King, Petra; Ulrich, Cornelia M.; Brenner, Hermann; Arumugam, Manimozhiyan; Bork, Peer; Zeller, Georg; Cordero, Francesca; Dias-Neto, Emmanuel; Setubal, João Carlos; Tett, Adrian; Pardini, Barbara; Rescigno, Maria; Waldron, Levi; Naccarati, Alessio; Segata, Nicola

doi:10.1038/s41591-019-0405-7

Article
Published: 01 April 2019

Metagenomic analysis of colorectal cancer datasets identifies cross-cohort microbial diagnostic signatures and a link with choline degradation

Nature Medicine volume 25, pages 667–678 (2019)Cite this article

39k Accesses
481 Citations
362 Altmetric
Metrics details

Subjects

An Author Correction to this article was published on 29 October 2019

This article has been updated

Abstract

Several studies have investigated links between the gut microbiome and colorectal cancer (CRC), but questions remain about the replicability of biomarkers across cohorts and populations. We performed a meta-analysis of five publicly available datasets and two new cohorts and validated the findings on two additional cohorts, considering in total 969 fecal metagenomes. Unlike microbiome shifts associated with gastrointestinal syndromes, the gut microbiome in CRC showed reproducibly higher richness than controls (P < 0.01), partially due to expansions of species typically derived from the oral cavity. Meta-analysis of the microbiome functional potential identified gluconeogenesis and the putrefaction and fermentation pathways as being associated with CRC, whereas the stachyose and starch degradation pathways were associated with controls. Predictive microbiome signatures for CRC trained on multiple datasets showed consistently high accuracy in datasets not considered for model training and independent validation cohorts (average area under the curve, 0.84). Pooled analysis of raw metagenomes showed that the choline trimethylamine-lyase gene was overabundant in CRC (P = 0.001), identifying a relationship between microbiome choline metabolism and CRC. The combined analysis of heterogeneous CRC cohorts thus identified reproducible microbiome biomarkers and accurate disease-predictive models that can form the basis for clinical prognostic tests and hypothesis-driven mechanistic studies.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Reproducible taxonomic and functional microbial biomarkers across datasets when comparing carcinoma to healthy controls (no adenoma samples considered).**

**Fig. 2: Assessment of prediction performances of the gut microbiome for CRC detection within and across cohorts.**

**Fig. 3: Ranking relevance of each species in the predictive models for each dataset and identification of a minimal microbial signature for CRC detection.**

**Fig. 4: Choline TMA-lyase gene *cutC* and its genetic variants are strong biomarkers for CRC-associated stool samples.**

**Fig. 5: Clinical potential and validation of the predictive biomarkers.**

Metagenomic and metabolomic analyses reveal distinct stage-specific phenotypes of the gut microbiota in colorectal cancer

Article 06 June 2019

Multi-kingdom microbiota analyses identify bacterial–fungal interactions and biomarkers of colorectal cancer across cohorts

Article Open access 27 January 2022

Identification of microbial markers across populations in early detection of colorectal cancer

Article Open access 24 May 2021

Data availability

Nucleotide sequences for the two new Italian cohorts are available in the Sequence Read Archive under accession No. SRP136711. MetaPhlAn2 and HUMANn2 profiles for the new cohorts were also added to the curatedMetagenomicData R package²⁷ along with their corresponding metadata. Validation Cohort1 is available in the European Nucleotide Archive under the study identifier PRJEB27928; Validation Cohort2 is available in the DNA data bank of Japan databases under the accession No. DRA006684.

Change history

29 October 2019
An amendment to this paper has been published and can be accessed via a link at the top of the paper.

References

Ferlay, J. et al. Cancer incidence and mortality worldwide: sources, methods and major patterns in GLOBOCAN 2012. Int. J. Cancer 136, E359–E386 (2015).
Article CAS PubMed Google Scholar
Siegel, R., Desantis, C. & Jemal, A. Colorectal cancer statistics, 2014. CA Cancer J. Clin. 64, 104–117 (2014).
Article PubMed Google Scholar
Frank, C., Sundquist, J., Yu, H., Hemminki, A. & Hemminki, K. Concordant and discordant familial cancer: familial risks, proportions and population impact. Int. J. Cancer 140, 1510–1516 (2017).
Article CAS PubMed Google Scholar
Foulkes, W. D. Inherited susceptibility to common cancers. N. Engl. J. Med. 359, 2143–2153 (2008).
Article CAS PubMed Google Scholar
Johnson, C. M. et al. Meta-analyses of colorectal cancer risk factors. Cancer Causes Control 24, 1207–1222 (2013).
Article PubMed PubMed Central Google Scholar
Huxley, R. R. et al. The impact of dietary and lifestyle risk factors on risk of colorectal cancer: a quantitative overview of the epidemiological evidence. Int. J. Cancer 125, 171–180 (2009).
Article CAS PubMed Google Scholar
Schmidt, T. S. B., Raes, J. & Bork, P. The human gut microbiome: from association to modulation. Cell 172, 1198–1215 (2018).
Article CAS PubMed Google Scholar
Thomas, R. M. & Jobin, C. The microbiome and cancer: is the ‘oncobiome’ mirage real? Trends Cancer Res. 1, 24–35 (2015).
Article Google Scholar
Jie, Z. et al. The gut microbiome in atherosclerotic cardiovascular disease. Nat. Commun. 8, 845 (2017).
Article PubMed PubMed Central CAS Google Scholar
Pasolli, E., Truong, D. T., Malik, F., Waldron, L. & Segata, N. Machine learning meta-analysis of large metagenomic datasets: tools and biological insights. PLoS Comput. Biol. 12, e1004977 (2016).
Article PubMed PubMed Central CAS Google Scholar
Cougnoux, A. et al. Bacterial genotoxin colibactin promotes colon tumour growth by inducing a senescence-associated secretory phenotype. Gut 63, 1932–1942 (2014).
Article CAS PubMed Google Scholar
Wu, S. et al. A human colonic commensal promotes colon tumorigenesis via activation of T helper type 17 T cell responses. Nat. Med. 15, 1016–1022 (2009).
Article CAS PubMed PubMed Central Google Scholar
Chung, L. et al. Bacteroides fragilis toxin coordinates a pro-carcinogenic inflammatory cascade via targeting of colonic epithelial cells. Cell Host Microbe 23, 203–214.e5 (2018).
Article CAS PubMed PubMed Central Google Scholar
Kostic, A. D. et al. Genomic analysis identifies association of Fusobacterium with colorectal carcinoma. Genome Res. 22, 292–298 (2012).
Article CAS PubMed PubMed Central Google Scholar
Kostic, A. D. et al. Fusobacterium nucleatum potentiates intestinal tumorigenesis and modulates the tumor-immune microenvironment. Cell Host Microbe 14, 207–215 (2013).
Article CAS PubMed PubMed Central Google Scholar
Rubinstein, M. R. et al. Fusobacterium nucleatum promotes colorectal carcinogenesis by modulating E-cadherin/β-catenin signaling via its FadA adhesin. Cell Host Microbe 14, 195–206 (2013).
Article CAS PubMed PubMed Central Google Scholar
Yu, J. et al. Metagenomic analysis of faecal microbiome as a tool towards targeted non-invasive biomarkers for colorectal cancer. Gut 66, 70–78 (2017).
Article CAS PubMed Google Scholar
Feng, Q. et al. Gut microbiome development along the colorectal adenoma–carcinoma sequence. Nat. Commun. 6, 6528 (2015).
Article CAS PubMed Google Scholar
Zeller, G. et al. Potential of fecal microbiota for early‐stage detection of colorectal cancer. Mol. Syst. Biol. 10, 766 (2014).
Article PubMed PubMed Central CAS Google Scholar
Vogtmann, E. et al. Colorectal cancer and the human gut microbiome: reproducibility with whole-genome shotgun sequencing. PLoS One 11, e0155362 (2016).
Article PubMed PubMed Central CAS Google Scholar
Baxter, N. T., Ruffin, M. T., Rogers, M. A. M. & Schloss, P. D. Microbiota-based model improves the sensitivity of fecal immunochemical test for detecting colonic lesions. Genome Med. 8, 37 (2016).
Article PubMed PubMed Central CAS Google Scholar
Zackular, J. P., Rogers, M. A. M., Ruffin, M. T. 4th & Schloss, P. D. The human gut microbiome as a screening tool for colorectal cancer. Cancer Prev. Res. 7, 1112–1121 (2014).
Article CAS Google Scholar
Drewes, J. L. et al. High-resolution bacterial 16S rRNA gene profile meta-analysis and biofilm status reveal common colorectal cancer consortia. NPJ Biofilms Microbiomes 3, 34 (2017).
Article PubMed PubMed Central CAS Google Scholar
Pollock, J., Glendinning, L., Wisedchanwet, T. & Watson, M. The madness of microbiome: attempting to find consensus ‘best practice’ for 16S microbiome studies. Appl. Environ. Microbiol. 84, e02627–17 (2018).
Article PubMed PubMed Central Google Scholar
Segata, N. On the road to strain-resolved comparative metagenomics. mSystems 3, e00190–17 (2018).
Article PubMed PubMed Central Google Scholar
Truong, D. T., Tett, A., Pasolli, E., Huttenhower, C. & Segata, N. Microbial strain-level population structure and genetic diversity from metagenomes. Genome Res. 27, 626–638 (2017).
Article CAS PubMed PubMed Central Google Scholar
Pasolli, E. et al. Accessible, curated metagenomic data through ExperimentHub. Nat. Methods 14, 1023 (2017).
Article CAS PubMed PubMed Central Google Scholar
Dai, Z. et al. Multi-cohort analysis of colorectal cancer metagenome identified altered bacteria across populations and universal bacterial markers. Microbiome 6, 70 (2018).
Article PubMed PubMed Central Google Scholar
Quince, C., Walker, A. W., Simpson, J. T., Loman, N. J. & Segata, N. Shotgun metagenomics, from sampling to analysis. Nat. Biotechnol. 35, 833–844 (2017).
Article CAS PubMed Google Scholar
Wirbel, J. et al. Meta-analysis of fecal metagenomes reveals global microbial signatures that are specific for colorectal cancer. Nat. Med. https://doi.org/10.1038/s41591-019-0406-6 (2019).
Hannigan, G. D., Duhaime, M. B., Ruffin, M. T. 4th, Koumpouras, C. C. & Schloss, P. D. Diagnostic potential and interactive dynamics of the colorectal cancer virome. MBio 9, e02248–18 (2018).
Article PubMed PubMed Central Google Scholar
Thomas, A. M. et al. Tissue-associated bacterial alterations in rectal carcinoma patients revealed by 16S rRNA community profiling. Front. Cell. Infect. Microbiol. 6, 179 (2016).
Article PubMed PubMed Central CAS Google Scholar
Gao, Z., Guo, B., Gao, R., Zhu, Q. & Qin, H. Microbiota disbiosis is associated with colorectal cancer. Front. Microbiol. 6, 20 (2015).
PubMed PubMed Central Google Scholar
Ahn, J. et al. Human gut microbiome and risk for colorectal cancer. J. Natl Cancer Inst. 105, 1907–1911 (2013).
Article CAS PubMed PubMed Central Google Scholar
Flemer, B. et al. The oral microbiota in colorectal cancer is distinctive and predictive. Gut 67, 1454–1463 (2017).
Article PubMed CAS Google Scholar
Brito, I. L. et al. Mobile genes in the human microbiome are structured from global to individual scales. Nature 535, 435–439 (2016).
Article CAS PubMed PubMed Central Google Scholar
Human Microbiome Project Consortium. Structure, function and diversity of the healthy human microbiome. Nature 486, 207–214 (2012).
Article CAS Google Scholar
Bonder, M. J. et al. The effect of host genetics on the gut microbiome. Nat. Genet. 48, 1407 (2016).
Article CAS PubMed Google Scholar
Segata, N. et al. Metagenomic biomarker discovery and explanation. Genome Biol. 12, R60 (2011).
Article PubMed PubMed Central Google Scholar
Xie, Y.-H. et al. Fecal Clostridium symbiosum for noninvasive detection of early and advanced colorectal cancer: test and validation studies. EBioMedicine 25, 32–40 (2017).
Article PubMed PubMed Central Google Scholar
Boleij, A., van Gelder, M. M. H. J., Swinkels, D. W. & Tjalsma, H. Clinical Importance of Streptococcus gallolyticus infection among colorectal cancer patients: systematic review and meta-analysis. Clin. Infect. Dis. 53, 870–878 (2011).
Article CAS PubMed Google Scholar
Fijan, S. Microorganisms with claimed probiotic properties: an overview of recent literature. Int. J. Environ. Res. Public Health 11, 4745–4767 (2014).
Article PubMed PubMed Central Google Scholar
Apweiler, R. et al. UniProt: the universal protein knowledgebase. Nucleic Acids Res. 32, D115–D119 (2004).
Article CAS PubMed PubMed Central Google Scholar
Gerner, E. W. & Meyskens, F. L. Jr Polyamines and cancer: old molecules, new understanding. Nat. Rev. Cancer 4, 781–792 (2004).
Article CAS PubMed Google Scholar
Costea, P. I. et al. Towards standards for human fecal sample processing in metagenomic studies. Nat. Biotechnol. 35, 1069–1076 (2017).
Article CAS PubMed Google Scholar
Riester, M. et al. Risk prediction for late-stage ovarian cancer by meta-analysis of 1525 patient samples. J. Natl Cancer Inst. 106, dju048 (2014).
Article PubMed PubMed Central Google Scholar
Kummen, M. et al. Elevated trimethylamine-N-oxide (TMAO) is associated with poor prognosis in primary sclerosing cholangitis patients with normal liver function. United European Gastroenterol. J. 5, 532–541 (2017).
Article CAS PubMed Google Scholar
Oellgaard, J., Winther, S. A., Hansen, T. S., Rossing, P. & von Scholten, B. J. Trimethylamine N-oxide (TMAO) as a new potential therapeutic target for insulin resistance and cancer. Curr. Pharm. Des. 23, 3699–3712 (2017).
Article CAS PubMed Google Scholar
Kalnins, G. et al. Structure and function of CutC choline lyase from human microbiota bacterium Klebsiella pneumoniae. J. Biol. Chem. 290, 21732–21740 (2015).
Article CAS PubMed PubMed Central Google Scholar
Rath, S., Heidrich, B., Pieper, D. H. & Vital, M. Uncovering the trimethylamine-producing bacteria of the human gut microbiota. Microbiome 5, 54 (2017).
Article PubMed PubMed Central Google Scholar
Pasolli, E. et al. Extensive unexplored human microbiome diversity revealed by over 150,000 genomes from metagenomes spanning age, geography, and lifestyle. Cell 176, 1–14 (2019).
Article CAS Google Scholar
Hothorn, T., Hornik, K., van de Wiel, M. A. & Zeileis, A. A lego system for conditional inference. Am. Stat. 60, 257–263 (2006).
Article Google Scholar
Nielsen, H. B. et al. Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes. Nat. Biotechnol. 32, 822–828 (2014).
Article CAS PubMed Google Scholar
Karlsson, F. H. et al. Gut metagenome in European women with normal, impaired and diabetic glucose control. Nature 498, 99–103 (2013).
Article CAS PubMed Google Scholar
Qin, J. et al. A metagenome-wide association study of gut microbiota in type 2 diabetes. Nature 490, 55–60 (2012).
Article CAS PubMed Google Scholar
Integrative HMP (iHMP) Research Network Consortium. The integrative human microbiome project: dynamic analysis of microbiome-host omics profiles during periods of human health and disease. Cell Host Microbe 16, 276–289 (2014).
Article CAS Google Scholar
Dejea, C. M. et al. Patients with familial adenomatous polyposis harbor colonic biofilms containing tumorigenic bacteria. Science 359, 592–597 (2018).
Article CAS PubMed PubMed Central Google Scholar
Manichanh, C. et al. Reduced diversity of faecal microbiota in Crohn’s disease revealed by a metagenomic approach. Gut 55, 205–211 (2006).
Article CAS PubMed PubMed Central Google Scholar
Le Chatelier, E. et al. Richness of human gut microbiome correlates with metabolic markers. Nature 500, 541–546 (2013).
Article PubMed CAS Google Scholar
Bae, S. et al. Plasma choline metabolites and colorectal cancer risk in the Women’s Health Initiative Observational Study. Cancer Res. 74, 7442–7452 (2014).
Article CAS PubMed PubMed Central Google Scholar
Xu, R., Wang, Q. & Li, L. A genome-wide systems analysis reveals strong link between colorectal cancer and trimethylamine N-oxide (TMAO), a gut microbial metabolite of dietary meat and fat. BMC Genomics 16(Suppl 7), S4 (2015).
Article PubMed PubMed Central CAS Google Scholar
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357 (2012).
Article CAS PubMed PubMed Central Google Scholar
Truong, D. T. et al. MetaPhlAn2 for enhanced metagenomic taxonomic profiling. Nat. Methods 12, 902–903 (2015).
Article CAS PubMed Google Scholar
Franzosa, E. A. et al. Species-level functional profiling of metagenomes and metatranscriptomes. Nat. Methods 15, 962–968 (2018).
Article CAS PubMed PubMed Central Google Scholar
Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).
Article Google Scholar
Pedregosa, F. et al. Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
Google Scholar
Hastie, T, Tibshirani, R. & Friedman, J. The Elements of Statistical Learning Vol. 1 (Springer, 2009).
Mandal, S. et al. Analysis of composition of microbiomes: a novel method for studying microbial composition. Microb. Ecol. Health Dis. 26, 27663 (2015).
PubMed Google Scholar
Ritchie, M. E. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43, e47 (2015).
Article PubMed PubMed Central CAS Google Scholar
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).
Article CAS PubMed Google Scholar
Seemann, T. Prokka: rapid prokaryotic genome annotation. Bioinformatics 30, 2068–2069 (2014).
Article CAS PubMed Google Scholar
Segata, N., Börnigen, D., Morgan, X. C. & Huttenhower, C. PhyloPhlAn is a new method for improved phylogenetic and taxonomic placement of microbes. Nat. Commun. 4, 2304 (2013).
Article PubMed CAS Google Scholar
Kaminski, J. et al. High-specificity targeted functional profiling in microbial communities with ShortBRED. PLoS Comput. Biol. 11, e1004557 (2015).
Article PubMed PubMed Central CAS Google Scholar
Edgar, R. C. Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26, 2460–2461 (2010).
Article CAS PubMed Google Scholar
Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Article PubMed PubMed Central CAS Google Scholar
Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011).
Article CAS PubMed PubMed Central Google Scholar
Katoh, K. & Standley, D. M. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772–780 (2013).
Article CAS PubMed PubMed Central Google Scholar
Price, M. N., Dehal, P. S. & Arkin, A. P. FastTree 2–approximately maximum-likelihood trees for large alignments. PLoS ONE 5, e9490 (2010).
Article PubMed PubMed Central CAS Google Scholar
Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312–1313 (2014).
Article CAS PubMed PubMed Central Google Scholar
Asnicar, F., Weingart, G., Tickle, T. L., Huttenhower, C. & Segata, N. Compact graphical representation of phylogenetic data and metadata with GraPhlAn. PeerJ 3, e1029 (2015).
Article PubMed PubMed Central CAS Google Scholar
Livak, K. J. & Schmittgen, T. D. Analysis of relative gene expression data using real-time quantitative PCR and the 2^–ΔΔC(T) Method. Methods 25, 402–408 (2001).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

We thank the members of the Segata, Naccarati and Waldron groups for insightful discussions, all the volunteers enrolled in the study, the NGS facility at the University of Trento for performing the metagenomic sequencing, and the HPC facility at the University of Trento for supporting the computational experiments. This work was primarily supported by Lega Italiana per La Lotta contro i Tumori to N.S., F.C. and A.N., and by Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP, grant No. 16/23527-2) to A.M.T. This work was also partially supported by the Conselho Nacional de Pesquisa e Desenvolvimento (CNPq, Brazil) to J.C.S. and E.D.-N.; by FAPESP (grant No. 14/26897-0); by Associação Beneficente Alzira Denise Hertzog Silva (ABADHS, Brazil) and PRONON/SIPAR to E.D.-N.; by Coordenação de Aperfeiçoamento de Pessoal de Nível Superior – Brazil (CAPES) (Finance Code No. 001 to J.C.S.; by the Italian Institute for Genomic Medicine (IIGM) and Compagnia di San Paolo Torino to A.N, A.F., B.P. and S.T.; by Fondazione Umberto Veronesi ‘Post-doctoral Fellowship Years 2014, 2015, 2016, 2017 and 2018’ to B.P. and S.T.; by the Grant Agency of the Czech Republic (grant No. 17-16857S) to A.N.; by Fondazione Umberto Veronesi (grant No. FUV-14-SG-GANDINI) to S.G.; by the European Union H2020 Marie Curie grant (No. 707345) to E.P.; by the European Research Council (ERC-STG project MetaPG) to N.S.; by MIUR ‘Futuro in Ricerca’ (grant No. RBFR13EWWI_001) to N.S.; by the People Programme (Marie Curie Actions) of the European Union FP7 and H2020 to N.S.; and by the National Cancer Institute (grant No. U24CA180996) and National Institute of Allergy and Infectious Diseases (grant No. 1R21AI121784-01) of the National Institutes of Health to L.W. B.P. is the recipient of a Fulbright Research Scholarship (year 2018). We acknowledge funding from EMBL, DKFZ, the Huntsman Cancer Foundation, the Intramural Research Program of the National Cancer Institute, ETH Zürich and the following external sources: the European Research Council (CancerBiome, grant No. ERC-2010-AdG_20100317) to P.B.; Microbios (No. ERC-AdG-669830) to P.B.; the Novo Nordisk Foundation (grant No. NNF10CC1016515) to M.A.; the Danish Diabetes Academy supported by the Novo Nordisk Foundation and TARGET research initiative (Danish Strategic Research Council, grant No. 0603-00484B) to M.A.; the Matthias-Lackas Foundation (to C.M.U.); the National Cancer Institute (grant Nos. R01 CA189184, R01 CA207371, U01 CA206110 and P30 CA042014 ll to C.M.U.); the BMBF (the de.NBI network, grant No. 031A537B) to P.B.; the ERA-NET TRANSCAN project (No. 01KT1503) to C.M.U.; and the Helmut Horten Foundation (to S.S.). For Validation Cohort2, funding was provided by grants from the National Cancer Center Research and Development Fund (grant Nos. 25-A-4, 28-A-4 and 29-A-6); Practical Research Project for Rare/Intractable Diseases from the Japan Agency for Medical Research and Development (AMED, grant No. JP18ek0109187); JST (Japan Science and Technology Agency)-PRESTO (grant No. JPMJPR1507); Japan Society for the Promotion of Science KAKENHI (grant Nos. 16J10135, 142558 and 221S0002); Joint Research Project of the Institute of Medical Science; the University of Tokyo; the Takeda Science Foundation; and the Suzuken Memorial Foundation.

Author information

These authors contributed equally: Andrew Maltez Thomas, Paolo Manghi.
These authors jointly supervised this work: Levi Waldron, Alessio Naccarati, Nicola Segata.

Authors and Affiliations

Department CIBIO, University of Trento, Trento, Italy
Andrew Maltez Thomas, Paolo Manghi, Francesco Asnicar, Edoardo Pasolli, Federica Armanini, Moreno Zolfo, Francesco Beghini, Serena Manara, Nicolai Karcher, Adrian Tett & Nicola Segata
Biochemistry Department, Chemistry Institute, University of São Paulo, São Paulo, Brazil
Andrew Maltez Thomas & João Carlos Setubal
Medical Genomics Laboratory, CIPE/A.C. Camargo Cancer Center, São Paulo, Brazil
Andrew Maltez Thomas & Emmanuel Dias-Neto
IEO, European Institute of Oncology IRCCS, Milan, Italy
Chiara Pozzi, Sara Gandini & Davide Serrano
Italian Institute for Genomic Medicine, Turin, Italy
Sonia Tarallo, Antonio Francavilla, Barbara Pardini & Alessio Naccarati
Department of Surgical and Medical Sciences, University of Catanzaro, Catanzaro, Italy
Gaetano Gallo
Department of Colorectal Surgery, Clinica S. Rita, Vercelli, Italy
Gaetano Gallo & Mario Trompetto
Department of Computer Science, University of Turin, Turin, Italy
Giulio Ferrero & Francesca Cordero
School of Life Science and Technology, Tokyo Institute of Technology, Tokyo, Japan
Sayaka Mizutani, Hirotsugu Shiroma & Takuji Yamada
Research Fellow of Japan Society for the Promotion of Science, Tokyo, Japan
Sayaka Mizutani
Division of Cancer Genomics, National Cancer Center Research Institute, Tokyo, Japan
Satoshi Shiba, Tatsuhiro Shibata & Shinichi Yachida
Human Genome Center, The Institute of Medical Science, The University of Tokyo, Tokyo, Japan
Tatsuhiro Shibata
Department of Cancer Genome Informatics, Osaka University, Osaka, Japan
Shinichi Yachida
PRESTO, Japan Science and Technology Agency, Saitama, Japan
Takuji Yamada
Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
Jakob Wirbel, Peer Bork & Georg Zeller
Division of Preventive Oncology, National Center for Tumor Diseases and German Cancer Research Center, Heidelberg, Germany
Petra Schrotz-King & Hermann Brenner
Huntsman Cancer Institute and Department of Population Health Sciences, University of Utah, Salt Lake City, UT, USA
Cornelia M. Ulrich
Division of Clinical Epidemiology and Aging Research, German Cancer Research Center, Heidelberg, Germany
Hermann Brenner
German Cancer Consortium, German Cancer Research Center, Heidelberg, Germany
Hermann Brenner
Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
Manimozhiyan Arumugam
Faculty of Healthy Sciences, University of Southern Denmark, Odense, Denmark
Manimozhiyan Arumugam
Molecular Medicine Partnership Unit, Heidelberg, Germany
Peer Bork
Max Delbrück Centre for Molecular Medicine, Berlin, Germany
Peer Bork
Department of Bioinformatics, Biocenter, University of Würzburg, Würzburg, Germany
Peer Bork
Laboratory of Neurosciences, Institute of Psychiatry, University of São Paulo, São Paulo, Brazil
Emmanuel Dias-Neto
Biocomplexity Institute of Virginia Tech, Blacksburg, VA, USA
João Carlos Setubal
Department of Medical Sciences, University of Turin, Turin, Italy
Barbara Pardini
Mucosal Immunology and Microbiota Unit, Humanitas Research Hospital, Milan, Italy
Maria Rescigno
Graduate School of Public Health and Health Policy, City University of New York, New York, NY, USA
Levi Waldron
Institute for Implementation Science in Population Health, City University of New York, New York, NY, USA
Levi Waldron
Department of Molecular Biology of Cancer, Institute of Experimental Medicine, Prague, Czech Republic
Alessio Naccarati

Authors

Andrew Maltez Thomas
View author publications
You can also search for this author in PubMed Google Scholar
Paolo Manghi
View author publications
You can also search for this author in PubMed Google Scholar
Francesco Asnicar
View author publications
You can also search for this author in PubMed Google Scholar
Edoardo Pasolli
View author publications
You can also search for this author in PubMed Google Scholar
Federica Armanini
View author publications
You can also search for this author in PubMed Google Scholar
Moreno Zolfo
View author publications
You can also search for this author in PubMed Google Scholar
Francesco Beghini
View author publications
You can also search for this author in PubMed Google Scholar
Serena Manara
View author publications
You can also search for this author in PubMed Google Scholar
Nicolai Karcher
View author publications
You can also search for this author in PubMed Google Scholar
Chiara Pozzi
View author publications
You can also search for this author in PubMed Google Scholar
Sara Gandini
View author publications
You can also search for this author in PubMed Google Scholar
Davide Serrano
View author publications
You can also search for this author in PubMed Google Scholar
Sonia Tarallo
View author publications
You can also search for this author in PubMed Google Scholar
Antonio Francavilla
View author publications
You can also search for this author in PubMed Google Scholar
Gaetano Gallo
View author publications
You can also search for this author in PubMed Google Scholar
Mario Trompetto
View author publications
You can also search for this author in PubMed Google Scholar
Giulio Ferrero
View author publications
You can also search for this author in PubMed Google Scholar
Sayaka Mizutani
View author publications
You can also search for this author in PubMed Google Scholar
Hirotsugu Shiroma
View author publications
You can also search for this author in PubMed Google Scholar
Satoshi Shiba
View author publications
You can also search for this author in PubMed Google Scholar
Tatsuhiro Shibata
View author publications
You can also search for this author in PubMed Google Scholar
Shinichi Yachida
View author publications
You can also search for this author in PubMed Google Scholar
Takuji Yamada
View author publications
You can also search for this author in PubMed Google Scholar
Jakob Wirbel
View author publications
You can also search for this author in PubMed Google Scholar
Petra Schrotz-King
View author publications
You can also search for this author in PubMed Google Scholar
Cornelia M. Ulrich
View author publications
You can also search for this author in PubMed Google Scholar
Hermann Brenner
View author publications
You can also search for this author in PubMed Google Scholar
Manimozhiyan Arumugam
View author publications
You can also search for this author in PubMed Google Scholar
Peer Bork
View author publications
You can also search for this author in PubMed Google Scholar
Georg Zeller
View author publications
You can also search for this author in PubMed Google Scholar
Francesca Cordero
View author publications
You can also search for this author in PubMed Google Scholar
Emmanuel Dias-Neto
View author publications
You can also search for this author in PubMed Google Scholar
João Carlos Setubal
View author publications
You can also search for this author in PubMed Google Scholar
Adrian Tett
View author publications
You can also search for this author in PubMed Google Scholar
Barbara Pardini
View author publications
You can also search for this author in PubMed Google Scholar
Maria Rescigno
View author publications
You can also search for this author in PubMed Google Scholar
Levi Waldron
View author publications
You can also search for this author in PubMed Google Scholar
Alessio Naccarati
View author publications
You can also search for this author in PubMed Google Scholar
Nicola Segata
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

N.S., A.M.T., L.W. and A.N conceived the study. N.S. supervised the study. C.P., S.G., D.S., S.T., A.F., G.G., M.T., B.P, M.R. and A.N. organized the clinical study, recruited patients and collected samples. F. Armanini generated metagenomic data. A.M.T., P.M., F. Asnicar, E.P., M.Z., F.B., N.K. and G.F. collected and analyzed the metagenomic data. A.M.T., P.M., F. Asnicar, E.P., M.Z., G.F., J.W., G.Z. and L.W. performed machine learning and statistical analyses. F. Armanini, S.T., S. Manara, A.T., B.P. and A.N. performed validation experiments. S. Mizutani., H.S., S. Shiba, T.S., S.Y., T.Y., J.W., P.S.-K, C.M.U., H.B., M.A., P.B. and G.Z. provided additional validation data. A.M.T., P.M., L.W. and N.S. designed and produced the figures. A.M.T., P.M. and N.S. wrote the manuscript with contributions from S. Manara, F.C., E.D.-N., J.C.S., M.R., L.W. and A.N. All authors discussed and approved the manuscript.

Corresponding author

Correspondence to Nicola Segata.

Ethics declarations

Competing interests

P.B., G.Z., A.Y.V. and S.S. are named inventors on a patent (EP2955232A1: Method for diagnosing colorectal cancer based on analyzing the gut microbiome). All other authors declare no competing interests as defined by Nature Research, or other interests that might be perceived to influence the results and/or discussion reported in this paper.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Sequencing depths and species richness across CRC datasets

a, Boxplots reporting the total number of reads in each dataset. P values between the carcinoma and control groups were calculated by two-tailed Wilcoxon rank-sum tests. b, Boxplots showing the total number of microbial species per dataset. P values were calculated by two-tailed Wilcoxon rank-sum tests. c, Boxplots showing the total number of microbial species per dataset calculated on metagenomes subsampled in each dataset to the number of reads of the tenth percentile. P values were calculated by two-tailed Wilcoxon rank-sum tests. d, Multivariate analysis of species richness using crude and age-, sex- and BMI-adjusted coefficients obtained from linear models. e, Meta-analysis of crude and adjusted multivariate richness coefficients using a random effects model. Bold lines represent the 95% confidence interval for the random effects model estimate.

Extended Data Fig. 2 Meta-analysis of species diversity and oral species richness in CRC datasets.

a, Boxplots reporting the Shannon species diversity in each dataset. P values between the carcinoma and control groups were calculated by two-tailed Wilcoxon rank-sum tests. b, Boxplots reporting the Shannon species diversity calculated on metagenomes subsampled in each dataset to the number of reads of the tenth percentile. P values were calculated by two-tailed Wilcoxon rank-sum tests. c, Multivariate analysis of species diversity using crude and age-, sex- and BMI-adjusted coefficients obtained from linear models. d, Meta-analysis of crude and adjusted multivariate Shannon diversity coefficients using a random effects model. Bold lines represent the 95% confidence interval for the random effects model estimate. e, Boxplots reporting the total number of oral microbial species per dataset. P values were calculated by two-tailed Wilcoxon rank-sum tests comparing values between controls and carcinomas for each dataset. f, Multivariate analysis of putative oral species richness using crude and age-, sex- and BMI-adjusted coefficients obtained from linear models. g, Meta-analysis of crude and adjusted multivariate putative oral species richness coefficients using a random effects model. Bold lines represent the 95% confidence interval for the random effects model estimate.

Extended Data Fig. 3 Two metagenomic cohorts identify clear but only partially overlapping microbiome signatures associated with CRC.

a,b, Relative abundances (log scale) and effect sizes (estimated using the linear discriminant analysis score in LEfSe) for the significantly different microbial species in CRC samples compared to control samples for Cohort1 (significance assessed by the non-parametric test in LEfSe) (a) and Cohort2 (b). c, Alpha-diversities measured as the total number of species and total number of UniProt90 gene families in each sample for the two cohorts. d, Beta-diversities estimated with the Bray–Curtis dissimilarity metric for intra- and inter-condition comparisons in the two cohorts.

Extended Data Fig. 4 Analysis of F. nucleatum markers and taxonomic meta-analysis of CRC datasets.

a, Percentages of F. nucleatum clade-specific markers (200 in total) in each dataset. P values were obtained by two-tailed Wilcoxon rank-sum tests comparing values between controls and carcinomas for each dataset. b, Multivariate analysis of meta-analysis species-level abundance biomarkers. Crude and age-, sex- and BMI-adjusted coefficients for species associated with disease status in the meta-analysis of standardized mean differences. c, Meta-analysis of CRC datasets using species-level MetaPhlAn2 profiles. Bold lines represent the 95% confidence interval for the random effects model estimate.

Extended Data Fig. 5 Analysis of putative oral species abundances in CRC datasets and gene-family richness across CRC datasets.

a, Effect sizes of the abundances of significant putative oral species identified using a meta-analysis of standardized mean differences and a random effects model. Bold lines represent the 95% confidence interval for the random effects model estimate. b, Total abundance of putative oral species in each gut metagenomic dataset. P values were obtained by two-tailed Wilcoxon rank-sum tests comparing values between controls and carcinomas for each dataset. c, The total number of reads in each sample of each dataset correlates with the total number of gene families identified using HUMANn2. Ellipses represent the 95% confidence level assuming a multivariate t-distribution. d, Distribution of the total number of gene families identified in the samples of each dataset. P values were obtained by two-tailed Wilcoxon rank-sum tests comparing values between controls and carcinomas for each dataset. e, Distribution of the percentages of unmapped reads across datasets for UniProt90 gene families.

Extended Data Fig. 6 Cross-validation, cross-cohort and LODO predictions using pathway abundances, species abundances and species-specific markers.

a, Prediction matrix reporting prediction performances as AUC values obtained using a random forest model on pathway relative abundances. Values on the diagonal refer to 20 times repeated tenfold stratified cross-validations. Off-diagonal values refer to the AUC values obtained by training the classifier on the dataset of the corresponding row and applying it to the dataset of the corresponding column. The LODO row refers to the performances obtained by training the model on pathway abundances using all but the dataset of the corresponding column and applying it to the dataset of the corresponding column. b, Prediction matrix as in a but using MetaPhlAn2 marker presence and absence information. c, Prediction of samples-to-cohort assignments using species-level relative abundances. Only control samples from each dataset are considered. d, Principal coordinate analysis of Bray–Curtis distances computed on MetaPhlAn2 species-level abundances across datasets. Ellipses represent the 95% confidence level assuming a multivariate t-distribution. e, Cross-prediction matrix for the performances of random forest models in predicting adenomas versus CRC conditions. f, Cross-prediction matrix as described in e but on the distinction of adenomas versus controls.

Extended Data Fig. 7 Prediction performances with increasing numbers of external datasets considered in the training model.

a, Prediction performances computed based on MetaPhlAn2 species abundances. The dark yellow line interpolates the median AUC at each number of training datasets considered. b, Prediction performances computed based on HUMANn2 gene-family abundances.

Extended Data Fig. 8 Identification of a minimal number of microbial gene families for CRC detection.

Prediction performances in the LODO settings at increasing numbers of gene families. Each ranking is obtained excluding the testing dataset to avoid overfitting.

Extended Data Fig. 9 Metagenomic analysis of genes involved in the TMA synthesis pathway.

a, ShortBRED analysis of yeaW and caiT gene abundances. Points represent the log of RPKM for each sample and crosses represent average values per group/dataset. b, ShortBRED analysis of cutD gene abundances. Boxplots report the RKPM abundances obtained using ShortBRED for the gene of the activating TMA-lyase enzyme cutD. P values were calculated by two-tailed Wilcoxon rank-sum tests comparing values between controls and carcinomas for each dataset. c, Forest plot showing effect sizes calculated using a meta-analysis of standardized mean differences and a random effects model on cutD RPKM abundances between carcinomas and controls. d, Breadth of coverage of cutC gene sequence clusters across CRC datasets. e, Depth of coverage of cutC gene sequence clusters across CRC datasets.

Extended Data Fig. 10 Cluster analysis of representative cutC sequence variants of samples.

a, Prediction strengths at differing numbers of clusters showing optimum numbers at two and four clusters. b, Tables showing the number of samples for carcinomas, adenomas and controls with breadth of coverage >80% at two different cluster thresholds. P values were calculated using a Fisher t-test, and taxonomy was assigned by BLASTn and the cutC sequence database (criteria of 80% coverage, >97% identity and minimum 2,000 nt alignment length).

Supplementary information

Reporting Summary

Supplementary Tables

Supplementary Tables 1–5

Rights and permissions

Reprints and permissions

About this article

Cite this article

Thomas, A.M., Manghi, P., Asnicar, F. et al. Metagenomic analysis of colorectal cancer datasets identifies cross-cohort microbial diagnostic signatures and a link with choline degradation. Nat Med 25, 667–678 (2019). https://doi.org/10.1038/s41591-019-0405-7

Download citation

Received: 29 May 2018
Accepted: 20 February 2019
Published: 01 April 2019
Issue Date: April 2019
DOI: https://doi.org/10.1038/s41591-019-0405-7

This article is cited by

QNetDiff: a quantitative measurement of network rewiring
- Shota Nose
- Hirotsugu Shiroma
- Yushi Uno
BMC Bioinformatics (2024)
Utilization of the microbiome in personalized medicine
- Karina Ratiner
- Dragos Ciocan
- Eran Elinav
Nature Reviews Microbiology (2024)
Comparison of the effectiveness of different normalization methods for metagenomic cross-study phenotype prediction under heterogeneity
- Beibei Wang
- Fengzhu Sun
- Yihui Luan
Scientific Reports (2024)
Consistent signatures in the human gut microbiome of old- and young-onset colorectal cancer
- Youwen Qin
- Xin Tong
- Pei-Rong Ding
Nature Communications (2024)
Exploring the gut DNA virome in fecal immunochemical test stool samples reveals associations with lifestyle in a large population-based study
- Paula Istvan
- Einar Birkeland
- Trine B. Rounge
Nature Communications (2024)