Genome content predicts the carbon catabolic preferences of heterotrophic bacteria

Gralka, Matti; Pollak, Shaul; Cordero, Otto X.

doi:10.1038/s41564-023-01458-z

Article
Published: 31 August 2023

Genome content predicts the carbon catabolic preferences of heterotrophic bacteria

Nature Microbiology volume 8, pages 1799–1808 (2023)Cite this article

8473 Accesses
10 Citations
73 Altmetric
Metrics details

Subjects

Abstract

Heterotrophic bacteria—bacteria that utilize organic carbon sources—are taxonomically and functionally diverse across environments. It is challenging to map metabolic interactions and niches within microbial communities due to the large number of metabolites that could serve as potential carbon and energy sources for heterotrophs. Whether their metabolic niches can be understood using general principles, such as a small number of simplified metabolic categories, is unclear. Here we perform high-throughput metabolic profiling of 186 marine heterotrophic bacterial strains cultured in media containing one of 135 carbon substrates to determine growth rates, lag times and yields. We show that, despite high variability at all levels of taxonomy, the catabolic niches of heterotrophic bacteria can be understood in terms of their preference for either glycolytic (sugars) or gluconeogenic (amino and organic acids) carbon sources. This preference is encoded by the total number of genes found in pathways that feed into the two modes of carbon utilization and can be predicted using a simple linear model based on gene counts. This allows for coarse-grained descriptions of microbial communities in terms of prevalent modes of carbon catabolism. The sugar–acid preference is also associated with genomic GC content and thus with the carbon–nitrogen requirements of their encoded proteome. Our work reveals how the evolution of bacterial genomes is structured by fundamental constraints rooted in metabolism.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Phenotyping experiment uncovers metabolic preferences across 186 diverse marine bacterial strains.**

**Fig. 2: Metabolic preferences can be predicted from genomes.**

**Fig. 3: Metabolic preference correlates with genomic GC content.**

**Fig. 4: Use of SAP to functionally annotate bacterial communities.**

**Fig. 5: Schematic overview of central metabolic fluxes when the primary substrate is either glycolytic or gluconeogenic.**

Genome diversification in globally distributed novel marine Proteobacteria is linked to environmental adaptation

Article Open access 11 May 2020

Selective carbon sources influence the end products of microbial nitrate respiration

Article Open access 05 May 2020

Taxonomic and environmental distribution of bacterial amino acid auxotrophies

Article Open access 22 November 2023

Data availability

All growth and genomic data are available at https://doi.org/10.17632/xfh8t8568g.1. All isolates are available from either M.G. (Europe) or O.X.C. (USA) on request. All genome assemblies are available under BioProjects PRJNA319196 and PRJNA478695, with the exception of strains 1A06 (PRJNA318805), 12B01 (PRJNA13568), 13B01 (PRJNA318805), DSS-3 (BioSample SAMN02604003) as well as AS40, AS56, AS88 and AS94 (PRJNA996876). Source data are provided with this paper.

Code availability

All code needed to reproduce the figures are available at https://doi.org/10.17632/xfh8t8568g.1.

References

Huttenhower, C. et al. Structure, function and diversity of the healthy human microbiome. Nature 486, 207–214 (2012).
Article CAS Google Scholar
Thompson, L. R. et al. A communal catalogue reveals Earth’s multiscale microbial diversity. Nature 551, 457–463 (2017).
Article CAS PubMed PubMed Central Google Scholar
Sunagawa, S. et al. Structure and function of the global ocean microbiome. Science 348, 1261359 (2015).
Article PubMed Google Scholar
Pontrelli, S. et al. Metabolic cross-feeding structures the assembly of polysaccharide degrading communities. Sci. Adv. 8, eabk3076 (2022).
Article CAS PubMed PubMed Central Google Scholar
Gralka, M., Szabo, R., Stocker, R. & Cordero, O. X. Trophic interactions and the drivers of microbial community assembly. Curr. Biol. 30, R1176–R1188 (2020).
Article CAS PubMed Google Scholar
Pollak, S. et al. Public good exploitation in natural bacterioplankton communities. Sci. Adv. 7, eabi4717 (2021).
Article CAS PubMed PubMed Central Google Scholar
Moran, M. A. The global ocean microbiome. Science 350, aac8455 (2015).
Article PubMed Google Scholar
Datta, M. S., Sliwerska, E., Gore, J., Polz, M. F. & Cordero, O. X. Microbial interactions lead to rapid micro-scale successions on model marine particles. Nat. Commun. 7, 11965 (2016).
Article CAS PubMed PubMed Central Google Scholar
Enke, T. N. et al. Modular assembly of polysaccharide-degrading marine microbial communities. Curr. Biol. 29, 1528–1535 (2019).
Article CAS PubMed Google Scholar
Fahimipour, A. K. & Gross, T. Mapping the bacterial metabolic niche space. Nat. Commun. 11, 4887 (2020).
Article CAS PubMed PubMed Central Google Scholar
Kehe, J. et al. Positive interactions are common among culturable bacteria. Sci. Adv. 7, eabi7159 (2021).
Article CAS PubMed PubMed Central Google Scholar
Kirchman, D. L. The ecology of Cytophaga–Flavobacteria in aquatic environments. FEMS Microbiol. Ecol. 39, 91–100 (2002).
CAS PubMed Google Scholar
Buchan, A., LeCleir, G. R., Gulvik, C. A. & González, J. M. Master recyclers: features and functions of bacteria associated with phytoplankton blooms. Nat. Rev. Microbiol. 12, 686–698 (2014).
Article CAS PubMed Google Scholar
Machado, D., Andrejev, S., Tramontano, M. & Patil, K. R. Fast automated reconstruction of genome-scale metabolic models for microbial species and communities. Nucleic Acids Res. 46, 7542–7553 (2018).
Article CAS PubMed PubMed Central Google Scholar
Barberán, A., Caceres Velazquez, H., Jones, S. & Fierer, N. Hiding in plain sight: mining bacterial species records for phenotypic trait information. mSphere 2, e00237-17 (2017).
Article PubMed PubMed Central Google Scholar
Mende, D. R. et al. ProGenomes2: an improved database for accurate and consistent habitat, taxonomic and functional annotations of prokaryotic genomes. Nucleic Acids Res. 48, D621–D625 (2020).
CAS PubMed Google Scholar
Sueoka, N. Correlation between base composition of deoxyribonucleic acid and amino acid composition of protein. Proc. Natl Acad. Sci. USA 47, 1141–1149 (1961).
Article CAS PubMed PubMed Central Google Scholar
Hellweger, F. L., Huang, Y. & Luo, H. Carbon limitation drives GC content evolution of a marine bacterium in an individual-based genome-scale model. ISME J. 12, 1180–1187 (2018).
Article CAS PubMed PubMed Central Google Scholar
Shenhav, L. & Zeevi, D. Resource conservation manifests in the genetic code. Science 370, 683–687 (2020).
Article CAS PubMed Google Scholar
Mende, D. R. et al. Environmental drivers of a microbial genomic transition zone in the ocean’s interior. Nat. Microbiol. 2, 1367–1373 (2017).
Article CAS PubMed Google Scholar
Musto, H. et al. Genomic GC level, optimal growth temperature, and genome size in prokaryotes. Biochem. Biophys. Res. Commun. 347, 1–3 (2006).
Article CAS PubMed Google Scholar
Estrela, S. et al. Functional attractors in microbial community assembly. Cell Syst. 13, 29–42 (2022).
Article CAS PubMed Google Scholar
Amarnath, K. et al. Stress-induced metabolic exchanges between complementary bacterial types underly a dynamic mechanism of inter-species stress resistance. Nat. Commun. 14, 3165 (2023).
Article CAS PubMed PubMed Central Google Scholar
Estrela, S., Diaz-Colunga, J., Vila, J. C., Sanchez-Gorostiaga, A., & Sanchez, A. Diversity begets diversity under microbial niche construction. Preprint at bioRxiv https://doi.org/10.1101/2022.02.13.480281 (2022).
Schink, S. J. et al. Glycolysis/gluconeogenesis specialization in microbes is driven by biochemical constraints of flux sensing. Mol. Syst. Biol. 18, e10704 (2022).
Article CAS PubMed PubMed Central Google Scholar
Basan, M. et al. A universal trade-off between growth and lag in fluctuating environments. Nature 584, 470–474 (2020).
Article CAS PubMed PubMed Central Google Scholar
Plucain, J. et al. Epistasis and allele specificity in the emergence of a stable polymorphism in Escherichia coli. Science 343, 160–164 (2014).
Article Google Scholar
Blount, Z. D., Borland, C. Z. & Lenski, R. E. Historical contingency and the evolution of a key innovation in an experimental population of Escherichia coli. Proc. Natl Acad. Sci. USA 105, 7899–7906 (2008).
Article CAS PubMed PubMed Central Google Scholar
Le Gac, M., Plucain, J., Hindré, T., Lenski, R. E. & Schneider, D. Ecological and evolutionary dynamics of coexisting lineages during a long-term experiment with Escherichia coli. Proc. Natl Acad. Sci. USA 109, 9487–9492 (2012).
Article PubMed PubMed Central Google Scholar
Hershberg, R. & Petrov, D. A. Evidence that mutation is universally biased towards AT in bacteria. PLoS Genet. 6, e1001115 (2010).
Article PubMed PubMed Central Google Scholar
Ely, B. Genomic GC content drifts downward in most bacterial genomes. PLoS ONE 16, e0244163 (2021).
Article CAS PubMed PubMed Central Google Scholar
Maddamsetti, R. & Grant, N. A. Divergent evolution of mutation rates and biases in the long-term evolution experiment with Escherichia coli. Genome Biol. Evol. 12, 1591–1603 (2020).
Article CAS PubMed PubMed Central Google Scholar
Yakovchuk, P., Protozanova, E. & Frank-Kamenetskii, M. D. Base-stacking and base-pairing contributions into thermal stability of the DNA double helix. Nucleic Acids Res. 34, 564–574 (2006).
Article CAS PubMed PubMed Central Google Scholar
Lassalle, F. et al. GC-content evolution in bacterial genomes: the biased gene conversion hypothesis expands. PLoS Genet. 11, e1004941 (2015).
Article PubMed PubMed Central Google Scholar
Shenhav, L. & Zeevi, D. Resource conservation manifests in the genetic code. Science 370, 683–687 (2020).
Article CAS PubMed Google Scholar
Smriga, S., Ciccarese, D. & Babbin, A. R. Denitrifying bacteria respond to and shape microscale gradients within particulate matrices. Commun. Biol. 4, 570 (2021).
Article CAS PubMed PubMed Central Google Scholar
Gowda, K., Ping, D., Mani, M. & Kuehn, S. Genomic structure predicts metabolite dynamics in microbial communities. Cell 185, 530–546 (2022).
Article CAS PubMed Google Scholar
Moran, M. A. et al. Genome sequence of Silicibacter pomeroyi reveals adaptations to the marine environment. Nature 432, 910–913 (2004).
Article CAS PubMed Google Scholar
Ben-Haim, Y. et al. Vibrio coralliilyticus sp. nov., a temperature-dependent pathogen of the coral Pocillopora damicornis. Int. J. Syst. Evol. Microbiol. 53, 309–315 (2003).
Article CAS PubMed Google Scholar
Hehemann, J. H. et al. Adaptive radiation by waves of gene transfer leads to fine-scale resource partitioning in marine microbes. Nat. Commun. 7, 12860 (2016).
Article CAS PubMed PubMed Central Google Scholar
Bankevich, A. et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19, 455–477 (2012).
Article CAS PubMed PubMed Central Google Scholar
Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P. & Tyson, G. W. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 25, 1043–1055 (2015).
Article CAS PubMed PubMed Central Google Scholar
Hyatt, D. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinform. 11, 119 (2010).
Article Google Scholar
Huerta-Cepas, J. et al. EGGNOG 4.5: a hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences. Nucleic Acids Res. 44, D286–D293 (2016).
Article CAS PubMed Google Scholar
Zhang, H. et al. DbCAN2: a meta server for automated carbohydrate-active enzyme annotation. Nucleic Acids Res. 46, W95–W101 (2018).
Article CAS PubMed PubMed Central Google Scholar
Chaumeil, P. A., Mussig, A. J., Hugenholtz, P. & Parks, D. H. GTDB-Tk: a toolkit to classify genomes with the genome taxonomy database. Bioinformatics 36, 1925–1927 (2020).
Article CAS Google Scholar
Shen, W. & Ren, H. TaxonKit: a practical and efficient NCBI taxonomy toolkit. J. Genet. Genomics 48, 844–850 (2021).
Article PubMed Google Scholar
Ebrahim, A., Lerman, J. A., Palsson, B. O. & Hyduke, D. R. COBRApy: COnstraints-based reconstruction and analysis for Python. BMC Syst. Biol. 7, 74 (2013).
Article PubMed PubMed Central Google Scholar
Wolfram Mathematica v. 13.2 (Wolfram, 2022).
R: A Language and Environment for Statistical Computing (R Core Team, 2022).
Yu, G., Smith, D. K., Zhu, H., Guan, Y. & Lam, T. T. Y. Ggtree: an R package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods Ecol. Evol. 8, 28–36 (2017).
Article Google Scholar
Paradis, E. & Schliep, K. Ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics 35, 526–528 (2019).
Article CAS PubMed Google Scholar
Tamura, K., Stecher, G. & Kumar, S. MEGA11: Molecular Evolutionary Genetics Analysis version 11. Mol. Biol. Evol. 38, 3022–3027 (2021).
Article CAS PubMed PubMed Central Google Scholar
Schliep, K. P. phangorn: Phylogenetic analysis in R. Bioinformatics 27, 592–593 (2011).
Article CAS PubMed Google Scholar
Kanehisa, M., Sato, Y., Kawashima, M., Furumichi, M. & Tanabe, M. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 44, D457–D462 (2016).
Article CAS PubMed Google Scholar
Heinken, A. et al. Genome-scale metabolic reconstruction of 7,302 human microorganisms for personalized medicine. Nat. Biotechnol. https://doi.org/10.1038/s41587-022-01628-0 (2023).
Heinken, A., Magnúsdóttir, S., Fleming, R. M. T. & Thiele, I. DEMETER: efficient simultaneous curation of genome-scale reconstructions guided by experimental data and refined gene annotations. Bioinformatics 37, 3974–3975 (2021).
Article CAS PubMed PubMed Central Google Scholar
Callahan, B. J. et al. DADA2: high-resolution sample inference from Illumina amplicon data. Nat. Methods 13, 581–583 (2016).
Article CAS PubMed PubMed Central Google Scholar
Hubert, B. SkewDB, a comprehensive database of GC and 10 other skews for over 30,000 chromosomes and plasmids. Sci. Data 9, 92 (2022).
Article CAS PubMed PubMed Central Google Scholar
Lagadec, E., Småge, S. B., Trösse, C. & Nylund, A. Phylogenetic analyses of Norwegian Tenacibaculum strains confirm high bacterial diversity and suggest circulation of ubiquitous virulent strains. PLoS One 16, e0259215 (2021).
Article CAS PubMed PubMed Central Google Scholar
Ekborg, N. A. et al. Saccharophagus degradans gen. nov., sp. nov., a versatile marine degrader of complex polysaccharides. Int. J. Syst. Evol. Microbiol. 55, 1545–1549 (2005).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

We thank S. Estrela (Yale University and Stanford University) for providing community composition data from their enrichment experiments (Fig. 4d); A. Sichert for assembling genomes; and M. d. Bello, X. Shan, T. Hwa as well as all members of the Cordero laboratory and Simons PriME collaboration for their enriching discussions. We acknowledge funding from the Simons Collaboration: Principles of Microbial Ecosystems (PriME) award number 542395 (O.X.C.) and Simons Foundation Postdoctoral Fellowship Award number 599207 (M.G.).

Author information

Matti Gralka
Present address: Systems Biology Group, Amsterdam Institute for Life and Environment (A-LIFE) and Amsterdam Institute of Molecular and Life Sciences (AIMMS), Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
Shaul Pollak
Present address: Division of Microbial Ecology, Centre for Microbiology and Environmental Systems Science, University of Vienna, Vienna, Austria

Authors and Affiliations

Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
Matti Gralka, Shaul Pollak & Otto X. Cordero

Authors

Matti Gralka
View author publications
You can also search for this author in PubMed Google Scholar
Shaul Pollak
View author publications
You can also search for this author in PubMed Google Scholar
Otto X. Cordero
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

M.G. designed the study, performed all experiments, analysed all data and wrote the initial manuscript. S.P. analysed the genomic data from the proGenomes database. M.G., S.P. and O.X.C. discussed the results. O.X.C. directed the project and edited the manuscript.

Corresponding authors

Correspondence to Matti Gralka or Otto X. Cordero.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Microbiology thanks Sara Mitri, Seppe Kuehn and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Phylogenetic tree of all strains used in this study.

The tree and taxonomy were created using the GTDB-tk classify workflow using standard parameters from an alignment of 120 marker genes. The legend corresponds to expected substitutions per site. A full list of all strains is provided in Supplementary Table 1.

Extended Data Fig. 2 Overview of growth characterization results.

a, Number of carbon sources supporting growth per strain. b, Fraction of all strains that were able to use a given substrate as their sole carbon and energy source. c, There was a lack of strong correlation between the number of carbon sources that support growth, growth rate and yield. Average yield (blue dots) and rate (red squares) binned by the number of carbon sources that supported growth, shown as the mean ± s.d. (for a total of n = 182 strains showing growth on at least one substrate). Lines and P values are derived from linear regressions. More generalist species (more carbon sources consumed) achieve slightly higher average yield but the effect size is likely not practically relevant. d, For each condition (substrates × strain), we plotted the growth rate and yield, which are very slightly positively correlated (linear regression P = 2 × 10⁻⁶, R² = 0.005). Points on the far right correspond to the maximal detectable growth rate given our spacing of experimental time points. e, Linear slopes for the per strain regression of yield with growth rate; only 3/186 strains exhibited a statistically significant correlation (linear regression) between rate and yield. The vertical line corresponds to the slope of the regression over all conditions.

Extended Data Fig. 3 Correlation between phenotype distance and different genomic distances.

a–c, Phenotype distance, defined as the cosine distance between consumption vectors, as a function of genomic distance between pairs of strains, where the genomic distance is the GTDB-tk distance (a), the Bray–Curtis distance between gene content (b; based on copy numbers of KEGG KO) or module content (c; based on abundance of KEGG modules). Points are the mean ± s.d. of logarithmic bins; n = 16,471 total comparisons.

Extended Data Fig. 4 Detailed principal component analysis of the growth characterization results.

a, Principal component analysis of the full growth rate matrix, reproduced from Fig. 1 in the main text. b, Averaged loadings of fine-grained categories of substrates normalized to unit length. Detailed loadings of all substrates in the principal component analysis in a. The full principal component analysis shows a clear separation of preferences for organic (including alcohols and aromatics) and amino acids. c,d, Individual loadings per substrate for each principal component (PC; left). Note that all acids have negative loadings on PC1 but all but two organic acids switch sign on PC2 relative to amino acids.Scatter plots of the first principal component (based on full growth rate matrix) versus the SAP as defined in the main text, and the second principal component versus the amino acid–organic acid preference defined analogously (right). Each point is a different isolate, coloured by taxonomic order (as in Fig. 1). P values are derived from linear regressions.

Extended Data Fig. 5 Comparison with external datasets.

a, Re-analysis of data from Kehe and colleagues¹¹. The heat map corresponds to their extended data fig. 2 (final optical density in each condition) except with rows and columns sorted by cosine similarity. b, Principal component analysis of this matrix shows the clustering of the two taxonomic orders and their alignment with the average loadings of acids and sugars. c, Phylogenetic tree based on GTDB-tk of species contained in the IJSEM and DEMETER trait databases as well as proGenomes (by species name). Note that two large phyla, Actinobacteriota and Firmicutes, are not at all represented in our strain library.

Extended Data Fig. 6 Reproducibility between experiments.

a, Smooth histograms of the pairwise correlation coefficients between the growth vectors of strains across all three experiments (V1, V2, V3; V3 is the experiment primarily discussed in the main text). b, Scatter plots of the SAP measured for each strain between all three replicate experiments. P values are derived from linear regressions.

Extended Data Fig. 7 Three measures of pathway abundance and their interrelations.

Completeness, coverage, and duplication are defined in detail in Methods. a, Predicting coverage from completeness (linear model) generally yields higher quality fits than predicting coverage from duplication. b, After correcting for completeness, duplication tends to explain more of the residuals than completeness does after correcting for duplication. c, Neither duplication nor coverage of any individual pathway correlated very strongly with SAP, and whether duplication or coverage of a given pathway was more predictive of SAP depended on the pathway. d, Illustrating the concept of functional duplication on the example of the galactose degradation pathway (KEGG pathway ko00052). Shown is the central part of the pathway that converts lactose and other oligosaccharides first to β-d-galactose, which is transformed through multiple steps to α-d-glucose-6-phosphate, which then enters glycolysis. For some reaction, we found multiple orthologues in the same strains (for example, up to six orthologues of K01785 (galM, aldose 1-epimerase, EC:5.1.3.3). These orthologues are not exact duplicates, as illustrated by the tree on the right. The tree is based on a multiple sequence alignment of all sequences annotated K01785 across all strains. We have highlighted the six copies found in the Zobellia strains A2M03, which are spread around the tree and often grouped with orthologues found in distantly related species. In fact, across all highly duplicated orthologues (maximum number of orthologues per strains of at least six), the pairwise distance (computed from the multiple sequence alignments for each KEGG orthologue using the dist.ml function of the phangorn package in R), was about equally likely to be greater between orthologues in the same strain relative to orthologues in different strains, as it was to be smaller. Thus, ‘duplicated’ orthologues in a strain probably represent functional variants of different evolutionary origin. e,f, Average distances between KEGG orthologues within and between strains for genes associated with sugar and acid catabolism. The KEGG orthologues in black have a more than 10% difference between the two distances. Points represent the mean ± s.e.m.; the number of comparisons differs for each gene, from n = 496 to n = 179,101. g, Comparison between measured and predicted growth on individual substrates. Predicted growth was derived from FBA simulations of genome-scale metabolic models created using CarveMe using standard parameters (no gapfilling). This procedure yielded 58% correct predictions (vertical line), which was within the range of correct predictions achieved when the comparison was performed with shuffled labels (distribution, obtained by shuffling labels 1,000 times, each time measuring the proportion of correct predictions).

Extended Data Fig. 8 The number of polysaccharide-degrading enzymes correlates with SAP.

a–d, Number of CAZymes (a,b, glycosyl hydrolases; and c,d, polysaccharide lyases) and their correlation with SAPs (b,d). b,d, The insets show −log₁₀P per order, the negative log₁₀ of the P value obtained from linear regressions of CAZyme number with SAP within each order; −log₁₀P > 2 (vertical line) corresponds to a significant correlation at the 5% level, Bonferroni corrected for multiple testing. b, The square symbols correspond to the squares in Fig. 1d. These are exceptions to the median metabolic preference per order, such as the acid-specialist Tenacibaculum genus in the Flavobacteriales, which includes fish pathogens⁶⁰. Conversely, the orders Pseudomonadales and Rhodobacterales (commonly thought to specialize in simple substrates¹³) tended to prefer acids (SAP < 0), but we also found the sugar-specialist Pseudomonadales genus Saccharophagus, which are known sugar degraders⁶¹. The Flavobacteriales and Pseudomonadales strains with atypical phenotypes for their taxonomy tended to have fewer/more CAZymes than their close relatives, respectively. Small points correspond to individual isolates, large points with error bars indicate the mean ± s.d. for each order (a,c, n = 28 (Pseudomonadales), 34 (Rhodobacterales), 20 (Vibrionales), 58 (Alteromonadales), 32 (Flavobacteriales)) or SAP bin (b,d, total number of strains n = 182).

Extended Data Fig. 9 Genomic GC content and consequences for nutrient requirements.

a,The GC content (measured across all predicted coding regions) is relatively conserved at the order level across our strain library (n = 28 (Pseudomonadales), 34 (Rhodobacterales), 20 (Vibrionales), 58 (Alteromonadales) and 32 (Flavobacteriales)). b, The GC content predicts the carbon and nitrogen requirements per coded amino acid. All protein sequences were manually scored according to the number of carbon and nitrogen atoms of each amino acid. c, Same data as Fig. 3b without binning: GC content is correlated with genomic GC content across the whole set of strains but not within orders, possibly because GC content evolves very slowly and is thus relatively conserved below the order level. Notably, this correlation was much stronger than the correlation between GC content and other basic characteristics of the genomes, such as the number of coding regions (linear model fit, P = 0.2), and there was no practically significant difference between the GC content of genes in sugar- and acid-catabolic pathways (e). d, Because of the correlation between GC content and both nutrient requirements and SAP, SAP is positively/negative correlated with the number of carbon/nitrogen atoms per coded amino acid. Small points correspond to individual strains, large points with error bars indicate the mean ± s.d. for the five main orders. Lines and P values are derived from linear regressions. e, The average GC content of sugar- and acid-catabolic genes are very similar. Scatter plot of the GC content of all genes annotated as sugar/acid genes (Supplementary Table 5), extracted from the genomes and averaged per strain. The line corresponds to equal GC content in sugar/acid genes. f, Residuals of the linear fit in a, showing a weak but statistically significant (P = 6 × 10⁻¹⁶) trend for high GC genomes to have a slightly higher GC content in sugar genes than acid genes. g, Example for the correlation and linear regression of pathway abundance with GC content in more than n = 11,000 diverse reference genomes (proGenomes). h, Extracting the linear regression coefficients (slopes) for each pathway, all of which were highly significant, yields a picture similar to Fig. 2b, that is, sugar pathways tended to decrease and acid pathways tended to increase in abundance as a function of GC content. The slopes for sugar (n = 7) and acid (n = 26) pathways are significantly different from each other (t-test, dof = 31, T = −4.26, P = 0.00017).

Extended Data Fig. 10 Details of enrichments and synthetic community experiments.

a, Taxonomic distribution and distribution of SAPs in the synthetic communities, coloured by order (Fla, Flavobacteriales; Vib, Vibrionales; Alt, Alteromonadales; Pse, Pseudomonadales; Rho, Rhodobacterales; Cyt, Cytophagales). b, Richness over time in synthetic communities growing on one of four carbon sources (Fig. 4a). Points with error bars indicate the mean ± s.d. across six replicates. c, Abundance-weighted average GC content of communities enriched on acids or sugars. Genome-average GC for individual OTUs was estimated using SkewDB (Methods). The distributions are statistically significantly different (two-sided Welch’s t-test \(T=6.95,{\rm{dof}}=13.8,{P}=7.5\times {10}^{-6}\)). d, Final richness in synthetic communities growing on four different concentrations of GlcNAc. The communities consisted of a complex mixture of strains, of which only about half were capable of consuming GlcNAc in monoculture (consumers). The remaining species (crossfeeders) therefore must have been crossfeeding on metabolites excreted by the consumers. e,f, Average number of C or N atoms per coded amino acid in the communities, weighted by the abundance of each strain. Shown is the average over the last five time points. Asterisks indicate significant differences between conditions (P = {2, 0.2, 5.8, 6.2} × 10⁻⁶ from top to bottom in e and P = {0.01, 3.0, 3.8, 1,4} × 10⁻⁵ from top to bottom in f) in a two-tailed Mann–Whitney test (using Bonferroni correction for multiple testing). d–f,h, Small points correspond to replicates (including different dilution factors, n = 12 points per condition), large points with error bars indicate the mean ± s.d. g, Functional composition of synthetic communities growing on four different concentrations of GlcNAc as the sole carbon (but not nitrogen) source. Final species compositions are shown as bar charts, where each species is coloured according to its SAP. At low GlcNAc concentrations, more acid-specialist species (negative SAP, green tones) dominated. This trend was driven not by a change in the relative abundance of consumers (which was roughly constant across conditions) but by both consumers and crossfeeders with lower SAP dominating at lower carbon concentrations. h, This pattern was remained when perturbing the communities. All four replicate communities at the intermediate dilution factor (grown for six cycles at the highest and lowest concentration (20 and 0.02 mM GlcNAc, respectively) were transferred into all of the other concentrations, in parallel to the unperturbed communities. Consistently with the unperturbed observation, an increase/decrease in GlcNAc concentration led to an increase/decrease in cSAP, respectively. This effect was overall stronger for more severe perturbation, for example, compare the 20 mM to 2 mM switched communities (yellow) to the 20 mM to 0.02 mM switched communities (red).

Supplementary information

Reporting Summary

Supplementary Tables 1–8

Supplementary Table 1. List of strains. Supplementary Table 2. List of substrates. Supplementary Table 3. Full dataset of growth rates. Supplementary Table 4. KEGG pathways used for SAP predictions. Supplementary Table 5. KOs used for SAP predictions. Supplementary Table 6. List of sugar/acid KOs in our strains. Supplementary Table 7. Predicted SAP for reference genomes. Supplementary Table 8. OTUs for synthetic communities on four carbon sources.

Source data

Source Data Figs. 1–4

Source data for Figs. 1–4.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Gralka, M., Pollak, S. & Cordero, O.X. Genome content predicts the carbon catabolic preferences of heterotrophic bacteria. Nat Microbiol 8, 1799–1808 (2023). https://doi.org/10.1038/s41564-023-01458-z

Download citation

Received: 08 February 2023
Accepted: 24 July 2023
Published: 31 August 2023
Issue Date: October 2023
DOI: https://doi.org/10.1038/s41564-023-01458-z

This article is cited by

Multi-Attribute Subset Selection enables prediction of representative phenotypes across microbial populations
- Konrad Herbst
- Taiyao Wang
- Daniel Segrè
Communications Biology (2024)
Predictions of rhizosphere microbiome dynamics with a genome-informed and trait-based energy budget model
- Gianna L. Marschmann
- Jinyun Tang
- Eoin L. Brodie
Nature Microbiology (2024)
Constraints on microbial metabolic complexity
- Zeqian Li
- Vaibhhav Sinha
- Seppe Kuehn
Nature Microbiology (2023)