Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Genome content predicts the carbon catabolic preferences of heterotrophic bacteria


Heterotrophic bacteria—bacteria that utilize organic carbon sources—are taxonomically and functionally diverse across environments. It is challenging to map metabolic interactions and niches within microbial communities due to the large number of metabolites that could serve as potential carbon and energy sources for heterotrophs. Whether their metabolic niches can be understood using general principles, such as a small number of simplified metabolic categories, is unclear. Here we perform high-throughput metabolic profiling of 186 marine heterotrophic bacterial strains cultured in media containing one of 135 carbon substrates to determine growth rates, lag times and yields. We show that, despite high variability at all levels of taxonomy, the catabolic niches of heterotrophic bacteria can be understood in terms of their preference for either glycolytic (sugars) or gluconeogenic (amino and organic acids) carbon sources. This preference is encoded by the total number of genes found in pathways that feed into the two modes of carbon utilization and can be predicted using a simple linear model based on gene counts. This allows for coarse-grained descriptions of microbial communities in terms of prevalent modes of carbon catabolism. The sugar–acid preference is also associated with genomic GC content and thus with the carbon–nitrogen requirements of their encoded proteome. Our work reveals how the evolution of bacterial genomes is structured by fundamental constraints rooted in metabolism.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Phenotyping experiment uncovers metabolic preferences across 186 diverse marine bacterial strains.
Fig. 2: Metabolic preferences can be predicted from genomes.
Fig. 3: Metabolic preference correlates with genomic GC content.
Fig. 4: Use of SAP to functionally annotate bacterial communities.
Fig. 5: Schematic overview of central metabolic fluxes when the primary substrate is either glycolytic or gluconeogenic.

Similar content being viewed by others

Data availability

All growth and genomic data are available at All isolates are available from either M.G. (Europe) or O.X.C. (USA) on request. All genome assemblies are available under BioProjects PRJNA319196 and PRJNA478695, with the exception of strains 1A06 (PRJNA318805), 12B01 (PRJNA13568), 13B01 (PRJNA318805), DSS-3 (BioSample SAMN02604003) as well as AS40, AS56, AS88 and AS94 (PRJNA996876). Source data are provided with this paper.

Code availability

All code needed to reproduce the figures are available at


  1. Huttenhower, C. et al. Structure, function and diversity of the healthy human microbiome. Nature 486, 207–214 (2012).

    Article  CAS  Google Scholar 

  2. Thompson, L. R. et al. A communal catalogue reveals Earth’s multiscale microbial diversity. Nature 551, 457–463 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Sunagawa, S. et al. Structure and function of the global ocean microbiome. Science 348, 1261359 (2015).

    Article  PubMed  Google Scholar 

  4. Pontrelli, S. et al. Metabolic cross-feeding structures the assembly of polysaccharide degrading communities. Sci. Adv. 8, eabk3076 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Gralka, M., Szabo, R., Stocker, R. & Cordero, O. X. Trophic interactions and the drivers of microbial community assembly. Curr. Biol. 30, R1176–R1188 (2020).

    Article  CAS  PubMed  Google Scholar 

  6. Pollak, S. et al. Public good exploitation in natural bacterioplankton communities. Sci. Adv. 7, eabi4717 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Moran, M. A. The global ocean microbiome. Science 350, aac8455 (2015).

    Article  PubMed  Google Scholar 

  8. Datta, M. S., Sliwerska, E., Gore, J., Polz, M. F. & Cordero, O. X. Microbial interactions lead to rapid micro-scale successions on model marine particles. Nat. Commun. 7, 11965 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Enke, T. N. et al. Modular assembly of polysaccharide-degrading marine microbial communities. Curr. Biol. 29, 1528–1535 (2019).

    Article  CAS  PubMed  Google Scholar 

  10. Fahimipour, A. K. & Gross, T. Mapping the bacterial metabolic niche space. Nat. Commun. 11, 4887 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Kehe, J. et al. Positive interactions are common among culturable bacteria. Sci. Adv. 7, eabi7159 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Kirchman, D. L. The ecology of CytophagaFlavobacteria in aquatic environments. FEMS Microbiol. Ecol. 39, 91–100 (2002).

    CAS  PubMed  Google Scholar 

  13. Buchan, A., LeCleir, G. R., Gulvik, C. A. & González, J. M. Master recyclers: features and functions of bacteria associated with phytoplankton blooms. Nat. Rev. Microbiol. 12, 686–698 (2014).

    Article  CAS  PubMed  Google Scholar 

  14. Machado, D., Andrejev, S., Tramontano, M. & Patil, K. R. Fast automated reconstruction of genome-scale metabolic models for microbial species and communities. Nucleic Acids Res. 46, 7542–7553 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Barberán, A., Caceres Velazquez, H., Jones, S. & Fierer, N. Hiding in plain sight: mining bacterial species records for phenotypic trait information. mSphere 2, e00237-17 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  16. Mende, D. R. et al. ProGenomes2: an improved database for accurate and consistent habitat, taxonomic and functional annotations of prokaryotic genomes. Nucleic Acids Res. 48, D621–D625 (2020).

    CAS  PubMed  Google Scholar 

  17. Sueoka, N. Correlation between base composition of deoxyribonucleic acid and amino acid composition of protein. Proc. Natl Acad. Sci. USA 47, 1141–1149 (1961).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Hellweger, F. L., Huang, Y. & Luo, H. Carbon limitation drives GC content evolution of a marine bacterium in an individual-based genome-scale model. ISME J. 12, 1180–1187 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Shenhav, L. & Zeevi, D. Resource conservation manifests in the genetic code. Science 370, 683–687 (2020).

    Article  CAS  PubMed  Google Scholar 

  20. Mende, D. R. et al. Environmental drivers of a microbial genomic transition zone in the ocean’s interior. Nat. Microbiol. 2, 1367–1373 (2017).

    Article  CAS  PubMed  Google Scholar 

  21. Musto, H. et al. Genomic GC level, optimal growth temperature, and genome size in prokaryotes. Biochem. Biophys. Res. Commun. 347, 1–3 (2006).

    Article  CAS  PubMed  Google Scholar 

  22. Estrela, S. et al. Functional attractors in microbial community assembly. Cell Syst. 13, 29–42 (2022).

    Article  CAS  PubMed  Google Scholar 

  23. Amarnath, K. et al. Stress-induced metabolic exchanges between complementary bacterial types underly a dynamic mechanism of inter-species stress resistance. Nat. Commun. 14, 3165 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Estrela, S., Diaz-Colunga, J., Vila, J. C., Sanchez-Gorostiaga, A., & Sanchez, A. Diversity begets diversity under microbial niche construction. Preprint at bioRxiv (2022).

  25. Schink, S. J. et al. Glycolysis/gluconeogenesis specialization in microbes is driven by biochemical constraints of flux sensing. Mol. Syst. Biol. 18, e10704 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Basan, M. et al. A universal trade-off between growth and lag in fluctuating environments. Nature 584, 470–474 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Plucain, J. et al. Epistasis and allele specificity in the emergence of a stable polymorphism in Escherichia coli. Science 343, 160–164 (2014).

    Article  Google Scholar 

  28. Blount, Z. D., Borland, C. Z. & Lenski, R. E. Historical contingency and the evolution of a key innovation in an experimental population of Escherichia coli. Proc. Natl Acad. Sci. USA 105, 7899–7906 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Le Gac, M., Plucain, J., Hindré, T., Lenski, R. E. & Schneider, D. Ecological and evolutionary dynamics of coexisting lineages during a long-term experiment with Escherichia coli. Proc. Natl Acad. Sci. USA 109, 9487–9492 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  30. Hershberg, R. & Petrov, D. A. Evidence that mutation is universally biased towards AT in bacteria. PLoS Genet. 6, e1001115 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  31. Ely, B. Genomic GC content drifts downward in most bacterial genomes. PLoS ONE 16, e0244163 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Maddamsetti, R. & Grant, N. A. Divergent evolution of mutation rates and biases in the long-term evolution experiment with Escherichia coli. Genome Biol. Evol. 12, 1591–1603 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Yakovchuk, P., Protozanova, E. & Frank-Kamenetskii, M. D. Base-stacking and base-pairing contributions into thermal stability of the DNA double helix. Nucleic Acids Res. 34, 564–574 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Lassalle, F. et al. GC-content evolution in bacterial genomes: the biased gene conversion hypothesis expands. PLoS Genet. 11, e1004941 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  35. Shenhav, L. & Zeevi, D. Resource conservation manifests in the genetic code. Science 370, 683–687 (2020).

    Article  CAS  PubMed  Google Scholar 

  36. Smriga, S., Ciccarese, D. & Babbin, A. R. Denitrifying bacteria respond to and shape microscale gradients within particulate matrices. Commun. Biol. 4, 570 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Gowda, K., Ping, D., Mani, M. & Kuehn, S. Genomic structure predicts metabolite dynamics in microbial communities. Cell 185, 530–546 (2022).

    Article  CAS  PubMed  Google Scholar 

  38. Moran, M. A. et al. Genome sequence of Silicibacter pomeroyi reveals adaptations to the marine environment. Nature 432, 910–913 (2004).

    Article  CAS  PubMed  Google Scholar 

  39. Ben-Haim, Y. et al. Vibrio coralliilyticus sp. nov., a temperature-dependent pathogen of the coral Pocillopora damicornis. Int. J. Syst. Evol. Microbiol. 53, 309–315 (2003).

    Article  CAS  PubMed  Google Scholar 

  40. Hehemann, J. H. et al. Adaptive radiation by waves of gene transfer leads to fine-scale resource partitioning in marine microbes. Nat. Commun. 7, 12860 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Bankevich, A. et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19, 455–477 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P. & Tyson, G. W. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 25, 1043–1055 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Hyatt, D. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinform. 11, 119 (2010).

    Article  Google Scholar 

  44. Huerta-Cepas, J. et al. EGGNOG 4.5: a hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences. Nucleic Acids Res. 44, D286–D293 (2016).

    Article  CAS  PubMed  Google Scholar 

  45. Zhang, H. et al. DbCAN2: a meta server for automated carbohydrate-active enzyme annotation. Nucleic Acids Res. 46, W95–W101 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Chaumeil, P. A., Mussig, A. J., Hugenholtz, P. & Parks, D. H. GTDB-Tk: a toolkit to classify genomes with the genome taxonomy database. Bioinformatics 36, 1925–1927 (2020).

    Article  CAS  Google Scholar 

  47. Shen, W. & Ren, H. TaxonKit: a practical and efficient NCBI taxonomy toolkit. J. Genet. Genomics 48, 844–850 (2021).

    Article  PubMed  Google Scholar 

  48. Ebrahim, A., Lerman, J. A., Palsson, B. O. & Hyduke, D. R. COBRApy: COnstraints-based reconstruction and analysis for Python. BMC Syst. Biol. 7, 74 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  49. Wolfram Mathematica v. 13.2 (Wolfram, 2022).

  50. R: A Language and Environment for Statistical Computing (R Core Team, 2022).

  51. Yu, G., Smith, D. K., Zhu, H., Guan, Y. & Lam, T. T. Y. Ggtree: an R package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods Ecol. Evol. 8, 28–36 (2017).

    Article  Google Scholar 

  52. Paradis, E. & Schliep, K. Ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics 35, 526–528 (2019).

    Article  CAS  PubMed  Google Scholar 

  53. Tamura, K., Stecher, G. & Kumar, S. MEGA11: Molecular Evolutionary Genetics Analysis version 11. Mol. Biol. Evol. 38, 3022–3027 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Schliep, K. P. phangorn: Phylogenetic analysis in R. Bioinformatics 27, 592–593 (2011).

    Article  CAS  PubMed  Google Scholar 

  55. Kanehisa, M., Sato, Y., Kawashima, M., Furumichi, M. & Tanabe, M. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 44, D457–D462 (2016).

    Article  CAS  PubMed  Google Scholar 

  56. Heinken, A. et al. Genome-scale metabolic reconstruction of 7,302 human microorganisms for personalized medicine. Nat. Biotechnol. (2023).

  57. Heinken, A., Magnúsdóttir, S., Fleming, R. M. T. & Thiele, I. DEMETER: efficient simultaneous curation of genome-scale reconstructions guided by experimental data and refined gene annotations. Bioinformatics 37, 3974–3975 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Callahan, B. J. et al. DADA2: high-resolution sample inference from Illumina amplicon data. Nat. Methods 13, 581–583 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Hubert, B. SkewDB, a comprehensive database of GC and 10 other skews for over 30,000 chromosomes and plasmids. Sci. Data 9, 92 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Lagadec, E., Småge, S. B., Trösse, C. & Nylund, A. Phylogenetic analyses of Norwegian Tenacibaculum strains confirm high bacterial diversity and suggest circulation of ubiquitous virulent strains. PLoS One 16, e0259215 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. Ekborg, N. A. et al. Saccharophagus degradans gen. nov., sp. nov., a versatile marine degrader of complex polysaccharides. Int. J. Syst. Evol. Microbiol. 55, 1545–1549 (2005).

    Article  CAS  PubMed  Google Scholar 

Download references


We thank S. Estrela (Yale University and Stanford University) for providing community composition data from their enrichment experiments (Fig. 4d); A. Sichert for assembling genomes; and M. d. Bello, X. Shan, T. Hwa as well as all members of the Cordero laboratory and Simons PriME collaboration for their enriching discussions. We acknowledge funding from the Simons Collaboration: Principles of Microbial Ecosystems (PriME) award number 542395 (O.X.C.) and Simons Foundation Postdoctoral Fellowship Award number 599207 (M.G.).

Author information

Authors and Affiliations



M.G. designed the study, performed all experiments, analysed all data and wrote the initial manuscript. S.P. analysed the genomic data from the proGenomes database. M.G., S.P. and O.X.C. discussed the results. O.X.C. directed the project and edited the manuscript.

Corresponding authors

Correspondence to Matti Gralka or Otto X. Cordero.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Microbiology thanks Sara Mitri, Seppe Kuehn and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Phylogenetic tree of all strains used in this study.

The tree and taxonomy were created using the GTDB-tk classify workflow using standard parameters from an alignment of 120 marker genes. The legend corresponds to expected substitutions per site. A full list of all strains is provided in Supplementary Table 1.

Extended Data Fig. 2 Overview of growth characterization results.

a, Number of carbon sources supporting growth per strain. b, Fraction of all strains that were able to use a given substrate as their sole carbon and energy source. c, There was a lack of strong correlation between the number of carbon sources that support growth, growth rate and yield. Average yield (blue dots) and rate (red squares) binned by the number of carbon sources that supported growth, shown as the mean ± s.d. (for a total of n = 182 strains showing growth on at least one substrate). Lines and P values are derived from linear regressions. More generalist species (more carbon sources consumed) achieve slightly higher average yield but the effect size is likely not practically relevant. d, For each condition (substrates × strain), we plotted the growth rate and yield, which are very slightly positively correlated (linear regression P = 2 × 10−6, R2 = 0.005). Points on the far right correspond to the maximal detectable growth rate given our spacing of experimental time points. e, Linear slopes for the per strain regression of yield with growth rate; only 3/186 strains exhibited a statistically significant correlation (linear regression) between rate and yield. The vertical line corresponds to the slope of the regression over all conditions.

Extended Data Fig. 3 Correlation between phenotype distance and different genomic distances.

ac, Phenotype distance, defined as the cosine distance between consumption vectors, as a function of genomic distance between pairs of strains, where the genomic distance is the GTDB-tk distance (a), the Bray–Curtis distance between gene content (b; based on copy numbers of KEGG KO) or module content (c; based on abundance of KEGG modules). Points are the mean ± s.d. of logarithmic bins; n = 16,471 total comparisons.

Extended Data Fig. 4 Detailed principal component analysis of the growth characterization results.

a, Principal component analysis of the full growth rate matrix, reproduced from Fig. 1 in the main text. b, Averaged loadings of fine-grained categories of substrates normalized to unit length. Detailed loadings of all substrates in the principal component analysis in a. The full principal component analysis shows a clear separation of preferences for organic (including alcohols and aromatics) and amino acids. c,d, Individual loadings per substrate for each principal component (PC; left). Note that all acids have negative loadings on PC1 but all but two organic acids switch sign on PC2 relative to amino acids.Scatter plots of the first principal component (based on full growth rate matrix) versus the SAP as defined in the main text, and the second principal component versus the amino acid–organic acid preference defined analogously (right). Each point is a different isolate, coloured by taxonomic order (as in Fig. 1). P values are derived from linear regressions.

Extended Data Fig. 5 Comparison with external datasets.

a, Re-analysis of data from Kehe and colleagues11. The heat map corresponds to their extended data fig. 2 (final optical density in each condition) except with rows and columns sorted by cosine similarity. b, Principal component analysis of this matrix shows the clustering of the two taxonomic orders and their alignment with the average loadings of acids and sugars. c, Phylogenetic tree based on GTDB-tk of species contained in the IJSEM and DEMETER trait databases as well as proGenomes (by species name). Note that two large phyla, Actinobacteriota and Firmicutes, are not at all represented in our strain library.

Extended Data Fig. 6 Reproducibility between experiments.

a, Smooth histograms of the pairwise correlation coefficients between the growth vectors of strains across all three experiments (V1, V2, V3; V3 is the experiment primarily discussed in the main text). b, Scatter plots of the SAP measured for each strain between all three replicate experiments. P values are derived from linear regressions.

Extended Data Fig. 7 Three measures of pathway abundance and their interrelations.

Completeness, coverage, and duplication are defined in detail in Methods. a, Predicting coverage from completeness (linear model) generally yields higher quality fits than predicting coverage from duplication. b, After correcting for completeness, duplication tends to explain more of the residuals than completeness does after correcting for duplication. c, Neither duplication nor coverage of any individual pathway correlated very strongly with SAP, and whether duplication or coverage of a given pathway was more predictive of SAP depended on the pathway. d, Illustrating the concept of functional duplication on the example of the galactose degradation pathway (KEGG pathway ko00052). Shown is the central part of the pathway that converts lactose and other oligosaccharides first to β-d-galactose, which is transformed through multiple steps to α-d-glucose-6-phosphate, which then enters glycolysis. For some reaction, we found multiple orthologues in the same strains (for example, up to six orthologues of K01785 (galM, aldose 1-epimerase, EC: These orthologues are not exact duplicates, as illustrated by the tree on the right. The tree is based on a multiple sequence alignment of all sequences annotated K01785 across all strains. We have highlighted the six copies found in the Zobellia strains A2M03, which are spread around the tree and often grouped with orthologues found in distantly related species. In fact, across all highly duplicated orthologues (maximum number of orthologues per strains of at least six), the pairwise distance (computed from the multiple sequence alignments for each KEGG orthologue using the function of the phangorn package in R), was about equally likely to be greater between orthologues in the same strain relative to orthologues in different strains, as it was to be smaller. Thus, ‘duplicated’ orthologues in a strain probably represent functional variants of different evolutionary origin. e,f, Average distances between KEGG orthologues within and between strains for genes associated with sugar and acid catabolism. The KEGG orthologues in black have a more than 10% difference between the two distances. Points represent the mean ± s.e.m.; the number of comparisons differs for each gene, from n = 496 to n = 179,101. g, Comparison between measured and predicted growth on individual substrates. Predicted growth was derived from FBA simulations of genome-scale metabolic models created using CarveMe using standard parameters (no gapfilling). This procedure yielded 58% correct predictions (vertical line), which was within the range of correct predictions achieved when the comparison was performed with shuffled labels (distribution, obtained by shuffling labels 1,000 times, each time measuring the proportion of correct predictions).

Extended Data Fig. 8 The number of polysaccharide-degrading enzymes correlates with SAP.

ad, Number of CAZymes (a,b, glycosyl hydrolases; and c,d, polysaccharide lyases) and their correlation with SAPs (b,d). b,d, The insets show −log10P per order, the negative log10 of the P value obtained from linear regressions of CAZyme number with SAP within each order; −log10P > 2 (vertical line) corresponds to a significant correlation at the 5% level, Bonferroni corrected for multiple testing. b, The square symbols correspond to the squares in Fig. 1d. These are exceptions to the median metabolic preference per order, such as the acid-specialist Tenacibaculum genus in the Flavobacteriales, which includes fish pathogens60. Conversely, the orders Pseudomonadales and Rhodobacterales (commonly thought to specialize in simple substrates13) tended to prefer acids (SAP < 0), but we also found the sugar-specialist Pseudomonadales genus Saccharophagus, which are known sugar degraders61. The Flavobacteriales and Pseudomonadales strains with atypical phenotypes for their taxonomy tended to have fewer/more CAZymes than their close relatives, respectively. Small points correspond to individual isolates, large points with error bars indicate the mean ± s.d. for each order (a,c, n = 28 (Pseudomonadales), 34 (Rhodobacterales), 20 (Vibrionales), 58 (Alteromonadales), 32 (Flavobacteriales)) or SAP bin (b,d, total number of strains n = 182).

Extended Data Fig. 9 Genomic GC content and consequences for nutrient requirements.

a,The GC content (measured across all predicted coding regions) is relatively conserved at the order level across our strain library (n = 28 (Pseudomonadales), 34 (Rhodobacterales), 20 (Vibrionales), 58 (Alteromonadales) and 32 (Flavobacteriales)). b, The GC content predicts the carbon and nitrogen requirements per coded amino acid. All protein sequences were manually scored according to the number of carbon and nitrogen atoms of each amino acid. c, Same data as Fig. 3b without binning: GC content is correlated with genomic GC content across the whole set of strains but not within orders, possibly because GC content evolves very slowly and is thus relatively conserved below the order level. Notably, this correlation was much stronger than the correlation between GC content and other basic characteristics of the genomes, such as the number of coding regions (linear model fit, P = 0.2), and there was no practically significant difference between the GC content of genes in sugar- and acid-catabolic pathways (e). d, Because of the correlation between GC content and both nutrient requirements and SAP, SAP is positively/negative correlated with the number of carbon/nitrogen atoms per coded amino acid. Small points correspond to individual strains, large points with error bars indicate the mean ± s.d. for the five main orders. Lines and P values are derived from linear regressions. e, The average GC content of sugar- and acid-catabolic genes are very similar. Scatter plot of the GC content of all genes annotated as sugar/acid genes (Supplementary Table 5), extracted from the genomes and averaged per strain. The line corresponds to equal GC content in sugar/acid genes. f, Residuals of the linear fit in a, showing a weak but statistically significant (P = 6 × 10−16) trend for high GC genomes to have a slightly higher GC content in sugar genes than acid genes. g, Example for the correlation and linear regression of pathway abundance with GC content in more than n = 11,000 diverse reference genomes (proGenomes). h, Extracting the linear regression coefficients (slopes) for each pathway, all of which were highly significant, yields a picture similar to Fig. 2b, that is, sugar pathways tended to decrease and acid pathways tended to increase in abundance as a function of GC content. The slopes for sugar (n = 7) and acid (n = 26) pathways are significantly different from each other (t-test, dof = 31, T = −4.26, P = 0.00017).

Extended Data Fig. 10 Details of enrichments and synthetic community experiments.

a, Taxonomic distribution and distribution of SAPs in the synthetic communities, coloured by order (Fla, Flavobacteriales; Vib, Vibrionales; Alt, Alteromonadales; Pse, Pseudomonadales; Rho, Rhodobacterales; Cyt, Cytophagales). b, Richness over time in synthetic communities growing on one of four carbon sources (Fig. 4a). Points with error bars indicate the mean ± s.d. across six replicates. c, Abundance-weighted average GC content of communities enriched on acids or sugars. Genome-average GC for individual OTUs was estimated using SkewDB (Methods). The distributions are statistically significantly different (two-sided Welch’s t-test \(T=6.95,{\rm{dof}}=13.8,{P}=7.5\times {10}^{-6}\)). d, Final richness in synthetic communities growing on four different concentrations of GlcNAc. The communities consisted of a complex mixture of strains, of which only about half were capable of consuming GlcNAc in monoculture (consumers). The remaining species (crossfeeders) therefore must have been crossfeeding on metabolites excreted by the consumers. e,f, Average number of C or N atoms per coded amino acid in the communities, weighted by the abundance of each strain. Shown is the average over the last five time points. Asterisks indicate significant differences between conditions (P = {2, 0.2, 5.8, 6.2} × 10−6 from top to bottom in e and P = {0.01, 3.0, 3.8, 1,4} × 10−5 from top to bottom in f) in a two-tailed Mann–Whitney test (using Bonferroni correction for multiple testing). df,h, Small points correspond to replicates (including different dilution factors, n = 12 points per condition), large points with error bars indicate the mean ± s.d. g, Functional composition of synthetic communities growing on four different concentrations of GlcNAc as the sole carbon (but not nitrogen) source. Final species compositions are shown as bar charts, where each species is coloured according to its SAP. At low GlcNAc concentrations, more acid-specialist species (negative SAP, green tones) dominated. This trend was driven not by a change in the relative abundance of consumers (which was roughly constant across conditions) but by both consumers and crossfeeders with lower SAP dominating at lower carbon concentrations. h, This pattern was remained when perturbing the communities. All four replicate communities at the intermediate dilution factor (grown for six cycles at the highest and lowest concentration (20 and 0.02 mM GlcNAc, respectively) were transferred into all of the other concentrations, in parallel to the unperturbed communities. Consistently with the unperturbed observation, an increase/decrease in GlcNAc concentration led to an increase/decrease in cSAP, respectively. This effect was overall stronger for more severe perturbation, for example, compare the 20 mM to 2 mM switched communities (yellow) to the 20 mM to 0.02 mM switched communities (red).

Supplementary information

Reporting Summary

Supplementary Tables 1–8

Supplementary Table 1. List of strains. Supplementary Table 2. List of substrates. Supplementary Table 3. Full dataset of growth rates. Supplementary Table 4. KEGG pathways used for SAP predictions. Supplementary Table 5. KOs used for SAP predictions. Supplementary Table 6. List of sugar/acid KOs in our strains. Supplementary Table 7. Predicted SAP for reference genomes. Supplementary Table 8. OTUs for synthetic communities on four carbon sources.

Source data

Source Data Figs. 1–4

Source data for Figs. 1–4.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gralka, M., Pollak, S. & Cordero, O.X. Genome content predicts the carbon catabolic preferences of heterotrophic bacteria. Nat Microbiol 8, 1799–1808 (2023).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

This article is cited by


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing