Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Long-term phenotypic evolution of bacteria


For many decades comparative analyses of protein sequences and structures have been used to investigate fundamental principles of molecular evolution1,2. In contrast, relatively little is known about the long-term evolution of species’ phenotypic and genetic properties. This represents an important gap in our understanding of evolution, as exactly these proprieties play key roles in natural selection and adaptation to diverse environments. Here we perform a comparative analysis of bacterial growth and gene deletion phenotypes using hundreds of genome-scale metabolic models. Overall, bacterial phenotypic evolution can be described by a two-stage process with a rapid initial phenotypic diversification followed by a slow long-term exponential divergence. The observed average divergence trend, with approximately similar fractions of phenotypic properties changing per unit time, continues for billions of years. We experimentally confirm the predicted divergence trend using the phenotypic profiles of 40 diverse bacterial species across more than 60 growth conditions. Our analysis suggests that, at long evolutionary distances, gene essentiality is significantly more conserved than the ability to utilize different nutrients, while synthetic lethality is significantly less conserved. We also find that although a rapid phenotypic evolution is sometimes observed within the same species, a transition from high to low phenotypic similarity occurs primarily at the genus level.

This is a preview of subscription content

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Figure 1: Diversity of considered bacterial families.
Figure 2: Evolution of bacterial metabolic growth phenotypes.
Figure 3: Distribution of phenotypic similarity at different levels of bacterial taxonomic classification.
Figure 4: Evolution of bacterial genetic phenotypes.


  1. 1

    Zuckerkandl, E. & Pauling, L. in Evolving Genes and Proteins (eds Bryson, V. & Vogel, H. ) 97–166 (Academic, 1965)

    Book  Google Scholar 

  2. 2

    Kimura, M. Evolutionary rate at the molecular level. Nature 217, 624–626 (1968)

    ADS  CAS  Article  Google Scholar 

  3. 3

    Darwin, C. The Origin of Species (Barnes & Noble Classics, 2008)

    Book  Google Scholar 

  4. 4

    Orth, J. D., Thiele, I. & Palsson, B. O. What is flux balance analysis? Nature Biotechnol. 28, 245–248 (2010)

    CAS  Article  Google Scholar 

  5. 5

    Oberhardt, M. A., Palsson, B. O. & Papin, J. A. Applications of genome-scale metabolic reconstructions. Mol. Syst. Biol. 5, 320 (2009)

    Article  Google Scholar 

  6. 6

    Kim, T. Y., Sohn, S. B., Kim, Y. B., Kim, W. J. & Lee, S. Y. Recent advances in reconstruction and applications of genome-scale metabolic models. Curr. Opin. Biotechnol. 24, 617–623 (2011)

    Google Scholar 

  7. 7

    Henry, C. S. et al. High-throughput generation, optimization and analysis of genome-scale metabolic models. Nature Biotechnol. 28, 977–982 (2010)

    CAS  Article  Google Scholar 

  8. 8

    Bochner, B. R. Global phenotypic characterization of bacteria. FEMS Microbiol. Rev. 33, 191–205 (2009)

    CAS  Article  Google Scholar 

  9. 9

    Schuetz, R., Zamboni, N., Zampieri, M., Heinemann, M. & Sauer, U. Multidimensional optimality of microbial metabolism. Science 336, 601–604 (2012)

    ADS  CAS  Article  Google Scholar 

  10. 10

    Peregrin-Alvarez, J. M., Sanford, C. & Parkinson, J. The conservation and evolutionary modularity of metabolism. Genome Biol. 10, R63 (2009)

    Article  Google Scholar 

  11. 11

    Moran, N. A., Munson, M. A., Baumann, P. & Ishikawa, H. A molecular clock in endosymbiotic bacteria is calibrated using the insect hosts. Proc. R. Soc. Lond. B 253, 167–171 (1993)

    ADS  Article  Google Scholar 

  12. 12

    Sabarly, V. et al. The decoupling between genetic structure and metabolic phenotypes in Escherichia coli leads to continuous phenotypic diversity. J. Evol. Biol. 24, 1559–1571 (2011)

    CAS  Article  Google Scholar 

  13. 13

    Gripp, E. et al. Closely related Campylobacter jejuni strains from different sources reveal a generalist rather than a specialist lifestyle. BMC Genomics 12, 584 (2011)

    CAS  Article  Google Scholar 

  14. 14

    Monk, J. M. et al. Genome-scale metabolic reconstructions of multiple Escherichia coli strains highlight strain-specific adaptations to nutritional environments. Proc. Natl Acad. Sci. USA 110, 20338–20343 (2013)

    ADS  CAS  Article  Google Scholar 

  15. 15

    de Hoon, M. J., Eichenberger, P. & Vitkup, D. Hierarchical evolution of the bacterial sporulation network. Curr. Biol. 20, R735–R745 (2010)

    CAS  Article  Google Scholar 

  16. 16

    Kirschner, M. W. & Gerhart, J. C. The Plausibility of Life: Resolving Darwin’s Dilemma (Yale Univ. Press, 2005)

    Google Scholar 

  17. 17

    Dixon, S. J. et al. Significant conservation of synthetic lethal genetic interaction networks between distantly related eukaryotes. Proc. Natl Acad. Sci. USA 105, 16653–16658 (2008)

    ADS  CAS  Article  Google Scholar 

  18. 18

    Tischler, J., Lehner, B. & Fraser, A. G. Evolutionary plasticity of genetic interaction networks. Nature Genet. 40, 390–391 (2008)

    CAS  Article  Google Scholar 

  19. 19

    Szappanos, B. et al. An integrated approach to characterize genetic interaction networks in yeast metabolism. Nature Genet. 43, 656–662 (2011)

    CAS  Article  Google Scholar 

  20. 20

    Barve, A. & Wagner, A. A latent capacity for evolutionary innovation through exaptation in metabolic systems. Nature 500, 203–206 (2013)

    ADS  CAS  Article  Google Scholar 

  21. 21

    Plata, G., Fuhrer, T., Hsiao, T. L., Sauer, U. & Vitkup, D. Global probabilistic annotation of metabolic networks enables enzyme discovery. Nature Chem. Biol. 8, 848–854 (2012)

    CAS  Article  Google Scholar 

  22. 22

    Benson, D. A., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J. & Sayers, E. W. GenBank. Nucleic Acids Res. 37, D26–D31 (2009)

    CAS  Article  Google Scholar 

  23. 23

    Sievers, F. et al. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol. Syst. Biol. 7, 539 (2011)

    Article  Google Scholar 

  24. 24

    Quast, C. et al. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res. 41, D590–D596 (2013)

    CAS  Article  Google Scholar 

  25. 25

    Felsenstein, J. PHYLIP (Phylogeny Inference Package) version 3.6. Cladistics 5, 164–166 (1989)

    Google Scholar 

  26. 26

    Sayers, E. W. et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 37, D5–D15 (2009)

    CAS  Article  Google Scholar 

  27. 27

    Becker, S. A. et al. Quantitative prediction of cellular metabolism with constraint-based models: the COBRA Toolbox. Nature Protocols 2, 727–738 (2007)

    ADS  CAS  Article  Google Scholar 

  28. 28

    Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997)

    CAS  Article  Google Scholar 

  29. 29

    Glass, J. I. et al. Essential genes of a minimal bacterium. Proc. Natl Acad. Sci. USA 103, 425–430 (2006)

    ADS  CAS  Article  Google Scholar 

  30. 30

    Kanehisa, M. & Goto, S. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 28, 27–30 (2000)

    CAS  Article  Google Scholar 

Download references


We thank B. Bochner and Biolog for providing the experimental phenotypic growth data. We also thank members of the Vitkup laboratory for discussions. This work was supported in part by the National Institute of General Medical Sciences GM079759 grant to D.V. and the U54CA121852 grant to Columbia University. The work by C.S.H. was supported by the Department of Energy contract DE-ACO2-06CH11357, as part of the SB Knowledgebase.

Author information




G.P. and D.V. conceived the study and performed the research and data analysis. C.S.H. built the metabolic models. D.V. directed the research. G.P. and D.V. wrote the manuscript. All authors read and edited the manuscript.

Corresponding author

Correspondence to Dennis Vitkup.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Additional information

The 322 models used in this study are available at

Extended data figures and tables

Extended Data Figure 1 The evolution of phenotypic similarity in the usage of carbon sources for ATP production.

Genetic distances are based on 16S bacterial rRNA sequences. The colours represent the point density at a given genetic distance for all pairwise comparisons between metabolic models (n = 20,910). The black line shows a three-parameter exponential fit to the computational predictions; the red line shows a moving average of the predictions.

Extended Data Figure 2 The evolution of phenotypic similarity for different sets of carbon sources.

Genetic distances are based on 16S bacterial rRNA sequences. Phenotypic similarities of biomass synthesis are shown for 62, 149, and 446 carbon sources. Solid lines represent moving averages (using a 0.05 genetic distance window) of computational predictions; dashed lines represent exponential fits to the data.

Extended Data Figure 3 The effect of species selection on observed patterns of phenotypic divergence.

The black lines in all panels (marked ‘Original fit’) represent the exponential fit of the phenotypic similarity (carbon source utilization) as a function of genetic distance for all pairs of considered models, that is, models with fewer than 20% auto-completion reactions and more than five predicted carbon sources for growth. The observed trends of phenotypic evolution remain similar when (a) only models with a smaller fraction of auto-completion reactions are considered, (b) models from individual families that include more than ten modelled species are excluded from the analysis, (c) only a maximum number of species per family is considered, and (d) a subset of species is chosen at random from the pool of all considered models. In d, the average values at different genetic distance bins are shown for 1,000 random samples of a given number of species; error bars represent the s.e.m. obtained on the basis of the 1,000 replicates.

Extended Data Figure 4 Evolutionary relationship between 40 species for which experimental phenotype microarray data were considered.

The cladogram is based on 16S bacterial rRNA sequences. Different colours indicate different bacterial classes. The tree was rooted in the M. barkeri rRNA sequence.

Extended Data Figure 5 The effect of cutoff selection on the computational and experimental phenotypic similarity trends.

a, The exponential fits of phenotypic similarity as a function of genetic distance for metabolic models that predict growth on more than a given number of carbon sources. b, The frequency distribution of normalized Biolog scores for 40 species across 62 experimental growth conditions. The figure shows a bimodal pattern (scores of 0 are not plotted). The dashed red line shows the cutoff score of 10 used in the main analysis. c, The effect of different cutoffs used to define positive growth in the Biolog data. Different lines represent exponential fits to the experimental values of phenotypic similarity based on different values of the Biolog cutoff score.

Extended Data Figure 6 Effect of growth media on the predicted phenotypic similarity trends.

The black line shows an exponential fit to the predicted phenotypic similarity in the usage of carbon sources calculated on the basis of the in silico minimal media used in the study of growth phenotypes (see Methods and Supplementary Data 2) (n = 26,106). The dashed red line shows an exponential fit to the predicted phenotypic similarity using the in silico medium in which all carbon sources that could be imported by the models were made available, with the total combined uptake of carbon constrained to a maximum value of 10 mM g−1 dry weight (n = 27,261).

Extended Data Table 1 Parameters of the exponential divergence models, describing the evolution of growth and genetic phenotypes
Extended Data Table 2 Model comparisons for predicted and experimentally determined phenotypic similarity as a function of genetic distance
Extended Data Table 3 The predicted frequency of carbon and nitrogen source usage across metabolic models
Extended Data Table 4 Bacteria with experimental genome-wide data used to analyse the conservation of gene essentiality

Supplementary information

Supplementary Data 1

Biolog phenotype microarray data for 40 species across 62 carbon sources. Phenotypic data for all species were obtained from the Biolog GEN III database (Biolog, Inc. Hayward CA, USA). (XLSX 21 kb)

Supplementary Data 2

Compounds included in the simulated media in order to test for biomass/ATP production on different carbon and nitrogen sources. (XLSX 12 kb)

PowerPoint slides

Source data

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Plata, G., Henry, C. & Vitkup, D. Long-term phenotypic evolution of bacteria. Nature 517, 369–372 (2015).

Download citation

Further reading


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing