Abstract
Training artificial intelligence (AI) systems to perform autonomous experiments would vastly increase the throughput of microbiology; however, few microbes have large enough datasets for training such a system. In the present study, we introduce BacterAI, an automated science platform that maps microbial metabolism but requires no prior knowledge. BacterAI learns by converting scientific questions into simple games that it plays with laboratory robots. The agent then distils its findings into logical rules that can be interpreted by human scientists. We use BacterAI to learn the amino acid requirements for two oral streptococci: Streptococcus gordonii and Streptococcus sanguinis. We then show how transfer learning can accelerate BacterAI when investigating new environments or larger media with up to 39 ingredients. Scientific gameplay and BacterAI enable the unbiased, autonomous study of organisms for which no training data exist.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
All data are available at http://github.com/jensenlab/BacterAI and the authors’ website: http://jensenlab.net/tools.
Code availability
All code is available at http://github.com/jensenlab/BacterAI and the authors’ website: http://jensenlab.net/tools.
References
Parks, D. H. et al. A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nat. Biotechnol. 36, 996–1004 (2018).
Quast, C. et al. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res. 41, D590–D596 (2013).
Coutant, A. et al. Closed-loop cycles of experiment design, execution, and learning accelerate systems biology model development in yeast. Proc. Natl Acad. Sci. USA 116, 18142–18147 (2019).
King, R. D. et al. The automation of science. Science 324, 85–89 (2009).
Schrittwieser, J. et al. Mastering Atari, Go, chess and shogi by planning with a learned model. Nature 588, 604–609 (2020).
Silver, D. et al. Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489 (2016).
Silver, D. et al. A general reinforcement learning algorithm that masters chess, dhogi, and Go through self-play. Science 362, 1140–1144 (2018).
Silver, D., Singh, S., Precup, D. & Sutton, R. S. Reward is enough. Artif. Intell. 299, 103535 (2021).
Tesauro, G. TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Comput. 6, 215–219 (1994).
Tesauro, G. & Galperin, G. On-line policy improvement using Monte-Carlo Search. In Advances in Neural Information Processing Systems (eds Mozer, M. C. et al) (MIT Press, 1996); https://proceedings.neurips.cc/paper_files/paper/1996/file/996009f2374006606f4c0b0fda878af1-Paper.pdf
Fel’dbaum, A. A. Theory of dual control. Autom. Remote Control 21, 1240–1249 (1960).
Witten, I. H. The apparent conflict between estimation and control—a survey of the two-armed bandit problem. J. Frankl. Inst. 301, 161–189 (1976).
Patel, S. & Gupta, R. S. Robust demarcation of fourteen different species groups within the genus Streptococcus based on genome-based phylogenies and molecular signatures. Infect. Genet. Evol. 66, 130–151 (2018).
van de Rijn, I. & Kessler, R. E. Growth characteristics of group A streptococci in a new chemically defined medium. Infect. Immun. 27, 444–448 (1980).
Lewin, G. R., Stocke, K. S., Lamont, R. J. & Whiteley, M. A quantitative framework reveals traditional laboratory growth is a highly accurate model of human oral infection. Proc. Natl Acad. Sci USA. https://doi.org/10.1073/pnas.2116637119 (2022).
King, R. D. et al. Functional genomic hypothesis generation and experimentation by a robot scientist. Nature 427, 247–252 (2004).
Magnúsdóttir, S. et al. Generation of genome-scale metabolic reconstructions for 773 members of the human gut microbiota. Nat Biotechnol. https://doi.org/10.1038/nbt.3703 (2017).
Jijakli, K. & Jensen, P. A. Metabolic modeling of Streptococcus mutans reveals complex nutrient requirements of an oral pathogen. mSystems. https://doi.org/10.1128/mSystems.00529-19 (2019).
Dama, A. C. & Jensen, P. A. PlatePlan. GitHub https://github.com/jensenlab/PlatePlan (2020).
Bellman, R. A Markovian decision process. J. Math. Mech. 6, 679–684 (1957).
Breiman, L. Bagging predictors. Mach. Learn. 24, 123–140 (1996).
Glorot, X., Bordes, A. & Bengio, Y. Deep sparse rectifier neural networks. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics Vol. 15 (eds Gordon, G. et al.) 315–323 (PMLR, 2011); https://proceedings.mlr.press/v15/glorot11a.html
Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. In 3rd International Conference on Learning Representations (eds Bengio, Y. & LeCun, Y.) Preprint at arXiv https://doi.org/10.48550/arXiv.1412.6980 (2015).
Akiba, T., Sano, S., Yanase, T., Ohta, T. & Koyama, M. Optuna: a next-generation hyperparameter optimization framework. In Proceedings of the 25rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining https://doi.org/10.1145/3292500.3330701 (Association for Computing Machinery, 2019).
Holland, J. H. Adaptation in Natural and Artificial Systems: an Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence (MIT Press, 1992).
Paszke, A. et al. PyTorch: an imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32 (eds Wallach, H. et al.) 8024–8035 (Curran Associates, Inc., 2019); http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf
Acknowledgements
This research was supported by the National Institutes of Health (grant nos. EB027396 and GM138210 to P.J.). The Titan V used for this research was donated by the NVIDIA Corporation. We thank K. Janes for his comments on the manuscript. Figures 1 and 3 were created with BioRender.com.
Author information
Authors and Affiliations
Contributions
A.D. and P.J. conceived the study, implemented BacterAI, designed experiments, analysed data and wrote the manuscript. A.D., K.K., D.L., A.L., N.S. and K.J. optimized laboratory workflows, executed experiments and performed quality control on the data.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Microbiology thanks Nathan Price and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 BacterAI learns the amino acid requirements of S. gordonii.
BacterAI learns to predict the growth of S. gordonii in media containing combinations of amino acids. Over 13 days, the agent trains a neural network to guide the search for new experiments along the growth front. Each day, the neural network is retrained using the data from all previous data (train set). The accuracy of the model is measured using the 336 experiments selected each day (test set).
Extended Data Fig. 2 Media selected by BacterAI are not random.
The fitness of S. gordonii grown in randomly selected media clusters near no growth (0) and full growth (1). By contrast, experiments selected by BacterAI are more uniformly distributed.
Extended Data Fig. 3 BacterAI learns the amino acid requirements of S. sanguinis.
BacterAI selects experiments to learn the amino acid requirements of the bacterium Streptococcus sanguinis. Although S. sanguinis is genetically similar to S. gordonii, the bacteria have different amino acid auxotrophies. BacterAI began its investigation of S. sanguinis from a blank slate and did not carry over any knowledge from the previous experiments with S. gordonii.
Extended Data Fig. 4 BacterAI learns amino acid requirements of S. sanguinis using transfer learning.
BacterAI learns a growth model for S. gordonii using transfer learning. A growth model for S. sanguinis was used to select the initial experiments and was retrained with new growth data from S. gordonii.
Extended Data Fig. 5 BacterAI learns amino acid requirements of S. sanguinis in anaerobic conditions using transfer learning.
BacterAI learns an anaerobic growth model for S. sanguinis using transfer learning. A growth model for S. sanguinis in an aerobic (5% CO2) environment was used to select the initial experiments and was retrained with new growth data from anaerobic experiments.
Extended Data Fig. 6 Replicate growth assays show little variation.
Replicates from fourteen days of experiments with S. sanguinis show little variation in growth. Using a grow/no grow threshold of 0.25 gives 97.37% (1 vs. 2), 97.11% (1 vs. 3), and 97.39% (2 vs. 3) agreement between the replicates.
Supplementary information
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Dama, A.C., Kim, K.S., Leyva, D.M. et al. BacterAI maps microbial metabolism without prior knowledge. Nat Microbiol 8, 1018–1025 (2023). https://doi.org/10.1038/s41564-023-01376-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41564-023-01376-0