Abstract
Systematically identifying functional peptides is difficult owing to the vast combinatorial space of peptide sequences. Here we report a machine-learning pipeline that mines the hundreds of billions of sequences in the entire virtual library of peptides made of 6–9 amino acids to identify potent antimicrobial peptides. The pipeline consists of trainable machine-learning modules (for performing empirical selection, classification, ranking and regression tasks) assembled sequentially following a coarse-to-fine design principle to gradually narrow down the search space. The leading three antimicrobial hexapeptides identified by the pipeline showed strong activities against a wide range of clinical isolates of multidrug-resistant pathogens. In mice with bacterial pneumonia, aerosolized formulations of the identified peptides showed therapeutic efficacy comparable to penicillin, negligible toxicity and a low propensity to induce drug resistance. The machine-learning pipeline may accelerate the discovery of new functional peptides.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 per month
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$79.00 per year
only $6.58 per issue
Rent or buy this article
Get just this article for as long as you need it
$39.95
Prices may be subject to local taxes which are calculated during checkout







Data availability
The main data supporting the findings of this study are available within the Article and its Supplementary Information. Negative AMPs were collected from the UniProt database (http://www.uniprot.org). Positive AMPs were collected from an external dataset dubbed Grampa and from an internal dataset consisting of peptides that were internally synthesized and experimentally validated. Source data for the figures are provided with this paper.
Code availability
Codes for the machine-learning models and for the generation of peptide features are provided in Supplementary Information.
References
Sims, E. K., Carr, A. L. J., Oram, R. A., DiMeglio, L. A. & Evans-Molina, C. 100 years of insulin: celebrating the past, present and future of diabetes therapy. Nat. Med. 27, 1154–1164 (2021).
Henninot, A., Collins, J. C. & Nuss, J. M. The current state of peptide drug discovery: back to the future? J. Med. Chem. 61, 1382–1414 (2018).
Muttenthaler, M., King, G. F., Adams, D. J. & Alewood, P. F. Trends in peptide drug discovery. Nat. Rev. Drug Discov. 20, 309–325 (2021).
Huang, Y., Wiedmann, M. M. & Uga, H. RNA display methods for the discovery of bioactive macrocycles. Chem. Rev. 119, 10360–10391 (2019).
Muratov, E. N. et al. A critical overview of computational approaches employed for COVID-19 drug discovery. Chem. Soc. Rev. 50, 9121–9151 (2021).
Ganesan, A., Coote, M. L. & Barakat, K. Molecular dynamics-driven drug discovery: leaping forward with confidence. Drug Discov. Today 22, 249–269 (2017).
Sartor, R. C., Noshay, J., Springer, N. M. & Briggs, S. P. Identification of the expressome by machine learning on omics data. Proc. Natl Acad. Sci. USA 116, 18119–18125 (2019).
Tyanova, S. et al. The Perseus computational platform for comprehensive analysis of (prote)omics data. Nat. Methods 13, 731–740 (2016).
Baek, M. et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science 373, 871–876 (2021).
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Janowczyk, A. & Madabhushi, A. Deep learning for digital pathology image analysis: a comprehensive tutorial with selected use cases. J. Pathol. Inform. 7, 29 (2016).
Coudray, N. et al. Classification and mutation prediction from non-small cell lung cancer histopathology images using deep learning. Nat. Med. 24, 1559–1567 (2018).
Vamathevan, J. et al. Applications of machine learning in drug discovery and development. Nat. Rev. Drug Discov. 18, 463–477 (2019).
Ekins, S. et al. Exploiting machine learning for end-to-end drug discovery and development. Nat. Mater. 18, 435–441 (2019).
Fleming, N. How artificial intelligence is changing drug discovery. Nature 557, 55–57 (2018).
Zhu, J. et al. Prediction of drug efficacy from transcriptional profiles with deep learning. Nat. Biotechnol. 39, 1444–1452 (2021).
Reker, D. et al. Computationally guided high-throughput design of self-assembling drug nanoparticles. Nat. Nanotechnol. 16, 725–733 (2021).
Sierra, J. M., Fuste, E., Rabanal, F., Vinuesa, T. & Vinas, M. An overview of antimicrobial peptides and the latest advances in their development. Expert Opin. Biol. Ther. 17, 663–676 (2017).
Lazzaro, B. P., Zasloff, M. & Rolff, J. Antimicrobial peptides: application informed by evolution. Science 368, eaau5480 (2020).
Lazar, V. et al. Antibiotic-resistant bacteria show widespread collateral sensitivity to antimicrobial peptides. Nat. Microbiol. 3, 718–731 (2018).
Das, P. et al. Accelerated antimicrobial discovery via deep generative models and molecular dynamics simulations. Nat. Biomed. Eng. 5, 613–623 (2021).
Mookherjee, N., Anderson, M. A., Haagsman, H. P. & Davidson, D. J. Antimicrobial host defence peptides: functions and clinical potential. Nat. Rev. Drug Discov. 19, 311–332 (2020).
Kolusheva, S., Boyer, L. & Jelinek, R. A colorimetric assay for rapid screening of antimicrobial peptides. Nat. Biotechnol. 18, 225–227 (2000).
Hilpert, K., Volkmer-Engert, R., Walter, T. & Hancock, R. E. High-throughput generation of small antibacterial peptides with improved activity. Nat. Biotechnol. 23, 1008–1012 (2005).
Tucker, A. T. et al. Discovery of next-generation antimicrobials through bacterial self-screening of surface-displayed peptide libraries. Cell 172, 618–628 (2018).
Fjell, C. D. et al. Identification of novel antibacterial peptides by chemoinformatics and machine learning. J. Med. Chem. 52, 2006–2015 (2009).
Fjell, C. D., Hiss, J. A., Hancock, R. E. & Schneider, G. Designing antimicrobial peptides: form follows function. Nat. Rev. Drug Discov. 11, 37–51 (2011).
Cardoso, M. H. et al. Computer-aided design of antimicrobial peptides: are we generating effective drug candidates? Front. Microbiol. 10, 3097 (2019).
Yoshida, M. et al. Using evolutionary algorithms and machine learning to explore sequence space for the discovery of antimicrobial peptides. Chem 4, 533–543 (2018).
Mishra, B., Lakshmaiah Narayana, J., Lushnikova, T., Wang, X. & Wang, G. Low cationicity is important for systemic in vivo efficacy of database-derived peptides against drug-resistant Gram-positive pathogens. Proc. Natl Acad. Sci. USA 116, 13517–13522 (2019).
Mourtada, R. et al. Design of stapled antimicrobial peptides that are stable, nontoxic and kill antibiotic-resistant bacteria in mice. Nat. Biotechnol. 37, 1186–1197 (2019).
Chen, C. H. et al. Simulation-guided rational de novo design of a small pore-forming antimicrobial peptide. J. Am. Chem. Soc. 141, 4839–4848 (2019).
Mishra, B. & Wang, G. Ab initio design of potent anti-MRSA peptides based on database filtering technology. J. Am. Chem. Soc. 134, 12426–12429 (2012).
Nagarajan, D. et al. Omega76: a designed antimicrobial peptide to combat carbapenem- and tigecycline-resistant Acinetobacter baumannii. Sci. Adv. 5, eaax1946 (2019).
Torres, M. D. T., Melo, M. C. R., Crescenzi, O., Notomista, E. & de la Fuente-Nunez, C. Mining for encrypted peptide antibiotics in the human proteome. Nat. Biomed. Eng. https://doi.org/10.1038/s41551-021-00801-1 (2021).
Pane, K. et al. Antimicrobial potency of cationic antimicrobial peptides can be predicted from their amino acid composition: application to the detection of ‘cryptic’ antimicrobial peptides. J. Theor. Biol. 419, 254–265 (2017).
Ramesh, S., Govender, T., Kruger, H. G., de la Torre, B. G. & Albericio, F. Short antimicrobial peptides (SAMPs) as a class of extraordinary promising therapeutic agents. J. Pept. Sci. 22, 438–451 (2016).
Strom, M. B. et al. The pharmacophore of short cationic antibacterial peptides. J. Med. Chem. 46, 1567–1570 (2003).
Wenzel, M. et al. Small cationic antimicrobial peptides delocalize peripheral membrane proteins. Proc. Natl Acad. Sci. USA 111, 1409–1418 (2014).
Chen, T. & Guestrin, C. Xgboost: a scalable tree boosting system. In Proc. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 785–794 (ACM, 2016).
Kriegler, B. & Berk, R. Stochastic gradient boosting. Comput. Stat. Data Anal. 38, 367–378 (2010).
Qi, Y. in Ensemble Machine Learning (eds Zhang, C. & Ma, Y.) 307–323 (Springer, 2012).
Lecun, Y. & Bottou, L. Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998).
Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural Comput. 9, 1735–1780 (1997).
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S. & Dean, J. Distributed representations of words and phrases and their compositionality. Adv. Neural Inf. Process. Syst. 26, 3111–3119 (2013).
Witten, J. & Witten, Z. Deep learning regression model for antimicrobial peptide design. Preprint at bioRxiv https://doi.org/10.1101/692681 (2019).
Wang, G. S., Li, X. & Wang, Z. APD3: the antimicrobial peptide database as a tool for research and education. Nucleic Acids Res. 44, D1087–D1093 (2016).
Novkovic, M., Simunic, J., Bojovic, V., Tossi, A. & Juretic, D. DADP: the database of anuran defense peptides. Bioinformatics 28, 1406–1407 (2012).
Pirtskhalava, M. et al. DBAASP v.2: an enhanced database of structure and antimicrobial/cytotoxic activity of natural and synthetic peptides. Nucleic Acids Res. 44, D1104–D1112 (2016).
Fan, L. L. et al. DRAMP: a comprehensive data repository of antimicrobial peptides. Sci. Rep. 6, 24482 (2016).
Piotto, S. P., Sessa, L., Concilio, S. & Iannelli, P. YADAMP: yet another database of antimicrobial peptides. Int. J. Antimicrob. Agents 39, 346–351 (2012).
Cao, Z., Qin, T., Liu, T. Y., Tsai, M. F. & Li, H. Learning to rank: from pairwise approach to listwise approach. In Proc. 24th International Conference on Machine Learning 129–136 (2007).
Chen, S. et al. Host defense peptide mimicking peptide polymer exerting fast, broad spectrum, and potent activities toward clinically isolated multidrug-resistant bacteria. ACS Infect. Dis. 6, 479–488 (2020).
Dijkshoorn, L., Nemec, A. & Seifert, H. An increasing threat in hospitals: multidrug-resistant Acinetobacter baumannii. Nat. Rev. Microbiol. 5, 939–951 (2007).
Geisinger, E. et al. Antibiotic susceptibility signatures identify potential antimicrobial targets in the Acinetobacter baumannii cell envelope. Nat. Commun. 11, 4522 (2020).
Gordillo Altamirano, F. et al. Bacteriophage-resistant Acinetobacter baumannii are resensitized to antimicrobials. Nat. Microbiol. 6, 157–161 (2021).
De la Calle, C. et al. Staphylococcus aureus bacteremic pneumonia. Eur. J. Clin. Microbiol. Infect. Dis. 35, 497–502 (2016).
Bubeck Wardenburg, J., Bae, T., Otto, M., Deleo, F. R. & Schneewind, O. Poring over pores: alpha-hemolysin and Panton-Valentine leukocidin in Staphylococcus aureus pneumonia. Nat. Med. 13, 1405–1406 (2007).
Patton, J. S. & Byron, P. R. Inhaling medicines: delivering drugs to the body through the lungs. Nat. Rev. Drug Discov. 6, 67–74 (2007).
Patel, A. K. et al. Inhaled nanoformulated mRNA polyplexes for protein production in lung epithelium. Adv. Mater. 31, 1805116 (2019).
Sharma, R. et al. Deep-ABPpred: identifying antibacterial peptides in protein sequences using bidirectional LSTM with word2vec. Brief. Bioinform. 22, bbab065 (2021).
Mohri, M., Rostamizadeh, A. & Talwalkar, A. Foundations of Machine Learning (MIT Press, 2018).
Blockeel, H. Encyclopedia of Machine Learning (Springer, 2011).
Clarke, B. Comparing Bayes model averaging and stacking when model approximation error cannot be ignored. J. Mach. Learn. Res. 4, 683–712 (2004).
Opitz, D. & Maclin, R. Popular ensemble methods: an empirical study. J. Artif. Intell. Res. 11, 169–198 (1999).
Acknowledgements
We thank R. Liu from East China University of Science and Technology for assistance in testing antimicrobial activity against clinically isolated strains. This work was supported by the National Natural Science Foundation of China (51933009, to J.J.), the National Key Research and Development Program of China (2020YFE0204400, to P.Z.), the Zhejiang Provincial Ten Thousand Talents Program (2018R52001, to J.J.), the Fundamental Research Funds for the Central Universities (226-2022-00146, to P.Z.), the International Research Center for X Polymers (to J.H.) and the startup package from Zhejiang University (to P.Z.).
Author information
Authors and Affiliations
Contributions
P.Z., J.Z. and J.J. conceived and supervised the project. J.H., Yanchao Xu and Y. Xue managed the project. Y. Xue encoded the peptide sequences. Yanchao Xu and J.Z. designed and implemented the algorithm. J.H., P.Z. and J.J. designed the wet-laboratory experiments. J.H. conducted the wet-laboratory experiments with the help of Y.H., X.C. and X.L. Yao Xu and D.Z. provided conceptual advice and technical support. J.H., Yanchao Xu, P.Z. and J.Z. prepared the manuscript. All of the authors discussed the results and assisted in the preparation of the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Biomedical Engineering thanks Cesar de la Fuente, Kim Lewis, Xiangrong Liu and Fangping Wan for their contribution to the peer review of this work. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary methods, figures, tables and references.
Supplementary File
Source data for the Supplementary figures.
Supplementary File
Codes for the machine-learning models and for the generation of peptide features.
Source data
Source data Fig. 2
Source data.
Source data Fig. 3
Source data.
Source data Fig. 4
Source data.
Source data Fig. 5
Source data.
Source data Fig. 6
Source data.
Source data Fig. 7
Source data.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Huang, J., Xu, Y., Xue, Y. et al. Identification of potent antimicrobial peptides via a machine-learning pipeline that mines the entire space of peptide sequences. Nat. Biomed. Eng (2023). https://doi.org/10.1038/s41551-022-00991-2
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41551-022-00991-2