Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Identification of potent antimicrobial peptides via a machine-learning pipeline that mines the entire space of peptide sequences

Abstract

Systematically identifying functional peptides is difficult owing to the vast combinatorial space of peptide sequences. Here we report a machine-learning pipeline that mines the hundreds of billions of sequences in the entire virtual library of peptides made of 6–9 amino acids to identify potent antimicrobial peptides. The pipeline consists of trainable machine-learning modules (for performing empirical selection, classification, ranking and regression tasks) assembled sequentially following a coarse-to-fine design principle to gradually narrow down the search space. The leading three antimicrobial hexapeptides identified by the pipeline showed strong activities against a wide range of clinical isolates of multidrug-resistant pathogens. In mice with bacterial pneumonia, aerosolized formulations of the identified peptides showed therapeutic efficacy comparable to penicillin, negligible toxicity and a low propensity to induce drug resistance. The machine-learning pipeline may accelerate the discovery of new functional peptides.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Overview of the AMP screening approach.
Fig. 2: Model selection for the pipeline.
Fig. 3: Wet-lab validation of the antimicrobial activity of the predicted peptides against S. aureus.
Fig. 4: Screening in the expended peptide libraries.
Fig. 5: Biological properties of CRRI hexapeptides in vitro.
Fig. 6: Therapeutic efficacy in treating acute pneumonia in vivo.
Fig. 7: Therapeutic efficacy in treating chronic pneumonia in vivo.

Similar content being viewed by others

Data availability

The main data supporting the findings of this study are available within the Article and its Supplementary Information. Negative AMPs were collected from the UniProt database (http://www.uniprot.org). Positive AMPs were collected from an external dataset dubbed Grampa and from an internal dataset consisting of peptides that were internally synthesized and experimentally validated. Source data for the figures are provided with this paper.

Code availability

Codes for the machine-learning models and for the generation of peptide features are provided in Supplementary Information.

References

  1. Sims, E. K., Carr, A. L. J., Oram, R. A., DiMeglio, L. A. & Evans-Molina, C. 100 years of insulin: celebrating the past, present and future of diabetes therapy. Nat. Med. 27, 1154–1164 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Henninot, A., Collins, J. C. & Nuss, J. M. The current state of peptide drug discovery: back to the future? J. Med. Chem. 61, 1382–1414 (2018).

    Article  CAS  PubMed  Google Scholar 

  3. Muttenthaler, M., King, G. F., Adams, D. J. & Alewood, P. F. Trends in peptide drug discovery. Nat. Rev. Drug Discov. 20, 309–325 (2021).

    Article  CAS  PubMed  Google Scholar 

  4. Huang, Y., Wiedmann, M. M. & Uga, H. RNA display methods for the discovery of bioactive macrocycles. Chem. Rev. 119, 10360–10391 (2019).

    Article  CAS  PubMed  Google Scholar 

  5. Muratov, E. N. et al. A critical overview of computational approaches employed for COVID-19 drug discovery. Chem. Soc. Rev. 50, 9121–9151 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Ganesan, A., Coote, M. L. & Barakat, K. Molecular dynamics-driven drug discovery: leaping forward with confidence. Drug Discov. Today 22, 249–269 (2017).

    Article  CAS  PubMed  Google Scholar 

  7. Sartor, R. C., Noshay, J., Springer, N. M. & Briggs, S. P. Identification of the expressome by machine learning on omics data. Proc. Natl Acad. Sci. USA 116, 18119–18125 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Tyanova, S. et al. The Perseus computational platform for comprehensive analysis of (prote)omics data. Nat. Methods 13, 731–740 (2016).

    Article  CAS  PubMed  Google Scholar 

  9. Baek, M. et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science 373, 871–876 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Janowczyk, A. & Madabhushi, A. Deep learning for digital pathology image analysis: a comprehensive tutorial with selected use cases. J. Pathol. Inform. 7, 29 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  12. Coudray, N. et al. Classification and mutation prediction from non-small cell lung cancer histopathology images using deep learning. Nat. Med. 24, 1559–1567 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Vamathevan, J. et al. Applications of machine learning in drug discovery and development. Nat. Rev. Drug Discov. 18, 463–477 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Ekins, S. et al. Exploiting machine learning for end-to-end drug discovery and development. Nat. Mater. 18, 435–441 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Fleming, N. How artificial intelligence is changing drug discovery. Nature 557, 55–57 (2018).

    Article  Google Scholar 

  16. Zhu, J. et al. Prediction of drug efficacy from transcriptional profiles with deep learning. Nat. Biotechnol. 39, 1444–1452 (2021).

    Article  CAS  PubMed  Google Scholar 

  17. Reker, D. et al. Computationally guided high-throughput design of self-assembling drug nanoparticles. Nat. Nanotechnol. 16, 725–733 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Sierra, J. M., Fuste, E., Rabanal, F., Vinuesa, T. & Vinas, M. An overview of antimicrobial peptides and the latest advances in their development. Expert Opin. Biol. Ther. 17, 663–676 (2017).

    Article  PubMed  Google Scholar 

  19. Lazzaro, B. P., Zasloff, M. & Rolff, J. Antimicrobial peptides: application informed by evolution. Science 368, eaau5480 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Lazar, V. et al. Antibiotic-resistant bacteria show widespread collateral sensitivity to antimicrobial peptides. Nat. Microbiol. 3, 718–731 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Das, P. et al. Accelerated antimicrobial discovery via deep generative models and molecular dynamics simulations. Nat. Biomed. Eng. 5, 613–623 (2021).

    Article  CAS  PubMed  Google Scholar 

  22. Mookherjee, N., Anderson, M. A., Haagsman, H. P. & Davidson, D. J. Antimicrobial host defence peptides: functions and clinical potential. Nat. Rev. Drug Discov. 19, 311–332 (2020).

    Article  CAS  PubMed  Google Scholar 

  23. Kolusheva, S., Boyer, L. & Jelinek, R. A colorimetric assay for rapid screening of antimicrobial peptides. Nat. Biotechnol. 18, 225–227 (2000).

    Article  CAS  PubMed  Google Scholar 

  24. Hilpert, K., Volkmer-Engert, R., Walter, T. & Hancock, R. E. High-throughput generation of small antibacterial peptides with improved activity. Nat. Biotechnol. 23, 1008–1012 (2005).

    Article  CAS  PubMed  Google Scholar 

  25. Tucker, A. T. et al. Discovery of next-generation antimicrobials through bacterial self-screening of surface-displayed peptide libraries. Cell 172, 618–628 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Fjell, C. D. et al. Identification of novel antibacterial peptides by chemoinformatics and machine learning. J. Med. Chem. 52, 2006–2015 (2009).

    Article  CAS  PubMed  Google Scholar 

  27. Fjell, C. D., Hiss, J. A., Hancock, R. E. & Schneider, G. Designing antimicrobial peptides: form follows function. Nat. Rev. Drug Discov. 11, 37–51 (2011).

    Article  PubMed  Google Scholar 

  28. Cardoso, M. H. et al. Computer-aided design of antimicrobial peptides: are we generating effective drug candidates? Front. Microbiol. 10, 3097 (2019).

    Article  PubMed  Google Scholar 

  29. Yoshida, M. et al. Using evolutionary algorithms and machine learning to explore sequence space for the discovery of antimicrobial peptides. Chem 4, 533–543 (2018).

    Article  CAS  Google Scholar 

  30. Mishra, B., Lakshmaiah Narayana, J., Lushnikova, T., Wang, X. & Wang, G. Low cationicity is important for systemic in vivo efficacy of database-derived peptides against drug-resistant Gram-positive pathogens. Proc. Natl Acad. Sci. USA 116, 13517–13522 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Mourtada, R. et al. Design of stapled antimicrobial peptides that are stable, nontoxic and kill antibiotic-resistant bacteria in mice. Nat. Biotechnol. 37, 1186–1197 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Chen, C. H. et al. Simulation-guided rational de novo design of a small pore-forming antimicrobial peptide. J. Am. Chem. Soc. 141, 4839–4848 (2019).

    Article  CAS  PubMed  Google Scholar 

  33. Mishra, B. & Wang, G. Ab initio design of potent anti-MRSA peptides based on database filtering technology. J. Am. Chem. Soc. 134, 12426–12429 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Nagarajan, D. et al. Omega76: a designed antimicrobial peptide to combat carbapenem- and tigecycline-resistant Acinetobacter baumannii. Sci. Adv. 5, eaax1946 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Torres, M. D. T., Melo, M. C. R., Crescenzi, O., Notomista, E. & de la Fuente-Nunez, C. Mining for encrypted peptide antibiotics in the human proteome. Nat. Biomed. Eng. https://doi.org/10.1038/s41551-021-00801-1 (2021).

    Article  PubMed  Google Scholar 

  36. Pane, K. et al. Antimicrobial potency of cationic antimicrobial peptides can be predicted from their amino acid composition: application to the detection of ‘cryptic’ antimicrobial peptides. J. Theor. Biol. 419, 254–265 (2017).

    Article  CAS  PubMed  Google Scholar 

  37. Ramesh, S., Govender, T., Kruger, H. G., de la Torre, B. G. & Albericio, F. Short antimicrobial peptides (SAMPs) as a class of extraordinary promising therapeutic agents. J. Pept. Sci. 22, 438–451 (2016).

    Article  CAS  PubMed  Google Scholar 

  38. Strom, M. B. et al. The pharmacophore of short cationic antibacterial peptides. J. Med. Chem. 46, 1567–1570 (2003).

    Article  CAS  PubMed  Google Scholar 

  39. Wenzel, M. et al. Small cationic antimicrobial peptides delocalize peripheral membrane proteins. Proc. Natl Acad. Sci. USA 111, 1409–1418 (2014).

    Article  Google Scholar 

  40. Chen, T. & Guestrin, C. Xgboost: a scalable tree boosting system. In Proc. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 785–794 (ACM, 2016).

  41. Kriegler, B. & Berk, R. Stochastic gradient boosting. Comput. Stat. Data Anal. 38, 367–378 (2010).

    Google Scholar 

  42. Qi, Y. in Ensemble Machine Learning (eds Zhang, C. & Ma, Y.) 307–323 (Springer, 2012).

  43. Lecun, Y. & Bottou, L. Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998).

    Article  Google Scholar 

  44. Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural Comput. 9, 1735–1780 (1997).

    Article  CAS  PubMed  Google Scholar 

  45. Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S. & Dean, J. Distributed representations of words and phrases and their compositionality. Adv. Neural Inf. Process. Syst. 26, 3111–3119 (2013).

  46. Witten, J. & Witten, Z. Deep learning regression model for antimicrobial peptide design. Preprint at bioRxiv https://doi.org/10.1101/692681 (2019).

  47. Wang, G. S., Li, X. & Wang, Z. APD3: the antimicrobial peptide database as a tool for research and education. Nucleic Acids Res. 44, D1087–D1093 (2016).

    Article  CAS  PubMed  Google Scholar 

  48. Novkovic, M., Simunic, J., Bojovic, V., Tossi, A. & Juretic, D. DADP: the database of anuran defense peptides. Bioinformatics 28, 1406–1407 (2012).

    Article  CAS  PubMed  Google Scholar 

  49. Pirtskhalava, M. et al. DBAASP v.2: an enhanced database of structure and antimicrobial/cytotoxic activity of natural and synthetic peptides. Nucleic Acids Res. 44, D1104–D1112 (2016).

    Article  CAS  PubMed  Google Scholar 

  50. Fan, L. L. et al. DRAMP: a comprehensive data repository of antimicrobial peptides. Sci. Rep. 6, 24482 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Piotto, S. P., Sessa, L., Concilio, S. & Iannelli, P. YADAMP: yet another database of antimicrobial peptides. Int. J. Antimicrob. Agents 39, 346–351 (2012).

    Article  CAS  PubMed  Google Scholar 

  52. Cao, Z., Qin, T., Liu, T. Y., Tsai, M. F. & Li, H. Learning to rank: from pairwise approach to listwise approach. In Proc. 24th International Conference on Machine Learning 129–136 (2007).

  53. Chen, S. et al. Host defense peptide mimicking peptide polymer exerting fast, broad spectrum, and potent activities toward clinically isolated multidrug-resistant bacteria. ACS Infect. Dis. 6, 479–488 (2020).

    Article  CAS  PubMed  Google Scholar 

  54. Dijkshoorn, L., Nemec, A. & Seifert, H. An increasing threat in hospitals: multidrug-resistant Acinetobacter baumannii. Nat. Rev. Microbiol. 5, 939–951 (2007).

    Article  CAS  PubMed  Google Scholar 

  55. Geisinger, E. et al. Antibiotic susceptibility signatures identify potential antimicrobial targets in the Acinetobacter baumannii cell envelope. Nat. Commun. 11, 4522 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Gordillo Altamirano, F. et al. Bacteriophage-resistant Acinetobacter baumannii are resensitized to antimicrobials. Nat. Microbiol. 6, 157–161 (2021).

    Article  CAS  PubMed  Google Scholar 

  57. De la Calle, C. et al. Staphylococcus aureus bacteremic pneumonia. Eur. J. Clin. Microbiol. Infect. Dis. 35, 497–502 (2016).

    Article  PubMed  Google Scholar 

  58. Bubeck Wardenburg, J., Bae, T., Otto, M., Deleo, F. R. & Schneewind, O. Poring over pores: alpha-hemolysin and Panton-Valentine leukocidin in Staphylococcus aureus pneumonia. Nat. Med. 13, 1405–1406 (2007).

    Article  PubMed  Google Scholar 

  59. Patton, J. S. & Byron, P. R. Inhaling medicines: delivering drugs to the body through the lungs. Nat. Rev. Drug Discov. 6, 67–74 (2007).

    Article  CAS  PubMed  Google Scholar 

  60. Patel, A. K. et al. Inhaled nanoformulated mRNA polyplexes for protein production in lung epithelium. Adv. Mater. 31, 1805116 (2019).

    Article  Google Scholar 

  61. Sharma, R. et al. Deep-ABPpred: identifying antibacterial peptides in protein sequences using bidirectional LSTM with word2vec. Brief. Bioinform. 22, bbab065 (2021).

    Article  PubMed  Google Scholar 

  62. Mohri, M., Rostamizadeh, A. & Talwalkar, A. Foundations of Machine Learning (MIT Press, 2018).

  63. Blockeel, H. Encyclopedia of Machine Learning (Springer, 2011).

  64. Clarke, B. Comparing Bayes model averaging and stacking when model approximation error cannot be ignored. J. Mach. Learn. Res. 4, 683–712 (2004).

    Google Scholar 

  65. Opitz, D. & Maclin, R. Popular ensemble methods: an empirical study. J. Artif. Intell. Res. 11, 169–198 (1999).

    Article  Google Scholar 

Download references

Acknowledgements

We thank R. Liu from East China University of Science and Technology for assistance in testing antimicrobial activity against clinically isolated strains. This work was supported by the National Natural Science Foundation of China (51933009, to J.J.), the National Key Research and Development Program of China (2020YFE0204400, to P.Z.), the Zhejiang Provincial Ten Thousand Talents Program (2018R52001, to J.J.), the Fundamental Research Funds for the Central Universities (226-2022-00146, to P.Z.), the International Research Center for X Polymers (to J.H.) and the startup package from Zhejiang University (to P.Z.).

Author information

Authors and Affiliations

Authors

Contributions

P.Z., J.Z. and J.J. conceived and supervised the project. J.H., Yanchao Xu and Y. Xue managed the project. Y. Xue encoded the peptide sequences. Yanchao Xu and J.Z. designed and implemented the algorithm. J.H., P.Z. and J.J. designed the wet-laboratory experiments. J.H. conducted the wet-laboratory experiments with the help of Y.H., X.C. and X.L. Yao Xu and D.Z. provided conceptual advice and technical support. J.H., Yanchao Xu, P.Z. and J.Z. prepared the manuscript. All of the authors discussed the results and assisted in the preparation of the manuscript.

Corresponding authors

Correspondence to Peng Zhang, Junbo Zhao or Jian Ji.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Biomedical Engineering thanks Cesar de la Fuente, Kim Lewis, Xiangrong Liu and Fangping Wan for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary methods, figures, tables and references.

Reporting Summary

Peer Review File

Supplementary File

Source data for the Supplementary figures.

Supplementary File

Codes for the machine-learning models and for the generation of peptide features.

Source data

Source data Fig. 2

Source data.

Source data Fig. 3

Source data.

Source data Fig. 4

Source data.

Source data Fig. 5

Source data.

Source data Fig. 6

Source data.

Source data Fig. 7

Source data.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Huang, J., Xu, Y., Xue, Y. et al. Identification of potent antimicrobial peptides via a machine-learning pipeline that mines the entire space of peptide sequences. Nat. Biomed. Eng 7, 797–810 (2023). https://doi.org/10.1038/s41551-022-00991-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41551-022-00991-2

This article is cited by

Search

Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics