Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Identification of antimicrobial peptides from the human gut microbiome using deep learning

Abstract

The human gut microbiome encodes a large variety of antimicrobial peptides (AMPs), but the short lengths of AMPs pose a challenge for computational prediction. Here we combined multiple natural language processing neural network models, including LSTM, Attention and BERT, to form a unified pipeline for candidate AMP identification from human gut microbiome data. Of 2,349 sequences identified as candidate AMPs, 216 were chemically synthesized, with 181 showing antimicrobial activity (a positive rate of >83%). Most of these peptides have less than 40% sequence homology to AMPs in the training set. Further characterization of the 11 most potent AMPs showed high efficacy against antibiotic-resistant, Gram-negative pathogens and demonstrated significant efficacy in lowering bacterial load by more than tenfold against a mouse model of bacterial lung infection. Our study showcases the potential of machine learning approaches for mining functional peptides from metagenome data and accelerating the discovery of promising AMP candidate molecules for in-depth investigations.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Schematic representation of study workflow.
Fig. 2: Establishing AMP prediction pipeline combining NLP models.
Fig. 3: Mining candidate AMPs from metagenomic data.
Fig. 4: Experimental validation and potency assays of predicted AMPs.
Fig. 5: c_AMP treatment of a mouse model of bacterial infection and mechanistic assays for c_AMP1043.

Similar content being viewed by others

Data availability

Our study contains only publicly available AMP, non-AMP, metagenome and metaproteome data. AMP data were mainly collected from four public AMP datasets—ADAM: http://bioinformatics.cs.ntou.edu.tw/adam/, APD: http://aps.unmc.edu, CAMP: http://www.camp.bicnirrh.res.in/ and LAMP: http://biotechlab.fudan.edu.cn/database/lamp/—which cover most of AMP sequences from different sources (downloaded as of 2 October 2018). The non-AMP dataset was downloaded from UniProt (https://www.uniprot.org) by setting the ‘subcellular location’ filter to cytoplasm and removing any entry that matches the following keywords: antimicrobial, antibiotic, antiviral, antifungal, effector or excreted (downloaded as of 20 November 2018). Validation datasets: non-AMPs part ENA project ID is PRJEB19640; AMPs part was downloaded from http://bagel4.molgenrug.nl/index.php. The representative genomes dataset was derived from species-level genome bins: https://opendata.lifebit.ai/table/SGB. The metaproteome datasets were collected from https://www.ebi.ac.uk/pride, PRIDE project IDs: PXD005780, PXD008870, PXD003907 and PXD000114. The 15 independent, large-scale metagenomic cohorts—BioProject IDs: PRJNA422434, PRJEB4336, PRJEB1220, PRJEB6337, PRJEB6456, PRJEB10878, PRJEB11532, PRJNA319574, PRJEB9584, PRJNA290380, PRJEB6337, PRJEB15371, PRJNA356102 and https://github.com/MetaSUB/MetaSUB-metadata. Source data are provided with this paper.

Code availability

The c_AMP prediction codes can be found at https://github.com/mayuefine/c_AMPs-prediction.

References

  1. O’Neil, J. Tackling drug-resistant infections globally: final report and recommendations. (Review on Antimicrobial Resistance, 2016).

  2. De Oliveira, D. M. P. et al. Antimicrobial resistance in ESKAPE pathogens. Clin. Microbiol. Rev. 33, e00102-19 (2020).

    Article  Google Scholar 

  3. Tacconelli, E. et al. Global priority list of antibiotic-resistant bacteria to guide research, discovery, and development of new antibiotics. (World Health Organization, 2017).

  4. PEW Charitable Trusts. Analysis shows continued deficiencies in antibiotic developments since 2014. PEW https://www.pewtrusts.org/en/research-and-analysis/data-visualizations/2019/five-year-analysis-shows-continued-deficiencies-in-antibiotic-development (2019).

  5. Lazzaro, B. P., Zasloff, M. & Rolff, J. Antimicrobial peptides: application informed by evolution. Science 368, eaau5480 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Heng, N. C. K. & Tagg, J. R. What’s in a name? Class distinction for bacteriocins. Nat. Rev. Microbiol. 4, 160–160 (2006).

    Article  CAS  Google Scholar 

  7. Chen, X. et al. Roles and mechanisms of human cathelicidin LL-37 in cancer. Cell. Physiol. Biochem. 47, 1060–1073 (2018).

    Article  CAS  PubMed  Google Scholar 

  8. Yu, G., Baeder, D. Y., Regoes, R. R. & Rolff, J. Predicting drug resistance evolution: insights from antimicrobial peptides and antibiotics. Proc. Biol. Sci. 285, 20172687 (2018).

  9. Kintses, B. et al. Phylogenetic barriers to horizontal transfer of antimicrobial peptide resistance genes in the human gut microbiota. Nat. Microbiol. 4, 447–458 (2019).

    Article  CAS  PubMed  Google Scholar 

  10. Buffie, C. G. & Pamer, E. G. Microbiota-mediated colonization resistance against intestinal pathogens. Nat. Rev. Immunol. 13, 790–801 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Bisanz, J. E. et al. A genomic toolkit for the mechanistic dissection of intractable human gut bacteria. Cell Host Microbe 27, 1001–1013 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Wilson, M. R. et al. The human gut bacterial genotoxin colibactin alkylates DNA. Science 363, eaar7785 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Kent, A. G., Vill, A. C., Shi, Q., Satlin, M. J. & Brito, I. L. Widespread transfer of mobile antibiotic resistance genes within individual gut microbiomes revealed through bacterial Hi-C. Nat. Commun. 11, 4379 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Sberro, H. et al. Large-scale analyses of human microbiomes reveal thousands of small, novel genes. Cell 178, 1245–1259 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Kim, S. G. et al. Microbiota-derived lantibiotic restores resistance against vancomycin-resistant Enterococcus. Nature 572, 665–669 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Li., J. et al. Mining the human tonsillar microbiota as autoimmune modulator. Preprint at bioRxiv https://www.biorxiv.org/content/10.1101/719807v1.full (2019).

  17. Walsh, C. T. Insights into the chemical logic and enzymatic machinery of NRPS assembly lines. Nat. Prod. Rep. 33, 127–135 (2016).

    Article  CAS  PubMed  Google Scholar 

  18. Spänig, S. & Heider, D. Encodings and models for antimicrobial peptide classification for multi-resistant pathogens. BioData Min. 12, 7 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  19. Stokes, J. M. et al. A deep learning approach to antibiotic discovery. Cell 180, 688–702 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Zhavoronkov, A. et al. Deep learning enables rapid identification of potent DDR1 kinase inhibitors. Nat. Biotechnol. 37, 1038–1040 (2019).

    Article  CAS  PubMed  Google Scholar 

  21. Das, P. et al. Accelerated antimicrobial discovery via deep generative models and molecular dynamics simulations. Nat. Biomed. Eng. 5, 613–623 (2021).

  22. Nagarajan, D. et al. Computational antimicrobial peptide design and evaluation against multidrug-resistant clinical isolates of bacteria. J. Biol. Chem. 293, 3492–3509 (2018).

    Article  CAS  PubMed  Google Scholar 

  23. Van Oort, C. M., Ferrell, J. B., Remington, J. M., Wshah, S. & Li, J. AMPGAN v2: machine learning-guided design of antimicrobial peptides. J. Chem. Inf. Model. 61, 2198–2207 (2021).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  24. Wang, C., Garlick, S. & Zloh, M. Deep learning for novel antimicrobial peptide design. Biomolecules 11, 471 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Gupta, A. & Zou, J. Feedback GAN for DNA optimizes protein functions. Nat. Mach. Intell. 1, 105–111 (2019).

    Article  Google Scholar 

  26. Veltri, D., Kamath, U. & Shehu, A. Deep learning improves antimicrobial peptide recognition. Bioinformatics 34, 2740–2747 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Devlin., J., Chang., M.-W., Lee., K. & Toutanova., K. BERT: pre-training of deep bidirectional transformers for language understanding. Preprint at https://arxiv.org/abs/1810.04805 (2018).

  28. Pasolli, E. et al. Extensive unexplored human microbiome diversity revealed by over 150,000 genomes from metagenomes spanning age, geography, and lifestyle. Cell 176, 649–662 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Bevins, C. L. & Salzman, N. H. Paneth cells, antimicrobial peptides and maintenance of intestinal homeostasis. Nat. Rev. Microbiol. 9, 356–368 (2011).

    Article  CAS  PubMed  Google Scholar 

  30. Login, F. H. et al. Antimicrobial peptides keep insect endosymbionts under control. Science 334, 362–365 (2011).

    Article  CAS  PubMed  Google Scholar 

  31. World Health Organization. 2019 Antibacterial Agents in Clinical Development (World Health Organization, 2019).

  32. Gong, L. et al. A nosocomial respiratory infection outbreak of carbapenem-resistant Escherichia coli ST131 with multiple transmissible blaKPC-2 carrying plasmids. Front. Microbiol. 11, 2068 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  33. Upert, G., Luther, A., Obrecht, D. & Ermert, P. Emerging peptide antibiotics with therapeutic potential. Med. Drug Discov. 9, 100078 (2021).

    Article  CAS  PubMed  Google Scholar 

  34. Cigana, C. et al. Efficacy of the novel antibiotic POL7001 in preclinical models of Pseudomonas aeruginosa pneumonia. Antimicrob. Agents Chemother. 60, 4991–5000 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Florin, T. et al. An antimicrobial peptide that inhibits translation by trapping release factors on the ribosome. Nat. Struct. Mol. Biol. 24, 752–757 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Gagnon, M. G. et al. Structures of proline-rich peptides bound to the ribosome reveal a common mechanism of protein synthesis inhibition. Nucleic Acids Res. 44, 2439–2450 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Chu, H. et al. Human α-defensin 6 promotes mucosal innate immunity through self-assembled peptide nanonets. Science 337, 477–481 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Loth, K. et al. The ancestral N-terminal domain of big defensins drives bacterially triggered assembly into antimicrobial nanonets. mBio 10, e01821-19 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  39. De Fauw, J. et al. Clinically applicable deep learning for diagnosis and referral in retinal disease. Nat. Med. 24, 1342–1350 (2018).

    Article  PubMed  CAS  Google Scholar 

  40. Coudray, N. et al. Classification and mutation prediction from non-small cell lung cancer histopathology images using deep learning. Nat. Med. 24, 1559–1567 (2018).

    Article  CAS  PubMed  Google Scholar 

  41. Kather, J. N. et al. Deep learning can predict microsatellite instability directly from histology in gastrointestinal cancer. Nat. Med. 25, 1054–1056 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Xiong, Z. et al. Pushing the boundaries of molecular representation for drug discovery with the graph attention mechanism. J. Med. Chem. 63, 8749–8760 (2020).

    Article  CAS  PubMed  Google Scholar 

  43. Zhong, H. et al. Distinct gut metagenomics and metaproteomics signatures in prediabetics and treatment-naïve type 2 diabetics. EBioMedicine 47, 373–383 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  44. Fjell, C. D., Hancock, R. E. & Cherkasov, A. AMPer: a database and an automated discovery tool for antimicrobial peptides. Bioinformatics 23, 1148–1155 (2007).

    Article  CAS  PubMed  Google Scholar 

  45. Zhao, X., Wu, H., Lu, H., Li, G. & Huang, Q. LAMP: a database linking antimicrobial peptides. PLoS ONE 8, e66557 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Chu, J., Vila-Farres, X. & Brady, S. F. Bioactive synthetic-bioinformatic natural product cyclic peptides inspired by nonribosomal peptide synthetase gene clusters from the human microbiome. J. Am. Chem. Soc. 141, 15737–15741 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Garcia-Gutierrez, E., Mayer, M. J., Cotter, P. D. & Narbad, A. Gut microbiota as a source of novel antimicrobials. Gut Microbes 10, 1–21 (2019).

    Article  CAS  PubMed  Google Scholar 

  48. Ryu, M., Park, J., Yeom, J. H., Joo, M. & Lee, K. Rediscovery of antimicrobial peptides as therapeutic agents. J. Microbiol. 59, 113–123 (2021).

    Article  CAS  PubMed  Google Scholar 

  49. Cullen, T. W. et al. Gut microbiota. Antimicrobial peptide resistance mediates resilience of prominent gut commensals during inflammation. Science 347, 170–175 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Müller, A. T., Gabernet, G., Hiss, J. A. & Schneider, G. modlAMP: Python for antimicrobial peptides. Bioinformatics 33, 2753–2755 (2017).

    Article  PubMed  CAS  Google Scholar 

  51. Agrawal, P. & Raghava, G. P. S. Prediction of antimicrobial potential of a chemically modified peptide from its tertiary structure. Front. Microbiol. 9, 2551 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  52. Lertampaiporn, S., Vorapreeda, T., Hongsthong, A. & Thammarongtham, C. Ensemble-AMPPred: robust AMP prediction and recognition using the ensemble learning method with a new hybrid feature for differentiating AMPs. Genes 12, 137 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Barrett, R., Jiang, S. & White, A. D. Classifying antimicrobial and multifunctional peptides with Bayesian network models. Pept. Sci. 110, e24079 (2018).

    Article  CAS  Google Scholar 

  54. Kumar, P., Kizhakkedathu, J. N. & Straus, S. K. Antimicrobial peptides: diversity, mechanism of action and strategies to improve the activity and biocompatibility in vivo. Biomolecules 8, 4 (2018).

  55. Guha, S., Ghimire, J., Wu, E. & Wimley, W. C. Mechanistic landscape of membrane-permeabilizing peptides. Chem. Rev. 119, 6040–6085 (2019).

    Article  CAS  PubMed  Google Scholar 

  56. Mourtada, R. et al. Design of stapled antimicrobial peptides that are stable, nontoxic and kill antibiotic-resistant bacteria in mice. Nat. Biotechnol. 37, 1186–1197 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Luther, A. et al. Chimeric peptidomimetic antibiotics against Gram-negative bacteria. Nature 576, 452–458 (2019).

    Article  CAS  PubMed  Google Scholar 

  58. Munoz-Price, L. S. et al. Clinical epidemiology of the global expansion of Klebsiella pneumoniae carbapenemases. Lancet Infect. Dis. 13, 785–796 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  59. Bonomo, R. A. et al. Carbapenemase-producing organisms: a global scourge. Clin. Infect. Dis. 66, 1290–1297 (2018).

    Article  CAS  PubMed  Google Scholar 

  60. Santos-Júnior, C. D., Pan, S., Zhao, X. M. & Coelho, L. P. Macrel: antimicrobial peptide screening in genomes and metagenomes. PeerJ 8, e10555 (2020).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  61. Bhadra, P., Yan, J., Li, J., Fong, S. & Siu, S. W. I. AmPEP: sequence-based prediction of antimicrobial peptides using distribution patterns of amino acid properties and random forest. Sci. Rep. 8, 1697 (2018).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  62. Xiao, X., Wang, P., Lin, W. Z., Jia, J. H. & Chou, K. C. iAMP-2L: a two-level multi-label classifier for identifying antimicrobial peptides and their functional types. Anal. Biochem. 436, 168–177 (2013).

    Article  CAS  PubMed  Google Scholar 

  63. Meher, P. K., Sahu, T. K., Saini, V. & Rao, A. R. Predicting antimicrobial peptides with improved accuracy by incorporating the compositional, physico-chemical and structural features into Chou’s general PseAAC. Sci. Rep. 7, 42362 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Fingerhut, L., Miller, D. J., Strugnell, J. M., Daly, N. L. & Cooke, I. R. ampir: an R package for fast genome-wide prediction of antimicrobial peptides. Bioinformatics 36, 5262–5263 (2020).

  65. Xiao, X., Shao, Y. T., Cheng, X. & Stamatovic, B. iAMP-CA2L: a new CNN-BiLSTM-SVM classifier based on cellular automata image for identifying antimicrobial peptides and their functional types. Brief. Bioinform. 22, bbab209 (2021).

  66. Wang, G., Li, X. & Wang, Z. APD3: the antimicrobial peptide database as a tool for research and education. Nucleic Acids Res. 44, D1087–D1093 (2016).

    Article  CAS  PubMed  Google Scholar 

  67. Waghu, F. H., Barai, R. S., Gurung, P. & Idicula-Thomas, S. CAMPR3: a database on sequences, structures and signatures of antimicrobial peptides. Nucleic Acids Res. 44, D1094–D1097 (2016).

    Article  CAS  PubMed  Google Scholar 

  68. van Heel, A. J., de Jong, A., Montalbán-López, M., Kok, J. & Kuipers, O. P. BAGEL3: automated identification of genes encoding bacteriocins and (non-)bactericidal posttranslationally modified peptides. Nucleic Acids Res. 41, W448–W453 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  69. Neme, R., Amador, C., Yildirim, B., McConnell, E. & Tautz, D. Random sequences are an abundant source of bioactive RNAs or peptides. Nat. Ecol. Evol. 1, 0217 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  70. Domazet-Loso, M. & Haubold, B. Efficient estimation of pairwise distances between genomes. Bioinformatics 25, 3221–3227 (2009).

    Article  CAS  PubMed  Google Scholar 

  71. Gavin, P. G. et al. Intestinal metaproteomics reveals host-microbiota interactions in subjects at risk for type 1 diabetes. Diabetes Care 41, 2178–2186 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  72. Tanca, A., Palomba, A., Pisanu, S., Addis, M. F. & Uzzau, S. Enrichment or depletion? The impact of stool pretreatment on metaproteomic characterization of the human gut microbiota. Proteomics 15, 3474–3485 (2015).

    Article  CAS  PubMed  Google Scholar 

  73. Chatterjee, S. et al. A comprehensive and scalable database search system for metaproteomics. BMC Genomics 17, 642 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  74. Young, J. C. et al. Metaproteomics reveals functional shifts in microbial and human proteins during a preterm infant gut colonization case. Proteomics 15, 3463–3473 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  75. Danko, D. et al. A global metagenomic map of urban microbiomes and antimicrobial resistance. Cell 184, 3376−3393 (2021).

  76. Zeevi, D. et al. Personalized nutrition by prediction of glycemic responses. Cell 163, 1079–1094 (2015).

    Article  CAS  PubMed  Google Scholar 

  77. Vatanen, T. et al. Variation in microbiome LPS immunogenicity contributes to autoimmunity in humans. Cell 165, 842–853 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  78. Schirmer, M. et al. Linking the human gut microbiome to inflammatory cytokine production capacity. Cell 167, 1125–1136 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  79. Bäckhed, F. et al. Dynamics and stabilization of the human gut microbiome during the first year of Life. Cell Host Microbe 17, 690–703 (2015).

    Article  PubMed  CAS  Google Scholar 

  80. Nielsen, H. B. et al. Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes. Nat. Biotechnol. 32, 822–828 (2014).

    Article  CAS  PubMed  Google Scholar 

  81. Qin, J. et al. A metagenome-wide association study of gut microbiota in type 2 diabetes. Nature 490, 55–60 (2012).

    Article  CAS  PubMed  Google Scholar 

  82. Le Chatelier, E. et al. Richness of human gut microbiome correlates with metabolic markers. Nature 500, 541–546 (2013).

    Article  PubMed  CAS  Google Scholar 

  83. Xie, H. et al. Shotgun metagenomics of 250 adult twins reveals genetic and environmental impacts on the gut microbiome. Cell Syst. 3, 572–584 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  84. Mitchell, A. L. et al. EBI Metagenomics in 2017: enriching the analysis of microbial communities, from sequence reads to assemblies. Nucleic Acids Res. 46, D726–D735 (2018).

    Article  CAS  PubMed  Google Scholar 

  85. Yu, J. et al. Metagenomic analysis of faecal microbiome as a tool towards targeted non-invasive biomarkers for colorectal cancer. Gut 66, 70–78 (2017).

    Article  CAS  PubMed  Google Scholar 

  86. Liu, W. et al. Unique features of ethnic mongolian gut microbiome revealed by metagenomic analysis. Sci. Rep. 6, 34826 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  87. He, Q. et al. Two distinct metacommunities characterize the gut microbiota in Crohn’s disease patients. Gigascience 6, 1–11 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  88. Qin, N. et al. Alterations of the human gut microbiome in liver cirrhosis. Nature 513, 59–64 (2014).

    Article  CAS  PubMed  Google Scholar 

  89. Human Microbiome Project Consortium. Structure, function and diversity of the healthy human microbiome. Nature 486, 207–214 (2012).

  90. Tang., G., Müller., M., Rios., A. & Sennrich., R. Why self-attention? A targeted evaluation of neural machine translation architectures. Preprint at https://arxiv.org/abs/1808.08946 (2018).

  91. Vaswani., A. et al. Attention is all you need. Preprint at https://arxiv.org/abs/1706.03762 (2017).

  92. Abadi, M. et al. TensorFlow: large-scale machine learning on heterogeneous distributed systems. Preprint at https://arxiv.org/abs/1603.04467 (2016).

  93. Paszke., A. et al. PyTorch: an imperative style, high-performance deep learning library. 33rd Conference on Neural Information Processing Systems (NeurIPS 2019); https://proceedings.neurips.cc/paper/2019/file/bdbca288fee7f92f2bfa9f7012727740-Paper.pdf

  94. Rice, P., Longden, I. & Bleasby, A. EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet. 16, 276–277 (2000).

    Article  CAS  PubMed  Google Scholar 

  95. Westbrook, A. et al. PALADIN: protein alignment for functional profiling whole metagenome shotgun data. Bioinformatics 33, 1473–1478 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  96. Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  97. Segata, N. et al. Metagenomic microbial community profiling using unique clade-specific marker genes. Nat. Methods 9, 811–814 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  98. Zhang, B. & Horvath, S. A general framework for weighted gene co-expression network analysis. Stat. Appl. Genet. Mol. Biol. 4, Article17 (2005).

    Article  PubMed  Google Scholar 

  99. Langfelder, P. & Horvath, S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 9, 559 (2008).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  100. Pollard, K. S., Gilbert, H. N., Ge, Y., Taylor, S. & Dudoit, S. multtest: resampling-based multiple hypothesis testing. scienceopen.com https://www.scienceopen.com/document?vid=43b5caa2-bac4-47c7-80d1-ee9c30ba9be7 (2011).

  101. Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  102. Cock, P. J. et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25, 1422–1423 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  103. Wayne, P. A. Performance Standards for Antimicrobial Disk Susceptibility Tests (Clinical and Laboratory Standards Institute, 1991).

Download references

Acknowledgements

This work was supported by the Strategic Priority Research Program of the Chinese Academy of Sciences (grant no. XDB29020000); the National Key Research and Development Program of China (grant nos. 2018YFC2000500 and 2018YFA0901900); the National Natural Science Foundation of China (grant nos. 32025002, 91857101 and 31771481); the Biological Resources Programme of the Chinese Academy of Sciences (grant no. KFJ-BRP-009); and the Beijing Nova Program (202077/202120).

Author information

Authors and Affiliations

Authors

Contributions

J.W. and Y.C. conceptualized and managed this study. Y.M. developed the bioinformatics pipeline and screening. Y.M., X.L., B.X., Z.G., Y.Z., Y.Y., N.T., X.T. and M.W. carried out the experiments. Y.M., Z.G., B.X., X.L., X.Y., J.F., Y.C. and J.W. analyzed the data. Y.M., X.L., Y.C. and J.W. drafted the manuscript. J.F., Y.C. and J.W. edited the manuscript.

Corresponding authors

Correspondence to Yihua Chen or Jun Wang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Biotechnology thanks Luis Pedro Coelho and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Length distribution of datasets and model converge in the training stage.

a, The length distribution of sequences in three training sets (Train AMPs: training set for AMP sequences, Train Non-AMPs: training set for non-AMP sequences with similar amount of sequences to that of Train AMPs, Train 10Non-AMPs: training set for non-AMP sequences with 10 times amount of sequences to that of Train AMPs) training data are matched. The colored squares indicate the different length distributions. This was plotted by http://www.bioinformatics.com.cn. b, The loss during training process of different models. Attention and LSTM models converged with 100-200 epochs of training steps, while Bert converged with higher number of epochs. c, Length distribution of 2,349 candidate AMPs from the metagenomic cohorts in our study.

Source data

Extended Data Fig. 2 Spectra and level of bacterial inhibition of all c_AMPs against the four strains of bacteria used for the initial screening.

Green color indicates that a c_AMPs significantly decreased the OD of at least one of the testing species. ‘*’ denotes 0.01 < p ≤ 0.05, ‘#’ denotes 0.001 < p ≤ 0.01 and ‘+’ denotes p ≤ 0.001, all in Dunnett’s test (two-sided).

Source data

Extended Data Fig. 3 Identities between AMPs and non-AMPs in our training set/discovered AMPs, and amino acid composition.

a, Identity distributions based on multiple sequence alignment between c_AMPs and training set of AMPs/10Non-AMPs (see Methods), the grey line indicates the median of the identity values, Med stand for median identity. b, Sequence identity distributions in the training set, there was significantly higher identities among AMPs than between AMPs and non-AMPs (both balanced and unbalanced sets). One-sided Wilcoxon test was performed for each comparison. c, Amino acids composition of c_AMPs discovered in our study, and of known AMPs/non-AMPs in training sets (balanced dataset, pink; and unbalanced dataset, light yellow).

Source data

Extended Data Fig. 4 c_AMPs mechanism of action and resistance development.

a, Transmission electronic microscopy (TEM) examination of E. coli DH5α cells treated with c_AMP1043 at 10× MIC concentration, showing cell content leakage and cell wall/membrane disruption. Experiments were performed in triplicates with similar results and one representative figure is shown. b, Section photo of E. coli DH5α cells treated with c_AMP1043 and HEPES as control, with c_AMP1043 at MIC and 10× MIC concentration in the test, and three microscope images at difference magnifications were selected for each treatment. Experiments were performed in triplicates with similar results and one representative figure is shown. c, Mechanistic assays against E. coli DH5α for the ten other c_AMPs in the selected list. ALP, PI, NPN and DISC3(5) assays were used to examine the potential mechanism of function of c_AMPs, in particular the disruption of membrane of G- bacteria E. coli (see Methods and Results). Colored lines indicate dosage-dependent increase of signals. N = 3 independent experiments. Data are presented as mean values +/− SEM. d, Resistance development experiment of AMP1043 by serial passage against E. coli DH5α. The y-axis indicates the MIC measured directly from the tubes during the serial passages (μM) and the x-axis is the number of passages. In 30 passages, no observed resistance occurred as MIC remained <10 μM. N = 3 independent experiments.

Source data

Extended Data Fig. 5 Determination of peptide structures using circular dichroism (CD) spectra.

a, CD results for the 11 most potent peptides and b, corresponding proportions of secondary structures calculated from CD data using CDNN. Purple: in water phase; dark blue: peptide mixed with 20 times of DMPE/DMPG lipid mixture (see Methods). c, and d, Further CD results and predicted structures of 11 randomly selected peptides with AMP activity. e, positive control Magainin 2, with CD results (left), predicted proportions of each secondary structure (middle) and known structure in PDB (right, accession no. 2LSA).

Source data

Supplementary information

Supplementary Information

Supplementary Figs. 1–5, Methods and Tables 1–17.

Reporting Summary

Source data

Source Data Fig. 2

Statistical source data.

Source Data Fig. 3

Statistical source data.

Source Data Fig. 4

Statistical source data.

Source Data Fig. 5

Statistical source data.

Source Data Extended Data Fig. 1

Statistical source data.

Source Data Extended Data Fig. 2

Statistical source data.

Source Data Extended Data Fig. 3

Statistical source data.

Source Data Extended Data Fig. 4

Statistical source data.

Source Data Extended Data Fig. 5

Statistical source data.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ma, Y., Guo, Z., Xia, B. et al. Identification of antimicrobial peptides from the human gut microbiome using deep learning. Nat Biotechnol 40, 921–931 (2022). https://doi.org/10.1038/s41587-022-01226-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41587-022-01226-0

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing