Novel antimicrobial peptides against Cutibacterium acnes designed by deep learning

Dong, Qichang; Wang, Shaohua; Miao, Ying; Luo, Heng; Weng, Zuquan; Yu, Lun

doi:10.1038/s41598-024-55205-3

Download PDF

Article
Open access
Published: 24 February 2024

Novel antimicrobial peptides against Cutibacterium acnes designed by deep learning

Qichang Dong¹^na1,
Shaohua Wang¹^na1,
Ying Miao³^na1,
Heng Luo¹,
Zuquan Weng³ &
…
Lun Yu²

Scientific Reports volume 14, Article number: 4529 (2024) Cite this article

1059 Accesses
Metrics details

Subjects

Abstract

The increasing prevalence of antibiotic resistance in Cutibacterium acnes (C. acnes) requires the search for alternative therapeutic strategies. Antimicrobial peptides (AMPs) offer a promising avenue for the development of new treatments targeting C. acnes. In this study, to design peptides with the specific inhibitory activity against C. acnes, we employed a deep learning pipeline with generators and classifiers, using transfer learning and pretrained protein embeddings, trained on publicly available data. To enhance the training data specific to C. acnes inhibition, we constructed a phylogenetic tree. A panel of 42 novel generated linear peptides was then synthesized and experimentally evaluated for their antimicrobial selectivity and activity. Five of them demonstrated their high potency and selectivity against C. acnes with MIC of 2–4 µg/mL. Our findings highlight the potential of these designed peptides as promising candidates for anti-acne therapeutics and demonstrate the power of computational approaches for the rational design of targeted antimicrobial peptides.

Avidumicin, a novel cyclic bacteriocin, produced by Cutibacterium avidum shows anti-Cutibacterium acnes activity

Article 01 June 2023

Deep learning-guided discovery of an antibiotic targeting Acinetobacter baumannii

Article 25 May 2023

Transcriptomic analysis of the antimicrobial activity of prodigiosin against Cutibacterium acnes

Article Open access 13 October 2023

Introduction

Natural AMPs are a set of small proteins synthesized by microorganisms, plants, and animals as part of their host innate immune response to infection. They often show good activities against multi-drug resistant bacteria, thereby offering an opportunity to address this global public health threat^1,2. Most reported AMPs are cationic and amphiphilic in nature and possess properties that are thought to be crucial for insertion into and disruption of the bacterial membrane^3,4.

In this paper, we are concerned with C. acnes, formerly known as Propionibacterium acnes⁵, a gram-positive (GP) bacterium that colonizes human skin. This lipophilic anaerobic bacterium resides mainly in the sebum-rich pilosebaceous units but is also detected in non-sebaceous areas⁶. C. acnes play an important role in the pathophysiology of acne vulgaris⁷. Acne vulgaris is a chronic inflammatory skin disorder affecting more than 80% of all adolescents and young adults worldwide⁸. Topical antibiotics, such as clindamycin, are effective acne treatments, but their widespread and often permissive use has led to the generation of resistant strains⁹. Owing to their bioactivities and low tendency to induce resistance, new antimicrobial agents, specifically targeting biofilm-forming C. acnes, may represent potential treatments to modulate the skin microbiota in acne¹⁰.

Previous designs of peptides against C. acnes were mostly template-based, relying on natural peptide screening, derivation, and sequence modifications^7,11,12. The success of these template-based designs is highly dependent on prior knowledge and predefined rules discovered from existing AMPs. However, identifying and experimentally validating these rules can be challenging, time-consuming and costly¹³.

With the progress of artificial intelligence, model-based methods have been applied to design AMPs. For the model-based de novo peptides design, two kinds of models are used: (1) generative models that generate novel peptide sequences; (2) predictive models that predict the bioactivities and properties of peptides by taking the peptide sequences as input. Various model-based methods have been used to design new AMPs with high antimicrobial activity, resistance to proteolysis, and low toxicity^{14,15,16,17,18,19,20,21,22,23,24}, but only a limited of them have been validated by experiments.

Here, we presented a pipeline based on a series of deep learning (DL) models for AMPs design selectively targeting C. acnes, while guaranteeing non-hemolysis and novelty. Our training dataset is mostly constructed from the Database of Antimicrobial Activity and Structure of Peptides (DBAASP)²⁵. To augment the data for targeted C. acnes inhibition, we constructed a phylogenetic tree to select bacteria species closely related to C. acnes. The deep learning models start with a basic generative model trained on all active AMPs sequences. Following that, the basic generative model was fine-tuned on the augmented C. acnes-specific data. Further, we trained two classifiers, one for predicting the antimicrobial activity and the other for predicting hemolysis. Then, we applied length filtering and clustering to prioritize the selections of peptides for in vitro experiments. At last, we synthesized 42 peptides to conduct in vitro tests and verified their antimicrobial potencies, along with no hemolysis and cytotoxicity.

In brief, our contributions are three-folds as follows:

We proposed a data curation process, specially constructed a C. acnes-associated AMPs dataset by phylogenetic analysis.
We designed a series of AI models including two generative models and two classification models to generate and classify unique and potent peptides in an efficient way.
We selected 42 designed peptides to synthesize, and conducted in vitro tests to verify their antimicrobial potencies, hemolysis and cytotoxicity.

Results

Our design pipeline is shown in Fig. 1 and summarized as follows:

1.
Basic generator training: we trained a basic initial DL generator with all known active AMPs sequences.
2.
Fine-tuning for C. acnes: we fine-tuned the basic DL generator with C. acnes-related sequences, resulting in an anti-C. acnes generator model.
3.
Activity and hemolysis classifiers training: we separately trained the activity and hemolysis DL classifiers.
4.
Novel sequence generation: the anti-C. acnes generator was used to generate 660,000 novel sequences.
5.
Classifier filter: the activity and hemolysis classifiers were used to filter the generated novel sequences, yielding 24,579 sequences.
6.
Length filtering and clustering: we applied a length filter and sequence clustering algorithm to select 42 peptides for in vitro experiments.
7.
In vitro validation: five of the 42 tested peptides demonstrated high selectivity and potency with minimum inhibitory concentration (MIC) values as low as 2–4 µg/mL against C. acnes.

Datasets

We constructed our training set based on DBAASP²⁶, which is a manually-curated database that contains over 19,000 peptides annotated with antimicrobial activity values and hemolytic values²⁵. We selected peptides with lengths from 4 to 50 amino acid residues inclusive and with specified target species, leading to the creation of multiple datasets. The data curation methods are described in the Supplementary Materials.

Antimicrobial activity classification dataset

This dataset compasses 8884 active and 4009 inactive peptides from DBAASP, supplemented with 4875 additional pseudo-inactive sequences.

Hemolysis Classification Dataset. We curated 2217 hemolytic and 2013 non-hemolytic peptides from DBAASP.

Peptide generation dataset

To train the initial generator, we utilized all active AMPs sequences, irrespective of the target species. To fine-tune the initial generator with C. acnes-related AMPs, we constructed a phylogenetic tree and identified 28 species related to C. acnes.

Figure 2a,b show the length distribution of AMPs and non-AMPs. It is clear that most AMPs and non-AMPs gravitate towards the 9–21 amino acid span. Notably, the most prevalent lengths for AMPs peptides are 12, 13, and 14 amino acids. This prevalence might be attributed to cost constraints, given that over 80% of peptides in DBAASP are synthetically produced²⁶.

Generators

We first trained a basic generative model with recurrent neural network (RNN) networks with the known active peptides from DBAASP. Subsequently, we employed transfer learning techniques, fine-tuning this model using C. acnes-related sequences, resulting in the specialized anti-C. acnes AMPs generator (Fig. 1). This refined generator produced 660,000 sequences with a uniqueness ratio of 99.16% and a novelty ratio of 99.97%. Here, the novelty is defined as the absence of a sequence in the curated known active and inactive training sequences in DBAASP, to check the generator performance.

Comparing key physicochemical attributes of both known and generated AMPs offers insights into our generative model’s efficacy. In the comparisons below, we defined three groups of data: “Known general AMPs” representing the 8884 known active AMPs in DBAASP with documented antimicrobial activity against at least one bacterial species; “Known anti-C. acnes AMPs”, a subset of “Known general AMPs”, representing the 653 active and non-hemolytic AMPs against C. acnes or related species; “Generated anti-C. acnes AMPs” representing the 660,000 sequences generated in this study.

Visual inspection of amino acid composition (AAC) distributions (Fig. 3a) indicates a closer resemblance between “Generated anti-C. acnes AMPs” and “Known anti-C. acnes AMPs” compared to “Known general AMPs”. Statistical testing also supports such observations. At the significance level of 0.05, the differences of each amino acid proportions show that for all amino acids, the distributions between “Generated anti-C. acnes AMPs” and “Known anti-C. acnes AMPs” do not significantly differ, whereas 13 out of 20 amino acids show significant differences when comparing “Generated anti-C. acnes AMPs” to “Known general AMPs”. The AAC statistical analysis data are in Supplementary Table 2. These results validate the closer AAC resemblance between the generated and known anti-C. acnes AMPs and demonstrate our transfer learning strategy’s effectiveness.

Although charge distribution across the datasets was largely consistent, distinct variations appeared at specific charge levels (Fig. 3b). Kernel distribution estimations (KDE) for anticipated α-helix amino acids fraction²⁷ showcased consistent dual peaks at the value 0.00 and 0.85 across the datasets (Fig. 3c). This pattern suggests that the generated anti-C. acnes AMPs predominantly feature positive charges and are inclined to assume an amphipathic α-helix structure, aligning more with known anti-C. acnes than known general AMPs. The Boman index²⁸ and hydrophobic moment²⁹ values distributions for generated anti-C. acnes AMPs aligned closely with both known anti-C. acnes and known general AMPs (Supplementary Fig. 1a,b). Such observations spotlight the distinct physicochemical profiles across the known general, known anti-C. acnes, and generated anti-C. acnes AMPs, offering crucial insights for targeted therapeutic implementations. We used modlAMP³⁰ to calculate the physicochemical attributes of peptides.

Classifiers

In our assessment of gated recurrent unit (GRU)³¹ and long short-term memory (LSTM)³² architectures, we also benchmarked various pre-trained protein language model embeddings. The experimental outcomes are illustrated in Fig. 4a,b. Comparing GRU and LSTM using random initial embeddings, LSTM marginally outperformed GRU. Subsequently, we compared random initial embeddings against two popular pre-trained protein language model embeddings: Evolutionary Scale Modeling (ESM)³³ and ProtTrans³⁴, deploying them with LSTM. Among the configurations tested, LSTM paired with ProtTrans performed best for the activity classification across all computed metrics (ROC AUC = 0.872, AUPRC = 0.854, accuracy = 0.792, precision = 0.785, recall = 0.803, F1 score = 0.794). As shown in Fig. 4a, ROC AUC was used as comparison metrics for different classifiers. In contrast, for hemolysis classification, LSTM using random initial embeddings outshone other configurations across most computed metrics (ROC AUC = 0.888, AUPRC = 0.879, accuracy = 0.814, precision = 0.807, recall = 0.805, F1 score = 0.806), shown in Fig. 4b. Interestingly, activity classifiers exhibited subdued performance relative to hemolysis classifiers, insinuating a higher sequence diversity and intricacy in activity than in hemolysis.

In the larger activity dataset of 17,768 sequences, both ESM and ProtTrans pre-trained protein embeddings boosted the classifier performances by roughly 2%. Conversely, for the hemolysis dataset, encompassing 4230 sequences, these pre-trained embeddings decreased performance. We speculate that pre-trained protein, predominantly trained on longer protein sequences, may not consistently excel with AMPs datasets. This is attributed to AMPs sequences often being shorter, spanning from 4 to 50 amino acids, which is a stark contrast to the typical protein sequences. Such insights suggest that leveraging pre-trained protein embeddings in AMPs models might hinge on the specific AMPs dataset’s size and characteristics.

To ensure a minimized false positive rate, we set threshold values above 0.99 to transform the final two classifiers’ probabilistic outputs to binary classifications. Consequently, the activity and hemolysis classifiers demonstrated precisions of 0.906 and 0.917, respectively. Sequentially applying both yielded a collective precision of 0.828, filtering down to 24,579 sequences.

We emphasized that the goal of this paper is not to provide AMP prediction models that outperform existing ones. Rather, the goal is to build a pipeline with comparable accuracies and selection strategies to design novel and potent peptides anti specific strains, such as C. acnes. We believe the performances of our models are sufficient for the intended task. They are also able to identify potential AMPs that might be overlooked by conventional methods (as detailed in Table 1 of the Supplementary Materials).

Length and novelty filters

We chose peptide sequences with lengths ranging from 10 to 15. To guarantee the novelty of these peptides, they were mandated to have a minimum of five mutations distinct from known AMPs in the DBAASP database, avoiding trivial analogs of known AMPs. For added diversity, we clustered the peptides based on their Levenshtein sequence distances. This rigorous process yielded 42 peptides for further in vitro experiments, and their sequences and physicochemical properties are in Supplementary Table 3. This length and novelty filter flowchart is shown in Supplementary Fig. 2.

In vitro experiments

In vitro antimicrobial activity of designed peptides

We synthesized a set of 42 peptides and evaluated in vitro inhibitory activity against C. acnes at concentrations of 100/50/25/12.5 µg/mL. A peptide was deemed to have antimicrobial activity if C. acnes growth was inhibited by more than 50% at a given concentration. The results yielded 14 high activity, 16 medium activity, 4 low activity, and 8 non activity peptides (Table 1). Remarkably, our design’s success rate for effective antimicrobial peptides was 71.4% (considering only medium and high activity peptides), showcasing exceptional efficacy.

Table 1 Summary and classification of antimicrobial activity of tested peptides against C. acnes.

Full size table

To pinpoint the MIC of the most potent AMPs, we assessed their inhibitory effects against C. acnes at even lower concentrations: 8/4/2/1 µg/mL. As depicted in Fig. 5, AMP-29,12 displayed the lowest MIC, registering at just 2 µg/mL. This was closely followed by AMP-25,31,33 and the positive controls (HPA3NT3 and FK13), all recording MICs of 4 µg/mL. AMP-5,9,38 marked their MICs at 8 µg/mL (Table 2). These impressively low MICs highlight the robust inhibitory prowess of AMPs against C. acnes.

Table 2 MIC and toxicity of tested peptides. MIC is determined as the lowest test concentration of peptides for microbial viability less than 10%. EC90 is the highest test concentration of peptides for rabbit red blood cell or HaCaT cell viability higher than 90%. Index is the ratio of EC90 to MIC of C. acnes.

Full size table

To ascertain the broad-spectrum efficacy of these peptides, we evaluated their activity against E. coli and S. aureus, representative of Gram-negative and Gram-positive bacteria, respectively. As illustrated in Fig. 6, every peptide we assessed effectively inhibited both E. coli and S. aureus, confirming their broad-spectrum antimicrobial activity, in line with previously studied AMPs^7,36,37,38. It’s noteworthy that the MIC values for E. coli and S. aureus were higher than those for C. acnes (Table 2). This aligns with our design, as these peptides were specifically engineered to target C. acnes.

Furthermore, we assessed the antimicrobial efficacy of the AMPs against C. albicans, a typical fungus. As demonstrated in Fig. 7, all evaluated peptides effectively inhibited C. albicans, indicating their capability to combat both bacterial and fungal pathogens. However, it’s worth noting that the MIC values for C. albicans were also higher than those for C. acnes as detailed in Table 2. In essence, these comprehensive antimicrobial tests underscore the potent and broad-spectrum activity of our designed AMPs.

Hemolytic activities of designed peptides

The hemolytic activity, a pivotal metric in therapeutic AMPs development, gauges potential toxicity to erythrocytes. Figure 8a shows the hemolytic behavior of the examined peptides relative to the 0.01% Triton X-100 control, which yields a 100% erythrocyte lysis. It’s demonstrated that the hemolytic concentrations (EC90) of all tested peptides are significantly higher than their respective MICs against C. acnes, which means these peptides are very safe. Among them, AMP-5,9,25 showcased the greatest erythrocyte tolerance, tolerating concentrations exceeding 240 µg/mL. This was followed closely by AMP-27,38, and HPA3NT3. However, the most potent antimicrobial peptides, namely AMP-12,29,31,33, and FK13, exhibited lower tolerance, possibly due to their distinct structural properties. Overall, these findings affirm that our custom-designed AMPs present minimal erythrocyte toxicity.

In vitro cytotoxic activities of designed peptides

For anti-acne applications, it's vital to assess peptide cytotoxicity on human skin cells. Hence, we evaluated our designed peptides against human keratinocytes (HaCaT) using the CCK-8 assay. Untreated cells served as the baseline, representing 100% viability. As depicted in Fig. 8b, at lower concentrations, peptides tended to enhance HaCaT cell viability. However, at elevated concentrations, they inhibited cellular activity. Notably, AMP-27 stood out, demonstrating the highest cell tolerance up to 320 µg/mL. This was followed by AMP-9 at 160 µg/mL and a cohort of peptides, including AMP-5,14,21,25,33,38, all showing tolerance up to 80 µg/mL. When combined all the data from cytotoxic, antimicrobial, and hemolytic activities (refer to Table 2), a pattern seems to emerge: peptides with heightened antimicrobial prowess often exhibited increased cytotoxicity.

Conclusion

In this study, we introduce a sophisticated pipeline tailored for the design of AMPs, specifically targeting C. acnes, while ensuring their novelty and minimizing hemolytic activity. Our primary data, consisting of both active and inactive peptides as well as hemolytic and non-hemolytic variants, was derived from DBAASP. Through a constructed phylogenetic tree, we pinpointed species closely related to C. acnes, which allowed us to curate a dataset of C. acnes-associated AMPs. Leveraging this dataset, we crafted models for AMPs generation, activity classification, and hemolysis prediction. Notably, our exploration into pretrained protein embeddings revealed differential performance: while ProtTrans excelled in large activity datasets, random initial embeddings shined in smaller hemolysis datasets.

From our C. acnes-focused generator, we extracted 660,000 sequences. Subsequent stringent filtration through classifiers, along with considerations of peptide length and novelty, culminated in 42 shortlisted peptides primed for synthesis, predominantly 14–15 residues long. Laboratory validations were overwhelmingly positive: 30 unique peptides exhibited inhibition against C. acnes growth, yielding a 71% success rate. Standout peptides like AMP-12/29/31/33 showcasing incredibly low MIC values of 2/4 µg/mL, yielding a 10% success rate. Impressively, several designed AMPs, including AMP-5/9/12/25/27/29/31/33/38, also inhibited E. coli, S. aureus, and C. albicans, albeit at higher MICs. This underlines the broad-spectrum yet selective antimicrobial prowess of our designed AMPs. Further bolstering their therapeutic potential, all evaluated AMPs showcased minimal hemolytic and cytotoxic activities. As C. acnes are the most important causes of acne vulgaris which affect more than 80% of all adolescents and young adults worldwide, these AMPs could provide promising anti-C. acnes therapies in both pharmaceutical and cosmetic fields.

The selectivity of the designed AMPs, as observed, aligns well with our design intention: showcasing robust inhibitory activity against C. acnes while presenting diminished effects on other microbes. We think the underpinning mechanism behind this selectivity is mostly caused by the transfer learning with C. acnes-centric AMPs sequences.

In summary, the AI-driven pipeline presented here is efficient and scalable, and can be tuned to not only for anti-C. acnes peptides, but also for generating and filtering other unique anti-microbial functional peptides for diverse applications from healthcare to cosmetics.