Introduction

With over one million described species, insects represent the most diverse as well as the largest class of organisms in the world, due to their ability to adapt to recurrent changes and to their resistance against a wide spectrum of pathogens1. Their immune system, exclusively based on the innate, well-developed immune response, allows a general and rapid response to various invading organisms2, 3. The humoral immune response includes the enzymatic cascade that regulates the activation of coagulation and melanization of the hemolymph, the production of reactive oxygen (ROS) and nitrogen (RNS) species, and the production of antimicrobial peptides (AMPs)4.

Today, the problem of antibiotic resistance represents one of the greatest threats in the medical field4. The constant need to find alternative solutions has increased the interest in AMPs over time. AMPs are small molecules, consisting of 10–100 amino acids, that have been identified in many organisms such as bacteria, fungi, plants, vertebrates and invertebrates, including insects5. They are cationic molecules that exhibit activities against bacteria, fungi, viruses, and parasites5. In addition to these known activities, many peptides also exert a cytotoxic effect against cancer cells6.

The discovery of the first AMP derived from insects, dates back to 1980s, when Boman et al.7 identified and isolated the first cecropin from the lepidopteran Hyalophora cecropia. Since then, many other AMPs have been discovered. Due to their high biodiversity, insects are considered to be among the richest and most innovative sources for these molecules. Insect AMPs can be classified into four families: α-helical peptides (e.g. cecropins), cysteine-rich peptides (e.g. defensins), proline-rich peptides, and glycine-rich peptides8. Despite their diversity, AMPs share two common features: the tendency to adopt an amphipathic conformation and the presence of a large number of basic residues, which determine the net positive charge at a neutral pH9. The established electrostatic forces between the positive amino acid residues of a peptide and the negative charges exposed on microorganism cell surfaces allow their interaction with bacterial membranes. Moreover, the cationic nature of these peptides allows the interaction with the negatively charged molecules exposed on cancer cell surfaces, such as phospholipid phosphatidylserine (PS), O-glycosylated mucins, sialylated gangliosides, and heparin sulfate, in contrast to the typical zwitterionic nature of the normal mammalian membranes6,10,11. According to their mechanism of action, AMPs can be grouped in two categories12, (1) the membranolytic mechanism, described by three different putative models: “carpet”, “toroidal” and “barrel-stave” model13, and (2) the non-membranolytic one, characterised by their direct interaction with intracellular targets such as DNA, RNA and proteins14,15,16.

To date, more than 3000 AMPs have been discovered and reported to the Antimicrobial Peptide Database (APD, https://aps.unmc.edu/AP/), which contains exactly 3104 AMPs from six kingdoms: 343 from bacteria, 5 from archaea, 8 from protists, 20 from fungi, 349 from plants, and 2301 from animals. The amount of AMPs in insects varies according to the species, i.e. more than 50 AMPs have been found in the invasive ladybird Harmonia axyridis17, whereas none was identified in the pea aphid Acyrthosiphon pisum18. The non-pest insect Hermetia illucens (Diptera: Stratiomyidae), also known as the Black Soldier Fly (BSF), is among the most promising sources for AMPs being able to live in hostile environments rich in microbial colonies19. In this study, we have analysed the larvae and the combined adult male and female H. illucens transcriptomes in order to identify AMPs, which were then analysed with the CAMP (Collection of Antimicrobial Peptides) database (https://www.camp.bicnirrh.res.in/)20,21,22,23. Moreover, the iACP online tool (https://lin.uestc.edu.cn/server/iACP) was used to predict the anticancer activity of the identified peptides while the AVPpred (https://crdd.osdd.net/servers/avppred) server was used to predict the antiviral activity of the identified peptides while the Antifp server (https://webs.iiitd.edu.in/raghava/antifp) was used to predict their antifungal activity, and their physicochemical properties were evaluated with the Antimicrobial Peptide Database Calculator and Predictor (APD3).

Results

De novo transcriptome assembly and gene identification

A Next-Generation sequencing (RNAseq) of the RNA isolated from larvae and combined adult male and female H. illucens transcriptomes was performed for an unambiguous identification of the peptide candidates. Sequencing and de novo assembly of the transcriptomes led to the identification of 25,197 unique nucleotide sequences (contigs) in the larvae transcriptome, and 78,763 contigs in the combined adults. These contigs were functionally annotated using Blast2GO software (https://www.blast2go.org). A total of 68 genes, encoding putative AMPs in the H. illucens transcriptomes, were finally identified.

Antimicrobial, anticancer, antiviral and antifungal activity prediction

All identified 68 sequences, encoding putative AMPs, were analysed in silico by the four machine-learning algorithms, such as Support Vector Machine (SVM), Discriminant Analysis (DA), Artificial Neural Network (ANN), and Random Forest (RF), available on the free online CAMP database, in order to predict their antimicrobial activity. The results are shown in Table 1. Table 2 reports the anticancer and non-anticancer scores obtained using the iACP tool. Table 3 shows the results obtained with the AVPpred server to predict the antiviral activity and with the Antifp server used to predict the antifungal activity. These analyses allowed the identification of 57 putatively active peptides: 13 sequences were predicted to be only antimicrobial while the others showed different combinations of antimicrobial, antiviral, anticancer or antifungal activity. In particular, 22 were both putative antimicrobial and anticancer; eight were both putative antimicrobial and antiviral; two were both putative antimicrobial and antifungal; seven were putative antimicrobial, anticancer and antiviral; one was putative antimicrobial antifungal and antiviral; two were putative antimicrobial, anticancer and antifungal while two potentially cover the complete range of analyzed biological activities (antimicrobial, anticancer, antifungal and antiviral). The remaining 11 did not show any activity according to the in silico investigation. In Supplementary Table S1 all the predicted activities are listed.

Table 1 Prediction of the antimicrobial activity through the CAMP database.
Table 2 Prediction of the anticancer activity through the iACP tool.
Table 3 Results obtained with the AVPpred server for the antiviral activity prediction and with Antifp server for the antifungal activity prediction.

Physicochemical properties of the identified peptides

The 57 identified, putatively active, peptides belong to different classes of AMPs including defensins, cecropins, attacins and lysozyme (Fig. 1). Although attacins and lysozyme are proteins due to their high molecular weight, they belong to AMPs’ classes because of their antibacterial activity. The physicochemical properties of these peptides were evaluated with the Antimicrobial Peptide Database Calculator and Predictor APD3 (Table 4). Figure 2 shows the graphical representation of the calculated physicochemical properties of the 57 identified peptides, whereas Table 5 reports their amino acid composition and the amino acid frequency, compared to the amino acid composition of the patent AMPs available in the APD database. The highest amino acid content in all the analysed AMPs was found for Gly, Ala, Arg, Asn, Cys, Leu, Ser residues, whereas the lowest content was found for His, Met, Trp, Tyr residues (Table 5). A graphical representation of the amino acid composition of each identified peptide is shown in Supplementary Fig. 1. The molecular mass of the identified peptides ranges from 3000 Da for the smallest peptide Hill_BB_C7985 to 19,000 Da for the largest peptide Hill_BB_C9237, with an average of approximately 7000 Da. The amino acid sequences varied from a minimum value of 31 residues to a maximum of 186 residues, and an average of approximately 66 residues. The total hydrophobic ratio showed the lowest value of 26 for the peptide NHill_AD_C53857 and the highest of 60 for the peptide Hill_BB_C390, and an average value of approximately 40. The total net charge of the identified peptides ranged from − 6, for the Hill_BB_C390 peptide to + 9 for the Hill_BB_C14202 peptide, with an average value of + 3, while the Isoelectric Point (pI) varied from 3.34 for the Hill_BB_C390 peptide to 11.83 for the NHill_AD_C12928 peptide, with an average value of 8.79.

Figure 1
figure 1

Graphic representation of the identified AMP classes from larvae and adult transcriptomes. The pie chart shows that the largest number of identified peptides belongs to the class of defensins.

Table 4 Prediction of physicochemical properties using the Antimicrobial Peptide Database Calculator and Predictor (APD3) and the Compute pI/Mw tool—Expasy.
Figure 2
figure 2

Graphical representation of the physicochemical properties of the 57 peptides with putative activity: (a) total hydrophobic ratio; (b) total net charge; (c) isoelectric point; (d) molecular weight; (e) peptide length; (f) Boman Index.

Table 5 Amino acid frequency and amino acid composition of the identified peptides.

Bacterial cell growth and viability

Four putative antimicrobial peptides, namely Hill_BB_C6571, Hill_BB_C16634, Hill_BB_C46948 and Hill_BB_C7985, that showed high antimicrobial score values with all prediction softwares were selected and chemically synthesised. The antimicrobial activity of these peptides was verified by monitoring E. coli cells growth in the presence of different concentrations of each peptide in comparison with untreated cells. Supplementary Fig. 2 shows the growth curves of E. coli cells in the presence of 3 µM (A) or 12 µM (B) concentrations of each peptide. A clear decrease in the growth curves was observed at both concentrations compared to untreated cells (blue line) with bacteria impaired to achieve the exponential phase at 12 µM due to rapid death. The reduction in cell viability was observed with increasing concentration of each peptide in comparison with untreated cells.

Next, cell viability of E. coli was also evaluated by treatment with 3 µM of each peptide (Supplementary Fig. 2C) confirming a decrease of about 50% in cell viability after 100 min treatment with all four peptides analysed.

Discussion

AMPs are promising candidates as alternatives to conventional antibiotics, thanks to their low toxicity to eukaryotic cells and their broad spectrum of action against bacteria, mycobacteria, fungi, viruses and cancer cells24. AMPs can kill bacteria through different mechanisms including membrane disruption, targeting intracellular components, or interfering with the bacterial metabolism25,26,27. Furthermore, most AMPs are cationic, with the positive net charge promoting the electrostatic interaction with negatively charged bacterial membranes28.

All living organisms produce AMPs with insects being among the richest sources due to their high biodiversity and their extremely varied living environments. The immune system of the insect H. illucens is very developed, as this species feeds on decaying substrates and manure, which are extremely rich in pathogenic microorganisms, as it possible to observe also in other species, such as in Eristalis tenax. Twenty-two AMPs were indeed identified in the Diptera E. tenax, that has been able to adapt to different aquatic habitats (sewage tanks and manure pits) with heavy microbial load29. AMPs, which are synthesized by the fat body and hemocytes and then secreted into the hemolymph, are an essential part of the immune defense30, 31. In this study, we focused on the gene level in order to identify all putative genes encoding AMPs (Fig. 3).

Figure 3
figure 3

Strategies carried out in order to identify peptides from Hermetia illucens insect.

The transcriptomes of H. illucens larvae as well as the combined male and female adults were assembled, and all the obtained contigs were functionally annotated through the Blast2Go software resulting in the identification of 68 putative peptides of interest. These sequences were analyzed in silico through the CAMP database and the iACP online tool in order to evaluate their antimicrobial and anticancer activity, respectively. Additionally, the AVPpred and the Antifp servers were used to predict the antiviral and the antifungal activity, respectively, of the identified peptides. Our results led to the identification of 57 peptides, 13 of which were predicted as endowed with an antimicrobial activity, 22 with an antimicrobial and anticancer activity, eight with an antimicrobial and antiviral activity, two with an antimicrobial and antifungal activity, seven with an antimicrobial, anticancer and antiviral activity (Supplementary Table S1). Only one peptide was predicted as antimicrobial, antiviral and antifungal activity, whereas two peptides were predicted to have a putative antimicrobial, anticancer and antifungal activity (Supplementary Table S1). Surprisingly, two peptides, corresponding to Hill_LB_C16634 and NHill_AD_C69719 contigs, resulted positive to all activity predictions (Supplementary Table S1). Most of the identified peptides belong to defensins and cecropins families, whose composition ranges from 34 to 51 amino acids32, 33. These peptides have a pattern of six cysteines, which are involved in the formation of three disulphide bonds, Cys1–Cys4, Cys2–Cys5 and Cys3–Cys6, for insect defensins34. Insect defensins are active against Gram–negative bacteria such as Escherichia coli, but mainly against Gram-positive bacteria, such as Staphylococcus aureus, Micrococcus luteus, Bacillus subtilis, Bacillus thuringiensis, Aerococcus viridians and Bacillus megaterium. Moreover, some insect defensins are also active against fungi35,36,37,38,39. For example, the royalisin peptide, isolated from the royal jelly of Apis mellifera, consists of 51 amino acids, and the six cysteine residues are involved in the formation of three disulphide bonds and are active against Gram-positive bacteria and fungi40. Defensin targets have not been identified yet, and studies of the structure–activity relationship could be useful to understand the molecular mechanism underlying their bioactivity41.

Cecropins were first purified from the moth H. cecropia and represent the most abundant family of linear α-helical AMPs in insects, active against both Gram-negative and Gram-positive bacteria42. Insect cecropins, mainly derived from lepidopteran and dipteran species, are the cecropins A, B and D. These consist of 35–37 amino acids with no cysteine residues and are able to lyse the bacterial membrane and to reduce the proline uptake. For example, cecropin B, a linear cationic peptide consisting of 35 amino acids, reduces the lethality of E. coli load and plasma endotoxin levels, and also shows an antifungal activity against Candida albicans42,43. Moreover, a cecropin-like peptide was isolated from the salivary glands of the female mosquito Aedes egypti, showing antiviral activity against the Dengue virus. Glycine residue is the most spread among the peptides that we identified and is particularly related to Attacin proteins44,45. Although the mechanism of action of the different AMPs has not yet been fully elucidated, it appears that AMPs, unlike antibiotics, have more difficulty in causing a microbial resistance, and most of them do not destroy normal cells of higher animals46. Recently, it has been demonstrated that the clavaspirin peptide from tunicate Styela clava exhibits the ability to kill drug-resistant pathogens, such as S. aureus, without a detectable resistance47. Moreover, it was demonstrated that two proline rich peptides (Lser-PRP2 and Lser-PRP3) do not interfere with protein synthesis but both were able to bind the bacterial chaperone DnaK and are therefore able to inhibit protein folding48. The characteristics of AMPs make them excellent candidates for the development of new drugs.

The bioinformatic approach represents a powerful tool to predict the physicochemical properties and the putative function of amino acid sequences. However, we aimed to go beyond the simple functional annotation which typically exclusively relies on sequence similarities to peptides deposited in public databases. Indeed, the approach we reported is based on the use of several softwares, previously employed to perform similar analyses49,50,51, that exploit different algorithms for the determination of a score that predicts the biological activity of unknown peptides. We demonstrated that a similar approach can provide reliable indications about the potential biological activities of candidate AMPs, as confirmed by our preliminary tests on the antimicrobial activity of four identified AMPs (Supplementary Fig. 2). However, validation studies were out of the scope of this study which was essentially aimed to identify a set of candidate peptides which could serve as a starting point for subsequent functional characterization of H. illucens AMPs by our group, as well as by other researchers in the field. Indeed, following the in silico analysis, the largest peptides could be produced by recombinant methodologies while chemical synthesis could be used for smaller ones. Structural analysis could be performed through mass spectrometry and circular dichroism (CD) and the biological activity could be evaluated by in vitro tests. The produced peptides, in fact, could be tested in vitro to validate their activity against different bacterial strains, both Gram-negative and Gram-positive bacteria, cancer cell lines, and fungi. Moreover, the peptides showing interesting biological activities, could be produced in fusion with suitable tags to investigate their mechanism of action through functional proteomics experiments and advanced mass spectrometry methodologies, in order to characterise their interaction(s) with target protein (mainly components of the biological membranes), thus identifying the possible protein targets.

Materials and methods

Rearing of Hermetia illucens and RNA isolation

Hermetia illucens larvae were reared on different diets in order to minimize the possible effect of a specific substrate on the expression of peptides, according to the protocol adopted by Vogel et al.52. The adults were reared in an environmental chamber under controlled conditions: temperature 27 ± 1.0 °C, humidity 70% ± 5%, and a photoperiod of 12:12 h [L:D]. Since it is not clear whether all AMPs are expressed in a similar fashion across different larval instars, RNA was obtained from two different instars, in order to identify the maximum number of expressed AMPs. Thus, using the TRI Reagent following the manufacturer’s instructions (Sigma, St. Louis, Missouri, USA), RNA was extracted from adults’ total body and from two larval stages: 2nd and 5th instar larvae whose isolated RNA was subsequently pooled in a 1:1 ratio for RNAseq. A DNase (Turbo DNase, Ambion Austin, Texas, USA) treatment was carried out to eliminate any contaminating DNA. After the DNase enzyme removal, the RNA was further purified using the RNeasy MinElute Clean up Kit (Qiagen, Venlo, Netherlands) following the manufacturer’s protocol, and eluted in 20 μL of RNA Storage Solution (Ambion Austin, Texas, USA). The RNA integrity was verified on an Agilent 2100 Bioanalyzer using the RNA Nano chips (Agilent Technologies, Palo Alto, CA), and the RNA quantity was determined by a Nanodrop ND1000 spectrophotometer.

RNA-Seq, de novo larvae and combined adult male and female transcriptomes assembly and gene identification

The transcriptome sequencing of all RNA samples was performed with a poly(A) + enriched mRNA fragmented to an average of 150 nucleotides. The sequencing was carried out by the Max Planck Genome Center (https://mpgc.mpipz.mpg.de/home/) using standard TruSeq procedures on an Illumina HiSeq2500 sequencer. The de novo transcriptome assembly was carried out using a CLC Genomics Workbench v7.1 (https://www.clcbio.com) which is designed to assemble large transcriptomes using sequences from short-read sequencing platforms. All obtained sequences (contigs) were used as queries for a BLASTX search53 in the ‘National Center for Biotechnology Information’ (NCBI) non-redundant (nr) database, considering all hits with an E-value cut-off of 10–5. The transcriptomes were annotated using BLAST, Gene Ontology, and InterProScan searches using Blast2GO PRO v2.6.1 (https://www.blast2go.de)54. To optimize the annotation of the obtained data, GO slim was used, a subset of GO terms that provides a higher level of annotations and allows a more global view of the result. Candidate AMP genes were identified through an established reference set of insect-derived AMPs and lysozymes, and additional filtering steps to avoid interpreting incomplete genes or allelic variants as further AMP genes52.

In silico analysis for the antimicrobial, anticancer, antiviral and antifungal activity prediction

The sequences, functionally annotated as antimicrobial peptides by the Blast2Go software, were analysed with Prop 1.055 and Signal P 4.056 Servers in order to identify the signal peptide and the pro-peptide region. The mature and active peptide regions were analysed in silico by four machine-learning algorithms, available on the CAMP database57: Support Vector Machine (SVM), Discriminant Analysis (DA), Artificial Neural Network (ANN), and Random Forest (RF), in order to predict their antimicrobial activity. The minimum calculated threshold for a sequence in order to be considered antimicrobial is 0.567,68,69. When all the sequences were analyzed with the algorithms, the ones with a score higher than 0.5 were automatically considered putative antimicrobials by the software. We would like to point out that the threshold is intrinsically set by the software, and can’t be modified by the user. This is true for the SVM, RF and DA algorithms that report the result in a numerical form (score) while the ANN algorithm provides the results as categories, namely either AMP (antimicrobial) or NAMP (not-antimicrobial). All sequences that showed a positive result with all four statistical methods, were considered as antimicrobial. The iACP tool58,59,60,61,62 was used to predict the anticancer activity of the same sequences, providing the results in a numerical form. The prediction of the antiviral activity was performed in silico with the online server AVPpred. It exploits four different models: (1) the AVP motif, which returns the result as YES or NO; (2) the Alignment model, which gives the result in the form AVP or Non-AVP; (3) the Composition model and the (4) the Physico-chemical model, which return their results in a numerical form (percentage). The overall result is expressed with a YES, if the peptide results have a putative antiviral activity, and with a NO, if otherwise63. The Antifp server was used to predict the antifungal activity, and provides the result as a numerical score64. For this analysis, a threshold of 0.5 was used.

Evaluation of the physicochemical properties

The corresponding physicochemical properties of identified putative active peptides following an in silico analysis, such as peptide length, molecular weight, total hydrophobic ratio, total net charge, isoelectric point, and the Boman Index, were determined by the Antimicrobial Peptide Database Calculator and Predictor (APD3)65,66,67 and the Compute pI/Mw tool—Expasy68, 69.

Bacterial cell growth and viability

Four putative antimicrobial peptides, namely Hill_BB_C6571, Hill_BB_C16634, Hill_BB_C46948 and Hill_BB_C7985, that showed high antimicrobial score values with all prediction softwares were selected and chemically synthesised (Bio-Fab Research, Rome, Italy). E. coli cells were incubated overnight in LB medium at 37 °C. The culture was then diluted to a concentration of 0.08 OD600/mL in fresh medium and grown at 37 °C for 90 min. At an OD/mL value of 0.5, the antimicrobial peptides were added to the culture at a final concentration of 3 or 12 µM. Growth of the culture was evaluated every 20 min for a total of 120 min by assessing absorbance at 600 nm.

Cell viability was evaluated by enumerating Colony Forming Units (CFU) after 16 h of incubation with 3 µM of each peptide. Serial dilutions of bacterial cultures up to a concentration of 10–6 cells both for treated and untreated samples were prepared. Finally, 100 µL of each sample was plated on LB agar every 20 min for a total of 100 min. Plates were incubated for 16 h at 37 °C and the CFUs occurring on each plate were then counted. Experiments were performed in triplicate.