A bioinformatic study of antimicrobial peptides identified in the Black Soldier Fly (BSF) Hermetia illucens (Diptera: Stratiomyidae)

Antimicrobial peptides (AMPs) play a key role in the innate immunity, the first line of defense against bacteria, fungi, and viruses. AMPs are small molecules, ranging from 10 to 100 amino acid residues produced by all living organisms. Because of their wide biodiversity, insects are among the richest and most innovative sources for AMPs. In particular, the insect Hermetia illucens (Diptera: Stratiomyidae) shows an extraordinary ability to live in hostile environments, as it feeds on decaying substrates, which are rich in microbial colonies, and is one of the most promising sources for AMPs. The larvae and the combined adult male and female H. illucens transcriptomes were examined, and all the sequences, putatively encoding AMPs, were analysed with different machine learning-algorithms, such as the Support Vector Machine, the Discriminant Analysis, the Artificial Neural Network, and the Random Forest available on the CAMP database, in order to predict their antimicrobial activity. Moreover, the iACP tool, the AVPpred, and the Antifp servers were used to predict the anticancer, the antiviral, and the antifungal activities, respectively. The related physicochemical properties were evaluated with the Antimicrobial Peptide Database Calculator and Predictor. These analyses allowed to identify 57 putatively active peptides suitable for subsequent experimental validation studies.


De novo transcriptome assembly and gene identification. A Next-Generation sequencing (RNAseq)
of the RNA isolated from larvae and combined adult male and female H. illucens transcriptomes was performed for an unambiguous identification of the peptide candidates. Sequencing and de novo assembly of the transcriptomes led to the identification of 25,197 unique nucleotide sequences (contigs) in the larvae transcriptome, and 78,763 contigs in the combined adults. These contigs were functionally annotated using Blast2GO software (https ://www.blast 2go.org). A total of 68 genes, encoding putative AMPs in the H. illucens transcriptomes, were finally identified.
Antimicrobial, anticancer, antiviral and antifungal activity prediction. All identified 68 sequences, encoding putative AMPs, were analysed in silico by the four machine-learning algorithms, such as Support Vector Machine (SVM), Discriminant Analysis (DA), Artificial Neural Network (ANN), and Random Forest (RF), available on the free online CAMP database, in order to predict their antimicrobial activity. The results are shown in Table 1. Table 2 reports the anticancer and non-anticancer scores obtained using the iACP tool. Table 3 shows the results obtained with the AVPpred server to predict the antiviral activity and with the Antifp server used to predict the antifungal activity. These analyses allowed the identification of 57 putatively active peptides: 13 sequences were predicted to be only antimicrobial while the others showed different combinations of antimicrobial, antiviral, anticancer or antifungal activity. In particular, 22 were both putative antimicrobial and anticancer; eight were both putative antimicrobial and antiviral; two were both putative antimicrobial and antifungal; seven were putative antimicrobial, anticancer and antiviral; one was putative antimicrobial antifungal and antiviral; two were putative antimicrobial, anticancer and antifungal while two potentially cover the complete range of analyzed biological activities (antimicrobial, anticancer, antifungal and antiviral). The remaining 11 did not show any activity according to the in silico investigation. In Supplementary Table S1 all the predicted activities are listed.
Physicochemical properties of the identified peptides. The 57 identified, putatively active, peptides belong to different classes of AMPs including defensins, cecropins, attacins and lysozyme (Fig. 1). Although attacins and lysozyme are proteins due to their high molecular weight, they belong to AMPs' classes because of their antibacterial activity. The physicochemical properties of these peptides were evaluated with the Antimicrobial Peptide Database Calculator and Predictor APD3 (Table 4). Figure 2 shows the graphical representation of the calculated physicochemical properties of the 57 identified peptides, whereas Table 5 reports their amino acid composition and the amino acid frequency, compared to the amino acid composition of the patent AMPs available in the APD database. The highest amino acid content in all the analysed AMPs was found for Gly, Ala, Arg, Asn, Cys, Leu, Ser residues, whereas the lowest content was found for His, Met, Trp, Tyr residues (Table 5). A graphical representation of the amino acid composition of each identified peptide is shown in Supplementary Fig. 1. The molecular mass of the identified peptides ranges from 3000 Da for the smallest peptide Hill_BB_C7985 to 19,000 Da for the largest peptide Hill_BB_C9237, with an average of approximately 7000 Da. The amino acid sequences varied from a minimum value of 31 residues to a maximum of 186 residues, and an Bacterial cell growth and viability. Four putative antimicrobial peptides, namely Hill_BB_C6571, Hill_ BB_C16634, Hill_BB_C46948 and Hill_BB_C7985, that showed high antimicrobial score values with all prediction softwares were selected and chemically synthesised. The antimicrobial activity of these peptides was verified by monitoring E. coli cells growth in the presence of different concentrations of each peptide in comparison with untreated cells. Supplementary Fig. 2 shows the growth curves of E. coli cells in the presence of 3 µM (A) or 12 µM (B) concentrations of each peptide. A clear decrease in the growth curves was observed at both concentrations compared to untreated cells (blue line) with bacteria impaired to achieve the exponential phase at 12 µM due to rapid death. The reduction in cell viability was observed with increasing concentration of each peptide in comparison with untreated cells. Next, cell viability of E. coli was also evaluated by treatment with 3 µM of each peptide ( Supplementary  Fig. 2C) confirming a decrease of about 50% in cell viability after 100 min treatment with all four peptides analysed.

Discussion
AMPs are promising candidates as alternatives to conventional antibiotics, thanks to their low toxicity to eukaryotic cells and their broad spectrum of action against bacteria, mycobacteria, fungi, viruses and cancer cells 24 . AMPs can kill bacteria through different mechanisms including membrane disruption, targeting intracellular components, or interfering with the bacterial metabolism [25][26][27] . Furthermore, most AMPs are cationic, with the positive net charge promoting the electrostatic interaction with negatively charged bacterial membranes 28 .
All living organisms produce AMPs with insects being among the richest sources due to their high biodiversity and their extremely varied living environments. The immune system of the insect H. illucens is very developed, as this species feeds on decaying substrates and manure, which are extremely rich in pathogenic microorganisms, as it possible to observe also in other species, such as in Eristalis tenax. Twenty-two AMPs were indeed identified in the Diptera E. tenax, that has been able to adapt to different aquatic habitats (sewage tanks and manure pits) with heavy microbial load 29 . AMPs, which are synthesized by the fat body and hemocytes and then secreted into www.nature.com/scientificreports/ the hemolymph, are an essential part of the immune defense 30,31 . In this study, we focused on the gene level in order to identify all putative genes encoding AMPs (Fig. 3).
The transcriptomes of H. illucens larvae as well as the combined male and female adults were assembled, and all the obtained contigs were functionally annotated through the Blast2Go software resulting in the identification of 68 putative peptides of interest. These sequences were analyzed in silico through the CAMP database and the iACP online tool in order to evaluate their antimicrobial and anticancer activity, respectively. Additionally, the AVPpred and the Antifp servers were used to predict the antiviral and the antifungal activity, respectively, of the identified peptides. Our results led to the identification of 57 peptides, 13 of which were predicted as endowed with an antimicrobial activity, 22 with an antimicrobial and anticancer activity, eight with an antimicrobial and antiviral activity, two with an antimicrobial and antifungal activity, seven with an antimicrobial, anticancer and antiviral activity (Supplementary Table S1). Only one peptide was predicted as antimicrobial, antiviral and antifungal activity, whereas two peptides were predicted to have a putative antimicrobial, anticancer and antifungal activity (Supplementary Table S1). Surprisingly, two peptides, corresponding to Hill_LB_C16634 and NHill_AD_C69719 contigs, resulted positive to all activity predictions (Supplementary Table S1). Most of the identified peptides belong to defensins and cecropins families, whose composition ranges from 34 to 51 amino acids 32,33 . These peptides have a pattern of six cysteines, which are involved in the formation of three disulphide bonds, Cys1-Cys4, Cys2-Cys5 and Cys3-Cys6, for insect defensins 34 . Insect defensins are active against Gram-negative bacteria such as Escherichia coli, but mainly against Gram-positive bacteria, such as Staphylococcus aureus, Micrococcus luteus, Bacillus subtilis, Bacillus thuringiensis, Aerococcus viridians and Bacillus megaterium. Moreover, some insect defensins are also active against fungi [35][36][37][38][39] . For example, the royalisin peptide, isolated from the royal jelly of Apis mellifera, consists of 51 amino acids, and the six cysteine residues are involved in the formation of three disulphide bonds and are active against Gram-positive bacteria and fungi 40 . Defensin targets have not been identified yet, and studies of the structure-activity relationship could be useful to understand the molecular mechanism underlying their bioactivity 41 .
Cecropins were first purified from the moth H. cecropia and represent the most abundant family of linear α-helical AMPs in insects, active against both Gram-negative and Gram-positive bacteria 42 . Insect cecropins, mainly derived from lepidopteran and dipteran species, are the cecropins A, B and D. These consist of 35-37 amino acids with no cysteine residues and are able to lyse the bacterial membrane and to reduce the proline uptake. For example, cecropin B, a linear cationic peptide consisting of 35 amino acids, reduces the lethality of E. coli load and plasma endotoxin levels, and also shows an antifungal activity against Candida albicans 42,43 . Moreover, a cecropin-like peptide was isolated from the salivary glands of the female mosquito Aedes egypti, www.nature.com/scientificreports/ showing antiviral activity against the Dengue virus. Glycine residue is the most spread among the peptides that we identified and is particularly related to Attacin proteins 44,45 . Although the mechanism of action of the different AMPs has not yet been fully elucidated, it appears that AMPs, unlike antibiotics, have more difficulty in causing a microbial resistance, and most of them do not destroy normal cells of higher animals 46 . Recently, it has been demonstrated that the clavaspirin peptide from tunicate Styela clava exhibits the ability to kill drug-resistant pathogens, such as S. aureus, without a detectable resistance 47 . Moreover, it was demonstrated that two proline rich peptides (Lser-PRP2 and Lser-PRP3) do not interfere with protein synthesis but both were able to bind the bacterial chaperone DnaK and are therefore able to inhibit protein folding 48 . The characteristics of AMPs make them excellent candidates for the development of new drugs. The bioinformatic approach represents a powerful tool to predict the physicochemical properties and the putative function of amino acid sequences. However, we aimed to go beyond the simple functional annotation which typically exclusively relies on sequence similarities to peptides deposited in public databases. Indeed,  Table 3. Results obtained with the AVPpred server for the antiviral activity prediction and with Antifp server for the antifungal activity prediction. From left to right are shown in order: peptide contig, AVP motif model results, alignment model results, composition model results, the physio-chemical model results, the overall results for the antiviral prediction, antifungal score and prediction result for the antifungal activity. www.nature.com/scientificreports/ www.nature.com/scientificreports/ the approach we reported is based on the use of several softwares, previously employed to perform similar analyses [49][50][51] , that exploit different algorithms for the determination of a score that predicts the biological activity of unknown peptides. We demonstrated that a similar approach can provide reliable indications about the potential biological activities of candidate AMPs, as confirmed by our preliminary tests on the antimicrobial activity of four identified AMPs ( Supplementary Fig. 2). However, validation studies were out of the scope of this study which was essentially aimed to identify a set of candidate peptides which could serve as a starting point for subsequent functional characterization of H. illucens AMPs by our group, as well as by other researchers in the field. Indeed, following the in silico analysis, the largest peptides could be produced by recombinant methodologies while chemical synthesis could be used for smaller ones. Structural analysis could be performed through mass spectrometry and circular dichroism (CD) and the biological activity could be evaluated by in vitro tests. The produced peptides, in fact, could be tested in vitro to validate their activity against different bacterial strains, both Gram-negative and Gram-positive bacteria, cancer cell lines, and fungi. Moreover, the peptides showing interesting biological activities, could be produced in fusion with suitable tags to investigate their mechanism of action through functional proteomics experiments and advanced mass spectrometry methodologies, in order to characterise their interaction(s) with target protein (mainly components of the biological membranes), thus identifying the possible protein targets. Table 4. Prediction of physicochemical properties using the Antimicrobial Peptide Database Calculator and Predictor (APD3) and the Compute pI/Mw tool-Expasy. From left to right are shown in order: peptide contig, the peptide length, the molecular weight, the total hydrophobic ratio, the total net charge, the isoelectric point (pI) and the Boman index. www.nature.com/scientificreports/

Materials and methods
Rearing of Hermetia illucens and RnA isolation. Hermetia illucens larvae were reared on different diets in order to minimize the possible effect of a specific substrate on the expression of peptides, according to the protocol adopted by Vogel et al. 52 . The adults were reared in an environmental chamber under controlled conditions: temperature 27 ± 1.0 °C, humidity 70% ± 5%, and a photoperiod of 12:12 h [L:D]. Since it is not clear whether all AMPs are expressed in a similar fashion across different larval instars, RNA was obtained from two different instars, in order to identify the maximum number of expressed AMPs. Thus, using the TRI Reagent following the manufacturer's instructions (Sigma, St. Louis, Missouri, USA), RNA was extracted from adults' Table 5. Amino acid frequency and amino acid composition of the identified peptides. As it is shown, the Gly, Ala, Arg, Asn, Cys, Leu, Ser residues are the most abundant, whereas the lowest content is associated with the His, Met, Trp, Tyr residues.  www.nature.com/scientificreports/ total body and from two larval stages: 2 nd and 5 th instar larvae whose isolated RNA was subsequently pooled in a 1:1 ratio for RNAseq. A DNase (Turbo DNase, Ambion Austin, Texas, USA) treatment was carried out to eliminate any contaminating DNA. After the DNase enzyme removal, the RNA was further purified using the RNeasy MinElute Clean up Kit (Qiagen, Venlo, Netherlands) following the manufacturer's protocol, and eluted in 20 μL of RNA Storage Solution (Ambion Austin, Texas, USA). The RNA integrity was verified on an Agilent 2100 Bioanalyzer using the RNA Nano chips (Agilent Technologies, Palo Alto, CA), and the RNA quantity was determined by a Nanodrop ND1000 spectrophotometer.

Amino acid composition of peptides identified in Hermetia illucens
RNA-Seq, de novo larvae and combined adult male and female transcriptomes assembly and gene identification. The transcriptome sequencing of all RNA samples was performed with a poly(A) + enriched mRNA fragmented to an average of 150 nucleotides. The sequencing was carried out by the Max Planck Genome Center (https ://mpgc.mpipz .mpg.de/home/) using standard TruSeq procedures on an Illumina HiSeq2500 sequencer. The de novo transcriptome assembly was carried out using a CLC Genomics Workbench v7.1 (https ://www.clcbi o.com) which is designed to assemble large transcriptomes using sequences from short-read sequencing platforms. All obtained sequences (contigs) were used as queries for a BLASTX search 53 in the 'National Center for Biotechnology Information' (NCBI) non-redundant (nr) database, considering all hits with an E-value cut-off of 10 -5 . The transcriptomes were annotated using BLAST, Gene Ontology, and InterProScan searches using Blast2GO PRO v2.6.1 (https ://www.blast 2go.de) 54 . To optimize the annotation of the obtained data, GO slim was used, a subset of GO terms that provides a higher level of annotations and allows a more global view of the result. Candidate AMP genes were identified through an established reference set of insect-derived AMPs and lysozymes, and additional filtering steps to avoid interpreting incomplete genes or allelic variants as further AMP genes 52 . in silico analysis for the antimicrobial, anticancer, antiviral and antifungal activity prediction. The sequences, functionally annotated as antimicrobial peptides by the Blast2Go software, were analysed with Prop 1.0 55 and Signal P 4.0 56 Servers in order to identify the signal peptide and the pro-peptide region. The mature and active peptide regions were analysed in silico by four machine-learning algorithms, available on the CAMP database 57 : Support Vector Machine (SVM), Discriminant Analysis (DA), Artificial Neural Network (ANN), and Random Forest (RF), in order to predict their antimicrobial activity. The minimum calculated threshold for a sequence in order to be considered antimicrobial is 0.5 [67][68][69] . When all the sequences were analyzed with the algorithms, the ones with a score higher than 0.5 were automatically considered putative antimicrobials by the software. We would like to point out that the threshold is intrinsically set by the software, and can't be modified by the user. This is true for the SVM, RF and DA algorithms that report the result in a numerical form (score) while the ANN algorithm provides the results as categories, namely either AMP (antimicrobial) or NAMP (not-antimicrobial). All sequences that showed a positive result with all four statistical methods, were considered as antimicrobial. The iACP tool [58][59][60][61][62] was used to predict the anticancer activity of the same sequences, providing the results in a numerical form. The prediction of the antiviral activity was performed in silico with the online server AVPpred. It exploits four different models: (1) the AVP motif, which returns the result as YES or NO; (2) the Alignment model, which gives the result in the form AVP or Non-AVP; (3) the Composition model and the (4) the Physico-chemical model, which return their results in a numerical form (percentage). The overall result is expressed with a YES, if the peptide results have a putative antiviral activity, and with a NO, if otherwise 63 . The Antifp server was used to predict the antifungal activity, and provides the result as a numerical score 64 . For this analysis, a threshold of 0.5 was used.
Evaluation of the physicochemical properties. The corresponding physicochemical properties of identified putative active peptides following an in silico analysis, such as peptide length, molecular weight, total hydrophobic ratio, total net charge, isoelectric point, and the Boman Index, were determined by the Antimicrobial Peptide Database Calculator and Predictor (APD3) [65][66][67] and the Compute pI/Mw tool-Expasy 68, 69 .
Bacterial cell growth and viability. Four putative antimicrobial peptides, namely Hill_BB_C6571, Hill_BB_C16634, Hill_BB_C46948 and Hill_BB_C7985, that showed high antimicrobial score values with all prediction softwares were selected and chemically synthesised (Bio-Fab Research, Rome, Italy). E. coli cells were incubated overnight in LB medium at 37 °C. The culture was then diluted to a concentration of 0.08 OD 600 /mL in fresh medium and grown at 37 °C for 90 min. At an OD/mL value of 0.5, the antimicrobial peptides were added to the culture at a final concentration of 3 or 12 µM. Growth of the culture was evaluated every 20 min for a total of 120 min by assessing absorbance at 600 nm. Cell viability was evaluated by enumerating Colony Forming Units (CFU) after 16 h of incubation with 3 µM of each peptide. Serial dilutions of bacterial cultures up to a concentration of 10 -6 cells both for treated and untreated samples were prepared. Finally, 100 µL of each sample was plated on LB agar every 20 min for a total of 100 min. Plates were incubated for 16 h at 37 °C and the CFUs occurring on each plate were then counted. Experiments were performed in triplicate.