Building HMM and molecular docking analysis for the sensitive detection of anti-viral pneumonia antimicrobial peptides (AMPs)

Pneumonia is the main reason for mortality among children under five years, causing 1.6 million deaths every year; late research has exhibited that mortality is increasing in the elderly. A few biomarkers used for its diagnosis need specificity and precision, as they are related to different infections, for example, pulmonary tuberculosis and Human Immunodeficiency Virus. There is a quest for new biomarkers worldwide to diagnose the disease to defeat these previously mentioned constraints. Antimicrobial peptides (AMPs) are promising indicative specialists against infection. This research work used AMPs as biomarkers to detect viral pneumonia pathogens, for example, Respiratory syncytial virus, Influenza A and B viruses utilizing in silico technologies, such as Hidden Markov Model (HMMER). HMMER was used to distinguish putative anti-viral pneumonia AMPs against the recognized receptor proteins of Respiratory syncytial virus, Influenza A, and B viruses. The physicochemical parameters of these putative AMPs were analyzed, and their 3-D structures were determined utilizing I-TASSER. Molecular docking interaction of these AMPs against the recognized viral pneumonia proteins was carried out using the PATCHDOCK and HDock servers. The results demonstrated 27 anti-viral AMPs ranked based on their E values with significant physicochemical parameters in similarity with known experimentally approved AMPs. The AMPs additionally had a high anticipated binding potential to the pneumonia receptors of these microorganisms sensitively. The tendency of the putative anti-viral AMPs to bind pneumonia proteins showed that they would be promising applicant biomarkers to identify these viral microorganisms in the point-of-care (POC) pneumonia diagnostics. The high precision observed for the AMPs legitimizes HMM’s utilization in the disease diagnostics’ discovery process.


Materials and methods
Data retrieval (literature mining). The experimentally approved anti-pneumonia AMPs for the viral pathogens (Respiratory syncytial virus, Influenza A and B viruses) were recovered from the antimicrobial peptide databases, for example, Antimicrobial Peptides Database (APD3) 19,20 , Collection of Antimicrobial Peptides (CAMP) 21 , and Anti-viral peptides databases (AVPDB) 22 . Curation was carried out through literature mining to affirm that all the recovered AMPs were either experimentally approved or anticipated. Duplicate experimentally validated AMPs were then removed from the recovered list utilizing the Cluster Database at High Identity with Tolerance (CD-HIT) 23 .
Training and testing datasets (data mining). The final list of the experimentally validated AMPs was sorted by their particular pathogenic strains with INFA-anti-Influenza A; INFB-anti-Influenza B; and RSV-anti-Respiratory Syncytial Virus [24][25][26] . Every classification of the strain-specific datasets was arbitrarily separated into two subsets: seventy-five percent of every dataset was used as the training set (to assemble each profile). At the same time, one-quarter was utilized as the testing dataset.

Construction of AMPs profiles (text mining).
The HMMER algorithm version 2.3.2 27 was utilized to build detailed pathogen-targeted models/profiles utilizing the training datasets. All the HMM profiles were constructed on the Ubuntu 12.04 LTS operating system. The assignment was cultivated on a terminal, and the command lines used to fabricate each profile was composed by the corresponding algorithm and the means associated with their development were as beneath:For the initial step, the training datasets of each target class were adjusted utilizing the ClustalW alignment device 28 . The task was carried out utilizing the command line: The command line essentially states ≪do an alignment of the sequences which are in the capitalized form found in the input record "target class.fasta" with the FastA, utilizing ClustalW as numerous alignment instruments and GCG Postscript yield for graphical printing≫. The command's yield brings about the development of adjusted sequences, called "target class.msf ". The modified sequences were utilized as a contribution to the subsequent stage.
clustalw-align-output=gcg-case=upper-sequos=off-outorder=aligned-infile=targetclass.fasta (i) Identification of novel putative anti-Pneumonia AMPs from proteome sequences. Proteome sequences were queried by the profiles with the list of all proteome sequences (in the fasta design) recovered from the Ensembl information base (http:// www. ensem bl. org/ index. html) and the UniProt information base (http:// www. unipr ot. org/). A cut-off E-value was set to be 0.05 for the retrieval of putative anti-pneumonia AMPs. This was cultivated utilizing "hmmsearch" module of the HMMER software with the command line utilized expressed underneath: where the target class.hmm in one of the three profiles, target class query.txt speaking to the species examined against the profile and result file.txt is the outcome document realized from querying the species against a specific microbe profile.
Identification of receptors. Viral receptors, for example, cell surface receptors and nucleoproteins, were recognized for the viral causative pathogens (Respiratory syncytial virus, Influenza A, and B) involved in pneumonia to fill in as targets for the distinguished AMPs utilizing a few in-silico strategies. Viral pneumonia proteins were gathered from different protein data banks (PDB), for example, the National Center for Biotechnology Information (NCBI), UniProt, Google Scholar, and Ensembl through literature mining. Curation was performed to confirm that all the recovered viral pneumonia proteins were complete or incomplete. Fractional proteins were removed, and complete protein was retained for additional examination. BLAST investigation was performed utilizing the UniProt interface for further affirmation of specificity with the end goal that the viral pneumonia proteins retrieved were absent in other microorganisms and viruses.
Physicochemical properties of the putative anti-Pneumonia AMPs and the pneumonia proteins. Physicochemical properties of the putative anti-pneumonia AMPs and pneumonia receptor proteins were determined utilizing the calculation interface of Bactibase (http:// bacti base. pfba-lab-tun. org/ physi cochem) 28,29 and APD3 (https:// wanga pd3. com/ main. php) 18,19 utilizing the amino acid sequences of the putative peptides as information.
De novo structure predictions of the putative anti-Pneumonia AMPs and Pneumonia proteins (receptors) using I-TASSER. 3

Results
Retrieval of anti-viral AMPs (VAP-AMPs) and profile creation using HMM. Literature mining uncovered 176 experimentally validated anti-viral pneumonia antimicrobial peptides (VAP-AMPs) in total for the CAMP, APD3, and AVPDB databases against the microbes Respiratory Syncytial Virus, Influenza A, and B in the order 112, 52 and 12, respectively. The initial phase in the profile construction pipeline was the random grouping of the various classes into ¾ and ¼ of the experimentally validated AMPs (Table 1). The ¾ is the training dataset, expected to prepare the HMM software to test whether the functionally significant amino acid consensus is captured. After this, multiple alignments were produced utilizing HMM ClustalW. A total of three AMP profiles was produced for every one of the accompanying classes ((anti-Respiratory syncytial virus (RSVM) and, anti-Influenza A, and B (INFA and INFB). of the dataset (training datasets in Table 1). Since experimentally validated AMPs were used, the assumption is that the profiles developed ought to have the option to recognize different sequences with precisely the same action and separate those that have no anti-pneumonia activity from the same microorganism. The constructed profiles were examined against a negative control dataset, comprised of random fragments of 17,236 neuropeptides, which had no recorded anti-pneumonia action. This independent testing was carried out with the negative dataset (neuropeptides) to confirm whether the trained profiles would distinguish non-anti-pneumonia peptides.
The independent testing of the profiles was evaluated utilizing the true positive (TP), false-positive (FP), true negative (TN), and false-negative (FN). A cut-off E-value of 0.05 was applied to the HMM tool to fortify the profile's capacity to separate between the TP anti-pneumonia AMP and the false-negative anti-pneumonia AMPs. TP speaks to effectively anticipated positive sequences (anti-pneumonia AMPs), TN indicates accurately predicted negative groupings (non-anti-pneumonia AMPs), FP (False-positive) is the quantity of non-anti-pneumonia AMPs wrongly anticipated as anti-pneumonia AMPs (AP-AMPs), FN is the number of anti-pneumonia AMPs wrongly anticipated as non-anti-pneumonia AMPs. It was conceivable to ascertain the quantity of TP AMPs from the complete number of input sequences; accordingly, the FP number could be extrapolated with the outcomes that appeared in Table 2, mirroring the limit of each profile to recognize true anti-pneumonia AMPs from false anti-pneumonia AMPs. In Table 2, INFB had all its testing datasets as TP while RSVM had 22 of its 28 testing datasets as TP. Nonetheless, INFA had 6 of its 13 testing datasets as TP, which could be because of an overlap of homologous relationships in the AMPs utilized in their profiles.
Performance measurement of the target-specific profiles. After evaluating the capacity of the tested profiles, the performance was determined to calculate the performance of each profile, utilizing specificity, sensitivity, accuracy, and MCC, presented by organic chemist Brian W. Matthews in 1975 33 . The specificity, sensitivity, accuracy, and MCC were determined as detailed in Table 3.
From the results in Table 3, sensitivity values were high in Anti-Influenza B virus (INFB) and Anti-Respiratory Syncytial Virus (RSVM) of anti-viral profiles tested. The high sensitivity values of INFB and RSVM profiles indicated the right prediction. The moderate sensitivity of INFA could be ascribed to the huge overlap in the conserved space of the AMPs utilized for their profile development 17 . The specificity results for all profiles were 100%, indicating a correct prediction. The accuracy results of the profiles showed a correct prediction with the elimination of mistakes by invalidating misclassified AMPs from both positive and negative datasets. MCC values for all the profiles indicated huge outcomes, with the most minimal value recorded for Anti-Influenza A virus (INFA) (0.50). The MCC value of 0.5 to 1 relates to the ideal expectation, while '0' points to an irregular prediction. Hence all profiles showed right expectation (INFB > RSVM > INFA). The MCC is considered to give the best performance estimation of models since it joins sensitivity, specificity, and accuracy 33 .
Proteome sequence databases query and discovery of putative anti-pneumonia AMPs. The discovery stage (Table 4) was to look for novel anti-viral pneumonia AMPs for the pneumonia pathogens (Influenza A, B just as Respiratory Syncytial Virus) in order to recognize peptides that had similar signatures/motifs and properties as the input sequences used to assemble the profiles RSVM, INFA, and INFB. The matches of the separate profiles to the proteome sequences additionally appeared with E-values (Table 4) of 0.05 to discover putative AMPs. The final list of anti-viral AMPs was arranged by their E-values, with those having the smallest E-values described as the most probable putative anti-viral pneumonia AMPs.  www.nature.com/scientificreports/ Physicochemical properties of the AMPs. The physicochemical parameters of the putative AMPs were determined using APD3 and BACTIBASE to ascertain that the AMP sequences conform to other known AMPs. Physicochemical parameters, for example, atomic weight amino acid components, hydrophobicity, Boman index, net charge, isoelectric potential, and half-life, were utilized to assess the anti-viral AMPs ( Table 5). The amino acid composition of the AMPs adds to the molecular weight since the AMPs are comprised of amino acids and can be a distinctive component to separate between two classes of protein/eptides 34    Retrieval of protein receptors of pneumonia pathogens. This stage was carried out‫‬ to assess the diagnostic potential of some immunogenic proteins of viral pneumonia to serve as targets for the putative antimicrobial peptides to determine these microbes. For example, a few pneumonia proteins, cell surface receptors, and nucleoproteins were analyzed for the viruses: Influenza A, Influenza B viruses, Respiratory Syncytial virus. These recovered protein receptors were projected to be potentially applicable in the diagnosis of viral pneumonia associated with these viruses. The Respiratory syncytial virus has some immunogenic receptors that have potential diagnostic pertinence, such as membrane fusion core protein chains 38 (Table 6). Instability index, molecular weight, and half-life are a function of how stable a protein can be, and any protein with an instability index lower than or equal to 40 is said to be stable; hydrophobicity enhances protein binding to ligands; while the net charge determines the behavior of the proteins in acidic or alkaline solution with all proteins having a net zero charge at the isoelectric point 39 . www.nature.com/scientificreports/

Structure prediction of the putative anti-pneumonia AMPs and Pneumonia protein receptors.
Representative figures from the I-TASSER server after predicting the 3-D structures of the anti-pneumonia AMPs (ligands) and the protein receptors are shown in Fig. 1. The results demonstrate that all AMPs predicted showed different secondary structures, including α-helices, parallel β-sheet, anti-parallel β-sheet, extended, and loop conformational structures. For structure prediction assessment utilizing I-TASSER (Table 7), a few parameters, for example, Confidence score (C-score), Template modeling score (TM-score), and Root Mean Square Deviation (RMSD), were utilized for the prediction of the putative AMPs and pneumonia protein receptor 3-D structures. The results demonstrated that the C-score of all the anticipated 3-D structures for the anti-viral pneumonia AMPs and the pneumonia receptor proteins were between the estimations of − 5 to 2 (see Table7), which suggests an existing template by I TASSER for their structure prediction 40 . The determined C-score of BOPAM-RSV11 was lower than that of the other AMPs and could show that this molecule had no accessible template for prediction by I-TASSER but was not a random prediction 29 . TM-score has of late been proposed for estimating the structural compatibility between two structures 41 . A TM-score > 0.5 shows a model of right topology, and a TM-score < 0.17 implies irregular compatibility. From the results, the TM-score of the predicted structures of the AMPs and protein receptors was higher than the cut-off value of 0.5. This signifies that these structures had a correct topology with structural similarity to the templates that were used to predict their structures 29,41 . Although there is no defined RMSD value for 3-D structure prediction, an RMSD value of 2-4 Å is considered good, and an RMSD ≤ 1 Å is considered ideal. Thus, all anti-viral pneumonia AMPs and the receptor proteins having RMSD within the accepted range (Table 7) had less distance and the atomic deviation between the peptides and the templates used for their 3-D structure prediction 42,43 .
RMSD is sensitive to local error since it is an average distance of all residue sets in two structures, hence the for proposing TM-score. For example, a misorientation of the structure will increase the RMSD value even though the global topology of the structure is right. TM-score is not sensitive to misorientation in the region of the residues, which makes the score insensitive toward the local modelling mistake and, in this manner, a more reliable measure.

Docking interaction analysis of the putative anti-pneumonia amps with viral pneumonia receptors.
The output figures from the PATCHDOCK and HDock servers after predicting the docking interaction between the anti-pneumonia AMPs (ligands) and the protein receptors were analyzed (Fig. 2). The spatial docking interaction analysis indicated that all the AMPs bound firmly to their proteins. Also, the computational investigation was done to affirm the AMPs with the most binding potential. These amino acid residues partook in the complex formation and towards which terminal of the proteins the binding occurs. Among the anti-Influenza A AMPs, only BOPAM-INFA1 bound at a different orientation to the nucleoprotein receptor. In contrast, BOPAM-INFB4 bound differently to the influenza B nucleoprotein when compared to other anti-  www.nature.com/scientificreports/ Influenza B AMPs. All anti-Respiratory syncytial virus AMPs are bound on the same chain A fusion protein orientation except BOPAM-RSV2, 6, and 9. BOPAM-RSVs bound more firmly to chain A protein with the highest binding geometry score noticed for BOPAM-RSV4. In a similar vein, the BOPAM-INFAs bound more firmly to nucleoprotein with the most binding geometry score noticed for BOPAM-INFA4. Also, in Table 8 The putative anti-influenza A AMPs displayed a high docking energy score using HDock, with BOPAM-INFA8 showing the highest energy − 199 kJ/mol. Similarly, all anti-influenza B AMPs displayed high binding energy to their receptors, with BOPAM-INFB2 having the highest docking energy score. Anti-respiratory syncytial virus AMPs showed high energy docking energy scores, with BOPAM-RSV4 and 3 having the highest docking energy scores to the receptor protein. The root-mean-square values are also generated from the HDock server as indicated in Table 9 alongside the hotspot interacting residues of the anti-viral pneumonia AMPs and their respective receptor proteins. The result from the HDock server shows consistency when compared to the PatchDock server.

Discussion
Experimentally validated AMPs were utilized for model construction in this research because their activities have been established since they had demonstrated activity against the target pneumonia viruses with the minimum inhibitory concentration (MIC) as an indicator using the agar dilution or broth micro-dilution strategies, as indicated in the databases 45 . The list of anti-pneumonia AMPs was retained in their separate pathogenic target   www.nature.com/scientificreports/ groups as recovered from the different databases to take into account specific species/microbe profile creation. Also, the profile creation step using the training dataset was carried out to train the HMMER software to assess the discriminatory capacity and quality of the AMPs profiles with both positive (test) and negative (neuropeptides) datasets. This technique of utilizing random sequences as positive and negative datasets is a regularly used method. It depends on the presumption that the probability of discovering random sequences with a discriminative propensity is exceptionally low 29 . Assessment of the profiles' performance showed that they were specificity, accuracy, sensitivity, with excellent MCC 43 46 . There was an exceptionally low probability that these peptides were wrongly predicted to be anti-pneumonia AMPs. Besides, some protein chains, for example, fusion protein core A, which are integral RNA proteins of Respiratory syncytial virus, mediate passage into the transmembrane glycoproteins of the host cell to elicit apoptosis 38 . They additionally assume a pivotal function in the virus assembly and interact with the RNA complex and the viral membrane. Recognition of these proteins in the body fluid has indicated just slight antigenic variance, which is not progressive, a significant factor for their utilization in detecting the virus 47 . Influenza A and B nucleoproteins play some significant structural and functional roles that could be investigated for their diagnostics. They are bi-functional membrane/RNA-binding proteins that participate in the encapsulation of the RNA-nucleoprotein core of the membrane envelope [56]. These nucleoproteins have been utilized in the diagnosis of pneumonia [56]. The utilization of receptor protein applicants, for example, Respiratory syncytial virus fusion protein chains A 48 , Influenza A virus nucleoprotein 49,50 , and Influenza B virus nucleoprotein 51 in the diagnosis of pneumonia is justified because they are synthesized in generally high concentration inside body fluid across all strains and subtypes of these microorganisms; do not change with time; abundantly available either as cell surface receptor and moderately stable to a gentle in vitro handling.
Moreover, the presence of charged, polar, and non-polar amino acids in the putative anti-viral AMPs and the viral receptors is the conferment of charge, improved hydrophobicity, and increased binding potential on them. The hydrophobicity result of the AMPs lower than 30% is not an ideal physicochemical parameter because Table 8. Quality assessment scores of the docking analysis for the anti-pneumonia putative AMPs and the pneumonia receptors. S/N serial number, ACE atomic contact energy.  A22 I25 R26 V29 S297 L298 V299 G300 D302 I388  R389 T390 R391 S392 G394 N395 S457 F458 Q459 G460  R461 F464 A471   T1 P2 F4 I5 D6 G7 Q8 V9 P10 I11 P12 K13 Q14   BOPAM-INFA12  − 165.06  77.48  R74 Y78 E80 E81 R150 R152 A153 R156 T157 S170 R174  M191 E192 R195 M196 R195 D203 F101 N211 T216  C1 P2 I4 L5 D6 A8 I9 Q10 L12 P13 K14   BOPAM-INFA3  − 178.81  72.93   R65 Y78 E81 H82 P83 S84 A85 G86 K87 G93 P95 Y97 R106  L108 I109 L110 N144 Y148 Q149 R150 T151 R152 A153  T171 R317 Q364 I365 A366 S367 E369   T1 P2 T3 F4 I5 D6 G7 Q8 V9 T10 V11 P12 D13 (53) where a re-assessment of the physicochemical properties of antimicrobial peptides was evaluated, bringing about a characteristic thermal change profile in model vesicles which was utilized to rank novel molecules with unknown biological action. The structure prediction results generated for the AMPs and the receptors are in accordance with the different structural conformations displayed by known AMPs and proteins. Examples of known AMPs and their structures are tachyplesin from horseshoe crabs and bovine lactoferricin, which have beta-sheet conformations; magainin simple and melittin having alpha-helical conformations. The C-score from I-TASSER is a measure of the certainty of the modeling template used for the prediction to anticipate the quality of the structure, that is, the distance between the anticipated model and the local structures 41 . Both TM and RMSD scores are known standards for estimating structural closeness between two structures for accuracy of structural model when the local structure is known 30 . The peptides' structures were predicted, and the outcomes demonstrated that these peptides conformed to known AMPs. In any case, the AMPs are thought to be putative anti-pneumonia peptides because of the absence of wet laboratory experiments for these molecules. This outcome relates to the work of Tincho, Gabere (12), where binding geometry scoring was utilized as the criteria in the determination of applicant AMPs for HIV diagnostics. These perceptions were additionally affirmed utilizing an in-house lateral flow device in which the putative AMPs were utilized to recognize HIV in patient samples 13 .
The anti-viral AMPs also displayed high binding energy scores with the viral pneumonia receptors using PatchDock and HDock servers. Both servers use scoring functions to simulate ligands' conformations on protein receptors. HDock server utilizes the classical force-fields-based scoring function to estimate and assess the nonbonded interactions (electrostatic and van der Waals). The docking interaction analysis of the AMPs revealed that all AMPs bound the respective viral receptors with a high binding capacity with BOPAM-INFA8, BOPAM-INFB2, and BOPAM-RSV4 having the highest binding potential and area with the most reduced atomic contact energy (Tables 8 and 9). Comparing these results with the physicochemical results in Table 5, BOPAM-INFA1,  2, 7, BOPAM-INFB3, 6, BOPAM-RSV3, 7, 12, and 13, which indicated zero net charges, gave the most reduced binding affinities (Boman index values) with the pneumonia receptor proteins. The result from this research showed BOPAM-INFA8, BOPAM-INFB4, and BOPAM-RSV4 as the best applicant specialists for the detection of the respective viral pneumonia pathogens. This binding affinity and other parameters, for example, area and atomic contact energy 2 , are significant in determining novel anti-viral AMPs for potential use in pneumonia diagnosis through the development of an LFD.
Designing and modeling novel AMPs for diagnostics is an active area of research to reduce the abuse of the conventional antibiotic agents and mitigate the non-specificity of the current diagnostic and prognostic biomarkers. One limitation for HMMER's use is the data correlation with the amino acid residues of AMPs which is hard to capture by this software because of the linear nature of HMM profile. An example of such data correlation is predicting the actual distance between the folding of proteins, their spreading out; and the forecast between the electrical and chemical connectivity. Another constraint is the low sensitivity of HMMER to the utilization of small datasets due to the accessible number of AMPs in the databases to specific targets. Also, AMPs are not advisable for use when proteolytic degradation is possible due to L-amino acids' presence in them 54 . All these limitations were taken into consideration during the design of this work to ensure that the sensitive detection of the viral pneumonia utilizing anti-viral AMPs was not compromised. The use of the putative AMPs from this analysis would greatly benefit the diagnosis of viral pneumonia through the HMMER's utilization in the prediction of AMPs for model predictions. One of this work's qualities is that it would offer knowledge into the modular architecture of AMPs utilizing in silico technologies for potential pneumonia diagnosis. This attempt offers promising perspectives for patients living with these conditions to develop accommodating lifestyles through sensitive detection of the viral pneumonia pathogens and would allow medical practitioners towards correct treatment plans. Table 9. Quality assessment scores of the docking analysis from HDock for the anti-pneumonia putative AMPs and the pneumonia receptors with the hotspot interacting residues.

Conclusion
This research work distinguished novel AMPs for the potential detection of viral pneumonia utilizing the HMMER in silico technology, where 27 anti-viral peptides were generated. The putative anti-pneumonia AMPs demonstrated conformity to other known AMPs regarding their physicochemical qualities estimated by APD3 and BACTIBASE. This demonstrative framework's fundamental goal is to facilitate the quest for specific biomarkers for the early recognition of viral pneumonia. Thus, the AMPs have indicated an incredible potential in evading the current diagnostic frameworks' downsides. This research could be sought after molecular validation through the binding test of these AMPs with the viral proteins individually, utilizing an "on/off " binding test in an LFD setting to build up a model with these AMPs.

Future work
Future work will incorporate the site-directed mutagenesis of the putative AMPs to upgrade them into more potent competitor diagnostic molecules. This analysis would be followed by an in vitro investigation of the antipneumonia activity of the transformed peptides. Furthermore, the EC50 of the AMPs and their selective index will be evaluated for the streamlined AMPs. The anti-pneumonia potential of these AMPs will be done on various pseudotypes of the pneumonia microbes to decide their diagnostic potential. Finally, the complex formed between the microbe receptors and putative AMPs will be unraveled utilizing structural biology to approve the perceptions made by the in silico binding examination.