In-silico identification of the vaccine candidate epitopes against the Lassa virus hemorrhagic fever

Lassa virus (LASV), a member of the Arenaviridae, is an ambisense RNA virus that causes severe hemorrhagic fever with a high fatality rate in humans in West and Central Africa. Currently, no FDA approved drugs or vaccines are available for the treatment of LASV fever. The LASV glycoprotein complex (GP) is a promising target for vaccine or drug development. It is situated on the virion envelope and plays key roles in LASV growth, cell tropism, host range, and pathogenicity. In an effort to discover new LASV vaccines, we employ several sequence-based computational prediction tools to identify LASV GP major histocompatibility complex (MHC) class I and II T-cell epitopes. In addition, many sequence- and structure-based computational prediction tools were used to identify LASV GP B-cell epitopes. The predicted T- and B-cell epitopes were further filtered based on the consensus approach that resulted in the identification of thirty new epitopes that have not been previously tested experimentally. Epitope-allele complexes were obtained for selected strongly binding alleles to the MHC-I T-cell epitopes using molecular docking and the complexes were relaxed with molecular dynamics simulations to investigate the interaction and dynamics of the epitope-allele complexes. These predictions provide guidance to the experimental investigations and validation of the epitopes with the potential for stimulating T-cell responses and B-cell antibodies against LASV and allow the design and development of LASV vaccines.

to 408 in the T-loop (residues 365-384) and HR2 (residues 400-412) regions of GP2. Site B contains residues 269 to 275 of the fusion peptide and residues 324 to 325 of HR1 (residues 311-355) of GP2 4,14 . Although the antibody predominantly binds to GP2, GP1 is required to maintain the proper prefusion conformation of GP2 for antibody binding 4 .
Identification of epitopes is an essential step for understanding disease etiology, immunotherapy, immunodiagnostics, and the discovery and development of epitope based-vaccines. An epitope-based vaccine has fewer side effects compared to conventional vaccines. Experimental identification of a promiscuous epitope involves many expensive and time-consuming steps, including the production of antibodies to map antigenic regions on a target protein, animal models, and determination of the crystal structure of antigen-antibody complexes using X-Ray crystallography. Computational identification of epitopes is often employed as a powerful and fast approach to facilitate the identification of potential epitope candidates that can decrease the number of validation experiments and time 15,16 . Multi-epitope based vaccine development has already proven effective against several viral infections and cancer 17,18 . In this study, we have identified and characterized T and B-cell epitopes for the LASV GP using different sequence and structure-based computational epitope prediction methods. We then selected potential B and T-cell epitopes for the LASV GP based on a consensus approach, and the novelty of the epitopes was examined with the Immune Epitope Database (IEDB) tools. Subsequently, we identified strongly binding alleles to the MHC-I T-cell epitopes and modeled the allele structures and performed docking to understand the interaction between alleles and epitopes. We further investigated the stability and dynamics of the epitope-allele complexes using molecular dynamics simulations. Analyses of root-mean square deviations, hydrogen bond, interaction energy, and solvent accessibility showed that epitope-allele complexes are stable, indicating that the epitopes strongly bind to the alleles. The identified B and T-cell epitopes of LASV GP in the study can be useful for the development of effective vaccines against Lassa hemorrhagic fever.

Materials and methods
Selection of LASV GP sequence and 3D structure. The sequence of GP for different LASV strains was obtained from the NIAID Virus Pathogen Database and Analysis Resource (ViPR) 19 . Subsequently, multiple sequence alignments were performed between the sequences using Clustal Omega 20 to select a conserved LASV GP for sequence-based epitope prediction. The corresponding X-Ray crystal structure of the Mouse/Sierra Leone/Josiah/1976 LASV GP was obtained from the Protein Data Bank (PDB ID: 5VK2) 4,21 for structure-based B-cell epitope prediction. The missing residues were modeled using the Charmm-Gui 22-24 . Prediction of B-cell epitopes. Sequence-based B-cell epitope prediction was performed with the use of BepiPred2.0 25 , BCPREDS 26 and BcePred 27 servers separately. These servers predict epitopes based on physico-chemical properties of amino acids, and these servers accept the primary sequence of LASV GP as an input. . Each GP has a GP1 subunit and a GP2 subunit (zoomed view). Each monomer is colored differently in the GP trimer. In the zoomed view, the GP2 subunit is lightly shaded to differentiate from the GP1 subunit, and some of the antibody binding sites (Site A, Site B) are highlighted (figure generated from the crystal structure of the LASV GP in the Protein Data Bank 21 , PDB ID: 5VK2 4 ).
Structure-based B-cell epitope prediction for the LASV GP (PDB ID: 5VK2) was carried out using three different programs separately: ElliPro 28 , Epitopia 29 and DiscoTope 30 . These servers predict epitopes regions based on the geometrical and solvent surface-accessibility of a protein structure, and these servers accept the 3D structure of a protein as input. The consensus epitopes from both sequence and structure-based predictions were selected as potential epitopes for further analysis.

Prediction of T-cell epitopes.
Sequence-based MHC-I T-cell epitope predictions for LASV GP were carried out by using three different servers, ProPred-I 31 , CTLPred 32 and NetCTL1.2 33 . To predict their alleles, the consensus epitopes among these three prediction methods were analyzed using IEDB 34 . The epitopes that strongly bind to the alleles (lowest IC 50 ) were selected for further analysis.
Sequence-based MHC-II T-cell epitope predictions for LASV GP were performed with the use of three different servers: ProPred 35 , NetMHCII2.3 36 40 ] were obtained from the PDB. The experimental structure for the HLA-A*32:01 (A4) allele is not available, and thus, the sequence of this allele was obtained from the UniProt database (UniProtKB ID: P01892), and subsequently its structure was modeled using Swiss-Model [41][42][43] . The selected consensus MHC-1 epitopes were extracted from the crystal structure of LASV GP (PDB: 5VK2). The epitopes and the alleles were prepared for docking using Autodock Tool version 1.5.6 44 . Autodock Vina 1.1.2 45 was used for peptide docking with a grid space that covered the entire allele. The best peptide-allele complexes were selected for further investigation based upon visual inspection of peptide-allele interactions and the Autodock Vina criteria. The stability and dynamics of the selected peptide-allele complexes were further studied using molecular dynamics simulations.

Molecular dynamics simulations.
All-atom, explicit-solvent molecular dynamics (MD) simulations were performed to investigate the stability and dynamics of the MHC-1 T-cell epitope-allele complexes using the CHARMM36m force field 46 with the NAMD 2.12 software package 47 . The systems were minimized for 10,000 steps followed by 200 ps of equilibration. This was followed by MD production runs for 200 ns at a temperature of 300 K using a 2 fs time-step. The long-range ionic interactions were calculated using the particle mesh Ewald (PME) method 48 while the covalent hydrogen atoms were constrained by using a SHAKE algorithm 49 . The temperature was controlled by using the Langevin temperature coupling with a friction coefficient of 1 ps −1 and pressure was controlled using the Nose-Hoover Langevin-Piston method 50 . Visualization, and rendering of trajectories and pictures were performed using VMD 51 .

Results and Discussion
The multiple sequence alignment of the 84 LASV GP sequences resulted in the LASV GP Mouse/Sierra Leone/ Josiah/1976) [UniprotKB ID: P08669] as a highly conserved strain, and we thus selected this strain for the sequence-based MHC-I and MHC-II T-cell epitope predictions and for both structure and sequence-based B-cell epitope predictions. In addition, a search of this strain with the experimentally determined structure available in the PDB displayed the 3.2 Å resolution crystal structure of the prefusion GP trimer of LASV in complex with the human neutralizing antibody 37.7 H. [PDB ID: 5VK2] as shown in Fig. 1. This structure was used for the structure-based B-cell epitope prediction. A schematic representation of the epitope prediction cascade is shown in Fig. 2. We have adopted multiple methods to predict and rank the epitopes as they use different criteria for their predictions. Some approaches may incorporate some properties that are similar such as solvent accessible surface area, but the predicted epitopes are different. Previous studies 52,53 have suggested that the consensus approach would improve the specificity and accuracy of the epitope prediction as it can reduce the false positives. Therefore, we employ a consensus approach; for example, an epitope can be considered if it overlaps with even a single residue by at least two prediction methods. Our consensus approach selected several nanomer epitopes for MHC-I (Table S1). Although the predicted epitopes for MHC-II T-cell vary in length, the consensus core region between predicted MHC-II epitopes is a nanomer (Table S2) which is considered 54 an optimal length for the HLA.
Prediction of T-cell Epitopes. MHC-I T-cell epitope prediction with the LASV GP sequence was performed using three different methods separately: ProPred-1, CTLPred, and NetCTL1.2, and the results are shown in Supplementary Table S1. The epitopes listed by at least two of the methods are listed in Table 1 along with their binding affinity (IC 50 ), antigenicity, and allele. Among these four consensus epitopes, the nanomer E1 epitope FATCGLVGL shows the lowest average IC 50 value of 34 nM against the A1 allele as predicted by the IEDB, and it has also a reasonable antigenicity score of 1.65. This was followed by the E3 epitope FSRPSPIGY, which has an average IC 50 value of 88 nM against the A3 allele, and also has a better antigenicity score of 2.50 compared to the FATCGLVGL epitope. Interestingly, the E4 epitope RRGTFTWTL is predicted by all three methods though its IC 50 and antigenicity scores are not as good as the other epitopes (Table 1). All four of these consensus epitopes were docked to the alleles and we performed the MD simulations to investigate the stability and dynamics of the allele-epitope complex as discussed later.
MHC-II T-cell epitope prediction with the LASV GP sequence was performed using three different methods separately: ProPred, NetMHCII 2.3, and EpiTOP 3.0, and the results are shown in Supplementary Table S2. ProPred uses a quantitative matrix 35 approach and NetMHCII2.3 uses ANN 36 , while EpiTOP 3.0 uses Quantitative Structure-Activity Relationship models (QSAR) 37 to predict the MHC-II T-cell epitopes. The epitopes that were predicted by at least two methods are listed in Table 2. Among these consensus MHC-II T-cell epitope predictions, the E9 and E13 epitopes were predicted by all three methods and have a reasonable antigenicity score of 0.7, indicating that these two epitopes can be potential candidates for the design of MHC-II T-cell based vaccines. ProPred and EpiTOP 3.0 predict most epitopes as nanomers whereas NetMHCII 2.3 predicts varying lengths of epitopes (Table 2). Interestingly, the 15-mer epitopes predicted by NetMHCII have the consensus core nanomer epitopes, suggesting that the core region is responsible for strong binding of the epitope into the MHC-II binding site 55-57 . Prediction of B-cell epitopes. In addition to the T-cell epitope predictions, we also predicted the linear B-cell epitopes for the LASV GP using sequence-based methods BepiPred 2.0 25 , BCPREDS 26 , and BcePred 27 . The BepiPred predicts the epitopes based on a random forest algorithm trained on epitopes annotated from antibody-antigen structures. BCPREDS predicts epitopes by using SVM combined with a different kernel method, including string kernels, radial basis kernels, and subsequence kernels. The BcePred locates B-cell epitopes using four physicochemical properties like hydrophilicity, polarity, exposed surface and beta-turns 27 . The epitope E30 containing 10 residues was predicted by all three of these sequence methods (Table 3) but with a negative antigenicity score.
We also performed structure-based B-cell epitope prediction using three representative structural and geometrical properties-based methods: ElliPro, Epitopia and DiscoTope. For this, the experimental 3D structure LASV GP (PDB ID: 5VK2) with the modeled missing residues was used. ElliPro predicts linear and conformational epitopes by incorporating the antigenicity, solvent accessibility, and flexibility of protein structures 28 . Epitopia uses a machine learning algorithm to analyze the antigenic features on protein structure and predicts the probable conformational epitope regions 29 . DiscoTope uses amino acid statistics, spatial information, and surface accessibility on the protein 3D structure to predict residue-by-residue conformational epitopes 30 . The E24, E29, E32 and E33 structure-based epitopes in Table 3 are especially interesting as potential candidates as they were predicted by all three methods. In Table 3, we also ranked each epitope based upon how many of the sequence and structure-based methods predicted each epitope, which do not always correlate with the highest antigenicity scores of E24, E26, E28, E29 and E31.
Robinson et al. 14 have recently reported the cloning of many human monoclonal antibodies derived from memory B cells of Lassa fever survivors in West Africa. These antibodies specifically bind to both GP1 and GP2 epitopes of LASV. The comparison of our predicted B-cell epitopes with those epitopes shows that there are five consensus epitopes ( Table 3) that share similarity with Robinson et al. (Table S3), and another five epitopes that do not share similarity, indicating that our consensus epitope prediction strategy has identified new epitopes.
Epitope surface mapping. For efficacy of vaccines, the epitopes should be located on an accessible region of the protein so that the epitope will be able to bind with antibodies 53 . This is especially important for the six epitopes that we list in the Tables above that do not share any part of their sequence with known epitopes: E1, www.nature.com/scientificreports www.nature.com/scientificreports/ E4, E18, E22, E27, E29. In Fig. 3, we highlight the positions of these epitopes on LASV GP. We also highlight the positions of E2 and E3 because the four MHC-I T-cell epitopes have IC 50 information readily available. Figure 3 shows that the E1, E2, E3, E4, E18, E22 and E27 epitopes are well located on the exposed regions and thus can interact well with the alleles. www.nature.com/scientificreports www.nature.com/scientificreports/

MHC-I T-cell Allele and epitope modeling and docking. Swiss-Model identified the 1.61 Å resolution
crystal structure of the HLA class I antigen (PDB ID: 6EI2) as the best template for constructing models. The sequence identity between A4 and the template was 92%. The best model was then selected based on multiple validation methods, including GMQE (Global Model Quality Estimation) and QMEAN. The GMQE and QMEAN values 41,58 of the model are 0.75, and 0.6, respectively. In addition to these analyses, Ramachandran plots and ERRAT were also used for the model validation. Analysis of Ramachandran plot 59 of the model shows 99.6% of residues are either in favored or in allowed regions ( Supplementary Fig. S1), indicating that backbone torsion angles of these models are acceptable. The ERRAT overall quality factor 60 score was computed as 99, which is greater than the normally accepted score range for a high quality model of 50. These analyses show that the model is within a high quality range and can be used for further analysis.
Docking of the four consensus MHC-I epitopes (Table 1) was performed using Autodock Vina, which enabled the docking of epitopes obtained from the sequence-based MHC-1 T-cell prediction into the promising allele structures. The Autodock Vina docking protocol has been previously demonstrated to successfully dock epitopes into allele structures 45 . However, we validated the capability of the docking protocol before docking the epitopes by redocking the epitopes into the allele crystal structure (PDB ID:3OX8) to see whether the crystal bound conformation of the peptide could be reproduced or not. The docked allele-epitope complex showed the same residue-epitope interactions observed in the epitope bound crystal structure, indicating that the Autodock Vina docking protocol was capable of reproducing the experimentally observed binding mode of the epitope. We applied Autodock Vina to each of the four MHC-I allele-epitope complexes. Autodock Vina found that the highest ranked docking structure had the following binding affinities: −5.5 kcal/mol for A1::E1 −5.0 kcal/mol  Table 3. Prediction of the B-cell epitopes. The epitopes predicted by either all three sequence-or structurebased methods are highlighted by boldface. Conformational epitopes chosen by all three structure-based methods are indicated in italics. www.nature.com/scientificreports www.nature.com/scientificreports/ for A2::E2, −6.8 kcal/mol for A3::E3, and −6.0 for A4::E4. These epitopes-alleles docking complexes are shown in Fig. 4.

Dynamics of the allele-epitope complex.
In order to investigate the dynamics and stability of the four MHC-I allele-epitope complexes, we performed 200 ns all-atom, explicit solvent MD simulations. To quantitatively understand the stability of the allele-epitope complex, we calculated the root mean square deviations (RMSD) of the backbone atoms of the allele-epitope complexes as a function of simulation time as shown in www.nature.com/scientificreports www.nature.com/scientificreports/  Figure 5 also includes curves of the RMSD of the backbone atoms of just the allele, and separately, just the backbone atoms of the epitope. All alleles have an RMSD compared to their initial structures of approximately 2 Å, whereas the allele-epitope complexes have a bit higher RMSD of approximately 2.5 Å, indicating that the epitopes make the complexes more flexible. Interestingly, in the case of A3::E3, the allele and the complex show almost the same RMSD, suggesting that the complex is especially stable. To pinpoint why the complexes show a higher RMSD, we further computed the RMSD of only the backbone atoms of the epitope in each the complex. Figure 4 shows that the initial configuration of epitopes E1 and E4 is compact, and that both of these epitopes rearrange their configuration in the binding site and elongate during the 200 ns MD simulation. This elongated configuration is consistent with the investigations of Antunes et al. 61 on MHC-I epitopes.
Since the interactions between protein and epitope peptide are mostly influenced by non-covalent interactions, we computed the number of hydrogen bonds and the interaction energy between the allele and epitope as a function of the MD simulation time. The hydrogen bond was calculated between the protein interface atoms with a distance cut-off of 3.5 Å and angle cut-off of 30o between the donor and acceptor heavy atoms. As shown in Fig. 6, the number of H-bonds fluctuates during the MD simulations for all the complexes. The A3 complex has the largest number of H-bonds. Table 4 shows that during the last 50 ns of the MD simulation trajectory, the A3 complex averages 2.5 H-bonds. Additional analysis of the hydrogen bonding between allele and epitope are listed in Supplementary Table S4. Figure 6b shows the interaction energy (electrostatic interaction + van der Waals contacts) throughout the entire MD simulation and Table 4 lists the average over the last 50 ns. The A3::E3 and A4::E4 display relatively stronger interaction energies than the A1:E1 and A2::E2 complexes. The comparison of RMSD, hydrogen bond, and interaction energy information indicates that the E3 epitope is an especially promising epitope candidate. Novelty analysis. The novelty of the four MHC I T-cell epitopes in Table 1, the nineteen MHC II T-cell epitopes in Table 2, and the ten B-cell epitopes in Table 3 identified in this study were analyzed using IEDB 34 . The IEDB database contains the epitopes that are annotated based on scientific literature. The IEDB showed that the E1, E4, E18, E22, E27, E29 epitopes, which bind to solvent exposed regions on the protein (Fig. 3), have not been previously reported as LASV epitopes or vaccine candidates. In addition, this analysis further indicates that 24 other epitopes (E2, E3, E5, E6, E7, E8, E10, E11, E12, E14, E15, E16, E17, E19, E20, E23, E24, E25, E26, E28, E30, E31, E32, E33) have partial segments of their sequence reported as subsets of other epitopes, whereas E9, E13, E21 are exact match to previously reported sequences. For these epitopes, a comparison showing the overlap between the predicted epitopes in this study and previously known epitopes documented in IEDB is given in Table S5. In addition to the epitopes in the IEDB, we compared our consensus predicted epitopes with the previously reported predictions [62][63][64][65][66][67] in Table S6. This comparison shows a varying degree of overlap in the predicted sequences. The www.nature.com/scientificreports www.nature.com/scientificreports/ novelty results confirm that thirty epitopes have not been previously tested experimentally as LASV epitopes, suggesting that their therapeutic potentials in designing vaccines against LASV can be further explored.

Conclusion
LASV hemorrhagic fever is endemic in West Africa, and no approved effective therapeutics are currently available. Therefore, there is an urgent need for the discovery and development of potential antiviral therapeutics. The LASV GP spike has emerged as a promising selective target for the development of novel vaccines as it plays an essential role in the virus-host interaction. Several in-silico studies [62][63][64][65][66][67] were performed to predict LASV GP epitopes with the use of a single prediction tool for each type of epitope. We have identified new T and B-cell epitopes using a variety of computational approaches, including twelve epitope prediction methods, protein-peptide docking, and MD simulations. The MHC I and II T-cell epitopes were separately predicted with the LASV GP sequence using well-known prediction methods. The predicted MHC I T-cell epitopes then were prioritized based on the consensus score, binding affinity, and antigenicity, while MHC II T and B-cell epitopes were prioritized based on the consensus score. Novelty analysis of the consensus-selected 33 epitopes showed that thirty of these predicted epitopes have either no overlap or only a partial overlap to previously reported sequences. Within this list of new epitopes, six sequences have no overlap with any known experimentally tested epitopes in the IEDB. In addition, docking and MD simulations were performed to further validate the MHC I T-cell epitopes. The simulation results show that the allele-MHC-I epitopes are stable, with favorable hydrogen-bond and interaction energy. Of these, Epitope E3 ( 233 FSRPSPIGY 241 ) segment was found to be especially stable. This study demonstrates that the adopted consensus epitope prediction strategy is valuable for in-silico investigations of known epitopes and the identification of new epitopes. Experimental validation of these epitopes may lead to the design and development of effective LASV vaccines.  Table 4. Allele-epitope interaction parameters calculated by averaging over the last 50 ns of the MD simulation trajectory.