Immunoinformatic and systems biology approaches to predict and validate peptide vaccines against Epstein–Barr virus (EBV)

Epstein–Barr virus (EBV), also known as human herpesvirus 4 (HHV-4), is a member of the Herpesviridae family and causes infectious mononucleosis, Burkitt’s lymphoma, and nasopharyngeal carcinoma. Even in the United States of America, the situation is alarming, as EBV affects 95% of the young population between 35 and 40 years of age. In this study, both linear and conformational B-cell epitopes as well as cytotoxic T-lymphocyte (CTL) epitopes were predicted by using the ElliPro and NetCTL.1.2 webservers for EBV proteins (GH, GL, GB, GN, GM, GP42 and GP350). Molecular modelling tools were used to predict the 3D coordinates of peptides, and these peptides were then docked against the MHC molecules to obtain peptide-MHC complexes. Studies of their post-docking interactions helped to select potential candidates for the development of peptide vaccines. Our results predicted a total of 58 T-cell epitopes of EBV; where the most potential were selected based on their TAP, MHC binding and C-terminal Cleavage score. The top most peptides were subjected to MD simulation and stability analysis. Validation of our predicted epitopes using a 0.45 µM concentration was carried out by using a systems biology approach. Our results suggest a panel of epitopes that could be used to immunize populations to protect against multiple diseases caused by EBV.

SCiEnTifiC REPORTS | (2019) 9:720 | DOI: 10.1038/s41598-018-37070-z EBV glycoprotein B is important for viral fusion events with B cells 14 . The human immune system precisely targets EBV glycoprotein 350 (gp350), which is an example of a lytically expressed gene 15 . The attachment of GP350 to the MHC-II molecules in the cell is aided by the already attached GP42 protein of EBV virus 16 . The fusion of the B-cell membrane and the outer viral envelope of the EBV virion requires functional spicule glycoproteins such as GH, GL, and gp42 17 . Previous studies suggest that glycoproteins such as GB complement membrane fusion 18 .
Vaccination is a significant approach to improve the standard of public health and provide an effective way to control the growing infections. In nature, plants act as bioreactors, which have been used to express efficient vaccine antigens against viral, bacterial and protozoan infections. Besides, we know that antibody epitope prediction using computational tools, one of the crucial steps of vaccine design 19 . Recent advancement in vaccine design has aimed for the expansion of conventional assays designed to quantify T-cell responses against various vaccine candidates 20 . Immunoinformatic approaches have made great contributions to predicting B-cell and T-cell epitopes in the development of subunit vaccines 21 . For subunit vaccine development, identification of continuous B-cell or nonlinear also known as non-continuous and cytotoxic T-lymphocyte (CTL) epitopes are essential. Among B-cell epitopes, >90% are noncontinuous 22,23 . The use of computational tools contribute greatly in biology designing in silico vaccine, prediction of T-cell epitope is crucial which does not only reduce the cost but also the necessity for experimental results 24 . Epitopic vaccine against HIV, malaria and tuberculosis resulted in promising outputs and maintained the defensive and beneficial potential therapeutic uses of the developed vaccines candidates 25 . Immunoinformatics plays an upright role in antibody and immunodiagnostic agents development, and vaccine design. The early phases of vaccine and other therapeutics agents developments were based on solely immunological experimentations. These early developmental techniques were tedious and costly too. The use of bioinformatics such as computational techniques greatly reduces the time and cost of developing such agents for therapeutic purposes. Khan et al. 26 , used multiple bioinformatics tools to predict vaccine against multiple HPV viruses 26 . Thus the development of modern therapeutic medicines and vaccines greatly rely on such tools 27,28 .
Systems medicine emphases significantly on the components of pathway kinetics to probe different conditions. Systems medicine could be also utilized to investigate interaction mechanisms between microbes. Metagenomics data could be utilized for such analysis. Rather than whole cell interaction, an insight onto proteins interaction could be also comprehended through systems biology approaches (Singh, P. K. et al. 29 . These latest techniques widely expand the circle of new drugs development. Overall, it is known that multidisciplinary aspects of the production of therapeutic proteins that has gained much more attention (Dangi, A. K. et al. 30 . Structural modelling using computational resources led to success in the development of computer assisted drugs. Likewise molecular docking algorithms scoring functions of different conformers to design new drug candidates. In the present study, potential B-cell and T-cell epitopes (as effective vaccine candidates) were identified with the help of immunoinformatic approaches. T-cell immunogenicity is associated with epitope binding strength to the MHC molecule 31 . Molecular modelling tools were applied to peptide-MHC complexes to investigate their post-docking interaction in order to select potential candidates for the development of peptide vaccines.

Materials and Methods
Epstein-Barr Virus Sequence. The sequences of EBV proteins, including GH, GL, GB, GN, GM, GP42 and GP350, were retrieved from the Universal Protein knowledgebase (Uniprot) database (http://www.uniprot. org/). The sequence retrieval accession numbers along with other information are provided in the ( Table 1). The 3D coordinates of all the selected proteins were predicted by using online webserver phyre2 (http://www.sbg.bio. ic.ac.uk/phyre2/html/page.cgi?id = index) 32 . The overall workflow of the work is shown in the Fig. 1.

Prediction of Linear B-Cell Epitopes.
After interacting with antigens (such as B-cell epitopes), B-lymphocyte cells differentiate into memory cells and antibody secreting plasma cells 33 . B-cell epitopes have a hydrophilic nature and are accessible for flexible regions 34 . IEDB (http://www.iedb.org/) online analysis resources were used to obtain the Parker hydrophilicity prediction values 35 , Emini prediction values of surface accessibility 36 , Kolaskar and Tongaonkar's antigenicity scale values 37 , and Karplus and Schulz Flexibility Prediction values. B-cell epitopes were predicted using ElliPro (http://tools.immuneepitope.org/toolsElliPro/) using both protein sequences and structural information 38 . ElliPro utilizes the Protrusion Index (PI) of residues, protein shape approximation, and the final neighbouring residues clustering, which rely on PI.  28 . NetCTL accepts the FASTA sequence format to perform different analyses, such as prediction of MHC class I binding affinity, TAP transport efficiency and C-terminal cleavage. The artificial neural network and weight matrix were used for the prediction of MHC-I binding and proteasome-dependent C-terminal cleavage.
Peptide Library Construction and Molecular Docking. All of the predicted epitopes were modelled using the online webserver PEP-FOLD3 using 200 simulation runs to sample the conformations 39 and sOPEP energy function 40 . Subsequently, we have docked the best ranked peptide models to the selected class I MHC molecules HLA-A (PDB ID: 2GIT) using the PatchDock docking server 41 . The algorithm of the PatchDock server uses structural geometry to find docking transformations with good molecular shape complementarity 42 . The resulting complexes were refined through the FireDock server 43,44 . High energy complexes were subjected to interaction analysis and molecular dynamics simulations 45 .

Molecular Dynamics Simulations.
The accepted complexes were subjected to Molecular Dynamic simulations using the AMBER 14 molecular dynamics package 46 . The system was neutralized using Na + ions using tleap. Each system was solvated in a rectangular box with buffer distance of 8.0 A° using TIP3P water molecules. A two-stage energy minimization of the complexes, using the SANDER module of AMBER 14, was performed to relieve the atomic clashes. An initial minimization of 6,000 steps, followed by another round of minimization (6,000 steps), were used to restrain the positions of all atoms in the systems, except those from the water molecules in the first minimization. The pmemd.cuda 47 software was used to simulate the minimized complexes. The SHAKE algorithm and the Particle-Mesh Ewald (PME) method were used to include the long-range interactions, and a non-bonded interaction cutoff radius of 10 A° was considered. For equilibration, 10,000 ps time was applied, followed by a 50 ns simulation carried out at 310 K using the Langevin temperature coupling scheme at constant pressure (1 atm) with isotropic molecule-based scaling. Sampling of the MD trajectories was carried out every 2.0 ps. RMSD and hydrogen bonding analysis were carried out using the integrated CPPTRAJ and PYTRAJ 48 modules in AMBER 14 and were visualized using the online server PDBePISA 49 , UCSF Chimera 50 and PyMOL 51 .
Antigenic and Allergenic behaviour the predicted Epitopes. To confirm the allergenic and non-allergenic properties of all the designed epitopes, B-cell and T-cell epitopes, AlgPred 52 (http://crdd.osdd. net/raghava/algpred/), which is an online web tool was employed with accuracy of 85%. Primary amino acid sequences were used of all the selected proteins for this purpose. The antigenicity of all the epitopes was predicted by using ANTIGENpro 53 (http://scratch.proteomics.ics.uci.edu/) using different machine learning algorithms to process the amino acids sequences.
PKPD Modeling. The design and execution of the EBV signalling cascade was performed based on a literature survey within a virtual cell. Proposed peptides were chosen for kinetic analysis of EBV, where a concentration of 0.45 µm was assigned based on previous literature. Modelling of chemical reaction networks was applied for the analysis of this pathway. This nonlinear kinetics scheme follows the Michaelis-Menten equation as below: This equation can be transformed to, Here, C = steady state concentration; C max = theoretical maximum for C; A max = theoretical maximum for A; D = dose.

Results
Sequence retrieval and analysis. Uniprot was used to retrieve the primary amino acid sequences of the selected proteins (glycoprotein B, glycoprotein L, glycoprotein N, glycoprotein H, glycoprotein M, glycoprotein 42 and glycoprotein 350) of EBV as shown in Fig. 2. Information about the protein source, accession number, number of active residues, and other information is given in Table 1.
Allergenicity and antigenicity prediction. The allergic and nonallergenic behaviours of EBV species were predicted using AlgPred (http://www.imtech.res.in/raghava/algpred/). Allergenicity prediction of known protein sequences is based on similarity. We checked if the epitopes are antigenic or not using the online server AntigenPro (http://www.scratch.proteomics.ics.uci.edu/) 35 . All of the proteins were found to be nonallergenic, while they possess antigenic properties (Table 2).
B-cell epitope prediction. The BCPred server predicted 58 B-cell epitopes: five epitopes were for GP 42, eight for GP H, nineteen for GP B, one for GP L and GP N, five for GP M, and nineteen for GP 350. However, epitopes with scores above 0.99 were selected as the most potentially antigenic epitopes. Therefore, only one epitope each from GP42, GL, GM, GN and GH; four epitopes from GB; and fifteen epitopes from GP350 were found to meet the threshold value. B-cell epitopes, along with their scores, are tabulated in Table 3.
Surface accessibility of EBV. Threshold values >1 were set to predict the surface probability values. Amino acids with higher surface probability values (>1) have greater probability to be present on protein surfaces 36   Parker Hydrophilicity Prediction for EBV. Hydrophilicity of the predicted epitopes was calculated using the Parker hydrophilicity approach 35 . The graphical illustration of the predicted Parker hydrophilicity of EBV is shown in Fig. S3 (Supplementary Materials). From all of the predicted EBV peptides, the maximum hydrophilicity calculated was 6.843 for Glycoprotein 350 at the amino acid positions DNGTESK 496-502 . These regions were predicted to act as active T-cell epitopes. The minimum hydrophilicity score calculated was −7.857 for Glycoprotein M at the amino acid positions FLWWVVF [193][194][195][196][197][198][199] . Moreover, for all of the other proteins which have maximum and minimum hydrophilicity scores are shown in Table S3 (Supplementary Materials).

T-cell epitope identification.
Epitope predictions for all seven proteins were conducted on NetCTL, an online epitope prediction server. MHC-I binding prediction using the SMM method resulted in many potential epitopes against one allele, HLA-A*24:02. The weight matrix and artificial neural network was used for the prediction of MHC-I binding and proteasome dependent C-terminal cleavage. The MHC binding affinity, the     48 (kcal/mol), respectively. The vdW interaction energies for these complexes ranged from −29.71(kcal/mol) to −18.80 (kcal/mol). Residues Lys66, Arg97, Tyr99, Gln155, His114, and Thr163 from these complexes were uniformly involved in hydrogen bonding interactions. The Chimera interaction analysis tool predicted the interactions between peptide and MHC-I molecules within 3Ǻ. Overall, the stability was supported by the variable amounts of hydrogen bonding. The molecular interaction patterns are depicted in Fig. 3, while the interacting atoms are shown in Table 5.

PKPD modelling-based validation.
In a time-course simulation with the shortlisted peptides (obtained from the screening), the initial concentration of the top-listed peptides was set at 0.45 µm 54-56 , and after 20 seconds, all of the interactions of the EBV mechanism in the cell signalling cascade were stabilized given in Fig. 4 and 5. EBV is a Baltimore Class I virus of the Herpesviridae family and plays a crucial role in lymphoproliferative disease. During tropic EBV infection, EBV binds to the HLA class II molecule, and B cells inhibit epithelial cell fusion, while the GH receptor protein interacts with GP42, and GL is transported to the cell surface where it is essential for the correct folding of GH. EBV GB is important for viral fusion events with B cells. Glycoprotein 350 binding is supported by the binding of EBV gp42 to B-cell MHC-II, while fusion of the B-cell membrane and the outer viral envelope of EBV virion must have functional spicule glycoproteins, such as GH, GL and gp42. Vaccines development requires the identification of extremely competent B-cell linear or nonlinear and CTL epitopes, where T cells act as mediators.
A systems biology approach is useful to investigate T-cell epitopes in peptide sequences, and earlier reports have accelerated research leading to the development of immune biology.
Root Mean Square Deviation (RMSD). Molecular dynamics simulation was conducted to confirm the post-docking stability of the complexes. Trajectories were obtained after 50 ns and subjected to backbone stability using RMSD. RMSD of the selected complexes after 50 ns revealed that all of the complexes were stable and that the peptides had occupied the binding grooves of MHC-I molecules. RMSDs of all the complexes were calculated

Discussion
Vaccination is one of the best options for providing immunity against different pathogenic organisms, thus delivering protection against different diseases. Different types of vaccines, such as peptide, conjugated, subunit, and DNA vaccines, can be used to provoke the immune response. However, in this postgenomic and proteomic era, researchers prefer peptide or subunit vaccines over the whole pathogenic agent due to the availability and accessibility of huge data sets for different pathogens. These data can be systematically analyzed through the use of computational  tools. Presentation of MHC-Antigen on the surface of T-cells provide a way to kill the infected cell. This presentation of MHC-antigen complex direct apoptosis or self-eating process of the infected cell. The antigenic element of the pathogen provoke the immune response by activating a signalling process. The peptide fragment bound to MHC molecule is primarily presented T-cell which significantly rely on different factors including proteasome cleavage and transport with the aid of ER. TAP which transporting channels or proteins help in the transport to the surface of the cell. Therefore, considering the c-terminal cleavage activity and TAP efficiency greatly help in the selection of effective vaccine candidates 31,57-59 . Immunoinformatic approaches have contributed greatly to the development of vaccines. Therefore, we employed these tools to design peptide vaccines against EBV to provide a means to protect humanity from the multiple diseases caused by EBV, such as infectious mononucleosis, Burkitt's lymphoma 60 , Hodgkin's lymphoma 61 , stomach cancer, laryngeal carcinoma 62 , multiple sclerosis 63,64 and lymphomatoid granulomatosis 65 . Additional diseases that have been linked to EBV include Giannotti-Crosstie syndrome, erythema multiform, acute genital ulcers, and oral hairy leukoplakia 66 ; furthermore, hypersensitivity to mosquito bites has been associated with EBV infection 67 . EBV has been implicated in disorders related to alpha-synuclein aggregation (e.g., Parkinson's disease, dementia with Lewy bodies, and multiple system atrophy) 68 . The proteome of EBV has many important functional proteins involved not only in its pathogenesis but also in the maintenance   ) as the final T-cell epitopes that could provoke the immune response in the host cell. The systems biology approach with pharmacokinetic/ dynamics modelling validated that these epitopes significantly activates the immune response pathway and, thus, provide a strong basis for the testing of these epitopes under in vivo conditions. Structural stability analysis of the peptide-MHC complexes also revealed the stability of these immunogenic complexes. Antigenic and allergenic profiles also confirmed that these epitopes are strong candidates. Furthermore, B-cell epitopes were reported as the primary choice for the development of a B-cell immune response. This study provides a means for the development of peptide-based vaccines against EBV infection that could prevent many important diseases. To date, no such computational meta-analysis integrated with dynamics has been reported for the purpose of developing a peptide vaccine for EBV. This multiple step process has noticeably increased the scope and precision of this study. Our results will facilitate efficient subsequent experimental efforts, as the specified regions from Glycoprotein B (QMDTIYQCY 134-142 ), Glycoprotein L (MTAASYARY 256-264 ), Glycoprotein N (TTDSEEEIF 396-404 ), Glycoprotein M (LTEAQDQFY [43][44][45][46][47][48][49][50][51][52], Glycoprotein 42 (CAELYPCTY 132-140 ), and Glycoprotein 350 (PTNTTDITY 316-324 ) could be used for the development of candidate CTL epitopes. This study will aid in the progress of peptide vaccines against EBV.

Conclusion
It is well known that EBV causes many human diseases, including cancer. This study integrated multiple approaches to elucidate possible effective peptide vaccines that could provide protection against multiple infections. This study provides insight into the disease-causing factors of EBV virus and, thus, finalized potential B-cell and T-cell epitopes that will aid in the development of effective vaccines. Despite the prediction and validation of such peptides computationally, testing of our predicted epitopes should be carried out in animal models.