Immunoinformatics approaches to explore Helicobacter Pylori proteome (Virulence Factors) to design B and T cell multi-epitope subunit vaccine

Helicobacter Pylori is a known causal agent of gastric malignancies and peptic ulcers. The extremophile nature of this bacterium is protecting it from designing a potent drug against it. Therefore, the use of computational approaches to design antigenic, stable and safe vaccine against this pathogen could help to control the infections associated with it. Therefore, in this study, we used multiple immunoinformatics approaches along with other computational approaches to design a multi-epitopes subunit vaccine against H. Pylori. A total of 7 CTL and 12 HTL antigenic epitopes based on c-terminal cleavage and MHC binding scores were predicted from the four selected proteins (CagA, OipA, GroEL and cagA). The predicted epitopes were joined by AYY and GPGPG linkers. Β-defensins adjuvant was added to the N-terminus of the vaccine. For validation, immunogenicity, allergenicity and physiochemical analysis were conducted. The designed vaccine is likely antigenic in nature and produced robust and substantial interactions with Toll-like receptors (TLR-2, 4, 5, and 9). The vaccine developed was also subjected to an in silico cloning and immune response prediction model, which verified its efficiency of expression and the immune system provoking response. These analyses indicate that the suggested vaccine may produce particular immune responses against H. pylori, but laboratory validation is needed to verify the safety and immunogenicity status of the suggested vaccine design.

B-cell epitopes prediction. B lymphocytes are the most important factor of the immune system. It is responsible for secreting antibodies that in turn, provides long term immunity. For the prediction of B-cell linear epitopes, we employed an online server BCPred (http://ailab.ist.psu.edu/bcpred/) 32 , the server exploits kernel technique to predict 20-mer linear B-cell epitopes. BCPred uses a support vector machine (SVM) algorithm for B-cell (linear) epitopes prediction.
For the prediction of discontinuous B cell epitopes, we used a web server ElliPro (http://tools.iedb.org/ ellipro/). Residues clustering algorithms along with Thornton's technique is employed by ElliPro suite for prediction of B-cell (conformational) epitopes. The server exploits Modeler v9.20 in order to create 3D coordinates of the epitopes predicted by the server. The server assigns a PI (protrusion index) value to each predicted epitope 33 . construction of vaccine sequence. A set of CTL and HTL epitopes were selected on the basis of its high binding score and non-allergenic nature. In order to make the final multi-epitopes vaccine construct, the selected CTL epitopes were linked together by AAY linker whereas GPGPG linkers were used for HTL epitopes. Epitopes representation and proper separation are enhanced by linkers. The linkers are significant for two reasons; (1) linkers effectively prevent the formation of neo-epitopes (junctional epitopes), (2) and improve epitope presentation [34][35][36][37] . Furthermore, in order to improve immunogenicity, human β-defensins was added at N-terminus of the vaccine using EAAAK linker as an adjuvant.
Vaccines antigenicity profiling. Antigenicity evaluation is an important step in vaccine designing. In this study, we utilized two servers for antigenicity prediction, ANTIGENpro (http://scratch.proteomics.ics.uci.edu/) 39 and Vaxijen 2.0 (http://www.ddg-pharmfac.net/vaxijen/VaxiJen/VaxiJen.html). For antigenicity evaluation of the query sequence the former uses microarray data, it employs pathogen independent and sequence-based approach to predict the antigenicity value. However, the later predicts antigenicity on the basis of physicochemical properties for analysis of the query sequence. Vaxijen have 70 to 89% precision depending on the organism. physiochemical properties evaluation. For assessment of several physicochemical properties of the vaccine sequence an online freely accessible web server ProtParam (http://web.expasy.org/protparam/) 40 was utilized. The server calculates amino acid composition, molecular weight, aliphatic score, in-vitro half-life, in vivo half-life, instability index, theoretical pI, and GRAVY. prediction of secondary structure. For the computation of secondary structure, we used a web server PSIPRED (http://bioinf.cs.ucl.ac.uk/psipred/) 41 for the vaccine sequence, the server provides results with high accuracy. Proteins showing homology to our vaccine construct were identified via PSI-Blast, these sequences were then used to create PSSM (position-specific scoring matrix). Furthermore, the server uses neural network (feed forward) to process the PSSM and predict the secondary structure elements.
Vaccine 3D structure prediction. For the prediction of 3D structure for our vaccine sequence, we employed a widely used, freely available online web server RaptorX (http://raptorx.uchicago.edu/) 42 . The server along with 3D structure prediction, can predict disordered regions, solvent accessibility, secondary structures, binding sites and contacts. RaptorX predicts the absolute global quality and comparable global quality for each of the residues of the query sequence. In order to check the 3D structure, we used Pymol software.
Vaccine 3D structure refinement. The 3D structure was subjected to an online server Galaxy Refine (http:// galaxy.seoklab.org/) 43 for further improvement. The server employs the CASP10 technique in order to refine the query 3D structure. The side chains of proteins are reconstructed by CASP10 technique followed by repacking and Figure 1. Workflow of this scientific study. The complete methodology is shown is four steps: the selection of proteins suitable for vaccine designing, the second step comprised of T and B cell epitopes prediction, construction of vaccine by joining together the epitopes, at last, molecular docking with TLRs and MD simulations for stability of complexes and finally in silico expression was performed.
www.nature.com/scientificreports www.nature.com/scientificreports/ use of simulations of the 3D structure for relaxation. Galaxyrefine improved the structural and global quality of the 3D structure. YASARA 44 software was used for energy minimization and correction of the structure.
Vaccine 3D structure validation. Three freely available web tools ProSA-web (https://prosa.services.came. sbg.ac.at/prosa.php) 45 , ERRAT (http://services.mbi.ucla.edu/ERRAT/) 46 and RAMPAGE (http://mordred.bioc. cam.ac.uk/rapper/~rampage.php) 47 were utilized for the refined 3D structure validation. Prosa-Web highlight and plots overall excellence scores of the errors calculated in the query 3D structure, ERRAT server focus on the nonbonded interactions within the given structure whereas RAMPAGE server investigates Ramachandran plot, it utilizes PROCHECK principles for validation via Ramachandran plot. A separate plot is drawn for Proline-glycine residues.
Interaction analysis vaccine with TLR receptors. For the evaluation of the interaction between the vaccine and human toll-like receptors 2, 4, 5 and 9 we employed PatchDock (http://bioinfo3d.cs.tau.ac.il/ PatchDock/) 48 . It computes separating scores, surface fix coordinating scores, and portrayal of atomic shape for molecular docking. The server divides both the vaccine and TLRs into small patches in agreement with the surface shape. These small patches resemble unique shapes, which can distinguish between puzzle pieces visually. Another algorithm does superimposition of these small patches after identification of the patches. Furthermore, for refining and re-scoring the molecular docking complexes another server was employed, FireDock 49 to get the best structure. The refined complexes provided by FireDock are based on several factors such as atomic contact energy, partial electrostatics and vdW. Molecular dynamics simulation. Amber 14 setup was utilized for all the complexes to perform MD simulations 50 . Addition of Sodium ions neutralized the system by using "tleap". A TIP3P water box was used to solvate the system. To remove the clashes and constraints in the system energy minimization was carried out on AMBER version 14. The minimized system was used for MD analysis using PMEMD.cuda. A cutoff radius of 10 Å for non-bond interactions was considered. SHAKE and Particle-Mesh Ewald (PME) algorithms were utilized 51 . Post simulation analysis of the ten nanoseconds trajectories was performed using CPPTRAJ and PTRAJ 52 . Finally, RMSDs and RMSFs of all the systems were calculated.
Validation codon optimization and in-silico vaccine expression. Jcat was used for reverse translation and optimization of codons in order to achieve maximum expression in E. coli cellular machinery, Jcat calculated the GC content and CAI scores for the query sequence in order to ensure maximum expression. Prokaryotic ribosome and restriction binding sites and rho-independent termination options were selected. Restriction sites of Xhol and Ndel were added to the reverse translated sequence. The final vaccine was then cloned into pET-28a(+) plasmid using snap-gene software.
immune Simulation. C-ImmSim 53 is an online server, which uses an agent-based modelling approach to estimate the effect of a foreign particle, antigen, on the immune system. The server reflects the immune response against the antigen using PSSM method. Antibodies, cytokines and interferon production upon the injection of the vaccine is calculated. In addition, Th1 and Th2 responses are also forecasted by the webserver. Default parameters were used to plot the Simpson Index or D (a measure of diversity).

Results
Helicobacter pylori protein sequences retrieval. The amino acid sequences of the selected proteins GroEL, OipA, CagA and VacA of Helicobacter pylori were retrieved from Gene Bank using GI: 446963037, GI: 446632395, GI: 2498230 and GI 15645505 to design multi-epitope subunit vaccine for immune response against H. pylori infection. The selection of these proteins are based on their target efficacy and reliability. Immunogenic behavior by vaccine was enhanced by fusing the human β-defensins 4A protein (Uniprot ID: O15263) as adjuvant, which regulates the immune response by coupling with vaccine protein. Cytotoxic and HTL epitopes prediction. A total of 72 CTL epitopes were predicted for all of the proteins by the NetCTL 1.2. Among the total 72 epitopes, only seven epitopes based on the defined criteria, MHC binding score and non-allergenic nature, were selected. Similarly, the CTL epitopes given in Supplementary Table S1 by using the IEDB MHC-II server were predicted for the selected proteins. The HTL epitopes were predicted for a set of seven HLAs (HLA-DRB1 * 03:01, HLA-DRB4 * 01:01, HLA-DRB1 * 07:01, HLA-DRB3 * 02:02, HLA-DRB1 * 15:01, HLA-DRB3 * 01:01 and HLA-DRB5 * 01:01). HTL epitopes are given in Table 1. A variable number of HTL epitopes such three from GroEL, CagA and OipA each at different positions and four from VacA were predicted by the server.
Vaccine construction. Twelve HTL and seven CTL epitopes based on their binding scores, antigenic nature and the non-allergenic property is given in Table 2 were joined together by using GPGPG and AAY linkers to construct the final multi-epitopes vaccine. Adjuvant was attached to the N-terminus of the vaccine for the protection from degradation. EAAAK linker was used to join the adjuvant to the CTL epitopes. On the other hand, AAY and GPGPG linkers joined the CTL and HTL epitopes. Figure 2 Table 2. A list of both HTL and CTL epitopes used to model the final vaccine structure based on their respective scores. These HTL and CTL epitopes sequences were used in the final structure. www.nature.com/scientificreports www.nature.com/scientificreports/ including. The predicted epitopes were evaluated by BLASTp to avoid epitopes homologous with human proteins. Only 16% similarity was reported which is due to the human β-defensins sequence attached at the N-terminus.
B cell epitopes prediction. BCPred predicted 12 (linear) B-cell epitopes of 20 amino acids in length each with scores ≥0.90 were selected. Likewise, ElliPro suite predicted the discontinuous B-cell epitopes. In total, 51 amino acids were classified as B-cell (conformational) epitopes with a score of 0.819. The server default parameters were applied to predict these epitopes. The predicted conformational and linear epitopes are given in Fig. 3A,B. physio-chemical properties calculations. Several essential properties such as allergenicity, molecular weight, half-life and instability were calculated. The server predicted that the final vaccine construct is non-allergenic by predicting −0.42 score. The default threshold −0.4 was used. The basic nature of the vaccine was confirmed by determining the Isoelectric point. PI score 9.64 with 41.25 kDa molecular weight (MW) was calculated by the server. On the other hand, half-life >30, >20 and >10 hours, in vitro (in mammal's reticulocyte) yeast E. coli (in vivo) was calculated. Instability index of 17.28 confirmed that the vaccine would be stable in the experimental setup. GRAVY was calculated as −0.261, while the aliphatic coefficient was 68.28. In addition, online servers ANTIGENpro and VaxiJen have calculated 0.68 and 0.93 antigenic scores for the vaccine design, suggesting that the vaccine is immunogenic and can trigger an adequate immune effect.
Secondary structure elements prediction. A web server PSIPRED was used to predict the secondary structure, shown in Fig. 4, of the vaccine protein. The forecasted secondary structure elements include 35.8% α-helix, 12.9% β-sheet and 51.3% coils.
3D structure modelling and refinement. 3-dimensional structure of the multi-epitope final vaccine was modelled by using RaptorX web tool. The final structure obtained from this server has been shown in Fig. 5. Multi-template modelling predicted the 3D structure in 3 domains. For each domain, different templates were utilized. Templates with accession numbers 1FD3, 2LXO for domain-1, 1IOK, 5CDI, 3RTK, 1GRL, 1WE3 for domain-2 while 5MZ5, 2GOK, 4E57, 3B3J and 4WA0 were used for domain-3 modeling. It was found that 3RTK with p-value 7.38e-05, attained score 83 and 18% of identity. Galaxy Refine tool was utilized then to refine the selected model-1 which was then evaluated using RMSD (0.442), MolProbity (2.001), Ramachandran plot (93.3), GDT-HA (0.9365), Poor rotamers (0.3) and clash score (11.2). YASARA energy minimization tool was used to further improve the quality of the vaccine structure. 3D structure validation. Validity of the refined 3-dimensional modelled structure of the multi-epitope vaccine protein was carried out by using RAMPAGE, which verified that 82.5% residues of the vaccine are in the favoured region, 13.9% in allowed while only 3.6% in outlier area. ProSA-web further assessed the quality of the model, which reported the Z-score of −3.05 while 86.3% confidence was reported by the ERRAT server. The plots obtained from RAMPAGE server and PROSA-web are given in Fig. 6. Interaction analysis of the vaccine with TLR receptors. Multiple TLR receptors (TLR-2, 4, 5 and 9) were used as receptors and are given in Fig. 7. Patchdock server reported the top ten interaction models ranked by geometry of the protein's surface and electrostatic complementarity. Refinement and rescoring of the top complexes was performed on FireDock web tool which reported the best model with electrostatic interactions (6.12), Van Der Waals associations (−15.09), atomic contact energy (1.05), and binding free energy (−11.97). RMSD was evaluated to quantify each system's stability while RMSF was evaluated as shown in Fig. 8 for residual fluctuation. Ten nanoseconds simulations for all the systems were established, which reported variable RMSDs such as 0.7 nm (TLR-2), 0.7 nm (TLR-5), 0.5-0.6 nm (TLR-4) and 0.7 nm for TLR-9. Residual fluctuation (RMSF) was observed within the allowed range with the exception of a few greater fluctuating residues. codon optimization and in-silico vaccine expression. Herein, Jcat, was utilized to quantify the expression level of the multi-epitopes vaccine Escherichia coli (K12 strain). A total of 1170 nucleotides were used as input. Codon optimization was performed and Codon Adaptation Index (CAI) was calculated which was found to be 0.95 with GC contents of 55%. These results indicate the better expression of the final multip-epitope vaccine in E. coli (K-12 strain). It has been reported that 35% to 70% GC contents for better expression is required.  www.nature.com/scientificreports www.nature.com/scientificreports/ Restriction sites (NdeI and XhoI) were added 5′ and 3′ ends and cloning of the optimized nucleotide sequence in the pET28a (+) vector was performed. The overall construct of this vaccine along with the vector and restriction sites are given in Fig. 9. residues were reported to be in favoured region, 13.9% in allowed while 3.6% were reported to be in disallowed region respectively. www.nature.com/scientificreports www.nature.com/scientificreports/ immune simulation. It was found that results from the C-ImmSim server are compatible with prior experimental studies' reported the immune responses in H. pylori infections. Figure 10 shows the figures obtained from C-ImmSim server. Figure 10(A,C) show that the main reaction triggers the production of IgG and IgM antibodies, while the secondary reaction indicates enhanced levels of IgG1 + IgG2, IgM, and IgG + IgM (B-cell populations) antibodies. Earlier screening of H. pylori patients was reported to have IgM and IgG antibodies 55 . Furthermore, IFN-γ, Th1 and Th2 responses were also tested. Previously studies revealed that the H. pylori infections are characterized by Th1, Th2 and IFN-γ and both Th1 and Th2 responses are required for the protection 56 . Herein, Fig. 10(B,C), the IFN-γ concentration and TH cell population are reported to be high. Thus, these findings show that our vaccine design could effectively trigger the immune response and provide the basis for immunity against H. pylori-associated infections.

Discussion
To control infectious diseases, the use of antibiotics is not only expensive but is also precariously resulting in the continuous generation of resistant microbes. Alternatively, vaccination can be effectively applied to a vast population to prevent infections. Conversely, to the classical approaches for vaccinations, today's cutting-edge research has made possible the generation of "subunit vaccines" which constitutes of the particular pathogenic protein sequences of the microbes and are fully capable of effective stimulation of immune response. Due to the availability of comprehensive information about the genomes and proteome of the microbes designing novel and efficient subunit vaccines has become more practically possible. As conventional vaccine development approaches are now almost outdated due to their low effectiveness and high costs in terms of budget and time. Designing vaccines via immunoinformatics approach is comparatively stable, safe, inexpensive, specific and more effective.  www.nature.com/scientificreports www.nature.com/scientificreports/ Utilizing the cutting-edge immune-informatics approach proteome of the H. pylori was exploited to identify the target sequences for creating the subunit vaccine against the H. pylori. As previously documented, the most recognizable toxic factors in H. pylori infections are cag, PaI, VacA, cagþ, dupA, OipA, dupA and BabA. To assess the immunogenic potency of these proteins comprehensive analysis was carried out using online servers, and online tools were applied to generate a potent subunit vaccine design. Additionally, the B-lymphocyte and T-lymphocyte epitopes were determined from the chosen protein sequences. Generally, the T cell receptors (TCR) on T-lymphocytes are responsible, to generate the immune response, after activated by the antigen-presenting cells (APCs) or MHC bound antigens, and these two molecules MHC-I and MHC-II. The immune-informatics scrutiny of the H. pylori proteome determines, that the proposed vaccine protein covers a comprehensive number of high affinities MHC Class I, II and B-cell linear epitope based on physio-chemical property and structural features. And the generation of plasma and memory B cells provides future protection from specific antigens or pathogen-associated antigens. Here our final proposed vaccine design is composed of significant MHC I & II binding epitopes assisted by the adjuvant improves the host immunity. To counter the allergic response to the designed vaccine, the involved key parameters were carefully assessed. PSIPRED v3.3 and RaptorX servers were utilized for the SS and 3D structures accordingly. To further refine the 3D structure GalaxyRefine server was utilized and the refined 3D model was additionally validated utilizing multiple servers for its quality. To assess the interactions among TLR-2, TLR-4, TLR-5, and TLR-9 and the designed vaccine, docking was performed, and for the stability of docking complexes, MD simulations were applied.
For maximized expression and optimization of codons, Jcat software was applied and K12 strain of E. coli was used. The GC contents and CAI calculated by Jcat indicated high level expression, the software validation confirmed high solubility of the proposed vaccine with required expression in E. coli. Disulfide bonds were used in order to provide stability to vaccine.
Immunoinformatics methods used in this scientific study validated the stability, effectiveness and high-level expression of vaccine protein in E. coli host. Furthermore, the use of agent-based model predicted the accurate response upon the injection of the vaccine sequence. The vaccine is specific, non-allergenic, antigenic and can effectively control H. pylori infections, further clinical trials are required to check the efficacy of vaccine.

conclusion
One of the major causes of gastric disorder is Helicobacter Pylori. This scientific study used immunoinformatics to design a novel and potent multi-epitope vaccine against H. pylori to design novel multi-epitope vaccine, which could provide both types of immunity. Molecular docking, thermodynamics stability profiling, in-silico expression and an agent-based modelling tool to verify the stability, expression and immune response reaction provoked by the final vaccine. The result we got increases the experimental validity and can be helpful in designing of an effective vaccine against H. pylori infection.