In silico analysis of epitope-based vaccine candidate against tuberculosis using reverse vaccinology

Tuberculosis (TB) kills more individuals in the world than any other disease, and a threat made direr by the coverage of drug-resistant strains of Mycobacterium tuberculosis (Mtb). Bacillus Calmette–Guérin (BCG) is the single TB vaccine licensed for use in human beings and effectively protects infants and children against severe military and meningeal TB. We applied advanced computational techniques to develop a universal TB vaccine. In the current study, we select the very conserved, experimentally confirmed Mtb antigens, including Rv2608, Rv2684, Rv3804c (Ag85A), and Rv0125 (Mtb32A) to design a novel multi-epitope subunit vaccine. By using the Immune Epitopes Database (IEDB), we predicted different B-cell and T-cell epitopes. An adjuvant (Griselimycin) was also added to vaccine construct to improve its immunogenicity. Bioinformatics tools were used to predict, refined, and validate the 3D structure and then docked with toll-like-receptor (TLR-3) using different servers. The constructed vaccine was used for further processing based on allergenicity, antigenicity, solubility, different physiochemical properties, and molecular docking scores. The in silico immune simulation results showed significant response for immune cells. For successful expression of the vaccine in E. coli, in-silico cloning and codon optimization were performed. This research also sets out a good signal for the design of a peptide-based tuberculosis vaccine. In conclusion, our findings show that the known multi-epitope vaccine may activate humoral and cellular immune responses and maybe a possible tuberculosis vaccine candidate. Therefore, more experimental validations should be exposed to it.

www.nature.com/scientificreports/ live-attenuated vaccine, BCG has a low safety because of the risks related with its use in immunocompromised individuals and the chance of returning the pathogen to its virulent state 12 . There are currently 16 TB vaccines in phase I, II, and III clinical trials, and some of them are a live attenuated form of Mtb 13 . Viral vector-based vaccines such as MVA85A and Crucell-Ad35/AERAS-402 will reduce the vaccine's efficacy before exposure to the vector 14 . Live prophylactic vaccines such as recombinant BCG VPM1002 and MTBVAC are also produced 7,8 , and can return to a pathogenic type. Inoculating patients with vaccines dependent on subunits decreases the risk of virulence reversal. The subunit vaccines (M72 and H4) consist of various Mtb antigens. However, lack of immunogenicity and are seldom capable of inducing immunity to long-term illness, requiring several vaccinations with adjuvants' addition. M72/AS01 E that was primarily considered clinically safe in both healthy and TB-diagnosed adults. However, several volunteers encountered local reactions at injection spots during phase II, which prematurely terminated the research 15 . Given the success of peptide vaccines such as H4/IC31, peptide vaccines are regarded as stable and potentially potent TB vaccines. H4/IC31 had clinically safe in phase I studies that induced a robust immune response in healthy adults, and BCG vaccinated infants 16 . Peptide vaccines usually have higher safety profiles due to epitopes without reactogenic responses 17 and low manufacturing charges 18 .
Different methods have been implemented to develop more effective novel TB vaccines, and among these methods, subunit vaccines have shown great promise 19 i.e., the variety of epitopes in the Mtb protein antigens constituting the subunit vaccine. Besides computational studies, some studies had endeavored to evaluate the human T cell immune responses to multiple Mtb subunit vaccines empirically 20,21 and using data from clinical trials, respectively [22][23][24] . Rodo et al. recommended that the identification of vaccines with distinctive immune response features may increase the probabilities of finding a safe vaccine 21 and provided valuable information on the potential production of the Mtb vaccine.
This study aimed to develop a TB epitope ensemble vaccine for new access in reverse vaccinology. The method has recently started to be demonstrated by the discovery of the epitope ensemble vaccine against SARS-CoV-2 [25][26][27] , Ebola virus 28 , malaria 29 Acinetobacter baumannii 30 , and Staphylococcus aureus 31 . Similarly, various immuno-informatics methods were used to develop a probable vaccine coding in the Mtb H37Rv genome for multiple B and T cell epitopes. They could potentially activate both humoral and cellular immunity 32 . Our finding proposed that the selected epitopes from the four Mtb antigens [Rv2608, Rv2684, Rv3804c (Ag85A), and Rv0125 (Mtb32A)] could be used effectively as potential candidates for vaccines and will be applied for future experimental research to eradicate TB. We selected Mtb epitopes with demonstrated immunogenicity combined to form an effective, widely available epitope ensemble vaccine, establishing a universal vaccine.

Materials and methods
Selection of Mtb strain, antigens and retrieval of protein sequences. There are 7 phylogenetic branches of Mtb, with lineages 2, 3, and 4 being responsible for the majority of worldwide spread. Most preclinical work has used lineage 4 derived vaccines due to the common lab strain H37Rv being of lineage 4. However, there was little direct evidence to support this selection 33 . The Immune Epitope Database and Analysis Resource (IEDB; http://www.iedb.org/) was used to collect M. tuberculosis-specific epitopes. We selected the antigens, based on their antigenicity, immunogenicity, conservancy, MHC binding affinity and IFN-γ stimulation. Antigens have chosen for this study included; Rv2608 (Accession No.: P9WHZ5), Rv2684 (Accession No.: P9WPD9), Rv3804c (Accession No.: P9WQP3) and Rv0125 (Accession No.: O07175). The Mtb H37Rv protein amino acid sequences were obtained from the mycobrowser database (https ://mycob rowse r.epfl.ch/). For all species of Mycobacteria, Mycobrowser is the complete genomic and proteomic database 34 including M. tuberculosis, M. leprae, M. marinum, and M. smegmatis 35 . Further analysis was done with the assistance of these protein sequences.
Prediction of antigenicity of proteins sequences. It is a significant feature of vaccine developing that designated vaccine candidates must have antigenic property. ANTIGENpro and VaxiJen v2.0 both were used to measure the antigenicity of the vaccine candidates. ANTIGENpro (http://scrat ch.Prote omics .ics.uci.edu/), which uses micro-array data to calculate protein antigenicity. The server's accuracy with the combined dataset was calculated to be 76% based on cross-validation experiments 41 . While the antigenic evaluation of the selected genes was performed via a freely accessible online VaxiJen 2.0 server (http://www.ddg-pharm fac.net/Vaxij en/ VaxiJ en/VaxiJ en.html) with a threshold value of ≥ 0.4 42 , only probable antigen epitopes were selected for the vaccine construction. VaxiJen 2.0 server is based on auto and cross-covariance (ACC) transformation of protein sequences into uniform vectors of major amino acid properties was used to evaluate the antigenicity of the vaccine. The VaxiJen algorithm is mainly based on the method of sequence alignment and analyzes protein physiochemical properties to identify them as antigenic 43 . Prediction of allergenicity and toxicity of proteins sequences. Allergen identification is a vital factor in the development of the vaccine. AllerTOP v.2.0 and AllergenFP servers measured the allergenic properties of the proteins. AllerTOP v2.0 an online server (http://www.ddg-pharm fac.net/Aller TOP) develops the k nearest neighbors (kNN), auto-and cross-covariance (ACC) transformation, and amino acid E-descriptors machine learning techniques for the classification of allergens by exploring the physiochemical properties of proteins. The accuracy of this approach was stated as 85.3% at fivefold cross-validation 44 . On the other hand, AllergenFP (http://ddg-pharm fac.net/Aller genFP /) is an alignment-free, descriptor-based fingerprint method for the detection of allergens and non-allergens. This method is based on a four-step algorithm. Initially, the protein sequences are defined in terms of their properties, including size, hydrophobicity, relative abundance, α helix and β-strand forming propensities. The generated strings that vary in length are transformed into vectors of equal size by ACC. The vectors were translated into binary fingerprints and measured according to the Tanimoto coefficient. The method was applied to known non-allergens and allergens and correctly recognized 88% of them with a Matthews correlation coefficient of 0.759 45 . Moreover, it is also a comprehensive and accurate method for the allergen prediction and also used by the researchers to predict allergens in the process of vaccine construction 46 . For further analysis, the protein sequences which are non-allergenic in the properties were selected.
And finally, all the epitopes were checked for toxicity using the ToxinPred server (https ://webs.iiitd .edu.in/ ragha va/toxin pred/multi _submi t.php) 47 , and non-toxic epitopes were chosen. The overall construct of the vaccine has also been verified for these characteristics.
Construction of multi-epitope vaccine candidate sequence. The highly antigenic, immunogenic, non-toxic, and non-allergenic epitopes were selected for the final vaccine construct. Selected CTLs, HTLs epitopes, and B-cell epitopes predicted by using NetCTL 1.2, IEDB MHC II server and ABCpred server respectively, were used to construct multi-epitope vaccine sequence. The linear B-cell and HTL epitopes were linked with GPGPG linker and CTL epitopes by AAY linker. In addition, a griselimycin (APD ID: AP02688) 48 was selected as an adjuvant to increase the immunogenicity of the vaccine, and linked via EAAAK linker. The sequence of the griselimycin was retrieved from the Antimicrobial Peptide Database (http://aps.unmc.edu/AP/ main.php).
Physiochemical properties and solubility prediction. Expasy Protparam (https ://web.expas y.org/ protp aram/) was used to predict various physicochemical properties like theoretical isoelectric point (pI), amino acid composition, in vitro and in vivo half-life, instability and aliphatic index, molecular weight (MW), and grand average of hydropathicity (GRAVY) of the vaccine constructs 49 . The multi-epitope vaccine solubility was predicted using the Protein-Sol server (http://prote in-sol.manch ester .ac.uk). The scaled solubility value (Que-rySol) is the predicted solubility. The population average for the experimental dataset (PopAvrSol) is 0.45, and thus any scaled solubility value greater than 0.45 is predicted to have a higher solubility than the average soluble E. coli protein from the experimental solubility dataset 50 . The protein with a lower scaled solubility value is predicted to be less soluble.
Secondary structure prediction. The secondary structures of the vaccine constructs were generated using the online tool PSIPRED and RaptorX Property servers. PRISPRED (http://bioin f.cs.ucl.ac.uk/psipr ed/), is an online server secondary structure generating tool that also predicts the transmembrane topology, transmembrane helix, fold and domain recognition etc efficiently 51  www.nature.com/scientificreports/ (Deep CNF) to continuously calculate secondary structure (SS), disorder regions (DISO), and solvent accessibility (ACC) 52 .
Tertiary structure prediction. The tertiary or three-dimensional (3D) model of the multi-epitope vaccine was prepared using the homology modeling tool I-TASSER (Iterative Threading ASSEmbly Refinement) server (https ://zhang lab.ccmb.med.Umich .Edu/I-TASSE R/). The I-TASSER server is an integrated platform for computerized protein structure and function prediction based on the sequence-to-structure-to-function paradigm and identifies similar structure patterns from the Protein Data Bank (PDB) 53 . I-TASSER initial produces 3D atomic models from several threading alignments and iterative structural assembly simulations starting from an amino acid sequence. A template modeling (TM)-value > 0.5 shows a model of accurate topology, and a TMvalue < 0.17 indicates a random similarity. These cutoff value does not depend on the length of the protein 54 . In the previous five community wide CASP (Critical Assessment of techniques for Structure Prediction) experiments, I-TASSER was ranked finest server for protein 3D structure prediction 55 .
Refinement of the tertiary structure. GalaxyRefine web server (http://galax y.seokl ab.org/cgi-bin/ submi t.cgi?type=REFIN E) has refined the 3D model obtained for the multi-epitope vaccine peptide. The Gal-axyRefine server is based on a refinement approach that was effectively verified in CASP10 based refinement experiments. and achieves the repacking and molecular dynamics simulation to relax the structure. This method can improve the quality of both global and local structures when used to improve the models produced by state of the art protein structure prediction servers 56 .
Validation of tertiary structure. Tertiary structure validation is a severe stage of the model construction method because it identifies possible mistakes in 3D models predicted 57 . ProSA-web server (https ://prosa .servi ces.came.sbg.ac.at/prosa .php) was initially used for protein 3D structure validation, which estimates a total quality score exact input structure, which is shown in the form of Z score. If the Z scores are outside the range of the properties for native proteins, it specifies that the structure likely contains errors 58 . To investigate nonbonded atom-atom interactions associated with the ERRAT web-server (http://servi ces.mbi.ucla.edu/ERRAT /) was also used to predict high-resolution crystallography structures. A Ramachandran plot was retrieved via RAMPAGE web-server (http://mordr ed.bioc.cam.ac.uk/~rappe r/rampa ge.php) and describe the quality of the modeled structure by displaying the percentage of residues in disallowed and allowed regions 59 .
Prediction of discontinuous B-cell epitopes. More than 90% of B cell epitopes were determined to be discontinuous. ElliPro, an online server (http://tools .iedb.org/ellip ro/), has been used to predict the validated 3D structure of discontinuous (conformational) B-cell epitopes. ElliPro implements three algorithms based on their protrusion index (PI) values to estimate the protein shape as an ellipsoid, measure the residue PI, and adjacent cluster residues. ElliPro offers a score for each output epitope termed as an average PI value over each epitope's residue. The ellipsoid with a PI value of 0.9 contains (90%) protein residues included while the (10%) residues are outside ellipsoids. The PI value for each epitope residue was determined based on the center of residue mass residing outside the largest ellipsoid possible. Compared to other structure based approaches used to predict epitopes, ElliPro achieved the top and provided an AUC value of (0.732) as the best calculation for any protein.

Molecular docking of the final vaccine with immune receptor. It is based on the interface between
an antigenic molecule and a particular immune receptor to produce an effective immune response. TLR3 (PDB ID: 1ZIW) was downloaded from Protein Databank (PDB) (https ://www.rcsb.org). Online servers ClusPro 2.0 (https ://clusp ro.bu.edu/login .php), HADDOCK server (https ://haddo ck.scien ce.uu.nl/), PatchDock server (https ://bioin fo3d.cs.tau.ac.il/Patch Dock/php.php), and FireDock server (http://bioin fo3d.cs.tau.ac.il/FireD ock/php.php) were used for molecular docking and docking refinement, respectively 60 . Again, the docking was performed for the third time using the HawkDock server (http://cadd.zju.edu.cn/hawkd ock/), and subsequently, the Molecular Mechanics/Generalized Born Surface Area (MM-GBSA) score was also measured using the same server that predicts the result in the affinity score and the lowest prediction score is considered the better score 61 .

Molecular dynamics simulation.
The molecular dynamics simulation study was conducted for the vaccine construct that showed the best molecular docking study results. The iMODS web-server (http://imods .Chaco nlab.org/) was used for the molecular dynamics simulation study, a fast, free-accessible and molecular dynamics simulation server for defining and calculating the protein flexibility 62 .
Codon optimization and in silico cloning. A codon optimization approach was used to improve recombinant protein expression. Codon optimization is essential because the genetic code's degeneracy permits most of the amino acids to be encoded by multiple codons. Java Codon Adaptation Tool (JCat) server (http://www. prodo ric.de/JCat) was used in the codon system of E. coli (strain K12) to obtain the codon adaptation index (CAI) values and GC contents to determine the levels of protein expression. The best CAI value is 1.0, while > 0.8 is regard a good score, and the GC content range from 30 to 70%. There are unfavorable effects on translation and transcriptional efficiencies beyond this range 63 . The multi-epitope vaccine's optimized gene sequence was cloned in E. coli plasmid vector pET-30a (+), NdeI and HindIII restriction sites were added to the N and C-terminals of the sequence, respectively. Finally, the optimized sequence of the final vaccine construct (with restriction sites) was inserted into the plasmid vector pET-30a (+) using the SnapGene software (https ://www. snapg ene.com/free-trial /) to confirm the expression of the vaccine.

Protein sequences and PDB structures. The amino acid sequence of Mycobacterium tuberculosis
H37Rv was saved from the mycobrowser database as a FASTA format. Hence, then the functional sequences for the four proteins were subjected to linear B-cell and T-cell epitope prediction and develop a novel subunit vaccine against tuberculosis.

Prediction of cytotoxic T lymphocytes epitope.
For the four nominated proteins, 34 CTL (9-mer) epitopes were predicted using the NetCTL 1.2 web-server fixed at the threshold value for epitope documentation. Out of all these predicted CTL epitopes, only ten epitopes were chosen to construct the vaccine based on their high scores binding affinity towards MHC-I, antigenicity, non-allergenicity, and non-toxicity, as illustrated in (Table 1).

Prediction of Helper T lymphocytes epitope.
High-binding MHC-II epitopes for human alleles HLA-DR, predicted with the IEDB MHC-II web server were defined as HTL epitopes. A total of four HTL epitopes were nominated for the final vaccine on the basis of binding affinity, antigenicity, non-allergenicity, and non-toxicity, as illustrated in (Table 2) Prediction of interferon-gamma inducing epitopes. The IFN-gamma plays a significant role in intracellular pathogen evasion and majorly acts as cytokines for cytotoxic T lymphocytes and natural killer cells. The IFN-γ inducing epitopes predicted by the IFN-γ epitope server, using the Support Vector Machine (SVM) method. Four HTL epitopes with IFN-γ positive scores were chosen for vaccine construction. www.nature.com/scientificreports/ Prediction of linear B-cell epitopes. ABCpred server was used to predict the B cell epitopes. All predicted B-cell epitopes (16-mer) with a cut-off binding score > 0.9, high antigenic, non-allergenic and non-toxic, a total of four B-cell epitopes finally chosen for vaccine construction, are listed in Table 3.
Construction of multi-epitope subunit vaccine. Four B cell epitopes, four HTL epitopes, and ten CTL epitopes were nominated to design a novel vaccine, fulfilling the criteria of binding affinity, antigenicity, nontoxicity, and non-allergenicity. In addition to these epitopes, to improve immunogenicity, an adjuvant griselimycin with APD ID: AP02688 also applied to both the N and C terminals of the vaccine. EAAAK linkers link adjuvant to the epitopes, GPGPG linkers were used to link B-cell and HTL epitopes and AAY linkers to link CTL epitopes. The constructed vaccine sequence was again tested for antigenicity, non-allergenicity, non-toxicity, solubility and fulfilling all the criteria. The schematic presentation of the final multi-epitope vaccine peptide of the current study is defined in Fig. 1.  www.nature.com/scientificreports/ which shows a hydrophilic nature of the vaccine constructs. The solubility score of protein was 0.557, which indicates the protein is highly soluble upon expression ( Fig. 2A).

Prediction of antigenicity
Prediction of secondary structure. The overall vaccine sequence was estimated to have 20% α-helix, 21% β-strand, and 58% coil (Fig. 2B). Furthermore, 38% of amino-acid residues were expected to be exposed, 22% medium exposed, and 38% buried in support of solvent accessibility.
Tertiary structure modeling. Five tertiary 3D structures of the designed vaccine were predicted by the I-TASSER web-server based on ten threading templates, with Z score values (1.10-2.78) and confidence score (C-score) values (− 3.96 to − 1.31). Usually, the C score series is from − 5 to 2, with high scores representing high sureness. The best structure with the C value − 2.01 from the modelling chosen for additional analysis. (Fig. 3A). This structure had a probable TM-score of 0.47 ± 0.15, with an expected root-mean-square deviation (RMSD) score of 11.2 ± 4.6 Å. The TM-value has been recommended as a calculating scale for the structural resemblance among the structures. The TM-value was suggested to address the issue of RMSD, that is delicate to native mistake.
Tertiary structure refinement. The GalaxyRefine web-server was used to enhance the consistency of the modeled protein. The loop refinement and energy minimization were carried out to obtain the high quality of the predicted structure. The refinement of the initial "crude" vaccine model on the GalaxyRefine web-server produced five model structures. Based on structure quality for all developed structures, model 4 was the most significant based on several factors, i.e., GDT-HA (0.9079), RMSD (0.518), and MolProbity (2.510). The clashvalue was 22.4, the low rotamers-value was 0.4, and Rama favored value was 84.50. This model was selected for additional study (Fig. 3B).
Tertiary structure validation. The refined structure was exposed to the Ramachandran plot analysis using the RAMPAGE web-server. The VADAR web tool plot exposed 85.9% of residues in favored regions, and 8.9% is allowed regions 5.2% of residues in the outlier region (Fig. 3C). Both ProSA-web and ERRAT verified the quality and potential errors in the crude 3D model. The chosen model after refinement had an overall quality factor of 87.9% with ERRAT. The Z score for the input vaccine was estimated to be − 1.39 by the ProSA-web- www.nature.com/scientificreports/ server (Fig. 3D). The overall results from RAMPAGE, ERRAT, VADAR web tool, and ProSa-web have validated the 3D modeled protein's outstanding quality.

Prediction of conformational B-cell epitopes.
One hundred ninety-two residues were estimated to be situated in four discontinuous B-cell epitopes, with values from 0.69 to 0.785. The conformation epitopes ranged in size from 20 to 65 residues. For discontinuous peptides that predict using Ellipro, the score value of 0.69 or more was selected (Fig. 4A-D) and (  ) were predicted. The individual score of each of the discontinuous epitopes has been shown in (Fig. 5A).

Molecular docking.
The protein-protein docking of the vaccine was carried out by numerous online tools for improving the accurateness of the prediction: i.e., ClusPro 2.0, PatchDock and HawkDock server. The docked complexes that were created by ClusPro 2.0 and PatchDock tools were further investigated by PRODIGY tool of HADDOCK web-server and FireDock server, respectively (Fig. 5B). The PRODIGY tool estimated the binding affinity score (kcal/mol) whereas FireDock predicted the global energy of the docked complexes. However, HawkDock produces ranking scores along with the binding free energy (kcal/mol). The binding free energy was deliberate after the MM-GBSA score in the HawkDock web-server. The vaccine in the docking experiment carried out by the ClusPro 2.0 and PRODIGY servers had the lowest binding affinity when docked with TLR-3 (− 32.82 kcal/mol). The vaccine with TLR-3 complex has good global binding energy score (− 35.88 kcal/mol) www.nature.com/scientificreports/ acquired from PatchDock server. However, the vaccine also indicated the best presentations with the nominated TLR-3 by the HawkDock server and also when studied in the MM-GBSA study with a relative binding free energy − 42.82 (kcal/mol) (Fig. 5C).

Molecular dynamics simulation.
The results of molecular dynamics simulation and normal mode analysis (NMA) of vaccine construct and TLR-3 docked complex is illustrated in (Fig. 6A). The simulation study was conducted to determine the movements of molecules and atoms in the vaccine construct. The deformability graph of the complex illustrates the peaks in the graphs which represent the regions of the protein with deformability (Fig. 6B). The eigenvalue of the complex is 1.726468e−09 as shown in (Fig. 6C). The variance graph displays the cumulative variance by green colored and individual variance by red colored (Fig. 6D). The B-factor graph gives a clear visualization of the relation of the docked complex between the NMA and the PDB sector (Fig. 6E). The co-variance map of the complex where the correlated motion between a pair of residues is   www.nature.com/scientificreports/ indicated by red color, uncorrelated by white color and anti-correlated by blue color (Fig. 6F). The complex 's elastic map shows the relation between the atoms and darker gray regions, indicating stiffer regions (Fig. 6G).

Codon optimization and in-silico cloning. For optimizing the vaccine construct's codon usage, Java
Codon Adaptation Tool (JCat) was used for maximal protein expression in E. coli (strain K12). The optimized codon sequence had a length of 927 nucleotides. The codon optimization index (CAI) value was predicted 1.0, and the average GC content of the adapted sequence was 59.2%, which indicates high expression in the E. coli host. Finally, the recombinant plasmid sequence was constructed by introducing the adapted codon sequences into the plasmid vector pET30a (+) using SnapGene software (Fig. 7).
Immune simulation. C-ImmSim studies the successive and effective immune responses of the state of the cell and the memory of immune cells by a mechanism that increases their half-life. The effect of the approach is that few cells substantially increase their half-life and live longer than other cells. ImmSim server immune simulation outcomes confirmed consistency with real immune reactions. The primary response was illustrated by high IgM levels. In addition, an increase in the B-cell population was characterized as an increase in immunoglobulin expression (IgG1+IgG2, IgM, and IgG+IgM), resulting in a decrease in antigen concentration (Fig. 8A,B). There is also a clear increase in the population of Th (helper) and T C (cytotoxic) cells with memory growth (Fig. 9A,B). IFN-γ production was also identified to have been stimulated after immunization (Fig. 9C). The T cell population results were significantly approachable as the memory developed, and all other immune cell populations were exposed to be consistent.

Discussion
Tuberculosis (TB) is a life threating disease and the TB vaccines used globally, the BCG vaccine, offers limited protection against TB for children, and adults, which accounts for most of the TB cases worldwide 66 . Therefore, new candidate vaccine against TB are extremely needed and some of them are under evaluation in clinical trials. Development in reverse vaccinology and the existence of genomics and proteomics information help in vaccine designing. Moreover, successful implementation of the bio-informatics tools is beneficial compared to traditional vaccine design 67 . Identification of immunogenic antigens is an essential step in vaccine designing as it can be potentially used for in-silico epitope prediction 68 . Immunogenic antigens have a unique attribute to attach and respond with immune cells and perform as immuno-dominant 69 . Using computational methods, we conducted an organized and complete valuation of the four Mtb antigens (Rv2608, Rv2684, Rv3804c, and Rv0125) constituting Mtb subunit vaccines undergoing clinical trials. We predicted linear B and T-cell epitopes www.nature.com/scientificreports/ using bioinformatics tools that could possibly stimulate both cellular and humoral immunity 32 . The prediction of linear B and T-cell epitopes from Mtb was selected to predict the vaccines as they play an important role in the cellular and biological developments. These epitopes of B-cells and T-cells may theoretically be used to produce vaccines targeting Mtb and may be reliable for stimulating both humoral and cell-mediated immunity. Prediction of B-cell epitopes is a significant characteristic for the designing of vaccines 70 , and to provide the site for the antigen-antibody interactions [71][72][73] . In the present research, 16-mer B-cell epitopes were predicted via the ABCpred server. T-cell epitopes are essential for adaptive immune stimulation and are sufficient to cooperate with MHC molecules 74 . Therefore, the selection of epitopes fixes with MHC is an essential aspect in predicting potent T-cell epitopes 75 . Also, CD4 + and CD8 + T-cells' recognition is critical while developing multi-epitope-based vaccines 76,77 . We predicted B and T-cell epitopes from the nominated antigens and joined them using AAY and GPGPG linkers in order to construct epitope-based vaccine 78 . To generate sequences with reduced junctional immunogenicity, the previously described GPGPG and AAY linkers 57,63 were integrated between the selected epitopes, enabling the rational design of a multi-epitope vaccine 78 . The EAAAK linker 79 was also fused between the adjuvant and the epitopes sequences for a best expression and bioactivity improvement of vaccine. Immunoinformatics evaluation of the constructed vaccine specified many MHC Class I, MHC Class II, IFN-γ, and linear B-cell epitopes. However, previous studies recommends that IFN-γ also promotes general protection against TB in the mice lungs 80 . The lack of the protein vaccine's allergenic function has nearer enhanced its efficacy as a candidate vaccine. The constructed multi-epitope vaccine showed higher scores of antigenicity both on ANTIGENpro and Vaxijen v2.0 server. Multi-epitopic vaccines have less immunogenicity and need adjuvants 78 .
The final protein's molecular weight (MW) was predicted to be 31.9 kDa and has been estimated to be highly soluble upon expression, along with its virtual immunogenicity. The solubility of recombinant protein overexpressed in the E. coli host is critical for numerous biochemical and functional studies 57 . The estimated theoretical pI is 4.28, suggesting that the constructed vaccine is acidic. The predicted score of the instability index was 25.51, which suggests that the protein would be extremely stable upon expression, thus further firming its probability. The aliphatic index shows that there are aliphatic side chains in the protein, representing possible hydrophobicity. Knowledge about the target protein's secondary and tertiary structures is vital in vaccine development 78 . Analyses of the secondary structure showed that the protein contained mainly of 58% coils, with 13% of residues disordered. Natively unfolded protein regions and α-helical coiled-coils peptides have been identified as significant "structural antigens" types. When examined in synthetic peptides, these two structural types can fold into their native structure and thus be recognized by antibodies naturally induced in response to infection 81 . The vaccine candidate's 3D structure improved considerably after the refinement and displayed appropriate characteristics based on the Ramachandran plot's results. The result of the Ramachandran plot indicates that 85.9% of the residues are initiate in the favored regions, and 8.9% are allowed regions with less (5.2%) residues in the outlier region; this suggests that the quality of the whole model is acceptable. One of the primary feature in validating a candidate vaccine is to screen for immunoreactivity over serological study 82 .
Further, to examine the capability of the constructed vaccine to bind with TLR on immune cells, the TLR-3 was docked with the vaccine. The results revealed that the constructed vaccine had a high binding affinity towards TLR-3. This interface of vaccine with TLR-3 was signifying that vaccine have the probability to produce both innate and adaptive immune response. For exploring the stability and dynamics performance of the TLR-3-vaccine docked complex, MD simulation was implemented, and the RMSD plot representing the steady binding of the complex.
Immune simulation showed results consistent with typical immune responses. Following repeated exposure to the antigen, there was a general increase in the generated immune responses. The development of memory www.nature.com/scientificreports/ B-cells and T-cells was evident, with memory in B-cells lasting several months. Helper T cells were particularly stimulated. Another interesting observation was that levels of IFN-γ and IL-2 rose after the first injection and remained at peak levels following repeated exposures to the antigen. This indicates high levels of T H cells and consequently efficient Ig production, supporting a humoral response. The Simpson index, D for investigation of clonal specificity suggests a possible diverse immune response. This needs that recombinant protein be expressed in a suitable host. The preferred alternative for the expression of recombinant proteins is the E. coli expression systems 83,84 . Codon optimization was carried out to get high-level expression of the recombinant vaccine in E. coli system (K12 strain). For high-level protein expression in bacteria, both the GC content (59.2%) and the CAI score (1.0) were favorable. The next step currently being planned is to express this peptide in a bacterial system and carry out the numerous immunological analyses required to confirm the results achieved through immuno-informatics analysis.

Conclusion
Computational approaches presented in the present work may produce new knowledge about Mtb vaccine antigens and new vaccine candidates that cannot simply be acquired from pre-clinical, in-vitro and animal studies. This study used an immuno-informatics tool to define tuberculosis novel multi-epitope subunit vaccine, which is highly immunogenic and has appropriate properties to be a carrier vaccine. Epitope-prediction tools were used to analyze multiple B-cell and T-cell epitopes, which were fused using suitable linkers and adjuvant to enhance the vaccine's immunogenicity. Antigenicity, allergenicity, solubility, as well as physiochemical properties and tertiary structure analysis, of vaccine were found to be very satisfactory. Molecular docking and molecular dynamics simulation analysis of TLR-3 and vaccine were completed, allowing estimation of the binding affinity www.nature.com/scientificreports/ and stability of the complex. The in silico immune simulation confirmed immune cell response against antigen clearance rate. The codon optimization also provided an optimistic CAI value, which will help in-vivo expression studies soon. In this research, we made use of various immune-informatics tools for investigating different properties of vaccine. Additionally, the predicted epitopes-based subunit vaccine's assessment is hugely acceptable to prove them as an immunogenic and potential vaccine candidate against tuberculosis. www.nature.com/scientificreports/