Introduction

The World Health Organization (WHO) announced SARS-CoV-2 as a pandemic in January 2020. Globally, as of date, there have been over 159 million confirmed cases of COVID-19 with the second COVID-19 wave kicking in around the 1st week of March 2021 and registering over 3.3 million deaths. According to a report, preventive and treatment options is one of the top two scenarios, the other being digitalization drive, which needs global adoption by the post-COVID-19 world to bounce back and get on to the revival path1. Since December 2019 the global COVID-19 pandemic has reached unprecedented deaths and death rates, currently continuing its more dreaded second wave. The world has called for preparedness for COVID-19 third wave as well. As of date, 475 COVID19 vaccine candidates are currently undergoing trials belonging to at least one of the fourteen different vaccine platforms2,3,4. Among them, 166 (34.9%) candidates are undergoing clinical trials, 251 (52.8%) candidates are undergoing the preclinical trials, 42 (8.8%) are in the Discovery phase and the remaining 16 (3.4%) have been discontinued. Currently, 34.9% (166) vaccine candidates belong to the protein subunit platform which emphasizes the importance and future potential of peptide-based vaccine candidates. There are 94 protein subunit-based vaccine candidates (~ 56.6%) that are under preclinical trials indicating there is a huge opportunity for peptide-based vaccine candidates4. As can be observed from the current vaccine approval statistics, protein subunit vaccine candidates have the highest approvals. Currently, out of 36 vaccines approved by at least one national regulatory authority(NRA) in the world, 12 belong to protein subunit platform as compared to the approval status eight months ago, where only one protein subunit vaccine: EpiVacCorona (Russia NRA), had the approval which is a peptide-antigen-based vaccine5. There are ten vaccines currently approved and listed under the WHO Emergency Use Listing (EUL) and among them two of the vaccines (COVOVAX™ and NUVAXOVID™) listed belong to the protein subunit platform where the same list did not have any Protein subunit platform based vaccines before the last revised list6,7. One of the first efforts of in silico vaccine design was published in 1999, formally launching immunoinformatics8,9,10. One of the first attempts to use multi-epitope-based polypeptides (MEBPs) for HIV-1 vaccine development dates back to 199911. In recent times MEBPVCs have shown extraordinarily high humoral and cellular responses to immunization. One of the first computational methods for designing MEBP vaccine constructs was published in 201012,13. Since then numerous articles related to in silico design of MEBP vaccine development have been published addressing vaccines against cancer, Chagas disease, filarial diseases, multi-drug resistance, Malaria, TB, and others14,15,16,17,18,19,20,21. MEBPVCs are gaining preference as they provide better control over the immunogenic components of the pathogen responsible for causing diseases, better reproducibility, and experimental control22. A recent article compiles all the interacting regions of the spike proteins from different pathogenic coronaviruses with human immune sensors like Toll-Like Receptors (TLRs), derives the epitopes specific to the interacting regions and designs MEBP. This methodology is claimed to provide a more comprehensive way to fight COVID-19 infection23.

Recently, many novel multi-epitope-based peptide (MEBP) vaccine constructs (MEBPVCs) against Spike Protein and or multi-targets of SARS-CoV-2 that causes COVID-19 disease have been designed using in silico approaches. Rahmani et. al., 2021, proposed a trivalent (multi-target) MEBPVC that contains new components such as an intracellular delivery agent (TAT) and synthetic epitope (PADRE) in addition to conventional components such as adjuvants (β-defensin 2), the predicted epitopes, and linkers to boost the immune response24. The novelty in Saha et. al., 2021 publication is dual-purpose epitopes i.e., each epitope predicted is a B-Cell derived T-cell epitope with a fixed size of ten amino acids. In other words, each predicted epitope triggers a response from both B-Cells and T-Cells simultaneously. This approach keeps the size of MEBPVC small yet efficient, keeping the titers high from both B-cells and T-cells25. Similarly, Khairkhah et. al., 2020, in their design, have not used any adjuvants26. Table 1 gives a detailed summary of the most recent in silico approaches in MEBPVCs against spike protein or multi-targets of SARS-CoV-2.

Table 1 The common components, i.e., adjuvants, linkers and predicted epitopes specific to various targets of SARS-CoV-2 are used in a MEBPV construction.

It is a fundamental and proven fact in molecular biology that a change (modification/mutation) in the composition and or order of an individual or a group of amino/nucleic acids in the protein/DNA/RNA sequence could bring substantial change to the fold/3D structure and hence alter the function. In the context of peptide subunit vaccine platforms, this would mean that any change in the order or composition of amino acids of the peptide-based immunogen (especially large sized immunogens), may undergo changes in the fold and 3D structure. This in turn is expected to alter the biophysical and immunological properties of the MEBPVC such as surface accessibility of the epitopes in the MEBP. Any change in the epitope accessibility to the host immune system influences the immunogenicity/antigenicity of the MEBPVCs.

With increasing adoption, strong interdependence of structural biology and immunoinformatics, and with a trend for designing bigger and larger vaccines with sizes ranging from 10 to 100 kDa, the emphasis should be to utilize the protein structural biology knowledge for effective vaccine design. The opportunity for structure-based vaccine design leads to next-generation immunogen development30,31. In this article, we present a novel methodology to computationally design and validate MEBPVCs. This study indeed provided interesting insights that will help in developing novel MEBP specific design tools and also speedup and improve the current vaccine design methodologies and protocols which is not only the need of the hour but also a need for global preparedness for the future pandemics supporting novel platforms32.

Results

The purpose of a vaccine (MEBPVC in the current context) is to elicit a strong response from the host immune system against a disease (COVID-19) releasing various neutralizing antibodies which continue to stay in the body to protect the host from any repeat infection thenceforth. Toll-like receptors (TLRs) are the common receptors that interact with the immunogen (MEBPVC) and trigger the downstream response and release of neutralizing antibodies. From the informatics point of view, to correlate the properties, to derive relationship with immunogenicity and changed epitope positions of the MEBPVC, the necessary data was generated at four levels: (a) sequence level, (b) 3D structure level, (c) receptor-ligand interaction level and (d) dosage versus immune response level. The data generated thus, are analysed to understand if the changed epitope positions influenced the various properties and eventually the immunogenicity. Prior to the analysis, to establish the fact that the change in the order/positions of the epitopes in a MEBPVC changes the immunogenicity, one has to first assess the diversity among the MEBP variants. To assess the same, pairwise alignment, multiple sequence alignment (MSA), and structure alignment were performed on the ten variants. The results are discussed below. Table 2 provides the percent sequence identity (Lower triangle) and root mean square deviation (RMSD) (upper triangle) as a ready reference.

Table 2 For ready comparison between the MEBP variants, the data is presented as percent sequence identity (lower triangle: below 100.00 diagonal) and root mean square deviation (RMSD) (upper triangle: above 100.00 diagonal).

Sequence & structural similarities between the MEBP variants

The average sequence identity between the variants is around 49% which is understandable considering the common adjuvants (45AA), linkers(AAY:3 × 5(copies) = 15AA, GPGPG:5 × 5(copies) = 25AA), and HIS tags(6AA) which make up to 49% (91AA) of the 183 AA long MEBPVC. Interestingly, the MEBP variants with IDs, SPVC_214 and SPVC_383, have 73.51% (highest) sequence identity (Fig. 1a).

Figure 1
figure 1

Structure alignment: (a) SPVC_214 (yellow) and SPVC_383 (green), 73.51% (SeqId), (b) SPVC_214 (yellow) and SPVC_387 (green), 39.13%, (c) REF_SEQ (yellow) and SPVC_214 (green), 17.3Å, (d) SPVC_32 (yellow) and SPVC_446 (green) 3.38 Å.

When the epitope positions between SPVC_214 and SPVC_383 were studied, the same epitopes were present at the 2nd, 6th, 8th, 9th and 10th positions (Table 3).

Table 3 Comparative epitope positions in the ten MEBP variants with % identity and RMSD with reference to REF_SEQ MEBPVC.

Further, at the 7th position, almost identical epitopes, VLSFELLHA, VVVLSFELL were present in SPVC_214 and SPVC_383 respectively which clearly justifies the 73.51% sequence identity (Table 2). However, their RMSD is one among the highest, i.e. 16.60Å (Table 2) indicating very dissimilar structures which is contrary to the notion in homology modeling that is, high sequence identity (> 30%) indicates high structural similarity and same function.

The pair with the least sequence identity (39.13%) is SPVC_214 and SPVC_387 (Fig. 1b). Their RMSD is 4.92 Å indicating reasonable structural similarity though sequences are not very similar. The pair with the highest RMSD i.e.17.3 Å are REF_SEQ and SPVC_214. With such high RMSD, it is commonly expected to have very little sequence similarity between the two sequences. However, they have a sequence identity of around 65%. Further analysis revealed that the same epitopes were seen at 2nd, 3rd, and 4th positions in the two MEBP constructs (REF_SEQ and SPVC_214) (Table 2) justifying the above-average sequence identity. Similarly, the pair with the lowest RMSD i.e. 3.38 Å are SPVC_32 and SPVC_446 with their sequence identity of only 46.24% only. Figure 1c,d show structure alignments of the MEBP variant-pair with lowest (3.38Å, SPVC_32 and SPVC_446) and highest (17.3Å, REF_SEQ and SPVC_214) RMSDs. These typical cases are clearly indicating that there are changes in the 3D structures of the MEBP variants on changing the positions/order of the epitopes in the MEBPVCs.

Immunological and biophysical properties of the MEBP dataset

Immunological properties

Antigenicity, allergenicity

The antigenicity scores from Vaxijen 2.0 server indicate that all the MEBPVCs are probable antigens with a range between 0.62 to 0.78 (Table 4). The sequence, SPVC_446 has the highest antigenicity(0.78) and SPVC_214 (0.62) has the lowest antigenicity. All the allergenicity scores predicted using AllerTop v2.0 and AllergenFP v1.0 servers indicate that all the variants are non-allergens hence the allergen column was omitted from (Table 4).

Table 4 The raw data of the Immuno- and Biophysical properties of the ten MEBP variants for ready reference.

Biophysical properties

Stability

The stability scores of MEBPVC variants fall between 73.17 to 76.97 which is the first parameter used as a filter. As a rule, all the MEBPVC variants must be predicted as stable which is found to be true for all the variants in the dataset. SPVC_214 is predicted to have the lowest and SPVC_537, to have the highest stability scores. The variations in the stability scores, though not very dispersed, indicate that change in the order of epitopes influenced the stability of the vaccine construct.

Solubility

The next biophysical property considered is solubility. Less soluble proteins are a major concern since the proteins synthesized may not fold to the right structure and hence lose the activity and function and are observed to precipitate out or form inclusion bodies leading to various disease states33. The solubility scores range from 0 to 1.0 where > 0.5 score indicates soluble and < 0.5 indicates insoluble peptide. In our case, all the MEBP variants had solubility scores ranging from 0.73 to 0.83 indicating all are soluble. The SPVC_446 variant has the lowest solubility and the SPVC_32 variant has the highest solubility. These solubility scores also indicate that the order of epitopes in the MEBPVC is important and crucial in the design of a good vaccine candidate.

Accessibility

Solvent accessibility is an important feature which, in the current context, has direct implications in eliciting the immune response in the host. The higher the epitope accessibility the more immunogenic the vaccine candidate. For the MEBP variants, the percent epitope accessibility (PEA) ranged between 35 to 37. The variant SPVC_387 has the lowest accessibility and SPVC_383 has the highest accessibility.

Disorder

In our MEBP variants disorder ranges between 0.15 to 0.16 and hence all variants are considered ordered. The MEBPVC sequence SPVC_357 has high disorder and SPVC_387 has low disorder among the variants. Low disorder is considered favorable for better vaccine design.

Aggregation

The predicted aggregation propensities ranged from 2.7 to 3.3 with lower values considered favorable. The sequences REF_SEQ and SPVC_387 have the highest aggregation propensity and SPVC_32, SPVC_565, SPVC_537 have the lowest propensity. Table 4 furnishes further details.

Hydrophobicity

Higher hydrophobicity shows better globularity, better accessible surface residues, and rigid 3D structure. The predicted hydrophobicity values ranged from 0.194 to 0.239, where higher hydrophobicity values are considered more favorable. The sequence SPVC_446 has the highest hydrophobicity and SPVC_214 has the lowest hydrophobicity.

Positive/negative influencers

Protein Stability Index, Surface Accessibility, Solubility, Sequence Identity & Sequence Alignments, Hydrophobicity, Docking (Z Rank) Score and Binding Affinity were categorized as positive influencers. Intrinsic Disorder, Protein Aggregation Propensity, RMSD and root mean square fluctuations (RMSF) were categorized as negative influencers.

Comparative docking analysis of MEBP variants

The properties compared in the previous sections were sequence-based. The analysis proved that the order of the epitopes indeed influenced the stability, solubility, accessibility, disorder, and aggregation properties. To make the analysis more complete and comprehensive, the following sections explore the docking and MD simulation studies using the ab initio modeled 3D structures of the MEBP variants. In the previous section, the MEBP models (variants) and their RMS deviations were discussed.

Among the family of Toll-Like Receptors (TLRs), the innate immune sensors, TLR8 and TLR4 are the most common receptors interacting with antigens/immunogens triggering an immune response from the host system to fight the immunogen. TLR8 plays an important role in the generation of effective immune responses in humans. TLR8 also senses the single-stranded RNA of viruses in the endosome and is predominantly expressed in the lungs. TLR4, plays an important role in the regulation of myocardial function, fibroblast activation, and acute inflammation by immune cells. Both the receptors are implicated in COVID-19. TLR4, is one of the ‘fate-deciding’ regulators of immunity and COVID-19 immunopathogenesis34. Table 5 shows the docking scores, binding affinities, minimization energies for both the receptors (TLR8 and TLR4) in complex with the MEBP variants.

Table 5 Minimization energies, Docking scores, binding affinities (MMGBSA), RMSD and RMSF of MEBP variants with both the receptors (TLR4 and TLR8).

ZRank Score is used to assess the quality of protein-protein docking. A more negative ZRank score indicates better docking. As can be seen from the tables, the ZRank scores are varying from (−100) to (−150) for the available MEBP variants. With TLR8, SPVC_537 has a better ZRank Score but an unfavorable binding affinity (11.92). With TLR4, SPVC_565 is showing favorable ZRank Score and acceptable binding affinity. It is interesting to note that there are at least four different MEBP variants having better ZRank scores when compared to REF_SEQ within a dataset size of ten variants. This definitely proves that changing the order of epitopes influences the 3D structure, which in turn influences the binding with immune machinery (TLRs) indicating the effectiveness of the immunogen (MEBPVC). Fig. 2 shows the docked MEBP variants with TLR4.

Figure 2
figure 2

Docked ligand of (a) SPVC_32, (b) SPVC_214, (c) SPVC_565, (d) REF_SEQ (yellow) with TLR4 (green).

Simulation analysis of MEBPVC variants complexed with TLRs

The molecular dynamic simulation production run for 100 ns yielded a center projected trajectory in which the MEBP vaccine complexes were centered along the system in order to calculate the relative RMSD and RMSF for the MEBP vaccine complexes. The calculated RMSD for all the MEBP-TLR complexes were interpreted as maximum deviation data points and the average deviation data points to arrive at a holistic conclusion. Normalizing the data points and considering average deviation data points will provide us with much more stable MEBP-TLR complexes. Considering only the maximum deviation data points and neglecting the rest of the stable data points of the MEBP vaccine complex simulation over a 100 ns span will not be feasible for a simulation of this larger time span. The RMSF calculations were also performed similarly (RMSD calculation). The obtained total RMSD/RMSF calculations of ten MEBP vaccine candidates are represented in Table 5. The maximum and average RMSD of the REF_SEQ vaccine candidate is 0.53 nm and 0.45 nm for the TLR4 complex and 0.62 nm and 0.48 nm for the TLR8 complex.

The maximum and average RMSF of the REF_SEQ is 0.54 nm and 0.19 nm for the TLR4 complex and 0.71 nm and 0.18 nm for the TLR8 complex respectively. Considering REF_SEQ complex as the reference, all the other MEBP vaccine complexes were screened accordingly. Combinatorial approach of considering both the vaccine potency score and stability will help us arrive at the most potent of the MEBP vaccine candidates. The combined MVP is calculated with our scoring algorithm based on various physicochemical parameters. The MVP score is observed to be relatively higher for SPVC_446 and SPVC_537 with a score of 8.858 and 8.899 respectively, when compared to the REF_SEQ with an MVP score of 8.595. Both SPVC_446 and SPVC_537 prove to be promising vaccine candidates with high potency scores. The maximum/average RMSD of SPVC_446 is 0.4 nm/0.39 nm and SPVC_537 is 0.43 nm/0.35 nm for TLR4. The maximum/average RMSF of SPVC_446 is 0.36 nm/0.32 nm and SPVC_537 is 0.49nm/0.4 nm for TLR8. Figure 3 represents all the RMSD and RMSF calculations and respective plots. The RMSD plots show that the SPVCs are more stable when in complex with both TLR4 and TLR8. It can be observed that REF_SEQ is relatively less stable when compared to SPVC_446 and SPVC_537. RMSF plots of SPVC_446 and SPVC_537 indicate better interactions and higher stability, when in complex with TLR4 (Fig.3e). However, all the three SPVCs show similar high stability when in complex with TLR8 without a clear distinction as seen in TLR4 complexes (Fig. 3f).

Figure 3
figure 3

Molecular dynamic simulation studies of each SPVC interactions with TLR4 and TLR8. (a) RMSD plot of REF_SEQ, REF_SEQ-TLR4 and REF_SEQ-TLR8 complexes (b) RMSD of SPVC_446, (c) RMSD of SPVC_537, (d) RMSF plot of REF_SEQ, SPVC_446 and SPVC_537, (e) RMSF plot of TLR4 in complex with REF_SEQ, SPVC_446 and SPVC_537, (f) RMSF plot of TLR8 in complex with REF_SEQ, SPVC_446 and SPVC_537.

Taking the stability of the complexes of MEBP vaccine candidates into consideration to derive a conclusion to select the best of the vaccine candidates, the SPVC_446 proves to be the best among the vaccine candidates. In spite of SPVC_537 having higher MVP score, the complex falls short in the stability parameter which is an important property to be looked into for biological activity.

Dosage versus immune response simulation analysis

As the last of the in silico tasks, we performed dosage vs immune response simulation for the ten vaccine constructs (MEBP variants) using the C-IMMSIM server with the same objective, to see if the variants trigger different responses than REF_SEQ, if so will the response indicate more potency or less. Two simulation experiments were done: (a) with adjuvant and HIS-tag and (b) without adjuvant and HIS-tag. The MEBPVC variants had all the parameters within the optimal and recommended ranges for them to be considered as a potent vaccine candidate individually with the exception of SPVC_387 (Supp. 2 Fig. S19) (without adjuvants + HIS-tag). A common observation has been that a repeated exposure led to an overall increase in the immune response and a decrease in the antigenic load (Supp. 2 Fig. S120).

Few observations are presented here. When compared to REF_SEQ, all other variants trigger strong antibody (especially IgM or IgM + IgG) responses with their 1st dose (exposure). Of all the variants, SPVC_214 is seen to trigger the highest titers of IgG + IgM. Of all the constructs, REF_SEQ triggers the weakest. The titers reach ~650,000 counts per ml for SPVC_214 and others but only ~580,000 counts per ml for REF_SEQ.

It is interesting to note that variants without adjuvants + HIS-tag seem to trigger more strongly than with adjuvants + HIS-tag. The antibody titers reach 90,000 counts per ml without adjuvants + HIS-tag as compared to only 20,000 counts per ml with adjuvants + HIS-tag on exposure to 1st dose of SPVC_214. The IgM + IgG titers reach ~760,000 counts per ml on the last (third) exposure. A similar trend is seen for all other variants as well, where without adjuvants + HIS-tag are triggering a better immune response. Figs. 4a–c and 5a–c show the level of immunoglobulins (with and without adjuvant + HIS-tag) at two different doses (1st, 2nd, and 3rd).

Figure 4
figure 4

Immunoglobulin counts of MEBP variants (with adjuvants + HIS-tag) after (a) 1st dose, (b) 2nd dose, (c) 3rd dose (X axis—vaccine constructs, Y axis—counts per ml).

Figure 5
figure 5

Immunoglobulin counts of MEBP variants (without adjuvants + HIS-tag) after (a) 1st dose, (b) 2nd dose, (c) 3rd dose (X axis—vaccine constructs, Y axis—counts per ml).

It is also observed that some variants such as SPVC_383 trigger high TH cell populations per state with counts reaching around 8200 per mm3 with an average antibody (IgG + IgM) response of around 570,000 counts. When compared to others, the same variant also shows the best B cell population counts (800 per mm3). According to the MEBP dose versus IFN-γ response simulation, it is interesting to note that variants with adjuvant + HIS-tag and without adjuvant + HIS-tag have totally different trends as seen in Fig. 6a,b. For example, SPVC_446 (with adjuvant+HIS-tag) triggers the highest concentration of IFN-γ and REF_SEQ has the lowest concentration of IFN-γ in the first dose. These above observations clearly demonstrate that change in the epitope order in a MEBP vaccine candidate influences immunogenicity.

Figure 6
figure 6

Concentration of IFN-γ for all MEBP variants ‘(a) with adjuvant + HIS-tag, (b) without adjuvant + HIS-tag) after the 1st, 2nd, and 3rd doses (X axis—vaccine constructs, Y axis—ng/ml).

Ranking the ten variants and identifying the most potent MEBPVC

Table 6 summarizes the receptor specific scores (RMVPs) and final MVP score for each variant. From the MVP score, SPVC_446 is predicted to be the most potent MEBPVC followed by SPVC_537. As can be seen, the least potent is REF_SEQ clearly proving that better and more potent MEBPVCs are possible by changing the epitope order and that epitope order influences immunogenicity (details about the normalized data are provided in Supp. 3).

Table 6 MVP of each variant is calculated by adding receptor associated MVP (RMVP)s.

Discussion

Vaccine development typically takes 10 years. In the pre-COVID-19 world, the fastest vaccine development time recorded was four years against mumps. It is no small feat to develop a vaccine against COVID-19 in a span of 9–10 months and vaccinate nearly 1.5 billion people. This shall be the new benchmark and reference for future vaccine development strategies and preparedness for future pandemics. This has become possible because of global cooperation for vaccine research and distribution35.

The current COVID-19 vaccines listed under EUL have respective advantages and disadvantages36. The major disadvantage of Pfizer/BioNtech Comirnaty vaccine is its stringent cold chain requirement though it has shown very good titers37. The adenovector-based vaccines show relatively less effective neutralizing antibody response38. Inactivated vaccines seem to show inferior immunogenicity and low T Cell response, though have shown lower adverse reactions39. Similar to inactivated vaccines, the protein subunit vaccines show low immunogenicity. However, the possible advantages and potential benefits attracted the pharma companies to invest in protein subunit platforms. More than 30% of the total COVID-19 vaccine candidates undergoing trials are protein subunit vaccines with 65% under preclinical trials. Peptide-based vaccines have many unique benefits such as (1) fully defined composition, (2) affordable large scale production, (3) stable in storage and freeze-dryable, (4) absence of biological contamination, (5) minimum allergic and autoimmune responses, (6) customizable multipurpose therapeutic, and (7) standard rDNA technology-based production and manufacturing protocols in place. MEBPVC has further advantages in designing multi-epitope, multi-target, multi-copy, multi-disease, and custom-size (molecular weight) vaccine constructs. The MEBP subunit vaccine platforms are in the initial phases of development.

Applications of in silico approach to design a MEBP vaccine is one of the pragmatic opportunities that can reduce the time in developing vaccines and reach the market in shorter time duration. The in silico methodology presented in this article shall further reduce the time to identify potential new vaccine candidates under the protein subunit vaccine platform.

It is known that peptide vaccines are weakly immunogenic. Considering the advantages offered by peptide-based vaccines that include MEBP vaccines, it is worth addressing the peptide vaccine-specific issues, where the major issue seems to be lower immunogenicity. This limitation is being effectively addressed through (a) combining with adjuvants such as β-defensin 2, HSP70, HBD-2, Matrix-M1, nanoparticles, (b) altering the size (molecular weight), and others. Adjuvants have shown to significantly boost immunogenicity but have not matched the current platforms such as RNA, adenovirus vector, and inactivated virus-based platforms40,41,42.

It is the fundamental phenomenon that changes in the amino acid order change the structure and function, giving the clue that the earlier reported MEBPVC (REF_SEQ) could have variants if the epitope order changed. A set of ten variants were generated manually to explore if the variants thus generated have altered immunogenicity.

The variants were analyzed at the sequence, structure, interaction and dosage levels. In homology modeling, it is a common rule of thumb that for any two sequences, if the sequence identity is > 30%, it is assumed that their 3D structures shall be similar and likely to have identical function43. Further, it is also believed that with the increase in the sequence identity, the structural similarity also increases, i.e., RMSD decreases. However, it is interesting to note that the variants show deviations from the rule of thumb. As can be seen from Table 2 and Table 3, there are many pairs that show deviations. There have been studies that proved that 3D structures of 100% identical sequences were having natural conformations that have RMSDs as large as 24Å44. There have been studies where an all-α helix protein (Protein G) was engineered and transformed into an all-β protein(Protein Rop) by changing only 50% of the amino acid composition45,46,47. There is a need to experimentally verify the MEBP variant 3D structures through experimental structure determination techniques such as X-Ray Crystallography, NMR and or Electron microscopy.

The analysis, indeed, strongly suggests that changing the epitope order in MEBPVC changes the structure and hence the various associated properties resulting in the alteration of immunogenicity of the variant. Hence more potent MEBPVCs can be identified if the protocol described in this article is followed. The step to generate a dataset of shuffled variants is key in the analysis as this step enables the comparative study which otherwise has not been reported till now in MEBP vaccine design protocols. MVP score has also been developed which provides an opportunity to rank and identify the most potent MEBPVC from the dataset. Further, the data generated becomes the necessary input for developing better scoring schemes and algorithms.

The MVP scores were calculated through summing the individual RMVP scores of TLR4/8. The RMVP scores had two influencers: positive and negative. The positive influencers majorly indicate the physicochemical parameters, stability parameters and binding affinity of the protein complex. It also gives an overall insight about the functional ability of the MEBP-TLR complex. Likewise negative influencers were also taken into consideration: disorderness, aggregation, RMSD and RMSF. The negative influencers directly incorporate the instability aspect of the MEBP-TLR complex to the MVP score. The RMSD and RMSF in depth analysis through molecular dynamic simulation provided data about the interaction stability and residue fluctuation (OPLS all atom forces fields and SPCE water model were used). SPVC_446 and SPVC_537 showed the highest MVP score and hence, are the most potential MEBPVCs from among the variant dataset. The normalized positive influencer scores of these two MEBPVCs were higher as it had positive influence over the stability and binding interactions. The normalized negative influencer scores of these two MEBPVCs were lower as it inversely influenced the disorderness and binding affinity of the MEBPVCs. The RMSD and RMSF tend to be higher for highly disordered or misfolded proteins48. Hence the RMSD and RMSF were considered as the negative influencers while calculating the MVP score. Lower the RMSD and RMSF of the MEBPVCs, higher the stability. The lower RMSD and RMSF indicates higher stability and lower misfolding candidates i.e. SPVC_446 and SPVC_537.

Compared to REF_SEQ, all other variants trigger strong antibody responses. SPVC_537 is seen to trigger the higher titers of IgG + IgM. Of all the constructs, REF_SEQ triggers the weakest. IFN-γ is secreted by natural killer (NK) cells, which is activated through cell damage signals from infected or damaged cells through Cytotoxic T cells (CTLs). IFN-γ plays a major role in activation of macrophages, dendritic cells and T cells to curb the further downstream infection49. According to the MEBP dose versus IFN-γ response simulation, SPVC_446 triggers the highest concentration of IFN-γ and REF_SEQ has the lowest concentration of IFN-γ in the first dose. All the other higher responsive SPVCs were rejected based on the lower MVP score and stability. Similarly, the SPVCs that triggered higher antibody responses were also ignored, and more stable complexes were chosen to avoid possibility of protein aggregation and complications caused through misfolded proteins50.

It is pertinent to mention that the number of variants was restricted to ten out of a possible 3,628,800 (10! possibilities) unique variants of similar length. The peptide length was restricted to 183 amino acids. The scope of the work was to test if the immunogenicity changes with the epitope order in the MEBPVC. For some parameters, it was also observed that the changes in the biophysical and immune parameters were not significant. This can be attributed to the small sample size, restriction on the length of the construct, and manual shuffling of epitopes for such small changes. The next challenges are to work with a bigger dataset of variants, optimize the parameters influencing the MEBP vaccine design, gain deeper insights into the mechanism(s) behind the mutations and their virulence and improvise the MVP score incorporating the knowledge and the decision making systems such as AI and ML51.

Materials and methods

In general, a MEBP sequence is constructed with three broad components (peptide sequence patterns), namely, Linkers (for example AAY, GPGPG, EAAAK), Adjuvants for example β-defensin 2 and HSP70, and predicted target specific MHCI and MHC II Epitopes (oligopeptides with size ranging from 8AA to 20AA). The MEBP sequence (REF_SEQ) published earlier is taken as a reference for this study25. Table 4 lists the properties of REF_SEQ MEBPVC.

Construction of MEBP and its variants

The construction of the REF_SEQ is described elsewhere25. Using the REF_SEQ another nine MEBPVCs were generated by shuffling the epitope-linker set at the predesignated positions of REF_SEQ (Fig. 7).

Figure 7
figure 7

Schematic template of a typical MEBPVC to generate variants from REF_SEQ. The MEBPVC contains five major components: (1) N-terminal adjuvant (blue color), (2) predicted MHC I epitopes (green color border, position number 1–5, TLDSKTQSL, GKQGNFKNL, CYGVSPTKL, KIADYNYKL, VVVLSFELL), (3) predicted MHC II epitopes (magenta color border, position number 6–10, IGINITRFQ, YGFQPTNGV, VLSFELLHA, LQIPFAMQM, IAIVMVTIM), (4) linkers AAY (orange color), GPGPG (cyan color), (5) C-terminal HIS-tag (red color). The MEBP variants were generated by shuffling the epitopes at the ten positions.

The generated MEBP variants are given unique IDs following the format as: SPVC_NNN, where SPVC stands for shuffled peptide vaccine candidate and NNN stands for a unique number. The ten sequences are provided as supplementary data (Supp. 1 file).

The following restraints were applied while generating the MEBP variants: (a) The position and order of N-terminal adjuvant (β-defensin 2) followed by EAAAK (linker) and the C-terminal HIS-tag were kept unchanged. (b) The length of the construct was kept unchanged at 183AA only. (c) The B-cell-derived T-cell epitope (9AA long) plus the linker (AAY/GPGPG), together, were rearranged manually to create nine MEBPVC variants. (d) No distinction was made between MHC-I (green) and MHC-II (magenta) epitopes. The molecular weight, the sequence length, the isoelectric point (pI), the aliphatic index, the number of atoms, half-life, and the chemical formula remained identical for all the variants.

Sequence alignment

The formatting code, all-to-all sequence alignment among the members of the MEBP dataset was done using BioInt52, Biobhasha (https://www.biobhasha.org), and BOSv2.0 (Biological Object-Based Software (BOS): An Integrative Biological Programming Environment). Multiple sequence alignment and sequence alignment renderer were performed using clustalw (https://www.genome.jp/tools-bin/clustalw) and ESpript 3.0 (https://espript.ibcp.fr/ESPript/cgi-bin/ESPript.cgi) 53,54.

Prediction of immunological and biophysical properties of MEBPVCs

The following relevant immuno- (antigenicity and allergenicity) and biophysical- (protein stability index, surface/solvent accessibility, solubility, inherent intrinsic disorder, aggrescan, hydrophobicity) properties were predicted using the online/offline web servers/tools to compare and develop a rationale for identifying the more potent ones.

Antigenicity

Antigenicity is the extent to which the host immune system responds to the antigen (foreign body) triggering both humoral and cellular responses. VaxiJen2.0 server (http://www.ddg-pharmfac.net/vaxijen/VaxiJen/VaxiJen.html) was used to predict the antigenicity of the MEBP variants55. The result from VaxiJen 2.0 categorizes the peptide input into either Probable ANTIGEN or Probable NON-ANTIGEN. Only those variants which were categorized as Probable ANTIGEN were selected for further processing.

Allergenicity

Allergenicity is the extent to which an immunogen or antigen induces allergic reactions in the host system resulting in discomfort and or inconvenience or any allergic conditions such as asthma, diarrhea, skin rashes, and others. AllerTop v2.0 server (https://www.ddg-pharmfac.net/AllerTOP/) was used to predict the allergenicity of the MEBP variants56. The output from AllerTop v2.0 indicates if the input sequence is an allergen or not using the following restricted text values i.e. PROBABLE ALLERGEN or PROBABLE NON-ALLERGEN. Only those variants were selected for further processing which were categorized PROBABLE NON-ALLERGEN.

Peptide/protein stability index

Protein stability is an important property especially to understand the structure-function and activity relationships of a protein. The EXPASY ProtParam server (https://web.expasy.org/protparam/) was used for predicting the Instability index of the nine MEBPVCs57. The instability index was modified to the stability index by subtracting the score from 100. A score of more than 60 henceforth indicates the input protein to have better stability.

Solvent accessibility

Solvent accessibility is an important feature for understanding and interpreting the structure–activity relationship58. The solvent accessibility or the surface exposed residues provide data that helps in various predictions such as protein–protein interactions, receptor-ligand interactions, drug designing, protein folding, and others. Scratch Protein Predictor (http://scratch.proteomics.ics.uci.edu) was used to predict the solvent accessibility of the variants59. The output from Scratch Protein Predictor contained residue level accessibility with higher values indicating higher accessibility and vice versa. In the context of peptide vaccine design, higher accessibility and especially the residues with high accessibility in the epitope regions of MEBP is preferred since to elicit an immune response, the epitopes in the vaccine construct should be accessible, be exposed, and be on the surface. The accessibility predictor gives a string output, equal to the length of the sequence, with ‘e’ (for each exposed/accessible residue) and ‘b’(for each buried residue). A more meaningful accessibility score in the context of MEBP vaccine design is percent epitope accessibility (PEA) defined and calculated as per the formula given below:

$$\begin{gathered} PEA = \frac{{ \, Total \, count \, of \, ^{\prime}e^{\prime}s \, in \, the \, Epitope \, {\text{Re}} gions \, of \, the \, MEBPVC \, sequence}}{Total \, count \, of \, ^{\prime}e^{\prime}s \, in \, the \, MEBPVC \, sequence} \times 100 \hfill \\ \, \hfill \\ \end{gathered}$$

The higher PEA value is considered favorable.

Solubility

Solubility of protein is an important biophysical property that depends on the amino acid composition and the 3D structure. Solubility influences the production of a protein and its half-life in the cell. Less soluble proteins are a major concern since the proteins synthesized may precipitate out or form inclusion bodies which lead to various disease states. SoluProt v1.0 server (https://loschmidt.chemi.muni.cz/soluprot/) was used to predict the solubility of all the MEBP variants60. For each input sequence, i.e. an MEBP variant in the current context, the SoluProt v1.0 server gives a score in the range of 0–1.0 where > 0.5 score indicates soluble and < 0.5 indicates insoluble peptide. The vaccine constructs with higher solubility (> 0.5) were selected for further processing.

Inherent intrinsic disorder

Intrinsically disordered proteins (IDPs) [a.k.a intrinsically unstructured proteins (IUPs)] are proteins that deviate from the dogma that every protein has a rigid 3D structure. IDPs or intrinsically disordered regions (IDRs) regulate many important biological functions such as transcription regulation, tissue-specific expression, and other signal transduction pathways. IDRs show high conformational changes, influence the stability of the protein, and affect the binding modes during ligand-receptor interactions. These regions play an important role in protein-protein interactions. IUPred2A server (https://iupred2a.elte.hu/) was used to predict the disorder of the MEBP variants61. The output has residue-wise disorder values and contiguous IDRs. A value >0.5 is considered disordered and the value <0.5 is considered ordered. An average disorder for each MEBP variant was calculated using the residue-wise disorder. The variants with the low average disorder are considered for further analysis.

Protein aggregation

Protein aggregation is a biological process in which protein/peptide subunits instead of forming regular and functional assemblages, misfold, aggregate (intra- or extracellularly), and precipitate. Protein aggregation is one of the important phenomena implicated in diseases such as Parkinson’s, Alzheimer's, and prion diseases62. To predict the aggregation for the variants, we used the AGGRESCAN server (http://bioinf.uab.es/aggrescan/)63,64. Variants with lower predicted aggregation were considered for further analysis.

Hydrophobicity

Hydrophobicity is the ‘water-hating/avoiding/repelling’ property of molecules. Hydrophobic amino acids tend to fold and shrink together to minimize contact with the solvent water or hydrophilic surroundings. The hydrophobic effect is a well-known important property to understand the 3D structure of a protein65. Hydrophobic interactions are an important driving force in protein folding hence the overall 3D structure. The shape determines the function of the protein. In the context of MEBPs which are the potential immunogens, higher hydrophobicity indicates more globularity, rigid 3D structure, and associated accessible surface residues. Kyte and Doolittle’s method was used to generate the Hydrophobicity values by using EMBOSS pepwindow server (https://www.ebi.ac.uk/Tools/seqstats/emboss_pepwindow/)66. The outputs were screened and filtered for higher hydrophobicity and used for further processing.

Dosage versus immune response simulation

C-IMMSIM (http://150.146.2.1/C-IMMSIM/index.php) is one of the most commonly used in silico immune system simulation servers. Given the vaccine candidate (MEBP amino acid sequence), the dosage volume, and dosage intervals, C-IMMSIM predicts the humoral and cellular immune responses in the host system over a period of time. The server uses machine learning techniques to predict the immune responses of immunogens. The time step of injection, 3 doses, was set at 1 h, 84 h (3.5 days), and 168 h (7 days). The simulation steps and simulation volume were kept at 1100 and 10.

Structural studies

Ab initio 3D structure prediction

A local installation of the I-TASSER-suite was used to predict the tertiary structures of the MEBP variants. The suite uses the Local Meta-Threading server (LOMET) to find out the suitable template structure from the input sequence. In the second step, the server performs the template-based fragment assembly simulation to create a full structure and the final step is that it will produce the top five predicted models with TM-score (Quantitative assessment of similarity between protein structures) and C-score (confidence score for estimating the quality of predicted models).

Molecular docking and binding affinity calculations

The protein-protein docking was performed in the Discovery studio (Z_DOCK module)67. The receptors (TLR4 and TLR8) were downloaded from RCSB with PDBIDs:4G8A, 3W3M, and the MEBP variants were used as ligands. The docking resulted in ~2000 clusters with each cluster containing ~70 to 80 poses of ligands (MEBP variants) with the receptor (TLR4 and TLR8). The top poses with the highest Z_RANK-SCORE were selected to find R_DOCK-SCORES (Refined docked protein). The R_DOCK uses CHARMm energy to optimize docked poses produced by the Z_DOCK module. The top hits from the R_DOCK module were selected for further simulation studies. The binding affinities were calculated using the MMGBSA module from Discovery studio68.

Molecular dynamics simulation

The MEBP variants were scored and assorted based on the favorability of each property it represented individually. Among hundreds of variants, we have selected the top 10 MEBP for finer investigation to produce a single potent MEBP vaccine candidate based on RMVP score (MEBP vaccine potency). The molecular dynamics simulation helps us study the stability and interaction of the MEBP-TLR complex in an ion based solvent system. All the top-scored MEBP vaccine candidates were subjected to molecular dynamics simulation using GROMACS v.2021 and topologies were generated using the OPLS force field. The system was solvated in a cubic box conformation using SPCE water model, energy minimized until the steepest descent energy, i.e. atoms are realigned to reduce the maximum net forces on them. The minimized atoms exert least forces on each atom and therefore serves as a favourable start point for molecular dynamics simulations. Pressurized and increasing temperature conditions were implemented for 100 ps. The molecular dynamics simulation was produced for a span of 100 ns over the centered projected trajectory. RMSD and RMSF were calculated using the gmx_rms option with reference to the MEBP Vaccine complex to TLR4 and TLR8 for all the ten MEBP TLR4/8 complexes.

Implementation of a scoring scheme for ranking and discovering the best MEBPVC from the library of variants.

The values obtained for the ten MEBPVC variants for the above discussed properties were grouped into (a) positive influencers (Stability, Accessibility, Solubility, Hydrophobicity, Antigenicity, ZrankScore, Binding Affinity (MMGBSA) b) negative influencers (Disorder, Aggregation, RMSD and RMSF). All the values of a property were normalized using normalization by averaging i.e.

$${x}_{n,i}=\frac{{x}_{i}}{\underline{x}}$$

where x is the original value of the property, n is the normalized value, i is the index of the variant, \(\underline{x}\) is the average, all the variants.

$$\mathrm{Receptor \; specific \; MEBP \; Vaccine \; Potency \;}(\mathrm{RMVP})_{\mathrm{i},\mathrm{m }}= {\sum }_{k=1}^{8}{x}_{n,k}-{\sum }_{l=1}^{4}{x}_{n,l}$$

where k, l are the normalized values of each positive and negative influencers (property of the variant) respectively. Where m is the receptor i.e. TLR4(1) or TLR8(2).

Final MVP for each variant, i, was calculated by summing individual RMVPs.

Conclusions

In pursuit to design and or discover more potent MEBPVCs, given the list of predicted epitopes, a methodology to generate variants and three predictors, percent epitope accessibility (PEA), receptor-specific vaccine potency (RMVP) and MEBP vaccine potency (MVP) scores have been introduced to quantify and enable an opportunity for the development of efficient MEBP vaccine design platform. This enabled not only ranking but also identifying the best MEBPVC. Thus, the results prove that the reported MEBPVC (REF_SEQ) is not the most potent candidate after all. In this article, only the order of epitopes has been used for generating variants. However, other parameters such as length of epitopes, length of the sequence, copy numbers, and or other parameters should also be explored in the future to design and discover more potent MEBPVCs. In conclusion, our in silico analysis and results indeed prove that changes in the position or order of the epitopes change the properties of the MEBPVC. Henceforth, the dataset generation method, PEA, RMVP, and MVP score should enable generating novel MEBPVCs and may be adopted in all MEBPVC design pursuits. The method enables the design and discovery of the computationally validated most immunogenic MEBPVC, each having a unique epitope order. Experimental validation and verification has no substitute; hence, the computationally validated vaccine constructs, with IDs, SPVC_446 and SPVC_537, need to be validated through in vitro, and in vivo experimental studies. The experimental validation provides important insights and inputs for improvising and developing a more efficient and more reliable scoring function and the improved versions of the software.