Immunoinformatic design of a COVID-19 subunit vaccine using entire structural immunogenic epitopes of SARS-CoV-2

Coronavirus disease 2019 (COVID-19) is an acute pneumonic disease, with no prophylactic or specific therapeutical solution. Effective and rapid countermeasure against the spread of the disease’s associated virus, SARS-CoV-2, requires to incorporate the computational approach. In this study, we employed various immunoinformatics tools to design a multi-epitope vaccine polypeptide with the highest potential for activating the human immune system against SARS-CoV-2. The initial epitope set was extracted from the whole set of viral structural proteins. Potential non-toxic and non-allergenic T-cell and B-cell binding and cytokine inducing epitopes were then identified through a priori prediction. Selected epitopes were bound to each other with appropriate linkers, followed by appending a suitable adjuvant to increase the immunogenicity of the vaccine polypeptide. Molecular modelling of the 3D structure of the vaccine construct, docking, molecular dynamics simulations and free energy calculations confirmed that the vaccine peptide had high affinity for Toll-like receptor 3 binding, and that the vaccine-receptor complex was highly stable. As our vaccine polypeptide design captures the advantages of structural epitopes and simultaneously integrates precautions to avoid relevant side effects, it is suggested to be promising for elicitation of an effective and safe immune response against SARS-CoV-2 in vivo.


Scientific Reports
| (2020) 10:20864 | https://doi.org/10.1038/s41598-020-77547-4 www.nature.com/scientificreports/ based on recombinant spike protein have been shown to be very well suited for creating immunogenicity against SARS and MERS 7,8 . While DNA, vectored and subunit candidates presenting viral epitopes can elicit a focused antibody response, their subviral components may not portray the full antigenic complexity of the virus, resulting in limited protective efficacy or immunopathology due to unbalanced immune responses 6 . One promising option to confront this issue is the subunit vaccines harbouring numerous and diverse antigenic elements, allowing to produce an inclusive spectrum of native viral antigens. The process of vaccine development takes at least ten years from bench research to approved vaccine use 9 . Identification of apt elements for developing vaccine immunogens can be promoted using chemistry and topology. As a current trend, this is becoming more of a rational design exercise, which may considerably reduce both the time and costs required for vaccine development and production 10 . In this study, it was hypothesized that advanced computational screening could identify suitable peptides for the construction of a subunit vaccine. We used the full set of the putative structural proteins from SARS-CoV-2 as the substance for extracting antigenic elements creating both B-cell and T-cell immunity. By integrating these peptides together, we developed a multi-epitope-based vaccine polypeptide with appropriate binding to Toll-like receptor 3 (TLR-3) to elicit an effective immune response, and with least possible toxicity and hypersensitivity. We also considered incorporating an adjuvant for augmenting the host antigen-specific immune response. As this method has been validated experimentally with designs against other pathogenic species 11 , we suggest that the proposed structural formulation would be potential to generate an effective immune response against the novel virus. The wet lab researchers are expected to validate our design, hoping to reach protection for the healthy community against the COVID-19 pandemic.

Results
The basic steps of the procedure for designing the multi-epitope vaccine are shown in Fig. 1.

Structural T-cell and B-cell epitopes of SARS-CoV-2. Final selected cytotoxic T lymphocyte (CTL)
antigenic epitopes harboured in the structural proteins of SARS-CoV-2 are listed in Table 1, indicating the epitopes for spike glycoprotein (S), envelope protein (E), membrane protein (M) and nucleocapsid phosphoprotein (N). The full sets of predicted epitopes and their properties can be found in Supplementary Tables S1-S4 online, for S, E, M and N proteins, respectively. The IEDB MHC-II prediction tool was applied to predict helper T lymphocyte (HTL) (15-mer) epitopes and their MHC-II binding, results of which are shown in Table 2, and with more detail in Supplementary Table S5 online. Both Tables 1 and 2 also indicate the HLA restriction of the single predicted epitopes. The peptide DVSLVKPSFYVYSRVK contained in the E protein was identified as the only linear B lymphocyte (LBL) epitope, when applying the score value > 0.75 as the criterion for selection (More detail in Supplementary Table S6 online). Using appropriate server tools, antigenic, non-toxic and nonallergenic T-cell or B-cell epitopes that were able to induce interferon-gamma (IFN-γ), interleukin-4 (IL-4) and interleukin-10 (IL-10) cytokines were selected for the design of a multi-epitope vaccine.
Multi-epitope vaccine polypeptide construction. The total of 34 CTL, and 12 HTL epitopic peptides were fused to each other by KK, and GPGPG linkers, respectively, followed by adjoining a single LBL epitope 3D structure of the vaccine polypeptide. The sequence of the multi-epitope vaccine polypeptide was submitted to appropriate tools, and a refined model of its 3D structure was obtained ( Supplementary Fig. S1 online). The output from ProSA-web validation tools showed a Z-score of -4.33, indicating the good quality of the final vaccine polypeptide structure ( Supplementary Fig. S1 online). In addition, RAMPAGE analysis indicated that 92.3% amino acids of the final structure were in the favoured area, 6.1% were in the allowed area, and www.nature.com/scientificreports/     Table S7 online). PI value (the score given by ElliPro) of 0.882 shows that 88.2% residues are locating in the predicted ellipsoid area of the epitope and this epitope features the highest solvent accessibility.
Binding of the vaccine structure to TLR-3. The best docked complex of the immune receptor TLR-3 with the vaccine model was identified from among various server outputs by comparing the binding free energy. Analysis of the residues contributing at the protein-protein interface showed that the C-terminal domain of the polypeptide is involved in the interaction, where residues from vaccine polypeptide form polar or non-polar contacts with three domains on the TLR-3 structure ( Supplementary Fig. S2 online). The docked complex was further applied for running molecular dynamic (MD) simulation investigations.

MD relaxation and analysis of the receptor-vaccine complex.
The MD simulations of the docked multi-epitope-based subunit vaccine with TLR-3 as the receptor were carried out to achieve information about the conformational changes of TLR-3-vaccine polypeptide complex. Such studies were essential for several vital facets: (1) to comprehend whether the designed vaccine was stable at the bound pocket; (2) to verify that the induced conformational mobility of both TLR-3 receptor and the multi-epitope vaccine structure did not exert undesired impact on the conformation of the docked proteins; and (3) to corroborate that the epitopes of the multi-epitope vaccine were potential for efficiently being recognized by the human immune system, causing strong immune response. Three statistical factors were assessed based on 24,000 ps of simulation trajectory (Fig. 4). The root mean squared deviation (RMSD) values of TLR-3 and multi-epitope vaccine in the complex reflected the high conformational stability of the docked molecules. An average RMSD of 0.29 nm with maximum of 0.44 nm realized at 14,000 ps was noted for the TLR-3 molecule (Fig. 4A). The RMSD value for the multi-epitope vaccine model (Fig. 4B) showed that it mostly remained stable during simulation time, with a plateau at about 10,500 ps. The www.nature.com/scientificreports/ observed trends can be attributed to the moving multi-epitope vaccine at TLR-3 binding pocket in an effort to obtain a suitable and stable docked conformation. Region-wise structural fluctuations of the TLR-3-vaccine polypeptide complex were studied by calculating the root mean squared fluctuation (RMSF) parameter (Fig. 4C,D). The mean RMSF calculated for TLR-3 and the multi-epitope vaccine were 0.2 nm and 0.83 nm, respectively, which were overall in favour of protein residues' local stability. Along the vaccine sequence, residues located at the loop regions (such as I137, T138, W204, N376 and R654) had high fluctuation (Fig. 4D). The flexibility of the loop regions is essential for proper holding of the vaccine at the binding pocket.
The compactness of the complex structure was estimated by calculating the radius of gyration (Rg) of TLR-3 and multi-epitope peptide molecules. The related graphs (Fig. 4E,F) showed that during the simulation time, TLR-3 and multi-epitope vaccine molecules had mean Rg values of about 3.56 nm and 4.87 nm, respectively. Rg fluctuations is an indication of movements and conformational changes in flexible regions of the multi-epitope vaccine peptide in the TLR-3 binding pocket. This dynamics seems essential to suitably identify the vaccine and incorporating it in the binding pocket.
Free energy of the binding between vaccine polypeptide and TLR-3. To figure out the strength of the contact between multi-epitope vaccine and TLR-3 structures, the binding free energy between the two molecules was calculated using MMPBSA approach. According to Table 4, the nonpolar element (− 136.92 kcal/ www.nature.com/scientificreports/ mol) was an important energy term in the binding free energy of the complex. Our findings clarified that the favourable electrostatic energy (E ele = − 241.5 kcal/mol) was covered up by the huge polar energy component (ΔG GB = 240.3 kcal/mol) in the binding process of the multi-epitope vaccine polypeptide. Therefore, the nonpolar energy was known as the main driving force in the vaccine binding to TLR-3, and this hydrophobic contribution leads to a thermodynamically favourable interaction (∆G binding = − 138.11 kcal/mol). To further clarify the binding mode, the calculated energy was broken down into residue-residue pairs through the binding free energy decomposition analysis. According to the data, there were several residues of vaccine polypeptide with less than − 1.5 kcal/mol free energy of contribution in the binding mechanism. The binding pose of vaccine with the key residues are illustrated in Fig. S3.

Discussion
Classic live or attenuated vaccines have a long history of success, however their deployment is associated with issues of biosafety such as autoimmune or strong allergic responses, plus difficulties with their synthesis and manufacture. These drawbacks might be addressed by developing fully synthetic peptide-based vaccines 16 . Research on COVID-19 prevention through this approach was begun immediately after release of relevant basic data such as its full genomic sequence. Specific epitope regions in SARS-CoV-2 with high homology to SARS-CoV, the best-characterized coronavirus in terms of epitope responses, were identified [17][18][19] . Antigenic properties of spike glycoprotein were more focused by theory and experimental researchers [20][21][22] . Applying this basic knowledge, a number of peptide-based designs have been presented to date, which encompass those integrating epitopes from a single or few viral protein(s) [23][24][25][26][27] to those considering the whole proteome of the virus 28 . The present research contributes to current understanding in this field by offering an alternative approach for developing a potential COVID-19 vaccine, where the entire set of structural proteins were searched for most efficacious epitopic segments. One rationale for the choice of the whole set of structural proteins of SARS-CoV-2 for epitope identification in this study, was the ample evidence on immunity-related advantages of each of the S, E, M and N proteins of SARS-CoV or SARS-CoV-2 17,29-32 . In addition, the choice of structural proteins as the source of epitopes implies a highly-conserved epitope set, as structural proteins are subject to less sequence or structural evolution 18 . We found that our proposed epitopes demonstrate overlap with similar studies which have extracted the epitopes from individual structural proteins from different sequence database entries; the similarity of findings and the conservation analyses by those research groups may also indicate the high epitope conservancy among geographical strains 26,27 . Aside from these advantages, our selection of structural epitopes aimed to cope with potential vaccine side effects, particularly the antibody-dependent enhancement of infectivity (ADEI). ADEI is a great challenge of subunit vaccines, referring to the reduced specificity in response elicitation because of the numerous conformations adopted by a peptide vaccine 10 . Strategies to mitigate this concern include: 1-immunofocusing by considering several most antigenic epitopes 33 ; 2-inclusion of structural proteins: structurally flexible coronaviral epitopes may be of limited value for in vivo immunotargeting and require to be replaced by conserved epitopes with low structural plasticity 6 . Structural proteins are characterized by highly conserved sequences, thus the linear epitopes remain highly stable in these proteins and the conformational epitopes preserve their structural patterns during various steps of the virus cycle. Our choice to include the full set of SARS-CoV-2 structural proteins has thus combined the advantages of augmented immunogenicity and geographical conservancy with precautions of both strategies proposed for reducing ADEI.
Integrating multiple immunodominant sites of viral pathogens in the vaccine structure helps augment the antigenic effect. Reports revealed that multiple antigenic peptides induce stronger B and T cells immune responses than un-conjugated peptide epitopes [34][35][36] . Thus, consecutive sequences of HTL and CTL epitopic peptides were fused to each other using accepted linkers. GPGPG has been introduced as a spacer between adjacent epitopes, because of several positive contributions to the immune-modulatory function of polypeptide vaccines, including reduction of junctional epitopes, increased proteasome processing, and augmented immunogenicity 37 . The KK linker is the target sequence of cathepsin B, a lyzosomal protease of high importance for antigen processing. Connecting two peptide antigens together using the lysine linker helps avoid induction of antibodies to the amino acid sequence that is generated by joining of two peptides and most antibodies would be reactive to each peptide 38 . www.nature.com/scientificreports/ Epitope-based peptide vaccines induce relatively weak immune response, when used alone 39 . The immunoreactivity could be improved by appending proper adjuvants. The vaccine construct in this study was prepared by fusing the epitopic peptides to β-defensin as an immunogenic adjuvant. β-Defensin has previously been reported as a potent adjuvant when conjugated with MERS-CoV antigens. Vaccines containing defensins as adjuvants have been shown, both in vivo and in vitro, to activate the primary innate antiviral immune response and mediate other immunomodulatory activities against a number of viruses, including coronaviruses 40,41 . Use of appropriate adjuvants has also been shown to help induce durable IFN-γ responses 6 .
In this study, the binding of the built vaccine model with the immune receptor TLR-3 was assessed by performing molecular docking, MD simulation and free energy calculations. Triggering TLR-3 may help induce the TLR signalling networks which activate the immune pathways evolved specifically against viral pathogens. In addition, this TLR choice was a consensus from vaccine design research works on viral/retroviral pathogens, specifically coronavirus species 12,14,25,26,42,43 . While we note that other Toll-like immune receptors such as TLR-2, -4, -5, -7 and -8 may be stimulated by the COVID-19 virus, they were not considered here as their exact roles are not established, with some possibly functioning even to the advantage of the coronavirus 44,45 .

Conclusion
In the ongoing urgent situation brought about by SARS-CoV-2, it is hard to fast counteract the circulating disease through preventative or therapeutic measures. The multi-epitope-based subunit vaccine design obtained through an immunoinformatic pipeline in this study could be promising, as it incorporates a priori bioinformatics predictions and up-to-date immunological knowledge. Despite the current highly active research community as well as the efforts of the industry section in the road to find an optimal immunogenic formulation, the extreme diversity in available design choices, with no a priori image of their effectiveness, appears as a fundamental obstacle in this route. To overcome this issue, a rapid mass design and screening strategy incorporating the same immunoinformatic approach is recommended to be developed by the research community, with the hope to find one formulation with the highest immunogenicity and biosafety.

Methodology
SARS-CoV-2 structural protein sequences. The amino acid sequences of SARS-CoV-2 structural proteins, including the spike glycoprotein (S), envelope protein (E), membrane protein (M), and nucleocapsid phosphoprotein (N), were retrieved using the NCBI reference genome of the virus (accession number NC_045512.2).

Identifying cytotoxic T lymphocyte (CTL) epitopes. Predicting peptides that are capable of inducing
CTL responses is a crucial step in the design of epitope-based vaccine. The MHC-I Binding tool of Immune Epitope Database and Analysis Resource (IEDB; http://tools .iedb.org/mhci) was used to predict the CD8 + T-cell epitopes borne in S, E, M and N proteins 12,46,47 . In this step, the ANN 4.0 method was set as the prediction method. The human was selected as the source species. The maximum IC 50 value was set to 50 nM, and percentile rank < 1 was considered as threshold since lower score indicates high affinity 48,49 .  Table S8 online), using NN-align 2.3 method 13,14,50,51 . The maximum IC 50 value of 50 nM, 500 nM and 5000 nM indicate high, intermediate and low affinity of epitopes, respectively 52 . The 15-mer epitopes with IC 50 values < 50 nM were considered to be included in the vaccine polypeptide 52-54 . B-cell epitopes prediction. The ABCpreds server was used to predict 16-mer linear B-lymphocyte (LBL) epitopes, using a threshold of 0.51 55 . In addition, the ElliPro tool of IEDB was utilized to predict conformational and linear B-cell epitopes of the vaccine polypeptide.
Assessment of identified epitopes for antigenicity, allergenicity, and toxicity. The antigenic potential of each of the T and B cells epitopes was predicted by VaxiJen v2.0 applying a threshold of 0.4 56 . The predicted T and B cells epitopes were then further evaluated in terms of toxicity and allergenicity, with Tox-inPred and AllergenFP v1.0 servers, respectively 57,58 . Then, the ability of each of the HTL and B cell epitopes (CD4 + ) to induce the secretion of cytokines, such as interferon-gamma (IFN-γ), interleukin-4 (IL-4) and interleukin-10 (IL-10), to overcome the inflammatory response and prevent tissue damage was predicted by IFNepitope, IL4pred and IL10pred server tools, respectively [59][60][61] . The IL4pred and IL10pred operations were carried out based on SVM method and hybrid method, respectively, with other parameters kept as default 12,13 . Designing the multi-epitope vaccine polypeptide construct. To develop a multi-epitope vaccine construct, we selected those predicted epitopes which demonstrated high antigenic potential, were not identified as allergic or toxic, and had good solubility when highly expressed. To construct the vaccine polypeptide, the selected CTL, HTL and LBL epitopes were fused together using appropriate linkers 62,63 . In addition, a 45-amino acid sequence prepared from β-defensin-2 protein was added to the N terminus of the vaccine sequence using the EAAAK linker, to increase the immunogenic capacity of the multi-epitope vaccine 12,52,63 . Immunogenic, allergenic and physiochemical evaluation of vaccine construct. www.nature.com/scientificreports/ of 0.4 56 . The allergenicity of the vaccine was analysed using AllerTOP v.2.0 and AllergenFP v.1.0 servers 57,64 . The ProtParam server was employed to evaluate the physical chemistry properties of the vaccine construct, such as amino acid composition, molecular weight, theoretical isoelectric point (pI), grand average of hydropathicity (GRAVY), aliphatic and instability index, and in vitro and in vivo half-life 65 .
Vaccine polypeptide structure modelling, refinement and validation. The SOPMA server was applied for analysing the secondary structural properties of the multi-epitope vaccine polypeptide 66 . The Gal-axyWEB server was employed for modelling and refinement of the 3D structure model 67 . The server relaxes the model structure using repacking and molecular dynamics (MD) simulation. Next, RAMPAGE server and ProSA-web tools were used to validate the refined 3D model 68,69 . All of these tools give us the overall quality of the 3D structure of the peptide vaccine.
Molecular docking of the vaccine polypeptide to TLR-3. The binding of antigenic agents with the target immune cell protein is crucial for the creation of a suitable immune system response. For analysing the binding pattern of the multi-epitope vaccine polypeptide with TLR-3 (PDB ID: 2A0Z) 70 , molecular docking analysis was performed by Hdock, Zdock, Cluspro and Hawkdock [71][72][73][74] . Among the molecular species docked by each of these applications, the best outputs were extracted, followed by the four docked molecules uploaded into the Hawkdock program, and the free energy of binding of each complex calculated. Based on this screening, it was determined that the model selected from the Cluspro output had the best binding free energy. Therefore, this molecular species was chosen as the primary structure for initiating the MD simulation.

MD simulation of the vaccine-receptor complex.
To determine the structural stability and to study the molecular details of the interactions between TLR-3 and the multi-epitope vaccine polypeptide in the docked conformation, MD simulation was performed. Briefly, the system including vaccine polypeptide-TLR-3 was simulated by the Gromacs-2020 package applying OPLS-AA force field 75 . The complex system was solvated using TIP3P water model. Then, the genion module was utilized to neutralize the whole system. Next, the conjugate gradient algorithm was applied to minimize the energy of the system. In an NVT ensemble, the temperature of the system gradually increased from 0 to 310 K during 400 ps. Subsequently, in an NPT ensemble, 500 ps simulation was carried out at the pressure of 1 atm and the temperature of 310 K. Production simulation for 24,000 ps was then implemented. The particle-mesh Ewald (PME) and the LINCS algorithms were applied to assess all electrostatic connections and to restrain all bond lengths in the protein, respectively. Moreover, periodic boundary condition was utilized during the simulation. The final coordinates obtained for the complex system were analysed with classic MD analyses, plus the MMPBSA method for calculating the free energy of intermolecular interactions 76 . The results were visualized by Pymol (Schrodinger L.L.C.).

Data availability
No datasets were generated or analysed during the current study. Data such as epitope sequences, which were analysed during this study, are included in this published article and its Supplementary Information file. www.nature.com/scientificreports/ Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.