Abstract
The outbreak of SARS-CoV-2 (2019-nCoV) virus has highlighted the need for fast and efficacious vaccine development. Stimulation of a proper immune response that leads to protection is highly dependent on presentation of epitopes to circulating T-cells via the HLA complex. SARS-CoV-2 is a large RNA virus and testing of all of its overlapping peptides in vitro to deconvolute an immune response is not feasible. Therefore HLA-binding prediction tools are often used to narrow down the number of peptides to test. We tested NetMHC suite tools' predictions by using an in vitro peptide-MHC stability assay. We assessed 777 peptides that were predicted to be good binders across 11 MHC alleles in a complex-stability assay and tested a selection of 19 epitope-HLA-binding prediction tools against the assay. In this investigation of potential SARS-CoV-2 epitopes we found that current prediction tools vary in performance when assessing binding stability, and they are highly dependent on the MHC allele in question. Designing a COVID-19 vaccine where only a few epitope targets are included is therefore a very challenging task. Here, we present 174 SARS-CoV-2 epitopes with high prediction binding scores, validated to bind stably to 11 HLA alleles. Our findings may contribute to the design of an efficacious vaccine against COVID-19.
Introduction
2019-nCoV (SARS-CoV-2) was first reported in Wuhan, China, on 31 December 2019, following a series of unexplained pneumonia cases1. Currently, the disease is rated as a global pandemic by The World Health Organization with case reports from all continents, as of 4 October 2020 the disease has infected more than 34 million people and has claimed more 1 million lives globally2. Vaccine development is of high priority, and a number of public and private initiatives are focused on this task3. Many of the ongoing vaccine development efforts are focused on raising an immune response against the spike protein. However, the spike protein only makes up 1/8 of the SARS-CoV-2 genome, so this vaccine strategy may inadvertently miss a lot of potential immune reactivity. SARS-CoV-2 has a large proteome4. Immune deconvolution to identify T cell epitopes will require initial filtering to assess which SARS-CoV-2-derived peptides are likely to bind a given HLA allele and to be presented on the surface of infected cells from where it can activate passing T cells. The core binding groove of most MHC class I molecules can accommodate 9 amino acid residues, with some variation or suspected impact of flanking positions5,6. MHC class II has been described to bind longer peptides (up to 13–25 residues long) interacting with the open binding groove7. Providing the possibility for further inspection of the importance of the binding motif and its flanking regions.
Several computational tools (a selection is presented in Table 1) have been developed that can predict the binding of peptides to HLA. Traditionally, these tools were trained using data from affinity assays8, but more recently many of them also incorporate data from peptides identified by HLA ligandome analysis. Most tools rely on small neural networks (NN) or variations of position-specific weight matrices (PSSM), to calculate the probability of a peptide matching a consensus motif or model.
NetMHC tools (such as NetMHC, NetMHCII, NetMHCpan, NetMHCIIpan and others) have been under constant development and have consistently performed well throughout the last decade9,10,11,12. Several tools are restricted in terms of which alleles are available for prediction, in particular for MHC class II. This restriction is primarily determined by the availability of training data, for which the largest public collection is currently the Immune Epitope Database (IEDB)13. Attempts to overcome this limitation have been made via sequence-to-sequence predictions, most notably for NetMHCpan14. A number of recent publications makes use of prediction tools to suggest vaccine candidate epitopes for SARS-CoV-215,16,17.
To assess whether current peptide-HLA prediction tools could be suitable for identification of epitopes relevant in a vaccine against SARS-CoV-2, we tested binders predicted by the netMHC tools, using a new peptide-MHC complex-stability assay NeoScreen on ten HLA class I alleles and one HLA class II allele. The selection of class I alleles broadly covers populations across different ethnic origins (Table S1). Subsequently, we chose all tools included in the benchmark recently reviewed by Mei et al.10, excluding the three tools with the lowest performance (MHCnuggets, HLA-CNN and RANKPEP), as well as SYFPEITHI which could not be brought to run on our system. Furthermore, we added newly developed tools such as HLAthena and DeepHLAPan and a standard tool SMM 1.0 to offer a comprehensive representation of the current prediction tool landscape (Table 1). Most of the selected tools are periodically tested in the IEDB Automated Benchmark35,36.
We found that algorithmically predicting binding between epitopes from SARS-CoV-2 and HLA outputs many complexes that turned out to exhibit low stability. Such peptides are thus very unlikely to elicit an immune response against SARS-CoV-2 and are therefore unsuitable for vaccine development. To investigate if this finding was a result of the quality of available training data, we constructed a proof-of-concept prediction model for HLA-A*02:01, which we trained on 2193 historic in-house stability data points, and found that it outperforms other tools. Training data was primarily human cancer-derived or based on random sequences. SARS-CoV-2 peptides that we validated as binding or non-binding in this study are freely available for use to assist in vaccine design against COVID-19.
Results
We set out to identify peptides with epitope potential in a future COVID-19 vaccine. We commenced by translating the reference sequence of SARS-CoV-2 (ACCESSION MN908947, VERSION MN908947.3) to a protein-coding sequence. Then we predicted potential epitopes in a sliding window of 9 for HLA class I and of 12 for class II using netMHC tools (netMHC/II and “-pan” versions, when allele was not available), for details see Data S1. We identified the top 94 predicted peptides for 11 HLA alleles (94 × 11 = 1034) and went further to validate the binding of these 94 peptides to each allele in an in vitro MHC-peptide complex stability assay (NeoScreen). We removed eight peptides that were synthetically introduced when translating the DNA sequence to protein sequence. Of the remaining 1026 peptides we observed a high degree of overlap between different alleles, resulting in 777 unique peptides. In order to first assess potential variability across the stability measurements we made replicate measurements (n = 4) of 30 randomly selected peptides over 8 different HLA alleles. Each peptide was measured with urea in 4 different concentrations (0 M, 2 M, 4 M, 6 M), and we observed an average standard deviation between replicates of 0.10 with an average mean of 0.56 (Figure S1). All remaining experiments were performed in duplicate for all concentrations. We found that 174 of the 777 unique peptides formed a stable peptide-HLA complex. Of these 174 peptides, currently 98 were previously measured and deposited in IEDB either as a 9-mers or as a substring of a longer peptide, 3 peptides were reported in recent studies37,38,39 but not deposited in the IEDB and 73 remaining peptides are novel. The overlap with peptides deposited in IEDB clearly points out to cross-reactivity between SARS-CoV and SARS-CoV-2, this cross-reactivity has been described in a recent study showing that individuals infected with SARS retained long-lasting memory T cells reacting to the N protein of SARS-CoV, as well as N protein of SARS-CoV-240. Since the completion of our measurements and data search there has been rapid development and many new studies have emerged. The full list of predicted binders (excluding synthetic peptides) can be found in the Supplementary materials (Supplementary Data S1).
To further address whether alternative prediction tools would have higher concordance with measured stability, we performed predictions for all tools listed in Table 1. Predictions for the 19 different tools were performed either through their web server or a stand-alone version, (see Materials and methods section for details). Furthermore, using in-house stability data, we developed PrdX 1.0, a prediction tool for a single allele HLA-A*02:01, where all other tools performed poorly.
We assessed the false positive rate for each tool via Receiver Operating Characteristic (ROC) curves, and their Area under curve (AUC) for all alleles that had more than 10 binders.
The analysis revealed that NetMHC 4.0 achieved the highest score for allele HLA-A*01:01 (AUC = 97.47; Fig. 1A), closely followed by NetMHCcons 1.1, NetMHCpan_BA 4.0 and IEDB-AR Consensus. PrdX 1.0 scored highest for HLA-A*02:01 (AUC = 85.54; Fig. 1B), NetMHCcons 1.1 scored highest for HLA-A*03:01 (AUC = 79.25; Fig. 1C), and MHCflurry 1.3.0 performed best for HLA-B*40:01 (AUC = 91.06; Fig. 1F). NetMHCstab 1.0 was the only tool that achieved the highest score for more than 1 allele: HLA-A*11:01 and HLA-A*24:02 (AUC = 89.80; 86.03; Fig. 1D,E, respectively). Out of the tools tested for HLA class II, IEDB-AR Consensus achieved the highest score for HLA-DRB1*04:01 (AUC = 81.31; Fig. 1G). Table 2 provides all AUC values, and the best result obtained for each allele is marked in bold. Notably, in the case of HLA-A*02:01 we observed particularly poor performance among all tested tools despite the extensive amount of data available for this allele.
ROC curves for each allele that bound more than 10 peptides stably (subplots A, B, C, D, E, F, G), (H) tools used in the benchmark, upper box—HLA class I, lower box—HLA class II (IEDB-AR Consensus is available for both), (I) precision-recall curves for HLA-A*02:01. Corresponding area under curve (AUC) values are listed in Table 2.
To assess the correlation between the predicted and measured peptide-HLA complexes, Spearman correlation coefficient (SCC) was calculated for all alleles. This revealed significant inconsistencies in performance depending on the predicted allele. PSSMHCpan 1.0 displayed the highest consistency, taking into account its coverage (Table 1), but the correlation median scored lower than other tools such as IEDB-AR Consensus, MixMHCpred 2.0.2, NetMHCpan_EL 4.0 or PrdX 1.0. The results of the Spearman correlations are summarized in Fig. 2.
Lastly to compare the performance of our benchmark we calculated the average percentile score as described at the IEDB website35 for all tools and alleles where we had both AUC and SCC available. Comparison between overlapping tools in IEDB Automated Benchmark and our study can be found in Supplementary materials (Table S2).
Discussion
Here we benchmark a number of tools to identify epitopes for SARS-CoV-2 virus and validate via stability assay the binding of candidate epitopes to 10 alleles of HLA class I and one allele of HLA class II. We find that the false positive rate is high for all tested tools when testing binding stability for predicted HLA-binding peptides from SARS-CoV-2 virus. This creates a challenge for vaccine development efforts, especially for the design of epitope vaccines, where only a limited number of epitopes may be included. Furthermore, it highlights the risk for failed vaccine design (for any pathogen or disease) if predicted HLA-binding protein regions in reality do not bind stably and allow immune presentation and response.
We observed, remarkably, that all tools tested performed poorly for HLA-A*02:01, which is the allele with most training data available41. Based on our observations we hypothesise that publicly available training data is not of high enough quality. This is supported by the fact that AUC and Spearman correlations indicate that performance seems to correlate with the alleles and not the tools; thus, suggesting that either the training data or the difficulty of modelling the allele is responsible for poor predictions. To test the hypothesis that training data is limiting for tool performance, we trained a vanilla NN on only 2193 historic in-house stability measurements and found that our model outperformed all tested prediction tools in this setting. This observation could also be explained by more similar data distributions between test and training data for PrdX 1.0.
We identified 174 potential SARS-CoV-2 vaccine candidate peptides, out of which 98 have been previously deposited in IEDB following various studies13,42,43,44,45,46. The majority of the previously deposited peptides were measured in one or multiple affinity assays and reached low Kd (< 50 nM) values, indicating strong affinity. Additionally, 9 of these peptides were previously measured in another stability assay and were recognised as stable binders47, independently confirming our approach and measurements. Recently, new T cell studies uncovered a large overlap with stable peptides from our assay: 60 peptides showed a positive T cell response in one or more performed studies37,38,39. Such a result not only reveals the true potential of complex-stability assays but also contributes to collective findings about cross-reactivity of SARS-CoV and SARS-CoV-2.
In conclusion, we make freely available 174 COVID-19 epitopes that we have predicted and validated in vitro to be HLA-binding. We hope that this contribution will aid the development of a vaccine against SARS-CoV-2. We performed a benchmark analysis of 19 tools on 777 peptides that were predicted by state-of-the-art prediction tools from netMHC to be binders and revealed high false positive rates for all benchmarked tools. We observed improved performance after training our own prediction tool PrdX 1.0 on allele HLA-A*02:01 using in-house generated stability data. Our findings suggest that the performance of current state-of-the-art epitope prediction tools are impacted by the varying quality of publicly available data.
Materials and methods
Nineteen prediction tools tested on a relevant dataset of peptides from the SARS-CoV-2 genome (assembly MN908947.3). The genome sequence was downloaded from the NCBI database (https://www.ncbi.nlm.nih.gov/nuccore/MN908947.3)4. Using NetMHC tools we predicted the top 94 peptides for HLA-A*01:01, HLA-A*02:01, HLA-A*03:01, HLA-A*24:02, HLA-B*40:01, HLA-C*04:01, HLA-C*07:01, HLA-C*07:02 (netMHC 4.0), HLA-C*01:02 (NetMHCpan 4.0) and HLA-DRB1*04:01 (NetMHCII 2.3). Subsequently, the peptides were analysed for binding stability to the respective HLA allele. Taking into account the cross-reactivity between the two alleles, peptides predicted to bind HLA-A*03:01 were also measured on HLA-A*11:01. For HLA-DRB1*04:01 we increased the synthesized peptides from length 9 to 12 in order to account for the effect of flanking regions to the core binding sequence.
Peptides were synthesised using standard Fmoc solid-phase synthesis on a modified cellulose support as solid support according to the SPOT synthesis protocol, starting with the acid labile Ramage linker.
After synthesis, peptides were cleaved off the membranes using 95% trifluoroacetic acid (TFA), 3% triisopropylsilane (TIS) & 2% H2O. Peptides were then precipitated with diethylether and washed with methyl-tert-butylether.
Peptides were subsequently dissolved in a proprietary mixture and dried under vacuum using a speed vac. Finally, 5% of all peptides were analysed by MALDI-TOF to confirm correct molecular weight. The anticipated yield per spot was 50 μg.
NeoScreen assay
The NeoScreen stability assay utilises urea denaturation to assess peptide-MHC complex stability. Briefly, peptides were dissolved in 200 µl DMSO with 1 mM β-mercaptoethanol and subsequently diluted into an assay buffer in 96 well plates at a final concentration of 2 µM. Positions A1 and H12 were reserved for a mixture of reference peptides with known stable binding to the MHC of interest. MHC I was diluted into an assay buffer with beta 2 microglobulin (b2m) and added at a 1:1 ratio to diluted peptides. For MHC II, the urea-denatured alpha and beta chains were diluted into an assay buffer and added at a 1:1 ratio to diluted peptides. The concentration of MHC depended on the actual chain, but final concentrations were in the range of 2–10 nM (hence peptide was added in excess). Upon folding, peptide-MHC complexes were transferred to 384 well plates where they were challenged with 4 different urea concentrations. Following the period of urea-induced stress the plates were developed in a conventional ELISA as described previously48,49. The ABS450 nm signals from the 4 different wells were averaged and normalised to the included reference to the included reference peptides in wells A1 and H12.
Unlike other previously developed assays NeoScreen offers a high-throughput process without a need to use iodine labelled b2m or FACS based quantification50,51. When compared with a recently developed method which uses thermal denaturation and differential scanning fluorimetry a same stability trend was found, where MART-1 wt had lowest stability, Tyrosinase and HTLV-TAX (NeoScreen reference peptide for HLA-A*02:01) had very high stability51.
Benchmarking of tools
Table 1 provides a summary of tools tested in this benchmark analysis. It features the year of their development, the algorithm used, web server availability and a reference. Most of the tested tools are available at the IEDB Analysis Resource web page (https://tools.iedb.org/main/) and were run through their web interface (https://tools.iedb.org/mhci/ or https://tools.iedb.org/mhcii/). MixMHCpred 2.0.2, MHCflurry 1.3.0 and PSSMHCpan 1.0 were downloaded from their respective GitHub pages (https://github.com/GfellerLab/MixMHCpred, https://github.com/openvax/mhcflurry, https://github.com/BGI2016/PSSMHCpan, respectively). ConvMHC, DeepHLApan and HLAthena were used from their privately hosted web servers (https://jumong.kaist.ac.kr:8080/convmhc, https://biopharm.zju.edu.cn/deephlapan/, https://hlathena.tools/, respectively).
All tested peptides were subjected to in silico predictions (with each prediction tool) regarding their available allele. Predictions were compared against measured stability determinations obtained through the NeoScreen assay. Measurements were normalised to an allele-specific reference peptide (stability = 100). The list of reference peptides used is available in Supplementary materials (Table S3). The threshold for a stable binder was set to 60. Predictions were subsequently evaluated according to commonly used metrics such as the Receiver Operating Characteristic (ROC) and its Area Under Curve (AUC) to visualise the relationship between sensitivity and specificity, corresponding equations can be found in the Supplementary methods. Spearman correlation was also used to compare the ranked correlation of predicted and measured data.
PrdX
To assess the performance of predictors trained on stability data we used PyTorch52 to train a fully connected, feed-forward neural network with 64 and 32 hidden units on historic in-house stability data from allele HLA-A*02:01. This data contains a mixture of human cancer-related stability measurements and measurements made on synthetic random peptides. We used BLOSUM62 matrix for encoding, simple network architecture, train-test split and early stopping for training.
Data availability
All epitopes are available at the vendor webpage (www.immunitrack.com) and in Supplementary materials (Data S2).
References
World Health Organization. Novel coronavirus (2019-nCoV) situation report-1. https://www.who.int/docs/default-source/coronaviruse/situation-reports/20200121-sitrep-1-2019-ncov.pdf?sfvrsn=20a99c10_4 (2020).
World Health Organization. Coronavirus disease (COVID-19) weekly epidemiological update-8. https://www.who.int/docs/default-source/coronaviruse/situation-reports/20201005-weekly-epi-update-8.pdf (2020).
Chen, W. H., Strych, U., Hotez, P. J. & Bottazzi, M. E. The SARS-CoV-2 vaccine pipeline: an overview. Curr. Trop. Med. Rep. 7, 61–64 (2020).
Wu, F. et al. A new coronavirus associated with human respiratory disease in China. Nature 579, 265–269 (2020).
Bassani-Sternberg, M. et al. Direct identification of clinically relevant neoepitopes presented on native human melanoma tissue by mass spectrometry. Nat. Commun. 7, 13404 (2016).
Rammensee, H.-G. Chemistry of peptides associated with MHC class I and class II molecules. Curr. Opin. Immunol. 7, 85–96 (1995).
Wieczorek, M. et al. Major histocompatibility complex (MHC) class I and MHC class II proteins: conformational plasticity in antigen presentation. Front. Immunol. 8, 292 (2017).
Harndahl, M. et al. Peptide binding to HLA class I molecules: homogenous, high-throughput screening, and affinity assays. J. Biomol. Screen. 14, 173–180 (2009).
Peters, B., Nielsen, M. & Sette, A. T cell epitope predictions. Annu. Rev. Immunol. https://doi.org/10.1146/annurev-immunol-082119 (2019).
Mei, S. et al. A comprehensive review and performance evaluation of bioinformatics tools for HLA class I peptide-binding prediction. Brief. Bioinform. 21, 1119–1135 (2020).
Saethang, T. et al. EpicCapo: epitope prediction using combined information of amino acid pairwise contact potentials and HLA-peptide contact site information. BMC Bioinform. 13, 313 (2012).
Bhattacharya, R. et al. Evaluation of machine learning methods to predict peptide binding to MHC Class I proteins. bioRxiv https://doi.org/10.1101/154757 (2017).
Vita, R. et al. The immune epitope database (IEDB): 2018 update. Nucleic Acids Res. 47, D339–D343 (2019).
Jurtz, V. et al. NetMHCpan-4.0: improved peptide–MHC class I interaction predictions integrating eluted ligand and peptide binding affinity data. J. Immunol. 199, 3360–3368 (2017).
Fast, E., Altman, R. B. & Chen, B. Potential T-cell and B-cell epitopes of 2019-nCoV. bioRxiv https://doi.org/10.1101/2020.02.19.955484 (2020).
Grifoni, A. et al. A sequence homology and bioinformatic approach can predict candidate targets for immune responses to SARS-CoV-2. Cell Host Microbe 27, 671-680.e2 (2020).
Abdelmageed, M. I. et al. Design of multi epitope-based peptide vaccine against E protein of human COVID-19: an immunoinformatics approach. bioRxiv https://doi.org/10.1101/2020.02.04.934232 (2020).
Nielsen, M. et al. Reliable prediction of T-cell epitopes using neural networks with novel sequence representations. Protein Sci. 12, 1007–1017 (2003).
Moutaftsi, M. et al. A consensus epitope prediction approach identifies the breadth of murine TCD8+-cell responses to vaccinia virus. Nat. Biotechnol. 24, 817–819 (2006).
Han, Y. & Kim, D. Deep convolutional neural networks for pan-specific peptide-MHC class I binding prediction. BMC Bioinform. 18, 585 (2017).
Wu, J. et al. DeepHLApan: a deep learning approach for neoantigen prediction considering both HLA-peptide binding and immunogenicity. Front. Immunol. 10, 2559 (2019).
Sarkizova, S. et al. A large peptidome dataset improves HLA class I epitope prediction across most of the human population. Nat. Biotechnol. 38, 199–209 (2020).
Bassani-Sternberg, M. et al. Deciphering HLA-I motifs across HLA peptidomes improves neo-antigen predictions and identifies allostery regulating HLA specificity. PLoS Comput. Biol. 13, e1005725 (2017).
O’Donnell, T. J. et al. MHCflurry: open-source class I MHC binding affinity prediction. Cell Syst. 7, 129-132.e4 (2018).
Karosiene, E., Lundegaard, C., Lund, O. & Nielsen, M. NetMHCcons: a consensus method for the major histocompatibility complex class I predictions. Immunogenetics 64, 177–186 (2012).
Jørgensen, K. W., Rasmussen, M., Buus, S. & Nielsen, M. NetMHCstab—predicting stability of peptide-MHC-I complexes; impacts for cytotoxic T lymphocyte epitope discovery. Immunology 141, 18–26 (2014).
Zhang, H., Lund, O. & Nielsen, M. The PickPocket method for predicting binding specificities for receptors based on receptor pocket similarities: application to MHC-peptide binding. Bioinformatics 25, 1293–1299 (2009).
Liu, G. et al. PSSMHCpan: a novel PSSM-based software for predicting class I peptide-HLA binding affinity. GigaScience 6, 1–11 (2017).
Peters, B. & Sette, A. Generating quantitative models describing the sequence specificity of biological processes with the stabilized matrix method. BMC Bioinform. 6, 132 (2005).
Kim, Y., Sidney, J., Pinilla, C., Sette, A. & Peters, B. Derivation of an amino acid similarity matrix for peptide: MHC binding and its application as a Bayesian prior. BMC Bioinform. 10, 394 (2009).
Wang, P. et al. A systematic assessment of MHC class II peptide binding predictions and evaluation of a consensus approach. PLoS Comput. Biol. 4, e1000048 (2008).
Jensen, K. K. et al. Improved methods for predicting peptide binding affinity to MHC class II molecules. Immunology 154, 394–406 (2018).
Nielsen, M., Lundegaard, C. & Lund, O. Prediction of MHC class II binding affinity using SMM-align, a novel stabilization matrix alignment method. BMC Bioinform. 8, 238 (2007).
Sturniolo, T. et al. Generation of tissue-specific and promiscuous HLA ligand databases using DNA microarrays and virtual HLA class II matrices. Nat. Biotechnol. 17, 555–561 (1999).
Trolle, T. et al. Automated benchmarking of peptide-MHC class I binding predictions. Bioinformatics 31, 2174–2181 (2015).
Andreatta, M. et al. An automated benchmarking platform for MHC class II binding prediction methods. Bioinformatics 34, 1522–1528 (2018).
Peng, Y. et al. Broad and strong memory CD4+ and CD8+ T cells induced by SARS-CoV-2 in UK convalescent individuals following COVID-19. Nat. Immunol. https://doi.org/10.1038/s41590-020-0782-6 (2020).
Mateus, J. et al. Selective and cross-reactive SARS-CoV-2 T cell epitopes in unexposed humans. Science 370, 89 (2020).
Dines, J. N. et al. The ImmuneRACE study: a prospective multicohort study of immune response action to COVID-19 events with the ImmuneCODETM open access database. medRxiv https://doi.org/10.1101/2020.08.17.20175158 (2020).
le Bert, N. et al. SARS-CoV-2-specific T cell immunity in cases of COVID-19 and SARS, and uninfected controls. Nature 584, 457–462 (2020).
Kim, Y. et al. Dataset size and composition impact the reliability of performance benchmarks for peptide-MHC binding predictions. BMC Bioinform. 15, 241 (2014).
Qu, Z. et al. Structure and peptidome of the Bat MHC class I molecule reveal a novel mechanism leading to high-affinity peptide binding. J. Immunol. 202, 3493–3506 (2019).
Blicher, T., Kastrup, J. S., Buus, S. & Gajhede, M. High-resolution structure of HLA-A*1101 in complex with SARS nucleocapsid peptide. Acta Crystallogr. D Biol. Crystallogr. 61, 1031–1040 (2005).
Sylvester-Hvid, C. et al. SARS CTL vaccine candidates; HLA supertype-, genome-wide scanning and biochemical validation. Tissue Antigens 63, 395–400 (2004).
Ishizuka, J. et al. Quantitating T cell cross-reactivity for unrelated peptide antigens. J. Immunol. 183, 4337–4345 (2009).
Harndahl, M. et al. Large-scale analysis of peptide-HLA class I interactions. IEDB https://www.iedb.org/reference/1000945 (2006).
Rasmussen, M. et al. Large-scale analysis of peptide-HLA-I stability. IEDB https://www.iedb.org/reference/1028288 (2014).
Justesen, S., Harndahl, M., Lamberth, K., Nielsen, L. L. B. & Buus, S. Functional recombinant MHC class II molecules and high-throughput peptide-binding assays. Immunome Res. 5, 2 (2009).
Sylvester-Hvid, C. et al. Establishment of a quantitative ELISA capable of determining peptide—MHC class I interaction. Tissue Antigens 59, 251–258 (2002).
Harndahl, M. et al. Peptide-MHC class I stability is a better predictor than peptide affinity of CTL immunogenicity. Eur. J. Immunol. 42, 1405–1416 (2012).
Blaha, D. T. et al. High-throughput stability screening of neoantigen/HLA complexes improves immunogenicity predictions. Cancer Immunol. Res. 7, 50–61 (2019).
Paszke, A. et al. PyTorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems vol. 32 8024–8035 (Curran Associates, Inc., 2019).
Acknowledgements
We acknowledge support from the Innovation Foundation Denmark [Grant Number Ref. No. 9065-00225B]. Special thanks to Savvas Kinalis for fruitful inputs on the inner workings of PyTorch.
Author information
Authors and Affiliations
Contributions
F.O.B., S.T., and S.J. designed the study. D.B.SJ. conducted stability measurements. M.P. performed epitope predictions and developed predictive models. F.O.B, M.P., S.J., and O.W. analysed the data. E.J. synthesised peptides. F.O.B., M.P., and S.J. drafted the manuscript. All authors read and approved the manuscript.
Corresponding author
Ethics declarations
Competing interests
F.O.B. serves as a member of the Scientific Advisory board at Immunitrack ApS. S.J. & S.T. are the founders of Immunitrack ApS. All other authors have no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Prachar, M., Justesen, S., Steen-Jensen, D.B. et al. Identification and validation of 174 COVID-19 vaccine candidate epitopes reveals low performance of common epitope prediction tools. Sci Rep 10, 20465 (2020). https://doi.org/10.1038/s41598-020-77466-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-020-77466-4
This article is cited by
-
Proteome-wide analysis of Coxiella burnetii for conserved T-cell epitopes with presentation across multiple host species
BMC Bioinformatics (2021)
-
Screening HLA-A-restricted T cell epitopes of SARS-CoV-2 and the induction of CD8+ T cell responses in HLA-A transgenic mice
Cellular & Molecular Immunology (2021)
-
Human genetic basis of coronavirus disease 2019
Signal Transduction and Targeted Therapy (2021)
-
The Role of Artificial Intelligence in Fighting the COVID-19 Pandemic
Information Systems Frontiers (2021)
-
Identification and characterization of a SARS-CoV-2 specific CD8+ T cell response with immunodominant features
Nature Communications (2021)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.