Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

A large peptidome dataset improves HLA class I epitope prediction across most of the human population


Prediction of HLA epitopes is important for the development of cancer immunotherapies and vaccines. However, current prediction algorithms have limited predictive power, in part because they were not trained on high-quality epitope datasets covering a broad range of HLA alleles. To enable prediction of endogenous HLA class I-associated peptides across a large fraction of the human population, we used mass spectrometry to profile >185,000 peptides eluted from 95 HLA-A, -B, -C and -G mono-allelic cell lines. We identified canonical peptide motifs per HLA allele, unique and shared binding submotifs across alleles and distinct motifs associated with different peptide lengths. By integrating these data with transcript abundance and peptide processing, we developed HLAthena, providing allele-and-length-specific and pan-allele-pan-length prediction models for endogenous peptide presentation. These models predicted endogenous HLA class I-associated ligands with 1.5-fold improvement in positive predictive value compared with existing tools and correctly identified >75% of HLA-bound peptides that were observed experimentally in 11 patient-derived tumor cell lines.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Mass spectrometric characterization of peptides eluted from HLA proteins in mono-allelic cell lines.
Fig. 2: Identification of shared motifs and submotifs amongst HLA-A, -B, -C and -G alleles.
Fig. 3: Mono-allelic data uncover length-specific HLA binding preferences.
Fig. 4: Proteasomal and peptidase shaping of the HLA-associated peptidome.
Fig. 5: Generation and evaluation of allele-and-length-specific and pan-allele-pan-length MS-based models on mono-allelic data.
Fig. 6: Integrative MS-informed models more accurately predict peptides directly observed on primary tumor cells.

Similar content being viewed by others

Data availability

The original mass spectra for 79 of 95 mono-allelic datasets generated for this study, the protein sequence database and tables of peptide spectrum matches for all 95 alleles have been deposited in the public proteomics repository MassIVE ( and are accessible at MS data for the 16 previously published mono-allelic datasets in MassIVE can be downloaded at Datasets for the patient samples are accessible at B721.221 RNA-seq data for HLA-C (C*04:01, C*07:01) are deposited under GEO: GSE131267. Melanoma RNA-seq data are deposited in dbGaP ( (ref. 15). Glioblastoma bulk RNA-seq data are available through dbGaP ( with accession number phs001519.v1.p1 (ref. 26). All other data are available from the corresponding authors upon reasonable request.

Code availability

Code used to generate plots characterizing allele-specific preferences (for example, logo plots, entropy plots, peptide projection and clustering, overlap with IEDB data and so on) as well as code to build a sample neural network prediction model is provided as Supplementary Code. The HLAthena predictors are available to use online for research purposes only at For commercial usage inquiries please contact the authors or the Broad Institute.


  1. Lefranc, M.-P. et al. IMGT®, the international ImMunoGeneTics information system® 25 years on. Nucleic Acids Res. 43, D413–D422 (2015).

    CAS  PubMed  Google Scholar 

  2. Robinson, J. et al. The IPD and IMGT/HLA database: allele variant databases. Nucleic Acids Res. 43, D423–D431 (2015).

    CAS  PubMed  Google Scholar 

  3. Jurtz, V. et al. NetMHCpan-4.0: improved peptide–MHC class I interaction predictions integrating eluted ligand and peptide binding affinity data. J. Immunol. 199, 3360–3368 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  4. Abelin, J. G. et al. Mass spectrometry profiling of HLA-associated peptidomes in mono-allelic cells enables more accurate epitope prediction. Immunity 46, 315–326 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  5. O’Donnell, T. J. et al. MHCflurry: open-source class I MHC binding affinity prediction. Cell Syst. 7, 129–132.e4 (2018).

    PubMed  Google Scholar 

  6. Gfeller, D. et al. The length distribution and multiple specificity of naturally presented HLA-I ligands. J. Immunol. 201, 3705–3716 (2018).

    CAS  PubMed  Google Scholar 

  7. Bulik-Sullivan, B. et al. Deep learning using tumor HLA peptide mass spectrometry datasets improves neoantigen identification. Nat. Biotechnol. 37, 55–63 (2018).

    Google Scholar 

  8. Nielsen, M. & Andreatta, M. NetMHCpan-3.0; improved prediction of binding to MHC class I molecules integrating information from multiple receptor and peptide length datasets. Genome Med. 8, 33 (2016).

    PubMed  PubMed Central  Google Scholar 

  9. Rajasagi, M. et al. Systematic identification of personal tumor-specific neoantigens in chronic lymphocytic leukemia. Blood 124, 453–462 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  10. de Kruijf, E. M. et al. HLA-E and HLA-G expression in classical HLA class I-negative tumors is of prognostic value for clinical outcome of early breast cancer patients. J. Immunol. 185, 7452–7459 (2010).

    PubMed  Google Scholar 

  11. Zhang, R.-L. et al. Predictive value of different proportion of lesion HLA-G expression in colorectal cancer. Oncotarget 8, 107441–107451 (2017).

    PubMed  PubMed Central  Google Scholar 

  12. Dawson, D. V., Ozgur, M., Sari, K., Ghanayem, M. & Kostyu, D. D. Ramifications of HLA class I polymorphism and population genetics for vaccine development. Genet. Epidemiol. 20, 87–106 (2001).

    CAS  PubMed  Google Scholar 

  13. Gragert, L., Madbouly, A., Freeman, J. & Maiers, M. Six-locus high resolution HLA haplotype frequencies derived from mixed-resolution DNA typing for the entire US donor registry. Hum. Immunol. 74, 1313–1320 (2013).

    CAS  PubMed  Google Scholar 

  14. Solberg, O. D. et al. Balancing selection and heterogeneity across the classical human leukocyte antigen loci: a meta-analytic review of 497 population studies. Hum. Immunol. 69, 443–464 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  15. Ott, P. A. et al. An immunogenic personal neoantigen vaccine for patients with melanoma. Nature 547, 217–221 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  16. Pearson, H. et al. MHC class I-associated peptides derive from selective regions of the human genome. J. Clin. Invest. 126, 4690–4701 (2016).

    PubMed  PubMed Central  Google Scholar 

  17. Vita, R. et al. The immune epitope database (IEDB) 3.0. Nucleic Acids Res. 43, D405–D412 (2015).

    CAS  PubMed  Google Scholar 

  18. Sette, A. & Sidney, J. HLA supertypes and supermotifs: a functional perspective on HLA polymorphism. Curr. Opin. Immunol. 10, 478–482 (1998).

    CAS  PubMed  Google Scholar 

  19. Robinson, J., Malik, A., Parham, P., Bodmer, J. G. & Marsh, S. G. E. IMGT/HLA database—a sequence database for the human major histocompatibility complex. Tissue Antigens 55, 280–287 (2000).

    CAS  PubMed  Google Scholar 

  20. Parham, P. & Moffett, A. Variable NK cell receptors and their MHC class I ligands in immunity, reproduction and human evolution. Nat. Rev. Immunol. 13, 133–144 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  21. Nielsen, M. et al. NetMHCpan, a method for quantitative predictions of peptide binding to any HLA-A and -B locus protein of known sequence. PLoS ONE 2, e796 (2007).

    PubMed  PubMed Central  Google Scholar 

  22. Rist, M. J. et al. HLA peptide length preferences control CD8+ T cell responses. J. Immunol. 191, 561–571 (2013).

    CAS  PubMed  Google Scholar 

  23. Maenaka, K. et al. Nonstandard peptide binding revealed by crystal structures of HLA-B*5101 complexed with HIV immunodominant epitopes. J. Immunol. 165, 3260–3267 (2000).

    CAS  PubMed  Google Scholar 

  24. Kaur, G. et al. Structural and regulatory diversity shape HLA-C protein expression levels. Nat. Commun. 8, 15924 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  25. Celik, A. A., Simper, G. S., Hiemisch, W., Blasczyk, R. & Bade-Döding, C. HLA-G peptide preferences change in transformed cells: impact on the binding motif. Immunogenetics 70, 485–494 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  26. Keskin, D. B. et al. Neoantigen vaccine generates intratumoral T cell responses in phase Ib glioblastoma trial. Nature 565, 234–239 (2019).

    CAS  PubMed  Google Scholar 

  27. Javitt, A. et al. Pro-inflammatory cytokines alter the immunopeptidome landscape by modulation of HLA-B expression. Front. Immunol. 10, 141 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  28. Di Marco, M. et al. Unveiling the peptide motifs of HLA-C and HLA-G from naturally presented peptides and generation of binding prediction matrices. J. Immunol. 199, 2639–2651 (2017).

    PubMed  Google Scholar 

  29. Liepe, J. et al. A large fraction of HLA class I ligands are proteasome-generated spliced peptides. Science 354, 354–358 (2016).

    CAS  PubMed  Google Scholar 

  30. Faridi, P. et al. A subset of HLA-I peptides are not genomically templated: evidence for cis- and trans-spliced peptide ligands. Sci. Immunol. 3, eaar3947 (2018).

    PubMed  Google Scholar 

  31. Mylonas, R. et al. Estimating the contribution of proteasomal spliced peptides to the HLA-I ligandome. Mol. Cell. Proteom. 17, 2347–2357 (2018).

    CAS  Google Scholar 

  32. Rolfs, Z., Solntsev, S. K., Shortreed, M. R., Frey, B. L. & Smith, L. M. Global identification of post-translationally spliced peptides with neo-fusion. J. Proteome Res. 18, 349–358 (2018).

    PubMed  PubMed Central  Google Scholar 

  33. Rolfs, Z., Müller, M., Shortreed, M. R., Smith, L. M. & Bassani-Sternberg, M. Comment on ‘A subset of HLA-I peptides are not genomically templated: evidence for cis- and trans-spliced peptide ligands’. Sci. Immunol. 4, eaaw1622 (2019).

    CAS  PubMed  Google Scholar 

  34. Bassani-Sternberg, M. et al. Direct identification of clinically relevant neoepitopes presented on native human melanoma tissue by mass spectrometry. Nat. Commun. 7, 13404 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  35. Schuster, H. et al. The immunopeptidomic landscape of ovarian carcinomas. Proc. Natl Acad. Sci. USA 114, E9942–E9951 (2017).

    CAS  PubMed  Google Scholar 

  36. Girdlestone, J. Regulation of HLA class I loci by interferons. Immunobiology 193, 229–237 (1995).

    CAS  PubMed  Google Scholar 

  37. Chong, C. et al. High-throughput and sensitive immunopeptidomics platform reveals profound interferonγ-mediated remodeling of the human leukocyte antigen (HLA) ligandome. Mol. Cell. Proteom. 17, 533–548 (2018).

    CAS  Google Scholar 

  38. Kidera, A., Konishi, Y., Oka, M., Ooi, T. & Scheraga, H. A. Statistical analysis of the physical properties of the 20 naturally occurring amino acids. J. Protein Chem. 4, 23–55 (1985).

    CAS  Google Scholar 

  39. Bremel, R. D. & Homan, E. J. An integrated approach to epitope analysis I: dimensional reduction, visualization and prediction of MHC binding using amino acid principal components and regression approaches. Immunome Res. 6, 7 (2010).

    PubMed  PubMed Central  Google Scholar 

  40. McInnes, L., Healy, J. & Melville, J. UMAP: uniform manifold approximation and projection for dimension reduction. Preprint at arXiv:1802.03426 [stat.ML] (2018).

  41. Harndahl, M. et al. Peptide binding to HLA class I molecules: homogenous, high-throughput screening, and affinity assays. J. Biomol. Screen. 14, 173–180 (2009).

    CAS  PubMed  Google Scholar 

  42. Bassani-Sternberg, M., Pletscher-Frankild, S., Jensen, L. J. & Mann, M. Mass spectrometry of human leukocyte antigen class I peptidomes reveals strong effects of protein abundance and turnover on antigen presentation. Mol. Cell. Proteom. 14, 658–673 (2015).

    CAS  Google Scholar 

  43. Hunt, D. F. et al. Characterization of peptides bound to the class I MHC molecule HLA-A2.1 by mass spectrometry. Science 255, 1261–1263 (1992).

    CAS  PubMed  Google Scholar 

  44. Rammensee, H. G., Friede, T. & Stevanoviíc, S. MHC ligands and peptide motifs: first listing. Immunogenetics 41, 178–228 (1995).

    CAS  PubMed  Google Scholar 

  45. Rammensee, H., Bachmann, J., Emmerich, N. P., Bachor, O. A. & Stevanović, S. SYFPEITHI: database for MHC ligands and peptide motifs. Immunogenetics 50, 213–219 (1999).

    CAS  PubMed  Google Scholar 

  46. Kim, Y., Sidney, J., Pinilla, C., Sette, A. & Peters, B. Derivation of an amino acid similarity matrix for peptide:MHC binding and its application as a Bayesian prior. BMC Bioinformatics 10, 394 (2009).

    PubMed  PubMed Central  Google Scholar 

Download references


We acknowledge technical assistance from K. Pelton, S. Santagata, O. Spiro, L. Elagina, B. Knisbacher, S. Shukla, J. Brugge and A. Apffel. We further express gratitude for constructive input from M. Rooney, J. Abelin and Z. Hu. We acknowledge support from the National Institutes of Health: grant nos. NCI-1RO1CA155010-02 (to C.J.W.), NHLBI-5R01HL103532-03 (to C.J.W.), NIH/NCI U24 CA224331 (to C.J.W.), NIH/NCI R21 CA216772-01A1 (to D.B.K.), NCI-SPORE-2P50CA101942-11A1 (to D.B.K.), NHGRI T32HG002295 and NIH/NCI T32CA207021 (to S.S.), NCI 5T32CA009172-41 (to D.A.B.), NIH/NCI U24-CA210986 and NIH/NCI U01 CA214125 (to S.A.C.). This work was supported in part by The G. Harold and Leila Y. Mathers Foundation and the Bridge Project, a partnership between the Koch Institute for Integrative Cancer Research at MIT and the Dana-Farber/Harvard Cancer Center. D.A.B. is supported in part by the John R. Svenson Fellowship. C.J.W. is a scholar of the Leukemia and Lymphoma Society, and is supported in part by the Parker Institute for Cancer Immunotherapy. S.K. is a Cancer Research Institute/Hearst Foundation fellow.

Author information

Authors and Affiliations



D.B.K., C.J.W., N.H. and S.A.C. directed the overall study design. S.S. performed computational analyses and developed predictive models. S.K., C.R.H., H.K. and K.R.C. generated the MS data and performed data analysis. D.B.K. and G.L.Z. selected the HLA alleles for analysis. D.B.K., P.M.L. and L.W.L. generated the single-HLA allele cell lines and performed data generation. D.B.K., G.O., K.L.L., D.A.B., P.M.L. and L.W.L. developed the patient-derived tumor cell lines. I.K.Z. and J.M.R. generated and provided cells from an ovarian cancer PDX model. P.B. provided CLL samples for analysis. W.Z. provided expert technical assistance. T.E. generated RNA-seq data for mono-allelic cell lines. T.O. and T.L. generated and quantified ribosome profiling data. J.S. and W.J.L. performed HLA typing and validation of all cell lines. S.J. performed HLA binding validation assays. S.S., S.K., N.H., C.J.W. and D.B.K. wrote the manuscript, with contributions from all co-authors.

Corresponding authors

Correspondence to Nir Hacohen, Steven A. Carr, Catherine J. Wu or Derin B. Keskin.

Ethics declarations

Competing interests

D.B.K. has previously advised Neon Therapeutics, and owns equity in Aduro Biotech, Agenus Inc., Armata Pharmaceuticals, Biomarin Pharmaceutical Inc., Bristol–Myers Squibb Com., Celldex Therapeutics Inc., Editas Medicine Inc., Exelixis Inc., Gilead Sciences Inc., IMV Inc., Lexicon Pharmaceuticals Inc. and Stemline Therapeutics Inc. D.A.B. has received consulting fees from Octane Global, Defined Health, Dedham Group, Adept Field Solutions, Slingshot Insights, Blueprint Partnership, Charles River Associates, Trinity Group and Insight Strategy, and is a member of the RCC translational medicine advisory broad of Bristol–Myers Squibb. K.L.L. owns equity and is a founder of Travera LLC and is an advisor to Bristol–Myers Squibb Com. and Rarecyte. S.A.C. is a member of the scientific advisory boards of Kymera, PTM BioLabs and BioAnalytix and a scientific advisor to Pfizer and Biogen. C.J.W. and N.H. are founders of Neon Therapeutics and members of its scientific advisory board. N.H. is also an advisor for IFM therapeutics. W.J.L. is a member of the scientific advisory board of CareDx. All other authors have no competing interests. Patent applications have been filed on aspects of the described work entitled as follows: ‘HLA single allele lines’, and ‘Methods for identifying neoantigens’.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Materials

Supplementary Figs. 1–6 and Notes 1–7.

Reporting Summary

Supplementary Data 1

Peptide exports for mono-allelic samples

Supplementary Data 2

Peptide exports for multi-allelic samples

Supplementary Table 1

Characteristics of HLA alleles and mono-allelic data

Supplementary Table 2

Allele similarity and submotifs derived from mono-allelic data

Supplementary Table 3

Mono-allelic data reveal length-based preferences

Supplementary Table 4

HLA presentation of IFN-γ response genes increases after treatment

Supplementary Table 5

Cross-validated model evaluation results on mono-allelic data

Supplementary Table 6

Cross-validated model evaluation results on multi-allelic data

Supplementary Code

Sample scripts for reproducing analysis and models.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sarkizova, S., Klaeger, S., Le, P.M. et al. A large peptidome dataset improves HLA class I epitope prediction across most of the human population. Nat Biotechnol 38, 199–209 (2020).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

This article is cited by


Quick links

Nature Briefing: Cancer

Sign up for the Nature Briefing: Cancer newsletter — what matters in cancer research, free to your inbox weekly.

Get what matters in cancer research, free to your inbox weekly. Sign up for Nature Briefing: Cancer