Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Resource
  • Published:

A machine learning model for ranking candidate HLA class I neoantigens based on known neoepitopes from multiple human tumor types

Abstract

Tumor neoepitopes presented by major histocompatibility complex (MHC) class I are recognized by tumor-infiltrating lymphocytes (TIL) and are targeted by adoptive T-cell therapies. Identifying which mutant neoepitopes from tumor cells are capable of recognition by T cells can assist in the development of tumor-specific, cell-based therapies and can shed light on antitumor responses. Here, we generate a ranking algorithm for class I candidate neoepitopes by using next-generation sequencing data and a dataset of 185 neoepitopes that are recognized by HLA class I–restricted TIL from individuals with metastatic cancer. Random forest model analysis showed that the inclusion of multiple factors impacting epitope presentation and recognition increased output sensitivity and specificity compared to the use of predicted HLA binding alone. The ranking score output provides a set of class I candidate neoantigens that may serve as therapeutic targets and provides a tool to facilitate in vitro and in vivo studies aimed at the development of more effective immunotherapies.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Evaluating criteria for their effects on nmer discovery.
Fig. 2: Analysis of unique minimal epitopes.
Fig. 3: Evaluation of mmp input features.
Fig. 4: Evaluation of models in a test set.

Similar content being viewed by others

Data availability

All next-generation sequencing data are available on dbGap under accession number phs001003.v1.p1. Source data are available from the NIH figshare repository at https://doi.org/10.35092/yhjc.c.4792338.v2 (ref. 56).

Code availability

The models developed and presented in this paper are available at https://github.com/JaredJGartner/SB_neoantigen_Models.

References

  1. Huang, J. et al. T cells associated with tumor regression recognize frameshifted products of the CDKN2A tumor suppressor gene locus and a mutated HLA class I gene product. J. Immunol. 172, 6057–6064 (2004).

    Article  CAS  PubMed  Google Scholar 

  2. Zhou, J., Dudley, M. E., Rosenberg, S. A. & Robbins, P. F. Persistence of multiple tumor-specific T-cell clones is associated with complete tumor regression in a melanoma patient receiving adoptive cell transfer therapy. J. Immunother. 28, 53–62 (2005).

    Article  PubMed  PubMed Central  Google Scholar 

  3. Robbins, P. F. et al. Mining exomic sequencing data to identify mutated antigens recognized by adoptively transferred tumor-reactive T cells. Nat. Med. 19, 747–752 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Lu, Y. C. et al. Mutated PPP1R3B is recognized by T cells used to treat a melanoma patient who experienced a durable complete tumor regression. J. Immunol. 190, 6034–6042 (2013).

    Article  CAS  PubMed  Google Scholar 

  5. Lu, Y. C. et al. Efficient identification of mutated cancer antigens recognized by T cells associated with durable tumor regressions. Clin. Cancer Res. 20, 3401–3410 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Prickett, T. D. et al. Durable complete response from metastatic melanoma after transfer of autologous T cells recognizing 10 mutated tumor antigens. Cancer Immunol. Res. 4, 669–678 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Tran, E. et al. Cancer immunotherapy based on mutation-specific CD4+ T cells in a patient with epithelial cancer. Science 344, 641–645 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Tran, E. et al. T-cell transfer therapy targeting mutant KRAS in cancer. N. Engl. J. Med. 375, 2255–2262 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Zacharakis, N. et al. Immune recognition of somatic mutations leading to complete durable regression in metastatic breast cancer. Nat. Med. 24, 724–730 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Rizvi, N. A. et al. Mutational landscape determines sensitivity to PD-1 blockade in non-small cell lung cancer. Science 348, 124–128 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. McGranahan, N. et al. Clonal neoantigens elicit T cell immunoreactivity and sensitivity to immune checkpoint blockade. Science 351, 1463–1469 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Hellmann, M. D. et al. Genomic features of response to combination immunotherapy in patients with advanced non-small-cell lung cancer. Cancer Cell 33, 843–852 (2018).

  13. Le, D. T. et al. Mismatch repair deficiency predicts response of solid tumors to PD-1 blockade. Science 357, 409–413 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Le, D. T. et al. PD-1 blockade in tumors with mismatch-repair deficiency. N. Engl. J. Med. 372, 2509–2520 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Peltomaki, P. DNA mismatch repair and cancer. Mutat. Res. 488, 77–85 (2001).

    Article  CAS  PubMed  Google Scholar 

  16. Peters, B. & Sette, A. Generating quantitative models describing the sequence specificity of biological processes with the stabilized matrix method. BMC Bioinf. 6, 132 (2005).

    Article  CAS  Google Scholar 

  17. Alvarez, B. et al. NNAlign_MA; MHC peptidome deconvolution for accurate MHC binding motif characterization and improved T-cell epitope predictions. Mol. Cell. Proteomics 18, 2459–2477 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. O’Donnell, T. J. et al. MHCflurry: open-source class I MHC binding affinity prediction. Cell Syst. 7, 129–132 (2018).

    Article  PubMed  CAS  Google Scholar 

  19. Duan, F. et al. Genomic and bioinformatic profiling of mutational neoepitopes reveals new rules to predict anticancer immunogenicity. J. Exp. Med. 211, 2231–2248 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  20. Bulik-Sullivan, B. et al. Deep learning using tumor HLA peptide mass spectrometry datasets improves neoantigen identification. Nat. Biotechnol. 37, 55–63 (2019).

  21. Hundal, J. et al. pVACtools: a computational toolkit to identify and visualize cancer neoantigens. Cancer Immunol. Res. 8, 409–420 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  22. Bjerregaard, A. M., Nielsen, M., Hadrup, S. R., Szallasi, Z. & Eklund, A. C. MuPeXI: prediction of neo-epitopes from tumor sequencing data. Cancer Immunol. Immunother. 66, 1123–1130 (2017).

    Article  CAS  PubMed  Google Scholar 

  23. Kim, S. et al. Neopepsee: accurate genome-level prediction of neoantigens by harnessing sequence and amino acid immunogenicity information. Ann. Oncol. 29, 1030–1036 (2018).

    Article  CAS  PubMed  Google Scholar 

  24. Kosaloglu-Yalcin, Z. et al. Predicting T cell recognition of MHC class I restricted neoepitopes. Oncoimmunology 7, e1492508 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  25. Brown, S. D. et al. Neo-antigens predicted by tumor genome meta-analysis correlate with increased patient survival. Genome Res. 24, 743–750 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Balachandran, V. P. et al. Identification of unique neoantigen qualities in long-term survivors of pancreatic cancer. Nature 551, 512–516 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Parkhurst, M. R. et al. Unique neoantigens arise from somatic mutations in patients with gastrointestinal cancers. Cancer Discov. 9, 1022–1035 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Tran, E. et al. Immunogenicity of somatic mutations in human gastrointestinal cancers. Science 350, 1387–1390 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Lo, W. et al. Immunologic recognition of a shared p53 mutated neoantigen in a patient with metastatic colorectal cancer. Cancer Immunol. Res. 7, 534–543 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Jurtz, V. et al. NetMHCpan-4.0: improved peptide–MHC class I interaction predictions integrating eluted ligand and peptide binding affinity data. J. Immunol. 199, 3360–3368 (2017).

    Article  CAS  PubMed  Google Scholar 

  31. Gfeller, D. et al. The length distribution and multiple specificity of naturally presented HLA-I ligands. J. Immunol. 201, 3705–3716 (2018).

    Article  CAS  PubMed  Google Scholar 

  32. Sarkizova, S. et al. A large peptidome dataset improves HLA class I epitope prediction across most of the human population. Nat. Biotechnol. 38, 199–209 (2020).

    Article  CAS  PubMed  Google Scholar 

  33. Paul, S. et al. HLA class I alleles are associated with peptide-binding repertoires of different size, affinity, and immunogenicity. J. Immunol. 191, 5831–5839 (2013).

    Article  CAS  PubMed  Google Scholar 

  34. Chen, W., Yewdell, J. W., Levine, R. L. & Bennink, J. R. Modification of cysteine residues in vitro and in vivo affects the immunogenicity and antigenicity of major histocompatibility complex class I-restricted viral determinants. J. Exp. Med. 189, 1757–1764 (1999).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Chen, J. L. et al. Structural and kinetic basis for heightened immunogenicity of T cell vaccines. J. Exp. Med. 201, 1243–1255 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Sachs, A., et al. Impact of cysteine residues on MHC binding predictions and recognition by tumor-reactive T cells. J. Immunol. 205, 539–549 (2020).

  37. Sahin, U. et al. Personalized RNA mutanome vaccines mobilize poly-specific therapeutic immunity against cancer. Nature 547, 222–226 (2017).

    Article  CAS  PubMed  Google Scholar 

  38. Horton, P. et al. WoLF PSORT: protein localization predictor. Nucleic Acids Res. 35, W585–W587 (2007).

    Article  PubMed  PubMed Central  Google Scholar 

  39. Abelin, J. G. et al. Mass spectrometry profiling of HLA-associated peptidomes in mono-allelic cells enables more accurate epitope prediction. Immunity 46, 315–326 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Rasmussen, M. et al. Pan-specific prediction of peptide–MHC class I complex stability, a correlate of T cell immunogenicity. J. Immunol. 197, 1517–1524 (2016).

  41. Jorgensen, K. W., Rasmussen, M., Buus, S. & Nielsen, M. NetMHCstab—predicting stability of peptide–MHC-I complexes; impacts for cytotoxic T lymphocyte epitope discovery. Immunology 141, 18–26 (2014).

    Article  CAS  PubMed  Google Scholar 

  42. Groettrup, M., Kirk, C. J. & Basler, M. Proteasomes in immune cells: more than peptide producers? Nat. Rev. Immunol. 10, 73–78 (2010).

  43. Larsen, M. V. et al. An integrative approach to CTL epitope prediction: a combined algorithm integrating MHC class I binding, TAP transport efficiency, and proteasomal cleavage predictions. Eur. J. Immunol. 35, 2295–2303 (2005).

    Article  CAS  PubMed  Google Scholar 

  44. Capietto, A. H. et al. Mutation position is an important determinant for predicting cancer neoantigens. J. Exp. Med. 217, e20190179 (2020).

  45. Calis, J. J. et al. Properties of MHC class I presented peptides that enhance immunogenicity. PLoS Comput. Biol. 9, e1003266 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  46. Chowell, D. et al. TCR contact residue hydrophobicity is a hallmark of immunogenic CD8+ T cell epitopes. Proc. Natl Acad. Sci. USA 112, E1754–E1762 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Cohen, C. J. et al. Isolation of neoantigen-specific T cells from tumor and peripheral lymphocytes. J. Clin. Invest. 125, 3981–3991 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  48. Gros, A. et al. PD-1 identifies the patient-specific CD8+ tumor-reactive repertoire infiltrating human tumors. J. Clin. Invest. 124, 2246–2259 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Gros, A. et al. Prospective identification of neoantigen-specific lymphocytes in the peripheral blood of melanoma patients. Nat. Med. 22, 433–438 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Parkhurst, M. et al. Isolation of T-cell receptors specifically reactive with mutated tumor-associated antigens from tumor-infiltrating lymphocytes based on CD137 expression. Clin. Cancer Res. 23, 2491–2505 (2017).

    Article  CAS  PubMed  Google Scholar 

  51. Stevanovic, S. et al. Landscape of immunogenic tumor antigens in successful immunotherapy of virally induced epithelial cancer. Science 356, 200–205 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Deniger, D. C. et al. T-cell responses to TP53 “Hotspot” mutations and unique neoantigens expressed by human ovarian cancers. Clin. Cancer Res. 24, 5562–5573 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Yossef, R. et al. Enhanced detection of neoantigen-reactive T cells targeting unique and shared oncogenes for personalized cancer immunotherapy. JCI Insight 3, e122467 (2018).

  54. Gros, A. et al. Recognition of human gastrointestinal cancer neoantigens by circulating PD-1+ lymphocytes. J. Clin. Invest. 129, 4992–5004 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Larsen, M. V. et al. Large-scale validation of methods for cytotoxic T-lymphocyte epitope prediction. BMC Bioinf. 8, 424 (2007).

    Article  CAS  Google Scholar 

  56. Gartner, J. Datasets for ‘Development of a model for ranking candidate HLA class I neoantigens based upon datasets of known neoepitopes’. figshare https://doi.org/10.35092/yhjc.c.4792338.v2 (2020).

Download references

Acknowledgements

We thank members of the NIH High Performance Computing (HPC) group for all of their support, assistance and technical advice. This work utilized the computational resources of the NIH HPC Biowulf cluster (http://hpc.nih.gov). We also thank all members of the tissue procurement team for all of their efforts in acquiring and maintaining the specimens used in this study.

Author information

Authors and Affiliations

Authors

Contributions

J.J.G., P.F.R. and S.A.R. designed the study and drafted the manuscript. J.J.G. trained models and evaluated all nmers and mmps. T.D.P. and S.C.C. generated exomes and RNA-seq libraries. N.Z., K.H., Y.F.L. and P.F.R. designed minigene constructs encoding candidate neoantigens and generated in vitro-transcribed RNA used to perform screening assays. M.R.P., M.F. and S. Kivitz synthesized peptides used for T-cell screening assays. M.R.P., A.G., E.T., M.S.J., A.C., K.H., N.Z., A.L., S. Krishna and A.S. evaluated T cells for their ability to recognize nmers/mmps in the context of the appropriate HLA class I restriction elements.

Corresponding author

Correspondence to Paul F. Robbins.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Cancer thanks the anonymous reviewers for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Percentile Rank Comparisons between NetMHCpan4.0 EL and MHCFlurry1.6 Percentile Rank.

Percentile rank of positive mmps were mapped by their MHCflurry1.6 rank on the x-axis and the NetMHCpan4.0 EL model rank on the y-axis. Red Triangles correspond to mmps containing cysteine residues at positions 2,3 or C-terminus (n=12) while orange dots correspond to peptides containing cysteine residues at position 1 or between positions 3 and the C-terminus (n=107).

Extended Data Fig. 2 Nmer localization predictions.

WoLF Psort algorithm was used on all nmer proteins (n=9541) to predicted for localization. Blue bars are CD8 + Positive nmers, Orange bars are negative nmers. Y-axis represents frequency of each group predicted to localize. X axis are the WoLF Psort prediction abbreviations. chlo = chloroplast, cyto = cytosol, cysk = cytoskeleton, E.R. = endoplasmic reticulum, extr = extracellular, golg = Golgi apparatus, lyso = lysosome, mito = mitochondria, nucl = nuclear, pero = peroxisome, plas = plasma membrane, vacu = vacuolar membrane . Individual totals for each groups positive and negative can be found in Supplementary Table 12. Hyphenated values denote compound prediction. P-values comparing positive to negative nmers displayed over each prediction. P-values calculated using a two-sided Fisher’s exact test and corrected using Bonferroni correction for multiple comparisons.

Extended Data Fig. 3 Gene expression Decile of mmps.

Gene expression deciles of positive (n=119) and negative mmps (n=2681162). Box indicates quartiles 2 & 3 and inter quartile range, median indicated by line in box plot, whiskers represent quartile 1 and 4 ± 1.5X IQR or minimum/maximum value if within the whisker values. Significance calculated with Mann-Whitney U test.

Extended Data Fig. 4 IEDB Immunogenicity scores of mmps.

IEDB Immunogenicity scores were generated for each mmp using the IEDB immunogenicity tool. The panels are split into all mmps (positive n=119, negative n=2681162), comparison of just those with a mutation anchor in position 2,3 or C-terminus (positive n=55, negative n= 1167363) and those without mutations in position 2,3, or C-terminus (positive n= 64, negative n= 1513799). Box indicates quartiles 2 & 3 and inter quartile range, median indicated by line in box plot, whiskers represent quartile 1 and 4 ± 1.5X IQR or minimum/maximum value if within the whisker values. Significance was calculated using the Mann-Whitney U test.

Extended Data Fig. 5 Hydrophobicity scores of T-cell contact regions.

Hydrophobicity scores were calculated summing the Kyte-Doolittle hydrophobicity score of positions 4 through n-1. The panels are split into all mmps (positive n=119, negative n=2681162), comparison of just those with a anchor in position 2,3 or C-terminus (positive n=55, negative n= 1167363) and those without mutations in position 2,3, or C-terminus (positive n= 64, negative n= 1513799). Box indicates quartiles 2 & 3 and inter quartile range, median indicated by line in box plot, whiskers represent quartile 1 and 4 ± 1.5X IQR or minimum/maximum value if within the whisker values. Significance calculated with Mann-Whitney U test.

Extended Data Fig. 6 Top NMER models using either MMP score of MHCflurry Score as input.

ROC curve showing the mean performance of the top models using either MMP model scores or MHCflurry scores as input. Solid line represents mean for each model across n=5 folds, shaded area is the standard deviation at each point along the x-axis.

Supplementary information

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gartner, J.J., Parkhurst, M.R., Gros, A. et al. A machine learning model for ranking candidate HLA class I neoantigens based on known neoepitopes from multiple human tumor types. Nat Cancer 2, 563–574 (2021). https://doi.org/10.1038/s43018-021-00197-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s43018-021-00197-6

This article is cited by

Search

Quick links

Nature Briefing: Cancer

Sign up for the Nature Briefing: Cancer newsletter — what matters in cancer research, free to your inbox weekly.

Get what matters in cancer research, free to your inbox weekly. Sign up for Nature Briefing: Cancer