Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Accounting for proximal variants improves neoantigen prediction


Recent efforts to design personalized cancer immunotherapies use predicted neoantigens, but most neoantigen prediction strategies do not consider proximal (nearby) variants that alter the peptide sequence and may influence neoantigen binding. We evaluated somatic variants from 430 tumors to understand how proximal somatic and germline alterations change the neoantigenic peptide sequence and also affect neoantigen binding predictions. On average, 241 missense somatic variants were analyzed per sample. Of these somatic variants, 5% had one or more in-phase missense proximal variants. Without incorporating proximal variant correction for major histocompatibility complex class I neoantigen peptides, the overall false discovery rate (incorrect neoantigens predicted) and the false negative rate (strong-binding neoantigens missed) across peptides of lengths 8–11 were estimated as 0.069 (6.9%) and 0.026 (2.6%), respectively.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Rent or buy this article

Get just this article for as long as you need it


Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Overview of the pipeline.
Fig. 2: Mischaracterization of neoantigens before PVC.

Data availability

Several of the in-house sequencing datasets used in the study have been previously published and deposited in various databases. All sequence data for the HER2+ breast cancer samples can be accessed via the Database of Genotypes and Phenotypes (dbGaP; study accession phs001291)17. Data for the oral squamous cell carcinoma project and hepatocellular carcinoma samples are part of other manuscripts currently in preparation and can be accessed under dbGaP study accessions phs001623 and phs001106, respectively. Results for the glioblastoma case18 and small cell lung cancer cases19 have been published and can be accessed under dbGaP study accessions phs001663 and phs001049, respectively. TCGA data can be accessed under dbGaP study accession phs000178.


  1. Hackl, H., Charoentong, P., Finotello, F. & Trajanoski, Z. Computational genomics tools for dissecting tumour–immune cell interactions. Nat. Rev. Genet. 17, 441–458 (2016).

    Article  CAS  Google Scholar 

  2. Schumacher, T. N. & Schreiber, R. D. Neoantigens in cancer immunotherapy. Science 348, 69–74 (2015).

    Article  CAS  Google Scholar 

  3. Liu, X. S. & Mardis, E. R. Applications of immunogenomics to cancer. Cell 168, 600–612 (2017).

    Article  CAS  Google Scholar 

  4. Hundal, J. et al. pVAC-Seq: a genome-guided in silico approach to identifying tumor neoantigens. Genome Med. 8, 11 (2016).

    Article  Google Scholar 

  5. Bjerregaard, A.-M., Nielsen, M., Hadrup, S. R., Szallasi, Z. & Eklund, A. C. MuPeXI: prediction of neo-epitopes from tumor sequencing data. Cancer Immunol. Immunother. 66, 1123–1130 (2017).

  6. Rubinsteyn, A., Hodes, I., Kodysh, J. & Hammerbacher, J. Vaxrank: a computational tool for designing personalized cancer vaccines. Preprint at bioRxiv (2017).

  7. Meydan, C., Otu, H. H. & Sezerman, O. U. Prediction of peptides binding to MHC class I and II alleles by temporal motif mining. BMC Bioinformatics 14(Suppl. 2), S13 (2013).

    Article  Google Scholar 

  8. Rammensee, H. G., Friede, T. & Stevanoviíc, S. MHC ligands and peptide motifs: first listing. Immunogenetics 41, 178–228 (1995).

    Article  CAS  Google Scholar 

  9. Poplin, R. et al. Scaling accurate genetic variant discovery to tens of thousands of samples. Preprint at bioRxiv (2017).

  10. Łuksza, M. et al. A neoantigen fitness model predicts tumour response to checkpoint blockade immunotherapy. Nature 551, 517–520 (2017).

    Article  Google Scholar 

  11. Sette, A. et al. The relationship between class I binding affinity and immunogenicity of potential cytotoxic T cell epitopes. J. Immunol. 153, 5586–5592 (1994).

    CAS  PubMed  Google Scholar 

  12. Turajlic, S. et al. Insertion-and-deletion-derived tumour-specific neoantigens and the immunogenic phenotype: a pan-cancer analysis. Lancet Oncol. 18, 1009–1021 (2017).

    Article  CAS  Google Scholar 

  13. Carreno, B. M. et al. Cancer immunotherapy. A dendritic cell vaccine increases the breadth and diversity of melanoma neoantigen-specific T cells. Science 348, 803–808 (2015).

    Article  CAS  Google Scholar 

  14. Sahin, U. et al. Personalized RNA mutanome vaccines mobilize poly-specific therapeutic immunity against cancer. Nature 547, 222–226 (2017).

    Article  CAS  Google Scholar 

  15. Ott, P. A. et al. An immunogenic personal neoantigen vaccine for patients with melanoma. Nature 547, 217–221 (2017).

    Article  CAS  Google Scholar 

  16. Linette, G. P. & Carreno, B. M. Neoantigen vaccines pass the immunogenicity test. Trends Mol. Med. 23, 869–871 (2017).

    Article  CAS  Google Scholar 

  17. Lesurf, R. et al. Genomic characterization of HER2-positive breast cancer and response to neoadjuvant trastuzumab and chemotherapy-results from the ACOSOG Z1041 (Alliance) trial. Ann. Oncol. 28, 1070–1077 (2017).

    Article  CAS  Google Scholar 

  18. Johanns, T. M. et al. Immunogenomics of hypermutated glioblastoma: a patient with germline POLE deficiency treated with checkpoint blockade immunotherapy. Cancer Discov. 6, 1230–1236 (2016).

    Article  Google Scholar 

  19. Wagner, A. H. et al. Recurrent WNT pathway alterations are frequent in relapsed small cell lung cancer. Nat. Commun. 9, 3787 (2018).

    Article  Google Scholar 

  20. Griffith, M. et al. Genome Modeling System: a knowledge management platform for genomics. PLoS Comput. Biol. 11, e1004274 (2015).

    Article  Google Scholar 

  21. Griffith, M. et al. Optimizing cancer genome sequencing and analysis. Cell Syst. 1, 210–223 (2015).

    Article  CAS  Google Scholar 

  22. Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows–Wheeler transform. Bioinformatics 26, 589–595 (2010).

    Article  Google Scholar 

  23. Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

    Article  Google Scholar 

  24. Larson, D. E. et al. SomaticSniper: identification of somatic point mutations in whole genome sequencing data. Bioinformatics 28, 311–317 (2012).

    Article  CAS  Google Scholar 

  25. Saunders, C. T. et al. Strelka: accurate somatic small-variant calling from sequenced tumor-normal sample pairs. Bioinformatics 28, 1811–1817 (2012).

    Article  CAS  Google Scholar 

  26. Koboldt, D. C. et al. VarScan: variant detection in massively parallel sequencing of individual and pooled samples. Bioinformatics 25, 2283–2285 (2009).

    Article  CAS  Google Scholar 

  27. Koboldt, D. C. et al. VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 22, 568–576 (2012).

    Article  CAS  Google Scholar 

  28. Griffith, M. et al. Comprehensive genomic analysis reveals FLT3 activation and a therapeutic strategy for a patient with relapsed adult B-lymphoblastic leukemia. Exp. Hematol. 44, 603–613 (2016).

    Article  CAS  Google Scholar 

  29. Barnell, E. K. et al. Standard operating procedure for somatic variant refinement of tumor sequencing data. Genet. Med. (2018).

  30. Van der Auwera, G. A. et al. From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Curr. Protoc. Bioinformatics 43, 11.10.1–11.10.33 (2013).

    Google Scholar 

  31. Szolek, A. et al. OptiType: precision HLA typing from next-generation sequencing data. Bioinformatics 30, 3310–3316 (2014).

    Article  CAS  Google Scholar 

  32. Charoentong, P. et al. Pan-cancer immunogenomic analyses reveal genotype-immunophenotype relationships and predictors of response to checkpoint blockade. Cell Rep. 18, 248–262 (2017).

    Article  CAS  Google Scholar 

  33. Chicz, R. M. et al. Predominant naturally processed peptides bound to HLA-DR1 are derived from MHC-related molecules and are heterogeneous in size. Nature 358, 764–768 (1992).

    Article  CAS  Google Scholar 

  34. Vita, R. et al. The immune epitope database (IEDB) 3.0. Nucleic Acids Res. 43, D405–D412 (2014).

    Article  Google Scholar 

  35. McLaren, W. et al. The Ensembl Variant Effect Predictor. Genome Biol. 17, 122 (2016).

    Article  Google Scholar 

  36. Andreatta, M. & Nielsen, M. Gapped sequence alignment using artificial neural networks: application to the MHC class I system. Bioinformatics 32, 511–517 (2016).

    Article  CAS  Google Scholar 

  37. Nielsen, M. et al. Reliable prediction of T-cell epitopes using neural networks with novel sequence representations. Protein Sci. 12, 1007–1017 (2003).

    Article  CAS  Google Scholar 

Download references


We are grateful to the research participants and their families, without whom this study would not be possible. We thank G. Dunn for early access to raw data for the published glioblastoma hypermutator case included in our analysis. We also thank R. Schreiber and B. Carreno for the initial discussions that inspired the study, and for their expertise and guidance during the study. R.G. was supported by the National Institutes of Health (NIH) National Cancer Institute (U01CA231844). S.J.S. was supported by the NIH National Library of Medicine (R01LM012222 and R01LM012482). O.L.G. was supported by the NIH National Cancer Institute (U01CA209936 and U01CA231844). M.G. was supported by the NIH National Human Genome Research Institute (R00HG007940) and the NIH National Cancer Institute (U01CA209936).

Author information

Authors and Affiliations



J.H. was involved in all aspects of this study, including designing and developing the methodology, analyzing and interpreting data, and writing the manuscript, with input from C.J.L., S.J.S., O.L.G., E.R.M., and M.G. S.K. was involved in development of neoantigen prediction software and participated in the data analysis and writing the manuscript. Y.-Y.F. contributed to data analysis, interpretation, and writing the manuscript. R.G., W.C.C., and R.U. provided unpublished tumor datasets and provided critical feedback on the manuscript. E.R.M. and M.G. supervised the study. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Elaine R. Mardis or Malachi Griffith.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated supplementary information

Supplementary Figure 1 Example of candidate neoantigen evaluation.

This figure shows the possible sub-peptide registers for selection of a candidate neoantigen of length 9. The 17-mer peptide window for a 9-mer candidate is selected by scanning 8 amino acids on each side of the mutated amino acid resulting from the SVOI (red box). Only those registers that contain amino acid changes resulting from both—the proximal variant (PV; orange box), as well as the SVOI (red box)—were considered for this analysis (five peptides shown in yellow for this example). The remaining registers shown (gray boxes) contain the SVOI but are not affected by the proximal variant.

Supplementary Figure 2 Example of a germline SNP within the proximity of a somatic SNV.

An example from one of the TCGA melanoma samples with a missense SNV that overlaps a germline SNP (dbSNP ID: rs9891498), 21 nucleotides upstream. When translated, the germline SNP results in the S357F (NP_001275708.1:p.Phe357Ser) alteration and is 7 amino acids downstream of the missense somatic variant F350S (NP_001275708.1:p.Ser350Phe) in MARCH10.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1 and 2

Reporting Summary

Supplementary Table 1

This table shows, for each sample, the percentage of SVOIs harboring any neighboring variants within the specified 89-bp window and the percentage of the total SVOIs that had any proximal variants in phase

Supplementary Table 2

This table shows the breakdown of all sequencing datasets used for this study and their corresponding accession IDs

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Hundal, J., Kiwala, S., Feng, YY. et al. Accounting for proximal variants improves neoantigen prediction. Nat Genet 51, 175–179 (2019).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

This article is cited by


Quick links

Nature Briefing: Cancer

Sign up for the Nature Briefing: Cancer newsletter — what matters in cancer research, free to your inbox weekly.

Get what matters in cancer research, free to your inbox weekly. Sign up for Nature Briefing: Cancer