Recent efforts to design personalized cancer immunotherapies use predicted neoantigens, but most neoantigen prediction strategies do not consider proximal (nearby) variants that alter the peptide sequence and may influence neoantigen binding. We evaluated somatic variants from 430 tumors to understand how proximal somatic and germline alterations change the neoantigenic peptide sequence and also affect neoantigen binding predictions. On average, 241 missense somatic variants were analyzed per sample. Of these somatic variants, 5% had one or more in-phase missense proximal variants. Without incorporating proximal variant correction for major histocompatibility complex class I neoantigen peptides, the overall false discovery rate (incorrect neoantigens predicted) and the false negative rate (strong-binding neoantigens missed) across peptides of lengths 8–11 were estimated as 0.069 (6.9%) and 0.026 (2.6%), respectively.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Data availability

Several of the in-house sequencing datasets used in the study have been previously published and deposited in various databases. All sequence data for the HER2+ breast cancer samples can be accessed via the Database of Genotypes and Phenotypes (dbGaP; study accession phs001291)17. Data for the oral squamous cell carcinoma project and hepatocellular carcinoma samples are part of other manuscripts currently in preparation and can be accessed under dbGaP study accessions phs001623 and phs001106, respectively. Results for the glioblastoma case18 and small cell lung cancer cases19 have been published and can be accessed under dbGaP study accessions phs001663 and phs001049, respectively. TCGA data can be accessed under dbGaP study accession phs000178.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.


  1. 1.

    Hackl, H., Charoentong, P., Finotello, F. & Trajanoski, Z. Computational genomics tools for dissecting tumour–immune cell interactions. Nat. Rev. Genet. 17, 441–458 (2016).

  2. 2.

    Schumacher, T. N. & Schreiber, R. D. Neoantigens in cancer immunotherapy. Science 348, 69–74 (2015).

  3. 3.

    Liu, X. S. & Mardis, E. R. Applications of immunogenomics to cancer. Cell 168, 600–612 (2017).

  4. 4.

    Hundal, J. et al. pVAC-Seq: a genome-guided in silico approach to identifying tumor neoantigens. Genome Med. 8, 11 (2016).

  5. 5.

    Bjerregaard, A.-M., Nielsen, M., Hadrup, S. R., Szallasi, Z. & Eklund, A. C. MuPeXI: prediction of neo-epitopes from tumor sequencing data. Cancer Immunol. Immunother. 66, 1123–1130 (2017).

  6. 6.

    Rubinsteyn, A., Hodes, I., Kodysh, J. & Hammerbacher, J. Vaxrank: a computational tool for designing personalized cancer vaccines. Preprint at bioRxiv https://doi.org/10.1101/142919 (2017).

  7. 7.

    Meydan, C., Otu, H. H. & Sezerman, O. U. Prediction of peptides binding to MHC class I and II alleles by temporal motif mining. BMC Bioinformatics 14(Suppl. 2), S13 (2013).

  8. 8.

    Rammensee, H. G., Friede, T. & Stevanoviíc, S. MHC ligands and peptide motifs: first listing. Immunogenetics 41, 178–228 (1995).

  9. 9.

    Poplin, R. et al. Scaling accurate genetic variant discovery to tens of thousands of samples. Preprint at bioRxiv https://doi.org/10.1101/201178 (2017).

  10. 10.

    Łuksza, M. et al. A neoantigen fitness model predicts tumour response to checkpoint blockade immunotherapy. Nature 551, 517–520 (2017).

  11. 11.

    Sette, A. et al. The relationship between class I binding affinity and immunogenicity of potential cytotoxic T cell epitopes. J. Immunol. 153, 5586–5592 (1994).

  12. 12.

    Turajlic, S. et al. Insertion-and-deletion-derived tumour-specific neoantigens and the immunogenic phenotype: a pan-cancer analysis. Lancet Oncol. 18, 1009–1021 (2017).

  13. 13.

    Carreno, B. M. et al. Cancer immunotherapy. A dendritic cell vaccine increases the breadth and diversity of melanoma neoantigen-specific T cells. Science 348, 803–808 (2015).

  14. 14.

    Sahin, U. et al. Personalized RNA mutanome vaccines mobilize poly-specific therapeutic immunity against cancer. Nature 547, 222–226 (2017).

  15. 15.

    Ott, P. A. et al. An immunogenic personal neoantigen vaccine for patients with melanoma. Nature 547, 217–221 (2017).

  16. 16.

    Linette, G. P. & Carreno, B. M. Neoantigen vaccines pass the immunogenicity test. Trends Mol. Med. 23, 869–871 (2017).

  17. 17.

    Lesurf, R. et al. Genomic characterization of HER2-positive breast cancer and response to neoadjuvant trastuzumab and chemotherapy-results from the ACOSOG Z1041 (Alliance) trial. Ann. Oncol. 28, 1070–1077 (2017).

  18. 18.

    Johanns, T. M. et al. Immunogenomics of hypermutated glioblastoma: a patient with germline POLE deficiency treated with checkpoint blockade immunotherapy. Cancer Discov. 6, 1230–1236 (2016).

  19. 19.

    Wagner, A. H. et al. Recurrent WNT pathway alterations are frequent in relapsed small cell lung cancer. Nat. Commun. 9, 3787 (2018).

  20. 20.

    Griffith, M. et al. Genome Modeling System: a knowledge management platform for genomics. PLoS Comput. Biol. 11, e1004274 (2015).

  21. 21.

    Griffith, M. et al. Optimizing cancer genome sequencing and analysis. Cell Syst. 1, 210–223 (2015).

  22. 22.

    Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows–Wheeler transform. Bioinformatics 26, 589–595 (2010).

  23. 23.

    Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

  24. 24.

    Larson, D. E. et al. SomaticSniper: identification of somatic point mutations in whole genome sequencing data. Bioinformatics 28, 311–317 (2012).

  25. 25.

    Saunders, C. T. et al. Strelka: accurate somatic small-variant calling from sequenced tumor-normal sample pairs. Bioinformatics 28, 1811–1817 (2012).

  26. 26.

    Koboldt, D. C. et al. VarScan: variant detection in massively parallel sequencing of individual and pooled samples. Bioinformatics 25, 2283–2285 (2009).

  27. 27.

    Koboldt, D. C. et al. VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 22, 568–576 (2012).

  28. 28.

    Griffith, M. et al. Comprehensive genomic analysis reveals FLT3 activation and a therapeutic strategy for a patient with relapsed adult B-lymphoblastic leukemia. Exp. Hematol. 44, 603–613 (2016).

  29. 29.

    Barnell, E. K. et al. Standard operating procedure for somatic variant refinement of tumor sequencing data. Genet. Med. https://doi.org/10.1038/s41436-018-0278-z (2018).

  30. 30.

    Van der Auwera, G. A. et al. From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Curr. Protoc. Bioinformatics 43, 11.10.1–11.10.33 (2013).

  31. 31.

    Szolek, A. et al. OptiType: precision HLA typing from next-generation sequencing data. Bioinformatics 30, 3310–3316 (2014).

  32. 32.

    Charoentong, P. et al. Pan-cancer immunogenomic analyses reveal genotype-immunophenotype relationships and predictors of response to checkpoint blockade. Cell Rep. 18, 248–262 (2017).

  33. 33.

    Chicz, R. M. et al. Predominant naturally processed peptides bound to HLA-DR1 are derived from MHC-related molecules and are heterogeneous in size. Nature 358, 764–768 (1992).

  34. 34.

    Vita, R. et al. The immune epitope database (IEDB) 3.0. Nucleic Acids Res. 43, D405–D412 (2014).

  35. 35.

    McLaren, W. et al. The Ensembl Variant Effect Predictor. Genome Biol. 17, 122 (2016).

  36. 36.

    Andreatta, M. & Nielsen, M. Gapped sequence alignment using artificial neural networks: application to the MHC class I system. Bioinformatics 32, 511–517 (2016).

  37. 37.

    Nielsen, M. et al. Reliable prediction of T-cell epitopes using neural networks with novel sequence representations. Protein Sci. 12, 1007–1017 (2003).

Download references


We are grateful to the research participants and their families, without whom this study would not be possible. We thank G. Dunn for early access to raw data for the published glioblastoma hypermutator case included in our analysis. We also thank R. Schreiber and B. Carreno for the initial discussions that inspired the study, and for their expertise and guidance during the study. R.G. was supported by the National Institutes of Health (NIH) National Cancer Institute (U01CA231844). S.J.S. was supported by the NIH National Library of Medicine (R01LM012222 and R01LM012482). O.L.G. was supported by the NIH National Cancer Institute (U01CA209936 and U01CA231844). M.G. was supported by the NIH National Human Genome Research Institute (R00HG007940) and the NIH National Cancer Institute (U01CA209936).

Author information


  1. McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA

    • Jasreet Hundal
    • , Susanna Kiwala
    • , Yang-Yang Feng
    • , Connor J. Liu
    • , Obi L. Griffith
    •  & Malachi Griffith
  2. Siteman Cancer Center, Washington University School of Medicine, St. Louis, MO, USA

    • Ramaswamy Govindan
    • , Obi L. Griffith
    •  & Malachi Griffith
  3. Division of Oncology, Department of Internal Medicine, Washington University School of Medicine, St. Louis, MO, USA

    • Ramaswamy Govindan
    • , Obi L. Griffith
    •  & Malachi Griffith
  4. Department of Surgery, Washington University School of Medicine, St. Louis, MO, USA

    • William C. Chapman
  5. Department of Surgery/Otolaryngology, Brigham and Women’s Hospital and Dana-Farber Cancer Institute, Boston, MA, USA

    • Ravindra Uppaluri
  6. Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, MO, USA

    • S. Joshua Swamidass
  7. Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA

    • Obi L. Griffith
    •  & Malachi Griffith
  8. Institute for Genomic Medicine, Nationwide Children’s Hospital, Columbus, OH, USA

    • Elaine R. Mardis


  1. Search for Jasreet Hundal in:

  2. Search for Susanna Kiwala in:

  3. Search for Yang-Yang Feng in:

  4. Search for Connor J. Liu in:

  5. Search for Ramaswamy Govindan in:

  6. Search for William C. Chapman in:

  7. Search for Ravindra Uppaluri in:

  8. Search for S. Joshua Swamidass in:

  9. Search for Obi L. Griffith in:

  10. Search for Elaine R. Mardis in:

  11. Search for Malachi Griffith in:


J.H. was involved in all aspects of this study, including designing and developing the methodology, analyzing and interpreting data, and writing the manuscript, with input from C.J.L., S.J.S., O.L.G., E.R.M., and M.G. S.K. was involved in development of neoantigen prediction software and participated in the data analysis and writing the manuscript. Y.-Y.F. contributed to data analysis, interpretation, and writing the manuscript. R.G., W.C.C., and R.U. provided unpublished tumor datasets and provided critical feedback on the manuscript. E.R.M. and M.G. supervised the study. All authors read and approved the final manuscript.

Competing interests

The authors declare no competing interests.

Corresponding authors

Correspondence to Elaine R. Mardis or Malachi Griffith.

Integrated supplementary information

  1. Supplementary Figure 1 Example of candidate neoantigen evaluation.

    This figure shows the possible sub-peptide registers for selection of a candidate neoantigen of length 9. The 17-mer peptide window for a 9-mer candidate is selected by scanning 8 amino acids on each side of the mutated amino acid resulting from the SVOI (red box). Only those registers that contain amino acid changes resulting from both—the proximal variant (PV; orange box), as well as the SVOI (red box)—were considered for this analysis (five peptides shown in yellow for this example). The remaining registers shown (gray boxes) contain the SVOI but are not affected by the proximal variant.

  2. Supplementary Figure 2 Example of a germline SNP within the proximity of a somatic SNV.

    An example from one of the TCGA melanoma samples with a missense SNV that overlaps a germline SNP (dbSNP ID: rs9891498), 21 nucleotides upstream. When translated, the germline SNP results in the S357F (NP_001275708.1:p.Phe357Ser) alteration and is 7 amino acids downstream of the missense somatic variant F350S (NP_001275708.1:p.Ser350Phe) in MARCH10.

Supplementary information

  1. Supplementary Text and Figures

    Supplementary Figures 1 and 2

  2. Reporting Summary

  3. Supplementary Table 1

    This table shows, for each sample, the percentage of SVOIs harboring any neighboring variants within the specified 89-bp window and the percentage of the total SVOIs that had any proximal variants in phase

  4. Supplementary Table 2

    This table shows the breakdown of all sequencing datasets used for this study and their corresponding accession IDs

About this article

Publication history