Abstract
Recent efforts to design personalized cancer immunotherapies use predicted neoantigens, but most neoantigen prediction strategies do not consider proximal (nearby) variants that alter the peptide sequence and may influence neoantigen binding. We evaluated somatic variants from 430 tumors to understand how proximal somatic and germline alterations change the neoantigenic peptide sequence and also affect neoantigen binding predictions. On average, 241 missense somatic variants were analyzed per sample. Of these somatic variants, 5% had one or more in-phase missense proximal variants. Without incorporating proximal variant correction for major histocompatibility complex class I neoantigen peptides, the overall false discovery rate (incorrect neoantigens predicted) and the false negative rate (strong-binding neoantigens missed) across peptides of lengths 8–11 were estimated as 0.069 (6.9%) and 0.026 (2.6%), respectively.
This is a preview of subscription content, access via your institution
Relevant articles
Open Access articles citing this article.
-
Whole exome and transcriptome sequencing reveal clonal evolution and exhibit immune-related features in metastatic colorectal tumors
Cell Death Discovery Open Access 27 August 2021
-
Identification and ranking of recurrent neo-epitopes in cancer
BMC Medical Genomics Open Access 27 November 2019
-
Best practices for bioinformatic characterization of neoantigens for clinical utility
Genome Medicine Open Access 28 August 2019
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 per month
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$189.00 per year
only $15.75 per issue
Rent or buy this article
Get just this article for as long as you need it
$39.95
Prices may be subject to local taxes which are calculated during checkout


Data availability
Several of the in-house sequencing datasets used in the study have been previously published and deposited in various databases. All sequence data for the HER2+ breast cancer samples can be accessed via the Database of Genotypes and Phenotypes (dbGaP; study accession phs001291)17. Data for the oral squamous cell carcinoma project and hepatocellular carcinoma samples are part of other manuscripts currently in preparation and can be accessed under dbGaP study accessions phs001623 and phs001106, respectively. Results for the glioblastoma case18 and small cell lung cancer cases19 have been published and can be accessed under dbGaP study accessions phs001663 and phs001049, respectively. TCGA data can be accessed under dbGaP study accession phs000178.
References
Hackl, H., Charoentong, P., Finotello, F. & Trajanoski, Z. Computational genomics tools for dissecting tumour–immune cell interactions. Nat. Rev. Genet. 17, 441–458 (2016).
Schumacher, T. N. & Schreiber, R. D. Neoantigens in cancer immunotherapy. Science 348, 69–74 (2015).
Liu, X. S. & Mardis, E. R. Applications of immunogenomics to cancer. Cell 168, 600–612 (2017).
Hundal, J. et al. pVAC-Seq: a genome-guided in silico approach to identifying tumor neoantigens. Genome Med. 8, 11 (2016).
Bjerregaard, A.-M., Nielsen, M., Hadrup, S. R., Szallasi, Z. & Eklund, A. C. MuPeXI: prediction of neo-epitopes from tumor sequencing data. Cancer Immunol. Immunother. 66, 1123–1130 (2017).
Rubinsteyn, A., Hodes, I., Kodysh, J. & Hammerbacher, J. Vaxrank: a computational tool for designing personalized cancer vaccines. Preprint at bioRxiv https://doi.org/10.1101/142919 (2017).
Meydan, C., Otu, H. H. & Sezerman, O. U. Prediction of peptides binding to MHC class I and II alleles by temporal motif mining. BMC Bioinformatics 14(Suppl. 2), S13 (2013).
Rammensee, H. G., Friede, T. & Stevanoviíc, S. MHC ligands and peptide motifs: first listing. Immunogenetics 41, 178–228 (1995).
Poplin, R. et al. Scaling accurate genetic variant discovery to tens of thousands of samples. Preprint at bioRxiv https://doi.org/10.1101/201178 (2017).
Łuksza, M. et al. A neoantigen fitness model predicts tumour response to checkpoint blockade immunotherapy. Nature 551, 517–520 (2017).
Sette, A. et al. The relationship between class I binding affinity and immunogenicity of potential cytotoxic T cell epitopes. J. Immunol. 153, 5586–5592 (1994).
Turajlic, S. et al. Insertion-and-deletion-derived tumour-specific neoantigens and the immunogenic phenotype: a pan-cancer analysis. Lancet Oncol. 18, 1009–1021 (2017).
Carreno, B. M. et al. Cancer immunotherapy. A dendritic cell vaccine increases the breadth and diversity of melanoma neoantigen-specific T cells. Science 348, 803–808 (2015).
Sahin, U. et al. Personalized RNA mutanome vaccines mobilize poly-specific therapeutic immunity against cancer. Nature 547, 222–226 (2017).
Ott, P. A. et al. An immunogenic personal neoantigen vaccine for patients with melanoma. Nature 547, 217–221 (2017).
Linette, G. P. & Carreno, B. M. Neoantigen vaccines pass the immunogenicity test. Trends Mol. Med. 23, 869–871 (2017).
Lesurf, R. et al. Genomic characterization of HER2-positive breast cancer and response to neoadjuvant trastuzumab and chemotherapy-results from the ACOSOG Z1041 (Alliance) trial. Ann. Oncol. 28, 1070–1077 (2017).
Johanns, T. M. et al. Immunogenomics of hypermutated glioblastoma: a patient with germline POLE deficiency treated with checkpoint blockade immunotherapy. Cancer Discov. 6, 1230–1236 (2016).
Wagner, A. H. et al. Recurrent WNT pathway alterations are frequent in relapsed small cell lung cancer. Nat. Commun. 9, 3787 (2018).
Griffith, M. et al. Genome Modeling System: a knowledge management platform for genomics. PLoS Comput. Biol. 11, e1004274 (2015).
Griffith, M. et al. Optimizing cancer genome sequencing and analysis. Cell Syst. 1, 210–223 (2015).
Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows–Wheeler transform. Bioinformatics 26, 589–595 (2010).
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Larson, D. E. et al. SomaticSniper: identification of somatic point mutations in whole genome sequencing data. Bioinformatics 28, 311–317 (2012).
Saunders, C. T. et al. Strelka: accurate somatic small-variant calling from sequenced tumor-normal sample pairs. Bioinformatics 28, 1811–1817 (2012).
Koboldt, D. C. et al. VarScan: variant detection in massively parallel sequencing of individual and pooled samples. Bioinformatics 25, 2283–2285 (2009).
Koboldt, D. C. et al. VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 22, 568–576 (2012).
Griffith, M. et al. Comprehensive genomic analysis reveals FLT3 activation and a therapeutic strategy for a patient with relapsed adult B-lymphoblastic leukemia. Exp. Hematol. 44, 603–613 (2016).
Barnell, E. K. et al. Standard operating procedure for somatic variant refinement of tumor sequencing data. Genet. Med. https://doi.org/10.1038/s41436-018-0278-z (2018).
Van der Auwera, G. A. et al. From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Curr. Protoc. Bioinformatics 43, 11.10.1–11.10.33 (2013).
Szolek, A. et al. OptiType: precision HLA typing from next-generation sequencing data. Bioinformatics 30, 3310–3316 (2014).
Charoentong, P. et al. Pan-cancer immunogenomic analyses reveal genotype-immunophenotype relationships and predictors of response to checkpoint blockade. Cell Rep. 18, 248–262 (2017).
Chicz, R. M. et al. Predominant naturally processed peptides bound to HLA-DR1 are derived from MHC-related molecules and are heterogeneous in size. Nature 358, 764–768 (1992).
Vita, R. et al. The immune epitope database (IEDB) 3.0. Nucleic Acids Res. 43, D405–D412 (2014).
McLaren, W. et al. The Ensembl Variant Effect Predictor. Genome Biol. 17, 122 (2016).
Andreatta, M. & Nielsen, M. Gapped sequence alignment using artificial neural networks: application to the MHC class I system. Bioinformatics 32, 511–517 (2016).
Nielsen, M. et al. Reliable prediction of T-cell epitopes using neural networks with novel sequence representations. Protein Sci. 12, 1007–1017 (2003).
Acknowledgements
We are grateful to the research participants and their families, without whom this study would not be possible. We thank G. Dunn for early access to raw data for the published glioblastoma hypermutator case included in our analysis. We also thank R. Schreiber and B. Carreno for the initial discussions that inspired the study, and for their expertise and guidance during the study. R.G. was supported by the National Institutes of Health (NIH) National Cancer Institute (U01CA231844). S.J.S. was supported by the NIH National Library of Medicine (R01LM012222 and R01LM012482). O.L.G. was supported by the NIH National Cancer Institute (U01CA209936 and U01CA231844). M.G. was supported by the NIH National Human Genome Research Institute (R00HG007940) and the NIH National Cancer Institute (U01CA209936).
Author information
Authors and Affiliations
Contributions
J.H. was involved in all aspects of this study, including designing and developing the methodology, analyzing and interpreting data, and writing the manuscript, with input from C.J.L., S.J.S., O.L.G., E.R.M., and M.G. S.K. was involved in development of neoantigen prediction software and participated in the data analysis and writing the manuscript. Y.-Y.F. contributed to data analysis, interpretation, and writing the manuscript. R.G., W.C.C., and R.U. provided unpublished tumor datasets and provided critical feedback on the manuscript. E.R.M. and M.G. supervised the study. All authors read and approved the final manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Integrated supplementary information
Supplementary Figure 1 Example of candidate neoantigen evaluation.
This figure shows the possible sub-peptide registers for selection of a candidate neoantigen of length 9. The 17-mer peptide window for a 9-mer candidate is selected by scanning 8 amino acids on each side of the mutated amino acid resulting from the SVOI (red box). Only those registers that contain amino acid changes resulting from both—the proximal variant (PV; orange box), as well as the SVOI (red box)—were considered for this analysis (five peptides shown in yellow for this example). The remaining registers shown (gray boxes) contain the SVOI but are not affected by the proximal variant.
Supplementary Figure 2 Example of a germline SNP within the proximity of a somatic SNV.
An example from one of the TCGA melanoma samples with a missense SNV that overlaps a germline SNP (dbSNP ID: rs9891498), 21 nucleotides upstream. When translated, the germline SNP results in the S357F (NP_001275708.1:p.Phe357Ser) alteration and is 7 amino acids downstream of the missense somatic variant F350S (NP_001275708.1:p.Ser350Phe) in MARCH10.
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1 and 2
Supplementary Table 1
This table shows, for each sample, the percentage of SVOIs harboring any neighboring variants within the specified 89-bp window and the percentage of the total SVOIs that had any proximal variants in phase
Supplementary Table 2
This table shows the breakdown of all sequencing datasets used for this study and their corresponding accession IDs
Rights and permissions
About this article
Cite this article
Hundal, J., Kiwala, S., Feng, YY. et al. Accounting for proximal variants improves neoantigen prediction. Nat Genet 51, 175–179 (2019). https://doi.org/10.1038/s41588-018-0283-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41588-018-0283-9
This article is cited by
-
Whole exome and transcriptome sequencing reveal clonal evolution and exhibit immune-related features in metastatic colorectal tumors
Cell Death Discovery (2021)
-
Neoadjuvant PD-L1 plus CTLA-4 blockade in patients with cisplatin-ineligible operable high-risk urothelial carcinoma
Nature Medicine (2020)
-
Identification and ranking of recurrent neo-epitopes in cancer
BMC Medical Genomics (2019)
-
Best practices for bioinformatic characterization of neoantigens for clinical utility
Genome Medicine (2019)