Accounting for proximal variants improves neoantigen prediction

Abstract

Recent efforts to design personalized cancer immunotherapies use predicted neoantigens, but most neoantigen prediction strategies do not consider proximal (nearby) variants that alter the peptide sequence and may influence neoantigen binding. We evaluated somatic variants from 430 tumors to understand how proximal somatic and germline alterations change the neoantigenic peptide sequence and also affect neoantigen binding predictions. On average, 241 missense somatic variants were analyzed per sample. Of these somatic variants, 5% had one or more in-phase missense proximal variants. Without incorporating proximal variant correction for major histocompatibility complex class I neoantigen peptides, the overall false discovery rate (incorrect neoantigens predicted) and the false negative rate (strong-binding neoantigens missed) across peptides of lengths 8–11 were estimated as 0.069 (6.9%) and 0.026 (2.6%), respectively.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: Overview of the pipeline.
Fig. 2: Mischaracterization of neoantigens before PVC.

Data availability

Several of the in-house sequencing datasets used in the study have been previously published and deposited in various databases. All sequence data for the HER2+ breast cancer samples can be accessed via the Database of Genotypes and Phenotypes (dbGaP; study accession phs001291)17. Data for the oral squamous cell carcinoma project and hepatocellular carcinoma samples are part of other manuscripts currently in preparation and can be accessed under dbGaP study accessions phs001623 and phs001106, respectively. Results for the glioblastoma case18 and small cell lung cancer cases19 have been published and can be accessed under dbGaP study accessions phs001663 and phs001049, respectively. TCGA data can be accessed under dbGaP study accession phs000178.

References

  1. 1.

    Hackl, H., Charoentong, P., Finotello, F. & Trajanoski, Z. Computational genomics tools for dissecting tumour–immune cell interactions. Nat. Rev. Genet. 17, 441–458 (2016).

    CAS  Article  Google Scholar 

  2. 2.

    Schumacher, T. N. & Schreiber, R. D. Neoantigens in cancer immunotherapy. Science 348, 69–74 (2015).

    CAS  Article  Google Scholar 

  3. 3.

    Liu, X. S. & Mardis, E. R. Applications of immunogenomics to cancer. Cell 168, 600–612 (2017).

    CAS  Article  Google Scholar 

  4. 4.

    Hundal, J. et al. pVAC-Seq: a genome-guided in silico approach to identifying tumor neoantigens. Genome Med. 8, 11 (2016).

    Article  Google Scholar 

  5. 5.

    Bjerregaard, A.-M., Nielsen, M., Hadrup, S. R., Szallasi, Z. & Eklund, A. C. MuPeXI: prediction of neo-epitopes from tumor sequencing data. Cancer Immunol. Immunother. 66, 1123–1130 (2017).

  6. 6.

    Rubinsteyn, A., Hodes, I., Kodysh, J. & Hammerbacher, J. Vaxrank: a computational tool for designing personalized cancer vaccines. Preprint at bioRxiv https://doi.org/10.1101/142919 (2017).

  7. 7.

    Meydan, C., Otu, H. H. & Sezerman, O. U. Prediction of peptides binding to MHC class I and II alleles by temporal motif mining. BMC Bioinformatics 14(Suppl. 2), S13 (2013).

    Article  Google Scholar 

  8. 8.

    Rammensee, H. G., Friede, T. & Stevanoviíc, S. MHC ligands and peptide motifs: first listing. Immunogenetics 41, 178–228 (1995).

    CAS  Article  Google Scholar 

  9. 9.

    Poplin, R. et al. Scaling accurate genetic variant discovery to tens of thousands of samples. Preprint at bioRxiv https://doi.org/10.1101/201178 (2017).

  10. 10.

    Łuksza, M. et al. A neoantigen fitness model predicts tumour response to checkpoint blockade immunotherapy. Nature 551, 517–520 (2017).

    Article  Google Scholar 

  11. 11.

    Sette, A. et al. The relationship between class I binding affinity and immunogenicity of potential cytotoxic T cell epitopes. J. Immunol. 153, 5586–5592 (1994).

    CAS  PubMed  Google Scholar 

  12. 12.

    Turajlic, S. et al. Insertion-and-deletion-derived tumour-specific neoantigens and the immunogenic phenotype: a pan-cancer analysis. Lancet Oncol. 18, 1009–1021 (2017).

    CAS  Article  Google Scholar 

  13. 13.

    Carreno, B. M. et al. Cancer immunotherapy. A dendritic cell vaccine increases the breadth and diversity of melanoma neoantigen-specific T cells. Science 348, 803–808 (2015).

    CAS  Article  Google Scholar 

  14. 14.

    Sahin, U. et al. Personalized RNA mutanome vaccines mobilize poly-specific therapeutic immunity against cancer. Nature 547, 222–226 (2017).

    CAS  Article  Google Scholar 

  15. 15.

    Ott, P. A. et al. An immunogenic personal neoantigen vaccine for patients with melanoma. Nature 547, 217–221 (2017).

    CAS  Article  Google Scholar 

  16. 16.

    Linette, G. P. & Carreno, B. M. Neoantigen vaccines pass the immunogenicity test. Trends Mol. Med. 23, 869–871 (2017).

    CAS  Article  Google Scholar 

  17. 17.

    Lesurf, R. et al. Genomic characterization of HER2-positive breast cancer and response to neoadjuvant trastuzumab and chemotherapy-results from the ACOSOG Z1041 (Alliance) trial. Ann. Oncol. 28, 1070–1077 (2017).

    CAS  Article  Google Scholar 

  18. 18.

    Johanns, T. M. et al. Immunogenomics of hypermutated glioblastoma: a patient with germline POLE deficiency treated with checkpoint blockade immunotherapy. Cancer Discov. 6, 1230–1236 (2016).

    Article  Google Scholar 

  19. 19.

    Wagner, A. H. et al. Recurrent WNT pathway alterations are frequent in relapsed small cell lung cancer. Nat. Commun. 9, 3787 (2018).

    Article  Google Scholar 

  20. 20.

    Griffith, M. et al. Genome Modeling System: a knowledge management platform for genomics. PLoS Comput. Biol. 11, e1004274 (2015).

    Article  Google Scholar 

  21. 21.

    Griffith, M. et al. Optimizing cancer genome sequencing and analysis. Cell Syst. 1, 210–223 (2015).

    CAS  Article  Google Scholar 

  22. 22.

    Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows–Wheeler transform. Bioinformatics 26, 589–595 (2010).

    Article  Google Scholar 

  23. 23.

    Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

    Article  Google Scholar 

  24. 24.

    Larson, D. E. et al. SomaticSniper: identification of somatic point mutations in whole genome sequencing data. Bioinformatics 28, 311–317 (2012).

    CAS  Article  Google Scholar 

  25. 25.

    Saunders, C. T. et al. Strelka: accurate somatic small-variant calling from sequenced tumor-normal sample pairs. Bioinformatics 28, 1811–1817 (2012).

    CAS  Article  Google Scholar 

  26. 26.

    Koboldt, D. C. et al. VarScan: variant detection in massively parallel sequencing of individual and pooled samples. Bioinformatics 25, 2283–2285 (2009).

    CAS  Article  Google Scholar 

  27. 27.

    Koboldt, D. C. et al. VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 22, 568–576 (2012).

    CAS  Article  Google Scholar 

  28. 28.

    Griffith, M. et al. Comprehensive genomic analysis reveals FLT3 activation and a therapeutic strategy for a patient with relapsed adult B-lymphoblastic leukemia. Exp. Hematol. 44, 603–613 (2016).

    CAS  Article  Google Scholar 

  29. 29.

    Barnell, E. K. et al. Standard operating procedure for somatic variant refinement of tumor sequencing data. Genet. Med. https://doi.org/10.1038/s41436-018-0278-z (2018).

  30. 30.

    Van der Auwera, G. A. et al. From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Curr. Protoc. Bioinformatics 43, 11.10.1–11.10.33 (2013).

    Google Scholar 

  31. 31.

    Szolek, A. et al. OptiType: precision HLA typing from next-generation sequencing data. Bioinformatics 30, 3310–3316 (2014).

    CAS  Article  Google Scholar 

  32. 32.

    Charoentong, P. et al. Pan-cancer immunogenomic analyses reveal genotype-immunophenotype relationships and predictors of response to checkpoint blockade. Cell Rep. 18, 248–262 (2017).

    CAS  Article  Google Scholar 

  33. 33.

    Chicz, R. M. et al. Predominant naturally processed peptides bound to HLA-DR1 are derived from MHC-related molecules and are heterogeneous in size. Nature 358, 764–768 (1992).

    CAS  Article  Google Scholar 

  34. 34.

    Vita, R. et al. The immune epitope database (IEDB) 3.0. Nucleic Acids Res. 43, D405–D412 (2014).

    Article  Google Scholar 

  35. 35.

    McLaren, W. et al. The Ensembl Variant Effect Predictor. Genome Biol. 17, 122 (2016).

    Article  Google Scholar 

  36. 36.

    Andreatta, M. & Nielsen, M. Gapped sequence alignment using artificial neural networks: application to the MHC class I system. Bioinformatics 32, 511–517 (2016).

    CAS  Article  Google Scholar 

  37. 37.

    Nielsen, M. et al. Reliable prediction of T-cell epitopes using neural networks with novel sequence representations. Protein Sci. 12, 1007–1017 (2003).

    CAS  Article  Google Scholar 

Download references

Acknowledgements

We are grateful to the research participants and their families, without whom this study would not be possible. We thank G. Dunn for early access to raw data for the published glioblastoma hypermutator case included in our analysis. We also thank R. Schreiber and B. Carreno for the initial discussions that inspired the study, and for their expertise and guidance during the study. R.G. was supported by the National Institutes of Health (NIH) National Cancer Institute (U01CA231844). S.J.S. was supported by the NIH National Library of Medicine (R01LM012222 and R01LM012482). O.L.G. was supported by the NIH National Cancer Institute (U01CA209936 and U01CA231844). M.G. was supported by the NIH National Human Genome Research Institute (R00HG007940) and the NIH National Cancer Institute (U01CA209936).

Author information

Affiliations

Authors

Contributions

J.H. was involved in all aspects of this study, including designing and developing the methodology, analyzing and interpreting data, and writing the manuscript, with input from C.J.L., S.J.S., O.L.G., E.R.M., and M.G. S.K. was involved in development of neoantigen prediction software and participated in the data analysis and writing the manuscript. Y.-Y.F. contributed to data analysis, interpretation, and writing the manuscript. R.G., W.C.C., and R.U. provided unpublished tumor datasets and provided critical feedback on the manuscript. E.R.M. and M.G. supervised the study. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Elaine R. Mardis or Malachi Griffith.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated supplementary information

Supplementary Figure 1 Example of candidate neoantigen evaluation.

This figure shows the possible sub-peptide registers for selection of a candidate neoantigen of length 9. The 17-mer peptide window for a 9-mer candidate is selected by scanning 8 amino acids on each side of the mutated amino acid resulting from the SVOI (red box). Only those registers that contain amino acid changes resulting from both—the proximal variant (PV; orange box), as well as the SVOI (red box)—were considered for this analysis (five peptides shown in yellow for this example). The remaining registers shown (gray boxes) contain the SVOI but are not affected by the proximal variant.

Supplementary Figure 2 Example of a germline SNP within the proximity of a somatic SNV.

An example from one of the TCGA melanoma samples with a missense SNV that overlaps a germline SNP (dbSNP ID: rs9891498), 21 nucleotides upstream. When translated, the germline SNP results in the S357F (NP_001275708.1:p.Phe357Ser) alteration and is 7 amino acids downstream of the missense somatic variant F350S (NP_001275708.1:p.Ser350Phe) in MARCH10.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1 and 2

Reporting Summary

Supplementary Table 1

This table shows, for each sample, the percentage of SVOIs harboring any neighboring variants within the specified 89-bp window and the percentage of the total SVOIs that had any proximal variants in phase

Supplementary Table 2

This table shows the breakdown of all sequencing datasets used for this study and their corresponding accession IDs

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Hundal, J., Kiwala, S., Feng, Y. et al. Accounting for proximal variants improves neoantigen prediction. Nat Genet 51, 175–179 (2019). https://doi.org/10.1038/s41588-018-0283-9

Download citation

Further reading