Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Pan-cancer analysis identifies tumor-specific antigens derived from transposable elements

Abstract

Cryptic promoters within transposable elements (TEs) can be transcriptionally reactivated in tumors to create new TE-chimeric transcripts, which can produce immunogenic antigens. We performed a comprehensive screen for these TE exaptation events in 33 TCGA tumor types, 30 GTEx adult tissues and 675 cancer cell lines, and identified 1,068 TE-exapted candidates with the potential to generate shared tumor-specific TE-chimeric antigens (TS-TEAs). Whole-lysate and HLA-pulldown mass spectrometry data confirmed that TS-TEAs are presented on the surface of cancer cells. In addition, we highlight tumor-specific membrane proteins transcribed from TE promoters that constitute aberrant epitopes on the extracellular surface of cancer cells. Altogether, we showcase the high pan-cancer prevalence of TS-TEAs and atypical membrane proteins that could potentially be therapeutically exploited and targeted.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: TE-chimeric transcripts are a prevalent phenomenon across all 33 cancer types in TCGA.
Fig. 2: Epigenetic and genetic signatures associate with cryptic TE promoter activation in cancer.
Fig. 3: Cancer cell lines express abundant TE-chimeric transcripts.
Fig. 4: TE-chimeric peptides are detected in tumor samples.
Fig. 5: TS-TEAs are presented as HLA complexes in cancer cell lines.
Fig. 6: TE-chimeric transcripts can generate antigens on the membrane surface of cells.

Similar content being viewed by others

Data availability

All sequencing and mass spectrometry data generated in this study are available at the following accession codes (GEO accession code: GSE201021; PRIDE code: PXD033351). Links and accession codes to all publicly available data used in this study are detailed in Methods section. Source data are provided with this paper.

Code availability

TEProF2, the custom pipeline used to identify TE-chimeric transcripts from RNA-sequencing data, is available with the following link: https://doi.org/10.5281/zenodo.7670515. All other codes used to generate the analysis and figures have been placed in a notebook that is made available through the following link: https://doi.org/10.5281/zenodo.7670584. Source data are provided with this paper.

References

  1. Payer, L. M. & Burns, K. H. Transposable elements in human genetic disease. Nat. Rev. Genet. 20, 760–772 (2019).

    Article  CAS  PubMed  Google Scholar 

  2. Burns, K. H. Transposable elements in cancer. Nat. Rev. Cancer 17, 415–424 (2017).

    Article  CAS  PubMed  Google Scholar 

  3. Chénais, B. Transposable elements and human cancer: a causal relationship? Biochim. Biophys. Acta 1835, 28–35 (2013).

    PubMed  Google Scholar 

  4. Babaian, A. & Mager, D. L. Endogenous retroviral promoter exaptation in human cancer. Mob. DNA 7, 24 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  5. Babaian, A. et al. Onco-exaptation of an endogenous retroviral LTR drives IRF5 expression in Hodgkin lymphoma. Oncogene 35, 2542–2546 (2016).

    Article  CAS  PubMed  Google Scholar 

  6. Lock, F. E. et al. Distinct isoform of FABP7 revealed by screening for retroelement-activated genes in diffuse large B-cell lymphoma. Proc. Natl Acad. Sci. USA 111, E3534–E3543 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Lamprecht, B. et al. Derepression of an endogenous long terminal repeat activates the CSF1R proto-oncogene in human lymphoma. Nat. Med. 16, 571–579 (2010).

    Article  CAS  PubMed  Google Scholar 

  8. Wiesner, T. et al. Alternative transcription initiation leads to expression of a novel ALK isoform in cancer. Nature 526, 453–457 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Jang, H. S. et al. Transposable elements drive widespread expression of oncogenes in human cancers. Nat. Genet. 51, 611–617 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Clayton, E. A. et al. An atlas of transposable element-derived alternative splicing in cancer. Phil. Trans. R. Soc. B 375, 20190342 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Laumont, C. M. et al. Noncoding regions are the main source of targetable tumor-specific antigens. Sci. Transl. Med. 10, eaau5516 (2018).

    Article  CAS  PubMed  Google Scholar 

  12. Attig, J. et al. LTR retroelement expansion of the human cancer transcriptome and immunopeptidome revealed by de novo transcript assembly. Genome Res. 29, 1578–1590 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Kong, Y. et al. Transposable element expression in tumors is associated with immune infiltration and increased antigenicity. Nat. Commun. 10, 5228 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  14. Bonaventura, P. et al. Identification of shared tumor epitopes from endogenous retroviruses inducing high-avidity cytotoxic T cells for cancer immunotherapy. Sci. Adv. 8, eabj3671 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Knochelmann, H. M. et al. CAR T cells in solid tumors: blueprints for building effective therapies. Front. Immunol. 9, 1740 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  16. June, C. H., O’Connor, R. S., Kawalekar, O. U., Ghassemi, S. & Milone, M. C. CAR T cell immunotherapy for human cancer. Science 359, 1361–1365 (2018).

    Article  CAS  PubMed  Google Scholar 

  17. Abugessaisa, I. et al. FANTOM5 CAGE profiles of human and mouse reprocessed for GRCh38 and GRCm38 genome assemblies. Sci. Data 4, 170107 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Jones, P. A. & Baylin, S. B. The epigenomics of cancer. Cell 128, 683–692 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Sharma, S., Kelly, T. K. & Jones, P. A. Epigenetics in cancer. Carcinogenesis 31, 27–36 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  20. Baylin, S. B. & Jones, P. A. A decade of exploring the cancer epigenome-biological and translational implications. Nat. Rev. Cancer 11, 726–734 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Bird, A. DNA methylation patterns and epigenetic memory. Genes Dev. 16, 6–21 (2002).

    Article  CAS  PubMed  Google Scholar 

  22. Morgan, H. D., Sutherland, H. G. E., Martin, D. I. K. & Whitelaw, E. Epigenetic inheritance at the agouti locus in the mouse. Nat. Genet. 23, 314–318 (1999).

    Article  CAS  PubMed  Google Scholar 

  23. Slotkin, R. K. & Martienssen, R. Transposable elements and the epigenetic regulation of the genome. Nat. Rev. Genet. 8, 272–285 (2007).

    Article  CAS  PubMed  Google Scholar 

  24. Jung, H. et al. DNA methylation loss promotes immune evasion of tumours with high mutation and copy number load. Nat. Commun. 10, 4278 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  25. Corces, M. R. et al. The chromatin accessibility landscape of primary human cancers. Science 362, eaav1898 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  26. Bailey, M. H. et al. Comprehensive characterization of cancer driver genes and mutations. Cell 173, 371–385.e18 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Wang, T. et al. Species-specific endogenous retroviruses shape the transcriptional network of the human tumor suppressor protein p53. Proc. Natl Acad. Sci. USA 104, 18613–18618 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Tiwari, B. et al. P53 directly represses human LINE1 transposons. Genes Dev. 34, 1439–1451 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Leonova, K. I. et al. p53 cooperates with DNA methylation and a suicidal interferon response to maintain epigenetic silencing of repeats and noncoding RNAs. Proc. Natl Acad. Sci. USA 110, 89–98 (2013).

    Article  Google Scholar 

  30. Tiwari, B., Jones, A. E. & Abrams, J. M. Transposons, p53 and Genome Security. Trends Genet. 34, 846–855 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Levine, A. J., Ting, D. T. & Greenbaum, B. D. P53 and the defenses against genome instability caused by transposons and repetitive elements. BioEssays 38, 508–513 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. McKerrow, W. et al. LINE-1 expression in cancer correlates with p53 mutation, copy number alteration, and S phase checkpoint. Proc. Natl Acad. Sci. USA 119, e2115999119 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Rajurkar, M. et al. Reverse transcriptase inhibition disrupts repeat element life cycle in colorectal cancer. Cancer Disco. 12, 1462–1481 (2022).

    Article  Google Scholar 

  34. Andrysik, Z. et al. Identification of a core TP53 transcriptional program with highly distributed tumor suppressive activity. Genome Res. 27, 1645–1657 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Klijn, C. et al. A comprehensive transcriptional portrait of human cancer cell lines. Nat. Biotechnol. 33, 306–312 (2015).

    Article  CAS  PubMed  Google Scholar 

  36. Salimullah, M., Mizuho, S., Plessy, C. & Carninci, P. NanoCAGE: a high-resolution technique to discover and interrogate cell transcriptomes. Cold Spring Harb. Protoc. 2011, pdb.prot5559 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  37. Finn, R. D. et al. Pfam: the protein families database. Nucleic Acids Res. 42, D222–D230 (2014).

    Article  CAS  PubMed  Google Scholar 

  38. Krogh, A., Larsson, B., Von Heijne, G. & Sonnhammer, E. L. L. Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J. Mol. Biol. 305, 567–580 (2001).

    Article  CAS  PubMed  Google Scholar 

  39. Kahles, A. et al. Comprehensive analysis of alternative splicing across tumors from 8,705 patients. Cancer Cell 34, 211–224 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Mertins, P. et al. Proteogenomics connects somatic mutations to signalling in breast cancer. Nature 534, 55–62 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Zhang, H. et al. Integrated proteogenomic characterization of human high-grade serous ovarian cancer. Cell 166, 755–765 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Jurtz, V. et al. NetMHCpan-4.0: improved peptide–MHC class I interaction predictions integrating eluted ligand and peptide binding affinity data. J. Immunol. 199, 3360–3368 (2017).

    Article  CAS  PubMed  Google Scholar 

  43. Tyanova, S., Temu, T. & Cox, J. The MaxQuant computational platform for mass spectrometry-based shotgun proteomics. Nat. Protoc. 11, 2301–2319 (2016).

    Article  CAS  PubMed  Google Scholar 

  44. Scholtalbers, J. et al. TCLP: an online cancer cell line catalogue integrating HLA type, predicted neo-epitopes, virus and gene expression. Genome Med. 7, 118 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  45. Bassani-Sternberg, M., Pletscher-Frankild, S., Jensen, L. J. & Mann, M. Mass spectrometry of human leukocyte antigen class I peptidomes reveals strong effects of protein abundance and turnover on antigen presentation. Mol. Cell. Proteom. 14, 658–673 (2015).

    Article  CAS  Google Scholar 

  46. Newey, A. et al. Immunopeptidomics of colorectal cancer organoids reveals a sparse HLA class I neoantigen landscape and no increase in neoantigens with interferon or MEK-inhibitor treatment. J. Immunother. Cancer 7, 309 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  47. Kent, W. J. BLAT—The BLAST-like alignment tool. Genome Res. 12, 656–664 (2002).

    CAS  PubMed  PubMed Central  Google Scholar 

  48. Maus, M. V. & June, C. H. Making better chimeric antigen receptors for adoptive T-cell therapy. Clin. Cancer Res. 22, 1875–1884 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Marofi, F. et al. CAR T cells in solid tumors: challenges and opportunities. Stem Cell Res. Ther. 12, 1–16 (2021).

    Article  Google Scholar 

  50. Roulois, D. et al. DNA-demethylating agents target colorectal cancer cells by inducing viral mimicry by endogenous transcripts. Cell 162, 961–973 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Brocks, D. et al. DNMT and HDAC inhibitors induce cryptic transcription start sites encoded in long terminal repeats. Nat. Genet. 49, 1052–1060 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Chiappinelli, K. B. et al. Inhibiting DNA methylation causes an interferon response in cancer via dsRNA including endogenous retroviruses. Cell 162, 974–986 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Trapnell, C. et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat. Protoc. 7, 562–578 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Kang, Y. J. et al. CPC2: a fast and accurate coding potential calculator based on sequence intrinsic features. Nucleic Acids Res. 45, W12–W16 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Kozak, M. Regulation of translation via mRNA structure in prokaryotes and eukaryotes. Gene 361, 13–37 (2005).

    Article  CAS  PubMed  Google Scholar 

  57. Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  58. Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet. J. https://doi.org/10.14806/ej.17.1.200 (2011).

  59. Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).

    Article  CAS  PubMed  Google Scholar 

  60. Röst, H. L. et al. OpenMS: a flexible open-source software platform for mass spectrometry data analysis. Nat. Methods 13, 741–748 (2016).

    Article  PubMed  Google Scholar 

  61. Thorsson, V. et al. The immune landscape of cancer. Immunity 48, 812–830.e14 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. Boegel, S. et al. HLA typing from RNA-seq sequence reads. Genome Med. 4, 102 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  63. Bassani-Sternberg, M. Mass spectrometry based immunopeptidomics for the discovery of cancer neoantigens. Methods Mol. Biol. 1719, 209–221

  64. Bassani-Sternberg, M. & Coukos, G. Mass spectrometry-based antigen discovery for cancer immunotherapy. Curr. Opin. Immunol. 41, 9–17 (2016).

    Article  CAS  PubMed  Google Scholar 

  65. Liao, Y., Smyth, G. K. & Shi, W. FeatureCounts: An efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30, 923–930 (2014).

    Article  CAS  PubMed  Google Scholar 

  66. Wang, M. & Kong, L. pblat: a multithread blat algorithm speeding up aligning sequences to genomes. BMC Bioinformatics 20, 10–13 (2019).

    CAS  Google Scholar 

Download references

Acknowledgements

We would like to thank Z. Andrysik and J.M. Espinosa from the Department of Pharmacology, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA, for the generous gift of TP53-KO HCT116 cell line. We would like to thank J. Hoisington-López and M.L. Jaeger from The Edison Family Center for Genome Sciences & Systems Biology (CGSSB) for assistance with sequencing and B. Koebbe and E. Martin from CGSSB for data processing. T.W. is funded by NIH grants 5R01HG007175, U24ES026699 and U01HG009391 and the American Cancer Society Research Scholar grant number RSG- 14-049-01-DMC awarded. N.M.S. was a Howard Hughes Medical Institute (H.H.M.I.) Medical Research Fellow. H.J.J. was supported by a grant from NIGMS (T32 GM007067). The LC–MS/MS work from the Proteomics & Mass Spectrometry Facility at the Danforth Plant Science Center is supported by National Science Foundation grant DBI-1827534 for the acquisition of the Orbitrap Fusion Lumos LC–MS/MS.

Author information

Authors and Affiliations

Authors

Contributions

N.M.S., H.J.J. and T.W. conceived and implemented the study; N.M.S., J.M., A.W., C.F., B.K., D.L. and T.W. contributed to the computational analysis; N.M.S. developed the computational pipeline to search for tumor-specific TE-chimeric transcripts; N.M.S. and J.M. analyzed the CAGE and the ATAC-seq data. N.M.S. and C.F. analyzed the methylation array data. N.M.S., A.W. and Y.L. processed the whole-lysate, HLA-pulldown and synthetic peptide LC–MS/MS datasets. H.J.J., J.M. and X.X. generated transcriptomic profiles of cell lines; H.J.J. and Y.L. performed membrane extraction and Western blot analysis; H.J.J., N.L.B. and Y.L. maintained and collected cell lines for LC–MS; H.J.J. performed the HLA-pulldown; Y.L. and X.Q. performed TP53 HCT116 experiments; Y.L. performed immunofluorescence experiments; Y.L. and A.L. performed targeted IP-LC–MS/MS experiments; S.-C.T. and B.S.E. performed the LC–MS/MS; and the paper was prepared and revised by N.M.S., H.J.J., Y.L. and T.W. with input from all authors.

Corresponding author

Correspondence to Ting Wang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Genetics thanks Artem Babaian and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Expression of TE-chimeric transcripts across TCGA and GTEx.

a, Frequency of 26,816 TE-chimeric transcripts in 33 cancer types from TCGA. Each dot represents a separate sample, and the top of the graph lists the number of samples in each cancer type and the mean number of TE-chimeric transcripts for that cancer type. b, Same as (a) but with TCGA normal tissue samples. c, Same as (a) but with GTEx adult tissue samples. d, For tumors with matched normal samples in TCGA, box plots of the number of TE-chimeric transcripts across all samples. There is a superimposed dot plot with a line connecting matched tumor and normal samples. The ‘N=’ lists the number of samples summarized with the box plots. All box plots follow the following format: center line, median; box limits, upper and lower quartiles; whiskers, 1.5x interquartile range.

Extended Data Fig. 2 Epigenetic Correlations and Filtering with TCGA, GTEx, and FANTOM5 samples.

a, Spearman rho correlation of global methylation versus number of all 26,816 TE-chimeric transcripts across cancer types and all samples. Purple bars represent significant correlations (Adjusted P-value < 0.05). Exact p-values are the following: STAD: 2.93E-04, HNSC: 1.27E-05, LUSC: 1.58E-03, BLCA: 2.33E-02, UCEC: 3.20E-02, KIRP: 1.23E-01, SKCM: 5.73E-02, LUAD: 2.65E-01, READ: 6.69E-01, LIHC: 2.65E-01, KIRC: 4.47E-01, PCPG: 6.69E-01, ESCA: 8.485E-01, LGG: 6.69E-01, PAAD: 8.48E-01, COAD: 8.48E-01, SARC: 8.72E-01, BRCA: 8.66E-01, PRAD: 9.22E-01, CESC: 9.22E-01, THCA: 6.69E-01, All: 9.24E-01. b, Dot plot of difference in number of all TE-chimeric transcripts between samples that have a particular driver mutation and those that do not in a specific cancer type. Dots are ordered by difference. Wilcoxon rank sum test (two-sided) was used with Benjamin-Hochberg correction. Exact o-values for significant differences are the following: COAD-APC: 1.17E-03, COAD-TP53: 3.15E-06, READ-TP53: 9.01E-04, STAD-TP53: 3.70E-02, HNSC-CASP8: 7.80E-04, HNSC-NOTCH1: 4.37E-02, HNSC-NSD1: 3.08E-04, BRCA-TP53: 4.37E-02, LIHC-TP53: 7.11E-03. c, Number of tumor and normal samples all TE-chimeric transcripts were present in. Those highlighted in blue passed our threshold for tumor-specificity. The bottom graph is a zoomed in on the section of the top graph that has a dotted box around it. d, Number of TCGA tumor and GTEx adult normal samples all TE-chimeric transcripts were present in. Those highlighted in blue passed our threshold for tumor-specificity. e, Number of samples in each tissue type profiled by FANTOM5. f, Expression of candidate promoters in FANTOM5. Dashed box highlights candidates removed due to high expression in adult tissues.

Extended Data Fig. 3 Summary statistics for TE-chimeric transcripts.

a, Scatter plot of the mean fraction of the target gene’s expression a chimeric transcript accounts for and the number of tumor samples where the transcript is present. Box plot format: center line, median; box limits, upper and lower quartiles; whiskers, 1.5x interquartile range; points, outliers. b, TE class distribution and enrichment of tumor-specific TE-chimeric transcripts. The size of the dot represents the number of TE-chimeric transcripts belonging to that class. c, TE subfamily number and enrichment of candidates.

Extended Data Fig. 4 Association of TE-chimeric transcripts with methylation, chromatin accessibility, and driver mutations.

a, Scatter plot of global methylation versus number of candidates across samples in 21 tumor types. The regression line is plotted, and the Spearman rho coefficient is displayed on each plot. b, Heat map of ATAC-seq peak expression z-score (left) and transcript RNA expression in z-score (right) for 149 TE-chimeric transcripts. c, Dot plot of difference in number of candidates between samples that have a particular driver mutation and those that do not in a specific cancer type. Dots are ordered by difference. Wilcoxon rank sum test (two-sided) was used with Benjamin-Hochberg correction. Supplementary Table 5 has exact p-values. d, Bar plot of the distribution of types of TP53 mutations across cancer types. e, Box and dot plots of global methylation levels of samples with (purple) and without (blue) TP53 mutations. *P < 0.05, **P < 0.01, ***P < 0.001. Wilcoxon rank sum test (two-sided) was used with Benjamin-Hochberg correction. The ‘N=’ lists the number of tumor samples in each boxplot. Expact p-values are the following: LGG: 9.11E-01, SARC: 1.30E-01, LUAD: 5.52E-03, HNSC: 8.03E-02, BRCA: 9.11E-01, COAD: 2.04E-02, STAD: 8.43E-12, LUSC: 1.40E-03, PRAD: 1.75E-03, UCEC: 1.75E-03, BLCA: 6.65E-01, LIHC: 1.75E-03, SKCM: 1.81E-01. f, Box plot of number of tumor samples in each cancer separated by TP53 mutation type. The ‘N=’ lists the number of tumor samples in each boxplot. All box plots follow the following format: center line, median; box limits, upper and lower quartiles; whiskers, 1.5x interquartile range; points, outliers.

Extended Data Fig. 5 Cell line expression of tumor-specific TE-chimeric transcripts.

a, Box plots with overlaid dot plots of the number of candidates expressed in each cancer cell line profiled across various tumor types. The ‘N=’ lists the number of cell lines in each boxplot. Box plot format: center line, median; box limits, upper and lower quartiles; whiskers, 1.5x interquartile range. b, Heatmap of CAGE-seq TPM expression of candidates across the 10 cancer cell lines we profiled. At the top is a dot plot showing how many TCGA tumor samples a candidate is present in.

Extended Data Fig. 6 Open reading frame prediction of tumor-specific TE-chimeric transcripts.

a, Pie charts showing how often the two translation methods (longest reading frame and kozak context) agreed in terms of type of protein product (normal, truncated, chimeric normal, chimeric truncated, and frameshift) and protein sequence. b, Pie charts of number of resulting proteins with Pfam domain loss or transmembrane domain loss. c, Example of multiple Pfam domain loss in the candidate L1HS_COL28A1. d, Example of complete transmembrane domain loss in the candidate L2a_TRPM6. e, Histogram of size distribution of frameshift proteins. f, Histogram of size distribution of prepended amino acids in chimeric proteins. g, Dot plot of the proportion of all tumors in a specific cancer type covered by each candidate. The most shared candidate in each cancer is labeled.

Extended Data Fig. 7 CPTAC confirmation of TS-TEP protein sequences.

a, Dot plot of the number of TS-TEPs predicted to be in each of the BRCA samples available from CPTAC. The colored dots are samples where at least one TS-TEP candidate was validated. b, Same as (a) but for OV. c, Box plot with overlaid dot plot of the RNA expression of detected TS-TEPs across all the BRCA samples profiled. Highlighted dots are the candidate-sample combinations validated by mass spectrometry. d, Same as (c) but for OV. The ‘N=’ lists the number of expression values that are summarized by the box plot. Box plot format: center line, median; box limits, upper and lower quartiles; whiskers, 1.5x interquartile range. e, Diagram of transcript with the open reading frame location (top) and diagram of protein structure (bottom) for LTR33_SPINK4 which was found in the BRCA dataset. f, Same as (e) but for MLT1C_SPARCL1. g, Number of samples candidate is present in for each cancer type for LTR33_SPINK4. h, Same as (g) but for MLT1C_SPARCL1. i, HLA-binding affinity prediction of candidate TS-TEA protein sequences to the HLA alleles of the 35 BRCA samples where a TS-TEP was detected with mass spectrometry.

Extended Data Fig. 8 HLA-pulldown mass spectrometry detection of TS-TEAs and analysis of repetitiveness.

a, Bar plot of number of HLA-bound peptides detected from our mass spectrometry experiments. Each cell line has 4-5 replicates, and then there is a ‘Total’ bar that is the total number of unique peptides detected from all of the replicates. b, Replicates in which each of the TS-TEA peptides was detected in the DMS 53 cell line. c, CAGE expression of TE-chimeric transcripts that the detected peptides can come from. Two of the candidate peptides have sequences common to multiple candidates that are expressed in the DMS 53 cell line. d, Violin plots with superimposed dot plots of the expression in reads per million (RPM) of genomic loci that can be translated into the peptide LPSEMNPVP. The x-axis groups the data by study and then ‘candidate’ loci that come from TE-chimeric transcripts identified in the paper and ‘not candidate’ loci which are other genomic locations that can also make the same peptide. e, Same as (d) but for the peptide SPSSASLTL. f, Same as (d) but for the peptide SPSSASLAL. g, Scatter plot of all TS-TEP candidates where the x-axis is the log2-transformed number of antigenic 9-mers that can be generated from the candidate and the y-axis is the log2-transformed number of genomic loci encoding a 9-mer averaged across all antigenic 9-mers for a TS-TEP. The size of the dot is proportional to the number of tumor samples the TS-TEP is present in, and the color of the dot is the class of transposable element. h, Box plot and violin plot of the log2-transformed number of antigenic 9-mers that can be generated from each TS-TEP candidate (left) and the log2-transformed number of genomic loci encoding a 9-mer averaged across all antigenic 9-mers for each candidate (right) categorized by TE class. The ‘N=’ lists the number of candidates summarized by each box plot. Box plot format: center line, median; box limits, upper and lower quartiles; whiskers, 1.5x interquartile range. i, For the candidate L1P2_STK31, a line graph with the x-axis showing the amino acid and position of amino acids at the N-terminus of the protein, and the y-axis is the number of genomic loci that can translate into the 9-mer. Each dot represents the 9-mer that ends with the amino acid below it on the x-axis. Below the plot is a schematic of the portion of the protein translated from the TE (subfamily is L1P2) and the canonical exons of the gene (STK31).

Extended Data Fig. 9 Repetitive nature of antigens and optimal antigen combination for TCGA.

a, Number of genomic loci HLA-bound TE peptides from another study could originate from using BLAT. b, mTEC and TEC expression of all 2,297 candidates. c, Bar graph of the number of mTEC or TEC samples the candidate is detected in. Only those candidates found in multiple mTEC or TEC samples are shown. d, Bar graph of the number of tumor samples that each candidate is present in for the optimal combination of 20 TS-TEAs that would cover the most TCGA samples (bottom). At the top, a line plot showing the cumulative number of tumor samples covered by that number of candidates.

Extended Data Fig. 10 Synthetic peptide validation of HLA-bound TS-TEAs.

a–d, For each detected TS-TEA peptide, the spectra of the peptide found in our HLA-pulldown experiments is displayed on the top in blue, and the spectra of the synthetic peptide is displayed on the bottom in red. The number of common peaks is listed.

Supplementary information

Supplementary Information

Supplementary Methods and Figs. 1–5.

Reporting Summary

Supplementary Tables

Supplementary Tables 1–17.

Source data

Source Data Fig. 6

Unprocessed western blot without annotation and unprocessed fluorescence microscope image.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shah, N.M., Jang, H.J., Liang, Y. et al. Pan-cancer analysis identifies tumor-specific antigens derived from transposable elements. Nat Genet 55, 631–639 (2023). https://doi.org/10.1038/s41588-023-01349-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41588-023-01349-3

This article is cited by

Search

Quick links

Nature Briefing: Cancer

Sign up for the Nature Briefing: Cancer newsletter — what matters in cancer research, free to your inbox weekly.

Get what matters in cancer research, free to your inbox weekly. Sign up for Nature Briefing: Cancer