Cryptic promoters within transposable elements (TEs) can be transcriptionally reactivated in tumors to create new TE-chimeric transcripts, which can produce immunogenic antigens. We performed a comprehensive screen for these TE exaptation events in 33 TCGA tumor types, 30 GTEx adult tissues and 675 cancer cell lines, and identified 1,068 TE-exapted candidates with the potential to generate shared tumor-specific TE-chimeric antigens (TS-TEAs). Whole-lysate and HLA-pulldown mass spectrometry data confirmed that TS-TEAs are presented on the surface of cancer cells. In addition, we highlight tumor-specific membrane proteins transcribed from TE promoters that constitute aberrant epitopes on the extracellular surface of cancer cells. Altogether, we showcase the high pan-cancer prevalence of TS-TEAs and atypical membrane proteins that could potentially be therapeutically exploited and targeted.
This is a preview of subscription content, access via your institution
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$189.00 per year
only $15.75 per issue
Rent or buy this article
Get just this article for as long as you need it
Prices may be subject to local taxes which are calculated during checkout
All sequencing and mass spectrometry data generated in this study are available at the following accession codes (GEO accession code: GSE201021; PRIDE code: PXD033351). Links and accession codes to all publicly available data used in this study are detailed in Methods section. Source data are provided with this paper.
TEProF2, the custom pipeline used to identify TE-chimeric transcripts from RNA-sequencing data, is available with the following link: https://doi.org/10.5281/zenodo.7670515. All other codes used to generate the analysis and figures have been placed in a notebook that is made available through the following link: https://doi.org/10.5281/zenodo.7670584. Source data are provided with this paper.
Payer, L. M. & Burns, K. H. Transposable elements in human genetic disease. Nat. Rev. Genet. 20, 760–772 (2019).
Burns, K. H. Transposable elements in cancer. Nat. Rev. Cancer 17, 415–424 (2017).
Chénais, B. Transposable elements and human cancer: a causal relationship? Biochim. Biophys. Acta 1835, 28–35 (2013).
Babaian, A. & Mager, D. L. Endogenous retroviral promoter exaptation in human cancer. Mob. DNA 7, 24 (2016).
Babaian, A. et al. Onco-exaptation of an endogenous retroviral LTR drives IRF5 expression in Hodgkin lymphoma. Oncogene 35, 2542–2546 (2016).
Lock, F. E. et al. Distinct isoform of FABP7 revealed by screening for retroelement-activated genes in diffuse large B-cell lymphoma. Proc. Natl Acad. Sci. USA 111, E3534–E3543 (2014).
Lamprecht, B. et al. Derepression of an endogenous long terminal repeat activates the CSF1R proto-oncogene in human lymphoma. Nat. Med. 16, 571–579 (2010).
Wiesner, T. et al. Alternative transcription initiation leads to expression of a novel ALK isoform in cancer. Nature 526, 453–457 (2015).
Jang, H. S. et al. Transposable elements drive widespread expression of oncogenes in human cancers. Nat. Genet. 51, 611–617 (2019).
Clayton, E. A. et al. An atlas of transposable element-derived alternative splicing in cancer. Phil. Trans. R. Soc. B 375, 20190342 (2020).
Laumont, C. M. et al. Noncoding regions are the main source of targetable tumor-specific antigens. Sci. Transl. Med. 10, eaau5516 (2018).
Attig, J. et al. LTR retroelement expansion of the human cancer transcriptome and immunopeptidome revealed by de novo transcript assembly. Genome Res. 29, 1578–1590 (2019).
Kong, Y. et al. Transposable element expression in tumors is associated with immune infiltration and increased antigenicity. Nat. Commun. 10, 5228 (2019).
Bonaventura, P. et al. Identification of shared tumor epitopes from endogenous retroviruses inducing high-avidity cytotoxic T cells for cancer immunotherapy. Sci. Adv. 8, eabj3671 (2022).
Knochelmann, H. M. et al. CAR T cells in solid tumors: blueprints for building effective therapies. Front. Immunol. 9, 1740 (2018).
June, C. H., O’Connor, R. S., Kawalekar, O. U., Ghassemi, S. & Milone, M. C. CAR T cell immunotherapy for human cancer. Science 359, 1361–1365 (2018).
Abugessaisa, I. et al. FANTOM5 CAGE profiles of human and mouse reprocessed for GRCh38 and GRCm38 genome assemblies. Sci. Data 4, 170107 (2017).
Jones, P. A. & Baylin, S. B. The epigenomics of cancer. Cell 128, 683–692 (2007).
Sharma, S., Kelly, T. K. & Jones, P. A. Epigenetics in cancer. Carcinogenesis 31, 27–36 (2009).
Baylin, S. B. & Jones, P. A. A decade of exploring the cancer epigenome-biological and translational implications. Nat. Rev. Cancer 11, 726–734 (2011).
Bird, A. DNA methylation patterns and epigenetic memory. Genes Dev. 16, 6–21 (2002).
Morgan, H. D., Sutherland, H. G. E., Martin, D. I. K. & Whitelaw, E. Epigenetic inheritance at the agouti locus in the mouse. Nat. Genet. 23, 314–318 (1999).
Slotkin, R. K. & Martienssen, R. Transposable elements and the epigenetic regulation of the genome. Nat. Rev. Genet. 8, 272–285 (2007).
Jung, H. et al. DNA methylation loss promotes immune evasion of tumours with high mutation and copy number load. Nat. Commun. 10, 4278 (2019).
Corces, M. R. et al. The chromatin accessibility landscape of primary human cancers. Science 362, eaav1898 (2018).
Bailey, M. H. et al. Comprehensive characterization of cancer driver genes and mutations. Cell 173, 371–385.e18 (2018).
Wang, T. et al. Species-specific endogenous retroviruses shape the transcriptional network of the human tumor suppressor protein p53. Proc. Natl Acad. Sci. USA 104, 18613–18618 (2007).
Tiwari, B. et al. P53 directly represses human LINE1 transposons. Genes Dev. 34, 1439–1451 (2020).
Leonova, K. I. et al. p53 cooperates with DNA methylation and a suicidal interferon response to maintain epigenetic silencing of repeats and noncoding RNAs. Proc. Natl Acad. Sci. USA 110, 89–98 (2013).
Tiwari, B., Jones, A. E. & Abrams, J. M. Transposons, p53 and Genome Security. Trends Genet. 34, 846–855 (2018).
Levine, A. J., Ting, D. T. & Greenbaum, B. D. P53 and the defenses against genome instability caused by transposons and repetitive elements. BioEssays 38, 508–513 (2016).
McKerrow, W. et al. LINE-1 expression in cancer correlates with p53 mutation, copy number alteration, and S phase checkpoint. Proc. Natl Acad. Sci. USA 119, e2115999119 (2022).
Rajurkar, M. et al. Reverse transcriptase inhibition disrupts repeat element life cycle in colorectal cancer. Cancer Disco. 12, 1462–1481 (2022).
Andrysik, Z. et al. Identification of a core TP53 transcriptional program with highly distributed tumor suppressive activity. Genome Res. 27, 1645–1657 (2017).
Klijn, C. et al. A comprehensive transcriptional portrait of human cancer cell lines. Nat. Biotechnol. 33, 306–312 (2015).
Salimullah, M., Mizuho, S., Plessy, C. & Carninci, P. NanoCAGE: a high-resolution technique to discover and interrogate cell transcriptomes. Cold Spring Harb. Protoc. 2011, pdb.prot5559 (2011).
Finn, R. D. et al. Pfam: the protein families database. Nucleic Acids Res. 42, D222–D230 (2014).
Krogh, A., Larsson, B., Von Heijne, G. & Sonnhammer, E. L. L. Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J. Mol. Biol. 305, 567–580 (2001).
Kahles, A. et al. Comprehensive analysis of alternative splicing across tumors from 8,705 patients. Cancer Cell 34, 211–224 (2018).
Mertins, P. et al. Proteogenomics connects somatic mutations to signalling in breast cancer. Nature 534, 55–62 (2016).
Zhang, H. et al. Integrated proteogenomic characterization of human high-grade serous ovarian cancer. Cell 166, 755–765 (2016).
Jurtz, V. et al. NetMHCpan-4.0: improved peptide–MHC class I interaction predictions integrating eluted ligand and peptide binding affinity data. J. Immunol. 199, 3360–3368 (2017).
Tyanova, S., Temu, T. & Cox, J. The MaxQuant computational platform for mass spectrometry-based shotgun proteomics. Nat. Protoc. 11, 2301–2319 (2016).
Scholtalbers, J. et al. TCLP: an online cancer cell line catalogue integrating HLA type, predicted neo-epitopes, virus and gene expression. Genome Med. 7, 118 (2015).
Bassani-Sternberg, M., Pletscher-Frankild, S., Jensen, L. J. & Mann, M. Mass spectrometry of human leukocyte antigen class I peptidomes reveals strong effects of protein abundance and turnover on antigen presentation. Mol. Cell. Proteom. 14, 658–673 (2015).
Newey, A. et al. Immunopeptidomics of colorectal cancer organoids reveals a sparse HLA class I neoantigen landscape and no increase in neoantigens with interferon or MEK-inhibitor treatment. J. Immunother. Cancer 7, 309 (2019).
Kent, W. J. BLAT—The BLAST-like alignment tool. Genome Res. 12, 656–664 (2002).
Maus, M. V. & June, C. H. Making better chimeric antigen receptors for adoptive T-cell therapy. Clin. Cancer Res. 22, 1875–1884 (2016).
Marofi, F. et al. CAR T cells in solid tumors: challenges and opportunities. Stem Cell Res. Ther. 12, 1–16 (2021).
Roulois, D. et al. DNA-demethylating agents target colorectal cancer cells by inducing viral mimicry by endogenous transcripts. Cell 162, 961–973 (2015).
Brocks, D. et al. DNMT and HDAC inhibitors induce cryptic transcription start sites encoded in long terminal repeats. Nat. Genet. 49, 1052–1060 (2017).
Chiappinelli, K. B. et al. Inhibiting DNA methylation causes an interferon response in cancer via dsRNA including endogenous retroviruses. Cell 162, 974–986 (2015).
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Trapnell, C. et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat. Protoc. 7, 562–578 (2012).
Kang, Y. J. et al. CPC2: a fast and accurate coding potential calculator based on sequence intrinsic features. Nucleic Acids Res. 45, W12–W16 (2017).
Kozak, M. Regulation of translation via mRNA structure in prokaryotes and eukaryotes. Gene 361, 13–37 (2005).
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet. J. https://doi.org/10.14806/ej.17.1.200 (2011).
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
Röst, H. L. et al. OpenMS: a flexible open-source software platform for mass spectrometry data analysis. Nat. Methods 13, 741–748 (2016).
Thorsson, V. et al. The immune landscape of cancer. Immunity 48, 812–830.e14 (2018).
Boegel, S. et al. HLA typing from RNA-seq sequence reads. Genome Med. 4, 102 (2012).
Bassani-Sternberg, M. Mass spectrometry based immunopeptidomics for the discovery of cancer neoantigens. Methods Mol. Biol. 1719, 209–221
Bassani-Sternberg, M. & Coukos, G. Mass spectrometry-based antigen discovery for cancer immunotherapy. Curr. Opin. Immunol. 41, 9–17 (2016).
Liao, Y., Smyth, G. K. & Shi, W. FeatureCounts: An efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30, 923–930 (2014).
Wang, M. & Kong, L. pblat: a multithread blat algorithm speeding up aligning sequences to genomes. BMC Bioinformatics 20, 10–13 (2019).
We would like to thank Z. Andrysik and J.M. Espinosa from the Department of Pharmacology, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA, for the generous gift of TP53-KO HCT116 cell line. We would like to thank J. Hoisington-López and M.L. Jaeger from The Edison Family Center for Genome Sciences & Systems Biology (CGSSB) for assistance with sequencing and B. Koebbe and E. Martin from CGSSB for data processing. T.W. is funded by NIH grants 5R01HG007175, U24ES026699 and U01HG009391 and the American Cancer Society Research Scholar grant number RSG- 14-049-01-DMC awarded. N.M.S. was a Howard Hughes Medical Institute (H.H.M.I.) Medical Research Fellow. H.J.J. was supported by a grant from NIGMS (T32 GM007067). The LC–MS/MS work from the Proteomics & Mass Spectrometry Facility at the Danforth Plant Science Center is supported by National Science Foundation grant DBI-1827534 for the acquisition of the Orbitrap Fusion Lumos LC–MS/MS.
The authors declare no competing interests.
Peer review information
Nature Genetics thanks Artem Babaian and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended Data Fig. 1 Expression of TE-chimeric transcripts across TCGA and GTEx.
a, Frequency of 26,816 TE-chimeric transcripts in 33 cancer types from TCGA. Each dot represents a separate sample, and the top of the graph lists the number of samples in each cancer type and the mean number of TE-chimeric transcripts for that cancer type. b, Same as (a) but with TCGA normal tissue samples. c, Same as (a) but with GTEx adult tissue samples. d, For tumors with matched normal samples in TCGA, box plots of the number of TE-chimeric transcripts across all samples. There is a superimposed dot plot with a line connecting matched tumor and normal samples. The ‘N=’ lists the number of samples summarized with the box plots. All box plots follow the following format: center line, median; box limits, upper and lower quartiles; whiskers, 1.5x interquartile range.
Extended Data Fig. 2 Epigenetic Correlations and Filtering with TCGA, GTEx, and FANTOM5 samples.
a, Spearman rho correlation of global methylation versus number of all 26,816 TE-chimeric transcripts across cancer types and all samples. Purple bars represent significant correlations (Adjusted P-value < 0.05). Exact p-values are the following: STAD: 2.93E-04, HNSC: 1.27E-05, LUSC: 1.58E-03, BLCA: 2.33E-02, UCEC: 3.20E-02, KIRP: 1.23E-01, SKCM: 5.73E-02, LUAD: 2.65E-01, READ: 6.69E-01, LIHC: 2.65E-01, KIRC: 4.47E-01, PCPG: 6.69E-01, ESCA: 8.485E-01, LGG: 6.69E-01, PAAD: 8.48E-01, COAD: 8.48E-01, SARC: 8.72E-01, BRCA: 8.66E-01, PRAD: 9.22E-01, CESC: 9.22E-01, THCA: 6.69E-01, All: 9.24E-01. b, Dot plot of difference in number of all TE-chimeric transcripts between samples that have a particular driver mutation and those that do not in a specific cancer type. Dots are ordered by difference. Wilcoxon rank sum test (two-sided) was used with Benjamin-Hochberg correction. Exact o-values for significant differences are the following: COAD-APC: 1.17E-03, COAD-TP53: 3.15E-06, READ-TP53: 9.01E-04, STAD-TP53: 3.70E-02, HNSC-CASP8: 7.80E-04, HNSC-NOTCH1: 4.37E-02, HNSC-NSD1: 3.08E-04, BRCA-TP53: 4.37E-02, LIHC-TP53: 7.11E-03. c, Number of tumor and normal samples all TE-chimeric transcripts were present in. Those highlighted in blue passed our threshold for tumor-specificity. The bottom graph is a zoomed in on the section of the top graph that has a dotted box around it. d, Number of TCGA tumor and GTEx adult normal samples all TE-chimeric transcripts were present in. Those highlighted in blue passed our threshold for tumor-specificity. e, Number of samples in each tissue type profiled by FANTOM5. f, Expression of candidate promoters in FANTOM5. Dashed box highlights candidates removed due to high expression in adult tissues.
Extended Data Fig. 3 Summary statistics for TE-chimeric transcripts.
a, Scatter plot of the mean fraction of the target gene’s expression a chimeric transcript accounts for and the number of tumor samples where the transcript is present. Box plot format: center line, median; box limits, upper and lower quartiles; whiskers, 1.5x interquartile range; points, outliers. b, TE class distribution and enrichment of tumor-specific TE-chimeric transcripts. The size of the dot represents the number of TE-chimeric transcripts belonging to that class. c, TE subfamily number and enrichment of candidates.
Extended Data Fig. 4 Association of TE-chimeric transcripts with methylation, chromatin accessibility, and driver mutations.
a, Scatter plot of global methylation versus number of candidates across samples in 21 tumor types. The regression line is plotted, and the Spearman rho coefficient is displayed on each plot. b, Heat map of ATAC-seq peak expression z-score (left) and transcript RNA expression in z-score (right) for 149 TE-chimeric transcripts. c, Dot plot of difference in number of candidates between samples that have a particular driver mutation and those that do not in a specific cancer type. Dots are ordered by difference. Wilcoxon rank sum test (two-sided) was used with Benjamin-Hochberg correction. Supplementary Table 5 has exact p-values. d, Bar plot of the distribution of types of TP53 mutations across cancer types. e, Box and dot plots of global methylation levels of samples with (purple) and without (blue) TP53 mutations. *P < 0.05, **P < 0.01, ***P < 0.001. Wilcoxon rank sum test (two-sided) was used with Benjamin-Hochberg correction. The ‘N=’ lists the number of tumor samples in each boxplot. Expact p-values are the following: LGG: 9.11E-01, SARC: 1.30E-01, LUAD: 5.52E-03, HNSC: 8.03E-02, BRCA: 9.11E-01, COAD: 2.04E-02, STAD: 8.43E-12, LUSC: 1.40E-03, PRAD: 1.75E-03, UCEC: 1.75E-03, BLCA: 6.65E-01, LIHC: 1.75E-03, SKCM: 1.81E-01. f, Box plot of number of tumor samples in each cancer separated by TP53 mutation type. The ‘N=’ lists the number of tumor samples in each boxplot. All box plots follow the following format: center line, median; box limits, upper and lower quartiles; whiskers, 1.5x interquartile range; points, outliers.
Extended Data Fig. 5 Cell line expression of tumor-specific TE-chimeric transcripts.
a, Box plots with overlaid dot plots of the number of candidates expressed in each cancer cell line profiled across various tumor types. The ‘N=’ lists the number of cell lines in each boxplot. Box plot format: center line, median; box limits, upper and lower quartiles; whiskers, 1.5x interquartile range. b, Heatmap of CAGE-seq TPM expression of candidates across the 10 cancer cell lines we profiled. At the top is a dot plot showing how many TCGA tumor samples a candidate is present in.
Extended Data Fig. 6 Open reading frame prediction of tumor-specific TE-chimeric transcripts.
a, Pie charts showing how often the two translation methods (longest reading frame and kozak context) agreed in terms of type of protein product (normal, truncated, chimeric normal, chimeric truncated, and frameshift) and protein sequence. b, Pie charts of number of resulting proteins with Pfam domain loss or transmembrane domain loss. c, Example of multiple Pfam domain loss in the candidate L1HS_COL28A1. d, Example of complete transmembrane domain loss in the candidate L2a_TRPM6. e, Histogram of size distribution of frameshift proteins. f, Histogram of size distribution of prepended amino acids in chimeric proteins. g, Dot plot of the proportion of all tumors in a specific cancer type covered by each candidate. The most shared candidate in each cancer is labeled.
Extended Data Fig. 7 CPTAC confirmation of TS-TEP protein sequences.
a, Dot plot of the number of TS-TEPs predicted to be in each of the BRCA samples available from CPTAC. The colored dots are samples where at least one TS-TEP candidate was validated. b, Same as (a) but for OV. c, Box plot with overlaid dot plot of the RNA expression of detected TS-TEPs across all the BRCA samples profiled. Highlighted dots are the candidate-sample combinations validated by mass spectrometry. d, Same as (c) but for OV. The ‘N=’ lists the number of expression values that are summarized by the box plot. Box plot format: center line, median; box limits, upper and lower quartiles; whiskers, 1.5x interquartile range. e, Diagram of transcript with the open reading frame location (top) and diagram of protein structure (bottom) for LTR33_SPINK4 which was found in the BRCA dataset. f, Same as (e) but for MLT1C_SPARCL1. g, Number of samples candidate is present in for each cancer type for LTR33_SPINK4. h, Same as (g) but for MLT1C_SPARCL1. i, HLA-binding affinity prediction of candidate TS-TEA protein sequences to the HLA alleles of the 35 BRCA samples where a TS-TEP was detected with mass spectrometry.
Extended Data Fig. 8 HLA-pulldown mass spectrometry detection of TS-TEAs and analysis of repetitiveness.
a, Bar plot of number of HLA-bound peptides detected from our mass spectrometry experiments. Each cell line has 4-5 replicates, and then there is a ‘Total’ bar that is the total number of unique peptides detected from all of the replicates. b, Replicates in which each of the TS-TEA peptides was detected in the DMS 53 cell line. c, CAGE expression of TE-chimeric transcripts that the detected peptides can come from. Two of the candidate peptides have sequences common to multiple candidates that are expressed in the DMS 53 cell line. d, Violin plots with superimposed dot plots of the expression in reads per million (RPM) of genomic loci that can be translated into the peptide LPSEMNPVP. The x-axis groups the data by study and then ‘candidate’ loci that come from TE-chimeric transcripts identified in the paper and ‘not candidate’ loci which are other genomic locations that can also make the same peptide. e, Same as (d) but for the peptide SPSSASLTL. f, Same as (d) but for the peptide SPSSASLAL. g, Scatter plot of all TS-TEP candidates where the x-axis is the log2-transformed number of antigenic 9-mers that can be generated from the candidate and the y-axis is the log2-transformed number of genomic loci encoding a 9-mer averaged across all antigenic 9-mers for a TS-TEP. The size of the dot is proportional to the number of tumor samples the TS-TEP is present in, and the color of the dot is the class of transposable element. h, Box plot and violin plot of the log2-transformed number of antigenic 9-mers that can be generated from each TS-TEP candidate (left) and the log2-transformed number of genomic loci encoding a 9-mer averaged across all antigenic 9-mers for each candidate (right) categorized by TE class. The ‘N=’ lists the number of candidates summarized by each box plot. Box plot format: center line, median; box limits, upper and lower quartiles; whiskers, 1.5x interquartile range. i, For the candidate L1P2_STK31, a line graph with the x-axis showing the amino acid and position of amino acids at the N-terminus of the protein, and the y-axis is the number of genomic loci that can translate into the 9-mer. Each dot represents the 9-mer that ends with the amino acid below it on the x-axis. Below the plot is a schematic of the portion of the protein translated from the TE (subfamily is L1P2) and the canonical exons of the gene (STK31).
Extended Data Fig. 9 Repetitive nature of antigens and optimal antigen combination for TCGA.
a, Number of genomic loci HLA-bound TE peptides from another study could originate from using BLAT. b, mTEC and TEC expression of all 2,297 candidates. c, Bar graph of the number of mTEC or TEC samples the candidate is detected in. Only those candidates found in multiple mTEC or TEC samples are shown. d, Bar graph of the number of tumor samples that each candidate is present in for the optimal combination of 20 TS-TEAs that would cover the most TCGA samples (bottom). At the top, a line plot showing the cumulative number of tumor samples covered by that number of candidates.
Extended Data Fig. 10 Synthetic peptide validation of HLA-bound TS-TEAs.
a–d, For each detected TS-TEA peptide, the spectra of the peptide found in our HLA-pulldown experiments is displayed on the top in blue, and the spectra of the synthetic peptide is displayed on the bottom in red. The number of common peaks is listed.
Supplementary Methods and Figs. 1–5.
Supplementary Tables 1–17.
Source Data Fig. 6
Unprocessed western blot without annotation and unprocessed fluorescence microscope image.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Shah, N.M., Jang, H.J., Liang, Y. et al. Pan-cancer analysis identifies tumor-specific antigens derived from transposable elements. Nat Genet 55, 631–639 (2023). https://doi.org/10.1038/s41588-023-01349-3