Transposable elements (TEs) are an abundant and rich genetic resource of regulatory sequences1,2,3. Cryptic regulatory elements within TEs can be epigenetically reactivated in cancer to influence oncogenesis in a process termed onco-exaptation4. However, the prevalence and impact of TE onco-exaptation events across cancer types are poorly characterized. Here, we analyzed 7,769 tumors and 625 normal datasets from 15 cancer types, identifying 129 TE cryptic promoter-activation events involving 106 oncogenes across 3,864 tumors. Furthermore, we interrogated the AluJb-LIN28B candidate: the genetic deletion of the TE eliminated oncogene expression, while dynamic DNA methylation modulated promoter activity, illustrating the necessity and sufficiency of a TE for oncogene activation. Collectively, our results characterize the global profile of TE onco-exaptation and highlight this prevalent phenomenon as an important mechanism for promiscuous oncogene activation and ultimately tumorigenesis.
Subscribe to Journal
Get full journal access for 1 year
only $17.42 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
All custom scripts are available from the authors upon request.
Datasets generated and analyzed in this study are available on Gene Expression Omnibus under accession code GSE113946.
Xie, M. et al. DNA hypomethylation within specific transposable element families associates with tissue-specific enhancer landscape. Nat. Genet. 45, 836–841 (2013).
Sundaram, V. et al. Widespread contribution of transposable elements to the innovation of gene regulatory networks. Genome Res. 24, 1963–1976 (2014).
Rebollo, R., Romanish, M. T. & Mager, D. L. Transposable elements: an abundant and natural source of regulatory sequences for host genes. Annu. Rev. Genet. 46, 21–42 (2012).
Babaian, A. & Mager, D. L. Endogenous retroviral promoter exaptation in human cancer. Mob. DNA 7, 1–21 (2016).
Botezatu, A. et al. in New Aspects in Molecular and Cellular Mechanisms of Human Carcinogenesis (ed. Bulgin, D.) 1–52 (InTech, 2016).
Pierotti, M. A., Sozzi, G. & Croce, C. M. Mechanisms of oncogene activation. in Holland-Frei Cancer Medicine (eds Kufe D. W. et al.) (BC Decker, 2003).
Batut, P., Dobin, A., Plessy, C., Carninci, P. & Gingeras, T. R. High-fidelity promoter profiling reveals widespread alternative promoter usage and transposon-driven developmental gene expression. Genome Res. 23, 169–180 (2013).
Chuong, E. B., Elde, N. C. & Feschotte, C. Regulatory activities of transposable elements: from conflicts to benefits. Nat. Rev. Genet. 18, 71–86 (2017).
Chuong, E. B., Elde, N. C. & Feschotte, C. Regulatory evolution of innate immunity through co-option of endogenous retroviruses. Science 351, 1083–1087 (2016).
Hon, G. C. et al. Global DNA hypomethylation coupled to repressive chromatin domain formation and gene silencing in breast cancer. Genome Res. 22, 246–258 (2012).
Baylin, S. B. & Jones, P. A. A decade of exploring the cancer epigenome—biological and translational implications. Nat. Rev. Cancer 11, 726–734 (2011).
Esteller, M. Cancer epigenomics: DNA methylomes and histone-modification maps. Nat. Rev. Genet. 8, 286–298 (2007).
Babaian, A. et al. Onco-exaptation of an endogenous retroviral LTR drives IRF5 expression in Hodgkin lymphoma. Oncogene 35, 2542–2546 (2016).
Lamprecht, B. et al. Derepression of an endogenous long terminal repeat activates the CSF1R proto-oncogene in human lymphoma. Nat. Med. 16, 571–579 (2010).
Lock, F. E. et al. A novel isoform of IL-33 revealed by screening for transposable element promoted genes in human colorectal cancer. PLoS ONE 12, 1–30 (2017).
Scarf, I. et al. Identification of a new subclass of ALK-negative ALCL expressing aberrant levels of ERBB4 transcripts. Blood 127, 221–233 (2016).
Wiesner, T. et al. Alternative transcription initiation leads to expression of a novel ALK isoform in cancer. Nature 526, 453–457 (2015).
Wolff, E. M. et al. Hypomethylation of a LINE-1 promoter activates an alternate transcript of the MET oncogene in bladders with cancer. PLoS Genet. 6, e1000917 (2010).
Liu, Y., Sun, J. & Zhao, M. ONGene: a literature-based database for human oncogenes. J. Genet. Genomics 44, 119–121 (2017).
Raskin, L. et al. Transcriptome profiling identifies HMGA2 as a biomarker of melanoma progression and prognosis. J. Invest. Dermatol. 133, 2585–2592 (2013).
Zhang, X. et al. SALL4: An emerging cancer biomarker and target. Cancer Lett. 357, 55–62 (2015).
Wang, T. et al. Aberrant regulation of the LIN28A/LIN28B and let-7 loop in human malignant tumors and its effects on the hallmarks of cancer. Mol. Cancer 14, 125 (2015).
Nguyen, L. H. et al. Lin28b is sufficient to drive liver cancer and necessary for its maintenance in murine models. Cancer Cell 26, 248–261 (2014).
Viswanathan, S. R. et al. Lin28 promotes transformation and is associated with advanced human malignancies. Nat. Genet. 41, 843–848 (2009).
Forrest, A. R. R. et al. A promoter-level mammalian expression atlas. Nature 507, 462–470 (2014).
Brocks, D. et al. DNMT and HDAC inhibitors induce cryptic transcription start sites encoded in long terminal repeats. Nat. Genet. 49, 1052–1060 (2017).
Shiraki, T. et al. Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage. Proc. Natl Acad. Sci. USA 100, 15776–15781 (2003).
Suzuki, A. et al. Aberrant transcriptional regulations in cancers: genome, transcriptome and epigenome analysis of lung adenocarcinoma cell lines. Nucleic Acids Res. 42, 13557–13572 (2014).
Johnson, C. D. et al. The let-7 microRNA represses cell proliferation pathways in human cells. Cancer Res. 67, 7713–7722 (2007).
Newman, M. A., Thomson, J. M. & Hammond, S. M. Lin-28 interaction with the Let-7 precursor loop mediates regulated microRNA processing. RNA 14, 1539–1549 (2008).
Zhou, J., Ng, S. B. & Chng, W. J. LIN28/LIN28B: an emerging oncogenic driver in cancer stem cells. Int. J. Biochem. Cell Biol. 45, 973–978 (2013).
Moqtaderi, Z. et al. Genomic binding profiles of functionally distinct RNA polymerase III transcription complexes in human cells. Nat. Struct. Mol. Biol. 17, 635–640 (2010).
Rice, P., Longden, L. & Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite. Trends Genet. 16, 276–277 (2000).
Hubley, R. et al. The Dfam database of repetitive DNA families. Nucleic Acids Res. 44, D81–D89 (2016).
Grant, C. E., Bailey, T. L. & Noble, W. S. FIMO: Scanning for occurrences of a given motif. Bioinformatics 27, 1017–1018 (2011).
Guo, W. et al. A LIN28B tumor-specific transcript in cancer. Cell Rep. 22, 2094–2106 (2018).
Beketaev, I. et al. cis-regulatory control of Mesp1 expression by YY1 and SP1 during mouse embryogenesis. Dev. Dyn. 245, 379–387 (2016).
Heo, I. et al. Lin28 mediates the terminal uridylation of let-7 precursor microRNA. Mol. Cell 32, 276–284 (2008).
Viswanathan, S. R., Daley, G. Q. & Gregory, R. I. Selective blockade of microRNA processing by Lin28. Science 320, 97–100 (2008).
Rybak, A. et al. A feedback loop comprising lin-28 and let-7 controls pre-let-7 maturation during neural stem-cell commitment. Nat. Cell Biol. 10, 987–993 (2008).
Tanenbaum, M. E., Gilbert, L. A., Qi, L. S., Weissman, J. S. & Vale, R. D. A protein-tagging system for signal amplification in gene expression and fluorescence imaging. Cell 159, 635–646 (2014).
Morita, S. et al. Targeted DNA demethylation in vivo using dCas9-peptide repeat and scFv-TET1 catalytic domain fusions. Nat. Biotechnol. 34, 1060–1065 (2016).
Huang, Y. H. et al. DNA epigenome editing using CRISPR-Cas SunTag-directed DNMT3A. Genome Biol. 18, 1–11 (2017).
Harrow, J. et al. GENCODE: the reference human genome annotation for the ENCODE project. Genome Res. 22, 1760–1774 (2012).
Tarailo-Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinforma. 25, 1–14 (2009).
Karolchik, D. The UCSC Table Browser data retrieval tool. Nucleic Acids Res. 32, D493–D496 (2004).
Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33, 290–295 (2015).
Kang, Y. J. et al. CPC2: A fast and accurate coding potential calculator based on sequence intrinsic features. Nucleic Acids Res. 45, W12–W16 (2017).
Dunham, I. et al. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Krueger, F. & Andrews, S. R. Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications. Bioinformatics 27, 1571–1572 (2011).
Corces, M. R. et al. An improved ATAC-seq protocol reduces background and enables interrogation of frozen tissues. Nat. Meth. 14, 959–962 (2017).
Bayat, A., Gaëta, B., Ignjatovic, A. & Parameswaran, S. Improved VCF normalization for accurate VCF comparison. Bioinformatics 33, 964–970 (2017).
Salimullah, M., Mizuho, S., Plessy, C. & Carninci, P. NanoCAGE: a high-resolution technique to discover and interrogate cell transcriptomes. Cold Spring Harb. Protoc. 6, 96–111 (2011).
Haberle, V. et al. CAGEr: precise TSS data retrieval and high-resolution promoterome mining for integrative analyses. Nucleic Acids Res. 43, e51(2015).
Zhou, X. et al. The human epigenome browser at Washington University. Nat. Meth. 8, 989–990 (2011).
Wang, X. Primer sequences for 96 cancer-related miRNA assays. RNA 15, 716–723 (2009).
Haeussler, M. et al. Evaluation of off-target and on-target scoring algorithms and integration into the guide RNA selection tool CRISPOR. Genome Biol. 17, 1–12 (2016).
Moreno-Mateos, M. A. et al. CRISPRscan: designing highly efficient sgRNAs for CRISPR-Cas9 targeting in vivo. Nat. Meth. 12, 982–988 (2015).
We would like to thank J. Hoisington-López and M.L. Jaeger from The Edison Family Center for Genome Sciences & Systems Biology (CGSSB) for assistance with sequencing; B. Koebbe and E. Martin from CGSSB for data processing; M. Savio, M. Patana and D. Schweppe from the Siteman Flow Cytometry core for FACS-related expertise; M. Goodell and Y. Huang for valuable expertise with the SunTag-DNMT3A system and L. Maggi (Washington University School of Medicine) for generously gifting the H838 cell line. This work was funded by NIH grant numbers 5R01HG007175, U24ES026699 and U01HG009391 and the American Cancer Society Research Scholar grant number RSG-14-049-01-DMC. H.S.J. was supported by a grant from NIGMS (no. T32 GM007067). A.Y.D. is supported by a grant from NHGRI (no. T32 HG000045). N.M.S. is a Howard Hughes Medical Institute (H.H.M.I.) Medical Research Fellow, http://www.hhmi.org/. The xenograft work cited in this publication was performed in a facility supported by NCRR grant number C06 RR015502. E.C.P. was supported by Postdoctoral Fellowship PF-17-201-01-TBG from the American Cancer Society.
The authors declare no competing interests.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Integrated supplementary information
Supplementary Figure 1 RNA-seq computational pipeline detects numerous TE onco-exaptation events in 15 cancer types from TCGA.
a, Number of cases from various cancer types that were analyzed. b, Schematic of the computational pipeline describing how RNA-seq from TCGA was processed to identify onco-exaptation candidates.
a, The genomic locations of TEs that act as cryptic promoters for oncogenes across different cancer types. b, Distribution of TE classes across cancer types. Total number of unique TEs that contribute to onco-exaptation events are labeled on top. c, Distribution of each TE family that contributed to onco-exaptation across 15 cancer types.
Supplementary Figure 3 Oncogene expression profiles of the top ten onco-exaptation candidates across cancer types.
Oncogene expression of each tumor with and without an onco-exaptation event. Each gray dot represents a tumor while the red dots reveal whether the tumor is predicted to have an onco-exaptation event. Total expression is represented as log2(FPKM-UQ) provided by TCGA GDC (https://portal.gdc.cancer.gov/).
Kaplan-Meier Curves for the 8 examples of where a top 10 candidate was significantly prognostic in a cancer (p < 0.05) based on log-rank statistical test (two-sided). The red line in each graph represents patients where the candidate was found to be present, and in blue line represents all the patients where the candidate was not detected. All were found to negatively impact overall survival. The number of biologically independent patients is listed within each plot.
Supplementary Figure 5 Transcription-start-site verification of onco-exaptation candidates in the H727 lung cancer cell line.
a, WashU Epigenome browser (http://epigenomegateway.wustl.edu/browser/) view of CAGE-seq and mate-paired reads where the forward read initiates from AluJb and the reverse read ends in the gene body of LIN28B in H1299 and b, H838. c, WashU Epigenome browser view of H727 CAGE-seq and mate-paired reads that where the forward read from L1PA2 and the reverse read ends in the gene body of SYT1. d, WashU Epigenome browser view of H727 CAGE-seq and mate-paired reads where the forward read initiates from Tigger3a/MLT1D and the reverse read ends in the gene body of ARID3A. e, WashU Epigenome browser view of H727 CAGE-seq over SYT1 promoters. f, WashU Epigenome browser view of H727 CAGE-seq over ARID3A promoter.
Supplementary Figure 6 The AluJb TE is methylated in somatic tissue and is epigenetically dysregulated in lung cancer cell lines.
a, DNA methylome profiles of multiple somatic tissues from the Roadmap Epigenomics Project (http://www.roadmapepigenomics.org/) are displayed on the WashU Epigenome Browser (http://epigenomegateway.wustl.edu/browser/). b, Schematics showing DNA methylation levels of AluJb in different cancer cell lines. An alternative start codon present in AluJb generates a chimeric LIN28B peptide that lacks 3 amino acids contributed by exon 1, but has 22 novel amino acids prepended. c, Predicted amino acid sequence of AluJb-LIN28B protein. d, Cropped Western blot (repeated twice with similar results) representing the size difference between the AluJb-LIN28B protein and canonical LIN28B protein.
The CRISPR-mediated genetic breaks are illustrated for H1299 (left) and H838 (right) CRISPR clones. gRNA sequences are illustrated with various colors. AluJb1 KO clones were generated by deleted genomic region between gRNA-A1 and gRNA-A3. AluJb2 KO clones were generated by deleted genomic region between gRNA-A2 and gRNA-A3. LIN28BP KO clones were generated by deleted genomic region between gRNA-L1 and gRNA-L2. Two independent clones were profiled for each set of CRISPR KOs. Two independent clones were profiled for each set of CRISPR KOs. The genotyping gel results were replicated twice in independent experiments.
a, Promoter-luciferase (n = 3 independent experiments) results illustrating transcriptional activity of various TE arrangements in K562. Welch’s t-test was performed against reverse complement AluJb sequence (limited activity). b, Luciferase assays (n = 3 independent experiments) for mutagenized transcription factor motifs in K562. Relative luciferase activty for mutagenized vectors were compared to wild-type AluJb-P. c, Genotypes of K562 CRISPR KO clones for AluJb-P and LIN28BP deletions. Genotype check was repeated twice with similar results. d, Cropped Western blot for LIN28B in K562 CRISPR KO clones. K562 also expresses a smaller isoform of LIN28B that is not present in H1299 and H838. This experiment was repeated twice with similar results. e, Relative let-7a, let-7b and let-7g miRNA levels compared to wild-type in CRISPR-knockout clones of K562 as measured by qPCR (n = 3 independent experiments). f, CCK-8 growth assay measuring cell growth rate of K562 WT and CRISPR clones (n = 3 independent experiments). a,b, P-values were calculated using two-tailed Welch t-test. a,b,e,f, All data are represented as means ± SE.
Black boxes denote cropped images that are presented in the manuscript.
Supplementary Figures 1–9
Compiled oncogene and onco-exaptation list.
All TE-derived alternative isoforms of oncogenes.
Tumor-enriched onco-exaptation events and their distribution across 15 TCGA cancers.
Multiple TE-derived oncogene activation.
Top candidates in lung adenocarcinoma cell lines.
About this article
Cite this article
Jang, H.S., Shah, N.M., Du, A.Y. et al. Transposable elements drive widespread expression of oncogenes in human cancers. Nat Genet 51, 611–617 (2019). https://doi.org/10.1038/s41588-019-0373-3
Genes & Development (2021)
Transcript assembly improves expression quantification of transposable elements in single-cell RNA-seq data
Genome Research (2021)
Oxidative and radiation stress induces transposable element transcription in Drosophila melanogaster
Journal of Evolutionary Biology (2021)
The FEBS Journal (2021)
Current Opinion in Genetics & Development (2021)