Transposable elements drive widespread expression of oncogenes in human cancers

An Author Correction to this article was published on 16 April 2019

This article has been updated


Transposable elements (TEs) are an abundant and rich genetic resource of regulatory sequences1,2,3. Cryptic regulatory elements within TEs can be epigenetically reactivated in cancer to influence oncogenesis in a process termed onco-exaptation4. However, the prevalence and impact of TE onco-exaptation events across cancer types are poorly characterized. Here, we analyzed 7,769 tumors and 625 normal datasets from 15 cancer types, identifying 129 TE cryptic promoter-activation events involving 106 oncogenes across 3,864 tumors. Furthermore, we interrogated the AluJb-LIN28B candidate: the genetic deletion of the TE eliminated oncogene expression, while dynamic DNA methylation modulated promoter activity, illustrating the necessity and sufficiency of a TE for oncogene activation. Collectively, our results characterize the global profile of TE onco-exaptation and highlight this prevalent phenomenon as an important mechanism for promiscuous oncogene activation and ultimately tumorigenesis.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: The TE onco-exaptation landscape across cancer types.
Fig. 2: TEs provide bona fide promoters for oncogenes in lung cancer cell lines.
Fig. 3: AluJb drives LIN28B expression and contributes to oncogenesis in lung cancer cell lines.
Fig. 4: Targeted DNA methylation dynamics uncover epigenetic control of AluJb promoter activity.

Code availability

All custom scripts are available from the authors upon request.

Data availability

Datasets generated and analyzed in this study are available on Gene Expression Omnibus under accession code GSE113946.

Change history

  • 16 April 2019

    In the version of this article initially published, grant PF-17-201-01-TBG from the American Cancer Society to author Erica C. Pehrsson was not included in the Acknowledgements. The error has been corrected in the HTML and PDF versions of the article.


  1. 1.

    Xie, M. et al. DNA hypomethylation within specific transposable element families associates with tissue-specific enhancer landscape. Nat. Genet. 45, 836–841 (2013).

    CAS  Article  Google Scholar 

  2. 2.

    Sundaram, V. et al. Widespread contribution of transposable elements to the innovation of gene regulatory networks. Genome Res. 24, 1963–1976 (2014).

    CAS  Article  Google Scholar 

  3. 3.

    Rebollo, R., Romanish, M. T. & Mager, D. L. Transposable elements: an abundant and natural source of regulatory sequences for host genes. Annu. Rev. Genet. 46, 21–42 (2012).

    CAS  Article  Google Scholar 

  4. 4.

    Babaian, A. & Mager, D. L. Endogenous retroviral promoter exaptation in human cancer. Mob. DNA 7, 1–21 (2016).

    Article  Google Scholar 

  5. 5.

    Botezatu, A. et al. in New Aspects in Molecular and Cellular Mechanisms of Human Carcinogenesis (ed. Bulgin, D.) 1–52 (InTech, 2016).

  6. 6.

    Pierotti, M. A., Sozzi, G. & Croce, C. M. Mechanisms of oncogene activation. in Holland-Frei Cancer Medicine (eds Kufe D. W. et al.) (BC Decker, 2003).

  7. 7.

    Batut, P., Dobin, A., Plessy, C., Carninci, P. & Gingeras, T. R. High-fidelity promoter profiling reveals widespread alternative promoter usage and transposon-driven developmental gene expression. Genome Res. 23, 169–180 (2013).

    CAS  Article  Google Scholar 

  8. 8.

    Chuong, E. B., Elde, N. C. & Feschotte, C. Regulatory activities of transposable elements: from conflicts to benefits. Nat. Rev. Genet. 18, 71–86 (2017).

    CAS  Article  Google Scholar 

  9. 9.

    Chuong, E. B., Elde, N. C. & Feschotte, C. Regulatory evolution of innate immunity through co-option of endogenous retroviruses. Science 351, 1083–1087 (2016).

    CAS  Article  Google Scholar 

  10. 10.

    Hon, G. C. et al. Global DNA hypomethylation coupled to repressive chromatin domain formation and gene silencing in breast cancer. Genome Res. 22, 246–258 (2012).

    CAS  Article  Google Scholar 

  11. 11.

    Baylin, S. B. & Jones, P. A. A decade of exploring the cancer epigenome—biological and translational implications. Nat. Rev. Cancer 11, 726–734 (2011).

    CAS  Article  Google Scholar 

  12. 12.

    Esteller, M. Cancer epigenomics: DNA methylomes and histone-modification maps. Nat. Rev. Genet. 8, 286–298 (2007).

    CAS  Article  Google Scholar 

  13. 13.

    Babaian, A. et al. Onco-exaptation of an endogenous retroviral LTR drives IRF5 expression in Hodgkin lymphoma. Oncogene 35, 2542–2546 (2016).

    CAS  Article  Google Scholar 

  14. 14.

    Lamprecht, B. et al. Derepression of an endogenous long terminal repeat activates the CSF1R proto-oncogene in human lymphoma. Nat. Med. 16, 571–579 (2010).

    CAS  Article  Google Scholar 

  15. 15.

    Lock, F. E. et al. A novel isoform of IL-33 revealed by screening for transposable element promoted genes in human colorectal cancer. PLoS ONE 12, 1–30 (2017).

    Article  Google Scholar 

  16. 16.

    Scarf, I. et al. Identification of a new subclass of ALK-negative ALCL expressing aberrant levels of ERBB4 transcripts. Blood 127, 221–233 (2016).

    Article  Google Scholar 

  17. 17.

    Wiesner, T. et al. Alternative transcription initiation leads to expression of a novel ALK isoform in cancer. Nature 526, 453–457 (2015).

    CAS  Article  Google Scholar 

  18. 18.

    Wolff, E. M. et al. Hypomethylation of a LINE-1 promoter activates an alternate transcript of the MET oncogene in bladders with cancer. PLoS Genet. 6, e1000917 (2010).

    Article  Google Scholar 

  19. 19.

    Liu, Y., Sun, J. & Zhao, M. ONGene: a literature-based database for human oncogenes. J. Genet. Genomics 44, 119–121 (2017).

    Article  Google Scholar 

  20. 20.

    Raskin, L. et al. Transcriptome profiling identifies HMGA2 as a biomarker of melanoma progression and prognosis. J. Invest. Dermatol. 133, 2585–2592 (2013).

    CAS  Article  Google Scholar 

  21. 21.

    Zhang, X. et al. SALL4: An emerging cancer biomarker and target. Cancer Lett. 357, 55–62 (2015).

    CAS  Article  Google Scholar 

  22. 22.

    Wang, T. et al. Aberrant regulation of the LIN28A/LIN28B and let-7 loop in human malignant tumors and its effects on the hallmarks of cancer. Mol. Cancer 14, 125 (2015).

    Article  Google Scholar 

  23. 23.

    Nguyen, L. H. et al. Lin28b is sufficient to drive liver cancer and necessary for its maintenance in murine models. Cancer Cell 26, 248–261 (2014).

    CAS  Article  Google Scholar 

  24. 24.

    Viswanathan, S. R. et al. Lin28 promotes transformation and is associated with advanced human malignancies. Nat. Genet. 41, 843–848 (2009).

    CAS  Article  Google Scholar 

  25. 25.

    Forrest, A. R. R. et al. A promoter-level mammalian expression atlas. Nature 507, 462–470 (2014).

    CAS  Article  Google Scholar 

  26. 26.

    Brocks, D. et al. DNMT and HDAC inhibitors induce cryptic transcription start sites encoded in long terminal repeats. Nat. Genet. 49, 1052–1060 (2017).

    CAS  Article  Google Scholar 

  27. 27.

    Shiraki, T. et al. Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage. Proc. Natl Acad. Sci. USA 100, 15776–15781 (2003).

    CAS  Article  Google Scholar 

  28. 28.

    Suzuki, A. et al. Aberrant transcriptional regulations in cancers: genome, transcriptome and epigenome analysis of lung adenocarcinoma cell lines. Nucleic Acids Res. 42, 13557–13572 (2014).

    CAS  Article  Google Scholar 

  29. 29.

    Johnson, C. D. et al. The let-7 microRNA represses cell proliferation pathways in human cells. Cancer Res. 67, 7713–7722 (2007).

    CAS  Article  Google Scholar 

  30. 30.

    Newman, M. A., Thomson, J. M. & Hammond, S. M. Lin-28 interaction with the Let-7 precursor loop mediates regulated microRNA processing. RNA 14, 1539–1549 (2008).

    CAS  Article  Google Scholar 

  31. 31.

    Zhou, J., Ng, S. B. & Chng, W. J. LIN28/LIN28B: an emerging oncogenic driver in cancer stem cells. Int. J. Biochem. Cell Biol. 45, 973–978 (2013).

    CAS  Article  Google Scholar 

  32. 32.

    Moqtaderi, Z. et al. Genomic binding profiles of functionally distinct RNA polymerase III transcription complexes in human cells. Nat. Struct. Mol. Biol. 17, 635–640 (2010).

    CAS  Article  Google Scholar 

  33. 33.

    Rice, P., Longden, L. & Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite. Trends Genet. 16, 276–277 (2000).

    CAS  Article  Google Scholar 

  34. 34.

    Hubley, R. et al. The Dfam database of repetitive DNA families. Nucleic Acids Res. 44, D81–D89 (2016).

    CAS  Article  Google Scholar 

  35. 35.

    Grant, C. E., Bailey, T. L. & Noble, W. S. FIMO: Scanning for occurrences of a given motif. Bioinformatics 27, 1017–1018 (2011).

    CAS  Article  Google Scholar 

  36. 36.

    Guo, W. et al. A LIN28B tumor-specific transcript in cancer. Cell Rep. 22, 2094–2106 (2018).

    Article  Google Scholar 

  37. 37.

    Beketaev, I. et al. cis-regulatory control of Mesp1 expression by YY1 and SP1 during mouse embryogenesis. Dev. Dyn. 245, 379–387 (2016).

    CAS  Article  Google Scholar 

  38. 38.

    Heo, I. et al. Lin28 mediates the terminal uridylation of let-7 precursor microRNA. Mol. Cell 32, 276–284 (2008).

    CAS  Article  Google Scholar 

  39. 39.

    Viswanathan, S. R., Daley, G. Q. & Gregory, R. I. Selective blockade of microRNA processing by Lin28. Science 320, 97–100 (2008).

    CAS  Article  Google Scholar 

  40. 40.

    Rybak, A. et al. A feedback loop comprising lin-28 and let-7 controls pre-let-7 maturation during neural stem-cell commitment. Nat. Cell Biol. 10, 987–993 (2008).

    CAS  Article  Google Scholar 

  41. 41.

    Tanenbaum, M. E., Gilbert, L. A., Qi, L. S., Weissman, J. S. & Vale, R. D. A protein-tagging system for signal amplification in gene expression and fluorescence imaging. Cell 159, 635–646 (2014).

    CAS  Article  Google Scholar 

  42. 42.

    Morita, S. et al. Targeted DNA demethylation in vivo using dCas9-peptide repeat and scFv-TET1 catalytic domain fusions. Nat. Biotechnol. 34, 1060–1065 (2016).

    CAS  Article  Google Scholar 

  43. 43.

    Huang, Y. H. et al. DNA epigenome editing using CRISPR-Cas SunTag-directed DNMT3A. Genome Biol. 18, 1–11 (2017).

    CAS  Article  Google Scholar 

  44. 44.

    Harrow, J. et al. GENCODE: the reference human genome annotation for the ENCODE project. Genome Res. 22, 1760–1774 (2012).

    CAS  Article  Google Scholar 

  45. 45.

    Tarailo-Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinforma. 25, 1–14 (2009).

    Google Scholar 

  46. 46.

    Karolchik, D. The UCSC Table Browser data retrieval tool. Nucleic Acids Res. 32, D493–D496 (2004).

    CAS  Article  Google Scholar 

  47. 47.

    Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33, 290–295 (2015).

    CAS  Article  Google Scholar 

  48. 48.

    Kang, Y. J. et al. CPC2: A fast and accurate coding potential calculator based on sequence intrinsic features. Nucleic Acids Res. 45, W12–W16 (2017).

    CAS  Article  Google Scholar 

  49. 49.

    Dunham, I. et al. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).

    CAS  Article  Google Scholar 

  50. 50.

    Krueger, F. & Andrews, S. R. Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications. Bioinformatics 27, 1571–1572 (2011).

    CAS  Article  Google Scholar 

  51. 51.

    Corces, M. R. et al. An improved ATAC-seq protocol reduces background and enables interrogation of frozen tissues. Nat. Meth. 14, 959–962 (2017).

    CAS  Article  Google Scholar 

  52. 52.

    Bayat, A., Gaëta, B., Ignjatovic, A. & Parameswaran, S. Improved VCF normalization for accurate VCF comparison. Bioinformatics 33, 964–970 (2017).

    CAS  PubMed  Google Scholar 

  53. 53.

    Salimullah, M., Mizuho, S., Plessy, C. & Carninci, P. NanoCAGE: a high-resolution technique to discover and interrogate cell transcriptomes. Cold Spring Harb. Protoc. 6, 96–111 (2011).

    Google Scholar 

  54. 54.

    Haberle, V. et al. CAGEr: precise TSS data retrieval and high-resolution promoterome mining for integrative analyses. Nucleic Acids Res. 43, e51(2015).

    Article  Google Scholar 

  55. 55.

    Zhou, X. et al. The human epigenome browser at Washington University. Nat. Meth. 8, 989–990 (2011).

    CAS  Article  Google Scholar 

  56. 56.

    Wang, X. Primer sequences for 96 cancer-related miRNA assays. RNA 15, 716–723 (2009).

    CAS  Article  Google Scholar 

  57. 57.

    Haeussler, M. et al. Evaluation of off-target and on-target scoring algorithms and integration into the guide RNA selection tool CRISPOR. Genome Biol. 17, 1–12 (2016).

    Article  Google Scholar 

  58. 58.

    Moreno-Mateos, M. A. et al. CRISPRscan: designing highly efficient sgRNAs for CRISPR-Cas9 targeting in vivo. Nat. Meth. 12, 982–988 (2015).

    CAS  Article  Google Scholar 

Download references


We would like to thank J. Hoisington-López and M.L. Jaeger from The Edison Family Center for Genome Sciences & Systems Biology (CGSSB) for assistance with sequencing; B. Koebbe and E. Martin from CGSSB for data processing; M. Savio, M. Patana and D. Schweppe from the Siteman Flow Cytometry core for FACS-related expertise; M. Goodell and Y. Huang for valuable expertise with the SunTag-DNMT3A system and L. Maggi (Washington University School of Medicine) for generously gifting the H838 cell line. This work was funded by NIH grant numbers 5R01HG007175, U24ES026699 and U01HG009391 and the American Cancer Society Research Scholar grant number RSG-14-049-01-DMC. H.S.J. was supported by a grant from NIGMS (no. T32 GM007067). A.Y.D. is supported by a grant from NHGRI (no. T32 HG000045). N.M.S. is a Howard Hughes Medical Institute (H.H.M.I.) Medical Research Fellow, The xenograft work cited in this publication was performed in a facility supported by NCRR grant number C06 RR015502. E.C.P. was supported by Postdoctoral Fellowship PF-17-201-01-TBG from the American Cancer Society.

Author information




H.S.J., N.M.S. and T.W. conceived and implemented the study. N.M.S., H.S.J., E.C.P., D.L. and T.W. contributed to the computational analysis. H.S.J. generated transcriptomic and epigenomic profiles of cell lines. H.S.J., X.X. and D.Z. performed the CRISPR-mediated deletion experiments. H.S.J. and Z.Z.D. performed the promoter-luciferase, motif mutagenesis and let-7 qPCR experiments. H.S.J. and A.Y.D. performed the growth and migration assays. H.S.J., A.Y.D., D.O. and J.I.G. performed the xenograft experiments. H.S.J., P.M.G. and S.K. performed the targeted methylation experiments. H.S.J. performed the rescue experiments. The manuscript was prepared and revised by H.S.J., N.M.S. and T.W. with input from all authors.

Corresponding author

Correspondence to Ting Wang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated supplementary information

Supplementary Figure 1 RNA-seq computational pipeline detects numerous TE onco-exaptation events in 15 cancer types from TCGA.

a, Number of cases from various cancer types that were analyzed. b, Schematic of the computational pipeline describing how RNA-seq from TCGA was processed to identify onco-exaptation candidates.

Supplementary Figure 2 TE locations and annotations that are implicated in onco-exaptation events.

a, The genomic locations of TEs that act as cryptic promoters for oncogenes across different cancer types. b, Distribution of TE classes across cancer types. Total number of unique TEs that contribute to onco-exaptation events are labeled on top. c, Distribution of each TE family that contributed to onco-exaptation across 15 cancer types.

Supplementary Figure 3 Oncogene expression profiles of the top ten onco-exaptation candidates across cancer types.

Oncogene expression of each tumor with and without an onco-exaptation event. Each gray dot represents a tumor while the red dots reveal whether the tumor is predicted to have an onco-exaptation event. Total expression is represented as log2(FPKM-UQ) provided by TCGA GDC (

Supplementary Figure 4 Overall survival effect of onco-exaptation candidates.

Kaplan-Meier Curves for the 8 examples of where a top 10 candidate was significantly prognostic in a cancer (p < 0.05) based on log-rank statistical test (two-sided). The red line in each graph represents patients where the candidate was found to be present, and in blue line represents all the patients where the candidate was not detected. All were found to negatively impact overall survival. The number of biologically independent patients is listed within each plot.

Supplementary Figure 5 Transcription-start-site verification of onco-exaptation candidates in the H727 lung cancer cell line.

a, WashU Epigenome browser ( view of CAGE-seq and mate-paired reads where the forward read initiates from AluJb and the reverse read ends in the gene body of LIN28B in H1299 and b, H838. c, WashU Epigenome browser view of H727 CAGE-seq and mate-paired reads that where the forward read from L1PA2 and the reverse read ends in the gene body of SYT1. d, WashU Epigenome browser view of H727 CAGE-seq and mate-paired reads where the forward read initiates from Tigger3a/MLT1D and the reverse read ends in the gene body of ARID3A. e, WashU Epigenome browser view of H727 CAGE-seq over SYT1 promoters. f, WashU Epigenome browser view of H727 CAGE-seq over ARID3A promoter.

Supplementary Figure 6 The AluJb TE is methylated in somatic tissue and is epigenetically dysregulated in lung cancer cell lines.

a, DNA methylome profiles of multiple somatic tissues from the Roadmap Epigenomics Project ( are displayed on the WashU Epigenome Browser ( b, Schematics showing DNA methylation levels of AluJb in different cancer cell lines. An alternative start codon present in AluJb generates a chimeric LIN28B peptide that lacks 3 amino acids contributed by exon 1, but has 22 novel amino acids prepended. c, Predicted amino acid sequence of AluJb-LIN28B protein. d, Cropped Western blot (repeated twice with similar results) representing the size difference between the AluJb-LIN28B protein and canonical LIN28B protein.

Supplementary Figure 7 Genotypes of AluJb-P and LIN28BP CRISPR-deleted clones.

The CRISPR-mediated genetic breaks are illustrated for H1299 (left) and H838 (right) CRISPR clones. gRNA sequences are illustrated with various colors. AluJb1 KO clones were generated by deleted genomic region between gRNA-A1 and gRNA-A3. AluJb2 KO clones were generated by deleted genomic region between gRNA-A2 and gRNA-A3. LIN28BP KO clones were generated by deleted genomic region between gRNA-L1 and gRNA-L2. Two independent clones were profiled for each set of CRISPR KOs. Two independent clones were profiled for each set of CRISPR KOs. The genotyping gel results were replicated twice in independent experiments.

Supplementary Figure 8 K562 control experiments for AluJb-LIN28B candidate validation.

a, Promoter-luciferase (n = 3 independent experiments) results illustrating transcriptional activity of various TE arrangements in K562. Welch’s t-test was performed against reverse complement AluJb sequence (limited activity). b, Luciferase assays (n = 3 independent experiments) for mutagenized transcription factor motifs in K562. Relative luciferase activty for mutagenized vectors were compared to wild-type AluJb-P. c, Genotypes of K562 CRISPR KO clones for AluJb-P and LIN28BP deletions. Genotype check was repeated twice with similar results. d, Cropped Western blot for LIN28B in K562 CRISPR KO clones. K562 also expresses a smaller isoform of LIN28B that is not present in H1299 and H838. This experiment was repeated twice with similar results. e, Relative let-7a, let-7b and let-7g miRNA levels compared to wild-type in CRISPR-knockout clones of K562 as measured by qPCR (n = 3 independent experiments). f, CCK-8 growth assay measuring cell growth rate of K562 WT and CRISPR clones (n = 3 independent experiments). a,b, P-values were calculated using two-tailed Welch t-test. a,b,e,f, All data are represented as means ± SE.

Supplementary Figure 9 Uncropped western blots and gel images.

Black boxes denote cropped images that are presented in the manuscript.

Supplementary Information

Supplementary Information

Supplementary Figures 1–9

Reporting Summary

Supplementary Table 1

Compiled oncogene and onco-exaptation list.

Supplementary Table 2

All TE-derived alternative isoforms of oncogenes.

Supplementary Table 3

Tumor-enriched onco-exaptation events and their distribution across 15 TCGA cancers.

Supplementary Table 4

Multiple TE-derived oncogene activation.

Supplementary Table 5

Top candidates in lung adenocarcinoma cell lines.

Supplementary Table 6

Primer sequences.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Jang, H.S., Shah, N.M., Du, A.Y. et al. Transposable elements drive widespread expression of oncogenes in human cancers. Nat Genet 51, 611–617 (2019).

Download citation

Further reading


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing