Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

Computational tools to unmask transposable elements

Abstract

A substantial proportion of the genome of many species is derived from transposable elements (TEs). Moreover, through various self-copying mechanisms, TEs continue to proliferate in the genomes of most species. TEs have contributed numerous regulatory, transcript and protein innovations and have also been linked to disease. However, notwithstanding their demonstrated impact, many genomic studies still exclude them because their repetitive nature results in various analytical complexities. Fortunately, a growing array of methods and software tools are being developed to cater for them. This Review presents a summary of computational resources for TEs and highlights some of the challenges and remaining gaps to perform comprehensive genomic analyses that do not simply ‘mask’ repeats.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Computational tools to analyse TEs.
Fig. 2: Discovery and annotation of TEs and repeats in genomes.
Fig. 3: Detection of polymorphic TE insertions.
Fig. 4: Functional impacts of TEs.

Similar content being viewed by others

References

  1. McClintock, B. Mutable loci in maize. Carnegie Inst. Wash. 47, 155–169 (1948).

    Google Scholar 

  2. McClintock, B. The origin and behavior of mutable loci in maize. Proc. Natl Acad. Sci. USA 36, 344–355 (1950).

    CAS  PubMed  PubMed Central  Google Scholar 

  3. Kazazian, H. H. Mobile elements: drivers of genome evolution. Science 303, 1626–1632 (2004).

    CAS  PubMed  Google Scholar 

  4. Garrett, R. A., She, Q., Brügger, K., Faguy, D. & Redder, P. in Mobile DNA II (eds. Craig N. L., Craigie, R., Gellert, M. & Lambowitz, A. M.) 1060–1073 (American Society of Microbiology, Washington, DC, 2002).

  5. Finnegan, D. J. Eukaryotic transposable elements and genome evolution. Trends Genet. 5, 103–107 (1989).

    CAS  PubMed  Google Scholar 

  6. Lander, E. S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).

    CAS  PubMed  Google Scholar 

  7. Kronmiller, B. A. & Wise, R. P. TEnest: automated chronological annotation and visualization of nested plant transposable elements. Plant Physiol. 146, 45–59 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  8. Wicker, T. et al. A unified classification system for eukaryotic transposable elements. Nat. Rev. Genet. 8, 973–982 (2007).

    CAS  PubMed  Google Scholar 

  9. Goodwin, T. J. & Poulter, R. T. The DIRS1 group of retrotransposons. Mol. Biol. Evol. 18, 2067–2082 (2001).

    CAS  PubMed  Google Scholar 

  10. Duval-Valentin, G., Marty-Cointin, B. & Chandler, M. Requirement of IS911 replication before integration defines a new bacterial transposition pathway. EMBO J. 23, 3897–3906 (2004).

    CAS  PubMed  PubMed Central  Google Scholar 

  11. de Koning, A. P. J., Gu, W., Castoe, T. A., Batzer, M. A. & Pollock, D. D. Repetitive elements may comprise over two-thirds of the human genome. PLOS Genet. 7, e1002384 (2011).

    PubMed  PubMed Central  Google Scholar 

  12. Hata, K. & Sakaki, Y. Identification of critical CpG sites for repression of L1 transcription by DNA methylation. Gene 189, 227–234 (1997).

    CAS  PubMed  Google Scholar 

  13. Slotkin, R. K. & Martienssen, R. Transposable elements and the epigenetic regulation of the genome. Nat. Rev. Genet. 8, 272–285 (2007).

    CAS  PubMed  Google Scholar 

  14. Malone, C. D. & Hannon, G. J. Small RNAs as guardians of the genome. Cell 136, 656–668 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  15. Levin, H. L. & Moran, J. V. Dynamic interactions between transposable elements and their hosts. Nat. Rev. Genet. 12, 615–627 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  16. Ewing, A. D. & Kazazian, H. H. High-throughput sequencing reveals extensive variation in human-specific L1 content in individual human genomes. Genome Res. 20, 1262–1270 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  17. Xing, J. et al. Mobile elements create structural variation: analysis of a complete human genome. Genome Res. 19, 1516–1526 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  18. Hancks, D. C. & Kazazian, H. H. Roles for retrotransposon insertions in human disease. Mob. DNA 7, 9 (2016).

    PubMed  PubMed Central  Google Scholar 

  19. Huang, C. R. L., Burns, K. H. & Boeke, J. D. Active transposition in genomes. Annu. Rev. Genet. 46, 651–675 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  20. Emmons, S. W. & Yesner, L. High-frequency excision of transposable element Tc 1 in the nematode Caenorhabditis elegans is limited to somatic cells. Cell 36, 599–605 (1984).

    CAS  PubMed  Google Scholar 

  21. Fernandez, L., Torregrosa, L., Segura, V., Bouquet, A. & Martinez-Zapater, J. M. Transposon-induced gene activation as a mechanism generating cluster shape somatic variation in grapevine. Plant J. 61, 545–557 (2010).

    CAS  PubMed  Google Scholar 

  22. Miki, Y. et al. Disruption of the APC gene by a retrotransposal insertion of L1 sequence in a colon cancer. Cancer Res. 52, 643–645 (1992).

    CAS  PubMed  Google Scholar 

  23. van den Hurk, J. A. et al. L1 retrotransposition can occur early in human embryonic development. Hum. Mol. Genet. 16, 1587–1592 (2007).

    PubMed  Google Scholar 

  24. Muotri, A. R. et al. Somatic mosaicism in neuronal precursor cells mediated by L1 retrotransposition. Nature 435, 903–910 (2005).

    CAS  PubMed  Google Scholar 

  25. Coufal, N. G. et al. L1 retrotransposition in human neural progenitor cells. Nature 460, 1127–1131 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  26. Baillie, J. K. et al. Somatic retrotransposition alters the genetic landscape of the human brain. Nature 479, 534–537 (2011). This study is the first mapping of somatic retrotransposition events in the human brain and is performed with the capture-based polymorphic TE detection tool RC-seq.

    CAS  PubMed  PubMed Central  Google Scholar 

  27. Goodier, J. L. Retrotransposition in tumors and brains. Mob. DNA 5, 11 (2014).

    PubMed  PubMed Central  Google Scholar 

  28. Volff, J.-N. Turning junk into gold: domestication of transposable elements and the creation of new genes in eukaryotes. Bioessays 28, 913–922 (2006).

    CAS  PubMed  Google Scholar 

  29. Elbarbary, R. A., Lucas, B. A. & Maquat, L. E. Retrotransposons as regulators of gene expression. Science 351, aac7247 (2016).

    PubMed  PubMed Central  Google Scholar 

  30. Chuong, E. B., Elde, N. C. & Feschotte, C. Regulatory activities of transposable elements: from conflicts to benefits. Nat. Rev. Genet. 18, 71–86 (2017).

    CAS  PubMed  Google Scholar 

  31. Bourque, G. et al. Evolution of the mammalian transcription factor binding repertoire via transposable elements. Genome Res. 18, 1752–1762 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  32. Jacques, P.-É., Jeyakani, J. & Bourque, G. The majority of primate-specific regulatory sequences are derived from transposable elements. PLOS Genet. 9, e1003504 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  33. Venuto, D. & Bourque, G. Identifying co-opted transposable elements using comparative epigenomics. Dev. Growth Differ. 60, 53–62 (2018).

    CAS  PubMed  Google Scholar 

  34. Kim, D.-S. et al. LINE FUSION GENES: a database of LINE expression in human genes. BMC Genomics 7, 139 (2006).

    PubMed  PubMed Central  Google Scholar 

  35. Mariner, P. D. et al. Human Alu RNA is a modular transacting repressor of mRNA transcription during heat shock. Mol. Cell 29, 499–509 (2008).

    CAS  PubMed  Google Scholar 

  36. Lubelsky, Y. & Ulitsky, I. Sequences enriched in Alu repeats drive nuclear localization of long RNAs in human cells. Nature 555, 107–111 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  37. Babaian, A. & Mager, D. L. Endogenous retroviral promoter exaptation in human cancer. Mob. DNA 7, 24 (2016).

    PubMed  PubMed Central  Google Scholar 

  38. Lu, X. et al. The retrovirus HERVH is a long noncoding RNA required for human embryonic stem cell identity. Nat. Struct. Mol. Biol. 21, 423–425 (2014).

    CAS  PubMed  Google Scholar 

  39. Naville, M. et al. Not so bad after all: retroviruses and long terminal repeat retrotransposons as a source of new genes in vertebrates. Clin. Microbiol. Infect. 22, 312–323 (2016).

    CAS  PubMed  Google Scholar 

  40. Lyon, M. F. Do LINEs have a role in X-chromosome inactivation? J. Biomed. Biotechnol. 2006, 59746 (2006).

    PubMed  PubMed Central  Google Scholar 

  41. Chuong, E. B., Elde, N. C. & Feschotte, C. Regulatory evolution of innate immunity through co-option of endogenous retroviruses. Science 351, 1083–1087 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  42. Wang, T. et al. Species-specific endogenous retroviruses shape the transcriptional network of the human tumor suppressor protein p53. Proc. Natl Acad. Sci. USA 104, 18613–18618 (2007).

    CAS  PubMed  PubMed Central  Google Scholar 

  43. Bao, W., Kojima, K. K. & Kohany, O. Repbase update, a database of repetitive elements in eukaryotic genomes. Mob. DNA 6, 11 (2015). This article presents the most comprehensive collection of TE consensus sequences from eukaryotic genomes, used with references 44 and 45 in RepeatMasker genome annotations.

    PubMed  PubMed Central  Google Scholar 

  44. Wheeler, T. J. et al. Dfam: a database of repetitive DNA based on profile hidden Markov models. Nucleic Acids Res. 41, D70–D82 (2013).

    CAS  PubMed  Google Scholar 

  45. Hubley, R. et al. The Dfam database of repetitive DNA families. Nucleic Acids Res. 44, D81–D89 (2016). References 44 and 45 present a eukaryotic TE consensus database with added HMM profiles used to improve genomic annotation of TEs.

    CAS  PubMed  Google Scholar 

  46. Wicker, T., Matthews, D. E. & Keller, B. TREP: a database for Triticeae repetitive elements. Trends Plant Sci. 7, 561–562 (2002).

    CAS  Google Scholar 

  47. Chen, J., Hu, Q., Zhang, Y., Lu, C. & Kuang, H. P-MITE: a database for plant miniature inverted-repeat transposable elements. Nucleic Acids Res. 42, D1176–D1181 (2014).

    CAS  PubMed  Google Scholar 

  48. Copetti, D. et al. RiTE database: a resource database for genus-wide rice genomics and evolutionary biology. BMC Genomics 16, 538 (2015).

    PubMed  PubMed Central  Google Scholar 

  49. Bousios, A. et al. MASiVEdb: the sirevirus plant retrotransposon database. BMC Genomics 13, 158 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  50. Levy, A., Sela, N. & Ast, G. TranspoGene and microTranspoGene: transposed elements influence on the transcriptome of seven vertebrates and invertebrates. Nucleic Acids Res. 36, D47–D52 (2007).

    PubMed  PubMed Central  Google Scholar 

  51. Kim, T.-H., Jeon, Y.-J., Kim, W.-Y. & Kim, H.-S. HESAS: HERVs expression and structure analysis system. Bioinformatics 21, 1699–1700 (2005).

    CAS  PubMed  Google Scholar 

  52. Spannagl, M. et al. PGSB PlantsDB: updates to the database framework for comparative plant genome research. Nucleic Acids Res. 44, D1141–D1147 (2016). This article presents a combination of multiple plant databases containing TE consensus sequences, annotated instances and polymorphic insertions.

    CAS  PubMed  Google Scholar 

  53. Murukarthick, J. et al. BrassicaTED - a public database for utilization of miniature transposable elements in Brassica species. BMC Res. Notes 7, 379 (2014).

    PubMed  PubMed Central  Google Scholar 

  54. Wang, J. et al. dbRIP: a highly integrated database of retrotransposon insertion polymorphisms in humans. Hum. Mutat. 27, 323–329 (2006).

    PubMed  PubMed Central  Google Scholar 

  55. Mir, A. A., Philippe, C. & Cristofari, G. euL1db: the European database of L1HS retrotransposon insertions in humans. Nucleic Acids Res. 43, D43–D47 (2015). The euL1db database contains the most comprehensive collection of polymorphic L1Hs insertions in human genomes.

    CAS  PubMed  Google Scholar 

  56. Gardner, E. J. et al. The mobile element locator tool (MELT): population-scale mobile element discovery and biology. Genome Res. 27, 1916–1929 (2017). This paper presents a great example of a polymorphic TE detection tool that also provides characterization of insertions, and it was used for the 1000 Genomes Project.

    CAS  PubMed  PubMed Central  Google Scholar 

  57. Daron, J. et al. Organization and evolution of transposable elements along the bread wheat chromosome 3B. Genome Biol. 15, 546 (2014).

    PubMed  PubMed Central  Google Scholar 

  58. Darzentas, N., Bousios, A., Apostolidou, V. & Tsaftaris, A. S. MASiVE: mapping and analysis of sirevirus elements in plant genome sequences. Bioinformatics 26, 2452–2454 (2010).

    CAS  PubMed  Google Scholar 

  59. Xiong, W., He, L., Lai, J., Dooner, H. K. & Du, C. HelitronScanner uncovers a large overlooked cache of helitron transposons in many plant genomes. Proc. Natl Acad. Sci. USA 111, 10263–10268 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  60. You, F. M., Cloutier, S., Shan, Y. & Ragupathy, R. LTR annotator: automated identification and annotation of LTR retrotransposons in plant genomes. IJBBB 5, 165–174 (2015).

    CAS  Google Scholar 

  61. Lee, H. et al. MGEScan: a Galaxy-based system for identifying retrotransposons in genomes. Bioinformatics 32, 2502–2504 (2016).

    CAS  PubMed  Google Scholar 

  62. Goecks, J., Nekrutenko, A., Taylor, J. & Galaxy Team. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 11, R86 (2010).

    PubMed  PubMed Central  Google Scholar 

  63. Steinbiss, S., Willhoeft, U., Gremme, G. & Kurtz, S. Fine-grained annotation and classification of de novo predicted LTR retrotransposons. Nucleic Acids Res. 37, 7002–7013 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  64. Monat, C., Tando, N., Tranchant-Dubreuil, C. & Sabot, F. LTRclassifier: a website for fast structural LTR retrotransposons classification in plants. Mob Genet. Elements 6, e1241050 (2016).

    PubMed  PubMed Central  Google Scholar 

  65. Bao, Z. & Eddy, S. R. Automated de novo identification of repeat sequence families in sequenced genomes. Genome Res. 12, 1269–1276 (2002).

    CAS  PubMed  PubMed Central  Google Scholar 

  66. Price, A. L., Jones, N. C. & Pevzner, P. A. De novo identification of repeat families in large genomes. Bioinformatics 21 (Suppl. 1), 351–358 (2005).

    Google Scholar 

  67. Smit, A. & Hubley, R. RepeatModeler 1.0.11. RepeatModeler http://www.repeatmasker.org/RepeatModeler/ (2018).

    CAS  PubMed  Google Scholar 

  68. Schaeffer, C. E., Figueroa, N. D., Liu, X. & Karro, J. E. phRAIDER: pattern-hunter based rapid ab initio detection of elementary repeats. Bioinformatics 32, i209–i215 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  69. Girgis, H. Z. Red: an intelligent, rapid, accurate tool for detecting repeats de-novo on the genomic scale. BMC Bioinformatics 16, 860 (2015).

    Google Scholar 

  70. Caballero, J., Smit, A. F. A., Hood, L. & Glusman, G. Realistic artificial DNA sequences as negative controls for computational genomics. Nucleic Acids Res. 42, e99 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  71. Flutre, T., Duprat, E., Feuillet, C. & Quesneville, H. Considering transposable element diversification in de novo annotation approaches. PLOS ONE 6, e16526 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  72. Novák, P., Neumann, P. & Macas, J. Graph-based clustering and characterization of repetitive sequences in next-generation sequencing data. BMC Bioinformatics 11, 378 (2010). This paper presents the first method to discover TEs in unassembled sequencing reads, on which many recent tools are based.

    PubMed  PubMed Central  Google Scholar 

  73. Novák, P., Neumann, P., Pech, J., Steinhaisl, J. & Macas, J. RepeatExplorer: a Galaxy-based web server for genome-wide characterization of eukaryotic repetitive elements from next-generation sequence reads. Bioinformatics 29, 792–793 (2013).

    PubMed  Google Scholar 

  74. Goubert, C. et al. De novo assembly and annotation of the Asian tiger mosquito (Aedes albopictus) repeatome with dnaPipeTE from raw genomic reads and comparative analysis with the yellow fever mosquito (Aedes aegypti). Genome Biol. Evol. 7, 1192–1205 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  75. Zytnicki, M., Akhunov, E. & Quesneville, H. Tedna: a transposable element de novo assembler. Bioinformatics 30, 2656–2658 (2014).

    CAS  PubMed  Google Scholar 

  76. Koch, P., Platzer, M. & Downie, B. R. RepARK—de novo creation of repeat libraries from whole-genome NGS reads. Nucleic Acids Res. 42, e80 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  77. Chu, C., Nielsen, R. & Wu, Y. REPdenovo: inferring de novo repeat motifs from short sequence reads. PLOS ONE 11, e0150719 (2016).

    PubMed  PubMed Central  Google Scholar 

  78. Guo, R. et al. RepLong: de novo repeat identification using long read sequencing data. Bioinformatics 34, 1099–1107 (2018).

    PubMed  Google Scholar 

  79. Lerat, E. Identifying repeats and transposable elements in sequenced genomes: how to find your way through the dense forest of programs. Heredity 104, 520–533 (2010). This detailed review discusses bioinformatics tools for TE annotation and classification.

    CAS  PubMed  Google Scholar 

  80. Hoen, D. R. et al. A call for benchmarking transposable element annotation methods. Mob. DNA 6, 13 (2015).

    PubMed  PubMed Central  Google Scholar 

  81. Kazazian, H. H. et al. Haemophilia A resulting from de novo insertion of L1 sequences represents a novel mechanism for mutation in man. Nature 332, 164–166 (1988).

    CAS  PubMed  Google Scholar 

  82. Yu, F., Zingler, N., Schumann, G. & Strätling, W. H. Methyl-CpG-binding protein 2 represses LINE-1 expression and retrotransposition but not Alu transcription. Nucleic Acids Res. 29, 4493–4501 (2001).

    CAS  PubMed  PubMed Central  Google Scholar 

  83. Muotri, A. R. et al. L1 retrotransposition in neurons is modulated by MeCP2. Nature 468, 443–446 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  84. Linheiro, R. S. & Bergman, C. M. Whole genome resequencing reveals natural target site preferences of transposable elements in Drosophila melanogaster. PLOS ONE 7, e30008 (2012). This article presents a polymorphic TE detection method for fly genomes that showed clade-specific TSD length and enrichment of target site palindromes for TIR and LTR element insertions.

    CAS  PubMed  PubMed Central  Google Scholar 

  85. Nelson, M. G., Linheiro, R. S. & Bergman, C. M. McClintock: an integrated pipeline for detecting transposable element insertions in whole genome shotgun sequencing data. G3 7, 2763–2778 (2017).

    PubMed  PubMed Central  Google Scholar 

  86. Kazazian, H. H. & Moran, J. V. The impact of L1 retrotransposons on the human genome. Nat. Genet. 19, 19–24 (1998).

    CAS  PubMed  Google Scholar 

  87. Goodier, J. L. Transduction of 3′-flanking sequences is common in L1 retrotransposition. Hum. Mol. Genet. 9, 653–657 (2000).

    CAS  PubMed  Google Scholar 

  88. Nakagome, M. et al. Transposon insertion finder (TIF): a novel program for detection of de novo transpositions of transposable elements. BMC Bioinformatics 15, 71 (2014).

    PubMed  PubMed Central  Google Scholar 

  89. 1000 Genomes Project Consortium et al. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010).

    Google Scholar 

  90. Wu, J. et al. Tangram: a comprehensive toolbox for mobile element insertion detection. BMC Genomics 15, 795–715 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  91. Platzer, A., Nizhynska, V. & Long, Q. TE-Locate: a tool to locate and group transposable element occurrences using paired-end next-generation sequencing data. Biology 1, 395–410 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  92. Zhuang, J., Wang, J. & Theurkauf, W. TEMP: a computational method for analyzing transposable element polymorphism in populations. Nucleic Acids Res. 42, 6826–6838 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  93. Tubio, J. M. C. et al. Extensive transduction of nonrepetitive DNA mediated by L1 retrotransposition in cancer genomes. Science 345, 1251343 (2014). This paper presents a method for somatic TE insertion from short sequencing reads and shows extensive L1-driven transposition and 3′ transduction in cancer genomes.

    PubMed  PubMed Central  Google Scholar 

  94. Helman, E. et al. Somatic retrotransposition in human cancer revealed by whole-genome and exome sequencing. Genome Res. 24, 1053–1063 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  95. Hénaff, E., Zapata, L., Casacuberta, J. M. & Ossowski, S. Jitterbug: somatic and germline transposon insertion detection at single-nucleotide resolution. BMC Genomics 16, 768 (2015).

    PubMed  PubMed Central  Google Scholar 

  96. Doucet, T. T. & Kazazian, H. H. Long interspersed element sequencing (L1-Seq): a method to identify somatic LINE-1 insertions in the human genome. Methods Mol. Biol. 1400, 79–93 (2016).

    PubMed  PubMed Central  Google Scholar 

  97. Tang, Z. et al. Human transposon insertion profiling: analysis, visualization and identification of somatic LINE-1 insertions in ovarian cancer. Proc. Natl Acad. Sci. USA 114, E733–E740 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  98. Solyom, S. et al. Extensive somatic L1 retrotransposition in colorectal tumors. Genome Res. 22, 2328–2338 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  99. Erwin, J. A. et al. L1-associated genomic regions are deleted in somatic cells of the healthy human brain. Nat. Neurosci. 19, 1583–1591 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  100. Witherspoon, D. J. et al. Mobile element scanning (ME-Scan) by targeted high-throughput sequencing. BMC Genomics 11, 410 (2010).

    PubMed  PubMed Central  Google Scholar 

  101. Kvikstad, E. M., Piazza, P., Taylor, J. C. & Lunter, G. A high throughput screen for active human transposable elements. BMC Genomics 19, 115 (2018).

    PubMed  PubMed Central  Google Scholar 

  102. Streva, V. A. et al. Sequencing, identification and mapping of primed L1 elements (SIMPLE) reveals significant variation in full length L1 elements between individuals. BMC Genomics 16, 220 (2015).

    PubMed  PubMed Central  Google Scholar 

  103. Disdero, E. & Filée, J. LoRTE: detecting transposon-induced genomic variants using low coverage PacBio long read sequences. Mob. DNA 8, 5 (2017).

    PubMed  PubMed Central  Google Scholar 

  104. Huddleston, J. et al. Discovery and genotyping of structural variation from long-read haploid genome sequence data. Genome Res. 27, 677–685 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  105. Sedlazeck, F. J. et al. Accurate detection of complex structural variations using single-molecule sequencing. Nat. Methods 14, 125 (2018).

    Google Scholar 

  106. Chaisson, M. J. P. et al. Resolving the complexity of the human genome using single-molecule sequencing. Nature 517, 608–611 (2015). This study is a major effort to complete the human reference genome through long-read sequencing and a custom structural variant caller.

    CAS  PubMed  Google Scholar 

  107. Pendleton, M. et al. Assembly and diploid architecture of an individual human genome via single-molecule technologies. Nat. Methods 12, 780–786 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  108. Ewing, A. D. Transposable element detection from whole genome sequence data. Mob. DNA 6, 24 (2015).

    PubMed  PubMed Central  Google Scholar 

  109. Iskow, R. C. et al. Natural mutagenesis of human genomes by endogenous retrotransposons. Cell 141, 1253–1261 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  110. Rishishwar, L., Mariño-Ramírez, L. & Jordan, I. K. Benchmarking computational tools for polymorphic transposable element detection. Brief Bioinform. 18, 908–918 (2017).

    PubMed  Google Scholar 

  111. Kofler, R. SimulaTE: simulating complex landscapes of transposable elements of populations. Bioinformatics 34, 1439 (2018).

    PubMed  PubMed Central  Google Scholar 

  112. Navarro, F. C. & Galante, P. A. RCPedia: a database of retrocopied genes. Bioinformatics 29, 1235–1237 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  113. Jin, Y., Tam, O. H., Paniagua, E. & Hammell, M. TEtranscripts: a package for including transposable elements in differential expression analysis of RNA-seq datasets. Bioinformatics 31, 3593–3599 (2015). This article presents the RNA-seq differential expression software TEtranscripts, shown to be the most accurate at identifying reads from repetitive elements.

    CAS  PubMed  PubMed Central  Google Scholar 

  114. Lanciano, S. et al. Sequencing the extrachromosomal circular mobilome reveals retrotransposon activity in plants. PLOS Genet. 13, e1006630 (2017).

    PubMed  PubMed Central  Google Scholar 

  115. Sundaresan, V. & Freeling, M. An extrachromosomal form of the Mu transposons of maize. Proc. Natl Acad. Sci. USA 84, 4924–4928 (1987).

    CAS  PubMed  PubMed Central  Google Scholar 

  116. Kamal, M., Xie, X. & Lander, E. S. A large family of ancient repeat elements in the human genome is under strong selection. Proc. Natl Acad. Sci. USA 103, 2740–2745 (2006).

    CAS  PubMed  PubMed Central  Google Scholar 

  117. Lowe, C. B., Bejerano, G. & Haussler, D. Thousands of human mobile element fragments undergo strong purifying selection near developmental genes. Proc. Natl Acad. Sci. USA 104, 8005–8010 (2007).

    CAS  PubMed  PubMed Central  Google Scholar 

  118. Chandrashekar, D. S., Dey, P. & Acharya, K. K. GREAM: a web server to short-list potentially important genomic repeat elements based on over-/under-representation in specific chromosomal locations, such as the gene neighborhoods, within or across 17 mammalian species. PLOS One 10, e0133647 (2015). This paper describes a tool that was developed to assess the impact of TEs on genes and biological pathways.

    PubMed  PubMed Central  Google Scholar 

  119. Criscione, S. W., Zhang, Y., Thompson, W., Sedivy, J. M. & Neretti, N. Transcriptional landscape of repetitive elements in normal and cancer human cells. BMC Genomics 15, 583 (2014).

    PubMed  PubMed Central  Google Scholar 

  120. Han, B. W., Wang, W., Zamore, P. D. & Weng, Z. piPipes: a set of pipelines for piRNA and transposon analysis via small RNA-seq, RNA-seq, degradome- and CAGE-seq, ChIP-seq and genomic DNA sequencing. Bioinformatics 31, 593–595 (2015).

    CAS  PubMed  Google Scholar 

  121. Luteijn, M. J. & Ketting, R. F. PIWI-interacting RNAs: from generation to transgenerational epigenetics. Nat. Rev. Genet. 14, 523–534 (2013).

    CAS  PubMed  Google Scholar 

  122. Lerat, E., Fablet, M., Modolo, L., Lopez-Maestre, H. & Vieira, C. TEtools facilitates big data expression analysis of transposable elements and reveals an antagonism between their activity and that of piRNA genes. Nucleic Acids Res. 45, e17 (2017).

    PubMed  Google Scholar 

  123. Robberecht, C., Voet, T., Zamani Esteki, M., Nowakowska, B. A. & Vermeesch, J. R. Nonallelic homologous recombination between retrotransposable elements is a driver of de novo unbalanced translocations. Genome Res. 23, 411–418 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  124. He, D., Hormozdiari, F., Furlotte, N. & Eskin, E. Efficient algorithms for tandem copy number variation reconstruction in repeat-rich regions. Bioinformatics 27, 1513–1520 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  125. Monlong, J. et al. Human copy number variants are enriched in regions of low mappability. Nucleic Acids Res. 7, 225 (2018).

    Google Scholar 

  126. Churakov, G. et al. A novel web-based TinT application and the chronology of the primate Alu retroposon activity. BMC Evol. Biol. 10, 376 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  127. Price, A. L., Eskin, E. & Pevzner, P. A. Whole-genome analysis of Alu repeat elements reveals complex evolutionary history. Genome Res. 14, 2245–2252 (2004).

    CAS  PubMed  PubMed Central  Google Scholar 

  128. Jiang, C., Chen, C., Huang, Z., Liu, R. & Verdier, J. ITIS, a bioinformatics tool for accurate identification of transposon insertion sites using next-generation sequencing data. BMC Bioinformatics 16, 72 (2015).

    PubMed  PubMed Central  Google Scholar 

  129. Daron, J. & Slotkin, R. K. EpiTEome: simultaneous detection of transposable element insertion sites and their DNA methylation levels. Genome Biol. 18, 91 (2017).

    PubMed  PubMed Central  Google Scholar 

  130. Glusman, G. et al. A third approach to gene prediction suggests thousands of additional human transcribed regions. PLOS Comput. Biol. 2, e18 (2006).

    Google Scholar 

  131. Eddy, S. R. The C-value paradox, junk DNA and ENCODE. Curr. Biol. 22, R898–R899 (2012).

    CAS  PubMed  Google Scholar 

  132. Kellis, M. et al. Defining functional DNA elements in the human genome. Proc. Natl Acad. Sci. USA 111, 6131–6138 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  133. Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).

    PubMed  PubMed Central  Google Scholar 

  134. Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012). References 133 and 134 describe the aligners BowTie and BowTie2, which are capable of handling multi-mapped reads.

    CAS  PubMed  PubMed Central  Google Scholar 

  135. Thankaswamy-Kosalai, S., Sen, P. & Nookaew, I. Evaluation and assessment of read-mapping by multiple next-generation sequencing aligners based on genome-wide characteristics. Genomics 109, 186–191 (2017).

    CAS  PubMed  Google Scholar 

  136. Kahles, A., Behr, J. & Rätsch, G. MMR: a tool for read multi-mapper resolution. Bioinformatics 32, 770–772 (2016).

    CAS  PubMed  Google Scholar 

  137. Wang, J., Huda, A., Lunyak, V. V. & Jordan, I. K. A. Gibbs sampling strategy applied to the mapping of ambiguous short-sequence tags. Bioinformatics 26, 2501–2508 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  138. Chung, D. et al. Discovering transcription factor binding sites in highly repetitive regions of genomes with multi-read analysis of ChIP-Seq data. PLOS Comput. Biol. 7, e1002111 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  139. Wang, R. et al. LOcating non-unique matched tags (LONUT) to improve the detection of the enriched regions for ChIP-seq data. PLOS One 8, e67788 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  140. Nakato, R., Itoh, T. & Shirahige, K. DROMPA: easy-to-handle peak calling and visualization software for the computational analysis and validation of ChIP-seq data. Genes Cells 18, 589–601 (2013). References 138–140 are examples of ChIP-seq peak callers developed to include multi-mapped reads in their analyses.

    CAS  PubMed  PubMed Central  Google Scholar 

  141. Trapnell, C. et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat. Protoc. 7, 562–578 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  142. Anders, S., Pyl, P. T. & Huber, W. HTSeq — a Python framework to work with high-throughput sequencing data. Bioinformatics 31, 166–169 (2015).

    CAS  PubMed  Google Scholar 

  143. Bray, N. L., Pimentel, H., Melsted, P. & Pachter, L. Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol. 34, 525–527 (2016).

    CAS  PubMed  Google Scholar 

  144. Patro, R., Duggal, G., Love, M. I., Irizarry, R. A. & Kingsford, C. Salmon provides fast and bias-aware quantification of transcript expression. Nat. Methods 14, 417–419 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  145. Boeke, J. D., Garfinkel, D. J., Styles, C. A. & Fink, G. R. Ty elements transpose through an RNA intermediate. Cell 40, 491–500 (1985).

    CAS  PubMed  Google Scholar 

  146. Eickbush, T. H. & Malik, H. S. in Mobile DNA II (eds. Craig N. L., Craigie, R., Gellert, M. & Lambowitz, A. M.) 1111–1144 (American Society of Microbiology, Washington, DC, 2002).

  147. Piégu, B., Bire, S., Arensburger, P. & Bigot, Y. A survey of transposable element classification systems — a call for a fundamental update to meet the challenge of their diversity and complexity. Mol. Phylogenet. Evol. 86, 90–109 (2015).

    PubMed  Google Scholar 

  148. Llorens, C. et al. The Gypsy Database (GyDB) of mobile genetic elements: release 2.0. Nucleic Acids Res. 39, D70–D74 (2011).

    CAS  PubMed  Google Scholar 

  149. Vassetzky, N. S. & Kramerov, D. A. SINEBase: a database and tool for SINE analysis. Nucleic Acids Res. 41, D83–D89 (2013).

    CAS  PubMed  Google Scholar 

  150. Ma, B., Li, T., Xiang, Z. & He, N. MnTEdb, a collective resource for mulberry transposable elements. Database 2015, bav004 (2015).

    PubMed  PubMed Central  Google Scholar 

  151. Shao, F., Wang, J., Xu, H. & Peng, Z. FishTEDB: a collective database of transposable elements identified in the complete genomes of fish. Database 2018, bax106 (2018).

    PubMed Central  Google Scholar 

  152. Xu, H. E. et al. BmTEdb: a collective database of transposable elements in the silkworm genome. Database 2013, bat055 (2013).

    PubMed  PubMed Central  Google Scholar 

  153. Li, S.-F., Zhang, G.-J., Yuan, J.-H., Deng, C.-L. & Gao, W.-J. Repetitive sequences and epigenetic modification: inseparable partners play important roles in the evolution of plant sex chromosomes. Planta 243, 1083–1095 (2016).

    CAS  PubMed  Google Scholar 

  154. Roberts, A. P. et al. Revised nomenclature for transposable genetic elements. Plasmid 60, 167–173 (2008).

    CAS  PubMed  Google Scholar 

  155. Nakagawa, S. & Takahashi, M. U. gEVE: a genome-based endogenous viral element database provides comprehensive viral protein-coding sequences in mammalian genomes. Database 2016, baw087 (2016).

    PubMed  PubMed Central  Google Scholar 

  156. Paces, J., Pavlícek, A. & Paces, V. HERVd: database of human endogenous retroviruses. Nucleic Acids Res. 30, 205–206 (2002).

    CAS  PubMed  PubMed Central  Google Scholar 

  157. Lappalainen, I. et al. DbVar and DGVa: public archives for genomic structural variation. Nucleic Acids Res. 41, D936–D941 (2013).

    CAS  PubMed  Google Scholar 

  158. Rahman, R. et al. Unique transposon landscapes are pervasive across Drosophila melanogastergenomes. Nucleic Acids Res. 43, 10655–10672 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  159. Ye, C., Ji, G. & Liang, C. detectMITE: a novel approach to detect miniature inverted repeat transposable elements in genomes. Sci. Rep. 6, 19688 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  160. Keane, T. M., Wong, K. & Adams, D. J. RetroSeq: transposable element discovery from next-generation sequencing data. Bioinformatics 29, 389–390 (2013).

    CAS  PubMed  Google Scholar 

  161. Hormozdiari, F. et al. Next-generation VariationHunter: combinatorial algorithms for transposon insertion discovery. Bioinformatics 26, i350–i357 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  162. Gilly, A. et al. TE-Tracker: systematic identification of transposition events through whole-genome resequencing. BMC Bioinformatics 15, 377 (2014).

    PubMed  PubMed Central  Google Scholar 

  163. Thung, D. T. et al. Mobster: accurate detection of mobile element insertions in next generation sequencing data. Genome Biol. 15, 488 (2014).

    PubMed  PubMed Central  Google Scholar 

  164. Quadrana, L. et al. The Arabidopsis thaliana mobilome and its impact at the species level. eLife 5, e15716 (2016).

    PubMed  PubMed Central  Google Scholar 

  165. David, M., Mustafa, H. & Brudno, M. Detecting Alu insertions from high-throughput sequencing data. Nucleic Acids Res. 41, e169 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  166. Tica, J. et al. Next-generation sequencing-based detection of germline L1-mediated transductions. BMC Genomics 17, 342 (2016).

    PubMed  PubMed Central  Google Scholar 

  167. Du, C., Caronna, J., He, L. & Dooner, H. K. Computational prediction and molecular confirmation of Helitron transposons in the maize genome. BMC Genomics 9, 51 (2008).

    PubMed  PubMed Central  Google Scholar 

  168. Fiston-Lavier, A. S., Barron, M. G., Petrov, D. A. & Gonzalez, J. T-Lex2: genotyping, frequency estimation and re-annotation of transposable elements using single or pooled next-generation sequencing data. Nucleic Acids Res. 43, e22 (2015).

    PubMed  Google Scholar 

  169. Kofler, R., Betancourt, A. J. & Schlötterer, C. Sequencing of pooled DNA samples (Pool-Seq) uncovers complex dynamics of transposable element insertions in Drosophila melanogaster. PLOS Genet. 8, e1002487 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  170. Kofler, R. & Gómez-Sánchez, D. PoPoolationTE2: comparative population genomics of transposable elements using Pool-Seq. Mol. Biol. Evol. 33, 2759–2764 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  171. Cridland, J. M., Macdonald, S. J., Long, A. D. & Thornton, K. R. Abundance and distribution of transposable elements in two Drosophila QTL mapping resources. Mol. Biol. Evol. 30, 2311–2327 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  172. Chen, J., Wrightsman, T. R., Wessler, S. R. & Stajich, J. E. RelocaTE2: a high resolution transposable element insertion site mapping tool for population resequencing. PeerJ 5, e2942 (2017).

    PubMed  PubMed Central  Google Scholar 

  173. Stuart, T. et al. Population scale mapping of transposable element diversity reveals links to gene regulation and epigenomic variation. eLife 5, e20777 (2016).

    PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

This work was supported by a grant from the Canadian Institute for Health Research (CIHR-MOP-115090). P.G.-P. is supported by the Programme de bourses de formation de doctorat du Fonds de Recherche Québec Santé (FRSQ-31874). G.B. is supported by the Fonds de Recherche Québec Santé (FRQS-25348). The authors also thank J.M.M. Monlong and the reviewers for very useful comments on the manuscript.

Reviewer information

Nature Reviews Genetics thanks E. Lerat, A. Smit and the other, anonymous reviewer(s) for their contribution to the peer review of this work.

Author information

Authors and Affiliations

Authors

Contributions

P.G.-P. and G.B. contributed to all aspects of the manuscript.

Corresponding author

Correspondence to Guillaume Bourque.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Related links

NovoAlign: http://www.novocraft.com/products/novoalign

RepeatMasker: http://www.repeatmasker.org

Glossary

TE annotation

Assembled genomes are annotated to indicate which sequences are derived from transposable elements (TEs). The annotation reveals which families of TEs are present as well as the percentage of TE-derived sequences in a genome.

Repression mechanisms

Active transposable elements contain promoters that can initiate transcription. They are ‘silenced’ through various repression mechanisms to prevent transcription and further mobilization.

Polymorphic insertions

Individual transposable element instances that have not been fixed in a species genome and are present in some re-sequenced genomes but absent from others, such as the reference genome. Polymorphic insertions can be either germline or somatic.

Long interspersed nuclear element 1

(LINE-1; also known as L1). Autonomous class I transposons that encode reverse transcriptase, endonuclease and RNA-binding proteins that effectively mobilize RNA sequences and create novel insertions.

Alu

Primate-specific non-autonomous short interspersed nuclear element retrotransposon. Alus are highly abundant in primate genomes and can mobilize through the long interspersed nuclear element (LINE) retrotransposition machinery.

SVA

Primate-specific non-autonomous retrotransposons composed of fragments of Alus and retroviral long terminal repeat elements. The SVA name comes from the fact that they are derived from short interspersed nuclear elements, variable number tandem repeats (VNTRs) and Alu elements. They mobilize through long interspersed nuclear element (LINE) mobilization proteins.

Germline insertions

Transposable element insertions occurring in the parental germ line or during embryogenesis and shared between all cells of an individual.

Somatic insertions

Transposable element insertions occurring later in life in a specific tissue. These insertions are unique to one or a subset of cells of an individual.

Domesticated

(Also known as co-opted). A transposable element (TE) for which at least part of its sequence has been recruited to perform a specific function for the host, such as providing a TE-encoded protein with physiological functions. The co-opted sequence has been domesticated.

Cis-regulation

A transposable element modulating the expression of nearby genes by having part of its sequence acting as a regulatory element.

Trans-regulation

A transposable element modulating cellular processes distant from its genomic location. Trans-regulation is done via its transcript or encoded protein.

Cryptons

DNA transposons initially identified in fungi that are characterized by the use of tyrosine recombinase instead of transposase for transposition.

Mavericks

Recently identified eukaryotic large DNA transposons (also known as Polintons) encoding up to ten proteins, including some that are similar to virus capsid.

Multi-mapped reads

Sequencing reads that map ambiguously to more than one location on the reference genome. These are common for repetitive regions including transposable elements.

Long-read sequencing

Can be achieved by directly sequencing long DNA molecules, such as by using Pacific Biosciences or Oxford Nanopore Technologies platforms. Alternatively, linked-read sequencing of 10X Genomics generates synthetic long reads by barcoding long molecules of DNA and sequencing interspersed short fragments each retaining the originating long molecule barcode, effectively linking these short reads into longer contigs.

Consensus sequence

Nucleotide sequence representing an approximation of the active transposable element (TE) that gave rise to a group of interspersed repeats. They are generated from a multiple alignment of instances from the same TE family that have accumulated mutations over time.

Miniature inverted repeat TE

(MITE). A recently coined name for non-autonomous short terminal inverted repeat DNA transposons.

Short interspersed nuclear elements

(SINEs). Non-autonomous elements for which their propagation is dependent on the retrotransposition machinery of long interspersed nuclear elements (LINEs) in the same genome. They contain an internal RNA polymerase III promoter derived from a small RNA gene, usually a tRNA.

Nested repeats

Transposable elements (TEs) that inserted in or near previous TE insertions. These are very challenging to detect with short reads.

Terminal inverted repeats

(TIRs). Repeated sequences that are present in the terminal regions of various transposable elements (TEs) are specific for particular TE families. These motifs contain transposase and DNA binding sites that are essential for transposition of the TE.

DIRS

Dictyostelium intermediate repeat sequence (DIRS) are classified as a superfamily of long terminal repeat transposons in the RepBase database and as a distinct order and superfamily in the 2007 Wicker unified transposable element classification system.

Target site duplications

(TSDs). Occur at insertion sites of most transposable elements (TEs), where the host genomic sequence is duplicated surrounding the new TE instance. As the two DNA strands are not cleaved at the exact same location, a few bases in between the two cuts will become duplicated during the second strand synthesis closing the insertion site.

Transduction

Host genomic DNA that is transcribed and inserted elsewhere in the genome through transposable element (TE) retrotransposition events. These duplicated sequences can be found with or without adjacent TE sequences as TE reverse transcription is often prematurely stopped.

SMRT

A PCR-free, single-molecule real-time (SMRT) sequencing platform from Pacific Biosciences that produces long reads. Reads are 1–60 kb in length, with a median of 10 kb.

ChIP-seq

Chromatin immunoprecipitation followed by sequencing (ChIP-seq) consists of the capture and sequencing of DNA that is bound by a protein of interest, such as a transcription factor or modified histone.

RACE-seq

Rapid amplification of cDNA ends (RACE) is a method to amplify complete RNA molecules. RACE-seq involves sequencing the RNA molecules amplified through the RACE protocol. It is often used to detect novel transcripts.

CAGE-seq

Cap analysis of gene expression (CAGE) sequencing is a method to identify transcription start sites through sequencing of 5′ RNA transcripts.

PIWI-interacting RNA

(piRNA). piRNAs are short non-coding RNA molecules that bind to PIWI proteins. They are established as part of transposable element silencing mechanisms in animals.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Goerner-Potvin, P., Bourque, G. Computational tools to unmask transposable elements. Nat Rev Genet 19, 688–704 (2018). https://doi.org/10.1038/s41576-018-0050-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41576-018-0050-x

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing