Review Article | Published:

Computational tools to unmask transposable elements

Abstract

A substantial proportion of the genome of many species is derived from transposable elements (TEs). Moreover, through various self-copying mechanisms, TEs continue to proliferate in the genomes of most species. TEs have contributed numerous regulatory, transcript and protein innovations and have also been linked to disease. However, notwithstanding their demonstrated impact, many genomic studies still exclude them because their repetitive nature results in various analytical complexities. Fortunately, a growing array of methods and software tools are being developed to cater for them. This Review presents a summary of computational resources for TEs and highlights some of the challenges and remaining gaps to perform comprehensive genomic analyses that do not simply ‘mask’ repeats.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

References

  1. 1.

    McClintock, B. Mutable loci in maize. Carnegie Inst. Wash. 47, 155–169 (1948).

  2. 2.

    McClintock, B. The origin and behavior of mutable loci in maize. Proc. Natl Acad. Sci. USA 36, 344–355 (1950).

  3. 3.

    Kazazian, H. H. Mobile elements: drivers of genome evolution. Science 303, 1626–1632 (2004).

  4. 4.

    Garrett, R. A., She, Q., Brügger, K., Faguy, D. & Redder, P. in Mobile DNA II (eds. Craig N. L., Craigie, R., Gellert, M. & Lambowitz, A. M.) 1060–1073 (American Society of Microbiology, Washington, DC, 2002).

  5. 5.

    Finnegan, D. J. Eukaryotic transposable elements and genome evolution. Trends Genet. 5, 103–107 (1989).

  6. 6.

    Lander, E. S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).

  7. 7.

    Kronmiller, B. A. & Wise, R. P. TEnest: automated chronological annotation and visualization of nested plant transposable elements. Plant Physiol. 146, 45–59 (2008).

  8. 8.

    Wicker, T. et al. A unified classification system for eukaryotic transposable elements. Nat. Rev. Genet. 8, 973–982 (2007).

  9. 9.

    Goodwin, T. J. & Poulter, R. T. The DIRS1 group of retrotransposons. Mol. Biol. Evol. 18, 2067–2082 (2001).

  10. 10.

    Duval-Valentin, G., Marty-Cointin, B. & Chandler, M. Requirement of IS911 replication before integration defines a new bacterial transposition pathway. EMBO J. 23, 3897–3906 (2004).

  11. 11.

    de Koning, A. P. J., Gu, W., Castoe, T. A., Batzer, M. A. & Pollock, D. D. Repetitive elements may comprise over two-thirds of the human genome. PLOS Genet. 7, e1002384 (2011).

  12. 12.

    Hata, K. & Sakaki, Y. Identification of critical CpG sites for repression of L1 transcription by DNA methylation. Gene 189, 227–234 (1997).

  13. 13.

    Slotkin, R. K. & Martienssen, R. Transposable elements and the epigenetic regulation of the genome. Nat. Rev. Genet. 8, 272–285 (2007).

  14. 14.

    Malone, C. D. & Hannon, G. J. Small RNAs as guardians of the genome. Cell 136, 656–668 (2009).

  15. 15.

    Levin, H. L. & Moran, J. V. Dynamic interactions between transposable elements and their hosts. Nat. Rev. Genet. 12, 615–627 (2011).

  16. 16.

    Ewing, A. D. & Kazazian, H. H. High-throughput sequencing reveals extensive variation in human-specific L1 content in individual human genomes. Genome Res. 20, 1262–1270 (2010).

  17. 17.

    Xing, J. et al. Mobile elements create structural variation: analysis of a complete human genome. Genome Res. 19, 1516–1526 (2009).

  18. 18.

    Hancks, D. C. & Kazazian, H. H. Roles for retrotransposon insertions in human disease. Mob. DNA 7, 9 (2016).

  19. 19.

    Huang, C. R. L., Burns, K. H. & Boeke, J. D. Active transposition in genomes. Annu. Rev. Genet. 46, 651–675 (2012).

  20. 20.

    Emmons, S. W. & Yesner, L. High-frequency excision of transposable element Tc 1 in the nematode Caenorhabditis elegans is limited to somatic cells. Cell 36, 599–605 (1984).

  21. 21.

    Fernandez, L., Torregrosa, L., Segura, V., Bouquet, A. & Martinez-Zapater, J. M. Transposon-induced gene activation as a mechanism generating cluster shape somatic variation in grapevine. Plant J. 61, 545–557 (2010).

  22. 22.

    Miki, Y. et al. Disruption of the APC gene by a retrotransposal insertion of L1 sequence in a colon cancer. Cancer Res. 52, 643–645 (1992).

  23. 23.

    van den Hurk, J. A. et al. L1 retrotransposition can occur early in human embryonic development. Hum. Mol. Genet. 16, 1587–1592 (2007).

  24. 24.

    Muotri, A. R. et al. Somatic mosaicism in neuronal precursor cells mediated by L1 retrotransposition. Nature 435, 903–910 (2005).

  25. 25.

    Coufal, N. G. et al. L1 retrotransposition in human neural progenitor cells. Nature 460, 1127–1131 (2009).

  26. 26.

    Baillie, J. K. et al. Somatic retrotransposition alters the genetic landscape of the human brain. Nature 479, 534–537 (2011). This study is the first mapping of somatic retrotransposition events in the human brain and is performed with the capture-based polymorphic TE detection tool RC-seq.

  27. 27.

    Goodier, J. L. Retrotransposition in tumors and brains. Mob. DNA 5, 11 (2014).

  28. 28.

    Volff, J.-N. Turning junk into gold: domestication of transposable elements and the creation of new genes in eukaryotes. Bioessays 28, 913–922 (2006).

  29. 29.

    Elbarbary, R. A., Lucas, B. A. & Maquat, L. E. Retrotransposons as regulators of gene expression. Science 351, aac7247 (2016).

  30. 30.

    Chuong, E. B., Elde, N. C. & Feschotte, C. Regulatory activities of transposable elements: from conflicts to benefits. Nat. Rev. Genet. 18, 71–86 (2017).

  31. 31.

    Bourque, G. et al. Evolution of the mammalian transcription factor binding repertoire via transposable elements. Genome Res. 18, 1752–1762 (2008).

  32. 32.

    Jacques, P.-É., Jeyakani, J. & Bourque, G. The majority of primate-specific regulatory sequences are derived from transposable elements. PLOS Genet. 9, e1003504 (2013).

  33. 33.

    Venuto, D. & Bourque, G. Identifying co-opted transposable elements using comparative epigenomics. Dev. Growth Differ. 60, 53–62 (2018).

  34. 34.

    Kim, D.-S. et al. LINE FUSION GENES: a database of LINE expression in human genes. BMC Genomics 7, 139 (2006).

  35. 35.

    Mariner, P. D. et al. Human Alu RNA is a modular transacting repressor of mRNA transcription during heat shock. Mol. Cell 29, 499–509 (2008).

  36. 36.

    Lubelsky, Y. & Ulitsky, I. Sequences enriched in Alu repeats drive nuclear localization of long RNAs in human cells. Nature 555, 107–111 (2018).

  37. 37.

    Babaian, A. & Mager, D. L. Endogenous retroviral promoter exaptation in human cancer. Mob. DNA 7, 24 (2016).

  38. 38.

    Lu, X. et al. The retrovirus HERVH is a long noncoding RNA required for human embryonic stem cell identity. Nat. Struct. Mol. Biol. 21, 423–425 (2014).

  39. 39.

    Naville, M. et al. Not so bad after all: retroviruses and long terminal repeat retrotransposons as a source of new genes in vertebrates. Clin. Microbiol. Infect. 22, 312–323 (2016).

  40. 40.

    Lyon, M. F. Do LINEs have a role in X-chromosome inactivation? J. Biomed. Biotechnol. 2006, 59746 (2006).

  41. 41.

    Chuong, E. B., Elde, N. C. & Feschotte, C. Regulatory evolution of innate immunity through co-option of endogenous retroviruses. Science 351, 1083–1087 (2016).

  42. 42.

    Wang, T. et al. Species-specific endogenous retroviruses shape the transcriptional network of the human tumor suppressor protein p53. Proc. Natl Acad. Sci. USA 104, 18613–18618 (2007).

  43. 43.

    Bao, W., Kojima, K. K. & Kohany, O. Repbase update, a database of repetitive elements in eukaryotic genomes. Mob. DNA 6, 11 (2015). This article presents the most comprehensive collection of TE consensus sequences from eukaryotic genomes, used with references 44 and 45 in RepeatMasker genome annotations.

  44. 44.

    Wheeler, T. J. et al. Dfam: a database of repetitive DNA based on profile hidden Markov models. Nucleic Acids Res. 41, D70–D82 (2013).

  45. 45.

    Hubley, R. et al. The Dfam database of repetitive DNA families. Nucleic Acids Res. 44, D81–D89 (2016). References 44 and 45 present a eukaryotic TE consensus database with added HMM profiles used to improve genomic annotation of TEs.

  46. 46.

    Wicker, T., Matthews, D. E. & Keller, B. TREP: a database for Triticeae repetitive elements. Trends Plant Sci. 7, 561–562 (2002).

  47. 47.

    Chen, J., Hu, Q., Zhang, Y., Lu, C. & Kuang, H. P-MITE: a database for plant miniature inverted-repeat transposable elements. Nucleic Acids Res. 42, D1176–D1181 (2014).

  48. 48.

    Copetti, D. et al. RiTE database: a resource database for genus-wide rice genomics and evolutionary biology. BMC Genomics 16, 538 (2015).

  49. 49.

    Bousios, A. et al. MASiVEdb: the sirevirus plant retrotransposon database. BMC Genomics 13, 158 (2012).

  50. 50.

    Levy, A., Sela, N. & Ast, G. TranspoGene and microTranspoGene: transposed elements influence on the transcriptome of seven vertebrates and invertebrates. Nucleic Acids Res. 36, D47–D52 (2007).

  51. 51.

    Kim, T.-H., Jeon, Y.-J., Kim, W.-Y. & Kim, H.-S. HESAS: HERVs expression and structure analysis system. Bioinformatics 21, 1699–1700 (2005).

  52. 52.

    Spannagl, M. et al. PGSB PlantsDB: updates to the database framework for comparative plant genome research. Nucleic Acids Res. 44, D1141–D1147 (2016). This article presents a combination of multiple plant databases containing TE consensus sequences, annotated instances and polymorphic insertions.

  53. 53.

    Murukarthick, J. et al. BrassicaTED - a public database for utilization of miniature transposable elements in Brassica species. BMC Res. Notes 7, 379 (2014).

  54. 54.

    Wang, J. et al. dbRIP: a highly integrated database of retrotransposon insertion polymorphisms in humans. Hum. Mutat. 27, 323–329 (2006).

  55. 55.

    Mir, A. A., Philippe, C. & Cristofari, G. euL1db: the European database of L1HS retrotransposon insertions in humans. Nucleic Acids Res. 43, D43–D47 (2015). The euL1db database contains the most comprehensive collection of polymorphic L1Hs insertions in human genomes.

  56. 56.

    Gardner, E. J. et al. The mobile element locator tool (MELT): population-scale mobile element discovery and biology. Genome Res. 27, 1916–1929 (2017). This paper presents a great example of a polymorphic TE detection tool that also provides characterization of insertions, and it was used for the 1000 Genomes Project.

  57. 57.

    Daron, J. et al. Organization and evolution of transposable elements along the bread wheat chromosome 3B. Genome Biol. 15, 546 (2014).

  58. 58.

    Darzentas, N., Bousios, A., Apostolidou, V. & Tsaftaris, A. S. MASiVE: mapping and analysis of sirevirus elements in plant genome sequences. Bioinformatics 26, 2452–2454 (2010).

  59. 59.

    Xiong, W., He, L., Lai, J., Dooner, H. K. & Du, C. HelitronScanner uncovers a large overlooked cache of helitron transposons in many plant genomes. Proc. Natl Acad. Sci. USA 111, 10263–10268 (2014).

  60. 60.

    You, F. M., Cloutier, S., Shan, Y. & Ragupathy, R. LTR annotator: automated identification and annotation of LTR retrotransposons in plant genomes. IJBBB 5, 165–174 (2015).

  61. 61.

    Lee, H. et al. MGEScan: a Galaxy-based system for identifying retrotransposons in genomes. Bioinformatics 32, 2502–2504 (2016).

  62. 62.

    Goecks, J., Nekrutenko, A., Taylor, J. & Galaxy Team. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 11, R86 (2010).

  63. 63.

    Steinbiss, S., Willhoeft, U., Gremme, G. & Kurtz, S. Fine-grained annotation and classification of de novo predicted LTR retrotransposons. Nucleic Acids Res. 37, 7002–7013 (2009).

  64. 64.

    Monat, C., Tando, N., Tranchant-Dubreuil, C. & Sabot, F. LTRclassifier: a website for fast structural LTR retrotransposons classification in plants. Mob Genet. Elements 6, e1241050 (2016).

  65. 65.

    Bao, Z. & Eddy, S. R. Automated de novo identification of repeat sequence families in sequenced genomes. Genome Res. 12, 1269–1276 (2002).

  66. 66.

    Price, A. L., Jones, N. C. & Pevzner, P. A. De novo identification of repeat families in large genomes. Bioinformatics 21 (Suppl. 1), 351–358 (2005).

  67. 67.

    Smit, A. & Hubley, R. RepeatModeler 1.0.11. RepeatModeler http://www.repeatmasker.org/RepeatModeler/ (2018).

  68. 68.

    Schaeffer, C. E., Figueroa, N. D., Liu, X. & Karro, J. E. phRAIDER: pattern-hunter based rapid ab initio detection of elementary repeats. Bioinformatics 32, i209–i215 (2016).

  69. 69.

    Girgis, H. Z. Red: an intelligent, rapid, accurate tool for detecting repeats de-novo on the genomic scale. BMC Bioinformatics 16, 860 (2015).

  70. 70.

    Caballero, J., Smit, A. F. A., Hood, L. & Glusman, G. Realistic artificial DNA sequences as negative controls for computational genomics. Nucleic Acids Res. 42, e99 (2014).

  71. 71.

    Flutre, T., Duprat, E., Feuillet, C. & Quesneville, H. Considering transposable element diversification in de novo annotation approaches. PLOS ONE 6, e16526 (2011).

  72. 72.

    Novák, P., Neumann, P. & Macas, J. Graph-based clustering and characterization of repetitive sequences in next-generation sequencing data. BMC Bioinformatics 11, 378 (2010). This paper presents the first method to discover TEs in unassembled sequencing reads, on which many recent tools are based.

  73. 73.

    Novák, P., Neumann, P., Pech, J., Steinhaisl, J. & Macas, J. RepeatExplorer: a Galaxy-based web server for genome-wide characterization of eukaryotic repetitive elements from next-generation sequence reads. Bioinformatics 29, 792–793 (2013).

  74. 74.

    Goubert, C. et al. De novo assembly and annotation of the Asian tiger mosquito (Aedes albopictus) repeatome with dnaPipeTE from raw genomic reads and comparative analysis with the yellow fever mosquito (Aedes aegypti). Genome Biol. Evol. 7, 1192–1205 (2015).

  75. 75.

    Zytnicki, M., Akhunov, E. & Quesneville, H. Tedna: a transposable element de novo assembler. Bioinformatics 30, 2656–2658 (2014).

  76. 76.

    Koch, P., Platzer, M. & Downie, B. R. RepARK—de novo creation of repeat libraries from whole-genome NGS reads. Nucleic Acids Res. 42, e80 (2014).

  77. 77.

    Chu, C., Nielsen, R. & Wu, Y. REPdenovo: inferring de novo repeat motifs from short sequence reads. PLOS ONE 11, e0150719 (2016).

  78. 78.

    Guo, R. et al. RepLong: de novo repeat identification using long read sequencing data. Bioinformatics 34, 1099–1107 (2018).

  79. 79.

    Lerat, E. Identifying repeats and transposable elements in sequenced genomes: how to find your way through the dense forest of programs. Heredity 104, 520–533 (2010). This detailed review discusses bioinformatics tools for TE annotation and classification.

  80. 80.

    Hoen, D. R. et al. A call for benchmarking transposable element annotation methods. Mob. DNA 6, 13 (2015).

  81. 81.

    Kazazian, H. H. et al. Haemophilia A resulting from de novo insertion of L1 sequences represents a novel mechanism for mutation in man. Nature 332, 164–166 (1988).

  82. 82.

    Yu, F., Zingler, N., Schumann, G. & Strätling, W. H. Methyl-CpG-binding protein 2 represses LINE-1 expression and retrotransposition but not Alu transcription. Nucleic Acids Res. 29, 4493–4501 (2001).

  83. 83.

    Muotri, A. R. et al. L1 retrotransposition in neurons is modulated by MeCP2. Nature 468, 443–446 (2010).

  84. 84.

    Linheiro, R. S. & Bergman, C. M. Whole genome resequencing reveals natural target site preferences of transposable elements in Drosophila melanogaster. PLOS ONE 7, e30008 (2012). This article presents a polymorphic TE detection method for fly genomes that showed clade-specific TSD length and enrichment of target site palindromes for TIR and LTR element insertions.

  85. 85.

    Nelson, M. G., Linheiro, R. S. & Bergman, C. M. McClintock: an integrated pipeline for detecting transposable element insertions in whole genome shotgun sequencing data. G3 7, 2763–2778 (2017).

  86. 86.

    Kazazian, H. H. & Moran, J. V. The impact of L1 retrotransposons on the human genome. Nat. Genet. 19, 19–24 (1998).

  87. 87.

    Goodier, J. L. Transduction of 3′-flanking sequences is common in L1 retrotransposition. Hum. Mol. Genet. 9, 653–657 (2000).

  88. 88.

    Nakagome, M. et al. Transposon insertion finder (TIF): a novel program for detection of de novo transpositions of transposable elements. BMC Bioinformatics 15, 71 (2014).

  89. 89.

    1000 Genomes Project Consortium et al. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010).

  90. 90.

    Wu, J. et al. Tangram: a comprehensive toolbox for mobile element insertion detection. BMC Genomics 15, 795–715 (2014).

  91. 91.

    Platzer, A., Nizhynska, V. & Long, Q. TE-Locate: a tool to locate and group transposable element occurrences using paired-end next-generation sequencing data. Biology 1, 395–410 (2012).

  92. 92.

    Zhuang, J., Wang, J. & Theurkauf, W. TEMP: a computational method for analyzing transposable element polymorphism in populations. Nucleic Acids Res. 42, 6826–6838 (2014).

  93. 93.

    Tubio, J. M. C. et al. Extensive transduction of nonrepetitive DNA mediated by L1 retrotransposition in cancer genomes. Science 345, 1251343 (2014). This paper presents a method for somatic TE insertion from short sequencing reads and shows extensive L1-driven transposition and 3′ transduction in cancer genomes.

  94. 94.

    Helman, E. et al. Somatic retrotransposition in human cancer revealed by whole-genome and exome sequencing. Genome Res. 24, 1053–1063 (2014).

  95. 95.

    Hénaff, E., Zapata, L., Casacuberta, J. M. & Ossowski, S. Jitterbug: somatic and germline transposon insertion detection at single-nucleotide resolution. BMC Genomics 16, 768 (2015).

  96. 96.

    Doucet, T. T. & Kazazian, H. H. Long interspersed element sequencing (L1-Seq): a method to identify somatic LINE-1 insertions in the human genome. Methods Mol. Biol. 1400, 79–93 (2016).

  97. 97.

    Tang, Z. et al. Human transposon insertion profiling: analysis, visualization and identification of somatic LINE-1 insertions in ovarian cancer. Proc. Natl Acad. Sci. USA 114, E733–E740 (2017).

  98. 98.

    Solyom, S. et al. Extensive somatic L1 retrotransposition in colorectal tumors. Genome Res. 22, 2328–2338 (2012).

  99. 99.

    Erwin, J. A. et al. L1-associated genomic regions are deleted in somatic cells of the healthy human brain. Nat. Neurosci. 19, 1583–1591 (2016).

  100. 100.

    Witherspoon, D. J. et al. Mobile element scanning (ME-Scan) by targeted high-throughput sequencing. BMC Genomics 11, 410 (2010).

  101. 101.

    Kvikstad, E. M., Piazza, P., Taylor, J. C. & Lunter, G. A high throughput screen for active human transposable elements. BMC Genomics 19, 115 (2018).

  102. 102.

    Streva, V. A. et al. Sequencing, identification and mapping of primed L1 elements (SIMPLE) reveals significant variation in full length L1 elements between individuals. BMC Genomics 16, 220 (2015).

  103. 103.

    Disdero, E. & Filée, J. LoRTE: detecting transposon-induced genomic variants using low coverage PacBio long read sequences. Mob. DNA 8, 5 (2017).

  104. 104.

    Huddleston, J. et al. Discovery and genotyping of structural variation from long-read haploid genome sequence data. Genome Res. 27, 677–685 (2017).

  105. 105.

    Sedlazeck, F. J. et al. Accurate detection of complex structural variations using single-molecule sequencing. Nat. Methods 14, 125 (2018).

  106. 106.

    Chaisson, M. J. P. et al. Resolving the complexity of the human genome using single-molecule sequencing. Nature 517, 608–611 (2015). This study is a major effort to complete the human reference genome through long-read sequencing and a custom structural variant caller.

  107. 107.

    Pendleton, M. et al. Assembly and diploid architecture of an individual human genome via single-molecule technologies. Nat. Methods 12, 780–786 (2015).

  108. 108.

    Ewing, A. D. Transposable element detection from whole genome sequence data. Mob. DNA 6, 24 (2015).

  109. 109.

    Iskow, R. C. et al. Natural mutagenesis of human genomes by endogenous retrotransposons. Cell 141, 1253–1261 (2010).

  110. 110.

    Rishishwar, L., Mariño-Ramírez, L. & Jordan, I. K. Benchmarking computational tools for polymorphic transposable element detection. Brief Bioinform. 18, 908–918 (2017).

  111. 111.

    Kofler, R. SimulaTE: simulating complex landscapes of transposable elements of populations. Bioinformatics 34, 1439 (2018).

  112. 112.

    Navarro, F. C. & Galante, P. A. RCPedia: a database of retrocopied genes. Bioinformatics 29, 1235–1237 (2013).

  113. 113.

    Jin, Y., Tam, O. H., Paniagua, E. & Hammell, M. TEtranscripts: a package for including transposable elements in differential expression analysis of RNA-seq datasets. Bioinformatics 31, 3593–3599 (2015). This article presents the RNA-seq differential expression software TEtranscripts, shown to be the most accurate at identifying reads from repetitive elements.

  114. 114.

    Lanciano, S. et al. Sequencing the extrachromosomal circular mobilome reveals retrotransposon activity in plants. PLOS Genet. 13, e1006630 (2017).

  115. 115.

    Sundaresan, V. & Freeling, M. An extrachromosomal form of the Mu transposons of maize. Proc. Natl Acad. Sci. USA 84, 4924–4928 (1987).

  116. 116.

    Kamal, M., Xie, X. & Lander, E. S. A large family of ancient repeat elements in the human genome is under strong selection. Proc. Natl Acad. Sci. USA 103, 2740–2745 (2006).

  117. 117.

    Lowe, C. B., Bejerano, G. & Haussler, D. Thousands of human mobile element fragments undergo strong purifying selection near developmental genes. Proc. Natl Acad. Sci. USA 104, 8005–8010 (2007).

  118. 118.

    Chandrashekar, D. S., Dey, P. & Acharya, K. K. GREAM: a web server to short-list potentially important genomic repeat elements based on over-/under-representation in specific chromosomal locations, such as the gene neighborhoods, within or across 17 mammalian species. PLOS One 10, e0133647 (2015). This paper describes a tool that was developed to assess the impact of TEs on genes and biological pathways.

  119. 119.

    Criscione, S. W., Zhang, Y., Thompson, W., Sedivy, J. M. & Neretti, N. Transcriptional landscape of repetitive elements in normal and cancer human cells. BMC Genomics 15, 583 (2014).

  120. 120.

    Han, B. W., Wang, W., Zamore, P. D. & Weng, Z. piPipes: a set of pipelines for piRNA and transposon analysis via small RNA-seq, RNA-seq, degradome- and CAGE-seq, ChIP-seq and genomic DNA sequencing. Bioinformatics 31, 593–595 (2015).

  121. 121.

    Luteijn, M. J. & Ketting, R. F. PIWI-interacting RNAs: from generation to transgenerational epigenetics. Nat. Rev. Genet. 14, 523–534 (2013).

  122. 122.

    Lerat, E., Fablet, M., Modolo, L., Lopez-Maestre, H. & Vieira, C. TEtools facilitates big data expression analysis of transposable elements and reveals an antagonism between their activity and that of piRNA genes. Nucleic Acids Res. 45, e17 (2017).

  123. 123.

    Robberecht, C., Voet, T., Zamani Esteki, M., Nowakowska, B. A. & Vermeesch, J. R. Nonallelic homologous recombination between retrotransposable elements is a driver of de novo unbalanced translocations. Genome Res. 23, 411–418 (2013).

  124. 124.

    He, D., Hormozdiari, F., Furlotte, N. & Eskin, E. Efficient algorithms for tandem copy number variation reconstruction in repeat-rich regions. Bioinformatics 27, 1513–1520 (2011).

  125. 125.

    Monlong, J. et al. Human copy number variants are enriched in regions of low mappability. Nucleic Acids Res. 7, 225 (2018).

  126. 126.

    Churakov, G. et al. A novel web-based TinT application and the chronology of the primate Alu retroposon activity. BMC Evol. Biol. 10, 376 (2010).

  127. 127.

    Price, A. L., Eskin, E. & Pevzner, P. A. Whole-genome analysis of Alu repeat elements reveals complex evolutionary history. Genome Res. 14, 2245–2252 (2004).

  128. 128.

    Jiang, C., Chen, C., Huang, Z., Liu, R. & Verdier, J. ITIS, a bioinformatics tool for accurate identification of transposon insertion sites using next-generation sequencing data. BMC Bioinformatics 16, 72 (2015).

  129. 129.

    Daron, J. & Slotkin, R. K. EpiTEome: simultaneous detection of transposable element insertion sites and their DNA methylation levels. Genome Biol. 18, 91 (2017).

  130. 130.

    Glusman, G. et al. A third approach to gene prediction suggests thousands of additional human transcribed regions. PLOS Comput. Biol. 2, e18 (2006).

  131. 131.

    Eddy, S. R. The C-value paradox, junk DNA and ENCODE. Curr. Biol. 22, R898–R899 (2012).

  132. 132.

    Kellis, M. et al. Defining functional DNA elements in the human genome. Proc. Natl Acad. Sci. USA 111, 6131–6138 (2014).

  133. 133.

    Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).

  134. 134.

    Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012). References 133 and 134 describe the aligners BowTie and BowTie2, which are capable of handling multi-mapped reads.

  135. 135.

    Thankaswamy-Kosalai, S., Sen, P. & Nookaew, I. Evaluation and assessment of read-mapping by multiple next-generation sequencing aligners based on genome-wide characteristics. Genomics 109, 186–191 (2017).

  136. 136.

    Kahles, A., Behr, J. & Rätsch, G. MMR: a tool for read multi-mapper resolution. Bioinformatics 32, 770–772 (2016).

  137. 137.

    Wang, J., Huda, A., Lunyak, V. V. & Jordan, I. K. A. Gibbs sampling strategy applied to the mapping of ambiguous short-sequence tags. Bioinformatics 26, 2501–2508 (2010).

  138. 138.

    Chung, D. et al. Discovering transcription factor binding sites in highly repetitive regions of genomes with multi-read analysis of ChIP-Seq data. PLOS Comput. Biol. 7, e1002111 (2011).

  139. 139.

    Wang, R. et al. LOcating non-unique matched tags (LONUT) to improve the detection of the enriched regions for ChIP-seq data. PLOS One 8, e67788 (2013).

  140. 140.

    Nakato, R., Itoh, T. & Shirahige, K. DROMPA: easy-to-handle peak calling and visualization software for the computational analysis and validation of ChIP-seq data. Genes Cells 18, 589–601 (2013). References 138–140 are examples of ChIP-seq peak callers developed to include multi-mapped reads in their analyses.

  141. 141.

    Trapnell, C. et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat. Protoc. 7, 562–578 (2012).

  142. 142.

    Anders, S., Pyl, P. T. & Huber, W. HTSeq — a Python framework to work with high-throughput sequencing data. Bioinformatics 31, 166–169 (2015).

  143. 143.

    Bray, N. L., Pimentel, H., Melsted, P. & Pachter, L. Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol. 34, 525–527 (2016).

  144. 144.

    Patro, R., Duggal, G., Love, M. I., Irizarry, R. A. & Kingsford, C. Salmon provides fast and bias-aware quantification of transcript expression. Nat. Methods 14, 417–419 (2017).

  145. 145.

    Boeke, J. D., Garfinkel, D. J., Styles, C. A. & Fink, G. R. Ty elements transpose through an RNA intermediate. Cell 40, 491–500 (1985).

  146. 146.

    Eickbush, T. H. & Malik, H. S. in Mobile DNA II (eds. Craig N. L., Craigie, R., Gellert, M. & Lambowitz, A. M.) 1111–1144 (American Society of Microbiology, Washington, DC, 2002).

  147. 147.

    Piégu, B., Bire, S., Arensburger, P. & Bigot, Y. A survey of transposable element classification systems — a call for a fundamental update to meet the challenge of their diversity and complexity. Mol. Phylogenet. Evol. 86, 90–109 (2015).

  148. 148.

    Llorens, C. et al. The Gypsy Database (GyDB) of mobile genetic elements: release 2.0. Nucleic Acids Res. 39, D70–D74 (2011).

  149. 149.

    Vassetzky, N. S. & Kramerov, D. A. SINEBase: a database and tool for SINE analysis. Nucleic Acids Res. 41, D83–D89 (2013).

  150. 150.

    Ma, B., Li, T., Xiang, Z. & He, N. MnTEdb, a collective resource for mulberry transposable elements. Database 2015, bav004 (2015).

  151. 151.

    Shao, F., Wang, J., Xu, H. & Peng, Z. FishTEDB: a collective database of transposable elements identified in the complete genomes of fish. Database 2018, bax106 (2018).

  152. 152.

    Xu, H. E. et al. BmTEdb: a collective database of transposable elements in the silkworm genome. Database 2013, bat055 (2013).

  153. 153.

    Li, S.-F., Zhang, G.-J., Yuan, J.-H., Deng, C.-L. & Gao, W.-J. Repetitive sequences and epigenetic modification: inseparable partners play important roles in the evolution of plant sex chromosomes. Planta 243, 1083–1095 (2016).

  154. 154.

    Roberts, A. P. et al. Revised nomenclature for transposable genetic elements. Plasmid 60, 167–173 (2008).

  155. 155.

    Nakagawa, S. & Takahashi, M. U. gEVE: a genome-based endogenous viral element database provides comprehensive viral protein-coding sequences in mammalian genomes. Database 2016, baw087 (2016).

  156. 156.

    Paces, J., Pavlícek, A. & Paces, V. HERVd: database of human endogenous retroviruses. Nucleic Acids Res. 30, 205–206 (2002).

  157. 157.

    Lappalainen, I. et al. DbVar and DGVa: public archives for genomic structural variation. Nucleic Acids Res. 41, D936–D941 (2013).

  158. 158.

    Rahman, R. et al. Unique transposon landscapes are pervasive across Drosophila melanogastergenomes. Nucleic Acids Res. 43, 10655–10672 (2015).

  159. 159.

    Ye, C., Ji, G. & Liang, C. detectMITE: a novel approach to detect miniature inverted repeat transposable elements in genomes. Sci. Rep. 6, 19688 (2016).

  160. 160.

    Keane, T. M., Wong, K. & Adams, D. J. RetroSeq: transposable element discovery from next-generation sequencing data. Bioinformatics 29, 389–390 (2013).

  161. 161.

    Hormozdiari, F. et al. Next-generation VariationHunter: combinatorial algorithms for transposon insertion discovery. Bioinformatics 26, i350–i357 (2010).

  162. 162.

    Gilly, A. et al. TE-Tracker: systematic identification of transposition events through whole-genome resequencing. BMC Bioinformatics 15, 377 (2014).

  163. 163.

    Thung, D. T. et al. Mobster: accurate detection of mobile element insertions in next generation sequencing data. Genome Biol. 15, 488 (2014).

  164. 164.

    Quadrana, L. et al. The Arabidopsis thaliana mobilome and its impact at the species level. eLife 5, e15716 (2016).

  165. 165.

    David, M., Mustafa, H. & Brudno, M. Detecting Alu insertions from high-throughput sequencing data. Nucleic Acids Res. 41, e169 (2013).

  166. 166.

    Tica, J. et al. Next-generation sequencing-based detection of germline L1-mediated transductions. BMC Genomics 17, 342 (2016).

  167. 167.

    Du, C., Caronna, J., He, L. & Dooner, H. K. Computational prediction and molecular confirmation of Helitron transposons in the maize genome. BMC Genomics 9, 51 (2008).

  168. 168.

    Fiston-Lavier, A. S., Barron, M. G., Petrov, D. A. & Gonzalez, J. T-Lex2: genotyping, frequency estimation and re-annotation of transposable elements using single or pooled next-generation sequencing data. Nucleic Acids Res. 43, e22 (2015).

  169. 169.

    Kofler, R., Betancourt, A. J. & Schlötterer, C. Sequencing of pooled DNA samples (Pool-Seq) uncovers complex dynamics of transposable element insertions in Drosophila melanogaster. PLOS Genet. 8, e1002487 (2012).

  170. 170.

    Kofler, R. & Gómez-Sánchez, D. PoPoolationTE2: comparative population genomics of transposable elements using Pool-Seq. Mol. Biol. Evol. 33, 2759–2764 (2016).

  171. 171.

    Cridland, J. M., Macdonald, S. J., Long, A. D. & Thornton, K. R. Abundance and distribution of transposable elements in two Drosophila QTL mapping resources. Mol. Biol. Evol. 30, 2311–2327 (2013).

  172. 172.

    Chen, J., Wrightsman, T. R., Wessler, S. R. & Stajich, J. E. RelocaTE2: a high resolution transposable element insertion site mapping tool for population resequencing. PeerJ 5, e2942 (2017).

  173. 173.

    Stuart, T. et al. Population scale mapping of transposable element diversity reveals links to gene regulation and epigenomic variation. eLife 5, e20777 (2016).

Download references

Acknowledgements

This work was supported by a grant from the Canadian Institute for Health Research (CIHR-MOP-115090). P.G.-P. is supported by the Programme de bourses de formation de doctorat du Fonds de Recherche Québec Santé (FRSQ-31874). G.B. is supported by the Fonds de Recherche Québec Santé (FRQS-25348). The authors also thank J.M.M. Monlong and the reviewers for very useful comments on the manuscript.

Reviewer information

Nature Reviews Genetics thanks E. Lerat, A. Smit and the other, anonymous reviewer(s) for their contribution to the peer review of this work.

Author information

P.G.-P. and G.B. contributed to all aspects of the manuscript.

Correspondence to Guillaume Bourque.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Related links

NovoAlign: http://www.novocraft.com/products/novoalign

RepeatMasker: http://www.repeatmasker.org

Glossary

TE annotation

Assembled genomes are annotated to indicate which sequences are derived from transposable elements (TEs). The annotation reveals which families of TEs are present as well as the percentage of TE-derived sequences in a genome.

Repression mechanisms

Active transposable elements contain promoters that can initiate transcription. They are ‘silenced’ through various repression mechanisms to prevent transcription and further mobilization.

Polymorphic insertions

Individual transposable element instances that have not been fixed in a species genome and are present in some re-sequenced genomes but absent from others, such as the reference genome. Polymorphic insertions can be either germline or somatic.

Long interspersed nuclear element 1

(LINE-1; also known as L1). Autonomous class I transposons that encode reverse transcriptase, endonuclease and RNA-binding proteins that effectively mobilize RNA sequences and create novel insertions.

Alu

Primate-specific non-autonomous short interspersed nuclear element retrotransposon. Alus are highly abundant in primate genomes and can mobilize through the long interspersed nuclear element (LINE) retrotransposition machinery.

SVA

Primate-specific non-autonomous retrotransposons composed of fragments of Alus and retroviral long terminal repeat elements. The SVA name comes from the fact that they are derived from short interspersed nuclear elements, variable number tandem repeats (VNTRs) and Alu elements. They mobilize through long interspersed nuclear element (LINE) mobilization proteins.

Germline insertions

Transposable element insertions occurring in the parental germ line or during embryogenesis and shared between all cells of an individual.

Somatic insertions

Transposable element insertions occurring later in life in a specific tissue. These insertions are unique to one or a subset of cells of an individual.

Domesticated

(Also known as co-opted). A transposable element (TE) for which at least part of its sequence has been recruited to perform a specific function for the host, such as providing a TE-encoded protein with physiological functions. The co-opted sequence has been domesticated.

Cis-regulation

A transposable element modulating the expression of nearby genes by having part of its sequence acting as a regulatory element.

Trans-regulation

A transposable element modulating cellular processes distant from its genomic location. Trans-regulation is done via its transcript or encoded protein.

Cryptons

DNA transposons initially identified in fungi that are characterized by the use of tyrosine recombinase instead of transposase for transposition.

Mavericks

Recently identified eukaryotic large DNA transposons (also known as Polintons) encoding up to ten proteins, including some that are similar to virus capsid.

Multi-mapped reads

Sequencing reads that map ambiguously to more than one location on the reference genome. These are common for repetitive regions including transposable elements.

Long-read sequencing

Can be achieved by directly sequencing long DNA molecules, such as by using Pacific Biosciences or Oxford Nanopore Technologies platforms. Alternatively, linked-read sequencing of 10X Genomics generates synthetic long reads by barcoding long molecules of DNA and sequencing interspersed short fragments each retaining the originating long molecule barcode, effectively linking these short reads into longer contigs.

Consensus sequence

Nucleotide sequence representing an approximation of the active transposable element (TE) that gave rise to a group of interspersed repeats. They are generated from a multiple alignment of instances from the same TE family that have accumulated mutations over time.

Miniature inverted repeat TE

(MITE). A recently coined name for non-autonomous short terminal inverted repeat DNA transposons.

Short interspersed nuclear elements

(SINEs). Non-autonomous elements for which their propagation is dependent on the retrotransposition machinery of long interspersed nuclear elements (LINEs) in the same genome. They contain an internal RNA polymerase III promoter derived from a small RNA gene, usually a tRNA.

Nested repeats

Transposable elements (TEs) that inserted in or near previous TE insertions. These are very challenging to detect with short reads.

Terminal inverted repeats

(TIRs). Repeated sequences that are present in the terminal regions of various transposable elements (TEs) are specific for particular TE families. These motifs contain transposase and DNA binding sites that are essential for transposition of the TE.

DIRS

Dictyostelium intermediate repeat sequence (DIRS) are classified as a superfamily of long terminal repeat transposons in the RepBase database and as a distinct order and superfamily in the 2007 Wicker unified transposable element classification system.

Target site duplications

(TSDs). Occur at insertion sites of most transposable elements (TEs), where the host genomic sequence is duplicated surrounding the new TE instance. As the two DNA strands are not cleaved at the exact same location, a few bases in between the two cuts will become duplicated during the second strand synthesis closing the insertion site.

Transduction

Host genomic DNA that is transcribed and inserted elsewhere in the genome through transposable element (TE) retrotransposition events. These duplicated sequences can be found with or without adjacent TE sequences as TE reverse transcription is often prematurely stopped.

SMRT

A PCR-free, single-molecule real-time (SMRT) sequencing platform from Pacific Biosciences that produces long reads. Reads are 1–60 kb in length, with a median of 10 kb.

ChIP-seq

Chromatin immunoprecipitation followed by sequencing (ChIP-seq) consists of the capture and sequencing of DNA that is bound by a protein of interest, such as a transcription factor or modified histone.

RACE-seq

Rapid amplification of cDNA ends (RACE) is a method to amplify complete RNA molecules. RACE-seq involves sequencing the RNA molecules amplified through the RACE protocol. It is often used to detect novel transcripts.

CAGE-seq

Cap analysis of gene expression (CAGE) sequencing is a method to identify transcription start sites through sequencing of 5′ RNA transcripts.

PIWI-interacting RNA

(piRNA). piRNAs are short non-coding RNA molecules that bind to PIWI proteins. They are established as part of transposable element silencing mechanisms in animals.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Further reading

Fig. 1: Computational tools to analyse TEs.
Fig. 2: Discovery and annotation of TEs and repeats in genomes.
Fig. 3: Detection of polymorphic TE insertions.
Fig. 4: Functional impacts of TEs.