Computational tools to unmask transposable elements

Goerner-Potvin, Patricia; Bourque, Guillaume

doi:10.1038/s41576-018-0050-x

Review Article
Published: 19 September 2018

Computational tools to unmask transposable elements

Nature Reviews Genetics volume 19, pages 688–704 (2018)Cite this article

19k Accesses
130 Citations
182 Altmetric
Metrics details

Subjects

Abstract

A substantial proportion of the genome of many species is derived from transposable elements (TEs). Moreover, through various self-copying mechanisms, TEs continue to proliferate in the genomes of most species. TEs have contributed numerous regulatory, transcript and protein innovations and have also been linked to disease. However, notwithstanding their demonstrated impact, many genomic studies still exclude them because their repetitive nature results in various analytical complexities. Fortunately, a growing array of methods and software tools are being developed to cater for them. This Review presents a summary of computational resources for TEs and highlights some of the challenges and remaining gaps to perform comprehensive genomic analyses that do not simply ‘mask’ repeats.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Computational tools to analyse TEs.**

**Fig. 2: Discovery and annotation of TEs and repeats in genomes.**

**Fig. 3: Detection of polymorphic TE insertions.**

Comprehensive identification of transposable element insertions using multiple sequencing technologies

Article Open access 22 June 2021

Transposable elements in human genetic disease

Article 12 September 2019

Characterization and visualization of tandem repeats at genome scale

Article 02 January 2024

References

McClintock, B. Mutable loci in maize. Carnegie Inst. Wash. 47, 155–169 (1948).
Google Scholar
McClintock, B. The origin and behavior of mutable loci in maize. Proc. Natl Acad. Sci. USA 36, 344–355 (1950).
CAS PubMed PubMed Central Google Scholar
Kazazian, H. H. Mobile elements: drivers of genome evolution. Science 303, 1626–1632 (2004).
CAS PubMed Google Scholar
Garrett, R. A., She, Q., Brügger, K., Faguy, D. & Redder, P. in Mobile DNA II (eds. Craig N. L., Craigie, R., Gellert, M. & Lambowitz, A. M.) 1060–1073 (American Society of Microbiology, Washington, DC, 2002).
Finnegan, D. J. Eukaryotic transposable elements and genome evolution. Trends Genet. 5, 103–107 (1989).
CAS PubMed Google Scholar
Lander, E. S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).
CAS PubMed Google Scholar
Kronmiller, B. A. & Wise, R. P. TEnest: automated chronological annotation and visualization of nested plant transposable elements. Plant Physiol. 146, 45–59 (2008).
CAS PubMed PubMed Central Google Scholar
Wicker, T. et al. A unified classification system for eukaryotic transposable elements. Nat. Rev. Genet. 8, 973–982 (2007).
CAS PubMed Google Scholar
Goodwin, T. J. & Poulter, R. T. The DIRS1 group of retrotransposons. Mol. Biol. Evol. 18, 2067–2082 (2001).
CAS PubMed Google Scholar
Duval-Valentin, G., Marty-Cointin, B. & Chandler, M. Requirement of IS911 replication before integration defines a new bacterial transposition pathway. EMBO J. 23, 3897–3906 (2004).
CAS PubMed PubMed Central Google Scholar
de Koning, A. P. J., Gu, W., Castoe, T. A., Batzer, M. A. & Pollock, D. D. Repetitive elements may comprise over two-thirds of the human genome. PLOS Genet. 7, e1002384 (2011).
PubMed PubMed Central Google Scholar
Hata, K. & Sakaki, Y. Identification of critical CpG sites for repression of L1 transcription by DNA methylation. Gene 189, 227–234 (1997).
CAS PubMed Google Scholar
Slotkin, R. K. & Martienssen, R. Transposable elements and the epigenetic regulation of the genome. Nat. Rev. Genet. 8, 272–285 (2007).
CAS PubMed Google Scholar
Malone, C. D. & Hannon, G. J. Small RNAs as guardians of the genome. Cell 136, 656–668 (2009).
CAS PubMed PubMed Central Google Scholar
Levin, H. L. & Moran, J. V. Dynamic interactions between transposable elements and their hosts. Nat. Rev. Genet. 12, 615–627 (2011).
CAS PubMed PubMed Central Google Scholar
Ewing, A. D. & Kazazian, H. H. High-throughput sequencing reveals extensive variation in human-specific L1 content in individual human genomes. Genome Res. 20, 1262–1270 (2010).
CAS PubMed PubMed Central Google Scholar
Xing, J. et al. Mobile elements create structural variation: analysis of a complete human genome. Genome Res. 19, 1516–1526 (2009).
CAS PubMed PubMed Central Google Scholar
Hancks, D. C. & Kazazian, H. H. Roles for retrotransposon insertions in human disease. Mob. DNA 7, 9 (2016).
PubMed PubMed Central Google Scholar
Huang, C. R. L., Burns, K. H. & Boeke, J. D. Active transposition in genomes. Annu. Rev. Genet. 46, 651–675 (2012).
CAS PubMed PubMed Central Google Scholar
Emmons, S. W. & Yesner, L. High-frequency excision of transposable element Tc 1 in the nematode Caenorhabditis elegans is limited to somatic cells. Cell 36, 599–605 (1984).
CAS PubMed Google Scholar
Fernandez, L., Torregrosa, L., Segura, V., Bouquet, A. & Martinez-Zapater, J. M. Transposon-induced gene activation as a mechanism generating cluster shape somatic variation in grapevine. Plant J. 61, 545–557 (2010).
CAS PubMed Google Scholar
Miki, Y. et al. Disruption of the APC gene by a retrotransposal insertion of L1 sequence in a colon cancer. Cancer Res. 52, 643–645 (1992).
CAS PubMed Google Scholar
van den Hurk, J. A. et al. L1 retrotransposition can occur early in human embryonic development. Hum. Mol. Genet. 16, 1587–1592 (2007).
PubMed Google Scholar
Muotri, A. R. et al. Somatic mosaicism in neuronal precursor cells mediated by L1 retrotransposition. Nature 435, 903–910 (2005).
CAS PubMed Google Scholar
Coufal, N. G. et al. L1 retrotransposition in human neural progenitor cells. Nature 460, 1127–1131 (2009).
CAS PubMed PubMed Central Google Scholar
Baillie, J. K. et al. Somatic retrotransposition alters the genetic landscape of the human brain. Nature 479, 534–537 (2011). This study is the first mapping of somatic retrotransposition events in the human brain and is performed with the capture-based polymorphic TE detection tool RC-seq.
CAS PubMed PubMed Central Google Scholar
Goodier, J. L. Retrotransposition in tumors and brains. Mob. DNA 5, 11 (2014).
PubMed PubMed Central Google Scholar
Volff, J.-N. Turning junk into gold: domestication of transposable elements and the creation of new genes in eukaryotes. Bioessays 28, 913–922 (2006).
CAS PubMed Google Scholar
Elbarbary, R. A., Lucas, B. A. & Maquat, L. E. Retrotransposons as regulators of gene expression. Science 351, aac7247 (2016).
PubMed PubMed Central Google Scholar
Chuong, E. B., Elde, N. C. & Feschotte, C. Regulatory activities of transposable elements: from conflicts to benefits. Nat. Rev. Genet. 18, 71–86 (2017).
CAS PubMed Google Scholar
Bourque, G. et al. Evolution of the mammalian transcription factor binding repertoire via transposable elements. Genome Res. 18, 1752–1762 (2008).
CAS PubMed PubMed Central Google Scholar
Jacques, P.-É., Jeyakani, J. & Bourque, G. The majority of primate-specific regulatory sequences are derived from transposable elements. PLOS Genet. 9, e1003504 (2013).
CAS PubMed PubMed Central Google Scholar
Venuto, D. & Bourque, G. Identifying co-opted transposable elements using comparative epigenomics. Dev. Growth Differ. 60, 53–62 (2018).
CAS PubMed Google Scholar
Kim, D.-S. et al. LINE FUSION GENES: a database of LINE expression in human genes. BMC Genomics 7, 139 (2006).
PubMed PubMed Central Google Scholar
Mariner, P. D. et al. Human Alu RNA is a modular transacting repressor of mRNA transcription during heat shock. Mol. Cell 29, 499–509 (2008).
CAS PubMed Google Scholar
Lubelsky, Y. & Ulitsky, I. Sequences enriched in Alu repeats drive nuclear localization of long RNAs in human cells. Nature 555, 107–111 (2018).
CAS PubMed PubMed Central Google Scholar
Babaian, A. & Mager, D. L. Endogenous retroviral promoter exaptation in human cancer. Mob. DNA 7, 24 (2016).
PubMed PubMed Central Google Scholar
Lu, X. et al. The retrovirus HERVH is a long noncoding RNA required for human embryonic stem cell identity. Nat. Struct. Mol. Biol. 21, 423–425 (2014).
CAS PubMed Google Scholar
Naville, M. et al. Not so bad after all: retroviruses and long terminal repeat retrotransposons as a source of new genes in vertebrates. Clin. Microbiol. Infect. 22, 312–323 (2016).
CAS PubMed Google Scholar
Lyon, M. F. Do LINEs have a role in X-chromosome inactivation? J. Biomed. Biotechnol. 2006, 59746 (2006).
PubMed PubMed Central Google Scholar
Chuong, E. B., Elde, N. C. & Feschotte, C. Regulatory evolution of innate immunity through co-option of endogenous retroviruses. Science 351, 1083–1087 (2016).
CAS PubMed PubMed Central Google Scholar
Wang, T. et al. Species-specific endogenous retroviruses shape the transcriptional network of the human tumor suppressor protein p53. Proc. Natl Acad. Sci. USA 104, 18613–18618 (2007).
CAS PubMed PubMed Central Google Scholar
Bao, W., Kojima, K. K. & Kohany, O. Repbase update, a database of repetitive elements in eukaryotic genomes. Mob. DNA 6, 11 (2015). This article presents the most comprehensive collection of TE consensus sequences from eukaryotic genomes, used with references 44 and 45 in RepeatMasker genome annotations.
PubMed PubMed Central Google Scholar
Wheeler, T. J. et al. Dfam: a database of repetitive DNA based on profile hidden Markov models. Nucleic Acids Res. 41, D70–D82 (2013).
CAS PubMed Google Scholar
Hubley, R. et al. The Dfam database of repetitive DNA families. Nucleic Acids Res. 44, D81–D89 (2016). References 44 and 45 present a eukaryotic TE consensus database with added HMM profiles used to improve genomic annotation of TEs.
CAS PubMed Google Scholar
Wicker, T., Matthews, D. E. & Keller, B. TREP: a database for Triticeae repetitive elements. Trends Plant Sci. 7, 561–562 (2002).
CAS Google Scholar
Chen, J., Hu, Q., Zhang, Y., Lu, C. & Kuang, H. P-MITE: a database for plant miniature inverted-repeat transposable elements. Nucleic Acids Res. 42, D1176–D1181 (2014).
CAS PubMed Google Scholar
Copetti, D. et al. RiTE database: a resource database for genus-wide rice genomics and evolutionary biology. BMC Genomics 16, 538 (2015).
PubMed PubMed Central Google Scholar
Bousios, A. et al. MASiVEdb: the sirevirus plant retrotransposon database. BMC Genomics 13, 158 (2012).
CAS PubMed PubMed Central Google Scholar
Levy, A., Sela, N. & Ast, G. TranspoGene and microTranspoGene: transposed elements influence on the transcriptome of seven vertebrates and invertebrates. Nucleic Acids Res. 36, D47–D52 (2007).
PubMed PubMed Central Google Scholar
Kim, T.-H., Jeon, Y.-J., Kim, W.-Y. & Kim, H.-S. HESAS: HERVs expression and structure analysis system. Bioinformatics 21, 1699–1700 (2005).
CAS PubMed Google Scholar
Spannagl, M. et al. PGSB PlantsDB: updates to the database framework for comparative plant genome research. Nucleic Acids Res. 44, D1141–D1147 (2016). This article presents a combination of multiple plant databases containing TE consensus sequences, annotated instances and polymorphic insertions.
CAS PubMed Google Scholar
Murukarthick, J. et al. BrassicaTED - a public database for utilization of miniature transposable elements in Brassica species. BMC Res. Notes 7, 379 (2014).
PubMed PubMed Central Google Scholar
Wang, J. et al. dbRIP: a highly integrated database of retrotransposon insertion polymorphisms in humans. Hum. Mutat. 27, 323–329 (2006).
PubMed PubMed Central Google Scholar
Mir, A. A., Philippe, C. & Cristofari, G. euL1db: the European database of L1HS retrotransposon insertions in humans. Nucleic Acids Res. 43, D43–D47 (2015). The euL1db database contains the most comprehensive collection of polymorphic L1Hs insertions in human genomes.
CAS PubMed Google Scholar
Gardner, E. J. et al. The mobile element locator tool (MELT): population-scale mobile element discovery and biology. Genome Res. 27, 1916–1929 (2017). This paper presents a great example of a polymorphic TE detection tool that also provides characterization of insertions, and it was used for the 1000 Genomes Project.
CAS PubMed PubMed Central Google Scholar
Daron, J. et al. Organization and evolution of transposable elements along the bread wheat chromosome 3B. Genome Biol. 15, 546 (2014).
PubMed PubMed Central Google Scholar
Darzentas, N., Bousios, A., Apostolidou, V. & Tsaftaris, A. S. MASiVE: mapping and analysis of sirevirus elements in plant genome sequences. Bioinformatics 26, 2452–2454 (2010).
CAS PubMed Google Scholar
Xiong, W., He, L., Lai, J., Dooner, H. K. & Du, C. HelitronScanner uncovers a large overlooked cache of helitron transposons in many plant genomes. Proc. Natl Acad. Sci. USA 111, 10263–10268 (2014).
CAS PubMed PubMed Central Google Scholar
You, F. M., Cloutier, S., Shan, Y. & Ragupathy, R. LTR annotator: automated identification and annotation of LTR retrotransposons in plant genomes. IJBBB 5, 165–174 (2015).
CAS Google Scholar
Lee, H. et al. MGEScan: a Galaxy-based system for identifying retrotransposons in genomes. Bioinformatics 32, 2502–2504 (2016).
CAS PubMed Google Scholar
Goecks, J., Nekrutenko, A., Taylor, J. & Galaxy Team. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 11, R86 (2010).
PubMed PubMed Central Google Scholar
Steinbiss, S., Willhoeft, U., Gremme, G. & Kurtz, S. Fine-grained annotation and classification of de novo predicted LTR retrotransposons. Nucleic Acids Res. 37, 7002–7013 (2009).
CAS PubMed PubMed Central Google Scholar
Monat, C., Tando, N., Tranchant-Dubreuil, C. & Sabot, F. LTRclassifier: a website for fast structural LTR retrotransposons classification in plants. Mob Genet. Elements 6, e1241050 (2016).
PubMed PubMed Central Google Scholar
Bao, Z. & Eddy, S. R. Automated de novo identification of repeat sequence families in sequenced genomes. Genome Res. 12, 1269–1276 (2002).
CAS PubMed PubMed Central Google Scholar
Price, A. L., Jones, N. C. & Pevzner, P. A. De novo identification of repeat families in large genomes. Bioinformatics 21 (Suppl. 1), 351–358 (2005).
Google Scholar
Smit, A. & Hubley, R. RepeatModeler 1.0.11. RepeatModeler http://www.repeatmasker.org/RepeatModeler/ (2018).
CAS PubMed Google Scholar
Schaeffer, C. E., Figueroa, N. D., Liu, X. & Karro, J. E. phRAIDER: pattern-hunter based rapid ab initio detection of elementary repeats. Bioinformatics 32, i209–i215 (2016).
CAS PubMed PubMed Central Google Scholar
Girgis, H. Z. Red: an intelligent, rapid, accurate tool for detecting repeats de-novo on the genomic scale. BMC Bioinformatics 16, 860 (2015).
Google Scholar
Caballero, J., Smit, A. F. A., Hood, L. & Glusman, G. Realistic artificial DNA sequences as negative controls for computational genomics. Nucleic Acids Res. 42, e99 (2014).
CAS PubMed PubMed Central Google Scholar
Flutre, T., Duprat, E., Feuillet, C. & Quesneville, H. Considering transposable element diversification in de novo annotation approaches. PLOS ONE 6, e16526 (2011).
CAS PubMed PubMed Central Google Scholar
Novák, P., Neumann, P. & Macas, J. Graph-based clustering and characterization of repetitive sequences in next-generation sequencing data. BMC Bioinformatics 11, 378 (2010). This paper presents the first method to discover TEs in unassembled sequencing reads, on which many recent tools are based.
PubMed PubMed Central Google Scholar
Novák, P., Neumann, P., Pech, J., Steinhaisl, J. & Macas, J. RepeatExplorer: a Galaxy-based web server for genome-wide characterization of eukaryotic repetitive elements from next-generation sequence reads. Bioinformatics 29, 792–793 (2013).
PubMed Google Scholar
Goubert, C. et al. De novo assembly and annotation of the Asian tiger mosquito (Aedes albopictus) repeatome with dnaPipeTE from raw genomic reads and comparative analysis with the yellow fever mosquito (Aedes aegypti). Genome Biol. Evol. 7, 1192–1205 (2015).
CAS PubMed PubMed Central Google Scholar
Zytnicki, M., Akhunov, E. & Quesneville, H. Tedna: a transposable element de novo assembler. Bioinformatics 30, 2656–2658 (2014).
CAS PubMed Google Scholar
Koch, P., Platzer, M. & Downie, B. R. RepARK—de novo creation of repeat libraries from whole-genome NGS reads. Nucleic Acids Res. 42, e80 (2014).
CAS PubMed PubMed Central Google Scholar
Chu, C., Nielsen, R. & Wu, Y. REPdenovo: inferring de novo repeat motifs from short sequence reads. PLOS ONE 11, e0150719 (2016).
PubMed PubMed Central Google Scholar
Guo, R. et al. RepLong: de novo repeat identification using long read sequencing data. Bioinformatics 34, 1099–1107 (2018).
PubMed Google Scholar
Lerat, E. Identifying repeats and transposable elements in sequenced genomes: how to find your way through the dense forest of programs. Heredity 104, 520–533 (2010). This detailed review discusses bioinformatics tools for TE annotation and classification.
CAS PubMed Google Scholar
Hoen, D. R. et al. A call for benchmarking transposable element annotation methods. Mob. DNA 6, 13 (2015).
PubMed PubMed Central Google Scholar
Kazazian, H. H. et al. Haemophilia A resulting from de novo insertion of L1 sequences represents a novel mechanism for mutation in man. Nature 332, 164–166 (1988).
CAS PubMed Google Scholar
Yu, F., Zingler, N., Schumann, G. & Strätling, W. H. Methyl-CpG-binding protein 2 represses LINE-1 expression and retrotransposition but not Alu transcription. Nucleic Acids Res. 29, 4493–4501 (2001).
CAS PubMed PubMed Central Google Scholar
Muotri, A. R. et al. L1 retrotransposition in neurons is modulated by MeCP2. Nature 468, 443–446 (2010).
CAS PubMed PubMed Central Google Scholar
Linheiro, R. S. & Bergman, C. M. Whole genome resequencing reveals natural target site preferences of transposable elements in Drosophila melanogaster. PLOS ONE 7, e30008 (2012). This article presents a polymorphic TE detection method for fly genomes that showed clade-specific TSD length and enrichment of target site palindromes for TIR and LTR element insertions.
CAS PubMed PubMed Central Google Scholar
Nelson, M. G., Linheiro, R. S. & Bergman, C. M. McClintock: an integrated pipeline for detecting transposable element insertions in whole genome shotgun sequencing data. G3 7, 2763–2778 (2017).
PubMed PubMed Central Google Scholar
Kazazian, H. H. & Moran, J. V. The impact of L1 retrotransposons on the human genome. Nat. Genet. 19, 19–24 (1998).
CAS PubMed Google Scholar
Goodier, J. L. Transduction of 3′-flanking sequences is common in L1 retrotransposition. Hum. Mol. Genet. 9, 653–657 (2000).
CAS PubMed Google Scholar
Nakagome, M. et al. Transposon insertion finder (TIF): a novel program for detection of de novo transpositions of transposable elements. BMC Bioinformatics 15, 71 (2014).
PubMed PubMed Central Google Scholar
1000 Genomes Project Consortium et al. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010).
Google Scholar
Wu, J. et al. Tangram: a comprehensive toolbox for mobile element insertion detection. BMC Genomics 15, 795–715 (2014).
CAS PubMed PubMed Central Google Scholar
Platzer, A., Nizhynska, V. & Long, Q. TE-Locate: a tool to locate and group transposable element occurrences using paired-end next-generation sequencing data. Biology 1, 395–410 (2012).
CAS PubMed PubMed Central Google Scholar
Zhuang, J., Wang, J. & Theurkauf, W. TEMP: a computational method for analyzing transposable element polymorphism in populations. Nucleic Acids Res. 42, 6826–6838 (2014).
CAS PubMed PubMed Central Google Scholar
Tubio, J. M. C. et al. Extensive transduction of nonrepetitive DNA mediated by L1 retrotransposition in cancer genomes. Science 345, 1251343 (2014). This paper presents a method for somatic TE insertion from short sequencing reads and shows extensive L1-driven transposition and 3′ transduction in cancer genomes.
PubMed PubMed Central Google Scholar
Helman, E. et al. Somatic retrotransposition in human cancer revealed by whole-genome and exome sequencing. Genome Res. 24, 1053–1063 (2014).
CAS PubMed PubMed Central Google Scholar
Hénaff, E., Zapata, L., Casacuberta, J. M. & Ossowski, S. Jitterbug: somatic and germline transposon insertion detection at single-nucleotide resolution. BMC Genomics 16, 768 (2015).
PubMed PubMed Central Google Scholar
Doucet, T. T. & Kazazian, H. H. Long interspersed element sequencing (L1-Seq): a method to identify somatic LINE-1 insertions in the human genome. Methods Mol. Biol. 1400, 79–93 (2016).
PubMed PubMed Central Google Scholar
Tang, Z. et al. Human transposon insertion profiling: analysis, visualization and identification of somatic LINE-1 insertions in ovarian cancer. Proc. Natl Acad. Sci. USA 114, E733–E740 (2017).
CAS PubMed PubMed Central Google Scholar
Solyom, S. et al. Extensive somatic L1 retrotransposition in colorectal tumors. Genome Res. 22, 2328–2338 (2012).
CAS PubMed PubMed Central Google Scholar
Erwin, J. A. et al. L1-associated genomic regions are deleted in somatic cells of the healthy human brain. Nat. Neurosci. 19, 1583–1591 (2016).
CAS PubMed PubMed Central Google Scholar
Witherspoon, D. J. et al. Mobile element scanning (ME-Scan) by targeted high-throughput sequencing. BMC Genomics 11, 410 (2010).
PubMed PubMed Central Google Scholar
Kvikstad, E. M., Piazza, P., Taylor, J. C. & Lunter, G. A high throughput screen for active human transposable elements. BMC Genomics 19, 115 (2018).
PubMed PubMed Central Google Scholar
Streva, V. A. et al. Sequencing, identification and mapping of primed L1 elements (SIMPLE) reveals significant variation in full length L1 elements between individuals. BMC Genomics 16, 220 (2015).
PubMed PubMed Central Google Scholar
Disdero, E. & Filée, J. LoRTE: detecting transposon-induced genomic variants using low coverage PacBio long read sequences. Mob. DNA 8, 5 (2017).
PubMed PubMed Central Google Scholar
Huddleston, J. et al. Discovery and genotyping of structural variation from long-read haploid genome sequence data. Genome Res. 27, 677–685 (2017).
CAS PubMed PubMed Central Google Scholar
Sedlazeck, F. J. et al. Accurate detection of complex structural variations using single-molecule sequencing. Nat. Methods 14, 125 (2018).
Google Scholar
Chaisson, M. J. P. et al. Resolving the complexity of the human genome using single-molecule sequencing. Nature 517, 608–611 (2015). This study is a major effort to complete the human reference genome through long-read sequencing and a custom structural variant caller.
CAS PubMed Google Scholar
Pendleton, M. et al. Assembly and diploid architecture of an individual human genome via single-molecule technologies. Nat. Methods 12, 780–786 (2015).
CAS PubMed PubMed Central Google Scholar
Ewing, A. D. Transposable element detection from whole genome sequence data. Mob. DNA 6, 24 (2015).
PubMed PubMed Central Google Scholar
Iskow, R. C. et al. Natural mutagenesis of human genomes by endogenous retrotransposons. Cell 141, 1253–1261 (2010).
CAS PubMed PubMed Central Google Scholar
Rishishwar, L., Mariño-Ramírez, L. & Jordan, I. K. Benchmarking computational tools for polymorphic transposable element detection. Brief Bioinform. 18, 908–918 (2017).
PubMed Google Scholar
Kofler, R. SimulaTE: simulating complex landscapes of transposable elements of populations. Bioinformatics 34, 1439 (2018).
PubMed PubMed Central Google Scholar
Navarro, F. C. & Galante, P. A. RCPedia: a database of retrocopied genes. Bioinformatics 29, 1235–1237 (2013).
CAS PubMed PubMed Central Google Scholar
Jin, Y., Tam, O. H., Paniagua, E. & Hammell, M. TEtranscripts: a package for including transposable elements in differential expression analysis of RNA-seq datasets. Bioinformatics 31, 3593–3599 (2015). This article presents the RNA-seq differential expression software TEtranscripts, shown to be the most accurate at identifying reads from repetitive elements.
CAS PubMed PubMed Central Google Scholar
Lanciano, S. et al. Sequencing the extrachromosomal circular mobilome reveals retrotransposon activity in plants. PLOS Genet. 13, e1006630 (2017).
PubMed PubMed Central Google Scholar
Sundaresan, V. & Freeling, M. An extrachromosomal form of the Mu transposons of maize. Proc. Natl Acad. Sci. USA 84, 4924–4928 (1987).
CAS PubMed PubMed Central Google Scholar
Kamal, M., Xie, X. & Lander, E. S. A large family of ancient repeat elements in the human genome is under strong selection. Proc. Natl Acad. Sci. USA 103, 2740–2745 (2006).
CAS PubMed PubMed Central Google Scholar
Lowe, C. B., Bejerano, G. & Haussler, D. Thousands of human mobile element fragments undergo strong purifying selection near developmental genes. Proc. Natl Acad. Sci. USA 104, 8005–8010 (2007).
CAS PubMed PubMed Central Google Scholar
Chandrashekar, D. S., Dey, P. & Acharya, K. K. GREAM: a web server to short-list potentially important genomic repeat elements based on over-/under-representation in specific chromosomal locations, such as the gene neighborhoods, within or across 17 mammalian species. PLOS One 10, e0133647 (2015). This paper describes a tool that was developed to assess the impact of TEs on genes and biological pathways.
PubMed PubMed Central Google Scholar
Criscione, S. W., Zhang, Y., Thompson, W., Sedivy, J. M. & Neretti, N. Transcriptional landscape of repetitive elements in normal and cancer human cells. BMC Genomics 15, 583 (2014).
PubMed PubMed Central Google Scholar
Han, B. W., Wang, W., Zamore, P. D. & Weng, Z. piPipes: a set of pipelines for piRNA and transposon analysis via small RNA-seq, RNA-seq, degradome- and CAGE-seq, ChIP-seq and genomic DNA sequencing. Bioinformatics 31, 593–595 (2015).
CAS PubMed Google Scholar
Luteijn, M. J. & Ketting, R. F. PIWI-interacting RNAs: from generation to transgenerational epigenetics. Nat. Rev. Genet. 14, 523–534 (2013).
CAS PubMed Google Scholar
Lerat, E., Fablet, M., Modolo, L., Lopez-Maestre, H. & Vieira, C. TEtools facilitates big data expression analysis of transposable elements and reveals an antagonism between their activity and that of piRNA genes. Nucleic Acids Res. 45, e17 (2017).
PubMed Google Scholar
Robberecht, C., Voet, T., Zamani Esteki, M., Nowakowska, B. A. & Vermeesch, J. R. Nonallelic homologous recombination between retrotransposable elements is a driver of de novo unbalanced translocations. Genome Res. 23, 411–418 (2013).
CAS PubMed PubMed Central Google Scholar
He, D., Hormozdiari, F., Furlotte, N. & Eskin, E. Efficient algorithms for tandem copy number variation reconstruction in repeat-rich regions. Bioinformatics 27, 1513–1520 (2011).
CAS PubMed PubMed Central Google Scholar
Monlong, J. et al. Human copy number variants are enriched in regions of low mappability. Nucleic Acids Res. 7, 225 (2018).
Google Scholar
Churakov, G. et al. A novel web-based TinT application and the chronology of the primate Alu retroposon activity. BMC Evol. Biol. 10, 376 (2010).
CAS PubMed PubMed Central Google Scholar
Price, A. L., Eskin, E. & Pevzner, P. A. Whole-genome analysis of Alu repeat elements reveals complex evolutionary history. Genome Res. 14, 2245–2252 (2004).
CAS PubMed PubMed Central Google Scholar
Jiang, C., Chen, C., Huang, Z., Liu, R. & Verdier, J. ITIS, a bioinformatics tool for accurate identification of transposon insertion sites using next-generation sequencing data. BMC Bioinformatics 16, 72 (2015).
PubMed PubMed Central Google Scholar
Daron, J. & Slotkin, R. K. EpiTEome: simultaneous detection of transposable element insertion sites and their DNA methylation levels. Genome Biol. 18, 91 (2017).
PubMed PubMed Central Google Scholar
Glusman, G. et al. A third approach to gene prediction suggests thousands of additional human transcribed regions. PLOS Comput. Biol. 2, e18 (2006).
Google Scholar
Eddy, S. R. The C-value paradox, junk DNA and ENCODE. Curr. Biol. 22, R898–R899 (2012).
CAS PubMed Google Scholar
Kellis, M. et al. Defining functional DNA elements in the human genome. Proc. Natl Acad. Sci. USA 111, 6131–6138 (2014).
CAS PubMed PubMed Central Google Scholar
Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).
PubMed PubMed Central Google Scholar
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012). References 133 and 134 describe the aligners BowTie and BowTie2, which are capable of handling multi-mapped reads.
CAS PubMed PubMed Central Google Scholar
Thankaswamy-Kosalai, S., Sen, P. & Nookaew, I. Evaluation and assessment of read-mapping by multiple next-generation sequencing aligners based on genome-wide characteristics. Genomics 109, 186–191 (2017).
CAS PubMed Google Scholar
Kahles, A., Behr, J. & Rätsch, G. MMR: a tool for read multi-mapper resolution. Bioinformatics 32, 770–772 (2016).
CAS PubMed Google Scholar
Wang, J., Huda, A., Lunyak, V. V. & Jordan, I. K. A. Gibbs sampling strategy applied to the mapping of ambiguous short-sequence tags. Bioinformatics 26, 2501–2508 (2010).
CAS PubMed PubMed Central Google Scholar
Chung, D. et al. Discovering transcription factor binding sites in highly repetitive regions of genomes with multi-read analysis of ChIP-Seq data. PLOS Comput. Biol. 7, e1002111 (2011).
CAS PubMed PubMed Central Google Scholar
Wang, R. et al. LOcating non-unique matched tags (LONUT) to improve the detection of the enriched regions for ChIP-seq data. PLOS One 8, e67788 (2013).
CAS PubMed PubMed Central Google Scholar
Nakato, R., Itoh, T. & Shirahige, K. DROMPA: easy-to-handle peak calling and visualization software for the computational analysis and validation of ChIP-seq data. Genes Cells 18, 589–601 (2013). References 138–140 are examples of ChIP-seq peak callers developed to include multi-mapped reads in their analyses.
CAS PubMed PubMed Central Google Scholar
Trapnell, C. et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat. Protoc. 7, 562–578 (2012).
CAS PubMed PubMed Central Google Scholar
Anders, S., Pyl, P. T. & Huber, W. HTSeq — a Python framework to work with high-throughput sequencing data. Bioinformatics 31, 166–169 (2015).
CAS PubMed Google Scholar
Bray, N. L., Pimentel, H., Melsted, P. & Pachter, L. Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol. 34, 525–527 (2016).
CAS PubMed Google Scholar
Patro, R., Duggal, G., Love, M. I., Irizarry, R. A. & Kingsford, C. Salmon provides fast and bias-aware quantification of transcript expression. Nat. Methods 14, 417–419 (2017).
CAS PubMed PubMed Central Google Scholar
Boeke, J. D., Garfinkel, D. J., Styles, C. A. & Fink, G. R. Ty elements transpose through an RNA intermediate. Cell 40, 491–500 (1985).
CAS PubMed Google Scholar
Eickbush, T. H. & Malik, H. S. in Mobile DNA II (eds. Craig N. L., Craigie, R., Gellert, M. & Lambowitz, A. M.) 1111–1144 (American Society of Microbiology, Washington, DC, 2002).
Piégu, B., Bire, S., Arensburger, P. & Bigot, Y. A survey of transposable element classification systems — a call for a fundamental update to meet the challenge of their diversity and complexity. Mol. Phylogenet. Evol. 86, 90–109 (2015).
PubMed Google Scholar
Llorens, C. et al. The Gypsy Database (GyDB) of mobile genetic elements: release 2.0. Nucleic Acids Res. 39, D70–D74 (2011).
CAS PubMed Google Scholar
Vassetzky, N. S. & Kramerov, D. A. SINEBase: a database and tool for SINE analysis. Nucleic Acids Res. 41, D83–D89 (2013).
CAS PubMed Google Scholar
Ma, B., Li, T., Xiang, Z. & He, N. MnTEdb, a collective resource for mulberry transposable elements. Database 2015, bav004 (2015).
PubMed PubMed Central Google Scholar
Shao, F., Wang, J., Xu, H. & Peng, Z. FishTEDB: a collective database of transposable elements identified in the complete genomes of fish. Database 2018, bax106 (2018).
PubMed Central Google Scholar
Xu, H. E. et al. BmTEdb: a collective database of transposable elements in the silkworm genome. Database 2013, bat055 (2013).
PubMed PubMed Central Google Scholar
Li, S.-F., Zhang, G.-J., Yuan, J.-H., Deng, C.-L. & Gao, W.-J. Repetitive sequences and epigenetic modification: inseparable partners play important roles in the evolution of plant sex chromosomes. Planta 243, 1083–1095 (2016).
CAS PubMed Google Scholar
Roberts, A. P. et al. Revised nomenclature for transposable genetic elements. Plasmid 60, 167–173 (2008).
CAS PubMed Google Scholar
Nakagawa, S. & Takahashi, M. U. gEVE: a genome-based endogenous viral element database provides comprehensive viral protein-coding sequences in mammalian genomes. Database 2016, baw087 (2016).
PubMed PubMed Central Google Scholar
Paces, J., Pavlícek, A. & Paces, V. HERVd: database of human endogenous retroviruses. Nucleic Acids Res. 30, 205–206 (2002).
CAS PubMed PubMed Central Google Scholar
Lappalainen, I. et al. DbVar and DGVa: public archives for genomic structural variation. Nucleic Acids Res. 41, D936–D941 (2013).
CAS PubMed Google Scholar
Rahman, R. et al. Unique transposon landscapes are pervasive across Drosophila melanogastergenomes. Nucleic Acids Res. 43, 10655–10672 (2015).
CAS PubMed PubMed Central Google Scholar
Ye, C., Ji, G. & Liang, C. detectMITE: a novel approach to detect miniature inverted repeat transposable elements in genomes. Sci. Rep. 6, 19688 (2016).
CAS PubMed PubMed Central Google Scholar
Keane, T. M., Wong, K. & Adams, D. J. RetroSeq: transposable element discovery from next-generation sequencing data. Bioinformatics 29, 389–390 (2013).
CAS PubMed Google Scholar
Hormozdiari, F. et al. Next-generation VariationHunter: combinatorial algorithms for transposon insertion discovery. Bioinformatics 26, i350–i357 (2010).
CAS PubMed PubMed Central Google Scholar
Gilly, A. et al. TE-Tracker: systematic identification of transposition events through whole-genome resequencing. BMC Bioinformatics 15, 377 (2014).
PubMed PubMed Central Google Scholar
Thung, D. T. et al. Mobster: accurate detection of mobile element insertions in next generation sequencing data. Genome Biol. 15, 488 (2014).
PubMed PubMed Central Google Scholar
Quadrana, L. et al. The Arabidopsis thaliana mobilome and its impact at the species level. eLife 5, e15716 (2016).
PubMed PubMed Central Google Scholar
David, M., Mustafa, H. & Brudno, M. Detecting Alu insertions from high-throughput sequencing data. Nucleic Acids Res. 41, e169 (2013).
CAS PubMed PubMed Central Google Scholar
Tica, J. et al. Next-generation sequencing-based detection of germline L1-mediated transductions. BMC Genomics 17, 342 (2016).
PubMed PubMed Central Google Scholar
Du, C., Caronna, J., He, L. & Dooner, H. K. Computational prediction and molecular confirmation of Helitron transposons in the maize genome. BMC Genomics 9, 51 (2008).
PubMed PubMed Central Google Scholar
Fiston-Lavier, A. S., Barron, M. G., Petrov, D. A. & Gonzalez, J. T-Lex2: genotyping, frequency estimation and re-annotation of transposable elements using single or pooled next-generation sequencing data. Nucleic Acids Res. 43, e22 (2015).
PubMed Google Scholar
Kofler, R., Betancourt, A. J. & Schlötterer, C. Sequencing of pooled DNA samples (Pool-Seq) uncovers complex dynamics of transposable element insertions in Drosophila melanogaster. PLOS Genet. 8, e1002487 (2012).
CAS PubMed PubMed Central Google Scholar
Kofler, R. & Gómez-Sánchez, D. PoPoolationTE2: comparative population genomics of transposable elements using Pool-Seq. Mol. Biol. Evol. 33, 2759–2764 (2016).
CAS PubMed PubMed Central Google Scholar
Cridland, J. M., Macdonald, S. J., Long, A. D. & Thornton, K. R. Abundance and distribution of transposable elements in two Drosophila QTL mapping resources. Mol. Biol. Evol. 30, 2311–2327 (2013).
CAS PubMed PubMed Central Google Scholar
Chen, J., Wrightsman, T. R., Wessler, S. R. & Stajich, J. E. RelocaTE2: a high resolution transposable element insertion site mapping tool for population resequencing. PeerJ 5, e2942 (2017).
PubMed PubMed Central Google Scholar
Stuart, T. et al. Population scale mapping of transposable element diversity reveals links to gene regulation and epigenomic variation. eLife 5, e20777 (2016).
PubMed PubMed Central Google Scholar

Download references

Acknowledgements

This work was supported by a grant from the Canadian Institute for Health Research (CIHR-MOP-115090). P.G.-P. is supported by the Programme de bourses de formation de doctorat du Fonds de Recherche Québec Santé (FRSQ-31874). G.B. is supported by the Fonds de Recherche Québec Santé (FRQS-25348). The authors also thank J.M.M. Monlong and the reviewers for very useful comments on the manuscript.

Reviewer information

Nature Reviews Genetics thanks E. Lerat, A. Smit and the other, anonymous reviewer(s) for their contribution to the peer review of this work.

Author information

Authors and Affiliations

Department of Human Genetics, McGill University, Montréal, Canada
Patricia Goerner-Potvin & Guillaume Bourque
Canadian Centre for Computational Genomics, Montréal, Canada
Guillaume Bourque
McGill University and Génome Québec Innovation Centre, Montréal, Canada
Guillaume Bourque

Authors

Patricia Goerner-Potvin
View author publications
You can also search for this author in PubMed Google Scholar
Guillaume Bourque
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

P.G.-P. and G.B. contributed to all aspects of the manuscript.

Corresponding author

Correspondence to Guillaume Bourque.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Glossary

TE annotation: Assembled genomes are annotated to indicate which sequences are derived from transposable elements (TEs). The annotation reveals which families of TEs are present as well as the percentage of TE-derived sequences in a genome.
Repression mechanisms: Active transposable elements contain promoters that can initiate transcription. They are ‘silenced’ through various repression mechanisms to prevent transcription and further mobilization.
Polymorphic insertions: Individual transposable element instances that have not been fixed in a species genome and are present in some re-sequenced genomes but absent from others, such as the reference genome. Polymorphic insertions can be either germline or somatic.
Long interspersed nuclear element 1: (LINE-1; also known as L1). Autonomous class I transposons that encode reverse transcriptase, endonuclease and RNA-binding proteins that effectively mobilize RNA sequences and create novel insertions.
Alu: Primate-specific non-autonomous short interspersed nuclear element retrotransposon. Alus are highly abundant in primate genomes and can mobilize through the long interspersed nuclear element (LINE) retrotransposition machinery.
SVA: Primate-specific non-autonomous retrotransposons composed of fragments of Alus and retroviral long terminal repeat elements. The SVA name comes from the fact that they are derived from short interspersed nuclear elements, variable number tandem repeats (VNTRs) and Alu elements. They mobilize through long interspersed nuclear element (LINE) mobilization proteins.
Germline insertions: Transposable element insertions occurring in the parental germ line or during embryogenesis and shared between all cells of an individual.
Somatic insertions: Transposable element insertions occurring later in life in a specific tissue. These insertions are unique to one or a subset of cells of an individual.
Domesticated: (Also known as co-opted). A transposable element (TE) for which at least part of its sequence has been recruited to perform a specific function for the host, such as providing a TE-encoded protein with physiological functions. The co-opted sequence has been domesticated.
Cis-regulation: A transposable element modulating the expression of nearby genes by having part of its sequence acting as a regulatory element.
Trans-regulation: A transposable element modulating cellular processes distant from its genomic location. Trans-regulation is done via its transcript or encoded protein.
Cryptons: DNA transposons initially identified in fungi that are characterized by the use of tyrosine recombinase instead of transposase for transposition.
Mavericks: Recently identified eukaryotic large DNA transposons (also known as Polintons) encoding up to ten proteins, including some that are similar to virus capsid.
Multi-mapped reads: Sequencing reads that map ambiguously to more than one location on the reference genome. These are common for repetitive regions including transposable elements.
Long-read sequencing: Can be achieved by directly sequencing long DNA molecules, such as by using Pacific Biosciences or Oxford Nanopore Technologies platforms. Alternatively, linked-read sequencing of 10X Genomics generates synthetic long reads by barcoding long molecules of DNA and sequencing interspersed short fragments each retaining the originating long molecule barcode, effectively linking these short reads into longer contigs.
Consensus sequence: Nucleotide sequence representing an approximation of the active transposable element (TE) that gave rise to a group of interspersed repeats. They are generated from a multiple alignment of instances from the same TE family that have accumulated mutations over time.
Miniature inverted repeat TE: (MITE). A recently coined name for non-autonomous short terminal inverted repeat DNA transposons.
Short interspersed nuclear elements: (SINEs). Non-autonomous elements for which their propagation is dependent on the retrotransposition machinery of long interspersed nuclear elements (LINEs) in the same genome. They contain an internal RNA polymerase III promoter derived from a small RNA gene, usually a tRNA.
Nested repeats: Transposable elements (TEs) that inserted in or near previous TE insertions. These are very challenging to detect with short reads.
Terminal inverted repeats: (TIRs). Repeated sequences that are present in the terminal regions of various transposable elements (TEs) are specific for particular TE families. These motifs contain transposase and DNA binding sites that are essential for transposition of the TE.
DIRS: Dictyostelium intermediate repeat sequence (DIRS) are classified as a superfamily of long terminal repeat transposons in the RepBase database and as a distinct order and superfamily in the 2007 Wicker unified transposable element classification system.
Target site duplications: (TSDs). Occur at insertion sites of most transposable elements (TEs), where the host genomic sequence is duplicated surrounding the new TE instance. As the two DNA strands are not cleaved at the exact same location, a few bases in between the two cuts will become duplicated during the second strand synthesis closing the insertion site.
Transduction: Host genomic DNA that is transcribed and inserted elsewhere in the genome through transposable element (TE) retrotransposition events. These duplicated sequences can be found with or without adjacent TE sequences as TE reverse transcription is often prematurely stopped.
SMRT: A PCR-free, single-molecule real-time (SMRT) sequencing platform from Pacific Biosciences that produces long reads. Reads are 1–60 kb in length, with a median of 10 kb.
ChIP-seq: Chromatin immunoprecipitation followed by sequencing (ChIP-seq) consists of the capture and sequencing of DNA that is bound by a protein of interest, such as a transcription factor or modified histone.
RACE-seq: Rapid amplification of cDNA ends (RACE) is a method to amplify complete RNA molecules. RACE-seq involves sequencing the RNA molecules amplified through the RACE protocol. It is often used to detect novel transcripts.
CAGE-seq: Cap analysis of gene expression (CAGE) sequencing is a method to identify transcription start sites through sequencing of 5′ RNA transcripts.
PIWI-interacting RNA: (piRNA). piRNAs are short non-coding RNA molecules that bind to PIWI proteins. They are established as part of transposable element silencing mechanisms in animals.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Goerner-Potvin, P., Bourque, G. Computational tools to unmask transposable elements. Nat Rev Genet 19, 688–704 (2018). https://doi.org/10.1038/s41576-018-0050-x

Download citation

Published: 19 September 2018
Issue Date: November 2018
DOI: https://doi.org/10.1038/s41576-018-0050-x

This article is cited by

RepEnTools: an automated repeat enrichment analysis package for ChIP-seq data reveals hUHRF1 Tandem-Tudor domain enrichment in young repeats
- Michel Choudalakis
- Pavel Bashtrykov
- Albert Jeltsch
Mobile DNA (2024)
Implications of the three-dimensional chromatin organization for genome evolution in a fungal plant pathogen
- David E. Torres
- H. Martin Kramer
- Bart P. H. J. Thomma
Nature Communications (2024)
Keep quiet: the HUSH complex in transcriptional silencing and disease
- Iris Müller
- Kristian Helin
Nature Structural & Molecular Biology (2024)
A SINE-VNTR-Alu at the LRIG2 locus is associated with proximal and distal gene expression in CRISPR and population models
- Ashley Hall
- Ben Middlehurst
- John P. Quinn
Scientific Reports (2024)
Regulation and function of transposable elements in cancer genomes
- Michael Lee
- Syed Farhan Ahmad
- Jian Xu
Cellular and Molecular Life Sciences (2024)

Computational tools to unmask transposable elements

Subjects

Abstract

Access options

Similar content being viewed by others

Comprehensive identification of transposable element insertions using multiple sequencing technologies

Transposable elements in human genetic disease

Characterization and visualization of tandem repeats at genome scale

References

Acknowledgements

Reviewer information

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Related links

Glossary

Rights and permissions

About this article

Cite this article

This article is cited by

RepEnTools: an automated repeat enrichment analysis package for ChIP-seq data reveals hUHRF1 Tandem-Tudor domain enrichment in young repeats

Implications of the three-dimensional chromatin organization for genome evolution in a fungal plant pathogen

Keep quiet: the HUSH complex in transcriptional silencing and disease

A SINE-VNTR-Alu at the LRIG2 locus is associated with proximal and distal gene expression in CRISPR and population models

Regulation and function of transposable elements in cancer genomes

Search

Quick links

Subjects

Abstract

Access options

Similar content being viewed by others

References

Acknowledgements

Reviewer information

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Related links

Glossary

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links