Non-long terminal repeat Retrotransposons are referred to as long interspersed nuclear elements (LINEs) and their non-autonomous partners are short interspersed nuclear elements (SINEs). It is believed that an active SINE copy, upon retrotransposition, generates near identical copies of itself, which subsequently accumulate mutations resulting in sequence polymorphism. Here we show that when a retrotransposition-competent cell line of the parasitic protist Entamoeba histolytica, transfected with a marked SINE copy, is induced to retrotranspose, >20% of the newly retrotransposed copies are neither identical to the marked SINE nor to the mobilized resident SINEs. Rather they are recombinants of resident SINEs and the marked SINE. They are a consequence of retrotransposition and not DNA recombination, as they are absent in cells not expressing the retrotransposition functions. This high-frequency recombination provides a new explanation for the existence of mosaic SINEs, which may impact on genetic analysis of SINE lineages, and measurement of phylogenetic distances.
Retrotransposons of the non-long terminal repeat (non-LTR) category are generally referred to as long interspersed nuclear elements (LINEs) and their non-autonomous partners are called short interspersed nuclear elements (SINEs)1. SINEs are extremely efficient genome invaders. The most well-known SINE, the human Alu element is found in 1.1 million copies in the human genome2. Individual SINE copies typically show extensive sequence polymorphism3,4,5. How this polymorphism is generated is not clear, and direct sequence comparison of newly retrotransposed SINE copies with their 'source' copy has not been reported. It should be possible to address this issue using retrotransposition-competent cell lines. In vivo retrotransposition of the human LINE, L16 in cultured cells, and SINEs, Alu7 and SVA8,9; and a LINE/SINE pair from eel10, have been well documented. However, analysis of sequence changes in retrotransposed copies compared with the marked SINE copy after a round of retrotransposition has been limited.
We have been studying the LINE and SINE elements in the early branching parasitic protist Entamoeba histolytica. These retrotransposons comprise ~11% of the 23-Mb genome11,12,13. EhLINEs belong to the R2 group of non-LTR retrotransposons14,15,16. EhLINEs/SINEs are generally located in intergenic regions11,13, but not within genes. The 4.8-kb EhLINE1 contains two open reading frames (ORFs) (Fig. 1a). ORF2 encodes the reverse transcriptase (RT) and endonuclease (EN) activities typically required for non-LTR element retrotransposition6,17,18,19. EhSINE1 (550 bp) is the likely non-autonomous partner of EhLINE1. The two share a 78-bp stretch of sequence homology at their 3′ ends (Fig. 1a)10,11,20. Of the 742 EhLINE1 copies in the E. histolytica genome, 88 are full-length, but they lack complete ORFs13. Therefore to study retrotransposition in these cells it is necessary to generate a cell line that expresses both ORFs of EhLINE1. Here we report the construction of such a cell line, which expresses ORF2 in a tetracycline (tet)-inducible manner, and can retrotranspose a SINE copy in the presence of tet. This is the first report of active retrotransposition in a parasitic protist cell line. Using this system, we demonstrate a novel feature of high-frequency recombination between SINE copies during retrotransposition.
A retrotransposition-competent E. histolytica cell-line
Although EhSINE1 transcripts are abundantly present in E. histolytica cells21,22, full-length transcripts of EhLINE1 are not detected (Fig. 1b). We show that E. histolytica cells maintained in the lab express ORF1p (Fig. 1c), but fail to express detectable levels of ORF2p; hence these cells are not expected to be retrotransposition competent.
The study of active retrotransposition requires the construction of a cell line expressing the LINE-encoded ORFs, as achieved with human L16, which could also retrotranspose SINEs, such as Alu7 and SVA8,9. To express the functions required for retrotransposition, we reconstructed the complete ORF2 (lacking any stop codons) by overlapping PCR (Fig. 2a), cloned it in a tet-inducible expression vector (Fig. 2b) and introduced it into E. histolytica cells to obtain the cell line Eh-ORF2. Our strategy to measure retrotransposition was to introduce in this cell line a plasmid containing a marked EhSINE1 copy (with a 25-bp GC-rich tag), and a known target site of EhSINE1 insertion15 (Fig. 2c). The sequence used as the target site for insertion was identified in a previous study in which we had looked for 'empty' sites in the genome, where a SINE element was missing in one chromosomal copy, but had inserted in another copy of the E. histolytica polyploid genome. The empty versus occupied site was differentiated by PCR amplification with flanking primers. The target site used in this assay was one such sequence with an empty site15. Hence, it is a sequence used by the E. histolytica retrotransposition machinery for SINE insertion in vivo.
Retrotransposition events occurring at this target site were scored without using selection pressure. Being PCR-based, we expect the scoring to be very sensitive. The doubly transfected cell line (Eh-ORF2-SN) expressed 2.9 kb ORF2 transcript (Fig. 3a) and ORF2p (111 kDa) (Fig. 3b) in a tet-inducible manner. The constitutive expression of ORF1p (60 kDa) was unaltered on tet induction (Fig 3b). This cell line also expressed the transcript corresponding to the marked SINE1 (Fig 3c). In the presence of tet, this cell line is expected to be retrotransposition competent. We added tet to cultures in early log phase and harvested the cells after 48 h (late log). We scored retrotransposition of the marked SINE copy to the insertion target site by PCR amplification of total genomic DNA using two sets of primer pairs to discount the possibility of PCR artefacts (Fig. 4). The identity of the amplicons was further confirmed by Southern hybridization with the marked SINE probe. We did not obtain amplicons in the absence of tet, whereas in the presence of tet, specific amplicons expected from the mobilization of the marked SINE to the insertion site were obtained with both primer pairs (Fig. 4a,b). We also did not obtain amplicons when tet was added to a cell line containing the marked SINE and insertion hotspot but lacking ORF2 (Eh-SN). A hallmark of retrotransposition is the generation of target site duplications (TSD). We induced retrotransposition by tet addition in three independently grown cultures and sequenced the flanking sequences of thirteen random clones of the amplicons. We predominantly found a 22-bp TSD (Fig. 4c), which matched exactly in size and sequence with the TSD found at this insertion hotspot at its genomic location15. We did not find insertion at any region in the 176-bp fragment other than the hotspot. Therefore, based on three criteria namely, strict requirement of ORF2p expression for mobilization, specific insertion into the retrotransposition hotspot and the 22-bp TSDs accompanying the insertion, we conclude that the events scored by us are due to retrotransposition and not due to DNA recombination. This is the first demonstration of active retrotransposition in a primitive protist.
Analysis of the newly retrotransposed copies of EhSINEs
Next we checked the sequences of the newly retrotransposed copies. To recover all retrotransposition events due to SINEs, we obtained amplicons using the primer pair C1/C2 (Fig. 4d), which gave a 1.4 kb amplicon from the parent plasmid and a 2.0 kb amplicon expected from events where a SINE copy had retrotransposed at the insertion hotspot. We did not obtain bands shorter than 2.0 kb, which shows that in our system the predominant retrotransposition events are contributed by full-length SINEs. This is expected, as truncated SINE transcripts are not seen, whereas full-length transcripts are abundant in E. histolytica23,24. This also shows that 5′-truncations are not common during EhSINE retrotransposition. We cut out the 2.0-kb band and reamplified it using the primer pair B1/B2 (Fig. 4). We cloned the 0.8-kb amplicon so obtained, and sequenced 23 randomly selected clones. The data showed that the sequences belonged to three different categories (Table 1) namely; Set I, a set of ten sequences matching completely with the marked SINE; Set II, a set of eight sequences lacking the tag and matching with genomic SINE copies; and Set III, a set of five sequences containing the 25-bp tag at the expected location but, otherwise, matching with genomic SINEs rather than the marked SINE. In these five instances of set III, the tag had associated itself with genomic SINE sequences. Characteristics of each set are as follows. Of the ten sequences in set I, seven were 100% identical to the marked SINE, and three had one mismatch each. The nucleotide position of the mismatch is indicated in Table 1, and shows that the mismatches are not clustered. In set II, all eight sequences lacked the tag and showed only 95–96% sequence identity (19–27 mismatches) with the marked SINE. As this level of identity is seen between random genomic EhSINE1 copies, these sequences would not have arisen from the marked SINE. Rather they may correspond to genomic SINEs. It is estimated that 142 SINE copies are transcribed in E. histolytica12, some of which may be mobilized in our cell line upon tet induction.Sequence comparison showed that seven of the eight sequences in set II indeed showed 98–99% identity with genomic EhSINE1 sequences found in the E. histolytica EST database (3–11 mismatches with best data base hits when compared with the full-length SINE sequence; Table 1, set II). The number of mismatches seen in separate comparisons of 5′-half and 3′-half with the EhSINE1 sequences in the data base is explained later. We confirmed that the database comparison was valid, as random check of PCR amplicons from four of these loci from our cultured cells showed 100% sequence match with the database. These events therefore resulted from the mobilization of transcribed genomic SINE copies and not from the marked SINE copy.
In set III, all five sequences had the 25-bp tag but, surprisingly, showed only 94–95% overall sequence identity (22–27 mismatches) with the marked SINE. When we searched these sequences (minus the tag) for identical hits in the E. histolytica data base we found at best 94–98% matches (11–28 mismatches in the full length SINE sequence). However, when we searched the sequence on either side of the tag separately (5′-half and 3′-half of each SINE separately; Table 1, set III), 98–100% matches were obtained, and each side matched with different genomic SINE sequences (accession codes are provided in Supplementary Table S1). Thus it seems that these five sequences are recombinants, derived from at least three different SINE sequences, one of them being the marked SINE and two belonging to different genomic SINEs. If the tag was acquired by the genomic SINE copies through a DNA-recombination event before retrotransposition, the tag should be present in the transcripts of these SINEs. To check this, we took total RNA from Eh-ORF2-SN cells induced with tet and did RT–PCR in two parts using primers from the tag and from SINE sequences at either end (Fig. 5a,b). We sequenced ten random clones of the amplicons from each side. All clones were identical only to the marked SINE sequence, showing that tag-containing transcripts arose only from the marked SINE copy. To further check if the recombinants existed before the induction of retrotransposition, we performed genomic PCR with DNA from Eh-ORF2-SN cells before tet addition using primers from the tag and opposite primers specific to the five mobilized copies of set III (Fig. 5c). No amplicons were obtained with this DNA, whereas DNA from cells after tet addition gave the expected amplicons, showing that the tag was not associated with these sequences before retrotransposition.
We conclude that recombinant SINEs are formed consequent to retrotransposition. The process is rapid, as we scored these events within 48 h of retrotransposition induction, and occurs at high frequency (>20% of total events scored). Some of the events in set II might also be recombinants, as the number of mismatches reduced when the 5′ and 3′ halves of each sequence were searched separately with the database (Table 1). Although the number of mismatches in a full-length comparison ranged from 3 to 20, this number became 0 to 4 when matches were searched only for the 5′-half, and was 0 to 5 when matches were searched for the 3′-half. In two cases the sequence had to be matched in three parts (5′, 3′, and middle) to get minimum mismatches.
Retrotransposition-competent cell lines have greatly assisted in understanding the mechanism of non-LTR retrotransposition in vivo6,7,8,9,10. Here we report the first such system in a parasitic protist E. histolytica. Our results show that in this organism the newly retrotransposed EhSINE copies undergo high-frequency recombination, not known earlier to take place in this class of non-LTR retrotransposons. Chimeric molecules arising from reverse transcripts have been observed in yeast Ty elements, and were attributed to gene conversion23, whereas in retroviruses high-frequency recombination occurs during reverse transcription of the two co-packaged RNAs in the virion as a result of template switching24. In non-LTR elements, tripartite chimeric LINEs have earlier been reported in a fungal genome25, and in mammalian genomes U6/L1 pseudogene chimeras have been experimentally demonstrated26. However, recombination between multiple copies of the same SINE family during retrotransposition is a novel observation.
The demonstrated properties of RT to displace the RNA template during complementary DNA (cDNA) synthesis, and to perform multiple template jumping27,28 could lead to these recombinants. The template jumping activity reported for non-LTR retrotransposons involves end-to-end jumping between the template RNAs, and this is frequently accompanied by the addition of non-templated nucleotides at the junction. The recombinant SINEs reported here are not generated by end-to-end jumping. They would require the RT to switch at internal positions from one template RNA to the next. Such switching has not been reported for the RT encoded by non-LTR elements and needs to be explored further. Our data (Table 1) suggests that, more frequently, no switches take place during retrotransposition (at least 10 out of 23 occurrences; set I); but multiple switching events are commonly encountered. In set II, four events (sequences 2, 3, 4 and 8) probably did not involve any switch, whereas two (numbers 1 and 7) could involve one switch each, and the remaining two (numbers 5 and 6) could involve two switches each. In set III, all five events seemed to be a result of two switches. This may be due to the small sample size in this set.
If such recombination is indeed common, one should expect the E. histolytica SINE population to display a prominent mosaic structure. To check this, we did multiple alignment of 63 full-length EhSINE1 copies thought to have retrotransposed most recently12, and looked at all positions that were polymorphic in at least 20% of the SINEs. We found 16 such positions. We organized the sequences in these positions in blocks of four, and clustered the positions with identical sequences into sets in the leftmost block (Fig. 6). Upon aligning these sequences with the next block of four, it is evident that the sequences of identical sets in the first block segregate out and associate in various combinations. Nine sets in the first block associated with twelve sets in the second block in 34 different combinations. The combination of patterns in all four blocks taken together showed clear indication of mosaic formations that would be expected to arise from internal template jumps during retrotransposition.
Although sequence analysis of human Alu subfamilies showed the existence of mosaic elements29 these were not experimentally found7,8,9. Mosaics could have been missed in these studies because of the smaller number of retrotransposition events scored, and the selection pressure used. Our results are the first direct demonstration that SINE copies engage in active sequence exchange during retrotransposition, leading to the rapid spread of the sequence tag to the SINE population, and generation of diversity. As mRNA transcripts are also templates of the same retrotransposition machinery during retropseudogene formation, it will be interesting to see if mRNA transcripts could also engage in similar recombination during reverse transcription29.
Cell culture and growth conditions
Trophozoites of E. histolytica strain HM-1:IMSS (clone 6) were axenically maintained and stable transfectants were obtained as described previously30. Single and double transfectants were maintained with 10 μg ml−1 of Hygromycin B or G418 or both.
Western analysis was performed with 100 μg of total cell lysates separated on 10% SDS–polyacrylamide gel electrophoresis and blotted using Mini-Trans Blot Electrophoretic Cell (Bio-Rad). Polyclonal antibodies, anti-EN (mouse) and anti-ORF1 (rabbit) were raised against each his-tagged recombinant protein purified through Ni-NTA agarose. 1:5,000 dilution of each anti-serum followed by HRP-conjugated secondary antibodies (1:10,000, Sigma) were used to detect respective polypeptides. ECL reagents were used for visualization (Millipore).
DNA sequence analysis
The genome sequence of E. histolytica having 1529 scaffolds was downloaded from NCBI database, with accession IDs AAFB00000000, and was searched (using BLAST tool) for matches with the sequences of retrotransposed copies of Sets II and III (Table 1 and Supplementary Table S1). In all, 25 nucleotides from both the ends of these copies were removed before BLAST analysis to remove any bias due to other factors (for example, addition of non-templated nucleotides during the RT reaction). Hits showing 100% coverage and maximum similarity were recorded. The coordinates of EhSINEs in various scaffolds are as per previous report12.
How to cite this article: Yadav, V. P. et al. Recombinant SINEs are formed at high frequency during induced retrotransposition in vivo. Nat. Commun. 3:854 doi: 10.1038/ncomms1855 (2012).
NCBI Reference Sequence
Singer, M. F. SINEs and LINEs: highly repeated short and long interspersed sequences in mammalian genomes. Cell 28, 433–434 (1982).
Lander, E. S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).
Cordaux, R., Hedges, D. J. & Batzer, M. A. Retrotransposition of Alu elements: how many sources? Trends Genet. 20, 464–467 (2004).
Roy, A. M. et al. Potential gene conversion and source genes for recently integrated Alu elements. Genome Res. 10, 1485–1495 (2000).
Salem, A. H., Ray, D. A., Hedges, D. J., Jurka, J. & Batzer, M. A. Analysis of the human Alu Ye lineage. BMC Evol. Biol. 5, 18 (2005).
Moran, J. V. et al. High frequency retrotransposition in cultured mammalian cells. Cell 87, 917–927 (1996).
Dewannieux, M., Esnault, C. & Heidmann, T. LINE-mediated retrotransposition of marked Alu sequences. Nat. Genet. 35, 41–48 (2003).
Hancks, D. C., Goodier, J. L., Mandal, P. K., Cheung, L. E. & Kazazian, H. H. Jr. Retrotransposition of marked SVA elements by human L1s in cultured cells. Hum. Mol. Genet. 20, 3386–3400 (2011).
Raiz, J. et al. The non-autonomous retrotransposon SVA is trans-mobilized by the human LINE-1 protein machinery. Nucleic Acids Res. 40, 1666–1683 (2012).
Kajikawa, M. & Okada, N. LINEs mobilize SINEs in the eel through a shared 3′ sequence. Cell 111, 433–444 (2002).
Bakre, A. A., Rawal, K., Ramaswamy, R., Bhattacharya, A. & Bhattacharya, S. The LINEs and SINEs of Entamoeba histolytica: comparative analysis and genomic distribution. Exp. Parasitol. 110, 207–213 (2005).
Huntley, D. M., Pandis, I., Butcher, S. A. & Ackers, J. P. Bioinformatic analysis of Entamoeba histolytica SINE1 elements. BMC Genomics 11, 321 (2010).
Lorenzi, H. et al. Genome wide survey, discovery and evolution of repetitive elements in three Entamoeba species. BMC Genomics 9, 595 (2008).
Malik, H. S., Burke, W. D. & Eickbush, T. H. The age and evolution of non-LTR retrotransposable elements. Mol. Biol. Evol. 16, 793–805 (1999).
Mandal, P. K., Bagchi, A., Bhattacharya, A. & Bhattacharya, S. An Entamoeba histolytica LINE/SINE pair inserts at common target sites cleaved by the restriction enzyme-like LINE-encoded endonuclease. Eukaryot Cell 3, 170–179 (2004).
Van Dellen, K., Field, J., Wang, Z., Loftus, B. & Samuelson, J. LINEs and SINE-like elements of the protist Entamoeba histolytica. Gene 297, 229–239 (2002).
Feng, Q., Moran, J. V., Kazazian, H. H. Jr & Boeke, J. D. Human L1 retrotransposon encodes a conserved endonuclease required for retrotransposition. Cell 87, 905–916 (1996).
Martin, S. L., Li, J., Epperson, L. E. & Lieberman, B. Functional reverse transcriptases encoded by A-type mouse LINE-1: defining the minimal domain by deletion analysis. Gene 215, 69–75 (1998).
Yang, J., Malik, H. S. & Eickbush, T. H. Identification of the endonuclease domain encoded by R2 and other site-specific, non-long terminal repeat retrotransposable elements. Proc. Natl Acad. Sci. USA 96, 7847–7852 (1999).
Moran, J. V. & Gilbert, N. Mammalian LINE-1 retrotransposons and related elements. In Mobile DNA II (eds Craig, N. L., Craigie, R., Gellert, M. & Lambowitz, A. M.) 836–869 (American Society for Microbiology, 2002).
Cruz-Reyes, J., ur-Rehman, T., Spice, W. M. & Ackers, J. P. A novel transcribed repeat element from Entamoeba histolytica. Gene 166, 183–184 (1995).
Willhoeft, U., Buss, H. & Tannich, E. The abundant polyadenylated transcript 2 DNA sequence of the pathogenic protozoan parasite Entamoeba histolytica represents a nonautonomous non-long-terminal-repeat retrotransposon-like element which is absent in the closely related nonpathogenic species Entamoeba dispar. Infect. Immun. 70, 6798–6804 (2002).
Derr, L. K. & Strathern, J. N. A role for reverse transcripts in gene conversion. Nature 361, 170–173 (1993).
Delviks-Frankenberry, K. et al. Mechanisms and factors that influence high frequency retroviral recombination. Viruses 3, 1650–1680 (2011).
Gogvadze, E., Barbisan, C., Lebrun, M. H. & Buzdin, A. Tripartite chimeric pseudogene from the genome of rice blast fungus Magnaporthe grisea suggests double template jumps during long interspersed nuclear element (LINE) reverse transcription. BMC Genomics 8, 360 (2007).
Garcia-Perez, J. L., Doucet, A. J., Bucheton, A., Moran, J. V. & Gilbert, N. Distinct mechanisms for trans-mediated mobilization of cellular RNAs by the LINE-1 reverse transcriptase. Genome Res. 17, 602–611 (2007).
Bibillo, A. & Eickbush, T. H. The reverse transcriptase of the R2 non-LTR retrotransposon: continuous synthesis of cDNA on non-continuous RNA templates. J. Mol. Biol. 316, 459–473 (2002).
Bibillo, A. & Eickbush, T. H. End-to-end template jumping by the reverse transcriptase encoded by the R2 retrotransposon. J. Biol. Chem. 279, 14945–14953 (2004).
Carroll, M. L. et al. Large-scale analysis of the Alu Ya5 and Yb8 subfamilies and their contribution to human genomic diversity. J. Mol. Biol. 311, 17–40 (2001).
Hamann, L., Buss, H. & Tannich, E. Tetracycline-controlled gene expression in Entamoeba histolytica. Mol. Biochem. Parasitol. 84, 83–91 (1997).
We acknowledge Jitender Kumar for helping to raise the anti-ORF1 antibody. V.P.Y. is a recipient of a Senior Research Fellowship from Council of Scientific and Industrial Research, Government of India. This work was funded by grant to S.B. from Department of Science and Technology and Department of Biotechnology, India.
The authors declare no competing financial interests.
About this article
Cite this article
Yadav, V., Mandal, P., Bhattacharya, A. et al. Recombinant SINEs are formed at high frequency during induced retrotransposition in vivo. Nat Commun 3, 854 (2012). https://doi.org/10.1038/ncomms1855