Article | Published:

Transposon molecular domestication and the evolution of the RAG recombinase


Domestication of a transposon (a DNA sequence that can change its position in a genome) to give rise to the RAG1–RAG2 recombinase (RAG) and V(D)J recombination, which produces the diverse repertoire of antibodies and T cell receptors, was a pivotal event in the evolution of the adaptive immune system of jawed vertebrates. The evolutionary adaptations that transformed the ancestral RAG transposase into a RAG recombinase with appropriately regulated DNA cleavage and transposition activities are not understood. Here, beginning with cryo-electron microscopy structures of the amphioxus ProtoRAG transposase (an evolutionary relative of RAG), we identify amino acid residues and domains the acquisition or loss of which underpins the propensity of RAG for coupled cleavage, its preference for asymmetric DNA substrates and its inability to perform transposition in cells. In particular, we identify two adaptations specific to jawed-vertebrates—arginine 848 in RAG1 and an acidic region in RAG2—that together suppress RAG-mediated transposition more than 1,000-fold. Our findings reveal a two-tiered mechanism for the suppression of RAG-mediated transposition, illuminate the evolution of V(D)J recombination and provide insight into the principles that govern the molecular domestication of transposons.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Data availability

The model of the cBbRAGL-nicked 3′TIR synaptic complex has been deposited in the PDB with accession code 6B40. The cryo-EM maps of cBbRAGL in complex with intact or nicked 3′TIRs have been deposited in Electron Microscopy DataBank with accession codes EMD-7043, -7044, -7045 and -7046. High-throughput DNA sequence data to identify transposition events in the human genome have been deposited in the NCBI Sequence Read Archive with accession codes SRR8430227SRR8430233 (project PRJNA514369).

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.


  1. 1.

    Gellert, M. V(D)J recombination: RAG proteins, repair factors, and regulation. Annu. Rev. Biochem. 71, 101–132 (2002).

  2. 2.

    Schatz, D. G. & Swanson, P. C. V. V(D)J recombination: mechanisms of initiation. Annu. Rev. Genet. 45, 167–202 (2011).

  3. 3.

    Lewis, S. M. The mechanism of V(D)J joining: lessons from molecular, immunological, and comparative analyses. Adv. Immunol. 56, 27–150 (1994).

  4. 4.

    Sinzelle, L., Izsvák, Z. & Ivics, Z. Molecular domestication of transposable elements: from detrimental parasites to useful host genes. Cell. Mol. Life Sci. 66, 1073–1093 (2009).

  5. 5.

    Levin, H. L. & Moran, J. V. Dynamic interactions between transposable elements and their hosts. Nat. Rev. Genet. 12, 615–627 (2011).

  6. 6.

    Jangam, D., Feschotte, C. & Betrán, E. Transposable element domestication as an adaptation to evolutionary conflicts. Trends Genet. 33, 817–831 (2017).

  7. 7.

    Thompson, C. B. New insights into V(D)J recombination and its role in the evolution of the immune system. Immunity 3, 531–539 (1995).

  8. 8.

    Fugmann, S. D. The origins of the Rag genes—from transposition to V(D)J recombination. Semin. Immunol. 22, 10–16 (2010).

  9. 9.

    Carmona, L. M. & Schatz, D. G. New insights into the evolutionary origins of the recombination-activating gene proteins and V(D)J recombination. FEBS J. 284, 1590–1605 (2017).

  10. 10.

    Huang, S. et al. Discovery of an active RAG transposon illuminates the origins of V(D)J recombination. Cell 166, 102–114 (2016).

  11. 11.

    Morales Poole, J. R., Huang, S. F., Xu, A., Bayet, J. & Pontarotti, P. The RAG transposon is active through the deuterostome evolution and domesticated in jawed vertebrates. Immunogenetics 69, 391–400 (2017).

  12. 12.

    Chatterji, M., Tsai, C. L. & Schatz, D. G. Mobilization of RAG-generated signal ends by transposition and insertion in vivo. Mol. Cell. Biol. 26, 1558–1568 (2006).

  13. 13.

    Reddy, Y. V., Perkins, E. J. & Ramsden, D. A. Genomic instability due to V(D)J recombination-associated transposition. Genes Dev. 20, 1575–1582 (2006).

  14. 14.

    Curry, J. D. et al. Chromosomal reinsertion of broken RSS ends during T cell development. J. Exp. Med. 204, 2293–2303 (2007).

  15. 15.

    Messier, T. L., O’Neill, J. P., Hou, S. M., Nicklas, J. A. & Finette, B. A. In vivo transposition mediated by V(D)J recombinase in human T lymphocytes. EMBO J. 22, 1381–1388 (2003).

  16. 16.

    Little, A. J., Matthews, A. G., Oettinger, M. A., Roth, D. B. & Schatz, D. G. in Molecular Biology of B Cells (eds Alt, F. W. et al.) Ch. 2, 13–34 (Academic/Elsevier, 2015).

  17. 17.

    Yin, F. F. et al. Structure of the RAG1 nonamer binding domain with DNA reveals a dimer that mediates DNA synapsis. Nat. Struct. Mol. Biol. 16, 499–508 (2009).

  18. 18.

    Agrawal, A., Eastman, Q. M. & Schatz, D. G. Transposition mediated by RAG1 and RAG2 and its implications for the evolution of the immune system. Nature 394, 744–751 (1998).

  19. 19.

    Hiom, K., Melek, M. & Gellert, M. DNA transposition by the RAG1 and RAG2 proteins: a possible source of oncogenic translocations. Cell 94, 463–470 (1998).

  20. 20.

    Kim, M. S., Lapkouski, M., Yang, W. & Gellert, M. Crystal structure of the V(D)J recombinase RAG1–RAG2. Nature 518, 507–511 (2015).

  21. 21.

    Ru, H. et al. Molecular mechanism of V(D)J recombination from synaptic RAG1-RAG2 complex structures. Cell 163, 1138–1152 (2015).

  22. 22.

    Kim, M. S. et al. Cracking the DNA code for V(D)J recombination. Mol. Cell 70, 358–370.e354, (2018).

  23. 23.

    Ru, H. et al. DNA melting initiates the RAG catalytic pathway. Nat. Struct. Mol. Biol. 25, 732–742 (2018).

  24. 24.

    Kriatchko, A. N., Anderson, D. K. & Swanson, P. C. Identification and characterization of a gain-of-function RAG-1 mutant. Mol. Cell. Biol. 26, 4712–4728 (2006).

  25. 25.

    Sakano, H., Hüppi, K., Heinrich, G. & Tonegawa, S. Sequences at the somatic recombination sites of immunoglobulin light-chain genes. Nature 280, 288–294 (1979).

  26. 26.

    Hsu, E. & Lewis, S. M. in Molecular Biology of B Cells (eds Alt, F. W. et al.) Ch. 9, 133–149 (Academic, 2015).

  27. 27.

    Jones, J. M. & Gellert, M. The taming of a transposon: V(D)J recombination and the immune system. Immunol. Rev. 200, 233–248 (2004).

  28. 28.

    Chatterji, M., Tsai, C. L. & Schatz, D. G. New concepts in the regulation of an ancient reaction: transposition by RAG1/RAG2. Immunol. Rev. 200, 261–271 (2004).

  29. 29.

    Neiditch, M. B., Lee, G. S., Huye, L. E., Brandt, V. L. & Roth, D. B. The V(D)J recombinase efficiently cleaves and transposes signal joints. Mol. Cell 9, 871–878 (2002).

  30. 30.

    Lu, C., Ward, A., Bettridge, J., Liu, Y. & Desiderio, S. An autoregulatory mechanism imposes allosteric control on the V(D)J recombinase by histone H3 methylation. Cell Rep. 10, 29–38 (2015).

  31. 31.

    Ward, A., Kumari, G., Sen, R. & Desiderio, S. The RAG-2 inhibitory domain gates accessibility of the V(D)J recombinase to chromatin. Mol. Cell. Biol. 38, e00159 (2018).

  32. 32.

    Corneo, B. et al. Rag mutations reveal robust alternative end joining. Nature 449, 483–486 (2007).

  33. 33.

    Coussens, M. A. et al. RAG2’s acidic hinge restricts repair-pathway choice and promotes genomic stability. Cell Rep. 4, 870–878 (2013).

  34. 34.

    Kapitonov, V. V. & Jurka, J. RAG1 core and V(D)J recombination signal sequences were derived from Transib transposons. PLoS Biol. 3, e181 (2005).

  35. 35.

    Krupovic, M., Béguin, P. & Koonin, E. V. Casposons: mobile genetic elements that gave rise to the CRISPR-Cas adaptation machinery. Curr. Opin. Microbiol. 38, 36–43 (2017).

  36. 36.

    Majumdar, S., Singh, A. & Rio, D. C. The human THAP9 gene encodes an active P-element DNA transposase. Science 339, 446–448 (2013).

  37. 37.

    Henssen, A. G. et al. Genomic DNA transposition induced by human PGBD5. eLife 4, e10565 (2015).

  38. 38.

    Lefranc, M. P. & Lefranc, G. The T Cell Receptor FactsBook (Academic, 2001).

  39. 39.

    Bergeron, S., Anderson, D. K. & Swanson, P. C. RAG and HMGB1 proteins: purification and biochemical analysis of recombination signal complexes. Methods Enzymol. 408, 511–528 (2006).

  40. 40.

    Zheng, S. Q. et al. MotionCor2: anisotropic correction of beam-induced motion for improved cryo-electron microscopy. Nat. Methods 14, 331–332 (2017).

  41. 41.

    Zhang, K. Gctf: Real-time CTF determination and correction. J. Struct. Biol. 193, 1–12 (2016).

  42. 42.

    Scheres, S. H. RELION: implementation of a Bayesian approach to cryo-EM structure determination. J. Struct. Biol. 180, 519–530 (2012).

  43. 43.

    Kimanius, D., Forsberg, B. O., Scheres, S. H. & Lindahl, E. Accelerated cryo-EM structure determination with parallelisation using GPUs in RELION-2. eLife 5, e18722 (2016).

  44. 44.

    Slootweg, E. J. et al. Structural determinants at the interface of the ARC2 and leucine-rich repeat domains control the activation of the plant immune receptors Rx1 and Gpa2. Plant Physiol. 162, 1510–1528 (2013).

  45. 45.

    Zhang, Y. H., Shetty, K., Surleac, M. D., Petrescu, A. J. & Schatz, D. G. Mapping and quantitation of the interaction between the recombination activating gene proteins RAG1 and RAG2. J. Biol. Chem. 290, 11802–11817 (2015).

  46. 46.

    Kozuki, T. et al. Roles of the C-terminal domains of topoisomerase IIα and topoisomerase IIβ in regulation of the decatenation checkpoint. Nucleic Acids Res. 45, 5995–6010 (2017).

  47. 47.

    Phillips, J. C. et al. Scalable molecular dynamics with NAMD. J. Comput. Chem. 26, 1781–1802 (2005).

  48. 48.

    Pawlowski, M., Bogdanowicz, A. & Bujnicki, J. M. QA-RecombineIt: a server for quality assessment and recombination of protein models. Nucleic Acids Res. 41, W389–W397 (2013).

  49. 49.

    Trabuco, L. G., Villa, E., Mitra, K., Frank, J. & Schulten, K. Flexible fitting of atomic structures into electron microscopy maps using molecular dynamics. Structure 16, 673–683 (2008).

  50. 50.

    Emsley, P., Lohkamp, B., Scott, W. G. & Cowtan, K. Features and development of Coot. Acta Crystallogr. D 66, 486–501 (2010).

  51. 51.

    DiMaio, F. et al. Atomic-accuracy models from 4.5-Å cryo-electron microscopy data with density-guided iterative local refinement. Nat. Methods 12, 361–365 (2015).

  52. 52.

    Adams, P. D. et al. PHENIX: a comprehensive Python-based system for macromolecular structure solution. Acta Crystallogr. D 66, 213–221 (2010).

  53. 53.

    Chen, V. B. et al. MolProbity: all-atom structure validation for macromolecular crystallography. Acta Crystallogr. D 66, 12–21 (2010).

  54. 54.

    Barad, B. A. et al. EMRinger: side chain-directed model and map validation for 3D cryo-electron microscopy. Nat. Methods 12, 943–946 (2015).

  55. 55.

    Pettersen, E. F. et al. UCSF Chimera—a visualization system for exploratory research and analysis. J. Comput. Chem. 25, 1605–1612 (2004).

  56. 56.

    Coster, G., Gold, A., Chen, D., Schatz, D. G. & Goldberg, M. A dual interaction between the DNA damage response protein MDC1 and the RAG1 subunit of the V(D)J recombinase. J. Biol. Chem. 287, 36488–36498 (2012).

  57. 57.

    Hu, J. et al. Detecting DNA double-stranded breaks in mammalian genomes by linear amplification-mediated high-throughput genome-wide translocation sequencing. Nat. Protocols 11, 853–871 (2016).

  58. 58.

    Lapkouski, M., Chuenchor, W., Kim, M. S., Gellert, M. & Yang, W. Assembly pathway and characterization of the RAG1/2-DNA paired and signal-end complexes. J. Biol. Chem. 290, 14618–14625 (2015).

Download references


We thank Y. Kong for computational analysis of genome transposition data, W. Eliason for assistance with size-exclusion chromatography-multiple angle light scattering, C. Akey for help with freezing of the grids for BbRAGL-intact 3′TIR, R. Huang and H. Chuan for help with cryo-EM data collection, J. Wang for advice and insight in structural analysis, E. Hendrickson for HCT116 cells, XEGEN for NBD phylogenetic sequence analysis, Z. Chou, X. Liu and H. Zhang for an insertion-site mapping script, M. Ciubotaru for advice and members of the Schatz laboratory for comments on the manuscript. This work was supported in part by R01 AI32524 and R01 AI137079 (D.G.S.), R01 AI102778 (Y.X.), 2013CB917800 from the Ministry of Science and Technology of China and 91231206 from the National Natural Science Foundation of China (A.X.), UEFISCDI grant PN-III-ID-PCE-2016-0650 and Romanian Academy programs 1 & 3 of IBAR (M.D.S. and A.J.P.) and grants from the Centre National de la Recherche Scientifique and Aix-Marseille Université (P.P.).

Reviewer information

Nature thanks Frederick Alt, Thomas Boehm and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Author information

Y.Z. and D.G.S. designed the experiments. Y.Z. purified the proteins and performed the biochemical and cell-based experiments. T.C.C. performed freezing of the cryo-EM grids, data collection and processing, and model building with input from Y.X. Y.Z. and T.C.C. performed structural analyses. M.D.S. and A.J.P. created the computational model of BbRAGL. G.H. and Q.L. helped to establish the in vivo transposition assays, and Q.L. performed the human RAG transposition assays. J.D.M. performed computational analysis of genome transposition data. A.X. provided the BbRAG1L and BbRAG2L codon-optimized cDNAs and information about BbRAGL function. P.P. performed phylogenetic analyses of BbRAG1L sequences. D.G.S. wrote the paper with input from other authors.

Competing interests

The authors declare no competing interests.

Correspondence to Anlong Xu or Yong Xiong or David G. Schatz.

Extended data figures and tables

  1. Extended Data Fig. 1 ProtoRAG transposon and analysis of the BbRAG1L NBD* domain.

    a, Schematic of the ProtoRAG transposon and, below it, the jawed-vertebrate RAG locus and prototypical antigen receptor gene (IGH). b, Schematic of full-length and truncated BbRAG1L proteins (top), and cleavage reactions performed with those proteins (plus BbRAG2L) and TIR substrates, as indicated above and below the lanes. Core BbRAG1L (aa 468–1136) retains the cleavage pattern of full-length BbRAG1L, whereas full-length BbRAG1L exhibits strong single TIR cleavage (lanes 6, 7). Closed and open arrowheads, single 5′TIR and single 3′TIR cleavage products, respectively. For gel source data, see Supplementary Fig. 1. c, Sequence alignment of BbRAG1L NBD* with RAG1 NBD showing divergent sequences with similar predicted secondary structure elements (α-helices 1, 2 and 3). d, SEC-multiple angle light scattering (SEC-MALS) analysis of the purified NBD* protein, indicating that the protein is a dimer in solution.

  2. Extended Data Fig. 2 Biochemical properties and cryo-EM structure of cBbRAGL–3′TIR synaptic complexes.

    a, SEC-MALS of MBP–cBbRAGL, which indicates that the complex is a heterotetramer with two subunits each of cBbRAG1L and BbRAG2L. b, c, SEC profiles of cBbRAGL incubated with intact (b) or nicked (c) 3′TIR, 5′TIR or 3′/5′TIRs showing resolution of protein–DNA complex from free DNA. Gels display the components of pooled column fractions containing the protein–DNA complex. d, Representative 2D class averages of cryo-EM particles of cBbRAGL bound to intact or nicked 3′TIRs. e, Left, FSC curves of the half maps from gold standard refinements of the cBbRAGL–nicked 3′TIR complex with no symmetry applied (blue) and the cBbRAGL–intact 3′TIR complex with no symmetry applied (red) and with C2 symmetry applied (green). Right, FSC curves of the gold standard refinement of cBbRAGL–nicked 3′TIR complex with C2 symmetry applied (blue) and of the C2 symmetrized map and model (green). Resolutions of the maps are read by the cutoff values at FSC = 0.143. f, Colour-coded local-resolution estimation of the C2 symmetrized map of cBbRAGL in complex with nicked 3′TIR, viewed from a perspective similar (with a 30° rotation) to that of g. Resolution is, in general, better for cBbRAG1L than BbRAG2L. g, h, Cryo-EM maps of cBbRAGL bound to intact 3′TIRs (5.3 Å overall resolution; g) or nicked 3′TIRs (5.0 Å overall resolution; h). One BbRAG1L subunit (grey) has been rendered partially transparent to allow visualization of DNAs inside the protein. Continuous DNA density running through the protein core is visible with nicked but not intact TIRs, which suggests that the DNA in the vicinity of the active site becomes more rigidly constrained upon nicking. This is notable in light of the recent finding that DNA in the RAG active site melts and swivels in preparation for nicking23. Clear differences between the two DNAs are visible in the bottom half of the structures, with 3′TIR-a (orange) protruding below the protein and density for 3′TIR-b (red) dissipating before the DNA emerges from the protein core. This argues that the two identical DNA molecules are engaged differently by cBbRAGL, with one (3′TIR-b) less rigidly constrained by its interactions with protein.

  3. Extended Data Fig. 3 Structural features of cBbRAGL.

    a, Comparison of the models of cBbRAGL and cRAG (PDB 5ZDZ) bound to nicked DNA but with DNA removed, illustrating the absence of NBD* from the cBbRAGL structure. NBD is a dimer that can pivot on a flexible hinge to accommodate the different spacer lengths of 12RSS and 23RSS, providing a structural explanation for the 12/23 rule20,21,22,58. We speculate that NBD*, HMGB1 and distal TIR sequences constitute a flexible domain located below the main complex, by analogy with RAG–RSS complexes. b, Superimposition of cBbRAGL–nicked 3′TIR synaptic complex with RAG–nicked RSS synaptic complex (PDB 5ZDZ). cBbRAG2L adopts a doughnut-shaped structure consistent with that of a six-bladed β-propeller. Because of low resolution, some elements cannot be unambiguously modelled as β-strands. Putative β-propellers I–VI are labelled, as are the N and C termini of the protein, showing that—as with RAG2—propeller I is composed of both N- and C-terminal sequences. d, Colour-coded linear diagram of cBbRAG1L subdomains (top) and cartoon of the BbRAG1L dimer (bottom) with the subdomains of one subunit colour-coded as in the linear diagram. The other subunit is grey except for the preR subdomain. Stars indicate a gap in the BbRAG1L model that spans aa 603–630. Nomenclature and figure layout as in ref. 20. DDBD, dimerization and DNA binding domain; PreR, pre-RNase H domain; RNH, RNase H domain; ZnC2 and ZnH2, domains that contribute two cysteines and two histidines, respectively, for zinc coordination; CTD, C-terminal domain; CTT*, C-terminal tail. e, Superimposition of cryo-EM map on the model of the nicked 3′TIR in the vicinity of the flipped bases near the site of nicking. f, g, Three-dimensional classes of cryo-EM maps of cBbRAGL bound to intact (f) or nicked (g) 3′TIRs (DNA omitted). One class is enlarged and shown from two vantage points below. The arrow points to the cleft that narrows in the open-to-closed transition. With intact DNA, three distinct 3D classes are distinguishable that vary in the degree of closure of the two arms of the V. h, Superimposition of three forms of cBbRAGL illustrating the movement of a 3′TIR and BbRAG2L subunit (colour-coded as in e, f) that takes place during the open-to-closed transition. One cBbRAG1L–2L dimer has been aligned and movement is visualized in the other dimer.

  4. Extended Data Fig. 4 Protein–DNA interactions in the cBbRAGL–nicked 3′TIR synaptic complex.

    a, Schematic diagram of the detailed interactions between BbRAG1L and nicked 3′TIR DNA. Bold underlined text, main-chain interactions; regular text, side-chain interactions; purple text, interactions involving BbRAG1L subunit a (defined as the subunit with an active site that engages the TIR depicted); cyan text, interactions involving symmetric BbRAG1L subunit b. BbRAG2L–DNA interactions could not be unambiguously assigned and are not depicted. b, c, Orthogonal views of the nicked 3′TIR–BbRAG1L subunit a interaction (b) and the nicked RSS–RAG1 subunit a interaction (c). Protein electrostatic surface potential is indicated with blue (positive charge) and red (negative charge) using the scale (KT/e) below d, e. d, BbRAG1L subunit b–nicked 3′TIR interaction. e, RAG1 subunit b–nicked RSS interaction.

  5. Extended Data Fig. 5 CTT, CTT* and mutational analysis of ProtoRAG TIRs.

    a, Superimposition showing CTT* extending from a structurally conserved region at the C terminus of the catalytic core regions of mouse RAG1 (mRAG1), zebrafish RAG1 (zRAG1) and BbRAG1L. b, Sequence alignment of CTT from six vertebrate RAG1 proteins. Species name abbreviations used in this paper: Mm, Mus musculus (mouse); Hs, Homo sapiens (human); Gg, Gallus gallus (chicken); Xl, Xenopus tropicalis (frog); Dr, Danio rerio (zebrafish); Bb, Branchiostoma belcheri (amphioxus); Pf, Ptychodera flava (acorn worm); Sp, Strongylocentrotus purpuratus (purple sea urchin); Af, Asterias forbesi (sea star); Hv, Hydra vulgaris (hydra); Rn, Rattus norvegicus (rat). c, Schematic indicating sub-regions of TIRs. Region 1 contains the heptamer and one additional bp, which in Fig. 1a and throughout the paper is defined as part of TR2. Otherwise, region 2 (broken up into 2a and 2b for the 5′TIR) corresponds to TR2. Poorly conserved regions 3 and 4 separate TR2 from a distal conserved 9-bp element (region 5). dg, Cleavage of substrates containing a single 5′TIR (d, e) or a single 3′TIR (f, g), either intact (WT) or with the indicated region scrambled, by cBbRAGL (d, f) or the ΔNBD* cBbRAGL complex (e, g). Closed and open arrowheads, 5′TIR and 3′TIR cleavage products, respectively. Region 5 is completely dispensable for cleavage, and regions 3 and 4 contribute modestly to 3′TIR—but not 5′TIR—cleavage. Upon deletion of NBD* from cBbRAG1L, 3′TIR cleavage loses all dependency on regions 3 and 4, consistent with the possibility that NBD* engages in functionally important interactions with regions 3 and 4 of the 3′TIR.

  6. Extended Data Fig. 6 Activities of chimeric RAG1–BbRAG1L proteins and residues that influence coupled cleavage.

    a, b, Cleavage by NBD–CC* is dependent on the length of the spacer between the TIR heptamer and the RSS nonamer. Substrates depicted schematically above the gel images. In a, the substrates contain a single target based on T1 (Fig. 3b), the spacer of which ranges in length from 10 to 14 bp. In b, the substrate contains target T1 and a partner target based on T2 (Fig. 3b), the spacer of which ranges in length from 20 to 25 bp. Dark arrowheads, T1 cleavage products; open arrowheads, T2 cleavage products. c, d, Cleavage reactions using the NBD*–CC–CTT* and CC–CTT* proteins and T3 and T4 substrates (all depicted schematically in Fig. 3c), as indicated above the lanes. T3* and T4*, T3 and T4 targets with a C-to-A mutation of heptamer position 1 that renders the target uncleavable; [T4/T4]H+TR2 and [T3/T3]H+TR2, substrates in which both targets have had all substrate sequences except the heptamer and TR2 deleted. Asterisks as in Fig. 2g. e, Cartoon depicting differences in the major protein–DNA interactions of BbRAGL and RAG. f, Superposition of RAG1 and BbRAG1L in the region containing Glu649 and Ser963 in complexes bound to nicked DNA substrates, illustrating the similarity of positioning of the active site residues Glu962 and Glu1063 and flanking residues Asn961 and Asn1062. h, RAG1 Asn961 and BbRAG1L Asn1062 have the potential to participate in hydrogen-bond networks after nicking and could thereby stabilize the hairpin-competent configuration of the enzyme. This is notable in light of the fact that mutant RAG1(N961A) displays enhanced coupled cleavage compared to wild-type RAG1. i, Cleavage reactions using wild-type and mutant cBbRAG1L proteins (with BbRAG2L) and substrates containing one or two TIRs as indicated above and below the lanes (left). cBbRAG1L(V751E), but not cBbRAG1L(A1064S), reduces uncoupled single 3′TIR cleavage (lower black asterisk, lane 2; reduction also seen in lane 8) and single 5′TIR cleavage (seen most clearly in lane 5). The strong reduction in cleavage seen with the BbRAG1L(V751E/A1064S) double mutant suggests the possibility that hydrogen bonding between these two residues holds the active site in an inactive configuration. Right, quantification of uncoupled cleavage as the ratio of the intensity of the 3′TIR single cleavage band (lower black asterisk) to that of the double cleavage band (red asterisk) as in lanes 1–3. Mean ± s.e.m. Two-tailed t-test: **P < 0.01, compared to wild-type cBbRAG1L. ns, not significant.

  7. Extended Data Fig. 7 In vitro transposition by wild-type and mutant RAG proteins.

    a, Schematic of intramolecular transposition. If the 3′ OH nucleophiles attack the strand on which they are located, the products are two deletion circles (top), but if they attack the opposite strand, a single inversion circle product is generated (bottom). Staggered attack on the target DNA backbone yields single-stranded gaps in the products, represented as five short vertical lines. b, Inverse PCR reaction to amplify inversion circles from purified intramolecular transposition product as in Fig. 5d, third lane. The band indicated with an arrow was excised, cloned and sequenced, yielding sites at which intramolecular transposition occurred to yield inversion circles, indicated in the map of the excised 12/23RSS central fragment (below). Half arrows indicate approximate locations of PCR primers. The location of deletion circle joints detected by sequencing are not indicated. c, Schematic of intermolecular in vitro transposition assay. An RSS-flanked Tet gene is mobilized from a linear donor by RAG-mediated DNA cleavage and can transpose into a target plasmid, which is detected after bacterial transformation by the appearance of colonies on Kan/Tet/Str (KTS) plates (streptomycin (str) is not relevant in this assay). d, In vitro DNA cleavage and intramolecular transposition by position 848-mutant cRAG1 (with RAG2(1–383)). Increased transposition compared to wild-type cRAG1 is revealed by diminished intensity of the double cleavage band and increased intensity of the slow-migrating intramolecular inversion circle transposition product band (red arrow). The intensity of the inversion circle band underestimates the efficiency of transposition because deletion circle transposition products—which are of heterogeneous size, and hence not visible as a discrete band—are also produced18. e, Quantification of intramolecular transposition efficiency from three independent experiments as in d, measured by ratio of double cleavage band to 23RSS cleavage band (the latter serving as an internal control for the total amount of cleavage). The ratio decreases as intramolecular transposition increases in efficiency, consuming the double cleavage band. Mean, with data range indicated by box. Two tailed t-test; P values are indicated. f, Distribution of transposition target site duplication lengths determined by sequencing of plasmid transposition products or from high-throughput sequencing of plasmid-to-genome transposition products (Extended Data Fig. 9d), as indicated above the bars. The RAG1 protein used is indicated below the bars. In vitro reactions as in Fig. 5e using RAG2(1–383); in vivo plasmid target reactions as in Fig. 5g using RAG2(1–350); genome transposition products generated using RAG2(1–350). In a small fraction of plasmids, sequencing revealed deletions at the site of insertion of the RSSs (red; deletion). g, In vitro cleavage and intramolecular transposition reactions using RAG2(1–352) and RAG2(1–383) (as indicated above the lanes) and wild-type or mutant cRAG1 (as indicated below the lanes). Transposition is readily detected with both forms of RAG2 and is increased by the RAG1(R848M) mutation. h, In vitro intermolecular transposition assays using RAG2(1–383) and RAG2(1–352) and wild-type or mutant cRAG1 (as indicated below the lanes). Deleting the RAG2 acidic hinge does not increase the efficiency of intermolecular transposition in vitro.

  8. Extended Data Fig. 8 In vivo transposition by RAG and BbRAGL proteins.

    a, Schematic of plasmid-to-plasmid in vivo transposition assay. An RSS-flanked Tet gene is mobilized from a donor plasmid by RAG-mediated DNA cleavage and can transpose into a target plasmid, which is detected after bacterial transformation by the appearance of colonies on KTS plates (streptomycin (str) reduces background in the assay by selecting against bacteria containing the rpsL gene, present in the donor plasmid). b, Schematic of in vivo GFP fluorescence recombination assay, used to generate data in c (right), e (right) and g. Excision of the polyadenylation sequence (poly-A) together with its flanking RSSs or TIRs (triangles) by RAG or BbRAGL and resealing of the plasmid allows expression of GFP. c, In vivo transposition (left) and recombination (right) activity in HEK293T cells of wild-type BbRAG1L and BbRAG1L(M949R) (together with BbRAG2L). Mean ± s.e.m. Two-tailed t-test: ***P < 0.005, compared to wild-type BbRAG1L. d, In vivo transposition activity assayed in human colon cancer cell line HCT116 with full-length RAG1(R848M/E649V) and either RAG2(1–350) or RAG2(1–383). As in HEK293T cells, transposition is strongly inhibited by the RAG2 acidic hinge. Mean ± s.e.m. e, In vivo transposition (left) and recombination (right) activity in HEK293T cells of human wild-type RAG1 and RAG1(R851M) with different forms of human RAG2, beginning at amino acid 1 and ending with the amino acid indicated below the bars. Mean ± s.e.m. Two-tailed t-test: ***P < 0.005; ****P < 0.001 compared to wild-type human RAG1. f, g, Protein expression (f) and recombination activity (g) in HEK293T cells of wild-type and mutant mouse RAG1 and RAG2 proteins used in the in vivo transposition assays. The data show that the large increases in transposition activity observed with some proteins (for example, RAG2(1–350), RAG2(1–352), and RAG1(R848M)) are not due to large increases in protein expression or cleavage/recombination activity. h, i, In vivo transposition activity assayed in HEK293T cells with full length RAG1(R848M) (h) or RAG1(R848M/E649V) (i) and various forms of RAG2, beginning at amino acid 1 and ending with the amino acid indicated below the bars. FL, full-length RAG2.

  9. Extended Data Fig. 9 Transposition into the human genome by mutant RAG proteins.

    a, Schematic of plasmid-to-genome in vivo transposition assay. An RSS-flanked Puro expression cassette is mobilized from a plasmid donor by RAG-mediated DNA cleavage and can transpose into the genome, which is detected by selection with puromycin and high-throughput sequencing. b, Schematic illustrating detection of bone fide transposition events into the genome by LAM-PCR and high-throughput sequencing. LAM-PCR is performed on genomic DNA with biotinylated primers (half arrows) that extend into the DNA flanking either the 12RSS or 23RSS; thereafter, independent libraries are prepared and sequenced for the 12RSS and 23RSS flanks. If the donor plasmid randomly inserts into genome (i), then the RSS is flanked by donor plasmid sequences. If the RSS fragment is cleaved at one or both RSSs and randomly inserted into genome (ii), then a match with an appropriate sequence duplication (indicative of a TSD) will not be found between the 12RSS and 23RSS libraries. Finally, if the RSS fragment is inserted into the genome by transposition (iii), a match with a 3–7-bp TSD will be found in the 12RSS and 23RSS libraries. c, Tissue culture plates stained with crystal violet showing puromycin-resistant colonies for experiments using RAG2(1–350) and either wild-type RAG1 or RAG1(R848M/E649V). Colony numbers increase about twofold with the mutant RAG1 protein but many colonies are seen with wild-type RAG1 owing to random integration of the donor plasmid. Essentially no colonies are seen if the donor plasmid is omitted (first column of plates) d, Summary of sequence data from the plasmid-to-genome transposition experiments. For each of the six libraries, column 1 shows the total number of reads with a barcode and RSS, columns 2 and 3 show a breakdown of number of reads in which RSS flanking sequences map to the human genome or the donor plasmid (a small fraction of reads do map to either genome or plasmid owing to poor read quality), column 4 shows the number of unique reads that map to the genome (after elimination of duplicates) and column 5 shows the number of bone fide transposition events detected. e, Rainfall circos plot of transposition events into chromosomes of HEK293T cells. fh, Genome features of transposon integration sites mediated by RAG1(R848M/E649V) and RAG2(1–350). f, Number (per cent) of transposition events into the genome features indicated. One-tailed Fisher’s exact test was used to determine whether the frequency of transposition events was greater than that expected by chance: genes (P = 9 × 10–30); protein-coding genes (P = 5 × 10–35); exons (P = 6 × 10–86); protein-coding exons (P = 4 × 10–82) and within 2 kb of a TSS (P = 5 × 10–180). g, h, Meta-analysis of integration sites within gene bodies (g) and flanking TSSs (h).

  10. Extended Data Fig. 10 Model of RAG evolution in metazoans.

    Steps leading from the ancestral Transib transposon, consisting of a RAG1-like ORF flanked by RSS-like TIRs34, to the RAG recombinase and split antigen receptor genes of jawed vertebrates. I, capture of a RAG2-like ORF by a Transib transposon to generate the ancestral RAG transposon in an early deuterostome; II, key events in the evolution of RAG1, RAG2 and antigen receptor genes of jawed vertebrates: A, insertion of the RAG transposon into the exon of a gene encoding an immunoglobulin-domain receptor protein to generate the ancestral antigen receptor gene; B, loss of CTT* and acquisition of Glu649 and Ser963 by RAG1 facilitated evolution of the 12/23 rule and coupled cleavage, respectively, while acquisition of RAG1 Arg848 and the RAG2 acidic hinge powerfully suppressed RAG transposition activity. The order of events depicted in II is not known. RAG-related elements, if found in members of a given lineage, are indicated at right, as is the presence of the CTT* domain. Protostome lineages have been collapsed into a single branch. While vertical transmission is consistent with the distribution of RAG1 and RAG2 transposon and recombinase elements in deuterostomes11, horizontal transmission might have contributed to the spread of Transib elements.

  11. Extended Data Table 1 Cryo-EM data collection, refinement and validation statistics

Supplementary information

  1. Supplementary Data

    This file contains Supplementary Data 1: Uncropped gel images for data obtained by gel-based electrophoretic separation and Supplementary Data 2: Examples of flow cytometry data for the in vivo GFP fluorescence recombination assay used to generate data for Extended Data Fig. 8c (right), e (right) and g

  2. Reporting Summary

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark
Fig. 1: Uncoupled DNA cleavage by BbRAGL.
Fig. 2: Cryo-EM structure of cBbRAGL-nicked 3′TIR complex.
Fig. 3: DNA cleavage properties of chimeric RAG1–BbRAG1L proteins.
Fig. 4: Residues that control coupled cleavage.
Fig. 5: Reawakening the RAG transposon in vivo.
Extended Data Fig. 1: ProtoRAG transposon and analysis of the BbRAG1L NBD* domain.
Extended Data Fig. 2: Biochemical properties and cryo-EM structure of cBbRAGL–3′TIR synaptic complexes.
Extended Data Fig. 3: Structural features of cBbRAGL.
Extended Data Fig. 4: Protein–DNA interactions in the cBbRAGL–nicked 3′TIR synaptic complex.
Extended Data Fig. 5: CTT, CTT* and mutational analysis of ProtoRAG TIRs.
Extended Data Fig. 6: Activities of chimeric RAG1–BbRAG1L proteins and residues that influence coupled cleavage.
Extended Data Fig. 7: In vitro transposition by wild-type and mutant RAG proteins.
Extended Data Fig. 8: In vivo transposition by RAG and BbRAGL proteins.
Extended Data Fig. 9: Transposition into the human genome by mutant RAG proteins.
Extended Data Fig. 10: Model of RAG evolution in metazoans.


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.