Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Evolutionary mining and functional characterization of TnpB nucleases identify efficient miniature genome editors

Abstract

As the evolutionary ancestor of Cas12 nuclease, the transposon (IS200/IS605)-encoded TnpB proteins act as compact RNA-guided DNA endonucleases. To explore their evolutionary diversity and potential as genome editors, we screened TnpBs from 64 annotated IS605 members and identified 25 active in Escherichia coli, of which three are active in human cells. Further characterization of these 25 TnpBs enables prediction of the transposon-associated motif (TAM) and the right-end element RNA (reRNA) directly from genomic sequences. We established a framework for annotating TnpB systems in prokaryotic genomes and applied it to identify 14 additional candidates. Among these, ISAam1 (369 amino acids (aa)) and ISYmu1 (382 aa) TnpBs demonstrated robust editing activity across dozens of genomic loci in human cells. Both RNA-guided genome editors demonstrated similar editing efficiency as SaCas9 (1,053 aa) while being substantially smaller. The enormous diversity of TnpBs holds potential for the discovery of additional valuable genome editors.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Curation of ISfinder-annotated TnpB systems.
Fig. 2: Analyses of TAM and TnpB activity in E. coli.
Fig. 3: Identification of three active TnpB systems in human HEK293T cells.
Fig. 4: Characterization of TnpB-associated reRNA.
Fig. 5: Evolutionary and functional properties of active TnpB systems.
Fig. 6: De novo annotation and in-depth characterization of ISAam1 and ISYmu1 TnpB systems.

Similar content being viewed by others

Data availability

All data generated or analyzed in this study are included in this article, the supplementary information files or the dedicated databases. Small-scale datasets are directly shown in the main figures, extended data figures or supplementary tables. Sequencing data generated in this study were concurrently submitted to the NCBI Sequence Read Archive under accession number PRJNA937454 (ref. 84) and the National Genomics Data Center (part of the China National Center for Bioinformation) under accession number PRJCA015164 (ref. 85). Source data are provided with this paper.

Code availability

The codes developed for TAM depletion analysis and de novo IS605 annotation have been wrapped as two Snakemake86 workflows, accessible at both Zenodo87 and GitHub (https://github.com/Zhanglab-IOZ/TnpB).

References

  1. Cong, L. et al. Multiplex genome engineering using CRISPR/Cas systems. Science 339, 819–823 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Mali, P. et al. RNA-guided human genome engineering via Cas9. Science 339, 823–826 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Zetsche, B. et al. Cpf1 is a single RNA-guided endonuclease of a class 2 CRISPR–Cas system. Cell 163, 759–771 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Wang, H. et al. One-step generation of mice carrying mutations in multiple genes by CRISPR/Cas-mediated genome engineering. Cell 153, 910–918 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Adli, M. The CRISPR tool kit for genome editing and beyond. Nat. Commun. 9, 1911 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  6. Anzalone, A. V., Koblan, L. W. & Liu, D. R. Genome editing with CRISPR–Cas nucleases, base editors, transposases and prime editors. Nat. Biotechnol. 38, 824–844 (2020).

    Article  CAS  PubMed  Google Scholar 

  7. Doudna, J. A. The promise and challenge of therapeutic genome editing. Nature 578, 229–236 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Rousset, F. & Sorek, R. A treasure trove of molecular scissors. Science 374, 37–38 (2021).

    Article  CAS  PubMed  Google Scholar 

  9. Sun, A. et al. The compact Casπ (Cas12l) ‘bracelet’ provides a unique structural platform for DNA manipulation. Cell Res. 33, 229–244 (2023).

    Article  CAS  PubMed  Google Scholar 

  10. Ran, F. A. et al. In vivo genome editing using Staphylococcus aureus Cas9. Nature 520, 186–191 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Huang, T. P. et al. High-throughput continuous evolution of compact Cas9 variants targeting single-nucleotide-pyrimidine PAMs. Nat. Biotechnol. 41, 96–107 (2023).

    Article  CAS  PubMed  Google Scholar 

  12. Bigelyte, G. et al. Miniature type V-F CRISPR–Cas nucleases enable targeted DNA modification in cells. Nat. Commun. 12, 6191 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  13. Kim, D. Y. et al. Efficient CRISPR editing with a hypercompact Cas12f1 and engineered guide RNAs delivered by adeno-associated virus. Nat. Biotechnol. 40, 94–102 (2022).

    Article  CAS  PubMed  Google Scholar 

  14. Wu, Z. et al. Programmed genome editing by a miniature CRISPR–Cas12f nuclease. Nat. Chem. Biol. 17, 1132–1138 (2021).

    Article  CAS  PubMed  Google Scholar 

  15. Xu, X. et al. Engineered miniature CRISPR–Cas system for mammalian genome regulation and editing. Mol. Cell 81, 4333–4345 (2021).

    Article  CAS  PubMed  Google Scholar 

  16. Altae-Tran, H. et al. The widespread IS200/IS605 transposon family encodes diverse programmable RNA-guided endonucleases. Science 374, 57–65 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Kapitonov, V. V., Makarova, K. S. & Koonin, E. V. ISC, a novel group of bacterial and archaeal DNA transposons that encode Cas9 homologs. J. Bacteriol. 198, 797–807 (2015).

    Article  PubMed  Google Scholar 

  18. Karvelis, T. et al. Transposon-associated TnpB is a programmable RNA-guided DNA endonuclease. Nature 599, 692–696 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Makarova, K. S. et al. Evolutionary classification of CRISPR–Cas systems: a burst of class 2 and derived variants. Nat. Rev. Microbiol. 18, 67–83 (2020).

    Article  CAS  PubMed  Google Scholar 

  20. Filée, J., Siguier, P. & Chandler, M. Insertion sequence diversity in archaea. Microbiol. Mol. Biol. Rev. 71, 121–157 (2007).

    Article  PubMed  PubMed Central  Google Scholar 

  21. He, S. et al. The IS200/IS605 family and ‘Peel and Paste’ single-strand transposition mechanism. Microbiol. Spectr. 3, MDNA3-0039-2014 (2015).

  22. Siguier, P., Gourbeyre, E., Varani, A., Ton-Hoang, B. & Chandler, M. Everyman’s guide to bacterial insertion sequences. Microbiol. Spectr. 3, MDNA3-0030-2014 (2015).

    Article  PubMed  Google Scholar 

  23. Siguier, P., Perochon, J., Lestrade, L., Mahillon, J. & Chandler, M. ISfinder: the reference centre for bacterial insertion sequences. Nucleic Acids Res. 34, D32–D36 (2006).

    Article  CAS  PubMed  Google Scholar 

  24. Makałowski, W., Gotea, V., Pande, A. & Makałowska, I. Transposable elements: classification, identification, and their use as a tool for comparative genomics. In Evolutionary Genomics (ed Anisimova, M.) 177–207 (Springer, 2019).

  25. Barabas, O. et al. Mechanism of IS200/IS605 family DNA transposases: activation and transposon-directed target site selection. Cell 132, 208–220 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. He, S. et al. Reconstitution of a functional IS608 single-strand transpososome: role of non-canonical base pairing. Nucleic Acids Res. 39, 8503–8512 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Brown, C. T. et al. Unusual biology across a group comprising more than 15% of domain bacteria. Nature 523, 208–211 (2015).

    Article  CAS  PubMed  Google Scholar 

  28. Ji, Y. et al. Widespread but poorly understood bacteria: candidate phyla radiation. Microorganisms 10, 2232 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Jager, D., Forstner, K. U., Sharma, C. M., Santangelo, T. J. & Reeve, J. N. Primary transcriptome map of the hyperthermophilic archaeon Thermococcus kodakarensis. BMC Genomics 15, 684 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  30. Gomes-Filho, J. V. et al. Sense overlapping transcripts in IS1341-type transposase genes are functional non-coding RNAs in archaea. RNA Biol. 12, 490–500 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  31. Zhang, F. & Huang, Z. Mechanistic insights into the versatile class II CRISPR toolbox. Trends Biochem. Sci 47, 433–450 (2022).

    Article  CAS  PubMed  Google Scholar 

  32. Kleinstiver, B. P. et al. Engineered CRISPR–Cas9 nucleases with altered PAM specificities. Nature 523, 481–485 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  33. Kim, H. et al. Surrogate reporters for enrichment of cells with nuclease-induced mutations. Nat. Methods 8, 941–943 (2011).

    Article  CAS  PubMed  Google Scholar 

  34. Moon, S. B., Kim, D. Y., Ko, J. H., Kim, J. S. & Kim, Y. S. Improving CRISPR genome editing by engineering guide RNAs. Trends Biotechnol. 37, 870–881 (2019).

    Article  CAS  PubMed  Google Scholar 

  35. Finn, R. D. et al. The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res. 44, D279–D285 (2016).

    Article  CAS  PubMed  Google Scholar 

  36. Pasternak, C. et al. ISDra2 transposition in Deinococcus radiodurans is downregulated by TnpB. Mol. Microbiol. 88, 443–455 (2013).

    Article  CAS  PubMed  Google Scholar 

  37. Takeda, S. N. et al. Structure of the miniature type V-F CRISPR–Cas effector enzyme. Mol. Cell 81, 558–570 (2021).

    Article  CAS  PubMed  Google Scholar 

  38. Xiao, R., Li, Z., Wang, S., Han, R. & Chang, L. Structural basis for substrate recognition and cleavage by the dimerization-dependent CRISPR–Cas12f nuclease. Nucleic Acids Res. 49, 4120–4128 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Al-Shayeb, B. et al. Diverse virus-encoded CRISPR–Cas systems include streamlined genome editors. Cell 185, 4574–4586 (2022).

    Article  CAS  PubMed  Google Scholar 

  40. Pruitt, K. D., Tatusova, T. & Maglott, D. R. NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 35, D61–D65 (2007).

    Article  CAS  PubMed  Google Scholar 

  41. Nobles, C. L. et al. iGUIDE: an improved pipeline for analyzing CRISPR cleavage specificity. Genome Biol. 20, 14 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  42. Edraki, A. et al. A compact, high-accuracy Cas9 with a dinucleotide PAM for in vivo genome editing. Mol. Cell 73, 714–726 (2019).

    Article  CAS  PubMed  Google Scholar 

  43. Awan, M. J. A., Amin, I. & Mansoor, S. Mini CRISPR–Cas12f1: a new genome editing tool. Trends Plant Sci. 27, 110–112 (2021).

    Article  PubMed  Google Scholar 

  44. Nakamura, M., Gao, Y., Dominguez, A. A. & Qi, L. S. CRISPR technologies for precise epigenome editing. Nat. Cell Biol. 23, 11–22 (2021).

    Article  CAS  PubMed  Google Scholar 

  45. Clow, P. A. et al. CRISPR-mediated multiplexed live cell imaging of nonrepetitive genomic loci with one guide RNA per locus. Nat. Commun. 13, 1871 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Taghbalout, A. et al. Enhanced CRISPR-based DNA demethylation by Casilio-ME-mediated RNA-guided coupling of methylcytosine oxidation and DNA repair pathways. Nat. Commun. 10, 4296 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  47. Nakagawa, R. et al. Cryo-EM structure of the transposon-associated TnpB enzyme. Nature 616, 390–397 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Sasnauskas, G. et al. TnpB structure reveals minimal functional core of Cas12 nuclease family. Nature 616, 384–389 (2023).

    Article  CAS  PubMed  Google Scholar 

  49. Shmakov, S. et al. Diversity and evolution of class 2 CRISPR–Cas systems. Nat. Rev. Microbiol. 15, 169–182 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Yin, Y., Yang, B. & Entwistle, S. Bioinformatics identification of anti-CRISPR loci by using homology, guilt-by-association, and CRISPR self-targeting spacer approaches. mSystems 4, e00455–19 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Gilbert, C. & Cordaux, R. Horizontal transfer and evolution of prokaryote transposable elements in eukaryotes. Genome Biol. Evol. 5, 822–832 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  52. Tan, S. et al. LTR-mediated retroposition as a mechanism of RNA-based duplication in metazoans. Genome Res. 26, 1663–1675 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Tan, S. et al. DNA transposons mediate duplications via transposition-independent and -dependent mechanisms in metazoans. Nat. Commun. 12, 4280 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Makalowski, W., Gotea, V., Pande, A. & Makalowska, I. Transposable elements: classification, identification, and their use as a tool for comparative genomics. Methods Mol. Biol. 1910, 177–207 (2019).

    Article  CAS  PubMed  Google Scholar 

  55. Bao, W. & Jurka, J. Homologues of bacterial TnpB_IS605 are widespread in diverse eukaryotic transposable elements. Mob. DNA 4, 12 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Benler, S. & Koonin, E. V. Recruitment of mobile genetic elements for diverse cellular functions in prokaryotes. Front. Mol. Biosci. 9, 821197 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Thornburg, B. G., Gotea, V. & Makałowski, W. Transposable elements as a significant source of transcription regulating signals. Gene 365, 104–110 (2006).

    Article  CAS  PubMed  Google Scholar 

  58. Feschotte, C. Transposable elements and the evolution of regulatory networks. Nat. Rev. Genet. 9, 397–405 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Laura, A. et al. A new view of the tree of life. Nat. Microbiology 1, 1604810 (2016).

    Google Scholar 

  60. Leenay, R. T. et al. Identifying and visualizing functional PAM diversity across CRISPR–Cas systems. Mol. Cell 62, 137–147 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J. 17, 10–12 (2011).

    Article  Google Scholar 

  62. Crooks, G. E., Hon, G., Chandonia, J. M. & Brenner, S. E. WebLogo: a sequence logo generator. Genome Res. 14, 1188–1190 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  63. Hickman, A. B. et al. DNA recognition and the precleavage state during single-stranded DNA transposition in D. radiodurans. EMBO J. 29, 3840–3852 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Park, J., Lim, K., Kim, J.-S. & Bae, S. Cas-analyzer: an online tool for assessing genome editing results using NGS data. Bioinformatics 33, 286–288 (2017).

    Article  CAS  PubMed  Google Scholar 

  65. Dolan, S. jq: command-line JSON processor. https://github.com/jqlang/jq (2018).

  66. Kans, J. Entrez Direct: E-utilities on the Unix command line. In Entrez Programming Utilities Help (National Center for Biotechnology Information, 2013).

  67. Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  68. Steinegger, M. & Soding, J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat. Biotechnol. 35, 1026–1028 (2017).

    Article  CAS  PubMed  Google Scholar 

  69. Lorenz, R. et al. ViennaRNA Package 2.0. Algorithms Mol. Biol. 6, 26 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  70. Katoh, K. & Standley, D. M. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772–780 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  71. Capella-Gutierrez, S., Silla-Martinez, J. M. & Gabaldon, T. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25, 1972–1973 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  72. Minh, B. Q. et al. IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era. Mol. Biol. Evol. 37, 1530–1534 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  73. Shen, W. & Ren, H. TaxonKit: a practical and efficient NCBI taxonomy toolkit. J. Genet. Genomics 48, 844–850 (2021).

    Article  PubMed  Google Scholar 

  74. Mistry, J. et al. Pfam: the protein families database in 2021. Nucleic Acids Res. 49, D412–D419 (2021).

    Article  CAS  PubMed  Google Scholar 

  75. Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  76. Zhang, Y. TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res. 33, 2302–2309 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  77. Tung, Ho,Ls & Ané, C. A linear-time algorithm for Gaussian and non-Gaussian trait evolution models. Syst. Biol. 63, 397–408 (2014).

    Article  Google Scholar 

  78. Ives, A. R. & Garland, T. Jr. Phylogenetic logistic regression for binary dependent variables. Syst. Biol. 59, 9–26 (2010).

    Article  PubMed  Google Scholar 

  79. Burnham, K. P. & Anderson, D. R. Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach (Springer, 2002).

  80. Eddy, S. R. et al. HMMER: biosequence analysis using profile hidden Markov models. http://hmmer.org (2020).

  81. Huang, L. et al. AcrDB: a database of anti-CRISPR operons in prokaryotes and viruses. Nucleic Acids Res. 49, D622–D629 (2021).

    Article  CAS  PubMed  Google Scholar 

  82. Pourcel, C. et al. CRISPRCasdb a successor of CRISPRdb containing CRISPR arrays and cas genes from complete genome sequences, and tools to download and query lists of repeats and spacers. Nucleic Acids Res. 48, D535–D544 (2020).

    CAS  PubMed  Google Scholar 

  83. Alkhnbashi, O. S., Meier, T., Mitrofanov, A., Backofen, R. & Voss, B. CRISPR–Cas bioinformatics. Methods 172, 3–11 (2020).

    Article  CAS  PubMed  Google Scholar 

  84. Xiang, G. et al. Screening and characterization of TnpB systems. https://www.ncbi.nlm.nih.gov/bioproject/PRJNA937454 (2023).

  85. Xiang, G. et al. Screening and characterization of TnpB systems. https://ngdc.cncb.ac.cn/bioproject/browse/PRJCA015164 (2023).

  86. Koster, J. & Rahmann, S. Snakemake—a scalable bioinformatics workflow engine. Bioinformatics 28, 2520–2522 (2012).

    Article  PubMed  Google Scholar 

  87. Yuanqing, L. Snakemake workflows for TAM depletion analysis and de novo IS605 annotation. https://doi.org/10.5281/zenodo.7952678 (2023).

Download references

Acknowledgements

We thank the Barrangou laboratory for sharing the pBAD33 plasmid. We thank Wang and Zhang laboratory members for helpful discussions. We thank the HPC Platform of the Beijing Institute of Genomics for providing the computational platform. We thank B. Peng at Peking University for data processing. We thank W. Li and K. Xu for helping with AAV experiments. We thank D. Liu for helping with project management. This work was supported by the National Key Research and Development Program of China (2019YFA0110000 to H.W. and 2019YFA0802600 to Y.E.Z.), the Strategic Priority Research Program of the Chinese Academy of Sciences (XDA16010503 to H.W.), the Chinese Academy of Sciences (ZDBS-LY-SM005 to Y.E.Z.), the Ministry of Agriculture and Rural Affairs of China and the National Natural Science Foundation of China (31970565 to Y.E.Z. and 32101204 to G.X.). G.X. is supported by ‘ZhiYi’ Innovation Funding and the ‘ZhiYi’ Fellowship.

Author information

Authors and Affiliations

Authors

Contributions

H.W., Y.E.Z. and G.X. conceived, designed and supervised this work. G.X., J.S. and Y.H. performed experiments, with the help of S.C., Y. Cao., L.Y. and Y. Cai. Y.L. performed computational analysis, with the help of Y.G. Y.E.Z., H.W. and G.X. wrote the manuscript, with help from the other authors.

Corresponding authors

Correspondence to Guanghai Xiang, Yong E. Zhang or Haoyi Wang.

Ethics declarations

Competing interests

A patent application has been filed by the Institute of Zoology, Chinese Academy of Sciences (H.W., G.X., J.S., Y.H., Y.E.Z. and Y.L. as inventors). The remaining authors declare no competing interests.

Peer review

Peer review information

Nature Biotechnology thanks Randall Platt and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 A schematic of the “Peel, Paste and Copy” model.

The subscripts L and R refer to the left and right ends, respectively. G refers to tetranucleotide guide sequences, while C refers to tetra- or pentanucleotide cleavage sites. CL is also the transposon insertion target site (TS) on the genome (which is marked with the unfilled bar), while GL/GR and CR are harbored by the element (marked with filled bars). The top panel shows the arrangement of IS605 locus together with the hairpin structures. During replication, TnpA catalyzes the excision (“Peel”) of transposon from the original site and the insertion (“Paste”) of transposon into a new site (the bottom left panel). Given the guidance of RNA derived from RE (reRNA, including sequences within the element as the scaffold and the downstream genome sequence as the guide), TnpB recognizes transposon-associated motifs (TAM, equivalent with CL or TS for ISDra2) and introduces a double-strand break at the original site. The break is repaired by recombining the transposon-present allele (“Copy”, the bottom right panel).

Extended Data Fig. 2 CL and TAM for 28 TnpB proteins with signals in TAM profiling experiments plus ISDra2 TnpB.

a, ISfinder-annotated CL and experimentally identified TAM sequence. CL for six conflicting cases is marked in red.

Extended Data Fig. 3 Gating strategy to determine the percentage of GFP+/RFP+ cells.

a, Cells are initially gated for viability, subsequently for singularity, and ultimately for the presence of both GFP and RFP markers (GFP+/RFP+).

Extended Data Fig. 4 Characterization of TnpB proteins.

a, Distribution of pairwise sequence identities between 65 TnpB tested. With a median identity of only 20% (marked as the red dotted line), these TnpB sequences are highly diverged. b-c,Pairwise comparison between the sequences (b) and structures (c) of archaea TnpB and those of active bacterial TnpB. Compared to the inactive TnpBs of archaea, the active TnpBs of archaea are more similar to the active TnpB systems of bacteria at both sequence and structure level. For the box plot, the figure convention follows Fig. 5c. Wilcoxon tests were used (n = 48 and 576 for the active and inactive group, respectively). d, Domain and structure of TnpB with ISTfu1 as an example. The top row shows the domain organization of ISTfu1 with 10 conserved residues marked as red lines. Herein, a fine-scale domain annotation based on knowledge of Un1Cas12f1 proteinsis shown, while a gross-scale annotation based on Pfam prediction is used in Fig. 5. The bottom panels show that the conserved N is well overlaid between ISTfu1 TnpB and Un1Cas12f1 (PDB ID: 7C7L) or Casλ (PDB ID: 8DC2). “Superposition” indicates that all three structures are overlaid. For Panels c and d, the TnpB structure is predicted with AlphaFold2.

Extended Data Fig. 5 Functional screening in HEK293T and TAM characterization of TnpB systems from de novo annotation.

a, Two TnpB systems induced reporter activity in HEK293T, as shown in Fig. 3a. Data are shown as the mean ± SD of three biological replicates with actual values overlaid. b, TAM logos for 10 TnpB systems characterized via a negative selection assay in E. coli, as shown in Fig. 2a.

Source data

Extended Data Fig. 6 Editing efficiency comparison of ISAam1, ISYmu1 and ISDra2.

Surveyor Assay gel pictures (a) and editing efficiency (b) of ISAam1, ISYmu1 and ISDra2 systems at six randomly selected endogenous sites in HEK293T cells. For Panel b, Data are shown as the mean ± SD of three biological replicates with actual values overlaid. The samples derive from the same experiment and that gels were processed in parallel.

Source data

Extended Data Fig. 7 gRNA design for seven nucleases at ten genomic loci of human.

a, Color-coded nucleases and the corresponding TAM. The gRNAs are aligned according to the stranded position. Taking CBLB as an example, the gRNA is more overlapping for ISAam1 and three Cas12f variants than for the other three nucleases. Sequences of all designed gRNAs are listed in Supplementary Table 8.

Extended Data Fig. 8 Editing efficiency of seven systems at eight genomic loci in mouse N2a cell line.

a, Distribution of the average efficiency of three biological replicates corresponding to one individual locus. The figure convention follows Fig. 6c.Ordinary one-way ANOVA test with Tukey’s multiple comparisons was performed (**P < 0.01; *P < 0.05).

Source data

Extended Data Fig. 9 Comparison of editing efficiency of ISAam1 (a) or ISYmu1 (b) relative to five Cas nucleases at three genomic loci in HEK293T cells.

The gRNA design is shown on the left panel, and editing efficiency shown on the right panel. The seven nucleases are color-coded. Since it is impossible to design overlapping gRNAs targeting the same location across all seven nucleases, two groups of overlapping gRNAs were separately designed for ISAam1, three Cas12f variants and Nme2-C.NR, and for ISAam1 and SaCas9. ISYmu1 was in a similar scenario. Data are shown as the mean ± SD of three biological replicates with actual values overlaid.

Source data

Extended Data Fig. 10 Five TnpB systems described in this study have compact size and broad editing scope in human genome.

a, Protein length distribution of representative Cas9, Cas12 and TnpB proteins. The proteins are sorted by decreasing sizes, with the dashed line marking the size of the smallest ISAam1 TnpB. Except seven proteins in Fig. 6, the widely used SpCas9 and AsCas12a together with ISDra2 TnpB and three TnpBs identified in this study are also shown. b, Protein length distribution of Cas9, Cas12 and TnpB homologs annotated in CRISPRCasdb. Notably, both Cas9 and Cas12 show a multimodal distribution, which could be caused by their intrinsic diversity in terms of protein length. In contrast, TnpB shows only one peak at approximately 400 aa. For the violin plot, the bar indicates first and third quartiles, the dot indicates the median and the curve indicates the data density. c, The proportion of potentially targetable exons in the human genome. Exons harboring at least one TAM site were counted. LincRNA refers to long intergenic noncoding RNA.

Supplementary information

Supplementary information

Supplementary Figs. 1–6 and Supplementary Protocol

Reporting Summary

Supplementary Tables 1–9

This file contains all supplementay tables.

Supplementary Data 1

This file contains source data for supplementary figures.

Source data

Source Data Extended Data Fig. 6

Unprocessed gels for Extended Data Fig. 6

Source Data Figs. 2–6 and Extended Data Figs. 5, 6, 8 and 9

Statistical source data for Figs. 2–6 and Extended Data Figs. 5, 6, 8 and 9

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xiang, G., Li, Y., Sun, J. et al. Evolutionary mining and functional characterization of TnpB nucleases identify efficient miniature genome editors. Nat Biotechnol (2023). https://doi.org/10.1038/s41587-023-01857-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1038/s41587-023-01857-x

This article is cited by

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research