Main

CRISPR–Cas systems provide adaptive immunity against mobile genetic elements (MGEs) in bacteria and archaea and are divided into two classes (classes 1 and 2) and six types (types I–VI)1,2. The class 2 systems include types II, V and VI, in which Cas9, Cas12 and Cas13, respectively, function as effector enzymes responsible for interference against MGEs. Recent studies have identified a dozen functionally divergent type V Cas12 effector proteins3,4,5,6,7,8,9,10,11. Cas12 enzymes associate with a CRISPR RNA (crRNA) or dual RNA guides (crRNA and trans-activating crRNA) and cleave double-stranded DNA (dsDNA) targets flanked by a protospacer-adjacent motif (PAM), using a single RuvC nuclease domain. Aside from the RuvC domain, Cas12 effector proteins share little sequence similarity, reflecting their diverse biochemical features. Type V Cas12 effectors reportedly evolved from the IS200/IS605 (IS stands for bacterial insertion sequences) superfamily transposon-associated TnpB protein2,12,13. Although the sequences of relatively large Cas12 proteins, such as Cas12a, have weak similarities to the TnpB sequence, five recently identified small Cas12 variants (V-U1 to V-U5) show greater sequence similarities to TnpB and are thought to represent the early stage of evolution from TnpB to the larger Cas12 enzymes2. Among the type V-U families, the U2–U4 clusters are classified into the Cas12f subtype, in which the effector proteins function as a dimer, while the U5 cluster is classified as the Cas12k subtype, in which the effector proteins are catalytically inactive.

A recent study demonstrated that the type V-M (formerly type V-U1, clade 2) Cas12m2 from Mycolicibacterium mucogenicum (hereafter referred to as Cas12m2 for simplicity) exhibits RNA-guided dsDNA interference activity11. Cas12m2 consists of 596 amino acid residues and is much smaller than other Cas12 enzymes, except for Cas12f7. Cas12m2 associates with a single crRNA to recognize dsDNA targets containing a 5′-TTN-3′ PAM. Interestingly, Cas12m2 lacks target dsDNA-cleavage activity, probably due to the non-canonical His–Asp–Asp (HDD) catalytic residues in the RuvC domain, rather than the canonical, DNA-cleavable active site Asp–Glu–Asp (DED). Hence, instead of target cleavage, Cas12m2 interferes with invading MGEs through transcriptional silencing by strong DNA binding11. However, because of the lack of structural information, the molecular mechanism of the miniature Cas12m2 and the evolutionary relationship between TnpB and Cas12m2 remain unknown.

Here, we report multiple cryo-electron microscopy (cryo-EM) structural states of Cas12m2–crRNA–target DNA ternary complexes and the Cas12m2–crRNA binary complex at overall resolutions of 2.9–3.7 Å. The structures reveal a unique target dsDNA-binding mechanism of Cas12m2, clearly explaining why Cas12m2 is capable of tightly binding target DNA but unable to cleave it. Furthermore, a structural comparison of Cas12m2 with TnpB illuminates their great structural similarities and diversities, providing insights into the evolutionary path from TnpB to Cas12 enzymes.

Results

Cryo-EM structure of the Cas12m2–crRNA–target DNA complex

To understand the molecular mechanism of Cas12m2, we analyzed its structure in complex with a 56-nucleotide crRNA (including a 20-nucleotide guide) and its 36-base pair (bp) dsDNA target with a 5′-TTG-3′ PAM by cryo-EM. Using three-dimensional (3D) classification and refinement, we discovered two conformational populations of the Cas12m2 ternary complex at overall resolutions of 2.9 Å (state I) and 3.1 Å (state II) (Fig. 1a–c, Table 1, Extended Data Fig. 1 and Supplementary Table 1). In both states, Cas12m2 binds the crRNA and its dsDNA target at a molar ratio of 1:1:1. Although the full-length crRNA–target DNA heteroduplex is visible in state I, a shorter heteroduplex (approximately 12 bp) is observed in state II. For simplicity, we describe the structural features of state I unless otherwise specified.

Fig. 1: Cryo-EM structure of the Cas12m2–crRNA–target DNA ternary complex.
figure 1

a, Domain structure of Cas12m2. b,c, Cryo-EM maps (b) and structural models (c) of the Cas12m2–crRNA–target DNA ternary complex. Zinc and magnesium ions in the TNB and RuvC domains are shown as gray spheres. d, Schematic of the crRNA and target DNA. The disordered regions are enclosed by dashed boxes. e, Structure of the crRNA-and-target DNA complex.

Table 1 Data collection, processing, model refinement and validation

The present structures revealed that Cas12m2 adopts a bilobed architecture composed of REC and nuclease (NUC) lobes (Fig. 1a–c and Extended Data Fig. 2a). The REC lobe consists of wedge (WED), REC1 and REC2 domains. The WED domain adopts an oligonucleotide–oligosaccharide-binding fold similar to those of other Cas12 enzymes14,15,16,17,18,19 (Extended Data Fig. 2b). The REC1 domain consists of three α-helices, and the REC2 domain has a characteristic coiled-coil structure, forming a distinctive architecture resembling a ‘sickle’ (Extended Data Fig. 2b). The NUC lobe contains RuvC and target nucleic acid-binding (TNB) domains. The RuvC domain has an RNase H fold, and the His317, Asp485 and Asp579 residues are located at positions similar to those of the well-conserved catalytic Asp, Glu and Asp residues in most other Cas12 enzymes14,15,16,17,18,19 (Extended Data Fig. 2c). The TNB domain is inserted into the RuvC domain and contains an HCCC-type zinc finger, in which a zinc ion is coordinated by His549, Cys552, Cys569 and Cys572 (Extended Data Fig. 2d). The guide RNA–target DNA heteroduplex is accommodated within the positively charged central channel formed by the REC and NUC lobes (Fig. 1c and Extended Data Fig. 2a). The displaced non-target DNA strand (NTS) is visible in our density map and is bound to unique pockets located in the REC and RuvC domains (discussed below).

crRNA architecture and recognition

The crRNA consists of the 20-nucleotide guide segment (G1 to C20) and the 36-nucleotide repeat-derived scaffold (G(−36) to C(−1)) (Fig. 1d). Nucleotides G1 to U17 in the crRNA and dA1 to dC17 in the target DNA strand (TS) form a 17-bp guide RNA–target DNA heteroduplex (Fig. 1d,e and Extended Data Fig. 2e). Unexpectedly, nucleotides dG(−3) to dC(−1) in the TS do not form base pairs with G18 to C20 in the crRNA and instead rehybridize with dG18* to dC20* in the NTS, suggesting that the 17-bp guide RNA–target DNA heteroduplex represents the optimal length for Cas12m2-mediated DNA recognition (to differentiate between NTS and TS in this study, an asterisk is used to denote the NTS; Fig. 1d,e and Extended Data Fig. 2e).

The crRNA scaffold comprises a pseudoknot (PK) and stem and loop regions. The 5′ end of the crRNA (G(−36) to U(−30)) is disordered (Fig. 1d,e and Extended Data Fig. 2f). Specifically, G(−23) forms a non-canonical base pair with A(−2), and C(−22) to C(−17) form canonical Watson–Crick base pairs with G(−8) to G(−3) to compose the PK structure. The PK coaxially stacks with the stem to form a continuous helix. G(−9) and A(−11) in the loop form hydrogen bonds with the ribose moieties of G(−16) and G(−14), respectively, thereby stabilizing the crRNA scaffold (Extended Data Fig. 3).

The crRNA scaffold is accommodated within the groove formed by the WED and RuvC domains and is mainly recognized by these domains (Fig. 2a,b and Extended Data Fig. 3). Notably, the PK and stem are bound by Arg237 and Arg241 through sugar–phosphate backbone interactions, while the nucleobase of the flipped-out A(−24) is sandwiched by His12 and Arg245 and its ribose moiety hydrogen bonds with Arg245 (Figs. 1d and 2c). The G(−12) in the loop forms a hydrogen bond and stacking interactions with Arg447 and Lys448, thereby stabilizing the crRNA loop region (Fig. 1d and Extended Data Fig. 3). Moreover, the first nucleotide C(−1) and the non-canonical G(−23):A(−2) base pair extensively interact with the WED domain in a base-specific manner. The nucleobase C(−1) forms a base-specific hydrogen bond and stacking interactions with the conserved His269 and Arg270, respectively, while G(−23) stacks with His269, stabilizing the non-canonical G(−23):A(−2) base pair (Fig. 2d). Altogether, the present structure reveals the mechanism of crRNA scaffold recognition by the compact Cas12m2.

Fig. 2: crRNA and target DNA recognition.
figure 2

a, Recognition sites of the crRNA scaffold and target DNA. b, Electrostatic surface potential of the Cas12m2–crRNA–target DNA ternary complex. The crRNA–target DNA heteroduplex is accommodated within a positively charged central channel formed by WED, REC1, REC2 and RuvC domains, and the PAM duplex is sandwiched by the WED and REC1 domains. cg, Recognition of the stem (c), the PK (d), the crRNA–target DNA heteroduplex (e), the rehybridized DNA duplex (f) and the PAM duplex (g). Hydrogen bonds are depicted with green dashed lines. h, DNA binding of Cas12m2 mutants in vitro, measured by SPR spectroscopy, relative to wild type. Values shown are the mean ± s.e.m. of three independent experiments. Statistical analysis was carried out using Tukey’s post hoc test in case of a significant two-way ANOVA result. Significant differences relative to wild-type Cas12m2 are indicated by asterisks. ***P < 0.001. Triple, Cas12m2R111A/R112A/R126A triple mutant. i, Unique Arg-rich pocket formed by the REC1 and REC2 domains. Hydrogen bonds are depicted with green dashed lines. j, Close-up view of the non-canonical RuvC active site. The magnesium ion coordinated with the NTS is shown as a gray sphere. The cryo-EM density for the RuvC conserved residues, the magnesium ion and the displaced NTS (dG16* and dT17*) is shown as a blue mesh. k, Schematic of the non-canonical RuvC active site.

Target DNA architecture and recognition

The guide RNA–target DNA heteroduplex is accommodated within the positively charged central channel formed by the REC1, REC2 and RuvC domains and is recognized by the Cas12m2 protein through interactions with its sugar–phosphate backbone (Fig. 2a,b and Extended Data Fig. 3). Met4 in the WED domain stacks with the first G1:dC17 base pair in the heteroduplex, while Arg301 in the WED domain interacts with the backbone phosphate group of dC18, thereby facilitating heteroduplex formation as observed in other Cas12 enzymes20 (Fig. 2e). Leu502 and Pro503 in the RuvC domain form hydrophobic interactions with the dG18* and dC(−1) bases, respectively, stabilizing reformation of the DNA duplex downstream of the heteroduplex (Fig. 2f).

The PAM-containing DNA duplex (referred to as the PAM duplex) is distorted into a narrow minor groove and is sandwiched by the REC1 and WED domains (Fig. 2a). The dT(−3*) and dT(−2*) nucleotides form hydrophobic interactions with Trp152 and Tyr141 and base-specific hydrogen bonds with Gln197 and Asn156, respectively (Fig. 2g). Tyr141 also interacts with the phosphate backbone of dT(−3*) (Fig. 2g). The dA19 and dA20 nucleobases, which base pair with dT(−2*) and dT(−3*), form hydrogen bonds with Gln195 and Gln197, respectively (Fig. 2g). Substitution mutagenesis of Cas12m2 confirmed the importance of Tyr141, Trp152, Asn156, Gln195 and Gln197 for DNA binding, as alanine substitution of any of these residues strongly affected the ability of Cas12m2 to interact with its DNA target in vitro (Fig. 2h and Extended Data Fig. 4). These structural and biochemical features explain the requirement of the 5′-TTN-3′ PAM sequence for Cas12m2-mediated target DNA binding.

The displaced NTS passes through the groove formed by the REC1 and REC2 domains toward the groove formed by the RuvC and TNB domains (Fig. 2a,b). Notably, the central region of the negatively charged backbone of the NTS is bound to an Arg-rich pocket in the REC1 and REC2 domains and interacts extensively with Arg111, Arg112, Arg126 and Arg173 (Fig. 2i). Such an Arg-rich pocket has not been observed in any other Cas12 enzymes, even though these Arg residues are highly conserved among Cas12m family enzymes (Extended Data Fig. 5). Cas12m2 mutants with alanine substitutions of Arg111, Arg112 and Arg126 showed reduced binding to target DNA in vitro (Fig. 2h and Extended Data Fig. 4c), suggesting the critical role of the unique Arg-rich pocket for the robust DNA-binding affinity of Cas12m enzymes.

Non-canonical HDD-type RuvC active site

In most Cas12 enzymes, the RuvC active site has the conserved DED catalytic residues that coordinate two magnesium ions on either side of the scissile phosphate (Extended Data Fig. 6a). The first magnesium ion (A) activates the nucleophilic water molecule, while the second magnesium ion (B) stabilizes the transition state, resulting in a two-divalent cation-dependent DNA-hydrolysis mechanism21,22 (Extended Data Fig. 6a,b).

In the Cas12m2 structure, the nucleotides dG14* to dT17* in the NTS are clearly visible, and the phosphate group of dG16* is located at the RuvC active site formed by His317, Asp485 and Asp579 residues (Fig. 2j and Extended Data Fig. 6c). At the position corresponding to magnesium ion (A) in other Cas12 enzymes, we observed a magnesium ion that was coordinated by His317 and Asp579, whereas no density corresponding to magnesium ion (B) was observed (Fig. 2j,k and Extended Data Fig. 6c,d). This is due to the replacement of the first Asp and the second Glu in the active site with His317 and Asp485, respectively, in Cas12m2. The His317 side chain has only one lone pair for metal coordination, which is used to bind magnesium ion (A), and the Asp485 side chain is located too far away to coordinate the second magnesium ion (B) (Fig. 2j,k and Extended Data Fig. 6c,d). Therefore, the non-canonical HDD motif leads to the absence of the second magnesium ion, resulting in loss of cleavage activity for the target DNA. Nonetheless, His317 and Asp579 are bound to the NTS via magnesium ion coordination, and His317 is stabilized by hydrogen bonding with Asp485, suggesting that the HDD motif plays a crucial role in target DNA binding, rather than target DNA cleavage. Consistent with these structural observations, our previous study showed that alanine substitution of Asp485 (HDD > HAD) leads to reduced DNA-binding affinity in vitro and transcriptional silencing activity in vivo11, indicating that Cas12m2 uses the RuvC catalytic center as the DNA-binding pocket for robust DNA-binding affinity.

Conformational changes upon heteroduplex formation lead to robust DNA binding

In state I, the target DNA forms a full R-loop containing a 17-bp heteroduplex, whereas, in state II, the target DNA only forms a 12-bp heteroduplex, and the displaced NTS is disordered (Fig. 3a,b). Thus, we propose that state II represents the intermediate state during formation of the 17-bp heteroduplex. Hereafter, we refer to state I as the full R-loop state and to state II as the intermediate state.

Fig. 3: Conformational changes upon heteroduplex formation enable robust DNA binding of Cas12m2.
figure 3

a,b, Structural models of the Cas12m2–crRNA–target DNA ternary complex, representing the intermediate state (a) and full R-loop state (b). The disordered regions are indicated as dashed lines. c, Superimposition of the intermediate state (light blue) and the full R-loop state (colored as in b). d,e, Close-up views around the lid motif of the intermediate state (d) and the full R-loop state (e). The disordered regions are indicated as the gray cycle surrounded by dashed lines. Hydrogen bonds are depicted with green dashed lines. f, Superimposition of the lid motif of the intermediate state (light blue) and the full R-loop state (colored as in b). The displaced NTS (dC15*–dT17*) in the full R-loop state clashes with the Lα2 helix in the intermediate state. g, Schematic of the bicistronic rfp-gfp operon (pTarget-operon), including the crRNA targeting sites in the promoter (A1) and at the end of the operon (3′-UTR; E1) described previously11. h, Normalized RFP and GFP fluorescence of E. coli cultures expressing different Cas12m2 mutants, directed to either the promoter (pCRISPR-A1) or the 3′-UTR (pCRISPR-E1). Fluorescence is shown relative to the average fluorescence of cultures expressing a non-targeting guide (pCRISPR-NT). Values shown are the mean ± s.e.m. of two values of biological duplicates (after averaging technical triplicates). Statistical analysis was carried out using Tukey’s post hoc test in case of a significant two-way ANOVA result. Significant differences relative to Cas12m2 values are indicated by asterisks. *P < 0.05, **P < 0.01, ***P < 0.001. dCas12m2, Cas12m2D485A mutant; triple, Cas12m2R111A/R112A/R126A triple mutant.

Structural comparisons between the full R-loop and the intermediate states revealed conformational changes in the REC2 domain (Fig. 3a–c and Supplementary Video 1). In the intermediate state, the REC2 domain is disordered due to its flexibility (Fig. 3a). By stark contrast, in the full R-loop, the REC2 domain is ordered and interacts with the sugar–phosphate backbone of the heteroduplex at the PAM-distal region (Fig. 3b). This structural rearrangement of the REC2 domain facilitates formation of the Arg-rich pocket between the REC1 and REC2 domains, allowing the binding of the displaced NTS (Fig. 3b). Structural comparisons also indicated local conformational changes in two short α-helices (Lα1 and Lα2) and the loop connecting them (together referred to as the lid motif) in the RuvC domain (Fig. 3c–f). In the intermediate state, residues 499–506 (part of Lα2 and the loop) are disordered due to the flexibility of this region, and Lα1 occludes the RuvC active site (Fig. 3d,f). By contrast, upon formation of the full R-loop, Thr502 in the lid motif hydrogen bonds with the phosphate backbone of dC2, while Pro503 and Leu504 in Lα2 form hydrophobic interactions with dC(−1) and dG18*, respectively (Fig. 3e). These structural rearrangements facilitate the local structural transition of Lα1, opening the RuvC active site to bind the NTS, as observed in Cas12b and Cas12i23,24 (Fig. 3e,f). Taken together, the conformational changes of the REC2 domain and the lid motif, which enable recognition of the PAM-distal end, provide a binding site for NTS within the Arg-rich pocket and the RuvC DNA-binding site, allowing Cas12m2 to silence transcription of target genes through its robust DNA-binding affinity.

To biochemically validate the importance of the two NTS-binding sites (that is, the Arg-rich pocket and the RuvC active site) in Cas12m2-mediated transcriptional repression, we performed an in vivo transcriptional silencing assay. We generated a target plasmid (pTarget-operon) containing a bicistronic operon with two fluorescence reporter genes, rfp and gfp. Escherichia coli cells harboring pTarget-operon and either pCRISPR-A1 (crRNA targeting the promoter region of the operon), pCRISPR-E1 (crRNA targeting the 3′ untranslated region (UTR) of the operon) or pCRISPR-NT (crRNA that does not target the operon) were made chemically competent and transformed with either pCas-Cas12m or plasmids encoding its mutants (including Cas12m2D485A, Cas12m2R111A, Cas12m2R112A and Cas12m2R126A mutants and the Cas12m2R111A/R112A/R126A triple mutant) (Fig. 3g). Wild-type Cas12m2 efficiently reduced red fluorescent protein (RFP) and green fluorescent protein (GFP) fluorescence upon crRNA targeting both the A1 and E1 sites, as observed in a previous study (Fig. 3h). By contrast, all mutants exhibited reduced transcriptional silencing capabilities when targeting the E1 site, and the Cas12m2R111A/R112A/R126A triple mutant showed lower transcriptional silencing efficacy when targeting both the A1 and E1 sites (Fig. 3h). These results explain why the Arg-rich pocket and the RuvC DNA-binding site, which result from conformational changes of the REC2 domain and the lid motif, play critical roles in Cas12m2-mediated transcriptional silencing.

Evolutionary path from TnpB to Cas12m2

The Cas12m family enzymes are thought to represent the early stage of evolution from TnpB to larger Cas12 enzymes2 (Extended Data Fig. 7). Thus, structural comparisons of TnpB and Cas12m2 may provide clues to trace the evolutionary path from TnpB to Cas12 enzymes. The structure of Cas12m2 strongly resembles that of the minimal TnpB containing WED, REC1 (REC domain in TnpB), RuvC and TNB domains25,26 (Fig. 4a). Both enzymes adopt bilobed architecture and accommodate the guide RNA–target DNA heteroduplex in similar manners (Fig. 4a). The PAM duplex (transposon-associated motif (TAM) duplex in TnpB) is commonly bound between the WED and REC1 domains, and the guide RNA scaffold is accommodated within the groove formed by the WED and RuvC domains. These structural similarities confirm that the RNA-guided DNA-targeting mechanism is highly conserved between TnpB and Cas12m2.

Fig. 4: Structural comparison of Cas12m2 and TnpB.
figure 4

a, Structural comparison of Cas12m2 and TnpB (from Deinococcus radiodurans ISDra2) (PDB 8H1J). b, Structural comparison of Cas12m2 and TnpB with emphasis on the guide RNA–target DNA heteroduplex. While Cas12m2 shares high sequence similarity with TnpB, Cas12m2 has an inserted REC2 domain (highlighted in blue), which is not present in TnpB, thereby facilitating recognition of the longer guide RNA–target DNA heteroduplex.

In contrast to these conserved structural features, there are also important structural differences. TnpB only recognizes a 12-bp guide RNA–target DNA heteroduplex, whereas Cas12m2 recognizes a longer 17-bp heteroduplex (Fig. 4b). This difference is mainly due to the characteristic REC2 insertion into the REC1 domain in Cas12m2 (Fig. 4a,b). TnpB lacks this REC2 domain, and thus the end of the guide RNA–target DNA heteroduplex is not recognized and the heteroduplexes are disordered beyond the 12th base pair. Therefore, these structural findings suggest that Cas12m2 acquired the ability to recognize the PAM-distal region of the guide RNA–target DNA heteroduplex through REC2 insertion into the core structure of TnpB. These features contributed to evolution toward CRISPR–Cas adaptive immunity.

Discussion

In this study, we determined the cryo-EM structure of the Cas12m2–crRNA–target DNA ternary complex, revealing that the non-canonical HDD-type RuvC catalytic triad forms an NTS-binding site via coordination of a single magnesium ion. Intriguingly, when we introduced counter-mutations to convert the HDD motif of Cas12m2 into a canonical DED motif, DNA-cleavage activity was not restored (data not shown). This observation indicates that there could be more reasons why Cas12m2 lost DNA-cleavage activity aside from the HDD-type RuvC triad. In a recent study, we divided the Cas12m family enzymes further into four clades, based on the conserved RuvC catalytic triad11 (clade 1, HEK, HQK; clade 2, HDD; clade 3, DED, NED; clade 4, DED, DND, NED). Because the other Cas12m enzyme clades have different catalytic residues, they could interfere with invading MGEs through mechanisms distinct from that of Cas12m2. Notably, clade 1 (HQK) may directly bind the NTS within the positively charged RuvC active site, while clade 3 and clade 4 may facilitate two-divalent cation-dependent DNA hydrolysis within the canonical DED-type RuvC active site. Biochemical and structural characterizations of the other Cas12m enzyme clades are required to further clarify the molecular mechanisms of Cas12m family enzymes.

Most Cas12 enzymes auto-process their own pre-crRNA into mature crRNA without the involvement of a host ribonuclease. Cas12a and Cas12i process their pre-crRNA at their WED domain by metal-independent, acid–base catalytic mechanisms21,24,27,28, whereas Cas12c and Cas12j process their pre-crRNA at the RuvC catalytic site by a metal-dependent mechanism8,20. A recent study showed that, similar to Cas12a and Cas12i, Cas12m2 cleaves its pre-crRNA upstream of the crRNA scaffold, and two conserved residues (His269 and Arg270) in the WED domain may participate in crRNA maturation11. However, unlike Cas12a and Cas12i, in which the 5′ end of the crRNA is surrounded by the WED domain, the 5′ end of Cas12m2–crRNA is projected away from the WED domain and exposed to the solvent in the present structure (Extended Data Fig. 8). His269 and Arg270 in the WED domain form base-specific interactions with the crRNA scaffold, suggesting that these residues are responsible for crRNA recognition, rather than crRNA processing (Fig. 2d). Indeed, a processing analysis of Cas12m2 revealed that alanine substitutions of His269 and Arg270 in the WED domain and Asp485 in the RuvC catalytic site had little or no effect on the pre-crRNA-processing pattern (Extended Data Fig. 9a). These structural and biochemical analyses suggest that Cas12m2 may process its pre-crRNA in a unique manner. Intriguingly, when we analyzed the cryo-EM structure of Cas12m2 complexed with a 56-nucleotide crRNA, we observed an additional density close to the 5′ region of the crRNA (Table 1, Extended Data Fig. 9b–e and Supplementary Table 1). This density aligns well with part of the distinctive sickle-like architecture of the REC1 and REC2 domains in Cas12m2, suggesting that Cas12m2 may facilitate pre-crRNA processing through the formation of a transient asymmetric homodimer. However, because the present resolution is not sufficient to unambiguously elucidate the residues responsible for crRNA maturation, further studies are required to fully understand the pre-crRNA-processing mechanism of Cas12m2.

A structural comparison of TnpB, the putative ancestor of Cas12 enzymes, with Cas12m2, Cas12f14 and Cas12a19 provides insights into the evolutionary path of Cas12 enzymes (Fig. 5a,b). Cas12m2 and Cas12a commonly have REC2 insertions that recognize the PAM-distal region of the guide RNA–target DNA heteroduplex, whereas Cas12f retains a TnpB-like REC domain. Nevertheless, Cas12f functions as an asymmetric homodimer to compensate for recognition of the PAM-distal region of the heteroduplex, with the second molecule serving as a replacement for the REC2 domain14,15. These observations suggest that CRISPR–Cas12 enzymes acquired the ability to recognize the PAM-distal region of the guide RNA–target DNA heteroduplex to engage in CRISPR–Cas adaptive immunity. All Cas12 enzymes except for Cas12f and Cas12k (which are both associated with one or more additional protein molecules) have the REC2 insertion to recognize the PAM-distal region14,15,29,30, supporting our suggestion. Structural comparison between Cas12m2 and Cas12a provides insights into the maturation of Cas12 enzymes (Fig. 5a,b). Notably, Cas12a has larger REC domains and a TNB domain19, which are responsible for target DNA recognition and loading, respectively. In addition, Cas12a contains a PAM interacting (PI) domain, which extensively interacts with the PAM duplex19 (Fig. 5a,b). These domain enlargements and the insertion may have been acquired to target the dsDNA of invading phage more efficiently and/or to compete with anti-CRISPR systems.

Fig. 5: Model of evolution from TnpB to Cas12 enzymes.
figure 5

a, Structural comparison of Cas12m2 with TnpB (from D. radiodurans ISDra2) (PDB 8H1J), Cas12f (from an uncultured archaeon) (PDB 7C7L) and Cas12a (from Francisella novicida) (PDB 6I1K). Second Cas12f molecule (mol. 2) and the specific REC2 insertions in Cas12m2 and Cas12a, which are responsible for recognition of the PAM-distal region, are highlighted in blue. The insertions from Cas12m2 to Cas12a are highlighted in orange. The blue cycle represents the region where TnpB acquired additional elements to engage in the adaptive immune system, and the orange cycles represent the region where domain insertions and enlargements occurred for more efficient target recognition and cleavage. b, Schematic of domain insertions and enlargements of Cas12 enzymes from TnpB. ZF, zinc finger domain.

Methods

Protein and RNA preparation for structural analysis

The Cas12m2 protein for structural analysis was expressed and purified using the protocol reported previously14,20. Briefly, the N-terminally His6-tagged Cas12m2 protein was expressed in E. coli Rosetta 2 (DE3). E. coli cells were cultured at 37 °C until the OD600 reached 0.8, and protein expression was then induced by the addition of 0.1 mM isopropyl β-d-thiogalactopyranoside (Nacalai Tesque). E. coli cells were further cultured at 20 °C overnight and collected by centrifugation. The cells were then resuspended in buffer A (20 mM HEPES–NaOH, pH 7.6, 20 mM imidazole and 1 M NaCl), lysed by sonication and centrifuged. The supernatant was mixed with 3 ml Ni-NTA Superflow resin (Qiagen), and the mixture was loaded into an Econo-Column (Bio-Rad). The protein was eluted with buffer B (20 mM HEPES–NaOH, pH 7.6, 0.3 M imidazole, 0.3 M NaCl) and then loaded onto a 5-ml HiTrap SP HP column (GE Healthcare) equilibrated with buffer C (20 mM HEPES–NaOH, pH 7.6, and 0.3 M NaCl). The protein was eluted with a linear gradient of 0.3–2 M NaCl and further purified by chromatography on a Superdex 200 column (GE Healthcare) equilibrated in buffer D (20 mM HEPES–NaOH, pH 7.6, 0.5 M NaCl). The purified proteins were stored at −80 °C until use. The crRNA was transcribed in vitro with T7 RNA polymerase and purified by 10% denaturing (7 M urea) polyacrylamide gel electrophoresis.

Electron microscopy sample preparation and data collection

The Cas12m2–crRNA–target DNA complex was reconstituted by mixing purified Cas12m2, the 56-nucleotide crRNA, the 36-nucleotide target DNA and the 36-nucleotide non-target DNA at a molar ratio of 1:1.2:1.5:1.5. The Cas12m2–crRNA binary complex was reconstituted by mixing purified Cas12m2 and the 56-nucleotide crRNA at a molar ratio of 1:1.2. The ternary and binary complexes were purified by size-exclusion chromatography on a Superdex 200 Increase 10/300 column (GE Healthcare) equilibrated with buffer E (20 mM HEPES–NaOH, pH 7.6, 50 mM NaCl, 2 mM MgCl2 and 10 μM ZnCl2). The purified complex solution (A260 = 4) was then applied to Au 300-mesh R1.2/1.3 grids (Quantifoil), and, after adding 3 μl amylamine, they were glow discharged in a Vitrobot Mark IV system (FEI) at 4 °C with a waiting time of 10 s and a blotting time of 4 s under 100% humidity conditions. The grids were plunge frozen in liquid ethane and cooled to the temperature of liquid nitrogen.

Micrographs for all datasets were collected on a Titan Krios G3i microscope (Thermo Fisher Scientific) running at 300 kV and equipped with a Gatan Quantum LS Energy Filter (GIF) and a Gatan K3 Summit direct electron detector in electron-counting mode (University of Tokyo, Japan). Movies were recorded at a nominal magnification of 105,000×, corresponding to a calibrated pixel size of 0.83 Å, with a total dose of approximately 50 electrons per Å2 per 48 frames using EPU software (Thermo Fisher Scientific). The dose-fractionated movies were subjected to beam-induced motion correction and dose weighting using MotionCor2 (ref. 31) implemented in RELION-3.1 (ref. 32), and contrast transfer function (CTF) parameters were estimated using patch-based CTF estimation in cryoSPARC version 3.3.2 (ref. 33).

Single-particle cryo-EM data processing

Data were processed using cryoSPARC33. For the ternary complex, 5,067,096 particles were initially selected from the 3,570 motion-corrected and dose-weighted micrographs using a two-dimensional (2D) reference and extracted at a pixel size of 3.32 Å. These particles were subjected to several rounds of 2D classification to curate particle sets. The particles were further curated by heterogeneous refinement, using a map derived from ab initio reconstruction as the template. The selected 1,030,197 particles were then re-extracted at a pixel size of 1.16 Å and subjected to 3D variability analysis34. The resulting maps with different conformations were used for subsequent heterogeneous refinement. The selected particles after heterogeneous refinement were refined using non-uniform refinement with optimization of the CTF value35, yielding maps at resolutions of 2.87 Å (state I) and 3.08 Å (state II), according to the FSC criterion of 0.143 (ref. 36). The local resolution was estimated with BlocRes in cryoSPARC.

For the binary complex, 6,034,518 particles were initially selected from a total of 8,004 motion-corrected and dose-weighted micrographs using a 2D reference and extracted at a pixel size of 3.32 Å. The particles were further curated by heterogeneous refinement using a map derived from ab initio reconstruction as the template. The selected 215,565 particles were then re-extracted at a pixel size of 1.66 Å and subjected to 3D classification without alignment, using a mask focused on the second subunit. After 3D classification, the selected particles were refined using non-uniform refinement, yielding a map at a resolution of 3.73 Å, according to the FSC criterion of 0.143. The local resolution was estimated with BlocRes in cryoSPARC.

Model building and validation

The model was built using the predicted model of the Cas12m2 protein created by AlphaFold2 (the AlphaFold model) as the reference, followed by manual model building with Coot37,38. The model was refined using phenix.real_space_refine version 1.20.1 (ref. 39) with secondary structure and metal coordination restraints. The metal coordination restraints were generated using ReadySet, as implemented in PHENIX. Structure validation was performed using MolProbity40. Residues 592–596 and nucleotides −36 to −30 of the crRNA, nucleotides −9 to −5 of the TS and nucleotides 22–26 of the NTS were not included in the final full R-loop state model because these regions were not well resolved on the density map. Residues 53–120, 499–506 and 592–596 and nucleotides −36 to −30 and 13–20 of the crRNA, nucleotides −9 to 5 and 27 of the TS and nucleotides −10 and 13–26 of the NTS were not included in the final intermediate state model. In the binary complex, residues 87–90, 488–508 and 592–596 of Cas12m2.1, residues 1–17, 65–94, 189–250 and 262–596 of Cas12m2.2 and nucleotides −36 to −30 and 16–20 of the crRNA were not included in the final model. The cryo-EM density maps were calculated with UCSF ChimeraX41, and the molecular graphics shown in the figures were prepared with CueMol (http://www.cuemol.org).

Assembly of Cas12m2 PAM and REC mutant plasmids

Plasmids used for protein expression of Cas12m2 PAM and REC mutants were assembled by PCR mutagenesis (Q5 polymerase, NEB; T4 DNA ligase, NEB) using the primers listed in Supplementary Table 2, with pML-1B-Cas12m as the PCR template11. Sequence-verified plasmids were used as templates for amplification of their inserts for restriction ligation assembly of pCas-Cas12m2-[x] plasmids, which were used for in vivo transcriptional silencing assays, with pCas-Cas12m2 as the template for backbone amplification (Supplementary Table 2).

In vivo transcriptional silencing assay

The transcriptional silencing activity of Cas12m variants in vivo was assessed using the protocol reported previously11, with minor modifications. Briefly, E. coli cells harboring pTarget-operon and either pCRISPR-A1, pCRISPR-E1 or pCRISPR-NT were made chemically competent and transformed with either pCas-Cas12m, pCas-dCas12m or pCas-Cas12m-[x] (the ten alanine-substitution mutants). After recovery, 5 μl of transformation mix was diluted with 195 μl M9TG medium in a 96-well 2-ml masterblock (Greiner) in triplicate, sealed with a gas-permeable membrane and grown at 37 °C and 750 r.p.m. for 20 h. Cells were diluted 1:1,000 in fresh M9TG medium in a total volume of 200 μl in a 96-well masterblock and grown overnight at 37 °C. Overnight cultures were diluted 1:10 in 200 μl PBS and analyzed using a Synergy Mx microplate reader (BioTek) and Gen5 (software version 3.10.06). The fluorescence of GFP (excitation, 488 nm; emission, 512 nm) and RFP (excitation, 532 nm; emission, 588 nm) was measured with gains of 91 and 140, respectively. Fluorescence values were normalized to blank-corrected OD600 values, and technical triplicates were averaged. In Fig. 3h, the averages of two biological replicates are shown relative to the average normalized fluorescence of wells containing E. coli expressing a non-targeting spacer (pCRISPR-NT). Statistical analysis was performed using Tukey’s post hoc test in case of significant two-way ANOVA results.

Cas12m2 purification for surface plasmon resonance analyses

The pML-1B-Cas12m2, pML-1B-dCas12m2 and pML-1B-Cas12m2-[x] plasmids were transformed into chemically competent E. coli Rosetta 2 (DE3) cells and purified using a modified version of the previously reported protocol11. Briefly, the transformed E. coli strains were inoculated into 800 ml LB medium supplemented with chloramphenicol and kanamycin, cultured overnight and grown to an OD600 of 0.6–0.8 at 37 °C and 150 r.p.m. The cultures were cooled on ice for 1 h before the addition of 0.4 mM isopropyl 1-thio-β-d-galactopyranoside and then grown for 16–20 h at 20 °C and 120 r.p.m. Cultured cells were pelleted, washed and resuspended in lysis buffer (1 M NaCl, 20 mM imidazole, 20 mM HEPES, pH 7.6) freshly supplemented with protease inhibitors (Roche cOmplete, EDTA-free) and lysed by sonication (Bandelin Sonopuls). The lysate was cleared (30 min, 30,000g), and the supernatant was filtered (0.45 μm). The filtrate was applied to a nickel column (HisTrap HP, GE Life Sciences), which was washed (500 mM NaCl, 20 mM imidazole, 20 mM HEPES, pH 7.6) and eluted (500 mM NaCl, 300 mM imidazole, 20 mM HEPES, pH 7.6) using an ÄKTA FPLC system. Elution fractions containing the expected band were pooled, concentrated in SEC buffer (500 mM NaCl, 20 mM HEPES, pH 7.6, 1 mM DTT) through centrifugal filters (Amicon Ultra-15, MW cutoff 30,000 Da) and further purified by size-exclusion chromatography (HiLoad 16/600 Superdex 200) by FPLC. Pure fractions were pooled and snap frozen in liquid nitrogen until analysis by surface plasmon resonance (SPR).

Surface plasmon resonance

SPR spectroscopy was performed at 25 °C on a Biacore T100 system (Cytiva). A series S CM5 sensor chip surface was modified with 2,500 response units of streptavidin (Invitrogen) using an amine coupling kit (Cytiva). Flow cell 1 was used as the reference. A 50-nucleotide biotinylated oligonucleotide containing a DNA target site was annealed to its complementary strand (Supplementary Table 2; IDT) in 25 mM HEPES, pH 7.5, containing 150 mM KCl, and 20 response units were immobilized on flow cell 2. Ribonucleotide protein complexes were formed by mixing purified protein variants with a twofold molar excess of PAM-SCNR RNA or a non-target RNA (Supplementary Table 2) to a final concentration of 500 nM protein in SPR buffer (20 mM HEPES, pH 7.8, 150 mM KCl, 10% glycerol, 5 mM MgCl2, 1 mM DTT and 0.05% Tween-20). Dilutions of this complex were injected over the chip surface at 50 µl min−1. Flow cells were regenerated using three consecutive injections of 4 M MgCl2, followed by an injection of the complementary oligonucleotide (1 µM) for 100 s at 5 µl min−1 in flow cell 2. Wild-type Cas12m2 displays multiphasic association and dissociation kinetics, which is most prominent at protein concentrations above 100 nM11; therefore, sensorgrams at lower concentrations (8–63 nM) were fitted to a two-state model using BIAevaluation software (Cytiva) to obtain a KD estimate. After double-reference correction, the remaining bulk effects masked the severely reduced binding capabilities of the mutant proteins in the association phase in the sensorgrams; therefore, the binding of Cas12m2 variants at 500 nM to the target site was quantified from the stable binding levels at the start of the dissociation phase (average signal of a 10-s window, 20 s after dissociation) relative to that of wild-type Cas12m2.

Pre-crRNA processing assay

Pre-crRNA processing was achieved using an in vitro transcription and translation system, as described previously11. A pCas and pCRISPR plasmid mixture expressing Cas12m2 and a CRISPR array (containing four spacers) was incubated at 29 °C for 5 h in a thermocycler. Total RNA was then extracted using a Direct-zol RNA MiniPrep kit, according to the manufacturer’s instructions (Zymo Research). For northern blotting analysis, 5 μg of each RNA sample obtained from cell-free transcription–translation was fractionated on an 8% polyacrylamide gel (7 M urea) at 300 V for 140 min. RNA was transferred onto Hybond-XL membranes (Amersham Hybond-XL, GE Healthcare) using an electroblotter at 50 V for 1 h at 4 °C (Tank-Elektroblotter Web M, PerfectBlue) and cross-linked with UV light at 0.12 J (UV lamp T8C, 254 nm, 8 W). Hybridization proceeded overnight in 17 ml of Roti-Hybri-Quick buffer, with 5 µl of [γ-32P]ATP end-labeled oligodeoxyribonucleotides at 42 °C. The membrane was visualized using a Phosphorimager (Typhoon FLA 7000, GE Healthcare).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.