R-loop formation and conformational activation mechanisms of Cas9

Cas9 is a CRISPR-associated endonuclease capable of RNA-guided, site-specific DNA cleavage1–3. The programmable activity of Cas9 has been widely utilized for genome editing applications4–6, yet its precise mechanisms of target DNA binding and off-target discrimination remain incompletely understood. Here we report a series of cryo-electron microscopy structures of Streptococcus pyogenes Cas9 capturing the directional process of target DNA hybridization. In the early phase of R-loop formation, the Cas9 REC2 and REC3 domains form a positively charged cleft that accommodates the distal end of the target DNA duplex. Guide–target hybridization past the seed region induces rearrangements of the REC2 and REC3 domains and relocation of the HNH nuclease domain to assume a catalytically incompetent checkpoint conformation. Completion of the guide–target heteroduplex triggers conformational activation of the HNH nuclease domain, enabled by distortion of the guide–target heteroduplex, and complementary REC2 and REC3 domain rearrangements. Together, these results establish a structural framework for target DNA-dependent activation of Cas9 that sheds light on its conformational checkpoint mechanism and may facilitate the development of novel Cas9 variants and guide RNA designs with enhanced specificity and activity.

Cas9 is a CRISPR-associated endonuclease capable of RNA-guided, site-specific DNA cleavage [1][2][3] .The programmable activity of Cas9 has been widely utilized for genome editing applications [4][5][6] , yet its precise mechanisms of target DNA binding and off-target discrimination remain incompletely understood.Here we report a series of cryo-electron microscopy structures of Streptococcus pyogenes Cas9 capturing the directional process of target DNA hybridization.In the early phase of R-loop formation, the Cas9 REC2 and REC3 domains form a positively charged cleft that accommodates the distal end of the target DNA duplex.Guide-target hybridization past the seed region induces rearrangements of the REC2 and REC3 domains and relocation of the HNH nuclease domain to assume a catalytically incompetent checkpoint conformation.Completion of the guide-target heteroduplex triggers conformational activation of the HNH nuclease domain, enabled by distortion of the guide-target heteroduplex, and complementary REC2 and REC3 domain rearrangements.Together, these results establish a structural framework for target DNA-dependent activation of Cas9 that sheds light on its conformational checkpoint mechanism and may facilitate the development of novel Cas9 variants and guide RNA designs with enhanced specificity and activity.
Cas9 enzymes rely on a dual guide RNA structure consisting of a CRISPR RNA (crRNA) guide and a trans-activating CRISPR RNA (tracrRNA) coactivator to cleave complementary DNA targets.S. pyogenes Cas9 (SpCas9) has found widespread use as a programmable DNA-targeting tool in genome editing and gene-targeting applications [4][5][6] .Target DNA binding by SpCas9 is dependent on the initial recognition of an NGG protospacer-adjacent motif (PAM) downstream of the target site 2,[7][8][9] , which triggers local DNA strand separation to initiate its directional hybridization with a 20-nt segment in the guide crRNA to form an R-loop structure 7,10,11 .Target strand (TS) binding is facilitated by structural pre-ordering of nucleotides 11-20 of the crRNA (counting from the 5′ end), termed the seed sequence, in an A form-like conformation 8,12 .Formation of a complete R-loop leads to the activation of the Cas9 HNH and RuvC nuclease domains to catalyse cleavage of the TS and non-target DNA strand (NTS), respectively 2,8,13 .Although highly specific, SpCas9 cleaves off-target sites with imperfect complementarity to the guide RNA, often resulting in considerable levels of off-target genome editing [14][15][16][17][18] .The off-target activity is dependent on the number, type and positioning of base mismatches within the guide-target heteroduplex 15,[19][20][21] .PAM-proximal mismatches within the seed region are discriminated against by substantially increased dissociation rates 11,19,21,22 , whereas PAM-distal mismatches are compatible with stable DNA binding 13,19,21,23,24 .Such off-targets are instead discriminated by a conformational checkpoint mechanism that monitors the integrity of the guide-target duplex to induce conformational activation of the nuclease domains 11,13,19,[21][22][23][24] .Structural, biophysical and computational studies of SpCas9 have shed light on the mechanism of guide RNA binding, PAM recognition and nuclease activation, revealing that the enzyme undergoes extensive conformational rearrangements throughout these steps.In particular, high-resolution structures of the fully bound target DNA complex of SpCas9 [25][26][27][28] have revealed a target-DNA-dependent conformational rearrangement of the Cas9 REC lobe that is necessary for cleavage activation.However, the mechanisms that underpin R-loop formation and off-target discrimination during conformational activation have remained elusive.

Cryo-EM analysis of R-loop formation
To investigate the mechanism of R-loop formation, we initially determined the minimal extent of target DNA complementarity necessary for stable binding using fluorescence-coupled size-exclusion chromatography, revealing that the presence of six complementary nucleotides in the PAM-proximal region of the target DNA heteroduplex is sufficient for stable association with the SpCas9-guide RNA complex.(Extended Data Fig. 1).Subsequently, catalytically inactive SpCas9 (dCas9) was reconstituted with a single-molecule guide RNA (sgRNA) and partially matched DNA targets containing 6, 8, 10, 12, 14 and 16 complementary nucleotides upstream of the PAM (Fig. 1a and Extended Data Fig. 2).We analysed the resulting complexes using cryo-electron microscopy (cryo-EM), yielding molecular reconstructions at resolutions of 3.0-4.1 Å (Extended Data Fig. 3 and Extended Data Tables 1 and 2).We additionally determined cryo-EM reconstructions of wild-type https://doi.org/10.1038/s41586-022-05114-0

Article
SpCas9 bound to 18-nt complementary DNA targets in the presence of 1 mM and 10 mM Mg 2+ , representing the checkpoint and catalytically active states, respectively (Extended Data Fig. 3 and Extended Data Table 2).Three-dimensional variability analysis 29 was used to analyse conformational heterogeneity within each complex (Supplementary Videos 1-8).Most of the detected variability within each complex can be attributed to the PAM-distal duplex and the REC2, REC3 and HNH domains, suggestive of conformational equilibrium sampling.The resulting structural models are representative of the most abundant conformational state of each complex (Extended Data Fig. 4).
Structural superpositions of the partially matched complexes with the guide-RNA-bound binary SpCas9 complex 12 provide a framework for the visualization of the DNA-binding mechanism, revealing stepwise domain rearrangements coupled to R-loop formation (Extended Data Fig. 5a).All complexes exhibit almost identical conformations of the bridge helix, REC1, RuvC and PAM-interaction domains, as well as the PAM-proximal double stranded DNA (dsDNA) duplex and the sgRNA downstream (3′ terminal) of the seed region.Conformational differences are observed in the positioning of the REC2, REC3 and the HNH domain relative to the emerging R-loop, consistent with the 3D variability analysis.

R-loop initiation by bipartite seed
The structure of the 6-nucleotide complementary target (6-nt match) complex reveals a 5-bp heteroduplex formed by the sgRNA seed sequence and TS DNA (Fig. 1b).Hybridization beyond the fifth seed sequence nucleotide is precluded by base stacking with the side chain of Tyr450, which was previously observed in the structure of the Cas9-sgRNA binary complex 12 (Fig. 1c).Comparisons with the binary complex structure indicate that TS hybridization is associated with the displacement of the REC2 domain out of the central binding channel (Fig. 1b).The PAM-distal duplex part of the DNA substrate is bound in a positively charged cleft formed by the REC2 and REC3 domains (Fig. 1b and Extended Data Fig. 5b), stabilized by interactions of the REC2 residues Ser219, Thr249 and Lys263 with the NTS backbone (Extended Data Fig. 5c), and REC3 residues Arg586 and Thr657 with the TS backbone (Extended Data Fig. 5d).Similar REC lobe conformation and protein contacts with the PAM-distal end of the DNA have been observed in a 3-bp heteroduplex complex described in a recent study 30 .Consequently, the NTS is positioned parallel to the guide RNA-TS DNA heteroduplex within the central binding channel (Fig. 1b).The 5′-terminal part of the sgRNA appears to be conformationally flexible but residual cryo-EM density suggests its placement in a positively charged cleft located between the HNH and PAM-interaction domains (Extended Data Fig. 5e).
The structure of the 8-nucleotide complementary target (8-nt match) complex reveals that expansion of the R-loop heteroduplex, enabled by unstacking of Tyr450, forces further repositioning of the REC2 and REC3 domains to widen the binding channel as the PAM-distal duplex shifts deeper inside the channel (Figs.1d and 2a-c and Extended Data Fig. 5f).R-loop propagation and PAM-distal duplex displacement results in the formation of new intermolecular contacts, with Cas9 contacting the PAM-distal duplex backbone through REC2 domain residues Ser217, Lys234 and Lys253, and REC3 residues Arg557 and Arg654 (Extended Data Fig. 5g,h).
Together, these observations suggest that the seed sequence of the Cas9 guide RNA is bipartite and that its hybridization with target DNA proceeds in two steps, consistent with the existence of a short-lived intermediate state observed in FRET studies 11,31 .To validate the observed interactions, we tested the cleavage activities of structure-based Cas9 mutant proteins in vitro (Extended Data Fig. 6a).Alanine substitution of Tyr450 resulted in substantial reductions of off-target substrate  cleavage rates, whereas on-target cleavage remained largely unperturbed (Fig. 1e and Extended Data Fig. 6b).As observed previously 32 , the effect was more prominent for off-target substrates containing mismatches with the seed region of the guide RNA compared with off-targets containing only PAM-distal mismatches.Together, these results suggest that disruption of seed sequence interactions in the binary Cas9-sgRNA complex and early binding intermediates might exacerbate R-loop destabilization caused by off-target mismatches, resulting in an increased rate of off-target substrate dissociation and thus increased specificity.By contrast, a subset of mutations of DNA-interacting REC2 or REC3 residues resulted in increased off-target cleavage, as did the deletion of the REC2 domain (Extended Data Fig. 6b-e), consistent with single-molecule studies implicating the REC2 domain in Cas9 specificity 31 .Collectively, these results underscore the importance of specific Cas9-DNA contacts during early steps of R-loop formation for the specificity of Cas9.

R-loop propagation and remodelling
Further guide RNA-TS hybridization to form a 10-bp heteroduplex causes a rearrangement of the REC2 and REC3 domains and repositioning of the PAM-distal DNA duplex into the positively charged central binding channel formed by the REC3, RuvC and HNH domains (Fig. 2a).Here, the PAM-distal dsDNA duplex forms a continuous base stack with the sgRNA-TS heteroduplex (Fig. 2d).The displaced NTS is positioned underneath the HNH domain and continues to run parallel to the extending guide RNA-TS DNA heteroduplex (Extended Data Fig. 7a).X-ray crystallographic analysis of the 10-nt match complex at a resolution of 2.8 Å (Extended Data Table 3) confirmed that the TS and NTS remain hybridized at the PAM-distal end of the DNA substrate (Extended Data Fig. 7b).The PAM-distal duplex is wedged between the REC3 and RuvC domains and the L1 HNH linker (Fig. 2d and Extended Data Fig. 7a,b).The relocation of the PAM-distal duplex causes the REC2 domain to shift closer to the binding channel and occlude the cleavage site in TS DNA (Fig. 2d).This shift also establishes new electrostatic interactions between a negatively charged helix in REC2 (Glu260, Asp261, Asp269, Asp272, Asp273, Asp274 and Asp276) and a positively charged helix in REC3 (Lys599, Arg629, Lys646, Lys649, Lys652, Arg653, Arg654 and Arg655), hereafter referred to as the DDD and RRR helices, respectively (Fig. 2e), which are highly conserved across Cas9 orthologues that contain a REC2 domain (Extended Data Fig. 7c).Cleavage of off-target substrates in vitro was reduced by alanine substitutions of the interacting residues in the REC2 DDD helix, whereas mutations in the REC3 RRR helix only reduced cleavage of the off-target substrate containing a mismatch in the seed region (Extended Data Fig. 6d,e).These results suggest that the REC2-REC3 interaction contributes to Cas9 restructuring during R-loop extension; however, the DDD and RRR helices might have additional structural roles during upstream and downstream steps in the DNA-binding mechanism, particularly as the REC3 RRR helix contacts the backbone of the PAM-distal DNA duplex during early stages of target binding (Extended Data Fig. 5d,h).
The HNH nuclease domain remains docked on the RuvC and PI domains in the 6-, 8-and 10-nt match complexes, with the active site buried at the interdomain interface (Fig. 3a).R-loop extension past the seed region to form a 12-bp heteroduplex does not result in major REC2/3 domain rearrangements, with the PAM-distal duplex remaining coaxially stacked onto the guide RNA-TS DNA heteroduplex throughout the 12-, 14-, 16-and 18-nt match complexes (Fig. 3b).By contrast, the HNH domain becomes disordered along with the surrounding RuvC 1011-1040 and PI 1245-1251 loops in the 12-nt match complex (Fig. 3c).Upon extension of the R-loop heteroduplex to 14 bp, the RuvC and PI loops responsible for HNH docking remain structurally disordered (Fig. 3c and Extended Data Fig. 8a) and residual density

Article
is observed for the HNH domain as its L2 linker contacts the guide RNA-TS heteroduplex (Extended Data Fig. 8a).Further extension of the R-loop heteroduplex from 14 to 16 bp causes translocation of the HNH domain towards the guide RNA-TS DNA heteroduplex within the central binding channel (Fig. 3c).Facilitated by the formation of the PAM-distal part of the R-loop, a loop in the RuvC domain (residues 1030-1040) restructures into a helical conformation, establishing interactions with the L2 linker (Extended Data Fig. 8b).This repositions the L2 linker and shifts the HNH domain on top of the heteroduplex, sealing off the central binding channel (Fig. 3c and Extended Data Fig. 8c).The HNH domain remains in a catalytically incompetent orientation, with its active site located around 31 Å away from the scissile phosphate group in the TS.

Conformational checkpoint and activation
Previous studies have shown that substrates containing 4-bp mismatches at the PAM-distal end of the target sequence (positions 16-20)  are generally refractory to Cas9 cleavage, whereas substrates containing mismatches at positions 19 and 20 are efficiently cleaved 13,23,24,33 .The cryo-EM reconstruction of the 18-nt match complex in the presence of 1 mM Mg 2+ reveals that the most populated 3D class in the sample represents a pre-cleavage state with an intact TS and disordered NTS (Fig. 4a).Upon extension of the R-loop to 18 bp, the HNH domain continues to assume the catalytically incompetent orientation observed in the 16-nt match complex, whereas the conformation of the REC2 and REC3 domains remains the same as in the 12-, 14-and 16-nt match complexes (Figs.3b and 4a).The observed conformation is thus consistent with a catalytically inactive checkpoint state inferred from previous biophysical and structural studies 23,24,33 .
The cryo-EM reconstruction obtained from a sample reconstituted in the presence of 10 mM Mg 2+ reveals a catalytically active conformation in which both the TS and the NTS are cleaved at the expected positions (Fig. 4b-f).In contrast to previously reported structures of catalytically active Cas9 enzymes 28,34,35 , the PAM-proximal part of the cleaved NTS remains bound in the RuvC active site (Fig. 4b,f).In this state, the REC2 domain is shifted away from the TS cleavage site, enabling the HNH domain to undergo a rotation of about 140° to engage the TS scissile phosphate with its active site and catalyse its hydrolysis via a one-metal-ion mechanism (Fig. 4d), in agreement with previous structural data 28,34,35 .This rearrangement is facilitated by pronounced bending of the PAM-distal region of the guide RNA-TS DNA heteroduplex and a concomitant reorientation of the REC3 domain that preserves interactions with the heteroduplex (Fig 4c).HNH domain rotation is brought about by restructuring of the L1 and L2 linkers, which results in the widening of the NTS binding cleft and exposure of the RuvC active site (Fig. 4b,e,f).The L1 linker, which is structurally disordered in the 18-nt match checkpoint complex, forms an α-helix and interacts with the minor groove of the guide RNA-TS DNA heteroduplex via multiple hydrogen-bonding interactions (Fig. 4e).The L2 linker helix becomes extended, allowing Phe916 to intercalate between NTS nucleobases by π-π stacking, thereby stabilizing the NTS in the RuvC active site (Fig. 4f).The NTS scissile phosphate is coordinated by two Mg 2+ ions, its position consistent with a His983-dependent catalytic mechanism proposed by molecular dynamics simulations 36 .A recent complementary study reported the structure of a 17-nt match catalytic complex that exhibits nearly identical HNH domain positioning and bent conformation of the guide RNA-TS DNA heteroduplex as observed in the 18-nt match catalytic complex 37 , indicating that catalytic activation can occur once a 17-bp heteroduplex is formed.Together, these structural observations provide a rationale for the allosteric coupling of R-loop formation with HNH domain rearrangement and RuvC active site accessibility, in agreement with single-molecule studies showing that PAM-distal end positioning modulates HNH domain conformation 33 .

Conclusions
In sum, our structural analysis of SpCas9 along its DNA-binding pathway points to a mechanism whereby R-loop formation is allosterically and energetically coupled to domain rearrangements necessary for nuclease domain activation (Extended Data Fig. 9).The initial phase of R-loop formation is facilitated by TS hybridization to a bipartite seed sequence of the guide RNA and interactions of the PAM-distal DNA with the Cas9 REC2 and REC3 domains.The observation of a bipartite seed sequence in the Cas9 guide RNA and a two-step seed hybridization mechanism involving a conformational rearrangement brings parallels with other RNA-guided nucleic acid-targeting systems including the Cascade complex and Argonaute proteins, both of which feature discontinuous seed sequences in their guide RNAs [38][39][40][41] .We identify mutations that destabilize the binding intermediate states and thus increase off-target discrimination, which presents an opportunity for the development of novel high-fidelity SpCas9 variants.As most off-target sequences are only bound but not cleaved [19][20][21]42 , these variants could prove useful for applications that rely on the fidelity of Cas9 target binding, such as transcriptional regulation or base editing 43 . Diectional target DNA hybridization is associated with dynamic repositioning of the REC2, REC3 and HNH domains to initially assume a catalytically inactive, checkpoint conformation upon R-loop completion.As conformational activation of the nuclease domains is allosterically controlled by structural distortion of the PAM-distal end of the guide-target heteroduplex and the sensing of its integrity by Cas9, it is precluded by incomplete PAM-distal heteroduplex pairing (<17 bp).Bona fide off-target substrates are able to pass the conformational checkpoint because they maintain heteroduplex integrity despite the presence of PAM-distal mismatches, in agreement with our recent structural data 44 .Furthermore, guide RNA modifications that result in altered heteroduplex conformation have profound effects on Cas9 nuclease activity and specificity 45 .Together, our structural studies thus highlight the importance of maintaining guide-target complementarity and proper heteroduplex geometry, consistent with biophysical and computational studies showing that the conformation of the R-loop heteroduplex strongly affects off-target binding 11,46 .These findings thus have important implications for ongoing experimental and computational studies of CRISPR-Cas9 off-target activity, and will inform its further technological development.

Online content
Any methods, additional references, Nature Research reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at https://doi.org/10.1038/s41586-022-05114-0.
Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material.If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Expression and purification of Cas9 proteins
Wild-type and mutant SpCas9 proteins were expressed in Escherichia coli Rosetta 2 (DE3) (Novagen) for 16 h at 18 °C as fusion proteins with an N-terminal His 6 -MBP-TEV tag.Bacterial pellets were resuspended and lysed in 20 mM HEPES-KOH pH 7.5, 500 mM KCl, 5 mM imidazole, supplemented with protease inhibitors.Cell lysates were clarified using ultracentrifugation and loaded on a 15 ml Ni-NTA Superflow column (QIAGEN) and washed with 7 column volumes of 20 mM HEPES-KOH pH 7.5, 500 mM KCl, 5 mM imidazole.Tagged Cas9 was eluted with 10 column volumes of 20 mM HEPES-KOH pH 7.5, 250 mM KCl, 200 mM imidazole.Salt concentration was adjusted to 250 mM KCl and the protein was loaded on a 10 ml HiTrap Heparin HP column (GE Healthcare) equilibrated in 20 mM HEPES-KOH pH 7.5, 250 mM KCl, 1 mM DTT.The column was washed with 5 column volumes of 20 mM HEPES-KOH pH 7.5, 250 mM KCl, 1 mM DTT, and dCas9 was eluted with 15 column volumes of 20 mM HEPES-KOH pH 7.5, 1.5 M KCl, 1 mM DTT, in a 0-50% gradient (peak elution around 500 mM KCl).His 6 -MBP tag was removed by TEV protease cleavage overnight at 4 °C with gentle shaking.The untagged protein was concentrated and further purified on a Superdex 200 16/600 gel filtration column (GE Healthcare) in 20 mM HEPES-KOH pH 7.5, 500 mM KCl, 1 mM DTT. Pure fractions were concentrated to 10 mg/ml, flash frozen in liquid nitrogen and stored at 80 °C.

Gel filtration binding assay
The dCas9-guide RNA complex was assembled by incubating 371 pmol dCas9 with 400 pmol of the sgRNA in 20 mM HEPES-KOH pH 7.5, 200 mM KCl, 2 mM MgCl 2 for 10 min at room temperature.Then 250 pmol of Cy5-labelled dsDNA substrate was added and incubated another 15 min.The volume was adjusted up to 100 µl with reaction buffer and the mixture was centrifuged to remove possible precipitates.Individual reactions were transferred to a 96-well plate and analysed using a Superdex 200 Increase 5/150 GL gel filtration column (GE Healthcare) attached to an Agilent 1200 Series Gradient HPLC system.The 260 nm, 280 nm and Cy5 signals were exported and plotted as a function of the retention volume in GraphPad Prism 9.

In vitro nuclease activity assays
Cleavage reactions were performed at 37 °C in reaction buffer, containing 20 mM HEPES pH 7.5, 250 mM KCl, 5 mM MgCl 2 and 1 mM DTT.First, Cas9 protein was pre-incubated with sgRNA in 1:1.25 ratio for 10 min at room temperature.The protein-RNA complex was rapidly mixed with the dsDNA substrates (containing 5′-ATTO-532 labelled TS) (Supplementary Table 1), to yield final concentrations of 1.67 µM protein and 66.67 nM substrate in a 7.5 µl reaction.Complexes were collected at 1 min, 2.5 min, 5 min, 15 min, 45 min, 90 min, 150 min and 24 h.Cleavage was stopped by addition of 2 µl of 250 mM EDTA, 0.5% SDS and 20 µg of proteinase K. Formamide was added to the reactions with final concentration of 50%, samples were incubated at 95 °C for 10 min, and resolved on a 15% denaturing PAGE gel containing 7 M urea and imaged using a Typhoon FLA 9500 gel imager.

Statistics and reproducibility
Nuclease activity rate constants (k obs ) were extracted from single exponential fits: [Product] = A × (1 − exp(−k obs × t)).k obs data are presented as mean ± s.e.m. (n = 4 independent replicates), obtained by direct fitting of four time-course datasets in GraphPad Prism 9 without calculating individual k obs values.Statistical analysis was performed using a two-sided t-test.The confidence interval used was 95%.

Crystallization and X-ray structure determination
The 10-nt complementary ternary complex of dCas9 was assembled by first incubating dCas9 with the sgRNA in a 1:1.5 molar ratio, and pre-purifying the binary complex on a Superdex 200 16/600 gel filtration column (GE Healthcare) in 20 mM HEPES-KOH pH 7.5, 500 mM KCl, 1 mM DTT.The binary complex was diluted in 20 mM HEPES-KOH pH 7.5, 250 mM KCl, 1 mM DTT to 2.5 mg ml −1 and the partially complementary dsDNA substrate was added in 1:1.5 molar excess.For crystallization, 1 µl of the ternary complex (1.5-2.5 mg ml −1 ) was mixed with 1 µl of the reservoir solution (0.1 M sodium cacodylate pH 6.5, 0.8-1.2M ammonium formate, 12-14% PEG4000) and crystals were grown at 20 °C using the hanging drop vapour diffusion setup.Crystals were collected after 3-4 weeks, cryoprotected in 0.1 M Na cacodylate pH 6.5, 1.0 M ammonium formate, 13% PEG4000, 20% glycerol, 2 mM MgCl 2 , and flash-cooled in liquid nitrogen.Diffraction data was measured at the beamline PXIII of the Swiss Light Source at a temperature of 100 K (Paul Scherrer Institute, Villigen, Switzerland) and processed using the autoPROC and STARANISO package with anisotropic cut-off 47 .Phases were obtained by molecular replacement using the Phaser module of the Phenix package 48 using the NUC lobe of the PDB ID: 5FQ5 as initial search model.The crystals belonged to the P1 space group and contained two copies of the complex in the asymmetric unit.

Cryo-EM sample preparation and data acquisition
To assemble the 6-, 8-, 10-, 12-, 14-and 16-nt match complexes, dCas9 protein was mixed with the sgRNA in a 1:1.5 molar ratio, and incubated at room temperature for 10 min in buffer 20 mM HEPES-KOH pH 7.5, 250 mM KCl, 1 mM DTT.The respective partially complementary dsDNA substrate (Supplementary Table 1) was then added in a 1:3 Cas9:DNA molar ratio and incubated another 20 min at room temperature.The complexes were then purified using a Superdex 200 Increase 10/300 GL gel filtration column (GE Healthcare) and eluted in 20 mM HEPES-KOH pH 7.5, 250 mM KCl, 1 mM DTT. Concentration of the monomeric peak was determined using the Qubit 4 Fluorometer Protein Assay, and then diluted to 0.275 mg ml −1 in 20 mM HEPES-KOH pH 7.5, 250 mM KCl cold buffer.3 µl of diluted complexes were applied to a glow discharged 200-mesh holey carbon grid (Au 1.2/1.3Quantifoil Micro Tools), blotted for 1.5-2.5 s at 90% humidity, 20 °C, plunge frozen in liquid propane/ ethane mix (Vitrobot, FEI) and stored in liquid nitrogen.To prepare the 18-nt match (checkpoint), wild-type Cas9-sgRNA complex was reconstituted with substrate DNA in 20 mM HEPES-KOH pH 7.5, 150 mM KCl, 1 mM DTT buffer, and incubated with 1 mM MgCl 2 for 1 min at 37 °C prior to vitrification.The 18-nt match catalytic complex was reconstituted in 20 mM HEPES-KOH pH 7.5, 100 mM KCl, 1 mM DTT buffer, and incubated with 10 mM MgCl 2 for 1 min at 37 °C prior to vitrification.Data collection was performed on a 300 kV FEI Titan Krios G3i microscope equipped with a Gatan Quantum Energy Filter and a K3 direct detection camera in super-resolution mode.Micrographs were recorded at a calibrated magnification of 130,000× with a pixel size of 0.325 Å and subsequently binned to 0.65 Å.Data acquisition was performed automatically using EPU with three shots per hole at −0.8 µm to −2.2 µm defocus.Data for the 18-nt match (checkpoint) complex was collected using a Titan Krios G4 equipped with a SelectrisX energy filter and a FalconIV detector at a magnification of 270,000×, pixel size of 0.45 Å, defocus −0.8 µm to −1.5 µm.and PAM-interacting (PI) domains, as well as the PAM-proximal DNA duplex and the sgRNA from the partially bound complex structures determined by crystallography and cryoEM, and full R-loop complexes (PDB: 6O0X, 6O0Y, 6O0Z) 28 .b, Zoom-in view of the PAM-distal DNA duplex in the 6-nt match complex.The protein surface is coloured according to electrostatic surface potential, with red denoting negative and blue positive charge.c, Interactions between SpCas9 REC2 domain and the backbone of the PAM-distal NTS in the 6-nt match complex.d, Interactions between the REC3 domain and the backbone of the PAM-distal TS in the 6-nt match complex.e, Central slice through the 6-nt match complex.Cryo-EM density map is coloured according to Fig. 1a.White density indicates positioning of the 5' sgRNA end.f, PAM-distal DNA duplex in the 8-nt match complex remains positioned in a positively charged cavity between the REC2 and REC3 domains.The protein surface is coloured according to electrostatic surface potential.g, Interactions of the REC2 domain with the PAM-distal DNA duplex in the 8-nt match complex.h, Interactions of the REC3 domain with the NTS of the PAM-distal duplex in the 8-nt match complex.

Extended Data
a r g e t O f f -t a r g e t 2 O f f -t a r g e t 4O f f -t a r g e t 4

Fig. 1 |
Fig. 1 | Target DNA binding induces Cas9 REC lobe restructuring.a, Top, schematic depicting DNA-bound complexes with increasing extent of complementarity to guide RNA.Bottom, domain composition of SpCas9.1-A, REC1-A domain; I-III, RuvC domain motifs I-III; BH, bridge helix.b, Structural comparison of the SpCas9 binary (left), 6-nt match (middle) and 8-nt match (right) complexes.c, Zoomed-in view of the seed region of the guide RNA-target DNA heteroduplex in the 6-nt match complex.Tyr450 stacks between the fifth and sixth nucleotide, counting from the PAM-proximal end of the heteroduplex.d, Zoomed-in view of the seed region of the guide RNA-target DNA heteroduplex in the 8-nt match complex.e, Fitted cleavage rate (k obs ) of wild-type (WT) and Y450A mutant Cas9 against on-target and off-target substrates.Data represent mean fit ± s.e.m. of n = 4 independent replicates.Two-tailed t-test, ****P < 0.0001, ***P = 0.0002, **P = 0.0011.The P-value for the on-target dataset was not significant (P = 0.1058).

Fig. 2 |Fig. 3 |
Fig. 2 | R-loop propagation drives DNA repositioning within Cas9.a, Zoomed-in views of the conformational transitions in the PAM-distal DNA duplex and Cas9 REC2 and REC3 domains in the 6-, 8-and 10-nt match complex.b, Zoomed-in view of the R-loop in the 6-nt match complex.c, Zoomed-in view of the R-loop in the 8-nt match complex.d, Zoomed-in view of the R-loop in the 10-nt match complex.e, Zoomed-in view of the interaction between the REC2 domain DDD helix and the REC3 RRR helix.

6 Fig. 4 |
Fig. 4 | HNH domain rotation and DNA bending enable catalytic activation.a, The structure of the 18-nt match complex in the pre-cleavage, checkpoint state.b, The structure of the 18-nt match complex in the catalytically active state.c, Conformations of the guide-target heteroduplexes and REC2 and REC3 domains in the 18-nt match checkpoint (left) and catalytic (right) complexes.The structures are shown in the same orientations as in a,b.The HNH domain has been omitted from the images for clarity.d, Zoomed-in view of the HNH nuclease active site in the 18-match catalytic complex containing bound cleaved TS. e, Zoomed-in view of the L1 linker contacting the minor groove of the guide RNA-target DNA heteroduplex.f, Zoomed-in view of the RuvC nuclease active site containing the 3′-terminal product of cleaved NTS.

Extended Data Fig. 1 |
Minimal target complementarity necessary for stable Cas9 binding.Size exclusion chromatography analysis of nucleaseinactive SpCas9 complexed with sgRNA and Cy5-labeled DNA substrates with increasing extent of guide-target complementarity.A254/A280 and Cy5 signals were normalised.Extended Data Fig. 2 | Schematic representation of DNA substrates used in structural studies.Base pair complementarity between sgRNA, target strand (TS), and non-target strand (NTS) is indicated by black lines.Extended Data Fig. 3 | Cryo-EM density maps of DNA-bound SpCas9 complexes.Front (left) and back (right) views of unsharpened cryo-EM density maps of the partially-bound SpCas9 complexes.Maps are coloured by local resolution, and gold-standard FSC of 0.143 resolution graphs and particle distribution heatmaps are indicated for each complex.Extended Data Fig. 4 | Structural models of DNA-bound SpCas9 complexes.Cartoon representations of DNA-bound 6-, 8-, 10-, 12-, 14-, 16-, 18-nt match (checkpoint and catalytic) complexes of SpCas9.Each model was generated based on the corresponding map shown in Extended Data Fig. 3. Extended Data Fig. 5 | See next page for caption.Extended Data Fig. 5 | Stabilisation of the PAM-distal duplex by REC2/3 domains.a, Structural overlays of the SpCas9 bridge helix (BH), REC1, RuvC,

Fig. 6 |
See next page for caption.Extended Data Fig.6|In vitro cleavage activities of structure-guided REC2 and REC3 mutants of Cas9.a, Off-target sequences selected for nuclease activity assays.Nucleotide mismatches between the TRAC guide RNA and the target are highlighted; matching nucleotides are denoted by a dot.b, In vitro cleavage kinetics of Y450A mutants from which k obs values are derived using single exponential fitting.Data represents mean ± SEM (n = 4).c, In vitro cleavage kinetics of REC2/REC3 mutants from which k obs values are derived using single exponential fitting.Data represents mean ± SEM (n = 4).d, Cleavage rate constants of PAM-distal duplex stabilising REC2/REC3 mutants on on-and off-targets.Data represents mean fit ± SEM of n = 4 replicates, significance was determined by a two-tailed t-test.*, p < 0.05; **, p < 0.01; ***, p < 0.001; ****, p < 0.0001.e, In vitro cleavage kinetics of DDD and RRR helix mutants from which k obs values are derived using single exponential fitting.Data represents mean ± SEM (n = 4).f, Cleavage rate constants of Cas9 DDD and RRR helix mutants.Data represents mean fit ± SEM of n = 4 replicates, significance was determined by a two-tailed t-test.*, p < 0.05; **, p < 0.01; ***, p < 0.001; ****, p < 0.0001.On-and off-target substrates were fluorescently labelled on the PAM-proximal end of the target DNA strand in all panels.Extended Data Fig.7| PAM-distal positioning and REC lobe conformation in the 10-nt match complex.a, Cryo-EM density of the 10-nt match complex overlaid with the structural model.NTS density can be traced along the heteroduplex (white).b, Cartoon representations of the X-ray crystallographic structures of the 10-nt match complex as based on the two complex copies (molecules A and B) in the crystallographic asymmetric unit.The complexes exhibit highly similar conformations (RMSD 0.46 Å). c, Alignment of protein sequences of the REC2 DDD and REC3 RRR helices from REC2-containing Cas9 orthologs.Extended Data Fig. 8 | HNH undocking induced by R-loop extension.a, Residual HNH domain density (white) observed in the 14-nt match complex, in which the elongated heteroduplex establishes a contact with the L2 linker.No NTS density is observed past the PAM region due to disorder.b, Zoom-in view of the interaction between the HNH domain L2 linker and the RuvC 1030-1040 helix induced by heteroduplex proximity of the 16-bp complex.c, HNH domain relocation towards the binding channel results in the formation of a positively charged NTS binding channel.No residual electron density (white) is observed for the NTS in the absence of the PAM-distal duplex.The protein is coloured according to electrostatic surface potential, with red being negative, blue positive.d, Surface electrostatics map of the 18-nt match catalytic state of SpCas9, showing the NTS binding cleft with cleaved NTS positioned within the active site.Extended Data Fig. 9 | Molecular mechanism of Cas9 R-loop formation and conformational activation.In the RNA-bound (binary) complex, the central DNA binding channel is occluded by the REC2 and REC3 domains.Upon PAM recognition and initial 5-nt base pairing with the seed sequence of the guide RNA, the REC2 domain is displaced to form a binding cleft to accommodate the PAM-distal DNA duplex.Formation of 8-bp heteroduplex further displaces the REC3 domain and fully opens the central binding channel, while the PAM-distal duplex remains in the REC2/3 cavity.Extension of the R-loop to 10-bp heteroduplex places the guide-TS heteroduplex and the PAM-distal duplex into the central binding channel, accompanied by formation of electrostatic contacts between the REC2 and REC3 domains.Base pairing past the seed region results in undocking of the HNH domain from the RuvC and PI domain interface, and results in its repositioning towards target heteroduplex into the checkpoint state.R-loop formation past 17 base pairs induces REC2 domain displacement from the binding channel and rotation of the HNH domain active site towards the TS cleavage site, while simultaneously positioning the NTS in the RuvC domain active site.