Structure and Dynamics of Cas9 HNH Domain Catalytic State

The bacterial CRISPR-Cas9 immune system has been harnessed as a powerful and versatile genome-editing tool and holds immense promise for future therapeutic applications. Despite recent advances in understanding Cas9 structures and its functional mechanism, little is known about the catalytic state of the Cas9 HNH nuclease domain, and identifying how the divalent metal ions affect the HNH domain conformational transition remains elusive. A deeper understanding of Cas9 activation and its cleavage mechanism can enable further optimization of Cas9-based genome-editing specificity and efficiency. Using two distinct molecular dynamics simulation techniques, we have obtained a cross-validated catalytically active state of Cas9 HNH domain primed for cutting the target DNA strand. Moreover, herein we demonstrate the essential roles of the catalytic Mg2+ for the active state formation and stability. Importantly, we suggest that the derived catalytic conformation of the HNH domain can be exploited for rational engineering of Cas9 variants with enhanced specificity.


I. Supplementary Figures
. spCas9-sgRNA complexed with PAM-containing double-stranded DNA (dsDNA) substrate. (a) Overall architecture of the ternary complex of spCas9, sgRNA and dsDNA (PDB code: 5F9R). Cas9 NUC lobe comprises of two nuclease domains (RuvC and HNH), C-terminal domain (CTD) and topoisomerase homology (Topo) domain, and REC lobe is spatially divided into three domains (REC1, REC2 and REC3); the two lobes are connected by an arginine-rich (bridge) helix (BH). The target and nontarget DNA strands (tDNA and ntDNA) are colored dark and light green, respectively, with the PAM duplex highlighted in crimson. (b) Close-up view of the HNH domain active center (PDB code: 5F9R). The putative catalytic residues are depicted in a stick model and colored by atom type (C, yellow; N, blue; O, red). (c) Structured-guided protein engineering to improve spCas9 specificity (PDB code: 4UN3). Neutralization of the selected basic residues on the HNH domain (colored blue) was shown to reduce spCas9 off-target effects while maintaining off-target activity. The cleavage site on tDNA is denoted with a scissor.   Figure S4. Cα RMSD distributions for the HNH and ββα fold calculated from the conventional and accelerated MD simulations relative to the starting crystal structure (PDB code: 5F9R). The average pairwise RMSDs for the HNH domain and ββα motif among the available Cas9 crystal structures in different binding forms is 1.4±0.6 and 1.4±0.7 Å, respectively (see Table S1), which are comparable to the corresponding peak values calculated from the cMD simulations. In contrast, aMD shows significantly larger RMSD values peaking at 4 Å, indicating the enhanced sampling accompanies considerable internal structural change. Figure S5. Putative catalytic state of Cas9 HNH domain modeled from T4 Endonuclease VII (Endo VII)/DNA complex (PDB code: 2QNC). (a) T4 Endo VII ββα-metal motif complexed with a DNA substrate. The Cα atoms of the active residues (i.e. Asp40, His41 and Asn62) are rendered as light blue spheres, and the coordinated Mg 2+ is depicted as a cyan sphere. (b) Cas9 HNH domain opposite to the target DNA strand (PDB code: 5F9R). The Cα atoms of the putative catalytic residues (i.e. Asp839, His840 and Asp861) are represented as dark blue spheres and the HNH domain ββα-metal motif is colored yellowgreen. (c) The scissile phosphate and flanking nucleotides of the DNA substrate in a superimposed onto the corresponding stretch in b alongside the ββα-metal motif. (d) Topology-independent structural alignment between Cas9 and Endo VII ββα-metal motifs (PDB codes: 5F9R and 2QNC) using the CLICK algorithm 4 The Cα RMSD of the equivalent residues (shown as spheres) between the two nucleases is 1.2 Å. The catalytic residues appear to be spatially superimposed well. (e) Cas9 HNH domain oriented toward the target DNA strand based on the transformation matrix obtained from d. (f) Direct "docking" of the HNH domain starting from the pre-catalytic state (PDB code: 5F9R) results in a number of steric clashes with other components (including Cas9 REC lobe, tDNA and sgRNA) in the system. The overlapping heavy atoms are shown as van der Walls spheres, using a distance cutoff of 1.4 Å. The backbone RMSD for the HNH domain between the pre-catalytic and resulting docked conformations is 25 Å here.  . Distribution of the minimum distance (d H840_+4P ) between the general base His840 and scissile phosphate (+4P) calculated from the wild type and Asp839Ala mutant simulations. The last 300-ns of the simulation trajectories were collected for computations. The above inset illustrates the calculated distance and the black dashed line denotes the corresponding value from the pre-catalytic crystal structure. It is evident that upon Asp839Ala substitution, the general base His840 goes away from the scissile phosphate (from ~5 Å to ~10 Å), leading to impaired nuclease activity. Figure S8. Structural superposition between the cMD ens -derived catalytic state and crystal pre-catalytic state (PDB code: 5F9R). The HNH domain within 5F9R (labeled as HNH cry ) is highlighted in green and the remainder of Cas9 colored blue, with the largest domain movement (involving HNH, REC2 and CTD) denoted by a yellow arrow. The cMD ens -derived catalytic structure is colored by domains as in Figure 3c: HNH (labeled as HNH cMD ), magenta; REC1, pink; REC2, pale yellow; REC3, iceblue; RuvC; orange; CTD; grey; Topo, hotpink, BH, cyan.  Supplementary Fig. 9g. Note that all the interacting pairs do not necessarily appear in one single snapshot, and the complete residue list is present in Table S3.   The residues whose alanine substitution was experimentally shown to enhance Cas9 specificity are highlighted in blue (see Supplementary Fig. 1c). The promising candidate residues for further testing, determined based on our study, are in red boldface. * Part of HNH domain flanking link regions (L1&L2) included into statistics § Salt bridge interaction is defined as the distance between the nitrogen and oxygen atoms is less than 4 Å;

II. Supplementary Tables
A hydrogen bond (H-bond) is defined as the distance between the donor and receptor atoms is less than 3.5 Å and the angle formed by the donor, hydrogen and acceptor atoms is less than 35° from 180°. † Post targeted MD (tMD)-derived interactions (G6 in Table 1). ‡ Presence (√) or not (-) in the initial pre-catalytic crystal structure (PDB code: 5F9R) ¶ Presence (√) or not (-) in the ensemble conventional MD (cMD ens )-derived catalytic state # Suggested amino acid mutations for further specificity improvement

III. Supplementary Computational Procedures Principal Component Analysis (PCA)
PCA is a technique for transforming a series of potentially coordinated observations into a set of orthogonal vectors called principal components (PCs) and is widely used to characterize the dominant modes of motion underlying protein dynamics 6,7 . The calculations of PCs involve two main steps, i) the calculation of covariance matrix, and ii) the diagonalization of this matrix. With the goal of comparing the conformational dynamics of HNH domain between different MD simulations, the whole simulation trajectories (G1-G4 and G10, Table 1) were first combined and superimposed to the starting crystal structure using the Cas9 Cα atoms excluding those on the HNH domain. After that, the PCA calculations were performed only on the HNH domain to determine the eigenvectors and associated eigenvalues (referred to collectively as eigenmode). The eigenvector with the largest eigenvalue corresponds to the lowest mode of motion. The PC analysis was done with the ccptraj module included within the AmberTools16 8 .

HNH Active State Modeling & HNH Pairwise RMSD Computation
Starting from the pre-catalytic Cas9 structure (PDB code: 5F9R 5 ), the detailed procedure modeling its putative catalytic state of HNH domain from the homologous T4 Endonuclease VII (Endo VII) complexed with a DNA Holliday junction (PDB code: 2QNC 9 ) was illustrated in Supplementary Fig. 5. It should be mentioned that 2QNC represents a catalytically active state where one Mg 2+ was coordinated at the interface between the enzyme ββα motif and scissile phosphate (see also Figure 2c-d in the main text), making it the best candidate for active state modeling among the available ββα-metal nuclease structures.
We took three steps to model the HNH active state. In step 1, the scissile phosphate and flanking nucleotides in the T4 Endo VII system (2QNC) was aligned to the corresponding tDNA stretch in the Cas9 complex of the pre-catalytic state (5F9R). In step 2, Cas9 HNH domain was moved toward the tDNA with the transformation matrix calculated from the paired ββα motifs in the two nucleases, resulting in a model of the HNH domain docked at the cleavage site. Notably, the equivalent residues between the above ββα motifs for transformation matrix calculation were determined based on topology-independent structure superposition by the CLICK algorithm 4 instead of generally used sequence alignment. The backbone RMSD of HNH domain between the pre-catalytic Cas9 state (5F9R) and the modeled "active" state is 25 Å (Supplementary Fig. 5f). In step 3, we repeated step 1 and step 2, replacing the crystal structure (5F9R) with snapshot structures from the sets of long cMD trajectories (G1 and G2, Table 1). We thereby obtained a modeled "active" state for every snapshot of the simulations. We calculated RMSD between the snapshot structure and its corresponding "active" state and used it as a metric to evaluate how close the snapshot conformation to its putative active state.

Details of Generating tMD-derived Catalytic State
We employed the targeted molecular dynamics (tMD) method to drive Cas9 conformational transition. The target structures for tMD were built by reference to the catalytically active T4 Endo VII system above ( Supplementary Fig. 5). To minimize the potential artificial effect by tMD, we extracted two snapshots from the sets of long cMD trajectories (G1 and G2, Table 1) as the starting structures that show most proximity to the their respective modeled "active" states in terms of HNH domain conformation (Supplementary Fig. 5). The backbone RMSD differences for the HNH domain from the target structures are about 10 Å, which are remarkably reduced as compared with that of 25 Å if using the pre-catalytic crystal structure (5F9R) as the starting point. Accordingly, the tMD stating points were much closer to the corresponding end points in the subspace defined by the first two principal components with regard to the crystal structure (Figure 1d). Not surprisingly, simple docking of the HNH domain toward the putative catalytic state inevitably brings about numerous steric clashes with the other components in the complex system ( Supplementary Fig. 5f), indicating considerable conformational rearrangements in Cas9 must be implicated during the pre-catalytic to catalytic state transition. We note that we did not employ the trajectory snapshots from aMD, albeit further approach to the target conformations with a minimum RMSD difference of ~5 Å, as it appears that the enhanced sampling via aMD also accompanies an appreciable distortion regarding the internal conformation of HNH domain (Supplementary Fig. 4 Table   1). During tMD, the Cα atoms of the protein residues (excluding HNH domain) exhibiting low fluctuations were weakly restrained with a force constant of 0.1 kcal/mol/Å 2 to prevent solute global drift. The guiding forces were exerted only on the HNH domain backbone atoms with a force constant of 0.5 kcal/mol/Å 2 , and the simulation time was set to 100 ns (G5 , Table 1), representing a RMSD deceasing rate of ca. 0.1 Å/ns. At the end of tMD, the RMSD between the initial and target coordinates declined to ~0.8 Å, indicating completion of the anticipated conformation change. We selected two structure snapshots that are at near the end of tMD for subsequent cMD (G6 , Table 1), in which one Mg 2+ was introduced at the interface between the HNH domain and tDNA in the framework of the one-metal-ion mechanism (Figure 2d) 10,11 . Here, we did not employ the tMD end structures (i.e. at 100 ns) as the start points for Mg 2+ introduction, given that the modeled target coordinates used in tMD do not necessarily represent a true catalytic state, and importantly, that the Mg 2+ may assist further conformation change to bridge the distance gap for catalysis as we previously demonstrated with the RuvC domain 12 . This consideration allowed for spontaneous adaptation of the system to the catalytic conformation. The deliberate building procedures could ensure least perturbation on the system and hence eliminate potential artificial effects by the tMD that is readily subjected to question. After sufficient equilibration, we finally obtained a reasonable catalytic conformation, featuring stable Mg 2+ -involved coordination configuration (Figure 2a) that matches well with that observed in the T4 Endo VII system (Figure 2c).

Details of Generating cMD ens -derived Catalytic State
The above tMD-based strategy to capture the catalytic state in essence is dependent on a modeled putative "target" state. One may question the reliability of the derived state and associated results, though the model was treated with careful considerations. To eliminate these concerns, we developed an ensemble samplingbased scheme targeting the active state forward. The basic idea is as follows: i) pre-define an a priori metric (or multiple if necessary) like distance, angle and RMSD; ii) use this metric to track conformational transition and screen a structure most approximate to expected target state; iii) perform ensemble conventional MD simulations (cMD ens ) starting from the above extracted structure; iv) screen another closest structure snapshot from previous cMD ens and initiate a new cycle of ensemble simulations. Ideally, we could get closer to or even hit the target conformation through several or more cycles, depending on the energetic barrier height between the initial and target states and the sampling length accessible to each independent run.
Here we used the geometric mean of the distances of +4P (the scissile phosphate) to the two active residues His840 and Asp861 ( "# %&'#( * "# %*'+, ) as a metric to monitor the HNH domain conformational change: the smaller this value, the closer to the target active state (Figure 4a,b). From the sets of long cMD trajectories (G1 and G2, Table 1), we extracted a structure bearing a minimum value of ~9 Å as the starting point for ensemble simulations (Figure 3a), where one Mg 2+ was placed around the reaction center as done for the post tMD simulations (see Materials and Methods). In each cycle, a total of 10 independent runs were carried out and each run lasted 500 ns (G8.1-G8.4, Table 1). Through four cycles, the above geometric mean got sable at ~6 Å (Figure 3a), which is comparable to that observed for the tMD-derived catalytic state (Figure 4a). Accordingly, the RMSD of the reaction interface from the tMD-derived catalytic state declined from initial ~3 Å to ~1 Å (Figure 4b). Moreover, the Mg 2+ -involved coordination composition and configuration here (Figure 2b) are essentially the same to those derived from tMD (Figure 2a), except that Tyr823 was engaged to Asp839 via an intercalated water molecule, again confirming the structural role of Tyr823 around the reaction center. These observations thus demonstrated formation of the cMD ens -derived catalytic state.

Cluster Analysis
The simulation structures used for visualization and comparison were determined through the cluster analysis with the package VMD (version 1.9.2) 13 . Following our previous experience with the same system 12 , the reaction interface atoms were selected for calculations, involving the heavy atoms of the three active residues, Asp839, His840 and Asp861, the Cα atoms of the remaining residues on the HNH ββα motif, the backbone of the scissile stretch on the tDNA (+3P to +5P), and the coordinated Mg 2+ between them. By varying the RMSD cutoff (0.6~1.0 Å here), we finally obtained four groups in which the first two account for > 80% of total population. The structure(s) closest to the centroid of the largest ensemble were extracted for analysis.

Binding Free Energy Calculation & Per-residue Energy Decomposition
The end-point Molecular Mechanics-Generalized Born Surface Area (MM-GBSA) approach 14 was employed to estimate per-residue energetic contribution to Mg 2+ binding and the difference in the affinities of the tDNA to Cas9 with and without Mg 2+ bound at the reaction interface. Compared to the alternative Molecular Mechanics-Poisson Boltzmann Surface Area (MM-PBSA), MM-GBSA is computationally more efficient and has shown to give comparable or even better accuracy 14,15 . All the MM-GBSA calculations were performed with the program MMPBSA.py in AmberTools16 16 . The entropic contribution was not taken into account here, due to the high computational cost and potential convergence problem. However, omission of this term does not qualitatively affect the results 14,15 . The last 400 ns of each set of simulation trajectories were used for calculations, with 50-ps intervals. Specifically, in the case of Mg 2+ binding free energy calculation, the three water molecules closet to the coordinated Mg 2+ in each trajectory snapshot were retained and considered as part of the Cas9-sgRNA/tDNA "receptor".

Non-bonded Interaction Energy Calculation
The non-bonded interaction energies of the HNH ββα motif with the scissile phosphate and flanking nucleotides (+3P to +5P) were calculated by the software NAMD (version 2.12) 17 , employing the same structural ensemble as mentioned above. The truncation cutoff was set to 10 Å, consistent with that used in MD simulations.