A kinetic model predicts SpCas9 activity, improves off-target classification, and reveals the physical basis of targeting fidelity

Eslami-Mossallam, Behrouz; Klein, Misha; Smagt, Constantijn V. D.; Sanden, Koen V. D.; Jones, Stephen K.; Hawkins, John A.; Finkelstein, Ilya J.; Depken, Martin

doi:10.1038/s41467-022-28994-2

Download PDF

Article
Open access
Published: 15 March 2022

A kinetic model predicts SpCas9 activity, improves off-target classification, and reveals the physical basis of targeting fidelity

Nature Communications volume 13, Article number: 1367 (2022) Cite this article

6150 Accesses
16 Citations
15 Altmetric
Metrics details

Subjects

Abstract

The S. pyogenes (Sp) Cas9 endonuclease is an important gene-editing tool. SpCas9 is directed to target sites based on complementarity to a complexed single-guide RNA (sgRNA). However, SpCas9-sgRNA also binds and cleaves genomic off-targets with only partial complementarity. To date, we lack the ability to predict cleavage and binding activity quantitatively, and rely on binary classification schemes to identify strong off-targets. We report a quantitative kinetic model that captures the SpCas9-mediated strand-replacement reaction in free-energy terms. The model predicts binding and cleavage activity as a function of time, target, and experimental conditions. Trained and validated on high-throughput bulk-biochemical data, our model predicts the intermediate R-loop state recently observed in single-molecule experiments, as well as the associated conversion rates. Finally, we show that our quantitative activity predictor can be reduced to a binary off-target classifier that outperforms the established state-of-the-art. Our approach is extensible, and can characterize any CRISPR-Cas nuclease – benchmarking natural and future high-fidelity variants against SpCas9; elucidating determinants of CRISPR fidelity; and revealing pathways to increased specificity and efficiency in engineered systems.

Systematic decomposition of sequence determinants governing CRISPR/Cas9 specificity

Article Open access 25 January 2022

Massively parallel kinetic profiling of natural and engineered CRISPR nucleases

Article 07 September 2020

Engineered CRISPR/Cas9 enzymes improve discrimination by slowing DNA cleavage to allow release of off-target DNA

Article Open access 17 July 2020

Introduction

CRISPR-Cas9 (Clustered Regularly Interspaced Short Palindromic Repeats—CRISPR-associated protein 9) has become a ubiquitous tool in the biological sciences^1,2, with applications ranging from live-cell imaging³ and gene knockdown/overexpression^4,5 to genetic engineering^6,7 and gene therapy^8,9. Streptococcus pyogenes (Sp) Cas9 can be programmed with a ~100 nucleotide (nt) single-guide RNA (sgRNA) to target DNAs based on the level of complementarity to a 20 nt segment of the sgRNA¹⁰. Wildtype SpCas9 (henceforth Cas9) induces site-specific double-stranded breaks and the catalytically dead Cas9 (dCas9) mutant allows for binding without cleavage^3,5. Apart from complimentary on-targets, Cas9-sgRNA also binds and cleaves non-complementary off-targets^{11,12,13,14,15,16,17,18}. Off-target cleavage risks deleterious genomic alterations, which has so far impeded the widespread implementation of the CRISPR toolkit in human therapeutics¹⁹.

Strong off-target sites are identified in silico by a growing set of tools. These tools use bioinformatics^20,21, machine learning^22,23, or heuristic^12,14,24,25 approaches to rank genomic sites based on distinctive off-target activity scores. Though such models can identify strong off-targets, they are not quantitative and cannot assess activity on the many lesser off-targets; nor can they predict how activity changes with exposure time and enzyme concentration—even though these parameters are frequently exploited to limit off-target activity in cells²⁶.

To implement quantitative activity prediction, Cas9 targeting must be modelled in physical terms. Existing physical models^24,27,28 assume binding equilibration before cleavage, and it remains unclear what predictive power such approaches can ultimately deliver in this non-equilibrium system^29,30. To account for the nonequilibrium nature of the targeting reaction, we construct a mechanistic model that captures binding and cleavage reactions in kinetic terms. To gain insights into general mechanisms, we train and validate our model on high-throughput datasets that capture both binding and cleavage in bulk experiments^15,31. Though we restrict our training to off-targets with two or less mismatches, we accurately predict the activities on all more highly mismatched off-targets in the same datasets, as well as those reported in two independent high-throughput datasets¹¹.

To reveal the physical basis of Cas9 fidelity on genomic scales, we extract the free-energy landscapes that control PAM binding, strand-replacement, and cleavage on any target. Our characterization of Cas9 supports the notion that observed differences in binding and cleavage activities^{32,33,34,35,36,37,38,39,40,41} stem from a relatively long-lived DNA-bound RNA-DNA hybrid (R-loop) intermediate. This R-loop intermediate was recently observed directly in single-molecule experiments⁴², and our model predicts both its location and its conversion rates.

Though the strengths of our model lies in that it allows us to calculate how (d)Cas9 activity evolves in time under various conditions, we also sought to compare our approach to existing binary off-target classifiers that identify strong off-targets. To this end, we reduce our quantitative activity predictor to a binary off-target classifier that outperforms the leading tools used today^12,24,28,43.

Results

The kinetic model

In Fig. 1a we show the reaction pathway that underpins the Cas9 targeting reaction on every target⁴⁴. The reaction starts with Cas9-sgRNA ribonucleoprotein complex exiting the solution state to specifically bind to a 3nt protospacer adjacent motif (PAM) DNA sequence—canonically 5’-NGG-3’—via protein-DNA interactions^44,45. Binding to the PAM sequence (state 0) opens the DNA double helix, and allows the first base of the target sequence to hybridize with the sgRNA^44,45, forming the first R-loop state (state 1). The DNA double helix further denatures as the RNA-DNA hybrid is extended in the guide-target strand-replacement reaction^46,47,48,49 (state 2-20). The hybrid grows and shrinks in single-nucleotide steps, until it is either reversed and Cas9 dissociates, or it reaches completion at 20 base pairs (bp) in state 20. If the full hybrid is formed, Cas9 can use its HNH and RuvC nuclease domains to cleave both DNA strands⁵⁰.

**Fig. 1: The reaction scheme and the implications of the model assumptions.**

If we know the conversion rates in Fig. 1a for a particular guide and target, the reaction scheme can be solved to calculate the binding and cleavage probabilities at any time (Methods). Fully parameterizing the model over all guide and target sequences requires the estimation of ~10²⁶ rates. To render parameter estimation tractable, we make four mechanistic-model assumptions:

(1)
Mismatch positions are more important than mismatch types (e.g. G-G vs. G-A). This can be directly inferred from data^11,15, and we treat all 12 mismatch types equally.
(2)
Mismatch energies are determined by local interactions. The energetic cost of multiple mismatches is taken to be equal to the sum of the energetic costs of the individual mismatches.
(3)
dCas9 differs from Cas9 only in that dsDNA bond-cleavage catalysis is completely suppressed (k_cat = 0); all other rates are taken to be identical^51,52.
(4)
All selectivity is governed by the hybrid-bond-reversal rates. Hybrid-bond-formation rates are treated as equal, independent of complementarity and location.

These assumptions reduce the total number of microscopic parameters to 44 (see Methods): the (concentration dependent) rate of PAM binding from solution (k_on) and the associated free-energy gain (F₀); a single internal forward bond-formation rate (k_f); 20 free energies dictating R-loop progression at the on-target (${F}_{1},\ldots ,\,{F}_{20}$); 20 free-energy penalties for mismatches at different R-loop positions ($\delta {{\epsilon }}_{1},\ldots ,\,\delta {{\epsilon }}_{20}$); and the rate at which the final cleavage reaction is catalyzed for Cas9 (k_cat). Once model parameters are estimated, all possible off-target free energies can be directly calculated using assumptions 1–4 above. In Fig. 1b we illustrate the calculation taking us from the on-target (pink) to the off-target (blue) free-energy landscape with mismatches entering the hybrid at the 3rd and 15th bp. How to translate between free energies and rates is detailed in Methods.

Base-pairing interactions, protein-DNA interactions⁵², and induced conformational changes^50,51,53,54 all contribute to the stability of the Cas9-sgRNA-DNA complex. To account for the varying nature of these interactions, we allow for varying gains and losses in the on-target free-energy landscape as the hybrid is extended. These variable gains and losses allow for the formation of metastable states on the on-target, and constitutes an essential extension of our previous fixed-gain model for RNA-guided nuclease kinetics³⁰, as well as of models describing DNA displacement reactions occurring in solution^55,56,57,58.

Training on binding and cleavage for moderately mismatched targets

We seek to reveal general properties of SpCas9 DNA targeting on genomic scales. To this end, we train and validate our model on data from two highly reproducible bulk-biochemical experiments performed on a large library of moderately to highly mismatched off-targets. The first set¹⁵ (NucleaSeq) has 97% correlation between replicated experiments, and estimates the effective cleavage rates (${k}_{{{{{{\rm{clv}}}}}}}^{{{{{{\rm{eff}}}}}}}$) for a library of off-targets exposed to Cas9-sgRNA for 16 hours. The second set^15,31 (CHAMP) has 94% correlation between replicated experiments, and reports on the effective association constant (${K}_{{{{{{\rm{A}}}}}}}^{{{{{{\rm{eff}}}}}}}$) over the same library and guide, but this time exposed to dCas9-sgRNA for 10 min. In Methods we detail how the experiments are modeled.

We estimate the model parameters by minimizing the total experimental-error weighted residue between prediction and experiment for off-targets (see Methods) with no more than two mismatches in the NucleaSeq (Fig. 2a–c) and CHAMP (Fig. 2d–f) experiments. The rates and association constants from different types of mismatches are averaged (see Methods and Supplementary Data 1), and the optimal solution is sought with a Simulated Annealing algorithm⁵⁹ (see Methods).

**Fig. 2: Training on cleavage and binding for moderately mismatched targets.**

The two training sets differ significantly (Fig. 2, and Supplementary Fig. 1a). Our model still reproduces effective cleavage rates (Fig. 2a–c) and effective association constants (Fig. 2d–f) with a Pearson correlation of 93% and 98% respectively, and quantitatively captures the difference between binding and cleavage activity. The time and concentration dependence of (d)Cas9 activity can be explored through a dashboard we provide (see Code Availability).

Validation on highly mismatched targets and independent data sets

Apart from the data we use for training (two or less mismatches), the NucleaSeq¹⁵ and CHAMP^15,31 sequence libraries also includes block-mismatched targets with more than two mismatches. In Fig. 3a, b we show that we quantitatively predict effective association constants on these highly mismatched targets at a correlation of 98%. Our method also successfully separates out the single dominating off-target present among highly mismatched targets in the NucleaSeq experiments (Supplementary Fig. 1b), resulting in a perfect correlation.

**Fig. 3: Validation on highly mismatched targets and independent HiTS-FLIP data.**

To further validate our model, we test against two data sets from HiTS-FLIP experiments reported in the literature¹¹. The first independent validation set records the association rate relative to the on-target, estimated over 1500 seconds of exposure to dCas9-sgRNA at 1 nM concentration (Fig. 3c–e). The second independent validation set records the dissociation rate relative to the on-target, estimated over 1500 seconds following 12 hours of exposure to a saturating dCas9-sgRNA concentration (Fig. 3f–h). Our model quantitatively captures the relative association rates for all reported targets with 82% correlation (Fig. 3e). For the relative dissociation rates, the correlation is more modest at 46% (Fig. 3h), and the quantitative agreement is lost in some regions (Fig. 3f–h). We still seem to capture the general trends on moderately mismatched targets (Fig. 3f, g), though our model will never give binding/dissociation rates above/below that of the on-target, as is reported for some highly mismatched targets (Fig. 3e, h)

Physical characterization of SpCas9 and the intermediate R-loop state

As our model parameters carry physical meaning, estimating them from data amounts to characterizing the system in physical terms. For Cas9, it has been experimentally shown that R-loop progression is controlled by an intermediate metastable state on the on-target⁴². We expect this intermediate state to show up as a local minimum in our estimated on-target free-energy landscape. The free energy of any metastable state will have a strong influence on the observed dynamics, and we expect such energies to be well constrained by the data. We expect barriers between metastable states to be harder to resolve, as the details of barrier regions matter less for the observable dynamics.

We here report 33 near-equivalent optimization runs that all resulted in a residue that fell within 15% of the best solution found (see Supplementary Video 1). In Fig. 4a we plot the resulting on-target free-energy landscapes, with the optimal solution highlighted in pink. As expected, we see metastable states in the on-target free-energy landscape. With Cas9 in solution or PAM-bound, we have a well-defined free-energy minimum where the R-loop is closed (C). The on-target free energy (Fig. 4a) increases substantially when forming the first hybrid bp in state 1, and remains relatively high and poorly constrained up to and including state 8. The energy of state 9-12 are well constrained, and among them we find a second local minimum. We identify these states as belonging to an intermediate (I) R-loop state. For hybrids of length 13 to 19 bp we again see an ill-constrained barrier, ending when we enter a well-constrained local minimum of a fully formed hybrid at state 20. This last minima defines the open (O) R-loop.

**Fig. 4: Physical parameters estimated from NucleaSeq and CHAMP datasets.**

Mismatch penalties are all around 5k_B T (Fig. 4b), but show reproducible variation along the hybrid. Comparing Fig. 2a, d with Fig. 4b, it is clear that variations in mismatch penalties in the first 8 states correlate strongly with the measured effective cleavage rate/dissociation constant on targets with a single seed mismatch at the corresponding hybrid position. It is not clear if these variations are due to varying interactions with the protein, or reflects the fact that the possible mismatch types vary with position. In Fig. 4c we show the remaining rates needed to predict Cas9 cleavage activity at any target, time, and Cas9-sgRNA concentration (see Methods).

R-loop dynamics captures single-molecule experiments

The recent direct observation of the R-loop dynamics between metastable states⁴² allows us to further test our model against quantitative single-molecule data. To this end, we define a coarse-grained model (Fig. 5a) and calculate the effective rates between metastable states from our microscopic free-energy landscapes (see Methods). In Supplementary Fig. 2 we show that predictions based on our coarse-grained model replicate those of the microscopic model.

**Fig. 5: Metastable states control the targeting dynamics.**

Using effective rates between metastable states, we can rationalize the broad strokes of Cas9 fidelity by considering a few important examples⁴². For on-targets (Fig. 5b), the transition between the PAM bound state and the intermediate R-loop state is reversible (${k}_{{{{{{\rm{PI}}}}}}}\approx {k}_{{{{{{\rm{IP}}}}}}}$) (Fig. 5c). Complexes that enter the intermediate state typically also enter the fully opened state (${k}_{{{{{{\rm{IP}}}}}}}\ll {k}_{{{{{{\rm{IO}}}}}}}$). The transition from intermediate to open R-loop configuration is irreversible (${k}_{{{{{{\rm{IO}}}}}}}\gg {k}_{{{{{{\rm{OI}}}}}}}$), and entering the open configuration guarantees cleavage (${k}_{{{{{{\rm{OI}}}}}}}\ll {k}_{{{{{{\rm{cat}}}}}}}$). Taken together, the on-target reaction is essentially unidirectional toward cleavage, once the intermediate state is entered. The transition into the intermediate R-loop state is rate-limiting (${k}_{{{{{{\rm{PI}}}}}}}\ll {k}_{{{{{{\rm{IO}}}}}}}\ll {k}_{{{{{{\rm{cat}}}}}}}$) for cleavage.

Mismatches between the target DNA and the sgRNA have differential effects on R-loop propagation depending on position. A PAM-proximal mismatch (position 1–8) (Fig. 5d) strongly suppresses the rate of transition from a closed to intermediate R-loop state (Fig. 5e). In contrast, a PAM-distal mismatch (position 12–17) (Fig. 5f) limits the effective rate of cleavage by reducing the intermediate to open transition rate (Fig. 5g), and allowing for re-closure of the R-loop before entering the open state (${k}_{{{{{{\rm{IO}}}}}}}\approx {k}_{{{{{{\rm{IP}}}}}}}$).

These observations are in agreement with the experimental observation⁴², and in Fig. 5c, e we use purple triangles to indicate measured rates⁴² when available at zero torque. We quantitatively predict the conversion rates out of the intermediate R-loop state. The model also captures the position of the on-target intermediate state as being around hybrid length 9-12. Our model does not capture the rate of the open to intermediate transition, and future work will have to determine if this is due to a difference in experimental conditions or because our choice of training data is ill-suited to determine the free energies past the intermediate state.

Our model predicts rates on all off-targets, and so extends and refines the long-established rule of thumb that off-target rejection in the PAM proximal seed requires only one mismatch, while off-target rejection outside the seed region requires multiple mismatches¹⁰. In particular, our model quantifies the intermediate activity resulting from PAM distal mismatch, and so enables prediction of activity titration.

R-loop dynamics resembles conformational dynamics

Next, we wondered what structural properties of Cas9 give rise to the free-energy landscape of Fig. 4a. A comparison between DNA-bound and unbound Cas9-sgRNA structures have revealed that Cas9 repositions its HNH and RuvC nuclease domains to catalyze cleavage^45,60,61. Ensemble FRET experiments detected two dominant Cas9 conformers with distinct HNH states⁵⁰, and single-molecule FRET studies have identified a third intermediate conformer^51,53,54.

The relative position and occupancy of the HNH states is affected by R-loop mismatches^51,53,54, and Ivanov et al.⁴² suggest that the intermediate R-loop state is linked to the intermediate structural state seen in FRET experiments⁵¹. To test this hypothesis, we mimicked the experiments of Dagdas et al.⁵¹, and considered the time evolution of the occupancy of our metastable R-loop states for two target sequences (Fig. 6). The HNH-domain completes its conformational change within seconds after Cas9-sgRNA binds to on-target DNA⁵¹, and our model demonstrates a similar behavior for R-loop progression (Fig. 6a). The intermediate structural state is visited only transiently⁵¹, as is the intermediate R-loop state in our model (Fig. 6a). Compared to the on-target, PAM-distal mismatches maintain the entry rate into the intermediate structural state, while increasing the time spent in this state⁵¹; again in close agreement with our findings for the intermediate and open metastable R-loop states in the presence of a PAM distal mismatch (Fig. 6b). Taken together, our model supports the notion that the intermediate R-loop state is linked to the intermediate structural state seen in FRET experiments.

**Fig. 6: Dynamics among metastable states resemble structural dynamics.**

Kinetic modelling improves genome-wide off-target prediction

Current methods^{12,14,20,21,22,23,24,25,28,43} for identifying strong off-targets rank genomic sequences according to various measures of activity. They do not quantitatively predict biochemically measurable parameters, nor do they normally capture changes in conditions or activity over time. Our approach overcomes these limitations, and we do not suggest that these benefits should be abandoned in order to construct a binary off-target classifier. Still, to strengthen the case for including the full non-equilibrium nature of the problem in any Cas9 modelling, we reduce our quantitative kinetic model to a binary classifier (referred to as kinetic classifier) and test how well it performs against three established state-of-the-art off-target predictors: a recent benchmarking of models²⁸ shows the CRISPRoff classifier to outperform the competition, so we first test against this tool; second, we test against the more recent uCRISPR²⁴ tool, which is based on hybrid energetics and has not been tested against CRISPRoff; lastly, we test against the Cutting Frequency Determination (CFD) score¹², since it is a much-used tool for off-target classification.

To compare our model against the three selected off-target classifiers, we choose to rank all genomic sites based on cleavage activity in the low enzyme-concentration limit (see Methods). We make our comparison over all canonical PAM sites in the human genome. True positive off-targets are collected from sequencing-based cleavage experiments that used industry-standard sgRNAs and reported multiple off-target cleavage sites^{35,36,37,38,40,41,62} (Supplementary Table 1). We tested how well our kinetic model’s ranking of activity compares to that of the CFD score¹², CRISPRoff²⁸, and uCRISPR²⁴. For each sgRNA, we separately tested the models by using the union (sites found in any experiment) and intersection (sites found in every experiment) sets of the reported off-target sites as true positives. We perform precision-recall (PR) analysis (Supplementary Fig. 3) rather than using receiver-operator characteristics (Supplementary Fig. 4) since the datasets are highly unbalanced, with many more true negatives than true positives.

Figure 7a shows the PR curve when models are tested against the union of all reported off-targets while targeting the HBB gene. As the threshold for what is judged a strong off-target is swept, PR curves display the fraction of predicted off-targets that are found experimentally (precision) against the fraction of experimentally found off-targets that are also predicted (recall). Our kinetic classifier typically produces higher precision for all recalls, outperforming the other classifying schemes for the union set on the HBB gene. More importantly, the kinetic classifier also outperforms the leading off-target predictors for highly-mismatched genomic off-targets of other sgRNAs: performing best on the majority of targets in every pairwise matchup on both union (Fig. 7b, c) and intersection (Fig. 7d, e) sets, and irrespectively of if max. F1 or area under the curve (AUC) scores are used.

**Fig. 7: Genome-wide off-target classification.**

Discussion

Training our model (Fig. 1) of SpCas9 target activity on moderately mismatched targets, we extract the physical parameters (Fig. 4) that control activity on any target (Figs. 2 and 3). Going beyond present-day binary off-target classification schemes, we quantitatively predict cleavage and binding activity as a function of both time and SpCas9-sgRNA concentration.

We show that SpCas9’s targeting reaction contain an intermediate R-loop state, with both position and conversion rates that agree with single-molecule experiments⁴² (Fig. 5). Mismatches affect the dynamics of the R-loop states (Fig. 6) in a manner similarity to how they affect the configurational states of SpCas9’s nuclease domains^42,51,53. Based on this, we lend support to the notion that R-loop formation is tightly coupled to protein conformation—pointing toward the relevant structure-function relation for the most important RNA-guided nuclease in use today.

Though our model captures the abundant low-activity off-targets that are discarded by binary classifiers, we sought to demonstrate the general utility of kinetic modelling by reducing our quantitative activity predictor to a binary classification tool. The resulting kinetic classifier outperforms established state-of-the-art classification tools on canonical PAM sites in the human genome (Fig. 7).

In a recent study, Jost et al.⁵ demonstrated that a series of mismatched guides can be used to titrate gene expression using CRISPRa/CRISPRi. Wildtype SpCas9 can also be (effectively) inactivated with PAM-distal mismatches in the guide⁶³. Our model can guide such titration of SpCas9-sgRNA inactivation by careful placement of mismatches. Our approach can also be used to calculate the total off-target activity over a genome, and so inform the design of sgRNAs for novel gene targets.

For simplicity and robustness, we built our model to exclude mismatch type parameters. This allows for extensive training using datasets based on a single guide sequence and off-target DNAs containing up to two mismatches. The limited set of adjustable model parameters (44 in total) and efficient data usage (422 data points used for training) does not seem to limit the model’s applicability (Figs. 2, 3, 7). The success of our low-complexity model strongly suggest that the path to increased predictive power and therapeutic relevance runs through bottom-up modelling of RNA-guided nucleases in kinetic terms.

Taken together, we have shown that our mechanistic and kinetic model gives biophysical insight and quantitative predictive power far beyond the training sets. This predictive power is only expected to increase when including sequence features and allowing for alternative PAM sequences in future modelling efforts. SpCas9 is only one of many RNA-guided nucleases with biotechnological applications, and other CRISPR associated nucleases (such as Cas12a, Cas13 and Cas14) offer a diversified genome-engineering toolkit^{15,64,65,66,67,68,69}. These nucleases can all be characterized with our approach, and it will be especially interesting to compare the free-energy landscape of our SpCas9 benchmark to that of engineered^41,54,70 and natural (e.g. N. meningitides Cas9⁷¹) high-fidelity Cas9 variants.

Methods

Modelling of the (d)Cas9 targeting reaction

We consider a single DNA target sequence with a PAM, in contact with (d)Cas9-sgRNA in solution at fixed concentration (Fig. 1a). (d)Cas9-sgRNA binding to the PAM site is assumed to be first order,

$${k}_{{{{{{\rm{on}}}}}}}={k}_{{{{{{\rm{on}}}}}}}^{{{{{{\rm{ref}}}}}}}[{{{{{\rm{Cas}}}}}}9-{{{{{\rm{sgRNA}}}}}}]$$

where [Cas9-sgRNA] is the concentration of active complexes relative to some reference concentration (we use 1 nM). Binding is followed by a Cas9-mediated strand exchange reaction between sgRNA and the DNA. Once a 20 bp hybrid is formed, Cas9 can cleave the DNA, while dCas9 cannot. We model the targeting recognition as a stochastic hopping process along a sequence of states: target unbound (n = −1), PAM bound ($n=0$), and strand exchange ($n=1,2,\ldots ,20$). We use the column vector ${{{{{\bf{P}}}}}}(t)={({P}_{-1}(t),\ldots ,{P}_{20}(t))}^{T}$ to represent the probabilities of being in the various states at time t. The evolution of probabilities is captured by the Master Equation

$${\partial }_{t}{{{{{\bf{P}}}}}}(t)={{{{{\bf{K}}}}}}\cdot {{{{{\bf{P}}}}}}(t),$$

where ${{{{{\bf{K}}}}}}$ is a tri-diagonal rate matrix. Letting ${k}_{n}^{{{{{{\rm{f}}}}}}}$ be the forward ($n\to n+1$) transition rate, ${k}_{n}^{{{{{{\rm{b}}}}}}}$ to be the backward ($n\to n-1$) transition rate (Fig. 1a), and defining ${k}_{-1}^{{{{{{\rm{b}}}}}}}=0$, we can give the elements of ${{{{{\bf{K}}}}}}$ as

$${{{{{{\bf{K}}}}}}}_{nm}=\left\{\begin{array}{ll}{k}_{n-1}^{{{{{{\rm{f}}}}}}} & m=n-1\\ -({k}_{n}^{{{{{{\rm{f}}}}}}}+{k}_{n}^{{{{{{\rm{b}}}}}}}) & m=n\\ {k}_{n+1}^{{{{{{\rm{b}}}}}}} & m=n+1\\ 0 & |n-m|\ge 2.\end{array}.\right.$$

The Master Equation has the formal solution

$${{{{{\bf{P}}}}}}(t)=\exp ({{{{{\bf{K}}}}}}t)\cdot {{{{{\bf{P}}}}}}(0)$$

which can be computed numerically, given any set of rates ${{{{{\bf{K}}}}}}$ and initial probabilities ${{{{{\bf{P}}}}}}(0)$. The above expression, with initial probabilities and rates adjusted to experimental conditions (see below), allows us to capture the full time-dependent evolution of the targeting reaction in quantitative terms.

Parameter reduction

Based on the mechanistic-model assumption 1, we average the data over mismatch types (see below), and only keep track of if there is a match or a mismatch at every position. Model assumption 3 means that the model of dCas9 is the same as for Cas9, but with ${k}_{20}^{{{{{{\rm{f}}}}}}}=0$. Model assumption 4 implies that ${k}_{0}^{{{{{{\rm{f}}}}}}}={k}_{1}^{{{{{{\rm{f}}}}}}}=\ldots ={k}_{19}^{{{{{{\rm{f}}}}}}}\equiv {k}_{{{{{{\rm{f}}}}}}}$. To see the implications of model assumption 2, we move to a description in terms of free energies.

Denote the free energy of any state n with F_n, and imagine that states n and $n-1$ are allowed to mutually equilibrate. Equilibration means that the relative occupancy is described by Boltzmann weights and that there are no net probability currents between the states

$$\frac{{P}_{n-1}^{{{{{{\rm{EQ}}}}}}}}{{P}_{n}^{{{{{{\rm{EQ}}}}}}}}=\frac{\exp \left(-\frac{{F}_{n-1}}{{k}_{{{{{{\rm{B}}}}}}}T}\right)}{\exp \left(-\frac{{F}_{n}}{{k}_{{{{{{\rm{B}}}}}}}T}\right)},\,{P}_{n-1}^{{{{{{\rm{EQ}}}}}}}{k}_{n-1}^{{{{{{\rm{f}}}}}}}={P}_{n}^{{{{{{\rm{EQ}}}}}}}{k}_{n}^{{{{{{\rm{b}}}}}}}.$$

The above relationships tie rates to free-energy differences through

$$\Delta {F}_{n}={F}_{n}-{F}_{n-1}={k}_{{{{{{\rm{B}}}}}}}T\,{{{{\mathrm{ln}}}}}\left(\frac{{k}_{n}^{{{{{{\rm{b}}}}}}}}{{k}_{n-1}^{{{{{{\rm{f}}}}}}}}\right).$$

Using $n=-1$ as the free-energy reference (${F}_{-1}=0\,{k}_{{{{{{\rm{B}}}}}}}T$), the assumption that binding is first-order implies

$${F}_{0}={F}_{0}^{{{{{{\rm{ref}}}}}}}-{k}_{{{{{{\rm{B}}}}}}}T\,{{{{\mathrm{ln}}}}}([{{{{{\rm{Cas}}}}}}9-{{{{{\rm{sgRNA}}}}}}]).$$

Here ${F}_{0}^{{{{{{\rm{ref}}}}}}}$ is the free energy of the PAM bound state at the reference concentration (1 nM). Mechanistic-model assumption 2 now implies that $\Delta {F}_{1\le n\le 20}$ only depends on if there is a mismatch at position $n$ or not, and we can write

$$\Delta {F}_{n}=\left\{\begin{array}{ll}{{\epsilon }}_{n}, & {{{{{\rm{match}}}}}}\\ {{\epsilon }}_{n}+\delta {{\epsilon }}_{n} & {{{{{\rm{mismatch}}}}}}\end{array}\right.,\,n=1,\ldots 20.$$

Here ${{\epsilon }}_{n}$ is the free-energy increase when extending the hybrid from length $n-1$ to length $n$ if the $n$:th hybrid bp is correctly matched, and $\delta {{\epsilon }}_{n}$ is the additional energy needed when the bp is incorrectly matched. We can write the backward transition rates as

$${k}_{n}^{{{{{{\rm{b}}}}}}}=\left\{\begin{array}{ll}{k}_{{{{{{\rm{on}}}}}}}^{{{{{{\rm{ref}}}}}}}\exp (\frac{{F}_{0}^{{{{{{\rm{ref}}}}}}}}{{k}_{{{{{{\rm{B}}}}}}}T}), & n=0,\\ {k}_{{{{{{\rm{f}}}}}}}\exp (\frac{\Delta {F}_{n}}{{k}_{{{{{{\rm{B}}}}}}}T}), & n=1,\ldots ,20.\end{array}\right.$$

The model is now parameterized it in terms of 41 free energies (${F}_{0}^{{{{{{\rm{ref}}}}}}}$, ${{\epsilon }}_{1},\ldots ,{{\epsilon }}_{20}$, $\delta {{\epsilon }}_{1},\ldots ,\delta {{\epsilon }}_{20}$) and three forward rates (${k}_{{{{{{\rm{on}}}}}}}^{{{{{{\rm{ref}}}}}}}$, ${k}_{{{{{{\rm{f}}}}}}}$, and ${k}_{{{{{{\rm{cat}}}}}}}$).

Predicting NucleaSeq cleavage rates

To produce predications for training and validation, we model experimental setups. To model NucleaSeq data¹⁵, we use the solution to the Master Equation to calculate the expected cleaved fraction at any complementarity pattern. NucleaSeq is performed by exposing targets to saturating concentrations of Cas9-sgRNA, which we model by setting ${F}_{0}=-1000{k}_{{{{{{\rm{B}}}}}}}T$ and taking ${P}_{-1}(0)=1$, ${P}_{0\le n\le 20}(0)=0$ as initial condition. As done in the original experiment, we record the fraction of DNA that remains uncleaved (${\sum }_{n=-1}^{20}{P}_{n}(t)$) at the time points t = 0 s, 12 s, 60 s, 180 s, 600 s, 1800 s, 6000 s, 18000 s, and 60000 s, and fit-out a single effective cleavage rate ${k}_{{{{{{\rm{clv}}}}}}}^{{{{{{\rm{eff}}}}}}}$. There is no a priori reason for the uncleaved fraction to follow an exponential decay, but as long as we follow the experimental data-analysis protocol we can use the effective cleavage rates to train and validate our model.

Predicting CHAMP association constants

We model the CHAMP experiments^15,31 by calculating the bound fraction (${\sum }_{n=0}^{20}{P}_{n}(t)$) of dCas9-sgRNA after 10 min at concentrations 0.1 nM, 0.3 nM, 1 nM, 3 nM, 10 nM, 30 nM, 100 nM and 300 nM, starting with the probabilities ${P}_{-1}(0)=1$, ${P}_{0\le n\le 20}(0)=0$. We use the equilibrium binding fraction

$${P}_{{{{{{\rm{bnd}}}}}}}^{{{{{{\rm{EQ}}}}}}}=\frac{[{{{{{\rm{Cas}}}}}}9-{{{{{\rm{sgRNA}}}}}}]}{[{{{{{\rm{Cas}}}}}}9-{{{{{\rm{sgRNA}}}}}}]+1/{K}_{{{{{{\rm{A}}}}}}}^{{{{{{\rm{eff}}}}}}}}$$

to fit out an effective association constant ${K}_{A}^{{{{{{\rm{eff}}}}}}}$. Again, there is no a priori reason to believe that this non-equilibrium system will equilibrate within 10 min, but as long as we follow the experimental data-analysis protocol we can use ${K}_{{{{{{\rm{A}}}}}}}^{{{{{{\rm{eff}}}}}}}$ for training and validation.

Predicting HiTS-FLIP association rates

To predict measured association rates in the HiTS-FLIP experiment¹¹, we assume the recorded fluorescence signal to be proportional to our calculated bound fraction of dCas9-sgRNA, when starting with the probabilities ${P}_{-1}(0)=1$, ${P}_{0\le {{{{{\rm{n}}}}}}\le 20}(0)=0$. Following the experiments we use linear regression to extract an effective association rate by fitting a straight line to the bound fraction at time points 500 s, 1000 s and 1500 s.

Predicting HiTS-FLIP dissociation rates

To predict measured dissociation rates in the HiTS-FLIP experimen¹¹, we again compared the fluorescence signal to our calculated bound fraction of dCas9, starting with the probabilities ${P}_{-1}(0)=1$, ${P}_{0\le n\le 20}(0)=0$. We let the protein associate at saturating concentrations for 12 h, and record the resulting occupational probabilities. We then use these probabilities as new initial probabilities, while also letting ${k}_{{{{{{\rm{on}}}}}}}=0$ ($[{{{{{\rm{Cas}}}}}}9-{{{{{\rm{sgRNA}}}}}}]=0$) in ${{{{{\bf{K}}}}}}$, before further evolving the system. This allows us to model complex dissociation in the presence of a high concentration of competitor on-targets in solution. Following the experiments, we fit an exponential decay to our predictions at timepoints 500 s, 1000 s, and 1500 s.

Averaging over mismatch types

Our model does not account for mismatch types, and for training we need to average over all experimentally measured mismatch sequences $s$ consistent with a mismatch pattern $p$. We expect rates to be proportional to exponentiated transition-state free energies, and association constants to be controlled by exponentiated binding free energies. We therefore choose to perform our mismatch-type averages over the logarithm of rates and association constants, bringing these averages close to averages of energies. For measured quantities $m={k}_{{{{{{\rm{clv}}}}}}}^{{{{{{\rm{eff}}}}}}}$ or ${K}_{{{{{{\rm{A}}}}}}}^{{{{{{\rm{ref}}}}}}}$, we chose a weighted mismatch-type average

$${\langle {\log }_{10}{m}^{\ast }\rangle }_{p}=\mathop{\sum\limits}_{s\in \left(\begin{array}{c}{{{{{\rm{sequences}}}}}}\,{{{{{\rm{with}}}}}}\\ {{{{{\rm{mm}}}}}}\,{{{{{\rm{pattern}}}}}}\,p\end{array}\right)}{W}_{s}{\log }_{10}{m}_{s}^{\ast }.$$

Here ${m}_{s}^{\ast }$ is the measured value for target sequences $s$. We take the weights to be given by

$${W}_{s}=\frac{1/\delta {({\log }_{10}{m}_{s}^{\ast })}^{2}}{{\sum }_{\sigma \in \left(\begin{array}{c}{{{{{\rm{sequences}}}}}}{{{{{\rm{with}}}}}}\\ {{{{{\rm{mm}}}}}}{{{{{\rm{pattern}}}}}}p\end{array}\right)}1/\delta {({\log }_{10}{m}_{\sigma }^{\ast })}^{2}}.$$

Here $\delta ({\log }_{10}{m}_{s}^{\ast })$ is the experimental error for the logarithm of the measurement at a particular sequence $s$. This choice of weights minimizes the error-normalized square deviation on the sequence resolved data, if we have complete freedom to set the average for each mismatch pattern. Our model is more constrained then this, but with this weighing our model could—at least in principle—give the best possible approximation of the sequence resolved data. The squared error in the mismatch-type average can be calculated as

$$\delta .$$

Cost function

We look to simultaneously optimize our predictions of both effective cleavage rates from NucleaSeq (${k}_{{{{{{\rm{clv}}}}}}}^{{{{{{\rm{eff}}}}}}}$) and effective dissociation constants from CHAMP (${K}_{{{{{{\rm{A}}}}}}}^{{{{{{\rm{ref}}}}}}}$). We combine the cost from each experiment

$${\chi }^{2}={\chi }_{{k}_{{{{{{\rm{clv}}}}}}}^{{{{{{\rm{eff}}}}}}}}^{2}+{\chi }_{{K}_{{{{{{\rm{A}}}}}}}^{{{{{{\rm{ref}}}}}}}}^{2}$$

by summing log deviations

$${\chi }_{m}^{2}=\mathop{\sum\limits}_{p\in \left(\begin{array}{c}{{{{{\rm{all}}}}}}\,{{{{{\rm{mm}}}}}}\,{{{{{\rm{patters}}}}}}\\ {{{{{\rm{used}}}}}}\,{{{{{\rm{for}}}}}}\,{{{{{\rm{training}}}}}}\end{array}\right)}{w}_{p}^{m}{({\log }_{10}({m}_{p})-{\langle {\log }_{10}{m}^{\ast }\rangle }_{p})}^{2}.$$

In the above ${m}_{p}$ represent the model prediction for the average measured quantity at mismatch pattern $p$. The weights ${w}_{p}^{m}$ are chosen so the error-weighted contribution from the on-target, the $20$ singly mismatched off-targets, and the $20\cdot 19/2=190$ doubly mismatched off-targets are weighted equally as groups

$${w}_{p}^{m}=\frac{1}{\delta {\langle {\log }_{10}{m}^{\ast }\rangle }_{p}^{2}}\cdot \left\{\begin{array}{cc}1, & p={{{{{\rm{on}}}}}}\,{{{{{\rm{target}}}}}}\\ 1/20, & p\in {{{{{\rm{single}}}}}}\,{{{{{\rm{mm}}}}}}\\ 1/190, & p\in {{{{{\rm{double}}}}}}\,{{{{{\rm{mm}}}}}}.\end{array}\right.$$

Simulated annealing

The Simulated Annealing algorithm⁵⁹ is commonly used for high-dimensional optimization problems. We optimize with respect to model parameters ${F}_{0}^{{{{{{\rm{ref}}}}}}}$, ${{\epsilon }}_{1},\ldots ,{{\epsilon }}_{20}$, $\delta {{\epsilon }}_{1},\ldots ,\delta {{\epsilon }}_{20}$, ${\log }_{10}({k}_{{{{{{\rm{on}}}}}}}^{{{{{{\rm{ref}}}}}}}/{{{{{\rm{s}}}}}})$, ${\log }_{10}({k}_{{{{{{\rm{f}}}}}}}/{{{{{\rm{s}}}}}})$, and ${\log }_{10}({k}_{{{{{{\rm{cat}}}}}}}/{{{{{\rm{s}}}}}})$. Trial moves are generated by adding a uniform noise of magnitude $\alpha$ to the present value of each model parameter. The process is initiated with a noise strength $\alpha =0.1.$ In the initiation cycle the temperature is adjusted until we have an acceptance fraction of 40–60% over 1000 trial moves, based on the Metropolis condition. After this initial cycle, the temperatures follow an exponential cooling scheme with a 1% cooling rate (${T}_{k+1}=0.99{T}_{k}$). At every temperature, we adjust the noise strength $\alpha$ until an acceptance fraction of 40–60% is reached over 1000 trial moves. Once the desired acceptance fraction is reached, an additional 1000 trial moves are performed to allow the system relax before the next cooling step. Once the temperature has dropped to one percent of its initial value we, apply the stop condition

$$|{\bar{\chi }}_{k}^{2}-{\bar{\chi }}_{k-1}^{2}|\le {10}^{-5}{\bar{\chi }}_{k-1}^{2}.$$

In the above, ${\bar{\chi }}_{k}^{2}$ denotes our cost function averaged over the last 1000 trial moves performed at temperature ${T}_{k}$. The results of this optimization is shown in Fig. 4.

Calculating coarse-grained transition rates

First we find the intermediate state on every possible target. As the central-local minimum in free energy (Fig. 4a) can be slightly displaced by mismatches on off-targets, we seek the free-energy minimum ${n}_{{{{{{\rm{I}}}}}}}$ between R-loop state 7 and 13 for every target. To calculate the effective rates of the coarse-grained model in Fig. 5a, we consider the first passage between metastable states. Take for example the passage from the PAM-bound state ($n=0$) to the intermediate state ($n={n}_{{{{{{\rm{I}}}}}}}$) on a specific target. To calculate the associated first-passage time, we truncate the full system to only include states $n=0,\ldots ,{n}_{{{{{{\rm{I}}}}}}}-1$. We use the rate matrix ${{{{{{\bf{K}}}}}}}_{{{{{{\rm{PI}}}}}}}$ with elements

$${({{{{{{\bf{K}}}}}}}_{{{{{{\rm{PI}}}}}}})}_{nm}={{{{{{\bf{K}}}}}}}_{nm},\,0\le n,m\le {n}_{{{{{{\rm{I}}}}}}}-1$$

and ${k}_{0}^{{{{{{\rm{b}}}}}}}=0$. With the initial state ${{{{{{\bf{P}}}}}}}_{{{{{{\rm{PI}}}}}}}(0)={(1,0,\ldots ,0)}^{T}$ we solve the Master Equation, and calculate the first-passage time distribution as

$${\Psi }_{{{{{{\rm{PI}}}}}}}(t)=-(1,\ldots ,1)\cdot {\partial }_{t}{{{{{{\bf{P}}}}}}}_{{{{{{\rm{PI}}}}}}}(t).$$

The effective transition rate ${k}_{{{{{{\rm{PI}}}}}}}$ is the inverse of the average first-passage time ${\tau }_{{{{{{\rm{PI}}}}}}}$, which can be calculated as

$${\tau }_{{{{{{\rm{PI}}}}}}}={\int }_{0}^{\infty }{{{{{\rm{d}}}}}}t\,t{\Psi }_{{{{{{\rm{PI}}}}}}}(t)=(1,\ldots ,1)\cdot {{{{{{\bf{K}}}}}}}_{{{{{{\rm{PI}}}}}}}^{-1}\cdot {{{{{{\bf{P}}}}}}}_{{{{{{\rm{PI}}}}}}}(0).$$

The same process was used to calculate all other rates of directly transitioning between meta-stable states, repeated over every target sequence.

Constructing a binary off-target predictor

We rank all canonical PAM sites in the human genome according to their relative cleavage rate in the low concentration limit. In this limit, the cleavage rate is given by the PAM binding rate times the probability to cleave once the PAM site is bound. As the PAM binding rate is not expected to depend on the sgRNA sequence $s$, we can rank our off-targets based on the cleavage probability once bound³⁰,

$${P}_{{{{{{\rm{PAM}}}}}}\to {{{{{\rm{clv}}}}}}}(s)=\frac{{k}_{{{{{{\rm{cat}}}}}}}\,{e}^{\frac{{F}_{-1}(p(s))}{{k}_{{{{{{\rm{B}}}}}}}T}}}{{k}_{{{{{{\rm{cat}}}}}}}{\sum }_{n=0}^{19}{e}^{\frac{{F}_{n}(p(s))}{{k}_{{{{{{\rm{B}}}}}}}T}}+{k}_{{{{{{\rm{f}}}}}}}{e}^{\frac{{F}_{20}(p(s))}{{k}_{{{{{{\rm{B}}}}}}}T}}}.$$

Here $p(s)$ is the mismatch pattern of sequence $s$.

Statistics & Reproducibility

Only experimental data giving physical positive values for mismatch-averaged rates and association constants were included in the correlation analysis. See Supplementary Data 1.

Reporting Summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

The data supporting the findings of this study are available from the corresponding authors upon reasonable request. Mismatch averaged experimental data used for training and validation (Figs. 2 and 3), estimated microscopic parameters (Fig. 4), and genome wide off-target classification evaluation (Fig. 7b–e), are all provided as Supplementary Data 1.

Code availability

The code enabling quantitative off-target activity prediction for any guide-target pair is available on our GitLab page (https://gitlab.tudelft.nl/depken_group/SpCas9_kinetic_model_dashboard). There you will also find a small dashboard application, allowing time resolved activity predictions given a particular sequence and enzyme concentration. A clone of the repository at publication is also permanently available at https://doi.org/10.5281/zenodo.5790798. The purpose made optimization code will be made available upon request.

References

Sander, J. D. & Joung, J. K. CRISPR-Cas systems for editing, regulating and targeting genomes. Nat. Biotechnol. 32, 347–350 (2014).
Article CAS PubMed PubMed Central Google Scholar
Wang, H., La Russa, M. & Qi, L. S. CRISPR/Cas9 in Genome Editing and Beyond. Annu. Rev. Biochem. 85, 227–264 (2016).
Article CAS PubMed Google Scholar
Chen, B. et al. Dynamic imaging of genomic loci in living human cells by an optimized CRISPR/Cas system. Cell 155, 1479–1491 (2013).
Article CAS PubMed PubMed Central Google Scholar
Gilbert, L. A. et al. CRISPR-mediated modular RNA-guided regulation of transcription in eukaryotes. Cell 154, 442 (2013).
Article CAS PubMed PubMed Central Google Scholar
Jost, M. et al. Titrating gene expression using libraries of systematically attenuated CRISPR guide RNAs. Nat. Biotechnol. 38, 355–364 (2020)
Niu, D. et al. Inactivation of porcine endogenous retrovirus in pigs using CRISPR-Cas9. Science 357, 1303–1307 (2017).
Article ADS CAS PubMed PubMed Central Google Scholar
Hammond, A. et al. A CRISPR-Cas9 gene drive system targeting female reproduction in the malaria mosquito vector Anopheles gambiae. Nat. Biotechnol. 34, 78–83 (2016).
Article CAS PubMed Google Scholar
Amoasii, L. et al. Gene editing restores dystrophin expression in a canine model of Duchenne muscular dystrophy. Science 362, 1–6 (2018).
Article CAS Google Scholar
Park, C. Y. et al. Functional Correction of Large Factor VIII Gene Chromosomal Inversions in Hemophilia A Patient-Derived iPSCs Using CRISPR-Cas9. Cell Stem Cell 17, 213–220 (2015).
Article CAS PubMed Google Scholar
Jinek, M. et al. A Programmable Dual-RNA – Guided. Science 337, 816–822 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
Boyle, E. A. et al. High-throughput biochemical profiling reveals sequence determinants of dCas9 off-target binding and unbinding. Proc. Natl Acad. Sci. 114, 5461–5466 (2017).
Article CAS PubMed PubMed Central Google Scholar
Doench, J. G. et al. Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9. Nat. Biotechnol. 34, 184–191 (2016).
Article CAS PubMed PubMed Central Google Scholar
Fu, Y. et al. High-frequency off-target mutagenesis induced by CRISPR-Cas nucleases in human cells. Nat. Biotechnol. 31, 822–826 (2013).
Article CAS PubMed PubMed Central Google Scholar
Hsu, P. D. et al. DNA targeting specificity of RNA-guided Cas9 nucleases. Nat. Biotechnol. 31, 827–832 (2013).
Article CAS PubMed PubMed Central Google Scholar
Jones, S. K. Jr et al. Massively parallel kinetic profiling of natural and engineered CRISPR nucleases. Nat. Biotechnol. 39, 84–93 (2021).
Article PubMed CAS Google Scholar
Kim, D., Luk, K., Wolfe, S. A. & Kim, J.-S. Evaluating and Enhancing Target Specificity of Gene-Editing Nucleases and Deaminases. Annu. Rev. Biochem. 88, 191–220 (2019).
Pattanayak, V. et al. High-throughput profiling of off-target DNA cleavage reveals RNA-programmed Cas9 nuclease specificity. Nat. Biotechnol. 31, 839–843 (2013).
Article CAS PubMed PubMed Central Google Scholar
Tsai, S. Q. & Joung, J. K. Defining and improving the genome-wide specificities of CRISPR-Cas9 nucleases. Nat. Rev. Genet. 17, 300–312 (2016).
Article CAS PubMed PubMed Central Google Scholar
Cullot, G. et al. CRISPR-Cas9 genome editing induces megabase-scale chromosomal truncations. Nat. Commun. 10, 1–14 (2019).
Article CAS Google Scholar
Labun, K., Montague, T. G., Gagnon, J. A., Thyme, S. B. & Valen, E. CHOPCHOP v2: a web tool for the next generation of CRISPR genome engineering. Nucleic Acids Res. 44, W272–W276 (2016).
Article CAS PubMed PubMed Central Google Scholar
Heigwer, F., Kerr, G. & Boutros, M. E-CRISP: Fast CRISPR target site identification. Nat. Methods 11, 122–123 (2014).
Article CAS PubMed Google Scholar
Listgarten, J. et al. Prediction of off-target activities for the end-to-end design of CRISPR guide RNAs. Nat. Biomed. Eng. 2, 38–47 (2018).
Article CAS PubMed PubMed Central Google Scholar
Chuai, G. et al. DeepCRISPR: Optimized CRISPR guide RNA design by deep learning. Genome Biol. 19, 1–18 (2018).
Article CAS Google Scholar
Zhang, D., Hurst, T., Duan, D. & Chen, S.-J. Unified energetics analysis unravels SpCas9 cleavage activity for optimal gRNA design. Proc. Natl Acad. Sci. 116, 8693–8698 (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
Stemmer, M., Thumberger, T., Del Sol Keyer, M., Wittbrodt, J. & Mateo, J. L. CCTop: An intuitive, flexible and reliable CRISPR/Cas9 target prediction tool. PLoS One 10, 1–11 (2015).
Article CAS Google Scholar
Tycko, J., Myer, V. E. & Hsu, P. D. Methods for Optimizing CRISPR-Cas9 Genome Editing Specificity. Mol. Cell 63, 355–370 (2016).
Article CAS PubMed PubMed Central Google Scholar
Farasat, I. & Salis, H. M. A Biophysical Model of CRISPR/Cas9 Activity for Rational Design of Genome Editing and Gene Regulation. PLoS Comput. Biol. 12, 1–33 (2016).
Article CAS Google Scholar
Alkan, F., Wenzel, A., Anthon, C., Havgaard, J. H. & Gorodkin, J. CRISPR-Cas9 off-targeting assessment with nucleic acid duplex energy parameters. Genome Biol. 19, 177 (2018).
Article PubMed PubMed Central CAS Google Scholar
Bisaria, N., Jarmoskaite, I. & Herschlag, D. Lessons from Enzyme Kinetics Reveal Specificity Principles for RNA-Guided Nucleases in RNA Interference and CRISPR-Based Genome Editing. Cell Syst. 4, 21–29 (2017).
Article CAS PubMed PubMed Central Google Scholar
Klein, M., Eslami-Mossallam, B., Arroyo, D. G. & Depken, M. Hybridization Kinetics Explains CRISPR-Cas Off-Targeting Rules. Cell Rep. 22, 1413–1423 (2018).
Jung, C. et al. Massively Parallel Biophysical Analysis of CRISPR-Cas Complexes on Next Generation Sequencing Chips. Cell 170, 35–47.e13 (2017).
Article CAS PubMed PubMed Central Google Scholar
O’Geen, H., Henry, I. M., Bhakta, M. S., Meckler, J. F. & Segal, D. J. A genome-wide analysis of Cas9 binding specificity using ChIP-seq and targeted sequence capture. Nucleic Acids Res. 43, 3389–3404 (2015).
Article PubMed PubMed Central CAS Google Scholar
Kuscu, C., Arslan, S., Singh, R., Thorpe, J. & Adli, M. Genome-wide analysis reveals characteristics of off-target sites bound by the Cas9 endonuclease. Nat. Biotechnol. 32, 677–683 (2014).
Article CAS PubMed Google Scholar
Wu, X. et al. Genome-wide binding of the CRISPR endonuclease Cas9 in mammalian cells. Nat. Biotechnol. 32, 670–676 (2014).
Article CAS PubMed PubMed Central Google Scholar
Cameron, P. et al. Mapping the genomic landscape of CRISPR-Cas9 cleavage. Nat. Methods 14, 600–606 (2017).
Article CAS PubMed Google Scholar
Tsai, S. Q. et al. CIRCLE-seq: A highly sensitive in vitro screen for genome-wide CRISPR-Cas9 nuclease off-targets. Nat. Methods 14, 607–614 (2017).
Article CAS PubMed PubMed Central Google Scholar
Kim, D. et al. Digenome-seq: Genome-wide profiling of CRISPR-Cas9 off-target effects in human cells. Nat. Methods 12, 237–243 (2015).
Article CAS PubMed Google Scholar
Tsai, S. Q. et al. GUIDE-seq enables genome-wide profiling of off-target cleavage by CRISPR-Cas nucleases. Nat. Biotechnol. 33, 187–198 (2015).
Article CAS PubMed Google Scholar
Frock, R. L. et al. Genome-wide detection of DNA double-stranded breaks induced by engineered nucleases. Nat. Biotechnol. 33, 179–188 (2015).
Article CAS PubMed Google Scholar
Yan, W. X. et al. BLISS is a versatile and quantitative method for genome-wide profiling of DNA double-strand breaks. Nat. Commun. 8, 1–9 (2017).
Article ADS CAS Google Scholar
Slaymaker, I. M. et al. Rationally engineered Cas9 nucleases with improved specificity. Science 351, 84–88 (2016).
Article ADS CAS PubMed Google Scholar
Ivanov, I. E. et al. Cas9 interrogates DNA in discrete steps modulated by mismatches and supercoiling. Proc. Natl Acad. Sci. U. S. A. 117, 5853–5860 (2020).
Article CAS PubMed PubMed Central Google Scholar
Haeussler, M. et al. Evaluation of off-target and on-target scoring algorithms and integration into the guide RNA selection tool CRISPOR. Genome Biol. 17, 1–12 (2016).
Article CAS Google Scholar
Anders, C., Niewoehner, O., Duerst, A. & Jinek, M. Structural basis of PAM-dependent target DNA recognition by the Cas9 endonuclease. Nature 513, 569–573 (2014).
Article ADS CAS PubMed PubMed Central Google Scholar
Jiang, F., Zhou, K., Gressel, S. & Doudna, J. A. A cas9 guide RNA complex preorganized for target DNA recognition. Science 348, 1477–1482 (2015).
Article ADS CAS PubMed Google Scholar
Josephs, E. A. et al. Structure and specificity of the RNA-guided endonuclease Cas9 during DNA interrogation, target binding and cleavage. Nucleic Acids Res. 43, 8924–8941 (2015).
Article CAS PubMed PubMed Central Google Scholar
Rutkauskas, M. et al. Directional R-loop formation by the CRISPR-cas surveillance complex cascade provides efficient off-target site rejection. Cell Rep. 10, 1534–1543 (2015).
Article CAS PubMed Google Scholar
Szczelkun, M. D. et al. Direct observation of R-loop formation by single RNA-guided Cas9 and Cascade effector complexes. Proc. Natl Acad. Sci. 111, 9798–9803 (2014).
Article ADS CAS PubMed PubMed Central Google Scholar
Xiao, Y. et al. Structure Basis for Directional R-loop Formation and Substrate Handover Mechanisms in Type I CRISPR-Cas System. Cell 170, 48–60.e11 (2017).
Article CAS PubMed PubMed Central Google Scholar
Sternberg, S. H., Lafrance, B., Kaplan, M. & Doudna, J. A. Conformational control of DNA target cleavage by CRISPR-Cas9. Nature 527, 110–113 (2015).
Article ADS CAS PubMed PubMed Central Google Scholar
Dagdas, Y. S., Chen, J. S., Sternberg, S. H., Doudna, J. A. & Yildiz, A. A conformational checkpoint between DNA binding and cleavage by CRISPR-Cas9. Sci. Adv. 3, 1–9 (2017).
Article CAS Google Scholar
Sung, K., Park, J., Kim, Y., Lee, N. K. & Kim, S. K. Target Specificity of Cas9 Nuclease via DNA Rearrangement Regulated by the REC2 Domain. J. Am. Chem. Soc. 140, 7778–7781 (2018).
Article CAS PubMed Google Scholar
Yang, M. et al. The Conformational Dynamics of Cas9 Governing DNA Cleavage Are Revealed by Single-Molecule FRET. Cell Rep. 22, 372–382 (2018).
Article CAS PubMed Google Scholar
Chen, J. S. et al. Enhanced proofreading governs CRISPR-Cas9 targeting accuracy. Nature 550, 407–410 (2017).
Article ADS CAS PubMed PubMed Central Google Scholar
Irmisch, P., Ouldridge, T. E. & Seidel, R. Modeling DNA-Strand Displacement Reactions in the Presence of Base-Pair Mismatches. J. Am. Chem. Soc. 142, 11451–11463 (2020).
Article CAS PubMed Google Scholar
Srinivas, N. et al. On the biophysics and kinetics of toehold-mediated DNA strand displacement. Nucleic Acids Res. 41, 10641–10658 (2013).
Article CAS PubMed PubMed Central Google Scholar
Šulc, P., Ouldridge, T. E., Romano, F., Doye, J. P. K. & Louis, A. A. Modelling toehold-mediated RNA strand displacement. Biophys. J. 108, 1238–1247 (2015).
Article ADS PubMed PubMed Central CAS Google Scholar
Broadwater, D. W. B. & Kim, H. D. The Effect of Basepair Mismatch on DNA Strand Displacement. Biophys. J. 110, 1476–1484 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Kirkpatrick, S., Gelatt, C. D. & Vecchi, M. P. Jr. Optimization by simulated annealing. Science 220, 671–680 (1983).
Article ADS MathSciNet CAS PubMed MATH Google Scholar
Jiang, F. et al. Structures of a CRISPR-Cas9 R-loop complex primed for DNA cleavage. Science 351, 867–871 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Jinek, M. et al. Structures of Cas9 endonucleases reveal RNA-mediated conformational activation. Science 343, 1247997 (2014).
Kim, D., Kim, S., Kim, S., Park, J. & Kim, J. S. Genome-wide target specificities of CRISPR-Cas9 nucleases revealed by multiplex Digenome-seq. Genome Res. 26, 406–415 (2016).
Article CAS PubMed PubMed Central Google Scholar
Dahlman, J. E. et al. Orthogonal gene knockout and activation with a catalytically active Cas9 nuclease. Nat. Biotechnol. 33, 1159–1161 (2015).
Article CAS PubMed PubMed Central Google Scholar
Chen, J. S. et al. CRISPR-Cas12a target binding unleashes indiscriminate single-stranded DNase activity. Science 360, 436–439 (2018).
Article ADS CAS PubMed PubMed Central Google Scholar
Gootenberg, J. S. et al. Nucleic acid detection with CRISPR-Cas13a/C2c2. Science 356, 438–442 (2017).
Article ADS CAS PubMed PubMed Central Google Scholar
Gootenberg, J. S. et al. Multiplexed and portable nucleic acid detection platform with Cas13, Cas12a, and Csm6. Science 444, 439–444 (2018).
Article ADS CAS Google Scholar
Harrington, L. B. et al. Programmed DNA destruction by miniature CRISPR-Cas14 enzymes. Science 362, 839–842 (2018).
Article ADS CAS PubMed PubMed Central Google Scholar
Kim, D. et al. Genome-wide analysis reveals specificities of Cpf1 endonucleases in human cells. Nat. Biotechnol. 34, 863–868 (2016).
Article CAS PubMed Google Scholar
Kleinstiver, B. P. et al. Genome-wide specificities of CRISPR-Cas Cpf1 nucleases in human cells. Nat. Biotechnol. 34, 869–874 (2016).
Article CAS PubMed PubMed Central Google Scholar
Kleinstiver, B. P. et al. High-fidelity CRISPR-Cas9 nucleases with no detectable genome-wide off-target effects. Nature 529, 490–495 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Amrani, N. et al. NmeCas9 is an intrinsically high-fidelity genome-editing platform Jin-Soo Kim. Genome Biol. 19, 1–25 (2018).
Article CAS Google Scholar

Download references

Acknowledgements

We would like to thank Kristian Blom, Diewertje Dekker, and Sonny de Jong for valuable discussions and/or their help during the project. We also thank the members of the Chirlmin Joo lab and Stan Brouns lab for valuable discussions. We thank Evan Boyle for sharing data and answering all our questions. This work was supported by: Netherlands Organization for Scientific Research (NWO) (FOM-140), B.E.M.; Zwaartekracht NanoFront, NWO M.K.; Parents in KIND program, The Kavli Institute of Nanoscience Delft/the Department of Bionanoscience at TU Delft/through a Spinoza Prize awarded to M. Dogterom, M.D.; University of Texas College of Natural Sciences Catalyst award and the Welch Foundation (F-1808) I.J.F.; U.S. National Institute of Health (R01GM124141, F32AG053051) I.J.F. and S.K.J.

Author information

Behrouz Eslami-Mossallam
Present address: Dept. Building Physics and Systems, TNO Building and Construction Research, Leeghwaterstraat 44, Delft, The Netherlands
Misha Klein & Constantijn V. D. Smagt
Present address: Department of Physics and Astronomy, and LaserLaB Amsterdam, Vrije Universiteit Amsterdam, De Boelelaan 1081, 1081 HV, Amsterdam, the Netherlands
Stephen K. Jones Jr.
Present address: VU LSC-EMBL Partnership for Genome Editing Technologies, Life Sciences Center, Vilnius University, Vilnius, Lithuania
John A. Hawkins
Present address: European Molecular Biology Laboratory, Genome Biology Department, Heidelberg, Germany
These authors contributed equally: Behrouz Eslami-Mossallam, Misha Klein.

Authors and Affiliations

Kavli Institute of NanoScience and Department of BionanoScience, Delft University of Technology, Delft, 2629HZ, the Netherlands
Behrouz Eslami-Mossallam, Misha Klein, Constantijn V. D. Smagt, Koen V. D. Sanden & Martin Depken
Department of Molecular Biosciences, University of Texas at Austin, Austin, TX, 78712, USA
Stephen K. Jones Jr., John A. Hawkins & Ilya J. Finkelstein
Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, TX, 78712, USA
Stephen K. Jones Jr., John A. Hawkins & Ilya J. Finkelstein
Center for Systems and Synthetic Biology, University of Texas at Austin, Austin, TX, 78712, USA
Stephen K. Jones Jr., John A. Hawkins & Ilya J. Finkelstein
Oden Institute for Computational Engineering and Science, University of Texas at Austin, Austin, TX, 78712, USA
John A. Hawkins

Authors

Behrouz Eslami-Mossallam
View author publications
You can also search for this author in PubMed Google Scholar
Misha Klein
View author publications
You can also search for this author in PubMed Google Scholar
Constantijn V. D. Smagt
View author publications
You can also search for this author in PubMed Google Scholar
Koen V. D. Sanden
View author publications
You can also search for this author in PubMed Google Scholar
Stephen K. Jones Jr.
View author publications
You can also search for this author in PubMed Google Scholar
John A. Hawkins
View author publications
You can also search for this author in PubMed Google Scholar
Ilya J. Finkelstein
View author publications
You can also search for this author in PubMed Google Scholar
Martin Depken
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

B.E.M. and M.K.: Designed and performed the research, and wrote the manuscript K.v.d.S. and C.v.d.S.: Performed the research. S.K.J.: Provided data, and wrote manuscript J.H.: Provided data, and wrote manuscript I.J.F.: Provided data, and wrote manuscript M.D.: Conceived of the project, designed the research, and wrote the manuscript.

Corresponding author

Correspondence to Martin Depken.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks Peter von Hippel and the other, anonymous, reviewers for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Peer Review File

Description of Additional Supplementary Files

Supplementary Movie 1

Supplementary Data 1

Reporting Summary

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Eslami-Mossallam, B., Klein, M., Smagt, C.V.D. et al. A kinetic model predicts SpCas9 activity, improves off-target classification, and reveals the physical basis of targeting fidelity. Nat Commun 13, 1367 (2022). https://doi.org/10.1038/s41467-022-28994-2

Download citation

Received: 11 June 2020
Accepted: 11 February 2022
Published: 15 March 2022
DOI: https://doi.org/10.1038/s41467-022-28994-2

This article is cited by

The energy landscape for R-loop formation by the CRISPR–Cas Cascade complex
- Dominik J. Kauert
- Julene Madariaga-Marcos
- Ralf Seidel
Nature Structural & Molecular Biology (2023)
High-throughput biochemistry in RNA sequence space: predicting structure and function
- Emil Marklund
- Yuxi Ke
- William J. Greenleaf
Nature Reviews Genetics (2023)
Genome-wide CRISPR off-target prediction and optimization using RNA-DNA interaction fingerprints
- Qinchang Chen
- Guohui Chuai
- Qi Liu
Nature Communications (2023)
Interpretable neural architecture search and transfer learning for understanding CRISPR–Cas9 off-target enzymatic reactions
- Zijun Zhang
- Adam R. Lamson
- Olga Troyanskaya
Nature Computational Science (2023)
A quantitative model for the dynamics of target recognition and off-target rejection by the CRISPR-Cas Cascade complex
- Marius Rutkauskas
- Inga Songailiene
- Ralf Seidel
Nature Communications (2022)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Introduction

Results

The kinetic model

Training on binding and cleavage for moderately mismatched targets

Validation on highly mismatched targets and independent data sets

Physical characterization of SpCas9 and the intermediate R-loop state

R-loop dynamics captures single-molecule experiments

R-loop dynamics resembles conformational dynamics

Kinetic modelling improves genome-wide off-target prediction

Discussion

Methods

Modelling of the (d)Cas9 targeting reaction

Parameter reduction

Predicting NucleaSeq cleavage rates

Predicting CHAMP association constants

Predicting HiTS-FLIP association rates

Predicting HiTS-FLIP dissociation rates

Averaging over mismatch types

Cost function

Simulated annealing

Calculating coarse-grained transition rates

Constructing a binary off-target predictor

Statistics & Reproducibility

Reporting Summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Comments

Search

Quick links