Enhanced proofreading governs CRISPR–Cas9 targeting accuracy

Abstract

The RNA-guided CRISPR–Cas9 nuclease from Streptococcus pyogenes (SpCas9) has been widely repurposed for genome editing1,2,3,4. High-fidelity (SpCas9-HF1) and enhanced specificity (eSpCas9(1.1)) variants exhibit substantially reduced off-target cleavage in human cells, but the mechanism of target discrimination and the potential to further improve fidelity are unknown5,6,7,8,9. Here, using single-molecule Förster resonance energy transfer experiments, we show that both SpCas9-HF1 and eSpCas9(1.1) are trapped in an inactive state10 when bound to mismatched targets. We find that a non-catalytic domain within Cas9, REC3, recognizes target complementarity and governs the HNH nuclease to regulate overall catalytic competence. Exploiting this observation, we design a new hyper-accurate Cas9 variant (HypaCas9) that demonstrates high genome-wide specificity without compromising on-target activity in human cells. These results offer a more comprehensive model to rationalize and modify the balance between target recognition and nuclease activation for precision genome editing.

Main

Efforts to minimize off-target cleavage by CRISPR–Cas9 have motivated the development of SpCas9-HF1 and eSpCas9(1.1) variants that contain amino-acid substitutions predicted to weaken the energetics of target site recognition and cleavage8,9 (Fig. 1a). Biochemically, we found that these Cas9 variants cleaved the on-target DNA with rates similar to that of wild-type (WT) SpCas9, whereas their cleavage activity was significantly reduced on substrates bearing mismatches (Extended Data Figs 1a and 2a). To test the hypothesis that SpCas9 with its single-guide RNA (sgRNA) might exhibit a greater affinity for its target than is required for effective recognition9,11, we measured DNA binding affinity and cleavage of SpCas9-HF1 and eSpCas9(1.1) variants. Contrary to a potential hypothesis that mutating these charged residues to alanine weakens target binding11, the affinities of these variants for on-target and PAM-distal mismatched substrates were similar to WT SpCas9 (Fig. 1b and Extended Data Figs 1a and 2b), indicating that cleavage specificity is improved through a mechanism distinct from a reduction of target binding affinity11.

Figure 1: High-fidelity Cas9 variants enhance cleavage specificity through HNH conformational control.
figure1

a, Locations of amino-acid alterations in existing high-fidelity SpCas9 variants mapped onto the dsDNA-bound SpCas9 crystal structure (Protein Data Bank (PDB) accession number 5F9R); HNH domain is omitted for clarity. b, Dissociation constants with mean and s.d. are shown; n = 3 independent experiments (overlaid as white circles). c, Cartoon of DNA-immobilized SpCas9 for measuring HNH conformation by smFRET, with DNA target numbering scheme. df, smFRET histograms showing HNH conformation with indicated Cas9 variants bound to on-target and mismatched targets using nucleotide numbers diagrammed in c. Black curves represent a fit to multiple Gaussian peaks.

PowerPoint slide

Source data

The HNH nuclease domain of SpCas9 undergoes a substantial conformational rearrangement upon target binding12,13,14,15, which activates the RuvC nuclease for concerted cleavage of both strands of the DNA12,16. It was previously shown that the HNH domain stably docks in its active state with an on-target substrate, but becomes loosely trapped in a catalytically inactive conformational checkpoint when bound to mismatched targets10,12. We therefore hypothesized that SpCas9-HF1 and eSpCas9(1.1) variants may use a more sensitive threshold for HNH domain activation to promote off-target discrimination. To test this possibility, we labelled catalytically active WT SpCas9 (SpCas9HNH), SpCas9-HF1 (SpCas9-HF1HNH) and eSpCas9(1.1) (eSpCas9(1.1)HNH) with Cy3/Cy5 Förster resonance energy transfer (FRET) pairs at positions S355C (within the ‘stationary’ REC1 domain) and S867C (within the ‘mobile’ HNH domain) to measure HNH conformational states upon dsDNA binding12 (Fig. 1c–f and Extended Data Fig. 1c–e). Whereas SpCas9HNH stably populated the active state with on-target and mismatched substrates, as observed by steady-state single-molecule FRET (smFRET) (Fig. 1d), only ~32% of SpCas9-HF1HNH molecules occupied the HNH active state (EFRET = 0.97) with an on-target substrate, with the remaining ~68% trapped in the inactive intermediate state (EFRET = 0.45) (Fig. 1e). Of the dynamic molecules (~36% of all smFRET traces) observed for SpCas9-HF1HNH, kinetics analysis further revealed that the HNH transition rate from the inactive state to the active state was approximately eightfold slower than that of WT SpCas9HNH (~3% dynamic molecules10) (Extended Data Fig. 3). However, when SpCas9-HF1HNH was bound to a substrate with a single mismatch at the PAM-distal end (denoted 20–20 bp mm), stable docking of the HNH nuclease was entirely ablated (Fig. 1e). In addition, eSpCas9(1.1)HNH and other high-fidelity variants8,9 reduced the HNH active state in the presence of mismatches (Fig. 1f and Extended Data Fig. 2c, d). We therefore propose that high-fidelity variants of Cas9 reduce off-target cleavage by raising the threshold for HNH conformational activation when bound to DNA substrates.

Since the HNH domain does not directly contact nucleic acids at the PAM-distal end13,17,18,19, it is likely that a separate domain of Cas9 senses target complementarity to govern HNH domain mobility. Structural studies suggested that a domain within the Cas9 recognition (REC) lobe, called REC3, interacts with the RNA–DNA heteroduplex and undergoes conformational changes upon target binding (Extended Data Fig. 2e, f)13,14,17,18,19. Because the function of this non-catalytic domain was previously unknown, we labelled SpCas9 with Cy3/Cy5 dyes at positions S701C (within the ‘mobile’ REC3 domain) and S960C (within the ‘stationary’ RuvC domain) to generate SpCas9REC3 and observed that the conformational states of REC3 become more heterogeneous as PAM-distal mismatches increase (Extended Data Fig. 4a–c). To determine whether PAM-distal sensing precedes HNH activation, we deleted REC3 from WT Cas9 (SpCas9∆REC3) (Fig. 2a). Deletion of REC3 decreased the cleavage rate by about 1,000-fold compared with WT Cas9, despite retaining near-WT affinity for the on-target (Extended Data Fig. 4d, e). Unexpectedly, in vitro complementation of REC3 rescued the on-target cleavage rate by about 100-fold in a concentration-dependent manner (Fig. 2b and Extended Data Fig. 4d). Furthermore, the HNH domain in SpCas9∆REC3 (SpCas9∆ REC3HNH) occupied the active state only when REC3 was supplemented in trans (Fig. 2c, d and Extended Data Fig. 4f). We therefore propose that REC3 acts as an allosteric effector that recognizes the RNA–DNA heteroduplex to allow for HNH nuclease activation.

Figure 2: The α-helical lobe regulates HNH domain activation.
figure2

a, Domain organization of SpCas9∆REC3. b, On-target DNA cleavage assay using SpCas9∆REC3 with increasing concentrations of the REC3 domain supplied in trans, resolved by denaturing polyacrylamide gel electrophoresis (PAGE); repeated three independent times with similar results. c, Schematic of SpCas9∆REC3HNH, with REC3 added in trans. Inactive to active structures represent HNH in the sgRNA-bound (PDB accession number 4ZT0) to dsDNA-bound (PDB accession number 5F9R) forms, respectively. d, smFRET histograms showing HNH conformation with SpCas9∆REC3HNH bound to an on-target substrate, with and without REC3. e, Schematic of SpCas9REC2; HNH domain is omitted for clarity. Inactive to active structures represent REC2 in the sgRNA- (PDB accession number 4ZT0) to dsDNA-bound (PDB accession number 5F9R) forms, respectively. f, g, smFRET histograms showing REC2 conformation with (f) WT SpCas9REC2 and (g) SpCas9-HF1REC2 bound to on-target and mismatched targets. For d, f and g, black curves represent a fit to multiple Gaussian peaks.

PowerPoint slide

Source data

We next considered allosteric interactions that could couple the discontinuous REC3 and HNH domains. Structural studies suggested that REC2 occludes the HNH domain from the scissile phosphate in the sgRNA-bound state19, and undergoes a large outward rotation upon binding to double-stranded DNA (dsDNA)13,14 (Fig. 2e). To test whether the REC2 domain regulates access of HNH to the target strand scissile phosphate, we labelled SpCas9 with Cy3/Cy5 dyes at positions E60C (within the ‘stationary’ arginine-rich helix) and D273C (within the ‘mobile’ REC2 domain) to generate SpCas9REC2 to detect REC2 conformational changes (Extended Data Fig. 1b, c). We observed reciprocal changes in bulk FRET values ((ratio)A)20 between SpCas9HNH and SpCas9REC2 across multiple DNA substrates (Extended Data Fig. 4g), which suggests that the REC2 and HNH domains are tightly coupled to ensure catalytic competence. smFRET experiments further confirmed a large opening of REC2 during the transition from the sgRNA-bound state (EFRET = 0.96) to the target-bound state (EFRET = 0.43) (Fig. 2e, f). In contrast to WT SpCas9REC2, SpCas9-HF1REC2 occupies an intermediate state (EFRET = 0.63) when bound to a target with a single PAM-distal mismatch (Fig. 2g). Together with the observation that the HNH domain of SpCas9-HF1 does not occupy the active state with PAM-distal mismatches, these experiments suggest that REC2 sterically occludes HNH in the conformational checkpoint when SpCas9 is bound to off-target substrates.

Next, we investigated whether this conformational proofreading mechanism could be rationally exploited to design novel hyper-accurate Cas9 variants. We identified five clusters of residues containing conserved amino acids within 5 Å of the RNA–DNA interface, four of which are located within REC3 and one in the HNH-RuvC Linker 2 (L2) (Fig. 3a and Extended Data Fig. 5). Alone or in combination with Q926A, a substitution within L2 that confers higher specificity9, we generated alanine substitutions for each residue within the five different clusters of amino acids (clusters 1–5 ± Q926A) (Fig. 3a). We tested whether these cluster mutations affect cleavage accuracy and equilibrium binding in vitro, and found that cluster 1 alone and cluster 2 + Q926A suppressed off-target cleavage while retaining target binding affinities comparable to WT (Extended Data Fig. 6). We next screened all cluster variants in human cells using an enhanced GFP (eGFP) disruption assay5. On-target activity for cluster 1 was comparable to that of SpCas9-HF1 or eSpCas9(1.1), whereas cluster 2 variants displayed generally lower activity (Fig. 3b and Extended Data Fig. 7a). Furthermore, cluster 1 retained high on-target activity (>70% of WT) at 19 out of 24 endogenous gene sites tested, compared with 18 out of 24 for SpCas9-HF1 and 23 out of 24 for eSpCas9(1.1) (Fig. 3c and Extended Data Fig. 8a).

Figure 3: Targeted mutagenesis within the REC3 domain reveals a SpCas9 variant with hyper-accurate behaviour in human cells.
figure3

a, Zoomed-in image of the REC3 domain and linker 2 (L2) with amino acids of cluster variants indicated (PDB accession number 5F9R). Boxed residues indicate amino acids also present in SpCas9-HF1. b, WT-normalized activity of Cas9 variants, using sgRNAs targeting 12 different sites within eGFP. c, WT-normalized endogenous gene disruption activity measured by T7 endonuclease 1 (T7E1) assay across 24 sites. For b and c, error bars represent median and interquartile ranges for n = 12 or 24 biologically independent samples, respectively; the interval with >70% of WT activity is highlighted in light grey. d, Activities of WT and high-fidelity Cas9 variants when programmed with singly mismatched sgRNAs against FANCF site 1. e, Activities of Cas9 variants when programmed with singly mismatched sgRNAs against FANCF site 4 and FANCF site 6. f, Histogram of the total number of GUIDE-seq detected off-target sites for Cas9 variants with six different sgRNAs.

PowerPoint slide

Source data

We then focused on the specific contributions of mutations within cluster 1 by restoring each individual mutated residue to its WT identity, along with the Q926A mutation, and tested the resulting variants for on-target editing efficiency in human cells. On-target activity was significantly compromised when N692A/Q695A/Q926A mutations occurred together, but restoring either N692 (cluster 1 N692 + Q926A) or Q695 (cluster 1 Q695 + Q926A) alone led to robust on-target efficiency comparable to cluster 1, signifying differential contributions from these mutations to activity and specificity (Extended Data Figs 7b, c and 8a–c). Using sgRNAs with single mismatches against the endogenous human gene target FANCF site 1, we found that cluster 1 exhibited greater specificity than both SpCas9-HF1 and eSpCas9(1.1) in the middle and PAM-proximal regions of the spacer (Fig. 3d and Extended Data Fig. 8c). Additional single mismatch tolerance assays on FANCF sites 4 and 6 further corroborated the superior accuracy of cluster 1 (N692A/M694A/Q695A/H698A, hereafter referred to as HypaCas9) against mismatches at positions 1 through 18; however, single mismatches along FANCF site 2 were still tolerated across all SpCas9 variants tested (Fig. 3e and Extended Data Fig. 8d, e).

Next, we performed genome-wide, unbiased identification of double-stranded breaks enabled by sequencing (GUIDE-seq)6 to compare the genome-wide specificities of WT SpCas9, SpCas9-HF1, eSpCas9(1.1) and HypaCas9 using three sgRNAs previously shown to exhibit substantial off-target effects (FANCF site 2, VEGFA sites 2 and 3)6,9, and three previously uncharacterized sgRNAs with a moderate number of in silico predicted off-target sites (FANCF site 6, DNMT1 sites 3 and 4; Extended Data Fig. 9a). We assessed GUIDE-seq tag integration and on-target editing and observed comparable efficiencies among the four nucleases for all six sgRNAs (Extended Data Fig. 9b–d). Our GUIDE-seq analysis revealed that HypaCas9 exhibits dramatically improved genome-wide specificity compared with WT SpCas9, and showed equivalent or better genome-wide specificity relative to both SpCas9-HF1 and eSpCas9(1.1) for all sgRNAs examined (Fig. 3f and Extended Data Figs 9e and 10). These results corroborate the enhanced mismatch intolerance of HypaCas9 and demonstrate that its specificity improvements may extend beyond the PAM-proximal and middle regions of the spacer sequence.

To biochemically validate cleavage specificity in the middle region of the spacer with HypaCas9, we measured cleavage rates against the FANCF site 1 sequence with or without internal mismatches. Although HypaCas9 retained on-target activity comparable to WT and SpCas9-HF1 in human cells, its in vitro cleavage rate was slightly reduced for the one target site examined (Fig. 4a). However, the cleavage rate with internally mismatched substrates was considerably slower compared with WT and SpCas9-HF1, which may be explained by the altered threshold of HNH activation (Fig. 4a, b).

Figure 4: Mutating residues involved in proofreading increases the threshold for conformational activation to ensure targeting accuracy.
figure4

a, DNA cleavage kinetics of SpCas9 variants with the FANCF site 1 on-target and internally mismatched substrates; mean and s.d. are shown; n = 3 independent experiments (overlaid as white circles). b, smFRET histograms showing HNH conformation for indicated SpCas9 variants with a FANCF site 1 on-target and mismatched substrate at the 12th position; black curves represent a fit to multiple Gaussian peaks. c, Model for α-helical lobe sensing and regulation of the RNA–DNA heteroduplex for HNH activation and cleavage.

PowerPoint slide

Source data

Our findings provide direct evidence to support previous speculation that Cas9 relies on protospacer sensing to enable accurate targeting21,22. In particular, we propose that REC3 binding to the RNA–DNA duplex is necessary for re-orienting REC2, which enables HNH docking at the active site (Extended Data Fig. 4h, i). Mutation of residues within REC3 that are involved in RNA–DNA heteroduplex recognition, such as those mutated in HypaCas9 or SpCas9-HF1, prevents transitions by the REC2 domain, which more stringently traps the HNH domain in the conformational checkpoint in the presence of mismatches (Fig. 4c and Extended Data Fig. 10). Curiously, nearly all of the amino acids within the cluster variants were strongly conserved (Extended Data Fig. 5), suggesting that these residues may also be involved in protospacer sensing and HNH nuclease activation across Cas9 orthologues. Furthermore, this observation may address how nature apparently has not selected for a highly precise Cas9 protein, whose native balance between mismatch tolerance and specificity may be optimized for host immunity. Our study delineates a general strategy for improving Cas9 specificity by tuning the natural conformational threshold, and offers opportunities for rational design of hyper-accurate Cas9 variants that do not compromise efficiency.

Methods

No statistical methods were used to predetermine sample size. The experiments were not randomized. The investigators were not blinded to allocation during experiments and outcome assessment.

Protein purification and dye labelling

S. pyogenes Cas9 and truncation derivatives were cloned into a custom pET-based expression vector containing an N-terminal His6-tag, maltose-binding protein (MBP) and TEV protease cleavage site. Point mutations were introduced by Gibson assembly or around-the-horn PCR and verified by DNA sequencing. Proteins were purified as described23, with the following modifications: after Ni-NTA affinity purification and overnight TEV cleavage at 4 °C, proteins were purified over an MBPTrap HP column connected to a HiTrap Heparin HP column for cation exchange chromatography. The final gel filtration step (Superdex 200) was carried out in elution buffer containing 20 mM Tris-HCl, pH 7.5, 200 mM NaCl, 5% (v/v) glycerol and 1 mM TCEP. For FRET experiments, Cy3/Cy5-dye positions were selected within a cysteine-free Cas9 protein on the basis of a structural alignment of the sgRNA-bound (4ZT0) to dsDNA-bound (5F9R) structures. Each FRET pair consisted of one cysteine substitution within the ‘mobile’ domain (HNH, REC2 or REC3) and another within the relatively ‘stationary’ domain (REC1, arginine-rich helix or RuvC), such that the inter-residue distance change from the sgRNA-bound to dsDNA-bound states was between 10 and 90 Å (Extended Data Fig. 10). Dye-labelled Cas9 samples were subsequently prepared as described12. A list of all protein variants and truncations is in Supplementary Table 2.

Nucleic acid preparation

sgRNA templates were PCR amplified from a pUC19 vector containing a T7 promoter, 20 nucleotide target sequence and optimized sgRNA scaffold. The amplified PCR product was extracted with phenol:chloroform:isoamyl alcohol and served as the DNA template for sgRNA transcription reactions, which were performed as described24. DNA oligonucleotides and 5′ end biotinylated DNAs (Supplementary Table 3) were synthesized commercially (Integrated DNA Technologies), and DNA duplexes were prepared and purified by native PAGE as described23.

DNA cleavage and binding assays

DNA duplex substrates were 5′-32P-radiolabelled on both strands. For cleavage experiments, Cas9 and sgRNA were pre-incubated at room temperature for at least 10 min in 1× binding buffer (20 mM Tris-HCl, pH 7.5, 100 mM KCl, 5 mM MgCl2, 1 mM DTT, 5% glycerol, 50 μg ml−1 heparin) before initiating the cleavage reaction by addition of DNA duplexes. For REC3 in vitro complementation experiments, 100 nM SpCas9∆REC3–sgRNA complexes were pre-incubated with tenfold molar excess of REC3 for at least 10 min at room temperature before addition of 1 nM radiolabelled substrate. For REC3 titration experiments, 100 nM SpCas9∆REC3-sgRNA complexes were separately pre-incubated with 0, 10, 20, 50, 100, 200, 500, and 1,000 nM REC3 for at least 10 min at room temperature before addition of 1 nM radiolabelled substrate; reactions were quenched after 10 min with an equal volume of buffer containing 50 mM EDTA, 0.02% bromophenol blue, and 90% (vol/vol) formamide. DNA cleavage experiments were performed and analysed as previously described12.

DNA binding assays were conducted using increasing concentrations of SpCas9 variant–sgRNA complexes (0.1, 1, 3, 10, 30, 100, 300, and 1,000 nM) in 1× binding buffer without MgCl2 + 1 mM EDTA, and reactions were incubated with <0.1 nM radiolabelled duplex substrate at room temperature for 2 h. DNA-bound complexes were resolved on 8% native PAGE (0.5× TBE + 1 mM EDTA, without MgCl2) at 4 °C, as previously described10. Experiments were replicated at least three times, and presented gels are representative results.

Bulk FRET experiments

All bulk FRET assays were performed at room temperature in 1× binding buffer, containing 50 nM SpCas9HNH (C80S/S355C/C574S/S867C labelled with Cy3/Cy5), SpCas9∆REC3HNH (M1–N497,GGS,V713–D1368 + C80S/S355C/C574S/S867C) or SpCas9REC2 (E60C/C80S/D273C/C574S labelled with Cy3/Cy5) with 200 nM sgRNA and DNA substrate where indicated. Fluorescence measurements were collected and analysed as described12. For REC3 in vitro complementation FRET experiments, SpCas9∆REC3HNH and sgRNA were pre-incubated with tenfold molar excess of REC3 for at least 10 min at room temperature before measuring bulk fluorescence.

Sample preparation for smFRET assay

MicroSurfaces supplied 99% PEG and 1% biotinylated-PEG coated quartz slides. Sample preparation was performed as previously described10. Briefly, the glass surface was pre-blocked with casein (10 mg ml−1) for 10 min. The sample chamber was washed with 1× binding buffer, then incubated with 20 μl streptavidin (1 mg ml−1) for 10 min. Unbound streptavidin was washed away with 40 μl of 1× binding buffer.

To immobilize SpCas9 on its DNA substrate, 2.5 nM biotinylated DNA substrate was introduced and incubated in the sample chamber for 5 min. Excess DNA was washed with 1× binding buffer. SpCas9–sgRNA complexes were prepared by mixing 50 nM Cas9 and 50 nM sgRNA in 1× binding buffer and incubated for 10 min at room temperature. SpCas9–sgRNA was diluted to 100 pM, introduced to sample chamber and incubated for 10 min. Before data acquisition, 20 μl imaging buffer (1 mg ml−1 glucose oxidase, 0.04 mg ml−1 catalase, 0.8% dextrose (w/v) and 2 mM Trolox in 1× binding buffer) was flown into chamber. The REC3 in vitro complementation assay was performed similarly to steady-state FRET experiments: 2.5 nM biotinylated DNA substrate (on-target) was immobilized on the surface, and excess DNA was washed with 1× binding buffer. SpCas9–sgRNA complexes were prepared by mixing 50 nM SpCas9ΔREC3 and 50 nM sgRNA in 1× binding buffer and incubated for 10 min at room temperature. SpCas9–sgRNA was diluted to 100 pM, introduced to the sample chamber and incubated for 10 min. Before data acquisition, 20 μl imaging buffer was flowed into the chamber. After data acquisition, the sample chamber was washed with 1× binding buffer. Imaging buffer (20 μl) supplemented with 1 μM REC3 was flowed into the sample chamber and incubated for 10 min. After incubation, data for REC3 complementation were collected.

Microscopy and data analysis

A prism-type TIRF microscope was set up using a Nikon Ti-E Eclipse inverted fluorescent microscope equipped with a 60×, 1.20 numerical aperture Plan Apo water objective and the Perfect Focus System (Nikon). A 532-nm solid state laser (Coherent Compass) and a 633-nm HeNe laser (JDSU) were used for Cy3 and Cy5 excitation, respectively. Cy3 and Cy5 fluorescence was split into two channels using an Optosplit II image splitter (Cairn Instruments) and imaged separately on the same electron-multiplied charged-coupled device camera (512 pixels × 512 pixels, Andor Ixon EM+). Effective pixel size of the camera was set to 267 nm after magnification. Movies for steady-state FRET measurements were acquired at 10 Hz under 0.3 kW cm−2 532-nm excitation. Steady-state and dynamic FRET data analysis was performed as described previously10. Briefly, for steady-state FRET analysis, two fluorescent channels were registered with each other using fiducial markers (20 nm diameter Nile Red Beads, Life Technologies) to determine the Cy3/Cy5 FRET pairs. Cy3/Cy5 pairs that photobleached in one step and showed anti-correlated signal changes were used to build histograms. FRET values were corrected for donor leakage and the histograms were normalized to determine the percentage of distinct FRET populations. Only samples showing greater than 3% of molecules with active transitions were subjected to dynamic FRET analysis.

Human cell culture and transfection

Descriptions of nuclease and guide RNA plasmids used for human cell culture are available in Supplementary Tables 2 and 3. Nuclease variants were generated by isothermal assembly into JDS246 (Addgene 43861)5, and guide RNAs were cloned into BsmBI-digested BPK1520 (Addgene 65777)25. Both U2OS cells (a gift from T. Cathomen, Freiburg) and U2OS–eGFP cells (encoding a single integrated copy of a pCMV–eGFP–PEST cassette)26 were cultured at 37 °C with 5% CO2 in advanced DMEM containing 10% heat-inactivated fetal bovine serum, 2 mM GlutaMax, penicillin–streptomycin, and 400 μg ml−1 Geneticin (for U2OS–eGFP cells only). Cell culture reagents were purchased from Thermo Fisher Scientific, cell line identities were validated by STR profiling (American Type Culture Collection, ATCC) and deep-sequencing, and cell culture supernatant was tested twice a month for mycoplasma. Transfections were performed using a Lonza 4-D Nucleofector with the SE Kit and the DN-100 program on ~200,000 cells with 750 ng of nuclease and 250 ng of guide RNA plasmids.

Human cell eGFP disruption assay

eGFP disruption experiments were performed as previously described5,26. Briefly, transfected cells were analysed ~52 h after transfection for loss of eGFP fluorescence using a Fortessa flow cytometer (BD Biosciences). Background loss was determined by gating a negative control transfection (containing nuclease and empty guide RNA plasmid) at ~2.5% for all experiments.

T7 endonuclease I assay

Roughly 72 h after transfection, genomic DNA was extracted from U2OS cells using an Agencourt DNAdvance Genomic DNA Isolation Kit (Beckman Coulter Genomics), and T7 endonuclease I (T7E1) assays were performed as previously described26. Briefly, 600- to 800-nucleotide amplicons surrounding on-target sites were amplified from ~100 ng of genomic DNA using Phusion Hot-Start Flex DNA Polymerase (New England Biolabs, NEB) using the primers listed in Supplementary Table 3. PCR products were visualized (using a QIAxcel capillary electrophoresis instrument, Qiagen) and purified (Agencourt Ampure XP cleanup, Beckman Coulter Genomics). Denaturation and annealing of ~200 ng of the PCR product was followed by digestion with T7EI (NEB). Digestion products were purified (Ampure) and quantified (QIAxcel) to approximate the mutagenesis frequencies induced by Cas9–sgRNA complexes.

GUIDE-seq

GUIDE-seq experiments were performed with WT SpCas9, SpCas9-HF1, eSpCas9(1.1) and HypaCas9 for six different sgRNAs, essentially as previously described6. Briefly, U2OS cells were transfected as described above with the addition of 100 pmol of an end-protected double-stranded oligonucleotide (dsODN) GUIDE-seq tag. Approximately 72 h after nucleofection, genomic DNA was extracted and gene disruption was quantified by T7E1 assay (as described above). GUIDE-seq tag-integration efficiencies were assessed using restriction fragment length polymorphism (RFLP) assays as previously described9. Briefly, PCR reactions amplified from ~100 ng of genomic DNA from GUIDE-seq treated samples, using Phusion Hot-Start Flex DNA Polymerase (NEB), were treated with 20 U NdeI (NEB) for 3 h. Digested products were purified (Ampure) and quantified (QIAxcel) to approximate GUIDE-seq tag-integration efficiencies. To perform GUIDE-seq, sample libraries were assembled as previously described6 and sequenced on an Illumina MiSeq machine. Data were analysed using open-source guideseq software (version 1.1)27. GUIDE-seq data can be found in Supplementary Table 1, and are deposited with the NCBI Sequence Read Archive. Potential alternative alignments shown in Supplementary Table 1, resulting from RNA or DNA bulges28, depict one of many possible alternative alignments.

Data availability

Plasmids encoding the high-fidelity SpCas9 variants described in this manuscript have been deposited with the non-profit plasmid distribution service Addgene (http://www.addgene.org/). All sequencing data from this study are available through the NCBI Sequence Read Archive under accession number SRP116962. The authors declare that all other data supporting the findings of this study are available within the paper and its Supplementary Information files.

Accession codes

Primary accessions

Sequence Read Archive

Referenced accessions

Protein Data Bank

References

  1. 1

    Doudna, J. A. & Charpentier, E. The new frontier of genome engineering with CRISPR-Cas9. Science 346, 1258096 (2014)

    Article  Google Scholar 

  2. 2

    Hsu, P. D., Lander, E. S. & Zhang, F. Development and applications of CRISPR-Cas9 for genome engineering. Cell 157, 1262–1278 (2014)

    CAS  Article  Google Scholar 

  3. 3

    Mali, P., Esvelt, K. M. & Church, G. M. Cas9 as a versatile tool for engineering biology. Nat. Methods 10, 957–963 (2013)

    CAS  Article  Google Scholar 

  4. 4

    Barrangou, R. & Horvath, P. A decade of discovery: CRISPR functions and applications. Nat. Microbiol. 2, 17092 (2017)

    CAS  Article  Google Scholar 

  5. 5

    Fu, Y. et al. High-frequency off-target mutagenesis induced by CRISPR-Cas nucleases in human cells. Nat. Biotechnol. 31, 822–826 (2013)

    CAS  Article  Google Scholar 

  6. 6

    Tsai, S. Q. et al. GUIDE-seq enables genome-wide profiling of off-target cleavage by CRISPR-Cas nucleases. Nat. Biotechnol. 33, 187–197 (2015)

    CAS  Article  Google Scholar 

  7. 7

    Tsai, S. Q. & Joung, J. K. Defining and improving the genome-wide specificities of CRISPR-Cas9 nucleases. Nat. Rev. Genet. 17, 300–312 (2016)

    CAS  Article  Google Scholar 

  8. 8

    Slaymaker, I. M. et al. Rationally engineered Cas9 nucleases with improved specificity. Science 351, 84–88 (2016)

    ADS  CAS  Article  Google Scholar 

  9. 9

    Kleinstiver, B. P. et al. High-fidelity CRISPR–Cas9 nucleases with no detectable genome-wide off-target effects. Nature 529, 490–495 (2016)

    ADS  CAS  Article  Google Scholar 

  10. 10

    Dagdas, Y. S., Chen, J. S., Sternberg, S. H., Doudna, J. A. & Yildiz, A. A conformational checkpoint between DNA binding and cleavage by CRISPR-Cas9. Sci. Adv. 3, eaao0027 (2017)

    Article  Google Scholar 

  11. 11

    Bisaria, N., Jarmoskaite, I. & Herschlag, D. Lessons from enzyme kinetics reveal specificity principles for RNA-guided nucleases in RNA interference and CRISPR-based genome editing. Cell Syst. 4, 21–29 (2017)

    CAS  Article  Google Scholar 

  12. 12

    Sternberg, S. H., LaFrance, B., Kaplan, M. & Doudna, J. A. Conformational control of DNA target cleavage by CRISPR–Cas9. Nature 527, 110–113 (2015)

    ADS  CAS  Article  Google Scholar 

  13. 13

    Jiang, F. et al. Structures of a CRISPR-Cas9 R-loop complex primed for DNA cleavage. Science 351, 867–871 (2016)

    ADS  CAS  Article  Google Scholar 

  14. 14

    Palermo, G., Miao, Y., Walker, R. C., Jinek, M. & McCammon, J. A. Striking plasticity of CRISPR-Cas9 and key role of non-target DNA, as revealed by molecular simulations. ACS Cent. Sci. 2, 756–763 (2016)

    CAS  Article  Google Scholar 

  15. 15

    Palermo, G., Miao, Y., Walker, R. C., Jinek, M. & McCammon, J. A. CRISPR-Cas9 conformational activation as elucidated from enhanced molecular simulations. Proc. Natl Acad. Sci. USA 114, 7260–7265 (2017)

    CAS  Article  Google Scholar 

  16. 16

    Jinek, M. et al. A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity. Science 337, 816–821 (2012)

    ADS  CAS  Article  Google Scholar 

  17. 17

    Nishimasu, H. et al. Crystal structure of Cas9 in complex with guide RNA and target DNA. Cell 156, 935–949 (2014)

    CAS  Article  Google Scholar 

  18. 18

    Anders, C., Niewoehner, O., Duerst, A. & Jinek, M. Structural basis of PAM-dependent target DNA recognition by the Cas9 endonuclease. Nature 513, 569–573 (2014)

    ADS  CAS  Article  Google Scholar 

  19. 19

    Jiang, F., Zhou, K., Ma, L., Gressel, S. & Doudna, J. A. A. A Cas9-guide RNA complex preorganized for target DNA recognition. Science 348, 1477–1481 (2015)

    ADS  CAS  Article  Google Scholar 

  20. 20

    Majumdar, Z. K., Hickerson, R., Noller, H. F. & Clegg, R. M. Measurements of internal distance changes of the 30S ribosome using FRET with multiple donor-acceptor pairs: quantitative spectroscopic methods. J. Mol. Biol. 351, 1123–1145 (2005)

    CAS  Article  Google Scholar 

  21. 21

    Szczelkun, M. D. et al. Direct observation of R-loop formation by single RNA-guided Cas9 and Cascade effector complexes. Proc. Natl Acad. Sci. USA 111, 9798–9803 (2014)

    ADS  CAS  Article  Google Scholar 

  22. 22

    Cencic, R. et al. Protospacer adjacent motif (PAM)-distal sequences engage CRISPR Cas9 DNA target cleavage. PLoS ONE 9, e109213 (2014)

    ADS  Article  Google Scholar 

  23. 23

    Jinek, M. et al. Structures of Cas9 endonucleases reveal RNA-mediated conformational activation. Science 343, 1247997 (2014)

    Article  Google Scholar 

  24. 24

    Wright, A. V. et al. Rational design of a split-Cas9 enzyme complex. Proc. Natl Acad. Sci. USA 112, 2984–2989 (2015)

    ADS  CAS  Article  Google Scholar 

  25. 25

    Kleinstiver, B. P. et al. Engineered CRISPR-Cas9 nucleases with altered PAM specificities. Nature 523, 481–485 (2015)

    ADS  Article  Google Scholar 

  26. 26

    Reyon, D. et al. FLASH assembly of TALENs for high-throughput genome editing. Nat. Biotechnol. 30, 460–465 (2012)

    CAS  Article  Google Scholar 

  27. 27

    Tsai, S. Q., Topkar, V. V., Joung, J. K. & Aryee, M. J. Open-source guideseq software for analysis of GUIDE-seq data. Nat. Biotechnol. 34, 483 (2016)

    Article  Google Scholar 

  28. 28

    Lin, Y. et al. CRISPR/Cas9 systems have off-target activity with insertions or deletions between target DNA and guide RNA sequences. Nucleic Acids Res. 42, 7473–7485 (2014)

    CAS  Article  Google Scholar 

  29. 29

    Bae, S., Park, J. & Kim, J. S. Cas-OFFinder: a fast and versatile algorithm that searches for potential off-target sites of Cas9 RNA-guided endonucleases. Bioinformatics 30, 1473–1475 (2014)

    CAS  Article  Google Scholar 

Download references

Acknowledgements

We thank A. V. Wright, S. N. Floor, J. C. Cofsky, D. Burstein, C. Fellman, B. L. Oakes and O. Mavrothalassitis for discussions and reading the manuscript, M. S. Prew for technical assistance, and J. M. Lopez for assistance with GUIDE-seq data processing. J.S.C. and L.B.H. are supported by National Science Foundation Graduate Research Fellowships, and B.P.K. from Banting (Natural Sciences and Engineering Research Council of Canada) and Charles A. King Trust Postdoctoral Fellowships. J.A.D. is an Investigator of the Howard Hughes Medical Institute. This work was supported by the National Institutes of Health (GM094522 and GM118773 (A.Y.), R35 GM118158 (J.K.J.)), National Science Foundation (MCB-1617028 (A.Y.) and MCB-1244557 (J.A.D.)), and the Desmond and Ann Heathwood MGH Research Scholar Award (J.K.J.).

Author information

Affiliations

Authors

Contributions

J.S.C., Y.S.D. and B.P.K. contributed equally to the work, and conceived and designed experiments with input from L.B.H., S.H.S., J.K.J., A.Y. and J.A.D. J.S.C. performed protein expression, labelling and biochemical experiments. Y.S.D. performed single-molecule fluorescence assays and related data analysis. B.P.K. and M.M.W. performed human cell-based assays, and B.P.K. and A.A.S. performed and analysed GUIDE-seq experiments. J.S.C., Y.S.D., B.P.K., J.K.J., A.Y. and J.A.D. wrote the manuscript.

Corresponding author

Correspondence to Jennifer A. Doudna.

Ethics declarations

Competing interests

J.K.J. has financial interests in Beacon Genomics, Beam Therapeutics, Editas Medicine, Pairwise Plants, Poseida Therapeutics, and Transposagen Biopharmaceuticals. J.K.J.’s interests were reviewed and are managed by Massachusetts General Hospital and Partners HealthCare in accordance with their conflict of interest policies. J.A.D. is a co-founder of Caribou Biosciences, Editas Medicine, and Intellia Therapeutics; a scientific adviser to Caribou, Intellia, eFFECTOR Therapeutics and Driver; and executive director of the Innovative Genomics Institute at the University of California, Berkeley and University of California, San Francisco. S.H.S. is an employee of Caribou Biosciences, Inc. S.H.S., J.S.C., and J.A.D. are inventors on a patent application entitled ‘Reporter Cas9 variants and methods of use thereof’ (PCT/US2016/036754), filed by The Regents of the University of California. B.P.K. and J.K.J. are inventors on a patent application entitled ‘Engineered CRISPR-Cas9 nucleases’ (US 15/060,424), filed by The General Hospital Corporation. J.S.C., Y.S.D., B.P.K., A.Y., J.K.J., and J.A.D. have filed a patent application related to this work through The General Hospital Corporation and The Regents of the University of California.

Additional information

Reviewer Information Nature thanks A. Ke and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Figure 1 Dual-labelled SpCas9 variants are fully functional for DNA cleavage.

a, SDS–PAGE analysis of unlabelled Cas9 variants. b, SDS–PAGE analysis of Cy3/Cy5-labelled Cas9 variants. The gel was scanned for Cy3/Cy5 fluorescence (middle, bottom) before staining with Coomassie blue (top). cf, DNA cleavage time courses of Cas9 FRET constructs and their dual-labelled counterparts for (c) WT SpCas9, (d) SpCas9-HF1, (e) eSpCas9(1.1) and (f) HypaCas9. For af, experiments were repeated three independent times with similar results.

Extended Data Figure 2 HNH domain in eSpCas9 variants still populate the docked state in the presence of PAM-distal mismatches.

a, Quantification of DNA cleavage time courses comparing WT SpCas9, SpCas9-HF and eSpCas9(1.1) variants with perfect and PAM-distal mismatched targets. b, Dissociation constants comparing WT SpCas9, SpCas9-HF and eSpCas9(1.1) variants with perfect and PAM-distal mismatched targets, as measured by electrophoretic mobility shift assays. For a, b, mean and s.d. are shown; n = 3 independent experiments (overlaid as white circles in b). c, d, smFRET histograms for (c) SpCas9-K855A and (d) SpCas9-N497A/R661A/Q695A. For c and d, black curves represent a fit to multiple Gaussian peaks. e, Schematic of SpCas9 domain structure with colour coding for separate domains. f, Vector map of global SpCas9 conformational changes from the sgRNA-bound (PDB accession number 4ZT0) to dsDNA-bound structures (PDB accession number 5F9R), domains coloured as in e. Source data

Extended Data Figure 3 Kinetic analysis of transitions between active and inactive states of the HNH domain.

a, Representative time traces (top), transition density plots (TDPs, middle) and rates of the transitions in TDPs (bottom) for SpCas9-HF1 with on-target DNA (left), eSpCas9(1.1) with on-target DNA (middle) and eSpCas9(1.1) with 20–20 bp mm DNA (right); mean and s.e.m. are shown; n = 107, 24 and 74 individual molecules, respectively. The percentages of molecules showing at least one such transition was 36%, 7% and 29% for SpCas9-HF1 with on-target, eSpCas9(1.1) with on-target and eSpCas9(1.1) with 20–20 bp mm DNA, respectively. Kinetics analysis of other cases (SpCas9-HF1 and eSpCas9(1.1) bound to other off-target substrates, and HypaCas9 bound to on- and off-target substrates) is not shown, because the percentage of molecules showing at least one such transition was less than 3%. b, Comparison of on-target transition rates for WT SpCas9, SpCas9-HF1 and eSpCas9(1.1); mean and s.e.m. are shown; n = 51, 107 and 24 individual molecules, respectively. Transition rates for WT SpCas9 collected from ref. 10. Source data

Extended Data Figure 4 Nucleic acid sensing requires engagement with the REC3 domain and outward rotation of the REC2 domain.

a, Schematic of SpCas9REC3 with FRET dyes at positions S701C and S960C, with HNH domain omitted for clarity. Inactive to active structures represent REC3 in the sgRNA-bound (PDB accession number 4ZT0) to dsDNA-bound (PDB accession number 5F9R) forms, respectively. b, c, smFRET histograms showing HNH conformational activation with black curves representing a fit to multiple Gaussian peaks for (b) WT SpCas9REC3 and (c) SpCas9-HF1REC3 bound to perfect and PAM-distal mismatched targets. The purple peak denotes the sgRNA-only bound state, while the red and green peaks represent two states of REC3 with conformational flexibility upon binding to DNA substrates. d, REC3 in vitro complementation assay with SpCas9∆REC3 by measuring cleavage rate constants. e, On-target DNA binding assay in the presence or absence of the REC3 domain; mean and s.d. are shown. f, REC3 in vitro complementation assay with SpCas9∆REC3 by measuring HNH activation with (ratio)A values. g, (Ratio)A data with SpCas9REC2 and SpCas9HNH showing reciprocal FRET states with the indicated substrates. For dg, mean and s.d. are shown; n = 3 independent experiments (overlaid as white circles in d, f, and g). h, Schematic of SpCas9∆REC3REC2 with FRET dyes at positions E60C and D273C, with the REC3 domain added in trans. Inactive to active structures represent REC2 in the sgRNA-bound (PDB accession number 4ZT0) to dsDNA-bound (PDB accession number 5F9R) forms, respectively. i, smFRET histograms measuring REC2 conformational states with SpCas9∆REC3REC2 in the absence and presence of the REC3 domain when bound to an on-target substrate. Source data

Extended Data Figure 5 Identification of cluster variants on the basis of nucleic acid proximity and multiple sequence alignment of residues within clusters 1–5.

a, Schematic depicting interactions of WT SpCas9 residues within clusters 1–5 with the RNA–DNA heteroduplex, on the basis of PDB accession 5F9R (adapted from ref. 9). b, Alignment of selected Cas9 orthologues using MAFFT and visualized in Geneious 10.0, with red boxes outlining residues mutated to alanine within each cluster variant.

Extended Data Figure 6 Mutation clusters in the REC3 domain along the RNA–DNA heteroduplex demonstrate localized sensitivity to mismatches along the target sequence.

a, b, Quantified DNA cleavage rates (dotted line indicates detection limit for kcleave set at 10 min−1) displayed as (a) a heatmap and (b) a bar graph. c, d, Target DNA binding assay (c) resolved by native PAGE mobility shift assays; repeated three independent times with similar results and (d) quantification with WT-normalized dissociation constants. For b and d, mean and s.d. are shown; n = 3 independent experiments (overlaid as white circles). Source data

Extended Data Figure 7 On-target activities of altered specificity variants using a human cell eGFP disruption assay.

a, Summary of eGFP disruption activities for SpCas9-HF1, eSpCas9(1.1), eSpCas9(1.1)-HF1 and cluster variants ± Q926A with mean and s.e.m., where n = at least 3 biologically independent samples (overlaid as white circles). b, Summary of eGFP disruption activities for the series of cluster 1 variants with each substituted residue restored to the canonical amino acid; mean and s.e.m. are shown; n = at least 3 biologically independent samples (overlaid as white circles); WT, cluster 1 (HypaCas9), and cluster 1 + Q926A data from a are re-plotted for comparison. c, WT-normalized plot of data in b; error bars represent median and interquartile range for n = 12 biologically independent samples; the interval with >70% of WT activity is highlighted in light grey. Source data

Extended Data Figure 8 Activities and specificities of high-fidelity SpCas9 variants targeted to endogenous human cell sites.

a, On-target activities of WT SpCas9, SpCas9-HF1, cluster 1 and cluster 2 variants across 24 endogenous human genes, assessed by T7E1 assay; mean and s.e.m. are shown; n = at least 3 biologically independent samples (overlaid as white circles). b, WT-normalized endogenous gene disruption data from a, for cluster 1 and 2 variants. Error bars represent median and interquartile ranges of 24 biologically independent samples with the >70% interval of WT activity highlighted in light grey; cluster 1 (HypaCas9) data from Fig. 3b are re-plotted for comparison. ce, Summary of single mismatch tolerance of WT SpCas9, SpCas9-HF1, eSpCas9(1.1), and cluster 1 and cluster 2 variants on (c) FANCF site 1, (d) FANCF sites 4 and 6 and (e) FANCF site 2. Percentage modification in ce assessed by T7E1 assay; mean and s.e.m. are shown for n = at least 3 biologically independent samples (overlaid as white circles). Source data

Extended Data Figure 9 Genome-wide specificity profiles of high-fidelity SpCas9 variants defined using GUIDE-seq.

a, Number of in silico predicted target sites mismatched by ‘n’ positions for six sgRNAs against the reference human genome (hg38) via Cas-OFFinder29. b, Assessment of GUIDE-seq dsODN tag integration at the on-target site for each nuclease and guide combination, detected by RFLP assay. c, On-target editing, determined by T7E1 assay; mean and s.e.m. are shown; n = 3 biologically independent samples (overlaid as white circles) for b and c. d, dsODN tag-integration efficiency ratios (integration:mutagenesis, from b and c) for each nuclease and guide combination, with means and 95% confidence intervals shown for n = 6 biologically independent samples. e, GUIDE-seq genome-wide specificity profiles for WT SpCas9, SpCas9-HF1, eSpCas9(1.1) and HypaCas9 each paired with six different sgRNAs. Mismatched positions in off-target sites are highlighted in colour; GUIDE-seq read counts shown to the right of the sequences, which correlate with approximate cleavage efficiency at a given site; blue circles indicate sites with potential alternative alignments due to RNA or DNA bulges28 (see Supplementary Table 1); yellow circles indicate off-target sites that are only supported by asymmetric GUIDE-seq reads. Source data

Extended Data Figure 10 Conformational gating drives targeting accuracy for SpCas9 variants.

ac, Steady-state smFRET histograms measuring (a) HNH, (b) REC2 and (c) REC3 conformational states for HypaCas9 bound to on-target and PAM-distal mismatched substrates. Black curves represent a fit to multiple Gaussian peaks. d, e, Steady-state smFRET histograms of Cas9 variants bound to PAM-distal mismatched substrates were normalized to and subtracted from that of on-target smFRET histograms. This analysis reveals transitions from one FRET population (negative peak, shaded region) to another population (positive peak, unshaded regions) for (d) REC3 and (e) REC2. f, Measured distances between residues labelled with Cy3/Cy5 FRET dyes for different substrate-bound Cas9 structures. Residue pairs were designed to report conformational changes of the specified domain (HNH, REC2 or REC3). The distances were measured between Cα atoms of the indicated residues for the associated PDB structures. Source data

Supplementary information

Supplementary Figure

This file contains the uncropped gel images from polyacrylamide gel electrophoresis experiments presented in the manuscript. (PDF 548 kb)

Reporting Summary (PDF 68 kb)

Supplementary Table 1

This file contains GUIDE-seq data. (XLSX 69 kb)

Supplementary Table 2

This file contains DNA plasmids and proteins used in this study. All enhanced specificity, high-fidelity, cluster and hyper-accurate SpCas9 variants tested in this study, with Addgene ID numbers for deposited plasmids. The HNH, REC2 or REC3 subscript designation with an enhanced specificity, high-fidelity or cluster SpCas9 variant denotes combination of residue substitutions with indicated FRET construct. (XLSX 43 kb)

Supplementary Table 3

This file contains a list of nucleic acids used in the study. (XLSX 18 kb)

PowerPoint slides

Source data

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Chen, J., Dagdas, Y., Kleinstiver, B. et al. Enhanced proofreading governs CRISPR–Cas9 targeting accuracy. Nature 550, 407–410 (2017). https://doi.org/10.1038/nature24268

Download citation

Further reading

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing