The RNA-guided CRISPR-associated (Cas) proteins Cas9 and Cas12a provide adaptive immunity against invading nucleic acids, and function as powerful tools for genome editing in a wide range of organisms. Here we reveal the underlying mechanisms of a third, fundamentally distinct RNA-guided genome-editing platform named CRISPR–CasX, which uses unique structures for programmable double-stranded DNA binding and cleavage. Biochemical and in vivo data demonstrate that CasX is active for Escherichia coli and human genome modification. Eight cryo-electron microscopy structures of CasX in different states of assembly with its guide RNA and double-stranded DNA substrates reveal an extensive RNA scaffold and a domain required for DNA unwinding. These data demonstrate how CasX activity arose through convergent evolution to establish an enzyme family that is functionally separate from both Cas9 and Cas12a.
Access optionsAccess options
Subscribe to Journal
Get full journal access for 1 year
only $3.90 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
All data that support the conclusions of this Article can be found in the figures and the Source Data. The cryo-EM structural models and electron density maps have been deposited in the Protein Data Bank under the codes 6NY1, 6NY2 and 6NY3 and the Electron Microscopy Data Bank under the codes EMD-8987, EMD-8988, EMD-8980, EMD-8994, EMD-8996, EMD-8991, EMD-8989 and EMD-8990. More information is summarized in Supplementary Table 1. All the plasmids and oligonucleotide sequences used in this study are summarized in Supplementary Table 2. Any other relevant data are available from the corresponding authors upon reasonable request.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Marraffini, L. A. CRISPR–Cas immunity in prokaryotes. Nature 526, 55–61 (2015).
Wright, A. V., Nuñez, J. K. & Doudna, J. A. Biology and applications of CRISPR systems: harnessing nature’s toolbox for genome engineering. Cell 164, 29–44 (2016).
Barrangou, R. & Doudna, J. A. Applications of CRISPR technologies in research and beyond. Nat. Biotechnol. 34, 933–941 (2016).
Strutt, S. C., Torrez, R. M., Kaya, E., Negrete, O. A. & Doudna, J. A. RNA-dependent RNA targeting by CRISPR–Cas9. eLife 7, e32724 (2018).
Koonin, E. V., Makarova, K. S. & Zhang, F. Diversity, classification and evolution of CRISPR–Cas systems. Curr. Opin. Microbiol. 37, 67–78 (2017).
Cong, L. et al. Multiplex genome engineering using CRISPR/Cas systems. Science 339, 819–823 (2013).
Jinek, M. et al. RNA-programmed genome editing in human cells. eLife 2, e00471 (2013).
Burstein, D. et al. New CRISPR–Cas systems from uncultivated microbes. Nature 542, 237–241 (2017).
Yamano, T. et al. Crystal structure of Cpf1 in complex with guide RNA and target DNA. Cell 165, 949–962 (2016).
Yang, H., Gao, P., Rajashankar, K. R. & Patel, D. J. PAM-dependent target DNA recognition and cleavage by C2c1 CRISPR–Cas endonuclease. Cell 167, 1814–1828.e1812 (2016).
Zetsche, B. et al. Cpf1 is a single RNA-guided endonuclease of a class 2 CRISPR–Cas system. Cell 163, 759–771 (2015).
Chen, J. S. et al. CRISPR–Cas12a target binding unleashes indiscriminate single-stranded DNase activity. Science 360, 436–439 (2018).
Swarts, D. & Jinek, M. Mechanistic insights into the cis-and trans-acting deoxyribonuclease activities of Cas12a. Preprint at https://www.biorxiv.org/content/early/2018/06/22/353748 (2018).
Oakes, B. L., Nadler, D. C. & Savage, D. F. Protein engineering of Cas9 for enhanced function. Methods Enzymol. 546, 491–511 (2014).
Oakes, B. L. et al. Profiling of engineering hotspots identifies an allosteric CRISPR–Cas9 switch. Nat. Biotechnol. 34, 646–651 (2016).
O’Connell, M. R. et al. Programmable RNA recognition and cleavage by CRISPR/Cas9. Nature 516, 263–266 (2014).
Zhu, X. et al. An efficient genotyping method for genome-modified animals and human cells generated with CRISPR/Cas9 system. Sci. Rep. 4, 6420 (2014).
Mali, P. et al. RNA-guided human genome engineering via Cas9. Science 339, 823–826 (2013).
Yamano, T. et al. Structural basis for the canonical and non-canonical PAM recognition by CRISPR–Cpf1. Mol. Cell 67, 633–645.e633 (2017).
Holm, L. & Laakso, L. M. Dali server update. Nucleic Acids Res. 44, W351–W355 (2016).
Moolenaar, G. F., Höglund, L. & Goosen, N. Clue to damage recognition by UvrB: residues in the β-hairpin structure prevent binding to non-damaged DNA. EMBO J. 20, 6140–6149 (2001).
Shen, J., Gai, D., Patrick, A., Greenleaf, W. B. & Chen, X. S. The roles of the residues on the channel β-hairpin and loop structures of simian virus 40 hexameric helicase. Proc. Natl Acad. Sci. USA 102, 11248–11253 (2005).
Castella, S., Bingham, G. & Sanders, C. M. Common determinants in DNA melting and helicase-catalysed DNA unwinding by papillomavirus replication protein E1. Nucleic Acids Res. 34, 3008–3019 (2006).
Hahn, S. & Roberts, S. The zinc ribbon domains of the general transcription factors TFIIB and Brf: conserved functional surfaces but different roles in transcription initiation. Genes Dev. 14, 719–730 (2000).
Okuda, M. et al. A novel zinc finger structure in the large subunit of human general transcription factor TFIIE. J. Biol. Chem. 279, 51395–51403 (2004).
Pan, H. & Wigley, D. B. Structure of the zinc-binding domain of Bacillus stearothermophilus DNA primase. Structure 8, 231–239 (2000).
Qi, L. S. et al. Repurposing CRISPR as an RNA-guided platform for sequence-specific control of gene expression. Cell 152, 1173–1183 (2013).
Tiscornia, G., Singer, O. & Verma, I. M. Production and purification of lentiviral vectors. Nat. Protoc. 1, 241–245 (2006).
Mastronarde, D. N. SerialEM: a program for automated tilt series acquisition on Tecnai microscopes using prediction of specimen position. Microsc. Microanal. 9, 1182–1183 (2003).
Zheng, S. Q. et al. MotionCor2: anisotropic correction of beam-induced motion for improved cryo-electron microscopy. Nat. Methods 14, 331–332 (2017).
Zhang, K. Gctf: real-time CTF determination and correction. J. Struct. Biol. 193, 1–12 (2016).
Tang, G. et al. EMAN2: an extensible image processing suite for electron microscopy. J. Struct. Biol. 157, 38–46 (2007).
Kimanius, D., Forsberg, B. O., Scheres, S. H. & Lindahl, E. Accelerated cryo-EM structure determination with parallelisation using GPUs in RELION-2. eLife 5, e18722 (2016).
Punjani, A., Rubinstein, J. L., Fleet, D. J. & Brubaker, M. A. cryoSPARC: algorithms for rapid unsupervised cryo-EM structure determination. Nat. Methods 14, 290–296 (2017).
Yang, B. et al. Identification of cross-linked peptides from complex samples. Nat. Methods 9, 904–906 (2012).
Asara, J. M., Christofk, H. R., Freimark, L. M. & Cantley, L. C. A label-free quantification method by MS/MS TIC compared to SILAC and spectral counting in a proteomics screen. Proteomics 8, 994–999 (2008).
Adams, P. D. et al. PHENIX: a comprehensive Python-based system for macromolecular structure solution. Acta Crystallogr. D 66, 213–221 (2010).
Emsley, P., Lohkamp, B., Scott, W. G. & Cowtan, K. Features and development of Coot. Acta Crystallogr. D 66, 486–501 (2010).
Chen, V. B. et al. MolProbity: all-atom structure validation for macromolecular crystallography. Acta Crystallogr. D 66, 12–21 (2010).
Trabuco, L. G., Villa, E., Schreiner, E., Harrison, C. B. & Schulten, K. Molecular dynamics flexible fitting: a practical guide to combine cryo-electron microscopy and X-ray crystallography. Methods 49, 174–180 (2009).
Pettersen, E. F. et al. UCSF Chimera—a visualization system for exploratory research and analysis. J. Comput. Chem. 25, 1605–1612 (2004).
Shmakov, S. et al. Discovery and functional characterization of diverse class 2 CRISPR–Cas systems. Mol. Cell 60, 385–397 (2015).
Katoh, K. & Standley, D. M. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772–780 (2013).
Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312–1313 (2014).
Letunic, I. & Bork, P. Interactive tree of life (iTOL) v3: an online tool for the display and annotation of phylogenetic and other trees. Nucleic Acids Res. 44, W242–W245 (2016).
Fu, L., Niu, B., Zhu, Z., Wu, S. & Li, W. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28, 3150–3152 (2012).
Electron microscopy data were collected at the Bay Area Cryo-EM (BACEM) facility located at UC Berkeley. We thank D. B. Toso and P. Grob for expert electron microscopy assistance; A. Chintangal and P. Tobias for computational support; and D. Savage, J. Cofsky and A. V. Wright for comments on the manuscript. Research in this publication was supported by the National Science Foundation under award number 1244557 (J.A.D.); by the National Institutes of Health under award number P50GM082250 (HARC Center, J.A.D.); and by the National Institute of General Medical Sciences of the National Institutes of Health under award number P01GM051487 (J.A.D. and E.N.). J.A.D. and E.N. are Howard Hughes Medical Institute Investigators.
Extended data figures and tables
a, RaxML maximum-likelihood phylogenetic tree of type V effector proteins with TnpB nucleases. Triangle denotes collapsed branches. Bootstrap values are indicated as percentage points; values above 88 are shown between the major branches. b, Per cent sequence-identity pairwise comparisons between the conserved RuvC domains of the class 2 effectors Cas9 (type II-A), Cas12a (type V-A) and CasX (type V-E) inferred from MAFFT alignment, depicted in an all-versus-all fashion. High sequence identity is shown in blue, and low sequence identity is shown in red. Histograms representing interfamily and intrafamily sequence-identity value distributions are shown along the edge. c, DNA cleavage site comparison among Cas12a, Cas12b and CasX. Five repeats with consistent results. d, DNA cleavage activity of DpbCasX mutations (n = 3, mean ± s.d.). e, Schematic of gfp gene. Target regions for guides 1 to 9 are marked along the gene. CasX guide screening by GFP disruption (n > 2, mean ± s.d.). f, CRISPRi efficiency for CasX active-site mutations. The Cas proteins and guide RNAs used in each assay are marked. Cas9-ng, non-targeting RNA guide of SpCas9; Cas9-g, targeting RNA guide of SpCas9; CasX-ng, non-targeting RNA guide of DpbCasX; CasX-g, targeting RNA guide of DpbCasX. GFP-disruption efficiency of targeting guide is shown by GFP signal divided by optical density at 600 nm, compared to the non-targeting guide control (n = 4, mean ± s.d.). g, Purification of apo-CasX, CasX–guide RNA binary complex and CasX–guide RNA–DNA ternary complex with three DNA designs, by size-exclusion chromatography. The representative S200 size-exclusion traces by absorbance at 280 nm are shown. Samples were taken from the labelled peaks and analysed with urea-PAGE with sybrGold. NTS, non-target strand from target DNA. TS, target strand from target DNA. All the reconstitutions have been repeated more than three times with consistent results.
a, PlmCasX T7E1 gene-editing validation of the mammalian cell GFP-disruption assay from Fig. 2g. b, PlmCasX T7EI quantification of a (n = 3, mean ± s.d.). c, PlmCasX GFP-disruption dose response (n = 3, mean ± s.d.). The Cas proteins and guide RNAs used in each assay are marked. Cas9-ng, non-targeting RNA guide of SpCas9; Cas9-g, targeting RNA guide of SpCas9; CasX-ng, non-targeting RNA guide of DpbCasX; CasX-g, targeting RNA guide of DpbCasX. In the assays of human cells, CasX-g2 and CasX-g3 denote GFP targeting guides to the template and non-template strand, respectively, and the GFP targeting guide of Cas9 (Cas9-g)—which is not expected to direct CasX activity—is used as the negative control. d, EGFP disruption of clonal EGFP HEK293T cell lines with PlmCasX and various doses of plasmid (n = 3, mean ± s.d.). Raw fluorescence-activated cell-sorting data are plotted with GFP on the x axis and forward scatter on the y axis, with gates drawn to demonstrate how GFP-negative cells are gated. e, Indels of GFP generated by PlmCasX cleavage as analysed by sub-cloning and Sanger sequencing of 20 clones. Three repeats with consistent results. f, Map depicting the target sites for each of the CasX and Cas9 guides on the EGFP coding sequence for Fig. 2h.
Extended Data Fig. 3 Electron microscopy analysis of CasX–guide RNA–DNA ternary complex with a 30-bp target DNA.
a, Target DNA sequence in this ternary complex. b, Electron microscopy analysis pipeline. From 7,500 drift-corrected micrographs, 1,698,815 particles were picked and used for 2D classification. By 2D-based manual screening, 713,219 good particles were selected for 3D classification into 4 classes. From the class that shows the most intact architecture, 363,431 particles were further used for heterogeneous refinement. This generated two reconstructions, state I and state II, which contained 71% and 29% of the particles, respectively. State I and state II were then independently refined to 3.8 Å and 4.2 Å, respectively. c, Euler angle distribution of the refined particles belonging to state I and state II. d, Gold-standard Fourier shell correlation (GSFSC) curve calculated using two independent half-maps. e, The density maps for both states, coloured by local resolution as calculated in Cryopsarc. The resolution ranges from 3 Å to 7 Å. c and d are taken directly from the standard output of Cryosparc.
Extended Data Fig. 4 Electron microscopy analysis of CasX–guide RNA–DNA ternary complex with full R-loop (45-bp target DNA).
a, Target DNA sequence in this ternary complex. b, Cryo-EM analysis pipeline. From 5,000 drift-corrected micrographs, 1,135,443 particles were picked and used for 2D classification. By 2D-based manual screening, 485,163 good particles were selected for 3D classification into 4 classes. From the class showing better structure preservation, 222,927 particles were further used for heterogeneous refinement; this generated two models, state I and state II, which contained 67% and 33% of the particles, respectively. State I and state II were then independently refined to 3.2 Å and 5.2 Å, respectively. c, The Euler angle distribution for state I and state II. d, GSFSC curve calculated using two independent half-maps. e, Cryo-EM structures of state I and state II coloured by local resolution as calculated in Cryopsarc. The resolution ranges from 3 Å to 7 Å. c and d are standard outputs of Cryosparc.
a–c, Atomic models and cryo-EM maps (shown with a threshold of 8σ or 9σ) for the CasX ternary complex with 30-bp DNA in state I (a) and state II (b), and for state I of the CasX ternary complex with full R-loop (45-bp DNA) (c). Representative regions of the cryo-EM density for different secondary structure regions are shown. d, Map against model FSCs. e, f, Zoomed-in views of atomic models fitted in electron microscopy densities. Arg917 or Gln920 and the DNA residues within 4 Å distance are linked by dashed lines.
a, OBD (or wedge domain, WED) shown in aquamarine, helical I (or REC1) domains are shown in yellow, helical II (or REC2) domains are shown in orange, RuvC domains are shown in green, TSL (or Nuc) domains are shown in pink and bridge helices are shown in blue. The NTSB domain in CasX is shown in red, and PAM interaction (or PI) domain of LbCas12a is shown in purple. Guide RNA and target DNA are shown in grey. Two orientations are presented for each model. b, Overall structure and individual domains of CasX were analysed using the Dali server against the full PDB database. The protein hit with highest Z-score for each target is shown in the top panel. The hits are marked with protein name and PDB code. The similarity scores between CasX overall structure or domains and Alicyclobacillus acidoterrestris Cas12b (here given as AacCas12b) are pulled out from the Dali full PDB analysis and shown in the middle panel. The similarity scores between CasX overall structure/domains and LbCas12a are pulled out from the Dali full PDB analysis and shown in the bottom panel. A z-score of above eight indicates a high degree of similarity; a z-score below eight but above two indicates moderate similarity (usually an irrelevant random match; and a z-score below two indicates noise. c, The TSL domain and full R-loop structures are subtracted from the ternary complex. Zinc ribbon residues are coloured in blue. The primary sequence across the TSL loop is shown. Tyrosines are marked with teal circles. Positively charged residues are marked with red circles. d, Zinc-finger validation by X-ray fluorescence elemental analysis. Bovine erythrocyte carbonic anhydrase that contains zinc in the active site was used as a positive control. Representative zinc peaks appeared in the purified CasX sample but not in the purified Cas9 sample. e, Atomic models of DpbCasX, Alicyclobacillus acidoterrestris Cas12b, LbCas12a and SpCas9 (here given as SpyCas9) binary complexes are shown in surface representation. Protein parts are coloured in cyan, and nucleic acids in dark grey. CasX, Alicyclobacillus acidoterrestris Cas12b and SpCas9 require both crRNA and tracrRNA (or a fused sgRNA), whereas LbCas12a uses only crRNA. Guide RNAs are subtracted from the complexes and shown as ribbons in bottom panels independently. The mass ratio of protein to guide RNA is shown on the right. Values of relative mass occupancy for protein and guide RNA within the three binary complexes (protein + guide RNA) are shown. Protein mass occupancies are coloured in cyan, and guide RNA in dark grey. f, CRISPRi efficiency by guide RNA mutation (n = 3, mean ± s.d.). The sequence for the fused sgRNA is shown. tracrRNA, the joint loop, crRNA and spacer region are marked. The sequences for mutated guide RNA are aligned with the original guide RNA sequence and shown. Cas9 is used as a positive control. +, targeting guide; −, non-targeting guide as a negative control. NC, non-complementary CasX guide. WT, complementary wild-type guide for CasX. GFP-disruption efficiency of targeting guide is shown by the GFP signal divided by optical density at 600 nm, compared to the non-targeting guide control.
a, Drift-corrected image of apoCasX obtained with a 70° phase shift and defocus of 0.5 μm. Scale bar, 50 nm. b, Drift-corrected image of CasX–guide RNA complex with a defocus of −1.5 μm. c, Drift-corrected image of CasX–guide RNA–DNA complex with a defocus of −1.5 μm. Representative reference-free 2D class averages are shown in the bottom panels for the three samples (scale bar for bottom panels, 20 nm). d, Cryo-EM reconstruction of apo-CasX. Three representative orientations are shown with coloured domains. OBD, aquamarine; NTSB, red; helical I, yellow; helical II, orange; RuvC, dark green; TSL, light pink; and bridge helix, blue. e, BS3 cross-linking signals revealed by mass spectrometry for the apo-CasX sample. The two lysines within a cross-linked pair are connected with a purple curve. f, g, As for d and e, for the CasX–guide RNA binary complex. h, i, As for d and e, for the CasX–guide RNA–DNA ternary complex. j, k, Accessibility of target-strand DNA by the RuvC domain in state I and state II. Distance between the TS DNA cleavage region and RuvC active site as calculated using Pymol is 43.8 Å for state I (j) and 10.9 Å for state II (k).
Extended Data Fig. 8 Electron microscopy analysis of CasX–guide RNA–DNA ternary complex with shortened NTS (20-nucleotide NTS and 45-nucleotide TS).
a, Target DNA sequence in this ternary complex. b, Cryo-EM analysis pipeline. From 3,500 drift-corrected micrographs, 801,927 particles were picked and used for 2D classification. By 2D-based manual screening, 369,430 good particles were selected for 3D classification into 4 classes. From the class showing better structure preservation, 181,009 particles were further used for heterogeneous refinement; thish generated two models, state I and state II, with 33.6% and 66.4% of the particles, respectively. State I and state II were then independently refined to 4.5 Å and 4.4 Å, respectively, by homogenous reconstruction. c, The Euler angle distribution of refined particles belonging to state I and state II. d, GSFSC curve calculated using two independent half-maps, indicating an overall resolution of 4.5 Å for state I and 4.4 Å for state II. e, Cryo-EM structures of state I and state II, coloured by local resolution as calculated in Cryopsarc. The resolution ranges from 3 Å to 7 Å. c and d are directly adopted from the standard outputs of Cryosparc.
a, The representative S200 size-exclusion traces by absorbance at 280 nm for wild-type CasX and for CasX with a deletion of the NTSB domain. SDS–PAGE of wild-type CasX protein and CasX protein with a deletion of the NTSB domain by Coomassie brilliant blue staining is shown in the right panel. b, Comparison of the cleavage activities of wild-type CasX and CasX with a truncation of the NTSB domain on an unwound probe (only the PAM region is base-paired, the rest of the probe is mismatched) and on just a single target DNA strand. All the assays have been repeated three times with consistent results.
a, Proposed overall architecture of apo-CasX. The different protein domains are coloured as in Fig. 3. b, Cryo-EM map of the guide RNA-bound CasX. Upon guide RNA binding, CasX undergoes a domain rearrangement (guide RNA is shown as a grey solid surface). c, Cryo-EM map of the CasX ternary complex in the NTS loading state (state I). Upon target dsDNA recognition and unwinding by the CasX–guide RNA complex, the non-target strand is preferentially positioned into the RuvC active site for cleavage. d, Cryo-EM map of the CasX ternary complex in the TS loading state (state II). After non-target-strand cleavage, the entire RNA–DNA duplex is bent by the TSL domain, thus positioning the target strand into RuvC active site. e, Cryo-EM of the CasX ternary complex mimicking a hypothetical trans-active state. After target-strand DNA cleavage, the tension within the bent RNA–DNA duplex favours the return of the CasX ternary complex to state I, thus enabling the RuvC domain to cut any accessible ssDNA. The model shown here corresponds to the CasX ternary complex with a short NTS DNA in state I to mimic the trans-ssDNA cleavage state (the 5′ overhang of TS DNA, which folds back to RuvC domain, is coloured in blue).
This file contains the uncropped gels and Supplementary Tables 1-2
: RNA bubble interacts with the helix-loop-helix fold in State II CasX ternary complex models in State I and State II were aligned based on overall structure using UCSF-Chimera and the two were morphed. The CasX protein is coloured in grey, with the RuvC helix-loop-helix in red. gRNA is cyan, except for the structural bubble shown in blue. The target DNA is not shown for better visualization of gRNA conformational change
: Protein conformational change between State I and State II CasX ternary complex models in State I and State II were aligned based on overall structure using UCSF-Chimera, then morphed. OBD is aquamarine, NTSB is red, Helical-I is yellow, Helical-II is orange, RuvC is dark green, TSL is light pink and the bridge helix is blue. sgRNA and target DNA are not shown for better visualization of the protein conformational changes
About this article
Nature Biotechnology (2019)