Comprehensive deletion landscape of CRISPR-Cas9 identifies minimal RNA-guided DNA-binding modules

Proteins evolve through the modular rearrangement of elements known as domains. Extant, multidomain proteins are hypothesized to be the result of domain accretion, but there has been limited experimental validation of this idea. Here, we introduce a technique for genetic minimization by iterative size-exclusion and recombination (MISER) for comprehensively making all possible deletions of a protein. Using MISER, we generate a deletion landscape for the CRISPR protein Cas9. We find that the catalytically-dead Streptococcus pyogenes Cas9 can tolerate large single deletions in the REC2, REC3, HNH, and RuvC domains, while still functioning in vitro and in vivo, and that these deletions can be stacked together to engineer minimal, DNA-binding effector proteins. In total, our results demonstrate that extant proteins retain significant modularity from the accretion process and, as genetic size is a major limitation for viral delivery systems, establish a general technique to improve genome editing and gene therapy-based therapeutics.


107
Cryo-EM image processing 108 109 All steps were performed using RELION-v3.1b unless otherwise indicated 2 . Movies were motion-110 corrected, exposure-filtered, and Fourier cropped to a pixel size of 0.9 Å using and the initial CTF processing. An initial set of 97,827 particles were picked using the general model of Boxnet2 4 . These 114 particles were extract in a 256 pixel box Fourier cropped to 64 pixels (3.6 Å·px −1 ). Iterative rounds of 115 reference-free 2D classification resulted in 85,327 particles, which were used to generate an ab initio 3D-116 reference by stochastic gradient descent. Particles were re-extracted and upsampled in a 128 pixel box 117 (1.8 Å·px −1 ) for further processing. Unsupervised 3D classification did not resolve distinguishable classes. 118 Thus, all particles were subjected to 'gold-standard' 3D auto-refinement using a reference low-pass 119 filtered to 25 Å and a soft shape-mask. This yielded a reconstruction at a nominal resolution of 6.4 Å 120 based on the FSC0.143 criterion and using phase-randomization to correct for masking artifacts 5 . This 121 set of particles was then used to train a picking model with Topaz-v0.2.3 6 . This approach resulted in a 122 set of 288,416 particle coordinates. The new set of particles was extracted in a 128 pixel box (1.8 Å·px −1 ) 123 and subjected to reference-free 2D classification, which resulted in a set 167,245 particles. Additional 124 attempts at 3D classification did not resolve distinguishable classes. This final set of particles was used 125 for 3D auto-refinement as described above and resulted in a 6.2 Å reconstruction. Further processing 126 using reference-based fitting of particle motion and CTF parameters did not yield improvements.

127
Resolution anisotropy of the final reconstruction was assessed using the 3DFSC web server 7 .

131
The previously published coordinate model for the 5.2 Å cryo-EM structure of SpCas9 ternary prevent growth biases during library construction and is flanked by BsaI sites for later Golden Gate cloning (here, plasmid 148 pSAH060). Additionally, rather than mutagenic oligos, double stranded PCR product was used for recombineering, and another 149 cloning step was introduced to remove unmodified plasmids. These modifications are described in Experimental Design. B) 150 Modified golden gate cloning generates a library of ligated N-and C-terminal fragments of the target gene, comprehensively 151 producing protein deletion variants as well as duplication variants. An equimolar mixture of the two plasmid libraries is mixed 152 and fully digested to produce free N-and C-terminal fragments of the target gene. This fragment mixture is then re-ligated in 153 the presence of NheI and SpeI. Successful ligation of an N-and C-terminal fragment from differing libraries produces one of two 154 possible 6 base-pair scar sequences. These novel scar sequences are not recognized by either NheI or SpeI, thus trapping the 155 desired chimeric product as a final ligated vector. Because N-and C-terminal fragments are ligated randomly, these chimeric 156 products produce both protein deletions and protein duplications. Ideally the library is both large enough and minimally biased 157 to produce a large fraction of possible variants. The product of this step can be considered a MISER library of plasmid pSAH060. 158 C) A final cloning step moves the MISER library into a desired context -i.e. an expression plasmid, here pSAH063.
Step C also 159 allows for size-based exclusion of undesired protein variants by extraction from an agarose gel ( A) Digesting a MISER library with a restriction enzyme that has exactly one site within the plasmid will linearize the majority of 217 plasmids, while plasmids with the site deleted will remain circular. This reaction can then be transformed in order to recover a 218 sublibrary containing deletions from a specific region. B) For example, the restriction enzyme SwaI was used to isolate deletions 219 in the REC2 region. The enzyme recognition site is shown mapped to the sequence of pSAH064, the dCas9 expression plasmid, 220 illustrating the overlap with various sequenced deletions. C) The restriction enzyme KpnI was used to isolate deletions in the 221 REC3 region, as in B. D) The restriction enzyme PciI was used to isolate deletions in the HNH region, as in B. E) Sublibraries 222 containing regional individual deletion variants were re-transformed, and colonies were picked and assayed for CRISPRi activity.

223
A subset of the most active clones was Sanger sequenced to identify the precise deletion. RuvC deletions could not be isolated 224 by the sublibrary approach, and instead were cloned manually by PCR. Data are plotted as mean±SD from biological triplicates.

225
Source data are provided as a Source Data file. CE deletion variants, Library 1. B) Flow cytometry was performed to isolate the most functional CE variants from the "stacked" 234 library described in (A). All highly functional CE variants from Library 1 were found to lack REC2 deletions (sequences of CE 235 variants selected for display on this plot can be found in  BLI traces of Δ3CE and Δ4CE binding to dsDNA show that the relative binding is minimal at 300 nM, even with a 3-bp bubble 287 in the seed region of the target (orange and purple). Subsequently a concentration of 1000 nM was used for these constructs. 288 Dotted lines represent Δ3CE and Δ4CE RNPs interacting with a target without complementary spacers but containing NGG 289 PAMs. Light grey and dark grey traces represent Δ3CE and Δ4CE RNPs, respectively, against dsDNA without a spacer or PAM. sgRNAs. sgPCNA and sgRPA1, sgRNAs targeting essential genes. Reci, recipient vector for sgRNA cloning. Data represent 311 the mean and standard deviation of triplicates (n=3). E) Immunoblotting for Flag-tagged MISER-dCas9 or WT-dCas9 KRAB 312 fusion proteins stably expressed in U-251 cells co-expressing a non-targeting guide (sgNT1). The indicated MISER deletions 313 result in reduction of protein size. dCas9 AS represents an alternative out-of-frame start-codon derived from the native sequence 314 of the KRAB domain. Beta-actin (ACTB) was used as loading control. Protein ladders indicate reference molecular weight 315 markers in kDa. Experiment was carried out once. F) Competitive proliferation assay as in (D). Note, the indicated sgRNA 316 (sgRPA1-i9) shows stronger depletion with some of the MISER variants when compared to the WT-dCas9 KRAB fusion. 317 Significance in increased cell depletion was assessed by comparing samples to the wild-type control using unpaired, two-tailed 318 t-tests (alpha = 0.01). Data represent the mean and standard deviation of triplicates (n=3). G) Correlation between PAM-proximal 319 A/U content of sgRNAs (5 most proximal bases) and cell depletion efficiency at day 9 of the competitive proliferation assay for 320 the indicated MISER-dCas9 KRAB fusion variants. The scatter plot represents data from sgPCNA-i3/i4/i6 and sgRPA1-i1/i5/i8/i9. approximately 3 microns defocus with scale indicated and representative reference-free 2D class averages from the Topaz-331 picked particle set. A total of 3400 micrographs were collected, of which 2,554 were used (panel B). Diameter of 2D mask is 150 332 Å in all averages. All cryo-EM data were collected from a single grid. No statistical methods were used to predetermine sample 333 size. B) Single-particle reconstruction workflow as described in methods and orientation distribution of the final reconstruction 334 inset of the insertion site, respectively (lowercase). Six bp were inserted between dCas9 codons, beginning after the target codon. 347 The above example targets the start codon, 'ATG' (bold uppercase). These six bp consisted of recognition sequences for either 348 the restriction enzyme SpeI or NheI (underlined). Flanking primer sequences allowed the amplification of the entire OLS library 349 (italics) using oligonucleotides SAH_284 and SAH_285 (