Structure–function insights into the initial step of DNA integration by a CRISPR–Cas–Transposon complex

Dear Editor, CRISPR (clustered regularly interspaced short palindromic repeats)–Cas (CRISPR-associated genes) surveillance complexes are RNA-based adaptive immune systems employed by prokaryotes against invading nucleic acids from bacteriophages and plasmids. The CRISPR-derived RNAs (crRNAs) guide the Cas effector complex to target and degrade the invading nucleic acids. Recently, bioinformatics analyses have revealed the presence of CRISPR–Cas loci in bacterial Tn7-like transposons, thereby implicating a functional relationship between RNA-guided DNA targeting and transposition, with the latter representing a new role unrelated to host defense. Support for this concept has emerged from recent functional studies on type I-F and type V-K effectors involved in sequence-specific DNA transposition, thereby significantly broadening the potential biological applications of CRISPR–Cas technology. To complement the available functional studies, our efforts have focused on structural studies of the Vibrio cholerae Tn6677 multi-subunit type I-F Cascade–TniQ complex, whereby transposition subunit TniQ initiates DNA transposition with the eventual help of other transposition-associated proteins TnsA, TnsB and TnsC in the gene cluster (Fig. 1a). Here we report cryo-EM structures of a V. cholerae type I-F Cascade in complex with transposition subunit TniQ before (binary Cascade–TniQ) and after (ternary Cascade–TniQ–dsDNA) target double-stranded DNA (dsDNA) binding at an average resolution of 2.9 Å and 3.2 Å, respectively (Supplementary information, Figs. S1 and 2). Cas6 and TniQ can be readily traced in the 2.9 Å structure of the binary complex, thereby providing insights into how the three Cas subunits (Cas8, Cas7, and Cas6) of the multi-subunit type I-F Cascade are assembled in an intertwined helical topology with crRNA and TniQ. The binary complex reveals a 1:6:1 Cas8:Cas7:Cas6 subunit stoichiometry similar to that reported for type I-F Cascade complexes, but contains a head-to-tail aligned TinQ dimer, whose individual monomers are bound to Cas7.1 and Cas6, so as to facilitate the first step of DNA transposition (Supplementary information, Fig. S3a). The complex is assembled with a 60nucleotide (nt) crRNA processed by the CRISPR-specific endoribonuclease Cas6 from long precursor CRISPR transcripts (precrRNA), which contains 28-nt repeat sequences separated by 32-nt plasmidor phage-derived spacer sequences. The partially palindromic repeat sequence leads to a stable stem-loop structure that could be recognized and cleaved by Cas6 (Supplementary information, Fig. S3b). Cas6 binds to the 3′ stem-loop of crRNA, while forming multiple polar interactions with Cas7.1 (Supplementary information, Fig. S3b, left insert). It has been shown that the trimming of pre-crRNA into crRNA is essential for both complex assembly and DNA transposition, given that the H29A mutant located in the pre-crRNA cleavage pocket shows no DNA transposition activity. Following Cas6-mediated maturation of crRNA, Cas proteins are assembled with crRNA, with the 5′ end recognized by Cas8 and the 3′ stem-loop held by Cas6, which is kinked by Cas7 thumb motifs in a periodic “5+1” pattern (Supplementary information, Fig. S3c). Our Cascade–TniQ–dsDNA ternary complex contains a G–G/C–C PAM (protospacer adjacent motif) sequence (Fig. 1b), which is required for binding of the target sequence, allowing CRISPR–Cas systems to discriminate between self and non-self. The target sequence contains 12-base pair (bp) of PAM-proximal duplex DNA and 51-bp of PAM-distal duplex DNA (Fig. 1c; Supplementary information, Fig. S4). We could observe only 5-bp of PAM-proximal duplex DNA with 2-nt overhang at the 3′ end of the non-target strand (NTS), and a traceable 32-nt target DNA strand complementary and paired to 32-nt spacer RNA, but no clear density for the PAM-distal duplex segment, which is adjacent to bound TniQ. The traceable parts of the crRNA–dsDNA are colored, while non-traceable parts are shown in grey (Fig. 1c; Supplementary information, Fig. S4). The G–G/C–C PAM is specifically recognized by Ser127 and Asn246 in Cas8 (Fig. 1d). Arg243 forms a wedge and stacks with PAM G (–1), facilitating the formation of crRNA–target DNA heteroduplex and displacing the NTS, thus leading to the onset of R-loop formation. It has been proposed that the transposition protein TniQ serves as an important connection between sequence-specific DNA targeting by the Cascade complex and DNA transposition by the accompanying transposase subunits TnsA, TnsB, and TnsC. In line with this concept, we observe that TniQ forms a head-to-tail dimer whose individual monomers simultaneously bind to Cas6 and Cas7.1 of the Cascade complex (Fig. 1b, c), with clear density observed at the interface between TniQ and both Cas6 and Cas7.1. One TniQ interacts with Cas6 via a main chain polar interaction between Val268 and Asn16, whereas the other TniQ interacts with Cas7.1 by multiple polar interactions between side chains over a larger interface (Fig. 1e, inset). We also determined the crystal structure of apo-TniQ at 2.1 Å resolution, demonstrating that TniQ forms a head-to-tail dimer with a dimeric interface of 1931 Å (Supplementary information, Fig. S5a). The dimer result was confirmed by SEC-MALS in solution (Supplementary information, Fig. S5b). Superposition of the TniQ dimer in the cryo-EM structure of ternary Cascade–TniQ–dsDNA complex (in color) with the crystal structure of the apo-form (in grey) reveals minimal conformational changes with an RMSD of 1.19 Å; the loop interacting with Cas6 becomes ordered and the helix interacting with Cas7.1 undergoes slight movement (circled in green, Supplementary information, Fig. S5c). Notably, the binary and DNA-bound ternary Cascade–TniQ complexes superpose with an RMSD of 2.33 Å. Minimal conformational changes are observed on ternary complex formation in the presence of bound target DNA, reflecting slight opening of the entire complex (Fig. 1f) and crRNA (Fig. 1g), especially within the TniQ-bound end.


Cryo-EM Sample Preparation and Data Acquisition
3.0 µl of ~0.5 mg/ml purified Cascade crRNA -TniQ binary and Cascade crRNA -TniQ-dsDNA ternary complexes were applied onto glow-discharged UltrAuFoil 300 mesh R1.2/1.3 grids (Quantifoil), respectively. Grids were blotted for 2s at ~100% humidity and flash frozen in liquid ethane using an FEI Vitrobot Mark IV. For the Cascade crRNA -TniQ binary complex, images were collected on FEI Titan Krios electron microscope operated at an acceleration voltage of 300 kV with a Gatan K3 Summit detector with a 1.08 Å pixel size at the National Cancer Institute's National Cryo-EM Facility. The defocus range was set from -1.0 µm to 2.5 µm. Movies were recorded in superresolution mode at a dose rate of 14.7 e -/Å 2 /s with a total exposure time of 3.4 s, for an accumulated dose of 50 e -/ Å 2 . Intermediate frames were recorded every 0.068 s for a total number of 50 frames. For the Cascade crRNA -TniQ-dsDNA ternary complex, images were collected on FEI Titan Krios electron microscope operated at an acceleration voltage of 300 kV with a Gatan K3 Summit detector with a 1.1 Å pixel size at Memorial Sloan Kettering Cancer Center. The defocus range was set from -1.0 µm to 2.5 µm. Movies were recorded in superresolution mode at a dose rate of 16.4 e -/Å 2 /s with a total exposure time of 3 s, for an accumulated dose of 49.3 e -/ Å 2 . Intermediate frames were recorded every 0.075 s for a total number of 40 frames.

Image Processing
For Cascade crRNA -TniQ binary dataset, motion correction was performed with MotionCor2 3 . Contrast transfer function parameters were estimated by Ctffind4 4 . All other steps of image processing were performed by RELION 3 5 . Templates for automated particle selection were generated from 2D-averages of ~2,000 manually picked particles. Automated particle selection resulted in 3,134,195 particles from 3,614 images. After two rounds of 2D classification, a total of 1,423,694 particles were selected for 3D classification using the initial model generated by RELION as reference. Particles corresponding to the best class with the highest-resolution features were selected and subjected to the second round of 3D classification. One of 3D classes with good secondary structural features and the corresponding 134,856 particles were polished using RELION particle polishing, yielding an electron microscopy map with a resolution of 2.9 Å after 3D auto-refinement (Supplementary information, Fig. S1). The dataset for Cascade crRNA -TniQ-dsDNA ternary complex was processed by the same procedure as above. Briefly, 3,036,464 particles were autopicked from 5,271 images, 55,900 particles were selected for the final 3D reconstruction after two rounds of 2D and 3D classification, resulting in a Cascade crRNA -TniQ-dsDNA ternary complex map with an overall resolution of 3.2 Å (Supplementary information, Fig. S2).
All resolutions were estimated using RELION 'post-processing' by applying a soft mask around the protein density and the Fourier shell correlation (FSC) = 0.143 criterion. Local resolution estimates were calculated from two half data maps using ResMap 6 . Further details related to data processing and refinement are summarized in Supplementary information, Table  S2.

Atomic Model Building and Refinement
For the Cascade crRNA -TniQ binary complex, the initial models of Cas8, Cas7, Cas6 and TniQ were manually built in COOT based on the bulky side chains to register the sequence 2 . All models were refined against summed maps using phenix.real_space_refine 1 by applying geometric and secondary structure restraints. For Cascade crRNA -TniQ-dsDNA ternary complex, the structure of Cascade crRNA -TniQ binary complex was docked into the cryo-EM density map using UCSF Chimera 7 and then manually rebuilt in COOT 2 . Density for TniQ subunits is not good enough for tracing all the residues, we then docked the model obtained from crystal structure of TniQ. All models were refined against summed maps using phenix.real_space_refine 1 by applying geometric and secondary structure restraints. All figures were prepared by PyMol (http://www.pymol.org) or Chimera 7 . The statistics for data collection and model refinement are shown in Supplementary information, Table S2.

SEC-MALS Experiments
For protein molar mass determination, purified TniQ proteins were analyzed using an ÄKTA-MALS system. A mini DAWN TREOS multi-angle light scattering detector (Wyatt Technology) and an Optilab T-rEX refractometer (Wyatt Technology) were used in-line with Superdex200 10/300 gel filtration column (GE Healthcare) pre-equilibrated in buffer B at a flow rate of 0.2 mL/min. Separation and ultraviolet detection were performed by ÄKTA Pure system (GE Healthcare), light scattering was monitored by miniDAWN TREOS system, and concentration was measured by the Optilab T-rEX differential refractometer. Molar masses of proteins were calculated using the Astra 6.1 program (Wyatt Technology) with a dn/dc value of 0.185 mL/g.