Optimizing genome editing strategy by primer-extension-mediated sequencing

Efficient and precise genome editing is essential for clinical applications and generating animal models, which requires engineered nucleases with high editing ability while low off-target activity. Here we present a high-throughput sequencing method, primer-extension-mediated sequencing (PEM-seq), to comprehensively assess both editing ability and specificity of engineered nucleases. We showed CRISPR/Cas9-generated breaks could lead to chromosomal translocations and large deletions by PEM-seq. We also found that Cas9 nickase possessed lower off-target activity while with some loss of target cleavage ability. However, high-fidelity Cas9 variants, including both eCas9 and the new FeCas9, could significantly reduce the Cas9 off-target activity with no obvious editing retardation. Moreover, we found AcrIIA4 inhibitor could greatly reduce the activities of Cas9, but off-target loci were not so effectively suppressed as the on-target sites. Therefore, PEM-seq fully evaluating engineered nucleases could help choose better genome editing strategy at given loci than other methods detecting only off-target activity.

(b) Editing efficiencies detected by PEM-seq for Cas9 variants targeting RAG1A in HEK293T cells. Target site sequence was listed above and the red letters indicated the PAM sequence. Error bars, mean ± SD. Two-tailed t-test, *, p < 0.05.
(c) Frequencies of total translocation junctions in 1-kb regions around off-target hotspots for indicated variants targeting RAG1A in HEK293T cells. Total numbers of identified off-target hotspots for each Cas9 variant were showed above the bar. Error bars, mean ± SD. Two-tailed t-test, *, p < 0.05; ** p < 0.01.
(d) Scatter plot of RAG1A off-target hotspots for indicated variants. y axis showed frequency of each hotspot per 100,000 editing events (indels plus translocation). Note that the WT libraries presented in (b-d) were independently prepared from the ones in Fig. 1 and 2. Sequencing reads for the libraries in this figure was less (~60%) and the identified hotspots were only 38, but still more than LAM-HTGTS.
(e-g) Editing efficiencies and off-target hotspots for Cas9 variants targeting EMX1 site in HEK293T cells, depicted as described in the legend to panels b-d. Cas9:RAG1A in HEK293T cells with indicated ratios of Cas9 over AcrIIA4. Total numbers of identified off-target hotspots for indicated samples were showed above the bars. Error bars, mean ± SD. Two-tailed t-test, *, p < 0.05; ** p < 0.01.
(c) Composition of indels and off-target junctions for Cas9:RAG1A libraries with indicated ratios of Cas9 over AcrIIA4. Total junction numbers from pooled three libraries were indicated above the bars. Note that the total bar length of each library was normalized to the same though inhibitor-treated samples always contained less junctions than un-treated ones.
(d-f) Editing efficiencies and off-target hotspots in other Cas9-targeting loci with 1:1 Cas9 over AcrIIA4, depicted as described in the legend to panels (a-c). Note that for 1:1 ratio described in panels (a-c), the amount of plasmid DNA used for transfection for Cas9:RAG1A, AcrIIA4 and blank were 2 g, 2 g and 4 g respectively; while for 1:1 in panels (d-f), the amounts were 2 g, 2 g and 0, which led to a higher transfection efficiency (a) Workflow of PEM-seq pipeline. Raw data was subjected to data preprocess, reads alignment and de-duplicates with RMB. Then unique reads were categorized into germline, indels and translocation according to the sequence features. See online methods for details.
(b) In vitro Cas9 digestion on RAG1A off-target hotspots. Indicated amplified fragments were incubated with purified Cas9 for 20 hrs. "On" was the RAG1A on-target site. "NC", negative control with no Cas9:RAG1A target sites but can be targeted by Cas9:MYC1. Red triangles indicated the un-cleaved fragment, while blue triangles indicated the larger cleaved fragments. The detailed information of these off-targets were showed in Supplementary Table   S2.
(c) Circos plots of Cas9:RAG1A libraries with RAG1A OT6 and OT8 loci as bait in 293T cells. Red triangles indicated the bait sites and red asterisks indicated the detected off-target, and blue triangles indicated the position of RAG1A on-target sites. The detailed information of these off-targets are showed in Supplementary Table S2. (a) Frequencies of indels detected by PEM-seq from titrated amounts of raw reads extracted from Cas9:RAG1A library with 14-bp or 7-bp RMB. The original read size for the Cas9:RAG1A library was 7.85 million. We randomly took 1 M to 7M (one sample per 0.5M) raw reads to go through the PEM-seq "SuperQ" pipeline individually. The 7-bp RMB was truncated from 14-bp RMB for simulation analysis. If the RMB got saturated, the frequencies of indels would increase due to the miss-assignment of germline as showed by the red line.