Multiplexed Cas9 targeting reveals genomic location effects and gRNA-based staggered breaks influencing mutation efficiency

Understanding the impact of guide RNA (gRNA) and genomic locus on CRISPR-Cas9 activity is crucial to design effective gene editing assays. However, it is challenging to profile Cas9 activity in the endogenous cellular environment. Here we leverage our TRIP technology to integrate ~ 1k barcoded reporter genes in the genomes of mouse embryonic stem cells. We target the integrated reporters (IRs) using RNA-guided Cas9 and characterize induced mutations by sequencing. We report that gRNA-sequence and IR locus explain most variation in mutation efficiency. Predominant insertions of a gRNA-specific nucleotide are consistent with template-dependent repair of staggered DNA ends with 1-bp 5′ overhangs. We confirm that such staggered ends are induced by Cas9 in mouse pre-B cells. To explain observed insertions, we propose a model generating primarily blunt and occasionally staggered DNA ends. Mutation patterns indicate that gRNA-sequence controls the fraction of staggered ends, which could be used to optimize Cas9-based insertion efficiency.


Supplemetary
: Correlation of expression levels at IR loci in mES TRIP cells (IR expression) and at the corresponding endogenous loci in wild-type mES cells (WT expression), with TGE features of wild-type mES cells. Note: "IR expression" is the same as "IR expression" or "Reporter expression" in Figure 3D. "WT expression" is the same as "Gene expression" in Figure 3D. R e p li c a ti o n ti m in g L a m in -B 1 H 3 K 2 7 m e 3 H 3 K 9 m e 3 H 3 K 9 m e 2 H iC a lp h a H 4 K 2 0 m e 3 M a c ro H 2 A 5 h m C H 3 K 3 6 m e 3 H 3 K 9 a c H 3 K 4 m e 3 H 3 K 7 9 m e 2 D n a s e I h y p e rs e n s it iv it y H 3 K 4 m e 2 H 3 K 4 m e 1 H 3 K 2 7 a c A T c o n te n t G e n e p ro x im it y In s e rt io n d e n s it y T T S p ro x im it y T S S p ro x im it y G C c o n te n t C p G is la n d p ro x im it y S e r2 P T b p M c a f1 S e r5 P S e r7 P T a f3 M e d 1 M e d 1 2 M tf 2 C b x 7 S u z 1 2 Y y 1 K a p 1 A tr x R in g 1 b E z h 2 J a ri d 2 M b d 3 R c o r1 H d a c 1 J a ri d b 1 S m c 3 H d a c 2 C tc f S e td b 1 D p y 3 0 S m c 1 C h d 4 P 3 0 0 C tr 9 J a ri d 1 a L s d 1 T b x 3 S ta t3 L u z p 1 N a n o g T c f3 S o x 2 c M y c S m a d 3 S m a d 2 p

Pearson's correlation
To exclude larger variations in expression due to the integration of exogenous DNA and/or quantification methods, we selected two groups of integration loci showing consistent high or low expression both in wild-type (WT) mES cells and in our mES TRIP pool (IR). We defined high and low expression respectively as values in the top and bottom 25 percentiles of the expression distributions. The result of this selection based on the WT and IR expression value distributions is depicted in Supplementary Figure 2. This figure shows some variability in IR (or WT) expression when selecting the IR groups based on high/low WT (or IR) expression only, but a very clear separation between the expression distributions of the high/low IR groups based on both WT and IR expression. Assuming that the correlations we observed between expression and Cas9-induced mutation frequency would have been heavily impacted by the integration of exogenous DNA, our selection should result in an increase in the difference between IR mutation frequencies across the groups of IRs with consistently high/low WT and IR expression levels, relative to the groups selected based only on WT or IR expression.
Supplementary Figure 3 shows that we could not validate these observations, the selection based on consistent WT and IR expression did not improve the association between mutation frequency and IR expression reported in the main manuscript ( Figure 3D). These results show that IRs with similar expression levels exhibited large variation in mutation frequency, and indicate that eventual changes induced by the integration of TRIP reporters were likely not the main source of such variation.

Supplementary Note 2 -Assessing the influence of Cas9 concentration on mutation frequency and patterns
To assess whether the Cas9 concentration would impact mutation frequency and/or patterns, we analyzed data from assays on 5 populations of cells sorted by different mCherry intensities (samples 1 to 5, with increasing Cas9 concentrations). These were intermediate assays used to fine tune the experimental protocol, and thus did not cover all the conditions of the experiments described in the main manuscript. Specifically, we used a population of mES cells containing varying TRIP integrations with a single promoter, PGK. We targeted the IR loci in samples 1-5 using Cas9 complex. We had lower sequencing depth per IR than we had for the cell line and multi-promoter TRIP assays analyzed in the main manuscript (Supplementary Figure 4). Changes in frequencies were very subtle and mostly noticeable using extreme variations in Cas9 concentration (e.g. sample 5 versus the rest). We observed that generally both deletion and insertion frequencies increased with Cas9 concentration. At the highest tested Cas9 concentration (sample 5), insertions became on average slightly more frequent than deletions. In this regard, we note that higher Cas9 concentrations resulted in less sequenced material and lower read counts per IR, possibly due to cell death by toxicity or damage to the DNA caused by the extensive Cas9 cleavage. This could also bias the data towards less damaging mutations (e.g. cells carrying 1-bp insertions could have better survival than those affected by larger deletions). The 5 samples with different Cas9 concentrations showed similar indel size distributions and patterns (Supplementary Figure 8), corroborating the results in the main manuscript ( Figures 4A, 4B, 4D). The most common deletion sizes were {2,4,5}-bp (Supplementary Figure 6), and 1-bp was by far the most frequent insertion size (Supplementary Figure 6), as seen for the TRIP cell line and multi-promoter pools ( Figure 4A). The ten most frequent deletions obtained for the samples 1-5 (Supplementary Figure 8, bottom right) also mostly agreed with the results reported for the TRIP cell line ( Figure  4B), both in ranking and relative frequency (note that the deleted regions in Supplementary Figure 8 are represented from left to right in decreasing order of frequency according to Figure 4B. Samples 1 and 2 showed more extreme values than the remaining three samples, namely for the frequency of CGTAT (rank 1 in Figure 4B) and TATGCG (rank 7 in Figure  4B). We note that each of these deletions represented a small proportion of the total and that the ratio of mutated reads in these samples was much smaller than in samples 3-5. Therefore, some variability was expected. Finally, the most inserted nucleotide in 1-bp insertions was overwhelmingly T, accounting for over 93% of 1-bp insertions in all 5 samples (Supplementary Figure 8, bottom left). This observation matched the TRIP cell line data as well ( Figure 4D). . We also note that the loci in these studies are typically involved in tumor-associated translocations and thus could be more likely to translocate than a random pair of loci. Targeting tens of loci per cell could increase the frequency of translocations in general, but such events should remain rather rare for random pairs of loci. In addition, only a fraction of translocation events would lead to the ligation of an upstream to a downstream region of two DSBs. We could not check for this type of translocation, since our sequenced amplicons covered only a region of the IR gene body (EGFP cassette), including the expected DSB site, along with the IR promoter index and the IR barcode downstream, but not the promoter itself located upstream of the IR gene (Supplementary Figure 7). The reason for this was that the complete IR sequence was 1.8-3.4kb in length, which would be challenging to amplify reliably and consistently across ~1k IRs.

Supplementary Figure 7:
Reporter construct and location of amplification primers for assessment of IR-promoter misassociation.

Cas9-induced DSB site
Upstream of DSB Downstream of DSB Importantly, our results on mutation frequencies and patterns were largely consistent with other studies targeting only a single to a few loci per cell. Moreover, our estimated IR mutation frequencies correlated strongly across experiments ( Figure 2D), which we would not expect if each TRIP mES cell population had suffered a significant number of translocations following Cas9 targeting.
As an additional check, we investigated the occurrence of translocation events for 20 randomly chosen IRs using PCR amplification and Sanger sequencing. For this purpose, we sought to obtain longer amplicons spanning the IR-specific elements upstream and downstream of the expected Cas9 target site (Supplementary Figure 7). The promoter sequence was the most proximal IR-specific feature located upstream of the EGFP cassette in the IRs. As a result, we designed primers to amplify a DNA region extending from the end of the 5'-TR sequence of the IR gene (Supplementary Figure 7  and Supplementary Table 2: 5'-TR-amp primer, reporter-specific primer just upstream of the promoter) to a portion of the genomic sequence located less than 1kb downstream of the IR gene (Supplementary Figure 7 and Supplementary  Table 2, genomic primers 1-20). The amplification products had a total length between 2kb and 3.9kb and spanned the following elements: 183bp of 5'-TR, promoter, EGFP cassette, promoter index, barcode, 3'-TR, and a small fraction of genomic DNA downstream (Supplementary Figure 7). We used this approach to check if the locus-specific genomic region downstream of the DSB remained associated with the correct promoter after Cas9 targeting. A match to a different promoter than expected would otherwise indicate a translocation.
We experienced major difficulties to obtain sufficient quality amplification product for sequencing (see section "Amplification issues" below). Moreover, four of the twenty DNA regions delivered amplification product, but the corresponding sequences did not match any reporter elements upstream or downstream of the expected DSB site, besides a short region surrounding the 5'-TR or genomic primer (Supplementary Table 1). This was likely caused by unspecific primer binding, since proper amplification of the targeted IR loci using the designed primers should at least contain the EGFP reporter sequence downstream of the expected DSB site. We were able to amplify and sequence five of the twenty IR loci. The promoter was correct for all five IRs with valid sequences (Supplementary Table 1). Although we cannot exclude the possibility of translocations, our data do not suggest that such events would be frequent nor that they would have had a major impact on our results and conclusions thereof.
The large IR DNA sequences were notoriously challenging to amplify, as expected. Amplification was further complicated by a majority of low-abundance IRs, since most IRs were only present in a fraction of the cells in the population. Moreover, by design we had to use a combination of locus-specific (genomic DNA) and reporter-specific (reporter DNA) primers so as to amplify the DNA at the desired location only when containing a reporter but also regardless of whether the region upstream of the expected DSB matched the expected one. This meant that at least one of the primers had to target one of the TRs, which was the only reporter element upstream of the promoter or downstream of the unique barcode. However, TR sequences are abundant in the genomic DNA of mES cells, making it difficult to design primers that would specifically recognize the intended sequence. As a result, unspecific primer binding led to unspecific and noisy sequence data. To mitigate unspecific binding, we also designed longer primers than usual (32bp).