A scalable CRISPR/Cas9-based fluorescent reporter assay to study DNA double-strand break repair choice

Double-strand breaks (DSBs) are the most toxic type of DNA lesions. Cells repair these lesions using either end protection- or end resection-coupled mechanisms. To study DSB repair choice, we present the Color Assay Tracing-Repair (CAT-R) to simultaneously quantify DSB repair via end protection and end resection pathways. CAT-R introduces DSBs using CRISPR/Cas9 in a tandem fluorescent reporter, whose repair distinguishes small insertions/deletions from large deletions. We demonstrate CAT-R applications in chemical and genetic screens. First, we evaluate 21 compounds currently in clinical trials which target the DNA damage response. Second, we examine how 417 factors involved in DNA damage response influence the choice between end protection and end resection. Finally, we show that impairing nucleotide excision repair favors error-free repair, providing an alternative way for improving CRISPR/Cas9-based knock-ins. CAT-R is a high-throughput, versatile assay to assess DSB repair choice, which facilitates comprehensive studies of DNA repair and drug efficiency testing.

(d) Example of the gating strategy for flow cytometry data acquisition and analysis.
(e) Scatter dot plots (N HEK293 = 8; N = 1 RPE-1 , mean ± s.d.) show the cell cycle profiles of untreated HEK293 CAT-R and RPE-1 CAT-R cells. A two-tailed t-test is used to calculate the *pvalue < 0.001 between the cell lines in every cell cycle phase. N represents the number of independent experiments.
(f) Bar plots (N = 4, mean ± s.d.) showing the cell cycle profiles of sorted HEK293 CAT-R cells. The cells that were transfected with gRNA:eGFP were sorted based on their CAT-R phenotype (small InDels and large deletions). Cell cycle analysis from these populations was done by incubating cells with 10 μM EdU for 1.5 h at 37 o C. Cells were then processed according to the Click-iT protocol for cell cycle profile analysis. N represents the number of independent experiments.
(g) The ratio between small InDels (SID) to large deletions (LAD) upon DSB induction is unaffected by experimental conditions. Data are derived from a minimum of 56 independent experiments; n represents the number of all replicates.
(h) Flow cytometry analysis of HEK293 CAT-R cells 72 h post-transfection with six different synthetic gRNA complexes targeting the eGFP coding sequence (N = 3, min to max, showing all points with median value). *p ≤ 0.05 versus WT control, multiple comparison analysis testing in ANOVA followed by a Dunnett's test. All individual p-values are included in Source Data file 1. N represents the number of independent experiments.
(i) Box and whisker plots (N = 4, centerlines mark the medians, box limits indicate the 25 th and 75 th percentiles, and whiskers extend to min and max, showing all points) show the short-read sequencing data for small InDels representing the inserted nucleobase types at the target site. N represents the number of independent experiments.
(j) Microhomology profiles based on short-read sequencing data for small InDels.
(k) Fluorescence-activated cell sorting (FACS) gating strategy for downstream analysis. Samples with mCherry+/eGFP-indicated the small InDels population whereas samples with mCherry-/eGFP-to the Large Deletions population.    (b) Box and whisker plots (n Untreated = 6, n Niraparib = 2, n Rucaparib = 2, n Talazoparib = 2, centerlines mark the medians, box limits indicate the 25 th and 75 th percentiles, and whiskers extend to min and max, showing all points) showing the dose-dependent colony formation assay of PARPi. Two concentrations of PARPi were assayed to RPE-1 and RPE-1 TP53, BRCA1 KO cell lines. Cells were uniformly seeded at low density (250 and 1.000 cells respectively) in individual wells of a standard 6-well plate and grown for 12 days in normal serum medium with the presence of PARPi. Colonies were visualized by crystal violet staining and quantified with ImageJ. N represents the number of independent experiments. (c) Box and whiskers plot (n End-protection = 10, n End-resection = 16, n NER = 6, n HRR = 36, centerlines mark the medians, box limits indicate the 25 th and 75 th percentiles, and whiskers extend to min and max, showing all points) of flow cytometry analysis for the HEK293 CAT-R cells. Values are normalized to wildtype (WT) control, data presented are a cluster of genes based on relevant pathways. *p ≤ 0.05 versus WT control, multiple comparison analysis testing in ANOVA followed by a Dunnett's test. Data are derived from 2 independent experiments; n represents the number of all replicates.
(d) Schematic of Color Assay Tracing-Reporter potential outcomes after a Cas9-mediated doublestranded break in the presence of a donor template as a single-stranded oligodeoxynucleotide (ssODN). The ssODN bares two mutations that change the amino acid from Proline to Alanine so that instead of the GFP the BFP is produced indicating a knock in event via the single-strand template repair pathway. The asymmetric design of donor templates is also illustrated.
(e) Representative flow cytometry analysis plots of HEK293 CAT-R cells 72 h post-transfection with the synthetic gRNA and the ssODN. Numbers shown inside plots indicate percentages of live cells. Axes report relative fluorescence intensity in arbitrary units. The conversion from GFP to BFP is quantified based on the wildtype (WT) control.
(f) Box and whiskers plot (n ssODN+gRNAa = 15, n ssODN+gRNAb = 9, n ssODN(L)+gRNAb = 60, n ssODN(L)+gRNAa = 18, n ssODN(R)+gRNAb = 18, n ssODN(R)+gRNAa = 9, , centerlines mark the medians, box limits indicate the 25 th and 75 th percentiles, and whiskers extend to min and max, showing all points) of flow cytometric analysis for HEK293 CAT-R transfected with the synthetic gRNA and the different ssODN templates. *p ≤ 0.05 versus WT control, multiple comparison analysis testing in ANOVA followed by a Dunnett's test. All individual p-values are included in Source Data file 3. Data are derived from 4 independent experiments; n represents the number of all replicates.
(g) Box and whisker plots (N = 3, , centerlines mark the medians, box limits indicate the 25th and 75th percentiles, and whiskers extend to min and max, showing all points) shows the knock-out expression levels of CRISPR/gRNA transfected cells, 72 h post-transfection with the synthetic gRNA targeting ERCC5 and ERCC8, are validated by qPCR. Values are normalized to wildtype (WT) control. N represents the number of independent experiments.
(i) A box and whiskers plot (n WT = 79, n ERCC1 = 9, n XPA = 9, n XPC = 9, centerlines mark the medians, box limits indicate the 25 th and 75 th percentiles, and whiskers extend to min and max, showing all points) presenting the frequency of conversion of GFP to BFP with the use of an asymmetric ssODN template in a mixed pool CRISPR/gRNA transfected cells. Values are represented as log2 fold change to wildtype (WT) control, *p ≤ 0.05 versus WT control, multiple comparison analysis testing in ANOVA followed by a Dunnett's test. All individual p-values are included in Source Data file 3. Data are derived from 3 independent experiments; n represents the number of all replicates.

Supplementary Figure 7. Expanding the utility of CAT-R.
(a) The workflow of generating a CAT-R stable expressing cell line using the AAVS1 safe harbor targeting system.