A panel of eGFP reporters for single base editing by APOBEC-Cas9 editosome complexes

The prospect of introducing a single C-to-T change at a specific genomic location has become feasible with APOBEC-Cas9 editing technologies. We present a panel of eGFP reporters for quantification and optimization of single base editing by APOBEC-Cas9 editosomes. Reporter utility is demonstrated by comparing activities of seven human APOBEC3 enzymes and rat APOBEC1 (BE3). APOBEC3A and RNA binding-defective variants of APOBEC3B and APOBEC3H display the highest single base editing efficiencies. APOBEC3B catalytic domain complexes also elicit the lowest frequencies of adjacent off-target events. However, unbiased deep-sequencing of edited reporters shows that all editosomes have some degree of local off-target editing. Thus, further optimization is required to generate true single base editors and the eGFP reporters described here have the potential to facilitate this process.

functions as a marker for assessing transfection and transduction efficiencies. Single base editing efficiencies are therefore quantified by dividing the fraction of eGFP and mCherry double-positive cells by the fraction of total mCherry-positive cells.
We first tested reporter utility by comparing efficiencies of single base editing in transiently transfected 293 T cells by the established rat APOBEC1 editosome (BE3) 1 , recently reported APOBEC3A and APOBEC3B C-terminal catalytic domain(ctd)-Cas9n-UGI complexes 17 , and new editosome constructs for APOBEC3B (full-length), APOBEC3C, APOBEC3D, APOBEC3F, APOBEC3G, and two naturally occurring variants of APOBEC3H (haplotype I and II) (Fig. 1). This panel spans the entire seven enzyme human APOBEC3 repertoire. For each editosome complex, efficiencies were highest for the L202 reporter, lower for the L138 reporter, and lowest for the Y93 reporter ( Fig. 1a-f, respectively). Moreover, within a given reporter data set, APOBEC3A and APOBEC3Bctd editosomes showed the highest activity, followed by APOBEC3B (full-length), rat APOBEC1, and APOBEC3H-II. All other editosomes showed negligible activity, which may be based in part on poor expression (APOBEC3D), different dinucleotide editing preference (5′-CC, APOBEC3G), and/or as-yet-unknown reasons. DNA sequencing was not used to analyze these episomal DNA editing events due to a vast excess of non-edited reporter plasmid in each transient transfection reaction. Next, chromosomal DNA editing efficiencies were compared by transiently co-transfecting each editosome construct and an appropriate eGFP gRNA into 293 T cell pools pre-engineered to contain a single copy of each editing reporter by lentivirus-mediated transduction (Fig. 2, Methods). For each editosome, the overall frequencies of eGFP-positive cells were lower than those for transiently transfected reporters, likely due in part to fewer editing substrates per cell (i.e., one versus many). However, relative editing and reporter efficiencies were still similar with APOBEC3A and APOBECBctd editing more efficiently than full-length APOBEC3B, BE3, and APOBEC3H-II, and the L202 reporter performing better than the L138 and Y93 reporters (Fig. 2a,b). In fact, Y93 chromosomal data were not shown because eGFP fluorescence rarely rose above background.
Sanger DNA sequencing was then used to assess mutational events in FACS-enriched, eGFP-positive cells. Due to enrichment by FACS (conservatively 85%), we anticipated finding a majority of on-target editing events and a minority of adjacent off-target edits and indels (i.e., additional mutational events within the DNA region analyzed by PCR and sequencing). However, only APOBEC3Bctd showed consistently high frequencies of on-target editing (8/9 for L202 and 13/16 for L138; Fig. 2c). In comparison, APOBEC3A showed lower than expected on-target editing events, with only 1/6 for the L202 reporter and 9/14 for the L183 reporter (Fig. 2c). Significant numbers of indels were also recovered in APOBEC3A reactions potentially due to imperfect FACS and/or preferential amplification of shorter DNA fragments by PCR.
These results were confirmed and extended by deep-sequencing the portion of each eGFP reporter that spans the intended editing target site (Methods). First, we noted that the overall frequency of on-target editing events reflects the proportion of eGFP-positive, reporter-activated cells in the overall mCherry-positive cell population (data not shown). Second, we used these unbiased deep-sequencing data sets to ask what frequencies and types of adjacent off-target base substitution mutations are observed alongside the on-target C-to-T editing events (Fig. 2d). Not surprisingly, the highly active APOBEC3A enzyme catalyzed the highest proportion of adjacent off-target events in both the L202 and L138 reporters with, for instance, >50% C-to-T at the position 5 nucleotides upstream of the intended target and high frequencies at other editing sites further upstream. APOBEC3A also caused mutations outside of the gRNA-targeted region (i.e., upstream of the single-stranded DNA in the R-loop created by gRNA annealing) indicating that this upstream DNA can become single-stranded at some frequency through different mechanisms such as transcription or DNA replication. BE3 editosomes also caused significant off-target events both within and upstream of the R-loop, whereas APOBEC3Bctd editosomes caused fewer overall off-target events and most of these were confined to the 5′-end of the R-loop. In all instances, relatively few off-target mutations were observed downstream of the intended target cytosine. Similar observations have been made previously using BE3 at several different target sites (e.g., refs 1,[18][19][20]. Full-length APOBEC3B has two canonical deaminase domains, a catalytically active C-terminal domain and an inactive N-terminal domain known to bind RNA [21][22][23] . The higher base editing activity of APOBEC3Bctd in comparison to full-length APOBEC3B suggested that RNA binding might somehow interfere with single base editing (e.g., a bound bulky RNA may prevent the catalytic site from accessing target cytosines in single-stranded DNA). To test this idea directly, we used human APOBEC3H-II, which was recently shown to bind RNA through a basic patch distinct from its DNA editing active site 24,25 . Substitution of two adjacent arginines to glutamates (R175E/R176E) disrupts the RNA binding activity of APOBEC3H-II and increases its single-stranded DNA editing activity 24 . A comparison of the single base editing activity of APOBEC3H-II editosomes and an otherwise identical R175E/R176E RNA binding mutant showed that the mutant is 3.1-to 5.5-fold more active regardless of whether the reporter is episomal or chromosomal (Fig. 3a,b). Sanger and MiSeq DNA sequencing showed similar levels of on-target editing events for each APOBEC3H editosome complex, but adjacent off-target events occurred at higher frequencies for the hyperactive RNA binding-defective enzyme (Fig. 3c,d). Both constructs also caused indels but at lower frequencies than APOBEC3A (Fig. 3e).

Discussion
This study describes the first fluorescent reporters for real-time quantification of single base editing by APOBEC-Cas9 editosomes in living cells. These eGFP reporters enabled us to perform the first comprehensive analysis of base editing capabilities of the entire seven protein human APOBEC3 repertoire. A detailed understanding of why some APOBEC enzymes are highly efficient DNA editors (APOBEC3A and APOBEC3Bctd), some are intermediate (rat APOBEC1, full-length APOBEC3B and APOBEC3H-II), and others are poor will be important for developing optimized editors for specific fundamental, applied, and biomedical applications. For instance, the RNA binding activity of APOBEC3H is clearly inhibitory and, therefore, strategies to eliminate or lessen this activity without compromising DNA editing activity may be beneficial. Many other variables may also influence single base editing efficiencies including Cas9 on/off rates, Cas9 endonuclease activity, linker length/ composition, construct size, overall editosome solubility, subcellular localization, and as-yet-unidentified cellular factors that interact with APOBEC3 enzymes in human cells (e.g., refs [26][27][28][29]. Reporter and editosome constructs described here could also be used, among many conceivable applications, to identify active variants of otherwise dead editosomes (reporter-up screen of editosome mutant libraries), variants of existing editosomes with increased single base selectivity (reporter-up screen with Y93 construct that  described here) could also be altered to 5′-CCA, 5′-ACA, or 5′-GCA (or moved to different codon positions as necessary) to screen for mutant editosomes with different di-and tri-nucleotide preferences (e.g., 5′-TC to 5′-CC in ref. 30 ). The eGFP reporters described here may also be easily adapted for use in a wide variety of different cellular systems (animal, plant, bacterial, parasite, etc.).

Methods
Single base editing reporters. The dual fluorescent HIV-based parental vector was reported 2 (pLenti-CMV-mCherry-T2A-eGFP). Single base editing reporters were made by replacing wild-type eGFP with mutant eGFP PCR products made by overlapping extension high-fidelity PCR with Phusion DNA polymerase (NEB) using primers listed in Supplementary Table 1. Full-length PCR products were gel purified, digested with XhoI and KpnI, and ligated into a similarly cut parental vector. The resulting L202, L138, and Y93 single base editing reporters were confirmed by diagnostic restriction digestions and Sanger sequencing.
The resulting PCR products were cut with NotI and XmaI and used to replace rat APOBEC1 in BE3 (NotI site in MCS and XmaI site in XTEN linker). The gRNAs targeting L202, L138, and Y93 in eGFP or non-specific (NS) sequence as a control were synthesized as complementary oligonucleotides (Supplementary Table 1)  In a subset of chromosomal editing experiments, eGFP-positive cells were recovered by FACS, converted to genomic DNA (Qiagen Gentra Puregene), and subjected to high-fidelity PCR using Phusion (NEB) to amplify eGFP target sequences. PCR products were gel-purified (GeneJET Gel Extraction Kit, Thermo Scientific) and cloned into a sequencing plasmid (CloneJET PCR Cloning Kit, Thermo Fisher). Sanger sequencing was done in 96-well format (Genewiz) using primers recommended with the CloneJET PCR Cloning Kit (Supplementary  Table 1).
To perform MiSeq experiments, eGFP target sequences were amplified using primers in Supplementary  Table 1 and Phusion high-fidelity DNA polymerase (NEB). To add diversity to the sequence library, zero, one, or two extra cytosine bases were added to forward and reverse primers for each amplicon. Barcodes were added to generate full-length Illumina amplicons. Samples were analyzed using Illumina MiSeq (University of Minnesota Genomics Center) 2 × 75-nucleotide paired-end reads. Reads were paired using FLASh 33 . Data processing was performed using a locally installed FASTX-Toolkit. Fastx-clipper was used to trim the 3′ constant adapter region from sequences, and a stand-alone script was used to trim 5′ constant regions. Trimmed sequences were then filtered for high-quality reads using the Fastx-quality filter. Sequences with a Phred quality score less than 30 (99.9% base calling accuracy) at any position were eliminated. Preprocessed sequences were then further analyzed using the FASTAptamer toolkit 34 . FASTAptamer-Count was used to determine the number of times each sequence was sampled from the population. Each sequence was then ranked and sorted based on overall abundance, normalized to the total number of reads in each population, and directed into FASTAptamer-Enrich. FASTAptamer-Enrich calculates the fold enrichment ratios from a starting population to a selected population by using the normalized reads-per-million (RPM) values for each sequence. Sequences at abundances lower than 5 RPM in the A3-editosome samples were discarded. For reporter and A3-editosome comparisons, sequences that appeared only in the A3-contianing samples (with an RPM value over 5), or, sequences that occurred at a frequency below 5 RPM in the No-editor control were included for analysis.
Immunoblots. 1 × 10 6 cells were lysed directly into 2.5x Laemmli sample buffer, separated by 8% SDS-PAGE, and transferred to PVDF-FL membranes (Millipore). Membranes were blocked in 5% milk in PBS and incubated with primary antibody diluted in 5% milk in PBS supplemented with 0.1% Tween20. Secondary antibodies were diluted in 5% milk in PBS supplemented with 0.1% Tween20 and 0.01% SDS. Membranes were imaged with a Licor Odyssey instrument. Primary antibodies used in these experiments were rabbit anti-Cas9 (Abcam ab204448) and mouse anti-HSP90 (BD Transduction Laboratories 610418). Secondary antibodies used were goat anti-rabbit IRdye 800CW (Licor 827-08365) and goat anti-mouse Alexa Fluor 680 (Molecular Probes A-21057). Relevant regions of each immunoblot are shown in Figs 1 and 3, and full images are provided in the supplement.

Data Availability
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.