Genome Editing With Targeted Deaminases

Precise genetic modifications are essential for biomedical research and gene therapy. Yet, traditional homology-directed genome editing is limited by the requirements for DNA cleavage, donor DNA template and the endogenous DNA break-repair machinery. Here we present programmable cytidine deaminases that enable site-specific cytidine to thymidine (C-to-T) genomic edits without the need for DNA cleavage. Our targeted deaminases are efficient and specific in Escherichia coli, converting a genomic C-to-T with 13% efficiency and 95% accuracy. Edited cells do not harbor unintended genomic abnormalities. These novel enzymes also function in human cells, leading to a site-specific C-to-T transition in 2.5% of cells with reduced toxicity compared with zinc-finger nucleases. Targeted deaminases therefore represent a platform for safer and effective genome editing in prokaryotes and eukaryotes, especially in systems where DSBs are toxic, such as human stem cells and repetitive elements targeting.

efficiency in vivo, we integrated a single-copy GFP reporter into the E. coli genome 19 (Fig. 1b and Supplementary Methods 2) in which the GFP is normally not expressed due to a 'broken' start codon ('ACG'). Correction of the genomic ACG to ATG by targeted deamination would restore GFP protein expression, thereby producing GFP-positive cells quantifiable by flow cytometry. Among the four chimeric deaminases we tested, ZF-AID induced comparatively more robust correction efficiency (Fig. 1c). We confirmed the intended ACG-to-ATG conversion in 20/20 randomly chosen GFP+ bacterial colonies, as assessed by Sanger sequencing. Therefore, ZF-AID introduces C T mutations at the locus specified by the fused DNA-binding module.
Our targeted deaminases are modular, such that the DNA-addressing component is interchangable with unrelated DNA-binding domains. To demonstrate this, we developed a TALE-AID fusion (recognizing a different binding sequence 5'-TCACGATTCTTCCC-3' 20 ) corresponding reporter E. coli strain (Fig. 2c). Induction of TALE-AID for 10 hours led to GFP expression in 0.02% of the reporter population, lower than that from ZF-AID, but nonetheless significantly higher than with TALE or AID expression alone (t-test, two-tailed, P (TALE-AID, TALE) =0.0069, P (TALE-AID, AID) =0.0186; n=4) ( Fig. 1d). Importantly, GFP expression is dependent on correct sequence recognition, because TALE-AID and ZF-AID do not induce GFP expression in reporter E. coli cells lacking the cognate target sequences (Fig. 1d). Additional target DNA sequences do not increase editing efficiency, suggesting that a single ZF-AID or TALE-AID is sufficient for editing (Supplementary Data 1 and Supplementary Fig. 1). Thus, ZF-AID and TALE-AID converts C-to-T at sequence-defined genomic loci.
The results demonstrated feasibility of using targeted deaminases for genome editing, but editing efficiency was low. We reasoned that the endogenous uracil repair pathways could reverse the targeted deamination, which would limit the desired C-to-T conversion. Therefore, we knocked out mutS and ung (Supplementary Method 2), two genes critical for uracil repair. Editing by ZF-AID increased to 0.5% (5-fold) in the Δ mutS knockout, and to 3.5% (35-fold) in the Δ mutS Δ ung double knockout (Fig. 1e). Similarly, editing by TALE-AID increased to 0.1% (7-fold increase) in the Δ mutS Δ ung knockout (Fig. 1e). We confirmed the GFP fluorescence signal by microscopy ( Fig. 1f) and confirmed the C:G T:A transitions by sequencing the gfp gene of 20 randomly chosen GFP+ colonies from both the ZF-AID-and TALE-AID-induced population. Hence, suppression of uracil repair increases editing frequencies from the targeted deaminases.
All subsequent experiments in E.coli were done in the Δ mutS Δ ung background.

Optimization of targeted deaminases
We next conducted structural optimization of the targeted deaminases by varying linker lengths and sequence compositions 21,22 (Fig. 2a). While tested variants all led to robust GFP rescue, a longer linker length improved editing frequencies, with ZF-8-aa-AID achieving 7.5% GFP+ frequency after 10 hours (Fig. 2b) and 13% after 30 hours of induction ( Supplementary Fig. 2a). Sequence composition of the linker also influences editing frequencies (t-test, two tailed, p=0.0032, n=4). Hence, the linker determines performance of the overall construct.
Our initial TALE -AID (hereafter referred to as TALE-C1-AID) is less efficient than the ZF-AIDs (Fig. 1e). Given the importance of the linker between the DNA-binding module with the deaminase, we proceeded to investigate if truncation of the 178aa 20 . Cterminus could increase TALE-AID activity (Fig. 2c). Truncations were chosen at in silico predicted loop regions. We also constructed five bacterial GFP reporter strains, each with a genomic gfp locus carrying a broken start codon 2, 5, 8, 11, or 14 bp upstream of the TALE binding site (Fig. 2c). Targeted deamination frequencies were then measured by GFP rescue frequency and compared in a 5-by-5 matrix of TALE-AIDs and reporters (Fig. 2d). TALE-AID truncations showed significantly higher GFP rescue over that of TALE-C1-AID (Fig. 2d), with TALE-C3-AID achieving a genomic editing frequency of 2.5% on the 8bp-spacer reporter after 10 hours of induction (Fig. 2d), and 8% following 20 hours of induction ( Supplementary Fig. 2b). Interestingly, TALE-C3-AID outperformed all other constructs regardless of the reporter spacer length, suggesting that this chimeric protein has an intrinsically optimal structure out of the TALE-AIDs tested. These results for ZF-AIDs and TALE-AIDs thus reveal important design considerations for engineering efficient targeted deaminases (Fig 2e).

Specificity of targeted deaminases
Having investigated and improved deaminase targeting frequency, we next characterized targeting specificity using the following three methods: 1) investigating the effect of point-mutations in the target DNA sequence; 2) deep-sequencing the GFP locus of the population; and 3) whole-genome sequencing of three GFP+ clones.
For the first assay, we show that single-nucleotide change in the cognate target sequence led to 4-8 fold decrease in observed editing rates (Fig. 3a), indicating that ZFP-8aa-AID is specific to the target locus. We next investigated the specificity of TALE-AID by individually varying each nucleotide in the TALE recognition site to the second most preferred base 1 for that position (Fig. 3b). Interestingly, TALE-C3-AID, which was designed to recognize a 14bp sequence, showed strong sequence specificity only for the first 8bp proximal to the target site (5' TTCTTCCC 3' in the TALE recognition site). For reasons that remain to be investigated, sequence alterations at more distal positions in the TALE binding site led to variable targeting frequency (Fig. 3b).
Next, to examine on-target editing at single-bp resolution, we sorted 10,000 GFP+ and 10,000 GFP-cells after 30 hours of ZF-8aa-AID induction, and randomly isolated 200 individual colonies from each population. We Sanger sequenced 1kb surrounding the gfp target site and, as control, the constitutively expressed gapA gene, which lies 1.9Mbp away from gfp. In the GFP+ population, all colonies harbored the intended C T transition in the gfp start codon. Interestingly, 5.5% (11/ 200 colonies) of these colonies carried additional C T transitions in the GFP transgene (Fig. 3c). Most of these additional mutations were confined in a +/-15bp region flanking the ZF binding site, mutations >150bp away were also detected, suggesting catalytically processivity of AID 23 . In the GFP-population, the only variant detected over 200 colonies was a G A transition 1bp away from the intended target site (ACG ACA) that is present in 2% of the population (Fig. 3c). No mutations were found in gapA in any colony from the two populations. We next repeated our assay using TALE-C3-AID. In the GFP+ population, besides the intended C T mutation, an additional C T mutation 4bp upstream of the intended site was found in 4.5% population (9/ 200 colonies, Fig. 3d). No other off-target mutations were detected in the GFP coding sequence or in the GFP-cells. We concluded that targeted deaminases have residual processivity that mutate nearby C's within a +/-15bp window of the target DNA sequence.
Finally, to assess genome-wide off-targeting, we sequenced with ~50X coverage the genomes of three GFP+ colonies edited by ZF-8aa-AID, and three colonies edited by TALE-C1-AID, and compared them to control GFP-colonies in which the expression of deaminases had not been induced. We did not observe increased indel mutations in the ZF/TALE-AID expressing clones ( Fig. 3e and 3f, Wilcoxon, P value =0.7109). However, we detected elevated levels of genome-wide C:G T:A transitions in WRC sequence motifs following expression of targeted deaminases (Wilcoxon, P value =0.02, t-test) ( Fig. 3e and   3f). In addition, we did not find any off-target mutations at predicted ZF/TALE off-target sites. The fact that off-target mutations are enriched at WRC motifs -the canonical AID recognition sequence 24 -suggest that off-target mutations derived from intrinsic activity of AID.

Human genome engineering using targeted deaminases
Given the intense interest in precise genomic editing for human biomedical studies, we tested functionality of our targeted deaminases in human cells. We constructed a mammalian reporter in which an EF1α promoter drives expression of a GFP harboring a broken-start-codon (ACG), followed by an IRES-mCherry selection marker. The reporter construct was stably integrated into HEK293FT cells by lentiviral transduction and a clonal cell line was isolated by FACS sorting (Fig. 4a). The optimized ZF-AID construct (ZF-8aa-AID) was then delivered into the reporter cell line via transfection (Fig. 4a). Following 48 hours of ZF-8aa-AID expression, 0.12% of transfected cells turned GFP+. We next constructed ZF-AID Δ NES by truncating the 15aa from the C-terminus of AID, which contains a strong nuclear export signal 25 and regions that interact with mismatch repair proteins 26 . This is expected to correctly localize the ZF-AID to the nucleus and decouple AID from the mismatch repair pathway. The expression of ZF-AID Δ NES significantly increased GFP+ cell frequency compared to fulllength ZF-AID ( Fig. 4b) (0.56%, t-test, two-tailed, n=4, P value =0.0013). Encouraged by our E.coli study, we examined if inhibiting the counteracting pathways of uracil repair and mismatch repair would increase C-to-T transition in human cells. Interestingly, the combination of the UNG inhibitor UGI 27 and MSH2 shRNA increased ZF-AID Δ NES mediated editing efficiency to 2.5% (Fig. 4b). In contrast, the expression of ZF GFPINL -AID Δ NES , a fusion protein whose zinc finger domain targets a site 265bp away from the GFP start codon, resulted in minimal GFP rescue (Fig. 4a,c), suggesting that genome editing by ZF-AID Δ NES is sequence-specific. Successful C:G T:A targeting of the broken start codon was confirmed by Sanger sequencing of the GFP locus in 8/8 stable GFP+ colonies. Therefore, engineered deaminases are capable of efficient sequence-specific genome editing in HEK293FT cells, and editing efficiency can be significantly increased by inhibiting the uracil repair pathway.
We next investigate toxicity of the targeted deaminase in human cells. To test whether ZF-AID Δ NES can be safely used as a genome editing tool without incurring DSBs, we generated a HEK293FT reporter cell line carrying a non-functional frame-shifted GFP, which could be rescued by DSB-enhanced HDR with an exogenous donor DNA 3, 28 (GFP-In reporter in Fig. 4d). If DSBs occurred from ZF-AID

Discussion
Our study demonstrates that fusing cytidine deaminases with DNA binding modules enables site-specific deamination of genomic loci in both prokaryotic and eukaryotic cells. We designed and optimized the structure of targeted deaminases to effectively convert a specific C:G base pair to T:A in the E.coli genome, achieving up to 13% editing frequency. We then applied the optimized chimeric deaminases to a human cell line and found that these novel enzymes could create site-specific single-nucleotide transitions in as many as 2.5% of cells. The transfected cells demonstrated decreased cytotoxicity compared with targeted nucleases. We found that inhibition of uracil and mismatch repair were critical to achieving these high editing rates.
A recent study independently developed targeted deaminases using Cas9 as the DNA binding domain and reported obtaining similar results, albeit by different means 30 .
Their Cas9 deaminase achieved higher efficiencies by taking advantage of the fact that Cas9 binding generates an ssDNA loop, a natural substrate for AID and APOBEC deaminases. Instead of expressing UGI independently, they fused it to nikase-Cas9, and cleverly supressed mismatch repair by allowing Cas9 to nick the non-targeted strand.
The authors of this study suggested that, due to their efficiency and their avoidance of mutagenic dsDNA cuts, targeted deaminases might be promising tools for the correction of genetic diseases. However, our results suggest that further engineering is needed to reduce their processivity 30 and off-target activity. First, it is still difficult to pinpoint the activity of targeted deaminases to a specific cytidine within a +/-15bp window ( Fig. 3c and 3d), and we also detected off-target mutations > 150bp away from the DNA binding site. These findings indicate that targeting does not eliminate the processivity of deaminases and suggest the need to engineer them to reduce offtargeting. Additionally, our whole genome sequencing data demonstrated elevated levels of global deamination in off-target WRC sites ( Fig. 3e and 3f), suggesting that deaminases maintain their intrinsic DNA binding preferences and editing activity even when fused to targeted DNA binding domains. That the overexpression of APOBEC3B has been implicated as a driver for human breast, ovarian and cervical cancers through its generation of random C:G T:A transitions 31, 32 provides a cautionary note about the need to constrain the intrinsic preferences and activity of deaminases. One possible solution to this problem would be to create obligatory dimeric targeted deaminases by splitting the deaminase protein and fusing each half to an independent DNA binding domain, so that an active deaminase protein would only be generated if both halves are targeted to two specific nearby DNA sequences. This approach could also reduce the processivity of the enzyme.
Our results set the stage for the future engineering of additional targeted DNA nucleotide mutases beyond cytidine deaminases, such as targeted adenosine deaminases 33 , that effect changes to DNA without introducing DNA cuts or nicks. Aside from using such targeted mutases for gene therapy as suggested for deaminase in the recent study 30 , suitably engineered to reduce their intrinsic mutability as we indicate, we foresee other potential uses of these tools. First, as nucleases tend to be toxic in prokaryotes 34, 35 due to their lack of efficient DSB repair pathways such as NHEJ, targeted DNA mutases may provide an effective means to engineer prokaryotic species such as Streptococcus for which few molecular tools are available. As demonstrated here with deaminases, targeted mutases have potential to be highly portable in both prokaryotic and eukaryotic systems. Second, although DSBs are better tolerated in eukaryotes, certain cell types are very sensitive to DSBs incurred by nucleases, such human induced pluripotent cells. We demonstrated that targeted deaminases incur significantly less cytotoxicity compared with nucleases in HEK293 cell and we envision that they and other targeted mutases will similarly be less toxic in these sensitive cell types. In addition, targeted mutases should be effective ways to generate precise mutations in non-dividing cells in which HDR activity is extremely low 13 . Moreover, the independency from exogenous DNA donor to make precise mutations likely allows this tool in the agriculture applications without GMO regulation, similar to recently successful cases of mushroom and corn in which CRISPR were used to disrupt endogenous genes.
Finally, although our group demonstrated highly multiplexible (62X) targeting of a repetitive sequence in immortalized pig cells 36 , it has proven highly difficult to achieve this result in primary cells, likely because the high number of DSBs required to achieve this result leads to chromosomal rearrangements, senescence and apoptosis. We believe that the independence from DSBs may make targeted mutases a safer and more efficient tool for editing and studying repetitive elements in the genome.

Construction of fusion proteins:
To construct ZFP-AID fusion proteins, we first PCR amplified ZFP from pUC57-ZFP 17 and AID from pTrc99A-AID 37 and fused these two parts with various linkers using overlap PCR. The fusion constructs were cloned into a pTrc-Kan plasmid. We fused AID with TALE by cloning AID into pLenti-EF1a-TALE(0.5 NI)-WPRE 20 plasmid and then cloned TALE-AID fusions into the pTrc-Kan plasmid.
APOBEC1, 3F, and 3G genes were synthesized (Genescript) and cloned into the pTrc-ZFP-Kan plasmid. To generate pCMV-ZF-AID constructs, we amplified ZF-AID cassette from pTrc-ZF-AID and cloned that into pCMV-hygo 20 plasmid. The detailed construction methods are found in Supplementary Method 1 and illustrated in Supplementary

1-5.
Construction of E.coli reporter cell lines: The GFP coding sequence was amplified from pRSET-EmGFP (Invitrogen). We modified the reporter by mutating the start codon to ACG and inserting a ZFP/TAL binding site upstream of the GFP coding sequence. To establish stable cell lines with a single copy of the GFP reporter sequence in the genome, we integrated the GFP cassette into the galK locus in the EcNR1 (MG1566   Fig.4) provides information about the number of raw sequence reads, aligned reads, genome coverage and validated SNVs (Supplementary Table 1 and 2), and the list of SNVs (Supplementary Table 3

Figure1 | Design and targeted deaminase activity of chimeric deaminases in E.coli.
a, Schematic representation of the design of targeted deaminases. The DNA binding domain (DBD), either ZF or TALE, was fused to N-terminus of the deaminase with a certain linker. b, Experimental overview: we integrated a GFP cassette (top) consisting of a broken start codon ACG, DNA binding sequence, and the GFP coding sequence into the bacterial genome. We subsequently transformed targeted deaminases (middle) in pTrc-kan plasmid (Supplementary Method1) into the strain and induced protein expression. Targeted deamination of the C in the broken start codon leads to a ACG ATG transition (bottom), rescuing GFP translation which is quantifiable via flow cytometry. c, ZF-deaminases were tested for targeted deaminase activity by measuring GFP rescue. ZF, ZF-APOBECs (ZF-APOBEC1, ZF-APOBEC3F, ZF-APOBEC3G) or ZF-AID indicate cells transformed with plasmids that express ZF, ZF-APOBECs or ZF-AID respectively. All error bars indicate s.d. (All t-tests compare ZF-deaminases against the ZF control. P value < 0.05 *, P value < 0.01 **, P value < 0.001 ***, n=4). d, GFP rescue by ZF-AID and TALE-AID in the ZF-reporter and TALE-reporter strains.(All t-tests compare the fusion deaminases against the AID control. P value < 0.05 *, P value < 0.01 **, P value < 0.001 ***, n=4). e, GFP rescue by ZF-AIDs and TALE-AID in (wild type), (Δung), and (ΔmutS Δ ung) strains. All error bars indicate s.d.. (All t-tests compare the fusion deaminases against the AID control. P value < 0.05 *, P value < 0.01 **, P value < 0.001 ***, n=4). f, E.coli (ΔmutS

Figure2 | Optimization of targeted deamination frequency of AID fusions in E.coli.
a, Schematic representation of ZF-AIDs variants tested for targeted deaminase activity (upper) and the reporter (lower) with the ZF-recognition sequence in blue. b, GFP rescue by expression of the four ZF-AIDs variants and ZF or AID domains alone. All error bars indicate s.d.. c, Schematic representation of TALE-AIDs and the reporters tested for targeted deaminase activity. Five TALE-AIDs (upper) with different TALE Cterminus truncations (C1 to C5) were constructed, with the remaining C-terminus lengths shown in parentheses. Full TALE-AID protein sequences can be found in Supplementary Sequence 2. Five reporters were constructed (lower) with different spacer lengths (2bp, 5bp, 8bp, 11bp) between the broken start codon and TALE DNA binding motif. The TALE binding site on the GFP reporter is shown in blue; the TALE Nterminus segment specifies the 5′ thymine base of the binding site. d. All five TALE-AIDs were tested for targeted deaminase activity on all five reporters (2c). Green and grey encode high and low GFP rescue, respectively.

Figure3 | Test of the specificity of AID fusions. a,
Test of ZF-8aa-AID sequence specificity using a GFP reporter with point-mutated ZF binding sequences. t-tests compare each mutated site against the unmodified site (top). P value < 0.05 *, P value < 0.01 **, P value < 0.001 ***, n=4. All error bars indicate s.d.. b, Test of TALE-C1-AID sequence specificity using a GFP reporter with point-mutated TALE binding sites. t-tests compare each mutated site against the unmodified site (top). P value < 0.05 *, P value < 0.01 **, P value < 0.001 ***, n=4. All error bars indicate s.d.. Note that we altered the first nucleotide, a TALE-N terminus-specified thymine, to other three nucleotides individually, while we changed other nucleotides in the TALE recognition domain to the nucleotide mostly likely to be recognized 1 . c. Mutation location and spectrum in the GFP gene of GFP+ and GFP-cells collected after ZF-8aa-AID induction. A schematic structure of the GFP gene is shown above the mutation frequency along the gene's length among 200 Sanger sequenced colonies of each cell population. Gray lines indicate positions of C/G nucleotides; red lines indicate occurrences of the AID preferred motif (WRC). d. Mutation spectrum on the GFP gene of GFP+ and GFP-cells collected after TALE-C1-AID induction. e, Whole-genome SNV profiles of strains with/without ZF-AID induction. SNVs that may stem from cytosine deamination (C/G T/A) are in either green (if C was in the AID-preferred WRC motif) or blue (all other Cs) bars. f, Whole-genome SNVs profiles of strains with/without TALE-AID induction. Color schematic is the same as 3e.

Figure4 | Targeted deamination and low toxicity of ZF-AID in human cells. a,
Schematic representation of the ACG-GFP reporter system in HEK239FT cells (upper) and the ZF-AID (lower) tested for targeting deaminase activity. IRES, internal ribosome entry site; NLS, nuclear localization signal. b. Targeted deamination activity of ZF-AIDs. ACG-GFP reporter cells were transfected with the constructs labeled on the X-axis.      In-dels

G C A C G A T T C T T C C C T A A C G A T T C T T C C C T C C C G A T T C T T C C C T C A A G A T T C T T C C C T C A C A A T T C T T C C C T C A C G C T T C T T C C C T C A C G A C T C T T C C C T C A C G A T C C T T C C C T C A C G A T T A T T C C C T C A C G A T T C C T C C C C C A C G A T T C T T C C C T C A C G A T T C T C C C C T C A C G A T T C T T A C C T C A C G A T T C T T C A C T C A C G A T T C T T C C A A C A C G A T T C T T C C C T C A C G A T T C T T C C C
TALE binding site