Cytosine base editing systems with minimized off-target effect and molecular size

Cytosine base editing enables the installation of specific point mutations without double-strand breaks in DNA and is advantageous for various applications such as gene therapy, but further reduction of off-target risk and development of efficient delivery methods are desired. Here we show structure-based rational engineering of the cytosine base editing system Target-AID to minimize its off-target effect and molecular size. By intensive and careful truncation, DNA-binding domain of its deaminase PmCDA1 is eliminated and additional mutations are introduced to restore enzyme function. The resulting tCDA1EQ is effective in N-terminal fusion (AID-2S) or inlaid architecture (AID-3S) with Cas9, showing minimized RNA-mediated editing and gRNA-dependent/independent DNA off-targets, as assessed in human cells. Combining with the smaller Cas9 ortholog system (SaCas9), a cytosine base editing system is created that is within the size limit of AAV vector.


Introduction
The cytosine base editing is mediated by cytidine deaminase guided by a nuclease-de cient CRISPR system. At the target site, deamination of cytosine generates uracil, which eventually converts C•G base pair (bp) to a T•A bp without introducing a double DNA strand break 1 2 . The originally developed cytosine base editing systems, Base editor (BE) 1 and Target-AID 2 , respectively employ rAPOBEC and PmCDA1 as deaminases and e ciently introduce mutations within the editing windows of 12-16 bases and 16-20 bases upstream of the PAM (protospacer-adjacent motif).
Recent studies have raised concern that deaminase-mediated base editing systems can induce genomewide SNV off-targets especially if overexpressed for long period of time 3 4 . In contrast to CRISPR-Cas9dependent off-targets which is based on mismatch annealing of guide RNAs, the SNV off-targets induced by base editing appears to be independent of the target sequence and is thought to be caused by nonspeci c, random deamination by the deaminase domain. BE systems have been well studied and the original BE3 and BE4 have been shown to induce both DNA and RNA off-targets [3][4][5][6] and several rAPOBEC1 mutant variants were then identi ed with reduced off-target effects 5,7,8 . Analytical methods have also been developed to evaluate off-target potential of the base editors. Genome-wide mutations can be thoroughly elucidated by whole genome sequencing (WGS). However, WGS is expensive, timeconsuming, and has low throughput. In addition, it may not be sensitive enough for further comparative analysis of the improved base editors. Previous WGS-based studies had indicated that actively transcribed regions were prone to base editing off-target effect 3,9 , because the R-loops which is formed by the exposure of single-stranded DNA by RNA transcription is a preferred substrate for the deaminases. To mimic such hot spots, localized R-loop was formed by using an orthogonal nuclease-defective CRISPR system which was co-transfected with base editing systems targeting another distant locus 7,8 . Deep sequencing of the R-loop region allowed for rapid and sensitive comparative evaluation of gRNAindependent off target potency. Alternatively, the rate of non-speci c mutations can be monitored as the occurrence of drug resistant mutants in microbes such as yeast 2 and E.coli 10 . By conducting these studies simultaneously, potential biases can be compensated for each other 7 .
For the treatment of genetic diseases, base editing is considered as a promising agent because it can install speci c SNV without inducing DNA double-strand breaks or template DNAs. In vivo delivery is one of the major bottlenecks to achieve e cient and speci c editing at the target tissue. Smaller molecular size is advantageous especially for in vivo delivery tools such as adeno-associated virus (AAV) vector.
AAV vector is one of the promising delivery methods for gene therapy with greater safety and e ciency 11 , although its DNA vector size limitation (4-5kb) hinders wider applications including base editing. In conventional genome editing, smaller CRISPR ortholog Staphylococcus aureus (Sa) Cas9 has led to the development of single AAV vector 12,13 , but adding a deaminase domain has been di cult 14 . Instead of composing a single AAV vector, the base editing components could be split into two AAV vectors to circumvent the size limitation 15 .
In this study, we addressed structure-based rational engineering of the cytosine base editing system Target -AID to minimize its off-target effects and molecular size.

Results
Elimination of DNA-binding region and restoration of deaminase activity of PmCDA1 DNA deaminases have an intrinsic a nity for DNA and cause nonspeci c deamination. The structure of hAID, a human homolog of PmCDA1, has revealed its complex formation with double-stranded DNA in a region distinct from the catalytic core 16 (Fig. 1a). Based on the amino acid alignment of hAID and PmCDA1, the potential DNA-binding moieties for PmCDA1 were located to residues 21-27 and 172-192 of the total 208 amino acids (a.a.) length of the protein (Fig. 1a). To delete the predicted DNA-binding region, we rst made a series of truncations from the C-terminal end (1-201, 1-197, 1-190, 1-183, 1-179, 1-176, 1-161) and tested their base editing activity in yeast Saccharomyces cerevisiae (BY4741) cells ( Supplementary Fig. S1). Although the previous report had shown that 47 a.a. truncated PmCDA1 (1-161) still exhibited comparable editing e ciency to that of full-length PmCDA1 (1-208) in yeasts 17 , our versions which were fused to the C-terminus of nCas9 and devoid of uracil DNA glycosylase inhibitor (UGI) showed a gradual decrease of their activity as truncation proceeded, as we assumed that the UGIfusion was too effective in yeast and masked evaluation of the net activity of the deaminase. Next, we performed a series of truncations from the N -terminus of the 1-161 version by fusing to the N-terminus of nCas9. The N-terminus truncations of CDA1 (1-161) rst showed further decreased activity, which was then recovered as truncation proceeded to 21 and 28 a.a ( Supplementary Fig. S2). The predicted structure of the protein indicated that simultaneous truncation of the N-and C-terminus minimizes cross-section and gives a smoother protein surface with less exposure of hydrophobic residues ( Supplementary Fig.  S2). Further truncation to CDA1(30-150), which was predicted to be the smallest one with minimum exposure of the hydrophobic surface while retaining its enzymatic core domain intact (Fig. 1b,   Supplementary Fig. S2), showed recovered activity ( Supplementary Fig. S2). These results suggest that the changes in their editing activity were attributed to the conformational stability of the protein. To further improve its activity, we introduced a series of mutations to the hydrophobic residues that were exposed after the truncation. Six mutations were tested in the rst round and W122E was found to signi cantly gain activity to CDA1(30-150) ( Supplementary Fig. S3). Additional seven mutations were tested in combination with W122E to nd W133R/Q with further improvement of the activity ( Supplementary Fig. S3). CDA1(30-150) containing W122E and W133Q was termed as tCDA1EQ hereafter.
As the engineered deaminase supposedly has less a nity to DNA by itself and might be less stable than the original PmCDA1, nCas9-fusion architecture may have a greater impact on its base editing property. Other than fusing to the nCas9 termini, the deaminase can be inlaid in the middle by splitting nCas9 polypeptide and fusing both termini of the protein to the split site. Structurally, 1054 a.a. position in RuvC domain of Cas9 is on the protein surface with exibility and close to the non-target DNA strand 18 which is subject to deamination. While N-terminally fused tCDA1EQ showed varying editing e ciency among target sites assessed by CAN1 assay 2 , the inlaid version showed consistent editing e ciencies comparable to that of the original Target-AID (Fig. 1d, Supplementary Fig. S4).
To assess non-speci c, gRNA-independent off-target effects, we performed a measurement of the occurrence of thialysine-resistance mutants (LYP1 assay) 2 for the engineered versions fused with UGI.
Both N-terminal and inlaid tCDA1EQ versions showed signi cant decreases (5~79 fold) in the mutant occurrences compared to the original Target-AID (Fig. 2a), indicating that their gRNA-independent offtarget effects were greatly reduced. We named these N-terminal and inlaid tCDA1EQ versions as AID-2S (Small and Speci c) and AID-3S (Small, Speci c and Superior), respectively.

Evaluation of AID-2S and AID-3S in mammalian cells
Next, we evaluated the editing e ciency and window of AID-2S and AID-3S in human HEK293T cells and compared them with existing improved cytosine base editors YE1, YE2, and R33A+K34A that were reported to exhibit reduced off-target effects 7 . The well-studied four on-target sites (HEK2, HEK3, RNF2, and VEGFA) were edited by plasmid DNA vector transfection and analyzed by amplicon deep sequencing. Target-AID, AID-2S, and YE1 showed consistent high e ciency for all four target sites tested. AID-3S and YE2 showed middle-to-high e ciencies dependent on target sites. R33A+K34A showed poor e ciency at HEK3 target site (Fig. 1e, Supplementary Fig. S5). Averaged editing window width for AID-2S is narrower than Target-AID and comparable to YE1 and YE2 (Fig. 1f).
The gRNA-independent off-target effects were assessed by using orthogonal SaCas9 R-loop assay 7 in HEK293T cells (Fig. 2b). SaCas9 off-targets site 1~6 were selected following the previous studies 7 and an additional site 7 (VEGFA locus) was chosen as its C-rich context may provide higher sensitivity to deamination by CBEs. Target-AID showed detectable off-target editing at all seven sites (Fig. 2c), while AID2S showed no detectable off-target occurrence in the site 1, 3 and signi cantly reduced off-target editing at site 2, 5, 6, 7, which was comparable to YE2 and R33A+K34A. YE1 showed rather higher offtarget editing at site 6 and 7. AID-3S showed the lowest, hardly detectable off-target editing across all seven sites tested. This may be attributed to the inlaid architecture which sterically limits the access of the enzyme beyond Cas9-binding DNA strand, in addition to the eliminated DNA a nity. On average, AID-2S and -3S respectively exhibited approximately 4.5-folds and 13.7-folds reduction of R-loop off-target editing compared to the original Target-AID while maintaining appreciable on-target editing e ciency (Fig. 2c, 2d). Combined with yeast LYP1 assay, these consistently support that the genome wide, gRNAindependent off-target effect is greatly mitigated in AID-2S and -3S. We also looked into the gRNAdependent off-target effect by deep-sequencing of the 6 reported sites (HEK2_OF1, 2; VEGFA_OF1, 2, 3, 4) 19,20 (Fig. 2e, Supplementary Fig. S6). AID-2S and AID-3S, together with YE2 and R33A+K34A showed substantially reduced off-target editing at all the sites analyzed.

Minimization of cytosine base editing system
The engineered PmCDA1 (tCDA1EQ) is substantially smaller (121 a.a.) in size compared to the wild-type (208 a.a.). Smaller molecular size as a genome editing component is advantageous especially for in vivo delivery tools such as AAV vector, which limits the size of DNA length to 4-5kb. Even using the small ortholog SaCas9 system addition of base editing components apparently exceeds the size limit (Fig. 1g). To develop SaAID-3S that contain all necessary base editing components in a loadable size for a single AAV vector, tCDA1EQ was inlaid into 615-616 a.a. position of nSaCas9 within the HNH domain facing to the polynucleotide-binding cleft. Small Scp1 promoter 21 and SpA terminator were also employed to compose a total length of 4036 bp plus 332 bp of gRNA expression cassette. For comparison, the conventional form of SaCas9 version of Target-AID (SaAID) was also developed, which contains fulllength PmCDA1 with linker, UGI, CMV promoter, and SV40 terminator to compose a total length of 5220 bp without gRNA cassette. To normalize transfection e ciency that may vary depending on the vector size, the transfected cells were sorted by the uorescent signal of iRFP670 expressed from the vector backbone. At the two target sites tested, both constructs showed comparable editing e ciency (Fig.1h) with differences in mutation window ( Supplementary Fig. S8).

Discussion
In this study, we developed high-delity cytosine base editors through structural engineering of the deaminase PmCDA1 to remove the potential non-speci c DNA binding moiety. Although the previous studies have explored a series of C-terminal truncations of PmCDA1 to show narrowed editing window and reduced genome-wide off-targets in yeasts 17 , simple stepwise removal of the region led to a substantial reduction of its net deaminase activity when measured without UGI. As the UGI-fusion form is so effective in yeast that it may mask evaluation of the net activity of the deaminase, which may not be readily applicable in other organisms. We intentionally omit UGI for strict evaluation of the enzymatic activity in yeast, then add UGI in the following off-target assay and human cell applications. Based on the predicted structure, we deliberately truncated both N-and C-termini of PmCDA1 to cleanly eliminate the DNA-binding domain and to minimize the protein section. The amino acid substitutions to lessen hydrophobicity at the exposed surface further recovered the activity, probably due to improved protein stability or folding. The obtained tCDA1EQ version demonstrated comparable on-target activity and greatly reduced off-target effect, especially in the AID-3S inlaid form. Cas9-inlaid architectures for base editing had been explored in CBE to expand editing window 18 and in ABE to reduce RNA off-target 14 . Consistently, AID-3S showed a wider editing window and further less R-loop off-target effect than AID-2S. As AID-2S also outperformed pre-existing base editors in on/off-target pro le, its narrower editing window should be useful for precise on-target editing.
YE1 was initially developed to narrow editing window 22 then revealed to have reduced off-target effects for both DNA and RNA 23 . While YE2 and R33A+K34A elicited further lowered off-target effects, they performed less robust on-target editing 7 . In this study, YE1 showed relatively high off-target editing, especially at the R-loop site 6 and 7. This might be due to the motif preference of APOBEC1 to 5' TC motif 1,5 , which was clearly observed at the both R-loop off-targets ( Supplementary Fig. S6) . YE2 and R33A+K34A also showed the same trend but much less extent, probably due to weakened substrate binding capacity. In contrast to APOBEC1 and other APOBEC family proteins, PmCDA1 apparently did not show such strong motif preferences nor RNA editing 2,24,25 . Cas9-gRNA dependent off-target effect was also shown to be signi cantly reduced in AID-2S and -3S. Possibly, the DNA a nity provided by deaminase may cooperate with the off-target binding of Cas9-gRNA and subsequent editing.
Minimal off-target effects and robust on-target editing are expected to have a wide range of applications from plant and microbial breeding to clinical use. The AID-3S concept has also been demonstrated with the SaCas9 ortholog, providing the smallest base editing system of its size that can be carried in a single AAV vector, facilitating its application in safer gene therapy. Figure 1 Rational engineering of smaller and speci c Target-AID. a, Ribbon model of the structure of a complex of human AID with dsDNA. The non-catalytic dsDNA binding domain is shown in green (N-terminus) and red (C-terminus), of which amino acid sequences are aligned with that of PmCDA1 at the bottom. b, The predicted space-lling structure of PmCDA1 before and after engineering. In addition to the direct DNAbinding sites (green and red), segments shown in blue were trimmed to minimize the protein section. The mutated amino acids (W122 and W139) are marked in yellow. c, On-target editing e ciencies for Target-AID, AID-2S, and AID-3S without UGI in yeast canavanine-resistance assay. CAN1-2 (blue dots) and CAN1-3 (orange dots) were selected as the target sites and the biological triplicates were plotted. d, Domain arrangements of CBE variants used in this study. BE architecture is common to YE1, YE2 and R33A+K34A, except for the point mutations in rAPOBEC1. e and f, On-target editing pro les of the CBE variants analyzed by deep sequencing at HEK2, HEK3, RNF2, and VEGFA sites in HEK293T. In e, the nucleotide positions (numbered from the PAM sequence side of the target sequence toward the 5' side) with the highest C to T conversion frequencies for each target are shown. The mutation frequencies of each nucleotide position are also shown in Supplementary Fig 5. For f, the averaged editing windows for the four targets are shown. For e, f, and h, the mean scores (square bars) with standard deviations (error bars) are shown and each biological replicate is plotted with a dot if n<9. g, Domain architectures of SaAID and SaAID-3S. The gRNA expression cassette was combined into each effector plasmid. h, Ontarget editing frequencies of SaAID and SaAID-3S in HEK293T are shown as in e. To normalize transfection e ciencies, cells were sorted by the expression of iRFP670 from the plasmid backbone. The mutation frequencies of each nucleotide position are also shown in Supplementary Fig. 8. Off-target assessment of AID-2S, -3S and rAPOBEC1 base editors. a, Occurrences of on-target mutation (canavanine resistance) and off-target mutation (thialysine resistance) were measured after induction of each construct as indicated in yeast. Biological triplicates were plotted for CAN1-2 (blue dots) and CAN1-3 (orange dots) target sites.b, Schematic illustration of orthogonal R-loop off-target assessment. Local Rloop formed by nickase SaCas9 serves as a hot spot for off-target events by SpCas9-based base editors.