Upgraded adenine base editor (uABE) with minimized RNA off-targeting activity

Both adenine base editors (ABEs) and cytosine base editors (CBEs) have been recently revealed to induce transcriptome-wide RNA off-target editing in a guide RNA-independent manner. As the optimized ABE, ABEmax, induces highly ecient A-to-I (inosine) editing within an E.coli tRNA-like structure, we construct a reporter system containing E.coli Hokb gene with a tRNA-like motif for robust detection of RNA editing activities. Then, we design mutations to disrupt the interaction between TadA and tRNAs in structure-guided principles, and nd that Arginine 153 (R153) within TadA is essential for recognizing core tRNA-like structures. Two ABEmax or mini ABEmax variants (TadA* fused with Cas9 (D10A)) with deletion of R153 within TadA and/or TadA* (named as del153/del153* and mini del153) are successfully engineered, showing minimized RNA editing, but comparable DNA on-targeting activities. Moreover, del153 in recently reported ABE8e or ABE8s can also largely reduce their RNA off-targeting activities. Taken together, we develop a strategy to generate upgraded ABEs (uABEs) with minimized RNA off-target activities.


Introduction
Adenine base editors (ABEs), which is originally designed by fusing a wild-type E.coli TadA (ecTadA) and/or a laboratory-evolved E.coli TadA (TadA*) with a Cas9 (D10A) nickase (Cas9n), can induce e cient A-to-G or T-to-C conversions with very low levels of unwanted mutations or insertions [1,2]. ABE is designed based on the native structure of homodimerized ecTadA, which can deaminate an adenosine within a transfer RNA (tRNA) [3], with an evolved TadA* being capable of deaminating genomic DNA adenosines [1]. Although ABEs show no detectable guide RNA-independent DNA off-target editing [4,5], both ABEs and cytosine base editors (CBEs) can induce tens of thousands of A-to-I (inosine) or C-to-U (uracil) RNA edits transcriptome-widely in guide RNA-independent manners in human cells [6,7].
Engineered CBE and ABE variants bearing rAPOBEC1 mutations [6] or TadA/TadA* mutations [7][8][9], have been recently reported with reduced RNA off-targeting activities. In these studies [6,8,9], GATK HaplotypeCaller, a tool for evaluating germline single nucleotide polymorphisms (SNPs) and indels [10], is employed for analyzing RNA A-to-I edits. It is worth noticing that RNA edits with 0-10% e ciency was not able to be recovered by this tool [6,8,9], suggesting a possible underestimation of RNA off-targets, therefore, driving us to further examine the ABE-induced off-target editing of cellular RNAs in depth.
According to structure-guided principles, we successfully engineered ABEmax and miniABEmax variants to generate upgraded ABEs (uABEs) that retained DNA on-target editing e ciency while largely decreased RNA-editing activities.

Results
Engineering ABEmax with reduced RNA deaminase activities Considering that cellular RNAs with wide range of aneuploidy copies have been discovered as RNA offtarget substrates of ABEs [6][7][8][9], we reasoned that Mutect2, a GATK tool for sensitive detection of somatic point mutations in heterogeneous cancer samples [11], might be more appropriate than HaplotypeCaller (for detection of euploid germline SNPs) for detection of RNA edits [12] (Supplementary Fig. 1a). Thus, we reanalyzed ABEs-and their optimized variants (miniABEmax-V82G, ABE7.10-F148A, and ABEmaxAW)induced RNA off-targets [6][7][8][9], and found that indeed, Mutect2 recovered 2.7-11-fold number of ABEsinduced RNA edits compared with that using HaplotypeCaller with similar editing signatures (endogenous A-to-I edits were deducted from control sequencing data), revealing these optimized ABE variants still retained a relatively large number of RNA edits (Supplementary Fig. 1b-g). Surprisingly, the overlapped RNA edits from HaplotypeCaller and Mutect2 were as few as 22-68% of HaplotypeCaller-calculated RNA off-targets ( Supplementary Fig. 1h-m). We further performed Manhattan plotting for ABEmax-induced RNA off-targets [6] to show e ciency distributions of overlapped, HaplotypeCaller-speci c, and Mutect2speci c RNA edits, respectively, demonstrating that the number of Mutect2-speci c RNA edits was much more than HaplotypeCaller-speci c edits, especially for those edits with 0-10% editing e ciency, which were ignored by HaplotypeCaller ( Supplementary Fig. 1l). Meanwhile, lower overlapping ratio was discovered for the samples possessing fewer RNA edits ( Supplementary Fig. 1m). Nine MuTect2-speci c edits (with > 10% e ciency in RNA-seq data) were randomly selected for PCR validation with cDNAs subjected to RNA-seq experiment. Indeed, all of these amplicons were successfully detected with high or low e ciency of A-to-G mutations ( Supplementary Fig. 1n), con rming the reliability of MuTect2-speci c edits and suggesting that it is necessary to engineer ABEmax variants based on MuTect2 analysis.
Although an engineered TadA* has been evolved to be capable of deaminating DNA adenines [1,2], both TadA* and wildtype TadA retain the ability to deaminate cellular RNAs [9]. We analyzed ABEmax-induced RNA edits in ABEmax-overexpressed HEK293T cells from a published RNA-seq dataset (Supplementary Fig. 2a-b) [6]. Meanwhile, we generated our data by co-transfecting HEK293T cells with a sgRNA to e ciently induce DNA A-to-G conversion within ABE site 8 ( Supplementary Fig. 2c-e). The cells with highest 15% GFP signal were collected for on-target and off-target analysis. Higher overlapping ratios between two independent duplicates were observed for those A positions of RNA edits with higher editing e ciency, demonstrating the preferential a nity of ABEmax for highly edited RNAs (Supplementary Fig. 2a and 2d). Thus, we calculated the sequence logos for ABEmax-induced RNA edits with differential scope of editing e ciency, showing that higher-edited adenines preferentially located within a conserved motif being more close to UACGA (Supplementary Fig. 2b and 2e), which highly resembles the conserved loop region of tRNA substrate for ecTadA [3]. These data demonstrate that, consistent with a recent report [9], ABEmax induces e cient transcriptome-wide off-target RNA editing harboring core E.coli tRNA-like sequences.
Therefore, we hypothesized that disruption of the interaction between TadA/TadA* heterodimer and tRNAloop structure may interfere the catalytic activities of ABEmax on RNA. Since there is no crystal structure information for the complexing between ecTadA and tRNA, we referred the co-crystal structure of Staphylococcus aureus TadA (saTadA) and tRNA as well as the alignment of the conserved amino acid sequences between ecTadA and saTadA with high similarity [3,13], showing that the amino acids possibly responsible for interaction with tRNA are conserved between the two types of TadA ( Fig. 1a and Supplementary Fig. 3a). Thus, we introduced a series of point mutations into either the TadA or TadA* monomer of ABEmax according to the interacting interface between homodimerized TadA and tRNA [3] to disrupt TadA/TadA* and tRNA interactions [3], and measured their RNA and DNA editing activities (Fig. 1b). To facilitate this test, we generated a robust reporter by cloning the E.coli Hokb (ecHokb) gene containing tRNA-like CTACGAA sequence, which has been reported to be highly edited by ecTadA at RNA levels [14], into a CMV promoter-driven vector. Then, this reporter was co-transfected with a sgRNA targeting HEK site 3 and ABEmax or its mutated variants, and the A-to-G editing e ciencies in ecHokb cDNA (reversely transcribed from mRNA) or genomic DNA (gDNA) were determined by deep sequencing on ecHokb cDNA or gDNA amplicons. It showed that both ABEmax and 2xTadA induced highly e cient RNA but not DNA editing within ecHokb locus. Notably, we identi ed three variants (N46A, H57A, and R153P) with substantially decreased RNA editing activities, especially R153P with most reduced RNA edits comparable to the negative Cas9n control ( Fig. 1c and Supplementary Fig. 3b; the endogenous RNA A-to-I edits detected in native HEK293T cells were deducted). In addition, their DNA on-target editing activities were retained (Fig. 1d). Moreover, similar to ABEmax [2], all variants induced very few byproducts and indels (Supplementary Fig. 3c-d). Three amino acids, including N46, R153, and the reported site E59 [6], were likely in close contact with tRNA near the enzymatic pocket in structural prediction ( Supplementary Fig. 3e). Additionally, ABEmax-R153P variant exhibited comparable DNA on-target A-to-G editing activities for multiple target sites in human cells (HEK293T and U2OS cells) (Supplementary Fig. 4a-b). Thus, we identi ed three variants, especially ABEmax-R153P, with minimized RNA editing activities in the reporter assay.

Deletion of Arginine 153 (del153) reduces RNA off-targeting activities in upgraded ABEs
Considering the importance of R153 for deamination ( Fig. 1), we tried to generate upgraded ABEs with reduced RNA off-targets by deleting R153 from both TadA and/or TadA* within ABEmax (del153/del153*) or mini ABEmax (fused by TadA* and Cas9n [9] (mini del153). As expected, we demonstrated that compared with ABEmax or mini ABEmax, the RNA off-targets induced by del153/del153* and mini del153 were largely decreased, and there were as few as 291 (MuTect2) or 98 (HaplotypeCaller) RNA A-to-I edits for mini del153 group (Fig. 2a-c; Supplementary Fig. 5a), while both variants retained a relatively high DNA on-targeting activities (Fig. 2a). We also overlapped or merged the ABEmax-, del153/del153*-, or mini del153-induced RNA A-to-I edits using HaplotypeCaller and MuTect2, respectively. Compared with ABEmax, both del153/del153* and mini del153 induced remarkably decreased RNA edits of the overlapped, HaplotypeCaller-speci c, MuTect2-speci c, and merged edits ( Supplementary Fig. 5b). Manhattan plots and histograms further con rmed that both the number and e ciency for del153/del153*-and mini del153-induced RNA A-to-I edits were strikingly decreased (Fig. 2b-c), accompanying with much lower mean frequencies throughout the transcriptome (Supplementary Fig. 5c).
We further characterize the DNA on-target editing activities of del153/del153* and mini del153 variants for another eight sites. It demonstrated that the DNA on-targeting activities of del153/del153* and mini del153 were highly similar to ABEmax for nearly all of detected sites, except for ABE site 12 with a bit lower but satisfactory editing e ciency ( Fig. 2d; Supplementary Fig. 6a). Similar to SECURE-BE3-induced RNA C-to-U edits with perfect reducing effect [6], del153/del153* and mini del153 variants-induced RNA Ato-I edits were decreased to only dozens or hundreds of off-targets when using HaplotypeCaller ( Supplementary Fig. 5a). Comprehensively considering the on-targeting and off-targeting activities of engineered variants, we designate del153/del153* and mini del153 as our best optimized ABE variants with minimized RNA editing activity. Recently reported ABE8e and ABE8s containing evolved mutations within TadA/TadA* possess increased DNA on-targeting activities as well as elevated RNA off-targeting activities [15,16]. We tried to generate upgraded ABEs with higher DNA on-targeting and lower RNA offtargeting activity by deletion of R153 from ABE8e or ABE8s, demonstrating that RNA off-targets were remarkably decreased from HaplotypeCaller or MuTect2 calculations, and the number of ABE8s-del153 exhibited comparable number of RNA edits with mini ABEmax (Fig. 2e and Supplementary Fig. 6b). Notably, ABE8s-del153 and ABE8e-del153 showed comparable or slight lower levels of DNA A-to-G editing activities; however, the on-targeting activity of f ABE8e/8 s or ABE8e/8 s-del153 was much higher than ABEmax or mini ABEmax, and the editing window of ABE8e/8 s or ABE8e/8 s-del153 was also much wider than ABEmax or mini ABEmax ( Fig. 2f; Supplementary Fig. 6c). In collection, we propose that deletion of R153 is a good strategy for reducing RNA off-targeting activities in upgraded ABEs.

Discussion
Our description of ABEmax-induced transcriptome-wide RNA off-targeting with high frequency and e ciency con rmed recent studies [7][8][9], although the number of RNA A-to-I edits was variable, possibly because of differential expression of ABEs [8] (Supplementary Fig. 6) and detection methods. When we noticed the rare distribution of RNA edits with 0-10% e ciency using HaplotypeCaller [6,8,9], we started using MuTect2, a widely used tool for calling somatic mutations in cancers [11,17], which might be more suitable for analyzing SNPs within aneuploid mRNAs. Surprisingly, we identi ed 2.7-11-fold number of ABEs-induced RNA edits, only 22-68% of which was overlapped with HaplotypeCaller-generated edits ( Supplementary Fig. 1). We conclude that the number of BEs-induced RNA off-targets is underscored, especially for those with < 10% editing e ciency, which may result in some poisoned or oncogenic proteins in therapeutic cases [18]. The sequence logos analysis suggests that TadA/TadA* preferentially edit cellular RNAs with UACGA motif, not regarding the secondary structure of RNAs. In addition, there are also a large number of HaplotypeCaller-recovered RNA A-to-I edits that cannot be captured by MuTect2, some of which can be validated by Sanger sequencing (data not shown). Detection performances of different tools to call RNA mutations highly depend on the sequencing depths, detected regions, and variant allele frequencies [12], which may lead to differential results from different tool. Therefore, barely using MuTect2 is not the best way to examine RNA A-to-I edits, and developing a new tool, such combining HaplotypeCaller and MuTect2 [11,19], for more accurate evaluation of RNA off-targeting effect will be quite helpful [20].
Based on a structure-guided design [3] to disrupt the interaction between TadA/TadA* and tRNA-like mRNAs with conserved UACGA motif [9], we successfully identify R153 as an important amino acid for deaminase activity of TadA/TadA*, supported by R153A/R153A* variants-induced lower e ciency of RNA A-to-I edits. Interestingly, the RNA editing e ciency for our reporter ecHokb and those e ciently edited RNAs with tRNA loop-like structures by ABEmax was decreased upon R153 mutation, whereas the total number of RNA edits was not signi cantly changed (Fig. 1). It indicates that R153 might be required for TadA to speci cally bind to t-RNA loop like RNAs, and deletion of R153 within TadA/TadA* in del153/del153* and mini del153 variants strikingly reduces the number of RNA off-targets, with high DNA on-targeting activity retained (Fig. 2), further con rming the reasonability of our strategy. However, mutation of R153 into "P" or "A" may retain its structural interaction with RNAs, while mutation into an acidic amino acid "E" may disrupt this interaction, because R153E mutant possesses very low ontargeting activity (data not shown). Moreover, our del153/del153* and mini del153 variants show better optimizing effects than the reported versions under our experimental conditions. When comparing with the perfect reducing effect of SUCURE-BE3-induced RNA C-to-U edits [6], del153/del153*, and mini del153 variants-induced RNA A-to-I edits are decreased to only dozens or hundreds of off-targets when using HaplotypeCaller. Considering a slight lower DNA on-targeting e ciency for mini del153 occasionally, del153/del153* is priorly recommended for targets with low targeting e ciency. We also combine del153 strategy with evolved ABEs, ABE8e and ABE8s [15,16], demonstrate that deletion of R153, the residue nearby some of mutated acids in ABE8e/ABE8s [15,16], can also remarkably reduce the number of RNA edits but retain their on-targeting activities in most cases (Fig. 2). Therefore, ABE8e-del153 and ABE8s-del153 are suitable for desiring higher DNA on-targeting and lower RNA off-targeting activities. Besides, it has been reported that BEs-induced RNA off-target editing acts in a sgRNA-independent manner [6,9], and we do not consider the sgRNA-dependent effects in the current study. While these ndings remind us to reconsider the off-targeting activities of our and others' reported dCas9-fused epigenome editing tools [21,22].
In sum, we reveal R153 of TadA/TadA* as an essential amino acid for its RNA interacting ability, and we successfully optimize ABEs by deletion of R153 from TadA/TadA* to generate upgraded ABE (uABE) variants, which greatly reduce the number of RNA edits while retain high DNA on-targeting activity. The successful engineering of CBEs and ABEs variants in our and other two studies [6][7][8][9] expands our understanding of desired and undesired features of DNA and RNA editing activities of base editors, and provides a feasible pathway available to engineer base editors based on structural analysis to minimize the unwanted properties while retaining the desired on-target editing ability for CBEs or ABEs.
RNA and genomic DNA extraction Genomic DNA of HEK293T and U2OS cells was extracted using phenol-chloroform method, and embryos genomic DNA was extracted using QuickExtract™ DNA Extraction Solution (Lucigen, QE09050). For RNA extraction, cells harvested from FACS were immediately treated with TRIzol reagent (Vazyme, R401-01), according to the manufacturer's instructions.

Targeted deep sequencing
Target sites were ampli ed with primers in Supplementary Table 3 using Phanta® Max Super-Fidelity DNA Polymerase (Vazyme, P505). PCR products with different barcodes were pooled together for deep sequencing on Illumina Nextseq 500 (2×150 PE) platform at the Novogene Bioinformatics Institute, Beijing, China. BWA and Samtools was employed for mapping the pair-end reads to human reference genome (hg38), and VarDict was used to call single nucleotide variants (SNVs) in amplicon aware mode. The aligned reads were visualized by using the Integrated Genome Viewer (IGV) and tabbed using Pysamstats.

RNA off-target analysis by RNA-seq
The libraries were sequenced on an Illumina HiseqXten-PE150, at a depth of around 20 million reads per sample. The reads were mapped to the human reference genome (hg38) by STAR software (Version 2.5.1), annotation from GENCODE version v30 was used. After removing duplication, variants were identi ed by GATK (version 4.1.2) Mutect2 or HaplotypeCaller. For Mutect2 method, variants were ltered with FilterMutectCalls. For HaplotypeCaller method, variants were rst ltered with QD (Quality by Depth) < 2, then all variants were veri ed and quanti ed by bam-readcount with parameters -q 20 -b 30. The depth for a given edit should be at least 10x and these edits were required to have at least 99% of reads supporting the reference allele in the wild-type samples. Finally, only A-to-G edits in transcribed strand were considered for downstream analysis. Motif or sequence logo was analyzed by WebLogo (v3.6.0) for RNA edits. The downloaded data subjected to RNA off-target analysis from four published papers were listed in Source Data for Sup Figures. Detailed information for called mutations was provided in Source Data for called mutations from RNA-seq data.

Structural analysis
A structural model for TadA-RNA complex was generated using coordinates from PDB ID 2B3J by PyMol (The PyMOL Molecular Graphics System, Version 1.9 Schrödinger, LLC.). TadA from Staphylococcus aureus (SaTadA) was shown as cartoon model in gray and the RNA bound was shown as stick model rendered by elements, with the Zn 2+ ion as green sphere. The residues critical for the RNA binding of TadA were shown in ball-and-stick model and labelled with single-letter codes in red.

Statistics
Results were obtained from three independent or indicated experiments and were presented as the mean ± SD. For analyzing relative expression in FPKM, the FPKM from RNA-seq data in 3 embryos was used and presented as the mean ± s.d.. All original data presented in main gures were provided in Source Data for Main Figures, and original