Introduction

RNA-binding proteins (RBPs) bind to RNA regulatory elements to regulate the RNA lifecycle of networks of RNA species. As disruption of protein-RNA interactions is associated with many human diseases1, scalable technologies that identify protein-RNA interactions are critically needed to provide deeper insights into RNA regulation. Immunoprecipitation (IP)-based strategies coupled with high-throughput sequencing such as cross-linking immunoprecipitation (CLIP) are routinely used to identify RBP targets and binding sites of RBPs. The eukaryotic ribosome is also composed of a complex of RBPs, and ribosome profiling methods such as Ribo-seq2 and variants that leverage enhanced CLIP3 with antibodies recognizing ribosome subunit proteins can evaluate the mRNA translatome3. Unfortunately, these techniques rely on the digestion of the unprotected portions of the interacting RNA, hampering the discrimination of RBP binding sites on alternative mRNA isoforms and the association of multiple proteins with the same transcript.

Technologies such as TRIBE4 and STAMP5 address these issues by fusing RNA base editors (rBEs) to full-length RBPs, thereby obviating the need for RNase digestion and protein-RNA cross-linking. In STAMP, RBP-APOBEC1 fusions yield statistically significant clusters of C-to-U edits near the known RBP binding motif5,6. Since RNase digestions are nonobligatory in STAMP, long-read sequencing detected RBP interactions with specific mRNA isoforms5. STAMP requires fewer cells than CLIP and was demonstrated to enable single-cell analysis of RBP-RNA interactions5. However, the enzymes used in TRIBE and STAMP have reported native sequence context preferences for base deamination. APOBEC1 prefers editing cytosines that are flanked by adenine (A) and uracil (U) bases7 and disfavors editing sites with upstream guanine8. The catalytic domains of TRIBE enzymes (ADARcd and the mutated derivative used for hyperTRIBE) have a strong preference for editing double-stranded RNA, even without the double-stranded RNA-binding domains, and this leads to false negatives in TRIBE experiments4,9,10,11,12,13,14. Hence, we contend that the current paucity of available rBEs constitutes a constraint in the pursuit of transcriptome-scale exploration of protein-RNA interactions.

To substantially expand the repertoire of rBEs, we developed an experimental and computational framework consisting of a combination of reporter constructs and transcriptome-wide analysis in live human cells we term PRINTER (protein-RNA interaction-based triaging of enzymes that edit RNA) that evaluated the editing activities and specificities of 31 rBEs. We evaluated the most promising rBE candidates through their fusion with two distinct full-length human RBPs, RBFOX2 and RPS2, each known for their unique RNA interaction preferences. Our experiments successfully identified seven rBE enzymes capable of detecting transcriptome-wide protein-RNA interactions with high sensitivity and specificity. This expansion builds upon the foundation established by the previous three enzymes4,5,15, significantly broadening the toolkit for RNA interaction studies. Furthermore, our comprehensive characterization of editing biases associated with different rBEs when fused to RBPs underscores the importance of considering these biases, especially when selecting single or multiple rBE fusions. This choice becomes particularly crucial when studying RBPs with strict sequence motif preferences, such as RBFOX2, in contrast to RBPs with more broad sequence specificity, such as RPS2. Lastly, we recommend pairs of rBEs that are well-balanced for enabling dual-RBP editing measurements on the same RNA transcript. Our study sets the stage for enabling the next stage of discoveries that leverage diverse rBEs to illuminate RBP and ribosome-RNA interactions transcriptome-wide.

Results

A two-component reporter system evaluates rBEs for RNA editing activity in cells

To identify ideal fusion partners to full-length RBPs, we designed a framework to evaluate the activities of putative RNA base editors (rBEs) in live mammalian cells. Our system comprises two components that are transiently co-transfected into human embryonic kidney (Lenti-X 293 T Cell Line or “HEK293XT”) cells. The first is an editable mRNA reporter that consists of a 3’UTR with twelve bacteriophage MS2 stem-loops (12X MS2-SLs) that physically interact with the MS2 bacteriophage coat protein (MCP)16 RBP. The second component is an MCP-rBE fusion whose expression is controlled by doxycycline 24 h after transfection (Fig. 1a, b). Recruitment of the rBE to the reporter by the MCP interaction with MS2-SLs results in RNA editing on nearby substrate bases (Fig. 1b). After 48 h, DNA-free RNA is isolated from lysed cells and targeted RNA sequencing then selectively detects edits on the reporter mRNA (Supplementary Fig. 1a). As a positive control, we evaluated MCP-APOBEC1 used previously in STAMP5. To ensure the robustness of the results, two biological replicates were conducted for the experiments. As expected, APOBEC1 deposited C-to-U edits on the reporter mRNA. Sequencing identified multiple instances of uridine (U) in place of cytidine (C) at several positions along the twelve-MS2 stem-loop region, indicating C-to-U editing by the MCP-APOBEC1 fusion at those sites (Fig. 1c, green bars). In contrast, we only observed low levels of edits that may be due to expected PCR or sequencing errors in the absence of the MCP-APOBEC1 fusion (“No rBE,” Fig. 1c). Therefore, our two-component system successfully detects rBE-mediated RNA editing.

Fig. 1: A reporter system to assay RNA base editing activity of candidate RNA base editors (rBEs).
figure 1

a Reporter mRNA bearing twelve MS2 bacteriophage stem-loops (yellow bars) downstream of a Super folder GFP coding sequence (sfGFP) and a chimera bearing an MS2 coat-protein (MCP, black) domain fused to a candidate rBE (brown). b Strategy to test candidate rBEs. Plasmids encoding the constructs in a) are co-transfected into HEK293XT cells so that the MCP binds the MS2 stem-loops in the reporter, and the rBE catalyzes RNA editing. After total RNA is isolated, targeted RNA sequencing is used to detect edits along the reporter sequence. c Fraction of total covered bases at each position along the twelve MS2 stem-loop reporters exhibiting either C-to-U (green), A-to-I (orange), or no (black horizontal line) edits. d The number of C-to-U (green) and A-to-I (orange) edits on the reporter mRNA (on-target, left) or other poly(A)+ RNAs (off-target, right). e Ratio of the number of on-target C-to-U (green) and A-to-I (orange) edits on twelve MS2 loop construct vs. the number of off-target edits on poly(A)+ RNAs, plotted for n = 2 independent experiments across each enzyme. Boxes extend from the first to the third quartile of the data, with the center line indicating the median. Box whiskers extend to the farthest data point lying within 1.5x the inter-quartile range from the box in either direction. (Figure created with BioRender).

Transcriptome-wide analysis reveals rBEs display a range of editing activity and accuracy

We evaluated rBEs that previously demonstrated precise RNA editing (e.g., RESCUE-S ADAR2dd17, hereafter A2dd (R-S)) and DNA base editors with high levels of “on-target” DNA editing activity (e.g., evoCDA18) or “off-target” RNA editing activities (e.g., TadA-7.10 (V82G), hereafter 7.10 (V82G)) from previous studies that targeted editing to specific DNA or RNA loci using or Cas9- or Cas13-based technology8,17,18,19,20,21,22,23,24,25,26,27,28,29. We evaluated 31 candidate editors, including fourteen C-to-U editors (including APOBEC1), seventeen A-to-I editors (including TRIBE and hyperTRIBE enzymes), and two capable of catalyzing both A-to-I and C-to-U edits. Our panel of enzymes includes different protein families such as Escherichia coli (E. coli) tRNA-specific adenosine deaminase (TadA), activation-induced cytidine deaminase (AID)/Apolipoprotein B mRNA editing enzyme, catalytic polypeptide-like (APOBEC), and adenosine deaminase acting on RNA (ADAR). These enzymes were expressed as C-terminal fusions to MCP (Fig. 1a, b) and individually evaluated by our reporter system in two biological replicates each.

The editors deposited a wide variety of C-to-U editing profiles on the reporter RNA. The mutant derivative of APOBEC1, evoAPOBEC18, yielded noticeably higher editing near the MS2 stem-loops than APOBEC1, albeit with edits reaching farther upstream into the GFP coding sequence (“edit spillover,” Fig. 1c). Since the GFP coding sequence is away from the MCP-binding sites, editing there may reflect off-target RNA editing (e.g., editing while the MCP is not bound to the RNA). The APOBEC3A-based mutant, A3A (Y132G/K30R)30, similarly catalyzed higher editing activity near the MCP binding sites than APOBEC1. However, A3A (Y132G/ K30R) had far less edit spillover, which may be a desirable feature for editing-based identification of binding sites of sequence-specific RBPs (Fig. 1c). Furthermore, the wild-type A3A and the A3A (Y132G/G188A/R189A/L190A)30 mutant produced similar editing rates but different editing patterns than APOBEC1, indicating that these enzymes may complement APOBEC1 (Fig. 1c). Lastly, the C-to-T-editing enzymes evoCDA18, evoFERNY8, evoA3A(N57G)18, APOBEC3G (A3G)31, AID* (truncated)32, AID (C12*)33, SECURE rAPOBEC1 (R33A)18, and SECURE rAPOBEC1 (R33A/K34A)18 did not produce detectable RNA editing on our reporter mRNA (Fig. 1c, see also Materials and Methods), confirming the engineered reduction of RNA-editing for the SECURE rAPOBEC1 DNA-editing enzymes18.

The A-to-I RNA-editing enzymes we assessed produced at least five editing patterns on the reporter RNA. The first group of editors, group A, deposited edits across the entirety of the reporter mRNA (TadA (D108N)21, TadA (V82G/D108N)21, TadA (V82G/D108N)-d21, and TadA (L84F/D108N)21; Fig. 1c). Group B editors, interestingly, primarily edited the GFP coding region with little to no editing near the 3’UTR MCP binding sites (TadA21, TadA-d21, TadA (V82G)18,21, TadA (V82G)-d18,21), while Group C editors barely had any signal, which we suspect arose from PCR or sequencing errors (TadA (L84F)21, ADARcd4, and ADARcd (E488Q)15 (hereafter hAcd (hyperTRIBE) for simplicity) (Fig. 1c). Group D editors primarily edited near the MS2 binding sites, albeit along a wide range of spillover rates (editing observed on the GFP coding sequence) with enzymes such as TadA-8.225 TadA-8e25, and TadA-8e-d25 (hereafter 8.2, 8e, and 8e-d respectively) at the high end and enzymes like 7.10 (V82G)18, TadA-7.10)18,21,34 (hereafter 7.10), and TadA-7.10-d18,21,34 (hereafter 7.10-d) at the low end (Fig. 1c). The last group of editors, group E, comprised ADAR-derivatives RESCUE ADAR2dd17 (hereafter A2dd (R)) and A2dd (R-S)17, edited both A-to-I and C-to-U simultaneously, an activity also observed with Cas13-directed RNA editing17. However, despite similar editing capability and spillover rates, the A2dd (R) enzyme displayed much more activity than the A2dd (R-S) mutant (Fig. 1c). Notably, the TadA and its mutated derivatives display a variety of A-to-I editing patterns (Fig. 1c). These results indicate a mutation strategy can help create designer editors. We demonstrate that our reporter system successfully evaluated A-to-I and C-to-U RNA editing activity and specificity.

In our reporter assays, we pinpointed enzymes that outperformed TRIBE and STAMP in terms of signal-to-noise ratio. Among these, A3A (Y132G/K30R), 7.10 (V82G), and 7.10 displayed the most potent on-target editing within the reporter mRNA’s 3’UTR, while maintaining minimal off-target edits on the GFP coding region (see Fig. 1c). EvoAPOBEC1, 8e, 8.2, 8e-d, and 8.2-d exhibited remarkable editing activity in proximity to the MCP binding sites. These enzymes are particularly valuable for studying proteins with brief dwell times on their target RNAs, despite some detectable spillover rates (refer to Fig. 1c). Importantly, the spillover appears to be primarily associated with the robust interaction between MCP and the cognate MS2 stem-loops (as indicated by a low dissociation constant, Kd = 10−9–10−8 M, ref. 35). Hence, we hypothesize that RBPs with shorter dwell times on RNA will yield more precise editing near genuine RBP binding sites, with manageable off-target effects. Additionally, we included A2dd (R) and A2dd (R-S) due to their capacity to edit both adenosine and cytidine, expanding the range of sequence substrates that can be captured.

Detecting genome-wide off-target editing for the top editing candidates

To determine the editing accuracy for the eight candidates compared to TRIBE and STAMP, we reasoned that since the reporter bears sequences that are not native to HEK293XT cells (namely the GFP and the MS2 SLs), we can gauge the accuracy of the MCP-rBE candidate by comparing the frequency at which the fusion edits the reporter (on-target editing) to the edits on endogenous transcripts that lack the MCP binding sites (off-target editing, Fig. 1d). We performed polyA+ RNA sequencing on the same RNA samples from both experimental replicates used to prepare the targeted RNA sequencing in the reporter-based screen (Fig. 1a–d). Analysis of transcriptome-wide data revealed strong agreement with our targeted RNA-seq assay results, uncovering the highest editing rates for 8.e and 8.e-d, then decreasing progressively in editing rates to evoAPOBEC1, A2dd (R), 7.10 (V82G), 7.10, APOBEC1, A3A (Y132G/ K30R) and A2dd (R-S) (Fig. 1d). The editing rate for hAcd (hyperTRIBE) enzyme was comparable to or slightly below background editing (Fig. 1d, e). Further, the off-target editing rates for each enzyme generally correlated with editing activity, with the most active editors (e.g., 8e) demonstrating higher off-target editing than the least active editors (e.g., A2dd (R-S)). Intriguingly, the 8e-d dimer demonstrated lower on-target editing activity while simultaneously increasing off-target editing relative to the 8e monomer (Fig. 1d, e). Our results indicate that rBEs 7.10 (V82G) and A3A (Y132G/K30R) have the highest editing and signal-to-noise activity of the A-to-I and C-to-U rBEs, respectively.

RBFOX2 fusions with candidate rBEs reveal authentic and distinct RNA targets

Having established that our enzymes performed well as fusions to MCP, we next assessed whether our candidate rBEs demonstrate accurate editing when fused to distinct human RBPs. We engineered 7.10 (V82G), A2dd (R), APOBEC1, and 8e ORFs as C-terminal fusions of RNA Binding Fox-1 Homolog 2 (RBFOX2) protein (Fig. 2a, b). We reasoned that since RBFOX2 binds specific sequence motifs ((U)GCAUG36, where (U) is present in some, but not all binding sites), fusing the candidate rBEs to this RNA-binding protein should yield edits near the known RBFOX2 binding motif. To distinguish RBFOX2-directed editing from free rBE-directed editing, we also expressed each rBE without fusion to an RNA-binding protein domain (“free rBEs”; Fig. 2a, b). In biological triplicates, we transiently transfected HEK293XT cells with plasmids encoding each fusion. After 72 hrs of inducing expression of the constructs, we generated and sequenced RNA-seq libraries. The sequencing data for RBP-fusions and free RBE experiments was first processed with the SAILOR algorithm to detect both A-to-I and C-to-U base changes5,37, before we used the FLARE algorithm6 to identify statistically significantly enriched edit clusters along the exons and introns of the target RNA species (Fig. 2b). We pinpointed clusters consistently identified in all three RBFOX2-rBE experiments and designated them as RBFOX2-rBE edits. Conversely, clusters present in all three free RBE experiments were categorized as background noise and subsequently excluded from the RBFOX2-rBE experimental dataset. We retained 7882 clusters (representing 4151 genes) for 8e, 4003 (2437 genes) for A2dd (R), 1897 (1274 genes) for APOBEC1, and 736 (549 genes) for 7.10 (V82G) (Fig. 2c, d). Our transcriptome-wide results were generally consistent with the reporter assay findings, as the enzymes with the highest number of clusters (8e, A2dd (R)) also achieved the greatest editing on the reporter (Fig. 2c, d, and Fig. 1d). The only inconsistent observation was with APOBEC1, which when fused to RBFOX2 resulted in more clusters and identified more targets than 7.10 (V82G), but it produced less editing on the reporter than 7.10 (V82G) (Fig. 2c, d, and Fig. 1d). This may be due to enzyme-specific biases, although it may be that the differences are within margins of error. Despite the differences in the number of clusters detected by each fusion, de novo motif enrichment analyses using HOMER38 identified statistically significant enrichment of the central GCAUG motif for all RBFOX2-rBEs evaluated (Fig. 2e, and Supplementary Fig. 2a–c). We observed that each of the RBFOX2-rBE fusions deposited edits near the GCAUG sequence statistically (p-value < 0.00001) more frequently than free rBEs (Fig. 2f, g). It is worth noting that the proportion of edit clusters containing the core RBFOX2 binding motif (GCAUG) may appear relatively low across various enzymes. However, these findings align with independent CLIP studies, revealing that in HepG2 and HEK293 cells, less than 33% and approximately 40% of RBFOX2 binding sites, respectively, feature the typical RBFOX2 binding motif39,40. In addition, many edited regions lacking the canonical motif may contain RBFOX2 motifs of intermediate affinity (see ref. 6). More importantly, empirical permutation tests demonstrated that the RBFOX2-rBE clusters were more likely to coincide with RBFOX2-APOBEC1 eCLIP peak sequences than randomly selected, similarly sized regions (Fig. 2h). A priori, we do not expect a perfect overlap between rBE- and eCLIP-based detection of RBFOX2 binding sites (Fig. 2h).

Fig. 2: RBP-Mediated RNA Editing with RBFOX2 and Top rBE Candidates.
figure 2

a Constructs feature RBFOX2 RNA-binding protein (RBP, pink) fused to RNA base editors (rBEs, brown) at the C-terminal end (“Sequence-specific RBP-associated rBEs”) and solo rBEs (“Free editors”). b RBFOX2-directed or free rBE activity detection in HEK293XT cells involves plasmid transfection. RBFOX2-RBP fused rBEs target GCAUG binding sites, differing from free rBEs. After 72 h, RNA is extracted for poly(A) + RNA sequencing and FLARE6 edit cluster detection. c, d FLARE analysis reveals edit clusters and RNA species edited by each RBFOX2-rBE fusion. Colors: RBFOX2 fusion to 8e (dark blue), A2dd (R) (teal), APOBEC1 (green), 7.10 (V82G) (red). e HOMER analysis identifies the canonical RBFOX2 binding-site motif ((U)GCAUG) as the top motif in each fusion’s edit clusters, using a cumulative hypergeometric distribution for p-values. f Replicable edit cluster fraction in RBFOX2-rBE fusions (n = 3 experiments) is higher than in random edit clusters from mRNA targets, showing enrichment. Box plots display data, with boxes from first to third quartiles, median center line, and whiskers extending to 1.5x the inter-quartile range. g Density plots show RBFOX2-rBE fusions’ replicable peak centers (n = 3 experiments) closer to the RBFOX2 motif than rBEs alone. RBFOX2-rBE fusion colors as in 2c; Free rBE colors distinct: 8e (purple), A2dd (R) (turquoise), APOBEC1 (light green), 7.10 (V82G) (pink). h Fraction of replicable edit clusters in RBFOX2-rBE fusions overlapping RBFOX2-APOBEC1 eCLIP peaks is higher than in random peak sets. Box plot details as in (f). i, j Analysis of relationships between edit cluster sets and RNA species for each RBFOX2-rBE fusion is depicted with grey dots. Lines bisect intersecting values, and black bars reflect set numbers. k The canonical RBFOX2 binding-site motif is predominant in unique clusters from RBFOX2-rBE fusion, as per HOMER analysis. (Figure created with BioRender).

Editing-based detection provides a larger temporal window of interaction capture (72 h in our experiments), whereas eCLIP offers a momentary snapshot of RBFOX2 interactions. Also, we expect edits up to 200 base pairs away from eCLIP-based RBP-RNA interaction sites41 (for a more in depth analysis, please see refs. 6,41). Notably, RBFOX2-APOBEC1 displays a higher frequency of capturing the RBFOX2 motif with the upstream U present in some RBFOX2 targets ((U)GCAUG) compared to other RBFOX2-rBE fusions (as depicted in Fig. 2e). However, it is important to consider that APOBEC1 prefers editing substrate bases surrounded by A/U-rich regions7. This preference may explain the more frequent occurrence of the upstream U in the motifs enriched by RBFOX2-APOBEC1.

Subsequently, we assessed the degree of overlap among RBFOX2-rBE edit clusters originating from various rBEs. Intriguingly, only 99 clusters exhibited commonality across all fusions (Fig. 2i, j). Even though the clusters detected by all fusions displayed a notable enrichment for the ((U)GCAUG motif (as shown in Fig. 2k), it is noteworthy that the clusters unique to −8e, -A2dd (R), and -APOBEC1 RBFOX2 fusions also exhibited statistically significant enrichment for the (U)GCAUG motifs. This suggests that each enzyme introduces edits at frequently distinct RBFOX2 binding sites (as illustrated in Fig. 2k and Supplementary Fig. 2d). Given that the RBFOX2 fusions solely differ by the rBE, our findings imply that enzyme-specific editing biases can lead to varying editing frequencies at distinct segments of the same RNA species bound by RBFOX2.

Sequence context preferences of rBEs influence the detection of sequence-specific RBP binding sites

To gain deeper insights into the intrinsic sequence preferences of the rBEs and how their fusion with sequence-specific RBPs influences their editing patterns at binding sites, we conducted a detailed analysis of the sequences surrounding the edits within the clusters derived from the RBFOX2-rBE fusion experiments. We employed a rigorous approach, training eight distinct two-layer convolutional neural networks (CNNs) to delve into the unique characteristics of each of the four rBEs individually, as well as their behavior when fused with RBFOX2. For each model, we used one-hot encoded 200-bp sequences from clusters associated with a specific rBE or RBFOX2-rBE fusions as positive examples, while an equal number of one-hot encoded 200-bp sequences from all other rBEs or RBFOX2-rBE fusions, respectively, served as negative examples. Each model was trained to provide a binary prediction, determining whether a given region was edited using that specific enzyme. To ensure effective learning without overfitting, we continued training until there was a minimal marginal reduction in the binary cross-entropy loss score on a separate validation dataset.

Across 8e, A2dd (R), APOBEC1, and 7.10 (V82G), these models can distinguish clusters generated by their respective target rBE from those produced by other rBEs fairly well, with the free rBE variations, on the whole, performing better (AUCs of 0.715, 0.839, 0.787, and 0.669, respectively) than those for the rBE fusions (AUCs of 0.701, 0.696, and 0.721) (Fig. 3a). Interestingly, models trained on clusters generated solely by free rBEs exhibited similar performance when evaluated on RBP-rBE fusion clusters, compared to models trained directly on RBP-rBE fusion clusters (Fig. 3b). These observations suggest that while fusion with an RBP can alter the editing profile of an rBE, the underlying rBE-specific biases can persist. Consequently, even when different RBFOX2-rBE fusions associate with the same binding sites, the composition of the surrounding sequence at the RBP binding sites, along with the inherent rBE preferences, jointly influence whether the binding event will result in edits.

Fig. 3: Analysis of RBFOX2-rBE and free rBE sequence biases.
figure 3

a A CNN with two convolutional layers identifies enzyme-specific biases in enzyme-RBFOX2 fusion and enzyme-alone experiments. Higher AUCs for enzymes alone suggest more identifiable biases. Color-coding: RBFOX2 fusions with 8e (dark blue), A2dd (R) (teal), APOBEC1 (green), and 7.10 (V82G) (red). For enzymes alone: 8e (purple), A2dd (R) (turquoise), APOBEC1 (light green), and 7.10 (V82G) (pink). b A CNN trained on enzyme-alone clusters predicts enzyme identity in fusion peaks well, often outperforming one trained on fusion clusters. Conversely, a CNN trained on enzyme-fusion clusters is less effective on free RBE clusters (right). c Edited sites show distinct, replicable enzyme-specific flanking base context preferences, evident in a PCA plot, whether alone or fused to RBFOX2. d In the PCA plot, the first (black) and second (purple) principal components distinctly reflect contributions from each RNA base or combinations (e.g., GC). e Density plots reveal consistent peak adenosine (A, left) and guanosine-cytidine (GC) content across RBFOX2-rBE and free rBE (top and bottom, respectively) for any enzyme. f Edit site context preferences, indicated by the height of each bar, vary by enzyme and are consistent between free and RBFOX2-rBE edits. Each flanking base set is color-coded to match its bar in the chart. g On genes with two GCAUG motifs edited by different enzymes, peak base contents and enzyme editing context specificities (from e) influence editing likelihood for each region. h Comparison of combined edit clusters (red bar) and individual RBFOX2-rBE edit cluster sets intersecting with RBFOX2-APOBEC1 eCLIP peak sequences. i Boxplots compare RBFOX2-APOBEC1 eCLIP peak sequence overlaps with replicable RBFOX2-rBE edit clusters against thirty random eCLIP sets; red circles for actual peaks, enrichment shown as red bars. Boxes cover middle 50% of data, median as center line, whiskers up to 1.5x inter-quartile range.

We next aimed to uncover which specific sequence features underlie these differences between the preferred editing contexts of each rBE. To do so, we characterized the four bases flanking high-confidence edit sites (SAILOR score > 0.9) for each rBE. A principal components analysis of the counts of each flanking base context within each RBP-rBE fusion dataset – based on proximity in PCA space along the first two principal components (PCs) – indicates that each rBE has highly replicable distributions of flanking base pairs and that edits generated by the same rBEs (whether alone or fused to RBPs) are found in more similar contexts to one another than to those generated by any other rBE or RBP-rBE fusion (Fig. 3c). These results support the conclusions from the CNN-based analyses described above (Fig. 3a, b). We examined the loadings for the contribution from each possible flanking base context with respect to PCs 1 and 2 and found that the factor loadings for PC1 exhibited strong positive contribution from A/U bases and negative contribution from G/C bases (Fig. 3d). The opposite was true for PC2, as it had negative contribution from A/U bases and positive contribution from G/C (Fig. 3d). Indeed, APOBEC1 and RBFOX2-APOBEC1 flanking contexts have high PC1 (A/U) values and low PC2 (G/C), consistent with the known APOBEC1 preference for cytosines flanked by As and Us7,42,43 and its low tolerance for Gs8, respectively. These rBE-specific affinities are evident at the cluster-wide level, as shown by similarly varying cluster-level base contents (Fig. 3e, and Supplementary Fig. 3a). We see that 7.10 (V82G) and A2dd (R) clusters tend to have higher GC content than APOBEC1 and 8e clusters, and APOBEC1 clusters are enriched for A content compared to clusters from other enzymes. Further examination of the distribution of edit flanking contexts revealed that out of the 16 possible flanking bases, 12 were either A’s or U’s. This reflects A/U flanking context bias, consistent with APOBEC1’s preferences7,42,43. In contrast, A2dd (R) edit sites exhibited enrichment for flanking contexts involving G’s and C’s, indicating a higher tolerance for these bases (Fig. 3f).

The implications of these varying preferred contexts for different RBP-rBE fusions are that authentic RBP binding sites, depending on their specific nucleotide contexts, may be better suited for editing by one RBP-rBE fusion over another. In other words, no single RBP-rBE fusion can edit the entire universe of existing binding sites. Pairs of core RBFOX2 binding sites (GCAUG) on the same gene are sometimes preferentially edited by different rBEs (Fig. 3g), with the nucleotide context of the 4 bases flanking the core GCAUG motif serving as a predictive element for which rBE is most likely to edit at each site. For example, when we focus on the FLARE-derived significant edit clusters found on CCNLI RNA, we notice that the RBFOX2 binding site with a greater GC content is only significantly edited by RBFOX2-A2dd (R), and the site with higher A content is only significantly edited by the RBFOX2-APOBEC1 fusion (as shown in Fig. 3g and Supplementary Fig. 3b).

These rBE-specific biases imply that at least for RBFOX2, a single RBP-rBE fusion is insufficient to capture the entire spectrum of RBP-RNA interactions. Combining multiple rBEs is expected to provide better coverage of true RBP binding sites than any individual fusion. To test this, we merged the confident cluster sets from all RBFOX2-rBEs into a single combined set and assessed the fraction of overlap with RBFOX2-APOBEC1 eCLIP peaks (Figs. 2c and 3h). Through empirical permutation testing, we observed that the combined cluster set captured a greater fraction of RBFOX2 eCLIP peaks than any of the individual fusions, and this overlap was statistically higher than expected by random chance (Fig. 3h, i). Importantly, even though 8e had higher editing activity, less active enzymes still captured eCLIP clusters that 8e missed (Fig. 3h), indicating that increasing the editing efficacy of certain rBEs may yield more clusters, but there will always be some subset of binding targets that remain undiscoverable without the use of other enzymes with editing context preferences better suited for those targets.

Lastly, we assessed whether the editors exhibited editing bias towards specific RNA regions. We found that, while most rBEs, both as fusions to RBFOX2 and individually, generally favored editing in 3’UTRs, 8e and A2dd (R) exhibited slightly more editing in coding regions (CDS) (Supplementary Fig. 3c). In conclusion, the suitability of rBEs to fuse with sequence-specific RBPs is influenced by sequence context, and a combination of multiple rBEs is recommended for achieving higher coverage and discovery of RBP-RNA binding sites.

Ribosomal protein RPS2-rBE fusions robustly detect transcriptome-wide mRNA translation changes

We previously demonstrated that the fusion of APOBEC1 to the core small ribosomal subunit protein RPS2 (RPS2-APOBEC1 or RiboSTAMP) enabled the measurement of ribosome-mRNA interactions, even in single cells5. Here, we evaluated RPS2 fusions to 8e, A2dd (R), and 7.10 (V82G), in comparison to APOBEC1 (Fig. 4a, b). Plasmids encoding RPS2-rBE fusions were transfected into HEK293XT cells and induced with doxycycline for 24 h, after which cells were treated with either dimethyl sulfoxide (DMSO) vehicle (-) or 100 nM of the mTOR pathway inhibitor Torin-1 (+) for 48 h (Fig. 4b) to induce changes in mRNA translation. The number of edits per sequencing read (edits per read or “EPR”) for each experiment was measured as a proxy for mRNA translation. We conducted three experiments for each RPS2-rBE fusion and concentrated our analyses on mRNAs that were consistently edited in all biological replicates. As expected, we observed generally lower EPR values in cells treated with Torin-1 compared to DMSO treatment for all RPS2-rBEs (Fig. 4c). We also determined that more mRNAs exhibited statistically significant decreases in EPR than increases (Fig. 4d). Notably, the Torin-1-mediated reduction of editing by the RPS2-rBE fusions was more evident among 5’ terminal oligopyrimidine tract (TOP)-containing mRNAs44 (Fig. 4e). Among the tested enzymes, the RPS2-A2dd (R) showed the lowest p-value and the largest t-test statistic for decreasing EPR in (TOP)-containing mRNAs between Torin-1 and DMSO conditions (t-test statistic 14.84, p-value < 10−44; Fig. 4f), followed by 8e (t-test statistic 9.43, p-value < 10−19), APOBEC1 (t-test statistic 4.24, p-value < 10−4) and 7.10 (V82G) (t-test 3.44, p-value < 10−3) (Fig. 4e, f). Notably, while all RPS2-rBE fusions similarly detected the majority of the fifty TOP-containing mRNAs, the overlap for non-TOP-containing RNAs that experienced significantly reduced editing exhibited a smaller overlap (Supplementary Fig. 4a, b).

Fig. 4: RNA Editing as a Proxy for Translation with RPS2 and Top rBE Candidates.
figure 4

a Construct with small ribosomal subunit protein RPS2 (purple) fused to a candidate rBE (brown) at its C-terminal end (“Translation-directed editor”). b RPS2-directed editing detection involves transfecting HEK293XT cells with a plasmid encoding the above construct and incubating with Torin-1 to inhibit translation or DMSO as a control. Post-treatment, poly(A) + RNA libraries are prepared and sequenced. Torin-1 is predicted to reduce editing, as indicated by reduction in edits per read (EPR). c EPR in Torin-1 treated cells shows a reduction compared to DMSO-treated cells across all RPS2-rBE fusions. d Statistically significant changes in RPS2-rBE EPR post Torin-1 treatment: decrease (red), increase (blue), or no significant change (grey) compared to DMSO control. e Log2-transformed decrease in EPR due to Torin-1, calculated as EPR in Torin-1 condition over untreated, across n = 3 experiments for RPS2-enzyme fusions in poly(A)+ RNAs (black) and TOP-containing mRNAs (pink). Boxes span the middle 50% of the data. The median is represented as the center line, and whiskers extend to the farthest data point lying within 1.5x the inter-quartile range from the box in either direction. f Edits per read fold-change between Torin-1- and DMSO-treated cells and associated statistical significance (log2-transformed p-value) for each RNA. Poly(A)+ RNAs (black) and TOP-containing mRNAs (pink) are highlighted. A two-sided t-test was used to determine significance. g Boxplots reveal EPR differences between Torin-1 and DMSO treatments in coding sequences (CDS, purple shading) and 3’ UTRs, over n = 3 experiments. Pronounced changes in CDS-specific editing for poly(A)+ RNAs (black) and TOP mRNAs (pink). Boxes cover middle 50%, median as center line, whiskers up to 1.5x inter-quartile range. (Figure created with BioRender).

We previously observed RPS2-APOBEC1 editing within 3’UTRs5. We compared the relative editing rates between mRNA coding sequences (CDSs) and 3’UTRs across our experimental conditions for our rBE-candidates. When considering all mRNAs, the ratios of EPR in CDS regions to 3’UTRs were similar across enzymes for mRNAs from both untreated and Torin-1 treatment conditions (Fig. 4g, compare the black CDS boxplot to the black 3’UTR boxplot for each individual enzyme). However, in TOP-containing genes, the CDS to 3’UTR editing ratio skews higher for each individual enzyme, indicating stronger decrease in editing among CDS regions compared to 3’UTRs (Fig. 4g, compare pink CDS boxplot to pink 3’UTRs boxplot for each individual enzyme). The ratios of the means of CDS/3’UTR editing as depicted in the boxplot are as follows: RPS2-8e = 1.873, RPS2-APOBEC1 = 1.384, RPS2-7.10 (V82G) = 2.331, and RPS2-A2dd (R) = 2.456. Since TOP genes are among the most highly translated genes, the increased propensity for CDS edits may indicate higher ribosome load on the coding sequences of these transcripts in standard growth conditions. This interpretation is bolstered by the observation that the CDS to 3’UTR editing ratio of TOP genes is lower in Torin-1 treated cells (Fig. 4g, and Supplementary Fig. 4d). Therefore, as with RPS2-APOBEC1, other RPS2-rBE fusions demonstrate editing in 3’UTRs, with RPS2-mediated editing more pronounced in CDS regions for all rBEs.

Like our observations for RBFOX2-rBE fusions, our analysis of the flanking base context for edits revealed distinct base context preferences for each enzyme fusion (Supplementary Fig. 4e, f) which are unaltered by the effects of Torin-1, as Torin-1 replicates clustered together with untreated replicates (Supplementary Fig. 4e). We observed that the loadings for PC1 had strong positive contribution from A/U bases, and strong negative contribution from G/C bases (Supplementary Fig. 4e, f). Conversely, the PC2 loadings had strong negative contributions from A/U and strong positive contributions from G/C bases (Supplementary Fig. 4e, f). These influences are also reflected in the bases that often neighbor edited sites (Supplementary Fig. 4g). However, as ribosomes are recruited to a substantially broader sequence area than sequence-specific RBPs like RBFOX2, we conclude that single rBE fusions to ribosome components is sufficient to measure general aspects of mRNA translation. To detect variations in RPS2-mediated editing rates, we suggest using 8e, A2dd (R), or APOBEC1. This is because the fold decreases in editing are more pronounced in Torin-1-treated cells compared to DMSO-treated cells with these RPS2-rBE fusions as opposed to the less sensitive RPS2-7.10 (V82G) fusion (Fig. 4f, g).

Evaluating combinations of C-to-U and A-to-I rBEs to assay dual editing compatibility

A-to-I and C-to-U edits can be used simultaneously to interrogate the binding of two distinct RNA-binding proteins on the same RNA transcript (as with TRIBE-STAMP45 that uses hAcd (hyperTRIBE) and APOBEC1). To test this possibility for our rBE fusions, we modified the synthetic 3’UTR in our twelve MS2 stem-loop reporter to also include PP7 stem-loops (Fig. 5a, left). The PP7 bacteriophage coat protein (PP7-CP) binds to the PP7 stem-loops, which coupled with the MS2 stem-loops can simultaneously recruit both C-to-U and A-to-I editing enzymes to the same 3’UTR. Different distributions of the MCP and PP7-CP binding sites on the reporter can also yield insights into the effects of binding site proximity on RNA co-editing (Fig. 5a, left). To create PP7-CP fusions, we replaced MCP with PP7-CP in the MCP-APOBEC1 plasmid and substituted APOBEC1 with either precise (7.10 (V82G)) or more active (8e) A-to-I editors selected based on enrichment scores and noise levels from RBFOX2 fusion and reporter experiments. After, plasmids encoding the MCP-APOBEC1, one of the PP7-CP-A-to-I fusions (PP7-CP-7.10 (V82G) or PP7-CP-8e), and one of the reporter mRNAs were co-transfected into HEK293XT cells (Fig. 5a, b) in biological duplicates. The editing experiments were then carried out for our initial rBE screen (Fig. 5a, b).

Fig. 5: A combinatorial editing reporter system to identify multiple RBP associations with the same transcript.
figure 5

a Components that are used to test combinatorial C-to-U and A-to-I rBE pairs. The system uses a reporter mRNA with varied MS2- (yellow rectangle) and PP7- (red rectangle) stem-loop distributions in the 3’ UTR. The reporter binds the MS2 (MCP, black) and PP7 (red) coat proteins fused to C-to-U and A-to-I rBEs. b A combinatorial editing strategy. Plasmids encoding C-to-U and A-to-I fused MCP and PP7 proteins are co-transfected into HEK293XT cells with the reporter bearing both binding sites. After, the edits are detected on the reporter with targeted RNA sequencing. ce Distribution of C-to-U (green) and A-to-I (orange) edits deposited by each of five different enzyme combinations and the reporter without enzymes. The edits are mapped along each of the three distinct reporters. One reporter bears a c MS2 (yellow) and PP7 (red) stem-loops spaced 50 bp apart, with 350 bp separating the pairs. The other two reporters contain d twelve or e four alternating MS2 and PP7 binding sites spaced 50 bp apart. (Figure created with BioRender).

Our dual-editing reporter system revealed distinct co-editing patterns. MCP-APOBEC1 and PP7-CP-7.10 (V82G) deposited edits near their respective binding sites in all the MS2 and PP7 stem-loop distributions tested but are more conspicuous in the constructs in which we alternated the binding sites (Fig. 5c–e, and Supplementary Fig. 5c–e). Moreover, rBE editing slightly decreased when paired in a dual editing experiment (Fig. 5c–e, and Supplementary Fig. 5c–e). Yet, the 8e and 7.10 (V82G) protein levels appeared similar whether transfected alone or with APOBEC1 (Supplementary Fig. 5d, e). This indicates that the rBEs potentially influence each other. The inter-enzyme influence can result from editing by one enzyme leading to substrate context incompatibility with the other, or from steric crowding on the reporter mRNA (Fig. 5c–e, and Supplementary Fig. 5c–e).

MCP-APOBEC1 and PP7-CP-7.10 (V82G) deposited fewer edits when co-expressed than when expressed on their own, consistent with our observation where one editor (8e) impacted the editing of the other (APOBEC1; Fig. 5c–e, and Supplementary Fig. 5c–e). Further, as with our 12X MS2-SL reporter, MCP-APOBEC1 and PP7-CP-7.10 (V82G) produced less edit spillover than PP7-CP-8e (Fig. 5c–e, and Supplementary Fig. 5c–e). Our reporter demonstrates the precision of the 7.10 (V82G) and APOBEC1 pair with paired MS2 and PP7 binding sites that were 350 bases apart. On this reporter, MCP-APOBEC1 and PP7-CP-7.10 (V82G) edits clustered near their corresponding binding sites, while PP7-CP-8e editing clusters near its binding site but produces much more evenly distributed edits across the 3’UTR (Fig. 5c, and Supplementary Fig. 5c). Further, reducing the number of twelve alternating MS2 and PP7 binding sites to four produced tighter editing patterns for MCP-APOBEC1 and PP7-7.10 (V82G) (Fig. 5d, e, and Supplementary Fig. 5d, e), highlighting the editing precision of this pair. We conclude and recommend that APOBEC1 and 7.10 (V82G) are a robust, balanced pair for dual-RBP-based editing applications.

Discussion

We have introduced PRINTER, a systematic workflow of experimental and computational assays designed to comprehensively assess the capabilities of over thirty candidate RNA base editors (rBEs) for probing protein-RNA interactomes within living cells. Our RNA “tethering” assay effectively recruits rBEs to a synthetic 3’ UTR on a reporter mRNA, where they catalyze edits. Subsequently, we rigorously analyze on- and off-target editing activities using targeted and transcriptome-wide RNA sequencing. Our findings reveal a rich diversity in the sensitivity and specificity of the rBEs.

Among the candidates evaluated, we have identified seven promising rBEs that hold significant potential to expand the scope of RBP-directed editing. These noteworthy editors include 8e, evoAPOBEC1, A2dd (R), 7.10 (V82G), 7.10, an APOBEC3A mutant (A3A (Y132G/K30R)), and A2dd (R-S), as listed in Table 1. Notably, several of these editors exhibited improved signals when compared to TRIBE and hyperTRIBE enzymes, underscoring their enhanced performance in RBP interactome studies. Additionally, we have identified editors with varying editing activity levels, such as A2dd (R-S) with reduced activity and evoAPOBEC1 with enhanced activity compared to the APOBEC1 enzyme used in STAMP, thus broadening the spectrum of editing control for RBP-mediated interactions.

Table 1 Comparison of top rBEs

However, it is worth noting that some enzymes in our screening did not perform robustly in our reporter assay. Cas9-mediated DNA editors (in the absence of Cas9) like evoCDA1, evoFERNY, evoA3A, AID, APOBEC3G, and several AID variants, were unable to edit RNA effectively. Additionally, SECURE rAPOBEC1 (R33A) and SECURE rAPOBEC1 (R33A, K34A), which had shown detectable RNA edits in previous studies, did not produce noticeable editing in our reporter RNA. Furthermore, ADARcd4 and hyperADARcd15 (referred to as hAcd in this study), the enzymes employed in TRIBE and hyper-TRIBE, respectively, failed to generate detectable edits in our experiments. This discrepancy in ADAR editing could be attributed to differences in enzyme sources, as the TRIBE enzymes originate from fruit flies, whereas the A2dd enzymes are human-derived. Another possibility is that dsRNA, which is the preferred substrate of ADAR4,9,10,11,12,13,14, is less abundant in the cytoplasm of cells, limiting the activity of these enzymes46.

Our RNA base editor tethering approach offers a rapid means to characterize enzymes before their application in RBP-mediated editing. It also enables the identification of DNA editors with reduced off-target RNA editing, addressing a critical aspect of the field’s current limitations. The challenge of achieving an optimal signal-to-noise ratio in RBP-directed editing experiments remains significant. Striking the right balance depends on the specific RBP-related problem at hand. In our studies, higher enzyme activity tends to result in more edits away from the binding sites, as exemplified by 8e in our reporter assays. While this may introduce some noise, it is important to note that 8e captures a largely distinct set of genuine RBFOX2 targets. Recent work with split 8e on DNA editing holds promise to make RNA editing inducible, given that the system is compatible with RNA editing as we have shown for intact 8e in this study47. On the other end of the spectrum, utilizing enzymes that rely on infrequent sequences or structures, such as the dsRNA editing ADARs, may fail to capture the full spectrum of protein-RNA interactions where these structures are rare, like in the cytoplasm.

Our combined computational and experimental strategies represent a step toward addressing the limitations inherent in editing-based detection approaches. These strategies provide valuable insights into the inherent biases of rBEs and their impact on the detection of protein-RNA interactions. Specifically, the observed preferences of certain rBEs for distinct sequence contexts, such as the GC-rich sequences preferred by A2dd (R), 7.10 (V82G), and 8e, or the A-rich contexts preferred by APOBEC1, offer valuable guidance for selecting the most suitable enzyme for a given study. Additionally, considering factors like 3’UTR bias and precision of editing helps researchers make informed decisions when choosing an rBE for their experiments. Specifically, 8e and A2dd (R) appeared to have less 3’UTR bias, which may make them more suitable for studies where RBPs associate with regions like the CDS. Precision was highest among 7.10 (V82G), 8e, and then A2dd (R) in those studies. We recommend using 7.10 (V82G) where GC tolerance and precision are required and 8e or A2dd (R) to maximize the number of recovered RNA targets and edit clusters.

Our study emphasizes the importance of acknowledging and addressing enzyme bias when designing experiments and interpreting their outcomes. Relying solely on one enzyme can lead to false negatives and an incomplete understanding of protein-RNA interactions. A case in point is our RBFOX2-rBE fusions, which produce edit clusters that overlap with genuine yet largely distinct sets of RBFOX2-APOBEC1 eCLIP targets. Thus, our recommendation to assay RBPs with multiple enzymes ensures a more comprehensive capture of genuine targets. These datasets can subsequently be analyzed to identify common sequence motifs that persist across different enzymes, facilitating the discovery of conserved binding sites. Furthermore, our work challenges the assumption that a single enzyme can universally address any type of protein-RNA interaction. The varying performance of different enzymes in different contexts underscores the need for a diversified toolbox of rBEs. Enzymes should be chosen based on their compatibility with the specific experimental goals, whether that involves GC-rich sequences, precision editing, or the detection of distinct types of interactions. To illustrate this point, consider RBFOX2-A2dd (R), which results in only a modest enrichment of RBFOX2 motif sequences and RBFOX2 eCLIP peaks. In contrast, RPS2-A2dd (R) outperformed all others in detecting Torin-1-mediated translational repression. This scenario underscores the ongoing need for the field to expand the repertoire of available enzymes for RBP-directed editing, a pursuit exemplified by the recent introduction of TadA-CD C-to-U editors, which are derived from the TadA-8e (8e) A-to-I editor profiled in our study25.

Our findings also highlight the potential for combining multiple rBEs to achieve more comprehensive results or evaluate which RBPs are bound to the same RNA molecule. We find that only some enzymes are compatible with “dual editing” on the same RNA45,48,49. Compatibility is important for the interpretability of the results because mismatched enzyme activities could lead to one enzyme masking the edits of the other. We recommend APOBEC1 (C-to-U) and 7.10 (V82G) (A-to-I) which captures co-binding across three distinct RBP binding site configurations. As with our single RBP studies, using more than one C-to-U and A-to-I enzyme pair would likely yield a more complete view of RBP co-binding than using a single pair alone. Thus, future work in identifying RNA base editors with different activities will also enable the field to identify more useful co-editing pairs.

In summary, our study not only addresses key limitations in the field of RNA base editor-mediated detection of protein-RNA interactions but also presents opportunities for further refinement and expansion. By modifying the sequences in the linker regions of our reporter (currently 50 bp RNA linker flanking MS2 stem-loops), we can evaluate enzymes with distinct sequence context preferences, paving the way for engineered rBEs, as has been previously conducted with DNA editors8,26,27. Additionally, there is vast potential to evaluate the ability of rBEs to detect localized RNAs50. For example, integrating the molecular recording strategy, localized RNA recording, and proximity-specific ribosome profiling data from yeast added insights into the interplay between RNA localization and translation at the ER and mitochondria that is impossible with either method alone50,51,52. Adapting optimized rBEs to identify localized RNAs can expand on the localized RNA recording strategy and complement methods like APEX-Seq and proximity-specific ribosome profiling50,51,52,53. Here, distinct editors may also be used to record RNA molecules that interact with multiple sub-cellular locales, as previously suggested with combined localized RNA recording and TRIBE50.

Methods

Cloning

12X-MS2 stem-loop mRNA reporter

The pcDNA3.1 (-) Mammalian Expression Vector (Invitrogen, Cat # V79520) was digested with the NheI (Thermo Fisher Scientific, Cat # FD0974) and the MssI (PmeI, Thermo Fisher Scientific, Cat # ER1341) restriction enzymes. After, fragments (IDT gene blocks) bearing the human codon-optimized (IDT tool) super folder green fluorescent protein54 (sfGFP) coding sequence and a synthetic 3’ UTR containing twelve version-six MS2 bacteriophage stem-loops (12XMBSV616) were cloned into the digested pcDNA3.1 (-) vector (Thermo Fisher Scientific, Cat # V79520) via Gibson assembly55.

MCP-RNA base-editor (rBE) fusions

Strategy 1: Gateway cloning

The pDONR221 Gateway vector (Thermo Fisher Scientific, Cat # 12536017) was first digested with AflII (New England Biolabs, Cat # R0520S) and EcoRV (New England Biolabs, Cat # R0195S) restriction enzymes to remove the attP1, ccdB, cmR, and attP2 cassettes. After, two gene fragments bearing the MS2 coat protein-coding sequence flanked by the attL1 and attL2 sequence were cloned into the digested vector via Gibson assembly. The resulting MCP-pDONR221, pHCMM14, was then used to insert the MCP into each rBE-bearing destinations vector using Gateway LR cloning56.

Preparation of destination vector for Gateway LR cloning

The pLIX403_Capture1_APOBEC_HA_P2A_mRuby (Plasmid #183901) vector5 was digested with PspXI (New England Biolabs, Cat # R0656S) and BstZ17I-HF (New England Biolabs, Cat # R3594S) to remove the APOBEC1 coding sequence. The digested fragments were then cleaned using the QIAquick PCR Purification Kit (Qiagen, Cat # 28104) and eluted in 18 μL water. Afterward, 2 μL of E-Gel Sample Loading Buffer, 1X (Thermo Fisher Scientific, Cat # 10482055), were added to the eluate, and 20 μL of the mix was loaded onto a well on a 2% Agarose E-gel (Thermo Fisher Scientific, Cat # G401002). Next, the cassette was loaded onto the E-Gel Power Snap Electrophoresis Device (Thermo Fisher Scientific, Cat # G8100) and run for 13 min on the 1-2% agarose gel setting. Finally, a band corresponding to the size of the backbone vector without the APOBEC1 sequence was excised from the agarose gel and purified using the Qiagen mini-elute gel extraction kit (Qiagen, Cat # 28604).

PCR-amplification of rBE coding sequences

The open reading frames encoding distinct rBE candidates were amplified using Q5 High-Fidelity DNA Polymerase (New England Biolabs, Cat # M0491). The PCR primers were designed to produce PCR product flanked by ~40 bp of sequence complementary to the target backbone vector, a requirement for Gibson assembly. The PCR products were then purified and quantified for the digested backbone vector (as described above for digested vector purification).

Gibson assembly of PCR-amplified rBE CDSs with the STAMP backbone vector

Gibson assembly reactions were assembled by adding 200–400 ng of each purified amplicon and ~50 ng of digested backbone vector together with 15 μL of 2X Gibson master mix and molecular grade water to a total of 20 μL volume. The reactions were then incubated at 50 °C for 50 min. After, 2 μL of the reaction were transformed into MultiShot™ FlexPlate TOP10 Competent Cells (Thermo Fisher Scientific, Cat # C4081201). Successful cloning was confirmed by Sanger sequencing. Finally, the Gibson reactions produced destination vectors bearing distinct rBE candidates.

Gateway cloning to generate MCP-rBE fusions

The entry clone plasmid bearing the MCP coding sequence (pHCMM14) was combined with each of the destination vectors carrying distinct rBE candidates (see above) using Gateway LR cloning (Thermo Fisher Scientific, Cat # 11791020). Next, 1 μL of the reaction was transformed into E. coli and the correct clones were confirmed by Sanger sequencing. The resulting plasmids were then transiently transfected into human embryonic kidney 293 cells (HEK293XT, see below).

Strategy 2: Gibson assembly to generate MCP-rBE fusions

We switched to Gibson assembly to speed up the generation of MCP-rBE fusions, which would skip the gateway cloning step. First, the MCP-APOBEC1 fusion generated above with Gateway cloning was digested with the AfeI (New England Biolabs, Cat # R0652S) or FastDigest Eco47III (Thermo Fisher Scientific, Cat # FD0324) and PspXI (New England Biolabs, Cat # R0656S) restrictions enzymes to remove the APOBEC1 from downstream of the MCP. The coding sequence for each of the remaining enzymes in our panel was then amplified using a primer that yielded fragments flanked by ~40 bp sequences complementary to the digested backbone vector. The vector and PCR products were then purified as described above. After ~50 ng of the digested vector was combined with 200-400 ng of the PCR-amplified rBE sequences, 15 μL of Gibson master mix, and water to bring the reactions to 20 μL each. The reactions were then incubated at 50 °C for 50 mins. After, the 2 μL of the Gibson reactions were transformed into E. coli and the clones were isolated from the resulting colonies and confirmed with Sanger sequencing. This approach yielded the remaining MCP-rBE fusions described in this report.

Transient transfection of HEK293XT cells

Plasmids were transfected into HEK293XT cells (Lenti-X 293 T Cell Line, Takara Inc., Cat # 632180) using Lipofectamine 3000 (Thermo Fisher Scientific, Cat # L3000015). The cell line was authenticated by the manufacturer using short tandem repeat (STR) analysis, and we authenticated in-house through morphological analysis using microscopy before and during each experiment. To prepare HEK cells for transfection, they were first grown in a 10 cm dish to 80% confluency. After, the media was removed, and the cells detached from the plate via TrypLE (Thermo Fisher Scientific, Cat # 12604039) treatment at 37 °C for ~3 mins and subsequent pipette-mixing with 10 mL of DMEM (Thermo Fisher Scientific, Cat # 11965092) + 10% FBS (Thermo Fisher Scientific, Cat # 26140-079). The suspension was then transferred to a 15 mL conical tube, and cells were pelleted by centrifugation at 200 x g for 5 min. The cells were resuspended in 3–5 mL of DMEM, and 20 μL of cell suspension was mixed with 20 μL of Trypan Blue (Thermo Fisher Scientific, Cat # 15250061). 10 μL of the mixture was then loaded onto a Dual-chamber Cell Counting Slide (Bio-Rad, Cat # 1450011), and the cells were counted using a Bio-RAD TC20 Automated Cell Counter. After cells were diluted to 174 cells/μL, adding 500 μL of cell suspension yielded ~87,000 cells per well of a 24-well plate. The cells were incubated overnight, after which DNA-lipid complexes were generated using lipofectamine 3000 within 24 hrs of plating. For 12-well plates, 1250 ng of DNA was used to create lipid-DNA complexes, while 625 ng was used for 24-well plates. Once created, the complexes were added drop-wise to the wells making sure to cover as much area of the well as possible with the drops. The plates were then placed in a 37 °C incubator overnight. The next day, 500 μL of fresh media containing 2ug/ml doxycycline (Dox) and 1 µg/mL puromycin were added to the existing media in each well. The cells were then incubated at 37 °C for 48 or 72 hrs. After, the media was removed via aspiration, and the cells recovered by TryPLe (500 μL) treatment for 2 mins at 37 °C, followed by resuspension in 1 mL of DMEM + 10% FBS. The cells were then pelleted via centrifugation at 200 x g for 5 mins at room temperature, the supernatant removed, and pellets stored at -80 °C until analysis. Alternatively, the aspirated media was replaced by 300–600 μL of TRIzol. The plates were either covered with foil seals and stored at -80 °C until RNA extraction or DNA-free total RNA was immediately isolated from the lysate using the Direct-zol RNA Miniprep Kit (Zymogen, Cat # R2052) using the RNA Purification protocol. The RNA was then quantified using the NanoDrop UV-Vis spectrophotometer.

Targeted reporter RNA sequencing

Experiments were performed in duplicate with an APOBEC1 positive control and a “reporter alone” negative, and libraries were prepared with the same amount of starting RNA material for all samples for qualitative comparison. To do so, 500 ng of total RNA was subjected to reverse transcription using a Superscript IV reverse transcriptase kit (Thermo Fisher Scientific, Cat # 18090010) in 20 µL volume using a primer specific to the reporter mRNA (HCMM64: 5’-AAAGGACAGTGGGAGTGG-3’) at a 0.1 μM final concentration. 1 μL of the resulting cDNA was then subjected to PCR amplification using Q5 High-Fidelity DNA Polymerase (New England Biolabs, Cat # M0491L) and primers specific to the reporter mRNA (HCMM64, and HCMM148: 5’- GAACCCACTGCTTACTGGCT-3’) each at a 0.5 μM final concentration. The resulting amplicon was then purified as described by the “DNA fragment purification” section below. 1 ng of the purified fragment was then used to prepare sequencing libraries with the Nextera XT DNA Library Preparation Kit (Illumina, Cat # FC-131-2002 or Cat # FC-131-1024; Indexes: Nextera XT DNA Library Preparation Kit (24 samples), Cat # FC-131-1024 or IDT for Illumina DNA/RNA UD, Cat # 20027213). The resulting libraries were then purified and normalized with either the Nextera XT kit or standard normalization. Equimolar amounts of each library were then pooled to make a 4 nM library. After, a 6.5 pM library was denatured and loaded onto a MiSeq reagent cartridge (Illumina, Cat # MS-102-3003, MS-102-3001, MS-103-1002, or MS-103-1001) and sequenced on the Illumina MiSeq with either single-end 150 or paired-end 150 read format.

DNA fragment purification

PCR amplification (two 50 μL reactions) and plasmid digest (three 50 μL) reactions were combined into a single Eppendorf tube. The reactions were then cleaned using the QIAquick PCR Purification Kit (Qiagen, Cat # 28104). To do so, the cleaned fragments were first concentrated in 18 μL of water via elution. Then, the eluate was added to 2 μL of 1X E-Gel Sample Loading Buffer (Thermo Fisher Scientific, cat # 10482055) and the resulting 20 μL loaded onto a well on a 2% Agarose E-gel (Thermo Fisher Scientific, Cat # G401002). The E-gel was then loaded onto the E-Gel Power Snap Electrophoresis Device (Thermo Fisher Scientific, Cat # G8100) and electrophoresed for 13 mins on the 1-2% agarose gel setting. Finally, a band corresponding to the size of the desired fragment was excised from the agarose gel and purified using the Qiagen min-elute gel extraction kit (Qiagen, Cat # 28604). Next, the purified DNA fragments were eluted in 14 μL water and quantified on a Nanodrop spectrophotometer. When the fragments were to be used for targeted sequencing, the purified fragments were quantified using a Qubit 3.0 fluorometer (Thermo Fisher Scientific, cat # Q33216) with the dsDNA broad range kit (Thermo Fisher Scientific, cat # Q32850).

Poly(A)-enriched RNA sequencing

To prepare poly(A)-enriched RNA sequencing libraries, we processed 500 ng of total RNA using the TruSeq Stranded mRNA Library Prep kit (Illumina, Cat # 20020595; Indexes: IDT for Illumina TruSeq RNA UD Indexes, Cat # 20020591). The resulting libraries were then analyzed on a TapeStation (Agilent) and quantified via a Qubit 3.0 fluorometer. Equimolar amounts of the resulting sequencing libraries were combined to make a 4 nM pool. The pools were then sequenced on a Novaseq 6000 instrument (Illumina) in single-end 100 bp format at the Genomics Core at the Institute for Genomic Medicine at UCSD or the La Jolla Institute for Immunology.

Standard sequencing library normalization

For standard normalization, we calculated the concentration in nM for each library using the equation: concentration in nM = [(concentration in ng/μL) ÷ (660 g/mol × average library size in bp)] × 106. The average library size was obtained via analysis of 1 μL of the library on the Agilent TapeStation 2200 (Agilent, Cat # G2964AA) using D1000 screen tape (Agilent, Cat # 5067-5583). The library concentration in ng/μL was determined by analysis of 1 μL of the library on the Qubit 3.0 fluorometer (Thermo Fisher Scientific, Cat # Q33216) using the dsDNA broad range kit (Thermo Fisher Scientific, Cat # Q32850).

Next-generation sequencing

NovaSeq sequencing was carried out on the Novaseq 6000 at the La Jolla Institute for Immunology (LJI) or the UC San Diego Health Sciences Institute for Genomic Medicine (IGM) Genomics Center. Further, MiSeq sequencing was done at the Stem Cell Genomics and Microscopy Core at the Sanford Consortium for Regenerative Medicine (SCRM).

RPS2-BE editing +/- Torin-1 treatment

For mTOR perturbation experiments, cells were transiently transfected with the RPS2-BE fusions as described above (see transient transfection section), and expression of the constructs was induced via the addition of doxycycline at a final concentration of 2 μg/mL and incubated at 37 °C for 24 hrs. After, Torin-1 (Cell Signaling, Cat # 14379) dissolved in dimethyl sulfoxide (DMSO) was added to pre-warmed DMEM + 10% FBS, adding the media to the cultured cells yielded a final concentration of 100 nM. A set of cells were treated with a DMSO vehicle lacking Torin-1 to serve as a control. The cells were then incubated at 37 °C for 48 hrs, after which the media was aspirated, and cells were resuspended in 300 μL of TRIzol followed by RNA extraction, library prep, and sequencing (see above). To ensure the reliability of the results, the experiments were carried out on three replicates.

Dual editing

The reporter-based dual editing system

The 12X MS2-SL plasmid was digested with AfeI (New England Biolabs, Cat # R0652 S) or FastDigest Eco47III (Thermo Fisher Scientific, Cat # FD0324) and PmeI (New England Biolabs, Cat # R0560S) to swap in different MS2 and PP7 stem-loop distributions for the reporter-based dual-editing system. Gene blocks (IDT) coding one of the three distinct MS2 and PP7 distributions considered were cloned into the purified backbone via Gibson assembly (See detailed Gibson cloning procedure above). Further, the protein component was generated by digesting the plasmid encoding the MCP-8e-HA-P2A-mRuby construct with NheI (Thermo Fisher Scientific, Cat # FD0974) and AfeI (New England Biolabs, Cat # R0652S) or FastDigest Eco47III (Thermo Fisher Scientific, Cat # FD0324) to remove the MCP coding sequence. A gene block encoding the PP7-CP57 was then Gibson cloned into the purified backbone to generate PP7-CP-8e-HA-P2A-mRuby. After the newly-cloned construct was digested with AfeI (New England Biolabs, Cat # R06 Thuronyi 2S) or FastDigest Eco47III (Thermo Fisher Scientific, Cat # FD0324) and BshTI (AgeI) (Thermo Fisher Scientific, Cat # FD1464) to remove the 8e-HA-P2A-mRuby-coding sequences and PCR products encoding 8e-HA-P2A or 7.10 (V82G)-HA-P2A were cloned into the purified backbone together with a PCR product encoding the blue fluorescent protein58 (BFP) coding sequence. The reaction yielded PP7-CP-8e-HA-P2A-BFP and PP7-CP-7.10 (V82G)-HA-P2A-BFP. The resulting PP7-A-to-I editor-encoding plasmids were then individually co-transfected into HEK293XT cells with each of the MS2- and PP7-stem-loop-bearing reporters (one reporter at a time) and a plasmid encoding the MCP-APOBEC1-HA-P2A-mRuby C-to-U-editing construct. Experiments in which each editing fusion was co-transfected with the reporter without a second editor were done to help determine the specificity of the coat proteins for their given stem-loops. Experiments in which the reporter was transfected without an editor were done to help account for editing that may arise from endogenous enzymes or sequencing errors. All experiments were done in duplicate to ensure the reproducibility of the results.

Western blots

Cells were lysed with ice cold iCLIP Lysis Buffer3 containing 1X Protease Inhibitor Cocktail Set III (Milipore Sigma, Cat # 539134-1SET). Lysates were sonicated for 5 min at 30 second on/off intervals and protein was quantified using Pierce BCA Protein Assay Kit (Thermo Fisher, Cat # 23225). Total protein lysates were run on 4%–12% NuPAGE Bis-Tris gels in NuPAGE MOPS running buffer (Thermo Fisher, Cat # NP0050) and transferred to PVDF membranes. Membranes were blocked in 5% nonfat milk in TBST and incubated with the following primary antibodies: 1 h at room temperature with rabbit anti-HA-tag (Cell Signaling, Cat # 3724, Clone C29F4, Lot 10), washed 3X for 5 min with TBST, incubated for 2 h at RT in 5% nonfat dry milk powder in TBST with Rabbit TrueBlot: Anti-Rabbit IgG HRP (Rockland Immunochemicals, Cat # 18-8816-33, Clone eB182, Lot 46967), washed 3X for 5 min with TBST. Membranes were developed using the Azure C600 imaging system and Pierce ECL Western Blotting Substrate (Thermo Fisher, Cat # 32209).

Data processing

MS2 loop specificity data

Cutadapt was used to remove adapter sequences from original FASTQ files, after which bwa mem was used to align sequences to a FASTA file containing a single entry, representing the twelve-loop construct sequence. Finally, Pysamstats (version 1.1.2) was used to generate count tables of base counts at each position along the construct, which was processed using the R statistical software ggplot2 to visualize editing levels as a histogram. The histogram displays Green (C-to-U), orange (A-to-I), and black “No Edit” bars. To make it easier to see the C-to-U and A-to-I edit bars, only the top line of the black “No Edit” bars was kept, which produced the “No Edit” line.

MS2 loop on/off-target data (signal-to-noise)

Cutadapt was used to remove adapter sequences from the original FASTQ files, after which the resulting FASTQ files were aligned to the GRCh38 human reference genome using STAR 2.7.6a. The STAR default settings were used, except for --outSAMtype BAM SortedByCoordinate and --outSAMunmapped Within. Unmapped and mapped reads were extracted from the resulting bam, using the commands samtools view -f4 and samtools view -F4, respectively, into separate bams. Reads from the.bam file with unmapped reads were converted back to a FASTQ file, which was subsequently aligned to a FASTA file containing a single entry representing the twelve-loop construct sequence, again using the STAR 2.7.6a alignment software. SAILOR was then run twice (once for A-to-I edit detection and once for C-to-T edit detection) on all reference-mapped and construct-mapped.bam files, using either the reference genome or the twelve-loop construct FASTA file as a reference, respectively. Finally, SAILOR edit counts were loaded into a Jupyter Lab 1.2.21 notebook. On/off-target rates were then calculated by dividing, for each sample, the edit count on reads mapped to the reporter construct (on-target) by the edit count on reads mapped to the genome (off-target). This ensured, for example, that enzymes that have high on-target but also high off-target have a lower signal-to-noise ratio than enzymes with low on-target but much lower off-target.

RBFOX2-rBE data

Cutadapt was used to trim adapter sequences from reads in all original FASTQ files, after which the resulting trimmed reads were aligned to the GRCh38 human reference genome using STAR 2.7.6 with default settings except for --outSAMtype BAM SortedByCoordinate and --outSAMunmapped Within. SAILOR was run for each bam file, with parameters variably set for A-to-I detection or C-to-T detection, depending on the known editing modality of each enzyme being tested. C-to-T and A-to-I SAILOR output files were combined for A2dd (R), which is known to exhibit both editing modalities. For each replicate of each enzyme, edit counts were summed for 30-bp bins across regions of the transcriptome exhibiting edits. Using a background rate calculated as the mean editing fraction (fraction of editable Cs or As, respectively, edited per bin) across all bins, a Poisson test was conducted for each bin to test for significantly elevated editing levels. Benjamini-Hochberg false discovery rate correction was used to adjust p-values, and only bins with adjusted p-values below 0.1 were retained. After filtering bins, the remaining bins within 15 bp of each other were merged to form clusters. For each enzyme, peak coordinates were intersected between all three replicates, and only clusters present in all three replicates were retained. The clusters were loaded into Jupyter Lab 1.2.21 notebooks, where motif presence and RBFOX2-APOBEC1 eCLIP overlap were calculated using custom scripts.

RPS2 +/- torin1 data

Cutadapt was used to trim adapter sequences from reads in all original FASTQ files, after which the resulting trimmed reads were aligned to the GRCh38 human reference genome using STAR 2.7.6 with default settings except for --outSAMtype BAM SortedByCoordinate and --outSAMunmapped Within. SAILOR was run for each bam file, with parameters variably set for A-to-I or C-to-T detection, depending on the known editing modality of each enzyme being tested. C-to-T and A-to-I SAILOR output files were combined for A2dd (R), which is known to exhibit both editing modalities. The subread featurecounts software was used to obtain read counts for genes in all samples, and then edits per read (EPR) were calculated on a per-gene basis using the output of featurecounts and the outputs of SAILOR. EPR data was loaded into Jupyter Lab 1.2.21 notebooks, where relative decreases of editing under the influence of Torin-1 were calculated for each sample.

Reporter-based dual editing sequencing data

Reference FASTA and GTF formatted files were prepared for each designed reporter. Adapters were trimmed from short reads using cutadapt. A Burrows-Wheeler Alignment Tool (BWA) (version 0.7.17) index was then built for each reporter and BWA mem was used with default parameters to align reads to the respective reporter sequence. Edits were identified with Pysamstats (version 1.1.2) using parameters: --min-baseq 20 and --type variation_strand. Plots to visualize sites containing A-to-I (G) and C-to-T edits were created using a custom R script.

Reproducibility

To ensure reproducibility, several experiments were conducted using two replicates. These experiments included MS2 reporter assays, on-to-off-target analyses, and integrated C-to-U and A-to-I (MCP and PP7) experiments. The data from the reporter experiments were analyzed independently per replicate, while the edit values for both replicates were combined for the on-to-off-target analyses before comparing reporter (on-target) versus transcriptome (off-target).

The RBFOX2-rBE and RPS2-rBE experiments were performed with three replicates. For the RBFOX2-rBE data, only FLARE edit clusters that were consistent across all three replicates were considered for analysis. Additionally, clusters that were detected consistently across three free rBE experiments were treated as noise and subtracted from the RBFOX2 data. For the RPS2-rBE experiments, only edits on mRNAs edited across three independent replicates were considered for analysis.

Biological materials availability

The plasmids constructed for this study will be deposited to Addgene for distribution under a Uniform Biological Material Transfer Agreement (UBMTA).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.