Functions and consequences of AID/APOBEC-mediated DNA and RNA deamination

The AID/APOBEC polynucleotide cytidine deaminases have historically been classified as either DNA mutators or RNA editors based on their first identified nucleic acid substrate preference. DNA mutators can generate functional diversity at antibody genes but also cause genomic instability in cancer. RNA editors can generate informational diversity in the transcriptome of innate immune cells, and of cancer cells. Members of both classes can act as antiviral restriction factors. Recent structural work has illuminated differences and similarities between AID/APOBEC enzymes that can catalyse DNA mutation, RNA editing or both, suggesting that the strict functional classification of members of this family should be reconsidered. As many of these enzymes have been employed for targeted genome (or transcriptome) editing, a more holistic understanding will help improve the design of therapeutically relevant programmable base editors.

Genetic information is encoded by DNA, transcribed into RNA and translated into protein. When originally proposed 1 , this foundational tenet assumed faithful transmission of information such that mRNA accurately reflects what is encoded at the DNA level. However, it is now clear that RNA molecules can undergo several processing events that diversify the genomic information, resulting in different transcripts that, in some cases, encode different protein isoforms. Examples of such processes are alternative splicing 2 , alternative polyadenylation 3

and base modifications.
Most RNA base modifications are not easily detectable via synthesis-based RNA sequencing 4 , making it exceedingly difficult to distinguish between modified and unmodified RNA molecules 5 . One exception is RNA base deamination (also known as RNA editing), a widespread set of modifications that lead to a change in the RNA sequence itself. RNA editing can be detected simply by comparing the sequence of the transcript with that of its cognate gene.
In mammals, RNA editing refers specifically to the deamination of adenosine to inosine (A-to-I) or cytosine to uracil DNA (or cDNA); others have very specific genomic DNA substrates -for example, AID edits the expressed immunoglobulin gene (Fig. 1c). Finally, APOBEC2 cannot edit RNA or DNA but has the ability to bind DNA with affinities much higher than those reported for any other family member 11 .
The interplay between RNA editing and DNA mutation and the types of molecular restrictions that determine substrate range and selectivity is the focus of this Perspective. We first summarize the main determinants of AID/APOBEC substrate selectivity in members that are able to deaminate only DNA (we call these 'specialists') and those that can deaminate both RNA and DNA (we term these 'generalists') ( Table 1); note that family members where activity has not yet been tested on both substrates remain unassigned in this scheme. We then provide examples of how these different functionalities have allowed specific members of the AID/APOBEC family to drive evolution in different contexts. Finally, we discuss how AID/APOBEC enzymes have been co-opted into synthetic biology -specifically into the genome and transcriptome engineering technologies broadly known as programmable base editing, which have enormous therapeutic potential 12 . A broad understanding of the molecular features that drive AID/APOBEC selectivity will be key to the development of such precision therapeutics.
Determinants of substrate selectivity AID/APOBEC enzymes share three major functional elements: they all contain the catalytic domain (comprising the enzymatic pocket that, in part, overlaps with the substrate binding surface), whereas some also contain a cofactor interaction region (that can also multimerize) and sequence elements that define the subcellular localization of each protein (Fig. 2a). Sequence and/or structural variations in any of these features can change nucleic acid preference, for example through minor alterations in the substrate binding groove, or through restricted subcellular localization, such as exclusion from the nucleus through interaction with cofactors or through intramolecular oligomerization (reviewed elsewhere 13,14 ).
(C-to-U); for the purposes of this Perspective we will exclude the phenomenon of uracil insertion or deletion that was described as RNA editing in mitochondria of Trypanosoma brucei 6 . A-to-I editing is catalysed by the adenosine deaminase acting on RNA (ADAR) protein family [7][8][9] . C-to-U editing is performed by numerous cytosine deaminases, the best known of which belong to a family of mammalian enzymes known as the 'activation-induced cytidine deaminase/apolipoprotein B mRNA-editing enzyme catalytic polypeptide-like' (AID/APOBEC) protein family 10 (box 1).
The first member of the AID/APOBEC family to be characterized was the bona fide RNA editing enzyme APOBEC1 (Fig. 1a). Since then, additional RNA editing deaminases belonging to this family have been described, including APOBEC3A (A3A) and A3G. These RNA editors have the peculiar ability to also deaminate DNA, leading to single-nucleotide variant mutations that often occur processively in genomic DNA or reverse-transcribed viral cDNA (Fig. 1b). By contrast, other family members seem to have lost their ability to deaminate RNA: some, instead, catalyse mutation of viral Functions and consequences of AID/APOBEC-mediated DNA and RNA deamination

Box 1 | AID/APOBECs and their cellular functions
Reported functions for key members of the activation-induced cytidine deaminase/apolipoprotein B mRNA-editing enzyme catalytic polypeptide-like (AID/APoBeC) family of enzymes are discussed below.

APOBEC1
Human APoBeC1 expression is confined to the small intestine 160 and its only confirmed physiologic target is the apolipoprotein B (APOB) mRNA 57 (Fig. 1a). However, mouse APoBeC1 (mAPoBeC1) is more broadly expressed, mostly in immune cells but also in the small intestine and liver 63 . the availability of genetic models has enabled the activity of mAPoBeC1 to be studied more thoroughly than human APoBeC1, and hundreds of edited transcripts have been detected in numerous mouse tissues 161,162 . Interestingly, different cofactors drive mAPoBeC1 activity to different mRNAs (Fig. 1a). Although a specific function for APoBeC1 (other than its role in Apob editing) has been hard to discern, loss of mAPoBeC1 activity confirms its importance for the function of innate immune cells of the monocytic lineage (including macrophages, microglia and dendritic cells among others) and microglia-mediated nervous system homeostasis 63,163 . APoBeC1 can also deaminate DNA, a process that has been linked to cancer 142, 164,165 (Fig. 1a).

AID
AID is a single-stranded DNA (ssDNA) deaminase with a strong target preference for unmodified dC in the sequence context 5′-WRC-3′ (W = dA or dt, R = dA or dg) [166][167][168] . AID catalyses the deamination of immunoglobulin (Ig) genes. When targeted to the rearranged v gene, it initiates somatic hypermutation (sHm); when it is targeted to the switch regions upstream of the constant gene, it initiates a deletional recombination programme that results in antibody isotype switching (Fig. 1c). Both of these mechanisms increase antibody diversification in the host 169 . Although AID can bind both RNA and ssDNA, it only deaminates ssDNA; no catalytic activity towards RNA has been reported to date 170,171 . Additionally, it targets cytosines within the expressed antibody gene at a higher frequency and specificity than all other genomic loci 172,173 ; although the underlying mechanism is unknown, it likely relies both on the locus architecture and on region-restricted binding cofactors. Finally, unregulated AID activity can catalyse off-target mutations and chromosomal translocations 107,108 (Fig. 1), although at rates substantially lower than those reported within the Ig gene.

APOBEC2
APoBeC2 is expressed in cardiac and skeletal muscle 174 . All APoBeC2 knockout animal models tested (from zebrafish to mouse) display a myopathy 175,176 . the strong conservation of APoBeC2 suggests an undiscovered essential molecular function 158,177 , which does not seem to require deamination 48 . Indeed, we have reported in a preprint that APoBeC2 does not deaminate RNA or DNA but has retained the ability to bind DNA at specific promoter regions -and, through this functionality, to act as a transcriptional repressor 11 .

APOBEC3
APoBeC3 is encoded by a single gene in mice, which has expanded and diverged in humans and primates into a seven-gene subfamily encoding seven proteins (designated APoBeC3A (A3A), A3B, A3C, A3D, A3F, A3g and A3H) 178 . the major role of these proteins seems to be the restriction of viruses and genomic mobile elements 81,179 , through the deamination of their obligatory single-stranded cDNA intermediates 180 (Fig. 1b). such damaged cDNA can be cleaved and degraded 181 . However, if the viral genome is not degraded, deamination by APoBeC3 results in mutations that support viral evolution (Fig. 4). Certain APoBeC3 family members have also been reported to target the host genome leading to point mutations, DNA breaks and chromosomal instability 120,121 (Fig. 1b). these effects may drive cancer evolution (Fig. 4) through an increase in tumour heterogeneity 124 .

APOBEC4
APoBeC4 is expressed in the mammalian testis 182 . two recent works suggest a role for APoBeC4 in promoter modulation within mammalian cells 183 and the antiviral response in birds 184 . However, as very little is known about this protein, it will not be discussed further in this Perspective. primary role in defining the A3A preference for a 5′-TC sequence motif: although these residues have the potential to interact extensively with either T -1 or C -1 , the size of the -1 pocket precludes access of the larger purine 16,20 . Taken together, these findings highlight the importance of the residues in loop 7 in determining the local dinucleotide preference of a generalist, ... ...

Genome Provirus
Ig gene Genome  Fig. 1 | Physiological and aberrant functions of the AID/APOBEC deaminases. a | APOBEC1 in humans and mice acts in the nucleus of enterocytes, together with its cofactor RNA-binding motif protein 47 (RBM47), to edit apolipoprotein B (APOB) mRNA 155 . Editing leads to C-to-U base change that converts Gln (CAA) to a stop codon (UAA). Edited and unedited APOB mRNAs are then translated in the cytoplasm, generating two distinct isoforms: short (APOB-48) and long (APOB-100) isoform, respectively. APOB-100 is the major component of plasma low-density lipoproteins whereas APOB-48 is essential for secretion of chylomicrons. In mice, APOBEC1, together with RBM47 , catalyses RNA editing of a large set of additional transcript targets (mRNA set 1). A change of cofactor from RBM47 to APOBEC1 complementation factor (A1CF) leads to RNA editing of a different set of transcript targets (mRNA set 2), suggesting that target specificity resides with the cofactor. Finally, in mice and humans, APOBEC1 is also able to induce DNA editing within the genome of the cells, leading to undesired mutations (dashed red arrow) 109,141,142 . b | In humans, APOBEC3 family members play an essential role during retroviral infections (for example, in leukocytes). Specifically, once a retrovirus infects a cell, it releases its viral genome as single-stranded RNA (ssRNA), which is retro-transcribed (RT) to cDNA. APOBEC3 proteins can deaminate this single-stranded DNA (ssDNA) leading to C-to-T base changes and mutations within the viral genome. This edited viral genome can be degraded (if heavily edited) or integrated into the genome as a provirus. APOBEC3A (A3A) and A3G are also able to perform RNA editing on RNA viruses (such as SARS-CoV-2) as well as host mRNAs. Aberrant activity of A3A and A3B can also induce DNA mutations within the genome of cells (dashed red arrows). c | AID plays an essential role in B cell antibody diversification, where it catalyses deamination either within transcribed (black arrow) antibody variable region V(D)J gene segments of the immunoglobulin (Ig) gene leading to somatic hypermutation (SHM) (mutations represented as red bars) or within repetitive 'switch' regions upstream of the constant region gene segments, leading to class switch recombination (CSR) (switch regions Sμ or Sɣ1 shown). Resulting mRNA encodes an IgG1 protein that contains a hypermutated variable region and a ɣ1 heavy chain. Unregulated AID activity can also result in mutations and translocations elsewhere in the genome (dashed red arrow). AID/APOBEC, activation-induced cytidine deaminase/ apolipoprotein B mRNA-editing enzyme catalytic polypeptide-like; dsDNA, double-stranded DNA. Although these experiments indicate that the catalytic pocket confers some degree of substrate sequence specificity, small-molecule inhibitors that can discriminate between AID/APOBEC family members have yet to be identifiedimplying that the catalytic pockets of these enzymes share strong common features (recently reviewed elsewhere 25 ) and that the main determinant of substrate selectivity may, in fact, be the substrate binding groove.

The substrate binding groove of generalists is U-shaped.
In vitro assays indicate that generalist AID/APOBECs prefer structured substrates; disrupting stem-loops in ssDNA and RNA substrates directly alters the frequency with which they are deaminated by A3A and A3G (reFs [26][27][28][29] ). Moreover, the co-crystal structures of A3A and A3G bound to nucleic acid demonstrate that the U-shaped substrate binding groove formed by loops 1, 3, 5 and 7 (with the catalytic pocket located at the bottom of the U shape where the π-stacking interaction occurs) 16,20 (Fig. 3d) optimally accommodates a stem-loop structure. Although crystal structures of APOBEC1 bound to ssDNA or single-stranded RNA (ssRNA) are not currently available, its best-studied target, apolipoprotein B (APOB) mRNA, is predicted to form a stem-loop secondary structure [30][31][32] . Thus, generalists bind substrates with similar conformations to tRNA, the substrate of the distantly related tRNA adenosine deaminases (such as TadA 33 ; Fig.2a), suggesting that the shape of the binding

5′-TC-3′ Protection from viruses
APOBECs that require further studies on substrate preference and activity are classified as 'unassigned'. The underline indicates the C that is deaminated by each enzyme. A3A, APOBEC3A; AID/APOBEC, activation-induced cytidine deaminase/apolipoprotein B mRNA-editing enzyme catalytic polypeptide-like; C, cytoplasmic; ds, double-stranded; N, nuclear; N/C, nuclear and cytoplasmic; ND not determined; ss, single-stranded. a Tissue expression courtesy of the Human Protein Atlas 154 .
groove in generalists has co-evolved with the structure of their nucleic acid substrate. Moreover, the shape of this groove could be predictive of generalists. We note here that a similarly shaped groove (termed 'patch 1') is evident in co-structures of A3H with ssDNA and RNA 34 . For the purposes of this Perspective, A3H is considered 'unassigned' because its ability to catalyse RNA editing has not yet been tested. However, given the shape of its substrate binding groove and its demonstrated ability to bind RNA (see below), we would predict that it too might function as a bona fide RNA editor.

Groove residues help generalists discriminate between RNA and DNA substrates.
Residues in the loops forming the substrate binding groove of AID/APOBECs have key roles in substrate discrimination (that is, binding of RNA versus DNA). For example, a W121A substitution in loop 7 of APOBEC1 almost completely abolishes deamination of RNA while retaining activity on DNA, indicating an essential role of this amino acid in substrate differentiation 21 .
Notably, alignment with other APOBECs reveals that W121 in APOBEC1 corresponds to Y113 in A3H (Fig. 3c), a residue that directly interacts with a ribose 2′-hydroxyl of bound RNA 15,19 (PDB:6B0B, PDB:5W3V) ( Table 2). The same residue also corresponds to D131 in A3A and D316 in A3G. As discussed above, these residues have been shown to be important for deamination activity on ssDNA and for local dinucleotide sequence preference 16,18,20,35 , but no evidence yet exists for their function on  156 . In the vertebrate-specific branch, AID and APOBEC2 are the most ancient members (present in cartilaginous and bony fish) 157 . APOBEC1 emerged later in the tetrapod-lungfish divergence; and APOBEC3 appeared even later, in placental mammals. Both are believed to have evolved from AID gene duplications 82,158 . Paralogue expansion within placental mammals led to emergence of several APOBEC3 subfamily members, with the seven members of the human subfamily being among the most diverse 82,159 . More recently, orthologues of APOBEC4 have been found in invertebrates, suggesting it predates rest of family members and forms a separate invertebrate branch 158 . Right: domain delineation of members of the vertebrate-specific AID/APOBEC family.
Each member of the family contains the core zinc-dependent cytidine deaminase domain (core CDA). Specific members contain accessory motifs within core CDA that determine subcellular localization, including nuclear localization signal (NLS), nuclear export signal (NES) and cytoplasmic retention signal (CRS). Some members contain additional accessory regions that provide specific molecular properties: for example, APOBEC2 contains an amino-terminal intrinsically disordered region (IDR) whereas the carboxy terminus of APOBEC1 is hydrophobic. b | Core CDA composed of a five-stranded β-sheet (β1-β5) surrounded by six α-helices (α1-α6). Several loops found within the deaminase fold (L-1 to L-10) with loops 1, 3, 5 and 7 forming the substrate binding groove. Catalytic pocket coordinates a zinc ion (Zn; green sphere) with the His-Glu (H and E) and Cys-Cys (C) motifs found on α2 and L-5/α3, respectively. AID/APOBEC, activation-induced cytidine deaminase/ apolipoprotein B mRNA-editing enzyme catalytic polypeptide-like. RNA. However, two A3A protein variants were recently described that exclusively deaminate RNA. Both variants have a Y132G substitution combined with additional substitutions in loop 1 or helix 6, implicating these amino acids in substrate discrimination 36 . Nonetheless, much remains to be learned about how generalist APOBECs discriminate between RNA or DNA substrates; additional structures of the proteins bound to DNA or RNA, in combination with genetic studies targeting specific amino acids, will be necessary to pinpoint the residues that define substrate selectivity.

Structural differences in grooves of specialists reflect functional differences.
The structure of AID (with a dCMP bound within the catalytic pocket) (PDB:5W3V) ( Table 2) revealed a bifurcated, rather than U-shaped, substrate binding surface. Residues of loops 1, 3 and 7 are essential in shaping the substrate channel where the dCMP coordinates 37 (Fig. 3e), which is connected to a second groove, termed the 'assistant patch' 37 (Fig. 3e). Positively charged basic residues in these channels form a binding surface, which is separated near their point of convergence by negatively charged residues in loop 7 (the 'separation wedge') (Fig. 3e). Groove residues are highly conserved in AID proteins from different species, but not among other APOBECs, highlighting that this structure is specific to AID. Interestingly, similar separation wedge structures have been observed for proteins that recognize branched nucleic acids, such as T4 RNase H 38 or Cas9 (reF. 39 ), suggesting that AID recognizes structured substrates. Although AID targeting mechanisms are still not fully clarified, the conformation of the substrate binding region agrees with recent experiments that reveal a possible role for g-quadruplex structures in guiding and targeting AID, at least in the context of immunoglobulin class switch recombination (CSR) 37,40 . These data also highlight the importance of the substrate binding groove structure in allowing different AID/ APOBECs to discriminate substrates based on their secondary structure. It must also be noted that AID, similar to other specialists, can bind RNA 41 , especially within RNA-DNA hybrids 42 , but cannot deaminate it 43 , suggesting again that binding is required but not sufficient for catalysis.
The structure of APOBEC2 was the first among the AID/APOBEC family to be published 44,45 (PDB:2NYT) ( Table 2), but little is known about its molecular substrate and so co-crystal structures are currently unavailable. As such, it is not possible to assess the conformation of the substrate binding groove, but the APOBEC2 structure does provide some insight into its lack of deaminase activity. E60 in APOBEC2 forms a point of coordination with the zinc ion that is absent from catalytically active AID, A3A, A3G or APOBEC1, and this may affect catalytic activity by disrupting coordination of an essential water molecule or by modulating substrate affinity 46 (Fig. 3f). Deamination could also be prevented by obstruction of the nucleic acid binding pocket by loop 1 (Fig. 3f). However, given the flexibility of this loop seen in its solution structures 44 , intermolecular interactions affecting its conformation may allow transient access to the deaminase active site and transient interactions with nucleic acid. Recent work from our laboratory, which has been made available as a preprint, strongly suggests that APOBEC2 has retained the ability to interact with ssDNA containing GC-rich motifs; moreover, this interaction seems to affect gene expression 11 . It is tempting to speculate that, similar to AID, APOBEC2 may interact with G-quadruplex structures found within these GC-rich promoter sequences. Alternatively, APOBEC2 may interact with transient ssDNA structures resulting from RNA polymerase promoter melting, in a manner similar to other APOBECs 47 . We currently speculate that transcriptional repression through chromatin interaction may be an evolutionarily conserved function of APOBEC2 (reF. 48 ), especially in the context of cellular reprogramming.
Taken together, the available AID/ APOBEC structures illustrate the flexibility of their core structure and how it maintains the active site requirements of the family while enabling substrate restriction and functional specialization in some members or broader substrate preference and functional plasticity in others.

Subcellular localization
Regardless of the innate capacity of an AID/APOBEC protein to bind and deaminate DNA, RNA or both substrates, its ability to do so in cells will depend on its subcellular localization and its access to the specific substrate. Whereas mRNA, viral RNA and viral DNA can all be deaminated in either the nucleus or the cytoplasm, the host genome can only be deaminated by nuclear-localized family members. For example, despite having DNA binding and deamination capabilities, the generalist A3G cannot mutate genomic DNA because it is confined to the cytoplasm.
The subcellular localization of each member of the AID/APOBEC family may depend on active or passive cellular mechanisms. Transit of AID and APOBEC1 between the nucleus and the cytoplasm relies on both an amino-terminal bipartite basic nuclear localization signal (NLS) sequence and a strong carboxy-terminal leucine-rich nuclear export signal (NES) sequence 49,50 (Fig. 2a). APOBEC1 also contains a C-terminal hydrophobic domain, which is involved in intramolecular interactions that can play a part in further defining subcellular localization 21 . An extensive study of AID and APOBEC2 protein chimaeras showed that nuclear import of AID involves residues in addition to the N-terminal NLS, whereas APOBEC2 lacks NLS or NES motifs and, instead, passively diffuses between the cytoplasmic and nuclear compartments 51 . Unlike the rest of the AID/APOBEC family, APOBEC2 contains an N-terminal glutamate-rich acidic intrinsically disordered region (IDR), which could further restrict its subcellular localization through intermolecular interactions with shuttling proteins or cofactors (Fig. 2a).
Single-domain human APOBEC3 paralogues, A3A, A3C and A3H, are small enough (~25 kDa) to passively enter and exit the nucleus, and are generally found throughout the cell during interphase 52 ( Fig. 2a; Table 1). For example, A3H lacks an NLS but enters the nucleus through passive diffusion and is retained within the nucleolar subcompartment 53 . By contrast, the larger (>50 kDa) double-domain APOBEC3 paralogues cannot passively enter the nucleus; A3B is constitutively nuclear owing to its N-terminal NLS 52-54 , whereas A3D, A3F and A3G lack an NLS and are mostly found within the cytoplasm (Fig. 2a).
Interestingly, A3G seems to contain a novel cytoplasmic retention signal (CRS) 55 . All human APOBEC3 paralogues are excluded from chromatin during mitosis when the nuclear envelope breaks down, which presumably inhibits genome mutagenesis 52 (reF. 14 offers an in-depth review on trafficking kinetics of the AID/APOBEC family of proteins).

Cofactors
AID/APOBEC enzymes interact with numerous protein cofactors that enable them to carry out their functions in the cell. Here, we focus on cofactors that affect substrate targeting or modulate catalytic activity.
To date, APOBEC1 is the only AID/ APOBEC protein for which specific cofactors have been demonstrated to modulate its catalytic activity. In mice, APOBEC1 is expressed in the small intestine and the liver, where it edits a specific cytosine within the APOB pre-mRNA. The C-to-U RNA editing event recodes a CAA codon to a stop codon, resulting in a truncated form of the APOB protein, called APOB-48 (reFs 56,57 ) (box 1; Fig. 1a). Two cofactors of mouse APOBEC1 (mAPOBEC1) -APOBEC1 complementation factor (A1CF) 58,59 and RNA-binding motif protein 47 (RBM47) 60 -have so far been identified, but given that doubly mutant mice lacking both of these cofactors still retain some C-to-U editing activity, other cofactors are likely to exist 61,62 . A1CF and RBM47 bind RNA, interact directly with APOBEC1 protein [58][59][60] and have an essential role in defining which RNAs are targeted for editing as well as determining the level of editing per target 61,62 . Elegant genetic dissection in a mouse system suggests that cofactors 'recruit' different (sometimes partially overlapping) sets of transcripts to the editing complex (Fig. 1a) and that cofactor dominance is associated with editing frequency 61,62 . Together with the fact that APOBEC1 exerts its biological function by deaminating target cytosines within cohorts of transcripts that define common pathways 63 , these experiments support the idea that distinct tissues drive APOBEC1 to specific sets of transcripts through the provision of different sets of cofactors 64 Table 2). Target deoxycytidine (C 0 ) located at bottom of the substrate binding groove formed by loops 1, 3 and 7 and forms a π-stacking interaction with Y130 (PDB:5SWW) ( Table 2). b | Overlapping crystal structures of APOBEC1 (purple) (PDB:6X91) ( Table 2), A3A bound to ssDNA (pink) (PDB:5KEG) ( Table 2) and A3G bound to ssDNA (blue) (PDB:6BUX) ( Table 2). Residues F120 (APOBEC1), Y130 (A3A) and Y315 (A3G) form critical aromatic π-stacking interactions with target C (C 0 , from co-crystal structure of A3A bound to ssDNA; turquoise) (PDB:5SWW) ( Table 2). c | Alignment of amino acids present in loop 7 for different APOBECs and target C preference motif for each. d | A3A binds U-shaped substrates, such as ssDNA (orange) (nucleobases represented as blue sticks) (PDB:5SWW) ( Table 2). e | AID in co-crystal structure with dCMP ligand (PDB:5W0U) ( Table 2) shown as a molecular surface. AID loops 1, 3 and 7 form positively charged (blue) bifurcated substrate binding surface, comprising 'substrate channel' (which hosts dCMP) and a second groove, termed the 'assistant patch'. The two grooves are separated near the point of convergence by negatively charged residues in loop 7 (red) known as the 'separation wedge'. f | Co-crystal structure AID (light brown) with a dCMP ligand (orange) (PDB:5W0U) ( Table 2) overlaid with crystal structure of APOBEC2 (blue) (PDB:2NYT) ( Table 2). Loop 1 of APOBEC2 obstructs substrate (orange) at the active site. Residue E60 in APOBEC2 forms a fourth point of coordination with zinc ion (Zn; green sphere). AID/APOBEC, activation-induced cytidine deaminase/apolipoprotein B mRNA-editing enzyme catalytic polypeptide-like. Part e is adapted from reF. 13  Several potential cofactors have been identified for AID 65-70 , but none has been proven to be the key determinant in targeting AID to the immunoglobulin locus, its physiological target. Finally, a secondary Zn 2+ ion has been shown to allosterically modulate catalysis of A3A and A3G (reF. 71 ). Although not a cofactor in the traditional sense, this functionality points to possible surfaces that could be occupied by more traditional cofactors to regulate enzymatic function.
AID/APOBECs drive adaptive evolution RNA editing and DNA mutations have very different features; editing is transient and tunable, whereas mutations are irreversible and heritable. Despite these differences, both mechanisms create genetic variability that has an essential role in adaptive evolution [72][73][74] . In this section, we discuss how AID/APOBEC proteins can drive adaptive evolution in viral and cancer genomes owing to their ability to deaminate both RNA and DNA.

APOBEC3 proteins in viral genome evolution
Early experiments predicted that T cells express a factor that blocks the replication of viral infectivity factor (Vif)-deficient human immunodeficiency virus type 1 (HIV-1) 75,76 . A3G was later identified as one of the factors responsible for this HIV-1 restriction through active deamination of nascent retroviral cDNA [77][78][79] , with subsequent studies highlighting the involvement of A3D, A3F and A3H (reFs 80,81 ). Although many of these experiments were performed in APOBEC3-overexpressing cells infected with pseudotyped HiV and may not fully reflect in vivo conditions, the general consensus is that several APOBEC3 proteins individually and synergistically restrict viral infectivity of HIV and many other viruses during natural infections, a view that is supported by the substantial expansion of the APOBEC3 family in organisms that support large infection loads, such as bats 82,83 . In a process known as hypermutation, APOBEC3 proteins can deaminate a substantial proportion of the total cytosines in the HIV cDNA in a single round of viral replication, with reports of up to 10% in in vitro or cell culture experiments and up to 98% in HIV sequences isolated from peripheral blood mononuclear cells. The resulting uracils are recognized and excised by the host uracil DNA N-glycosylase (UNG) protein, which initiates the base excision repair pathway and, ultimately, leads to heavily damaged genomes containing multiple abasic sites. These genomes can be further cleaved and degraded, thereby decreasing viral infectivity 77,83 . However, genomes with less extensive damage (and fewer abasic sites) can simply be repaired, often resulting in mutations that can support viral evolution 84 and the acquisition of drug resistance 85 , altered transmission and immune escape 85,86 (Fig. 4a).
Analysis of HIV genomes that have undergone hypermutation or are associated with immune escape reveals an enrichment of APOBEC3-defined mutational signatures, which, in conjunction with biochemically derived triplet preferences, strongly support a physiologic role for specific APOBEC3 family enzymes in both viral restriction and viral evolution (reviewed elsewhere 87 ). Although the majority of knowledge surrounding APOBECs and viral restriction comes from the study of retroviruses, DNA viruses such as hepatitis B virus (HBV) and human papilloma virus (HPV) are also restricted by APOBEC3 enzymes [88][89][90] . Additionally, some APOBEC3 proteins can also deaminate viral genomes composed solely of RNA, such as the positive-sense RNA genome of the betacoronavirus SARS-CoV-2. Soon after the beginning of the COVID-19 pandemic, RNA sequencing data from bronchoalveolar lavage fluid of patients with COVID-19 was used to monitor the mutational signatures shaping the viral genome before fitness selection 91,92 . The most common mutations detected in these sequencing data were A-to-G and T-to-C changes (possibly the outcome of ADAR1 activity on the positive-sense and negative-sense strands, respectively, during viral replication) followed by C-to-T and G-to-A changes, likely mediated by APOBEC3 proteins, the only AID/APOBEC family members known to bind and deaminate viral RNA [91][92][93] . The involvement of APOBEC3 proteins is further supported by the frequent occurrence of edited Cs within the motif 5′-U/ACU/A-3′ (reFs 91,94 ) (although a recent preprint indicates this could also be explained by APOBEC1-mediated deamination 95 ) and in terminal loop rather than stem sequences 96 , and the upregulation of APOBEC3 proteins in samples from patients with COVID 97-100 . Analysis of SARS-CoV-2 genomic sequences largely acquired through the process of viral genome surveillance of variants of interest over the course of the pandemic has revealed that, after fitness selection, about 40% of all mutations involve C-to-T changes (reviewed elsewhere 100,101 ), which are at least partially confined to a group of mutational hotspots 102 , a pattern consistent with APOBEC3 activity. Numerous other ssRNA viruses (including human T cell leukaemia virus type 1 (HTLV-1) and rubella) have been shown to be targeted by APOBEC3 proteins (reviewed elsewhere 81 ). Overall, deep sequencing data strongly support a functional role of APOBEC3 family members in the restriction of ssRNA viruses in natural settings. Taken together, these studies clearly show the effects of APOBEC3 mutagenesis on viral genomes and its relevance to virus evolution 84,103,104 . As generalists with a preference for viruses with ssRNA and DNA genomes 105 , A3A and A3G contribute to restriction of a range of viruses but can also drive evolution of retroviruses (such as HIV-1 (reF. 85 )), DNA viruses (such as herpesviruses 74 ) and also ssRNA viruses that lack ssDNA intermediates (including SARS-CoV-2 and rubella among others 96 ).

AID/APOBECs and cancer evolution
The first solid piece of genetic evidence linking any AID/APOBEC family member to cancer was the finding that APOBEC1 overexpression in the liver of transgenic animals induces hepatocellular carcinoma 106 , although whether this was the result of RNA editing or DNA mutation remained unclear. Ectopic expression of AID was later shown to catalyse off-target DNA mutations and chromosomal translocations 107,108 , albeit at rates substantially lower than those reported for its true target, the immunoglobulin genes. Subsequently, some APOBEC3 family members (chiefly those with access to the nucleus) were reported to be a cause of DNA damage and mutagenesis 109,110 . Indeed, based on mutational signatures found in cancer genomes, AID/APOBEC-derived mutations are present in more than 50% of human cancer types, and account for 5-90% of all substitution mutations 111,112 . In addition, AID/APOBEC mutations can occur in clusters over kilobase-sized regions 113,114 . These hypermutated clusters are termed kataegis mutations 113,115 and have been reported in more than 60% of cancers 116 . They are especially prominent in cancer types where APOBEC3 mutagenesis is active 117 . Expression of some AID/ APOBEC enzymes in tumours (such as AID in chronic myeloid leukaemia 118 or A3B in tamoxifen-resistant breast cancer 119 ) has been correlated with increased tumour evasion and drug resistance, suggesting that they drive tumour evolution. Independently of kataegis mutations, APOBEC3-catalysed mutagenesis can also lead to chromosomal instability 120,121 and, thus, to either cellautonomous lethality 122,123 or to cancer evolution through increased tumour heterogeneity 124 . Given that these outcomes mirror those of APOBEC3-mediated viral restriction, we hypothesize that expression of APOBEC3 proteins is induced by the inflammatory cancer microenvironment in an attempt to kill malignant cells via localized hypermutation. However, when APOBEC3-mediated mutation is not successful in achieving tumour restriction 125,126 , the tumour cells that have evaded cell death (that is, those with non-lethal levels of mutation) can drive cancer evolution, thus leaving behind a mutational signature in the genome at sites that are likely directly related to the original drive to restrict 26 (Fig. 4b).
The RNA editing capacity of some AID/APOBEC deaminases has also been directly linked to the generation of heterogeneity essential to tumour evolution [127][128][129][130][131] (for a comprehensive recent review on the AID/APOBEC but also ADAR contribution to tumour evolution, see reF. 128 ). For example, loss of editing (through ablation of Apobec1) in the small intestine of a mouse model of intestinal cancer (the APC min mouse) leads to substantial tumour reduction 132 . Additionally, deletion of Apobec1 from the germline of a mouse model of testicular cancer (in which around 8% of male mice succumb to testicular teratocarcinomas by 4 weeks of age) ablates susceptibility 133 . Finally, it has recently been demonstrated that the location of A3A-catalysed DNA mutations in cancer genomes can be predicted in clinical samples by monitoring the frequency of A3A RNA editing at the same loci 28 . This finding supports the notion that editing precedes mutation and that RNA editors induced under inflammatory conditions can also inflict DNA damage, such as kataegis mutation. More generally, these data imply that the RNA editing state of a cell determines the fate of that cell, even in the absence of a heritable genomic mutation. Indeed, both A3A expression and RNA editing were detected in cancers such as acute myeloid leukaemia and myeloproliferative neoplasm 28 , yet APOBEC-associated genomic signatures are only a minor component of the mutational signatures present in these tumours 111 , further implying that A3A activity on RNA could precede DNA mutagenesis in cancer.

AID/APOBECs as base-editing tools
In this section we will discuss how AID/ APOBEC enzymes have been used in genome and transcriptome engineering technologies, broadly known as programmable base editing (Fig. 5), to revert T-to-C or A-to-G transitions in DNA or mRNA, and how a fuller understanding of their substrate specificities can inform the design and optimization of these tools. As this Perspective is focused on AID/APOBECs, we will not discuss mRNA base-editing technologies that are based on adenosine deaminase enzymes (reviewed extensively elsewhere [134][135][136][137][138][139] ).

DNA-directed base-editing tools
The first members of the AID/APOBEC family to be used as the basis of a cytosine base editor (CBE) were AID, rat APOBEC1 (rA1) and A3G. A seminal paper from volume 23 | August 2022 | 513 NAtuRe RevIeWs | GENETICS the Liu laboratory used catalytically dead CRISPR-associated endonuclease (dCas) fused to these AID/APOBEC family members, together with appropriate Cas9 guide rNas (gRNAs), to target deaminase activity to specific loci and induce single base changes in the absence of a DNA break 140 . Given the substantial activity of rA1 as a DNA mutator 109,141,142 , its fusion with dCas9 was the most efficient at generating specific C-to-T (or G-to-A) substitutions within DNA, constituting the first CBE 140 . Several variations of this system were soon developed to increase base-editing efficiency (by fusion with a uracil DNA glycosylase inhibitor (UGI)), to reduce indel generation (for example, by using Cas9-D10A, a nickase mutant of Cas9) and to reduce off-target editing (by using A3A or AID instead of APOBEC1) (reviewed elsewhere 138 ).
Given that rA1 and A3A are generalists, it was unavoidable that DNA editing systems based on these deaminases would also lead to several thousand unwanted RNA editing events 143,144 . However, this off-target activity was almost entirely eliminated by introducing specific amino acid changes into rA1 and A3A. Two different two-amino acid changes to rA1 (R33A/K34A and W90Y/R126E) each resulted in reduced off-target activity on RNA while retaining efficient base editing on DNA 143,144 (Fig. 5b). Similarly, off-target RNA editing by A3A was reduced by introducing either an R128A or a Y130F amino acid change 144 (Fig. 5b). R128A and Y130F of A3A and R126E of rA1 occur in loop 7, emphasizing the importance of residues in this loop for deamination of RNA. Moreover, R33A/K34A changes were shown to affect the capability of APOBEC1 to bind RNA 49 . These mutations illustrate how a better understanding of the features that determine whether an AID/APOBEC protein acts as a generalist or a specialist might enable specificity issues to be avoided by facilitating more informed CBE design and optimization at the outset.  Several CBE variants illustrated, which differ with respect to the deaminase used (rat APOBEC1 (rA1) or human APOBEC3A (A3A)) and specific mutations within the deaminase. Note that amino acids identified in the human APOBEC1 structure 21 as likely to be functionally important can inform base-editing work with rA1. c-e | RNA-directed CBE tools based on APOBEC proteins. APOBEC variants are directed to a specific nucleotide in a transcript of interest via an antisense gRNA, design of which varies according to the editing system: RNA base editing by CURE (cytidine-specific C-to-U RNA Editor) -here, targeting mediated by gRNA that recruits a chimeric protein comprising either dCas13 or dCasRx and a Y132D variant of A3A to the RNA targetand gRNA creates a 14-nucleotide loop containing C to be edited (part c); RNA base editing with a SNAP-tagging system -mouse APOBEC1 (mAPOBEC1)-SNAP chimaera recruited to target RNA via covalent linkage to a benzylguanine (BG)-modified gRNA -and unlike other systems, C to be deaminated is positioned four to six nucleotides downstream of region bound by gRNA (part d); and RNA base editing with an MS2-tagging system -a human APOBEC1 deamination domain (hA1 DD )-MS2 chimaera is recruited to a specific location on target RNA by binding MS2 coat proteins to MS2 stem-loop on gRNA -and in this system, C to be edited is specified by a C:A mismatch between target RNA and gRNA (part e). AID/APOBEC, activation-induced cytidine deaminase/apolipoprotein B mRNA-editing enzyme catalytic polypeptide-like; WT, wild type.

RNA-directed base-editing tools
The development of AID/APOBECs as RNA-directed CBEs has proven to be more difficult than DNA-directed CBEs, leading one group to, instead, evolve ADAR proteins to induce C-to-U editing 145 . One possible explanation for these difficulties is that RNA deamination by APOBEC1, A3A and A3G requires the target RNA to adopt specific secondary structures 26,[29][30][31]146 . This theory is supported by studies using the recently developed CURE (cytidine-specific C-to-U RNA Editor) system, which uses gRNAs to target a Y132D mutant version of A3A fused either to dPspCas13b or dCasrx to specific locations in a target transcript. Interestingly, A3A was only able to elicit RNA editing at the desired location when these gRNAs induced the target transcripts to form a loop 147 (Fig. 5c). Importantly, no off-target DNA editing was detected using CURE, although a few hundred off-target RNA edits were found 147 . Two other recently reported RNA-directed CBE approaches used mAPOBEC1 or human APOBEC1 in combination with either sNaP-tagged or Ms2-tagged gRNAs to target specific target mRNAs 148,149 (Fig. 5d,e). Neither of these two methods was checked for off-target DNA editing, but the mAPOBEC1-SNAP system demonstrated that integration of an inducible editing enzyme reduces global off-target RNA editing, as had previously been shown for ADAR RNA base-editing technologies 149,150 . This method was not benchmarked against CURE, making a direct comparison difficult, but it is important to note that the reported RNA off-target activity of CURE (measured as a simple sum of sites and noting that CURE enzymes are overexpressed) is much lower than that of mAPOBEC1-SNAP (reFs 147,149 ). Despite these recent developments, APOBEC1-based RNA-directed CBE systems still suffer from moderate levels of global off-target RNA editing and, owing to the inherent dinucleotide preference of A3A, CURE can only edit Cs present in a 5′-UC-3′ motif (Fig. 3c). A better understanding of how APOBEC1, A3A and A3G interact specifically with RNA will help improve the current systems and facilitate the development of new ones.

Expanding the potential of base editing
An important limitation of the RNA-directed CBE systems described here is that editing is restricted to locations that match the sequence context preferences of the enzymes used. In particular, no currently known APOBECs naturally edit Cs within a 5′-GC-3′ context (Fig. 3c). Therefore, it will be necessary to develop additional context-specific base editors to complete the spectrum of Cs that can be edited.
Considering the importance of residues in loop 7 (but also in loops 1 and 3) in defining the substrate and sequence context preference of the AID/APOBEC enzymes, it seems reasonable to hypothesize that altering residues within these loops may be a way to change the local motif preferences and alleviate target motif limitations. Finally, recruitment of endogenous AID/ APOBECs for base-editing purposes (as has been done for ADAR 151,152 ) remains an unexplored field. Further developments in this area are important because endogenous AID/APOBEC enzymes are generally overexpressed in contexts (such as cancer) in which therapeutic editing could be beneficial.

Conclusion and future perspectives
Here, we have argued that, under certain conditions, several AID/APOBEC deaminases can act on both RNA and DNA substrates whereas other family members are substrate-restricted. Through the analysis of recently published co-crystal structures we have attempted to describe the features that allow these enzymes to 'toggle' between substrates (as APOBEC1 and some APOBEC3 proteins do) and how such activity can be restricted (as in the case of AID and, perhaps, APOBEC2).
With the advent of programmable base editors, it will be important to analyse all known AID/APOBEC deaminases (not only all mammalian family members but also distant relatives that seem to exist in marine organisms 153 ) for their properties, in order to develop CBEs that can selectively target RNA or DNA and to expand the local sequence preference of such tools. Such analyses can also help answer biological questions arising from the close mechanistic relationship between RNA editing and DNA mutation. For example, it is well understood that DNA mutators of the APOBEC3 family are upregulated in

Base modifications
Chemically altered nucleotides within mature rNa molecules.

G-quadruplex
a non-canonical four-stranded secondary structure of guanine-rich DNa sequences.

Guide RNAs
(grNas). short rNa sequences used in base-editing technologies to target the base editor to a specific sequence in DNa or rNa. Depending on the tagging system used, the base editor can be recruited by the grNa using specific scaffolds (for Cas proteins), sequences (Ms2 coat protein) or chemical modifications (for sNaP).
Intrinsically disordered region (iDr). an unstructured domain of proteins that are believed to have roles in intermolecular and intramolecular interactions, such as complex formation and phase separation.

MS2-tagged
refers to a molecule labelled using a tagging system based on the natural interaction between the Ms2 bacteriophage coat protein and a stem-loop structure from the phage genome. The sequence forming the stem-loop can be attached to a guide rNa (grNa) to target an Ms2-tagged base editor.
Nuclear export signal (Nes). a short peptide motif enriched for hydrophobic residues (such as leu) recognized by exportins (such as xPo1/CrM1) that tags a protein for nuclear exit.

Nuclear localization signal
(Nls). a short peptide motif enriched for positively charged residues that tags a protein for nuclear import.

Pseudotyped HIV
Chimaeric viruses composed of the envelope glycoprotein of vesicular stomatis virus (VsV-g) and the human immunodeficiency virus type 1 (HiV-1) core; these viruses are more infectious than non-pseudotyped HiV-1 viruses.

SNAP-tagged
refers to a molecule labelled using a tagging system based on the sNaP-tag self-labelling protein derived from the human O 6 -alkylguanine-DNa alkyltransferase. as a sNaP-tag will form a covalent linkage with benzylguanine (bg)-modified nucleotides, a sNaP-tagged base editor can be directed to specific targets by bg-modified guide rNas (grNas).

Stem-loops
specific structures that may occur in single-stranded rNa (ssrNa) when complementary sequences base pair to form a double helix that ends in an unpaired (single-stranded) loop. stem-loops are also known as hairpin structures or hairpin loops.

Tumour restriction
The limitation of tumour growth and/or tumour suppression or ablation by numerous distinct molecular mechanisms. Here, we specifically refer to the limitation of tumour growth owing to cell death after activation-induced cytidine deaminase/apolipoprotein b mrNa-editing enzyme catalytic polypeptide-like (aiD/aPobeC)-mediated hypermutation.
cancer tissue -a holistic (but yet to be fully tested) view of the field would argue that these enzymes are actually upregulated in the context of a programmed RNA editing response to inflammation, and that DNA mutation is an off-target outcome of this response 28,143 . If, as implied, RNA is the preferred substrate for these enzymes, it will be important to understand the physiologic role of RNA editing in the context of an early host response to tumour inflammation. Finally, if kataegis mutations (detected in the majority of human cancers) are simply the by-product of the host's attempt to limit tumour growth, then RNA editing could be used diagnostically as an early biomarker for ongoing tumour diversification and relapse 28 .