RNA-guided programmable nucleases from CRISPR systems generate precise breaks in DNA or RNA at specified positions. In cells, this activity can lead to changes in DNA sequence or RNA transcript abundance. Base editing is a newer genome-editing approach that uses components from CRISPR systems together with other enzymes to directly install point mutations into cellular DNA or RNA without making double-stranded DNA breaks. DNA base editors comprise a catalytically disabled nuclease fused to a nucleobase deaminase enzyme and, in some cases, a DNA glycosylase inhibitor. RNA base editors achieve analogous changes using components that target RNA. Base editors directly convert one base or base pair into another, enabling the efficient installation of point mutations in non-dividing cells without generating excess undesired editing by-products. In this Review, we summarize base-editing strategies to generate specific and precise point mutations in genomic DNA and RNA, highlight recent developments that expand the scope, specificity, precision and in vivo delivery of base editors and discuss limitations and future directions of base editing for research and therapeutic applications.
The ability to precisely and efficiently edit DNA sequences within the genome of living cells has been a major goal of the life sciences since the first demonstration of restriction cloning1. Recently, RNA-programmable CRISPR-associated (Cas) nucleases have contributed to the pursuit of this goal2,3,4 through their ability to generate a double-strand DNA break (DSB) at a precise target location in the genome of a wide variety of cells and organisms5,6,7,8 (reviewed extensively elsewhere9,10,11,12). Catalytically inactivated Cas nucleases are also useful as programmable DNA-binding proteins that localize tethered proteins to target DNA loci2,13,14,15,16.
Generation of a DSB does not directly lead to DNA editing; rather, editing following nuclease treatment occurs as a result of cellular responses to DSBs. Processes including non-homologous end joining (NHEJ) and microhomology-mediated end joining (MMEJ) can lead to gene disruption through the introduction of insertions, deletions, translocations or other DNA rearrangements at the site of a DSB17,18,19. Alternatively, a precise DNA edit can be made by supplying a donor DNA template encoding the desired DNA change flanked by sequence homologous to the regions upstream and downstream of the DSB. Cellular homology-directed repair (HDR) then results in the incorporation of sequence from the exogenous DNA template at the DSB site20,21. Although HDR is a flexible tool with the ability to make precise insertions, deletions or any point mutation of interest, HDR is largely restricted to the G2 and S phases of the cell cycle, limiting efficient HDR to actively dividing cells, and even in cultured cell lines HDR efficiency can be modest22,23,24. Moreover, NHEJ and HDR are competing processes and, under most conditions, NHEJ is more efficient than HDR. Thus, a majority of edited products will usually contain small insertions or deletions (indels)24,25.
In mammalian cells, DSB-induced NHEJ is an effective way to disrupt a gene of interest. To make comparisons between alleles, study the effects of specific mutations within genes or treat genetic disease through gene correction, however, more reliable techniques that generate precise DNA or RNA modifications are necessary. The largest class of known human pathogenic mutations, by far, is the point mutation (also called single-nucleotide polymorphism (SNP)), although sampling bias owing to the extensive use of short-read sequencing to analyse genomic diversity may skew this distribution26,27,28,29 (Fig. 1a). Installing or reversing pathogenic SNPs efficiently and cleanly is thus of great interest for the study and treatment of genetic disorders and requires a method to specifically change the sequence of an individual base pair within a vast genome.
DSBs created by nucleases such as Cas9 result in indels, translocations and rearrangements27,30,31,32 that are undesired by-products when attempting to install a point mutation. Base editing is a genome-editing method that directly generates precise point mutations in genomic DNA or in cellular RNA without directly generating DSBs, requiring a DNA donor template or relying on cellular HDR33,34,35. Because base editors do not normally create DSBs, they minimize the formation of DSB-associated by-products35,36. Instead, DNA base editors comprise fusions between a catalytically impaired Cas nuclease and a base modification enzyme that operates on single-stranded DNA (ssDNA) but not double-stranded DNA (dsDNA). Upon binding to its target locus in DNA, base pairing between the guide RNA and the target DNA strand leads to displacement of a small segment of single-stranded DNA in an ‘R loop’37. DNA bases within this ssDNA bubble are modified by the deaminase enzyme. To improve efficiency in eukaryotic cells, the catalytically disabled nuclease also generates a nick in the non-edited DNA strand, inducing cells to repair the non-edited strand using the edited strand as a template33,34,35.
Two classes of DNA base editor have been described: cytosine base editors (CBEs) convert a C•G base pair into a T•A base pair33,34,38, and adenine base editors (ABEs) convert an A•T base pair into a G•C base pair. Collectively, CBEs and ABEs can mediate all four possible transition mutations (C to T, A to G, T to C and G to A)35,39 (Fig. 1b). In RNA, targeted adenosine conversion to inosine has also been developed using both antisense40,41,42,43,44,45,46,47,48,49 and Cas13-guided39 RNA-targeting methods. In this Review, we describe the development of DNA and RNA base editors, their capabilities and limitations and their current and future applications.
Development of cytosine base editors
The first DNA base editors convert a C•G base pair to a T•A base pair by deaminating the exocyclic amine of the target cytosine to generate uracil (Fig. 2a). To localize deamination activity to a small target window within the mammalian genome, Liu and co-workers used an APOBEC1 cytidine deaminase, which accepts ssDNA as a substrate but is incapable of acting on dsDNA50. Fusion of APOBEC1 to dead Cas9 from Streptococcus pyogenes (dCas9; a mutant of Cas9 containing D10A and H840A mutations) resulted in base editor 1 (BE1)33 (Table 1). When bound to its cognate DNA, dCas9 performs local denaturation of the DNA duplex to generate an R loop in which the DNA strand not paired with the guide RNA exists as a disordered single-stranded bubble2,37. This feature enables BE1 to perform efficient and localized cytosine deamination in a test tube, with deamination activity restricted to an ~5-bp window of ssDNA (positions ~4–8, counting the protospacer adjacent motif (PAM) as positions 21–23) generated by dCas9. Fusion to dCas9 presents the target site to APOBEC1 in high effective molarity, enabling BE1 to deaminate cytosines located in a variety of different sequence motifs, albeit with differing efficacies33 (Fig. 2b).
A major challenge for the use of base editors in mammalian cells is circumventing DNA repair processes that oppose target base pair conversion. Although BE1 mediates efficient, RNA-programmed deamination of target cytosines in vitro, it is not effective in human cells (deamination efficiency fell from 25–40% in vitro to 0.8–7.7% in cells)33. This decrease is largely due to effective cellular repair of the U•G intermediate in DNA51. Base excision repair (BER) of U•G in DNA is initiated by uracil N-glycosylase (UNG), which recognizes the U•G mismatch and cleaves the glyosidic bond between uracil and the deoxyribose backbone of DNA. BER will usually result in the reversion of the U•G intermediate created by BE1 back to a C•G base pair51,52 (Fig. 2c). To inhibit UNG, Liu and co-workers fused uracil DNA glycosylase inhibitor (UGI), a small protein from bacteriophage PBS, to the carboxy terminus of BE1, generating BE2. UGI is a DNA mimic that potently inhibits both human and bacterial UNG53. BE2 mediates efficient base editing in bacterial cells54 and moderately efficient editing in mammalian cells, enabling conversion of a C•G base pair to a T•A base pair through a U•G intermediate33 (Fig. 2c).
The base-editing efficiency of BE2 is limited by its ability to edit only one strand of DNA. To direct cellular replacement of the G present in the non-deaminated strand of DNA with A, Liu and co-workers designed third-generation base editors (BE3) that specifically nick the non-edited DNA strand (Fig. 2c). Nicking the non-edited DNA strand biases cellular repair of the U•G mismatch to favour a U•A outcome, greatly elevating base-editing efficiencies in mammalian cells. Restoration of His840 in dCas9 generates a base editor that uses Cas9 nickase (D10A) instead of dCas9, resulting in nicking of the non-edited DNA strand (Fig. 2c). The APOBEC1–Cas9 nickase–UGI fusion (BE3) yielded efficient editing in mammalian cells, averaging 37% across six loci in the initial report33. Notably, although indels are a detectable by-product upon treatment with BE3, their frequency is typically small relative to the base edit (indel formation averaged 1.1% across the six reported loci) and much less frequent than indels induced by DSBs33.
Nishida and co-workers described a similar system for cytosine base editing in yeast and mammalian cells, termed ‘Target-AID’34. In lieu of APOBEC1, they used cytidine deaminase 1 (CDA1) in a Cas9 nickase–CDA1–UGI base editor construct. Target-AID displays a slightly shifted activity window relative to BE3 (Table 1). Nishida and co-workers also noted that base editing at certain bases is less precise than expected, demonstrating that C-to-G or C-to-A edits are, in some cases, significant by-products of base editing34, as was also observed with BE3 (ref.33). Improvements to CBEs that minimize by-product formation and increase editing efficiency are discussed below.
Development of adenine base editors
The distribution of pathogenic point mutations in living systems is not uniform across the six possible ways to exchange one base pair for another (Fig. 1b). This uneven distribution is consistent with the relatively high rate of spontaneous cytosine deamination (estimated to be 100–500 deamination events per cell per day), which, if uncorrected, can mutate a G•C base pair to an A•T base pair55,56. A molecular machine capable of reversing such mutations by converting an A•T base pair into a G•C base pair is therefore of particular interest because it would enable correction of the most common type of pathogenic SNPs in the ClinVar database, representing ~47% of disease-associated point mutations (Fig. 1b). Like cytosine, adenine contains an exocyclic amine that can be deaminated to alter its base pairing preferences. Deamination of adenosine yields inosine (Fig. 3a). Although inosine in the third position of a tRNA anticodon is well known to pair with A, U or C in mRNA during translation, in the context of a polymerase active site, inosine exhibits the base pairing preference of guanosine57.
The major hurdle to the development of an ABE was the lack of any known adenosine deaminase enzymes capable of acting on ssDNA. Attempts to force RNA adenosine deaminases to act on DNA by installing them in place of APOBEC1 in BE3 resulted in no detectable adenine base editing35. To overcome this problem, Liu and co-workers evolved a deoxyadenosine deaminase enzyme that accepts ssDNA starting from an Escherichia coli tRNA adenosine deaminase enzyme, TadA35. E. coli cells were equipped with TadA mutants and defective antibiotic resistance genes. To grow in the presence of antibiotic, a mutant TadA–dCas9 fusion (TadA*–dCas9) must convert a deoxyadenosine to a deoxyinosine in the defective antibiotic resistance gene. Bacteria encoding TadA–dCas9 fusions capable of repairing the mutated resistance gene were isolated and then tested in a mammalian cell context.
Although TadA*–dCas9 fusions during this evolution and engineering process were capable of efficient A-to-I conversion in E. coli, simple TadA*–Cas9 nickase fusions resulted in only modest editing rates in mammalian cells. In its native (E. coli) context, TadA acts as a homodimer, with one monomer catalysing deamination and the other monomer contributing to tRNA substrate binding58. In the E. coli selection, endogenous wild-type TadA could form dimers with the mutated TadA*–dCas9 construct in trans; however the absence of TadA in mammalian cells precludes TadA–TadA* heterodimerization. This challenge was addressed by engineering heterodimeric proteins that incorporate a wild-type non-catalytic TadA monomer, an evolved TadA* monomer and a Cas9 nickase (TadA–TadA*–Cas9 nickase) in a single polypeptide chain (Fig. 3b). The single-chain heterodimeric construct greatly improved adenine base-editing efficiency in mammalian cells when compared with the corresponding homodimeric TadA*–TadA*–Cas9 nickase editor, suggesting that the mutations required to support deoxyadenosine deamination are incompatible with the structural role played by the amino-terminal TadA monomer35.
As with the CBEs, ABEs catalyse deamination within a small window of exposed ssDNA generated by Cas9–guide RNA binding to the target locus. ABE7.10, containing 14 amino acid substitutions in the catalytic TadA* domain, is the most efficient and sequence context-independent ABE reported to date and performs A•T to G•C conversion within an editing window of protospacer positions ~4–7, counting the PAM as positions 21–23. Different ABE evolutionary relatives, such as ABE7.9 or ABE6.3, can offer higher editing efficiencies at positions closer to the PAM (such as position 8 or 9) (Table 1). Together, ABEs represent powerful new tools that enable precise conversion of a target A•T base pair to G•C in the genomic DNA of living cells35.
Base editing of RNA
Editing individual bases in RNA can also provide powerful capabilities for the life sciences and, potentially, for medicine. Owing to its single-stranded nature, 12 possible base editors that operate on RNA, rather than 6 possible base editors that operate on dsDNA, are needed to cover all possible changes. To date, the only reported programmable oligonucleotide-directed transformation that changes Watson–Crick base pairing in RNA is deamination of A to I59.
Antisense oligonucleotide-directed A-to-I RNA editing
All RNA base editors characterized in mammalian cells thus far use adenosine deaminases from the ADAR family that natively catalyse hydrolytic adenosine deamination, converting an adenosine to an inosine60,61. Unlike most other RNA-editing enzymes62, ADARs are not natively RNA guided. Instead, they contain a distinct RNA-binding domain that recognizes and localizes the enzyme to certain regions of double-stranded RNA63,64.
Pioneering efforts by Stafforst, Rosenthal, Nakagawa and their respective co-workers to generate a targetable adenine RNA editor tethered the catalytic domain of an ADAR enzyme to a guiding antisense RNA oligonucleotide40,41,42,43,44,45,46,47,48. These RNA editors rely on Watson–Crick base pairing between an antisense RNA and the target transcript to localize an ADAR deaminase domain (ADARDD) to the target RNA. At least three strategies have been developed to establish a physical linkage between the deaminase and the antisense RNA. First, fusing a SNAP tag to the ADAR and generating an antisense benzylguanine-modified RNA (BG-RNA)65 enabled editing in vitro48,66. Delivery of the modified antisense RNA combined with overexpression of a SNAP–ADAR fusion in cells resulted in a covalent linkage between the SNAP-tagged ADAR and the antisense RNA44,45,46,47,48,49 (Fig. 3c). Second, appending the RNA-binding λ-phage N protein to the ADAR deaminase domain and fusing the antisense RNA with a 17-nucleotide ‘BoxB’ hairpin that is bound by BoxB also enabled the association of the antisense RNA and the ADAR, allowing both the guiding RNA and the deaminase construct to be genetically encoded40,41 (Fig. 3d). Third, Stafforst, Fukuda and their respective co-workers showed that fusing the antisense RNA to the natural substrate for ADAR2 (also known as ADARB1) can localize ADAR2 to the antisense RNA for editing in cells42,43 (Fig. 3e).
Two key innovations improved the efficiency and specificity of these RNA-guided deamination systems. Stafforst and Schneider exploited the natural sequence preference of human ADAR1 (also known as DRADA) and ADAR2, which preferentially deaminate an adenine that is mispaired with a cytosine in a double-stranded RNA substrate67,68. They designed a 17-nucleotide antisense RNA sequence that placed a C opposite the target A to generate an A•C mismatch upon binding to the target RNA48. This use of the A•C mismatch to direct ADAR activity improved editing in vitro at the on-target adenine and in many of the motifs they tested, with no detectable editing at nearby adenines in the same RNA48,66. Rosenthal and co-workers combined the A•C mismatch strategy40 with use of a hyperactive human ADAR2 mutant (E488Q) to further increase editing efficiency and demonstrated RNA editing in HEK293T cells40, which was improved in efficiency by using two BoxB recruitment domains41 (Fig. 3d). Despite these improvements, the use of antisense–deaminase conjugates remained challenging owing to high rates of off-target deamination and strong context-dependent editing of adenine bases located in sequence motifs preferred by ADARs40,41,42,44,45,46,47,48.
In the most recently reported antisense-guided RNA-editing system, Stafforst and co-workers dramatically reduced the off-target deamination that usually accompanies efficient RNA editing. They reduced the exposure time between the transcriptome and the deaminase by integrating an inducible SNAP–ADAR fusion construct into HEK293 cells and delivering chemically modified antisense 22-nucleotide BG-RNAs by lipofection49 (Fig. 3c). The SNAP tag spontaneously becomes covalently bound to the BG-RNA. Editing efficiency was impressively high at six assayed endogenous target transcripts (15–90%) and could be multiplexed without efficiency loss. Significant improvements to the specificity of editing were also made through modifying all the nucleotides in the antisense RNA with a 2ʹ-methoxy group other than the cytosine that specifies the target adenine through the previously described A•C mismatch and its two neighbouring bases. This innovation minimized proximal off-target editing other than that at adenine-rich triplet targets49. Distal off-target editing transcriptome-wide was significant when hyperactive ADAR variants were used but was reduced to negligible levels with wild-type ADARs, although on-target editing rates were also lower with wild-type ADARs49.
The most notable limitation of this method is its sequence context dependence; GAN (where N is any nucleotide) target sites are not efficiently edited with any assayed variant owing to the native preference of ADAR1 and ADAR2. Future work may harness ADAR mutants, such as E488Q, which show a reduced sequence preference69, into this system to overcome the targeting sequence limitation. For tolerated sequence motifs, this approach represents a substantial improvement to efficiency and specificity of RNA editing when genomic integration of the RNA editor construct and delivery of a chemically modified antisense RNA can be performed49.
Cas13-directed A-to-I RNA base editing
Zhang and co-workers developed a different approach to RNA-guided RNA base editing that uses a catalytically dead RNA-guided Cas13b enzyme (dPspCas13b) to localize an ADAR to the target RNA39. dPspCas13b is fused to ADARDD to generate an RNA-guided editor (Fig. 3f; Table 1). This approach is termed RNA Editing for Programmable A-to-I Replacement (REPAIR)39. REPAIR incorporates two aspects of ADAR-mediated RNA editing described above: use of the hyperactive ADAR2DD-E488Q mutant and specifying the target adenine with an A•C mismatch39 (Fig. 3c–e). Notably, REPAIR may offer broad sequence context compatibility; when tested at all 16 possible NAN motifs in a luciferase reporter transcript, REPAIRv1 could edit all 16 codons, apparently overcoming the native ADAR preference through binding to the target site with high effective molarity39.
Zhang and co-workers demonstrated that REPAIRv1 offers higher editing efficiency (89%) than two antisense-mediated strategies (BoxB-ADAR2 (50%)40 and full-length ADAR2 (35%)42) when targeted to a Cluc reporter transcript. However, in two endogenous transcripts tested with REPAIRv1, editing efficiency was reduced to 15–40%39. Transcriptome-wide RNA sequencing (RNA-seq) revealed that REPAIRv1 displays off-target editing that is comparable to that of the BoxB-ADAR strategy and significantly greater than that resulting from overexpression of full-length ADAR2 (ref.39). Proximal off-target RNA base editing was also observed with REPAIRv1: adenine bases 50 bp upstream or downstream of the target adenine were edited at a frequency of ~10–20%39. Off-target RNA editing was attributed to overexpression of the hyperactive ADAR deaminase.
To improve the specificity of REPAIRv1, Zhang and co-workers introduced mutations into ADAR2DD-E488Q designed to reduce the binding affinity between ADARDD and non-target cellular RNA39. Use of the ADAR2DD-E488Q–T375G double mutant in the REPAIRv1 architecture resulted in REPAIRv2. In transcriptome-wide sequencing assays using a guide programmed to edit Cluc, REPAIRv2 yielded only 20 detectable off-target editing events, a 900-fold improvement relative to REPAIRv1. Although still detected, REPAIRv2 also dramatically reduced proximal off-target editing in the 100-nucleotide region upstream or downstream of the target adenine. As expected owing to its higher specificity, on-target editing efficiencies of REPAIRv2 were reduced relative to REPAIRv1 (from 89% to ~45% in the Cluc reporter), and it is possible that the sequence-targeting scope of REPAIRv2 is reduced compared with that of REPAIRv1. Nevertheless, its high specificity makes REPAIRv2 a promising tool for A-to-I RNA base editing in the mammalian transcriptome39.
Cellular decoding of inosine in mRNA
In DNA base editing of deoxyadenosine, the resulting deoxyinosine is decoded by a DNA or RNA polymerase either during DNA replication or during transcription. Inosine in RNA is functionally decoded by different machinery, such as the ribosome (when in protein-coding regions) or the spliceosome (when in splice sites). Whereas there is strong evidence that deoxyinosine in DNA is read as a G in the active site of a polymerase in human cells57, an inosine in the wobble position of a tRNA pairs with A, C or U in mRNA, enabling a single tRNA to decode multiple cognate codons70. Indeed, in microRNAs (miRNAs), the reduced binding strength between the I•C base pair compared with the G•C base pair is thought to be biologically relevant for directing mRNA decay71.
The ability of inosine to form base pairs with multiple bases raises concern that an inosine in an mRNA might be decoded as a mixture of bases in the context of a ribosome or spliceosome active site. Known examples of natural A-to-I editing in the coding regions of mRNA suggest that editing to an inosine at codon position 1 or 2 results predominantly in the inosine being read as a guanine, both in cells72 and in vitro73. For applications involving RNA editing to modulate splicing, observations are also consistent with the spliceosome reading an inosine as a guanine, as A-to-I editing can directly generate or destroy splice sites as if the I were a G74,75.
Base editor limitations and improvements
Base-editing product purity
Initial reports of CBEs identified that at some genomic loci, unanticipated C-to-non-T edits are observed, reducing base-editing product purity33,34,76,77,78. Liu and co-workers investigated the determinants of base-editing product purity by performing cytosine base editing in cells lacking various genes including UNG. In UNG–/– cells, product purity improved from an average of 68% to >98% across 12 target cytosines, indicating that UNG is required for by-product formation36. This insight was used to improve base-editing outcomes. Fusing a second UGI domain onto the carboxy terminus of BE3 improved the editing purity in UNG-containing cell lines, probably owing to increased inhibition of UNG. In addition, installation of a more flexible set of linkers improved efficiency of editing to generate a fourth-generation editor, BE4 (ref.36) (Fig.2c; Table 1). Overexpression of UGI in trans with a BE3 also improves product purity and reduces indel formation in mammalian cells79, but this may be accompanied by a global increase in C-to-T mutation rates80,81.
In some cases, the ability of a CBE with no fused UGI to mutate a target C to a mixture of T, A and G bases provides a useful system for targeted random mutagenesis. Bassik, Chang and their respective co-workers developed two such systems that exploit C-to-non-T editing abilities of base editors for targeted mutagenesis in mammalian cells. These approaches, targeted activation-induced deaminase (AID)-mediated mutagenesis78 and CRISPR-X76, have been reviewed extensively82.
Adenine base editing by an ABE typically exhibits very high product purity; indeed, there are no reports of significant A-to-non-G edits to date35,83,84,85,86, perhaps because of the much weaker ability of cells to remove inosine from DNA than uracil. Consistent with this potential explanation, the use of ABEs in cells deficient in alkyl adenine DNA glycosylase (AAG; also known as MPG), an enzyme known to recognize and remove inosine in DNA87, did not improve editing efficiency35.
Generation of indels
DNA base editing can yield a low but detectable rate of indel formation. Liu and co-workers noted that as well as improved product purity profiles, UNG-knockout cells displayed reduced indel formation36. This observation is consistent with a model in which UNG-mediated creation of an abasic site following C-to-U deamination can lead to nicking of the deaminated strand of DNA by DNA (apurinic or apyrimidinic site) lyase (AP lyase)88 (Fig.2c). If the opposite strand has been nicked by the Cas9 nickase component of the base editor, the resulting proximity of the two nicks results in a DSB, which is likely to be resolved by indel-prone end-joining processes (Fig.2c). Liu and co-workers showed that indel formation can be substantially reduced by fusing the bacteriophage Mu-derived Gam protein (Mu-GAM) to BE4 to generate BE4-Gam, which further reduces indels in treated HEK293T cells relative to BE4 (ref.36). BE4-Gam treatment also resulted in increased product purity and reduced indel frequency compared with BE3 in rabbit embryos89.
ABE typically leads to very low (in some cases undetectable) indel frequencies, typically well below 1%, for treated cells in culture35,83,86,90,91, mice83 and plants92. The lower frequency of ABE-mediated indels is consistent with the requirement of a glycosylase or other enzyme involved in DNA repair to remove inosine and induce a nick in the edited strand to form an indel35. Because the removal of inosine is thought to be substantially less efficient than removal of uracil from DNA87, fewer nicks in the deaminated strand, fewer resulting DSBs and fewer indels would be expected to follow adenine base editing than cytosine base editing.
Off-target editing with DNA base editors
As with all genome-editing technologies, both cytosine and adenine DNA base editors have the potential to operate on DNA at off-target genomic loci33,34,35,93. Off-target base editing can be classified into ‘proximal off-target editing’, which is editing that takes place near (for example, within 200 bp of) the target locus but outside the activity window, and ‘distal off-target editing’, which is editing that takes place away from the target locus. While the off-target effects of DNA base editors continue to be investigated, early evidence suggests that distal off-target base editing generally occurs only at a subset of loci that experience off-target editing from Cas9 nuclease94. In contrast to RNA editors (see above), current data33,35 suggest that DNA base editors typically do not induce measurable proximal off-target edits, although an in-depth study of proximal off-target base editing has not yet been reported.
As the Cas9 component mediates the DNA-targeting ability of base editors, off-target base edits have been interrogated through deep sequencing of genomic loci known to be edited by the Cas9 nuclease33,34,35,86,90,93. As expected, off-target loci that contain a C positioned in the activity window of the editor are sometimes edited at a low but detectable frequency by CBEs. As not all the Cas nuclease off-targets contain an editable cytosine, off-target profiles of CBEs are generally more favourable than that of the corresponding nucleases programmed with the same guide RNAs33,34,35,86,90,93. To improve the DNA specificity of cytosine base editing, high-fidelity versions of BE3 have been generated by incorporating mutations known to improve the editing fidelity of the Cas9 nuclease into the Cas9 portion of BE3. Liu and co-workers used the mutations discovered by Joung and co-workers95 to improve the DNA specificity of the Cas9 nuclease, resulting in high-fidelity BE3 (HF-BE3)95. HF-BE3 shows a substantial reduction in off-target editing, even when paired with highly promiscuous guide RNAs93 (Table 1). Kim and co-workers have generated an alternative high-fidelity base editor, called Sniper-BE3, using the same strategy with a different set of mutations96.
Kim and co-workers developed an unbiased in vitro screen for identifying off-target edits by CBEs using purified genomic DNA and BE3ΔUGI (BE3b36) ribonucleoproteins (RNPs), finding that the off-target loci deaminated by rat APOBEC1 (rAPOBEC1)–Cas9 nickase are indeed predominantly, but not entirely, a subset of the loci edited by the Cas9 nuclease94. Although off-target adenine base editing has not been broadly interrogated, examination of off-target ABE activity at known off-targets of the Cas9 nuclease when programmed with the same guide RNAs suggests that ABEs exhibit substantially lower off-target activity than Cas9 nucleases and even less than was observed from BE3 (refs33,35). Further studies to investigate off-target ABE activity in cells and in vivo are needed to fully characterize and explain the apparently higher DNA specificity of the ABEs than the CBEs.
In addition to the off-target editing that could be directed by the DNA-binding protein component of base editors, deamination of ssDNA not targeted by Cas9 (such as within a transient bubble of ssDNA during transcription), or in RNA, may occur from DNA base editors. Misregulation or overexpression of endogenous deaminases has been linked to elevated mutation rates97,98,99, and expression of the UGI component of CBEs could also lead to an elevated rate of C-to-T transitions in the genome through impeding repair of spontaneously generated uracils80,81. However, studies of CBE off-target editing to date do not report widespread C-to-T mutations upon CBE expression or treatment, and transient delivery methods such as RNP delivery are likely to further reduce the mutagenicity of UGI in the context of CBEs33,34,77,86,90,93,94.
Whole-genome sequencing (WGS), when performed on the genomic DNA from sufficient numbers of independent cells, has the potential to detect all types of off-target base editing in cells or whole higher organisms. However, the WGS experiments reported to date on base-edited animals have not been performed with sufficient power or controls to identify such events across an entire mammalian genome. Kim, Huang and their respective co-workers performed WGS on mutant mice generated through treatment with ABE7.10 and a guide RNA targeted to the Tyr locus83 or targeted to the Hoxd13 locus in a one-cell-stage embryo100. Computational analysis indicated that none of the SNPs identified in the treated mice were likely to have arisen through off-target base editing. Together, these studies further suggest high DNA specificity of ABE7.10.
We note that these studies do not exclude the possibility of deamination from base editors that is not directed by the DNA-binding or RNA-binding component of the editors but instead by random encounters between the deaminase domain of base editors and transient ssDNA. More data are required to characterize this possibility, including WGS of treated and untreated littermate controls and of mice treated with base editor mutants with catalytically inactivated deaminases. The continued development of context-dependent base editors101 or future base editor variants that lack the ability to bind ssDNA without assistance from the guide RNA represents a potential solution to further minimize the possibility of random non-directed off-target base editing.
Editing window and bystander edits
In the case of BE3, which incorporates S. pyogenes Cas9 (SpCas9) as the DNA-targeting moiety, the ‘activity window’ in which efficient editing is observed is approximately five nucleotides wide (positions 4–8, counting the PAM as positions 21–23)33,36,93 (fig.4a). Bases located outside the activity window but within the ssDNA R loop region may still be edited at a lower efficiency, particularly if they are located in a favourable editing motif (see below). For many genome-editing applications, only a single-nucleotide is targeted for conversion, so an ideal base editor would have a narrow activity window that focuses activity only on the target base. However, such a narrow window necessitates that the base editor be targetable to a broad range of PAM sequences. As the repertoire of natural Cas nucleases with different PAM requirements and function in human cells (including SaCas9 (ref.102), LbCpf1 and AsCpf1 (ref.103), CjCas9 (ref.104), StCas9 (ref.105) and NmCas9 (ref.106), engineered CRISPR proteins107,108 and laboratory-evolved CRISPR proteins86) continues to expand, the desirability of more precise base editor variants with narrower activity windows will increase.
For some target sites, multiple editable Cs or As exist within or nearby the activity window, which can result in conversion of bases in addition to the target base. We use the term ‘bystander editing’ to describe editing in the protospacer at a nucleotide other than the target nucleotide (fig.4b). Bystander editing may be inconsequential, especially when base editing to disrupt promoters, splice sites or other regulatory sequences or when knocking out gene function by introducing premature stop codons. When editing protein-coding genes, within a canonical five-base editing window, most, but not all, base-editing cases will result in only the desired single amino acid change, in part because the genetic code dictates that almost all third-position transitions in a codon are silent (see Box 1 for a detailed analysis).
To minimize bystander editing, researchers have developed base editor variants with altered activity windows. Liu and co-workers engineered CBEs with mutations in the rAPOBEC1 domain that attenuate deamination activity, resulting in editors with reduced processivity and narrower activity windows (YE1-BE3, YE2-BE3 and YEE-BE3)109 (Table 1). These narrow-window CBEs enable selective editing of a target C over a neighbouring C that is located within the standard editing window of BE3. For ABE7.10, which is generally the most efficient and widely used ABE, the activity window is approximately located from position 4 to position 7 in the protospacer (counting the PAM as positions 21–23). For certain targets, ABE7.9 or ABE6.3 may be more useful owing to a slightly broader activity window enabling editing from position 4 to position 9 (ref.35). Recent work by Kim and co-workers described how pairing a 5ʹ-extended guide RNA with ABE7.10 can increase editing to positions 2–3, although editing at these positions remains modest83 (Table 1). The use of base editor variants that exhibit strong sequence context preference serves as a promising additional strategy to minimizing bystander base editing. These variants are discussed below (see the Base editing sequence context section).
Conversely, Huang and co-workers expanded the width of the editing window by engineering ‘BE-PLUS’, a CBE variant in which a SunTag110 was fused to the amino terminus of the Cas9-D10A nickase. Separately expressing a single-chain variable fragment (scFv)–APOBEC–UGI fusion allows up to ~10 UGI domains to associate with each SunTag111. This construct enabled editing from protospacer position 4 to position 16, with reduced indel and C-to-non-T editing compared with BE3, probably owing to the recruitment of many UGI domains111 (Table 1). Although base editors with enlarged editing windows are more prone to bystander editing, they also facilitate access of the target base pair and may be especially useful when targeting non-protein-coding sites.
Successful DNA target binding by CRISPR family nucleases requires a PAM, which is a conserved sequence upstream or downstream of the variable guide RNA protospacer sequence2,16 (Fig. 4a). For base editing, the PAM must be appropriately positioned relative to the target base to ensure efficient editing. Even though SpCas9 offers the least restrictive PAM among those CRISPR enzymes reported to function with high activity in mammalian cells, owing to this requirement, only ~26% of known pathogenic SNPs that are of the four types of base conversions (C to T, G to A, A to G or T to C) that can be performed can be targeted by SpCas9-derived base editors86 (Fig. 4c). This limitation creates the need to develop base editors with additional PAM compatibilities.
To increase the number of targetable bases, researchers have developed base editors incorporating different CRISPR-associated nuclease enzymes (Table 1). Liu and co-workers described a set of alternative CBEs with Staphylococcus aureus Cas9 (SaCas9) and engineered variants of SpCas9 and SaCas9 capable of efficient editing with non-NGG PAMs109, including SaBE3, Sa(KKH)-BE3, VQR-BE3, VRER-BE3 and EQR-BE3. Chen and co-workers described a CBE derived from Cas12a (also known as Cpf1) (PAM = TTTV, where V is A, C or G), which allows access to T-rich regions of genomic DNA38. Because there is no known mutation capable of transforming Cas12 into a nickase that cleaves only the non-deaminated strand of DNA, Chen and co-workers characterized a dead LbCas12a base editor, which nevertheless displays editing efficiencies averaging 22% across ten target sites in HEK293T cells38 (Table 1).
Recently, Liu and co-workers used phage-assisted continuous evolution (PACE) to evolve SpCas9 to recognize a broader range of PAMs. A resulting evolved variant, xCas9(3.7), harbours mutations that allow it to access some target sequences with some NG, GAA or GAT PAMs. Replacing Cas9 in the BE3 construct with xCas9(3.7) made xBE3, a CBE capable of editing some loci with NGN, GAA and GAT PAMs86 (Table 1). Although xCas9 variants are capable of mediating DNA cleavage or base editing at several non-NGG PAMs, xCas9-mediated editing efficiency varies among different target sites, and like many engineered or evolved Cas9 variants, it is likely to require a high degree of perfection between the guide RNA and the target sequence, including a G at the 5ʹ end of the guide RNA and at the corresponding first position of the protospacer101,112. Surprisingly, in addition to its expanded PAM acceptance, xCas9 also displays higher editing fidelity than SpCas9 (ref.86).
Nureki and co-workers used a rational design approach to develop another SpCas9 variant with broadened PAM compatibility, termed NG-Cas9 (ref.113). In mammalian cells, the relative activities of xCas9 and NG-Cas9 appear to be guide-RNA-dependent; Nureki and co-workers reported that NG-Cas9 is more active than xCas9 at 15 out of 15 NGC, 16 out of 18 NGT and 15 out of 19 NGA PAM sites, but NG-Cas9 exhibits a loss of efficiency at the canonical NGG PAM sites that is not observed with xCas9 (ref.113). NG-Cas9 also does not exhibit the increased fidelity observed with xCas9 but tolerates inclusion of fidelity-increasing mutations113. As a CBE, NG-Cas9 accepted a subset of NG PAM loci as substrates for efficient base editing113 (Table 1).
Alternative-PAM ABEs have been developed by adapting SaCas9 (ref.84), Sa(KKH)Cas9 (refs114,115), Sp(VQR)Cas9 (refs114,115) and Sp(VRER)Cas9 (ref.114) into the ABE7.10 architecture, resulting in efficient generation of mutant rice plants (Table 1). Additional ABE variants with altered PAM requirements would substantially augment the scope of targetable bases for adenine base editing.
Base-editing sequence context
In addition to PAM-imposed and activity window-imposed sequence restrictions, the particular deaminase enzyme variant used in a base editor may impose sequence context preferences that affect editing efficiency at a particular locus. For example, rAPOBEC1 exhibits poor processing of cytosines within some (but not all) GC motifs33,36 (Fig. 4d). By contrast, other cytidine deaminases such as AID or CDA1 do not display this particular sequence preference but exhibit lower editing efficiencies than rAPOBEC1 in most tested sequence contexts when tested in a BE3 architecture36. Yang and co-workers identified that rAPOBEC1-mediated base-editing rates are reduced by DNA methylation at CpG dinucleotides116 and that human APOBEC3A (hA3A) can edit cytosines found in CpG dinucleotides and in GC motifs more efficiently than rAPOBEC1 (ref.116).
Joung and co-workers harnessed the sequence preferences of different cytosine deaminase enzymes to engineer a mutant hA3A-based CBE that preferentially deaminates cytosines preceded by a T101 (Table 1) as a strategy to reduce bystander editing. Structure-guided design and screening of hA3A deaminase mutants resulted in an enhanced variant (eA3A) with a single mutation (N57G) that deaminates the target motif (TC) but significantly reduces activity at Cs in other sequence contexts, resulting in a context-dependent base editor that maintains a five-nucleotide activity window101. Importantly, Joung and co-workers performed a detailed analysis of the individual alleles that were generated upon successful base editing by eA3A-BE3, BE3 and other engineered variants (YEE-BE3, YE1-BE3 and YE2-BE3) to demonstrate that eA3A can make the desired allele at a high efficiency and purity101. A high-throughput sequencing data analysis package facilitated this detailed analysis of base-editing outcomes117,118.
Context-specific base editors such as those developed by Joung and co-workers represent an important advance that offers more precise base editing with the trade-off of lower target site applicability because the target nucleotide must naturally exist in the preferred sequence context. Thus far, the data from mammalian cell editing with ABE7.10 indicate that it is relatively free from motif-related sequence preferences in human cells35, but Kim and co-workers have demonstrated that there is a preference for editing at TA motifs relative to GA, CA or AA motifs in Arabidopsis thaliana92. The development of additional context-specific ABE and CBE variants will be enabling for applications in which editing only a single base is paramount.
Improving intracellular expression and nuclear localization of base editors
For plasmid delivery of the Cas9 nuclease, optimization of codon use for mammalian cell expression improves soluble protein levels and increases editing efficiencies112. Optimization of the nuclear localization sequence (NLS) also improves Cas9-medited editing in vivo119. Liu and co-workers identified that poor expression is also a bottleneck to the efficiency of base editors and optimized codon usage and NLSs to generate improved CBEs and ABEs, resulting in BE4max and ABEmax from BE4 and ABE7.10, respectively91. The use of ancestral sequence reconstruction starting from the protein sequences of the hundreds of known APOBEC homologues, a process that has been demonstrated to improve protein expression120, resulted in AncBE4max. All three optimized base editors offered substantially improved editing efficiency, especially under suboptimal conditions such as when delivery into cells is limiting91.
In an elegant independent study, Dow and co-workers optimized CBE codon usage by removing premature poly(A) sites and rare mammalian codons and improved CBE nuclear localization by adding a second NLS to the amino terminus of BE3 to generate an optimized FNLS-BE3 that results in much higher editing efficiencies than BE3. When packaged into lentivirus, FNLS-BE3 mediated efficient editing in murine intestinal organoids121. Hydrodynamic injection of the plasmid encoding FNLS, together with a guide RNA that programmes the base editor to make an S45F mutation in Ctnnb1, lead to significantly more efficient base editing and corresponding physiological changes (tumour nodule formation) in the livers of mice than BE3 treatment121. Dow and co-workers also generated lentiviral constructs with the corresponding optimized editor versions of BE4-Gam, which enable improved editing rates with reduced indel formation121. Ensuring optimal expression of the base editor construct in the target cell type is critical for applications that require high editing efficiency, and the above developments thus represent important advances.
Delivery of base editors
DNA delivery strategies: plasmid transfection and viral delivery
Because most proteins cannot spontaneously traverse cell membranes, a delivery method is required to facilitate cell entry. A common strategy is to deliver DNA encoding the target protein through chemical transfection122, electroporation123 or viral infection124 and then rely on target cell transcription and translation to produce the desired protein.
For cell lines in culture (including HEK293T, HeLa, U2OS and murine NIH3T3 cells), lipid-mediated transfection of plasmids encoding base editors has resulted in high editing efficiencies without selection for transfected cells33,34,35,36,38,39,83,86,93. For cell types resistant to plasmid lipofection, electroporation followed by fluorescence-activated cell sorting (FACS) to isolate transfected cells has yielded favourable editing efficiencies for lymphoblastoid cell lines (LCLs)35 and mouse astrocytes33. Although plasmid-based delivery is a convenient strategy, DNA delivery raises the risk of exogenous DNA recombination into the genome, and protracted overexpression of genome-editing agents increases off-target editing rates77,93,125,126,127.
The use of viruses to deliver DNA encoding base editors is a promising delivery modality for some in vivo research or therapeutic applications. Use of non-integrating vectors such as adeno-associated virus (AAV), herpes simplex virus (HSV) or adenoviral vectors reduces the potential for random integration of exogenous DNA into the host genome. However, infection with adenovirus and HSV-1 may provoke inflammatory responses124. By contrast, AAV is thought to be both non-inflammatory and non-pathogenic128. When coupled with its broad tropism, well-studied serotypes and ability to infect dividing cells, AAV is a particularly promising strategy for viral delivery of genome-editing agents.
AAV-mediated delivery of many CRISPR genome-editing agents, including base editors, is challenging owing to the 4.9 kbp packaging limit of AAV129. A CBE or ABE plus a guide RNA totals ~6 kbp. Kim and co-workers overcame this through the use of two trans-RNA splicing AAVs (tsAAVs)130 encoding each half of ABE7.10 (ref.83). Dual tsAAV-mediated delivery of ABE7.10 into skeletal muscle in a mouse model of Duchenne muscular dystrophy corrected a premature stop codon83. After dual infection, homologous recombination between the identical inverted terminal repeat (ITR) sequences generates the full-length ABE7.10 transcript83, enabling ABE7.10 protein production.
DNA-free base editing enables precise and specific changes to genomic DNA without exposing a cell to exogenous DNA90,93. Sustained overexpression of genome-editing agents erodes DNA specificity: after successful editing, the target site is no longer a binding site for the editing agent, and residual editor can only act to mediate off-target editing. Thus, controlling the exposure to editing agents, including base editors, can greatly improve their DNA specificity25,93,125,127.
Kim and co-workers established that purified Cas9 complexed with a guide RNA, forming an RNP complex, can be efficiently delivered into mammalian cells in culture by electroporation and that RNP delivery of Cas9 leads to improved DNA specificity relative to plasmid-based delivery127. Liu and co-workers demonstrated that cationic lipid-mediated delivery of Cas9 RNP complexes can facilitate in vivo delivery of Cas9 near the site of administration, as well as efficient delivery into cells in culture, and results in greatly improved DNA specificity relative to plasmid-based lipofection125,126,131.
BE3 protein has also been purified77,93, and Liu and co-workers have packaged BE3–guide RNA RNP complexes into cationic liposomes for lipid-mediated delivery to cultured cells, zebrafish embryos and the inner ear of postnatal mice90,93. Analogous to the delivery of Cas9, cationic lipid-mediated delivery of BE3 dramatically improves DNA specificity in human cells compared with plasmid delivery90,93. BE3 RNPs have also been delivered through electroporation into mice77 and through direct injection into Xenopus laevis embryos132. RNP delivery is also effective for alternative base editors; the engineered high-precision editor eA3A(N57Q) has been delivered as an RNP into human erythroid precursor cells via nucleofection of the RNP complex to correct a mutant haemoglobin beta (HBB) allele that causes beta-thalassemia, resulting in a fourfold increase in HBB expression101. The advantages of RNP delivery include improving editing specificity and removing the reliance on intracellular transcription and translation to generate the editing agent.
mRNA delivery of base editors
Delivery of mRNA is a commonly used strategy to deliver genome-editing agents into embryos. Kim, Huang, Lin, Liu, Zhang, Li and their respective co-workers have demonstrated that in vitro transcription followed by purification of an mRNA encoding BE3, when combined with a guide RNA, can be co-delivered into single-cell mouse77,133, human134,135, rabbit89, rat136 or zebrafish zygotes137,138 by electroporation or direct injection to generate point mutations with high efficiency and DNA specificity. These studies establish mRNA delivery of base editors into embryos as a robust and efficient strategy for the generation of animals with tailor-made point mutations.
Applications of base editing
Base editing to install or correct pathogenic point mutations
Because point mutations are the largest class of known pathogenic genetic variants (Fig. 1a) and CBEs and ABEs collectively have the potential to install or reverse up to ~60% of pathogenic point mutations28,29 (Fig. 1b), a major application of base editing is the study or treatment of disease-associated point mutations.
Examples of base-editor-induced gene correction in cultured cells are already numerous. Liu and co-workers showed that plasmid nucleofection of BE3 can convert the Alzheimer disease-associated allele APOE4 to APOE3r in mouse astrocytes and can correct the cancer-associated p53 mutation Y163C in breast cancer cells33. Subsequently, codon-optimized CBEs were delivered as plasmids in patient-derived fibroblasts to correct the L119P mutation in MPDU1 (ref.91) that causes the congenital disorder of glycosylation type 1f139. Liu and co-workers also showed that plasmid delivery of ABE7.10 can correct the hereditary haemochromatosis-causing mutation C282Y in an immortalized patient-derived LCL and can install a mutation known to increase fetal haemoglobin (HBG) expression in adults35. Joung, Huang and their respective co-workers reported correction of a mutant HBB allele in an engineered HEK293T cell line101 and in patient-derived primary fibroblasts135.
Direct injection of base-editor-encoding mRNA along with a guide RNA has also proved effective for editing pathogenic alleles in human embryos. Direct injection of mRNA encoding BE3 (refs134,135,140), YE1-BE3 (ref.140) or YEE-BE3 (ref.135) together with a guide RNA can generate homozygous mutants at a rate of up to 77% of embryos that survive to the blastomere stage134,135.
Viral delivery of base editors is an effective method for correcting pathogenic mutations in mouse disease models in vivo. Kim and co-workers used AAV to deliver ABE7.10 with a guide RNA programmed to correct a premature stop codon in the dystrophin gene (Dmd) in a mouse model of muscular dystrophy. Although the correction rate was only 3.3% of sequenced cells, dystrophin expression was restored in 17% of muscle fibres83, highlighting that low levels of editing can often lead to therapeutically relevant phenotypic change. Separately, Musunuru and co-workers generated an adenoviral vector encoding BE3 and a guide RNA programmed to make the W159X stop-codon mutation in murine Pcsk9. They measured a median rate of 25% editing in liver cells and observed a modest reduction in plasma PCSK9 protein levels and plasma cholesterol 4 weeks after injection141.
In vivo base editing has also been used to ascertain whether a genotype is causal for a particular phenotype. Dow and co-workers performed hydrodynamic transfection of an optimized BE3 plasmid construct, termed FNLS-BE3, with a guide RNA programmed to make the S45F cancer-associated mutation in Ctnnb1. They demonstrated efficient (nearly 100%) base editing in liver cells and showed that mice treated with FNLS-BE3 plus the on-target guide RNA grew a significant number of visible tumour nodules compared with controls121. Lin and co-workers delivered BE3 as an mRNA into one-cell-stage zebrafish embryos to generate a P302S mutation in tyr that mimics a common mutation observed in human ocular albinism. This approach enabled investigation into the effects of such a mutation on ocular pigmentation137. These studies hint at the promise of base editors as potential therapeutics and demonstrate their efficacy for researchers interested in ascertaining the phenotypic effects of precise genetic changes in cell culture and in vivo.
Base editing in postmitotic cells
Liu and co-workers demonstrated that base editing can occur in the non-mitotic sensory supporting and hair cells142 in vivo in the mouse inner ear90. BE3 combined with a guide RNA targeting β-catenin was used to control flux through the WNT signalling pathway90. Blocking phosphorylation at S33 through an S33F mutation extends the cellular half-life of β-catenin, increasing WNT signalling. For this target, maintaining low indel rates is critical, as indels are likely to disrupt the gene and reduce β-catenin levels, opposing the desired change90. Lipid-mediated delivery of BE3 complexed with the S33F guide RNA as an RNP into the inner ear of mice led to editing in postmitotic somatic cells at efficiencies up to 8%. Dissection and staining of treated hair cells identified that BE3 treatment, unlike treatment with the Cas9 nuclease and an HDR template, induced cellular reprogramming of other cells into cells resembling cochlear hair cells. These results establish the ability of base editing to occur in postmitotic cells that are resistant to DSB-stimulated HDR23,143.
Cytosine base editing to introduce premature stop codons
CBEs (but not ABEs) can install premature stop codons to disrupt genes in a homogeneous manner by precisely converting one of four codons (CAA, CAG or CGA in the non-coding strand or TGG in the coding strand) into stop codons. Kim and co-workers demonstrated this possibility by using BE3 to introduce a premature stop codon in Dmd in mouse embryos77. The CRISPR-Stop144 and iSTOP145 methods use this principle to enable high-throughput BE3-mediated gene inactivation without generation of DSBs and accompanying indels. Ciccia and co-workers generated a database describing a set of guide RNAs that, when complexed with BE3, are capable of generating premature stop codons in >98.6% of open reading frames in the human genome (reference genome assembly GRCh38). They published a freely accessible online database enabling researchers to find appropriate guide RNAs for iSTOP to use in eight species145. Adli and co-workers identified that this strategy results in a significant reduction in apoptosis when compared with Cas9 nuclease treatment144, possibly owing to lower DSB-induced toxicity146,147,148. Despite being typically efficient and widely utilized, NHEJ-mediated knockout of genes following DSBs leads to a mixed population of cells, DNA translocations and rearrangements27,149, and the induction of cell death146,147,148, all of which in principle are avoided through the use of base editors to install precise stop codons. Flow cytometry of CRISPR-Stop-treated cells indicated that stop codon introduction is similar in efficiency to Cas9-mediated gene knockout144.
Perez-Pinera and co-workers confirmed that base-editor-induced C-to-T edits at the conserved splicing acceptor site can induce exon skipping150. Their method (termed CRISPR-SKIP) was similar in efficiency to Cas9 DSB-mediated exon skipping, but unlike nuclease treatment, it did not generate DSBs150.
Base editing in embryos to generate animal models
A common goal of genome editing at the single-cell embryo stage is to generate model organisms. To minimize mosaicism and maximize the chance that editing occurs in the germ line, it is critical that editing occurs quickly and efficiently. As nuclease-mediated editing strategies often fail to generate homozygous, non-mosaic progeny in the F0 generation151,152, the high efficiency of base editing offers an attractive alternative. CBEs are particularly useful for generating loss-of-function animal models by inserting a premature stop codon into a gene of interest without generating DSBs or indels77,137.
Kim and co-workers demonstrated that microinjection of mRNA encoding BE3 together with a guide RNA, or electroporation of the BE3–guide RNA RNP complex mediates efficient generation of premature stop codons in one-cell-stage mouse embryos at two target sites: Q871X in Dmd or Q68X in Tyr77. Impressively, mRNA treatment yielded the target mutation in 11 out of 15 and 10 out of 10 blastocysts at the Dmd and Tyr loci, respectively. RNP delivery of BE3 was also effective; two out of seven of the embryos treated with a BE3 RNP pre-complexed with a guide RNA targeted to the Tyr locus were transplanted into surrogate mothers to yield homozygous, non-mosaic progeny with the expected albino phenotype77. Independently, Songyang and co-workers used either BE3 or a high-fidelity version of BE2 (BE2-HF2) mRNA to perform base editing in mouse zygotes, resulting in up to 50% of sequenced embryos harbouring a C-to-T point mutation at the target locus133. Li and co-workers made rabbit models of human disease using mRNA injection of BE3, BE4-Gam or ABE7.10 into blastocysts89. They performed ABE-mediated editing to generate the Dmd exon 9 point mutation T297A, which is associated with X-linked dilated cardiomyopathy in humans89. They also used BE3 to install the c.1821C > T mutation in Lmna, generating a rabbit model of Hutchinson–Gilford progeria syndrome89.
Huang and co-workers performed multiplexed base editing through co-injection of ABE7.10 and SaBE3 mRNA along with guide RNA sequences targeting Tyr (an S. aureus guide RNA was used to generate the Q58X mutation) and Hoxd13 (an S. pyogenes guide generated the Q312R mutation) in one-cell mouse embryos. Impressively, A-to-G and C-to-T edits were simultaneously observed in blastocysts100. The same strategy has been used to deliver ABE mRNA and guide RNAs into rat embryos115,136. Zhang and co-workers showed that co-injection of two different guide RNAs efficiently generated two transmissible A-to-G point mutations in the F0 generation simultaneously136, while Yin and co-workers used an ABE to generate a rat model of Pompe disease115. These data demonstrate that base editing is an enabling tool for generating mutant mice, rats and rabbits for animal studies; previous nuclease-based editing methods usually failed to generate non-mosaic mice with 100% mutation frequency in the F0 generation153.
Base editors as cellular event recorders
In addition to its applications in biomedical research to install and correct point mutations, base editing has also been used as a synthetic biology tool to record cellular signalling and exposure to stimuli154. Unlike the stochastic indels that result from Cas9 DNA cleavage, base editors generate predictable single point mutations. By coupling the stimulus of interest to the activity of the base editor, the resulting stimulus-dependent single point mutations can be used to record exposure to signals into the genome. Liu and co-workers developed a ligand-responsive editing system by appending a blocking sequence to a guide RNA through a ligand-dependent hammerhead ribozyme155. This system facilitated ligand-dependent base editing in mammalian cells155.
Subsequently, Liu and Tang demonstrated that controlling expression of a base editor or its accompanying guide RNA using stimulus-dependent promoters enables recording of a wide variety of stimuli — including exposure to light, nutrients, antibiotics or viruses — durably as point mutations into the genome of a cell. This recording system was termed CRISPR-mediated analogue multi-event recording apparatus 2 (CAMERA 2)54. Control of base editor expression through small-molecule-responsive promoters enabled dose-dependent and time-dependent base editing of four small molecules (aTc, IPTG, arabinose and rhamnose) simultaneously in bacterial cells. Through careful design of two ratcheted protospacers, in which base editing from one guide RNA edits the binding site for the second guide RNA, the order of exposure could also be recorded54. The same principles were used in mammalian cells: signals including exposure to doxycycline, tetracycline or IPTG were recorded as base edits in the CCR5 safe-harbour locus. CAMERA 2 could also record changes in WNT signalling in mammalian cells.
Independently, Lu and co-workers used CBEs to develop a related platform for cellular reading and writing named DOMINO (DNA-based Ordered Memory and Iteration Network Operator)156. As in CAMERA 2, expression of the base editor and guide RNAs is controlled with different small-molecule-responsive promoters in E. coli. DOMINO can directly couple stimulus-dependent base editing to a phenotypic readout. For example, successful DNA editing by two input guide RNAs could enable a third guide RNA to bind to a target DNA operator site upstream of a genomically integrated GFP gene. Binding of the guide RNA–base editor complex to this operator resulted in GFP fluorescence reduction156.
Lu and co-workers also used DOMINO as a self-reinforcing ‘molecular clock’ in human HEK293 cells that records stimulus exposure time. They fused a CBE with the VP64 transcriptional activator to perform sequential editing of a repetitive operator region located just upstream of GFP. The circuit was designed such that over the course of 15 days the repetitive operator region was sequentially edited to generate more guide RNA binding sites, increasing localization of the editor to the operator region and thus increasing GFP expression. Both the number of GFP-positive cells and the C-to-T editing levels in the operator region reflected the number of days of exposure between the cell population and the active editor construct156. Both DOMINO and CAMERA 2 rely on the exquisite precision of base editing, as indel-generating methods would not be expected to predictably write new protospacer sequences. We anticipate that future cellular recording applications will use both CBEs and ABEs to develop more complex recording systems, as ABEs can erase signals written by CBEs and vice-versa.
Base editing in plants
Base editing in plants could enable researchers and agriculturalists to rapidly generate novel plant mutants with an efficiency beyond that of conventional breeding157. Generation of precise, gain-of-function point mutations can improve many agronomic traits; for example, a point mutation in the plant ALS gene confers resistance to herbicides such as sulfonylureas and imidazoliones158. Generating precise point mutations in plant cells remains challenging using DSB-induced HDR159,160.
Multiple plant species of agronomic interest have been edited with CBEs and ABEs. Gao and co-workers demonstrated that BE3 generates efficient point mutations in maize, rice and wheat161. In a separate study, Kondo and co-workers showed that the Target-AID editor is capable of efficient editing in rice and tomatos162. More recently, two independent reports from the Zhou and Zhu laboratories demonstrated that ABE editing is highly efficient in rice84,85,114. Gao and co-workers optimized the architecture of ABE7.10 for adenine base editing in rice and wheat and used the resulting editor in protoplasts and in regenerated plants163. Kim and co-workers recently described two phenotypic changes generated through transient Agrobacterium tumefaciens transfection of ABE7.10 into A. thaliana and Brassica napus92. Using a plant-optimized expression system, they performed editing in A. thaliana to generate a single codon change that generates a Y85H mutation in the FT protein, resulting in a late-flowering phenotype, or to disrupt a splice acceptor site in the PDS3 gene, generating a dwarf phenotype. After transformation, >85% of T1 plants showed >50% editing, and T2 seedlings isolated from T1 plants also displayed the same phenotypes, indicating that the editing was germline-transmissible92.
These demonstrations establish that base editing is a promising approach for rapid engineering of polyploid plant genomes. We anticipate that RNP delivery of base editors into crop species will be particularly important from a regulatory and consumer perspective because transgene integration from plasmid delivery results in plants with genetically modified organism (GMO) status. RNP delivery of base editors would enable DNA-free precision editing that may avoid the creation of GMO crops158.
Conclusions and future perspectives
The ability to efficiently and cleanly install changes to genetic information in living systems at the highest-resolution level — that of the individual base pair — resembled science fiction even only recently. The major developments summarized in this Review have rapidly established base editing of individual nucleotides as a robust technology with the potential to broadly impact the life sciences and medicine.
The two classes of DNA base editors described thus far have repeatedly proved effective for making precise point mutations in the genome of a wide variety of living cells and organisms. That said, CBEs and ABEs make only two of the six possible changes of one base pair to another. Much additional work is needed to develop base editors that can install transversion mutations, and possibly other DNA or RNA changes, at programmable target loci. Success is likely to be facilitated by a deep understanding and creative manipulation of cellular mechanisms controlling base modification and DNA repair in mammalian cells.
Although early examples of in vivo base editing are very encouraging, challenges associated with delivery of large proteins into specific tissues remain an important focus of ongoing efforts, including the use of base editing to treat human genetic diseases. Thus, the development of novel base editor delivery systems, including those that target specific tissues, is likely to be another major focus in the coming years. Detailed analyses of the off-target editing activities of base editors in vivo under a variety of conditions relevant to ongoing research and therapeutic applications are also needed, as are assessments of the potential biological consequences of making off-target point mutations in vivo. For example, as base editors in general do not create DSBs that can lead to indels, translocations or large DNA rearrangements, can the clinically relevant consequences of off-target base editing be adequately assessed by monitoring the DNA sequences of a defined set of oncogenesis-associated genes and their regulatory regions? Experimentally testing such possibilities in animals would represent important steps towards advancing base editing into the clinic.
The continued development of additional editing technologies that maximize base-editing efficiency and targeting scope, while minimizing off-target base editing, will continue to propel the field towards increasingly ambitious and sophisticated applications. For the vast majority of base-editing applications described here, the target sequence is known in advance. Thus, the development of many distinct classes of future base editors that each convert a target DNA base pair or RNA base exclusively in a particular sequence context, or in a protospacer containing a particular PAM, is likely to play an important role in maximizing the precision and specificity of base editing.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
D.R.L. acknowledges support from Defense Advanced Research Projects Agency (DARPA) HR0011-17-2-0049; the Ono Pharma Foundation; US National Institutes of Health (NIH) RM1 HG009490, R01 EB022376, U01 AI142756 and R35 GM118062; and Howard Hughes Medical Institute (HHMI). H.A.R. is supported by the Kilpatrick Educational fund from the Chemistry and Chemical Biology Department, Harvard University. The authors thank J.K. Joung, F. Zhang, A. Raguram, W.-H. Yeh, T. Huang, K. Zhao and W. Tang for their helpful comments.
- Guide RNA
Short RNA sequence comprising a scaffold for binding to the necessary CRISPR-associated (Cas) enzyme and a variable spacer region that defines the target site for the enzyme. In natural CRISPR systems, the guide RNA is often made of two molecules of RNA with complementarity. Engineered ‘single-guide’ RNAs connecting the two natural guide RNA components are often accepted by Cas enzymes.
- Protospacer adjacent motif
(PAM). A small region of nucleotides in the target DNA sequence adjacent to the sequence specified by a guide RNA. The PAM is not specified in the guide RNA, but CRISPR-associated (Cas) enzymes do not bind or cleave a sequence unless they are next to the appropriate PAM.
- Cas9 nickase
A catalytically disabled mutant of a Cas9 enzyme that is able to create a single-stranded DNA break but not a double-stranded DNA break.
- Activity window
The region of DNA or RNA, typically defined by the number of nucleotides from the protospacer adjacent motif (PAM), in which a particular base editor acts to induce efficient point mutations. The activity window for most base editors is approximately four to five nucleotides wide.
A region in a guide RNA of 15–25 nucleotides in length that specifies the target RNA or DNA locus.
- Proximal off-target editing
Unwanted editing of bases that occurs outside of the activity window but is found nearby (for example, 100 nucleotides upstream or downstream of) the target site.
- Distal off-target editing
Unwanted editing of bases residing in locations of the genome or transcriptome unrelated to (for example, >100 nucleotides away from) the target site of the base editor.
A class 2, type VI RNA-guided RNase from the CRISPR system. Variants from several species have been characterized. It catalyses site-specific cleavage of single-stranded RNA.
- Wobble position
The third nucleotide in a codon.
- Base-editing product purity
The term used to describe the spectrum of mutations induced by a particular base-editing technology. Low product purity occurs when a target base is mutated to bases other than the desired point mutation or when small insertions or deletions are generated in addition to the desired edit; for example, C-to-G or C-to-A edits, rather than the desired C-to-T edit, from a cytosine base editor.
An RNA-guided endonuclease variant isolated from the CRISPR system of Streptococcus pyogenes. It catalyses site-specific cleavage of double-stranded DNA at sites with an NGG protospacer adjacent motif (PAM).
An RNA-guided endonuclease variant isolated from the CRISPR system of Staphylococcus aureus. It catalyses site-specific cleavage of double-stranded DNA at sites with an NNGRRT protospacer adjacent motif (PAM).
- Bystander editing
Editing of a non-target base that resides in the activity window of a particular base editor and guide RNA. Bystander editing occurs in addition to editing of the target base.
A class 2, type V RNA-guided endonuclease from the CRISPR system. Variants from several species have been characterized. It catalyses site-specific cleavage of double-stranded DNA at sites with a TTTV protospacer adjacent motif (where V is A, C or G).
A state in which two or more cell populations with distinct genotypes are present in the same organism and derived from a single fertilized egg.