The spontaneous deamination of cytosine is a major source of transitions from C•G to T•A base pairs, which account for half of known pathogenic point mutations in humans. The ability to efficiently convert targeted A•T base pairs to G•C could therefore advance the study and treatment of genetic diseases. The deamination of adenine yields inosine, which is treated as guanine by polymerases, but no enzymes are known to deaminate adenine in DNA. Here we describe adenine base editors (ABEs) that mediate the conversion of A•T to G•C in genomic DNA. We evolved a transfer RNA adenosine deaminase to operate on DNA when fused to a catalytically impaired CRISPR–Cas9 mutant. Extensive directed evolution and protein engineering resulted in seventh-generation ABEs that convert targeted A•T base pairs efficiently to G•C (approximately 50% efficiency in human cells) with high product purity (typically at least 99.9%) and low rates of indels (typically no more than 0.1%). ABEs introduce point mutations more efficiently and cleanly, and with less off-target genome modification, than a current Cas9 nuclease-based method, and can install disease-correcting or disease-suppressing mutations in human cells. Together with previous base editors, ABEs enable the direct, programmable introduction of all four transition mutations without double-stranded DNA cleavage.
The formation of uracil and thymine from the spontaneous hydrolytic deamination of cytosine and 5-methylcytosine, respectively1,2, occurs an estimated 100–500 times per cell per day in humans1 and can result in C•G to T•A mutations, which account for approximately half of all known pathogenic single nucleotide polymorphisms (SNPs) (Fig. 1a). The ability to convert A•T base pairs to G•C base pairs at target loci in the genomic DNA of unmodified cells could therefore make it possible to correct a substantial fraction of human disease-associated SNPs.
Base editing is a form of genome editing that enables direct, irreversible conversion of one base pair to another at a target genomic locus without requiring double-stranded DNA breaks (DSBs), homology-directed repair (HDR) processes, or donor DNA templates3,4,5. Compared with standard genome editing methods to introduce point mutations, base editing can proceed more efficiently3 and with far fewer undesired products, such as stochastic insertions or deletions (indels) or translocations3,4,5,6,7,8.
The most commonly used base editors are third-generation designs (BE3) comprising (i) a catalytically impaired CRISPR–Cas9 mutant that cannot make DSBs; (ii) a single-strand-specific cytidine deaminase that converts C to uracil (U) within an approximately five-nucleotide window in the single-stranded DNA bubble created by Cas9; (iii) a uracil glycosylase inhibitor (UGI) that impedes uracil excision and downstream processes that decrease base editing efficiency and product purity5; and (iv) nickase activity to nick the non-edited DNA strand, directing cellular DNA repair processes to replace the G-containing DNA strand3,5. Together, these components enable efficient and permanent C•G to T•A base pair conversion in bacteria, yeast4,9, plants10,11, zebrafish8,12, mammalian cells3,4,5,6,7,8,13,14, mice8,15,16, and even human embryos17,18. Base editing capabilities have expanded through the development of base editors with different protospacer-adjacent motif (PAM) compatibilities7, narrowed editing windows7, enhanced DNA specificity8, and small-molecule dependence19. Fourth-generation base editors (BE4 and BE4-Gam) show further improved editing efficiency and product purity5.
To date, all reported base editors mediate C•G to T•A conversion. In this study, we used protein evolution and engineering to develop a new class of adenine base editors (ABEs) that convert A•T to G•C base pairs in DNA in bacteria and human cells. Seventh-generation ABEs efficiently convert A•T to G•C at a wide range of target genomic loci in human cells efficiently and with a very high degree of product purity, exceeding the typical performance characteristics of BE3. ABEs greatly expand the scope of base editing and, together with previously described base editors, enable the programmable installation of all four transitions (C to T, A to G, T to C, and G to A) in genomic DNA.
Evolution of an adenine deaminase that processes DNA
The hydrolytic deamination of adenosine yields inosine (Fig. 1b). Within the constraints of a polymerase active site, inosine pairs with C and therefore is read or replicated as G20. While replacing the cytidine deaminase of an existing base editor with an adenine deaminase could, in theory, provide an ABE (Fig. 1c), no enzymes are known to deaminate adenine in DNA. Although all reported examples of enzymatic adenine deamination occur on free adenine, free adenosine, adenosine in RNA, or adenosine in mispaired RNA–DNA heteroduplexes21, we began by replacing the APOBEC1 (apolipoprotein B mRNA editing enzyme, catalytic polypeptide-like 1) component of BE3 with natural adenine deaminases such as Escherichia coli TadA (ecTadA)22,23, human ADAR224, mouse ADA25, and human ADAT226 (Supplementary Sequences 1) to test the possibility that these enzymes might process DNA when present at a high effective molarity. Unfortunately, when plasmids encoding these deaminases fused to Cas9 D10A nickase were transfected into HEK293T cells together with a corresponding single guide RNA (sgRNA), we observed no A•T to G•C editing above that seen in untreated cells (Extended Data Figs 1, 2b). These results suggest that the inability of these natural adenine deaminase enzymes to accept DNA precludes their direct use in an ABE.
Given these results, we sought to evolve an adenine deaminase that accepts DNA as a substrate. We developed a bacterial selection method for base editing by creating defective antibiotic resistance genes that contain point mutations at critical positions (Supplementary Table 8 and Supplementary Sequences 2). Reversion of these mutations by base editors restores antibiotic resistance. To validate the selection, we used a bacterial codon-optimized version of BE23 (APOBEC1 cytidine deaminase fused to dCas9 and UGI), because bacteria lack the nick-directed mismatch repair machinery27 that enables more efficient base editing by BE3. We observed successful rescue of a defective chloramphenicol acetyl transferase (CamR) containing an A•T to G•C mutation at a catalytic residue (H193R) by BE2 and an sgRNA programmed to direct base editing to the inactivating mutation.
Next we adapted the selection plasmid for ABE activity by introducing a C•G to T•A mutation in the CamR gene, creating an H193Y substitution that confers minimal chloramphenicol resistance (Supplementary Table 8 and Supplementary Sequences 2). A•T to G•C conversion at the H193Y mutation should restore chloramphenicol resistance, linking ABE activity to bacterial survival.
Previously described base editors3,5,7,8 exploit the use of cytidine deaminase enzymes that operate on single-stranded DNA but reject double-stranded DNA. This feature is critical to restrict deaminase activity to a small window of nucleotides within the single-stranded bubble created by Cas9. TadA is a tRNA adenine deaminase22 that converts adenine to inosine (I) in the single-stranded anticodon loop of tRNAArg. E. coli TadA shares homology with the APOBEC enzyme28 used in our original base editors, and some APOBECs bind single-stranded DNA in a conformation that resembles tRNA bound to TadA28. TadA does not require small-molecule activators (unlike ADAR29) and acts on polynucleic acid (unlike ADA25). On the basis of these considerations, we chose ecTadA as the starting point of our efforts to evolve a DNA adenine deaminase.
We created unbiased libraries of ecTadA–dCas9 fusions containing mutations only in the adenine deaminase portion of the construct, to avoid altering the favourable properties of the Cas9 portion of the editor (Supplementary Table 7). The resulting plasmids were transformed into E. coli harbouring the CamR H193Y selection (Fig. 2a and Supplementary Table 8). Colonies that survived chloramphenicol challenge were strongly enriched for the TadA mutations A106V and D108N (Fig. 2b). Sequence alignment of ecTadA with S. aureus TadA, for which a structure in complex with tRNAArg has been reported30, predicts that the side-chain of D108 forms a hydrogen bond with the 2′-OH group of the ribose in the uracil upstream of the substrate adenine (Fig. 2c). Mutations at D108 are likely to abrogate this hydrogen bond, decreasing the energetic opportunity cost of binding DNA. DNA sequencing confirmed that all clones that survived the selection showed A•T to G•C reversion at the targeted site in CamR. Collectively, these results indicate that mutations at or near TadA D108 enable TadA to perform adenine deamination on DNA substrates.
The TadA A106V and D108N mutations were incorporated into a mammalian codon-optimized TadA–Cas9 nickase fusion construct that replaces dCas9 with the Cas9 D10A nickase used in BE3 to manipulate cellular DNA repair to favour the desired base editing outcomes3, and adds a C-terminal nuclear localization signal (NLS). We designated the resulting TadA*–XTEN–nCas9–NLS construct, in which TadA* represents an evolved TadA variant and XTEN is a 16-amino acid linker used in BE33, as ABE1.2. Transfection of plasmids expressing ABE1.2 and sgRNAs targeting six human genomic sites (Extended Data Fig. 2a) resulted in very low but observable A•T to G•C editing efficiencies (3.2 ± 0.88%; all editing efficiencies are reported as mean ± s.d. of three biological replicates five days after transfection without enrichment for transfected cells unless otherwise noted) at or near protospacer position 5, counting the PAM as positions 21–23 (Fig. 3a). These data confirmed that an ABE capable of catalysing low levels of A•T to G•C conversion emerged from the first round of protein evolution and engineering.
Improved deaminase variants and ABE architectures
To improve editing efficiencies, we generated an unbiased library of ABE1.2 variants and challenged the resulting TadA*1.2–dCas9 mutants in bacteria with higher concentrations of chloramphenicol than were used in round 1 (Supplementary Tables 7 and 8). From round 2 we identified two new mutations, D147Y and E155V, that were predicted to lie in a helix adjacent to the TadA tRNA substrate (Fig. 2c). In mammalian cells, ABE2.1 (ABE1.2 + D147Y + E155V) exhibited twofold to sevenfold higher activity than ABE1.2 at the six genomic sites tested, resulting in an average of 11 ± 2.9% A•T to G•C base editing (Fig. 3a).
Next we sought to improve ABE2.1 through additional protein engineering. Fusing the TadA(2.1)* domain to the C terminus of Cas9 nickase, instead of the N terminus, abolished editing activity (Extended Data Fig. 2c), as previously shown in BE33. We also varied the length of the linker between TadA(2.1)* and Cas9 nickase. An ABE2 variant (ABE2.6) with a linker twice as long (32 amino acids, (SGGS)2-XTEN-(SGGS)2,) as the linker in ABE2.1 offered modestly higher editing efficiencies, now averaging 14 ± 2.4% across the six genomic loci tested (Extended Data Fig. 2c).
Alkyl adenine DNA glycosylase (AAG) catalyses the cleavage of the glycosidic bond of inosine in DNA31. To test whether inosine excision impeded ABE performance, we created ABE2 variants designed to minimize potential sources of inosine excision. Given the absence of known protein inhibitors of AAG, we attempted to block endogenous AAG from accessing the inosine intermediate by separately fusing to ABE2.1 catalytically inactivated versions of enzymes involved in inosine binding or removal: human AAG (inactivated with a E125Q mutation31) or E. coli Endo V (inactivated with a D35A mutation32). Neither ABE2.1–AAG(E125Q) (ABE2.2) nor ABE2.1–Endo V(D35A) (ABE2.3) exhibited altered A•T to G•C editing efficiency in HEK293T cells compared with ABE2.1 (Extended Data Fig. 2d). Indeed, ABE2.1 in Hap1 cells lacking AAG failed to increase base editing efficiency or product purity compared with Hap1 cells containing wild-type AAG (Extended Data Fig. 2e). Moreover, ABE2.1 induced virtually no indels (≤0.1%) or A•T to non-G•C products (≤0.1%) in HEK293T cells, consistent with inefficient excision of inosine (Extended Data Fig. 3). Together, these observations suggest that cellular repair of inosine intermediates created by ABEs is inefficient, obviating the need to subvert base excision repair. This situation contrasts with that of BE3 and BE4, which are strongly dependent on inhibiting uracil excision to maximize base editing efficiency and product purity3,5.
As a final ABE2 engineering study, we investigated the effect of TadA* dimerization on base editing efficiency. TadA natively operates as a homodimer, with one monomer catalysing deamination and the other monomer acting as a docking station for the tRNA substrate30. During selection in E. coli, endogenous TadA probably serves as the non-catalytic monomer. We hypothesized that tethering an additional wild-type or evolved TadA monomer might improve base editing in mammalian cells by minimizing reliance on intermolecular ABE dimerization. Indeed, co-expression with ABE2.1 of either wild-type TadA or TadA*2.1 (ABE2.7 or ABE2.8, respectively), as well as direct fusion of either evolved or wild-type TadA to the N terminus of ABE2.1 (ABE2.9 or ABE2.10, respectively), substantially improved editing efficiency (Fig. 3a and Extended Data Fig. 4a). A fused TadA*–ABE2.1 architecture (ABE2.9) offered the highest editing efficiency (averaging 20 ± 3.8% across the six genomic loci, a 7.6 ± 2.6-fold average improvement at each site over ABE1.2) and therefore a dimeric architecture was used in all subsequent experiments (Figs 2b, 3a).
Finally, we determined which of the two TadA* subunits within the TadA*–ABE2.1 fusion was responsible for catalysing conversion of adenine to inosine. We introduced an inactivating E59A mutation22 into either the N-terminal or the internal TadA* monomer of ABE2.9. The variant with an inactivated N-terminal TadA* subunit (ABE2.11) demonstrated comparable editing efficiencies to ABE2.9, whereas the variant with an inactivated internal TadA* subunit (ABE2.12) lost all editing activity (Extended Data Fig. 4a). These results establish that the internal TadA subunit is responsible for catalysing adenine deamination.
ABEs that efficiently edit a subset of targets
Next we performed a third round of bacterial evolution starting with TadA*2.1–dCas9 to increase editing efficiency further. We increased selection stringency by introducing two early stop codons (Q4stop and W15stop) into the kanamycin resistance gene (KanR, aminoglycoside phosphotransferase, Supplementary Table 8 and Supplementary Sequences 2). Each of the mutations requires an A•T to G•C reversion to correct the premature stop codon. We subjected a library of TadA*2.1–dCas9 variants containing mutations in the TadA domain to this higher stringency selection (Supplementary Table 8), resulting in the strong enrichment of three new TadA mutations: L84F, H123Y, and I157F. These mutations were imported into ABE2.9 to generate ABE3.1 (Fig. 2b). In HEK293T cells, ABE3.1 resulted in editing efficiencies averaging 29 ± 2.6% across the six tested sites, a 1.6-fold average increase in A•T to G•C conversion at each site over ABE 2.9, and a 11-fold average improvement over ABE1.2 (Fig. 3b). Using longer (64- or 100-amino-acid) linkers between the two TadA monomers, or between TadA* and Cas9 nickase, did not consistently improve editing efficiency compared to ABE3.1 (Extended Data Figs 1, 4b).
Although ABE3.1 mediated efficient base editing at some targets, such as the CAC in site 1 (65 ± 4.2% conversion), for other sites, such as the GAG in site 5, editing efficiency was much lower (8.3 ± 0.67%) (Fig. 3b). The results from six genomic loci with different sequence contexts surrounding the target adenine suggested that ABEs from rounds 1–3 strongly preferred target sequence contexts of YAC, where Y is T or C. This preference is likely to have been inherited from the substrate specificity of native E. coli TadA, which deaminates the adenine in the UAC anticodon of tRNAArg. The utility of an ABE would be greatly limited, however, by such a target sequence restriction.
To overcome this sequence preference, we initiated a fourth evolution campaign focusing mutagenesis at TadA residues that were predicted to interact with the nucleotides upstream and downstream of the target adenine30. We subjected TadA*2.1–dCas9 libraries (Supplementary Table 7) containing randomized amino acids at five such positions (E25, R26, R107, A142, and A143) to a new selection in which A•T to G•C conversion of a non-YAC target (GAT, which causes a T89I mutation in the spectinomycin resistance protein) restores antibiotic resistance (Supplementary Table 8 and Supplementary Sequences 2). Surviving bacteria converged strongly on the TadA mutation A142N. Although apparent A•T to G•C base editing efficiency in bacterial cells with TadA*4.3–dCas9 (TadA*3.1 + A142N–dCas9) was higher than with TadA*3.1–dCas9, as judged by spectinomycin resistance (Extended Data Fig. 4c), in mammalian cells ABE4.3 exhibited decreased base editing efficiency (averaging 16 ± 5.8%) compared with ABE3.1 (Figs 2b, 3b). We hypothesized that the A142N mutation might benefit base editing in a context-dependent manner, and revisited its inclusion in later rounds of evolution (see below).
We performed a fifth round of evolution to increase ABE catalytic performance and broaden target sequence compatibility. We generated a library of TadA*3.1–dCas9 variants containing unbiased mutations throughout the TadA* domain as before (Supplementary Table 7). To favour ABE constructs with faster kinetics, we subjected this library to CamR H193Y selection with higher doses of chloramphenicol after allowing ABE to be expressed for only half the duration (7 h) of the previous rounds of evolution (about 14 h) (Supplementary Table 8). Surprisingly, importing a consensus set of mutations from surviving clones (H36L, R51L, S146C, and K157N) into ABE3.1, thereby creating ABE5.1, decreased overall editing efficiencies in HEK293T cells by 1.7 ± 0.29-fold (Figs 2b, 3b).
ABE5.1 included seven mutations since our dimerization state experiments on ABE2.1. We speculated that the accumulation of these mutations might have impaired the ability of the non-catalytic N-terminal TadA subunit to play its structural role in mammalian cells. In E. coli, endogenous wild-type TadA is provided in trans, potentially explaining the difference between bacterial selection phenotypes and mammalian cell editing efficiencies. Therefore, we examined the effect of using wild-type TadA instead of evolved TadA* variants in the N-terminal TadA domain of ABE5 variants. A heterodimeric construct containing wild-type E. coli TadA fused to an internal evolved TadA* (ABE5.3) exhibited greatly improved editing efficiency compared to homodimeric ABE5.1 with two identical evolved TadA* domains. ABE5.3 had an average editing efficiency across the six genomic test sites of 39 ± 5.9%, with an average improvement at each site of 2.9 ± 0.78-fold relative to ABE5.1 (Figs 2b, 3b). Notably, ABE5.3 also showed broadened sequence compatibility that now enabled 22–33% editing of non-YAC targets in sites 3–6 (Fig. 3b).
Concurrently, we subjected a library from round 5 to the non-YAC spectinomycin selection used in round 4. Although no highly enriched or beneficial mutations emerged (Extended Data Fig. 5a), mutations from two genotypes that emerged from this selection, N72D + G125A and P48S + S97C, were included in subsequent library generation steps. In addition, eight heterodimeric wild-type TadA–TadA* ABE5.3 variants (ABE5.5 to ABE5.12) containing 24-, 32-, or 40-residue linkers between the TadA domains or between TadA and Cas9 nickase showed no obvious improvements in base editing efficiency over ABE5.3 (Extended Data Figs 1, 5b). For subsequent studies, we therefore used the ABE5.3 architecture containing heterodimeric wtTadA–TadA*–Cas9 nickase with two 32-residue linkers.
Highly active ABEs with broad sequence compatibility
During the sixth round of evolution, we aimed to remove any non-beneficial mutations by DNA shuffling and to re-examine mutations from previous rounds of evolution that may benefit ABE performance once liberated from negative epistasis with other mutations. Evolved TadA*–dCas9 variants from rounds 1–5 along with wild-type E. coli TadA were shuffled and subjected to spectinomycin resistance T89I selection (Supplementary Table 8). Two mutations were strongly enriched from this selection: P48S/T and A142N (first seen in round 4). These mutations were added either separately or together to ABE5.3, forming ABE6.1 to ABE6.6 (Extended Data Fig. 1). ABE6.3 (ABE5.3 + P48S) resulted in 1.3 ± 0.28-fold higher average editing than ABE5.3 at the six genomic sites tested, and an average conversion efficiency of 47 ± 5.8% (Figs 2b, 3c). P48 is predicted to lie approximately 5 Å from the substrate adenosine 2′-hydroxyl in the TadA crystal structure (Fig. 2c), and we speculated that mutating this residue to Ser might improve compatibility with a deoxyadenosine substrate. Although at most sites ABE6 variants that contained the A142N mutation were less active than ABEs that lack this mutation, editing by ABE6.4 (ABE6.3 + A142N) at site 6, which contains a target A at position 7 in the protospacer, was 1.5 ± 0.13-fold more efficient than editing by ABE6.3, and 1.8 ± 0.16-fold more efficient than editing by ABE5.3 (Fig. 3c). These results suggest that ABEs containing A142N may offer improved editing of adenines closer to the PAM than position 5.
Although six rounds of evolution and engineering yielded substantial improvements, ABE6 editors still suffered from reduced editing efficiencies (about 20–40%) at target sequences containing multiple adenines near the targeted A (Fig. 3c). To address this challenge, we performed a seventh round of evolution in which new unbiased libraries of TadA*6–dCas9 variants were targeted to two separate sites in the kanamycin resistance gene: the Q4stop mutation used in round 3, which requires editing of TAT, and a new D208N mutation that requires editing of TAA (Supplementary Table 7 and 8, Supplementary Sequences 2). Surviving clones contained three enriched sets of mutations: W23L/R, P48A, and R152H/P.
Introducing these mutations separately or in combinations into mammalian cell ABEs (ABE7.1 to ABE7.10) substantially improved editing efficiencies, especially at targets that contain multiple A residues (Figs 2b, 3c, d, and Extended Data Figs 1, 6a, b). ABE7.10 edited the six genomic test sites with an average efficiency of 58 ± 4.0%, an average improvement at each site of 1.3 ± 0.20-fold relative to ABE6.3 (Fig. 3c), and 29 ± 7.4-fold compared to ABE1.2. Although mutational dissection revealed that all three of the new mutations contributed to the enhanced editing efficiency (Extended Data Figs 1, 6a, b), the R152P substitution is particularly noteworthy, as this residue is predicted to contact the C in the UAC anticodon loop of the tRNA substrate (Fig. 2b, c). We speculate that substitution of Arg for Pro abrogates base-specific enzyme–DNA interactions and thereby broadens target sequence compatibility.
Characterization of late-stage ABEs
We characterized the most promising ABEs from rounds 5–7 in depth. We chose an expanded set of 17 human genomic targets that place a target A at position 5 or 7 of the protospacer and collectively include all possible NAN sequence contexts (Extended Data Fig. 2a). Overall, we observed strong improvement of A•T to G•C editing efficiency in HEK293T cells during the progression from ABE5 to ABE7 variants (Fig. 3c, d). The base editing efficiency of the most active editor overall, ABE7.10, averaged 53 ± 3.7% at the 17 sites tested, exceeded 50% at 11 of these sites, and ranged from 34–68% (Figs 3c, d). These results compare favourably to the typical C•G to T•A editing efficiency of BE33.
Next, we further characterized the base editing activity window of late-stage ABEs. We chose a human genomic site containing an alternating 5′-A-N-A-N-A-N-3′ sequence that could be targeted with either of two sgRNAs such that an A would be located either at every odd position (site 18) or at every even position (site 19) from 2 to 9 in the protospacer (Extended Data Fig. 2a). The resulting editing outcomes (Extended Data Fig. 7a), together with an analysis of editing efficiencies at every protospacer position across all 19 sites tested (Extended Data Fig. 7b), suggest that the activity windows of late-stage variants are approximately 4–6 nucleotides wide, from approximately protospacer positions 4 to 7 for ABE7.10, and positions 4 to 9 for ABE6.3, ABE7.8, and ABE7.9, counting the PAM as positions 21–23 (Fig. 5). We note that the precise editing window boundaries can vary in a target-dependent manner (Supplementary Table 1), as is the case with BE3 and BE4. We also tested ABE7.8–ABE7.10 in U2OS cells at sites 1–6 and observed similar editing results to those obtained in HEK293T cells (Extended Data Fig. 6c), demonstrating that ABE activity is not limited to HEK293T cells.
Analysis of individual high-throughput DNA sequencing reads from ABE editing at 6–17 genomic sites in HEK293T cells revealed that base editing outcomes at nearby adenines within the editing window are not statistically independent events. The average normalized linkage disequilibrium between nearby target adenines increased steadily as ABE evolution proceeded (Extended Data Fig. 8), indicating that early-stage ABEs edit nearby adenines more independently, whereas late-stage ABEs edit nearby adenines more processively. These findings suggest that TadA might have evolved kinetic changes that decrease the likelihood of substrate release before additional adenines within the editing window are converted, resulting in processivity similar to that of BE33.
In contrast to the formation of C to non-T edits and indels that can arise from BE3-mediated base editing of cytidines, ABEs convert A•T to G•C very cleanly in HEK293T and U2OS cells, with indel frequencies and A to non-G editing similar to those of untreated cells (typically 0.1% or less) among the 17 genomic NAN sites tested (Fig. 4 and Supplementary Table 1). The undesired products of BE3 arise from uracil excision and downstream repair processes5. The remarkable product purity of all tested ABE variants suggests that the activity or abundance of enzymes that remove inosine from DNA may be low compared to those of uracil N-glycosylase (UNG), resulting in minimal base excision repair following ABE editing.
We compared the efficiency of ABE7.10-catalysed A•T to G•C editing to that of a current Cas9 nuclease-mediated HDR method, CORRECT33. At five genomic loci in HEK293T cells, we observed average target point mutation frequencies ranging from 0.47% to 4.2% with 3.3% to 10.6% indels using the CORRECT HDR method under optimized 48-h conditions in HEK293T cells (Fig. 5a). At the same five genomic loci, ABE7.10 resulted in average target mutation frequencies of 10–35% after 48 h, and 55–68% after 120 h (Fig. 5a), with fewer than 0.1% indels (Fig. 5b). The target mutation/indel ratio averaged 0.43 for CORRECT HDR, and more than 500 for ABE7.10, representing an improvement of over 1,000-fold in product selectivity for ABE7.10. Although HDR is well-suited to introduce insertions and deletions into genomic DNA, these results demonstrate that ABE7.10 can introduce A•T to G•C point mutations with much higher efficiency and far fewer undesired products than a current Cas9 nuclease-mediated HDR method.
Next we examined off-target editing by ABE7 variants. As no method yet exists to comprehensively profile the off-target activity of ABEs, we assumed that off-target ABE editing occurred primarily at the off-target sites that are edited when Cas9 nuclease is complexed with the same guide RNA, as is the case with BE33,8,34. We treated HEK293T cells with three well-characterized guide RNAs35 and either Cas9 nuclease or ABE7 variants, and sequenced the on-target loci and the 12 most active off-target human genomic loci associated with these guide RNAs as identified by the genome-wide GUIDE-Seq method35. The efficiency of on-target indels by Cas9 and the efficiency of on-target base editing by ABE7.10 both averaged 54% (Supplementary Tables 2–4). We observed detectable modification (0.2% indels or more) by Cas9 nuclease at nine of the 12 (75%) known off-target loci (Fig. 5c and Supplementary Tables 2–4). In contrast, when complexed with the same sgRNAs, ABE7.10, ABE7.9, or ABE 7.8 produced 0.2% or more off-target base editing at only four of the twelve (33%) known Cas9 off-target sites. Moreover, the nine confirmed Cas9 off-target loci were modified with an average efficiency of 14% indels, whereas the four confirmed ABE off-target loci were modified with an average of only 1.3% A•T to G•C mutations (Supplementary Tables 2–4). Although seven of the nine confirmed Cas9 off-target loci contained at least one adenine within the ABE activity window, three of these seven off-target loci were not detectably edited by ABE7.8, 7.9, or 7.10. Together, these data strongly suggest that ABE7 variants may be less prone to off-target genome modification than Cas9 nuclease, although a comprehensive, unbiased method of profiling the DNA specificity of ABEs is needed. In addition, we did not detect any apparent ABE-induced A•T to G•C DNA editing outside on-target or off-target protospacers following ABE treatment.
Although additional studies are needed to examine possible RNA editing by ABEs, we observed no elevated adenine mutation rate among four abundant mRNAs in ABE7.10-treated HEK293T cells compared to untreated cells (Extended Data Table 1), nor any apparent ABE toxicity in bacterial or human cells under the conditions used here. We speculate that the evolved mutations at TadA residues known to interact with the ribose 2′-hydroxyl (Fig. 2c), the fused Cas9 nickase, or ABE nuclear localization may impede RNA editing.
Installation of disease-relevant mutations with ABE
Finally, we tested the potential of ABEs to introduce disease-suppressing mutations and to correct pathogenic mutations in human cells. Mutations in β-globin genes cause a variety of blood diseases. Humans with the rare benign condition HPFH (hereditary persistence of fetal hemoglobin) are resistant to some β-globin diseases, including sickle-cell anaemia. In certain patients, this phenotype is mediated by mutations in the promoters of the γ-globin genes HBG1 and HBG2 that enable sustained expression of fetal hemoglobin, which is normally silenced in humans around birth36. We designed an sgRNA that programs ABE to simultaneously mutate −198T to C in the promoter that drives HBG1 expression, and −198T to C in the promoter that drives HBG2 expression, by placing the target A•T base pair at protospacer position 7. These mutations are known to confer British-type HPFH and enable fetal hemoglobin production in adults37. ABE7.10 installed the desired T•A to C•G mutations in the HBG1 and HBG2 promoters with 29% and 30% efficiency, respectively, in HEK293T cells (Fig. 5c, Extended Data Fig. 9).
The iron storage disorder hereditary haemochromatosis (HHC) is an autosomal recessive genetic disorder commonly caused by a G to A mutation at nucleotide 845 in the human HFE gene, resulting in a C282Y substitution in the HFE protein that leads to excessive iron absorption and potentially life-threatening elevation of serum ferritin38. We transfected DNA encoding ABE7.10 and a guide RNA that places the target adenine at protospacer position 5 into an immortalized lymphoblastoid cell line (LCL) harbouring the HFE C282Y genomic mutation. Owing to the extreme resistance of LCL cells to transfection, we isolated transfected cells and measured editing efficiency by high-throughput DNA sequencing (HTS) of their genomic DNA. We observed the clean conversion of the Tyr282 codon to Cys282 in 28% of sequencing reads from transfected cells, with no evidence of undesired editing or indels at the on-target locus (Fig. 5c). Although much additional research is needed to develop these and other ABE editing strategies into potential clinical therapies for diseases with a genetic component, including the development of ABEs that accept a wide variety of PAMs7, these examples demonstrate the potential of ABEs to correct disease-driving mutations, and to install mutations known to suppress genetic disease phenotypes, in human cells.
In summary, seven rounds of evolution and engineering transformed a protein with no ability to deaminate adenine at target loci in DNA (wild-type TadA–dCas9) into forms that edit DNA weakly (ABE1s and ABE2s), variants that edit limited subsets of sites efficiently (ABE3s, ABE4s, and ABE5s), and, ultimately, highly active ABEs with broad sequence compatibility (ABE6s and ABE7s). We recommend ABE7.10 for general A•T to G•C base editing. When the target adenine is at protospacer positions 8–10, ABE7.9, ABE7.8, or ABE6.3 may offer higher editing efficiencies than ABE7.10, although conversion efficiencies at these positions are typically lower than at protospacer positions 4–7. The development of ABEs greatly expands the capabilities of base editing and the fraction of pathogenic SNPs that can be addressed by genome editing without introducing DSBs (Fig. 1a). Together with BE33 and BE45, these ABEs advance the field of genome editing by enabling the direct installation of all four transition mutations at target loci in living cells with a minimum of undesired byproducts.
DNA amplification was conducted by PCR using Phusion U Green Multiplex PCR Master Mix (ThermoFisher Scientific) or Q5 Hot Start High-Fidelity 2× Master Mix (New England BioLabs) unless otherwise noted. All mammalian cell and bacterial plasmids were assembled using the USER cloning method as previously described40 and starting material gene templates were synthetically accessed as either bacterial or mammalian codon-optimized gBlock Gene Fragments (Integrated DNA Technologies). All sgRNA expression plasmids were constructed by one-piece blunt-end ligation of a PCR product containing a variable 20-nucleotide sequence corresponding to the desired sgRNA targeted site. Primers and templates used in the synthesis of all sgRNA plasmids are listed in Supplementary Table 5. All mammalian ABE constructs, sgRNA plasmids and bacterial constructs were transformed and stored as glycerol stocks at −80 °C in Mach1 T1R Competent Cells (Thermo Fisher Scientific), which are recA-. Molecular biology grade, Hyclone water (GE Healthcare Life Sciences) was used in all assays and PCR reactions. All vectors used in evolution experiments and mammalian cell assays were purified using ZymoPURE Plasmid Midiprep (Zymo Research Corportion), which includes endotoxin removal. Antibiotics used for either plasmid maintenance or selection during evolution were purchased from Gold Biotechnology.
Generation of bacterial TadA* libraries (evolution rounds 1–3, 5 and 7)
In brief, libraries of bacterial ABE constructs were generated by two-piece USER assembly of a PCR product containing a mutagenized E. coli TadA gene and a PCR product containing the remaining portion of the editor plasmid (including the XTEN linker, dCas9, sgRNA, selectable marker, origin of replication, and promoter). Specifically, mutations were introduced into the starting template (Supplementary Table 7) in 8 × 25 μl PCR reactions containing 75 ng–1.2 μg of template using Mutazyme II (Agilent Technologies) following the manufacturer’s protocol and primers NMG-823 and 824 (Supplementary Table 6). After amplification, the resulting PCR products were pooled and purified from polymerase and reaction buffer using a MinElute PCR Purification Kit (Qiagen). The PCR product was treated with Dpn1 (NEB) at 37 °C for 2 h to digest any residual template plasmid. The desired PCR product was subsequently purified by gel electrophoresis using a 1% agarose gel containing 0.5 μg/ml ethidium bromide. The PCR product was extracted from the gel using the QIAquick Gel Extraction Kit (Qiagen) and eluted with 30 μl H2O. Following gel purification, the mutagenized ecTadA DNA fragment was amplified with primers NMG-825 and NMG-826 (Supplementary Table 6) using Phusion U Green Multiplex PCR Master Mix (8 × 50 μl PCR reactions, 66 °C annealing, 20-s extension) in order to install the appropriate USER junction sequences onto the 5′ and 3′ ends of the fragment. The resulting PCR product was purified by gel electrophoresis. Next, the backbone of the bacterial base editor plasmid template (Supplementary Table 7), was amplified with primers NMG-799 and NMG-824 (Supplementary Table 6) and Phusion U Green Multiplex PCR Master Mix (100 μl per well in a 98-well PCR plate, 5–6 plates total, 66 °C, 4.5-min extension) following the manufacturer’s protocol. Each PCR reaction was combined with 300 ml PB DNA binding buffer (Qiagen) and 25 ml of the solution was loaded onto a HiBind DNA Midi column (Omega Bio-Tek). Bound DNA was washed with five column volumes of PE wash buffer (Qiagen) and the DNA fragment was eluted with 800 μl H2O per column. Both DNA fragments were quantified using a NanoDrop 1000 Spectrophotometer (Themo Fisher Scientific).
TadA* libraries were assembled following a previously reported USER assembly procedure40 with the following conditions: 0.22 pmol ecTadA mutagenized DNA fragment 1, 0.22 pmol plasmid backbone fragment 2, 1 U of USER (Uracil-Specific Excision Reagent, New England Biolabs) enzyme, and 1 U of DpnI enzyme (New England Biolabs) per 10 μl USER assembly mixture were combined in 50 mM potassium acetate, 20 mM Tris-acetate, 10 mM magnesium acetate, 100 μg/ml BSA at pH7.9 (1× CutSmart Buffer, New England Biolabs). Generally, each round of evolution required ~1 ml USER assembly mixture (22 pmol of each DNA assembly fragment) which was distributed into 10-μl aliquots across multiple 8-well PCR strips. The reactions were warmed to 37 °C for 60 min, then heated to 80 °C for 3 min to denature the two enzymes. The assembly mixture was slowly cooled to 12 °C at 0.1 °C/s in a thermocycler to promote annealing of the freshly generated ends of the two USER junctions.
With a library of constructs in hand, we removed denatured enzymes and reaction buffer from the assembly mixture by adding 5 vol PB buffer (Qiagen) to the assembly reaction mixture and binding the material onto a MinElute column (480 μl per column). ABE hybridized library constructs were eluted in 30 μl H2O per column and 2 μl of this eluted material was added to 20 μl NEB 10-beta electrocompetent E. coli and electroporated with a Lonza 4D-Nucleofector System using bacterial program 5 in a 16-well Nucleocuvette strip. A typical round of evolution used ~300 electroporations to generate 5–10 million colony forming units (c.f.u.). Freshly electroporated E. coli were recovered in 200 ml pre-warmed Davis Rich Medium (DRM) at 37 °C, and incubated with shaking at 200 r.p.m. in a 500-ml vented baffled flask for 15 min before the addition of 30 μg/ml carbenicillin (for plasmid maintenance). The culture was incubated at 37 °C with shaking at 200 r.p.m. for 18 h. The plasmid library was isolated with a ZymoPure Plasmid Midiprep kit following the manufacturer’s procedure (50 ml culture per DNA column), except that the plasmid library was eluted in 200 μl pre-warmed water per column. Evolution rounds 1–3, 5 and 7 followed this procedure in order to generate the corresponding libraries with minor variations (Supplementary Table 7).
Generation of site-saturated bacterial TadA* library (evolution round 4)
Mutagenesis at Arg24, Glu25, Arg107, Ala142, and Ala143 of ecTadA was achieved by using ecTadA*(2.1)–dCas9 as a template and amplifying with appropriately designed degenerate NNK-containing primers (Supplementary Table 6). Briefly, the ecTadA*(2.1)–dCas9 template was amplified separately with two sets of primers: NMG-1197 + NMG-1200, and NMG-1199 + NMG-1202, using Phusion U Green Multiplex PCR Master Mix, to form PCR product 1 and PCR product 2, respectively. Both PCR products were purified individually using PB binding buffer and a MiniElute column and eluted with 20 μl H2O per 200 μl PCR reaction. In a third PCR reaction, 1 μl PCR product 1 and 1 μl PCR product 2 were combined with the exterior, uracil-containing primers NMG-1202 and NMG-1197, and amplified by Phusion U Green Multiplex PCR Master Mix to form the desired extension-overlap PCR product with flanking uracil-containing USER junctions. In a fourth PCR reaction, ecTadA*(2.1)–dCas9 was amplified with NMG-1201 and NMG-1198 to generate the backbone DNA fragment for USER assembly. After DpnI digestion and gel purification of both USER assembly fragments, the extension-overlap PCR product (containing the desired NNK mutations in ecTadA) was incorporated into the ecTadA*(2.1)–dCas9 backbone by USER assembly as described above. The freshly generated NNK library was transformed into NEB 10-beta electrocompetent E. coli and the DNA was harvested as described above.
Generation of DNA-shuffled bacterial TadA* library (evolution round 6)
DNA shuffling was achieved by a modified version of the nucleotide exchange and excision technology (NExT) DNA shuffling method41. Solutions of 10 mM each of dATP, dCTP, dGTP and dTTP/dUTP (7/3) were freshly prepared. Next, the TadA* fragment was amplified from 20 fmol of a pool of TadA*–XTEN–dCas9 bacterial constructs isolated from evolution rounds 1–5 in equimolar concentrations using Taq DNA Polymerase (NEB), primers NMG-822 and NMG-823 (Supplementary Table 6), and 400 μM each of dATP, dCTP, dGTP, and dUTP/dTTP (3/7) in 1× ThermoPol Reaction Buffer (63 °C, 1.5-min extension time). The freshly generated uracil-containing DNA library fragment was purified by gel electrophoresis and extracted with QIAquick Gel Extraction Kit (Qiagen), eluting with 20 μl H2O per extraction column. The purified DNA product was digested with 2 U of USER enzyme per 40 μl in 1× CutSmart Buffer at 37 °C and monitored by analytical agarose gel electrophoresis until digestion was complete. The reaction was quenched with 10 vol PN1 binding buffer (Qiagen) when the starting material was no longer observed (typically 3–4 h at 37 °C). Additional USER enzyme was added to the reaction if needed. The digested material was purified with QiaexII kit (Qiagen) using the manufacturer’s protocol and the DNA fragments were eluted in 50 μl pre-warmed H2O per column.
The purified shuffled TadA* fragments were reassembled into full-length TadA* amplicons by an internal primer extension procedure as follows. The eluted digested DNA fragments (25 μl) were combined with 4 U Vent Polymerase (NEB), 800 μM each of dATP, dCTP, dGTP, and dTTP, and 1 U Taq DNA polymerase in 1× ThermoPol Buffer supplemented with 0.5 mM MgSO4. The thermocycler program for the reassembly procedure was as follows: 94 °C for 3 min, 40 cycles of denaturation at 92 °C for 30 s, annealing over 60 s at increasing temperatures starting at 30 °C and adding 1 °C per cycle (cooling ramp 1 °C/s), and extension at 72 °C for 60 s with an additional 4 s per cycle, ending with one final cycle of 72 °C for 10 min. The full-length reassembled product was amplified by PCR with the following conditions: 15 μl unpurified internal assembly was combined with 1 μM each of USER primers NMG-825 and NMG-826, 100 μl Phusion U Green Multiplex PCR Master Mix and H2O to a final volume of 200 μl, 63 °C annealing, extension time 30 s. The PCR product was purified by gel electrophoresis and assembled using the USER method into the corresponding ecTadA*–XTEN–dCas9 backbone with corresponding flanking USER junctions generated from amplification of the backbone with USER primers NMG-799 and NMG-824 as before. Following transformation of the USER assembly products into NEB 10-beta electrocompetent E. coli, the library of evolution round 6 constructs was isolated using a ZymoPURE Plasmid Midiprep kit following the manufacturer’s instructions.
Bacterial evolution of TadA variants
The previously described strain S103042 was used in all evolution experiments and an electrocompetent version of the bacteria was prepared as previously described40 harbouring the appropriate selection plasmid specific to each round of evolution (Supplementary Table 7). In brief, 2 μl freshly generated TadA* library (300–600 ng/μl) prepared as described above was added to 22 μl freshly prepared electrocompetent S1030 cells containing the target selection plasmid and electroporated with a Lonza 4D-Nucleofector System using bacterial program 5 in a 16-well Nucleocuvette strip. A typical selection used 5–10 × 106 c.f.u. After electroporation, freshly transformed S1030 cells were recovered in a total of 250 ml pre-warmed DRM at 37 °C with shaking at 200 r.p.m. for 15 min. Following this brief recovery incubation, carbenicillin was added to a final concentration of 30 μg/ml to maintain the library plasmid, along with the appropriate antibiotic to maintain the selection plasmid; see Supplementary Table 7 for the list of selection conditions, including the antibiotics used for each round. Immediately after the addition of the plasmid maintenance antibiotics, 100 mM of l-arabinose was added to the culture to induce translation of TadA*–dCas9 fusion library members, which were expressed from the PBAD promoter. The culture was grown to saturation at 37 °C with shaking at 200 r.p.m. for 18 h, except that the incubation time for evolution round 5 was only 7 h.
Library members were challenged by plating 10 ml of the saturated culture onto each of four 245 mm2 square bioassay dishes containing 1.8% agar-2×YT, 30 μg/ml plasmid maintenance antibiotics, and a concentration of the selection antibiotic pre-determined to be above the MIC of the S1030 strain harbouring the selection plasmid alone (Supplementary Table 8). Plates were incubated at 37 °C for 2 days and ~500 surviving colonies were isolated. The TadA* genes from these colonies were amplified by PCR with primers NMG-822 and NMG-823 (Supplementary Table 6) and submitted for DNA sequencing. Concurrently, the colonies were inoculated separately into 1-ml DRM cultures in a 96-deep well plate and grown overnight at 37 °C, 200 r.p.m. Aliquots (100 μl) of each overnight culture were pooled, the plasmid DNA was isolated, and the TadA* genes were amplified with USER primers NMG-825 and NMG-826 (Supplementary Table 6). The TadA* genes were subcloned back into the plasmid backbone (containing the XTEN linker–dCas9, and appropriate guide RNAs) with the USER assembly protocol described above. This enriched library was transformed into the appropriate S1030 (+ selection plasmid) electrocompetent cells, incubated with maintenance antibiotic and l-Ara and re-challenged with the selection condition. After a 2-day incubation, 300–400 surviving clones were isolated as described above and their TadA* genes were sequenced. Mutations arising from each selection round were imported into mammalian ABE constructs and tested in mammalian cells as described below.
General mammalian cell culture conditions
HEK293T (ATCC CRL-3216) and U2OS (ATTC HTB-96) were purchased from ATCC and cultured and passaged in Dulbecco’s modified Eagle’s medium (DMEM) plus GlutaMax (ThermoFisher Scientific) supplemented with 10% (v/v) fetal bovine serum (FBS). Hap1 (Horizon Discovery, C631) and Hap1 AAG− cells (Horizon Discovery, HZGHC001537c002) were maintained in Iscove’s modified Dulbecco’s medium (IMDM) plus GlutaMax (ThermoFisher Scientific) supplemented with 10% (v/v) FBS. Lymphoblastoid cell lines (LCL) containing a C282Y mutation in the HFE gene (Coriell Biorepository, GM14620) were maintained in Roswell Park Memorial Institute medium 1640 (RPMI-1640) plus GlutaMax (ThermoFisher Scientific) supplemented with 20% FBS. All cell types were incubated, maintained, and cultured at 37 °C with 5% CO2. Cell lines were authenticated by the suppliers and tested negative for mycoplasma.
HEK293T tissue culture transfection protocol and genomic DNA preparation
HEK293T cells grown in the absence of antibiotic were seeded on 48-well poly-d-lysine coated plates (Corning). 12–14 h after seeding, cells were transfected at approximately 70% confluency with 1.5 μl Lipofectamine 2000 (Thermo Fisher Scientific) according to the manufacturer’s protocols and 750 ng ABE plasmid, 250 ng sgRNA expression plasmid, and 10 ng GFP expression plasmid (Lonza). Unless otherwise stated, cells were cultured for 5 days, with a medium change on day 3. Medium was removed, cells were washed with 1 × PBS solution (Thermo Fisher Scientific), and genomic DNA was extracted by addition of 100 μl freshly prepared lysis buffer (10 mM Tris-HCl, pH 7.0, 0.05% SDS, 25 μg/ml Proteinase K (ThermoFisher Scientific)) directly into each well of the tissue culture plate. The genomic DNA mixture was transferred to a 96-well PCR plate and incubated at 37 °C for 1 h, followed by an 80 °C enzyme denaturation step for 30 min. Primers used for mammalian cell genomic DNA amplification are listed in Supplementary Table 9.
Nucleofection of HAP1 and HAP1 AAG − cells and genomic DNA extraction
HAP1 and HAP1 AAG- cells were nucleofected using the s.e. Cell Line 4D-Nucleofector X Kit S according to the manufacturer’s protocol. In brief, 4 × 105 cells were nucleofected with 300 ng of ABE plasmid and 100 ng of sgRNA expression plasmid using the 4D-Nucleofector program DZ-113 and cultured in 250 μl medium in a 48-well poly-d-lysine coated culture plate for 3 days. DNA was extracted as described above.
Nucleofection of U2OS cells and genomic DNA extraction
U2OS cells were nucleofected using the SG Cell Line 4D-Nucleofector X Kit (Lonza) according to the manufacturer’s protocol. In brief, 1.25 × 105 cells were nucleofected in 20 μl SG buffer along with 500 ng ABE plasmid and 100 ng sgRNA expression plasmid using the 4D-Nucleofector program EH-100 in a 16-well Nucleocuvette strip (20 μl cells per well). Freshly nucleofected cells were transferred into 250 μl medium in a 48-well poly-d-lysine coated culture plate. Cells were incubated for 5 days and medium was changed every day. DNA was extracted as described above.
Electroporation of LCL HFE C828Y cells
LCL cells were electroporated using a Gene Pulser Xcell Electroporater (BioRad) and 0.4-cm gap Gene Pulser electroporation cuvettes (BioRad). In brief, 1 × 107 LCL cells were resuspended in 250 μl RPMI-160 plus GlutaMax. To this medium was added 65 μg plasmid expressing ABE7.10, GFP, and the corresponding sgRNA targeting the C282Y mutation in the HFE gene. The mixture was added to a pre-chilled 0.4-cm gap electroporation cuvette and the cell–DNA mixture was incubated in the cuvette on ice for 10 min. Cells were pulsed at 250 V and 950 μF for 3 ms. Cells were transferred back onto ice for 10 min, then transferred to 15 ml pre-warmed RPMI-160 supplemented with 20% FBS in a T-75 flask. The next day, an additional 5 ml medium was added to the flask and cells were left to incubate for a total of 5 days. After incubation, cells were isolated by centrifugation, resuspended in 400 μl medium, filtered through a 40-μm strainer (Thermo Fisher Scientific), and sorted for GFP fluorescence using an FACSAria III Flow Cytometer (Becton Dickenson Biosciences). GFP-positive cells were collected in a 1.5-ml tube containing 500 μl medium. After centrifugation, the medium was removed and cells were washed twice with 600 μl 1× PBS (Thermo Fisher Scientific). Genomic DNA was extracted as described above.
Comparison between ABE 7.10 and HDR using the CORRECT method
HEK293T cells grown in the absence of antibiotic were seeded on 48-well poly-d-lysine coated plates (Corning). After 12–14 h, cells were transfected at ~70% confluency with 750 ng Cas9 or base editor plasmid, 250 ng sgRNA expression plasmid, 1.5 μl Lipofectamine 3000 (Thermo Fisher Scientific), and for HDR assays 0.7 μg single-stranded donor DNA template (100 nt, PAGE-purified from IDT) according to the manufacturer’s instructions. Single-stranded 100-mer oligonucleotide donor templates are listed in Supplementary Table 10.
Genomic DNA was harvested 48 h after transfection (as described43) using the Agencourt DNAdvance Genomic DNA isolation Kit (Beckman Coulter) according to the manufacturer’s instructions. A size-selective DNA isolation step ensured that there was no risk of contamination by the single-stranded donor DNA template in subsequent PCR amplification and sequencing steps. We re-designed amplification primers to ensure that there was minimal risk of amplifying the donor oligo template.
HTS of genomic DNA samples
Genomic sites of interest were amplified by PCR with primers containing homology to the region of interest and the appropriate Illumina forward and reverse adapters (Supplementary Table 9). Primer pairs used in this first round of PCR (PCR 1) for all genomic sites can be found in Supplementary Table 9. Specifically, 25 μl of a given PCR 1 reaction was assembled containing 0.5 μM of each forward and reverse primer, 1 μl genomic DNA extract and 12.5 μl Phusion U Green Multiplex PCR Master Mix. PCR reactions were carried out as follows: 95 °C for 2 min, then 30 cycles of (95 °C for 15 s, 62 °C for 20 s, and 72 °C for 20 s), followed by a final 72 °C extension for 2 min. PCR products were verified by comparison with DNA standards (Quick-Load 100 bp DNA ladder) on a 2% agarose gel supplemented with ethidium bromide. Unique Illumina barcoding primer pairs were added to each sample in a secondary PCR reaction (PCR 2). Specifically, 25 μl of a given PCR 2 reaction was assembled containing 0.5 μM of each unique forward and reverse illumina barcoding primer pair, 2 μl unpurified PCR 1 reaction mixture, and 12.5 μl Q5 Hot Start High-Fidelity 2× Master Mix. The barcoding PCR 2 reactions were carried out as follows: 95 °C for 2 min, then 15 cycles of (95 °C for 15 s, 61 °C for 20 s, and 72 °C for 20 s), followed by a final 72 °C extension for 2 min. PCR products were purified by electrophoresis with a 2% agarose gel using a QIAquick Gel Extraction Kit, eluting with 30 μl H2O. DNA concentration was quantified with the KAPA Library Quantification Kit-Illumina (KAPA Biosystems) and sequenced on an Illumina MiSeq instrument according to the manufacturer’s protocols.
General HTS data analysis
Sequencing reads were demultiplexed in MiSeq Reporter (Illumina). Alignment of amplicon sequences to a reference sequence was performed as previously described using a Matlab script with improved output format (Supplementary Note 1). In brief, the Smith–Waterman algorithm was used to align sequences without indels to a reference sequence; bases with a quality score of less than 30 were converted to ‘N’ to prevent base miscalling as a result of sequencing error. Indels were quantified separately using a modified version of a previously described Matlab script in which sequencing reads with more than half the base calls below a quality score of Q30 were filtered out (Supplementary Note 2). Indels were counted as reads which contained insertions or deletions of greater than or equal to 1 bp within a 30-bp window surrounding the predicted Cas9 cleavage site.
Owing to homology in the HBG1 and HBG2 loci, primers were designed that would amplify both loci within a single PCR reaction. In order to computationally separate sequences of these two genomic sites, sequencing experiments involving this amplicon were processed using a separate Python script (Supplementary Note 3). In brief, reads were disregarded if more than half of the base calls were below Q30, and base calls with a quality score below Q30 were converted to N. HBG1 or HBG2 reads were identified as having an exact match to a 37-bp sequence containing two SNPs that differ between the sites. A base calling and indel window were defined by exact matches to 10-bp flanking sequences on both sides of a 43-bp window centred on the protospacer sequence. Indels were counted as reads in which this base calling window differed in length by more than 1 bp. This Python script yields output with identical quality to the aforementioned Matlab script (estimated base calling error rate of <1 in 1,000), but in far less time owing to the absence of an alignment step.
To calculate the total number of edited reads as a proportion of the total number of successfully sequenced reads, the fraction of edited reads as measured by the alignment algorithm were multiplied by (1 – fraction of reads containing an indel).
RNA isolation from HEK293T cells and analysis
HEK293T cells were plated and a subset was transfected with ABE 7.10 as described above and incubated for five days before being removed from the plate using TrypLE Express (Thermo Fisher Scientific) and pelleted. RNA was extracted using the RNeasy Mini Kit (Qiagen) according to the manufacturer’s instructions. cDNA was generated from the isolated RNA using the ProtoScript II First Strand cDNA Synthesis Kit (New England Biolabs) according to the manufacturer’s instructions with a mixture of random primers and Oligo-dT primers. Amplification of the cDNA for high-throughput sequencing was performed to the top of the linear range (29 cycles for all four amplicons) using qPCR as described above. High-throughput sequencing of the amplicons was performed as described above. Sequences were aligned to the reference sequence for each RNA, obtained from the NCBI.
Linkage disequlilbrium analysis
A custom Python script (Supplementary Note 4) was used to assess editing probabilities at the primary target adenine (P1), at the secondary target adenine (P2), and at both the primary and secondary target adenines (P1,2). Linkage disequilibrium was then evaluated as P1,2 – (P1 × P2). Linkage disequilibrium values were normalized with a normalization factor of Min(P1(1 – P2), (1 – P1)P2). This normalization controls for allele frequency and yields a normalized linkage disequilibrium value from 0 to 1.
Scripts used in this work can be found in Supplementary Information, Supplementary Notes 1–4.
High-throughput sequencing data have been deposited in the NCBI Sequence Read Archive database under accession code SRP119577. Plasmids encoding ABE6.3, ABE7.8, ABE7.9, and ABE7.10 are available from Addgene.
Sequence Read Archive
This work was supported by DARPA HR0011-17-2-0049, US NIH RM1 HG009490, R01 EB022376, and R35 GM118062, and HHMI. A.C.K. and D.I.B. were Ruth L. Kirchstein National Research Service Awards Postdoctoral Fellows (F32 GM 112366 and F32 GM106621, respectively). M.S.P. was an NSF Graduate Research Fellow and was supported by training grant T32 GM008313. We thank Z. Niziolek for technical assistance. N.M.G. thanks A. E. Martin for his encouragement.