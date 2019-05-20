Abstract
Base editing requires that the target sequence satisfy the protospacer adjacent motif requirement of the Cas9 domain and that the target nucleotide be located within the editing window of the base editor. To increase the targeting scope of base editors, we engineered six optimized adenine base editors (ABEmax variants) that use SpCas9 variants compatible with non-NGG protospacer adjacent motifs. To increase the range of target bases that can be modified within the protospacer, we use circularly permuted Cas9 variants to produce four cytosine and four adenine base editors with an editing window expanded from ~4–5 nucleotides to up to ~8–9 nucleotides and reduced byproduct formation. This set of base editors improves the targeting scope of cytosine and adenine base editing.
Data availability
Plasmids encoding modified PAM adenine base editors and circularly permuted cytosine and adenine base editors have been deposited to Addgene. High-throughput sequencing data are deposited in the NCBI Sequence Read Archive (PRJNA498804).
Additional information
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Acknowledgements
This work was supported by US NIH (grant nos. U01 AI142756, RM1 HG009490, R01 EB022376 and R35 GM118062); the St. Jude Research Consortium; the Ono Pharma Foundation; DARPA (grant no. HR0011–17–2–0049); and the HHMI. T.P.H. and K.T.Z. were supported by the Harvard Chemical Biology Program NIH Training Grant (no. T32 GM095450). S.M.M. was supported by an NSF graduate fellowship. B.L.O. was supported by the Innovative Genomic Institute Entrepreneurial Fellowship Program. C.F. is supported by US NIH grants (nos. K99 GM118909 and R00 GM118909). D.F.S. was supported by US NIH (grant no. DP2 EB018658).
Integrated supplementary information
Supplementary Fig. 1 Indel frequencies for ABE PAM variants at six NGA PAM sites, six NGCG PAM sites or three other PAM sites.
(a) Percent of all sequencing reads containing an indel following modification by VRQR-ABEmax, xABEmax, ABEmax, or NG-ABEmax at six genomic sites containing an NGA PAM in HEK293T cells. (b) Percent of all sequencing reads containing an indel following modification by VRQR-ABEmax, VRER-ABEmax, xABEmax, ABEmax, or NG-ABEmax at six genomic sites containing an NGCG PAM in HEK293T cells. (c) Percent of all sequencing reads containing an indel following modification by VRQR-ABEmax, xABEmax, ABEmax, or NG-ABEmax at three genomic sites (PAM: GAT, CGCC, TGCC) in HEK293T cells. Values and error bars reflect the mean±s.d. of three independent biological replicates performed by different researchers on different days.
Supplementary Fig. 2 Peak editing position and indel frequencies for SaABEmax and SaKKH-ABEmax at six NNGRRT or six NNHRRT PAM sites.
(a) Top: Percent of all sequencing reads containing an indel following modification by SaABEmax or SaKKH-ABEmax at six genomic sites containing an NNGRRT PAM. Bottom: Representative sample of the top two allelic outcomes at six genomic sites containing NNGRRT PAMs following modification with SaABEmax (n = 1 shown) in HEK293T cells. The top allelic outcome is the unmodified amplicon, followed by the most common editing outcome being a single A-to-G conversion (4 out of 6 sites) within a shifted window (protospacer positions 7–11) (b) Top: Percent of all sequencing reads containing an indel following modification by SaABEmax or SaKKH-ABEmax at six genomic sites containing an NNHRRT PAM (where H = A, C, or T). Bottom: Representative sample of the top two allelic outcomes at six genomic sites containing NNHRRT PAMs following modification with SaKKH-ABEmax (n = 1 shown) in HEK293T cells. The top allelic outcome is the unmodified amplicon, followed by the most common editing outcome being a single A-to-G conversion (5 out of 6 sites) within a shifted window (protospacer positions 7–11). Values and error bars reflect the mean ± s.d. of three independent biological replicates performed by different researchers on different days.
Supplementary Fig. 3 Indel frequencies for CP1300-CBEmax and CP1300-ABEmax at five genomic sites.
(a) Percent of all sequencing reads containing an indel following modification by CP-CBEmax variants compared to CBEmax at five genomic sites containing a variety of adenines and cytosines in HEK293T cells. (b) Percent of all sequencing reads containing an indel following modification by CP-ABEmax variants compared to ABEmax at five genomic sites containing a variety of adenines and cytosines in HEK293T cells. Values and error bars reflect the mean ± s.d. of three independent biological replicates performed by different researchers on different days.
Supplementary Fig. 4 Base-editing frequency for CP1300-CBEmax and CP1300-ABEmax at five genomic sites containing adenines and cytosines.
Base editing with (a) CP1300-CBEmax and (b) CP1300-ABEmax at five genomic sites containing a variety of adenines and cytosines in HEK293T cells. Nucleobase conversion is highly site dependent, with minimal activity at most of the five genomic sites tested. Values and error bars reflect the mean ± s.d. of three independent biological replicates performed by different researchers on different days.
Supplementary Fig. 5 Out-of-protospacer C·G-to-T·A conversion by CP1012-CBEmax, CP1028-CBEmax and CP1041-CBEmax variants.
Three of the five genomic sites treated with CP-CBEmax variants exhibited both nontarget strand editing and out-of-protospacer editing. Representative samples of three sites are shown, with the protospacer designated by the grey box, and out-of-protospacer C·G-to-T·A conversion highlighted in the red box. CP1012-CBEmax exhibited the most frequent out-of-protospacer editing, with CP1028-CBEmax and CP1041-CBEmax exhibiting this property on only one of the sites.
Supplementary Fig. 6 Predicted average minimum distance between original or new C termini and the ssDNA substrate for base editing.
(a) Crystal structure of the SpCas9:gRNA:DNA ternary complex with the ssDNA bubble partially resolved (PDB: 5F9R)36. Novel CP termini are represented as spheres (original N- and C- termini in dark grey). The minimal linear distance between the predicted position of the novel CP termini (and WT C-terminus) and the furthest resolved nucleobase in the ssDNA bubble (corresponding to protospacer position 12, counting the PAM as positions 21–23) is depicted. (b) Cryo-EM structure of the SpCas9:gRNA:DNA ternary complex with the ssDNA bubble fully resolved (PDB: 5Y36)38. Novel CP termini are represented as spheres (original N- and C- termini in dark grey). The minimal linear distance between the predicted position of the novel CP termini (and WT C-terminus) and a protospacer position typically targeted for base editing (corresponding to protospacer position 4) is depicted. (c) Average of the distances to two different target positions on the ssDNA substrate measured for the novel CP termini (or WT C-terminus) from (a) and (b), listed in ascending order.
Supplementary Fig. 7 Edited product distribution for CP-CBEmax-B variants at two genomic sites prone to non-C-to-T byproduct formation.
The product distribution among edited DNA sequencing reads (reads in which the target C is base edited) is shown for each CBEmax variant with no UGI (“CBEmax-B” variants) tested at the same two sites as in Fig. 2g. Subscripted numbers indicate protospacer positions, counting the first base of the PAM as position 21. Values and error bars reflect the mean ± s.d. of three biological replicates performed on different days at each site. ns, P > 0.05; *P < 0.05; **P < 0.01; ***P < 0.001, by two tailed Student’s t-test.
Supplementary Fig. 8 ClinVar analysis of targetable human pathogenic SNPs with expanded editing window CP-CBEmax and CP-ABEmax variants.
Fraction of pathogenic T•A-to-C•G to SNPs in ClinVar17,18 that could, in principle, be corrected by (a) CBEmax with an editing window of positions 4–8 (left) versus the SNPs correctable by CP-CBEmax with an editing window of positions 4–14 (right). Fraction of G•C-to-A•T pathogenic SNPs in ClinVar that could, in principle, be corrected by (b) ABEmax with an editing window of positions 4–8 (left) versus the SNPs correctable by CP-ABEmax with an editing window of positions 4–14 (right).
Supplementary information
Supplementary Information
Supplementary Figs. 1–8, Supplementary Tables 1–9, Supplementary Note and Supplementary Sequences
Reporting Summary
