Abstract
Cytosine base editors (CBEs) have the potential to correct human pathogenic point mutations. However, their genome-wide specificity remains poorly understood. Here we report Detect-seq for the evaluation of CBE specificity. It enables sensitive detection of CBE-induced off-target sites at the genome-wide level. Detect-seq leverages chemical labeling and biotin pulldown to trace the editing intermediate deoxyuridine, thereby revealing the editome of CBE. In addition to Cas9-independent and typical Cas9-dependent off-target sites, we discovered edits outside the protospacer sequence (that is, out-of-protospacer) and on the target strand (which pairs with the single-guide RNA). Such unexpected off-target edits are prevalent and can exhibit a high editing ratio, while their occurrences exhibit cell-type dependency and cannot be predicted based on the sgRNA sequence. Moreover, we found out-of-protospacer and target-strand edits nearby the on-target sites tested, challenging the general knowledge that CBEs do not induce proximal off-target mutations. Collectively, our approaches allow unbiased analysis of the CBE editome and provide a widely applicable tool for specificity evaluation of various emerging genome editing tools.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Rent or buy this article
Prices vary by article type
from$1.95
to$39.95
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
All data generated for this paper have been deposited at NCBI GEO and are available under accession numbers GSE151265 and GSE152907. Source data are provided with this paper.
Code availability
Detect-seq tools are available at https://github.com/menghaowei/Detect-seq.
References
Doudna, J. A. The promise and challenge of therapeutic genome editing. Nature 578, 229–236 (2020).
Lee, J. et al. Recent advances in genome editing of stem cells for drug discovery and therapeutic application. Pharmacol. Ther. 209, 107501 (2020).
Wang, D., Zhang, F. & Gao, G. CRISPR-based therapeutic genome editing: strategies and in vivo delivery by AAV vectors. Cell 181, 136–150 (2020).
Komor, A. C., Badran, A. H. & Liu, D. R. CRISPR-based technologies for the manipulation of eukaryotic genomes. Cell 169, 559 (2017).
Rees, H. A. & Liu, D. R. Base editing: precision chemistry on the genome and transcriptome of living cells. Nat. Rev. Genet. 19, 770–788 (2018).
Anzalone, A. V., Koblan, L. W. & Liu, D. R. Genome editing with CRISPR–Cas nucleases, base editors, transposases and prime editors. Nat. Biotechnol. 38, 824–844 (2020).
Gaudelli, N. M. et al. Programmable base editing of A*T to G*C in genomic DNA without DNA cleavage. Nature 551, 464–471 (2017).
Komor, A. C., Kim, Y. B., Packer, M. S., Zuris, J. A. & Liu, D. R. Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage. Nature 533, 420–424 (2016).
Nishida, K. et al. Targeted nucleotide editing using hybrid prokaryotic and vertebrate adaptive immune systems. Science 353, aaf8729 (2016).
Dunbar, C. E. et al. Gene therapy comes of age. Science 359, eaan4672 (2018).
Grunewald, J. et al. Transcriptome-wide off-target RNA editing induced by CRISPR-guided DNA base editors. Nature 569, 433–437 (2019).
Jin, S. et al. Cytosine, but not adenine, base editors induce genome-wide off-target mutations in rice. Science 364, 292–295 (2019).
Kim, D. et al. Genome-wide target specificities of CRISPR RNA-guided programmable deaminases. Nat. Biotechnol. 35, 475–480 (2017).
McGrath, E. et al. Targeting specificity of APOBEC-based cytosine base editor in human iPSCs determined by whole genome sequencing. Nat. Commun. 10, 5353 (2019).
Zhou, C. et al. Off-target RNA mutation induced by DNA base editing and its elimination by mutagenesis. Nature 571, 275–278 (2019).
Zuo, E. et al. Cytosine base editor generates substantial off-target single-nucleotide variants in mouse embryos. Science 364, 289–292 (2019).
Shu, X. et al. Genome-wide mapping reveals that deoxyuridine is enriched in the human centromeric DNA. Nat. Chem. Biol. 14, 680–687 (2018).
Xia, B. et al. Bisulfite-free, base-resolution analysis of 5-formylcytosine at the genome scale. Nat. Methods 12, 1047–1050 (2015).
Zhu, C. et al. Single-cell 5-formylcytosine landscapes of mammalian early embryos and ESCs at single-base resolution. Cell Stem Cell 20, 720–731 e725 (2017).
Zeng, H. et al. Bisulfite-free, nanoscale analysis of 5-hydroxymethylcytosine at single base resolution. J. Am. Chem. Soc. 140, 13190–13194 (2018).
Tsai, S. Q. et al. GUIDE-seq enables genome-wide profiling of off-target cleavage by CRISPR-Cas nucleases. Nat. Biotechnol. 33, 187–197 (2015).
Wienert, B. et al. Unbiased detection of CRISPR off-targets in vivo using DISCOVER-Seq. Science 364, 286–289 (2019).
Hong, J. & Gresham, D. Incorporation of unique molecular identifiers in TruSeq adapters improves the accuracy of quantitative sequencing. Biotechniques 63, 221–226 (2017).
Salk, J. J., Schmitt, M. W. & Loeb, L. A. Enhancing the accuracy of next-generation sequencing for detecting rare and subclonal mutations. Nat. Rev. Genet. 19, 269–285 (2018).
Saraconi, G., Severi, F., Sala, C., Mattiuz, G. & Conticello, S. G. The RNA editing enzyme APOBEC1 induces somatic mutations and a compatible mutational signature is present in esophageal adenocarcinomas. Genome Biol. 15, 417 (2014).
Doman, J. L., Raguram, A., Newby, G. A. & Liu, D. R. Evaluation and minimization of Cas9-independent off-target DNA editing by cytosine base editors. Nat. Biotechnol. 38, 620–628 (2020).
Bae, S., Park, J. & Kim, J. S. Cas-OFFinder: a fast and versatile algorithm that searches for potential off-target sites of Cas9 RNA-guided endonucleases. Bioinformatics 30, 1473–1475 (2014).
Huai, C. et al. Structural insights into DNA cleavage activation of CRISPR-Cas9 system. Nat. Commun. 8, 1375 (2017).
Jiang, F. et al. Structures of a CRISPR-Cas9 R-loop complex primed for DNA cleavage. Science 351, 867–871 (2016).
Zuo, E. et al. A rationally engineered cytosine base editor retains high on-target activity while reducing both DNA and RNA off-target effects. Nat. Methods 17, 600–604 (2020).
Li, X. et al. Base editing with a Cpf1-cytidine deaminase fusion. Nat. Biotechnol. 36, 324–327 (2018).
Wang, X. et al. Cas12a base editors induce efficient and specific editing with low DNA damage response. Cell Rep. 31, 107723 (2020).
Kim, D., Lim, K., Kim, D. E. & Kim, J. S. Genome-wide specificity of dCpf1 cytidine base editors. Nat. Commun. 11, 4072 (2020).
Kim, D. et al. Genome-wide analysis reveals specificities of Cpf1 endonucleases in human cells. Nat. Biotechnol. 34, 863–868 (2016).
Kleinstiver, B. P. et al. Genome-wide specificities of CRISPR-Cas Cpf1 nucleases in human cells. Nat. Biotechnol. 34, 869–874 (2016).
Sakata, R. C. et al. Base editors for simultaneous introduction of C-to-T and A-to-G mutations. Nat. Biotechnol. 38, 865–869 (2020).
Zhang, X. et al. Dual base editor catalyzes both cytosine and adenine base conversions in human cells. Nat. Biotechnol. 38, 856–860 (2020).
Grunewald, J. et al. A dual-deaminase CRISPR base editor enables concurrent adenine and cytosine editing. Nat. Biotechnol. 38, 861–864 (2020).
Li, C. et al. Targeted, random mutagenesis of plant genes with dual cytosine and adenine base editors. Nat. Biotechnol. 38, 875–882 (2020).
Kurt, I. C. et al. CRISPR C-to-G base editors for inducing targeted DNA transversions in human cells. Nat. Biotech. 39, 41–46 (2020).
Zhao, D. et al. Glycosylase base editors enable C-to-A and C-to-G base changes. Nat. Biotech. 39, 35–40 (2020).
Mok, B. Y. et al. A bacterial cytidine deaminase toxin enables CRISPR-free mitochondrial base editing. Nature 583, 631–637 (2020).
Acknowledgements
We thank W. Wei (Peking University) and J. Hu (Peking University) for discussion, W. Wei together with J. Chen (ShanghaiTech University) for kindly providing related plasmids, and J. Liu (Peking University) for help with experiments. We thank the National Center for Protein Sciences at Peking University in Beijing, China, for assistance with FACS and the Fragment Analyzer. Bioinformatics analysis was performed on the High-Performance Computing Platform of the School of Life Sciences. This work was supported by the National Natural Science Foundation of China (grant nos. 21825701 and 91953201), National Key R&D Program (grant no. 2019YFA0110900) and the Peking University Ge Li and Ning Zhao Education Fund.
Author information
Authors and Affiliations
Contributions
Z. Lei, H.M., Z. Lv and C.Y. conceived and guided the research. Z. Lei and M.L. led the development of Detect-seq protocol. H.M. developed the computational pipeline for Detect-seq. H.M., H.W. and H.Z. analyzed all high-throughput sequencing data. Z. Lei and H.M. optimized the targeted amplicon sequencing methodology. Z. Lv conducted cellular experiments and molecular cloning assays. Z. Lei executed Detect-seq experiments. L.L., K.Y., X.Z., Y.Z. and Y.Y. assisted with the experiments. Z. Lei, H.M., Z. Lv and C.Y. wrote the paper.
Corresponding author
Ethics declarations
Competing interests
The authors have filed patent applications on related sequencing technologies.
Additional information
Peer review Information Nature Methods thanks Jia Chen and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Lei Tang was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Overview of Detect-seq.
Schematic procedures of Detect-seq. Fragmentation of genomic DNA was followed by end repair to avoid mistaken labeling or protection at overhangs. Endogenous 5fdC was blocked by O-ethylhydroxylamine (EtONH2)-based protection. Damage repair step eliminates false positive signals from endogenous DNA damages including abasic sites (AP), single strand breaks (SSB), etc. Deoxyuridine (dU) generated by CBE in vivo was labeled by an in vitro reconstituted base excision repair (BER) reaction: UDG removes dU with high specificity, generating abasic sites; Endo IV catalyzes the cleavage of abasic sites while leaving a 3’-OH remnant; Bst DNA polymerase initiates DNA synthesis following 3’-OH and successively replaces nucleotides 3’ to it; ligase finally sews the nicks. Through the ‘nick translation’ activity of Bst during the dU-labeling step, biotinylated dUTPs and 5fdCTPs were incorporated 3’ to dU. Malononitrile treatment marks the incorporated 5fdCs for introducing a featured tandem C-to-T mutation pattern to trace CBE editing events. Biotin pulldown followed by NaOH treatment enriches CBE edited DNA fragments and enhances signals for high-throughput sequencing.
Extended Data Fig. 2 The characteristics of Detect-seq signal pattern.
a, Enriched peaks and Detect-seq signals at the on-target site of EMX1 and VEGFA_site_2 (shorted as VEGFA) sgRNA. Sequencing data shown was generated from CBE-transfected HEK293T cells. Two independent biological replicates are shown, demonstrating high reproducibility. Red ‘T’s in the upper IGV (Integrative Genomics Viewer) view indicate C-to-T mutations on the non-target strand. Green ‘G’s in the lower IGV view indicate G-to-A mutations on the target strand (that is, C-to-T mutations on the non-target strand). b, A representative example showing Detect-seq signals can be easily distinguished from SNVs. Red blocks without a black triangle above represent C-to-T mutations on the non-target strand, while red blocks with a black triangle above indicate a G-to-T SNV; the red and green inverted triangles respectively indicate genuine C-to-T edits on the forward and reverse strand according to the results of targeted amplicon sequencing. The pRBS is shadowed, and the corresponding targeted amplicon sequencing results are shown below. c, Normalized signals within a 4 kb window at pRBSs identified by Detect-seq (navy line) and sites by genome sampling (green line). For plots of each sgRNA, the left panel shows signals in WGS data, while the right panel shows Detect-seq data.
Extended Data Fig. 3 Detect-seq identified prevalent pRBS-containing loci.
a, Genome-wide distribution of pRBSs identified by Detect-seq on each chromosome for the three sgRNAs. On- and off-target edits are indicated by red squares and blue circles, respectively. b, Sequence logos for different sgRNAs obtained via WebLogo using DNA sequences at the pRBSs. The upper, middle and lower panel respectively shows sequence logos of pRBSs with high, middle and low level of editing frequency. c, Detect-seq identified dozens to hundreds of off-target sites that are highly reproducible between two biological replicates for EMX1, HEK293_site_4 and VEGFA_site_2 in HEK293T cells, as well as for HEK293_site_4 in MCF7 cells. d, A representative, reproducible off-target site observed in two biological replicates. The pRBS is shadowed, and the corresponding targeted amplicon sequencing results are shown below. In the IGV views, green blocks indicate G-to-A mutations on the target-strand (that is, C-to-T mutations on the non-target strand), while red blocks indicate C-to-T mutations on the target-strand. Orange asterisks indicate signal regions of out-of-protospacer edits; the red and green inverted triangles respectively indicate genuine C-to-T edits on the forward and reverse strand according to the results of targeted amplicon sequencing. e, Recall ratio plot for all identified pRBSs of the three sgRNAs according to the downsampling result of Detect-seq data. f, Recall ratio plot for the top one-third pRBSs (ranked by signal strength of Detect-seq) of the three sgRNAs according to the downsampling result of Detect-seq data.
Extended Data Fig. 4 Cas9-dependent off-target sites are verified by an optimized targeted amplicon sequencing strategy.
a, Schematic workflow of the improved targeted amplicon sequencing procedure. Extended unique molecular identifiers (UMIs) are introduced to mark each amplicon during the first round of PCR amplification. b, The bioinformatic strategy to remove PCR duplicates and errors generated during the second round of PCR amplification according to left and right UMIs. c, d, Matched targeted amplicon sequencing results for Fig. 1e (c) and in Supplementary Fig. 4b (d). e, Detect-seq and the matched targeted high-throughput sequencing results were given for a representative putative sgRNA-binding site for the constructs in Fig. 1c. The pRBS is shadowed. Green and red blocks in the IGV views respectively indicate C-to-T mutations on the non-target strand and target strand; the red and green inverted triangles respectively indicate genuine C-to-T edits on the forward and reverse strand according to the results of targeted amplicon sequencing.
Extended Data Fig. 5 The distribution of the target-strand editing.
a, Signal distribution of Detect-seq. C-to-T mutations herein reflect non-target-strand edits, while G-to-A mutations are target-strand edits. b, Distribution of edited cytosines on the target strands. The pRBS region for off-target sites of VEGFA_site_2 sgRNA are indicated by the dashed lines. Count PAM as positions 21–23.
Extended Data Fig. 6 Comparisons of off-target effect induced by dCpf1-based and Cas9-based BEs.
a, Illustration of the two genomic sites used for direct comparison of the two BE systems. b, c, Detect-seq identified off-target sites of BE4max (b) or LbCpf1-BE (c) that are highly reproducible between two biological replicates for the RUNX1 and DYRK1A. d, Genome-wide distribution of reproducible off-target sites for the RUNX1 and DYRK1A. On- and off-target edits are indicated by red squares and blue circles respectively. f, g, Sequence logos for RUNX1 (f) and DYRK1A (g) obtained via WebLogo using DNA sequences at the pRBSs (putative sgRNA/crRNA binding sites). The pRBSs are identified by Detect-seq for BE4max or LbCpf1-BE and compared with predicted off-target sites by Cas-OFFinder (allowing no more than 7 mismatches).
Supplementary information
Supplementary Information
Supplementary Figs. 1–19.
Supplementary Table 1
Primer sequences for targeted amplicon sequencing, and designed spike-in model sequences.
Supplementary Table 2
Validation results for edits by targeted amplicon sequencing.
Supplementary Table 3
The lists of Detect-seq identified pRBSs.
Supplementary Table 4
Public data used in this study.
Source data
Source Data Fig. 1
Statistical source data.
Source Data Fig. 2
Statistical source data.
Source Data Fig. 3
Statistical source data.
Source Data Fig. 4
Statistical source data.
Source Data Fig. 5
Statistical source data.
Source Data Extended Data Fig. 3
Statistical source data.
Source Data Extended Data Fig. 5
Statistical source data.
Source Data Extended Data Fig. 6
Statistical source data.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Lei, Z., Meng, H., Lv, Z. et al. Detect-seq reveals out-of-protospacer editing and target-strand editing by cytosine base editors. Nat Methods 18, 643–651 (2021). https://doi.org/10.1038/s41592-021-01172-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41592-021-01172-w
This article is cited by
-
Base editing of organellar DNA with programmable deaminases
Nature Reviews Molecular Cell Biology (2024)
-
Tracking endogenous proteins based on RNA editing-mediated genetic code expansion
Nature Chemical Biology (2024)
-
CRISPR/Cas9-mediated base editors and their prospects for mitochondrial genome engineering
Gene Therapy (2024)
-
Cas9 variants expand the targeting scope of base editing systems in bacteria
The Nucleus (2024)
-
Detect-seq, a chemical labeling and biotin pull-down approach for the unbiased and genome-wide off-target evaluation of programmable cytosine base editors
Nature Protocols (2023)