Abstract
Following Cas9 cleavage, DNA repair without a donor template is generally considered stochastic, heterogeneous and impractical beyond gene disruption. Here, we show that template-free Cas9 editing is predictable and capable of precise repair to a predicted genotype, enabling correction of disease-associated mutations in humans. We constructed a library of 2,000 Cas9 guide RNAs paired with DNA target sites and trained inDelphi, a machine learning model that predicts genotypes and frequencies of 1- to 60-base-pair deletions and 1-base-pair insertions with high accuracy (r = 0.87) in five human and mouse cell lines. inDelphi predicts that 5–11% of Cas9 guide RNAs targeting the human genome are ‘precise-50’, yielding a single genotype comprising greater than or equal to 50% of all major editing products. We experimentally confirmed precise-50 insertions and deletions in 195 human disease-relevant alleles, including correction in primary patient-derived fibroblasts of pathogenic alleles to wild-type genotype for Hermansky–Pudlak syndrome and Menkes disease. This study establishes an approach for precise, template-free genome editing.
This is a preview of subscription content, access via your institution
Relevant articles
Open Access articles citing this article.
-
Recursive Editing improves homology-directed repair through retargeting of undesired outcomes
Nature Communications Open Access 05 August 2022
-
Target residence of Cas9-sgRNA influences DNA double-strand break repair pathway choices in CRISPR/Cas9 genome editing
Genome Biology Open Access 01 August 2022
-
Massively targeted evaluation of therapeutic CRISPR off-targets in cells
Nature Communications Open Access 13 July 2022
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 per month
cancel any time
Subscribe to this journal
Receive 51 print issues and online access
$199.00 per year
only $3.90 per issue
Rent or buy this article
Get just this article for as long as you need it
$39.95
Prices may be subject to local taxes which are calculated during checkout




Data availability
High-throughput sequencing data have been deposited in the NCBI Sequence Read Archive database under accession codes SRP141261 and SRP141144. Processed data have been deposited under the following DOIs: https://doi.org/10.6084/m9.figshare.6838016, https://doi.org/10.6084/m9.figshare.6837959, https://doi.org/10.6084/m9.figshare.6837956, https://doi.org/10.6084/m9.figshare.6837953, and https://doi.org/10.6084/m9.figshare.6837947.
Change history
14 February 2019
In this Article, a data processing error affected Fig. 3e and Extended Data Table 2; these errors have been corrected online.
References
Cong, L. et al. Multiplex genome engineering using CRISPR/Cas systems. Science 339, 819–823 (2013).
Mali, P. et al. RNA-guided human genome engineering via Cas9. Science 339, 823–826 (2013).
Jinek, M. et al. RNA-programmed genome editing in human cells. eLife 2, e00471 (2013).
Doench, J. G. et al. Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR–Cas9. Nat. Biotechnol. 34, 184–191 (2016).
Adli, M. The CRISPR tool kit for genome editing and beyond. Nat. Commun. 9, 1911 (2018).
Komor, A. C., Kim, Y. B., Packer, M. S., Zuris, J. A. & Liu, D. R. Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage. Nature 533, 420–424 (2016).
Gaudelli, N. M. et al. Programmable base editing of A•T to G•C in genomic DNA without DNA cleavage. Nature 551, 464–471 (2017).
Paquet, D. et al. Efficient introduction of specific homozygous and heterozygous mutations using CRISPR/Cas9. Nature 533, 125–129 (2016).
Landrum, M. J. et al. ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Res. 44, D862–D868 (2016).
Stenson, P. D. et al. Human Gene Mutation Database: towards a comprehensive central mutation database. J. Med. Genet. 45, 124–126 (2008).
Suzuki, K. et al. In vivo genome editing via CRISPR/Cas9 mediated homology-independent targeted integration. Nature 540, 144–149 (2016).
Nakade, S. et al. Microhomology-mediated end-joining-dependent integration of donor DNA in cells and animals using TALENs and CRISPR/Cas9. Nat. Commun. 5, 5560 (2014).
Koike-Yusa, H., Li, Y., Tan, E.-P., Velasco-Herrera, Mdel. C. & Yusa, K. Genome-wide recessive genetic screening in mammalian cells with a lentiviral CRISPR-guide RNA library. Nat. Biotechnol. 32, 267–273 (2014).
van Overbeek, M. et al. DNA repair profiling reveals nonrandom outcomes at Cas9-mediated breaks. Mol. Cell 63, 633–646 (2016).
Urasaki, A., Morvan, G. & Kawakami, K. Functional dissection of the Tol2 transposable element identified the minimal cis-sequence and a highly repetitive sequence in the subterminal region essential for transposition. Genetics 174, 639–649 (2006).
Ceccaldi, R., Rondinelli, B. & D’Andrea, A. D. Repair pathway choices and consequences at the double-strand break. Trends Cell Biol. 26, 52–64 (2016).
Deriano, L. & Roth, D. B. Modernizing the nonhomologous end-joining repertoire: alternative and classical NHEJ share the stage. Annu. Rev. Genet. 47, 433–455 (2013).
Bae, S., Kweon, J., Kim, H. S. & Kim, J.-S. Microhomology-based choice of Cas9 nuclease target sites. Nat. Methods 11, 705–706 (2014).
Cornu, T. I., Mussolino, C. & Cathomen, T. Refining strategies to translate genome editing to the clinic. Nat. Med. 23, 415–423 (2017).
Davis, A. J. & Chen, D. J. DNA double strand break repair via non-homologous end-joining. Transl. Cancer Res. 2, 130–143 (2013).
Arbab, M., Srinivasan, S., Hashimoto, T., Geijsen, N. & Sherwood, R. I. Cloning-free CRISPR. Stem Cell Reports 5, 908–917 (2015).
Bourbon, M., Alves, A. C. & Sijbrands, E. J. Low-density lipoprotein receptor mutational analysis in diagnosis of familial hypercholesterolemia. Curr. Opin. Lipidol. 28, 120–129 (2017).
Ran, F. A. et al. In vivo genome editing using Staphylococcus aureus Cas9. Nature 520, 186–191 (2015).
Oh, J. et al. Positional cloning of a gene for Hermansky–Pudlak syndrome, a disorder of cytoplasmic organelles. Nat. Genet. 14, 300–306 (1996).
Biehs, R. et al. DNA double-strand break resection occurs during non-homologous end joining in G1 but is distinct from resection during homologous recombination. Mol. Cell 65, 671–684 (2017).
Shin, H. Y. et al. CRISPR/Cas9 targeting events cause complex deletions and insertions at 17 sites in the mouse genome. Nat. Commun. 8, 15464 (2017).
Kosicki, M., Tomberg, K. & Bradley, A. Repair of double-strand breaks induced by CRISPR–Cas9 leads to large deletions and complex rearrangements. Nat. Biotechnol. 36, 765–771 (2018).
Kleinstiver, B. P. et al. Broadening the targeting range of Staphylococcus aureus CRISPR–Cas9 by modifying PAM recognition. Nat. Biotechnol. 33, 1293–1298 (2015).
Sherwood, R. I. et al. Discovery of directional and nondirectional pioneer transcription factors by modeling DNase profile magnitude and shape. Nat. Biotechnol. 32, 171–178 (2014).
Acknowledgements
The authors thank O. Juez, R. Jodhani and C. Araneo for technical assistance and the MIT Biomicro Center, the Harvard Medical School Biopolymers Facility, and the Broad Institute Genomics Platform for sequencing. The authors acknowledge funding from an NSF Graduate Research Fellowship to M.W.S.; an NWO Rubicon Fellowship to M.A.; 1R01HG010372 (C.A.C.); DARPA HR0011-17-2-0049, NIHRM1 HG009490, R01 EB022376, R35 GM118062, HHMI (D.R.L.); 1RO1HG008363, 1R01HG008754 (D.K.G.); 1K01DK101684, the Human Frontier Science Program, NWO, Brigham Research Institute, Harvard Stem Cell Institute, and American Cancer Society (R.I.S.).
Reviewer information
Nature thanks D. Durocher, R. Platt and the anonymous reviewer(s) for their contribution to the peer review of this work.
Author information
Authors and Affiliations
Contributions
M.W.S., J.Y.H. and D.K.G. contributed to the inDelphi model. M.W.S., M.A., C.A.C., D.R.L., D.K.G. and R.I.S. contributed to the editing libraries, assays and applications. M.A. and R.I.S. contributed to the library experimental protocol and performed library experiments in mESCs, DNA repair-deficient mESCs, and U2OS cells. D.W., S.J.C., O.K. and R.I.S. performed endogenous experiments in mESCs, HCT116, U2OS and HEK293T cells. M.A. performed endogenous experiments in primary patient fibroblasts. M.W.S., J.Y.H., C.A.C. and D.K.G. contributed to algorithm development and computational analysis. M.W.S., M.A., D.R.L., D.K.G. and R.I.S. contributed to writing and editing the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare competing interests: patent applications have been filed on this work. D.R.L. is a consultant and co-founder of Editas Medicine, Beam Therapeutics and Pairwise Plants, companies that use genome editing technologies. D.K.G. is a co-founder of Think Therapeutics, a company that uses machine learning for therapeutic development.
Additional information
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
Extended Data Fig. 1 Design and cloning of a high-throughput library to assess CRISPR–Cas9-mediated editing products, yielding diverse and replicate-consistent data that is concordant with repair spectra at endogenous human genomic loci.
a, Empirical distributions of various predicted and measured properties of DNA from 169,279 SpCas9 gRNA target sites in the human genome. Number of target sites per range used to design lib-A are indicated. b, Cumulative percentage of endogenous deletions in VO target sites in HEK293 (n = 89 target sites), HCT116 (n = 92) and K562 (n = 86) cells that delete up to the reported number of nucleotides (x axis). c, Schematic of the cloning process used to clone lib-A and lib-B (Methods, Supplementary Discussion, Supplementary Methods). d, Number of unique high-confidence editing outcomes (Supplementary Methods) called by simulating data subsampling in data in lib-A (n = 2,000 target sites) in mESCs (combined data from n = 3 independent biological replicates) and U2OS cells (combined data from n = 2 independent biological replicates). For ‘all’, the original non-subsampled data are presented. Each box depicts data for 2,000 target sites. Outliers are not depicted. e, Pearson’s r of genotype frequencies comparing lib-A in mESCs and U2OS cells with endogenous data in HEK293 (n = 87 target sites), HCT116 (n = 88), and K562 (n = 86) cells. Outliers are depicted as diamonds. 1-bp insertion frequency adjustment was performed at each target site by proportionally scaling them to be equal between two cell types. f, Pearson’s r of genotype frequencies at lib-A target sites, comparing two independent biological replicate experiments in mESCs (n = 1,861 target sites, median r = 0.89) and U2OS cells (n = 1,921, median r = 0.77). Outliers are depicted as diamonds. Box plots denote the 25th, 50th and 75th percentiles and whiskers show 1.5 times the interquartile range.
Extended Data Fig. 2 Categorizing and modelling Cas9-mediated DNA repair products with manual data-analysis and automated machine learning through inDelphi.
a, b, Categories of Cas9-mediated genotypic outcomes in data from endogenous contexts at VO target sites in K562 (n = 88 target sites), HCT116 (n = 92), HEK293 (n = 89) cells (collectively, a) and U2OS cells (b, n = 1,958 lib-A target sites). c, Categories and defined properties (Supplementary Methods) of all sequence alignments consistent with a Cas9-mediated 7-bp deletion. d, Hypothesized mechanisms for template-free DNA repair at Cas9-mediated DSBs based on components of the classical NHEJ, alternative NHEJ or MMEJ pathways (Supplementary Discussion). e, Function learned for modelling MH deletions (Supplementary Methods). f, Function learned for modelling MH-independent deletions (MHless-NN) mapping deletion length to a numeric score (psi, Supplementary Methods, point plot) and with deletion length penalty normalized to sum to 1 (phi, Supplementary Methods, histogram).
Extended Data Fig. 3 Influential role of hyperlocal sequence context features in predicting and causing 1-bp insertions.
a, Frequency of 1-bp insertions in mESCs (n = 1,981 lib-A target sites) and U2OS cells (n = 1,918) with varying −4 nucleotides. b, c, Plot of 1-bp insertion frequency in mESCs (n = 1,996 lib-A target sites) and U2OS cells (n = 1,966) compared to their total phi score (b) and predicted deletion length precision score (c) with Pearson’s r. d, Comparison of 1-bp insertion frequencies among all edited products from 1,966 lib-A target sites in U2OS cells (combined data from n = 2 independent biological replicates). e, Nucleotides and their effect on the frequency of 1-bp insertions in U2OS cells. Only bases with non-zero linear regression weights in 10,000-fold iterative cross-validation are shown. Total n = 1,966 lib-A target sites. f, Insertion frequency in mESCs (n = 205) and U2OS cells (n = 217) when varying four bases by the cleavage site (positions −5 to −2 counted from the NGG-PAM at positions 0–2) contained within three target sites designed with weak microhomology. g, Microhomology strength (deletion phi score) and 1-bp insertions in mESCs for 312 ‘4-bp’ target sites and 89 VO sequences. *P = 6.1 × 10−9; two-sided two-sample t-test, test statistic = −5.94, d.f. = 399, Hedges’ g effect size = 0.49. Box plots denote the 25th, 50th and 75th percentiles, whiskers show 1.5 times the interquartile range, and outliers are depicted as diamonds.
Extended Data Fig. 4 inDelphi predictions represent nearly all editing outcomes and are accurate at predicting the frequencies of genotypes, indel lengths, and frameshift frequencies.
a, b, Pearson’s r for held-out lib-A target sites comparing inDelphi predictions with observed frequencies for genotypes (a) and indel lengths (b) in mESCs and U2OS cells. The box denotes the 25th, 50th and 75th percentiles, whiskers show 1.5 times the interquartile range. Densities were smoothed with noise but do not extend beyond the data. c, Pie chart depicting the output of Delphi for specific outcome classes at lib-A target sites in mESCs. d, e, Comparison of two methods for frameshift predictions to observed values with Pearson’s r in HCT116 cells (d, n = 91 target sites) and K562 cells (e, n = 82 target sites). The error band represents the 95% confidence intervals around the regression estimate with 1,000-fold bootstrapping. f, Distribution of predicted frameshift frequencies among 1–60-bp deletions for SpCas9 gRNAs targeting exons (n = 1,000,294 gRNAs; mean = 66.4%) and shuffled versions (mean, 69.3%), and introns (n = 740,759) in the human genome. Dashed lines indicate means. ***P < 10−300, two-sided Welch’s t-test, test statistic = −145.5, d.f. = 1,506,304, Hedges’ g = −0.19.
Extended Data Fig. 5 Characterization of lib-B data including pathogenic microduplication repair in wild-type mESCs, wild-type U2OS cells and mESCs treated with DPKi3, NU7026 and MLN4924.
a, Box plots of the number of unique high-confidence editing outcomes (see Supplementary Methods) called by simulating data subsampling in data at 2,000 lib-B target sites in mESCs (combined data from n = 2 independent technical replicates) and U2OS cells (combined data from n = 2 independent biological replicates). In ‘all’, the full non-subsampled data are presented (see Supplementary Table 2 for read counts). Each box depicts data for 2,000 target sites. The box denotes the 25th, 50th, and 75th percentiles and whiskers show 1.5 times the interquartile range. Outliers are not depicted. b, Frequencies of repair to wild-type genotype at 567 ClinVar pathogenic alleles versus predicted frequencies in lib-B in human U2OS cells with Pearson’s r. c, Frequencies of repair to wild-type frame at 437 ClinVar pathogenic alleles versus predicted frequencies in lib-B in human U2OS cells with Pearson’s r. d, Frequency of pathogenic microduplication repair in wild-type mESCs (n = 1,480 target sites) compared to mESCs treated with MLN4924 (n = 1,569), NU7041 (n = 1,561) and DPKi3 (n = 1,563).
Extended Data Fig. 6 Altered distributions of Cas9-mediated genotypic products in Prkdc−/−Lig4−/− mESCs and mESCs treated with DPKi3, NU7026, and MLN4924 compared to wild-type mESCs.
a, Comparison of MH deletions among all deletions at lib-B target sites in wild-type cells (n = 1,909 target sites), cells treated with DPKi3 (n = 1,999), MLN4924 (n = 1,995) or NU7026 (n = 1,999) and Prkdc−/−Lig4−/− cells (n = 1,446). Statistical tests performed against wild-type population. *P = 5.6 × 10−5, **P = 3.5 × 10−13, ***P = 5.0 × 10−41, two-sided Welch’s t-test. b, Comparison of the frequency of each class of MH-less deletions among all deletion products in wild-type (lib-A and lib-B target sites, n = 3,829 target sites), DPKi3 (lib-B, n = 1,990), MLN4924 (lib-B, n = 1,980), NU7026 (lib-B, n = 1,992) and Prkdc−/−Lig4−/− (lib-A and lib-B target sites, n = 3,344). P values are compared to wild-type, two-sided Welch’s t-test. c, Frequency of 1-bp insertions at 1,055 target sites in lib-A in Prkdc−/−Lig4−/− mESCs. d, Frequencies of deletion repair to wild-type genotype in lib-B in wild-type mESCs (n = 1,480 target sites, combined data from two technical replicates) compared to conditions, with combined data from two independent biological replicates for each of Prkdc−/−Lig4−/− (n = 1,041 target sites), MLN4924 (n = 1,569), NU7026 (n = 1,561) and DPKi3 (n = 1,563). e, Table of Pearson’s r of the change in disease correction frequency compared to wild-type at n = 791 target sites for each pair of conditions. f, g, Annexin V-568 staining flow cytometry contour plots (f) and mean ± standard deviation values (g) in wild-type and Prkdc−/−Lig4−/− lib-A mESCs following transfection with SpCas9–P2A–GFP (representative data for n = 2 experiments). Box plots denote the 25th, 50th and 75th percentiles, whiskers show 1.5 times the interquartile range, and outliers are depicted as diamonds. For detailed statistics on significance tests, see Methods.
Extended Data Fig. 7 Template-free Cas9-nuclease editing of human and mouse cells containing pathogenic alleles.
a, b, Flow cytometric contour plots showing GFP fluorescence and LDL–Dylight550 uptake in (a) and fluorescence microscopy of (b) HCT116 cells containing the denoted LDLR alleles and treated with SaCas9 and gRNA when denoted (representative data for n = 2 experiments). c, Fluorescence microscopy of U2OS cells containing the denoted LDLR alleles and treated with SaCas9 and gRNA when denoted (representative data for n = 2 experiments). d, e, Flow cytometry gating strategy used for mESC and LDLRdup–P2A–GFP untreated (d) and treated with SpCas9 and gRNA (e). f, g, Results of 12 pathogenic 1-bp deletion alleles selected by inDelphi for high 1-bp insertion frequency (combined data from n = 2 independent biological replicates) compared to lib-A (f) and presented in a table (g). The box denotes the 25th, 50th and 75th percentiles, whiskers show 1.5 times the interquartile range, and outliers are depicted as diamonds. *P = 1.6 × 10−4, two-sided Welch’s t-test. For detailed statistics, see Methods. In the table, the most frequent 1-bp insertion genotype predicted by inDelphi that does not correspond to the wild-type genotype is indicated by an asterisk. In fluorescence microscopy plots, GFP fluorescence is shown in green, LDL–Dylight550 uptake in red, and Hoechst staining nuclei in blue.
Supplementary information
Supplementary Information
This file contains Supplementary Discussion, Supplementary Methods and Supplementary References.
Source data
Rights and permissions
About this article
Cite this article
Shen, M.W., Arbab, M., Hsu, J.Y. et al. Predictable and precise template-free CRISPR editing of pathogenic variants. Nature 563, 646–651 (2018). https://doi.org/10.1038/s41586-018-0686-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41586-018-0686-x
Keywords
- Production Editor
- Hermansky-Pudlak Syndrome (HPS1)
- Low-density Lipoprotein Receptor (LDLR)
- Microhomology-mediated End Joining (MMEJ)
- gRNA Expression Plasmid
This article is cited by
-
Prime editing for precise and highly versatile genome manipulation
Nature Reviews Genetics (2023)
-
Target residence of Cas9-sgRNA influences DNA double-strand break repair pathway choices in CRISPR/Cas9 genome editing
Genome Biology (2022)
-
Recursive Editing improves homology-directed repair through retargeting of undesired outcomes
Nature Communications (2022)
-
Cas9-induced large deletions and small indels are controlled in a convergent fashion
Nature Communications (2022)
-
Systematic decomposition of sequence determinants governing CRISPR/Cas9 specificity
Nature Communications (2022)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.