Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Predictable and precise template-free CRISPR editing of pathogenic variants

An Author Correction to this article was published on 14 February 2019

This article has been updated


Following Cas9 cleavage, DNA repair without a donor template is generally considered stochastic, heterogeneous and impractical beyond gene disruption. Here, we show that template-free Cas9 editing is predictable and capable of precise repair to a predicted genotype, enabling correction of disease-associated mutations in humans. We constructed a library of 2,000 Cas9 guide RNAs paired with DNA target sites and trained inDelphi, a machine learning model that predicts genotypes and frequencies of 1- to 60-base-pair deletions and 1-base-pair insertions with high accuracy (r = 0.87) in five human and mouse cell lines. inDelphi predicts that 5–11% of Cas9 guide RNAs targeting the human genome are ‘precise-50’, yielding a single genotype comprising greater than or equal to 50% of all major editing products. We experimentally confirmed precise-50 insertions and deletions in 195 human disease-relevant alleles, including correction in primary patient-derived fibroblasts of pathogenic alleles to wild-type genotype for Hermansky–Pudlak syndrome and Menkes disease. This study establishes an approach for precise, template-free genome editing.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: High-throughput assaying of Cas9-mediated DNA repair products supports the design of the inDelphi model.
Fig. 2: Sequence context influences 1-bp insertions.
Fig. 3: inDelphi accurately predicts nearly all editing outcomes.
Fig. 4: Precise template-free correction of pathogenic alleles.

Similar content being viewed by others

Data availability

High-throughput sequencing data have been deposited in the NCBI Sequence Read Archive database under accession codes SRP141261 and SRP141144. Processed data have been deposited under the following DOIs:,,,, and

Change history

  • 14 February 2019

    In this Article, a data processing error affected Fig. 3e and Extended Data Table 2; these errors have been corrected online.


  1. Cong, L. et al. Multiplex genome engineering using CRISPR/Cas systems. Science 339, 819–823 (2013).

    Article  ADS  CAS  Google Scholar 

  2. Mali, P. et al. RNA-guided human genome engineering via Cas9. Science 339, 823–826 (2013).

    Article  ADS  CAS  Google Scholar 

  3. Jinek, M. et al. RNA-programmed genome editing in human cells. eLife 2, e00471 (2013).

    Article  Google Scholar 

  4. Doench, J. G. et al. Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR–Cas9. Nat. Biotechnol. 34, 184–191 (2016).

    Article  CAS  Google Scholar 

  5. Adli, M. The CRISPR tool kit for genome editing and beyond. Nat. Commun. 9, 1911 (2018).

    Article  ADS  Google Scholar 

  6. Komor, A. C., Kim, Y. B., Packer, M. S., Zuris, J. A. & Liu, D. R. Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage. Nature 533, 420–424 (2016).

    Article  ADS  CAS  Google Scholar 

  7. Gaudelli, N. M. et al. Programmable base editing of A•T to G•C in genomic DNA without DNA cleavage. Nature 551, 464–471 (2017).

    Article  ADS  CAS  Google Scholar 

  8. Paquet, D. et al. Efficient introduction of specific homozygous and heterozygous mutations using CRISPR/Cas9. Nature 533, 125–129 (2016).

    Article  ADS  CAS  Google Scholar 

  9. Landrum, M. J. et al. ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Res. 44, D862–D868 (2016).

    Article  CAS  Google Scholar 

  10. Stenson, P. D. et al. Human Gene Mutation Database: towards a comprehensive central mutation database. J. Med. Genet. 45, 124–126 (2008).

    Article  CAS  Google Scholar 

  11. Suzuki, K. et al. In vivo genome editing via CRISPR/Cas9 mediated homology-independent targeted integration. Nature 540, 144–149 (2016).

    Article  ADS  CAS  Google Scholar 

  12. Nakade, S. et al. Microhomology-mediated end-joining-dependent integration of donor DNA in cells and animals using TALENs and CRISPR/Cas9. Nat. Commun. 5, 5560 (2014).

    Article  CAS  Google Scholar 

  13. Koike-Yusa, H., Li, Y., Tan, E.-P., Velasco-Herrera, Mdel. C. & Yusa, K. Genome-wide recessive genetic screening in mammalian cells with a lentiviral CRISPR-guide RNA library. Nat. Biotechnol. 32, 267–273 (2014).

    Article  CAS  Google Scholar 

  14. van Overbeek, M. et al. DNA repair profiling reveals nonrandom outcomes at Cas9-mediated breaks. Mol. Cell 63, 633–646 (2016).

    Article  Google Scholar 

  15. Urasaki, A., Morvan, G. & Kawakami, K. Functional dissection of the Tol2 transposable element identified the minimal cis-sequence and a highly repetitive sequence in the subterminal region essential for transposition. Genetics 174, 639–649 (2006).

    Article  CAS  Google Scholar 

  16. Ceccaldi, R., Rondinelli, B. & D’Andrea, A. D. Repair pathway choices and consequences at the double-strand break. Trends Cell Biol. 26, 52–64 (2016).

    Article  CAS  Google Scholar 

  17. Deriano, L. & Roth, D. B. Modernizing the nonhomologous end-joining repertoire: alternative and classical NHEJ share the stage. Annu. Rev. Genet. 47, 433–455 (2013).

    Article  CAS  Google Scholar 

  18. Bae, S., Kweon, J., Kim, H. S. & Kim, J.-S. Microhomology-based choice of Cas9 nuclease target sites. Nat. Methods 11, 705–706 (2014).

    Article  CAS  Google Scholar 

  19. Cornu, T. I., Mussolino, C. & Cathomen, T. Refining strategies to translate genome editing to the clinic. Nat. Med. 23, 415–423 (2017).

    Article  CAS  Google Scholar 

  20. Davis, A. J. & Chen, D. J. DNA double strand break repair via non-homologous end-joining. Transl. Cancer Res. 2, 130–143 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  21. Arbab, M., Srinivasan, S., Hashimoto, T., Geijsen, N. & Sherwood, R. I. Cloning-free CRISPR. Stem Cell Reports 5, 908–917 (2015).

    Article  CAS  Google Scholar 

  22. Bourbon, M., Alves, A. C. & Sijbrands, E. J. Low-density lipoprotein receptor mutational analysis in diagnosis of familial hypercholesterolemia. Curr. Opin. Lipidol. 28, 120–129 (2017).

    Article  CAS  Google Scholar 

  23. Ran, F. A. et al. In vivo genome editing using Staphylococcus aureus Cas9. Nature 520, 186–191 (2015).

    Article  ADS  CAS  Google Scholar 

  24. Oh, J. et al. Positional cloning of a gene for Hermansky–Pudlak syndrome, a disorder of cytoplasmic organelles. Nat. Genet. 14, 300–306 (1996).

    Article  CAS  Google Scholar 

  25. Biehs, R. et al. DNA double-strand break resection occurs during non-homologous end joining in G1 but is distinct from resection during homologous recombination. Mol. Cell 65, 671–684 (2017).

    Article  CAS  Google Scholar 

  26. Shin, H. Y. et al. CRISPR/Cas9 targeting events cause complex deletions and insertions at 17 sites in the mouse genome. Nat. Commun. 8, 15464 (2017).

    Article  ADS  CAS  Google Scholar 

  27. Kosicki, M., Tomberg, K. & Bradley, A. Repair of double-strand breaks induced by CRISPR–Cas9 leads to large deletions and complex rearrangements. Nat. Biotechnol. 36, 765–771 (2018).

    Article  CAS  Google Scholar 

  28. Kleinstiver, B. P. et al. Broadening the targeting range of Staphylococcus aureus CRISPR–Cas9 by modifying PAM recognition. Nat. Biotechnol. 33, 1293–1298 (2015).

    Article  CAS  Google Scholar 

  29. Sherwood, R. I. et al. Discovery of directional and nondirectional pioneer transcription factors by modeling DNase profile magnitude and shape. Nat. Biotechnol. 32, 171–178 (2014).

    Article  CAS  Google Scholar 

Download references


The authors thank O. Juez, R. Jodhani and C. Araneo for technical assistance and the MIT Biomicro Center, the Harvard Medical School Biopolymers Facility, and the Broad Institute Genomics Platform for sequencing. The authors acknowledge funding from an NSF Graduate Research Fellowship to M.W.S.; an NWO Rubicon Fellowship to M.A.; 1R01HG010372 (C.A.C.); DARPA HR0011-17-2-0049, NIHRM1 HG009490, R01 EB022376, R35 GM118062, HHMI (D.R.L.); 1RO1HG008363, 1R01HG008754 (D.K.G.); 1K01DK101684, the Human Frontier Science Program, NWO, Brigham Research Institute, Harvard Stem Cell Institute, and American Cancer Society (R.I.S.).

Reviewer information

Nature thanks D. Durocher, R. Platt and the anonymous reviewer(s) for their contribution to the peer review of this work.

Author information

Authors and Affiliations



M.W.S., J.Y.H. and D.K.G. contributed to the inDelphi model. M.W.S., M.A., C.A.C., D.R.L., D.K.G. and R.I.S. contributed to the editing libraries, assays and applications. M.A. and R.I.S. contributed to the library experimental protocol and performed library experiments in mESCs, DNA repair-deficient mESCs, and U2OS cells. D.W., S.J.C., O.K. and R.I.S. performed endogenous experiments in mESCs, HCT116, U2OS and HEK293T cells. M.A. performed endogenous experiments in primary patient fibroblasts. M.W.S., J.Y.H., C.A.C. and D.K.G. contributed to algorithm development and computational analysis. M.W.S., M.A., D.R.L., D.K.G. and R.I.S. contributed to writing and editing the manuscript.

Corresponding authors

Correspondence to David R. Liu, David K. Gifford or Richard I. Sherwood.

Ethics declarations

Competing interests

The authors declare competing interests: patent applications have been filed on this work. D.R.L. is a consultant and co-founder of Editas Medicine, Beam Therapeutics and Pairwise Plants, companies that use genome editing technologies. D.K.G. is a co-founder of Think Therapeutics, a company that uses machine learning for therapeutic development.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Design and cloning of a high-throughput library to assess CRISPR–Cas9-mediated editing products, yielding diverse and replicate-consistent data that is concordant with repair spectra at endogenous human genomic loci.

a, Empirical distributions of various predicted and measured properties of DNA from 169,279 SpCas9 gRNA target sites in the human genome. Number of target sites per range used to design lib-A are indicated. b, Cumulative percentage of endogenous deletions in VO target sites in HEK293 (n = 89 target sites), HCT116 (n = 92) and K562 (n = 86) cells that delete up to the reported number of nucleotides (x axis). c, Schematic of the cloning process used to clone lib-A and lib-B (Methods, Supplementary Discussion, Supplementary Methods). d, Number of unique high-confidence editing outcomes (Supplementary Methods) called by simulating data subsampling in data in lib-A (n = 2,000 target sites) in mESCs (combined data from n = 3 independent biological replicates) and U2OS cells (combined data from n = 2 independent biological replicates). For ‘all’, the original non-subsampled data are presented. Each box depicts data for 2,000 target sites. Outliers are not depicted. e, Pearson’s r of genotype frequencies comparing lib-A in mESCs and U2OS cells with endogenous data in HEK293 (n = 87 target sites), HCT116 (n = 88), and K562 (n = 86) cells. Outliers are depicted as diamonds. 1-bp insertion frequency adjustment was performed at each target site by proportionally scaling them to be equal between two cell types. f, Pearson’s r of genotype frequencies at lib-A target sites, comparing two independent biological replicate experiments in mESCs (n = 1,861 target sites, median r = 0.89) and U2OS cells (n = 1,921, median r = 0.77). Outliers are depicted as diamonds. Box plots denote the 25th, 50th and 75th percentiles and whiskers show 1.5 times the interquartile range.

Source data

Extended Data Fig. 2 Categorizing and modelling Cas9-mediated DNA repair products with manual data-analysis and automated machine learning through inDelphi.

a, b, Categories of Cas9-mediated genotypic outcomes in data from endogenous contexts at VO target sites in K562 (n = 88 target sites), HCT116 (n = 92), HEK293 (n = 89) cells (collectively, a) and U2OS cells (b, n = 1,958 lib-A target sites). c, Categories and defined properties (Supplementary Methods) of all sequence alignments consistent with a Cas9-mediated 7-bp deletion. d, Hypothesized mechanisms for template-free DNA repair at Cas9-mediated DSBs based on components of the classical NHEJ, alternative NHEJ or MMEJ pathways (Supplementary Discussion). e, Function learned for modelling MH deletions (Supplementary Methods). f, Function learned for modelling MH-independent deletions (MHless-NN) mapping deletion length to a numeric score (psi, Supplementary Methods, point plot) and with deletion length penalty normalized to sum to 1 (phi, Supplementary Methods, histogram).

Source data

Extended Data Fig. 3 Influential role of hyperlocal sequence context features in predicting and causing 1-bp insertions.

a, Frequency of 1-bp insertions in mESCs (n = 1,981 lib-A target sites) and U2OS cells (n = 1,918) with varying −4 nucleotides. b, c, Plot of 1-bp insertion frequency in mESCs (n = 1,996 lib-A target sites) and U2OS cells (n = 1,966) compared to their total phi score (b) and predicted deletion length precision score (c) with Pearson’s r. d, Comparison of 1-bp insertion frequencies among all edited products from 1,966 lib-A target sites in U2OS cells (combined data from n = 2 independent biological replicates). e, Nucleotides and their effect on the frequency of 1-bp insertions in U2OS cells. Only bases with non-zero linear regression weights in 10,000-fold iterative cross-validation are shown. Total n = 1,966 lib-A target sites. f, Insertion frequency in mESCs (n = 205) and U2OS cells (n = 217) when varying four bases by the cleavage site (positions −5 to −2 counted from the NGG-PAM at positions 0–2) contained within three target sites designed with weak microhomology. g, Microhomology strength (deletion phi score) and 1-bp insertions in mESCs for 312 ‘4-bp’ target sites and 89 VO sequences. *P = 6.1 × 10−9; two-sided two-sample t-test, test statistic = −5.94, d.f. = 399, Hedges’ g effect size = 0.49. Box plots denote the 25th, 50th and 75th percentiles, whiskers show 1.5 times the interquartile range, and outliers are depicted as diamonds.

Source data

Extended Data Fig. 4 inDelphi predictions represent nearly all editing outcomes and are accurate at predicting the frequencies of genotypes, indel lengths, and frameshift frequencies.

a, b, Pearson’s r for held-out lib-A target sites comparing inDelphi predictions with observed frequencies for genotypes (a) and indel lengths (b) in mESCs and U2OS cells. The box denotes the 25th, 50th and 75th percentiles, whiskers show 1.5 times the interquartile range. Densities were smoothed with noise but do not extend beyond the data. c, Pie chart depicting the output of Delphi for specific outcome classes at lib-A target sites in mESCs. d, e, Comparison of two methods for frameshift predictions to observed values with Pearson’s r in HCT116 cells (d, n = 91 target sites) and K562 cells (e, n = 82 target sites). The error band represents the 95% confidence intervals around the regression estimate with 1,000-fold bootstrapping. f, Distribution of predicted frameshift frequencies among 1–60-bp deletions for SpCas9 gRNAs targeting exons (n = 1,000,294 gRNAs; mean = 66.4%) and shuffled versions (mean, 69.3%), and introns (n = 740,759) in the human genome. Dashed lines indicate means. ***P < 10−300, two-sided Welch’s t-test, test statistic = −145.5, d.f. = 1,506,304, Hedges’ g = −0.19.

Source data

Extended Data Fig. 5 Characterization of lib-B data including pathogenic microduplication repair in wild-type mESCs, wild-type U2OS cells and mESCs treated with DPKi3, NU7026 and MLN4924.

a, Box plots of the number of unique high-confidence editing outcomes (see Supplementary Methods) called by simulating data subsampling in data at 2,000 lib-B target sites in mESCs (combined data from n = 2 independent technical replicates) and U2OS cells (combined data from n = 2 independent biological replicates). In ‘all’, the full non-subsampled data are presented (see Supplementary Table 2 for read counts). Each box depicts data for 2,000 target sites. The box denotes the 25th, 50th, and 75th percentiles and whiskers show 1.5 times the interquartile range. Outliers are not depicted. b, Frequencies of repair to wild-type genotype at 567 ClinVar pathogenic alleles versus predicted frequencies in lib-B in human U2OS cells with Pearson’s r. c, Frequencies of repair to wild-type frame at 437 ClinVar pathogenic alleles versus predicted frequencies in lib-B in human U2OS cells with Pearson’s r. d, Frequency of pathogenic microduplication repair in wild-type mESCs (n = 1,480 target sites) compared to mESCs treated with MLN4924 (n = 1,569), NU7041 (n = 1,561) and DPKi3 (n = 1,563).

Source data

Extended Data Fig. 6 Altered distributions of Cas9-mediated genotypic products in Prkdc−/−Lig4−/− mESCs and mESCs treated with DPKi3, NU7026, and MLN4924 compared to wild-type mESCs.

a, Comparison of MH deletions among all deletions at lib-B target sites in wild-type cells (n = 1,909 target sites), cells treated with DPKi3 (n = 1,999), MLN4924 (n = 1,995) or NU7026 (n = 1,999) and Prkdc−/−Lig4−/− cells (n = 1,446). Statistical tests performed against wild-type population. *P = 5.6 × 10−5, **P = 3.5 × 10−13, ***P = 5.0 × 10−41, two-sided Welch’s t-test. b, Comparison of the frequency of each class of MH-less deletions among all deletion products in wild-type (lib-A and lib-B target sites, n = 3,829 target sites), DPKi3 (lib-B, n = 1,990), MLN4924 (lib-B, n = 1,980), NU7026 (lib-B, n = 1,992) and Prkdc−/−Lig4−/− (lib-A and lib-B target sites, n = 3,344). P values are compared to wild-type, two-sided Welch’s t-test. c, Frequency of 1-bp insertions at 1,055 target sites in lib-A in Prkdc−/−Lig4−/− mESCs. d, Frequencies of deletion repair to wild-type genotype in lib-B in wild-type mESCs (n = 1,480 target sites, combined data from two technical replicates) compared to conditions, with combined data from two independent biological replicates for each of Prkdc−/−Lig4−/− (n = 1,041 target sites), MLN4924 (n = 1,569), NU7026 (n = 1,561) and DPKi3 (n = 1,563). e, Table of Pearson’s r of the change in disease correction frequency compared to wild-type at n = 791 target sites for each pair of conditions. f, g, Annexin V-568 staining flow cytometry contour plots (f) and mean ± standard deviation values (g) in wild-type and Prkdc−/−Lig4−/− lib-A mESCs following transfection with SpCas9–P2A–GFP (representative data for n = 2 experiments). Box plots denote the 25th, 50th and 75th percentiles, whiskers show 1.5 times the interquartile range, and outliers are depicted as diamonds. For detailed statistics on significance tests, see Methods.

Source data

Extended Data Fig. 7 Template-free Cas9-nuclease editing of human and mouse cells containing pathogenic alleles.

a, b, Flow cytometric contour plots showing GFP fluorescence and LDL–Dylight550 uptake in (a) and fluorescence microscopy of (b) HCT116 cells containing the denoted LDLR alleles and treated with SaCas9 and gRNA when denoted (representative data for n = 2 experiments). c, Fluorescence microscopy of U2OS cells containing the denoted LDLR alleles and treated with SaCas9 and gRNA when denoted (representative data for n = 2 experiments). d, e, Flow cytometry gating strategy used for mESC and LDLRdup–P2A–GFP untreated (d) and treated with SpCas9 and gRNA (e). f, g, Results of 12 pathogenic 1-bp deletion alleles selected by inDelphi for high 1-bp insertion frequency (combined data from n = 2 independent biological replicates) compared to lib-A (f) and presented in a table (g). The box denotes the 25th, 50th and 75th percentiles, whiskers show 1.5 times the interquartile range, and outliers are depicted as diamonds. *P = 1.6 × 10−4, two-sided Welch’s t-test. For detailed statistics, see Methods. In the table, the most frequent 1-bp insertion genotype predicted by inDelphi that does not correspond to the wild-type genotype is indicated by an asterisk. In fluorescence microscopy plots, GFP fluorescence is shown in green, LDL–Dylight550 uptake in red, and Hoechst staining nuclei in blue.

Source data

Extended Data Table 1 Frequency of gRNAs in the human genome with denoted Cas9-mediated outcome precision
Extended Data Table 2 Endogenous repair of 24 designed high-precision gRNAs in human cell lines
Extended Data Table 3 Repair of ten pathogenic microduplication alleles in individual cellular experiments

Supplementary information

Supplementary Information

This file contains Supplementary Discussion, Supplementary Methods and Supplementary References.

Reporting Summary

Source data

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shen, M.W., Arbab, M., Hsu, J.Y. et al. Predictable and precise template-free CRISPR editing of pathogenic variants. Nature 563, 646–651 (2018).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


This article is cited by


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.


Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research