Abstract

Following Cas9 cleavage, DNA repair without a donor template is generally considered stochastic, heterogeneous and impractical beyond gene disruption. Here, we show that template-free Cas9 editing is predictable and capable of precise repair to a predicted genotype, enabling correction of disease-associated mutations in humans. We constructed a library of 2,000 Cas9 guide RNAs paired with DNA target sites and trained inDelphi, a machine learning model that predicts genotypes and frequencies of 1- to 60-base-pair deletions and 1-base-pair insertions with high accuracy (r = 0.87) in five human and mouse cell lines. inDelphi predicts that 5–11% of Cas9 guide RNAs targeting the human genome are ‘precise-50’, yielding a single genotype comprising greater than or equal to 50% of all major editing products. We experimentally confirmed precise-50 insertions and deletions in 195 human disease-relevant alleles, including correction in primary patient-derived fibroblasts of pathogenic alleles to wild-type genotype for Hermansky–Pudlak syndrome and Menkes disease. This study establishes an approach for precise, template-free genome editing.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Data availability

High-throughput sequencing data have been deposited in the NCBI Sequence Read Archive database under accession codes SRP141261 and SRP141144. Processed data have been deposited under the following DOIs: https://doi.org/10.6084/m9.figshare.6838016, https://doi.org/10.6084/m9.figshare.6837959, https://doi.org/10.6084/m9.figshare.6837956, https://doi.org/10.6084/m9.figshare.6837953, and https://doi.org/10.6084/m9.figshare.6837947.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  1. 1.

    Cong, L. et al. Multiplex genome engineering using CRISPR/Cas systems. Science 339, 819–823 (2013).

  2. 2.

    Mali, P. et al. RNA-guided human genome engineering via Cas9. Science 339, 823–826 (2013).

  3. 3.

    Jinek, M. et al. RNA-programmed genome editing in human cells. eLife 2, e00471 (2013).

  4. 4.

    Doench, J. G. et al. Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR–Cas9. Nat. Biotechnol. 34, 184–191 (2016).

  5. 5.

    Adli, M. The CRISPR tool kit for genome editing and beyond. Nat. Commun. 9, 1911 (2018).

  6. 6.

    Komor, A. C., Kim, Y. B., Packer, M. S., Zuris, J. A. & Liu, D. R. Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage. Nature 533, 420–424 (2016).

  7. 7.

    Gaudelli, N. M. et al. Programmable base editing of A•T to G•C in genomic DNA without DNA cleavage. Nature 551, 464–471 (2017).

  8. 8.

    Paquet, D. et al. Efficient introduction of specific homozygous and heterozygous mutations using CRISPR/Cas9. Nature 533, 125–129 (2016).

  9. 9.

    Landrum, M. J. et al. ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Res. 44, D862–D868 (2016).

  10. 10.

    Stenson, P. D. et al. Human Gene Mutation Database: towards a comprehensive central mutation database. J. Med. Genet. 45, 124–126 (2008).

  11. 11.

    Suzuki, K. et al. In vivo genome editing via CRISPR/Cas9 mediated homology-independent targeted integration. Nature 540, 144–149 (2016).

  12. 12.

    Nakade, S. et al. Microhomology-mediated end-joining-dependent integration of donor DNA in cells and animals using TALENs and CRISPR/Cas9. Nat. Commun. 5, 5560 (2014).

  13. 13.

    Koike-Yusa, H., Li, Y., Tan, E.-P., Velasco-Herrera, Mdel. C. & Yusa, K. Genome-wide recessive genetic screening in mammalian cells with a lentiviral CRISPR-guide RNA library. Nat. Biotechnol. 32, 267–273 (2014).

  14. 14.

    van Overbeek, M. et al. DNA repair profiling reveals nonrandom outcomes at Cas9-mediated breaks. Mol. Cell 63, 633–646 (2016).

  15. 15.

    Urasaki, A., Morvan, G. & Kawakami, K. Functional dissection of the Tol2 transposable element identified the minimal cis-sequence and a highly repetitive sequence in the subterminal region essential for transposition. Genetics 174, 639–649 (2006).

  16. 16.

    Ceccaldi, R., Rondinelli, B. & D’Andrea, A. D. Repair pathway choices and consequences at the double-strand break. Trends Cell Biol. 26, 52–64 (2016).

  17. 17.

    Deriano, L. & Roth, D. B. Modernizing the nonhomologous end-joining repertoire: alternative and classical NHEJ share the stage. Annu. Rev. Genet. 47, 433–455 (2013).

  18. 18.

    Bae, S., Kweon, J., Kim, H. S. & Kim, J.-S. Microhomology-based choice of Cas9 nuclease target sites. Nat. Methods 11, 705–706 (2014).

  19. 19.

    Cornu, T. I., Mussolino, C. & Cathomen, T. Refining strategies to translate genome editing to the clinic. Nat. Med. 23, 415–423 (2017).

  20. 20.

    Davis, A. J. & Chen, D. J. DNA double strand break repair via non-homologous end-joining. Transl. Cancer Res. 2, 130–143 (2013).

  21. 21.

    Arbab, M., Srinivasan, S., Hashimoto, T., Geijsen, N. & Sherwood, R. I. Cloning-free CRISPR. Stem Cell Reports 5, 908–917 (2015).

  22. 22.

    Bourbon, M., Alves, A. C. & Sijbrands, E. J. Low-density lipoprotein receptor mutational analysis in diagnosis of familial hypercholesterolemia. Curr. Opin. Lipidol. 28, 120–129 (2017).

  23. 23.

    Ran, F. A. et al. In vivo genome editing using Staphylococcus aureus Cas9. Nature 520, 186–191 (2015).

  24. 24.

    Oh, J. et al. Positional cloning of a gene for Hermansky–Pudlak syndrome, a disorder of cytoplasmic organelles. Nat. Genet. 14, 300–306 (1996).

  25. 25.

    Biehs, R. et al. DNA double-strand break resection occurs during non-homologous end joining in G1 but is distinct from resection during homologous recombination. Mol. Cell 65, 671–684 (2017).

  26. 26.

    Shin, H. Y. et al. CRISPR/Cas9 targeting events cause complex deletions and insertions at 17 sites in the mouse genome. Nat. Commun. 8, 15464 (2017).

  27. 27.

    Kosicki, M., Tomberg, K. & Bradley, A. Repair of double-strand breaks induced by CRISPR–Cas9 leads to large deletions and complex rearrangements. Nat. Biotechnol. 36, 765–771 (2018).

  28. 28.

    Kleinstiver, B. P. et al. Broadening the targeting range of Staphylococcus aureus CRISPR–Cas9 by modifying PAM recognition. Nat. Biotechnol. 33, 1293–1298 (2015).

  29. 29.

    Sherwood, R. I. et al. Discovery of directional and nondirectional pioneer transcription factors by modeling DNase profile magnitude and shape. Nat. Biotechnol. 32, 171–178 (2014).

Download references

Acknowledgements

The authors thank O. Juez, R. Jodhani and C. Araneo for technical assistance and the MIT Biomicro Center, the Harvard Medical School Biopolymers Facility, and the Broad Institute Genomics Platform for sequencing. The authors acknowledge funding from an NSF Graduate Research Fellowship to M.W.S.; an NWO Rubicon Fellowship to M.A.; 1R01HG010372 (C.A.C.); DARPA HR0011-17-2-0049, NIHRM1 HG009490, R01 EB022376, R35 GM118062, HHMI (D.R.L.); 1RO1HG008363, 1R01HG008754 (D.K.G.); 1K01DK101684, the Human Frontier Science Program, NWO, Brigham Research Institute, Harvard Stem Cell Institute, and American Cancer Society (R.I.S.).

Reviewer information

Nature thanks D. Durocher, R. Platt and the anonymous reviewer(s) for their contribution to the peer review of this work.

Author information

Author notes

  1. These authors contributed equally: Max W. Shen, Mandana Arbab

Affiliations

  1. Computational and Systems Biology Program, Massachusetts Institute of Technology, Cambridge, MA, USA

    • Max W. Shen
  2. Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA

    • Max W. Shen
    •  & David K. Gifford
  3. Merkin Institute of Transformative Technologies in Healthcare, Broad Institute of Harvard and MIT, Cambridge, MA, USA

    • Mandana Arbab
    •  & David R. Liu
  4. Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA, USA

    • Mandana Arbab
    •  & David R. Liu
  5. Howard Hughes Medical Institute, Harvard University, Cambridge, MA, USA

    • Mandana Arbab
    •  & David R. Liu
  6. Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA

    • Jonathan Y. Hsu
    •  & David K. Gifford
  7. Molecular Pathology Unit, Center for Cancer Research, and Center for Computational and Integrative Biology, Massachusetts General Hospital, Charlestown, MA, USA

    • Jonathan Y. Hsu
  8. Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA

    • Daniel Worstell
    • , Sannie J. Culbertson
    • , Olga Krabbe
    • , Christopher A. Cassa
    •  & Richard I. Sherwood
  9. Hubrecht Institute for Developmental Biology and Stem Cell Research, Royal Netherlands Academy of Arts and Sciences (KNAW), Utrecht, The Netherlands

    • Olga Krabbe
    •  & Richard I. Sherwood
  10. Broad Institute of MIT and Harvard, Cambridge, MA, USA

    • Christopher A. Cassa
    •  & David K. Gifford
  11. Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA

    • David K. Gifford

Authors

  1. Search for Max W. Shen in:

  2. Search for Mandana Arbab in:

  3. Search for Jonathan Y. Hsu in:

  4. Search for Daniel Worstell in:

  5. Search for Sannie J. Culbertson in:

  6. Search for Olga Krabbe in:

  7. Search for Christopher A. Cassa in:

  8. Search for David R. Liu in:

  9. Search for David K. Gifford in:

  10. Search for Richard I. Sherwood in:

Contributions

M.W.S., J.Y.H. and D.K.G. contributed to the inDelphi model. M.W.S., M.A., C.A.C., D.R.L., D.K.G. and R.I.S. contributed to the editing libraries, assays and applications. M.A. and R.I.S. contributed to the library experimental protocol and performed library experiments in mESCs, DNA repair-deficient mESCs, and U2OS cells. D.W., S.J.C., O.K. and R.I.S. performed endogenous experiments in mESCs, HCT116, U2OS and HEK293T cells. M.A. performed endogenous experiments in primary patient fibroblasts. M.W.S., J.Y.H., C.A.C. and D.K.G. contributed to algorithm development and computational analysis. M.W.S., M.A., D.R.L., D.K.G. and R.I.S. contributed to writing and editing the manuscript.

Competing interests

The authors declare competing interests: patent applications have been filed on this work. D.R.L. is a consultant and co-founder of Editas Medicine, Beam Therapeutics and Pairwise Plants, companies that use genome editing technologies. D.K.G. is a co-founder of Think Therapeutics, a company that uses machine learning for therapeutic development.

Corresponding authors

Correspondence to David R. Liu or David K. Gifford or Richard I. Sherwood.

Extended data figures and tables

  1. Extended Data Fig. 1 Design and cloning of a high-throughput library to assess CRISPR–Cas9-mediated editing products, yielding diverse and replicate-consistent data that is concordant with repair spectra at endogenous human genomic loci.

    a, Empirical distributions of various predicted and measured properties of DNA from 169,279 SpCas9 gRNA target sites in the human genome. Number of target sites per range used to design lib-A are indicated. b, Cumulative percentage of endogenous deletions in VO target sites in HEK293 (n = 89 target sites), HCT116 (n = 92) and K562 (n = 86) cells that delete up to the reported number of nucleotides (x axis). c, Schematic of the cloning process used to clone lib-A and lib-B (Methods, Supplementary Discussion, Supplementary Methods). d, Number of unique high-confidence editing outcomes (Supplementary Methods) called by simulating data subsampling in data in lib-A (n = 2,000 target sites) in mESCs (combined data from n = 3 independent biological replicates) and U2OS cells (combined data from n = 2 independent biological replicates). For ‘all’, the original non-subsampled data are presented. Each box depicts data for 2,000 target sites. Outliers are not depicted. e, Pearson’s r of genotype frequencies comparing lib-A in mESCs and U2OS cells with endogenous data in HEK293 (n = 87 target sites), HCT116 (n = 88), and K562 (n = 86) cells. Outliers are depicted as diamonds. 1-bp insertion frequency adjustment was performed at each target site by proportionally scaling them to be equal between two cell types. f, Pearson’s r of genotype frequencies at lib-A target sites, comparing two independent biological replicate experiments in mESCs (n = 1,861 target sites, median r = 0.89) and U2OS cells (n = 1,921, median r = 0.77). Outliers are depicted as diamonds. Box plots denote the 25th, 50th and 75th percentiles and whiskers show 1.5 times the interquartile range. Source data

  2. Extended Data Fig. 2 Categorizing and modelling Cas9-mediated DNA repair products with manual data-analysis and automated machine learning through inDelphi.

    a, b, Categories of Cas9-mediated genotypic outcomes in data from endogenous contexts at VO target sites in K562 (n = 88 target sites), HCT116 (n = 92), HEK293 (n = 89) cells (collectively, a) and U2OS cells (b, n = 1,958 lib-A target sites). c, Categories and defined properties (Supplementary Methods) of all sequence alignments consistent with a Cas9-mediated 7-bp deletion. d, Hypothesized mechanisms for template-free DNA repair at Cas9-mediated DSBs based on components of the classical NHEJ, alternative NHEJ or MMEJ pathways (Supplementary Discussion). e, Function learned for modelling MH deletions (Supplementary Methods). f, Function learned for modelling MH-independent deletions (MHless-NN) mapping deletion length to a numeric score (psi, Supplementary Methods, point plot) and with deletion length penalty normalized to sum to 1 (phi, Supplementary Methods, histogram). Source data

  3. Extended Data Fig. 3 Influential role of hyperlocal sequence context features in predicting and causing 1-bp insertions.

    a, Frequency of 1-bp insertions in mESCs (n = 1,981 lib-A target sites) and U2OS cells (n = 1,918) with varying −4 nucleotides. b, c, Plot of 1-bp insertion frequency in mESCs (n = 1,996 lib-A target sites) and U2OS cells (n = 1,966) compared to their total phi score (b) and predicted deletion length precision score (c) with Pearson’s r. d, Comparison of 1-bp insertion frequencies among all edited products from 1,966 lib-A target sites in U2OS cells (combined data from n = 2 independent biological replicates). e, Nucleotides and their effect on the frequency of 1-bp insertions in U2OS cells. Only bases with non-zero linear regression weights in 10,000-fold iterative cross-validation are shown. Total n = 1,966 lib-A target sites. f, Insertion frequency in mESCs (n = 205) and U2OS cells (n = 217) when varying four bases by the cleavage site (positions −5 to −2 counted from the NGG-PAM at positions 0–2) contained within three target sites designed with weak microhomology. g, Microhomology strength (deletion phi score) and 1-bp insertions in mESCs for 312 ‘4-bp’ target sites and 89 VO sequences. *P = 6.1 × 10−9; two-sided two-sample t-test, test statistic = −5.94, d.f. = 399, Hedges’ g effect size = 0.49. Box plots denote the 25th, 50th and 75th percentiles, whiskers show 1.5 times the interquartile range, and outliers are depicted as diamonds. Source data

  4. Extended Data Fig. 4 inDelphi predictions represent nearly all editing outcomes and are accurate at predicting the frequencies of genotypes, indel lengths, and frameshift frequencies.

    a, b, Pearson’s r for held-out lib-A target sites comparing inDelphi predictions with observed frequencies for genotypes (a) and indel lengths (b) in mESCs and U2OS cells. The box denotes the 25th, 50th and 75th percentiles, whiskers show 1.5 times the interquartile range. Densities were smoothed with noise but do not extend beyond the data. c, Pie chart depicting the output of Delphi for specific outcome classes at lib-A target sites in mESCs. d, e, Comparison of two methods for frameshift predictions to observed values with Pearson’s r in HCT116 cells (d, n = 91 target sites) and K562 cells (e, n = 82 target sites). The error band represents the 95% confidence intervals around the regression estimate with 1,000-fold bootstrapping. f, Distribution of predicted frameshift frequencies among 1–60-bp deletions for SpCas9 gRNAs targeting exons (n = 1,000,294 gRNAs; mean = 66.4%) and shuffled versions (mean, 69.3%), and introns (n = 740,759) in the human genome. Dashed lines indicate means. ***P < 10−300, two-sided Welch’s t-test, test statistic = −145.5, d.f. = 1,506,304, Hedges’ g = −0.19. Source data

  5. Extended Data Fig. 5 Characterization of lib-B data including pathogenic microduplication repair in wild-type mESCs, wild-type U2OS cells and mESCs treated with DPKi3, NU7026 and MLN4924.

    a, Box plots of the number of unique high-confidence editing outcomes (see Supplementary Methods) called by simulating data subsampling in data at 2,000 lib-B target sites in mESCs (combined data from n = 2 independent technical replicates) and U2OS cells (combined data from n = 2 independent biological replicates). In ‘all’, the full non-subsampled data are presented (see Supplementary Table 2 for read counts). Each box depicts data for 2,000 target sites. The box denotes the 25th, 50th, and 75th percentiles and whiskers show 1.5 times the interquartile range. Outliers are not depicted. b, Frequencies of repair to wild-type genotype at 567 ClinVar pathogenic alleles versus predicted frequencies in lib-B in human U2OS cells with Pearson’s r. c, Frequencies of repair to wild-type frame at 437 ClinVar pathogenic alleles versus predicted frequencies in lib-B in human U2OS cells with Pearson’s r. d, Frequency of pathogenic microduplication repair in wild-type mESCs (n = 1,480 target sites) compared to mESCs treated with MLN4924 (n = 1,569), NU7041 (n = 1,561) and DPKi3 (n = 1,563). Source data

  6. Extended Data Fig. 6 Altered distributions of Cas9-mediated genotypic products in Prkdc−/−Lig4−/− mESCs and mESCs treated with DPKi3, NU7026, and MLN4924 compared to wild-type mESCs.

    a, Comparison of MH deletions among all deletions at lib-B target sites in wild-type cells (n = 1,909 target sites), cells treated with DPKi3 (n = 1,999), MLN4924 (n = 1,995) or NU7026 (n = 1,999) and Prkdc−/−Lig4−/− cells (n = 1,446). Statistical tests performed against wild-type population. *P = 5.6 × 10−5, **P = 3.5 × 10−13, ***P = 5.0 × 10−41, two-sided Welch’s t-test. b, Comparison of the frequency of each class of MH-less deletions among all deletion products in wild-type (lib-A and lib-B target sites, n = 3,829 target sites), DPKi3 (lib-B, n = 1,990), MLN4924 (lib-B, n = 1,980), NU7026 (lib-B, n = 1,992) and Prkdc−/−Lig4−/− (lib-A and lib-B target sites, n = 3,344). P values are compared to wild-type, two-sided Welch’s t-test. c, Frequency of 1-bp insertions at 1,055 target sites in lib-A in Prkdc−/−Lig4−/− mESCs. d, Frequencies of deletion repair to wild-type genotype in lib-B in wild-type mESCs (n = 1,480 target sites, combined data from two technical replicates) compared to conditions, with combined data from two independent biological replicates for each of Prkdc−/−Lig4−/− (n = 1,041 target sites), MLN4924 (n = 1,569), NU7026 (n = 1,561) and DPKi3 (n = 1,563). e, Table of Pearson’s r of the change in disease correction frequency compared to wild-type at n = 791 target sites for each pair of conditions. f, g, Annexin V-568 staining flow cytometry contour plots (f) and mean ± standard deviation values (g) in wild-type and Prkdc−/−Lig4−/− lib-A mESCs following transfection with SpCas9–P2A–GFP (representative data for n = 2 experiments). Box plots denote the 25th, 50th and 75th percentiles, whiskers show 1.5 times the interquartile range, and outliers are depicted as diamonds. For detailed statistics on significance tests, see Methods. Source data

  7. Extended Data Fig. 7 Template-free Cas9-nuclease editing of human and mouse cells containing pathogenic alleles.

    a, b, Flow cytometric contour plots showing GFP fluorescence and LDL–Dylight550 uptake in (a) and fluorescence microscopy of (b) HCT116 cells containing the denoted LDLR alleles and treated with SaCas9 and gRNA when denoted (representative data for n = 2 experiments). c, Fluorescence microscopy of U2OS cells containing the denoted LDLR alleles and treated with SaCas9 and gRNA when denoted (representative data for n = 2 experiments). d, e, Flow cytometry gating strategy used for mESC and LDLRdup–P2A–GFP untreated (d) and treated with SpCas9 and gRNA (e). f, g, Results of 12 pathogenic 1-bp deletion alleles selected by inDelphi for high 1-bp insertion frequency (combined data from n = 2 independent biological replicates) compared to lib-A (f) and presented in a table (g). The box denotes the 25th, 50th and 75th percentiles, whiskers show 1.5 times the interquartile range, and outliers are depicted as diamonds. *P = 1.6 × 10−4, two-sided Welch’s t-test. For detailed statistics, see Methods. In the table, the most frequent 1-bp insertion genotype predicted by inDelphi that does not correspond to the wild-type genotype is indicated by an asterisk. In fluorescence microscopy plots, GFP fluorescence is shown in green, LDL–Dylight550 uptake in red, and Hoechst staining nuclei in blue. Source data

  8. Extended Data Table 1 Frequency of gRNAs in the human genome with denoted Cas9-mediated outcome precision
  9. Extended Data Table 2 Endogenous repair of 24 designed high-precision gRNAs in human cell lines
  10. Extended Data Table 3 Repair of ten pathogenic microduplication alleles in individual cellular experiments

Supplementary information

  1. Supplementary Information

    This file contains Supplementary Discussion, Supplementary Methods and Supplementary References.

  2. Reporting Summary

Source data

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/s41586-018-0686-x

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.