CRISPR–Cas9 is poised to become the gene editing tool of choice in clinical contexts. Thus far, exploration of Cas9-induced genetic alterations has been limited to the immediate vicinity of the target site and distal off-target sequences, leading to the conclusion that CRISPR–Cas9 was reasonably specific. Here we report significant on-target mutagenesis, such as large deletions and more complex genomic rearrangements at the targeted sites in mouse embryonic stem cells, mouse hematopoietic progenitors and a human differentiated cell line. Using long-read sequencing and long-range PCR genotyping, we show that DNA breaks introduced by single-guide RNA/Cas9 frequently resolved into deletions extending over many kilobases. Furthermore, lesions distal to the cut site and crossover events were identified. The observed genomic damage in mitotically active cells caused by CRISPR–Cas9 editing may have pathogenic consequences.
The utility of the CRISPR–Cas9 system for gene therapy in humans has been recognized and extensively investigated1. Initial concerns about the off-target activity have been addressed by the development of sensitive detection methods, as well as modified Cas9 enzymes and improved delivery protocols that limit this type of damage2,3,4,5,6,7,8,9,10,11,12. The vast majority of on-target DNA repair outcomes after Cas9 cutting in a variety of cell types are thought to be insertions and deletions (indels) of less than 20 bp13,14,15. Although indels a few hundred nucleotides in size were also observed in experiments using Cas9 or other nucleases, they were reported to be rare16,17,18. Consequently, Cas9 has been assumed to be reasonably specific and the first approved clinical trials using Cas9 edited cells are underway (clinicaltrials.gov: NCT03081715, NCT03398967, NCT03166878, NCT02793856, NCT03044743, NCT03164135).
Studies using paired gRNAs to induce localized deletions also reported generation of more complex genotypes, such as inversions, endogenous and exogenous DNA insertions, and larger-than-expected deletions19,20,21,22,23. Single gRNAs were shown to induce deletions of up to 600 bp in mouse zygotes24. Deletions of up to 1.5 kb in a haploid cancer cell line potentially induced by single gRNAs have been described, but since the guides were directed to a small part of the genome and provided as a pool, the possibility of rare double-cutting events cannot be excluded25. Furthermore, the analysis of the alleles generated using both single and paired gRNAs has in most studies relied on amplification of short regions (<1 kb) around the target and potential off-target sites, limiting the scope of assessment. Lesions non-contiguous with the cleavage site, such as those reported in yeast upon I-SceI nuclease cutting, would also be missed by such short-range assessments26,27,28. Finally, cancer cell lines, whose genome and DNA repair mechanisms are abnormal, were often used in the context of studying Cas9-induced lesions, making extrapolations to normal tissues and cells problematic.
We speculate that current assessments may have missed a substantial proportion of potential genotypes generated by on-target Cas9 cutting and repair, some of which may have potential pathogenic consequences following somatic editing of large populations of mitotically active cells.
We first comprehensively explored allelic diversity induced by Cas9 at the X-linked PigA locus, which is hemizygous in male embryonic stem (ES) cells. In contrast to cancer-derived cell lines, ES cells have a normal karyotype and intact DNA repair mechanisms, which makes them more representative of a normal somatic cell. Although mouse ES cells and embryonic fibroblasts differ in their use of DNA repair pathways, it is not known how they compare to other somatic cells29. We introduced Cas9 and gRNA constructs targeting intronic and exonic sites of PigA into JM8 mouse ES cells using PiggyBac transposition. Cells with both constructs were selected and subsequently stained with FLAER reagent to quantify the proportion of PigA-deficient cells (Fig. 1a,b). Single gRNAs targeting exons 2 to 4 yielded very high rates of PigA loss (59–97%). Notably, single gRNAs targeting intronic sites also yielded PigA-deficient cells at significant frequencies. Ten different guides located 263–520 bp from the nearest exon caused 8–20% PigA loss, whereas two guides greater than 2 kb away induced 5–7% loss (Fig. 1c and Supplementary Table 1). We obtained similar results with transient expression using electroporation or lipofection of ribonucleoprotein complexes (RNP), proving that these observations were not a consequence of PiggyBac transposition, delivery method, antibiotic selection or cellular response to transfected plasmid DNA (Supplementary Fig. 1). Lower knockout efficiency using exonic guides correlated with slower editing dynamics when delivered by PiggyBac transposition (data not shown).
To understand what genetic changes underlie the generation of PigA-deficient cells, we amplified a 5.7-kb region around exon 2 from pools of cells edited with three selected gRNAs introduced by PiggyBac transposition, and sequenced the PCR products using the PacBio platform. We observed a depletion of read coverage on a kilobase-scale around the cut sites, consistent with the presence of large deletions (Fig. 2a). Cells edited with intronic guides and sorted for loss of PigA generally exhibited loss of the adjacent exon. If intronic regulatory sequences were present around the exon, the DNA of cells sorted for retention of PigA expression would be wild type or contain small indels around the cut site. However, the most frequent lesions in these cells were deletions extending many kilobases up- or downstream, away from the exon. We conclude that, in most cases, loss of PigA expression was likely caused by loss of the exon, rather than damage to intronic regulatory elements.
Clustering of PacBio reads yielded 183 unique, edited, high-quality alleles derived from three different gRNAs. These alleles ranged from simple deletions and insertions to complex rearrangements (Fig. 2a,b, Supplementary Table 2 and Supplementary Data 1). One of the alleles contained an insertion with a perfect match to four consecutive exons derived from the Hmgn1 gene (Fig. 2b). We speculate this represents a de novo insertion from the spliced and reverse-transcribed RNA, rather than from one of the pseudogenized forms of Hmgn1, as the pseudogenes diverge in sequence from the observed insertion.
To fully characterize a variety of edited PigA loci, we isolated single-cell clones. The PigA loci around the gRNA target site were amplified using PCR primer pairs positioned progressively further apart (up to 16 kb), until amplicons were generated. These were sequenced using conventional Sanger sequencing technology (Supplementary Fig. 2a). This strategy allowed us to recover an allele in most cases (133/141, 94%; Supplementary Tables 2 and 3 and Supplementary Data 1).
Simple deletions overlapping both the cut site and the exon were found in almost three-quarters (69/93) of PigA-deficient alleles generated by single, intronic gRNAs (Supplementary Fig. 2b,c). The deletions varied in size, the largest spanning 9.5 kb. The remaining events were deletions combined with large insertions or more complex, multiple-lesion alleles. We obtained similar results using electroporation of RNP (Supplementary Fig. 1b). To assess the frequency of large deletions without strong selection for that outcome, we used an exonic gRNA causing 97% PigA loss. Although two-thirds of alleles (32/48) from PigA-deficient cells had indels <50 bp, as expected, >20% (10/48) had deletions >250 bp, extending up to 6 kb (Supplementary Fig. 2d). Because the deletions generated with the exonic gRNA were bidirectional, this is consistent with the average frequency of generating PigA-deficient cells with intronic guides positioned 263–520 bp from an exon (∼12%).
Notably, 23 of 133 recovered alleles contained additional lesions (single-nucleotide polymorphisms (SNPs), indels, large deletions and insertions) that were non-contiguous with the lesion at the cut site. In 13 out of 23 cases, the only exonic lesion detected was non-contiguous with the cut site (Fig. 2c). Furthermore, we observed alleles in which the intronic gRNA caused an inversion of a region containing the exon (Fig. 2d). Had the assessment been limited to the immediate vicinity of the cleavage site, such alleles would have been misclassified as wild type, and their phenotypic consequences would have been underestimated.
Insertions were present in 35 out of 133 recovered alleles. We could not find convincing local mapping for insertions shorter than 7 bp (13 alleles), which we speculate to be mostly non-templated nucleotides. The large majority of other insertions were constituted from sequence, which mapped to the PigA locus and encompassed inversions and duplications ranging from 11 bp to 2.5 kb (17 alleles; Fig. 2c,d and Supplementary Fig. 2c). The remaining five alleles contained DNA sequences that mapped to other parts of the mouse genome, such as interspersed repeats, or to exogenous, transfected sequences.
Six alleles did not contain lesions overlapping the nearest exon. Three of these were also wild type around the cut sites and are likely to contain lesions in other exons or larger rearrangements. The remaining three alleles contained only intronic lesions, which may interfere with splicing. In eight cases, it was not possible to recover any product with exon-spanning primers (Supplementary Fig. 3a, black primer pairs). To understand this class of events, we performed additional PCRs targeting each end of the PigA locus (Supplementary Fig. 3a, gray primer pairs). In three cases, just one end or neither end of the locus could be amplified, suggesting a larger deletion. In the remaining five cases, both ends were amplified. Since no product connecting the two ends could be obtained, these are likely to be translocations, inversions or large insertions (Supplementary Table 4).
To understand the diversity of potential deletion outcomes, we have repeated our original experiment in biological quadruplicate using the 5′ intronic gRNA. Cells with large deletions were enriched by sorting for PigA-negative cells and deletion fingerprints were generated by PCR. Each biological replicate differed substantially, despite a large number of unique deletion events sampled, indicating that the diversity of potential deletion outcomes is vast (Supplementary Fig. 4 and Supplementary Note).
Given that PigA is mono-allelic in the XY ES cells used in this study we wished to exclude the possibility that the observations reflect some peculiarity of the lack of a homolog. The autosomal Cd9 locus was selected for this purpose as it is non-essential in ES cells and its protein product can be readily detected by cell surface staining. An exonic guide yielded 88% Cd9 loss, while 5′ and 3′ intronic guides generated 4.2% and 5.4% Cd9 loss, respectively (Fig. 3a,b and Supplementary Table 1a). Taking into account a 1.6% background of Cd9low cells in the untransfected condition, we estimate the true proportion of Cd9 loss due to intronic cutting to be between 2.6–3.8%. This is consistent with results at the PigA locus, assuming both Cd9 alleles have to be destroyed to prevent Cd9 expression.
To describe the genetic events underlying Cd9 loss, we isolated single-cell clones edited with the 3′ intronic guide, ascertained their expression status by flow cytometry and sequenced the area around the cut site using PacBio and Sanger technologies. The largest deletion spanned 5.5 kb. A pileup of 185 resolved alleles derived from 93 single-cell clones shows a clear enrichment for deletions overlapping the exon in clones negative for Cd9 compared to positive clones and ones exhibiting a mixture of Cd9-positive and Cd9-negative cells (Fig. 3c). The bimodal expression pattern of some of the clones may be the result of a mixed clone or a protracted repair event that was resolved during clone outgrowth. The haplosufficient nature of the Cd9 gene is demonstrated by the fact that we could detect at least one allele with an intact exon in all but one of the 66 Cd9-positive and mixed clones. Similarly, only one of the 27 Cd9 negative clones had an intact exon, this exception presumably harboring other undetected lesions. We have further confirmed by PCR genotyping that large deletions are a common outcome in single-cell clones edited at the Cd9 locus using additional intronic and exonic guides (Supplementary Table 5 and Supplementary Note).
The experiment at the Cd9 locus was performed in mouse ES cells derived from an F1 cross between Mus musculus (BL6) and Mus musculus castaneus (CAST) mouse strains, which allowed us to distinguish the homologous chromosomes. In no case was the repair outcome identical between homologs within a clone, despite 15 alleles reoccurring between clones. This result is consistent with the great diversity of outcomes at the PigA locus. Just over half of the edited clones (52 out of 93) contained precisely one CAST and one BL6 allele, as expected. Notably, in 18 clones only one allele was detected, potentially due to translocations, very large deletions, insertions or inversions, monosomy or loss of heterozygosity (LOH) either local or chromosome-wide. 21 clones contained an abnormal number of alleles, which could have resulted from a mixed clone, large duplication, repair events happening during clone outgrowth or aneuploidy induced by Cas9 cutting. Finally, two clones contained recombinant BL6-CAST alleles (Fig. 3d). In one case, an LOH event distal to the breakpoints converted part of the CAST allele to BL6. In another case, the BL6-CAST crossover boundary did not coincide with the breakpoint. We conclude that the creation of these alleles likely involved interhomolog strand invasion as they cannot be explained by a simple rejoining of the resected ends of two broken chromosomes.
To investigate whether the observed on-target extensive DNA repair-associated damage is an intrinsic property of undifferentiated mouse ES cells, we examined the consequence of editing in a human differentiated cell line. An immortalized human female retinal pigment epithelial cell line (RPE1) was used. Although this is a female cell line, X-inactivation renders it functionally hemizygous at the PIGA locus. Editing PIGA with single exonic and intronic gRNAs delivered with PiggyBac vectors, resulted in a loss of PIGA at frequencies comparable to those observed in mouse ES cells (Fig. 4a,b). PCR genotyping and Sanger sequencing of 41 PIGA-deficient single-cell clones edited with intronic gRNAs revealed large deletions, insertions, inversions and non-contiguous lesions overlapping the exon (Fig. 4c–e). In some clones only one small, intronic indel allele was detected, which we interpret as an inconsequential edit of the inactive chromosome, coupled with a loss-of-function lesion on the active X-chromosome; the lesion would inactivate one or both primer binding sites.
Similar results were obtained in lineage-negative cells from the bone marrow of mice homozygous for a Cas9-GFP cassette at the Rosa26 locus. Progenitor cells enriched by removal of differentiated cells on magnetic columns were electroporated with a crRNA:trRNA complex against the GFP locus, and GFP-negative single-cell clones were isolated and genotyped around the cut site with three different primer pairs spanning up to 3.6 kb. At least one large deletion product between 100 bp and ∼3 kb in size was detected in 35 out of 96 clones (Supplementary Fig. 5a,b). We verified eight deletion products by Sanger sequencing across the deletion junction (Supplementary Fig. 6a). Only wild-type-size products were detected in the remaining clones and none of the 96 control clones exhibited any deletion bands (Supplementary Table 5, “progenitor” experiment).
The editing in this study was conducted at actively transcribed loci in normal ES cells and progenitor cells, both with intact DNA repair processes, as well as in an immortalized, differentiated human cell line; each are surrogates for various clinical editing applications. We show that extensive on-target genomic damage is a common outcome at all loci and in all cell lines tested. Moreover, the genetic consequences observed are not limited to the target locus, as events such as loss-of-heterozygosity will uncover recessive alleles, whereas translocations, inversions and deletions will elicit long-range transcriptional consequences. Given that a target locus would presumably be transcriptionally active, mutations that juxtapose this to one of the hundreds of cancer-driver genes may initiate neoplasia. In the clinical context of editing many billions of cells, the multitude of different mutations generated makes it likely that one or more edited cells in each protocol would be endowed with an important pathogenic lesion. Such lesions may constitute a first carcinogenic 'hit' in stem cells and progenitors, which have a long replicative lifespan and may become neoplastic with time. Such a circumstance would be similar to the activation of LMO2 by pro-viral insertion in some of the early gene-therapy trials, which caused cancer in these patients30. Results reported here also illustrate a need to thoroughly examine the genome when editing is conducted ex vivo. As genetic damage is frequent, extensive and undetectable by the short-range PCR assays that are commonly used, comprehensive genomic analysis is warranted to identify cells with normal genomes before patient administration.
Mouse ES cell culture and transfection.
gRNA-expression vectors contain a U6 promoter with an “F+E” scaffold31 and a PGK-Puro-2A-BFP cassette, flanked by PiggyBac repeats. The Cas9-expression vector contains a Cas9-Blast cassette expressed from a short EF1α promoter in a pKLV backbone13,32. CAST/BL6 (CB9; a gift from Prof A. Fergusson-Smith), AB2.2 mCherry/GFP reporter (a gift from X. Gao and P. Liu) or JM8.A3 mouse ES cells33,34 were cultured in M15 media (high-glucose DMEM, with 15% FSC, beta-mercaptanol and L-glutamate) on STO-neo-LIF-puro (SNLP) feeder cells.
Complexes of lipofectamine LTX (2.5 μl), plus reagent (0.5 μl), 200 ng hyperactive PiggyBac transposase35, 100 ng of the PiggyBac Cas9-Blast plasmid and 50 ng of the PiggyBac gRNA-Puro plasmid were prepared in 50 μl OptiMEM following manufacturer′s instructions. Cells were trypsinized, washed in M15, resuspended in M15+LIF and seeded onto a gelatinized 24-well plate, containing the lipofectamine DNA complexes, at 3 × 105 cells per well. From day 2, M15+LIF media containing puromycin (3 μg/ml) and blasticidin (10 μg/ml) was used. The same setup was used for RPE1 cell line, except the Cas9-Blast plasmid was omitted. A similar setup was used for lipofection of RNP complexes with 20 pmol of both hybridized crRNA:trRNA (Sigma) and EnGen Cas9 NLS (NEB). Neon Transfection System (Thermo Fisher Scientific; 1,600 v/10 ms /3 pulses) was used for electroporation of 1.5 × 105 cells in buffer R with 6 pmol each of crRNA:trRNA, electroporation enhancer (IDT) and Cas9 protein or 9 pmol each of crRNA:trRNA and Cas9 protein. Around 3 × 105 cells were collected on day 14 (or day 17, in case of the RPE1 cells), stained in PBS+0.1% BSA for 30 min at room temperature with 1 μg/ml FLAER reagent (Cedarlane) or anti-Cd9-PE antibody (cat. 124805, Biolegend), washed twice and analyzed using a Cytoflex flow cytometer. For single-cell cloning and PacBio experiments, cells were transfected in six-well plates with five times more cells and reagents, expanded onto 10-cm dishes and sorted by fluorescence-activated cell sorting for loss of FLAER or Cd9 staining on day 14 using MoFlow XDP (Beckman Coulter). Single-cell clones were isolated and grown in 96-well plates. DNA was extracted by proteinase K digestion followed by ethanol precipitation. PCR reaction were conducted using primers in Supplementary Table 6 and LongAMP polymerase (NEB) following manufacturer's instructions.
Primers were designed using Primer3-BLAST (Supplementary Table 6). Guide RNAs were designed using Benchling and CRISPRscan36. Alignment of Sanger-sequenced PCR products was performed using BLAT (v 36) and converted into BAM format using a customized script from T. Marschall (https://github.com/ALLBio/allbiotc2/tree/master/synthetic-benchmark). Mixed traces were resolved using PolyPeakParser37. Analysis of PacBio data was performed using command line version of SMRT-Link software (pbtranscript 1.0.1.TAG-1470). For PigA locus pileup, circular consensus sequences were called with at least one full pass and minimum predicted accuracy of 0.9. Individual PigA and Cd9 alleles were reconstructed by following “Running Iso Seq using SMRTLink” tutorial on github, except “–targeted_isoseq” option was used at the clustering step. Resulting alleles were mapped to the reference genome using bwa mem (v 0.7.17-r1188). In case of the PigA locus, mapped reads were clustered furthered using a custom script. Genome coverage was calculated with “bedtools genomecov –dz” (v 2.27.1) using circular consensus sequences (PigA locus) or reconstructed alleles (Cd9 locus). All downstream analysis was performed using custom R (v 3.3.2) and bash scripts and visualized with ggplot2 package. Flow cytometric data were processed with FlowJo (v 10.4.1).
Mouse bone marrow cell culture and transfection.
Bone marrow cells from a homozygous C57BL/6 CAS9-EGFP knock-in mouse38 were isolated by flushing tibias and femurs in HBSS (Life Technologies) supplemented with 2% FBS and 10 mM HEPES (Sigma). Lineage negative cells were isolated using Direct Lineage Cell Depletion Kit Mouse (Miltenyi Biotec) and cultured in X-Vivo (Lonza) with 2% FBS, 50 ng/ml stem cell factor, 50 ng/ml thrombopoietin, 10 ng/ml IL-6 (Peprotech). After culturing for 3 h, 1 × 105 cells were electroporated (1550 v/20 ms/1 pulse) in buffer T with 44 pmol of preassembled crRNA:trRNA duplex (guide #311, Supplementary Table 1; IDT) using the Neon Transfection System. GFP-negative cells were sorted 4 d after the electroporation and plated into Methocult M3434 media (6,000 cells per 3 ml, StemCell Technologies). Seven days later, single colonies were picked into 25 μl of direct PCR lysis buffer (Peqlab) with 1 μg/ml proteinase K and analyzed by PCR (Supplementary Fig. 5 and Supplementary Table 5).
Life Sciences Reporting Summary.
Further information on experimental design is available in the Nature Research Reporting Summary linked to this article.
PacBio sequencing data are accessible at the European Nucleotide Archive under accession numbers ERS2396492 (PigA) and ERS2396493 (Cd9). Barcoding information is in Supplementary Data 2. Correspondence and requests for materials, additional data and code should be addressed to A.B. (firstname.lastname@example.org).
Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
European Nucleotide Archive
We wish to thank M. Friedrich for sharing his gRNA expression construct and technical advice, E. Metzakopian for technical advice and critical reading of the manuscript, G. Rutledge for critical reading of the early manuscript, A. Ferguson-Smith for the CAST/B6 hybrid ES cells, P. Liu and X. Gao for mCherry/GFP reporter cells, S. Jackson's group for the Cas9-expressing RPE1 cell line and the Cytometry Core Facility for assistance with cell sorting. This work was supported by the Wellcome Trust Grant number 098051.
Integrated supplementary information