To the Editor:
The recent correspondence by Schaefer et al.1 contradicts numerous publications in the field using similar analytical approaches and methods2, some performed in human induced pluripotent stem cell (iPSC) and embryonic stem cell (ESC) lines3,4. The authors suggest that the CRISPR–Cas9 system is mutagenic in genomic regions not expected to be targeted by the gRNA. We believe that the conclusions drawn in this work are not substantiated by the data presented. It is impossible to ascribe the observed differences in the subject mice to the effects of CRISPR–Cas9 per se. The genetic differences seen in this comparative study were likely present before editing with CRISPR–Cas9.
The selection of a single cohoused mouse (rather than the parents, bona fide littermates, and/or a larger number of cohoused controls) as the control is inadequate. It is impossible for the authors to rule out the possibility that the reported genomic differences between the two experimental animals and the one control animal existed before experimental manipulation with CRISPR–Cas9. Oey et al.5 have shown in the C57BL/6J strain (a different strain than that used by Schaefer et al.1) that littermates analyzed by whole-genome sequencing (WGS) can differ in as many as 985 single-nucleotide variants (SNVs); the vast majority of these SNVs were present in the parents, and a small minority arose naturally in the progeny. Although the strain difference prevents direct inference of mutation rates for the FVB mouse, the values are roughly similar to those in Schaefer et al.1.
To further understand the observations in Schaefer et al.1, we reanalyzed their sequencing data deposited in the NCBI-SRA database. Raw sequence (fastq) files were retrieved, and, because the analysis parameters were not sufficiently described to reproduce the authors' analysis, we realigned and identified variants using a standard analytical framework (Supplementary Methods). Similar to Schaefer et al.1, we identified SNV and indel differences between the control “FVB” mouse and the test “F03” and “F05” mice, with 4,022 SNVs and 2,799 indel variants found across the three mice. We focused our analysis on variants where there were only two alleles in the three test mice. We filtered out variants where there were either three or more alleles across the three mice, or where all alleles were identical in all mice yet distinct from the mouse reference sequence (mm10). This left 3,978 SNVs and 2,713 indel variants for analysis (Supplementary Table 1).
Our analysis shows striking genomic similarity between F03 and F05. Approximately 60% of SNV and indel genotypes are shared between F03 and F05 and distinct from the FVB control. Such a strong similarity between the F03 and F05 mice is unexpected for a mutagenesis event occurring within independently created mice, and it suggests either underlying genetic similarities between F03 and F05 or a mutagen that is strongly directive.
We annotated whether each variant was found in the mouse reference genome (mm10), a Black 6 strain. In reviewing the variant list, it became clear that many of the variants were distributed relative to the mm10 reference in a way that would not be expected if a mutagen were applied (e.g., CRISPR–Cas9, as proposed by the authors, or potentially another step in the process). As summarized in Supplementary Table 2, there are 2,508 SNVs where the control FVB mouse genotype is homozygous and matches the mm10 reference and the F03 or F05 mice have a different genotype. Of these, 409 (16%) are 'complete switches', where F03 and F05 have identical homozygous genotypes that are not the mm10 reference. In examining the opposite case, we note 730 SNVs in the FVB control mouse homozygous for a genotype not matching the mm10 reference. Of these, a striking 578 SNVs (79%) appear as 'complete switches' for both the F03 and F05 mice back to the homozygous mm10 reference. Additionally, there are only 27 variants (4%) where both F03 and F05 mice have homozygous changes that do not match the mm10 reference. In considering just the 'complete switches', an expected distribution of SNVs should be 67% to one of the two non-mm10 references and 33% to the mm10 reference, yet here we see 4% and 96%, respectively—making this deviation highly significant (chi-squared P < 0.00001, with 2 × 2 contingency table of expected versus observed for mm10 and non-mm10; the chi-square is nondirectional).
An analysis with indels yields similar results. Of 1,698 homozygous indels matching mm10 in the FVB mouse, 458 (27%) are 'complete switches' in F03 and F05; and of 779 homozygous non-mm10 indels in FVB, 285 (36%) are complete switches back to the mm10 reference. However, only 126 (16%) are complete switches to another genotype. It is impossible to calculate an expected distribution because the number of possible indels is much larger and not defined. However, there is no reason to expect that indels would appear with a greater preference for the reference mm10 over any other possible indel.
The SNV and indel analyses for these extreme cases indicate that a mutagen (either CRISPR–Cas9 or other process steps) is unlikely to be the cause of the variants observed in the F03 and F05 mice. The mutagenic bias toward the mouse reference genome requires an alternate explanation, including variation in the breeding colony and subsequent Mendelian inheritance. Further investigation of the unexpected high levels of heterozygosity observed is discussed in a Supplementary Note (including Supplementary Fig. 1 and Supplementary Table 3), and illustrates some of the complexities involved with WGS analysis.
In conclusion, the genetic differences seen in Schaefer et al.1 were likely present before the editing process with CRISPR–Cas9. We encourage the authors to reconsider the current study and follow up with appropriately controlled experiments. Developing a deep understanding and control of the specificity of CRISPR–Cas9 technology is essential for research and therapeutic development.
Sequence data for F05 (SRR5450996), F03 (SRR5450997) and FVB (SRR5450998) were retrieved from the NCBI Short Read Archive. Sequence data for the FVB/NJ mouse sequenced by the Sanger were retrieved from the Sanger FTP site (http://ftp-mouse.sanger.ac.uk/REL-1502-BAM/FVB_NJ.bam) with modified date 1/20/15. All analysis of the sequencing data is described within the Supplementary Methods and Supplementary Table 4.
Integrated supplementary information
Supplementary Figure 1, Supplementary Tables 1–4, Supplementary Note 1 and Supplementary Methods
About this article
Transgenic Research (2018)