Exosome-mediated horizontal gene transfer occurs in double-strand break repair during genome editing

The CRISPR-Cas9 system has been successfully applied in many organisms as a powerful genome-editing tool. Undoubtedly, it will soon be applied to human genome editing, including gene therapy. We have previously reported that unintentional DNA sequences derived from retrotransposons, genomic DNA, mRNA and vectors are captured at double-strand breaks (DSBs) sites when DSBs are introduced by the CRISPR-Cas9 system. Therefore, it is possible that unintentional insertions associated with DSB repair represent a potential risk for human genome editing gene therapies. To address this possibility, comprehensive sequencing of DSB sites was performed. Here, we report that exosome-mediated horizontal gene transfer occurs in DSB repair during genome editing. Exosomes are present in all fluids from living animals, including seawater and breathing mammals, suggesting that exosome-mediated horizontal gene transfer is the driving force behind mammalian genome evolution. The findings of this study highlight an emerging new risk for this leading-edge technology.

S ince 2000, three types of genome editing technologies have been developed: zinc-finger nucleases (ZFNs), transcription activator-like effector nucleases (TALENs), and CRISPR-Cas9 1 . Of these, CRISPR-Cas9 features not only the easiest construct design but also high double-strand break (DSB) efficiency; however, CRISPR-Cas9 can cause DSBs at unintended sites 1,2 . In mouse zygotes, most DSBs introduced by CRISPR-Cas9 are repaired by nonhomologous end joining (NHEJ) without homologous DNA oligos for homologous recombination (HR) 3 . NHEJ-mediated repair of DSBs is prone to error, causing small indels 3 . In 2015, we reported that DSBs introduced by CRISPR-Cas9 can be repaired by the capture of retrotransposon sequences, reverse-transcribed spliced mRNA sequences (RMDR: RTproduct-mediated DSB repair) and CRISPR-Cas9 vector sequences (non-RMDR: non-RT-product-mediated DSB repair) in mouse zygotes 4 . Most captured DNA sequences are truncated at their 5′ and 3′ ends. Short microhomologies (1-4 bp) between the captured DNA sequence and the DSB-introduced site were observed in only half of the cases, suggesting that both RMDR and non-RMDR proceed via NHEJ 4 . RMDR and non-RMDR have also been observed in DSBs induced by CRISPR-Cas9 in NIH-3T3 cells 4 .
The capture of DNA sequences was also observed at the DSB site introduced by the I-SceI restriction enzyme in Saccharomyces cerevisiae, a human hepatoma cell line and human monocytic leukemia cells and at naturally occurring DSB sites in Daphnia, Drosophila, and Aspergillus [5][6][7][8][9][10][11] . Ty1 retrotransposon insertions into DSB sites were induced by I-SceI in HR-deficient S. cerevisiae 8,9 . In the case of the hepatoma cell line LMH, I-SceI induced the insertion of truncated infected hepatitis B virus into DSB sites 10 . Endogenous nucleotide sequence insertions were also induced by I-SceI in the human monocytic leukemia cell line U937 11 . In Daphnia, Drosophila, and Aspergillus, greater than half of recent naturally gained introns originated from the repair of staggered DSBs [5][6][7] .
These capture of unintentional DNA sequences at DSB sites might be an evolutional driving force of mammalian genomes, including horizontal gene transfer. In this report, comprehensive analyses of DSB sites introduced by CRISPR-Cas9 in vivo and in vitro were performed to identify the relationships between DSB repairs and genome evolution and verify the risk for the leadingedge technology. Our results highlight exosome-mediated horizontal gene transfer, which occurs in DSB repair, during genome editing and represents a potential new risk for genome editing.

Results
Determination of indels by deep sequencing. First, we accurately determined the lengths of the indels introduced by the CRISPR-Cas9 system in vivo and in vitro by deep sequencing of PCR products amplified with two primers across the target DSB site (Fig. 1a).
We introduced DSBs at the Peg10 gene locus by transfecting NIH-3T3 cells with a CRISPR plasmid encoding both Cas9 and gRNA targeting the Peg10 gene and a PGK-Puro plasmid 4 . After transient selection with puromycin, DNA was extracted from the cells, and PCRs were performed to amplify the region containing the DSB site introduced into Peg10 (Fig. 1a). Then, the PCR products were subjected to high-throughput next-generation sequencing analyses. Greater than half of the sequence reads contained ±1-2 bp indels as previously described 12 (Fig. 1b, c, Table 1). These populations may have been repaired by errorprone NHEJ as previously reported 1 . Greater than 90% of the deletions (3-64 bp) exhibited microhomologies (1-4 bp) at the junction, suggesting that these deletions were also mediated by NHEJ 13 (Fig. 1b, Supplementary Fig. 1, Table 1, Supplementary Data 1).
Long insertions (>33 bp) were observed in 4% of sequence reads from DSB-induced NIH-3T3 cells (Fig. 1b, c). Greater than half of the long insertion sequences were derived from plasmid DNA (Fig. 1c, d). In total, 16% and 2% of long insertions were identical to mouse genomic DNA and mRNAs (Fig. 1c, d, Supplementary Data 2). These results are comparable to previous results obtained by gel extraction and subcloning/Sanger sequencing 4 .
Capture of bovine and E. coli genomic DNA by horizontal gene transfer. One of the two novel findings of our high-throughput sequencing analyses is that 21% of the long insertions were derived from Escherichia coli genomic DNA. These sequences are identical to the E. coli K12 strain, suggesting that they are derived from contamination by the host E. coli genomic fragments used to amplify the CRISPR-Cas9 vectors (Fig. 1d). DNA sequences from E. coli with or without microhomologies were captured (Supplementary Fig. 2). Finally, 2% of the long insertions mostly (88%) consists of mouse repeats, including mouse short interspersed nuclear elements (SINEs), mouse long interspersed nuclear element-1s (L1s), mouse endogenous retroviruses, and mouse satellite repeats and simple repeats, whereas the remaining 12% of the insertions were derived from bovine genomic DNA, including bovine SINE and bovine satellite repeats ( Fig. 1d- Exosome-mediated horizontal gene transfer. Most of the inserted bovine DNA was derived from bovine satellite DNA sequences, such as BTSAT2, BTSAT3, and BTSAT4, and bovine SINE sequences, such as Bovc-tA2 14 (Table 2a). Dulbecco's Modified Eagle Medium (DMEM) containing 10% fetal bovine serum (FBS) was used to culture NIH-3T3 cells. Thus, DNA or RNA from FBS in the form of cell-free DNA/RNA, including exosomal DNA/RNA might be the source of the bovine DNA sequences captured by the DSB sites in the cultured mouse cells [15][16][17][18][19][20][21] .
To confirm the possibility of such horizontal gene transfer from the cell culture medium, we repeated these experiments using goat serum instead of FBS (Fig. 2). As noted in the experiments with FBS, mouse genomic DNA, mRNA, or plasmid DNA and E. coli genomic DNA were detected as expected ( Fig. 2a, b). As expected, goat DNA sequences were captured in the DSB sites of NIH-3T3 cells ( Fig. 2c-e, Table 2b, Supplementary Data 3). These data demonstrate that horizontal gene transfer can occur from the serum used in the culture medium. To clarify the origin of the captured bovine DNA sequences, i.e., whether these sequences arose from cell-free nucleic acids or nucleic acids in exosomes, we introduced DSBs by CRISPR-Cas9 in NIH-3T3 cell lines cultured with exosome-free 10% FBS (DMEM), which contains comparable amount of cell-free nucleic acids (Fig. 3, Supplementary Fig. 3). Bovine DNA sequences originating from cell-free nucleic acids should still be introduced at the DSB sites if horizontal gene transfer was mediated by cellfree nucleic acids. In contrast, a reduction in the insertion of bovine DNA sequences in the presence of exosome-free serum would indicate that trans-species gene transfer is mediated by exosomes. The insertion rates of endogenous mouse DNA sequences, vector sequences, and E. coli sequences were comparable in cells cultured with exosome-free FBS or normal FBS; however, most of the bovine DNA insertions were abolished by culture with exosome-free 10% FBS/DMEM (Figs. 3a-c and 4,  Of the sequence reads, 35% were deletions, and 4% were large insertions (more than 33 bp; red region). d Of the large insertions (red region in c), 59% corresponded to partial sequences of the transfected plasmid DNA. An additional 16% and 2% of the reads were identical to mouse genomic DNA and mRNA sequences, respectively, and 21% of the large insertions corresponded to E. coli genomic DNA. The remaining 2% of the total reads are described in e (blue region). e 12% of the reads classified as others (blue region in d) were from Bos taurus (bovine), including genome, SINEs, and satellite DNA sequences. Structures of de novo inserted bovine sequences at the Peg10 loci (f, g). Both the post-and preintegration sequences are presented. The sgRNA sequence and the PAM sequences are presented in red and bold red characters, respectively. The black lines indicate the junction sites between pre-and postintegration sequences. The sequences in the blue boxes are overlapping microhomologies and are marked with black dotted lines. Each insertion was truncated at both the 5′ and 3′ ends. f Truncated Bov-tA1, BCS, and bovine SINEs were inserted with 6 and 1-bp microhomologies. g A truncated BTSAT3b, a bovine satellite, and a partial BERV2, bovine endogenous retrovirus, were inserted with a 1-bp overlapping microhomology retrotransposon RNAs were highly expressed in FBS under all the conditions, suggesting that bovine satellite sequence RNAs and bovine retrotransposon RNAs were within the exosomes ( Supplementary Fig. 4).
These data support exosome-mediated trans-species gene transfer; however, it is possible that these horizontal gene transfer events were mediated by cell-free nucleic acids. Because exosomes and cell-free nucleic acids are reportedly present in all fluids from living animals, trans-species gene transfer events may also occur in mouse embryos in which DSBs are introduced by injection of CRISPR-Cas9 mRNA into zygotes. Thus, DNA was extracted from day 10 embryos in which CRISPR-Cas9 mRNA and Peg10 sgRNA were injected at the zygote stage and analyzed by highthroughput sequencing. One of 12 embryos (#20) captured BTAUL1, a bovine SINE (Fig. 5a, b, Supplementary Fig. 5, Supplementary Data 5). The KSOM medium used to culture the mouse zygotes contains bovine serum albumin (BSA) fraction V, which may contain exosomes or cell-free nucleic acids.

Discussion
In this report, we demonstrated that horizontal gene transfer assisted by CRISPR-Cas9 occurs in NIH-3T3 cells and mouse embryos. This phenomenon might be the driving force behind mammalian genome evolution. In fact, mice with fusions between the murine Peg10 gene and a bovine SINE were obtained (Supplementary Fig. 5). A number of possible trans-species horizontal gene transfer events have been reported in mammals. Chromodomains (chromatin organization modifier), a protein structural domain, are highly conserved in chromoviruses, and SCAN domains might originate from GYPSYDR-1 retrotransposons. Sirh-family genes, which are conserved in mammals, contain a gag-like domain from the Ty3/Gypsy-type retrotransposon of fugu fish [22][23][24][25][26] . Recently, in silico analyses demonstrated horizontal transfer of BovB (non-LTR retrotransposon from Bos taurus) and L1 retrotransposons (B. taurus) in eukaryotes 27 . In this study, we revealed that BovB and L1 were abundant in exosomes and that goat BovB was horizontally transferred into mouse NIH-3T3 cells. These data support that horizontal gene transfer events are mediated by exosomes.
CRISPR-Cas9 itself exhibits some propensity for inducing offtarget mutations 2 . The DSBs produced by CRISPR-Cas9, whether on target or off target, were repaired by the capture of unintentional DNA sequences 2 . Although the risk of unintentional

Pre-integration sequence
Post-integration sequence

Pre-integration sequence
Post-integration sequence Fig. 2 Trans-species horizontal gene transfer at the Peg10 gene locus from the serum included in the culture medium. a Distribution of indels at CRISPR-Cas9-induced DSB sites in NIH-3T3 cells cultured using DMEM containing 10% goat serum instead of FBS. In addition, 38% of the sequence reads were deletions. Large insertions (greater than 33 bp) represented 4% of the total sequence reads (red region). b Here, 51% of the large insertions corresponded to partial sequences of the plasmid DNA that was transfected into the NIH-3T3 cells. In addition, 16% and 1% of the reads were identical to mouse genomic DNA and mRNA sequences (MM10), respectively. Moreover, 29% of the large insertions corresponded to E. coli genomic DNA. The remaining 3% of the total reads are described in c (blue region). c Approximately 9% of the reads classified as others were from goat, including the goat genome and goat SINEs and goat satellite DNA. Structures of de novo inserted goat sequences at the Peg10-ORF1 loci (d, e). Both the postintegration site and preintegration sequences (bottom of the panel) are presented. The nucleotide sequences that correspond to the single guide RNA sequence and the PAM sequences are presented in red and bold red characters, respectively. The black lines indicate the junction sites between pre-and postintegration sequences. The sequences in the blue boxes are overlapping microhomologies and are marked with black dotted lines. Each insertion was truncated at both the 5′ and 3′ ends. d Partial goat DNA sequences from chromosome 28 were inserted with a 1-bp microhomology. e A truncated goat satellite DNA sequence was inserted with a 2-bp overlapping microhomology insertions is greater than 4%, considerable efforts have focused on reducing off-target effects. The pair of CRISPR-Cas9 D10A (nickase) and a high-fidelity CRISPR-Cas9 nuclease reduce genome-wide off-target effects 28,29 . These efforts hold promise because DSBs at off-target sites could be neglected. However, unintentional insertions at on-target DSB site cannot be suppressed by these off-target-reducing methods. Therefore, gene therapy using these genome-editing technologies may capture unintentional insertions. DSBs are typically repaired by NHEJ or HR. NHEJ is the predominant pathway in mammals 30,31 and Drosophila 32,33 , whereas HR is the major pathway in S. cerevisiae 34 . Another DSB repair mechanism, microhomologymediated end-joining (MMEJ), repairs DSBs via the use of substantial microhomology. MMEJ uses microhomologies of 5-25 bp during the alignment of two broken ends, whereas NHEJ frequently proceeds through the annealing of short (1-4 bp) microhomologies 13 . Most of the insertion sequences identified in the present study displayed short microhomologies (1-4 bp) or no microhomology with the introduced DSB site, suggesting that they were captured by NHEJ rather than MMEJ.
SCR7, an inhibitor of NHEJ, improves the efficiency of HR in genome editing 35,36 . Increasing the efficiency of HR may be a key strategy to reduce the risk of unintended insertions.

Methods
Animals. All animal studies were conducted in accordance with the guidelines approved by the animal care committee of the National Institute of Health Sciences (No. 1004). The animal welfare committee of National Institute of Health Sciences (No. 539) approved the protocol. Animals had access to a standard chow diet and water ad libitum and were housed in a pathogen-free barrier facility with a 12L:12D cycle, as previously described 4 .
Production of hCas9 mRNA and Peg10-ORF1-sgRNA. To produce the Cas9 mRNA, the T7 promoter was added to the Cas9 coding region of the pX330 plasmid by PCR amplification, as previously described 3 . Briefly, the T7-Cas9 PCR product was gel purified and used as the template for in vitro transcription (IVT) using the mMESSAGE mMACHINE T7 ULTRA kit (Thermo Fisher Scientific, Waltham, MA). The T7 promoter was added to the Peg10-ORF1-sgRNA region of the pX330 plasmid by PCR purification using the following primers as previously described: Peg10-ORF1-IVT-F (TGTAATACGACTCACTATAGGGTGTCTCTA CTGTGGCAATGG) and IVT-R (AAAAGCACCGACTCGGTGCC) 4 .
The T7-sgRNA PCR product was gel purified and used as the template for IVT using the MEGAshortscript T7 kit (Thermo Fisher Scientific, Waltham, MA). Both the Cas9 mRNA and Peg10-ORF1-sgRNA were treated with DNase to eliminate template DNA, purified using the MEGAclear kit (Thermo Fisher Scientific, Waltham, MA), and eluted into RNase-free water as previously described 4 .
Exosome collection and exosome RNA isolation. Exosomes were prepared by a stepwise centrifugation-ultracentrifugation method as described previously with minor modifications 17 . Briefly, 1.4 ml FBS was centrifuged at 10,000×g for 30 min to remove the cell debris and then centrifuged at 100,000×g for 70 min using a  PCR and DNA sequencing. For analyses of unintentional sequence insertion associated with DSB repairs, genomic DNA was prepared from the embryonic yolk sac or cultured cells using the DNeasy kit (QIAGEN, Hilden, Germany). The identity of the indels induced by DSB repair was confirmed by PCR and subsequent next-generation sequencing using MiSeq (Illumina Inc., San Diego, CA). The following primers were used: Peg10 F (5′-AATGATACGGCGACCACCGA-GATCTACACNNNNNNNNTCGTCGGCAGCGTCAGATGTGTATAAGAGA-CAGagagacgccgcaaaatgaat-3′; NNNNNNNN = Illumina barcode S sequence) and Peg10 R (5′-CAAGCAGAAGACGGCATACGA-GATNNNNNNNNGTCTCGTGGGCTCGGAGATGTGTATAAGAGA-CAGgaggctttcgctggacac-3′; NNNNNNNN = Illumina barcode N sequence) as previously described 4 . A mixture of 1× ExTaq buffer (Takara Bio, Kusatsu, Japan), 2.5 mM dNTPs, primers and 2.5 U of ExTaq (Takara Bio, Kusatsu, Japan) was subjected to 32 PCR cycles of 96°C for 15 s, 65°C for 30 s, and 72°C for 30 s in a Bio-Rad C1000 Touch system. Each PCR product was purified using an Ampure XP (BECKMAN COULTER, Indianapolis, IN) as previously described 4 .
For analyses of exosomes from FBS, exosome cDNA libraries were synthesized with SMARTer smRNA-Seq Kit for Illumina (Takara Bio, Kusatsu, Japan). The concentration of the PCR products with DSB repair and cDNA synthesized from exosome RNA were quantified using a Kapa Library Quantification kit (Roche, Basel, Switzerland). These products (8 pM) were then subjected to 300 cycles of paired-end index sequencing (total 600 cycles) on an Illumina MiSeq sequencer according to the manufacturer's instructions (Illumina Inc., San Diego, CA). All the sequence data were converted to FASTQ format by using Illumina BaseSpace (https://basespace.illumina.com/home/index).
Evaluation of cell-free nucleic acids in FBS and exosome-free FBS. Briefly, 24 ml of FBS and exosome-free FBS were centrifuged at 10,000×g for 30 min to remove the cell debris and cell-free nucleic acids were purified via a phenolchloroform procedure. Then, ethanol precipitation with glycogen was performed and eluted into DNase/RNase-free water. The concentration and quality of cell-free nucleic acids were determined by using a NanoDrop spectrophotometer (Thermo Fisher Scientific, Waltham, MA) and Agilent 2100 Bioanalyzer and DNA HS chips (Agilent Technologies, Palo Alto, CA), respectively.
Determination of sequence length distribution. All analyses were performed using Galaxy (https://usegalaxy.org). FASTQ files were filtered by the FILTER By Quality program with default parameters. Paired-end reads were merged using the PEAR program and default parameters, and assembled reads with the Peg10-F sequence at the 5′ end and the Peg10-R sequence at the 3′ end were filtered by the Barcode Splitter program. The lengths of the filtered sequences were counted by the