Find and cut-and-transfer (FiCAT) mammalian genome engineering

While multiple technologies for small allele genome editing exist, robust technologies for targeted integration of large DNA fragments in mammalian genomes are still missing. Here we develop a gene delivery tool (FiCAT) combining the precision of a CRISPR-Cas9 (find module), and the payload transfer efficiency of an engineered piggyBac transposase (cut-and-transfer module). FiCAT combines the functionality of Cas9 DNA scanning and targeting DNA, with piggyBac donor DNA processing and transfer capacity. PiggyBac functional domains are engineered providing increased on-target integration while reducing off-target events. We demonstrate efficient delivery and programmable insertion of small and large payloads in cellulo (human (Hek293T, K-562) and mouse (C2C12)) and in vivo in mouse liver. Finally, we evolve more efficient versions of FiCAT by generating a targeted diversity of 394,000 variants and undergoing 4 rounds of evolution. In this work, we develop a precise and efficient targeted insertion of multi kilobase DNA fragments in mammalian genomes.

H uman gene editing technologies have significantly progressed over the last few years by the development of new editing tools 1 . Traditionally, gene editing was based on the design of artificial endonucleases that induce a double-strand break (DSB) into the sequence of interest in the genome 2 . Cells repair these DSB through one of two major pathways: nonhomologous end joining (NHEJ) or homology directed repair (HDR) 3 . Recently, editing independent on DSBs has been developed with methodologies based on directly editing DNA bases with deaminases, namely base editors (BEs) 4 and in situ replacing DNA bases with aid of a reverse transcriptase, namely prime editors (PEs) 5 . However, BEs and PEs only target a small number of bases, and HDR-based editing scales poorly with size 6 .
Pathological genetic defects can range from a few bases to large deletions, and there is a need for gene editing technologies to be able to handle an increased range in size capacity. Precise gene delivery methodologies based on NHEJ have been developed such as homology independent targeted integration (HITI) 7 . This methodology has been demonstrated for insertions of several kilobases but remains inefficient for very large edits 6 . While HITI might work to deliver exons, it may not be efficient enough to robustly deliver cDNAs of genes such as dystrophin (~14 kb) or ABCA4 (~6.8 kb). HITI has been expanded to improve efficiency on DNA by fusion to DNA binding domains recently 8 . In bacteria, precise gene delivery has been demonstrated using CRISPR programmable transposons 9,10 but this technology is not available for mammalian cells yet.
Previous attempts of fusing zinc fingers or dead Streptococcus pyogenes Cas9 (Cas9) to the mammalian compatible piggyBac (PB) or Sleeping Beauty transposases delivered systems with relatively low levels of precision [11][12][13] . PB transposase is an attractive tool for gene therapy as efficiency scales well with inserted payload size 14 , it is a mutation independent technology, and it has reduced dependence on DNA repair endogenous machinery.
In this work, we develop an efficient and precise programmable gene delivery technology based on an engineered Cas9-PB fusion protein with capability to deliver small and large payloads. We test the technology in cellulo achieving on-target efficiencies of 5-22% with low or absent off-target events and we have demonstrated on-target gene transfer in vivo to mice liver, as well as germline cells in mouse models. Finally, we perform directed evolution of FiCAT and further improved efficiency by~25-30%.

Results and discussion
Cas9 fused to PB and on-target integration reporter system. We combined genome-targeting precision of the Cas9 protein with PB variants that exhibit enhanced payload preparation (excision activity) and lower promiscuous DNA binding by expressing a Cas9-PB fusion protein. In order to isolate the best performing combination we developed a sensitive reporter system for targeted gene insertion, based on the reconstitution of a fluorescent protein ORF upon on-target integration. A promoterless C-terminal (C-t) half of Emerald GFP (emGFP) preceded by a splicing acceptor was randomly inserted in the genome of Hek293T cells to build a reporter cell line. A "docking site" (labeled as "target" in Fig. 1a) was added upstream of the C-t emGFP. We embedded homologous sequences to multiple genomic sites including AAVS1 and TRAC in this reporter. A PB transposon payload containing an N-terminal (N-t) half of an emGFP followed by a splice donor was used as a reporter for programmable insertion as on-target integration of the PB payload in the reporter cell line yielding emGFP expression (Fig. 1a,  Supplementary Fig. 1). Overall transposition efficiencies (ontarget and off-target) can be measured by using a PB transposon encoding a full-length RFP under a constitutive promoter (Fig. 1b). This assay allowed accurate detection of on-target and total transposition activities using flow cytometry.
Cas9 and PB diversity exploration. We started by exploring three variants of Cas9 (nuclease (Cas9), nickase (nCas9), and dead (dCas9)) fused to the N-t or C-t of nonmodified hyPB (Fig. 1b). The highest on-target insertion activity was obtained using the N-t-Cas9-PB-C-t configuration (referred to as Cas9-PB from this point on) where insertion depended on the intact nuclease activity of Cas9, suggesting a role of Cas9-generated DSB in facilitating the on-target insertion activity of the transposon payload. Different linkers (Supplementary Table 1) were tested for this combination but no significant differences in activity were observed ( Supplementary Fig. 2). To further improve the activity of Cas9-PB, we sought to introduce mutations in PB that on the one hand increase its donor DNA excision activity thus providing more substrate primed for integration, and on the other hand decrease its intrinsic target DNA binding activity thus increasing its dependence on Cas9 targeting to specific genomic sites (Supplementary Table 2). Previously reported PB mutants with increased excision (D450N and M194V) were selected 15 . To identify candidate mutations that decrease target DNA binding, we generated a structural model of PB using the Robetta structural prediction algorithm and superimposed the predicted structure of the PB catalytic core over that of HIV integrase 16 bound to host and donor DNA (Supplementary Figs. 3 and 4) since the catalytic core of both enzymes adopts an RNase H-like fold. Based on this superimposition, we mutated the basic residues that contact the target DNA (R372, K375 and R376) and neighboring acidic residues (E377 and E380) to alanine. Towards the end of this work, the atomic structure of PB bound to donor and target DNA was determined by cryo-EM 17 , confirming R372, K375 and R376 as target DNA binding residues. To develop the FiCAT prototype, we generated Cas9-PB containing various combinations of PB mutants (Fig. 1c, d). Reporter cell line assays showed the highest levels of programmable insertion from PB variants combining mutants with increased excision activity and decreased target DNA binding activity, which is consistent with our principle of design. The dependence of Cas9-PB harboring target DNA (t-DNA) binding mutants in PB is further highlighted in the loss of integration activity in the absence of Cas9 ( Supplementary Fig. 5a). In order to elucidate PB's role on the integration into targeted loci, PB catalytic residues were mutated and it was demonstrated that FiCAT targeted insertion is lost when catalytic activity is compromised ( Supplementary Fig. 3c). Also, a donor DNA lacking inverted terminal repeat (ITR) was generated and tested with no detectable integration events ( Supplementary Fig. 5b). Interestingly, cumulative mutations of target DNA binding residues (R372A, K375A, R376A) correlated with a decrease in integration activity, which may be consistent with an active role of PB in integration with a minimal requirement of intrinsic t-DNA binding capacity onto Cas9-generated DSB ( Supplementary Fig. 3d). Similarly, a recent PB structural study suggests that the reduced insertion capacity of a R372A and K375A mutant is due to weakening of target DNA binding 17 , but detailed catalytic contribution of PB may require further mechanistic studies.
Cas9 and double-strand break role on FiCAT mechanism. To further explore the role of the DSB activity of Cas9 in facilitating targeted integration, we uncoupled on-site targeting and DSB activity by using a zinc finger-PB fusion (Znf-PB), without nuclease, for directed localization of the transposon and complemented it with on-site DSB by an independent Cas9 nuclease.
We used a zinc finger targeting the upstream region of the half GFP reporter cell line. Znf-PB fusion exhibited no targeted insertion activity that was rescued when combined with introducing DSBs near the Znf binding site with gRNA guided-Cas9 (Fig. 1e). These results are consistent with a mechanism where DSB generation by Cas9 in the vicinity of PB facilitates the insertional activity of PB and bypasses its requirement for the TTAA motif at the insertion site. Characterization of the ontarget site showed that Cas9-PB-mediated insertion occurs exactly at the Cas9 induced DSB, with the presence of small indels near the targeting site, and that ITR sequences get disrupted (Fig. 2a, Supplementary Fig. 6). An important practical consequence of this disruption combined with absence of TTAA is the irreversibility of the FiCAT-mediated integration mechanism 18 (Supplementary Fig. 7). This mechanism likely contributes to the efficiency of programmable insertion by Cas9-PB, and the coupling of "find" and "cut" activity of Cas9 with "transfer" activity of modified PB contributes to the observed levels of precision.
On-target and off-target insertion sites characterization. We next characterized precision levels of targeted insertion. First, FiCAT precision is dependent on Cas9 DNA recognition accuracy. We evaluated the gRNA off-target levels computationally and by targeted sequencing and we could not detect off-target signals above the background ( Supplementary Fig. 8). Second, we used a single-tail adapter/tag (STAT)-PCR based method followed by next-generation sequencing to capture payload-genome junctions 19,20 , we were able to precisely characterize FiCAT ontarget and off-target insertion sites (Fig. 2b). On-target insertions detected do not occur at TTAA sites surrounding the gRNA site, further demonstrating integration on DSB sites generated by Cas9 and resulting in the loss of preferred excision substrate. We analyzed the precision of FiCAT technology targeting the TRAC loci in WT Hek293T cells (Fig. 2b, Supplementary Fig. 9, Supplementary Table 3). In order to capture unbiasedly on-target and off-target insertions, the full-length RFP expression cassette transposon was used in this experiment. All insertions were characterized by (STAT)-PCR in enriched edited cells. We compared (STAT)-PCR results across all FiCAT variants. We detected all insertions on-target for the variant R372A_K375A_D450N with a limit of detection (LOD) 1% ( Supplementary Fig. 10). This variant was selected for further characterization.
FiCAT comparison to HDR and HITI. We have benchmarked FiCAT technology with current methods for precise gene delivery such as Cas9 based HDR, HITI (Fig. 3a, b and Supplementary  Fig. 11). We constructed payloads of multiple sizes ranging from   2.5 to 9.5 kb. FiCAT shows higher efficiencies, a gap which widens in large payloads. The best mutants achieved insertions (up to 8 kb) with twofold more efficiency than HDR and high accuracy. We compared FiCAT with a HITI variant in which we fused Cas9 to a catalytic dead version of PB, which may help in recruiting DNA to the insertion site as it has been recently suggested by a similar approach using the DNA binding domain of the SB100 transposase. FiCAT presents twofold higher efficiency compared to alternative aided HITI methods (Fig. 3b). PB is quite unique in presenting seamless excision with precise repair of the template gap 17 . FiCAT may present a safer excision mechanism leaving a ligated circular template as opposed to an open double stranded linear fragment left by HITI or HDR approaches which likely presents higher genome toxicity and higher risk of uncontrolled insertion.
Modified nuclease and payload (minicircle (MC)). To further expand the applications of FiCAT we explored its performance in the context of other nucleases in addition to SpCas9. We obtained good programmable insertion activity for CjCas9 and LbCpf1, while CasX did not achieve any programmable integration in our assay. Notably, SaCas9 had the highest levels of programmable insertion among the Cas proteins tested, with similar levels to SpCas9 fused to modified hyPB (Fig. 3c). Editing activity for each target site and Cas protein variant is also shown for normalization purposes ( Supplementary Fig. 12). In addition to the use of SaCas9 as the "find module" in FiCAT, we explored additional mechanisms to maximize on-target insertion. We compared the efficiency of on-target delivery with a MC transposon in contrast to a transposon containing a full backbone (plasmid) and expressed the Cas9 in fusion with two monomers of hyPB (R372A_K375A_D450N) in the same polypeptide; using both approaches we could increase the on-target insertion activity achieving a 16% of cells with programmable insertion (Fig. 3d).
FiCAT characterization in K-562, C2C12 and in vivo. In order to characterize FiCAT beyond Hek293T we performed targeted gene delivery into myoblasts, K-562 cell line and mice liver. Precise insertion into the C2C12 myoblast cell line was performed with an efficiency of~20%. Junction PCR and (STAT)-PCR were used to measure on-target and off-target efficiency (Fig. 4a-d).
STAT-PCR was also used to measure targeted insertion in K-562 cells ( Supplementary Fig. 13). With the aim to demonstrate FiCAT activity in vivo, we performed precise gene delivery to the liver of mice using in vivo JetPEI transduction reagent or hydrodynamic both with retro-orbital intravenous injection. In  addition to the expression plasmid DNA for FiCAT, we produced mRNA by in vitro transcription of FiCAT R372A_K375A_D450N ( Supplementary Fig. 14a). We delivered FiCAT to mice liver targeting Rosa26 genomic safe harbor together with RFP, GFP or luciferase encoding transposon either in plasmid or MC form. High copy number of transgene was observed compared to an endogenous gene TFRC (Fig. 4e) and maintained transgene expression overtime (Fig. 4f, Supplementary Fig. 14b). PCR of the junction between 3′ ITR and genomic locus was used to measure the newly formed on-target insertion (Fig. 4g). For the different in vivo experiments, mice were maintained 4-5 weeks after injection before analysis of the data to allow episomal plasmid/MC DNA to dilute. We also tested FiCAT in a germline murine model, achieving 57% delivery efficiency of a GFP MC (Supplementary Fig. 14c).
FiCAT directed evolution. After deployment, benchmarking and characterization of FiCAT technology, a combinatorial library of 17 PB aa variants was designed to further improve FiCAT on-target activity. Mutations were chosen based on an extensive biochemical and structural data analysis (Supplementary Table 2) in order to enhance excision (450, 560, 564, 573, 589, 592, 594), reduce t-DNA binding activity (245, 275, 277, 347, 372, 375, 465) and explore importance of homologous key residues on HIV integrase integration specificity (325, 347, 351) reaching a total diversity of 394,000 variants compromising all possible combinations of selected mutations (Fig. 5a, Supplementary Table 2). Candidates were selected using the reporter cell line for on-target insertion (GFP positive cells) inserting the FiCAT library variants into cell genome using lentivirus followed by 4 consecutive selection cycles (Fig. 5a). Reporter cell line was first infected with lentivirus containing Cas9 linked to PB combinatorial library and after, it was transfected with ½ GFP transposon and gRNA targeting AAVS1 plasmids. Cells were sorted for GFP expression (on-target insertion of the payload), genomic DNA was extracted and cloned into a lentiviral vector for the next round of selection. We performed evolution cycles until the average efficiency of the evolving population was higher than the FiCAT variant R372A_K375A_D450N. Validation of the cycles was performed by assessing each cycle average population on-target efficiency in plasmid variants mixture (Fig. 5b) and in the infected population ( Supplementary Fig. 15a). Best performing FiCAT variants were selected and transfected individually with AAVS1 gRNA and MC ½ GFP. First, a random selection of 96 variants was performed ( Supplementary Fig. 15b) and best performing variants were screened separately and the six with best on-target efficiencies were selected (Fig. 5c). A summary of best PB aa variants for high on-target insertion (Fig. 5d) confirms the importance of residues that result in increased excision activity (D450N) and reduced t-DNA binding (R372A, R375A). Additional residues associated with excision F594L and T560A seem to contribute to increased targeted efficiency. Interestingly, the N347S variant adjacent to catalytic triad has also been detected; cryo-EM of the PB strand transfer complex shows that N347 is a t-DNA binding residue 21 , and our earlier structural modeling ( Supplementary Figs. 3 and 4) suggested that it occupies an equivalent position to Integrase N117 t-DNA binding residue in HIV intasome 22 . Interestingly, a spontaneous mutation R202K was detected in one of the variants. The cryo-EM structure of the PB shows that the side chain amino group of R202 hydrogen bonds with the phosphate backbone in the ITR (Supplementary Fig. 16). The protonated amino group of the K side chain likely results in establishment of a stronger ionic interaction with the phosphate backbone of the ITR. Further characterization of the mechanistic basis of the enhancement of FiCAT activity by these mutants will be needed to better understand the molecular process in which FiCAT performs programmable gene transfer.
To sum up, we have coupled Cas9 target DNA recognition and cleavage with DNA cut-and-transfer activity of a modified PB to generate an efficient tool to perform precise and efficient gene delivery. It was key to modify together the pair so that they act synergistically: Cas9 finds and marks the genomic insertion point and the transposase with potentiated donor excision and reduced promiscuous DNA binding contributes to the genetic insertion. The system acts irreversibly by destroying the preferred transposase recognition site during insertion. This technology scales well with payload size. We demonstrated its efficacy in human fibroblast, mouse myoblast cell models and in vivo mouse liver. We envision FiCAT technology as a generalized platform for therapeutic gene writing for advanced therapies and other applications. We are currently working on preclinical proof of concept studies involving delivery of FiCAT with lipid nanoparticles which will further elucidate FiCAT's impact in the scientific community.
Nuclease, nickase and dead Cas9 fusions to hyPB and ½ emGFP transposon were performed by Golden Gate assembly using BspQI enzyme and standard methods. CasX, CjCas9, LbCpf1, and SaCas9 fused to hyPB R372A_K375A_D450N expressing vectors were cloned on pcDNA4 using Golden Gate assembly and Esp3I according to manufacturer recommendations.
MC plasmid of ½ emGFP SMN1 transposon was obtained amplifying it from previously described ½ GFP transposon and cloning into pMC BESPx MCS1 (Systems Biosciences) and transformed into YCY10P3S2T Minicircle Production Strain (Systems Biosciences). MC production was performed according to the manufacturer's protocol. Different mutations were introduced into hyPB sequence fused to Cas9 (Cas9_PB plasmid) by site directed mutagenesis following QuikChange Lightning mutagenesis kit's instructions (Agilent). Primers were designed with QuickChange Primer Design to achieve following mutations to the hyPB sequence: M194V, R245A, G325A, R372A, K375A, R376A, E377A, E380A, D450N, S564P. Cas9-hyPB_R372A_K375A_D450N coding plasmid was deposited at Addgene (#179381). All plasmids are available upon request. PB ½ emGFP SMN1 was obtained by introducing the first half of emGFP sequence and SMN1 intron 6 sequence into PB acceptor vector. pT4 SMN1 2/2 emGFP was obtained by adding a second half SMN1 intron 6 and partial emGFP in SB100X transposon vector. emGFP sequences containing SMN1 were obtained from DYP004reporter 26 , a kind gift from Sri Kosuri.
Luciferase transposon was obtained by cloning firefly luciferase preceded by a CMV promoter into pMC BESPx MCS1.
Transposon and HDR templates of different sizes were generated by cloning a partial cDNA (NC_000006.12) fragment upstream of the split emGFP reporter system. Lentiviral payload was prepared from pSICO obtained from Addgene (Addgene plasmid #11578) and Cas9 and Esp3I cloning sites were introduced to provide a Golden Gate acceptor vector for the PB variants combinatorial library.
emGFP splicing based reconstitution assay. Hek293T cell line containing pT4 SMN1 2/2 emGFP was generated by PEI mediated transfection of SB100X and pT4 SMN1 2/2 emGFP DNA constructs, followed by single clone expansion and PCR genotyping (Supplementary Table 5 Junction PCRs for insertion site sequencing. Junction PCR was performed on sorted cells with BD FACSAria (Biosciences). Selected cells had on-target insertion of PB ½ emGFP or RFP transposon targeting AAVS1, TRAC, lama 271.1, rosa26 target site on reporter cell line, Hek293T, K-562, c2c12 or liver tissue. In the case of liver tissues a second nested PCR was performed. Genomic DNA was extracted using DNeasy Blood and tissue kit (Qiagen). Primers were designed by the 3′ ITR of the transposon (forward) and targeting the different genomic locations studied taking into account insertion at + or − strand (reverse) (Supplementary Table 5).
Library prep and Illumina sequencing for targeted insertion analysis. We implemented STAT-PCR 18 amplifying the 3′ ITR of the transposon DNA coupled to Illumina sequencing to capture genome integration sites with high sensitivity. Genomic DNA was extracted from enriched cells by flow cytometry sorting using DNeasy Blood and tissue kit (Qiagen) and fragmented to 500 bp fragments using Q800R3 Sonicator. End repair, A-tailing, and ligation of Y-adapter were performed using KAPA Hyper Prep Kit (KR0961-v5. 16) and 3 μg of fragmented genomic DNA, followed by AMPure XP SPRI bead purification at 1X ratio. After adapter ligation, each sample was split in two and amplified with GSP5′ or GSP3′ to capture 5′ and 3′ junctions, respectively. To capture 5′ and 3′ transposon-genome junctions, two nested PCRs were performed using KAPA HiFi DNA Polymerase following manufacturer protocol: PCR1 with P5_1 and PB_5_GSP1 or PB_3_GSP1 in a 25 μl final volume and PCR2 with P5_2 PB_5_GSP2 or PB_3_GSP2 in a 25 μl final volume. 5′ and 3′ PCR products were purified with AMPure XP SPRI bead purification at 1X ratio, mixed in equimolar ratio and sequenced with Illumina Miseq Reagent Kit V2-500 cycles (2 × 250 bp paired end). Three microliters of 100 μM custom primers index 1 and read 2 were added to the sequencing reaction.
Bioinformatics analysis of targeted integration analysis. Illumina reads were clustered with usearch v11.0.667 27 and mapped to the reference using bwa-mem v0.7.17 28 . For on-target insertion characterization, reads covering 5′ and 3′ junctions from the target insertion site were selected with Python scripting and Samtools 1.10 29 . Number of indels was obtained with CRISPR-GA 30 . For on-target and off-target experiments, clustered reads that mapped against the vector were selected and mapped against the reference genome using bwa-mem in short reads and minimap2 v2.17 31 in long reads. Significance of the insertion peaks was assessed with macs2 v2.2.5 32 algorithm and taking into account the standard deviations of read start and end positions. We estimated the LOD of the method by diluting the positive UMIs computationally, we selected randomly 1%, 10%, 25%, 50% and 99% of the positive UMIs while maintaining the 100% negative UMIsand repeated the dilution process for 100 replicates . We analyzed the dilution samples with the previously described pipeline and applied a logarithmic transformation to the fold enrichment of on-target peaks (the predictor variable). Then, we extrapolated the dilution at fold enrichment 0 in order to determine the minimum percentage of ontarget sample needed to detect a significant peak (Supplementary Fig. 10). We estimated between 0.1% and 9% of LOD for all positive samples.
in vivo targeted insertion to mice liver. Animal experimentation procedures were approved by the Animal Experimentation Ethic Committee of Barcelona Biomedical Research Park. C57BL/6J, 8-10 weeks old, were used for this study. Animals were purchased from Jackson Laboratories, male and female were used without distinction. FiCAT mRNA was produced by in vitro transcription with RiboMAX Large Scale RNA Production Systems-T7 (Promega) following the manufacturer's instructions. Rosa26 gRNA 33 was purchased from Synthego. FiCAT mRNA or plasmid, sgRNA or gRNA plasmid targeting Rosa26 and PB512-B, luciferase or GFP MC transposon were injected via retro-orbital using two delivery methods. For in vivo JetPEI delivery plasmids were used in a 1 FiCAT:2.5 gRNA:2.5 transposon molecular ratio. A total of 60 μg of nucleic acids was complexed with In vivo JetPEI (Polyplus transfection) at NP ratio 7. For hydrodynamic injection, a total of 10 to 10.2 μg of nucleic acids were used (6 μg MC-luciferase transposon/MC-GFP transposon, 2 μg FiCAT pDNA/3 μg FiCAT mRNA, 2 μg gRNA pDNA/1.2 μg sgRNA targeting Rosa26.2).
Nucleic acids were diluted with PBS and 7% of animal body weight in ml was injected in less than 7 s via retro-orbital systemic injection.
Whole body imaging of luciferase expression was performed at different timepoints after FiCAT-gRNA-transposon or transposon control administration with IVIS spectrum imaging system (Caliper Life Sciences). Images were taken 5 min after intraperitoneal injection of D-Luciferin potassium salt (Gold Biotechnology) according to the manufacturer's instructions.
For qPCR copy number analysis of PB512-B transposon, animals were euthanized 10 days after injection and the liver was isolated and homogenized. Genomic DNA was extracted from liver samples with DNeasy Blood and tissue kit (Qiagen) Transposon relative Copy number to Tfrc endogenous gene was obtained by qPCR (primers listed in Supplementary Table 5).
PB combinatorial library screening. DNA library was produced by Twist Bioscience, cloned into a lentiviral vector containing Cas9 and Esp3I golden Gate cloning site, and transformed into ElectroMax Stbl4 competent cells (Thermo Fisher), ensuring 100 times representation of each combinatorial variant. Plasmids were purified with HiPure Maxiprep kit (Life technologies) and cotransfected with envelope and packaging plasmids into Hek293T cells to produce lentivirus. Lentivirus was harvested, filtered and titered comparing functional titer (GFP fluorescent cells by GFP carrier lentivirus infection) with qPCR based titer 34 . Reporter cell line containing C-t half of GFP sequence was infected at MOI 1 corrected by PB copy number (to avoid bias for cloning efficiencies between cycles). Infected cells were transfected with ½ GFP plasmid and gRNA targeting AAVS1 sequence into the reporter target side, transfections were performed as previously described. On-target positive cells were selected by flow cytometry sorting 5 days after transfection and genomic DNA was extracted. Genomic DNA product was used to be cloned and start a new cycle, PB was amplified by PCR from genomic DNA and cloned into a lentiviral vector containing Cas9 with Golden Gate assembly.
PB structural modeling. A 3D structure of the Trichoplusia ni PB transposase protein was obtained by Robetta Web protein structure prediction server (http:// robetta.bakerlab.org). The core domain (131-550aa) was predicted by Rosetta Comparative Modeling method that is based on Monte Carlo algorithm with embedded Cartesian-space minimization and all-atom optimization 35 . The tertiary structure fold was analyzed and validated with SPServer and ProSa-Web knowledge-based methods (Supplementary Fig. 3). Secondary structure was analyzed with PSIPRED and HHPred machine-learning based methods. PB's core was then modeled for refinements with PyMOL by comparative protein modeling methods. The refinement process was guided by the superimposition of the PB model with cryo-EM HIV-1 strand transfer complex intasome (PDB ID: 5U1C) consisting of the HIV integrase tetramer bound to viral DNA and target host DNA and X-ray diffraction Tn5 transposase complex structure (PDB ID: 1MUS 36 ). Strandtransferring DNA and donor DNA were extrapolated from the superimpositions of HIV-1 intasome and Tn5, respectively. The nucleotides in the interface in contact with the protein were analyzed with X3DNA as double-strand DNA. We used statistical potentials to score the interaction between protein and DNA and generate a theoretical PWM 37 . The theoretic PWM is obtained by testing all potential double-strand DNA sequences in the interface, ranking them with the statistical potentials and selecting the top to make a multiple sequence alignment. During the submission of this manuscript a cryo-EM structure became available, which shows important agreement with modeling performed 17 . Cryo-EM structure of PB transposase strand transfer complex (PDB ID: 6X67) confirmed the general fold of the model and the domains we hypothesized were responsible for the contact with donor and target DNA.
Statistics and reproducibility. No statistical method was used to predetermine sample size. No data were excluded from the analyses. The experiments were not randomized. The investigators were not blinded to allocation during experiments and outcome assessment.
Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
The next-generation sequencing data generated in this study have been deposited in the European Nucleotide Archive under the study accession code PRJEB39575. The piggyBac Catalytic Core with DNA has been deposited in the Model Archive database under https://modelarchive.org/doi/10.5452/ma-oaxcu with the accession code HKJnRCqk3U. Sequences of plasmids used in this work are provided as a Supplementary Data file, plasmids.fasta. Source data are provided with this paper.