Retargeted Foamy Virus Vectors Integrate Less Frequently Near Proto-oncogenes

Retroviral gene therapy offers immense potential to treat many genetic diseases and has already shown efficacy in clinical trials. However, retroviral vector mediated genotoxicity remains a major challenge and clinically relevant approaches to reduce integration near genes and proto-oncogenes are needed. Foamy retroviral vectors have several advantages over gammaretroviral and lentiviral vectors including a potentially safer integration profile and a lower propensity to activate nearby genes. Here we successfully retargeted foamy retroviral vectors away from genes and into satellite regions enriched for trimethylated histone H3 at lysine 9 by modifying the foamy virus Gag and Pol proteins. Retargeted foamy retroviral vectors integrated near genes and proto-oncogenes less often (p < 0.001) than controls. Importantly, retargeted foamy retroviral vectors can be produced at high, clinically relevant titers (>107 transducing units/ml), and unlike other reported retargeting approaches engineered target cells are not needed to achieve retargeting. As proof of principle for use in the clinic we show efficient transduction and retargeting in human cord blood CD34+ cells. The modified Gag and Pol helper constructs we describe will allow any investigator to simply use these helper plasmids during vector production to retarget therapeutic foamy retroviral vectors.

regions in the human genome 21 . Although this approach was successful in retargeting LV integration, it requires modifying a target cell line to express a chimeric LEDGF-CBX1 protein, which is not practical for clinical use.
Foamy retroviral vectors (FVs) have several desirable properties in regards to genotoxicity. In contrast to gammaretroviral vectors (GVs), FVs naturally have a lower tendency to integrate within CpG islands and near promoters 12 . FVs also integrate less often within genes than LVs. In addition, FVs have not been shown to exhibit read-through transcription and have a lower potential to activate nearby genes than either GVs or LVs 22 . The interactions of the foamy retroviral PIC with host proteins are not as well defined as lentiviral PIC interactions. However, the foamy virus Gag contains a chromatin binding site (CBS) motif in the C-terminus. The CBS is present in foamy retroviral PICs and interacts with the core histones H2A and H2B on host chromatin 23 . The CBS is a distinguishing feature of the foamy virus Gag protein. Mutations in the CBS negatively affect the ability of the foamy virus PIC to interact with host chromatin.
Here we demonstrate a clinically relevant approach to retarget FV integration into satellite elements and H3K9me3 regions and away from proto-oncogenes. We were able to significantly alter the integration profile of FVs, reducing the frequency of retroviral vector integration sites (RISs) near genes and proto-oncogenes. Efficient retargeting required modifying the FV Gag CBS and fusing the FV IN to CBX1. Importantly, using this approach retargeted FVs can be produced at clinically relevant titers and any target cell, including CD34 + human cord blood cells, can be used without prior modification by simply using alternate foamy helper plasmids during vector production.

Results
Retargeted FVs can be produced at clinically relevant titers. To retarget FVs we developed 2 modified FV helper plasmids. The FV Gag helper plasmid CBS was altered using a previously described triple alanine substitution of RTY mutation shown to eliminate chromatin binding 23 . FV Gag (pFVGag-CBS-RTY) expresses a CBS domain with the triple alanine substitution of RTY (Gag-RTY). The FV Pol helper plasmid (pFVPol-CBX1) expresses CBX1 fused to the C-terminus of IN (IN-CBX1) with a flexible glycine-serine, (GS) 3 , linker (Fig. 1a,b). In this approach modified FVs are expected to integrate in areas enriched for H3K9me3 that are not close to genes. A control vector preparation (FV Control ) and 3 modified vector preparations (FV IN-CBX1 , FV Gag-RTY , and FV IN-CBX1 & Gag-RTY ) were produced in parallel by transient transfection with the appropriate Pol, Gag, and Env helper plasmids and the FV-SGW-KO vector plasmid, which expresses an enhanced green fluorescent protein (EGFP) gene. Vector titers were determined by exposing HT1080 cells to vector containing media and analyzing the cells for EGFP expression. Although the unconcentrated titers of FV Gag-RTY and FV IN-CBX1 & Gag-RTY were between 4 and 5 fold lower than control unmodified FV (Fig. 2), all modified FVs could be produced at clinically relevant titers, > 10 7 transducing units (TU)/ml, after a 100-fold concentration.
Retargeted FVs integrate near H3K9me3 enriched regions and satellite elements. We next asked whether the modified FVs showed a preference for integrating into or near H3K9me3 histone modified regions since CBX1 is known to bind to H3K9me3. IMR90 normal human fibroblast cells were transduced with the control and modified vectors. RISs were sequenced using modified genome sequencing (MGS)-PCR 24,25 , resulting in over 900 captured RISs (Table 1) for each vector. The RISs were analyzed to determine the distance of each site relative to H3K9me3 peaks in IMR90 cells previously determined by ChIP-Seq. FV IN-CBX1 & Gag-RTY was significantly retargeted the most efficiently relative to FV Control (p < 0.001), with 19.2% of the RISs being within areas with high levels of H3K9me3 (Fig. 3a). Interestingly, FV Gag-RTY also had a higher number of RISs within H3K9me3 regions when compared to FV Control (p < 0.001). FV with only the Pol mutation, FV IN-CBX1, had a weaker but significant retargeting for H3K9me3 regions (p < 0.05). Thus, eliminating the Gag CBS binding domain had a stronger effect on retargeting than the IN-CBX1 fusion, but the combination of both mutations led to the strongest retargeting.
Since the combination of IN-CBX1 and Gag-RTY proteins had the most dramatic effect on retargeting FV integration into H3K9me3 regions, CD34 + cells were transduced with the FV IN-CBX1 & Gag-RTY and FV Control to compare RISs in clinically relevant cells. The redistribution of the RISs with the FV IN-CBX1 & Gag-RTY vector in CD34 + cells was consistent with the trend observed in IMR90 cells (Fig. 3b). Efficient retargeting of FV IN-CBX1 & Gag-RTY integration to H3K9me3 sites in CD34 + cells shows that, as expected, retargeting was not specific to the fibroblast cells. We further assessed the number of RISs that were within hotspots (Fig. 4a,c and Supplemental Figure S1). FV IN-CBX1 & Gag-RTY had a much greater tendency to form RIS hotspots than FV Control in both IMR90 and CD34 + cells. Importantly, the vast majority of the RISs that fell within hotspots were also near (< 50 kb) H3K9me3 regions (Fig. 4b,d). We also investigated the proximity of the RISs from IMR90 cells to other histone modification and DNase I hypersensitive site data (Supplemental Figure S2). FV IN-CBX1 & Gag-RTY had a significant but slight preference for DNase I hypersensitivity, H3K9ac, and H3K4me3 sites, which are associated with high levels of transcription, when compared to FV Control .
We also expected that retargeting away from genes might increase the frequency of integration into repeat elements. Therefore, we analyzed whether the retargeted vectors had a preference for any repeat elements ( Table 2). Retargeted FVs integrate less frequently near genes and proto-oncogenes. FVs have distinct RIS preferences relative to genes and CpG islands when compared to LVs and GVs 12 . Since a major determinant of retroviral vector safety is the distance of RISs to proto-oncogenes and regulatory elements, the proximity of retargeted RISs to RefSeq genes, CpG islands, and proto-oncogenes was determined in both IMR90 and CD34 + cells. Retargeted FVs had distinct and favorable integration patterns from control FV relative to transcription start sites (TSSs) and CpG islands (Fig. 5). There was dramatic retargeting away from TSSs and CpG islands with the FV Gag-RTY and FV IN-CBX1 & Gag-RTY vectors when compared to FV Control . Importantly, the number of RISs near proto-oncogene TSSs was significantly reduced (p < 0.001) for FV Gag Cells transduced with retargeted FVs have efficient transgene expression and display no evidence of clonal outgrowth. Since the retargeted FVs integrated in transcriptionally less active regions in the genome, we analyzed the transgene expression in the transduced cells by flow cytometry. EGFP expression was slightly lower for all retargeted vectors when compared to the control in both cell lines, but it is important to note that there was still a relatively high level of EGFP expression, 1.3-1.4 fold lower than the control, for all modified  *** Statistically significant at p < 0.001 compared to control vector. ## Statistically significant at p < 0.01 compared to Gag-RTY vector. a Within or less than 1 kb from a CpG island. RIS, retroviral vector integration sites; TSS, transcription start site.
vectors in both normal human fibroblasts at a multiplicity of infection (MOI) of 0.5, and cord blood CD34 + cells at a MOI of 10. Vector genome copy numbers from CD34 + genomic DNA were determined with quantitative PCR (Fig. 6). FV IN-CBX1 & Gag-RTY had 19% fewer vector copies per cell when compared to FV Control , indicating the lower level of EGFP expression was due to fewer vector copies and not evidence of increased silencing for retargeted FV. We additionally performed colony forming unit (CFU) assays to determine if the retargeted FV efficiently transduced hematopoietic progenitor cells (Supplemental Figure S3). The difference in the percentage of EGFP positive CFUs, 12.3% for control FV sample and 9.4% for retargeted FV sample, was similar to the control and retargeted FV bulk CD34 + population flow cytometry analysis for EGFP, again showing no evidence of increased silencing, even after 14 days in culture.
Finally, we analyzed the clonal diversity of the transduced IMR90 and CD34 + cells by RIS analysis (Fig. 7 and Supplemental Table S2). We observed highly polyclonal populations with no evidence of dominant clones for all vectors in both cell types.

Discussion
Gene therapy has enormous potential but genotoxicity is a major concern for use of retroviral vectors in the clinic. Here we have successfully modified the integration profile of FV, resulting in significantly fewer RISs near genes and proto-oncogenes which may prove to be safer. Retargeted FVs can be produced by simply using the modified Gag and Pol helper plasmids we describe here during vector production. Retargeted FVs can be produced at high titer and efficiently transduce human CD34 + cells suggesting they will be useful for gene therapy of SCID-X1, chronic granulomatous disease, thalassemias, and potentially metabolic diseases that can be treated by hematopoietic stem cell (HSC) transplantation such as metachromatic leukodystrophy.   to retarget FV modestly reduced titers, we were able to obtain clinically relevant titers (> 10 7 TU/ml) after concentration. We also observed high levels of transgene expression relative to the control that were achieved using a low MOI in both normal human fibroblasts and cord blood CD34 + cells. There was also no evidence of vector silencing in CD34 + cells using the strong spleen focus forming virus (SFFV) promoter. The Gag-RTY modification reduced vector titer which is consistent with a study by Tobaly-Tapiero et al. in which they found fewer integrated viral copies in genomic DNA when the vector expressed Gag-RTY when compared to a control FV 23 . It was found Gag-RTY abolishes the affinity of FV PIC for host H2A and H2B histones, which reduces the efficiency of integration. A surprising observation was the stronger influence of the Gag-RTY modification on RIS retargeting than the IN-CBX1 modification. Although it is known that the foamy retroviral Gag CBS interacts with H2A and H2B histones and is partially responsible for tethering the PIC to the host chromatin 23 , this is the first study that demonstrates mutations in the Gag CBS substantially retargets FV integration. FV Gag-RTY integration frequently occurred in H3K9me3 regions (Fig. 3), which are known to be a preferred target for CBX1. Upon further analysis, it was determined that many of the retargeted RISs were also in satellite repeat elements and near centromeres. Satellite on the ability to engage nucleosomes 26 . It was further shown that specific amino acid substitutions in IN impede the ability of the FV intasome to interact with nucleosomes. Interestingly, these IN mutant FVs integrated more frequently within genes, which is a stark contrast to the reduction of integration within genes observed in our study with FVs harboring Gag-RTY (  accessible to the FV PIC and this may be dependent on the local chromatin structure immediately preceding integration. There may also be viral or host proteins that interact with foamy viral Gag CBS that influence integration site preferences which have yet to be determined. Our findings suggest the Gag CBS has a major role in integration site selection and Gag-RTY retargets FV integration into satellite repeats which are predominately gene sparse and frequently bear the H3K9me3 mark. The addition of IN-CBX1 significantly increased the number of RISs within H3K9me3 regions. We also observed that FV IN-CBX1 & Gag-RTY had a greater preference for DNase I hypersensitivity, H3K9ac, and H3K4me3 sites than FV Control (Supplemental Figure S2). A possible explanation is that vectors retargeted to H3K9me3 sites or satellite elements have a preference for chromatin that is more accessible. However, it is important to consider that the state of histone methylation and acetylation during integration is not known. It has been shown that certain genes transiently bear the H3K9me3 mark during transcription activation 27,28 . IMR90 cells transduced with FV IN-CBX1 & Gag-RTY had a much greater number of RISs in satellites ( Table 2) and did not have dramatically more RISs in H3K9ac regions (Supplemental Figure S2) when compared to FV Control , which suggests the retargeted FV is not being directed to H3K9me3 regions overlapping highly expressed genes. It is also important to determine if a retroviral vector has any strong integration site preferences and thus has a tendency to form integration hotspots. Hotspots were observed at much higher frequency with FV IN-CBX1 & Gag-RTY than FV Control (Fig. 4). However, FV IN-CBX1 & Gag-RTY RISs in hotspots had a strong tendency to also be near H3K9me3 regions. FV IN-CBX1 & Gag-RTY formed hotspots in putatively safe genomic regions, as H3K9me3 sites are associated with gene sparse regions. Also, we observed that RISs near H3K9me3 sites were farther away from proto-oncogene TSSs on average in CD34 + cells (Supplemental Table S1). Thus retargeted FVs have increased hotspots, but these are in gene sparse areas. This observation also provides more evidence that utilizing both the IN-CBX1 fusion protein and the Gag-RTY mutant protein results in the most efficient retargeting into gene sparse regions. The formation of integration hotspots thus appears to be a result of retargeted integration site selection rather than an indication of genotoxicity. Clonality analyses showed that all IMR90 and CD34 + cell populations were polyclonal at 23 days post vector exposure (DPVE) and 10   of clones contributing to the total cell population, with 8 clones contributing 1-2% in CD34 + cells at 10 DPVE. However, it is important to note that all of the RISs were greater than 100 kb from the nearest proto-oncogene TSS in these clones. Also, the total number of clones contributing to the retargeted FV cell populations may be slightly underrepresented due to the increased number of RISs in repetitive regions which are not included in clonality analyses. Future in vivo studies are needed to determine if FV IN-CBX1 & Gag-RTY has a dramatic effect on the resulting hematopoietic repopulating cell populations.
It has been shown that modifying the host cells to express a lentiviral IN-LEDGF fusion protein can alter the integration profile of LVs 20 . However, methods that require modifying the host cells prior to transduction are not a clinically relevant approach. As an alternative to manipulating tethering factors, proteins such as zinc finger and endonucleases fused directly to IN have shown weak retargeting in LVs [29][30][31] . Although there were only slight effects on retargeting integration, these experiments demonstrated the feasibility of modifying vectors directly. Recently, Ashkar et al. demonstrated GV integration can be targeted away from active transcription areas in the genome by introducing mutations in the IN regions that interact with bromodomain and extra-terminal (BET) proteins. GV IN interacts with host BET proteins and directs the GV PIC into or near TSSs [32][33][34] . Specifically in this study, single point mutations or complete truncation of the C-terminal region of GV IN resulted in an integration profile distinct from normal GV, with fewer RISs being near proto-oncogene TSSs 35 . However, vector titers were not expressed in TU/ml and the distributions of RISs were not analyzed in human cord blood CD34 + cells in their study. Also, while it was shown that the modified GVs had a similar percentage of RISs near proto-oncogene TSSs when compared to published normal FV data, we have demonstrated that the retargeted FVs have even less of a preference for proto-oncogene TSSs than normal FVs (Table 1). Thus, retargeted FV have a potentially safer integration profile than BET retargeted GV. Other inherent advantages of using FVs over other retroviral vectors include a lower potential to activate nearby genes and they have not been shown to exhibit read-through transcription 12,22 .
An important advantage of our approach is the ability to transduce target cells by simply using different helper plasmids during vector production. To obtain optimal retargeting of FV into H3K9me3 regions, both the Gag mutant and the IN-CBX1 fusion protein were necessary. This suggests that reducing the affinity of the foamy retroviral Gag CBS for H2A and H2B histones may be necessary to facilitate retargeting FVs with a modified IN. Importantly, we also show that the foamy IN can tolerate relatively large protein fusions. The CBX1 protein fused to the foamy IN is 21 kDa. Using Gag-RTY in conjunction with other FV IN fusion proteins, such as an IN-zinc finger or an IN-Cas9 fusion protein, should be considered in future work when developing retargeted FVs.
In summary the retargeted FVs we have described here integrate less frequently near proto-oncogenes than control FV. Since FV proviruses are also known to dysregulate nearby genes less than LVs or GVs 36 , retargeted FVs may be a safer option for HSC gene therapy. Thus, the safety of retargeted FVs should be further explored for clinical use.

Materials and Methods
Developing constructs and vector production. Vector  Vectors were produced by transient transfection of FV plasmid and helper plasmids on HEK-293 cells using polyethylenimine as previously described 37 , except that 14 μ g of FV vector plasmid FV-SGW-KO and 0.3, 4, and 6 μ g of the appropriate Env, Gag, and Pol FV helper plasmids were used, and 72.9 μ g polyethylenimine were used and 4-(2-hydroxyethyl)-1-piperazineethanesulfonic acid was omitted. Vector-containing supernatant was passed through a 0.45 μ m filter (Pall Corporation, Newquay, United Kingdom), concentrated 100-fold by ultracentrifugation at 23 °C, and frozen at − 80 °C until use in serum-free media containing 5% DMSO. Before use on CD34 + human cord blood, vector preparations were dialyzed using an Amicon Ultra 0.5 mL Centrifugal 50 kDa Filter (EMD Millipore, Billerica, MA) to remove DMSO. Vector preparations were titered on HT1080 cells and EGFP expressing cells were quantified using a BD Accuri C6 Flow Cytometer 72 hours post vector exposure (BD Biosciences, Franklin Lakes, NJ).
Cell culture, transduction of IMR90 and CD34 + human cord blood cells, and CFU assays. IMR90 cells (Coriell Institute, Camden, NJ) were cultured in Eagle's minimal essential medium (EMEM) (ATCC, Manassas, VA) supplemented with 15% Fetal bovine serum (FBS) (Atlanta Biologicals, Lawrenceville, GA) at 37 °C in 5% CO 2 . For each vector, six 10-cm dishes were seeded with 5 × 10 4 cells on day 1, the appropriate vector was added at a multiplicity of infection of 0.5 on day 2, the cells were passaged 1:3 to one 10 cm plate on day 7, the cells were expanded to two 10 cm plates on day 12, and the DNA was extracted on day 23.
Cryopreserved CD34 + male human cord blood cells (Stemcell Technologies, Vancouver, Canada) were thawed, counted, and plated in a 12-well tissue culture treated plate at 5 × 10 5 cells/mL in prestimulation media (IMDM + 10% heat inactivated FBS with 5000U penicillin/streptomycin and cytokines: rhIL-3, rhIL-6, rhSCF, rhTPO, rhFlt-3, rhG-CSF (ProSpec-Tany TechnoGene Ltd., Rehovot, Israel), 100 ng/mL each) on day 1. Cells were incubated at 37 °C overnight. On day 2, a human fibronectin fragment coated 12-well suspension plate was prepared by coating wells with 2 μ g/cm 2 RetroNectin ® Reagent (Takara Bio, Otsu, Shiga, Japan). Cells were counted and plated in three wells at 5 × 10 5 cells/well in prestimulation media. Cells were exposed to FV at an MOI of 10, or were mock transduced with prestimulation media. The final volume of each well was adjusted to 1 mL with prestimulation media and cells were incubated at 37 °C for 20 hours. On day 3, cells were washed, counted and plated at 5 × 10 5 cells/ml. Cells were maintained between 5 × 10 5 and 1 × 10 6 cells/ml for the remainder of the experiment. DNA was extracted and the cells were analyzed for EGFP expression by flow cytometry on days 6 and 11. Events were viewed on a live cell gate for the flow cytometry analysis. One day after cells were exposed to vector, 2000 cells were plated in semisolid Methocult ™ H4230 methylcellulose media (Stemcell Scientific RepoRts | 6:36610 | DOI: 10.1038/srep36610 Technologies, Vancouver, Canada). Methocult ™ H4230 was prepared according to manufacturer's directions, and the following cytokines were added: 50 ng/mL rhSCF, 20 ng/mL rhIL-3, 20 ng/mL rhG-CSF, 20 ng/mL rhGM-CSF (ProSpec-Tany TechnoGene Ltd., Rehovot, Israel). Methocult ™ H4230 plates were incubated at 37 °C. 11 days later, CFUs were counted and scored for EGFP expression using fluorescence microscopy.
Quantitative PCR for vector copy number analysis. Analysis of vector copy number was performed through a multiplexed quantitative PCR assay using primers and probes targeting an EGFP transgene and RNase P as an internal control. The EGFP/EYFP assay consisted of a Custom TaqMan Probe (#4316034) containing a 5′ FAM reporter dye and a 3′ Minor Groove Binder/nonfluorescent quencher, with primers/probe sequences as previously reported 38 . The RNase P assay (Applied Biosystems #4403328) contained a probe with a 5′ Hex reporter dye. The standard curve was generated with genomic DNA extracted from HT1080 cells transduced with a single vector containing EYFP. Reactions were run in triplicate with TaqMan Genotyping Master Mix (Applied Biosystems # 4371353) in a Bio-Rad CFX384 Touch under the following thermal cycling conditions: 95 °C 10 min + 40 × (95 °C 15 s + 60 °C 1 min).

Analysis of integration sites in FV transduced cells.
Genomic DNA was extracted from IMR90 normal human fibroblasts and CD34 + human cord blood cells using the Puregene Cell & Tissue kit (Qiagen Inc., Valencia, CA) according to the manufacturer's directions. Vector-genome junctions were sequenced using MGS-PCR 25 .
Forward and reverse sequence reads were paired with PEAR software using the default settings 39 . Sequence reads were truncated at the 3 rd nucleotide letter that had a corresponding Q score < 27 (p ≥ 0.002) to reduce the effects of sequencing errors when aligning queries to the genome. Perl scripts were used to process and map vector-genome junctions to the Genome Reference Consortium (GRC) build GRCh37 (hg19) of the human genome 40 . Integration sites that aligned to repetitive regions in the genome that could not be clearly resolved as follows below were excluded from all downstream analyses, except for the analysis of the proximity of integration sites to repeat elements and centromeres. An integration site was considered to be within a repetitive region if its second greatest scoring alignment had an alignment score > 95% than its greatest scoring alignment. For integration sites with alignment scores < 100, this threshold was reduced to 90%.
Perl scripts were used to determine the proximity of integration sites to RefSeq genes and TSSs, centromeres, repeat elements (LINE, SINE, LTR, DNA, and satellite elements), CpG islands, histone modification ChIP-Seq peaks, DNase I hypersensitive site peaks, and proto-oncogenes. RefSeq genes, repeat elements, centromere locations, and CpG islands files for the hg19 genome were downloaded from UCSC (http://genome.ucsc.edu/cgi-bin/ hgTables). The Roadmap Epigenomics Project IMR90 and CD34 + histone modification ChIP-Seq and DNase I hypersensitive site peak files were used 41 . The annotated peaks for each cell type were sorted by peak strength and then divided into 10 equal sized bins, with bin 1 containing the lowest scoring peaks and bin 10 containing the highest scoring peaks (strong peaks). An integration site was considered to be within a peak if it was between the start and stop positions of a peak and an integration site was considered near a peak if it was within 50 kb of the start or stop site of a peak. For the cancer gene analysis, we used 2,048 cancer genes that had corresponding RefSeq mRNAs from the Network of Cancer Genes 4.0 as of June 18 th , 2014. An integration site was considered to be within an oncogene when it was located within the oncogene transcript. RIS hotspots were defined as 3 RISs within the specified kb window. The RIS hotspot analysis was performed as previously described 42 . Briefly, RIS data sets were divided into a minimum of three non-overlapping randomly selected matched sized data sets of 320 RISs for IMR90 cell data and 511 RISs for CD34 + cell data for hotspot analyses. Statistical analysis. Statistical significance was determined by using Student's t-test for vector titers and the χ 2 goodness-of-fit test for all other analyses.