Introduction
Recombinant vectors based on gammaretroviruses have been used extensively in preclinical animal models and human clinical trials with very little evidence of genotoxicity1,2. However, recent studies have established that such vectors do have the potential to activate endogenous proto-oncogenes and induce malignant transformation in targeted cell populations3,4. These events have spurred a great deal of interest in characterizing global patterns of vector integration, with the goal of understanding better the risks of vector-related insertional mutagenesis and other forms of genotoxicity and devising strategies to reduce these risks. As recently reported, several studies found that recombinant vectors based on retroviruses do not integrate randomly throughout the target cell genome, but instead cluster in and around genes, with some evidence of integration hot spots and preferences for palindromic sequences and matrix attachment regions and, in some cases, actively transcribed regions5,6,7,8,9,10,11,12,13,14,15,16,17,18. However, details of these integration patterns vary with the class of vector and, to a lesser degree, the target cell type. In addition, these studies have focused either on cell lines, or in vivo where the repertoire of integration events can be heavily biased.
The repertoire of integration sites can be influenced by several factors. In the most extreme case, vector-mediated activation of proto-oncogenes flanking a small minority of integration sites can result in the selective survival and expansion of individual transduced clones, eventually leading to a repertoire of integration sites that converge on clonality3,4,15,19. The fact that vectors based on gammaretroviruses and lentiviruses integrate preferentially within or near transcriptionally active genes suggests that specific patterns of integration, at least for these classes of vectors, can vary widely depending on the target cell population and associated differences in global gene expression patterns. It is possible that the repertoire of integration sites may also vary depending on whether individual vector provirus are expressed. The influence of flanking sequences on provirus expression, a phenomenon known as chromosomal position effects, has been clearly documented for both wild-type virus infections20,21 and recombinant virus vectors based on gammaretroviruses and lentiviruses22,23. The mechanisms underlying chromosomal position effects presumably include the influence of cis-acting regulators such as enhancers or silencers present in flanking sequences, whether the sites of integration are located in transcriptionally permissive euchromatin or repressive heterochromatin, and possibly transcriptional interference due to unfavorable interactions with flanking transcription units. The degree to which selection based on vector gene expression can influence the repertoire of integration sites has profound implications for the use of such strategies in clinical gene transfer trials. As recently reviewed24, several methods of selection are being investigated as a means of increasing both the proportion of transduced cells and the level of vector expression. It is possible that such selection can further skew an already biased spectrum of integration sites near or within particular genes or gene-rich regions of the genome, altering the probability of genotoxicity.
To assess the global integration pattern of conventional gammaretrovirus-based vectors in clinically relevant primitive hematopoietic cells without the potential bias of in vivo reconstitution, as well as to gauge the effects of selection for vector expression on the repertoire of retroviral integration sites, we transduced mouse bone marrow cells with a reporter vector based on murine stem cell virus (MSCV), grew myeloid progenitor colonies in the absence and presence of drug selection, and cloned provirus junction sites. These sites were sequenced, assigned to positions in the mouse genome, and compared to a random set of simulated integration sites to detect any biases relative to coding elements of the mouse genome.
Results and discussion
Experimental system
For these studies we focused on the analysis of vector integration sites in primitive hematopoietic progenitor cells. As diagrammed in Fig. 1A, progenitor cells are relatively undifferentiated but exhibit a high capacity to replicate and to generate large numbers of differentiated cells. Although the ability to support in vivo engraftment is typically restricted to more primitive hematopoietic stem cells, progenitors do exhibit limited self-renewal capacity, placing them at risk of vector-mediated oncogenic transformation. We were able to restrict our analysis to integration events that occurred in the primitive progenitor population by plating transduced bone marrow cells in methylcellulose progenitor colony assays and waiting until the mature cells had died and the primitive progenitors had expanded and differentiated into colonies. We then prepared DNA from pools of these colonies for integration site analysis. To assess the effects of selection on the repertoire of integration events, we added the neomycin drug analog G418 to a set of parallel methylcellulose cultures. Moreover, this schema allowed for the analysis of a primitive and clinically relevant population of primary hematopoietic cells without the potential biases that have been frequently associated with in vivo engraftment3,4,15,17,19.
Figure 1.
Experimental schema. (A) Strategy for targeting progenitors. The differentiation pathways for myelo/monocytic, erythroid, and megakaryocytic/platelet lineages are diagrammed, including primitive hematopoietic stem cells (HSC), common myeloid progenitors (CMP), more restricted progenitors, and various stages of terminal differentiation (indicated by horizontal arrows). Integration sites for primitive progenitors were targeted by assaying only DNA from colonies of differentiated cells that were grown from the progenitors that were present at the time of transduction. (B) Diagram of integrated vector provirus. The gammaretrovirus vector used here was derived from the previously described MSCV-based vector MGPN225. It includes an expression cassette for GFP transcribed from the 5' vector LTR and an expression cassette for the drug resistance gene Neo transcribed from an internal Pgk promoter. To this vector was added a bacterial expression cassette for the drug resistance gene Zeo in the proximal portion of the 3' LTR, from which it is copied into the 5' LTR during formation of provirus, generating the flanking arrangement indicated in the diagram. Genomic DNA from transduced cells was digested with SpeI, which cuts at the indicated sites within the provirus as well as at clone-specific locations in the flanking genomic DNA, and with AvrII, which cuts only in the flanking genomic DNA, and the indicated fragments were subcloned by plasmid ligation, bacterial transformation, and colony selection under zeocin selection. Vertical arrows, restriction sites; horizontal arrows, transcription promoters; wavy lines, genomic DNA. (C) Example of vector transduction. Mouse bone marrow cells were transduced with the vector described above and myeloid progenitor colonies were grown in the absence and presence of G418 selection for Neo expression. Pools of colonies were collected and the frequency of GFP expression was determined by flow cytometry. Exemplary flow-cytometric histograms for unselected and selected pools are presented, along with the fraction of cells determined to be GFP-positive, indicated above the gates used for this determination. Light traces, untransduced controls; dark traces, experimental samples.
Full figure and legend (95K)The vector used in these studies is based on the MSCV gammaretrovirus vector MGPN222,25. As previously described, this vector contains a coding cassette for green fluorescence protein (GFP) transcribed by the virus LTR promoter and a coding cassette for neomycin phosphotransferase (Neo) transcribed by an internal promoter from the murine phosphoglycerokinase (Pgk) gene. To this vector we added a bacterial expression cassette for the zeocin drug resistance gene (Zeo) inserted in the proximal region of the 3' LTR. From this region the Zeo cassette is copied into the 5' LTR during reverse transcription and proviral integration, resulting in the flanking arrangement diagrammed in Fig. 1B. This arrangement allows for the recovery of junction fragments from both the 5' and the 3' ends of integrated provirus by digesting genomic DNA with SpeI and AvrII and recovering the desired fragments by plasmid ligation, bacterial transformation, and colony selection under zeocin selection. This approach offers several key advantages over PCR-based methods of cloning integration sites. It is relatively quick and easy and avoids the complications of nonspecific amplification, template competition, and random base substitutions associated with PCR. It also allows for the simultaneous cloning of junction fragments from both sides of the integrated provirus, as well as junction fragments that are typically much longer than those obtained by PCR. These advantages combine to allow for the cloning and unambiguous mapping of integration sites that may otherwise be missed by more conventional methods, such as sites located within short stretches of repetitive DNA.
Transduction and cloning of integration sites
We transduced bone marrow cells from female donors at a low multiplicity of infection to obtain predominantly single integration events per transduced cell, split them into two fractions, and plated them in methylcellulose myeloid progenitor cell cultures with or without G418 selection. We performed six independent transductions with an average of 449 myeloid progenitor colonies per pool. As diagrammed in Fig. 1C, flow-cytometric analysis of pools of colonies demonstrated that only about a third of the cells within colonies grown in the absence of selection expressed GFP, whereas this rate was increased to half or more when the colonies were grown under G418 selection. These results are consistent with previous studies with this vector22,26 and serve to demonstrate both the relatively low level of transduction and the profound sensitivity of this vector to silencing chromosomal position effects.
We prepared genomic DNA from pools of colonies and recovered provirus/genomic junctions as described above. Cloned fragments averaged around 3 kb, consistent with the size expected when mouse genomic DNA is digested with the combination of enzymes used here. Sequencing of the cloned fragments from both ends yielded genomic sequence with an average size of alignment with the mouse genome (when the sequences were submitted as BLAST queries) of 401 bp, with an average identity of 97.6%. Only unambiguous alignments with a score of better than e-16 that mapped to a single site in the mouse genome were used for further analyses. We identified a total of 130 unique integration sites from the unselected colony pools and 129 unique integration sites from the selected colony pools, for a total of 259 experimental integrations. We also generated a simulated dataset of 260 random integration sites. For this purpose, we identified segments surrounding randomly selected sites in the mouse genome, of lengths similar to those obtained experimentally, and subjected them to the same screening process of BLAST analysis applied to the experimental dataset. This resulted in a modestly more restricted simulated control dataset than those used in other previous studies9,27,28, effectively omitting about 2% of regions with high sequence homologies and redundancies and thereby matching more closely the treatment of the experimental dataset. We did not, however, adjust this control dataset for distances relative to SpeI and AvrII restriction sites used to generate the experimental dataset, as others have suggested13, due to the fairly uniform distribution of these sites throughout the genome, the ability of our cloning strategy to isolate large as well as small DNA fragments, and the incomplete nature of the mouse genome database.
General distribution of integration sites
As diagrammed in Fig. 2A, we found vector integration sites to be broadly distributed throughout the mouse genome, with integrations observed in all 20 mouse chromosomes with no obvious recurring integration hot spots for either the unselected or the selected samples. There were no significant differences observed between the experimental and the simulated random datasets for individual chromosomes, with the exception of the X chromosome, which contained fewer than half of the expected integration events. It is possible that this may reflect a general inaccessibility of the inactive X chromosome in the female target cells or other unique properties of the X chromosome such as a high density of L1 elements that may inhibit either the mechanism or the detection of integration. However, there were also other, albeit less significant, differences in the frequencies of integrations for specific chromosomes. As diagrammed in Fig. 2B, a comparison between the frequency of integrations and the frequency of genes within individual chromosomes revealed a strong correlation between these two parameters for the experimental dataset (coefficient of determination, R2 = 0.45, P = 0.001). In contrast, we saw a much weaker correlation for the simulated random dataset (R2 = 0.23, P = 0.04), presumably reflecting the fact that, in general, larger chromosomes contain more genes that smaller chromosomes.
Figure 2.
General distribution of integration sites. (A) The percentages of integration sites present on individual mouse chromosomes for the combined experimental dataset versus the simulated random dataset are shown. The P value was determined using one-sided Z statistic for two proportions. (B) Correlation between the percentage of integration sites and the percentage of genes on individual mouse chromosomes are coordinately presented for the combined experimental dataset and the simulated random dataset. R2, coefficient of determination; P value was determined using the F statistic. (C) Correlation between integration sites and gene density within a 1-Mb window (500 kb on either side of integration site) for the combined experimental dataset and the simulated random dataset. Heavy horizontal bar, median; open box ends, first and third quartiles; whiskers, 1.5 times the interquartile range; diamonds, individual outlier data points. KS, result of Kolmogorov–Smirnov test29.
Full figure and legend (137K)Previous studies demonstrated a correlation between integration sites and gene density for MLV and HIV vectors12,14. To assess this correlation in our dataset, we assessed the frequency of genes within a 1-Mb window (500 kb on either side) around individual sites of integration. As seen in Fig. 2C, there was a median of 14 genes within this window surrounding the integration sites observed experimentally, versus a median of only 8 genes for the simulated random set. Using the nonparametric and distribution-free two-sample Kolmogorov–Smirnov test29, we found these distributions to be significantly different (KS = 0.30, P < 0.001). We also observed a small difference between the integration sites observed experimentally in the absence and presence of selection (KS = 0.17, P = 0.04), with a greater preference seen with the unselected set (median: 16 genes) versus the selected set (median: 12.5 genes). Taken together, this initial analysis suggests that gammaretrovirus vectors have a significant bias for integration into gene-rich regions in primary mouse hematopoietic progenitor cells and that this preference may be influenced, albeit to a small degree, by selection.
Frequency of integration within and near genes
We next sought to determine whether there was a bias for integration within genes. For this purpose, we included integration events within the primary transcripts of both RefSeq genes and all known genes. As outlined in Table 1, the rate of integration observed for the experimental dataset averaged 30.1% for the RefSeq genes and 41.7% for all known genes, which were essentially identical to the rates observed for the simulated random set. There were also no significant differences observed between the experimental datasets for progenitor colonies grown in the absence and presence of selection, although there appeared to be a trend toward integration within genes for the selected set (45.7%) compared to the unselected set (37.7%) when all genes were included in the analysis (P = 0.12, one-sided Z statistic for two proportions).
As summarized in Table 1, we also reanalyzed the integration site data originally reported by Wu et al.9 for gammaretrovirus vector transduction of human HeLa cells. For this analysis we BLAST queried 747 sequences longer than 88 bp to the more recent build 35 of the human genome and obtained 547 alignments 45 bp and longer with a score better than e-16. We also generated a simulated random dataset for the human genome in a manner similar to that used for the mouse random set. Comparisons between these datasets indicated there was still a statistically significant but very small bias for integrations within RefSeq genes (39.1% for experimental versus 34.5% for control, P = 0.04). This difference from the original analysis reported by Wu et al. presumably reflects the larger number of genes annotated in the more recent build of the human genome, as well as the use of more stringent criteria for establishment of the simulated random dataset. However, comparisons between the rates of integration into all known genes in HeLa cells (260 of 547) versus mouse progenitor cells (108 of 259) indicated no significant difference (P = 0.07, one-tailed Z statistic for two proportions). Taken together these results suggest there are few if any differences in the general integration patterns between human and mouse cells.
Frequency of integration near transcriptional start sites
Although we did not detect a bias toward integration within genes in the mouse genome, we did observe a very strong bias for integration within 5 kb upstream of transcriptional start sites for all known genes. As summarized in the last column of Table 1, 18.9% of all experimental integrations were within this window, versus a significantly lower 5% for the simulated random dataset. There were no significant differences observed between the experimental datasets for progenitor colonies grown in the absence (20.1%) and presence (17.1%) of selection and also no significant difference between the experimental datasets observed in mouse bone marrow progenitor cells and our recalculated dataset for HeLa cells (16.5%). Thus, this tendency appears to represent a general propensity for gammaretrovirus vectors in multiple settings.
To pursue this correlation further, we determined the distance between integration sites and the nearest transcriptional start sites for all known genes. As seen in Fig. 3A, there was a clear clustering of integrations near transcriptional start sites for the combined experimental dataset, with a median of -0.1 kb (upstream) and an interquartile range of -8 kb to +15 kb. This is in contrast to the simulated random dataset, with a median of 12 kb and an interquartile range of -22 to 80 kb (KS = 0.24, P < 0.001). As diagrammed in Fig. 3B, a higher resolution analysis of the region within 20 kb of promoters revealed an even stronger difference between the combined experimental and simulated random datasets (KS = 0.34, P < 0.001), with the strongest bias apparently occurring within 7.5 kb upstream and 2.5 kb downstream of transcription start sites as evidenced by the statistically significant differences observed for five of six individual bins within this region. Moreover, 104 of 259 experimental integration sites (40.2%) could be found in this -7.5 kb to +2.5 kb window, versus only 17 of 260 sites (6.5%) for the simulated random datasets. Closer inspection of this window indicated a clear gradient of integration from the first kilobase of the gene, dropping off steeply into the transcript, but decreasing only gradually into the 10 kb upstream, which presumably contains promoter and control elements. This distribution is similar to that reported by Wu et al.9 with a gammaretrovirus vector in human HeLa cells, again emphasizing the conservation of this bias.
Figure 3.
Integration site distribution around transcription start sites. (A) Distances of all integration sites from transcription start sites. The distance from the nearest transcription start site is shown in kilobases for individual integration sites within the combined experimental dataset versus the simulated random dataset. Negative numbers indicate integrations located 5' (upstream) of all known transcription start sites, positive numbers, integrations 3' (downstream) of all known transcription start sites. See Fig. 2C legend for additional information. (B) Distances for integration sites near transcription start sites. The percentages of integration sites within the indicated discrete windows around all known transcription start sites are shown for the combined experimental dataset versus the simulated random dataset. KS, result of Kolmogorov–Smirnov test29. *P < 0.05 for experimental versus simulated datasets within discrete bins determined using one-tailed Z statistic for two proportions. (C) Effect of selection on distance of integration sites from nearby transcription start sites. The percentages of integration sites within the indicated discrete windows around all known transcription start sites are shown for the unselected versus selected experimental datasets. KS, result of Kolmogorov–Smirnov test29.
Full figure and legend (174K)Although there were no significant differences between the unselected and the selected experimental samples considering the whole dataset (KS = 0.11, P = 0.45), as seen in Fig. 3C we did observe small but significant differences between these samples when we truncated the datasets to include only integration events located within 20 kb of promoters (KS = 0.22, P = 0.03). Further comparisons of these truncated datasets by the one-tailed Wilcoxon rank-sum test indicated that this difference was due to a shift of integration sites to the 3' (downstream) side of the promoter (Z = -2.05, P = 0.02) relative to the unselected dataset.
Transcriptional state of genes surrounding integration sites
We next sought to determine whether there was a bias for integration events within or near transcriptionally active genes. For this purpose, we used expression array data for mouse bone marrow from the publicly available Genomics Institute of the Novartis Research Foundation SymAtlas panels. As diagrammed in Fig. 4, we ordered all of the genes present in the arrays based on their level of expression and then segregated them to establish five equal bins. We then identified all of the genes located within 20 kb of the mapped integration sites (93 evaluable sites for the unselected dataset and 74 evaluable sites for the selected dataset) and assigned them to individual bins based on their expression level. Comparisons between the combined experimental dataset and the total array dataset indicated a significant overall bias in integration sites (KS = 0.22, P = 0.01). As indicated in Fig. 4, this difference was due in large part to a bias for integration events within or near the 20% of genes expressed at the highest level in mouse bone marrow cells, including a 2.3-fold overrepresentation for the unselected dataset and 1.7-fold overrepresentation for the selected dataset. Although there also appeared to be a bias toward integration events within or near the 20% of genes expressed at the lowest level in mouse bone marrow for the unselected versus selected datasets, there were no overall differences observed between these two datasets (KS = 0.19, P = 0.09).
Figure 4.
Expression levels of genes flanking integration sites. The expression levels for all evaluable genes within a 20-kb window around integration sites were determined using publicly available Genomics Institute of the Novartis Research Foundation (GNF) SymAtlas panels (Mouse GNF1M MAS5 and gcRMA arrays) for mouse bone marrow. The bone marrow expression levels for all genes on the arrays were ordered from lowest to highest and used to establish five equal bins representing the 20% intervals indicated on the x axis of the graph. The identified genes flanking the unselected and selected experimental integration site databases were then assigned to these bins based on their relative expression levels. The percentages of genes within each bin are presented as histograms. KS, result of Kolmogorov–Smirnov test29. *P < 0.05 versus the whole array distribution (indicated by the line labeled "neutral") using the one-tailed Z statistic for two proportions for individual bins.
Full figure and legend (59K)Given the apparent influence of transcriptional activity on integration site bias, we further compared the orientation of individual provirus integrated within or near known genes relative to the direction of transcription of those genes. In the dataset from colonies grown in the absence of selection, we found that 49 of 91 evaluable proviruses were integrated in the same sense orientation as flanking genes, demonstrating no significant preference for a particular orientation. In contrast, we found only 31 of 82 evaluable proviruses from the dataset of selected colonies to be integrated in the sense orientation relative to flanking genes, representing a modest but statistically significant increase in the frequency of provirus arranged in the antisense orientation (P = 0.025, Z test). Taken together, these two results suggest that both the level and the orientation of target site transcription can influence the likelihood of integration and the likelihood of expression for individual proviruses.
Associations with specific locations and genes
Although the integration sites were distributed relatively randomly throughout the mouse genome, there were a total of 25 instances in which cloned integration site junctions mapped to similar or possibly even identical locations with other sites. As summarized in Table 1 of the supplementary materials, most of these (4 for the unselected set and 10 for the selected set) appeared to have arisen from the repeated isolation of single integration events, although the possibility of independent integration events in very close proximity cannot be ruled out. There was also a total of 11 examples in which verifiably independent integration events occurred at least twice, and in some cases three times, within a 100-kb window of each other. In particular, we observed three of these sites in both the unselected and the selected datasets. Such clustering has been proposed as a means of defining common retrovirus vector insertion sites30. Analysis of the simulated random dataset revealed only four instances of such clustering, indicating that the rate of verifiably nonclonal common integration sites was small but significantly higher for the selected dataset (P = 0.03, one-tailed Z statistic for two proportions), while we observed no significant clustering for the unselected dataset. However, it is clear from this analysis that there are no individual integration sites that constitute highly significant hot spots for oncoretrovirus vector integration.
As another means of identifying potential integration hot spots, we compared our dataset to that reported by Wu et al. for gammaretrovirus vector transduction of human HeLa cells9. As summarized in Table 2 of the supplementary materials, we found 11 examples of common gene targets in these two datasets, often located in the same relative location (intron or 5'). Comparison with our human simulated random dataset (a 1.8-fold larger size analysis) revealed only seven integration sites in common with the mouse experimental datasets, of which only three encode nuclear products. This indicates that the similarity of target genes in the mouse and human experimental datasets is statistically significant (P = 0.007, one-tailed Z statistic for two proportions), suggesting that there may indeed be hot spots of integration that are conserved between species.
Given the potential for oncogenic transformation, we also assessed our experimental datasets for integration sites located within or near genes involved in malignant transformation. For this purpose we included genes identified as common insertion sites in the retroviral tagged cancer gene database31, as well as other genes with a documented association with tumor formation. As summarized in Table 3 of the supplementary materials, we found 7 integration events for the unselected dataset and 11 integration events for the selected dataset (including two pairs), representing an overall rate of 6.9% for such associations. However, we also found a total of 14 such associations for the simulated random dataset (5.4%, data not shown), indicating that there was no statistically significant bias for integration near genes associated with malignant transformation. It is worth noting that we did not detect integrations within or near the MDS1/Evi1 genes reported by others in mouse and nonhuman primate hematopoietic stem cell transduction and transplantation models14,19. This disparity presumably reflects the fact that we assessed transduction events in a setting in which the target cells were expanded for only a short period of time so that small improvements in cell division or survival were not amplified to the degree that can occur in vivo.
Finally, during our analysis of integration sites we noticed an apparent preference for integration within or near (20 kb) genes that encode nuclear proteins (based on Gene Ontology (GO) terms embedded in the Ensembl database). In particular, there were a total of 77 of 200 evaluable genes fitting this category for the combined experimental dataset, versus 43 of 157 evaluable genes for the simulated random dataset (P = 0.02, one-tailed Z statistic for two proportions). Although this correlation appears significant, the availability of a large number of alternative GO categories requires an adjustment for multiple testing, which in turn reduces the significance of this particular correlation considerably. However, reanalysis of the integration site data of Wu et al.9 for gammaretrovirus vector transduction of human HeLa cells also revealed a similar bias, with 179 of 438 evaluable genes fitting this category for the experimental dataset, versus only 134 of 402 evaluable genes for the simulated random dataset (P = 0.01). This bias was also present for genes encoding proteins which contain the bipartite nuclear localization signal (InterPro domain classification IPR001472), since such genes were present at or near 45 of 200 evaluable integration events for the experimental dataset, compared to only 20 out of 157 evaluable integration events for the simulated random dataset (P = 0.01). Taken together, these results indicate a clear bias for integrations within or near genes that encode nuclear proteins.
General conclusions and implications
The analysis reported here points to an underlying mechanism of integration site selection for gammaretrovirus-based vectors that is highly conserved across species, cell lineages, and stages of differentiation. The preference for integration into gene-rich regions, and especially near transcriptionally active promoters, may simply reflect the generally open chromatin structure and chromosome accessibility correlated with such regions32,33. As recently proposed, it is also possible that this preference may reflect a directed targeting of the viral preintegration complex to elements of the transcription complex or other regulatory elements bound to promoters7. This latter hypothesis may be of particular relevance in explaining the apparent bias for integration sites within or near genes encoding nuclear proteins, which may employ a unique constellation of transcription factors that serve as a good target for the preintegration complex.
The observed differences in the patterns of integration sites recovered in the absence and presence of selection suggest that the pattern and orientation of genomic gene expression in the immediate vicinity of integrated provirus can have a direct impact on the likelihood and/or level of vector expression. The shift toward integrations immediately downstream of endogenous promoters may reflect a harnessing of endogenous promoters to augment vector transcription. Indeed, such a phenomenon may play a role in the generation of leukemic versus carrier states for cells infected with the lentivirus human T cell leukemia virus 120. In contrast, the bias for integration events in less gene-rich regions, integrations within or near the most transcriptionally silent genes in the target cell population, and integrations oriented in the antisense direction relative to flanking transcription units may reflect an incompatibility between vector and flanking gene expression. As recently proposed, this may involve the mechanism of transcriptional interference between integrated provirus and the genes flanking particular sites of integration21,34. Indeed, the role of transcriptional interference in vector expression could even be a driving force for the bias of one particular orientation in the selected dataset34. Moreover, our data indicate that selection can have a modest but statistically significant impact on the repertoire of integration sites.
The apparent impact of selection on the integration site repertoire has several possible implications for the use of selection in the setting of clinical gene transfer. For example, the shift toward integrations within the start of genes presumably portends an increased risk of gene disruption or dysregulation. Likewise, the increased preference for integrations within or near the most transcriptionally silent class of genes raises the specter of inappropriate gene activation, either through trans-activation mediated by vector-derived enhancers or through transcription readthrough. Several approaches have been proposed for improving the safety of recombinant retrovirus vectors, including the use of self-inactivating vectors and more tissue-specific enhancers to reduce the potential for trans-activation, the use of stronger transcription terminators to reduce transcription run-through, and the use of chromatin insulators to reduce the potential of trans-activation and to reduce the influence of chromosomal position effects on vector expression22,35. Future studies will be needed to determine whether the integration site biases observed here in primitive primary hematopoietic progenitor cells have any appreciable effect on vector genotoxicity, and whether these biases can be reduced through the use of one or more of the approaches for improving vector safety.
Materials and methods
Recombinant vector
The gammaretrovirus vector MGPN2 was described previously25. This vector contains the LTR and extended packaging signal from MSCV and expresses the enhanced GFP gene from the viral 5' LTR promoter and the Neo gene from a Pgk promoter. A 572-bp fragment containing a zeocin resistance gene with a synthetic EM7 promoter was inserted into the NheI site of the 3' LTR, from which it is copied into the 5' LTR during generation of provirus. Vector producer lines were generated in the ecotropic packaging line GP+E8636. All cultures were maintained at 37°C and 7.5% CO2 in Dulbecco's modified Eagle's medium supplemented with 10% heat-inactivated characterized fetal bovine serum, 2 mM L-glutamine, 1 mM sodium pyruvate, 0.1 mM nonessential amino acids, and antibiotics.
Bone marrow transduction
Mouse bone marrow cells were transduced as previously described37. Marrow was harvested from the femora of 6- to 12-week-old B6
D2 F1 female donors treated 2 days previously with 5-fluorouracil (150 mg/kg ip). Cells were preinduced at 1
106 cells/ml in Iscove's modified Dulbecco's medium containing 10% defined FBS, L-glutamine, sodium pyruvate, nonessential amino acids, antibiotics, 5% interleukin-3 culture supplement (IL-3; Collaborative Biomedical Products, Bedford, MA, USA), 100 ng/ml recombinant human IL-6 (PeproTech, Inc., Rocky Hill, NJ, USA), and 50 units/ml recombinant mouse stem cell factor (PeproTech, Inc.). After 48 h culture at 37°C in 5% CO2, the marrow cells were overlaid on irradiated (15 Gy), subconfluent GP+E86 producer cells at a density of 5–10
106 cells per 10-cm plate and an estimated multiplicity of infection of 0.3 virus per cell in 10 ml of the above medium further supplemented with 8
g/ml Polybrene. After an additional 48 h culture, the nonadherent bone marrow cells were carefully collected on ice and washed in cold Hanks' buffered saline solution (HBSS).
Progenitor cultures
Marrow cells were suspended at 1.7
105 cells/ml in methylcellulose culture medium designed to support the differentiation of a wide spectrum of myelo/monocytic, erythroid, and megakaryocytic progenitors (complete methylcellulose with erythropoietin; StemGenix, Amherst, NY, USA) either in the absence or in the presence of 0.7 mg/ml active G418. After 7 to 10 days of culture at 37°C and 5% CO2, colonies were enumerated, collected, and pooled in HBSS for further analysis.
Flow cytometry analysis
Pools of progenitor cells were washed in HBSS supplemented with 2% FBS to remove undissolved methylcellulose and were analyzed by flow cytometry on a FACScan flow cytometer (Becton–Dickinson, San Jose, CA, USA) using CellQuest software.
Cloning of integration sites
Genomic DNA was prepared from progenitor colony pools by standard methods38 and digested with SpeI and AvrII, which cut inside (SpeI) and outside (both) of the integrated vector but not within the Zeo expression cassette. These fragments were ligated to cloning plasmids (devoid of SpeI and AvrII restriction sites) previously digested with XbaI (which generates the same complementary overhangs as SpeI and AvrII) and treated with phosphatase. The ligated plasmids were again digested with SpeI and AvrII to remove plasmids with multiple inserts. XL-1 Blue (Stratagene, La Jolla, CA, USA) bacterial cells were electroporated, and transduced colonies containing Zeo–LTR–genomic junction fragments were selected with 25
g/ml zeocin.
Integration site analysis
Plasmids containing retroviral/genomic junctions were sequenced with four primers: M13R; puc19MCS, 5'-AGTGAATTCGAGCTCGGTA-3'; Zeo orf, 5'-GCCGAGGAGCAGGACTGA-3'; and EM7 promoter, 5'-TATGCCGATATACTATGC-3'. Sequences were BLAST searched against the Ensembl mouse genome database (build 33) or against the NCBI nonredundant database (build 35 for human). Insertion sites were considered authentic if they contained adjoining retroviral sequences and matched genomic sequence with better than 90% identity and score of e-16 or higher.
Simulated random dataset
Random sites in the mouse genome were chosen using a random number generator. Sequences of lengths about the same size as the experimental data (400 bp for mouse, 106 bp for human) were then identified adjacent to these sites and BLAST searched using the criteria used for the experimental datasets described above.
Website URLs
The Ensemble Web site is at http://www.ensembl.org, the Genomics Institute of the Novartis Research Foundation SymAtlas expression array panels are at http://symatlas.gnf.org, BLAST searches were done at http://www.ncbi.nlm.nih.gov/BLAST, and the mouse retrovirus tagged cancer gene database is at http://RTCGD.ncifcrf.gov.
References
- Kohn, D. B., et al. (2003). American Society of Gene Therapy (ASGT) ad hoc subcommittee on retroviral-mediated gene transfer to hematopoietic stem cells. Mol. Ther. 8: 180–187. | Article | PubMed | ISI | ChemPort |
- Baum, C., et al. (2004). Chance or necessity? Insertional mutagenesis in gene therapy and its consequences. Mol. Ther. 9: 5–13. | Article | PubMed | ISI | ChemPort |
- Hacein-Bey-Abina, S., et al. (2003). LMO2-associated clonal T cell proliferation in two patients after gene therapy for SCID-X1. Science 302: 415–419. | Article | PubMed | ISI | ChemPort |
- Li, Z., et al. (2002). Murine leukemia induced by retroviral gene marking. Science 296: 497. | Article | PubMed | ISI | ChemPort |
- Bushman, F. (2003). Targeting survival: integration site selection by retroviruses and LTR–retrotransposons. Cell 115: 135–138. | Article | PubMed | ISI | ChemPort |
- Narezkina, A., et al. (2004). Genome-wide analyses of avian sarcoma virus integration sites. J. Virol. 78: 11656–11663. | Article | PubMed | ISI | ChemPort |
- Engelman, A. (2005). The ups and downs of gene expression and retroviral DNA integration. Proc. Natl. Acad. Sci. USA 102: 1275–1276. | Article | PubMed | ChemPort |
- Holman, A. G. and Coffin, J. M. (2005). Symmetrical base preferences surrounding HIV-1, avian sarcoma/leukosis virus, and murine leukemia virus integration sites. Proc. Natl. Acad. Sci. USA 102: 6103–6107. | Article | PubMed | ChemPort |
- Wu, X., Li, Y., Crise, B. and Burgess, S. M. (2003). Transcription start sites in the human genome are favored sites for MLV integration. Science 300: 1749–1751. | Article | PubMed | ISI | ChemPort |
- Wu, X., Li, Y., Crise, B., Burgess, S. M. and Munroe, D. J. (2005). Weak palindromic consensus sequences are a common feature found at the integration target sites of many retroviruses. J. Virol. 79: 5211–5214. | Article | PubMed | ChemPort |
- Johnson, C. N. and Levy, L. S. (2005). Matrix attachment regions as targets for retroviral integration. Virol. J. 2: 68. | Article | PubMed | ChemPort |
- Schroder, A. R., Shinn, P., Chen, H., Berry, C., Ecker, J. R. and Bushman, F. (2002). HIV-1 integration in the human genome favors active genes and local hotspots. Cell 110: 521–529. | Article | PubMed | ISI | ChemPort |
- Mitchell, R. S., et al. (2004). Retroviral DNA integration: ASLV, HIV and MLV show distinct target site preferences. PLoS Biol. 2: 1127–1137. | ChemPort |
- Hematti, P., et al. (2004). Distinct genomic integration of MLV and SIV vectors in primate hematopoietic stem and progenitor cells. PLOS Biol. 2: 2183–2190. | ChemPort |
- Calmels, B., et al. (2005). Recurrent retroviral vector integration at the MDS1–EVI1 locus in non-human primate hematopoietic cells. Blood 106: 2530–2533. | Article | PubMed | ISI | ChemPort |
- Laufs, S., et al. (2003). Retroviral vector integration occurs in preferred genomic targets of human bone marrow-repopulating cells. Blood 101: 2191–2198. | Article | PubMed | ISI | ChemPort |
- Laufs, S., Nagy, K. Z., Giordano, F. A., Hortz-Wagenblatt, A., Zeller, W. J. and Fruehauf, S. (2004). Insertion of retroviral vectors in NOD/SCID repopulating human peripheral blood progenitor cells occurs preferentially in the vicinity of transcription start regions and in introns. Mol. Ther. 10: 874–881. | Article | PubMed | ISI | ChemPort |
- Trobridge, G., et al. (2006). Foamy virus vector integration sites in normal human cells. Proc. Natl. Acad. Sci. USA 103: 1498–1503. | Article | PubMed | ChemPort |
- Kustikova, O., et al. (2005). Clonal dominance of hematopoietic stem cells triggered by retroviral gene marking. Science 308: 1171–1174. | Article | PubMed | ISI | ChemPort |
- Doi, K., et al. (2005). Preferential selection of human T-cell leukemia virus type I (HTLV-I) provirus integration sites in leukemic versus carrier states. Blood 106: 1048–1053. | Article | PubMed | ChemPort |
- Lewinski, M. K., et al. (2005). Genome-wide analysis of chromosomal features repressing human immunodeficiency virus transcription. J. Virol. 79: 6610–6619. | Article | PubMed | ISI | ChemPort |
- Emery, D. W., Yannaki, E., Tubb, J. and Stamatoyannopoulos, G. (2000). A chromatin insulator protects retrovirus vectors from chromosomal position effects. Proc. Natl. Acad. Sci. USA 97: 9150–9155. | Article | PubMed | ChemPort |
- Ramezani, A., Hawley, T. S. and Hawley, R. G. (2003). Performance- and safety-enhanced lentiviral vectors containing the human interferon-beta scaffold attachment region and the chicken beta-globin insulator. Blood 101: 4717–4724. | Article | PubMed | ISI | ChemPort |
- Milsom, M. D. and Fairbairn, L. J. (2004). Protection and selection for gene therapy in the hematopoietic system. J. Gene Med. 6: 133–146. | Article | PubMed | ISI | ChemPort |
- Cheng, L., et al. (1997). A GFP reporter system to assess gene transfer and expression in human hematopoietic progenitor cells. Gene Ther. 4: 1013–1022. | Article | PubMed | ISI | ChemPort |
- Yannaki, E., Tubb, J., Aker, M., Stamatoyannopoulos, G. and Emery, D. W. (2002). Topological constraints governing the use of the chicken HS4 chromatin insulator in oncoretrovirus vectors. Mol. Ther. 5: 589–598. | Article | PubMed | ISI | ChemPort |
- Nakai, H., et al. (2005). Large-scale molecular characterization of adeno-associated virus vector integration in mouse liver. J. Virol. 79: 3606–3614. | Article | PubMed | ISI | ChemPort |
- Yant, S. R., Wu, X., Huang, Y., Garrison, B., Burgess, S. M. and Kay, M. A. (2005). High-resolution genome-wide mapping of transposon integration in mammals. Mol. Cell. Biol. 25: 2085–2094. | Article | PubMed | ISI | ChemPort |
- Horn, S. D. (1977). Goodness-of-fit tests for discrete data: a review and an application to a health impairment scale. Biometrics 33: 237–247. | Article | PubMed | ChemPort |
- Suzuki, T., et al. (2002). New genes involved in cancer identified by retroviral tagging. Nat. Genet. 32: 166–174. | Article | PubMed | ISI | ChemPort |
- Akagi, K., Suzuki, T., Stephens, R. M., Jenkins, N. A. and Copeland, N. G. (2004). RTCGD: retroviral tagged cancer gene database. Nucleic Acids Res. 32: D523–D527. | Article | PubMed | ISI | ChemPort |
- Gilbert, N., Boyle, S., Fiegler, H., Woodfine, K., Carter, N. P. and Bickmore, W. P. (2004). Chromatin architecture of the human genome: gene-rich domains are enriched in open chromatin fibers. Cell 118: 555–566. | Article | PubMed | ISI | ChemPort |
- Lee, C.-K., Shibata, Y., Rao, B., Strahl, B. D. and Lieb, J. D. (2004). Evidence for nucleosome depletion at active regulatory regions genome-wide. Nat. Genet. 36: 900–905. | Article | PubMed | ISI | ChemPort |
- Eszterhas, S. K., Bouhassira, E. E., Martin, D. I. and Fiering, S. (2002). Transcriptional interference by independently regulated genes occurs in any relative arrangement of the genes and is influenced by chromosomal integration position. Mol. Cell. Biol. 22: 469–479. | Article | PubMed | ISI | ChemPort |
- von Kalle, C., Fehse, B., Layh-Schmitt, G., Schmidt, M., Kelly, P. and Baum, C. (2004). Stem cell clonality and genotoxicity in hematopoietic cells: gene activation side effects should be avoidable. Semin. Hematol. 41: 303–318. | PubMed | ISI | ChemPort |
- Markowitz, D., Goff, S. and Bank, A. (1988). A safe packaging line for gene transfer: separating viral genes on two different plasmids. J. Virol. 62: 1120–1124. | PubMed | ISI | ChemPort |
- Bodine, D. M., Karlsson, S. and Nienhuis, A. W. (1989). Combination of interleukin 3 and 6 preserves stem cell function in culture and enhances retrovirus-mediated gene transfer into hematopoietic stem cells. Proc. Natl. Acad. Sci. USA 86: 8897–8901. | Article | PubMed | ChemPort |
- Sambrook, J., Fritsch, E. F. and Maniatis, T. (1989). Molecular Cloning: A Laboratory Manual. 2nd ed. Cold Spring Harbor Laboratory Press: Cold Spring Harbor, NY.
Appendices
Appendix A
Supplementary data
Supplementary data associated with this article can be found, in the online version, at doi:10.1016/j.ymthe.2006.02.016.
Acknowledgements
This work was supported by grants from the National Heart, Lung, and Blood Institute.
MORE ARTICLES LIKE THIS
These links to content published by NPG are automatically generated.
RESEARCH
Bactericidal antisense effects of peptide?PNA conjugatesNature Biotechnology Research (01 Apr 2001)
Caspase inhibition reduces apoptosis and increases survival of nigral transplantsNature Medicine Article (01 Jan 1999)
Genomic and Functional Assays Demonstrate Reduced Gammaretroviral Vector Genotoxicity Associated With Use of the cHS4 Chromatin InsulatorMolecular Therapy Original Article
Immortalization and leukemic transformation of a myelomonocytic precursor by retrovirally transduced HRX?ENLThe EMBO Journal Article (15 Jul 1997)
Potential genotoxicity from integration sites in CLAD dogs treated successfully with gammaretroviral vector-mediated gene therapyGene Therapy Scientific Correspondence
See all 18 matches for Research
