Introduction

The human T-cell leukemia virus type-1 (HTLV-1) is critical in the development and progression of endemically clustered adult T-cell leukemia/lymphoma (ATLL) (Hinuma et al. 1981; Poiesz et al. 1980; Watanabe et al. 1984; Yoshida 1983). In ATLL, the tumor cells harbor one or more copies of HTLV-1 although the provirus often shows extensive deletion in the gag and/or pol genes (Korber et al. 1991; Ohshima et al. 1991; Seiki et al. 1982). Leukemic cells of ATLL, but not other types of lymphoma or leukemia tested so far, were found to contain the provirus genome and also found to be monoclonal with respect to the integration site of provirus genome (Yoshida et al. 1984). Thus, integration of the provirus genome is prerequisite for development of ATLL (Yoshida et al. 1984). Nevertheless, development of ATLL needs long latent period of about 40–60 years, and a few, only about 1 in 1,000, HTLV-1 carriers develops ATLL per year; therefore, it is predicted that in addition to direct influence of HTLV-1, alterations in intracellular genes are essential for development of ATLL. Etiologies of ATLL of HTLV-1-mediated pathopoiesis mechanisms have been the subject of intensive investigation since the discovery of HTLV-1. However, since the finding of the association of HTLV-1 provirus and ATLL, the role of the provirus has remained unclear. In contrast, retroviral integration into murine hematopoietic cells can lead to the generation of leukemia by enhancing expression of cellular proto-oncogenes or by disrupting expression of tumor suppressor genes. Retroviral proviruses in murine leukemias thus provide powerful tags for identifying leukemia genes (Jonkers and Berns 1996). In the case of HTLV-1, there are only a few reports showing the alteration in gene expression by HTLV-1 proviral integration into its corresponding locus. In HTLV-1-transformed T-cell lines MT-2 and C10/MJ, the expressions of interleukin-9 receptor (Kubota et al. 1996) and PDGF beta-receptor (PDGFβR) (Chi et al. 1997) have shown to be influenced by HTLV-1 integration, respectively. In C10/MJ cells, integration of HTLV-1 provirus into thePDGFβR locus caused the expression of truncated PDGFβR protein acquiring transformation capability (Chi et al. 1997). In an ATLL-derived cell lines HuT-102, an elevated expression of a chimeric mRNA joining a segment of the R region of the long terminal repeat (LTR) of HTLV-1 and the 5′-untranslated region (UTR) of IL-15 was observed (Bamford et al. 1996). However, it is difficult to correlate these results in cultured cell lines to etiology of ATLL, since most of the cell lines carry multiple copies of HTLV-1 proviral integration, which are likely to be due to the secondary infection of HTLV-1 during establishment and maintenance of cell lines. Moreover, in vitro infection of HTLV-1 causes immortalization of T-lymphocytes, often leading to the growth of non-tumorigenic cells, even in cell lines derived from ATLL patients. Thus, it is essential to analyze the samples from ATLL patients not from cell lines to assess the cis-effect of HTLV-1 proviral integration on leukemogenesity.

The previous analyses of HTLV-1 integration site revealed randomness of the chromosomal integration site (Leclercq et al. 2000; Macera et al. 1992; Ohshima et al. 1998; Richardson et al. 2001; Seiki et al. 1984). But the relationship between the proviral integration site and the structure of the gene nearby its integration site and its influence on gene expression have never been described in patients with ATLL, mainly due to difficulties in isolation of the viral integration site from the limited amount of patient sample and the lack of information on the human genome.

To overcome these problems, we have developed a high-performance method for characterization of the viral integration site based on the adaptor-ligated PCR (AL-PCR). This new method allowed us to obtain the sequence-flanking proviral integration site within a couple of days using less than 500 ng of DNA without requiring cloning in bacterial hosts. In addition, by taking advantage of rapid progress in the human genome project, we were first able to perform an extensive analysis of the cis-effect on HTLV-1 provirus in a total of 106 integration sites, including 58 new sites from 33 ATLL patients and five ATLL-derived cell lines, together with the previously reported ones (accession numbers AF228926-AF228988, AY003886-AY003899, and S80210-S80215).

Materials and methods

Patients

A total of 33 newly diagnosed ATLL patients between 1989 and 1996 were studied. These patients were natives of Nagasaki Prefecture, one of the clustering areas of ATLL incidence in Japan. Diagnosis and subclassification of ATLL were done according to previously described criteria (Shimoyama 1991). Mononuclear cells from ATLL patients were isolated by Ficoll-Hypaque gradients. All samples contained 65–80% of leukemic cells and were positive for CD4 and negative for CD8 surface markers (Shimoyama 1991). Heparinized peripheral blood was obtained after informed consent and prior to any medication from ATLL patients.

Cell lines

The ATN-1 cell line used in this study has been described (Naoe et al. 1988). The IL-2-dependent, ATLL cell lines OMT, KOB, KK-1, and ST-1 were obtained from Dr. Yamada of Medical School of Nagasaki University (Hata et al. 1999; Maeda et al. 1999; Yamada et al. 1991, 1999).

Oligonucleotide primer sequence

Oligonucleotide primer sequences used are shown in Table 1.

Table 1 Oligonucleotide primer used for adaptor-ligated PCR (AL-PCR), and quantitative RT-PCR analysis. Underline denotes the GC-rich region

Adaptor-ligated PCR

As shown in Fig. 1a, b, 500 ng of genomic DNA was partially digested with 0.1 U of restriction endonuclease Sau3AI in 75 μl of reaction mixture containing 50 mM KCl, 5 mM MgCl2 in 20 mM Tris-HCl (pH 7.4) for 1 h at 37°C, followed by partial fill-in reaction using 33 μM each of dA, dG, and 0.5 U of Klenow enzyme in 20 μl of reaction mixture containing 10 mM MgCl2, 10 mM dithiotreitol in 50 mM Tris-HCl (pH 7.6) for 30 min at 37°C. A set of adaptors 1 and 2 was ligated to the end of the DNA fragment (100 ng) using 250 U of T4 DNA ligase (Nippon Gene, Toyama, Japan) in 5 μl of ligation mixture containing 9.4% polyethylene glycol, 310 mM NaCl, 15 mM MgCl2, 30 mM dithiotreitol, 1.6 mM ATP, 2 μM each adaptor in 80 mM Tris-HCl (pH 7.9) for 2 h at 37°C. One-fortieth of the DNA was then used as a first PCR template with adaptor primer 1 (AP1) and either HTLV-1_0920L for the amplification from 5′ LTR or HTLV-1_8242U from 3′ LTR, followed by nested PCR using adaptor primer 2 (AP2) and either HTLV-1_0339L for the amplification from 5′ LTR or HTLV-1_8911U from 3′ LTR to enhance the specificity. Amplification of each product was carried out using LA Taq (TaKaRa, Tokyo, Japan) according to company-supplied protocol: one cycle of denaturing at 95°C for 5 min, followed by 30–35 cycles of denaturing for 1 min at 95°C, annealing for 1 min at 52°C, and elongation for 3 min at 72°C. The PCR product was then analyzed using HTLV-1_0213L or HTLV-1_8986U primer by either direct sequencing or sequencing after conventional subcloning into pBluescript (STRATAGENE, La Jolla, Calif.) when needed.

Fig. 1a–c
figure 1

Isolation of the genomic DNA sequences flanking the HTLV-1 using an adaptor-ligated PCR. a Schematic presentation of the amplification procedure. White box HTLV-1, gray box human genome, diagonal line box adaptor 1, black diagonal line box GC-rich region in adaptor 1, spotted box adaptor 2. b Structure of adaptor and adaptor primer. Box GC-rich region. c Final PCR amplification of the provirus flanking sequences yields DNA fragments in eight ATLL patients. It could yield specific PCR products of the provirus flanking sequences

Comparison of integration sites with database sequences

The sequence flanking integration sites were analyzed by comparison with the NCBI database. An integration target sequence was scored as a part of a transcription unit if it were (1) a member of the Refseq set of well-studied genes (http://www.ncbi.nlm.nih.gov/LocusLink/refseq.html) or (2) if it was predicted to be a transcription unit by NCBI or ENSEMBLE, and if that assignment were supported by mRNA or spliced EST sequence. Repeat sequences were identified using RepeatMasker analysis. Integration site sequences in the human genome have been deposited in GenBank (accession number AB114357-AB114413).

Quantitative RT-PCR (qPCR)

Normal CD3+ T cells were purified from a healthy donor by Human T-cell Enrichment Columns (R and D systems, Minneapolis, Minn.). Poly(A)+ RNA was isolated from MicroPoly(A)Pure (Ambion, Austin, Tex.) and treated with DNA-free reagent (Ambion) to remove any residual contamination of genomic DNA. First-strand cDNA was synthesized from 100 ng of Poly(A)+ RNA with SuperScript III (Invitrogen, Carlsbad, Calif.) per company-supplied protocol. Quantitative PCR (qPCR) was performed with an ABI Prism 7000 Sequence Detection System (Applied Biosystems, Foster City, Calif.) and qPCR Core Kit for SYBR Green I (EUROGENTEC, Seraing, Belgium). We used the standard curve method to quantify and normalize amounts of each gene relative to specific and actin mRNAs. Expression standard was chosen mean value of two different batches of cDNAs from resting CD4+ cells in the Human Blood Fractions MTC (Clontech, Palo Alto, Calif.), which were collected from a total of 6–20 persons. The position of each primer set was downstream of HTLV-1 integration site, and they did not contain complete ORF.

Statistical analysis

Statistical data analysis was performed using the R language provided by The Comprehensive R Archive Network at http://www.r-project.org/. The two-sample test for equality of proportions was calculated by the following equation

$$Z_{0} = \frac{{|q_{1} - q_{2} |}}{{{\sqrt {q(1 - q)(1/n_{1} + 1/n_{2} )} }}}$$

where n 1 or n 2 is total number of each cases (n 1 is experimental, n 2 is predicted from human genome); r 1 or r 2 is positive number of each cases (r 1 is experimental, r 2 is predicted from human genome); q 1 or q 2 equals r 1/n 1 or r 2/n 2, respectively, and q equals (r 1+r 2)/(n 1+n 2). Z 0 obeys normal distribution, and the P-value is obtained from the two-sided probability of Z 0.

Results

HTLV-1 flanking sequence identification

To analyze the genomic sequences flanking proviruses, we have developed an efficient, AL-PCR procedure as illustrated in Fig. 1a, b. The presence of a GC-rich region next to the ligation site in adaptor 1, and mismatch at the 3′ end of adaptor 2 prominently prevents non-specific PCR amplification. The partial fill-in reaction at the Sau3AI site prevents concatenation of unrelated inserts during adaptor ligation. The PCR products obtained from most of the ATLL patients were sequenced directly with a DNA sequencer without any subcloning step, which increased the speed of the isolation of proviral-flanking sequences. As shown in Fig. 1c, we could yield specific PCR products of the provirus flanking sequences from eight ATLL patients. Ladder products were due to partial digestion with the restriction endonuclease. When there were two or more integration sites in same samples such as ATLL cell lines, cloning in bacterial hosts was required.

Using this AL-PCR procedure, 35 sequences from 33 ATLL patients and 23 sequences from five ATLL cell lines were obtained. Thirty-two out of 35 and 21 out of 23 sequences could be placed on the genome; two patients carried each of two integration sites, and all ATLL cell lines had multiple integration sites of 2–6, and 3 out of 35 and 2 out of 23 sequences did not yield a high-quality match to the genome due to sequences too short to determine a unique placement or having only repetitive sequences.

Identification of the host sequences targeted for HTLV-1 integration in vivo

The distribution of integration sites is shown mapped on the human chromosomes in Fig. 2. The matches to known genetic elements are summarized in Table 2, while Table 3 shows the list of HTLV-1 integration sites derived from 33 ATLL patients and previously reported ones including seven ATLL patients (accession number AF228967-AF228988). Table 4 shows the list of HTLV-1 integration sites derived from five ATLL-derived cell lines and one HTLV-1 infected T-cell line (MT-4) (accession number S80210-S80215). Table 5 shows the list of HTLV-1 integration sites derived from 26 tropical spastic paraparesis (TSP)/HTLV-1-associated myelopathy (HAM) patients (accession number AF228926-S80215 and AY003886-AY003899) and eight asymptomatic carriers (accession number AF228954-AF228965).

Fig. 2
figure 2

Sites of HTLV-1 integration in the human genome. Locations of chromosomal sequences matching HTLV-1 integration sites are shown with arrowheads. Black arrowheads ATLL patients, white arrowheads ATLL and HTLV-1-transformed T-cell lines, gray arrowheads TSP/HAM patients and carriers integration site, * integration site which is inserted into transcription units. Integration sites were dispersed throughout the entire chromosome

Table 2 Chromosomal features associated with human T-cell leukemia virus type-1 (HTLV-1) integration sites. The integration sites studied included those mapped to unique locations on the genome and those in identifiable repeats. UN unknown
Table 3 Summary of HTLV-1 integration sites in adult T-cell leukemia/lymphoma (ATLL) patients
Table 4 Summary of HTLV-1 integration sites in ATLL and HTLV-1-transformed T-cell lines
Table 5 Summary of HTLV-1 integration site in tropical spastic paraparesis (TSP)/HTLV-1-associated myelopathy (HAM) pateints and carriers

As reported before (Leclercq et al. 2000; Macera et al. 1992; Ohshima et al. 1998; Richardson et al. 2001; Seiki et al. 1984), integration sites were dispersed all over the entire chromosome and there was neither similarity nor bias (P>0.05, comparison of the size of each chromosome and the number of integration site). However, analysis of the integration sites in ATLL patients at the molecular level revealed a highly significant deviation from random placement (P<0.02) within the transcription units [22 out of 37(59.5%)]. Around 33% of the human genome contains transcription units including introns (Lander et al. 2001; Venter et al. 2001), so the frequency of integration in transcription units is significantly bias. Although the frequencies of integration in transcription units were also observed in about 47–60% patients with TSP/HAM, asymptomatic carriers as well as ATLL-derived cell lines [9 out of 19 (47.4%), 3 out of 5 (60%), and 11 out of 21(52.4%), respectively], no significant deviation (P<0.05) from random placement was observed as shown in Table 2, most likely due to the small sample size analyzed. In any case, HTLV-1 integration in ATLL patients significantly favored transcription units.

Deregulation of gene expression by HTLV-1 integration

Since HTLV-1 favored to integrate in transcriptional unit in ATLL patients, we next studied whether these corresponding genes were expressed in normal T-lymphocytes by RT-PCR. We found all known genes associated with viral integration in ATLL patients, and ATLL cell lines were expressed in normal T-lymphocytes, which suggested that integration site selection was influenced by transcriptional activity (data not shown).

To investigate whether integration of HTLV-1 caused deregulation of cellular genes, we measured expression of nine genes associated with HTLV-1 integration in ATLL patients by qPCR. The samples for qPCR were obtained from lymphocytes of 20 ATLL patients (nine of these carried HTLV-1 integration within known genes) as well as CD3+ T-lymphocytes and CD4+/CD8 T-lymphocytes from healthy donors with and without T-cell stimulation.

As shown in Table 3 and Fig. 3a, b, two of nine ATLL patients showed significant elevation of gene expression by HTLV-1 integration when compared with ATLL patients without HTLV-1 integration at the corresponding locus and normal T-lymphocytes, as well as CD4 single-positive T-cells. In ATLL cases 12 and 19, ankyrin-1 (ANK-1) and gephyrin (GPHN) genes showed about 4.4- and 102-fold higher expression than CD4 single-positive T-cells, respectively. In Fig. 3, the dotted line denotes average expression level of ATLL patients without HTLV-1 integration at the corresponding loci. Other seven patients showed no significant difference in expression level when compared with ATLL patients with or without HTLV-1 integration at the corresponding locus, as well as normal T-lymphocytes and CD4 single-positive T-cells with and without mitogenic stimulation.

Fig. 3a, b
figure 3

Change in gene expression by HTLV-1 integration. The relative expression level of ankyrin-1 (ANK-1) (a), and gephyrin (GPHN) (b) was compared between patients with and without HTLV-1 integration in the corresponding gene together with normal controls. Symbols are shown with their meaning in the bottom box. Dotted line Average expression level of ATLL patients without HTLV-1 integration at the corresponding loci. Results from two to four PCR reactions were averaged in one plot on diagram. We used the standard curve method to quantify and normalize amounts of each gene relative to specific and beta-actin mRNAs. Expression standard was chosen from the mean value of two different batches of cDNAs from resting CD4+ cells in the Human Blood Fractions MTC (Clontech, Palo Alto, Calif.) collected from a total of 6–20 persons. Arrow integration event

Chimeric transcripts

LTR of HTLV-1 contains a variety of functional elements such as a promoter region, a polyadenylation signal, and a splicing donor site. Integration of HTLV-1 into a cellular transcriptional unit occasionally produces chimeric transcript (Bamford et al. 1996; Chi et al. 1997) containing both viral and cellular sequences. These chimeric transcripts have only been reported in one ATLL-derived cell line, HuT-102, and one HTLV-1-infected T-cell line, C10/MJ, but not reported in ATLL patients. We performed RT-PCR assays to see whether there were any chimeric transcripts in five ATLL patients, whose proviral integrations occurred in the same direction as the transcriptional units of the corresponding gene as shown in Table 3. The results of RT-PCR revealed no evidence of chimeric transcript between HTLV-1 and the endogenous gene.

Discussion

We have developed a rapid and convenient method for the analysis of the proviral integration site based on AL-PCR. To obtain the sequence-flanking proviral integration site, conventional methods such as screening of genome library, inverse PCR (Ochman et al. 1988), or ligation-mediated PCR (Smith 1992) always require the cloning step. Screening of a genomic library entails huge cost, labor, takes longer time, and employs much more DNA. Inverse PCR and ligation-mediated PCR have low levels of reproducibility and low specificity. Formation of concatemer in conventional inverse PCR or ligation-mediated PCR during circularization or the adaptor-ligation step often causes amplification of chimeric DNA fragments leading to misinterpretation of localization of integration site. By including partial fill-in reaction before adaptor ligation, we were able to suppress such concatemer formation effectively. Furthermore, the use of restriction enzyme Sau3AI in combination with partial digestion made it easy to adjust the size of amplified fragment suitable for PCR in more than 95% of cases. If it is hard to obtain suitable size of PCR product, it is easy to use restriction enzymes other than Sau3AI. With our AL-PCR, most of the integration site from ATLL patients was determined by the direct sequencing method if there were a major population detected by the Southern method. A minor population that is barely detectable by the Southern method will hide in major populations in the case of amplification by PCR. But, we confirmed that minor populations could be detected by cloning (data not shown). Moreover, less than 10% of our samples failed to locate in the genome integration sites due to the presence of large repeat regions.

By including suppression PCR technology (Park et al. 2003; Siebert et al. 1995), we are able to suppress amplification of non-specific product; the templates possessing adaptor 1 on both ends form an inverted repeat and a “Panhandle” through the GC-rich region in the adaptor molecule. As the melting temperature of this structure is more than 85°C, it is more stable than the primer-template hybrid and thereby suppresses exponential amplification.

Improvement of the adaptor sequence as well as the utilization of partial fill-in reaction enabled us to obtain the very specific PCR product ready to use as the template for direct sequencing. This method is particularly useful for the analysis of HTLV-1 integration in ATLL patients, since the most of leukemic cells have monoclonal integration of HTLV-1. With our method, we could obtain the sequence of the viral flanking region by employing genomic DNA of less than 500 ng within a couple of days. Because of its high specificity, we have also found this AL-PCR method quite useful for the isolation of chromosomal breakpoints from several leukemic patients by amplifying the region containing the breakpoint, using nested primers derived from just one side of chromosome with unknown sequence on the other side of chromosomal region (manuscript in preparation).

It is still unknown whether the HTLV-1 integration into the host genome plays any role in the development of ATLL. Previous reports suggested that HTLV-1 integration in the human genome was randomly distributed at the chromosomal level. The availability of the human genome sequence allowed us to do a much more straightforward and quantitative analysis of integration site. We found about 60% of HTLV-1 integration occurred in expressed loci in patients with ATLL. We also observed similar association with the transcriptional unit in ATLL-derived cell lines and in asymptomatic carriers, although it was not statistically significant, probably due to small sample size. Similar phenomenon has been described in human immunodeficiency virus (HIV); 69% of HIV integration occurred in actively transcribed loci when a human lymphoid cell line was infected with HIV or an HIV-based vector in vitro (Schroder et al. 2002). Thus, it might be possible that there is a common mechanism that brings a bias on integration site into the active gene, such as increased chromatin accessibility in transcribed regions, thereby removing inhibitory effects of unfavorable chromatin environment, such as open chromatin regions or in naked DNA on nucleosomes (Bushman 2002; Holmes-Son et al. 2001). Alternatively, as has been suggested for integration targeting by yeast retrotransposons (Ji et al. 1993; Kirchner et al. 1995), favorable interactions between preintegration complex and locally bound transcription factors may promote integration in active genes. In addition to these mechanisms, the integration of HTLV-1 in ATLL patients could also be affected by the selection for growth advantage during leukemogenesis. If such event occurred, the resulting integrations are more likely to be found in cellular genes involved in tumorigenesis as is the case in the mouse; retroviral integration in hematopoietic cells can lead to the generation of leukemias by enhancing expression of cellular proto-oncogenes or by disrupting expression of tumor suppressor genes. Thus, it is important to clarify the function of each gene involved in HTLV-1 proviral integration. If HTLV-1 integration causes deregulation of cellular genes, we may find the change in its expression level or the structure of transcripts. Our results indicated that two out of nine genes analyzed were actually overexpressed in ATLL patients. One of these genes, GPHN, originally identified as a receptor-associated protein, links the β-subunit of glycine receptors to the subsynaptic cytoskeleton (Kirsch and Betz 1995; Kirsch et al. 1991; Prior et al. 1992). GPHN binds to phosphatidylinositol 3,4,5-triphosphate binding proteins involved in actin dynamics and downstream signaling, and interacts with the ATM-related family member RAFT1 (Sabatini et al. 1999), an important regulator of mRNA translation initiation. RAFT1 mutants that could not associate with GPHN failed to signal to downstream molecules, including the p70 ribosomal S6 kinase and the eIF-4E binding protein, 4E-BP1 (Beretta et al. 1996; Brown et al. 1995; Brunn et al. 1997; Chung et al. 1992; Kuo et al. 1992; Sabatini et al. 1999; von Manteuffel et al. 1996). Since translation initiation is one of the key events regulated in response to mitogenic stimulation and nutrient availability, tightly coupled to mammalian cell-cycle progression and growth, overexpression of GPHN might stimulate these signaling pathways. Moreover, recent reports have also identified GPHN as fusion partner of MLL from two patients with a de novo acute monoblastic leukemia or acute undifferentiated leukemia both carrying t(11;14)(q23;q24) (Eguchi et al. 2001; Kuwada et al. 2001). Although the biological significance of overexpression of GPHN in ATLL needs to be clarified, it might cause dysregulation of cellular partner protein(s) that may lead to malignant transformation of ATLL.

Due to the limited amount of patients samples, we were unable to determine the integrity of both transcripts corresponding to ANK-1 and GPHN genes in patients showing overexpression. However, orientation of HTLV-1 integration was opposite to the direction of ANK-1 and GPHN; thus, it is unlikely that chimeric transcripts between LTR and the corresponding gene caused overexpression of these genes. More detailed analysis of these patients is being carried on.

Our data suggest that the role of HTLV-1 integration for deregulation of host genes at the step of leukemic transformation is limited for two reasons: (1) the lack of common integration site of HTLV-1 in ATLL patients and (2) only about 10% of ATLL cases, with or without integration into certain transcriptional unit, showed viral integration associated upregulation of cellular genes. Failure to detect change in expression of cellular genes by viral integration in seven out of nine ATLL cases does not necessarily mean these genes are not involved in leukemogenesis, since such alteration might only be critical at the early stage of leukemogenesis and no longer required at the late stage of leukemia after accumulation of many genetic alteration. Thus, further functional analysis of these genes is required to draw a conclusion.

In conclusion, we demonstrated preferential integration of HTLV-1 into expressed loci in ATLL patients. This integration occasionally causes deregulation of corresponding cellular genes, which may lead to leukemogenesis in a fraction of ATLL.