Introduction

Approximately 8% of the human genome is comprised of human endogenous retroviruses (HERVs) and other long terminal repeat (LTR)-like elements (International Human Genome Sequencing Consortium 2001). These sequences can be easily recognized because they consist of at least three genes—gag (coding structural proteins), pol (viral enzymes), and env (surface envelope proteins)—as well as LTRs. HERVs have been subjected to many amplification and transposition events during evolution resulting in a widespread distribution of complete or partial retroviral sequences. HERVs represent the remnants of ancestral retroviral infections that were integrated into the germline of primates 35–45 million years (MY) ago (Sverdlov 2000). It was recently reported that, using phylogenetic analyses based on either the highly conserved reverse transcriptase (RT) domain of the pol gene or the transmembrane (TM) moiety of the envelope gene, at least 26 HERV families with distinct retroviral lineages reside within the human genome (Tristem 2000; Benit et al. 2001). Subsequent retrotransposition event and recombination mechanisms amplified these sequences to a high degree during primate evolution. Such retroviral elements have been described in primates and humans (Kabat et al. 1996; Kim and Crow 1999; Shi et al. 1991). Several HERV families with very different copy numbers (1–1,000 copies) and genomic integration have been identified. These different copy numbers could represent either multiple integration events or proviruses that have been amplified by retrotransposition following integration.

It has been suggested that HERVs have played a role in influencing the functional organization of the human genome (Baltimore 1985). The open reading frames of many extinct HERVs are interrupted by premature stop codons that interrupt the “read through” of genetic information. However, the structural genes of some HERV families are expressed preferentially in human placenta (Kjellman et al. 1999; Mi et al. 2000; Venables et al. 1995) and several cancer cell lines (Armbruester et al. 2002; Lower et al. 1993; Yi et al. 2004). Expression of HERVs could be influenced by the outcome of infections in different ways that can be either beneficial or detrimental to the host. Full-length retroviral sequences may interact with cellular oncogenes (Varmus 1982) and retroviral LTR sequences have the capacity to exert a regulatory influence as promoters and enhancers of cellular genes (Leib-Mosch et al. 1993). A small number of these sequences have acquired a role in regulating gene expression, and some of these may be related to differences between individuals, and to expression of disease (Sverdlov 1998).

The murine leukemia virus (MuLV)-related endogenous retrovirus HC2 was isolated by high-stringency hybridization with the 847 bp pol gene. It was found to be more closely related to the human endogenous retrovirus (HERV) S71 than to other MuLV-related retroviruses such as gibbon ape leukemia virus (GaLV), feline leukemia virus (FeLV), baboon endogenous virus (BaEV), and simian sarcoma virus/simian associated sarcoma virus (SSV/SSAV) (Kabát et al. 1996). The genome organization of S71, which is localized on chromosome 18q21 (Brack-Werner et al. 1989), is truncated in the pol gene and it carries a 920-bp insertion of a solitary HERV-K LTR element (Haltmeier et al. 1995). The deletion within the pol region spans the 3′ domain of the protease gene, the complete reverse transcriptase gene, and the 5′ end of the tether (Werner et al. 1990). The S71-related retroviral elements pCRTK1 and pCRTK6, which contain intact protease and reverse transcriptase domains that are missing in S71, were identified by polymerase chain reaction (PCR) amplification. Over the past decades a considerable number of studies have classified HERVs using different methodologies. In general, HERV families are often named according to the type of tRNA used by their primer binding site (PBS), which is located downstream of the 5′ LTR. For example, the HERV-W, H, and F are primed by the tryptophan, histidine, and phenylalanine, respectively. Despite previous characterization, PBS data were not available for the HERV-S71 family due to the absence of a 5′ LTR (Tristem 2000). Recently, a new HERV-T family, HERV-S71 (renamed by Bénit et al. 2001), was identified from human sequences in the databases. Based on phylogenetic analysis, the HERV-T family is most closely related to the HERV-R family (Benit et al. 2001) and consists of 100 copies within the human genome (de Parseval et al. 2003). Previous studies suggest that the human endogenous retrovirus HERV-HC2 is closely related to HERV-S71, and to the retroviral fragments pCRTK1 and pCRTK6 (Werner et al. 1990; Haltmeier et al. 1995), suggesting that they create a single phylogenetic group. HERV-HC2 is an incomplete provirus containing intact gag and pol genes and a 3′ LTR, but missing the 5′ LTR and env gene (Kabat et al. 1996). HERV-HC2 is present as a single copy, located at chromosome 10q26, within the haploid human genome (Kim and Crow 1998). We previously reported that HERV-HC2 belongs to HERV-T family, and comprises four members in the human genome (Kim and Crow 1999) and 12 members in the primate genome (Kim et al. 1999a). Additionally, we also reported that the HERV-HC2 pol gene is expressed in several tissues of the squirrel monkey (Yi et al. 2003). This information led us to examine expression of HERV-HC2 in normal human tissues and cancer cells.

Materials and methods

Cell culture

Human cancer cells (RT4, PFSK-1, BT-474, HCT-116, TE-1, UO-31, Jurkat, HepG2, A549, MCF7, OVCAR-3, MIA-PaCa-2, PC3, LOX-IMVI, AZ521, 2F7, U-937, C-33A) were grown in Dulbecco’s modified Eagle’s medium (DMEM) supplemented with 10% heat-inactivated fetal calf serum, 2 mM glutamine, 1 mM nonessential amino acids, 1 mM sodium pyruvate, 100 U/ml penicillin, 0.1 mg/ml streptomycin at 37°C, in a 5% CO2 incubator.

Isolation of total RNA, and RT-PCR analysis for HERV-HC2 belong to the HERV-T family

Total RNA from human cancer cell lines was isolated using a High Pure RNA isolation kit (Roche, Indianapolis, IN) following the protocol provided by the manufacturer. Human tissues (brain, prostate, testis, heart, kidney, liver, lung, placenta, skeletal muscle, spleen, thymus, uterus) were purchased from Roche. RNA was treated with 1 U DNase I (RNase-free) at 25°C for 20 min to remove contaminating DNA. Using the pure 1 μg/μl mRNA only, expression patterns of the HERV-HC2 pol gene were examined by the Titan One Tube RT-PCR System (Roche). RT-PCR was performed in a 25 μl volume. We performed PCR amplification without a reverse transcription (RT) reaction with pure mRNA samples, to verify that mRNA samples prepared from the human tissues and cancer cells did not contain genomic DNA. Using an RT-PCR approach, novel 339-bp pol fragments of the HERV-HC2 family were amplified by primer pairs HS307 (5′-TGGGTAAAGCTCCTTCCTTTAG-3′, bases 5,128–5,149) and DS706 (5′-ATGACATATGAGGTCCTTTTC-3′, bases 5,444–5,465) from HERV-HC2 (GenBank, accession no. Z70664). The RT-PCR conditions followed were the standard protocol of the Titan One Tube RT-PCR System, with an annealing temperature of 50°C. As a standard control, G3PDH was amplified using the primers GPH-S (5′-CAAAGTTGTCATGGATGACC-3′, bases 31,721–31,740) and GPH-AS (5′-CCATGGAGAAGGCTGGGG-3′, bases 31,898–31,915) from human G3PDH (GenBank accession no. AC068657).

Molecular cloning of RT-PCR products, sequencing, and data analyses

RT-PCR products were separated on a 1.8% agarose gel, purified with the QIAEX II gel extraction kit (Qiagen, Hilden, Germany) and cloned into the pGEM-T-easy vector (Promega, Madison, WI). The cloned DNA was isolated by the alkali lysis method using the High Pure plasmid isolation kit (Roche). Individual plasmid DNA was digested by the restriction enzyme EcoRI. Positive samples were subjected to sequence analyses on both strands with T7 and M13 reverse primers using an automated DNA sequencer (Model 373A) and the DyeDeoxy terminator kit (Applied Biosystems, Foster City, CA). Nucleotide sequence analyses were performed using the GAP, PILEUP, and PRETTY from the GCG software (http://www.accelrys.com/products/gcg/). The neighbor-joining tree was obtained using the MEGA2 program (Kumar et al. 2001). Bootstrap evaluation of the branching patterns was performed with 100 replications. Distances were estimated by the Kimura two-parameter method in the MEGA2 program to determine sequence divergence (Kimura 1980). Nucleotide sequences of HERV-HC2 family were retrieved from the GenBank database with the aid of the BLAST network server (Altschul et al. 1997).

Nucleotide sequence accession numbers

The nucleotide sequence data reported in this paper will appear in the DDBJ/EMBL/GenBank nucleotide sequence databases with the accession numbers: AB167270–AB167348.

Results and discussion

Several studies on HERV transcription have been performed by Northern blot analyses (Lower et al. 1996), indicating HERV sequences are expressed preferentially in the placenta. The best example of an HERV with a known function is HERV-W, which is involved in placenta development. Particularly, the high-level expression observed in the placenta, of three endogenous coding env genes namely, HERV-R, HERV-W, and the newly identified HERV-FRD, have been implicated in two major physiological processes in human placental tissue. We have already used RT-PCR to examine the expression of HERV-HC2 pol fragments in mRNA from New World monkey tissues, which were found to have a more specific expression pattern than that detected in squirrel monkeys (Yi et al. 2003). This information motivated us to examine Old World monkey tissues (manuscript in preparation), human tissues, and abnormal tissues for comparison. Therefore, in this study, RT-PCR and sequencing was used to analyse expression of the pol fragment of HERV-HC2 belong to the HERV-T family in 12 human tissues and 18 human cancer cell lines. Expression of the pol gene was detected in all human tissues examined (brain, prostate, testis, heart, kidney, liver, placenta, skeletal muscle, spleen, thymus, and uterus) except for lung. It was also detected in all cancer cell lines examined (RT4, PFSK-1, BT-474, HCT-116, TE-1, UO-31, Jurkat, HepG2, A549, MCF7, OVCAR-3, MIA-PaCa-2, PC3, LOX-IMVI, AZ521, 2F7, U-937, and C-33A) (Fig. 1a, b). Thus, the HERV-HC2 pol gene may play a biological role in human tissues and cancer cells. These data suggest that the coding capacity of HERV-HC2 was acquired during hominid evolution. Recently, HERV-W structural genes (gag/pol/env) were found to be expressed in multiple human tissues and cancer cell lines, implying a potential transcriptional role for the HERV-W family in human cancer genomes (Yi et al. 2004). In addition, coding envelope genes of HERV-K, T, W, FRD, and R have been detected in the majority of human normal tissues (de Parseval et al. 2003).

Fig. 1
figure 1

Reverse transcription-polymerase chain reaction (RT-PCR) analysis of mRNA for the expression of human endogenous retrovirus (HERV)-HC2 pol fragments from various a human tissues and b cancer cells. c Comparative expression pattern between human normal tissues and cancer cells. G3PDH Internal positive control

As expected, control PCRs carried out without the RT step never resulted in any amplification. The RT-PCR products were cloned and sequenced. The clones of the expressed HERV-HC2 pol sequences are described in Table 1: 29 clones from different human tissues and 50 clones from human cancer cells, respectively, were sequenced. The HERV-HC2 pol fragment from human tissues and cancer cell lines showed a high degree of nucleotide sequence similarity (79.1–99.0%) to that of HERV-HC2 families from human chromosomes retrieved from the NCBI database (AL662890 on chromosome 6, AC007276 on chromosome 7, AL117337 and AC073174 on chromosome 10, AC009533 on chromosome 12, AC116553 on chromosome 16, AC092364 on chromosome 19, AC011749 on chromosome Y and M32788 on chromosome 18) and our previous data from a human monochromosomal panel [AB016525 (HC2-3) on chromosome 3, AB016526 (HC2-7) on chromosome 7, AB016527 (HC2-10) on chromosome 10 and AB016528 (HC2-16) on chromosome 16]. Examination of inferred amino acid sequences revealed that 20/29 clones from human tissues and 28/50 clones from human cancer cell lines (highlighted in Table 1) had no translation interruptions resulting in point mutations or deletion/insertions. Alignment of the putative amino acid sequences of the pol fragments of the HERV-HC2 family from various human tissues and cancer cell lines together with our previous data (Kim and Crow 1999) (Fig. 2) revealed features that could be of great importance in understanding their functional roles. The percentage identity of 98 amino acid sequences of HERV-HC2 pol fragments ranged from 79.2 to 99.9% (Tables 2, 3). We also analyzed synonymous and non-synonymous substitutions within the HERV-HC2 family in order to discover the evolutionary forces at work. In pol fragments from human tissues and human cancer cells, mean synonymous substitution (Ks) ranged from 0 to 18.6%, whereas mean non-synonymous substitution (Ka) ranged from 0 to 22.8%; the value of Ka/Ks ranged from 0 to 3.38. In terms of the Ka/Ks ratio, 53.2% of the values in pairwise comparisons were <1 (data not shown). The data suggest that negative selection is acting on these sequences of the HERV-HC2 family.

Table 1 Human endogenous retrovirus (HERV) HC2- pol gene sequences from various human tissues and cancer cell lines
Fig. 2
figure 2figure 2

Amino acid sequence alignment of the HERV-HC2 pol fragments without nonsense or frameshift mutations. Consensus sequences are shown on the top row. Dashes indicate residues identical to the consensus sequences and dots indicates gaps introduced to maximize the alignments

Table 2 Percentage identity of amino acid sequences of HC2-pol fragment from various human tissues
Table 3 Percentage identity of amino-acid sequences of HERV-HC2 pol fragment from various human cancer cells

The pol gene sequences of the HERV-HC2 family have been detected in hominoids, Old World monkeys and New World monkeys by PCR amplification (Kim et al. 1999a), in distinction to HERV-W pol gene sequences, which have been detected in hominoids and Old World monkeys, but not in New World monkeys (Kim et al. 1999b). To understand the evolutionary relationships within the HERV-HC2 family in human tissues and cancer cells, we performed phylogenetic analyses with these clones. Using all HERV-HC2 pol members including our previous data from a human monochromosomal DNA panel and the GenBank database, a neighbor-joining tree was constructed. The tree showed that HERV-HC2 members have been continuously amplified and scattered by duplication events during primate evolution, clustering randomly between clones from multiple human normal tissues and cancer cells (Fig. 3). As shown in Fig. 3, HERV-HC2 family members fell into three distinct groups (I, II, III) through evolutionary divergence, indicating that the HERV-HC2 pol family has been amplified at least three times after the original integration into the genome. We used the pol gene sequence of HERV-S71 (M32788), which is the HERV-T family, as an outgroup. On this tree, we found interesting features within groups T-1, T-2, T-3, T-4, T-5, T-6, and T-7. Among clones of the HERV-HC2 family from human tissues and cancer cell lines, 100% nucleotide sequence similarity to each other was discovered. Strikingly, clone HC23-1 from testis of normal human tissue was completely identical to AC009533 from human chromosome 12 in group T-1. In a similar manner, HERV-HC2 families from different tissues (brain, prostate, testis, kidney, liver, placenta, spleen, thymus, and uterus) and cancer cells (PFSK-1, TE-1, Jurkat, Hep G2, OVCAR-3, MIA-PaCa-2, PC3, LOX-IMVI, 2F7, and U-937) belonging to the T-3, T-4, T-5, and T-7 families showed completely identical sequences to AC007276 from human chromosome 7, AL117337, and AC073174 from human chromosome 10 and AB016525 (HC2-3) from human chromosome 3, respectively.

Fig. 3
figure 3

Phylogenetic tree for pol fragments of HERV-HC2 from human tissues and cancer cell lines. The tree was constructed using the neighbor-joining method. Branch lengths are proportional to the distances between the taxa. The values at the branch points indicate the percentage support for a particular node after 100 bootstrap replicates were performed. HC2-clones are derived from human tissues and CHC2-clones are derived from cancer cells

On the assumption that the dendrogram in Fig. 3 reflects the phylogenetic relationships of the HERV-HC2 family, we computed the pairwise divergence for the three groups as group I = 12.7%, group II = 18.8%, and group III = 22.5%, employing the Kimura two-parameter method (Kimura 1980). We then estimated the divergence times of the three groups as 31 MY (group I), 47 MY (group II), and 56 MY (group III), using the average evolutionary rate of 0.2% per million years (Anderssen et al. 1997). This is in agreement with the result of our PCR analysis, which showed that HERV-HC2 pol gene sequences were present in hominoids, Old World monkeys, and New World monkeys (Kim et al. 1999), suggesting that the HERV-T family was integrated into the primate genome before the divergence of New World monkeys and prosmians (Fig. 4). Taken together, our data support the notion that the HERV-HC2 pol elements were inserted into the primate genome about 56 MY ago, suggesting that an evolutionary rate of 0.2% per MY was available to the HERV-HC2 pol family during primate evolution. Accordingly, groups I, II, and III proliferated about 31, 47, and 56 MY ago, respectively.

Fig. 4
figure 4

Putative integration times of HERV-HC2 belonging to the HERV-T family during primate evolution based on the data presented in Fig. 3. An average evolutionary rate of 0.2% per million years was used. Evolutionary estimation of the HERV-HC2 family was based on the average divergence of members of each subfamily from their respective consensus sequences. The evolutionary tree was modified from Lebedev et al. (2000), Kim et al. (1999), and Anderssen et al. (1997)

There have been numerous reports that several HERV sequences are expressed in various tissues, indicating that they could have a biological role. The syncytin gene (HERV-W env gene) is one well-known functional HERV family member that is actively expressed in human placenta and testis, with two transcripts of 4 and 8 kb in Northern blot analysis (Mi et al. 2000). Syncytin could mediate placental cytotrophoblast fusion connected to placental development. HERVs have also been frequently proposed to play a role in the etiology of chronic diseases such as cancer, autoimmunity and neuropsychiatric disease (Lower 1999). The growing number of studies on HERV transcription has revealed preferential expression of HERV families both in normal human tissues and abnormal tissues such as tumors. In comparative analyses between normal tissues and cancer cells, the HERV-HC2 pol gene was expressed in A549 lung cancer cells whereas no expression appeared in normal lung tissue (Fig. 1c). To date, using RT-PCR, RNA in situ hybridization and Northern blot analysis, only the HERV-E env gene has been shown to be actively expressed in prostate carcinoma tissues and cell lines, but not in normal prostate tissues and cells (Wang-Johanning et al. 2003). No expression of the HERV-E env gene was detected in breast tissue. On the contrary, HERV-K env transcripts have been detected in most breast cancer cell lines and in many breast tumor tissues (Wang-Johanning et al. 2001). More strikingly, screening human sequence databases for endogenous retroviral elements with coding envelope genes has characterized 16 candidate genes. As part of a study aimed at understanding the potential biological function of these latter sequences, a previously uncharacterized fusogenic envelope gene belonging to the HERV-FRD family was identified using a cell–cell fusion assay, suggesting that, like HERV-W, such sequences could have fusogenic properties in placenta formation (Blaise et al. 2004). Furthermore, a systematic screen of the expression level of 16 coding env genes present in the human genome was performed in a series of 19 healthy human tissues. The newly identified HERV-T env gene was generally transcribed in the majority of the tissues; specifically, high-level expression was detected in the thyroid, implying that this phenomenon might be relevant to the hormone-producing status of the thyroid gland and to specific sequences in the HERV LTR (de Parseval et al. 2003). It has been strongly suggested that enhancer and promoter elements in retroviral LTRs influence the transcription of neighboring genes (Kowalski et al. 1999).

In summary, we examined the expression pattern and relationship of HERV-HC2 belonging to the HERV-T family in human tissues and human cancer cells. With our previous data from various primates, we already had information that HERV-HC2 family is present in hominoids, as well as in both Old- and New-World monkeys. Based on the average divergence of the HERV-HC2 family, these sequences have evolved at a rate of 0.2% nucleotide differences per MY during primate evolution, suggesting that they were integrated into primate genomes approximately 56 MY ago. Although the functional and biological roles of HERV-T family members remain obscure, our data provide useful information for studies on the transcriptional potential of the HERV-T family in the human genome as well as pathological knowledge of the HERV-T family in human cancers. In further studies, it will be worthwhile investigating the relationship between expression of HERV-T and neighboring genes, as well as the gene function of the HERV-T family.