Introduction

The incidence of clinical prostate cancer differs substantially between ethnic groups, where African Americans exhibit a 10-to 40-fold higher incidence than Asians (Gronberg 2003). Despite an expanding body of epidemiological data, the etiology of prostate cancer remains poorly understood. However, evidences support the involvement of both genetic and environmental factors, which might also attribute to the ethnic differences in incidence rates.

The growth and development of the prostate gland, together with the maintenance of its physiological integrity, are dependent on the circulating androgens and intact intracellular steroid signaling pathways (Cunha et al. 1987). The effects of androgens are mediated through the androgen receptor (AR), a ligand-activated nuclear transcription factor encoded by AR gene, located on the X chromosome (Xq11–12). The AR gene comprised of 8 exons, spanning more than 90 kb of the genomic DNA, encodes the AR protein with four functional domains including an amino-terminal transcription activation domain (TAD), the DNA binding domain (DBD), a hinge region and the carboxyl-terminal ligand-binding domain (LBD) (Janne et al.1993). Androgens, particularly dihydrotestosterone, bind to the AR with high affinity and stimulate the transcription of a cascade of androgen-responsive genes. In addition to stimulating the expression of genes associated with the differentiated phenotype of the prostate, such as prostate-specific antigen (PSA), it has been reported that AR may regulate genes involved in cell-cycle control, e.g., cyclin dependent kinases like CDK2, CDK4 and p16 (Lu et al. 1997). Thus, the AR transactivation plays an important role in the normal growth and function of the prostate gland. Exon 1 of the gene contains the polymorphic CAG and GGN repeat motifs, which are approximately 1.1 kb apart encoding polyglutamine and a polyglycine tract, respectively (Edwards et al. 1992). GGN repeat is complex in nature and composed of (GGT) 3(GGG) 1(GGT) 2(GGC) n(Platz et al. 1998). The number of CAG repeats ranges from 8 to 35 with an average of 20–23 repeats and the GGN tract varies from about 10–35 repeats. In vitro investigations suggest that CAG repeat length correlate inversely with AR transactivation (Chamberlain et al. 1994). The longer CAG repeat length found to be associated with male infertility (Patrizio et al. 2001), while the short CAG repeat length of the AR has been reported to predispose to prostate cancer (Coetzee et al. 1994; Mishra et al. 2005; Krishnaswamy et al. 2006). However, the effect of the variation in the length of the GGN tract on AR activity is unclear. Results from earlier studies on transient transfection of reporter constructs have shown that deletion of the GGN tract resulted in either no alteration or increased or decreased AR transcriptional activity (Jenster et al. 1994, Gao et al. 1996). Jenster et al. (1994) found that complete deletion of the (GGC) n sequence had no substantial effect on AR activity, whereas Gao et al. (1996) found that the same mutation resulted in a diminished capacity (30% reduction) to activate the luciferase gene.

Epidemiological investigations on the association between the number of GGN repeats and prostate cancer risk have produced inconsistent results. Because of the ethnic variation in CAG and GGN repeat lengths of the AR gene and the role of AR in androgenic activity, it has been suggested that the polymorphism may help explain part of the large ethnic difference in prostate cancer risk.

In India, a significant association has been identified between short CAG repeats and prostate cancer risk in North Indian men (Mishra et al. 2005), as well as our earlier study on the South Indian men (Krishnaswamy et al. 2006). This raised curiosity to determine the role of GGN repeat polymorphisms and its linkage with CAG repeats in prostate cancer risk in South Indian men. Therefore, we have analyzed the relation between AR-GGN microsatellite and prostate cancer risk and investigated whether this relation varies with tumor grade, PSA levels and age at diagnosis. Moreover, we assessed whether specific combination of CAG and GGN microsatellite alleles shows significant association with prostate cancer risk. In addition, we also tested for a possible linkage between the repeats among all the individuals studied, irrespective of the disease status.

Materials and methods

Subjects

The present case-control study comprised of 86 histologically confirmed prostate cancer patients and 119 male control subjects from the southern part of India. The controls comprised of 79 healthy, age-matched and unrelated individuals with normal serum PSA levels (≤4 ng/ml), digital rectal examination showing no abnormality and with no history of cancer and 40 subjects with benign prostatic hyperplasia (BPH). Both patients and control individuals were from the same ethnic background. Relevant clinical and pathological data were collected for all the patients. The age of prostate cancer patients ranged from 44 to 98 years with mean age of 67.5 years, in BPH patients it was 55–77 years with mean of 65.5 years and in normal healthy controls the age ranged between 50 and 81 years with mean of 66.5 years. Pathological grading of the tumors by Gleason scores (GS) were obtained and the patients were stratified as low grade if their Gleason scores were less than 7 and high grade if their Gleason scores were greater than or equal to 7. The Gleason score was less than 7 in 47 patients and greater than or equal to 7 in 39 patients. The study was approved by the Institutional Medical and Ethics Committee. Blood samples were collected from both the patients and controls with an informed written consent.

Genotyping of GGN repeat polymorphism

Genomic DNA was isolated from blood leucocytes by standard phenol/chloroform method (Sambrook et al. 1989). Exon 1 of AR gene was genotyped using the primers flanking the GGN repeat motif: 5′FAM-CCGCTTCCTCATCCTGGCACAC 3′ (forward primer) and 5′ GCCGCCAGGGTACCACACATC 3′ (reverse primer). Each PCR was carried out in a 10 μl reaction mixture containing 20 ng DNA, 1 μl of 10 × PCR buffer, 5 pM of each primer, 200 μM dNTPs (deoxynucleotide triphosphates), 0.4 μl of 100% DMSO, 0.6 μl of 100% Glycerol and 0.5 U of AmpliTaq Gold (Perkin–Elmer). PCR conditions consisted of initial denaturation of 96°C for 12 min, followed by 30 cycles each consisting of 1 min 30 s at 96°C, 1 min at 60°C and 3 min at 72°C followed by a final extension at 72°C for 5 min. For GeneScan analysis, 3.0 μl of the PCR product was mixed with 0.2 μl of LIZ500™ and 6.8 μl of formamide. Upon, denaturation for 5 min at 95°C and cooling for 5 min on ice, the samples were run on ABI 3730 DNA analyzer (Applied Biosystems, USA). The raw data were further analyzed using GeneMapper software to determine the number of repeats. The PCR and the genotyping were repeated for all the samples to confirm the number of repeats.

Genotyping of CAG repeat polymorphism was performed by PCR followed by GeneScan analysis on ABI3730 Genetic Analyzer as described earlier (Krishnaswamy et al. 2006)

Statistical analysis

The descriptive measures like mean, median and standard deviation of various characteristics such as age, PSA levels, tumor grade, CAG and GGN repeats of the subjects were calculated. Comparison of the mean GGN repeat length among the cases and controls were carried out using unpaired t test. The mean GGN repeat was used to categorize the subjects into two groups and the relative risk associated with GGN repeats was determined by calculating odds ratio (OR). The difference in proportion of specific CAG and GGN alleles between the cases and controls were evaluated by calculating OR. Using the χ2 test, we assessed whether the distribution of GGN repeats varied by the level of CAG repeats and also determined the linkage disequilibrium between the repeats separately among the cases and controls. To further test whether GGN microsatellite contributes to the risk in combination with CAG microsatellite, logistic regression analysis was carried out with CAG and GGN as binary covariates.

Mean GGN repeat in different groups of Gleason score, age and PSA were compared by t test. GGN repeat distributions within each prognostic factor (age, grade and PSA values) were also calculated. One-way analysis of variance (ANOVA) was used to compare more than two mean values. All the tests were two-sided and the level of significance is taken as 5%. All the statistical analysis of data were performed using the statistical software SPSS (version 13).

Results

The mean age of prostate cancer patients, BPH and healthy controls were 67.5, 65.5 and 66.5 years, respectively. Mean serum PSA level measured at the time of diagnosis was 44.4 ng/ml in prostate cancer patients. The PSA values were in the normal range (≤4 ng/ml) in both the BPH patients and the healthy controls. Selected characteristics of prostate cancer cases and controls are presented in Table 1.

Table 1 Principle characteristics of study subjects

The number of GGN repeats among cases and healthy controls ranged from 15 to 26 with a mean of 21, whereas in BPH the number of GGN repeat was between 15–23 with mean of 21. Thus, no significant difference was seen in the mean GGN repeat among the cases and controls; also, within the control group the mean GGN repeats length was 21 in both BPH and healthy controls. The patient and control allelic distributions for the (GGN) n polymorphism are shown in Fig. 1.

Fig. 1
figure 1

Distribution of GGN repeats in Androgen receptor gene among prostate cancer patients and controls

For the polyglycine tract (GGT) 3(GGG) 1(GGT) 2(GGC) n, there was no variation in the number of GGT and GGG trinucleotides in all the samples analyzed, although the number of GGC repeats was highly variable. The pattern was always three GGT, one GGG, and two GGT, followed by a variable number of GGC repeats. Hence a 22 GGN repeat refers to 6 repeats corresponding to the consensus sequence and 16 GGC repeats. In the (GGN) n system, the 2 alleles, 21 and 22 repeats were predominant and together they accounted for 78% in patients and 75% in control populations of all 12 alleles genotyped.

In order to assess the risk associated with GGN repeats, the study subjects were dichotomized based on the mean GGN repeat. Men with GGN repeats ≤21 had no significant risk of prostate cancer compared to those with >21 repeats (OR 0.91 at 95% CI = 0.52–1.58) (Table 2).

Table 2 Risk of prostate cancer in relation to the number of GGN repeats in exon 1 of AR gene

The CAG repeat polymorphism analyzed in our previous study revealed a significant difference in the mean CAG repeats between prostate cancer patients and controls (17.0 vs. 20.7; < 0.001) and, men with CAG repeat length ≤19 had a significantly increased risk for cancer than those with >19 CAG repeats (OR-5.90 at 95% CI 3.2–11.2; P < 0.001) (Krishnaswamy et al. 2006). In order to determine whether specific combination of CAG and GGN alleles differed significantly between cases and controls, we combined the CAG repeat data with the GGN repeats, observed in the present study. There was prevalence of CAG ≤19/GGN ≤21 and CAG ≤19/GGN >21 haplotypes in cases compared to controls. Thus, men with CAG ≤19/GGN ≤21 (OR-5.2 at 95% CI-2.17–12.48, P < 0.001) and CAG ≤19/GGN >21(OR-6.9 at 95% CI–2.85–17.01, P < 0.001) had an increased risk compared to men with CAG >19/GGN >21 whereas individuals with CAG >19 and GGN≤21 were not at an increased risk for cancer (OR-1.1; 95% CI-0.14–2.83) (Table 3). Logistic regression analysis with CAG and GGN as binary covariates also showed a significant association (< 0.001).

Table 3 Risk of prostate cancer in relation to the combined distribution of number of CAG and GGN repeats in exon 1 of AR gene

We further tested a possible association (linkage) between CAG and GGN microsatellites separately among the cases and controls by cross-classifying them into groups based on mean GGN (≤21 and >21) and mean CAG (≤19 and >19) repeat lengths (Table 4). However, we did not observe a significant linkage between the two microsatellites among the cases as well as among the controls.

Table 4 Frequency distribution of AR gene CAG and GGN repeat lengths among prostate cancer cases and controls

We also analyzed GGN repeat polymorphism of the prostate cancer patients by categorizing them into different groups based on age, Gleason score and PSA levels (Table 5). With respect to tumor grade, patients with well and moderately differentiated tumor were classified as low grade (GS < 7) and those with poorly differentiated tumor as high grade (GS ≥ 7). Although a trend towards short mean GGN repeat length with high grade was observed, it was non-significant (P−0.09).

Table 5 Comparison of AR-GGN repeats of prostate cancer patients as a variable with age, grade and serum PSA levels at diagnosis

With respect to age at diagnosis, subjects were stratified into four groups based on quartiles (≤62, 63–66, 67–72, >72), and into two groups with regard to PSA levels, with mean PSA value as the cut-off. The mean GGN repeat within each of the age groups and PSA groups revealed no significant difference. Moreover, stratified analysis of GGN repeats distribution based on the age of onset, tumor grade and PSA levels revealed no significant association with any of the variables (Table 5).

Discussion

Studies on the association of AR-GGN repeat length and prostate cancer risk have produced conflicting results. Our study reveals that singly the GGN repeats are not associated with prostate cancer risk, but when combined with CAG repeats, men with CAG ≤19/GGN ≤21 and CAG ≤19/GGN >21 have an increased risk compared to men with CAG >19/GGN >21. However, we did not observe any statistically significant association between GGN repeats length and age of diagnosis, Gleason score and PSA levels.

The distribution of GGN microsatellites has been reported to differ significantly among different ethnic groups. High-risk African–Americans were found to possess the lowest frequency (20%) for GGN allele, with 22 repeats; whereas the comparable values for intermediate-risk whites and low-risk Asians were 57 and 70%, respectively (Irvine et al. 1995). Among the Western (Platz et al. 1998, Chen et al. 2002) as well as Chinese men (Hsing et al. 2000) GGN repeat length of 23 was predominant. Hence, they suggested that 23 GGN repeats might represent the coding sequence for optimal AR protein conformation and activity. However, in our study only 1.7% of the subjects had 23 GGN repeats and the repeats clustered around 22; where, 42% of the subjects had 22 repeats and 33% had 21 repeats, thus revealing the polymorphic nature and distinct ethnic variation in GGN repeat tract length.

Similar to our results, no significant genotype-specific prostate cancer risk was found with GGN repeat polymorphism among Caucasians in Britain (Edwards et al. 1999), French–German men (Correa-Cerro et al. 1999) and Caucasians in America (Chen et al. 2002). Moreover, men with short GGN repeats were not at increased risk in familial prostate cancer cases (Miller et al. 2001; Cicek et al. 2004). In addition, a recent study on early onset prostate cancer in British men also reported lack of association between GGN repeats and prostate cancer risk (Forrest et al. 2005).

In contrast to our results, men with GGC repeats ≤16 have been reported to exhibit higher risk estimates than men with >16 repeats (Stanford et al. 1997). Moreover, a significantly increased frequency of the GGC repeat ≤16 has been reported in hereditary as well as sporadic prostate cancer in a study predominant of Caucasians (Chang et al. 2002). In addition, among Chinese men those with <23 GGN repeats had 12% increased risk of prostate cancer compared to those with ≥23 GGN repeats (Hsing et al. 2000). Thus, our results on GGN repeats correlate with and deviate from some of the previous studies revealing the ethnic differences in AR GGN polymorphism and association with prostate cancer (Table 6).

Table 6 Studies on AR-GGN repeat polymorphism and prostate cancer risk

When we combined both the CAG and GGN repeats, we observed men with the haplotypes CAG ≤ 19/GGN ≤ 21 and CAG ≤ 19/GGN > 21 to exhibit increased risk compared to men with CAG > 19/GGN > 21 haplotype. Table 7 reveals the combined distribution of the repeats observed in different studies. Our results are thus consistent with earlier studies, where the subgroups with two short repeats (CAG < 22; GGC ≤ 16) had a twofold increased risk relative to those with long repeats (CAG ≥ 22; GGC > 16) (Stanford et al. 1997). Platz et al. (1998) reported an increased risk for those with a GGN = 23 and CAG < 21 compared to a GGN other than 23 and a CAG > 23. In contrast, men with CAG < 22 and GGN ≤ 23 repeats were not at increased risk of prostate cancer (Chen et al. 2002).

Table 7 Studies on combined distribution of AR-CAG and GGN repeat polymorphism in prostate cancer

Since the AR gene is located on the X-chromosome, the two microsatellites, which are in close proximity at this locus, can be associated with each other or in other words one would expect to find linkage disequilibrium between them. Irvine et al. (1995) have observed significant linkage disequilibrium between the CAG and GGN repeats only among cases and not among controls. However, Platz et al. (1998) reported linkage disequilibrium among cases and controls. In contrast, our results revealed no significant linkage between the repeats among cases as well as controls. The absence of linkage between the repeats might indicate that either one or both the repeats mutate at a relatively high rate and independent of each other.

Studies on the association of GGN repeats and the age of diagnosis have revealed contrasting findings. Stanford et al. (1997) observed men with GGC ≤ 16 to be at increased risk regardless of the age at diagnosis. But Miller et al. (2001) reported a reduced risk among men diagnosed at the age ≤ 66 years and an increased risk among men diagnosed at the age ≥ 66 years. Moreover, a reduced risk was observed with ≤17 repeats in men aged 70 years or older and no evidence of any association in men <70 years (Chen et al. 2002). However, our study did not find any association between age and the prostate cancer risk.

Although a trend of short mean GGN repeat length with high grade was observed in our study the association was not significant. Hakimi et al. (1997) have reported short GGN repeats to identify a sub-population of patients with clinically localized disease. However, Edwards et al. (1999) have observed long GGN alleles at higher frequency in advanced stages and grades. They also reported long GGN alleles to be associated with shorter time to relapse and worse overall survival. Thus we propose that assessing the role of CAG and GGN repeats with relapse and overall survival of the patients will enable prediction of the growth behavior of early-stage tumors and thus validate the prognostic significance of the repeats in prostate cancer.

Since the AR transactivation results in PSA secretion, we assessed the association of PSA levels with AR GGN genotype but did not observe any significant association. As the GGN repeats are in the transactivation domain, it is possible that a single–amino acid difference disrupts the binding affinity of the domain enough to up- or down-regulate the formation of a critical regulatory complex. However, the two functional studies reported so far are contradictory, with one revealing no substantial effect of the deletion of GGN repeats on AR activity (Jenster et al. 1994), whereas another reported diminished AR activity (Gao et al.1996). Thus the functional significance of the GGN repeats needs to be further evaluated.

The differences in the results reported for each study population may be partially explained by the gene–environment interaction. Differences in the study design and reference CAG and GGN lengths may also contribute to the divergent results in the epidemiological studies. It has also been proposed that the polymorphic CAG and GGN repeats function as low penetrance prostate cancer alleles that may require additional genetic or environmental factors to result in increased cancer risk (Nwosu et al. 2001). Moreover, they might be in linkage disequilibrium with other disease causing mutations in the AR gene or with other unknown adjacent genes that affect prostate cancer risk. Although numbers of repeats have a quantitative feature, frequency of 21 and 22 GGN repeats is very high and others are almost negligible. Therefore the positive association could be due to difference between the allele with 21 repeats and the allele with 22 repeats, rather than due to quantitative effect of number of repeats.

To the best of our knowledge, this is the first study to investigate the association between GGN repeat polymorphism and the relative risk of prostate cancer in Indian men. Our results suggest that specific haplotypes of AR attribute to risk of prostate cancer. Because of the significance of AR in prostate cancer, investigation of factors that interact with the polyglutamine and polyglycine region of the AR gene to alter AR function and modulate prostate cancer risk is an important area for future research.