Introduction

In Mexico, prostate cancer (PC) is currently the first cause of morbidity (27.3 per 100 000 inhabitants) and mortality (11.3 per 100 000 inhabitants) among males. An estimated 30% of PC cases are diagnosed in subjects prior to the age of 60 years.1 Interestingly, a high proportion of all cases, at the time of diagnosis, present a Gleason scale 7 and are classified as high-grade or poorly differentiated cancer.2

Prior studies support that age, ethnicity, and a positive PC family history are important risk factors that contribute to the increase of PC development.3 Studies in twins reinforce that genes have a remarkable influence (42%) on the development of PC.4 Given that prostate cell growth is hormone dependent, genes that participate in the synthesis pathways of steroid hormones have a critical role.5 Androgen receptor (AR) is a ligand-dependent transcriptional regulator codified on the X chromosome (q11–q12). This gene comprises eight exons in which different polymorphisms were found. As with other steroid receptors, AR contains a transactivation domain (located on the first exon) that encloses a trinucleotide motif (CAG)n.6 This motif has been related to receptor activity, where a lower number of CAG repeats encode for a more active receptor and greater androgenic activity, even with the same amount of androgens.7, 8

Preliminary studies suggest that a short CAG repeat length considerably increases risk for PC.9, 10 Nonetheless, CAG repeat length associated with PC has been discrepant among populations, suggesting that average CAG repeat length exhibits an inter-ethnic variation.11 In this scenario, Afro-descendants and Caucasians present the shortest CAG repeat length and high PC incidence rates.3, 12 In contrast, Asian and Hispanic populations have been related to largest CAG repeat length and consequently, at low risk for PC.9, 10, 12 However, in Hispanics, there are only two available studies, with conflicting results that examine CAG repeat length and related risk for PC.13, 14 In a case-control study carried out in Hispanic residents from San Antonio, TX, USA, fewer than (CAG)18 were associated with a threefold greater probability of PC.13 In contrast, another study in Hispanic residents from Los Angeles and Hawaii (USA), where the cutoff points employed were <22 or <23 CAG repeats, no association with PC risk or its extension was observed.14 In order to add to the limited findings among Hispanic populations, our study aimed to evaluate the association between CAG repeat length and the risk of PC in a population-based case–control study carried out with Mexican male residents in Mexico City.

Materials and methods

Study subject selection

We analyzed the genetic data of unrelated 158 males with incident PC (cases) and 326 males without a diagnosis of PC (controls) ages 42–94 years, who resided in Mexico City, Mexico, and had no previous history of any other cancer type. Both groups comprised a random sample from a population-based case–control study (402 cases and 805 controls) performed from November 2011 to August 2014.15 Cases were recruited at three public and three social security hospitals in Mexico City. Controls were males matched by ±5 years of age with index cases, without a previous report of prostate-specific antigen >4 ng ml−1, nor with malignant PC-related symptoms (that is, dysuria, hematuria, among others). Identification of controls was performed through the master sample framework utilized in National Health Surveys after the case was detected. In agreement with the 2005 National Census of Population and Housing, we selected 33 Basic Geostatistical Areas in Mexico City, each represented by 10 city blocks, which were visited from North to East in order to find a male possessing all of the criteria for consideration as a control. From each household, only one male was included in the study; if the next home belonged to same family, this household was not visited. Males who did not accept to participate in the study responded four questions about: educational level; marital status; birthplace; and length of time living in Mexico City. For the original study, the participation rate between cases and controls was 85.9% and 87.5%, respectively.

Following the STREGA (STrengthening the REporting of Genetic Association Studies) statement, we included the study of genome control (GC) in order to support our findings and diminish false-positive results.16 GC is a useful tool to depict the genetic architecture of complex populations. The GC was composed by 300 non-related individuals (150 females and 150 males) from a representative group of 1640 unrelated Mexican mestizo individuals from the Central Valley of Mexico previously described by our research group and only persons whose eight great-grandparents were born in Mexico were eligible.17

This population-based case–control study was conducted in accordance with the principles established by the Declaration of Helsinki and was approved by the Ethics Committee of the Mexican National Institute of Public Health (INSP; CI-980) and by each committee of the participating hospitals. Also, each GC signed an informed consent validated by the Ethics Committee of the Bimodi Research Unit. Through face-to-face interviews from each case and population control, we obtained information about sociodemographic characteristics, smoking habit, and familial history of PC in first-degree relatives. From each case, we obtained information on Gleason scale and histopathological diagnosis.

Sample collection

Peripheral venous blood was collected from all subjects in Vacutainer tubes containing EDTA (Becton Dickinson (BD), Franklin Lakes, NJ, USA). Peripheral blood mononuclear cells were obtained by Ficoll-hypaque density gradients (Hystopaque; Sigma Chemical Co., Sigma-Aldrich, St Louis, MO, USA), and genomic DNA was extracted from peripheral blood mononuclear cells using TRIzol reagent (Thermo Fisher Scientific, Suwanee, GA, USA). Sample concentration and purity was evaluated by Thermo Scientific NanoDrop 1000 Spectrophotometer (260/280), and DNA integrity was assessed by electrophoresis in 0.8% agarose gels.

Polymorphism analysis

PCR was performed with oligonucleotide primers previously reported by Westberg et al.18 Approximately 10 ng of target DNA was amplified. The reaction was standardized at a 6-μl total volume, containing 0.05 μm primers, 1 × reaction buffer with NH4SO2, 20 mm MgCl2, 200 μm of each nucleotide (Thermo Fisher Scientific), 1 m betaine (Sigma-Aldrich), and 1 U Taq DNA polymerase (Thermo Fisher Scientific). The thermocycling procedure consisted of 30 cycles of denaturation at 94 °C for 1 min, annealing at 65 °C for 1 min, and extension at 72 °C for 45 s, followed by a final extension step of 10 min at 72 °C. The resulting amplicons were analyzed by capillary electrophoresis on the ABI Prism 3130XL Genetic Analyzer employing GeneMapper ID ver. 3.2. software (Applied Biosystems, Carlsbad, CA, USA).

Statistical analyses

Selected characteristics between cases and controls were compared. Depending on the type of studied variable, Student's t-test, χ2-test or Fisher exact tests were utilized. Family history of PC in first-degree relatives was measured as ‘yes’ or ‘no’. According to birth state, birthplace was categorized into six regions: Mexico City, South: Campeche, Chiapas, Guerrero, Oaxaca, Quintana Roo, and Yucatán; West-Central: Aguascalientes, Colima, Guanajuato, Jalisco, and Michoacán; East-Central: Hidalgo, State of Mexico, Morelos, Puebla, Querétaro, and Tlaxcala; North: Chihuahua, Coahuila, Durango, San Luis Potosí, Zacatecas, Baja California, Baja California Sur, Sinaloa, Sonora, Nayarit, Nuevo León, and Tamaulipas, and East: Veracruz and Tabasco.

Average CAG repeat length between cases and population controls were compared using the Student's t-test, and genetic differences between populations were evaluated by means of an analysis of molecular variance test. On the basis of a search of the previous literature, we categorized CAG repeat length using the following cutoffs: 18, 19–25, and >25; <18 vs 18; <19 vs 19; <21 vs21, and <23 vs 23. To evaluate the association between CAG repeat length with cancer aggressiveness and age at diagnosis, we generated two different groups of cases. Based on Gleason scale19 at diagnosis, we considered as well-differentiated or low-risk, cancers in which the Gleason scale was <7; in contrast, poorly differentiated or high-risk cancers were those with a Gleason 7. In relation to age at diagnosis, we categorized cases of <60 years of age as ‘early-onset’ vs 60 years of age as ‘elderly-onset’.

Allele and genotype frequencies in all studied groups were estimated using Arlequin v.3.1 software.20 Hardy-Weinberg expectation was calculated by applying Weir and Cockerham F Statistic (FISW&C), using Genètix ver. 4.05.2 software.21

The crude and adjusted associations between AR CAG repeat length and PC risks were estimated using independent, unconditional logistic regression models for each case type. For each model, we employed AR CAG repeat length as a continuous variable and also independently we evaluated the different previously mentioned cutoff points. Age at interview was included as a continuous variable in bivariate and multivariable models. In addition to age, we evaluated, as a potential confounder, smoking history as well as birthplace. Due to the low prevalence of PC familial history among controls, we only report the association between AR CAG repeats and PC, including all individuals and males without a PC familial history. Cases in this latter analysis were considered as sporadic PC (no family history of PC). Sensitivity and specificity in the cutoffs were evaluated using Receiver Operating Characteristic (ROC) curves. All of the analyses were performed utilizing STATA ver. 14.0 statistical software.

Results

At diagnosis, 78.2% of cases were classified as high-grade and poorly differentiated PC (Gleason 7), while 26.6% were considered early-onset prostate cancer (EO-PC; data not shown in tables). Regarding study design, we did not find significant differences in the average ages between cases and controls (65.2±8.9 vs 64.5±9.4 years; P=0.19). Compared with males born in Mexico City, those born in the East-Central region had a nearly threefold (odds ratio, 2.70; 95% CI, 1.64–4.46; P=0.00) increased risk of being cases. The prevalence of former smokers and familial history of PC was significantly higher among cases than controls (Table 1).

Table 1 Selected characteristics of the study population according to cases and controls

Allele frequencies

Allele frequency distribution of the CAG polymorphism is depicted in Table 2. The most frequent alleles in the three groups were 17, 18, 19, 20 and 21 (average 19.33±2.5, range 10–29). Overall, the studied groups presented high-diversity patterns, exhibiting at least fifteen different alleles (k) in cases group (k=15), whereas the other groups presented k=16 (genome control, GC) and k=17 (population controls). Comparison among populations (cases, controls and GC) suggested that the groups were similar (genetic differences less than 1%, P0.25). The distribution of this polymorphism was in agreement with Hardy-Weinberg equilibrium (P>0.05), which was obtained from women in the GC.

Table 2 Allele frequencies and descriptive statistics parameters regarding the locus studied in cases, controls and genome control

Compared with population controls, PC cases in general (19.5±2.5 vs 19.0±2.6; P=0.06) and PC classified as poorly differentiated at time of diagnosis (19.5±2.5 vs 18.9±2.5; P=0.06) had a marginally lower average of repeats. In contrast, EO-PC cases (18.6±2.2; P=0.02) presented a significantly lower average of CAG repeats (Figures 1a and c). Birthplace was associated with the number of CAG repeats. Males born in the South region presented on average a greater number of triplets than those born in Mexico City (20.7±3.5; P=0.003); in contrast, subjects born in the North region had a marginally lower average of CAG repeats (17.8±2.8; P=0.07). The average number of CAG repeats length among those with a familial history of PC was lower (18.7±3.9; P=0.11), but not statistically significant (Table 3).

Figure 1
figure 1

Distribution of CAG repeat length in AR gene, according to prostate cancer cases and controls. *Student's t-test, P-value for cases and controls <0.05; **Student's t-test, P-value for cases and controls=0.06.

Table 3 Androgen receptor (AR) CAG repeats according to selected characteristics of the study population

PC and CAG repeat length

In order to assess a linear association between CAG repeat length and PC, we carried out a model employing CAG repeat length as a continuous variable. Overall, for each increase in number of CAG repeats, we observed a reduction in the risk of PC. However, this was only statistically significant for EO-PC (odds ratio, 0.83; 95% CI, 0.71; 0.97; P=0.02). For all PC and for Gleason 7, the reduction in the risk associated with the increase of copies was marginally significant (Table 4). On evaluating different cutoff points, solely the cutoff point in 19 repeated sequences (CAG)19 exhibited the best combination of sensitivity (0.48) and specificity (0.70) for identifying the risk condition (data not shown). Lower number of (CAG)19 repeats was significantly associated with a two-fold greater risk of EO-PC (odds ratio, 2.31; 95% CI, 1.14; 4.69; P=0.02) and a marginal increase in the risk of PC in general or for poorly differentiated cancer (Gleason 7). With others cutoff points, although associated with a greater risk of PC in general, the associations were not statistically significant. In subjects without a familial history of PC, the association observed with EO-PC remained statistically significant and of the same magnitude (Table 4).

Table 4 Androgen receptor (AR) CAG repeats length and its association with prostate cancer (PC) using different cutoff points

Discussion

In agreement with previous reports, this study suggests that in Mexican men, a fewer number of CAG repeats in the gene encoding for the AR might be associated with a greater risk of sporadic and EO-PC. The cutoff point that best identified subjects at risk for presenting this neoplasm was(CAG)19 repeat length. Likewise, these results suggest that the differential distribution of CAG repeat length across regions of Mexico could be a potential explanation for the regional distribution of PC across the country observed in the Histopathological National Register of Cancer from 1993 to 2002;22 in that study most of the prostate tumors were reported in Mexico City, followed by the northern, center and southern states of the country.

The results of this study are consistent with the majority of studies conducted in Afro-descendent, Caucasian and Asiatic populations,9, 10 especially with a study carried out in Sweden where they found that short CAG repeats in the AR gene correlate with young age at diagnosis of sporadic PC.23 Moreover, our results are in agreement with the two studies that have evaluated this association in Hispanic population. The study carried out in Hispanic residents of San Antonio, Texas, USA, reported that a CAG repeat length 18 was associated with a three-fold greater risk for PC.13 This association was stronger when the analysis was restricted to cases of PC aged <65 years (odds ratio, 3.03; 95% CI, 1.27–7.26). The results of the multiethnic cohort in Hawaii and Los Angeles, CA, USA, were consistent with our results, due to that we did not find a significant association when similar cutoff points to those utilized by the abovementioned authors (<22 and <23 CAG repeats).14

The main proposed biological mechanism for the association between shorter CAG repeat length and PC risk is the increased receptor activity; however, several studies suggest that CAG repeat length may increase PC risk-mediating androgen-induced TMPRSS2 and ERG proximity.24 About a half of human PCs display a TMPRSS2-ERG gene fusion, which is one of the most common genomic aberrations in PC. This gene fusion is related with a poor PC prognosis.25

Comparison between included and not included subjects in this analysis (Supplementary Table S1) showed that a higher proportion of included controls did not have familial history of PC (1.2 vs 3.5%). The external validity of our results is limited to subjects without familial history of PC. Therefore, we were not able to establish if the magnitude of this association could be similar among subjects with a family history of PC. Given that we did not find differences related to CAG repeat length between the population controls and GC, the probability of our results to be a consequence of selection bias is low. In addition, although all analysis were adjusted for place of birth, as a proxy measure for genetic ancestry, there is a possibility of having residual confounding. Some evidence suggests that body mass index is related with the risk for PC and the number of CAG repeats. However, we did not include body mass index in our analysis owing to body mass index information that we had was at the time of the interview and it did not adequately represent the subjects’ body composition, mainly among cases, which can be affected by the disease. Likewise, we discarded the possibility that the results observed were the consequence of a differential measurement error in the determination of the number of CAG copies, because the person charged with carrying out this determination was unaware of case or control condition. If a measurement error was present, it would be of the non-differential type and would impact the results toward underestimation of the association.

To our knowledge this is the first study that examines the role of the lower number of CAG repeats as a risk factor for PC in Mexican males. In addition, findings from this study may help inform the establishment of cutoff points for determining potential PC risk in this population related to (CAG)19. A previous small study (68 PC cases vs 48 healthy men) carried out in Mexican men suggests an association between shorter CAG repeat length and latter age at PC diagnosis;26 however, some methodological limitations (selection of the study subjects and measurement of CAG repeat length) limit the validity and interpretation of these results. Finally, longitudinal studies are needed to determine and evaluate CAG cutoff points as potential risk factors for developing PC, as well as for disease progression. A high proportion of cancers respond to androgen ablation; however, part of these cancers become resistant to treatment and it is not impossible to think that this lack of response could be associated with the presence of this polymorphism. Finally, it is crucial to establish a National Cancer Registry in Mexico, which could provide a comprehensive repository of data to help guide related research and practice guidelines to reduce the risk of PC morbidity and mortality.