Introduction

Prostate cancer is the most common noncutaneous neoplasm among men. According to the American Cancer Society, 1 89 000 US men were diagnosed with prostate cancer in the year 2002, and approximately 30 000 US men died from this disease. A clear risk factor for prostate cancer is age, suggesting an important role of exposure to endogenous and/or exogenous factors (eg, hormones or environmental agents). Another known risk factor is ethnicity: prostate cancer is almost two times more common in African Americans than in Caucasians, whereas Asians have the lowest risk.1,2 Finally, there is an increased risk among men with a positive family history of prostate cancer.3 These factors suggest that both genes and environment predispose men to this disease.

Substantial evidence from linkage studies supports the existence of genetic factors for prostate cancer development and progression. A number of chromosomal regions have been linked to prostate cancer, including 1q24–q25, 1q42.2–q43, 1p36, 8p22–23, 16q23, 17p11, 19q, 20q13, and Xq27–28.4 On 17p11, ELAC2 was the first prostate cancer-susceptibility gene cloned.5 Common allelic variants of ELAC2 were shown associated with prostate cancer in some studies,6,7 but not all.8,9,10,11 Another candidate prostate cancer-susceptibility gene, RNASEL, was identified in the HPC1 locus on 1q24–25.12 Subsequent studies support a role both for inactivating mutations and variants in RNASEL and prostate cancer risk.13,14,15

Since prostate growth depends on active testosterone,16 and testosterone administration leads to the development of prostate adenocarcinomas in rats,17 studies of genes involved in androgen metabolism have provided further insight into the genetic basis of this disease. Three such genes are CYP17A1, CYP3A4, and SRD5A2. CYP17A1 is involved with the biosynthesis of testosterone in the gonads and adrenals, and is involved with catalyzing steroid 17α-hydroxylase and 17,20-lyase activities at key points in this process.18 CYP3A4 catalyzes the 6β-hydroxylation of testosterone, suggesting that it may be involved in the metabolism and oxidative deactivation of this steroid.19 The SRD5A2 gene product catalyzes the conversion of testosterone to dihydrotestosterone,20 a highly active form of this androgen.

Numerous previous studies have detected associations between variants in CYP17A1, CYP3A4, and SRD5A2 and prostate cancer.21,22,23,24,25,26,27,28,29,30,31 These studies have focused only on specific individual variants in the candidate genes. However, fully deciphering the impact of these genes on prostate cancer may require the evaluation of multiple variants or haplotypes. Therefore, we present here a two-phase comprehensive analysis of the association between multiple single-nucleotide polymorphism (SNP) genotypes/haplotypes in CYP17A1, CYP3A4, and SRD5A2 and prostate cancer.

Materials and methods

Study population

A total of 1117 men (637 cases and 480 controls) from 506 sibships were included in the study. In total, 92 sibships were concordant (ie, where all included men were diagnosed with prostate cancer, and there was no control sibling) and contained altogether 197 men. The remaining 414 sibships were discordant (ie, at least one control sibling) and contained 440 cases and 480 controls. Altogether, 52 discordant sibships had more than one control sib. Hence, while all of the 1117 study subjects contributed information for calculating allele frequencies, only the 440 cases and 480 controls (920 men total) from discordant families were informative for the conditional logistic regression analysis of association.

The study population was recruited between January 1998 and January 2001 from the major medical institutions in the greater Cleveland area and from the Henry Ford Health System in Detroit. The study was approved by the collaborating institution's Review Boards, and informed consent was obtained from all participating men. Characteristics of the study population have been described elsewhere.13

Men diagnosed with histologically confirmed prostate cancer at age 73 or younger were invited to join the study if they had a living unaffected brother who was either older than the proband, or at most 8 years younger than the age at diagnosis of the proband. This age restriction was selected in an attempt to increase the potential for finding a genetic association with disease and to substantiate that the controls were not unaffected simply due to being of a younger age. To help confirm that the controls were not diseased, we tested the prostate-specific antigen (PSA) levels in their blood (see discussion). Information on the cases' Gleason score and tumor stage (TNM) was determined from their medical records. The study population was comprised of 90% Caucasians (European Americans), and the remainder primarily African American (9%).

SNP discovery

A complementary approach was used to identify SNPs within CYP17A1, CYP3A4, and SRD5A2. We performed SNP discovery by sequencing individuals from two populations: (1) the Coriell Polymorphism Discovery Resource (Collins et al32; Coriell Cell Repositories, Camden, NJ, USA); and (2) prostate cancer case–control sibships. The Coriell population includes 24 individuals: six European Americans, six African Americans, three Hispanic Americans, three Native Americans, and six Asian Americans. The prostate cancer case–control sibships used for SNP discovery were two randomly selected subsets of our entire study population, and included 67 cases and 43 controls for CYP17A1 and CYP3A4, and 51 cases and 41 controls for SRD5A2. There was no overlap between the two subgroups. Of the 110 individuals sequenced for CYP17A1 and CYP3A4, 106 were European American, two were Hispanic American, and two were African American. Of the 92 individuals sequenced for SRD5A2, 84 were European American and eight were African American.

For sequencing, PCR primers amplifying most of the coding regions, splice sites, 5′ and 3′ regions, and parts of introns of CYP17A1, CYP3A4, and SRD5A2 were designed using the Primer3 program (http://www.genome.wi.mit.edu/cgi-bin/primer/primer3.cgi). PCR products were sequenced using energy DYEnamic™ ET Terminator Kit on the MegaBACE™ DNA Analysis System (Amersham Biosciences, Sunnyvale, CA, USA) by standard protocols. Sequence analysis was performed by assigning quality values (Phred, University of Washington, Seattle, Washington), assembling contigs (Phrap, University of Washington), automated identification of candidate heterozygote SNPs (PolyPhred, University of Washington), automated identification of candidate homozygote SNPs (High Quality Mismatch, Amersham Biosciences), and by operator confirmation (Consed, University of Washington). All polymorphisms were confirmed by single-nucleotide primer extension.

In addition, we searched the following databases for known SNPs: dbSNP (http://www.ncbi.nlm.nih.gov/SNP/), GeneSNPs (http://www.genome.utah.edu/genesnps), the Human Cytochrome P450 Allele Nomenclature Committee (HCANC) (http://www.imm.ki.se/CYPalleles/), the Human Gene Mutation Database (HGMD) (http://archive.uwcm.ac.uk/uwcm/mg/hgmd0.html), and HGVbase (http://hgvbase.cgb.ki.se/) (formerly HGBASE at http://hgbase.interactiva.de/).

Genotyping

For Phase I of the study, we genotyped the SNPs discovered in CYP17A1, CYP3A4, and SRD5A2 among 276 men (including the 92 and 110 men used in SNP discovery) from case–control sibships. Altogether, 276 men were selected for phase I to allow for detection of SNPs with low allele frequencies, and due to the convenience of fitting them all into three 96-well plates along with controls. These men included 153 cases and 123 brother controls, 70% European American and 30% African American. We then used the information from the 276 men to determine initial case–control allele frequency differences and haplotype-tagging SNPs. From these results, we determined which SNPs should be genotyped in the remainder of our study population (ie, Phase II of the study).

Genotyping of SNPs was primarily performed utilizing the MegaBACE SNuPe™ Genotyping Kit (Amersham Biosciences) on the MegaBACE™ DNA Analysis System (Amersham Biosciences). The Primer3 program was used to design PCR primers to amplify regions containing the SNPs of interest (see the Online Supplementation (Table 4) for PCR primers and conditions). PCR fragments were purified with 0.5 U of Shrimp Alkaline Phosphatase (Amersham Biosciences) and 10 U of Exonuclease I (Amersham Biosciences) at 37°C for 40 min and at 85°C for 15 min. The single base extension (SBE) reaction was set with 1 pmol of HPLC-purified SBE primer, 2–4 μl of SNuPe Premix (Amersham Biosciences), 2–4 μl of sterile water, and 1 μl of purified PCR fragment, and incubated at 25 cycles of 96°C for 10 s, 50°C for 5 s, and 60°C for 10 s. For phase I of the study, SNuPe reactions were set in 96-well plates at 10 μl volume and purified with AutoSeq™96 Plates (Amersham Biosciences) prior to injecting into the MegaBACE1000 system. For phase II of the study, SNuPe reactions were set in 384-well plates at 5–6 μl volume, diluted with 3–4 μl of sterile water and purified with 1 U of Shrimp Alkaline Phosphatase (Amersham Biosciences) at 37°C for 45 min and at 85°C for 15 min prior to injecting into the MegaBACE4000 system. In cases where low signal was anticipated (due to faint PCR), SNuPe reactions were desalted using a custom 384-well filter plate incorporating modified size-exclusion technology (Millipore Corporation, Billerica, MA, USA). Scierra Genotyping LWS™ system (Amersham Biosciences) was utilized for the tracking and management of samples and laboratory activity for the Phase II of the study.

Specific software (SNPriDe; Amersham Biosciences) was developed for the automated design of SBE primers. Using a purified PCR fragment containing the SNP of interest as a template, a third, internal primer was designed so that the 3′ end annealed adjacent to the polymorphic base pair, and during the SBE reaction a fluorescently labeled dideoxynucleotide (terminator) was added onto the primer. The signal data were automatically processed, outputting the maximum-likelihood SNP genotypes (MegaBACE SNP Profiler, Amersham Biosciences). The system includes a user interface for editing and verification. Three SNPs, SRD5A2_SNP20 (V89L), SRD5A2_SNP22 (A49T), and CYP17_SNP29 (−34T>C) were analyzed by restriction enzyme digestion (see Online Supplementation for details).

Haplotype estimation

Alleles within each of the three candidate genes were in strong linkage disequilibrium with one another. Thus, for each gene, haplotypes were estimated using the resulting genotypes, by disease status and within major ethnic groups using the software PHASE.33 This program uses Markov chain Monte Carlo to estimate haplotypes, imputes information for missing genotypes, and incorporates a statistical model for the distribution of unresolved haplotypes based on coalescent theory.33 For the estimation of haplotypes, PHASE uses individual level information and provides the most likely haplotypes. Haplotype-tagging SNPs were determined by custom perl scripts (Amersham Bioscience). We first determined haplotypes and haplotype-tagging SNPs among the 276 men genotyped for Phase I of the study, where tagging SNPs were those necessary to define the most common haplotypes (eg, 5%). After completing genotyping on the entire study population (Phase II of the study), we used the resulting data to estimate haplotypes.

Association analyses

Altogether, 414 discordant sibships (440 cases and 480 controls) were included in the analyses. We first compared case versus control allele frequencies within major ethnic groups. Then we evaluated the association between the resulting genotypes/haplotypes and prostate cancer risk by calculating odds ratios (OR, estimates of relative risk) and 95% confidence intervals from conditional logistic regression with family as the matching variable, and a robust variance estimator that incorporates familial correlations. This is a standard approach for analyzing sibling-matched case–control data, although sibling sets without any controls do not contribute any information (197 cases total here).34 In our analyses of CYP17A1, CYP3A4, and SRD5A2, we used a dominant coding, which assumes that the relative risk of carrying one or two polymorphisms (or haplotype) is equivalent. The most likely haplotypes were used in the association analyses. The nontagging SNPs (included due to interesting initial results in phase I) were included in both the genotype- and haplotype-level analyses. All our analyses were stratified by ethnicity.

To control for potential confounding, age was adjusted for in all regression models. In addition to looking at the main effects of each SNP or haplotype, we also stratified the analyses by the case's disease aggressiveness, where high aggressiveness was defined by TNM stage T2B or Gleason score 7, and low aggressiveness by TNM stage

Results

Phase I: SNP discovery

We detected 34 novel SNPs: 11 in CYP17A1, 18 in CYP3A4, and five in SRD5A2. In addition, we ‘rediscovered’ 11 SNPs from the public databases. Including these 11 SNPs, we selected a total of 53 SNPs from the databases: 18 in CYP17A1, 15 in CYP3A4, and 20 in SRD5A2. These were chosen based on the intention to obtain an even distribution of SNPs across the genes and the availability in the databases at that time (January–April 2001). In all, 21 SNPs were chosen from dbSNP, 27 from GeneSNPs, 12 from HGMD, eight from HGVbase, and two from HCANC (the total number of SNPs listed here exceeds 53 as several SNPs were present in multiple databases). Table 1 lists all the 87 SNPs (34 novel, 53 from databases), with their origins, exact locations, and allele frequencies.

Table 1 The origins, nucleotide changes and allele frequencies of single nucleotide polymorphisms (SNPs) in CYP17A1, CYP3A4, and SRD5A2 observed in the Coriell Diversity set (CDS), European Americans, and African Americans

Among the 34 novel SNPs, 26 (76%) were discovered in both the Coriell and case–control populations. Three SNPs were only observed in the Coriell data, and the remaining five were found only in the prostate cancer sibships. Of these five, three were relatively rare (allele frequencies 0.2–1.5%), suggesting that they may not have been discovered in the Coriell population simply due to its small sample size (n=24). Nevertheless, the other two SNPs that were only found in the prostate cancer sibships (CYP3A4_SNP12 and CYP17_SNP42) showed higher allele frequencies (7.5 and 21.8%, respectively), suggesting that they might be specific to the prostate cancer case–control population.

Phase I: genotyping and haplotyping

The 87 SNPs were geneotyped in a total of 276 males from prostate cancer sibships (29 in CYP17A1, 33 in CYP3A4, and 25 in SRD5A2). Altogether 11 SNPs gave ambiguous genotyping results. This might have been due to unoptimized genotyping reactions or primer self-priming due to secondary structures and unspecificity of PCR and/or SNuPe primers, especially within the cytochrome P450 gene family. Of the remaining 76 SNPs, a similar percentage of those novel (41%, or 12/29) and public (38%, or 18/47) had allele frequencies 10%. However, 19/47 (40%) of the public SNPs were found to be monoallelic in the 276 men, suggesting that they are either extremely rare, population specific, or artifacts.

In the light of these results, we excluded the 11 SNPs with ambiguous genotype results, the 19 SNPs that appeared monoallelic in all samples tested, and additional four (three novel and one public SNP) that were seen only in the Coriell Diversity Set but not in the prostate cancer sibships. We also excluded one SNP because >15% of data were missing (due to a low success rate for PCR and SNuPe reaction). Finally, we excluded 12 SNPs because their minor allele frequencies were less than 5% in all of the following four subgroups: European American controls, European American cases, African American controls, and African American cases (see Table 1 for details). Following these exclusions, a total of 40 SNPs remained for consideration in the Phase II association study (14 in CYP17A1, 16 in CYP3A4, and 10 in SRD5A2) (Table 1).

Using the preliminary genotype information, haplotypes estimated with a frequency 5% in at least one of the four major subgroups (ie, European American controls, European American cases, African American controls, African American cases) were identified. Each gene had a single ‘common’ haplotype, with a frequency ranging between 42 and 51% (not shown). Haplotype-tagging SNPs were identified and used as a basis for inclusion in Phase II of the study. In addition, nontagging SNPs exhibiting suggestive case versus control allele frequencies were considered (Table 1). Altogether, 24 SNPs were selected for Phase II.

Phase II

The 24 tagging and suggestive SNPs were genotyped in an additional 841 men, giving information on a total of 1117 individuals for Phase II. Case and control allele frequencies by ethnic groups are presented in Table 1. Haplotypes estimated with a frequency 3% in at least one of the four major subgroups of the study population were identified. The major haplotypes for CYP17A1, CYP3A4, and SRD5A2 along with their frequencies are presented in Figure 1.

Figure 1
figure 1

Major haplotypes for (a) CYP17A1, (b) CYP3A4, and (c) SRD5A2. Solid black triangles refer to the locations of novel SNPs, while white triangles denote the locations of public SNPs. All haplotypes with frequency 3% in at least one of the four subgroups (European American controls, European American cases, African American controls, African American cases) are given in the center of each panel, along with their case and control frequencies. aEA, European American; bAA, African American; ccomposite haplotype refers to all the remaining rare haplotypes pooled together.

In our association analyses, we did not detect any associations between CYP17A1 genotypes/haplotypes and prostate cancer. When looking at CYP3A4, we found that SNP1 was inversely associated with prostate cancer (OR=0.53, 95% CI=0.29–0.99; P-value=0.05) (Table 2a). Furthermore, our haplotype analysis revealed an inverse association with CYP3A4_Hap4 and prostate cancer (OR=0.46, 95% CI=0.21–1.02; P-value=0.05) (Table 3a). We also found that two SNPs in SRD5A2 were positively associated with prostate cancer: SRD5A2_SNP26 (OR=1.57, 95% CI=1.08–2.30; P-value=0.02), and SRD5A2_SNP20 (V89L) (OR=1.56, 95% CI=1.08–2.25; P-value=0.02) (Table 2A). These SNPs, however, were in strong linkage disequilibrium.

Table 2 (a) All nonstratified association results between CYP17A1, CYP3A4, and SRD5A2 variants and risk of prostate cancer among cases and sibling controls;a (b) Statistically significant allele associations obtained from analysis stratified by aggressivenessa
Table 3 (a) All nonstratified haplotype association results for CYP17A1, CYP3A4, and SRD5A2a (see Figure 1 for details of haplotypes); (b) Statistically significant haplotype associations obtained from analysis stratified by high aggressiveness (ie, high TNM stage or Gleason score) and low aggressiveness (ie, low TNM stage and Gleason score)a

When we stratified the study population by high and low aggressiveness of prostate cancer, several interesting associations emerged (see Tables 2b and 3b). First, five SNPs in CYP3A4 showed statistically significant associations with low aggressiveness, four of them showing an inverse association: CYP3A4_SNP11 (CYP3A4*1B) (OR=0.20, 95% CI=0.06–0.67; P-value=0.009), CYP3A4_SNP47 (OR=0.19, 95% CI=0.06–0.62; P-value=0.006), CYP3A4_SNP1 (OR=0.21, 95% CI=0.05–0.86; P-value=0.03), and CYP3A4_SNP15 (OR=0.41, 95% CI=0.22–0.79; P-value=0.007). One of the five SNPs, CYP3A4_SNP25, showed positive association with low aggressiveness (OR=6.54, 95% CI=0.99–43.10; P-value=0.05). Second, we observed an inverse association between CYP3A4_Hap4 and low aggressiveness (OR=0.06, 95% CI=0.008–0.50; P-value=0.009) (Table 3b). Finally, we detected an inverse association between SRD5A2_Hap3 and high aggressiveness (OR=0.52, 95% CI=0.29–0.91; P-value=0.02) (Table 3b).

Discussion

The motivation for studying the relation between prostate cancer and the three candidate genes involved in the testosterone biosynthetic pathway (CYP17A1, CYP3A4, and SRD5A2) comes from several observations. First, the prostate is an androgen-dependent organ. Second, prostate cancer typically strikes at an old age – after decades of exposure to testosterone. Third, men castrated at an early age do not develop prostate cancer. And, finally, African Americans experience earlier puberty,35 higher serum levels of total testosterone,36 and higher incidence of prostate cancer.

Our study did not detect any associations with CYP17A1, including the previously reported association between prostate cancer and the −34 T>C promoter SNP in CYP17A1 (CYP17_SNP29 in this study).22,25,26,27,29,31 There are also other negative reports for this promoter SNP,37,38,39 making its role in the development of prostate cancer controversial. We did, however, detect an association with a number of CYP3A4 genotypes and haplotypes. Interestingly, all but one of the statistically significant findings in CYP3A4 showed protective effects with the associated minor alleles.

These included CYP3A4*1B21 (CYP3A4_SNP11 in our study), which was inversely associated when we stratified the analysis by low disease aggressiveness, as previously identically reported from this population.40 Note that most of the other positive associations for CYP3A4*1B have come from case-only studies, when comparing men with more aggressive to those with less aggressive disease.21,24 We also found CYP3A4_Hap4 to be inversely associated with low aggressiveness. CYP3A4_Hap4 differs from the most common haplotype (Hap1) by seven SNPs, including SNP47, SNP11, SNP1, and SNP15, all of which showed inverse genotype-level associations in our stratified analyses. Moreover, the CYP3A4_Hap4 association was more statistically significant than any of the genotype-level associations. Since the associated SNPs are not necessarily exclusive to the associated haplotype, it is difficult to discern which may be the causal variant, or whether they may all simply be in linkage disequilibrium with – and on the same haplotype as – the causal, yet unknown, allele. Furthermore, since many of our SNPs are in linkage disequilibrium, some associations we saw may not represent independent effects, but rather reflect association with another linked SNP.

In addition to the findings in CYP3A4, we detected positive associations between prostate cancer risk and two SNPs in SRD5A2. One of these is a novel association (SRD5A2_SNP26), whereas the other (SRD5A2_SNP20, or V89L) has already been reported in the current and previous populations (30; personal communication). However, due to the extremely high linkage disequilibrium between the two SNPs, we cannot distinguish which (if either) may be causal for disease. We did not confirm the previously reported23,28 association between prostate cancer and SRD5A2_SNP22 (A49T), possibly due to the low frequency (3.6%) of this SNP. Finally, we observed an inverse association between the high aggressiveness of prostate cancer and the SRD5A2_Hap3; this differs from the most common haplotype (Hap1) by two SNPs, both of which showed suggestive case–control differences in Phase I (data not shown).

The complementary approach used here to discover SNPs within the three candidate genes entailed resequencing a diversity panel and a disease-specific population, and searching in public databases. Using a diverse population in SNP discovery efforts is important because they should lead to the detection of a large number of SNPs. On the other hand, using disease-specific populations may reveal mutations not seen in a diversity set. By using both types of populations, we were able to discover 34 novel SNPs in the three genes. Moreover, studying these populations was crucial because more than one-third of the SNPs selected from public databases were monoallelic in our study group.

In our study, we included 53 SNPs from the public databases; 19 of them did not show up in our population. Furthermore, we did not have sequence coverage for 15 of the 34 biallelic public SNPs. Of the remaining 19 public SNPs, we rediscovered 11 (58%). We missed three public SNPs that had a reasonable allele frequency even though we had sequence coverage for those SNPs. In addition, we missed five public SNPs with very low minor allele frequencies. Several of these missed SNPs were located at the very ends of our contigs, making it less likely that we had sequence coverage in both directions. We typically followed through only variants that were seen in both directions.

We restricted our consideration to SNPs with allele frequencies 5% and haplotypes with frequency 3% in at least one of the following subpopulations: European American controls, European American cases, African American controls, and African American cases. This restriction was made because we would have severely limited statistical power to detect associations at low frequencies. We used a slightly lower frequency cutoff for haplotypes to try and distinguish those that might be carrying a causal variant.

To help confirm the controls' disease status, we determined the PSA levels in their blood. If they had PSA levels above 4 ng/ml, they were informed of this and advised that they should schedule an appointment with their primary care physician to further evaluate this test result. We retained such individuals in the study as controls unless a subsequent diagnosis of prostate cancer was made, at which time they were reclassified as cases. Keeping them in the study is important because automatically excluding men with elevated PSA levels regardless of their ultimate prostate cancer status can lead to biased estimates of association.41,42

Since prostate cancer risk varies by population, we adjusted our analyses by ethnicity (ie, European American and African American). Unfortunately, the relatively small number of African Americans studied here (96 total) resulted in unstable estimates of association within this ethnic group. Nevertheless, by using a sibling-based study design, we are assured that our controls have been ascertained from the cases' genetic source population, excluding the potential for bias due to population stratification.43

A two-phase design such as used here allows one to include a large number of candidate SNPs (87 here) and initially screen them in a relatively small sample set (276 men here). Once the relevant SNPs are identified (24 here), resources can be focused on genotyping a larger sample set (841 additional men here) in order to achieve adequate statistical power to detect the potential associations. This reveals the major haplotypes with reasonable resources, and hence may be a valuable approach for future studies. The number of subjects studied in each phase, however, will depend upon the haplotype structure of one's chromosomal region of interest, the expected frequency of causal genotypes or haplotypes, and the magnitude of their impact on disease.

In summary, we have detected a number of intriguing associations between prostate cancer and genotypes/haplotypes in CYP3A4 and SRD5A2. These results need to be confirmed in other, independent and ethnically diverse populations. If confirmed, functional studies would be valuable for deciphering the potential causal impact of these SNPs on prostate cancer. Deciphering which of these truly impact disease susceptibility would prove invaluable for screening purposes, and finding aggressiveness genes may provide important information about the most appropriate recommended course of treatment among men already diagnosed with prostate cancer.