Introduction

In most Western countries, prostate cancer (PrCa) is one of the most commonly diagnosed cancers in men. In Finland in 2009, there were over 4500 cases diagnosed and 785 deaths,1 making it the most commonly diagnosed cancer and the second most common cause of cancer death. There has been a concerted effort over many years to try and understand the mechanisms that drive the development and progression of prostate tumors as part of an attempt to improve detection and therapeutic interventions.

Family history has long been known to be a major risk factor2, 3, 4, 5, 6 and many genetic studies have attempted to identify genetic variants that predispose men to development of the disease or contribute to the aggressiveness of the tumor. Many different studies consistently report PrCa as highly heritable.5, 7, 8, 9, 10 At least 15 different loci have been linked to hereditary PrCa (HPC) through linkage analysis in highly aggregated families. So far, a few genes have been positively identified in the search for high-penetrance PrCa susceptibility loci, but the evidence does not suggest that risk alleles in these genes explain large proportions of either HPC or nonhereditary PrCa because the risk alleles at this loci appear to be quite rare. HPC1/RNASEL (1q23-25),11 HPC2/ELAC2 (17p)12 and MSR1(8p22-23)13 are genes with rare, high-penetrance risk alleles that have been found through sequencing under linkage peaks in HPC families. Multiple other risk loci, such as PCAP (1q42.2-43),14, 15 CAPB (1p36),15, 16 MYC (8q24),17, 18, 19 HPC20 (20q13)20 and HPCX1 (Xq27-28),21 have been implicated through linkage analysis in HPC families.

Common, low-penetrance polymorphisms on 3q, 8q, 10q, 11q, 17q, 19q and Xp have been consistently and repeatedly detected in genome-wide association studies (GWASs).17, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32 However, more work remains to be done to fully elucidate the role of heredity in the complex disease of PrCa. The causative genes responsible for these associations are still unknown.

Ordered subset analysis (OSA) is a widely used technique to address the genetic heterogeneity of many complex diseases and traits and to allow detection of gene–gene interactions.33 We have previously reported genome-wide linkage analysis of hereditary PrCa in 69 highly aggregated Finnish families,34 where we replicated a previously reported linkage to HPC9 on 17q21-q2235, 36 and identified a novel locus on 2q37.2. Here we describe a secondary analysis of these data using OSA to condition on linkage to either or both linkage peaks to attempt to identify additional loci linked to this disease. Recently, functional variants in the HOXB13 gene37 have been proposed as candidates for the causal risk alleles in the 17q21-q22 region, making it quite important to condition on linkage to this region in the search for additional loci important to familial PrCa risk.

Materials and methods

Families and genotyping

Sixty-nine multiplex Finnish families, all Caucasian, were included in the study. All of the 69 families had at least three confirmed cases of PrCa and 6 out of 69 families had ≥5 affected members. The detailed description of the families, our sample collection protocol and confirmation of diagnoses are presented elsewhere.38, 39 A total of 54 families were genotyped with microsatellite markers, 44 of these families from the microsatellite study, plus an additional 15 previously ungenotyped families were genotyped with SNP markers. Details regarding the DNA preparation, PCR conditions and allele-scoring techniques for the markers are described elsewhere.38

Linkage analyses

Primary, nonparametric linkage (NPL) analyses were performed in GENEHUNTER-PLUS.40, 41 The X-chromosome version of GENEHUNTER-PLUS (v1.3) was used in X-chromosome analyses. The nonparametric affecteds-only linkage analyses included NPL scores from GENEHUNTER-PLUS using the ‘all’ option and allele-sharing LOD scores as developed by Kong and Cox40 (performed by the ASM program in conjunction with GENEHUNTER-PLUS).

OSA

To address the apparent genetic heterogeneity observed by many studies of HPC,34 we used OSA to condition on NPL scores at one or both of the two linked loci. NPL methods are powerful to detect loci that contribute to risk in a large proportion of families, but less powerful when the proportion of linked families is small. By conditioning on one or both of our two already identified loci, we reduce heterogeneity and so increase power to detect linkage to other loci. Multipoint NPL scores calculated with GENEHUNTER-PLUS software were utilized as the ranking covariate in OSA to take advantage of the extended pedigree structure, ranking families by familywise NPL score. The OSA program first arranged the familywise NPL scores in an ascending order (low to high) and later in a descending order (high to low) to find an appropriate subset of families that maximized the evidence of linkage to other regions of the genome.33 The ‘optimal slice’ that gave the maximum OSA LOD score determined a subset of ‘adjacent’ families in the covariate rankings (not necessarily including the end points), thus allowing families with extremely low or high covariate scores to be excluded from the linked subset. The OSA program generates this subset by taking the most significant of the two ordered ranking subsets and then sequentially dropping families from the top of the subset to see if an even more significant subset can be found, which does not necessarily include the tails of the covariate distribution.33 The OSA results were then graphically compared with the multipoint GH results and the empirical P-values for the change in the LOD score were examined. An empiric probability was calculated to assess whether the observed OSA LOD score (based on ordering the families by the covariate) was significantly increased over the OSA LOD obtained from a random reordering of families. This empiric probability was computed using a permutation test, which randomized the order of the families 10 000 times to determine the proportion of times the randomly ordered families gave an OSA LOD score greater than the observed OSA LOD score. This permutation test examines the hypothesis that the covariate-defined subset yields a higher LOD score than in the randomly assigned family subsets.33 The OSA method maximizes the OSA LOD score over the subsets. Therefore, the maximum LOD score will always be at least as large as the overall LOD score in all families. This means that the distribution of the maximum OSA LOD score cannot be the same as the distribution of a LOD score, which has not been maximized. Evaluation of the OSA LOD scores for a subset has to be done in the context of the evidence for linkage in the entire sample and must account for the selection of a subset of the data. By evaluating the change in evidence in favor of linkage, we can account for both the baseline evidence and the nonrandom subsetting of the data.

We analyzed three variables, individual family NPL scores for the peak on 2q37, individual family NPL scores for the peak at 17q21-q22 and a maximum score, which was calculated as the larger of the two NPL scores. A P-value was considered significant at P≤0.025, to account for the two opposing ranking methods for the covariate.33 Family-based association analyses were also performed (Supplementary Methods).

Results

Conditioning on linkage to chromosome 2

Conditioning on linkage to chromosome 2q37 yielded six loci with OSA LOD scores >2 and significant ΔLOD scores by permutation testing (Table 1 and Figure 1). The highest OSA LOD score of 4.88 (ΔLOD=3.193, P=0.009) was on chromosome Xq26.3-q27.1 in an optimal-slice subset of 41 families with weak to moderate evidence of linkage to chromosome 2. Of these families, 18 had evidence of male-to-male transmission and 23 had no evidence of male-to-male transmission.

Table 1 Maximum OSA LOD scores in descending order in the 69 families conditioning on linkage to chromosome 2, chromosome 17 or both
Figure 1
figure 1

Nonparametric LOD score plot by chromosome for the most significant OSA LOD scores when conditioning on linkage to chromosome 2. LOD scores for the overall sample (dashed line) and the subset with maximum NPL score (solid line) at (a) 140 (b) 156 (c) 112 (d) 26 (e) 94 and (f) 117 cM.

Other loci with high OSA LOD scores were on 12q21.1-q23.3 (OSA LOD=3.05, ΔLOD=1.835, P=0.02) in a subset of 17 families unlinked to chromosome 2 and 8q24.22-q24.3 (OSA LOD=3.195, ΔLOD=2.963, P=0.02) in an optimal-slice subset of 15 families with weak evidence of linkage to chromosome 2.

Conditioning on linkage to chromosome 17

Conditioning on linkage to chromosome 17q21-q22 yielded four loci with OSA LOD scores >2 and significant ΔLOD scores by permutation testing (Table 1 and Figure 2). The highest OSA LOD score was on 3q26.31-q27.1 (OSA LOD=3.49, ΔLOD=2.39, P=0.02) in a subset of 47 families with moderate to no linkage to chromosome 17. The second strongest locus was on 12q14.2-q21.31 (OSA LOD=3.23, ΔLOD=2.33, P=0.02) in a subset of 34 families with weak to no linkage to chromosome 17.

Figure 2
figure 2

Nonparametric LOD score plot by chromosome for the most significant OSA LOD scores when conditioning on linkage to chromosome 17. LOD scores for the overall sample (dashed line) and the subset with maximum NPL score (solid line) at (a) 185 (b) 82 (c) 142 and (d) 128 cM.

Conditioning on linkage to chromosomes 2 and 17

We then calculated the maximum family LOD score for chromosomes 2 and 17, so we could condition on linkage to either chromosome 2 or 17. Four loci had OSA LOD scores >2 and had significant ΔLOD scores by permutation testing (Table 1 and Figure 3). None were as strong as the scores found when conditioning on just one of the chromosomes, however, two loci were completely novel to this third analysis; 18q12.1-q12.2, which was not quite significant (OSA LOD=2.54, ΔLOD=1.65, P=0.03) in the 38 families not linked to either locus, and 22q11.1-q11.21 (OSA LOD=2.40, ΔLOD=2.36, P=0.006) in an optimal-slice subset of 12 families with weak evidence of linkage to either chromosome 2 or 17. Family-based association analysis results can be found in the Supplementary Results and Supplementary Table 1.

Figure 3
figure 3

Nonparametric LOD score plot by chromosome for the most significant OSA LOD scores when conditioning on linkage to chromosome 2 or 17. LOD scores for the overall sample (dashed line) and the subset with maximum NPL score (solid line) at (a) 82 (b) 55 (c) 1 and (d) 94 cM.

Discussion

The considerable genetic heterogeneity of PrCa makes it difficult to identify the various genetic factors that contribute to the risk of developing the disease. Here we have used ordered subset linkage analysis in order to find additional loci that cannot be found with other linkage methods. Inflation in the false-positive rate in these OSA analyses (induced by examining multiple family subsets for a given covariate) is controlled using a permutation test. Simulation studies performed by Hauser et al33 show that the type 1 error rate is adequately controlled by the permutation procedure. We have not controlled for OSA comparisons over the different conditioning loci because some correlation may well exist between the loci. A Bonferroni correction of such a P-value would be very conservative. These are exploratory analyses and as such should be seen as a hypothesis-generating approach that requires follow-up and cross comparison with other studies. This is similar to the procedure of Cox et al,42 who proposed the idea of examining the difference between overall and conditional LOD scores to evaluate the effects of epistasis or genetic heterogeneity, and Hauser et al33 showed that this idea could also be applied to OSA.

Our OSA analysis conditioning on linkage to chromosome 2 revealed a subset of families with strong evidence of linkage to chromosome X. The OSA LOD score peak of 4.88 on chromosome Xq26.3-q27.1 was found in families with weak to moderate evidence of linkage to chromosome 2. The first documented X-linked PrCa susceptibility locus in the region Xq27-q28 (HPCX) was identified in 1998 in a study of 360 families with multiple cases of PrCa.21 From this study, a subset of Finnish families with no male-to-male transmission and late age of onset of PrCa (>65 years) demonstrated stronger evidence for linkage to Xq27-q28 than other families belonging to the complete data set.39 In the current data set, there was no significant overall evidence for linkage to Xq26-q28, but this region was significant in the subset of families that showed weak to moderate evidence of linkage to chromosome 2. Thus, this OSA procedure is successfully detecting an X-linked subset in the presence of strong heterogeneity. Our results for chromosome X lie within the vicinity of HPCX and the genomic coordinates for SPANXC (Xq27.1), a member of the SPANX family, which encode for differentially expressed testis-specific proteins that localize to various subcellular compartments. Originally, the SPANX family of genes was proposed as putative candidates for the HPCX-susceptibility locus.43 However, recent studies have not identified mutations in any of these genes that definitely account for PrCa risk in X-linked HPC families.44

Also found by conditioning on linkage to chromosome 2 was a significant peak on 12q21-q23, with an OSA LOD of 3.67 in families unlinked to chromosome 2. Nominal but nonsignificant evidence of linkage to 12q was detected in the complete data set with a heterogeneity LOD (HLOD) of 2.3434 on 12q22-q23, which overlaps with the present region. Again, the OSA technique was able to identify the subset of families strongly linked to this region. This locus spans 30 Mb and is close to a locus reported by the International Consortium for Prostate Cancer Genetics (ICPCG) when performing OSA using 406 marker loci (distributed across the genome) as the OSA-ranking covariate in 426 PrCa families from Johns Hopkins Hospital, University of Michigan, University of Umeå and University of Tampere.45 Some of the families included here were part of this ICPCG study. This 12q locus also overlaps with the locus recently found in African Americans.46 Gain and loss of 12q has been reported in many studies of prostate tumors.47, 48

A third significant locus found by conditioning on linkage to chromosome 2 was a peak on 8q24.22-q24.3 in families with weak evidence of linkage to chromosome 2. There was not even a suggestive evidence of linkage to this region in the overall data set, but OSA was able to identify the 15 families showing linkage to 8q24.22-q24.3. Amplification of 8q24 in PrCa has been consistently reported.49, 50, 51 This 7.5-Mb locus overlaps with the 8q24 region, which has been reported in linkage and GWASs of PrCa17, 23, 24, 29, 32 and has been replicated in many studied populations.22, 25, 26, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68 The prostate stem cell antigen maps to this region69 as does the oncogene MYC. The region has been extensively studied and is relatively gene-poor. A number of functional enhancers have been identified in the region70, 71 and some of these enhancers have been shown to regulate MYC.

The strongest signal found by conditioning on linkage to chromosome 17 was on 3q26.3-q27.1 in families with moderate to no linkage to chromosome 17. This region did not exhibit even a nominal evidence of linkage in the complete data set. The locus spans 8 Mb and contains a number of genes known to be upregulated in cancer cells. Amplification of this region is associated with PrCa72, 73, 74, 75 and it contains the gene PIK3CA, which is associated with other tumors including breast, ovarian, colorectal and gastric cancers.76, 77, 78, 79, 80

Some of these loci were found in subsets that already had some evidence of linkage to chromosome 2 or 17. These loci may not have been detected in the original linkage analyses because the effect on risk due to these loci is smaller than the effect on risk of the chromosome 2 or 17 loci, leading to insufficient power. Although several of the loci detected by this OSA analysis have been seen in other studies, they were not significant in these data even when using HLOD scores. The two novel loci on 18q and 22q may only be detectable after conditioning on other loci with larger effects on PrCa risk. There may also be a statistical interaction on risk between these loci and the conditioning loci but the current analyses cannot test explicitly for this. Current thinking suggests that for many common cancers, multiple loci may be acting together to account for the high risk of developing the disease in highly aggregated families. If the additional loci found here are truly affecting risk of developing PrCa, then they may be having a modifying role on the complex process of neoplastic transformation, perhaps through multiple different mechanisms. Alternatively, because these are highly selected families, it is possible that each locus has independent, noninteracting effects on PrCa risk, and risk genotypes are segregating for multiple loci in some of the same families because of the mode of ascertainment.

For OSA to have sufficient power, there must be adequate correlation between evidence of linkage and the levels of the OSA covariate. In our analysis, the covariate was itself a linkage signal, and so the extent to which this varies between families will be closely correlated with the overall genetic heterogeneity in the sample.

In order to further investigate these loci, useful approaches to refining the signal and identifying causative variants would be targeted sequencing of the region in selected individuals or whole-genome sequencing (WGS). Linkage results such as these were helpful in identifying the candidate causal mutations in the HOXB13 gene on chromosome 17q21.37 Thus, these linkage results may be very useful in WGS studies of individuals with a family history of PrCa, because they will allow us to prioritize these regions in the search for rare variants with major effects on PrCa risk.81, 82, 83