Introduction

Prostaglandin–endoperoxide G/H synthase 2, also known as cyclooxygenase-2 (COX-2), is encoded by the gene PTGS2. COX-2 functions as a homodimer and converts arachidonic acid to prostaglandin H2. In turn, this is converted to other pro-inflammatory prostaglandins such as prostaglandin E2 (PGE2). There is a causal relationship between chronic inflammation and cancer,1 with PGE2 known to stimulate cancer progression.2 In particular, inflammatory diseases of the colon such as Crohn’s disease and ulcerative colitis are associated with increased colorectal cancer (CRC) risk.3 Aspirin and other non-steroidal antiinflammatory inhibitors of COX-2 are known to act as CRC chemopreventatives.4 COX-2 is known to be overexpressed in 85% of human colorectal carcinomas, 50% of colorectal adenomas,5 as well as other epithelial and nonepithelial tumours (reviewed in Koki and Masferrer6). There is evidence that COX-2 overexpression is also involved in the aetiology of familial adenomatous polyposis (FAP), a CRC predisposition syndrome caused by an inherited mutation of the Adenomatous polyposis coli (APC) gene. The polyps removed from FAP patients show overexpression of COX-2 in the fibroblast and endothelial cells.7 In a FAP animal model, COX-2 overexpression has a direct role in the formation of tumours as polyposis in Apc-knockout mice is suppressed by the inhibition of COX-2.8 It has also been reported that FAP patients experience significantly less adenoma recurrence when taking the COX-2 inhibitor drug, celecoxib.9 High COX-2 expression is also observed in another familial CRC syndrome, MUTYH-associated polyposis.10

As COX-2 overexpression can be oncogenic, genetic variants that alter the expression of COX-2 may influence development of CRC. A large number of single nucleotide polymorphisms (SNPs) in and around the PTGS2 locus have been reported. If these genetic variations alter the production or function of the COX-2 enzyme, they may modulate the risk of cancer. Non-synonymous coding SNPs within the exons of PTGS2 are very rare. However, the relatively common non-coding SNPs rs20417 (−765G>C) in the promoter and rs5275 (+8473T>C) in the 3′-UTR of the gene have been shown to reduce promoter activity11, and alter mRNA stability,12, 13 respectively. The rs5275 SNP is located within the microRNA miR-542-3p binding site of the COX-2 transcript, with the C allele interfering with microRNA binding. Transcripts bearing the C allele exhibit less microRNA-mediated transcript degradation, increased mRNA stabilisation and thereby direct more COX-2 production.13

The rs5275 C allele has a frequency of 0.37 in Caucasian populations, with a higher frequency in African (0.67) and less in Asian (0.17) populations (dbSNP, submission ss66858504). This SNP has been well studied in many disease contexts and cancer in particular. There are reported significant associations with basal cell, bladder, breast, colorectal, gastric, lung, prostate and ovarian cancers.14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 Across cancer tissues of different origin, each allele paradoxically has been associated with increased risk of cancer. However, if studies are stratified by cancer type, there is greater consistency in the assignment of the risk allele.29 Other biological factors such as gender and age and environmental factors such as non-steroidal antiinflammatory drug use and smoking status also influence the strength of association with cancer incidence.15, 25, 27, 28, 30, 31 Considered together, the prevailing evidence suggests that rs5275 is associated with cancer incidence, but the associations are complex.

In our previous linkage study of families with inherited non-syndromic CRC predisposition, we identified weak linkage to the chromosome 1 region containing the PTGS2 gene locus.32 In this present study we chose to sequence the coding region of PTGS2 in a subset of 16 CRC-affected people with linkage to the chromosome 1 region. The sequencing suggested novel or rare coding SNPs of rs5275 were not contributing to disease. However, we noted a relatively high frequency of the C allele of the common SNP, rs5275. We then genotyped rs5275 in all family members in the Australian and Spanish families previously collected for the linkage study, and looked for an association with CRC diagnosis and age of diagnosis.

Materials and methods

Study population

This family-based association study examined non-syndromic, high-risk CRC families. Families with known colon cancer syndromes, including HNPCC or Lynch syndrome, FAP, hereditary mixed polyposis, juvenile polyposis, Peutz–Jeghers, Cowden’s syndrome or MUTYH-associated polyposis were excluded. A description of these families and criteria for inclusion in the study have been described previously.32 Briefly, these families are defined as those containing at least one affected person who has one or more first-degree affected relative(s) and where the known causal mutations had been excluded. As in our previous study, affected status was defined as diagnosis with either colorectal carcinoma or advanced adenoma. In turn, advanced adenoma was defined as three or more synchronous or metachronous adenomas and/or adenoma(s) with villous morphology, and/or with severe dysplasia, and/or diameter ≥10 mm. The cohort is mostly sibships with missing parental genotypes and is described further in Table 1. In some instances extended pedigrees were genotyped; these include families of up to four generations (including non-genotyped founders). A number of the pedigrees contained parent–child, avuncular or cousin affected pairs. The families show weak genetic linkage with CRC (non-parametric LOD=1.71) in a 21 Mb region contained within chromosome 1q25.2–q32.1 with the PTGS2 gene locus situated in the middle of this region. The population examined comprises 418 genotyped individuals, 241 described in the linkage study and a further 177 individuals from cohort families, which were considered uninformative for our earlier non-parametric linkage analysis. In total there were 183 affected cases (179 had a recorded age of onset) and 223 unaffected relatives, mostly sibs. An additional 12 people without CRC diagnosis, but under 55 years of age or with previous detection of colorectal polyps were considered to have unknown disease status. All people in the study had their age at blood draw recorded. There were 173 males and 245 females in the study (Table 1).

Table 1 Family statistics

Variant screening

Genomic DNA was extracted from white blood cells using a PAXgene Blood DNA Kit (Qiagen, Melbourne, Australia). DNAs were amplified, labelled using BigDye Terminator v3.1 chemistry and run on a 3730xl capillary sequencer (Applied Biosystems, Foster City, CA, USA). Primers were selected using the VariantSEQr (Applied Biosystems) resequencing set for PTGS2 (with sequences provided in Supplementary Table 1). Sequences were inspected visually for variants.

Genotyping

To ensure exceptionally high genotyping quality, the rs5275 (8473T>C) genotype was determined for every possible individual using both the PCR-based primer-introduced restriction analysis (PIRA) method as described by Hu et al24 and the TaqMan 5′ nuclease allelic discrimination assay (Applied Biosystems). Two individuals could only be genotyped using the PIRA assay owing to small quantities of available DNA.

TaqMan assays were performed in 25 μl volumes with 10 ng of template DNA. TaqMan probe sequences and thermal cycling conditions were as described in the variantGPS database (http://variantgps.nci.nih.gov) under polymorphism ID, ‘CGFID_4979’; otherwise, conditions were as described by the manufacturer. The genotyping assays were performed in 96-well plates in a LightCycler 480 real-time PCR system (Roche Applied Sciences, Sydney, Australia). For PIRA assays, 50 ng of template DNA was amplified (in duplicate) and resultant PCR products digested overnight using BclI (New England Biolabs Inc., Beverly, MA, USA). The digested PCR products were separated on a Criterion TBE 10% polyacrylamide gels (Bio-Rad Laboratories, Sydney, Australia) and resolved using Gel Red Nucleic Acid Stain (Biotium, Hayward, CA, USA) and a Typhoon 9400 scanner (GE Healthcare, Sydney, Australia).

Statistical analysis

A survival model was fitted and Kaplan–Meier plots drawn in R (version 2.14, 64-bit) using the ‘survival’ package. Generalised family-based association tests (FBAT)33 implemented in the PBAT version 3.6 software34 were carried out using generalised estimating equations and P-values generated from the asymptotic Normal distribution. Genotypes were analysed using both dichotomous affection status and censored time-to-onset analyses. For the time-to-onset analysis, we used a Wilcoxon Logrank FBAT statistic. The test used a null hypothesis of linkage and no association with sandwich variance estimation. Pedigree correlation estimates can become unstable in large pedigrees. Sandwich variance is useful for robustly estimating correlation between members in these large pedigrees. These analyses were also repeated with consideration of gender. FBAT analyses using nuclear families were implemented in the FBAT V2.0.4 beta software (http://www.hsph.harvard.edu/fbat/fbat.htm).35 The generalised disequilibrium test V0.1.1 software36 was used to test for association in dichotomous relative pairs.

Results

Variant screening

DNAs from 16 people diagnosed with CRC and good evidence for linkage to the chromosome 1 region were forward and reverse capillary sequenced across the exons, 3-UTR and promoter region of PTGS2. In total, 16 SNPs were discovered, of which two were novel (Supplementary Table 2). For each novel SNP, there was only a single observation, so it is highly unlikely that they are the cause of inherited cancer predisposition in these 16 people. Given this, we did not follow up these observations in a larger cohort. Interestingly, we observed the SNP rs5275 C variant allele was enriched in frequency compared with Caucasian population estimates (0.394; dbSNP135), with 77% of people in the sample having the CT or CC genotype, giving a C allele frequency of 0.462. Given this increased C allele frequency and previous reports of an association of rs5275 with CRC diagnosis, we chose to further analyse this SNP in the whole cohort. A single SNP should be a reasonable marker for the involvement of PTGS2 with CRC as the gene is particularly small, only occupying 8.6 kb of space on chromosome 1. This small locus size implies other genetic variations within PTGS2 should be in high linkage disequilibrium.

Genotyping and quality control

Across 418 individuals, all except two were genotyped for rs5275 using both the PIRA and TaqMan methods. There were 20 discordant genotypes between the two technologies. In our hands, Taqman was clearly the more accurate genotyping method—all 20 discordant genotypes were resolved as PIRA genotyping errors (following repeat BclI restriction digest and genotyping on another polyacrylamide gel). We found the C allele frequency to be 0.334, which is consistent with other Caucasian populations. The rs5275 genotype was found to be in Hardy-Weinberg equilibrium.

Summary statistics and survival analysis

Summary statistics subdivided by disease status and gender are presented in Table 2. The median age of disease diagnosis across the genotypes was 55.5, 55.0 and 58.0 for the TT, CT and CC genotypes, respectively. Interestingly, there is some discordance between males and females in the median age of disease diagnosis (Table 2). There is evidence that males carrying at least one C allele have an earlier age of disease diagnosis, while the opposite is observed for females. This is illustrated in more detail in Figures 1a and b. In Kaplan–Meier curves (Figures 2a and b) carriers of the heterozygous CT genotype in both genders trended toward an earlier age of diagnosis than the more common TT genotype carriers. For the 12 females diagnosed with CRC carrying the rare CC genotype, the trend suggests the risk imposed by the CC genotype may vary with age. Until 55 years of age female CC carriers have the lowest rate of cancer diagnosis; however, within the following 5 years the proportion of carriers diagnosed with CRC is comparable with that of male CT genotype carriers. Little can be stated regarding male CC carriers as there were only four observations. The apparent trends in the data were tested using a Cox proportional hazard model conditioned on the pedigrees and stratified by gender. Without clustering there was evidence of a violation of the proportional hazards assumption within pedigrees (P=0.026), with the effect of pedigree increasing over time. To account for the non-independent observations within families, we clustered the data within pedigrees. The odds ratios (OR), with CC as the reference, for the TT and CT genotypes were 1.02 and 1.32, respectively (Likelihood ratio test, P=0.221; Wald test, P=0.168). Although the result is non-significant, the increased OR for the CT genotype, but not the CC or TT genotypes was interesting. We followed this up by collapsing the genotype into heterozygotes (CT genotype) and homozygotes (CC or TT genotypes). In this instance, homozygotes were found to be protected against CRC (OR=0.78) with the difference in risk approaching significance (Likelihood ratio test, P=0.101; Wald test, P=0.069). Although the survival analysis was not significant, it suggests tests of association with CRC age of diagnosis should have the most power under a heterozygous-advantage genetic model.

Table 2 Results of rs5275 genotyping split by disease status and gender
Figure 1
figure 1

Combination boxplot-stripchart by genotype and gender. (a) Affected male C allele carriers have an earlier median age of CRC diagnosis, while the opposite is observed for (b) female C allele carriers. The box represents the upper and lower quartiles and the intersecting horizontal line, the median. The filled grey points are individual patient ages of diagnosis with multiple cases stacked horizontally. Below the genotype the number in parentheses is the total number of cases observed for that genotype with a reported age of CRC diagnosis.

Figure 2
figure 2

Kaplan–Meier analysis of ages of diagnosis, with consideration of rs5275 genotype and gender. The survival function of right-censored age of diagnosis data for (a) 78 males and (b) 101 females were formed with a Kaplan–Meier estimator and genotype and gender as factors. The estimates are plotted by genotype as a function of the proportion of CRC as yet undiagnosed at a given age. As illustrated in the legend on panel (a), the dotted, dashed and solid lines represent the CC, CT and TT genotype estimates, respectively. Crosses on the estimator line are uncensored people—so the age at blood draw of relatives without evidence of CRC.

Family-based association testing

FBATs allow determination of associations in the presence of intra-family correlations that result from allelic bias and the tests are robust to population stratification. In particular, the classes of FBATs implemented in the PBAT software are well suited for this current study, as the tests can generalise to large or small arbitrary pedigrees and to pedigrees with missing parental genotypes.34 Further, PBAT can handle both dichotomous and censored age of diagnosis phenotypes. The PBAT FBAT showed the C allele of rs5275 as significantly associated with the diagnosis of CRC in these predisposed families (Table 3). The additive (increasing risk with increasing C dosage, P=0.0159), dominant (CT and CC impose increased risk, P=0.0094) and heterozygous-advantage (only CT imposes risk, P=0.0496) models all show significant association between carriers of the minor C allele and the diagnosis of CRC. We also considered age at CRC diagnosis as a phenotype and found similar results (Table 4). The minor C allele was significantly associated with an earlier age of diagnosis under the heterozygous-advantage genetic model (P=0.0089), the dominant model (P=0.0116) and additive model (P=0.0486) using an FBAT–Wilcoxon statistic. Considering the gender-discordant trends highlighted in the survival analysis, we also stratified the age of diagnosis association analysis by gender (Table 4). Although there were considerably less subjects in each test, in the more populous female group the association under the dominant (P=0.0451) and heterozygous-advantage (P=0.0427) models were still significant. While no significant association was observed in males, in a similar fashion, the dominant (P=0.1886) and heterozygous-advantage (P=0.1605) models were also the best supported.

Table 3 Results of the family-based association testing between diagnosis of colorectal cancer and rs5275 genotype
Table 4 Results of the family-based association testing between age of diagnosis of colorectal cancer and rs5275 genotype

Discussion

Our previous linkage study suggested linkage between familial inherited non-syndromic CRC and a region of chromosome 1 spanning the PTGS2 gene locus.32 There is extensive evidence to implicate the involvement of the PTGS2 locus gene product, COX-2, in carcinogenesis (Mann and DuBois37) and many genetic association and biochemical studies support the association of rs5275 with CRC diagnosis or aetiology. We confirm and extend these earlier studies, reporting here the first study to consider the association between rs5275 and cancer diagnosis within predisposed families. We show good evidence that rs5275 genotype is significantly associated with both diagnosis and age of diagnosis of non-syndromic CRC, with carriers of the rs5275 variant C allele having a higher risk of developing CRC (P<0.01). This association is consistent with previous reports considering sporadic CRC, which also report a significant association of the C allele with increased cancer risk.18, 19, 21

We found consideration of the age of diagnosis to generate some interesting findings. While age is a major risk factor for all CRC, Kaplan–Meier plots suggest a possible gender-discordant effect with female carriers of the rare CC genotype having a reduced rate of CRC diagnosis when young, but increased diagnosis of familial CRC in later years. We state this tentatively, as there were only 12 observations to support the finding. Although the trend observed in age of diagnosis for CC carriers is complex, carriers of the CT genotype in both genders seem to consistently have an earlier age of onset than carriers of the TT genotype. To test for differences in age of diagnosis between genotypes, we used a likelihood ratio test with clustering by pedigree. Although this test was not significant, the simple survival analysis we used has caveats; it does not account for degrees of relatedness between members of a pedigree, nor does it consider the genetic model or correlation due to genetic linkage. To account for this, we used PBAT that can model as an association in the presence of genetic linkage with consideration of the genetic model and full pedigree structure. In this instance, we found a highly significant association (P<0.01) between the C allele and both diagnosis and age of diagnosis of CRC. Although the dominant genetic model was well supported for both the diagnosis and age of diagnosis phenotype (P=0.0094 and P=0.0116, respectively), we note the heterozygous-advantage model was less well supported for the diagnosis phenotype (P=0.0496, compared with P=0.0089), while the additive model was less well supported for the age of diagnosis phenotype (P=0.0486, compared with P=0.0159). This discrepancy can be explained by the difference in statistic used to test the two phenotypes and by consideration of the Kaplan–Meier plot. The FBAT–Wilcoxon statistic used in the age of diagnosis analysis assigns more weight to early CRC diagnoses, while the FBAT statistic for dichotomous traits considers all diagnoses equally with a simple binary encoding.38 In young females, CT carriers have the highest rate of CRC diagnosis so there is an expectation of good support for a heterozygous-advantage model using an FBAT–Wilcoxon statistic.

Across the two phenotypes, the consistency we observe in the support for an association with the C allele under the additive, dominant and heterozygous-advantage genetic models is encouraging. The age of diagnosis is a more information-rich phenotype than diagnosis alone; however, specification of the age of diagnosis is prone to some vagaries and bias. Foremost, the progression of CRC from early-to late-stage disease can take multiple years, and patients are diagnosed at various stages of disease progression. Additionally, in a familial cancer setting, often a diagnosis in the proband may lead to earlier surveillance of the proband’s siblings and offspring. Finally, there is the potential for error in self-reporting an age of diagnosis. Although age of diagnosis is a more information-rich phenotype, a simple dichotomous trait will avoid these caveats, as it is a test of ‘if’ and not ‘when’ a person will get disease.

Consideration of the association testing and survival analysis together suggests that the association of rs5275 genotype with non-syndromic CRC is complex and has a possible gender effect. The tentative finding that female carriers of the CC genotype are protected from CRC development early in their lives, but show increased risk around 55–60 years of age, is particularly interesting. Considering the median age of menopause is estimated to be 51.3 years,39 this increased risk may correlate with prior onset of menopause. Age discordant and menopause-dependent effects in women’s predisposition towards CRC diagnosis and prognosis is well documented.40 It is also known that hormone replacement therapy is protective against the onset of CRC in women, which suggests that reduced oestrogen production postmenopause may increase the risk of CRC.41 Functional evidence exists for the involvement of oestrogen in CRC. Oestrogen has access to normal colon epithelium, with oestrogen receptor beta (ERβ) abundantly expressed in normal colonic epithelia in both genders; however, in CRC cells this expression of ERβ is significantly reduced.42, 43 We postulate that the observed age-dependent risk imposed in females by the CC genotype is due to the protective role of oestrogen in younger women.

Our findings of a complex association are supported by comparisons to the published case-control studies showing significant association of rs5275 with cancer risk (further discussion is available in the Supplementary Material). Collectively, the differences in findings in these studies suggest the allele associated with risk is dependent upon factors, such as the cancer tissue of origin, environmental influences and ethnicity. If only CRC association studies showing significant association are considered, a more consistent trend emerges. A study by Ali et al19 with 726 Caucasian cases and 729 controls, found a complex gender discordant association similar to this present study. In males, carriers of the CT genotype, but not CC genotype, were at increased risk of CRC with a respective OR of 1.31 and 1.02. In females, the trend was reversed (with OR=0.94 and OR=1.40). The two other sporadic CRC case-control studies reporting significant associations with rs5275 genotype are from the same group and do not explicitly segregate the data by gender.18, 21 However, if we consider the associations found within this pair of Caucasian cohorts and relate this to the proportion of males in each, there is some evidence of a gender-discordant effect. In the first cohort with 53.3% male cases, only the CT genotype was found to be associated with CRC incidence (OR=1.47).21 In the second cohort with 22.4% male cases, only the CC genotype was significantly associated with CRC (OR=1.14). We suggest that the gender-discordant risk of carrying the C allele is responsible for the observation of an apparent heterozygous-advantage model in the mostly male cohort.

Given the complexity of the association, we also employed other association tests to test the sensitivity of the result (Supplementary Material). Only the PBAT algorithm that considers affected people across a number of genetic models and makes use of full pedigree structures was sensitive enough to find an association. Our results also suggest that further family-based studies of rs5275 would benefit from an increased number of families in the analysis.

Given that the C allele leads to increased mRNA stabilisation of the PTGS2 transcript,44 the association we observe between the rs5275 C allele and non-syndromic CRC suggests that tumours of C allele carrying patients may have overexpression of COX-2, similar to patients with FAP and MAP syndromes. It would be interesting to measure COX-2 expression levels in colorectal tissue and tumours from patients with inherited non-syndromic CRC predisposition, and to correlate this against their rs5275 genotype.

This current study strengthens the evidence for a link between rs5275 genotype and risk of diagnosis of CRC. We also establish an association between rs5275 genotype and age of diagnosis of CRC. This is the first study of rs5275 in a familial cancer setting, and the evidence for association suggests that rs5275 genotype should be further investigated in this context in larger studies. Our use of gender as a factor and age of diagnosis as a phenotype reveal complexities in the association with rs5275 genotype, this in turn suggests that chemopreventative COX-2 inhibitor drugs may have different efficacies depending upon age, gender and rs5275 genotype. We suggest consideration of gender may help explain the contrary results in the literature for the protection associated with the ‘C’ allele in the breast17 and ovarian cancer28 and risk in prostate cancer.26 Oestrogen activation of COX-2 is a compelling explanation for the tentative gender discordant and female age-dependent associations observed in this study. The evidence, in this and other studies, for the imposition of risk in carrying the rs5275 C allele being dependent upon gender and tissue suggests there would be a fairly low-selection pressure, and may explain why the C allele is relatively common in some populations. The inclusion of gender as a factor in future association studies, particularly of PTGS2, would be prudent.