Tagging single-nucleotide polymorphisms in candidate oncogenes and susceptibility to ovarian cancer

Low–moderate risk alleles that are relatively common in the population may explain a significant proportion of the excess familial risk of ovarian cancer (OC) not attributed to highly penetrant genes. In this study, we evaluated the risks of OC associated with common germline variants in five oncogenes (BRAF, ERBB2, KRAS, NMI and PIK3CA) known to be involved in OC development. Thirty-four tagging SNPs in these genes were genotyped in ∼1800 invasive OC cases and 3000 controls from population-based studies in Denmark, the United Kingdom and the United States. We found no evidence of disease association for SNPs in BRAF, KRAS, ERBB2 and PIK3CA when OC was considered as a single disease phenotype; but after stratification by histological subtype, we found borderline evidence of association for SNPs in KRAS and BRAF with mucinous OC and in ERBB2 and PIK3CA with endometrioid OC. For NMI, we identified a SNP (rs11683487) that was associated with a decreased risk of OC (unadjusted Pdominant=0.004). We then genotyped rs11683487 in another 1097 cases and 1792 controls from an additional three case–control studies from the United States. The combined odds ratio was 0.89 (95% confidence interval (CI): 0.80–0.99) and remained statistically significant (Pdominant=0.032). We also identified two haplotypes in ERBB2 associated with an increased OC risk (Pglobal=0.034) and a haplotype in BRAF that had a protective effect (Pglobal=0.005). In conclusion, these data provide borderline evidence of association for common allelic variation in the NMI with risk of epithelial OC.

(earlier known as STK15) (Dicioccio et al, 2004). However, this was not confirmed in a larger consortium study . Most of the oncogenes known to be altered in OC development have not yet been studied.
Of the oncogenes known to be involved in OC, KRAS is the most frequently mutated (Forbes et al, 2006). KRAS functions in the receptor tyrosine kinase pathway (Gemignani et al, 2003) and several other genes that function in this pathway are mutated in multiple tumour types (Cuatrecasas et al, 1997). Activating mutations of KRAS appear to be an early event in OC development, but predominantly tumours of the mucinous histological subtype (Gemignani et al, 2003). Mutations in codons 12 and 13 have been detected in approximately 50% of mucinous OCs (Gemignani et al, 2003). BRAF, in the mitogen-activated protein kinase pathway, is a downstream effector of KRAS and is critical in the transduction of cell growth signals (Cuatrecasas et al, 1998). Overexpression of BRAF has been found in a variety of cancers, and mutations have been reported in 12% of OCs (Gemignani et al, 2003;Russell and McCluggage, 2004;Sieben et al, 2004).
PIK3CA is the catalytic subunit of the lipid kinase phosphatidylinositol 3-kinase (PIK), which is involved in the regulation of cell proliferation, adhesion transformation, survival, apoptosis and motility (Volinia et al, 1994;Fruman et al, 1998;Cantley, 2002). The helical and kinase domains of PIK3CA are hotspots for mutations, which have been found in multiple tumour types including ovary, breast, lung, brain, colon and stomach (Muller et al, 2007). PIK3CA mutations have been shown to correlate with increased gene expression in several OC cells lines. Detectable amplification of the gene has also been shown in 58% of ovarian tumours using fluorescence in situ hybridisation (Shayesteh et al, 1999).
The human epidermal growth factor receptor-2 gene, ERBB2 (HER-2/Neu), is a transmembrane protein that acts as a growth factor receptor and is involved in cell proliferation and cell differentiation (Wu et al, 2004). Breast, prostate, lung, gastrointestinal, kidney, liver and bladder cancers have all shown an elevated expression of ERBB2 (Wu et al, 2004). For OC, 20-30% of primary stage III/IV tumours show ERBB2 overexpression (Hellstrom et al, 2001). Protein expression using antibody staining on a subset of ovarian tumours from the MALOVA study showed that 39% of the carcinomas overexpressed ERBB2 (Hogdall et al, 2003b).
The MYC family of proto-oncogenes, including NMYC and MYC, and their interacting partners, are transcription factors that have a well-documented role in tumourigenesis. MYC overexpression caused by gene amplification induces uncontrolled hyper-proliferation and occurs in B35% of epithelial OCs. Another gene, the NMYC and STAT interactor (NMI), which interacts with NMYC, MYC, MAX, FOS, other transcription factors (Zhu et al, 1999) and BRCA1 (Li et al, 2002), is overexpressed in human leukaemias and other cancers (Bao and Zervos, 1996).
The aim of this study was to evaluate the risks of OC associated with common genetic variation in five of the candidate oncogenes described above -BRAF, ERBB, KRAS, NMI and PIK3CA -using a SNP-tagging approach. To do this, we genotyped 34 common tagging SNPs (tSNPs) in 1816 invasive epithelial OC cases and 3000 unaffected controls from five different case-control studies from the United States, United Kingdom and Denmark as part of a multicentre collaboration. We then evaluated one positive finding in a further 1097 cases and 1712 controls from three other US studies.

Study individuals
In the first stage of this study, we genotyped OC cases and controls from five different populations. These were (1) The Danish MALOVA study (446 cases and 1221 controls); (2) The UK SEARCH study (730 cases and 855 controls); (3) The Genetic Epidemiology of Ovarian Cancer Study (GEOCS; previously FROC) from Stanford, CA, USA (327 cases and 429 controls); (4) The USC (A) study from Los Angeles, CA, USA (197 cases and 224 controls); and (5) the UKOPS study from the United Kingdom (116 cases and 271 controls). In stage 2, a putative positive association was followed up in three other case -control studies: (1) The USC (B) study, CA, USA (237 cases and 360 controls); (2) The DOVE study, Seattle, WA, USA (584 cases and 716 controls); and (3) The HOPE study, Pittsburgh, USA (276 cases and 636 controls). USC (A) and (B) are subsets of the same USC population. The USC (A) samples were collected between 2000 and 2004 and USC (B) samples were collected from 1993 to 1999. All study individuals were non-Hispanic Whites. Details for several of these studies have been published before (Dicioccio et al, 2004;Pearce et al, 2005;Song et al, 2006Song et al, , 2007Gayther et al, 2007;Rossing et al, 2007) and are summarised in Table 1. Local ethics committee approval was given for the collections and genotyping in all individuals.

Candidate gene and tSNP selection
We chose to analyse candidate oncogenes for which there is evidence that the genes were amplified or mutated in OCs. The genes we chose to examine initially were BCL2, BRAF, MYC, CTNNB1, EGFR, ERBB2, FGF3, HRAS, KIT, MDM2, NMI and PIK3CA. Some of these genes were excluded if no HapMap genotyping data were available, if the gene was poorly tagged or if there were o3 tSNPs or 415 tSNPs in the genes. We used data from the CEPH population, from The International HapMap Project Data Rel 20/phase II Jan06 (www.hapmap.org), Haploview version 3.32 (Barrett et al, 2005) and Tagger (de Bakker et al, 2005) to select tSNPs that capture common genetic variation in each candidate gene, and putative regulatory regions up and downstream of the gene (within 5 kb), with a minimum squared correlation of 0.8 (r 2 X0.8). The multi-marker (aggressive) tagging option of Tagger was used to select tSNPs. If a selected tSNP failed assay design or genotyping, an alternative tagging SNP was chosen.

Genotyping SNPs
A combination of iPLEX (Sequenom Inc., Hamburg, Germany) and TaqMan ABI 7900HT Sequence Detection System (Applied Biosystems, Warrington, UK) was used to genotype the samples as described earlier (Zhao et al, 2006). The MALOVA and SEARCH samples were genotyped by a combination of TaqMan and iPLEX; UKOPS, USC and GEOCS were genotyped with TaqMan only. Genotyping with iPLEX was performed at the Sequenom laboratory in Hamburg, Germany. TaqMan genotyping of stage 1 samples was performed at the Gynaecological Cancer Research Laboratories, University College London, and Strangeways Research Laboratory, University of Cambridge (both UK). For stage 2, samples were genotyped by TaqMan at the Keck School of Medicine, University of Southern California, USA. Genotyping was repeated for studies/plates when call rates were below 90%, if there were discordant duplicate samples or if negative controls tested positive (Song et al, 2006Gayther et al, 2007).

Statistical methods
Deviation from Hardy -Weinberg equilibrium (HWE) was assessed in controls within study populations using the standard w 2 test. Unconditional logistic regression was used to assess the relationship between each tSNP and risk of OC for each study and pooled across studies (stratified by study), with the primary test of association being a test for trend (P trend ). The per-allele odds ratio and odds ratios for the heterozygote and rare homozygote relative to the common homozygote were estimated by stratified logistic regression. The programme TagSNPs (Stram et al, 2003) was used to model the relevant multi-marker haplotypes resulting from aggressive SNP tagging. Heterogeneity between study strata was tested by comparing logistic regression models with and without a genotype -stratum interaction term using the likelihood ratio test. All the reported P-values are two sided. There was no association in controls between age and genotype frequency for any of the SNPs, and adjusting for age did not materially alter the effect estimates and thus age was not included in the models (data not shown). Where there was evidence for association, we compared the fit of log-additive co-dominant, dominant and recessive genetic models using likelihood ratio tests.
We also conducted analyses to determine if haplotype effects were present. Haplotype blocks (regions of strong linkage disequilibrium) were defined using the confidence interval option of Haploview (Gabriel et al, 2002), with minor adjustments to include adjacent SNPs, but maintaining the cumulative frequency of the common haplotypes to 490%. All genes had one haplotype block, except KRAS, which had two blocks. Haplotype analysis was conducted using the programme TagSNPs (Stram et al, 2003).
TagSNPs implements an expectation -substitution approach to account for the uncertainty caused by the unphased genotype data (Stram et al, 2003). The genotype data for GEOCS, MALOVA, SEARCH, UKOPS and USC (A) samples were used in this analysis. Unconditional logistic regression was used to test the association between each haplotype relative to the most common haplotype (Zaykin et al, 2002;Stram et al, 2003). Haplotypes that occurred with a frequency of 2% or greater in the combined data were considered 'common', and those with less than 2% frequency were pooled together as rare haplotypes.

RESULTS
Forty SNPs were selected to tag the common genetic variation in BRAF, ERBB2, KRAS, NMI and PIK3CA. Six of these (PIK3CA: rs1607237, rs6443624 and rs3729692; KRAS: rs11047912 and rs17388893; BRAF: rs11771946) failed assay design, manufacture or genotype testing and could not be efficiently tagged by any tSNPs in oncogenes and ovarian cancer susceptibility other SNP. Therefore, we were able to genotype 34 SNPs in total ( Table 2). The tSNPs were selected using HapMap Data Rel 20/ phase II on NCBI B35 assembly, dbSNP b125. For all five genes, we captured 176 of 188 (94%) common SNPs with r 2 40.8. Tagging efficiencies were the same using the most recent HapMap data release (HapMap Data Rel 21a/phase II on NCBI B35 assembly dbSNPb125), which captured 199 of 212 (94%) of the common SNPs with r 2 40.8. This panel of SNPs was genotyped in five different OC case -control studies from the United Kingdom (SEARCH and UKOPS), United States (GEOCS and USC (A)) and Denmark (MALOVA). Combined, these studies comprise 1816 invasive epithelial OC cases and 3000 unaffected female controls. For most tSNPs, genotype distributions in controls were consistent with HWE in all populations in which genotyping passed quality control criteria. For one SNP (rs2699905), controls from GEOCS, SEARCH and UKOPS deviated significantly from HWE (Po0.01). Seven tSNPs (rs2952155, rs11047917, rs11551174, rs10487888, rs2865084, rs1733832 and rs289831) could not be genotyped for GEOCS, UKOPS and USC (A) because Taqman assays for these tSNPs failed assay manufacture. This is reflected in the variable numbers of cases and/or controls that were successfully genotyped for each tSNP, listed in Table 3.

Association between genotype frequencies and OC risk
We found no evidence of association between tSNPs or multimarker haplotypes in BRAF, ERBB2, KRAS and PIK3CA and susceptibility to invasive epithelial OC (Table 3 and Supplementary  Table 1). A SNP in NMI (rs11683487) showed evidence of association with reduced risk of OC (heterozygous odds ratio (OR) 0.80 (95% confidence interval (CI) 0.69 -0.93); homozygous OR 0.87 (95% CI 0.71 -1.02); P trend ¼ 0.038) ( Table 3). The bestfitting genetic model for this SNP was a dominant model (P ¼ 0.004) (rare allele carriers vs common allele homozygotes OR 0.81 (95% CI 0.71 -0.94)). There was no statistically significant heterogeneity across studies for any SNP.
The association with rs11683487 was investigated further by performing a second stage of genotyping in three additional populations from the United States (USC (B); DOVE and HOPE) (Table 1). Together, these three studies comprised an additional 1097 cases and 1712 controls. There was no association between rs11683487 and the risk of OC in the samples used for validation on their own (P dominant ¼ 0.92; OR ¼ 1.01 (0.85 -1.20)). After combining the data from both stages, the association with rs11683487 was weaker, but still statistically significant (P dominant ¼ 0.032; OR ¼ 0.89 (0.80-0.99); Figure 1A; Supplementary Table 2).
Earlier studies have shown that different histological subtypes of OC have different genetic and biological backgrounds and are associated with different aetiological pathways. Therefore, we stratified cases by histological subtype and repeated the analyses. In the combined sample set, there were 859 OC cases of the serous histological subtype, 274 endometrioid cases, 192 mucinous cases and 138 clear-cell cases. We found no additional evidence of genetic associations in the serous subtype, but we did find borderline evidence of association for one SNP each in the ERBB2 and PIK3CA genes with endometrioid OC and for three SNPS each in the BRAF and KRAS genes and one SNP in the NMI gene, all associated with mucinous OC. These data are summarised in Table 4.
We performed tests of association with common haplotypes for the five genes. There was no evidence of association with OC risk for haplotypes in KRAS and PIK3CA (Table 5). We found statistically significant haplotype effects for BRAF, ERBB2 and NMI. Two haplotypes from ERBB2 were associated with an increased OC risk, h233 (OR ¼ 1.17 (1.02 -1.34), P ¼ 0.022) and h411 (OR ¼ 1.19 (1.03 -1.37), P ¼ 0.016), respectively. A haplotype in BRAF, h333423241, was associated with a decrease in the risk of OC (OR ¼ 0.81 (0.68 -0.95), P ¼ 0.012). Global tests of association were significant for BRAF (P ¼ 0.005) and ERBB2 (P ¼ 0.034). The association observed with the NMI haplotype was fully explained by the single tSNP association.

DISCUSSION
Somatic alterations that activate proto-oncogenes and drive cells towards unregulated proliferation are a well-documented feature of all cancers. It has also become clear that different combinations of oncogenes contribute to the development of different tumour types. BRAF, ERBB2, KRAS and PIK3CA are all oncogenes shown to be involved in OC development, and NMI interacts with the oncogenes NMYC, MYC, MAX and FOS. NMI has also been shown to form a complex with MYC and BRCA1 and therefore may play a role in breast cancer and OC (Li et al, 2002). In this study, we evaluated the association between 34 tSNPs in these genes and the risk of invasive epithelial OC using a case -control study design. To our knowledge, none of these genes has been investigated before for their association with invasive OC. We found borderline evidence for a statistically significant association with disease risk for a tSNP, rs11683487, in intron 1 of the NMI gene. The common allele (G) occurs in the Caucasian population with a frequency of approximately 58% and we observed that the rare allele (T) was associated with a decreased risk of OC. The association for rs11683487 may be a false positive. Where the prior probability of association is low, very stringent significance levels are required to ensure that a detected association is true positive. Genome-wide significance is generally considered to be Po10 À7 (Thomas et al, 2005). False-positive associations due to population stratification is also possible, but this seems an unlikely explanation for data from multiple studies from different populations in which the analyses were restricted to white subjects.
If the association we identified in NMI is real, then this could be either due to a direct causal effect of the tSNP or because the SNP is in linkage disequilibrium with the true causal variant, possibly in a different gene. NMI enhances the transcription of several other genes altered in OCs (MYC; N-MYC) when it is induced by interleukin-2 and interferon-g. The role of MYC amplification in ovarian and other cancers is well established. A detailed mapping of SNPs at a locus on chromosome 8q, near MYC, has recently provided substantial evidence that this locus is associated with susceptibility to breast, prostate and ovarian cancer (Ghoussaini et al, 2008). A further link comes from the finding that NMI forms a complex with MYC and BRCA1 (Li et al, 2002).
The NMI SNP rs11683487 tags eight other SNPs with r 2 40.8, one of which is a non-synonymous coding SNP (rs1048135) tagged with an r 2 ¼ 1, and the rare (G) allele codes for leucine instead of serine. We examined whether there is any evidence supporting the role of these SNPs in abrogating NMI function. The programme PMut (Ferrer-Costa et al, 2005) predicted that the rare allele (coding for leucine), with a score of 3/10 had a 'pathological significance' and was classed as 'damaging' using the SIFT programme (Cheng et al, 2006). The bioinformatics tool, PupaSNP (http://pupasuite.bioinfo.cipf.es/) (Conde et al, 2006;Reumers et al, 2008), also suggested that this allele may disrupt the binding of exonic splicing enhancers. In addition, PupaSNP indicated that rs11683487 and rs11730 may have transcription and translation regulatory functions, and that rs11730 may affect exon splicing.
We found no evidence of association with disease risk for polymorphisms in BRAF, ERBB2, KRAS and PIK3CA at Pp0.05, when OC was considered as a single disease phenotype. The combined sample size from five studies provides 98% power at the 5% significance level to detect a co-dominant allele with a frequency of 0.3 that confers a relative risk of 1.2, and 95% power to detect a dominant allele with a frequency of 0.1 that confers a relative risk of 1.3. It is therefore unlikely that the common tagged variants in these genes contribute significantly to OC risk. However, we cannot rule out the possibility that associations exist for the known poorly tagged variants. With the most recent HapMap data (release 21a), of the 212 common variants, 199 were tagged with r 2 40.8 and 205 with r 2 40.5. Furthermore, even though tSNPs based on HapMap data are likely to tag most of the common SNPs, there is a possibility that other unknown common variants were poorly tagged, or that less common variants in these genes that influence disease susceptibility exist. We must also consider the possibility that common variants within these genes confer susceptibility to specific subtypes of OC. There is evidence in the literature that the genetic changes associated with OC development differ for different histological subtypes (reviewed in Elmasry and Gayther, 2005). For example, somatic activating KRAS mutations are found to some extent in most OC subtypes, but are much more common in mucinous ovarian tumours. Also, germline BRCA1 and BRCA2 mutations tend to predispose to serous OCs (Lakhani et al, 2004). We found some evidence of association with disease risk for different histological subtypes, for SNPs in all five of the oncogenes studied, and it is perhaps interesting that SNPs in the KRAS gene and its downstream effector BRAF were associated with mucinous OC. However, the sample sizes after subtype stratification meant that these studies had insufficient power to detect associations at stringent levels of statistical significance, and so the data must be treated with caution. Much larger sample sizes, gathered through the ovarian cancer association consortium (OCAC), will be needed to establish if any of these associations are real.
Haplotype analysis identified significant associations in BRAF, ERBB2 and NMI. The association between the NMI haplotype and risk of OC is explained by the single SNP rs11683487. The global test of haplotype effect was statistically significant for BRAF and ERBB2. Interestingly, the two haplotypes in ERBB2, which are significantly associated with increased risk of OC, contain the opposite allele at each SNP loci. Using HapMap data we evaluated whether these two putative risk haplotypes in ERBB2 shared an untagged common variant, but this was not the case. It is possible that there is an as-yet-unidentified variant that tags both haplotypes. There may be an allele that is found only in the protective haplotype of BRAF, which was not captured with our genotyping.
This study is one of the several in the published literature to use the multi-centre OCAC to follow-up on putative susceptibility alleles for OC (e.g., Gayther et al, 2007;Pearce et al, 2008;Ramus et al, 2008). These studies highlight the importance of consortia for validating suggested genetic associations from case -control studies and for identifying novel susceptibility loci for the disease. In addition to dramatically increasing the power of association studies, another role of consortia like the OCAC, has been to implement stringent data quality and genotyping guidelines, which are likely to minimise reports of false-positive associations.  In conclusion, we genotyped 34 tSNPs that tag the common variants in BRAF, ERBB2, KRAS, NMI and PIK3CA in OC cases and controls. We found borderline evidence of a statistically significant association with invasive OC, for a SNP in NMI and haplotypes in BRAF and ERBB2. Further studies will be needed to confirm if this genetic risk association is real or not. We thank all members of the research team, including research nurses, research scientists, data entry personnel and consultant gynaecological oncologists, for their help in establishing the UKOPS case -control collection and Andy Ryan and Jeremy Ford for data and sample management. We also thank Debby Bass, Shari Hutchison, Carlynn Jackson, Jessica Kopsic and 3.9 1.20 (0.97 -1.48) 0.102 CI ¼ confidence interval; MAF ¼ minor allele frequency; OR ¼ odds ratio. In the haplotypes, the numbers correspond to nucleotides: 1 ¼ A, 2 ¼ C, 3 ¼ G, 4 ¼ T. SNP order in haplotypes is 5 0 -3 0 of the genes -BRAF: rs10487888, rs1733832, rs1267622, rs13241719, rs17695623, rs17161747, rs17623382 and rs6944385; ERBB2: rs2952155, rs2952156 and rs1801200; KRAS (block 1): rs12305513, rs12822857 and rs10842508; KRAS (block 2): rs12579073, rs10842513, rs4623993, rs6487464, rs10842514 and rs11047917; NMI: rs394884, rs11551174, rs289831, rs3771886 and rs11683487; PIK3CA: rs2865084, rs7621329, rs1517586, rs2699905, rs7641889, rs7651265, rs7640662 and rs2677760. a Compared with common haplotype. Bold text indicates positive results, either by P-value or CI ranges that do not cross 1.00.

ACKNOWLEDGEMENTS
tSNPs in oncogenes and ovarian cancer susceptibility