Common genetic variability in ESR1 and EGF in relation to endometrial cancer risk and survival

We investigated common genetic variation in the entire ESR1 and EGF genes in relation to endometrial cancer risk, myometrial invasion and endometrial cancer survival. We genotyped a dense set of single-nucleotide polymorphisms (SNPs) in both genes and selected haplotype tagging SNPs (tagSNPs). The tagSNPs were genotyped in 713 Swedish endometrial cancer cases and 1567 population controls and the results incorporated into logistic regression and Cox proportional hazards models. We found five adjacent tagSNPs covering a region of 15 kb at the 5′ end of ESR1 that decreased the endometrial cancer risk. The ESR1 variants did not, however, seem to affect myometrial invasion or endometrial cancer survival. For the EGF gene, no association emerged between common genetic variants and endometrial cancer risk or myometrial invasion, but we found a five-tagSNP region that covered 51 kb at the 5′ end of the gene where all five tagSNPs seemed to decrease the risk of dying from endometrial cancer. One of the five tagSNPs in this region was in strong linkage disequilibrium (LD) with the untranslated A61G (rs4444903) EGF variant, earlier shown to be associated with risk for other forms of cancer.

have, however, concentrated only on few single-nucleotide polymorphisms (SNPs) in the gene. We aimed towards capturing the entire common variation in the ESR1 and EGF genes by genotyping a dense set of markers in 92 Swedish controls and then selecting haplotype tagging SNPs (tagSNPs) that were genotyped in 713 Swedish endometrial cancer cases and 1567 Swedish controls. We assessed the association of the tagSNPs with endometrial cancer risk, myometrial invasion and endometrial cancer survival using logistic regression and Cox regression models.

Study population and DNA extraction
Details of the population selection process for this study have been published earlier (Einarsdottir et al, 2007). In brief, 68% (719) of all endometrial cases among women 50 -74 years of age identified through the nation-wide cancer registries in Sweden between 1994 and 1995 agreed to participate in this study. During that period, 64% (1574) of the age-frequency matched controls selected from the Swedish Registry of Total Population agreed to participate. Only women with an intact uterus were considered eligible as controls. All participants provided detailed questionnaire information, and histological specimens for all the endometrial cases were reviewed and re-classified by the study pathologist.
Following informed consent, 603 cases and 1574 controls donated whole blood for DNA extraction. For deceased cases and those cases that declined to donate blood but consented to our use of tissue, we collected archived paraffin-embedded, noncancerous tissue samples (n ¼ 116). We extracted DNA from 4 ml of whole blood using the QIAamp DNA Blood Maxi Kit (Qiagen, Solna, Sweden) according to the manufacturer 0 s instructions. From non-malignant cells in paraffin-embedded tissue, we extracted DNA using a standard phenol/chloroform/isoamyl alcohol protocol (Isola et al, 1994). We successfully isolated DNA from 600 (blood) and 116 (tissue) endometrial cancer patients and 1567 (blood) controls. We randomly selected 92 controls out of the 1567 controls to be used for linkage disequilibrium (LD) characterisation and haplotype reconstruction of the ESR1 and EGF genes.
This study was approved by the Institutional Review Boards in Sweden and at the National University of Singapore.

SNP markers and genotyping
The ESR1 gene covers 295.7 kb of genomic sequence on chromosome 6, and EGF spans 99.4 kb on chromosome 4. We selected SNPs in the ESR1 and EGF genes and their 20-kb flanking sequences from dbSNP (build 124) and Celera databases, aiming for an initial marker density of at least one SNP per 5 kb. SNPs were genotyped at the Genome Institute of Singapore using the Sequenom primer extension-based assay (Sequenom, San Diego, CA, USA) and the BeadArray system from Illumina (San Diego, CA, USA) following the manufacturers' instructions. All genotyping plates included positive and negative controls, DNA samples were randomly assigned to the plates, and all genotyping results were generated and checked by laboratory staff unaware of casecontrol status. Only SNPs where 485% of the samples gave a genotype call were analysed further. As quality control, we genotyped 200 randomly selected SNPs in the 92 control samples using both the Sequenom system and the BeadArray system. The genotype concordance was 499.5%, suggesting high genotyping accuracy and high concordance between the two platforms.

LD characterisation and TagSNP selection
We successfully genotyped 228 SNPs in the ESR1 gene and 104 SNPs in the EGF gene in the 92 controls. The SNP names, physical positions, minor allele frequencies (MAF) and the Hardy -Weinberg equilibrium (HWE) P-values have been published earlier as Supplementary Tables 1 and 2 in Einarsdóttir et al (2008). We thereafter identified regions of LD and selected 'tagging' SNPs (tagSNPs). We produced LD plots of the D 0 and R 2 values for ESR1 and EGF using the LDheatmap function in the statistical software R (Team, 2005). The plots have been published earlier as Supplementary Figures 1 and 2 in Einarsdóttir et al (2008). We reconstructed haplotypes for the genes using the PLEM algorithm (Qin et al, 2002) implemented in the tagSNPs program (Stram et al, 2003) and selected tagSNPs based on the R 2 coefficient, which quantifies how well the tagSNP haplotypes predict the SNPs or the number of copies of haplotypes an individual carries. We chose tagSNPs so that common SNP genotypes (minor allele frequency X0.03) and common haplotypes (frequency X0.03) were predicted with R 2 X0.8 (Gabriel et al, 2002). The well-studied PvuII (rs2234693), XbaI (rs9340799), codon 243 (rs4986934) and codon 325 (rs1801132) variants in ESR1 had been genotyped earlier in our study subjects , in both cases and controls, and were therefore 'forced' into the selection of tagSNPs. In order to evaluate our tagSNPs' performance in capturing unobserved SNPs within the genes and to assess whether we needed a denser set of markers, we carried out an SNP-dropping analysis (Weale et al, 2003;Iles, 2006). In brief, each of the genotyped SNPs was dropped in turn and tagSNPs were selected from the remaining SNPs so that their haplotypes predicted the remaining SNPs with an R 2 value of 0.85. We then estimated how well the tagSNP haplotypes of the remaining SNPs predicted the dropped SNP, an evaluation that can provide an unbiased and accurate estimate of tagSNP performance (Weale et al, 2003;Iles, 2006).

Endometrial tumour characteristics and follow-up
We retrieved information for the endometrial cancer cases on date and cause of death until 31 December 2004 from the Swedish Causes of Death Registry and on date of emigration from the Swedish National Population Registry. Follow-up time began at date of diagnosis and ended on 31 December 2004, or at the date of death or emigration, whichever came first.
Endometroid endometrial carcinomas constituted the majority of the cancers. The endometroid tumours were classified according to cell differentiation: Grade I (well-differentiated carcinomas, maximum 5% solid areas); Grade II (moderately differentiated, 6 -50% solid areas); and Grade III (poorly differentiated or undifferentiated, more than 50% solid areas). Myometrial invasion was classified as present (at least 50% of the myometrial thickness or through the serosa) or absent (none or o50% of the myometrial thickness).

Statistical analyses
We applied unconditional logistic regression models for assessing the association between ESR1 and EGF tagSNPs and risk of endometrial cancer (case -control analysis) or myometrial invasion (case-only analysis). Adjusting for age (in 5-year age groups) did not affect our results. We estimated the hazard ratio (HR) of death due to endometrial cancer in relation to the genes' tagSNP using Cox proportional hazards models. The tagSNPs were included as covariates in the models either one at a time or in groups of five (codominant main effects only). The latter method was used for detection of association with haplotypes and is referred to as the 'sliding window' analysis in the main text. Although it does not require resolution of gametic phase, tests based on such models can be powerful within regions of strong LD (Clayton et al, 2004). Likelihood ratio tests were used to generate P-values for comparing models with or without covariates.
Confounding has been defined as the presence of a common cause to the exposure and the outcome (Hernan et al, 2002). We believe that lifestyle and reproductive endometrial cancer risk factors are unlikely to cause genetic variation and we thus did not adjust for them in the analyses.
We made adjustments to our test results to account for multiplicity. We did so for each outcome (risk, myometrial invasion, and survival) separately. We used a permutation-based approach that controls for the family-wise error rate (probability of rejecting one or more true null hypotheses of no association). Analyses were carried out using the statistical software R (Team, 2005) and the SAS system (release 9.1, SAS Institute Inc., Cary, NC, USA). Table 1 shows selected characteristics of the study participants. Statistically significant differences between cases and controls reflected established associations.

Characteristics of participants
Those cases who participated in our study through tissue sample donation were on average 2.1 years older than the cases who donated a blood sample (P ¼ 0.002). The former group was also more likely to have poorly differentiated (grade 3) tumours (P ¼ 0.08). As no significant differences in genotype frequencies within Grade 1, Grade 2, and Grade 3 were evident between the two groups of cases (data not shown), this difference is unlikely to be a cause for concern.
Genotyping, LD pattern, and coverage We selected a dense set of markers in the ESR1 and EGF genes for genotyping in 92 randomly selected controls. We successfully genotyped 228 SNPs in the ESR1 gene and 104 SNPs in the EGF gene in the 92 controls. The SNP names, physical positions, MAF, and HWE P-values have been published earlier as Supplementary Tables 1 and 2 in Einarsdóttir et al (2008). Summary statistics on genotyping results and SNP coverage for the ESR1 and EGF genes are shown in Table 2. Out of the SNPs successfully genotyped in the 92 controls, those SNPs that conformed to HWE and that were at least 3% in MAF were included in our study (Table 2). We thus included 157 SNPs in ESR1 and 54 SNPs in EGF in our study for LD mapping and tagSNP selection. The LD plots from the SNPs included in our study have been published earlier as Supplementary Figures 1 and 2 in Einarsdóttir et al (2008).
Using the SNP-dropping method (Weale et al, 2003), we assessed the ability of the selected tagSNPs to capture variation in the SNPs we did not genotype in our study. We found that the tagSNPs could efficiently capture non-genotyped SNPs in the genes (Table 2).

Association analyses
We selected 52 tagSNPs in ESR1 and 15 tagSNPs in EGF that could predict the SNPs included in our study and their haplotypes with an R 2 of at least 0.8. The tagSNPs were genotyped in all cases and controls (see Supplementary Table 1 for numbers as well as MAF and HWE calculations), but seven tagSNPs in ESR1 and one tagSNP in EGF could not be genotyped in the cases who participated through tissue sample donation. The figures also show P-values (P win ) generated from a sliding window approach where five adjacent tagSNPs were analysed together in a regression model (without resolution of gametic phase).
The single tagSNP analysis indicated a 15-kb region of five tagSNPs -including TAG5 (rs3853250) to TAG9 (rs1709181) -that were all associated with endometrial cancer risk (Figure 1), but that did not withstand multiple testing correction. The sliding window analysis including TAGs 5 -9 gave a P-value of 0.072 ( Figure 1). This region -which covered intron 1 and intron 2 at the 5 0 end of ESR1 -included the PvuII (TAG6, OR 0.83, 95% CI 0.74 -0.95) and XbaI (TAG7, OR 0.82, 95% CI 0.71 -0.94) variants. We show individual genotype associations for the five tagSNPs in Table 3.
We reconstructed haplotypes from TAGs 5 -9 and found that five common (40.03) haplotypes accounted for 95% of the chromosomes. We included expected 'dosages' of the five common haplotypes and the rare haplotypes (combined into a single group) as covariates in a logistic regression model with the most common haplotype as reference (Stram et al, 2003). We found a haplotype (haplotype 2) that carried all the five minor alleles of TAGs 5 -9 was associated with endometrial cancer risk (P ¼ 0.0026), compared with haplotype 1, which carried none of the minor alleles (Table 4). This association did not carry over to the global test (P ¼ 0.067).
The strongest signal in relation to endometrial cancer risk (Figure 1) came from TAG12 (rs1033182, OR 0.82, 95% CI 0.72 -0.94, P ¼ 0.003) -situated 20 kb from TAG9 -but this again did not withstand adjustment for multiple testing (P ¼ 0.154). When we carried out the single tagSNP analyses within groups defined by duration of menopausal oestrogen only use or family history, we found that the protective effect of TAG12 on endometrial cancer risk was to a large extent confined to never users of menopausal oestrogen. Yet, the P-value for interaction did not suggest that the effect of TAG12 depended on oestrogen only use (P ¼ 0.18).
Neither the single tagSNP analysis nor the window analysis showed an association of common genetic variation in the ESR1 gene with myometrial invasion (Supplementary Figure 1

EGF
Our data did not indicate that common genetic variants in the EGF gene are associated with endometrial cancer risk (Supplementary Figure 3) or myometrial invasion (Supplementary Figure 4). With regard to endometrial cancer survival, the single tagSNP analysis signified a 51-kb 5 0 region of five tagSNPs in EGF -including TAG1 (rs718768, HR 0.42, 95% CI 0.22 -0.81, P ¼ 0.0038) to TAG5 (rs1024600, HR 0.49, 95% CI 0.26 -0.92, P ¼ 0.016) -that were associated with the risk of dying from endometrial cancer (Figure 2). The association signals were, however, rendered not significant after multiple testing adjustment. The sliding window analysis including TAGs 1 -5 gave a P-value of 0.055 for endometrial cancer survival (Figure 2). Table 3 shows individual genotype associations for the five tagSNPs in EGF.
In this region of EGF, 10 kb downstream of TAG1 and 2 kb upstream of TAG2 (rs881878), lies the untranslated A61G (rs4444903) polymorphism. This polymorphism was genotyped in our 92 Swedish controls and was in LD with TAG2 (R 2 ¼ 0.84). TAG2 was associated with the risk of dying from endometrial cancer with a HR of 0.57 (95% CI 0.33 -0.97, P ¼ 0.028). TAG1  TAG2  TAG3  TAG4  TAG5  TAG6  TAG7  TAG8  TAG9  TAG10  TAG11  TAG12  TAG13  TAG14  TAG15  TAG16  TAG17  TAG18  TAG19  TAG20  TAG21  TAG22  TAG23  TAG24  TAG25 TAG27  TAG28  TAG29  TAG30  TAG31  TAG32  TAG33  TAG34  TAG35  TAG36  TAG37  TAG38  TAG39  TAG40  TAG41  TAG42  TAG43  TAG44  TAG45  TAG46  TAG47  TAG48  TAG49  TAG50  TAG51 Figure 1 Association of the 52 tagSNPs in ESR1 with endometrial cancer risk. Squares and horizontal lines represent odds ratios (change in risk with each addition of the rare allele) and their confidence intervals. Sizes of the squares reflect the minor allele frequencies. P ¼ P-value for an association of each tagSNP with endometrial cancer risk. P win ¼ P-value from a model including a window of five tagSNPs (the P-value aligns with the middle tagSNP of each window) for an association with endometrial cancer risk. When we reconstructed haplotypes from TAGs 1 -5 in EGF, we found that four common haplotypes (40.03) accounted for 95% of the chromosomes in the cases. We constructed a Cox proportional hazards model including the expected 'dosages' of the common haplotypes and rare haplotypes (combined into a single group) with the most common haplotype as reference (Table 4). A haplotype (haplotype 2) carrying all rare alleles of the five tagSNPs decreased the risk of dying from endometrial cancer (P ¼ 0.006, HR 0.33, 95% CI 0.15 -0.73) compared with a haplotype carrying none of the rare alleles. This association did not, however, carry over to the global test (P ¼ 0.10).

DISCUSSION
Using a comprehensive tagging approach of common variation in the ESR1 and EGF genes, we assessed whether common variants in the genes affected endometrial cancer risk, myometrial invasion, or endometrial cancer survival. The ESR1 gene did not seem to significantly affect myometrial invasion or endometrial cancer survival. However, a 5 0 region of five tagSNPs in ESR1 decreased endometrial cancer risk in our dataset. For the EGF gene, no association emerged between common variants in the gene and endometrial cancer risk or myometrial invasion, but we found a five tagSNP region at the 5 0 end of the gene where all five tagSNPs seemed to decrease the risk of dying from endometrial cancer. One of the five tagSNPs in this region was in strong LD with the A61G (rs4444903) EGF variant. None of the association signals in ESR1 or EGF withstood multiple testing correction.
Most earlier genetic association studies of the ESR1 gene in relation to endometrial cancer risk have focused on only a few common genetic polymorphisms. Among the most commonly studied variants are the PvuII (TAG6, rs2234693), XbaI (TAG7, rs9340799), codon 243 (TAG14, rs4986934) and codon 325 (TAG21, rs1801132) variants. Two groups that have explored both the PvuII and XbaI variants in relation to endometrial cancer risk (Weiderpass et al, 2000;Iwamoto et al, 2003) found that the variants decreased the risk of the disease. However, a group that investigated the PvuII, codon 243 and codon 325 variants found no association with endometrial cancer risk (Sasaki et al, 2002). In our earlier study of the ESR1 gene, we investigated the PvuII, XbaI, codon 243, codon 325 variants, and a 5 0 promoter microsatellite (rs2234670), and found the PvuII and XbaI variants as well as the microsatellite to be associated with endometrial cancer risk ). In the current study, we went on to explore the gene in greater detail and found a 5 0 region of five tagSNPs in the ESR1 gene -including the PvuII (TAG6, rs2234693) and XbaI (TAG7, rs9340799) variants -that decreased endometrial cancer risk. It is interesting that, in our earlier publication of the ESR1 gene in relation to breast cancer risk and survival, the same region showed a tendency towards a decreased risk of breast cancer (Einarsdóttir et al, 2008). Nevertheless, it is unlikely that any of the SNPs genotyped within this region affect ESR1 protein structure as none of them were located in exons. It is still a possibility, however, that the SNPs themselves or one or more SNPs in LD with any of them affect the regulation of ESR1 protein expression. In fact, the PvuII variant has been suggested to produce a functional binding site for the transcription factor B-myb (Herrington et al, 2002). This is the first study to explore the EGF gene in relation to endometrial cancer risk and survival. We found five tagSNPs -one upstream of the gene and the others in the 5 0 introns -that were associated with the risk of dying from endometrial cancer. This five-tagSNP region encompassed the 5 0 untranslated A61G (rs4444903) polymorphism, which was in high LD with TAG2, one of the five tagSNPs. The A61G variant has been found to affect the risk of malignant melanoma (Shahbazi et al, 2002), hepatocellular carcinoma in patients with cirrhosis (Tanabe et al, 2008), gastric cancer (Hamai et al, 2005;Jin et al, 2007), and glioma (Costa et al, 2007). It has also been suggested to affect malignant melanoma survival (Okamoto et al, 2006) and oesophageal cancer survival (Jain et al, 2007). Furthermore, the 61*G allele has been found to have a significantly more active promoter (Vauleon et al, 2007) and produce more EGF protein than the 61*A allele (Shahbazi et al, 2002;Tanabe et al, 2008).
Our study was a population-based case -control study with a well-defined study base. All participants were Caucasian and born in Sweden between 1919 and 1944, at a time when foreign immigration to Sweden was still rare (Statistics Sweden, 2004), which means population stratification is of limited concern in our study. Cases were ascertained from the nation-wide Swedish  tagSNP  TAG1  TAG2  TAG3  TAG4  TAG5  TAG6  TAG7  TAG8  TAG9  TAG10  TAG11  TAG12  TAG13  TAG14 Figure 2 Association of the 15 tagSNPs in EGF with endometrial cancer survival. Squares and horizontal lines represent hazard ratios (change in risk with each addition of the rare allele) and their confidence intervals. Sizes of the squares reflect the minor allele frequencies. P ¼ P-value for an association of each tagSNP with endometrial cancer survival. P win ¼ P-value from a model including a window of five tagSNPs (the P-value aligns with the middle tagSNP of each window) for an association with endometrial cancer survival.
Cancer Registries, that contain virtually complete data on incident cancers in Sweden (The Swedish National Board of Health and Welfare, 2006). Information on date and cause of death of the cases was obtained from the Causes of Death Registry in Sweden, which has been found to be highly reliable (Nystrom et al, 1995). It is therefore likely that there is little misclassification of the outcome. Differential misclassification of the exposure is also unlikely to have accounted for our results. Genotyping was carried out using genotyping methods with low error rates, all genotyping plates included positive and negative controls, DNA samples were randomly assigned to the plates, and our genotyping personnel were blinded to case -control status. We also replicated genotype calls for a subset of samples using a separate genotyping method with over 99.5% genotype concordance. One limitation of our study was the relatively low participation rates in our study and the small number of deaths included in the survival analysis. The lack of participation may have been associated with severe disease or death. In an attempt to minimise the problem, we sought to obtain tissue samples from the deceased cases and those cases that had declined donation of a blood sample, and were able to obtain the majority of the samples requested. It is thus unlikely that the relatively low participation was related to genotype and we thus presume that the main problem with regard to lack of participation and therefore low number of deaths was decreased power in our study.
Another limitation that deserves mentioning is the fact that we were unable to genotype seven tagSNPs in ESR1 and one tagSNP in EGF in the tissue samples. In the case of these eight tagSNPs being associated with severe disease, the association with risk of endometrial cancer death might have been biased towards null in our study because we could not genotype all the severe cases. None of the eight tagSNPs were actually associated with endometrial cancer survival in our study. The fact that the results were not different when we restricted our analyses to the most severe cases among those who donated blood samples indicates that the eight tagSNPs were truly not associated with severe disease.