Main

Breast cancer is a complex disease, with multiple genetic and environmental factors involved in its etiology. Rare mutations in the DNA repair genes BRCA1 and BRCA2 confer a high lifetime risk of breast cancer (Antoniou et al, 2003) and are routinely screened for in women with a strong family history of the disease. Studies focused on other DNA repair genes have led to the discovery that rare coding variants in CHEK2, ATM, BRIP1 and PALB2 (Swift et al, 1987; Meijers-Heijboer et al, 2002; Seal et al, 2006; Rahman et al, 2007) are associated with moderately increased breast cancer risk. However, few, if any, candidate-gene- or pathway-based association studies have identified convincing associations with breast cancer risk for common genetic variants (The Breast Cancer Association Consortium, 2006). In contrast, empirical genome-wide association studies (GWAS) have proven to be a successful approach to identify common variants associated with small increases in risk, with more than 70 identified in this way to date (Easton et al, 2007; Hunter et al, 2007; Stacey et al, 2007, 2008; Ahmed et al, 2009; Thomas et al, 2009; Zheng et al, 2009; Antoniou et al, 2010; Turnbull et al, 2010; Cai et al, 2011; Fletcher et al, 2011; Haiman et al, 2011; Ghoussaini et al, 2012; Siddiq et al, 2012; Bojesen et al, 2013; Garcia-Closas et al, 2013; Michailidou et al, 2013). For the great majority of these associations, the causal variant(s), and even the causal gene, are unknown; thus, the identification of novel candidate genetic susceptibility pathways through this approach is not straightforward.

An intronic variant in the FGFR2 gene was one of the first single-nucleotide polymorphisms (SNPs) identified by GWAS as tagging a breast cancer susceptibility locus (Easton et al, 2007; Hunter et al, 2007). It is now well-established that the minor allele of this SNP is associated with increased risk of breast cancer, particularly estrogen receptor (ER)-positive disease (Garcia-Closas et al, 2008). Fine-mapping of the region has suggested that at least one causal variant is located in intron 2 of FGFR2 (Easton et al, 2007; Udler et al, 2009), and functional studies have proposed that rs2981578 affects FGFR2 expression (Meyer et al, 2008; Udler et al, 2009; Huijts et al, 2011). These findings strongly suggest that FGFR2 is a breast cancer susceptibility gene.

FGFR2 is a fibroblast growth factor (FGF) receptor gene; the amino-acid sequence of the protein it encodes is highly conserved across all FGF receptors. The other FGF receptor genes and other genes acting downstream of them in the FGF pathway may also be implicated in the development of breast cancer, although associations with disease risk have not been assessed comprehensively by a study with adequate sample size to detect odds ratios (ORs) of the magnitude observed for SNPs in FGFR2.

We hypothesised that common variants in other genes in the FGF pathway, and in the other FGF receptor genes in particular, might also confer increased breast cancer risk. The primary aim of our investigation was to comprehensively assess associations between breast cancer risk and common variation in the FGF receptor genes FGFR1, FGFR3, FGFR4 and FGFRL1 by genotyping selected tag-SNPs in the Breast Cancer Association Consortium (BCAC). A secondary objective was to assess common variants in other genes in the FGF pathway based on a two-stage design.

Materials and methods

Participants

Study participants were women from 49 studies participating in BCAC: 38 from populations of predominantly European ancestry, 9 of Asian women and 2 of African–American women (Table 1 and Supplementary Table 1). The majority were population-based or hospital-based case–control studies, but some studies selected subjects based on age or oversampled for cases with a family history or bilateral disease. Cases and controls from the CNIO-BCS were also studied in a previous assessment of selected genes in the FGF pathway. All study participants gave informed consent and each study was approved by the corresponding local ethics committee.

Table 1 Number of cases and controls included, by study

Gene and SNP selection

Ingenuity Pathways Analysis and selected publications (Eswarakumar et al, 2005; Presta et al, 2005; Chen & Forough, 2006; Schwertfeger, 2009) were used to identify genes reported to be involved downstream of the FGF genes in the FGF pathway, particularly those related to angiogenesis. A total of 39 genes, including the FGF receptors FGFR1 (located at 8p11.22), FGFR2 (10q26.13), FGFR3 (4p16.3), FGFR4 (5q35.2) and FGFRL1 (4p16.3), was selected for tagging. Single-nucleotide polymorphisms with minor allele frequency (MAF) >5% in the coding and non-coding regions, and within 5 kb upstream and 5 kb downstream of each gene, were identified using HapMap CEU genotype data and dbSNP 128 as reference. The minimum number of tag-SNPs were then selected among all identified SNP using Haploview (Barrett et al, 2005) based on the following criteria: r2>0.8 and Illumina assay score >0.60. A total of 384 SNPs tagging 39 genes was genotyped in the CNIO-BCS, 324 of which were successfully genotyped (Supplementary Table 2). The 31 SNPs tagging genes FGFR1, FGFR3, FGFR4 and FGFRL1 were all genotyped in BCAC, along with a further 26 of the 324 tag-SNPs. The latter group comprised SNPs selected based on evidence of association with breast cancer under a log-additive model in the Stage 1 CNIO-BCS. Single-nucleotide polymorphisms in FGFR2 were not considered, as all were included as part of a separate fine-mapping study (Meyer et al, submitted). Results from Stage 1 are summarised in Supplementary Table 2.

Genotyping

Genotyping of the 57 SNPs in the BCAC samples was conducted using a custom Illumina Infinium array (iCOGS) in four centers, as part of a multi-consortia collaboration (the Collaborative Oncological Gene–Environment Study, COGS) as described previously (Michailidou et al, 2013). Genotypes were called using Illumina’s proprietary GenCall algorithm.

For the genotyping of the 384 SNPs in the Stage 1 CNIO-BCS, genomic DNA was isolated from peripheral blood lymphocytes using automatic DNA extraction (MagNA Pure, Roche Diagnostics, Indianapolis, IN, USA) according to the manufacturer’s recommended protocols. This DNA was quantified using Picogreen (Invitrogen, Life Technologies, Grand Island, NY, USA) and for each sample a final quantity of 250 ng was extracted and used for GoldenGate genotyping with VeraCode Technology (Illumina Inc., San Diego, CA, USA). Samples were arranged on 25 96-well plates containing one negative control and at least one study sample in duplicate. Three Centre d’Etude du Polymorphisme Humain (CEPH) trios were used as internal intra- and inter-plate duplicates and to check for Mendelian segregation errors. DNA was extracted, quantified, plated and genotyped at the Spanish National Genotyping Centre (CeGen), Madrid, Spain. All genotypes were determined for each SNP and each plate using manual clustering. Single-nucleotide polymorphisms with call rate <90% were excluded, as were samples with no-calls for more than 20% of included SNPs.

Statistical methods

For each SNP, we estimated ORs and 95% confidence intervals (CIs) using unconditional logistic regression. For the analysis of BCAC data, we considered per-allele and co-dominant models using common-allele homozygotes as reference and including study and ethnicity-specific principal components as covariates, as previously described (Michailidou et al, 2013). Departure from the Hardy–Weinberg equilibrium (HWE) was tested for in controls from individual studies using the genhwi module in STATA 11.2 (College Station, TX, USA). A study-stratified χ2 test (1df) was applied across studies (Haldane, 1954; Robertson & Hill, 1984). Between-study heterogeneity in ORs was assessed for each of the three broad racial groups using the metan command in STATA to meta-analyse study-specific per-allele log-OR estimates and generate I2 statistics; values greater than 50% were considered notable (Higgins & Thompson, 2002). Odds ratios specific to disease subtypes defined by ER, PR and HER2 status (positive and negative) were estimated separately for each ethnic subgroup using polytomous logistic regression with control status as the reference outcome. Differences in ORs by disease subtypes were assessed using a likelihood ratio test (LRT). All statistical tests were two-sided.

The effective number of independent SNPs (VeffLi) was estimated using the method described by Li & Ji (2005). This method was applied via the web-interface matSpDlite (http://gump.qimr.edu.au/general/daleN/matSpDlite/), based on the observed correlations between SNPs (Nyholt, 2004). VeffLi was then used to calculate a Bonferroni-corrected significance threshold (α*). Power calculations were carried out using Quanto v1.2.4 (http://hydra.usc.edu/gxe/).

Single-nucleotide polymorphism imputation

The genotypes of untyped SNPs were imputed based on data from the March 2012 release of the 1000 genomes project using IMPUTE v2.2. These were converted to allele doses using the impute2mach function in the GenABEL library in R (Aulchenko et al, 2007) and analysed under a per-allele model. Imputed SNPs with an estimated MAF <5% were excluded, as were SNPs with an imputation r2<80%.

Results

All SNPs in the present analysis had overall call rates >95%. Very strong evidence of departure from HWE was observed for rs34869253 for one study (pKarma, P=3.3 × 10−21), which was excluded from the subsequent analyses of that SNP. After quality control, there were data available for 53 835 cases and 50 156 controls from BCAC, including 89 050 European women (46 450 cases and 42 600 controls), 12 893 Asian (6269 cases and 6624 controls) and 2048 African–American women (1116 cases and 932 controls) (Table 1).

Results from the analysis of the 31 tag-SNPs in FGFR genes for white Europeans are summarised in Table 2. No strong evidence of association was observed, although one SNP (rs743682) in FGFR3 (MAF=9%) was marginally significant after correction for multiple testing based on a VeffLi of 23 (per-allele OR=1.05, 95%CI=1.02–1.09, P=0.0020, α*=0.0022). All SNPs with an associated P-value <0.05 were intronic, with the exception of rs1966265, which is a missense variant in FGFR4. However, PolyPhen (http://genetics.bwh.harvard.edu/pph2/) predicts this amino acid change to be benign, with a score of 0.000. On the basis of ENCODE data, no SNP with an associated P-value <0.05 was located in a region involved or predicted to be involved in epigenetic regulation, nor at, or within 2 kb of, a CpG island. For European women, we did not observe any evidence of between-study heterogeneity for any SNPs (I219%; P0.15) and little evidence of differential associations by disease subtypes defined by ER (P0.036), PR (P0.084) or HER2 status (P0.019).

Table 2 Summary results for SNPs in FGF receptor genes for white European women

We similarly observed little evidence of association with overall breast cancer risk in Asian and African–American women (Supplementary Tables 3 and 4, respectively). Nevertheless, a consistent result was observed for Europeans and Asians for rs1966265 in FGFR4. The estimated OR per risk (G) allele was 1.03 (95%CI=1.01–1.05; P=0.0060) for European women and 1.08 (95%CI=1.03–1.14; P=0.0036) for Asian women. There was no evidence of heterogeneity by race for any of the 31 SNPs in FGF receptors (I2=18%; P=0.14).

The SNPs genotyped were estimated to capture a variable proportion of the common variation in the four genes considered, as described in the 1000 genomes project; at r20.80, this coverage was 75% for FGFR1, 77% for FGFR3, 66% for FGFR4 and 17% for FGFRL1. This coverage was dramatically improved with the inclusion of imputed common SNPs (with imputation r2>0.80) to 95%, 93%, 97% and 84% for FGFR1, FGFR3, FGFR4 and FGFRL1, respectively. No stronger evidence of association was observed for any imputed SNPs (Supplementary Tables 5–8).

Finally, we observed little evidence of association for any of the 26 SNPs in other genes in the FGF pathway, selected based on results from Stage 1 (Supplementary Table 9). The results were consistent across the three ethnic groups considered and for disease subtypes defined by ER, PR and HER2 expression.

It is noteworthy that strong association signals were observed in the Stage 1 Spanish study for selected tag-SNPs rs10736303 (MAF=0.49; per-allele OR=1.37, 95% CI=1.21–1.55, P=2.8 × 10−7), and rs2981582 (MAF=0.40; per-allele OR=1.35, 95% CI=1.19–1.53, P=8.3 × 10−7), both in FGFR2.

Discussion

In this multicentre case–control study, we comprehensively assessed common variation in the FGF receptor genes FGFR1, FGFR3, FGFR4 and FGFRL1 in 53 835 cases and 50 156 controls and found little evidence of association with risk of breast cancer. This is the largest study we know of assessing a family of genes via a candidate approach based on the findings from GWAS.

A non-trivial issue in analyses of this kind is the establishment of a statistical significance threshold that adequately controls the proportion of false-positive findings. As permutation-testing was not feasible due to the sample size and number of dummy variables required to adjust for study, we dealt with the issue of non-independence of multiple tests by estimating that the 31 tag-SNPs represented an effective number of 23 independent variables, and applying a Bonferroni correction accordingly. The association of one SNP (rs743682) in FGFR3 for European women was found to be statistically significant on this basis. However, the P-value threshold applied is somewhat questionable in the context of the total of more than 70 000 SNPs nominated for genotyping by BCAC and the total 210 000 genotyped on the iCOGS array. Thus, the current result is far from genome-wide statistical significance and certainly requires independent replication. In any case, the per-allele ORs for FGFR3_rs743682 (1.05, 95% CI=1.02–1.09) and FGFR4_rs1966265 (1.03, 95% CI=1.01–1.05) appear to be substantially lower than that for rs2981582 in FGFR2 (1.26, 95% CI=1.23–1.30) (Easton et al, 2007).

We estimated that for common SNPs (MAF >0.05) associated with overall breast cancer risk in European women, we had greater than 99% power to detect at genome-wide statistical significance (P<5 × 10−8) a per-allele OR as low as 1.23 (the lower 95% confidence limit for the OR for FGFR2_rs2981582). For a per-allele OR as low as 1.05 and for SNPs with MAF of 0.10, 0.20 and 0.30, the estimated power was 1%, 10% and 24%, respectively. That is, our study provides strong evidence that common variation in FGFR1, FGFR3, FGFR4 and FGFRL1 are not associated with breast cancer risk to the degree observed for SNPs in FGFR2, although associations of smaller magnitude may exist.

The hypothesis underlying our study was based on the identification of a functional SNP in intron 2 of FGFR2 associated with breast cancer susceptibility (Easton et al, 2007; Meyer et al, 2008; Udler et al, 2009; Huijts et al, 2011). A recent study has subsequently identified three independent risk signals within FGFR2, and uncovered likely causal variants and functional mechanisms behind them (Meyer et al, 2013). Although an association between these SNPs and expression of FGFR2 has not been established, these results provide strong evidence that FGFR2 is the target gene, and it therefore seems plausible that other FGF receptors or genes acting in the FGF pathway might also be implicated in breast cancer risk. However, we find little evidence that this is the case for the receptors, at least not to the extent observed for common variants in FGFR2. Admittedly, the degree to which common variation in the FGF receptor genes was tagged by the genotyped SNPs was good for FGFR1, FGFR3 and FGFR4 and poor for FGFRL1, but substantial improvement was afforded by imputation. Nevertheless, it is possible that common variation not captured by the genotyped or imputed SNPs may be associated with breast cancer risk. It is also possible that these genes may be implicated in disease susceptibility via regulatory mechanisms involving variants outside the chromosomal boundaries defined for each gene considered. That said, few studies have assessed common variation in candidate genes to this extent, in terms of both gene coverage and sample size.

The power of our study was much lower for Asian and African–American women; however, our primary focus on European women is consistent with our hypothesis, based on the previous finding in FGFR2 in this population. Our study was also limited by the power and gene coverage of the stage 1 component which assessed tag-SNPs in the selected genes of the FGF pathway. Therefore, no conclusions can be drawn about the potential implication of common variation in these genes in breast cancer susceptibility. Nevertheless, we checked the chromosomal locations of the 76 established risk-associated loci (http://www.nature.com/icogs/primer/shared-susceptibility-loci-for-breast-prostate-and-ovarian-cancers/) and found that none were located within 10 kb of any of the 39 genes considered, with the exception of the FGFR2 locus.

In conclusion, in this, possibly the largest candidate-gene association study carried out to date, we have observed little evidence of association between common variation in the FGFR1, FGFR3, FGFR4 and FGFRL1 genes and risk of breast cancer. Our results suggest that common variants in these FGF receptors are not associated with risk of breast cancer to the degree observed for FGFR2.