Main

Worldwide, breast cancer is the most common cancer in women with an estimated 1.7 million cases and 521 900 deaths in 2012 (Torre et al, 2015). The heritability of breast cancer, that is, the proportion of the phenotypic variance that can be attributed to the germline genotype, has been estimated to be about 30% (Lichtenstein et al, 2000; Mucci et al, 2013). Furthermore, women with a first-degree relative affected by breast cancer have a twofold higher risk of developing the disease (Pharoah et al, 1997). Known genetic factors contributing to a higher lifetime risk of breast cancer comprise rare variants with moderate to high penetrance in BRCA1, BRCA2, PALB2, ATM and CHEK2, as well as over 90 common variants with low penetrance (Michailidou et al, 2015). Taken together, these variants explain about 37% of the excess familial risk (Michailidou et al, 2015) (Figure 1).

Figure 1
figure 1

Proportions of familial risk of breast cancer explained by hereditary variants.

PowerPoint slide

Apart from genetic factors, there are various well-established environmental factors (including endogenous and exogenous exposure to hormones and other substances, and lifestyle factors) associated with breast cancer risk. These include reproductive factors such as age at menarche, age at first birth, parity, breastfeeding and age at menopause. Reproductive factors are generally considered to be non-modifiable risk factors, that is, they cannot be controlled or changed as public health measures. Other risk factors are considered modifiable such as extended use of menopausal hormone therapy, high alcohol consumption, lack of physical activity and high body mass index (BMI).

Women carrying a certain genetic variant may be at greater risk for breast cancer, if a related environmental factor is present or absent. Such gene–environment interactions have been identified for bladder cancer, where smokers carrying variants in two carcinogen-metabolising genes (NAT2 and GSTM1) are at higher risk of disease compared with non-smokers (Garcia-Closas et al, 2005), as well as for oesophageal squamous cell carcinoma, where alcohol users carrying variants in the alcohol-metabolising pathway (ADH1B and ALDH2) are at higher risk compared with non-users (Wu et al, 2012).

General concept of gene–environment interaction studies

Knowledge of gene–environment interaction is important for risk prediction and the identification of certain high-risk populations to inform public health strategies for targeted prevention. Gene–environment interaction studies may also help to discover novel genetic risk factors; for example, variants in NAT2 and GSTM1 may not have been identified as relevant genetic risk factors, if their association with bladder cancer risk would have been evaluated in a population of non-smokers. In the same way, novel environmental risk factors might be identified when the environmental exposure is a mixture of several composites (Hunter, 2005). For example, when investigating whole-body fat distribution, with new techniques such as MRI, as a breast cancer risk factor, investigating gene–environment interactions may help to determine probable risk increasing or protective relationships between breast cancer risk and different adipose tissue compartments, such as subcutaneous or visceral fat. Gene–environment interactions studies may also be used to gain insight into the biological mechanisms underlying an association between a risk factor and the disease of interest. In the bladder cancer example, gene–environment interaction studies pointed to aromatic amines as the most important bladder carcinogen in tobacco smoke.

To investigate gene–environment interaction, information on both components of the relationship has to be available. Most often, gene–environment interaction is evaluated on the multiplicative scale, that is, investigators assess whether the joint effect measured as relative risks or odds ratios of the genetic and the environmental factor is significantly greater or smaller than would be expected by multiplying the relative risks for the genetic or environmental factor alone. However, gene–environment interaction can also be assessed as departure from the additive model, when looking at risk differences rather than ratio measures.

One of the biggest problems to overcome is to gather the sample size needed to investigate gene–environment interaction with sufficient power. As a general rule of thumb, studies aiming at evaluating departure from multiplicativity have to be at least four times the size needed to investigate effects of the genetic or environmental factor alone (Smith and Day, 1984). This review focuses therefore on studies analysing a large sample coming from a single study, and on studies in which data from multiple studies were pooled to achieve greater power.

Studies assessing the potential modification of known breast cancer risk variants by environmental factors

Several studies have investigated whether breast cancer risk associated with the common susceptibility variants might be modified by environmental factors. Here the investigators formally tested for gene–environment interaction on the multiplicative scale. Details on the studies can be found in Table 1. Prior to the large studies, a number of smaller studies investigated possible interactions between the known susceptibility variants in FGFR2 and menopausal hormone therapy and yielded inconclusive results (Travis et al, 2010). In the beginning, studies included a limited set of single-nucleotide polymorphism (SNPs), which were identified by the first genome-wide association studies (GWAS) to be associated with breast cancer risk.

Table 1 Overview of studies investigating gene–environment interaction for breast cancer risk in the general population

Travis et al (2010), investigated gene–environment interactions between 12 SNPs and several risk factors, analysing a sample of 7610 breast cancer cases and 10 196 controls from the Million Women Study. After accounting for the number of tests performed, none of the gene–environment interactions investigated reached statistical significance (Travis et al, 2010). The study of Milne et al (2010), which was based on data from up to 26 349 breast cancer cases and 32 208 controls from 21 case–control studies participating in the Breast Cancer Association Consortium (BCAC), confirmed the absence of significant gene–environment interactions after accounting for multiple testing. This study investigated 12 SNPs, 9 of them overlapping with those included in the study by Travis et al (2010) and (Milne et al, 2010). An investigation in 8576 breast cancer cases and 11 892 controls nested within the Breast and Prostate Cancer Cohort Consortium (BPC3) also found no evidence for gene–environment interactions between 17 common susceptibility loci and 9 environmental factors investigated (Campa et al, 2011).

The second gene–environment interaction study within BCAC (Nickels et al, 2013) assessed 11 additional newly identified susceptibility SNPs (23 SNPs in total) using a larger sample size than before, including up to 34 793 breast cancers and 41 099 controls. Additional environmental risk factors were also investigated, such as use of menopausal hormone therapy and alcohol consumption. This study replicated two potential interactions that had previously been reported by Travis et al (2010) and Milne et al (2010) but deemed statistically non-significant. In the Million Women Study, the per-allele odds ratio (OR) for CASP8-rs1045485 was 0.99 (95% confidence interval (CI) 0.92–1.07) in women who reported to consume on average less than one drink per day and 1.23 (95% CI 1.09–1.38) in those who reported to consume one or more drinks per day (P-value for interaction=0.003; Travis et al, 2010). In BCAC, CASP8-rs1045485 was also associated with an increased breast cancer risk in women who consumed 20 g per day alcohol (per-allele OR 1.45, 95% CI 1.14–1.85). In women who consumed less, an inverse association was present (per-allele OR 0.91, 95% CI 0.84–0.98, P-value for interaction=0.0003). The second confirmed interaction was between LSP1-rs3817198 and number of live births. In Milne et al (2010), the association between LSP1-rs3817198 and breast cancer was strongest in women having had at least four live births (per-allele OR 1.24, 95% CI 1.11–1.38) and weaker for those having less births, with the weakest association in women having only one live birth (per-allele OR 1.04, 95% CI 0.97–1.11, P-value for interaction=0.002). The per-allele ORs for LSP1-rs3817198 was very similar in the larger data set analysed by Nickels et al (2013) with per-allele ORs 1.26 (95% CI 1.16–1.37) and 1.08 (95% CI 1.01–1.16), respectively (P-value for interaction=2.4 × 10−6). When restricting the data set to 6266 cases and 3899 controls not included in the previous report, the interaction remained significant (P-value for interaction=0.002). Furthermore, Nickels et al (2013) reported a novel potential interaction between 1p11.2-rs11249433 and ever being parous (P-value for interaction=5.3 × 10−5), with a per-allele OR of 1.14 (95% CI 1.11–1.17) in parous women and 0.98 (95% CI 0.92–1.05) in nulliparous women (Nickels et al, 2013).

The reported interactions were followed up in another study conducted within BPC3 (Barrdahl et al, 2014). The study comprised up to 16 285 breast cancer cases and 19 376 controls and investigated 39 SNPs. A meta-analysis with results for 19 SNPs reported by BCAC was also included. The results for the two interactions confirmed by Nickels et al (2013) between CASP8-rs1045485 and alcohol consumption, and between LSP1-rs3817198 and number of live births were not replicated due to inconsistent results between BCAC and BPC3. The OR for interaction between LSP1-rs3817198 and number of live births in BPC3 was 0.97 (95% CI 0.94–1.00), whereas in BCAC a positive departure from a multiplicative model was observed (OR for interaction=1.06 (95% CI 1.04–1.09). Similarly, no interaction between CASP8-rs1045485 and alcohol consumption was found in BPC3 (OR for interaction=0.96 (95% CI 0.81–1.15)), whereas in BCAC the joint risks of alcohol intake and the CASP8-variant were super-multiplicative (OR for interaction=1.59 (95% CI 1.24–2.05).

For the interaction between 1p11.2-rs11249433 and ever being parous, the OR for interaction in BPC3 was in the same direction as in BCAC (OR for interaction BCAC=1.16 (95% CI 1.08–1.24); OR for interaction BPC3=1.07 (95% CI 0.95–1.20); OR for interaction META=1.13 (95% CI 1.07–1.20)). Here the lack of replication could be attributed to the smaller sample size of the study by Barrdahl et al (2014) in BPC3 compared with the study by Nickels et al (2013) in BCAC.

The heterogeneity in results of CASP8-rs1045485 and LSP1-rs3817198 between BCAC and BPC3 are unlikely to be explained by differences in the genetic make-up of the populations, since both were restricted to Caucasians and the minor-allele frequencies were comparable between BCAC and BPC3 samples. Although self-reports of the number of live births and other parity-related factors are generally accurate (Olson et al, 1997), this is often less the case for alcohol intake (Thiebaut et al, 2007). This may be one reason for the lack of replication of the interaction between CASP8-rs1045485 and alcohol consumption since misclassification biases the multiplicative interaction parameter towards the null (Garcia-Closas et al, 1999). Apart from that, many other sources of heterogeneity between the included studies in BCAC and BPC3 could potentially lead to inconsistent results, for example, differences in ranges of exposure levels, time period in which the recruitment took place (e.g. 80 s or early 2000 s), and study design. Also, the interaction between LSP1-rs3817198 and number of pregnancies was modelled similarly in BCAC and BPC3, but this was not the case for the interaction between CASP8-rs1045485 and alcohol consumption. In BCAC, alcohol consumption was modelled as a binary variable (<20 g per day versus 20 g per day; Nickels et al, 2013), whereas in BPC3 alcohol consumption was modelled as a linear continuous variable (in g per day) (Barrdahl et al, 2014).

Further efforts in BCAC led to the identification of 41 novel susceptibility variants for breast cancer (Michailidou et al, 2013). Potential modifications by environmental factors of the associations of these 41 variants as well as 6 additional variants associated with oestrogen receptor negative (ER−) breast cancer were evaluated in another study within BCAC (Rudolph et al, 2015). The data set was fairly large for some of the investigated gene–environment interactions (26 633 breast cancer cases and 30 119 controls) and, as can be seen from post hoc power calculations presented in Table 1, the power was sufficient to detect moderately sized gene–environment interactions. However, the interactions identified were small, so that none of them reached statistical significance after accounting for the number of tests performed (Rudolph et al, 2015). The power presented in Table 1 for the different studies are calculated for an assumed minor-allele frequency of 0.35. Figure 2 illustrates that power varies tremendously in gene–environment interaction studies depending on the frequency of the minor allele of the SNP investigated.

Figure 2
figure 2

Power for detecting gene–environment interaction given different allele frequencies. Power was calculated with Quanto 1.2.4, assuming a log-additive inheritance mode, a population prevalence of disease of 1%, an OR of 1.10 for the marginal association between SNP and disease, an OR of 1.20 for the marginal association between environmental factor and disease, a prevalence of the environmental factor of 0.15, a sample of 10 000 unmatched case–control pairs and an two-sided alpha of 5 × 10−6.

PowerPoint slide

Despite power issues, studies of gene–environmental interaction could contribute towards gaining insight into the biological mechanisms underlying modifying effects of susceptibility alleles with respect to breast cancer risk. In Nickels et al (2013), all findings with a P-value for interaction below 10−4 were reported, including that between 2q35-rs13387042 and current combined oestrogen/progestagen menopausal hormone therapy use, which was restricted to oestrogen-receptor-positive disease. At that time, the gene involved at this locus was still unknown. The closest known genes were TNP1 (transition protein 1), IGFBP5 (insulin-like growth-factor-binding protein 5), IGFBP2 (insulin-like growth-factor-binding protein 2) and TNS1 (tensin 1/matrix-remodelling-associated protein 6). The authors wrote that ‘observed effect modification would suggest that the gene involved may be responsive to steroid hormones’ (Nickels et al, 2013), which would have implicated the involvement of IGFBP5 or IGFBP2. Indeed, fine mapping of the 2q35 locus and functional studies led to the recognition that the G-allele downregulates IGFBP5 (Ghoussaini et al, 2014).

Interaction effects may be different for rarer variants with higher penetrance. A number of studies have thus assessed the effects of environmental factors on cancer occurrence among BRCA1/2 mutation carriers (e.g. Brohet et al, 2007). No large studies, that is, with at least 10 000 cases, investigated the interactions between mutations in BRCA1/2 or other high-risk genes and environmental factors. Therefore, gene–environment interaction was not formally investigated as departure from a multiplicative or additive model. One meta-analysis of relatively small studies was published, in which modifiers of cancer risk in BRCA1 or BRCA2 mutation carriers were reviewed (Friebel et al, 2014). Late age at first birth was identified as a probable factor modifying the risk of BRCA1 mutation carriers. A first birth at age 30 years or older was associated with a 35% (pooled estimate 0.65, 95% CI 0.42–0.99) reduction in risk of breast cancer compared with a first birth at an age younger than 30 years (Friebel et al, 2014). Friebel et al (2014), mention two possible explanations for this observation: being older when giving birth for the first time may have differential effects in women carrying an BRCA1 mutation compared with women without BRCA1 mutation, or the observed associations may be due to risk-reducing oophorectomy or bias in the ascertainment. In the general population, an older age at first birth is associated with an increased breast cancer risk (Reeves et al, 2009). Thus, if the associations between age at first birth and BRCA1 mutation are unbiased and differ depending on the presence of a BRCA1 mutation, then gene–environment interaction is present. However, data to assess the modification of breast cancer risk associated with rare variants like BRCA1 and BRCA2 are sparse and there is even less data for other rare variants like CHEK2. Whether gene–environment interactions are more common or stronger with rarer genetic variants has to be shown in future investigations.

To summarise, the associations between the susceptibility loci known today and breast cancer risk are not likely to be strongly modified by environmental factors. The potential gene–environment interactions that have been identified are of small to moderate magnitude. With interaction effects of this size, differences between additive and multiplicative interactions are subtle, as departures from either the multiplicative or additive model are minimal. A simulation study by Aschard et al (2012), showed that the inclusion of gene–gene and gene–environment interactions with small effects (relative risks between 0.5 and 2.0) is unlikely to strongly improve risk prediction for breast cancer. Apart from that, for most breast cancer susceptibility loci the causal variants have still to be to be discovered as well as the underlying biological mechanisms.

It has recently been shown that combining small effects from common genetic susceptibility loci into a genetic risk score helps to discriminate women with low and high breast cancer risk (Mavaddat et al, 2015). In a study by Li et al (2013), among Chinese women, the association between a genetic risk score constructed with 10 SNPs and breast cancer risk was not modified by any of the investigated environmental risk factors. One example, where several small single-SNP gene–environment interactions translated into a significant interaction between the respective genetic risk score and the environmental risk factor is a recent study by Qi et al (2014), with respect to BMI. In this study, consuming fried foods four or more times a week compared with consuming fried foods less than once a week was associated with an increase of 1 kg m−2 (s.e. 0.2 kg m−2) in BMI among women within the upper tertile of the genetic score. Among women in the lower tertile, frequent consumption of fried foods was associated with an increase in BMI of 0.5 kg m−2 (SE 0.2 kg m−2), P for interaction=0.005). Similar results were observed for men. Of the 32 variants included in the genetic score, 4 (FTO, GNPDA2, NEGR1, and SEC16B) showed nominally significant interactions with total fried-food consumption on BMI (P for interaction <0.05) and all interactions were in the same direction (that is, stronger association between fried-food consumption and BMI in carriers of one or more BMI increasing alleles). Therefore, if multiple unidirectional gene–environment interactions are present, then considering these may further improve discrimination, although the effect of each interaction may be small.

Exploratory studies aiming to find new genetic risk factors and gain aetiological insight

Instead of variants known to be associated with breast cancer risk, explorative gene–environment interaction studies investigate large numbers of variants or use a genome-wide approach. This approach offers the possibility to identify new genetic risk factors and to gain knowledge on the biological mechanisms underlying the associations between environmental risk factors and breast cancer risk.

In 2013, Hein et al (2013) published a GWAS that aimed to identify potential genetic modifiers of the association between menopausal hormone therapy use and ductal and lobular breast cancer risk. In the first stage, a case-only GWAS was conducted with current use of menopausal hormone therapy as the dependent variable in 731 breast cancer cases of a German study (MARIE). The 1200 SNPs with lowest P-values for interaction were selected for stage two. In stage two, gene–environment interactions were tested using case–control analyses in pooled sample of additional MARIE cases (N=1375) and controls (N=1974) as well as 795 cases and 764 controls from a Swedish case–control study (SASBAC). The lowest joint P-value for stage one and two was used to identify the interaction between variant rs6707272 on chromosome 2 and current use of MHT (joint P-value for interaction=3.0 × 10−7). This result was followed up in a replication stage using 5795 cases and 5390 controls from nine studies of the BCAC but the interaction could not be replicated (P-value for interaction=0.21). Most likely this study suffered from an underpowered replication stage, and from an overestimation of the interaction effect in stages one and two, a phenomenon known as ‘the winners curse’ (Kraft, 2008). Rudolph et al (2013), used a larger sample both for detection and replication, conducting a second genome-wide study with a case-only design in the first stage to identify potential interactions between SNPs and menopausal hormone therapy use on breast cancer risk. A meta-analysis of four studies comprising 2920 cases in total yielded 1391 variants showing interactions with use of menopausal hormone therapy with P-values for interaction <0.003, which were followed up. The replication stage based on case–control analysis comprising 7689 cases and 9266 controls from 11 studies in BCAC failed to detect any interactions at genome-wide significance level (Rudolph et al, 2013). These results indicate that strong modifiers of the established association between menopausal hormone therapy use and breast cancer risk are unlikely and larger studies will be required to detect more moderate interactions.

An exploratory study with the aim to identify novel genetic susceptibility loci for breast cancer was published in 2014 (Schoeps et al, 2014). In total, 71 527 SNPs were tested for interaction with 10 environmental risk factors of breast cancer. The SNPs were enriched for association with breast cancer risk as they had been proposed by BCAC consortium members to be genotyped using a custom Illumina iSelect genotyping array (iCOGS). To gain power, three recently developed two-step methods to test gene–environment interactions as well as a joint test of association and interaction were employed in the study [22]. Two SNPs (rs10483028 and rs2242714) in strong linkage disequilibrium located on chromosome 21 showed statistically significant interactions with BMI in postmenopausal women. The variants were identified by all four methods applied. A third variant, rs12197388 on chromosome 6, was detected through the joint test (P-value 2df test <7.0 × 10−7), based on its association with breast cancer but did not show significant interaction. Overall, this study exemplifies that new risk loci can be discovered by accounting for gene–environment interaction.

For breast cancer, gene–environment interaction studies aiming at identifying new environmental risk factors are lacking. The main reason is the difficulty to obtain comprehensive reliable data on environmental exposure for the large sample sizes that would be needed, and lack of sufficient studies with biological specimens for for example, metabolomics or proteomics. Further technical development may help to overcome this problem similarly to the vast advances that have been made in measuring genomic variability.

Conclusion

Studying gene–environment interaction offers the opportunity to obtain new knowledge on several aspects of breast cancer aetiology and it may help to improve prevention. Although several comprehensive large-scale gene–environment interaction studies have been conducted for breast cancer risk, relevant gene–environment interaction has not been established to date. The potential interactions that have been identified with common genetic variants are of small to moderate size, and they are unlikely to improve risk prediction individually, but may be helpful when combined. Stronger interactions could be present with rarer variants, but this has still to be examined. Presently, we should consider hereditary variants and environmental factors as multiplicative/additive factors in the prediction of breast cancer risk. Hence in risk prediction models both should be included and modelled as independent factors pending further research.

Although the few exploratory studies that attempted to identify new genetic susceptibility loci by accounting for gene–environment interaction showed interesting results, this approach is likely to be more successful with the availability of larger data resources. The large genotyping experiment currently being conducted by the OncoArray consortium (2013) together with the resources already available in BCAC and BPC3 will create opportunities to further study gene–environment interactions.