Genome-wide interaction analysis of menopausal hormone therapy use and breast cancer risk among 62,370 women

Use of menopausal hormone therapy (MHT) is associated with increased risk for breast cancer. However, the relevant mechanisms and its interaction with genetic variants are not fully understood. We conducted a genome-wide interaction analysis between MHT use and genetic variants for breast cancer risk in 27,585 cases and 34,785 controls from 26 observational studies. All women were post-menopausal and of European ancestry. Multivariable logistic regression models were used to test for multiplicative interactions between genetic variants and current MHT use. We considered interaction p-values < 5 × 10–8 as genome-wide significant, and p-values < 1 × 10–5 as suggestive. Linkage disequilibrium (LD)-based clumping was performed to identify independent candidate variants. None of the 9.7 million genetic variants tested for interactions with MHT use reached genome-wide significance. Only 213 variants, representing 18 independent loci, had p-values < 1 × 105. The strongest evidence was found for rs4674019 (p-value = 2.27 × 10–7), which showed genome-wide significant interaction (p-value = 3.8 × 10–8) with current MHT use when analysis was restricted to population-based studies only. Limiting the analyses to combined estrogen–progesterone MHT use only or to estrogen receptor (ER) positive cases did not identify any genome-wide significant evidence of interactions. In this large genome-wide SNP-MHT interaction study of breast cancer, we found no strong support for common genetic variants modifying the effect of MHT on breast cancer risk. These results suggest that common genetic variation has limited impact on the observed MHT–breast cancer risk association.

Breast cancer is one of the most common cancers in women. There were 268,600 new cases and 41,760 deaths due to breast cancer estimated in the U.S. in 2019 1 . The use of menopausal hormone therapy (MHT) is associated with up to 23% increased risk of breast cancer. MHT use has been reduced among postmenopausal women since the report by the Women's Health Initiative (WHI) clinical trial and observational study 2,3 which has been subsequently confirmed by other studies and meta-analyses 4,5 . Breast cancer risk increases with longer duration of use 6 , and is higher for combined estrogen-progesterone MHT (EPT) use as compared with estrogen-only (ET) regimens 4,5 . Additionally, the association between MHT use and breast cancer may also differ by tumor molecular subtype. A prospective cohort study in UK found that current MHT use was associated with increased risk for estrogen receptor positive (ER+) breast cancers, but not with ER-breast cancers 7 . Several other observational studies also found that MHT use was associated with elevated risk of ER+ breast cancer [8][9][10][11][12] .
The biological mechanisms underlying the effect of MHT use on breast cancer risk is not fully understood. One proposed mechanism is that higher estrogen and progesterone levels increase the proliferation of breast epithelial cells, which results in accumulation of genetic mutations and insufficient DNA repair 13,14 , and therefore induces mutagenesis 15,16 . Genome-wide association studies (GWAS) have identified over 200 single nucleotide polymorphisms (SNPs) that are associated with invasive breast cancer risk [17][18][19] . Further analyses based on these GWAS findings have identified several genes that might interact with MHT use on breast cancer risk, including SNPs regulating the fibroblast growth factor receptor two (FGFR2) gene 20 , as well as SNPs close to the Kruppel like factor 4 (KLF4) gene and the insulin like growth-factor-binding protein 5 (IGFBP5) gene [21][22][23] . A metaanalysis of four genome-wide case-only interaction studies found suggestive evidence of interactions between MHT use and SNPs in genes related to transmembrane signaling and immune cell activation 24 . However, none of the findings reached genome-wide significance.
In the present study, we performed a comprehensive genome-wide interaction analysis of current MHT use by pooling individual-level data from 26 epidemiological studies. We also performed genome-wide interaction analysis of MHT use on ER+ breast cancer specifically.

Methods
Study population and data collection. Individual level data were pooled from 26 epidemiological studies, including eight population-based case-control studies, 13 nested studies from prospective cohort studies and five studies with mixed design from the Breast Cancer Association Consortium (BCAC) (Table S1). Data collection instruments for individual studies have been described previously 19,23 . Breast cancer cases were defined as incident invasive or in-situ breast tumors, confirmed by medical records, pathological reports or death certificates. Cases of benign breast disease or cases diagnosed more than five years before study enrollment were excluded.
Participants were excluded if they were male, pre-menopausal, of non-European ancestry, with unknown age at reference date, or missing information on MHT use. Reference date was defined as date of diagnosis for cases, and date of interview for controls. Menopausal status was reported at time of interview. For women with missing menopausal status, we assumed postmenopausal status for those who were > 54 years old. Only studies with information on MHT use in at least 150 breast cancer cases and 150 controls were included in the data analysis.
Ethnical approval and consent to participate. All participating studies were approved by the relevant ethics committees and informed consent was obtained from study participants.
Menopausal hormone therapy use definition. MHT use was defined as use for at least three months of any type of MHT, including EPT and ET. Current MHT use was defined as use at, or within the six months prior to the reference date. Former MHT use was defined as women who had a history of using MHT but had quit more than 6 months prior to the reference date.
Genotyping. Samples were genotyped by the Illumina custom iSelect genotyping array (iCOGs) 25,26 or the Illumina OncoArray 500K (OncoArray) 19,27 . Details on genotyping, imputation and quality-control checks have been published previously 19,26 . For these analyses, 9680 cases and 10,598 controls were genotyped using iCOGs, and 17,905 cases and 24,187 controls were genotyped using OncoArray. Both datasets were imputed to the 1000 Genomes Phase 3 release 28 . For samples that were genotyped on both iCOGs and OncoArray, OncoArray data was used. SNPs were excluded if imputation r 2 < 0.5 for iCOGs, and r 2 < 0.8 for OncoArray. A total of 9,661,037 genetic variants (SNPs and indels) were included for analysis in both datasets. Statistical analysis. We used multivariable logistic regression models to test for interaction between each genetic variant and current MHT use (compared to never users) on breast cancer risk, adjusting for age at reference date, study, former MHT use, an indicator for study design (1 for population-based case-control or prospective studies, 0 for non-population or mixed case-control studies), an interaction term of study design www.nature.com/scientificreports/ indicator and current MHT use to account for different main effect of current MHT use by study design, and principal components to account for potential population stratification 29 , thus fitting a model of the form: Each genetic variant was assessed as a continuous variable in a log-additive odds ratio model. For genetic variants that were not directly genotyped, the expected number of copies of the variant allele ("dosage") was used 30 . OncoArray and iCOGs datasets were analyzed separately, and platform-specific interaction parameter estimates ( β gc ) were combined using METAL 31 to obtain summary estimates for each SNP. Similar analyses were also performed for EPT use only and for ER+ breast cancer. Q-Q plots were used to assess whether the distribution of the p-values indicated genomic inflation. A p-value at 5 × 10 -8 was used as the genome-wide significance level 32 .
For variants reaching suggestive evidence of interaction (p < 1 × 10 -5 ), we performed linkage disequilibrium (LD)-based clumping to identify independent loci that might interact with MHT use on breast cancer risk (SWISS version 1.0.05b). SNPs in LD (r 2 > 0.1 based on the build-in 1000G_2014-11_EUR) within 1 Mb from the most significantly associated SNP were removed so that independent SNPs remained in each region.
We also performed sensitivity analysis among patients from the population-based studies only. All analyses were performed using R version 3.6.1 unless otherwise specified.

Results
A total of 62,370 post-menopausal women from 26 studies (27,585 cases and 34,785 controls), were included in the analyses (Table S1). Cases were slightly older (mean age: 64 years) than controls (mean age: 63 years). Current use of MHT was more common among breast cancer cases (34%) than controls (28%), showing a suggestive increased breast cancer risk (OR = 1.16; 95% CI: 0.99, 1.36; Fig. 1A). A total of 20,131 cases and 22,601 controls from 18 studies also had information on current use of EPT. Current EPT use was more common among cases (19%) than controls (13%) and was associated with an estimated 48% risk increase of breast cancer, compared to non-EPT users (OR: 1.48; 95% CI: 1.29, 1.70; Fig. 1B).
A total of 9,661,271 SNPs and indels were successfully imputed from both the OncoArray and iCOGs genotyping platforms and were included in the combined analysis. We did not observe any interactions between variants and current MHT use at genome-wide significance level (p-value < 5 × 10 -8 , Fig. 2A). 213 SNPs had suggestive evidence of interaction with MHT use on breast cancer risk (p-value < 1 × 10 -5 ). After LD-based clumping, 18 independent SNPs remained, none of which were in LD with currently known breast cancer risk GWAS loci ( Table 1). The strongest evidence of interaction was for SNP rs4674019, located at chromosome 2q35 (p-value = 2.27 × 10 -7 ). When restricting the analyses to population-based studies only (23,063 cases and 30,250 controls), this same SNP rs4674019 showed statistically significantly interaction with current MHT use on breast cancer risk (p-value = 3.75 × 10 -8 ; Fig. S1).
Similarly, we did not observe any genome-wide significant interactions between SNPs and combined EPT use on breast cancer risk (Fig. 2B). There were 71 SNPs that reached suggestive significance level at p-value < 1 × 10 -5 . After LD-based clumping, 21 independent SNPs showed suggestive interactions ( Table 2). The strongest evidence of interaction was for SNP rs4865075, located on chromosome 4q12 (p-value = 5.5 × 10 -7 ). Sensitivity analysis using population-based studies only did not find statistically significant interactions.
Restricting our cases to those with ER+ breast cancer did not result in any genome-wide significant findings (Figs. S2 and S3). No genomic inflation was observed in primary or subgroup analyses (Figs. S1, S2 and S3).

Discussion
In this large genome-wide analysis of postmenopausal women of European ancestry, we did not identify any genetic variants that were strong modifiers of the association between current MHT use on breast cancer risk. Although the interaction between SNP rs4674019 and current MHT use was statistically significant among population-based studies only, the variate allele frequency is relatively rare (EAF = 5%) and needs further validation.
Consistent with previous literature 2, 3,33 , we found that current use of MHT, and in particular current EPT use, was associated with an increased risk of breast cancer for postmenopausal women. The mechanisms underlying this association are not fully understood. It has been hypothesized that estrogen stimulates cell proliferation through ERα-mediated hormone activity and increases mutation rates through a cytochrome P450-mediated metabolic activation that results in DNA damage 34 . In addition, the risk associated with ER+ breast cancer is substantially higher than for ER-breast cancer, particularly for EPT use, suggesting an ER-dependent pathway 5 . In vitro and in vivo studies found that estradiol and 4-OH-estradiol, metabolites of estrogen, may induce mutations and damage DNA by forming DNA adducts to bind to adenine and guanine on the DNA backbone 35,36 . The role of progestogens in human breast carcinogenesis is less clear, although it has been suggested that synthetic progestogens are pro-proliferative and may thus promote cancer cell growth 37,38 .
Although MHT use has been found to be associated with increased breast cancer risk in both epidemiologic and experimental studies, no published studies to date have identified genome-wide significant interactions for breast cancer risk between candidate single variants and MHT use among postmenopausal women 39,40 . In a previous two-stage GWAS interaction analysis among ~ 2700 cases and ~ 2700 controls, five SNPs had suggestive evidence of interaction with current MHT use; but none of them reached genome-wide significance 41 . A meta-analysis of genome-wide case-only studies in 2920 cases also found no statistically significant interactions logit[Pr(D)] = α + β c CurrentMHT + β g SNP + β gc SNP × CurrentMHT www.nature.com/scientificreports/ between SNPs and MHT use on breast cancer overall or by subtype 24 . Although our study had a larger sample size and increased statistical power than previous genome-wide analyses, we similarly did not find any genomewide statistically significant interactions between genetic variants and MHT use in this study, and we further did not replicate previously suggested SNPs (data not shown). www.nature.com/scientificreports/ The region for which the strongest evidence of interaction with current MHT use on breast cancer risk was observed (lead SNP rs4674019), was also implicated in the analysis restricted to combined EPT use only (p-value = 4.5 × 10 -6 ). The rs4674019 SNP is an intronic variant in the coding region for the long intergenic nonprotein coding RNA 607 (LINC00607). Although the functionality of long non-coding RNAs is still not clear, it has been recently recognized that abnormal expression of long non-coding RNAs may play an important role in cell cycle control and cell differentiation, which is related to cancer and neurodegenerative disease [42][43][44] . Expression levels of LINC00607 were found to be significantly downregulated among lung adenocarcinoma tissues, compared to adjacent tissues 45 . Other GWAS have shown genetic variants in the LINC00607 gene to be associated with height in people of European ancestry 46 . Previous evidence for long noncoding RNAs in relation to  www.nature.com/scientificreports/ breast cancer risk is limited; but it is possible that changes in exogenous hormone levels due to MHT use result in differential expression that eventually leads to tumorigenesis. We also observed suggestive evidence of interaction between current use of both MHT and EPT and rs146251672. SNP rs146251672 is located in the intronic region for the prickle planar cell polarity protein 2 (PRICKLE2) gene on chromosome 3. PRICKLE2 encodes a non-canonical Wnt signaling protein that mediates feedback amplification to generate asymmetric planar cell polarity (PCP) signaling 47 . The Wnt pathway has been found to be activated in more than half of breast tumors, and is associated with lower overall survival for breast cancer patients 48 . In particular, the upregulation of the Wnt/PCP pathway has been suggested to be associated with more malignant phenotypes, such as abnormal tissue polarity, invasion and metastasis 49 . Exposure to estrogen has been associated with accelerated tumor formation in ER-knockout/Wnt-1 mice 36 . It is plausible that MHT acts partially through the alternative Wnt pathway rather than ER-dependent pathways to promote breast tumor development.
This study constitutes the largest genome-wide interaction analysis for current MHT use and breast cancer risk in postmenopausal women to date. We analyzed data from more than 62,000 women for whom we had both MHT use and genotypes from more than 9.6 million genetic variants. We controlled our analysis for potential confounding by population stratification by adjusting for principal components. We performed LD-based clumping, which accounted for correlations between genotypes to identify the strongest signal in each independent region, providing more targeted variants and regions for future investigation.
There are some limitations to our study. We used a single binary definition of current MHT use within 6 months prior to reference date and could not evaluate other measures such as age at MHT initiation or duration of MHT use. This could lead to some exposure misclassification, particularly for the non-population based studies, where it is possible that those cases had stopped their MHT use at time of recruitment and were classified as non-current users. Such misclassification would have attenuated the main effect of MHT and reduced our statistical power to detect any interactions. In our sensitivity analysis using population-based studies only, we found stronger interactions between the lead SNPs and MHT use. However, given a smaller sample size in the sensitivity analysis, it is possible that we did not have sufficient statistical power to detect any other potential interactions. We assumed no significant interactions between SNPs and former MHT use, and only adjusted for Table 2. Independent genetic variants with suggestive interaction of current combined EPT use on breast cancer after LD-based clumping. SNP single nucleotide polymorphism, Chr chromosome, A1 reference allele, A2 alternative allele, EAF estimated allele frequency for alternative allele, OR odds ratios per alternative allele with current combined estrogen-progesterone hormone therapy use on breast cancer risk, 95% CI corresponding 95% confidence intervals; a rsid and position are based on the Genome Reference Consortium Human genome build 37. www.nature.com/scientificreports/ potential confounding from the main effect of former MHT use in the model based on previous evidence. It is possible that comparing the interaction effect of current MHT use to a combined reference group of never and former users may attenuate the point estimates of the interaction term. The use of estrogen only hormone therapy (ET) was also not available among the study participants, although the statistical power might be further limited since the main association of ET and breast cancer risk is much smaller than EPT use 5 . Although the impact may be small given our case-control study design and large sample size, it is still possible that the observed SNP-MHT interaction was due to the interaction between SNPs and potential uncontrolled confounders of MHT use that were not available in our study 50 , such as tolerance of menopausal symptoms or socioeconomic status. In addition, our study sample only included women of European ancestry, and thus, our findings may not be generalizable to other race/ethnicity groups. It is important to note that the lack of statistical interaction, on the log-scale, does not necessarily imply a lack of biological interaction. The results are consistent with a model in which the effects of genetic variants and MHT use combine multiplicatively on risk, which could still indicate important interactions at a functional level. Overall, our results suggest that it is not necessary to include interaction variables for G × MHT use in development of breast cancer risk prediction models. Although our results suggested that potential interaction effect between SNP rs4674019 and current MHT, further validation is needed. Several suggestive interactions also warrant further investigations in independent studies.

Conclusion
In this large genome-wide SNP-MHT interaction study of breast cancer, we found no strong support for common genetic variants modifying the effect of MHT on breast cancer risk. These results suggest that common genetic variation has limited impact on the observed MHT-breast cancer risk association.

Data availability
The data that support the findings of this study are available from the Breast Cancer Association Consortium but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of the Breast Cancer Association Consortium.