Background

Epidemiological studies provide strong evidence for a role of endogenous hormones in the aetiology of breast cancer.1,2 Pooled analyses of data from prospective studies estimated that a doubling of circulating oestradiol or oestrone was associated with a 30–50% increase in breast cancer risk in postmenopausal women and a 20–30% increase in breast cancer risk in premenopausal women; there was no evidence that premenopausal progesterone levels were associated with breast cancer risk.2,3 We have previously screened 642 SNPs tagging 42 genes involved in sex steroid synthesis or metabolism, and tested for the association with premenopausal urinary oestrone glucuronide and pregnanediol-3-glucuronide levels, measured in urine samples collected at pre-specified days of the woman’s menstrual cycle.4 Oestrone-3-glucuronide and pregnanediol-3-glucuronide are urinary metabolites of oestrogen and progesterone, respectively,5,6 that are used in the context of reproductive medicine to monitor ovarian activity.7 None of the variants that we tested was associated with urinary pregnanediol-3-glucuronide, but a rare haplotype, defined by two SNPs spanning the cytochrome P450 family 3 subfamily A (CYP3A) gene cluster, was associated with a highly significant 32% difference in urinary oestrone-3-glucuronide.4 Fine-scale mapping analyses identified the SNP rs45446698 as a putative causal variant at this locus; rs45446698 is one of seven highly correlated SNPs that cluster within the CYP3A7 promoter and comprise the CYP3A7*1C allele.8 A genome-wide association study (GWAS) of postmenopausal plasma oestradiol levels found no association at this locus.9 A subsequent GWAS of pre- and postmenopausal hormone levels similarly found no association with plasma oestradiol at this locus; they did however find associations at this locus with DHEAS and progesterone.10

The CYP3A genes (CYP3A5, CYP3A7 and CYP3A4) encode enzymes that metabolise a diverse range of substrates;11 in addition to a role in the oxidative metabolism of hormones, CYP3A enzymes metabolise ~50% of all clinically used drugs, including many of the agents used in treating cancer.12 CYP3A4, the major isoform in adults, is predominantly expressed in the liver, where it is the most abundant P450, accounting for 30% of total CYP450 protein. CYP3A7, the major isoform in the foetus, is generally silenced shortly after birth.13 In CYP3A7*1C carriers, a region within the foetal CYP3A7 promoter has been replaced with the equivalent region from the adult CYP3A4 gene;14 this results in adult expression of CYP3A7 in CYP3A7*1C carriers and may influence metabolism of endogenous hormones, exogenous hormones used in menopausal hormone treatment and clinically prescribed drugs, including agents used in treating cancer, in these individuals.12,15 In order to identify additional variants that are associated with premenopausal urinary hormone levels and to further characterise the associations at the CYP3A locus, we carried out a GWAS of urinary oestrone-3-glucuronide and pregnanediol-3-glucuronide levels, using mid-luteal-phase urine samples from women of European ancestry and followed up by testing for an association with breast cancer risk in cases and controls from the Breast Cancer Association Consortium (BCAC). To determine whether the CYP3A7*1C allele influences metabolism of exogenous hormones, we evaluated gene-environment interactions with menopausal hormone treatment for breast cancer risk, and to investigate whether adult expression of CYP3A7 impacts on agents used in treating cancer, we analysed associations with breast cancer-specific survival.

Methods

GWAS subjects

Generations Study

Full details of the Generations Study have been published previously.16 Briefly, the Generations Study is a cohort study of more than 110,000 women from the UK general population, who were recruited beginning in 2003 and from whom detailed questionnaires and blood samples have been collected to investigate risk factors for breast cancer.

British Breast Cancer Study

Full details of the British Breast Cancer Study have been published previously.17 Briefly, the British Breast Cancer Study is a national case–control study of breast cancer, in which cases of breast cancer were ascertained through the cancer registries of England and Scotland and through the National Cancer Research Network. Cases were asked to invite a healthy female first-degree relative with no history of cancer and a female friend or non-blood relative to participate in the study.

Mammography Oestrogens and Growth Factors study

Full details of the Mammography Oestrogens and Growth Factors study have been published previously.18 Briefly, this is an observational study nested within a trial of annual mammography screening in young women that was conducted in Britain.19 Approximately 54,000 women aged 39–41 years were randomly assigned to the intervention arm from 1991 to 1997 and offered annual mammograms until age 48 years. From 2000 to 2003, women in the intervention arm who were still participating in this trial were invited to participate in the Mammography Oestrogens and Growth Factors study; they were asked to provide a blood sample and complete a questionnaire detailing demographic, lifestyle and reproductive factors. More than 8000 women were enrolled in the study.

GWAS subjects were drawn from the Generations Study (N = 184), the British Breast Cancer Study (N = 284) and the Mammography Oestrogens and Growth Factors study (N = 109). To be eligible for the GWAS analysis of oestrone-3-glucuronide and pregnanediol-3-glucuronide levels, women had to be having regular menstrual cycles (i.e., their usual cycle length had to be between 21 and 35 days) and not using menopausal hormone therapy or oral contraceptives. All of the women included in this analysis reported being of European ancestry, and none had been diagnosed with breast cancer at the time of study recruitment.

Measurement of hormone levels

The protocol for collecting timed urine samples has been published previously.18 Briefly, a woman’s predicted date of ovulation was estimated from the date of the first day of her last menstrual period and her usual cycle length; ovulation was predicted to occur 14 days before the date of her next menstrual period. On this basis, women were asked to provide a series of early morning urine samples on pre-specified days of their cycle. For this analysis, the mid-luteal-phase sample, taken at 7 days after the predicted day of ovulation, was used. To confirm that ovulation had occurred, consistent with the predicted date of ovulation, pregnanediol-3-glucuronide was measured; to take account of the differences in volume in early morning urine samples from different women, we measured creatinine, a waste product of normal muscle and protein metabolism that is released at a constant rate by the body. Samples in which pregnanediol-3-glucuronide, adjusted for creatinine levels, was >0.3 µmol/mol, were taken forward for measurement of creatinine-adjusted oestrone-3-glucuronide. Pregnanediol-3-glucuronide and oestrone-3-glucuronide were analysed by commercial competitive ELISA Kits (Arbor Assays, Ann Arbor, USA) according to the manufacturer’s instructions. For pregnanediol-3-glucuronide, the lower limit of detection was determined as 0.64 nmol/l; intra- and inter-assay coefficients of variation were 3.7% and 5.2%, respectively. For oestrone-3-glucuronide, the lower limit of detection was determined as 19.6 pmol/l; intra- and inter-assay coefficients of variation were 3.5% and 5.9%, respectively. Creatinine was determined using the creatininase/creatinase-specific enzymatic method20 using a commercial kit (Alpha Laboratories Ltd. Eastleigh, UK) adapted for use on a Cobas Fara centrifugal analyser (Roche Diagnostics Ltd, Welwyn Garden City, UK). For within-run precision, the coefficient of variation was <3%, while for intra-batch precision, the coefficient of variation was <5%.

For 303 premenopausal women participating in the Generations Study (184 as above and an additional 119 for whom timed urine samples were accrued more recently), urinary progesterone levels were also measured using an “in house” ELISA. In all, 96-well plates (Greiner Bio-One GmbH, Frickenhausen, Germany) were coated with 100 µl of 5 µg/ml GAM (Arbor Assays, Ann Arbor, USA) in ELISA coating buffer (100 mM Na Bicarbonate, pH 9.6) covered and incubated in a fridge at 4 °C overnight. Before use, the plates were washed three times with wash buffer 0.05 M Tris/HCl and 0.05% Tween 20, pH 7.4 (Tween® 20, Sigma-Aldrich, Inc., St. Louis, MO, USA). Standards, samples and controls (20 µl per well) were added to each well, followed by 80 µl of progesterone 3-HRP conjugate (Astra Biotech GmbH, Berlin, Germany) at 1:10,000 in assay buffer (PBS pH 7.4 containing 0.1% BSA and 250 ng/ml Cortisol), followed by 50 μl of monoclonal progesterone Ab (Astra Biotech GmbH, Berlin, Germany) 1:50,000 in assay buffer. Plates were incubated at room temperature for 2 h on a microtitre plate shaker (IKA®, Schuttler MTS4, IKA Labortechnik, Staufen, Germany), then washed five times with assay wash buffer and 120 µl of substrate solution (3,3,5,5-tetramethylbenzidine, Millipore Corporation, Temecula, CA, USA) was added to each well. Plates were incubated at room temperature without shaking in the dark. After 20 min, the reaction was stopped by adding 80 µl of 2 N H2SO4 solution (Sigma-Aldrich Company Ltd., Dorset, UK). Finally, the plates were read on a plate reader at 450 nm. Standard curves were prepared with a total of eight different concentrations (16, 8, 4, 2, 1, 0.5, 0.25 and 0 ng/ml). Samples, standards and controls were included in duplicate. Inter- and intra-assay coefficients of variation were calculated from two controls of low and high progesterone in duplicate in each of eight assays. The inter-assay coefficients of variation for low and high pools, respectively, were 11.4 and 9.1%; the intra-assay coefficients of variation were 8.9 and 5.6%. The lower limit of detection was calculated at 0.1 ng/ml. Cross-reaction with other steroids was oestrone: 0.17%, oestradiol: 0.28%, oestriol: 0.18%, dehydroepiandrosterone: 0.02%, testosterone: 0.36%, dihydrotestosterone: 0.15%, 17α-hydroxyprogesterone: 2.9%, androstenedione: 0.14%, 11-deoxycortisol: 0.46%, corticosterone: 0.18%, cortisone: 0.04% and cortisol: 0.04%.

GWAS genotyping and quality control

DNA from 577 women was genotyped using Illumina Infinium OncoArray 500 K BeadChips. We excluded samples for which <95% of SNPs were successfully genotyped. Identity-by-descent analysis was used to identify closely related individuals enabling exclusion of first-degree relatives. We applied SmartPCA21 to our data and used phase II HapMap samples to identify individuals with non-Caucasian ancestry. The first two principal components for each individual were plotted, and k-means clustering was used to identify samples separated from the main Caucasian cluster. SNPs with call rates <95% were excluded, as were SNPs with minor allele frequency (MAF) < 2% and those whose genotype frequencies deviated from Hardy–Weinberg proportions at P < 1 × 10–05. Following QC, 487,659 SNPs were successfully genotyped in 560 samples (Generations Study: N = 179, British Breast Cancer Study: N = 278 and Mammography Oestrogens and Growth Factors study: N = 103). Genome-wide imputation was performed using 1KGP Phase 3 reference data. Haplotypes were pre-phased using SHAPEIT2.22 Imputation was performed using IMPUTE2.23 Imputed SNPs with INFO scores <0.8 and MAFs <2% were excluded from subsequent analyses. After QC, a set of 7,792,694 successfully imputed SNPs were available for association analysis.

Genotyping rs45446698 and sequencing of the CYP3A7*1C allele

For the 119 Generations Study women who were not included in the GWAS but for whom progesterone was subsequently measured, rs45446698 was genotyped by TaqMan (Thermo Fisher Scientific Ltd, UK). The call rate was 100% with 100% concordance between 12 duplicates. To confirm that rs45446698 tags the CYP3A7*1C allele, we sequenced this region in 31 women selected on the basis of their rs45446698 genotype (9 common homozygotes and 22 carriers). A 370-bp DNA region (chr7: 99 332 745-99 333 114; GRCh37/hg19) was amplified using Phusion High-Fidelity DNA Polymerase (New England Biolabs, UK) and primers CCATAGAGACAAGAGGAGA (forward) and CTGAGTCTTTTTTTCAGCAGC (reverse). The PCR product was purified using QIAquick Gel Extraction Kit (Qiagen) and Sanger sequenced using a commercially available service (Eurofins Genomics, Germany).

Statistical analysis of GWAS data

Tests of association between SNP genotypes and log-transformed creatinine-adjusted oestrone-3-glucuronide and pregnanediol-3-glucuronide adjusted for study were performed using linear regression in SNPTEST v2.5.24 Test statistic inflation was assessed visually using a QQ Plot (Supplementary Fig. S1) and formally by calculating the inflation factor, λ. There was no evidence of systematic test statistic inflation (λ = 1.01 for both oestrone-3-glucuronide and pregnanediol-3-glucuronide). For the single significant association (rs45446698), we used multivariate linear regression to adjust for potential confounders: age at menarche (<12, 12, 13, 14 and >14 years), age at collection of urine samples (<35, 35–40 and ≥40 years), body mass index (BMI: < 18.5, 18.5–<20.0, 20.0–<25.0, 25.0–<30.0 and ≥ 30.0 kg/m2) and parity (0, 1, 2 and ≥3 live births).

Follow-up genotyping of rs45446698

Genotype data for rs45446698 were generated as part of iCOGS25 and OncoArray.26 Full details of SNP selection, array design, genotyping and post-genotyping QC have been published.25,26 Participants genotyped in both collaborations were excluded from the iCOGS data sets with the exception of the GxE interaction analysis of menopausal hormone treatment, for which five studies (CPS-II, PBCS, UKBGS, MCCS and pKARMA) were excluded from OncoArray, rather than iCOGS, in order to maximise the number of studies with sufficient cases and controls for analysis. We excluded cases with breast tumours of unknown invasiveness, or in situ disease, and those for whom age at diagnosis was not known. After QC exclusions,26 the call rate for rs45446698 in OncoArray data was 99.66% and there was no evidence of deviation from Hardy–Weinberg equilibrium in controls (Supplementary Table S1). In iCOGS data, rs45446698 was imputed using 1KGP Phase 3 reference data (info score = 0.94); we used gene dosages (≤0.2 = 0, >0.8 and ≤1.2 = 1, >1.8 = 2) to call genotypes for 99.22% of samples.

Statistical analysis of rs45446698 and breast cancer risk

Due to the low MAF of rs45446698 (3.7%, 0.03% and 0.4% in individuals of European, Asian and African ancestry, respectively), we restricted our analyses to individuals of European ancestry and excluded studies with <50 cases or controls; there were 35 (iCOGS) and 56 (OncoArray) studies for the current case–control analysis (Supplementary Tables S1 and S2).

We combined heterozygote and rare homozygote genotypes and estimated carrier ORs using logistic regression, adjusted for 15 principal components25,26 and study. Stratum-specific carrier ORs were estimated for a set of pre-specified prognostic variables (oestrogen receptor (ER), progesterone receptor (PR), HER2, grade and stage). We excluded studies with <50 cases or controls in any individual stratum from stratified analyses. Interactions were assessed based on case-only models (ER, PR, Her2, stage and grade). In the subset of studies for which covariate data were available, we used multivariable logistic regression to adjust for reference age (defined as age at diagnosis for cases and age at interview for controls), age at menarche, BMI and parity (as above). Finally, we stratified our analyses on menopausal status at reference age. When menopausal status was missing, the reference age was used as a surrogate (<54 premenopausal and ≥54 postmenopausal). To select the reference age that most accurately captured menopausal status in this group of studies, we generated AUC curves based on women who had reported natural menopause with different reference age cut-offs (50–56 years); on this basis, a reference age of 54 was selected. P values were estimated using likelihood ratio tests with one degree of freedom. All P values reported, for all analyses, are two-sided. Statistical analyses were performed using STATA version 11.0 (StataCorp, College Station, TX, USA).

Statistical analysis of gene-environment interaction (GxE) with menopausal hormone treatment

Postmenopausal women from 13 (iCOGS) and 27 (OncoArray) studies provided the data on menopausal hormone treatment. Menopausal status and postmenopausal hormone use were derived as of the reference date (defined as date of diagnosis for cases and interview for controls); women with unknown age at reference date were excluded from this analysis. All analyses were conducted only in postmenopausal women. Carrier ORs for breast cancer risk were estimated using logistic regression stratified by current use of menopausal hormone treatment, oestrogen–progesterone therapy and oestrogen-only therapy, respectively. Analyses were adjusted for study, ten principal components, reference age, age at menarche, parity, BMI, former use of menopausal hormone treatment and use of any menopausal hormone treatment preparation other than the one of interest in analyses of current use of menopausal hormone treatment by type. To account for potential heterogeneity of the main effects of menopausal hormone treatment/oestrogen–progesterone therapy/oestrogen-only therapy by study design, we included an interaction term between the risk factor of interest and an indicator variable for study design (prospective cohorts/population-based case–control studies, non-population-based studies). Interactions between rs45446698 and current use of menopausal hormone treatment, oestrogen–progesterone therapy and oestrogen-only therapy were assessed using likelihood ratio tests, based on logistic regression models with and without interaction between rs45446698 and current use of menopausal hormone treatment, oestrogen–progesterone therapy and oestrogen-only therapy, respectively. Statistical analyses were performed using SAS 9.4 and R (version 3.4.4).

Statistical analysis of breast cancer-specific survival in cases

In total, 38 (iCOGS) and 63 (OncoArray) studies provided follow-up data for analysis of breast cancer-specific survival. Analysis of outcome was restricted to patients who were at least 18 years old at diagnosis and for whom vital status at, and date of the last follow-up were known. Patients ascertained for a second tumour were excluded. Time-to-event was calculated from the date of diagnosis. For prevalent cases with study entry after diagnosis, left truncation was applied, i.e., follow-up started at the date of study entry.27 Follow-up was right-censored at the date of death (death known to be due to breast cancer considered an event), the date the patient was last known to be alive if death did not occur or at 10 years after diagnosis, whichever came first. Follow-up was censored at 10 years due to limited data availability after this time. Hazard ratios (HR) for association of rs45446698 genotype with breast cancer-specific survival were estimated using Cox proportional hazards regression implemented in the R package survival (v. 2.43–3) stratified by country. iCOGS and OncoArray estimates were combined using an inverse-variance-weighted meta-analysis.

Results

We tested 8,280,353 autosomal SNPs for association with luteal-phase creatinine-adjusted oestrone-3-glucuronide and pregnanediol-3-glucuronide in 560 premenopausal women. For oestrone-3-glucuronide, we identified a single peak mapping to the CYP3A locus at chromosome 7q22.1 (Fig. 1 and Supplementary Table S3); conditioning on any of the top SNPs, there were no additional independent signals. Four of the SNPs that were significant at P < 1 × 10−8 comprise part of the seven-SNP CYP3A7*1C allele,8,15 including the top, directly genotyped SNP, rs45446698 (Supplementary Table S3). The rare rs45446698-C allele (MAF = 0.035) was associated with a 49.2% reduction in luteal-phase oestrone-3-glucuronide (95% CI −56.1% to −41.1%, P = 3.1 × 10−18, Table 1) and explained 11.5% of the variation in oestrone-3-glucuronide in these premenopausal women. Since hormone levels may be influenced by both demographic and reproductive factors, we adjusted for age at urine collection, age at menarche, body mass index and parity; these adjustments did not alter the association (fully adjusted model: 44.8% reduction, 95% CI −53.3% to −34.8%, P = 2.1 × 10−12, Table 1).

Fig. 1: Manhattan plot of single-nucleotide polymorphism (SNP) associations with luteal-phase urinary oestrone-3-glucuronide levels in 560 premenopausal women.
figure 1

–log10 P values for SNP associations are plotted against the genomic coordinates (hg19). The red line indicates the conventionally accepted threshold for genome-wide significance (P = 1 × 10−8).

Table 1 Association of rs45446698 with levels of oestrone-3-glucuronide, pregnanediol-3-glucuronide and progesterone in premenopausal women of European ancestry.

For pregnanediol-3-glucuronide, there were no associations that were significant at a threshold of P < 1 × 10−8 (Supplementary Fig. S2). An association between the CYP3A locus and progesterone levels has been reported previously;10 accordingly, we measured progesterone in addition to pregnanediol-3-glucuronide in premenopausal women from the Generations Study. Progesterone was moderately correlated with pregnanediol-3-glucuronide (r = 0.37, P = 7.4 × 10−12), but while there was no association between the rs45446698-C allele and urinary pregnanediol-3-glucuronide levels (5.5% reduction, 95% CI −24.2% to +17.7%, P = 0.61) in this group of women, the rs45446698-C allele was associated with significantly lower luteal-phase urinary progesterone levels (26.7% reduction, 95% CI −39.4% to −11.6%, P = 0.001, Table 1). Adjusting these analyses for covariates, as above, did not alter the results (Table 1).

To test for the association between rs45446698 and breast cancer risk, we combined genotype data from 56 studies (OncoArray; Supplementary Table S1) with imputed data from 35 studies (iCOGS; Supplementary Table S2) in a total of 90,916 cases and 89,893 controls of European Ancestry. The rs45446698-C allele was associated with a reduction in breast cancer risk (OR = 0.94, 95% CI 0.91–0.98, P = 0.002, Table 2) with no evidence of heterogeneity between data sets (Phet = 0.58). There was no evidence that the reduction in breast cancer risk associated with being a rs45446698-C carrier differed according to Her2 status, tumour grade or stage (Supplementary Table S4). Stratifying by ER status, the association was limited to ER-positive (ER + ) breast cancers (OR = 0.91, 95% CI 0.87–0.96, P = 0.0002 and OR = 1.03, 95% CI 0.95–1.11, P = 0.50 for ER + and ER− cancers, respectively; Pint = 0.03, Table 2). Stratifying by ER and PR status, the association was limited to ER + /PR + cancers (ER + /PR + : OR = 0.86, 95% CI 0.82–0.91, P = 6.9 × 10–8; ER + /PR−: OR = 1.06, 95% CI 0.96–1.16, P = 0.25; Pint = 0.0001, Table 2). Adjusting for demographic and reproductive factors in the subset of studies for which these additional covariates were available did not alter this association (Supplementary Table S5). Defining reference age as age at diagnosis for cases and age at interview for controls and using this as a proxy for menopausal status (<54 or ≥54 years), we further stratified our analysis on menopausal status; there was little evidence that the association with ER + /PR + breast cancer differed by menopausal status (premenopausal OR = 0.94, 95% CI 0.84–1.06, P = 0.31, postmenopausal OR = 0.86, 95% CI 0.80–0.93, P = 0.0001, Phet = 0.28).

Table 2 Association of rs45446698 among women of European ancestry overall and stratified by hormone receptor status.

On the assumption that genetic variants that influence metabolism of endogenous hormones5 may also impact on metabolism of exogenous hormones, we investigated whether menopausal hormone treatment modified the association between rs45446698 genotype and ER + /PR + breast cancer risk in 17,831 postmenopausal breast cancer cases and 40,437 postmenopausal controls. The rs45446698-C carrier OR was lower (i.e., more protective) in current users of any menopausal hormone treatment but particularly in those who used combined oestrogen–progesterone therapy (current users: OR = 0.68, 95% CI 0.52–0.90, P = 0.007; never users: OR = 0.85, 95% CI 0.76–0.95, P = 0.005, Table 3). This difference was not, however, statistically significant (Pint = 0.15, Table 3).

Table 3 Association of rs45446698 genotype with ER + /PR + breast cancer risk among women of European ancestry stratified by current use of postmenopausal hormone treatment.

Finally, to determine whether rs45446698 genotype could affect patient outcome by influencing metabolism of cytotoxic agents that are CYP3A substrates,15 we tested for the association between rs45446698 genotype and 10-year breast cancer-specific survival in 91,539 breast cancer cases from 71 studies for whom follow-up data were available. There was neither overall association between rs45446698 genotype and breast cancer-specific survival (HR = 0.99, 95% CI 0.91–1.09, P = 0.90, Table 4) nor was there any evidence of an association in analyses stratified by tumour characteristics (Supplementary Table S6). Stratifying by treatment regimen, we found no evidence that rs45446698 genotype influenced outcome in cases who were treated with a hormonal agent (i.e., tamoxifen or an aromatase inhibitor, Table 4). There was, however, some evidence that in cases who were treated with a taxane, carriers of the rs45446698-C allele had reduced breast cancer-specific survival compared with non-carriers (HR = 1.46, 95% CI 1.08–1.97, P = 0.01, Table 4).

Table 4 Association of rs45446698 with breast cancer-specific survival in breast cancer cases of European Ancestry stratified by treatment regimen.

Discussion

This present GWAS identified a single, highly significant association between the CYP3A7*1C allele (tagged by rs4546698) and premenopausal urinary oestrone-3-glucuronide. This finding alone is not novel; we have previously reported an association between the CYP3A7*1C allele, parent oestrogens and several oestrogen metabolites.5 What we have demonstrated for the first time is the extent to which this signal dominates the genetic architecture of hormone levels in premenopausal women of Northern European ancestry (Fig. 1; rs45446698 P = 3.1 × 10−18, all other signals P > 1 × 10–8) and we estimate that 11.5% of the variance in urinary oestrone-3-glucuronide levels is explained by this one allele.

Two previous GWAS of circulating oestrogen levels have been published, neither reported an association with the CYP3A locus.9,10 This lack of replication may be explained by our choice of study population. The first GWAS9 was conducted in postmenopausal women (N = 1623) participating in the Nurses’ Health Study and the Sisters in Breast Screening Study. The second was conducted within the Twins UK study (N = 2913) and included men as well as pre-, peri- and postmenopausal women. A strength of our GWAS is that all of the women were premenopausal and had regular menstrual cycles; circulating levels of oestrogens in premenopausal women are much higher compared with those in postmenopausal women.4,28 For each woman, we assayed a single urine sample taken in the mid-luteal phase of her cycle at exactly 7 days after her predicted day of ovulation. Thus, although our study is relatively small (N = 560), we may have had greater power to detect an association at the CYP3A locus than previous studies due to the very homogeneous premenopausal study population that we selected.

Our findings also demonstrate the potential significance of the choice of hormone or hormone metabolite; both of the previous GWAS assayed plasma oestradiol. In a targeted analysis of urinary oestrogen metabolites, we have previously shown that the association between the CYP3A7*1C allele and oestrone (45.3% lower levels in carriers, P = 0.0005) is more pronounced than the association with oestradiol (26.7% lower levels, P = 0.07) with the implication that measuring urinary oestrone-3-glucuronide (rather than plasma oestradiol) may have contributed to our positive findings. Similarly, by measuring pregnanediol-3-glucuronide and progesterone in premenopausal women from the Generations Study, we were able to demonstrate a significant association of rs45446698 with progesterone (27% reduction, P = 0.001) in the absence of an association with pregnanediol-3-glucuronide (6% reduction, P = 0.61).

The fact that we measured a urinary oestrogen metabolite (oestrone-3-glucuronide) rather than serum or plasma oestrogens (oestradiol or oestrone) limits the interpretation of our results in terms of a causal association. Estimates of the association between circulating oestrogens and breast cancer risk are based on measurements of hormone levels in plasma or serum,3 and in a recent study that measured luteal-phase serum oestrogens and urinary oestrogen metabolites in 249 premenopausal women,29 serum oestradiol and oestrone were only moderately correlated with urinary oestrone (serum oestradiol: r = 0.39, serum oestrone: r = 0.48). Our analysis of rs45446698 genotypes in 90,916 cases and 89,893 controls from BCAC, however, provides robust evidence of an association of the CYP3A7*1C allele with breast cancer risk overall (OR = 0.94, P = 0.002) and a more pronounced protective effect on ER + /PR + breast cancers (OR = 0.86, P = 6.9 × 10−8). The specificity of this association (comparing ER + /PR− with ER + /PR + cancers, Phet = 0.001) and our replication of Ruth and colleagues report of a signal at the CYP3A locus in their analysis of circulating progesterone levels10 raise the possibility that premenopausal progesterone levels might influence risk of ER + /PR + breast cancers. This would be in contrast to the findings from Key et al. who reported no evidence of an association between premenopausal progesterone levels and breast cancer risk overall and no heterogeneity in estimates stratified by PR status.3 However, the number of cases of PR + (N = 158) and PR− (N = 61) breast cancer was small, and this analysis may have lacked power to detect modest associations in subgroups of cancers. Alternatively, the association of rs45446698 genotype with ER + /PR + breast cancer risk, specifically, may be due to the fact that PR is a marker for an intact oestrogen signalling pathway30 confirming a direct link between the levels of oestrogen (or oestrogen signalling) and proliferation in this subgroup of cancers.

Our analysis of the CYP3A7*1C allele, menopausal hormone treatment and breast cancer risk was inconclusive; while the carrier ORs were consistent with a greater protective effect of this allele in women taking exogenous hormones, particularly oestrogen–progesterone therapy, none of the interactions was statistically significant. Overall, there were 14,119 ER + /PR + breast cancer cases and 32,418 controls for this subgroup analysis, but for what was, arguably, the most pertinent subgroup (i.e., current oestrogen–progesterone therapy use), the number of cases who were current users was relatively small (CYP3A7*1C carriers N = 107, non-carriers N = 1498) and power was limited to detect modest interactions. There are limitations to this analysis; we focussed on current menopausal hormone treatment use (adjusted for past use) as it is for current use that the association with breast cancer risk is the strongest,31 but we did not have information on dose, duration or the formulation that was used.

Finally, we found no association between CYP3A7*1C carrier status and survival in patients treated with tamoxifen, a known CYP3A substrate. This may reflect the fact that compared to CYP3A4, CYP3A7 is a poor metaboliser of tamoxifen,32 or that standard doses of tamoxifen achieve high levels of oestrogen receptor saturation.33 There was some evidence that breast cancer-specific survival was reduced in CYP3A7*1C carriers who were treated with a taxane, compared with non-carriers (P = 0.01); this may, however, be a chance finding given the number of comparisons that were tested.

In conclusion, we present strong evidence that the CYP3A7*1C allele impacts on the metabolism of endogenous hormones, which in turn, reduces the risk of hormone receptor-positive breast cancer in carriers. Optimal strategies for breast cancer prevention in women at high risk of breast cancer and in the general population are an area of active research. In this context, CYP3A7*1C carriers represent a naturally occurring cohort in which the effects of reduced exposure to endogenous oestrogens and progesterones throughout a woman’s premenopausal years can be further investigated. Our results regarding the impact of CYP3A7*1C carrier status on exogenous hormones and chemotherapeutic agents are preliminary but warrant further investigation, preferably in the setting of randomised trials.