Introduction

DNA methylation has been proposed to play an important role in the aetiology of complex traits and diseases, including cancer1,2. DNA methylation, especially when detected in blood, has been investigated for its association with breast cancer risk. Peripheral blood DNA methylation at the known breast cancer susceptibility genes BRCA1 and ATM has been found to be associated with elevated breast cancer risk3,4,5. Several genome-wide studies of DNA methylation have reported peripheral blood DNA methylation changes associated with the risk of breast cancer6,7,8,9,10.

Two DNA methylation-based measures have been reported to be associated with breast cancer risk. One measure is DNA methylation-based biological age (DNAm age). Several epigenetic clocks have been developed to estimate DNAm age11, and the Hannum12 and Horvath13 clocks have received the most attention. Both these clocks are developed by regressing chronological age on individual methylation sites to select a set of sites to predict chronological age. The Levine clock14 has recently been developed by selecting a set of sites to predict ‘phenotypic age’, a linear combination of 10 clinical biomarkers (including chronological age) associated with the hazard of aging-related mortality. The difference between DNAm age and chronological age is called ‘epigenetic age acceleration’, which reflects the rate of biological aging. A positive value of epigenetic age acceleration suggests that biological age is older than chronological age. Prospective analyses have found associations between accelerated DNAm ages based on the Horvath and Levine clocks and increased risk of breast cancer for middle-aged women14,15. Recently, a prospective study assessed DNAm age for 2,764 middle-aged women using DNA extracted from pre-diagnosis blood samples and found that each of the three clocks above was predictive of breast cancer risk: each 5-year increase in epigenetic age acceleration (calculated as the residuals from regressing DNAm age on chronological age and blood cell composition and thus independent of chronological age and blood cell composition) was associated with about a 10–15% increase in breast cancer risk16.

Another DNA methylation measure reported to be associated with breast cancer risk is a global measure, genome-wide average DNA methylation, defined as the average methylation value across multiple probes, commonly calculated from the HumanMethylation 450 K (HM450) BeadChip17,18. Two studies found that a lower genome-wide average DNA methylation in pre-diagnosis peripheral blood was associated with increased risk of breast cancer8,9. These results are consistent with previous findings that conventional global blood DNA methylation measures, such as LINE-1 and LUMA, are negatively associated with breast cancer risk19,20. However, this association was not observed in the Norwegian Women and Cancer Study9, the Sister Study10 or a recent meta-analysis of four studies21, two of which are the studies of Refs. 8 and 9. A similar measure, the median methylation value across probes, has been found to be prospectively associated with risks of mature B-cell neoplasms22, urothelial cell carcinoma23 and prostate cancer24.

These measures could be treated as putative DNA methylation-based breast cancer risk factors. It is unknown whether these DNA-methylation based risk factors and conventional risk factors modify breast cancer risk independently or in combination. Knowing the associations between the two groups of risk factors could be helpful for answering this question and provide insights into breast cancer aetiology and risk prediction. As Christensen pointed out25: “in-depth studies of both established and putative breast cancer risk factors for their relationship with epigenetic age acceleration can add to our understanding of the biology underlying disease risk factors and present new opportunities for primary and secondary prevention of breast cancer.”

DNAm age has been investigated for its associations with lifestyle factors and some inconsistent results were reported11. Levine et al.26 found age at menopause was negatively associated with accelerated DNAm age based on the Horvath clock, opposite to the direction of the two factors’ associations with breast cancer risk. From a twin and family study, we found evidence that the variance in genome-wide average DNA methylation is determined by (unmeasured) environmental factors across the lifespan17; no specific environmental factor was investigated. To the best of our knowledge, associations of these DNA methylation-based risk factors with conventional breast cancer risk factors, such as family history or mammographic density, have not been investigated.

We therefore conducted a study with the aim of investigating associations between DNA methylation-based risk factors and conventional risk factors for breast cancer.

Results

Characteristics of the Australian mammographic density twins and sisters study (AMDTSS) sample

Table 1 shows the distribution of the investigated risk factors for the AMDTSS sample. Our sample is comparable with samples from previous studies8,9,14,15 that reported on the DNA methylation-based breast cancer risk factors considered in this study.

Table 1 Characteristics of the study sample.

The mean (SD) DNAm age based on the Hannum, Horvath and Levine clocks were 57.3 (6.4), 55.5 (6.5) and 53.0 (7.4) years, respectively. By construction, the mean epigenetic age acceleration was zero years, and the SD was 4.2, 4.7 and 5.6 years, respectively. The mean (SD) percentage genome-wide average DNA methylation was 53.0% (0.3%).

Figures 1 and 2 show the correlations between chronological age and the DNA methylation-based risk factors. The three DNAm age measures were correlated with each other, and each of them was correlated with chronological age (all r > 0.61; all P < 10−15). The three epigenetic age acceleration measures were correlated with each other (all r > 0.62; all P < 10−15), and independent of chronological age by construction. Genome-wide average DNA methylation was independent of chronological age, DNAm age and epigenetic age acceleration (all P > 0.19). No obvious heterogeneity in the correlations across MZ twins, DZ twins and sisters was observed.

Figure 1
figure 1

Correlation of chronological age with DNAm age.

Figure 2
figure 2

Correlations of genome-wide average DNA methylation with chronological ages and DNAm age.

Associations between DNA methylation-based and conventional risk factors

Table 2 shows that the epigenetic acceleration measures were associated with different conventional risk factors: the Hannum clock was negatively associated with age at first live birth, the Horvath clock was positively associated with BMI and age at menarche, and the Levine clock was positively associated with BMI, pack-years of smoking and alcohol drinking (all nominal P < 0.05); BMI was the only risk factor associated with two epigenetic acceleration measures. Genome-wide average DNA methylation was negatively associated with the number of live births, and positively associated with age at first live birth (both nominal P < 0.05). No association was found for the other risk factors.

Table 2 Association estimates between epigenetic age acceleration, genome-wide average DNA methylation and conventional breast cancer risk factors

Similar results were found from the sensitivity analyses. For categorized BMI, positive associations were found for the Horvath clock: regression coefficient (β) = 1.10, 95% confidence interval (CI) = 0.03 to 2.17, P = 0.04 for overweight women, and β = 1.12, 95% CI = 0.08 to 2.17, P = 0.04 for obese women. As to the Levine clock, the association for obese women was marginally significant: β = 1.29, 95% CI = −0.01 to 2.58, P = 0.05. No association was found for the Hannum clock or genome-wide average DNA methylation. The analyses including the whole sample for pack-years of smoking, number of live births, length of HRT use and length of oral contraceptive use gave similar results (data not shown).

Replication and meta-analyses

The associated risk factors found above were examined using the Melbourne Collaborative Cohort Study (MCCS) sample (Table 3). At the level of nominal significance (P < 0.05), the association between BMI and the Levine clock was replicated using both the MCCS women and men, and the association between pack-years of smoking and the Levine clock was replicated using the MCCS men.

Table 3 Association estimates between DNA methylation-based risk factors and conventional risk factors from the MCCS and meta-analysis.

The associations for the Hannum clock were not statistically significant from any of the meta-analyses, and the rest of the associations were found to be nominally significant in at least one of the meta-analyses, with the exception that the association between number of live births and genome-wide average DNA methylation was marginally significant (P = 0.07). Smoking status was associated with the greatest number of DNA methylation-based risk factors investigated: the Horvath clock, Levine clock and genome-wide average DNA methylation.

No evidence of heterogeneity by sex was found for any association, except for the association between smoking status and the Horvath clock (P = 0.04). Note that this association was also significant in the meta-analysis when including women only.

Within-twin-pair associations

The associated risk factors found above were examined using within-twin-pair analyses (Table 4). For all twin pairs, there was evidence of a within-twin-pair association between the Horvath clock and BMI. There was no evidence of within-twin-pair association heterogeneity between MZ and DZ pairs (all P > 0.05).

Table 4 Within-twin-pair association estimates between DNA methylation-based risk factors and conventional risk factors.

Discussion

Using a sample of unaffected women comparable with those in the previous studies3,7,11,13 which have reported that DNAm age and genome-wide average DNA methylation are predictive of breast cancer risk, we investigated the associations between these DNA methylation-based risk factors and multiple conventional breast cancer risk factors. We further conducted replication and meta-analyses using independent and homogenous samples, as well as within-twin-pair analyses that control for familial confounding. We found that DNAm age was associated with lifestyle factors (BMI, smoking and alcohol drinking) and hormonal factors (age at menarche), while genome-wide average DNA methylation was associated with hormonal factors (number of live births and age at first live birth) and smoking. We also found evidence that the association of DNAm age with BMI remained after controlling for familial factors shared by twins.

Following our previous finding that the variance in genome-wide average DNA methylation is determined to a large extent by as yet unmeasured environmental factors across the human lifespan, including non-genetic factors shared by relatives17, here we found that smoking, number of live births and age at first live birth were associated with this summary measure of DNA methylation across the genome. To the best of our knowledge, our study is the first to report that genome-wide average DNA methylation is associated with these three conventional risk factors.

Similar to previous studies14,27,28,29,30, we also found DNAm age was positively associated with lifestyle risk factors including BMI, smoking and alcohol drinking. This suggests that some lifestyle factors accelerate biological ageing, consistent with both DNAm age and those lifestyle factors being associated with an increased risk of breast cancer. We found that the Levine clock was associated with more lifestyle factors than the other two clocks. Similar findings have arisen from previous studies; see Table 1 of Horvath et al.31 Interestingly, the Levine clock was found to be most strongly associated with breast cancer risk16. These differences might be due to differences in the development of these clocks; the Horvath and Hannum clocks are trained on chronological age12,13, while the Levine clock is trained on the ‘phenotypic age’ which in addition to chronological age also considers nine other health-related factors, so is theoretically more relevant to health14.

We found that DNAm age was associated with hormonal risk factors, as has been previously found26,32. Binder and colleagues found a negative association between the Horvath clock and age at menarche32, which is in the opposite direction of our observed association. One difference between the two studies is that Binder et al. studied adolescent girls while we studied older women. Note that the point estimates for the associations of age at menarche and the other two clocks were also positive, providing some more evidence about the direction of association between DNAm age and age at menarche for older women.

Results from the within-twin-pair analyses suggest that our observed association of the Horvath clock with BMI is unlikely to be due to confounding due to familial factors shared by twins, because such potentially confounding effects are cancelled out when using a regression analysis of within-twin-pair differences. In this sense, this association is more likely to be causal. We did not find evidence for genetic confounders, given that the within-twin-pair association was similar for MZ and DZ twin pairs. The null results for the other within-twin-pair associations do not necessarily imply that the associations observed in Tables 2 and 3 are due to familial confounding. Note that, the confidence interval of the within-twin-pair association in Table 4 contained the point estimate of the corresponding association in Tables 2 and 3, suggesting that the associations tend to be alike, and the lack of statistical significance in the within-twin-pair analyses is possibly a result of insufficient sample size.

Our findings imply that DNA methylation-based risk factors potentially interplay with their associated conventional risk factors in modifying breast cancer risk. Such interplay may be the due to the DNA methylation-based risk factors mediating the associations between conventional risk factors and cancer risk, or other mechanisms. Studies with appropriate data, design and analytic methods are required to further investigate the interrelationships between DNA methylation-based risk factors, conventional risk factors and breast cancer risk. Findings from such studies could provide novel insights on the aetiology, early detection and prevention of breast cancer.

The strengths of our study include: (1) using samples comparable with those used in studies that reported the DNA methylation-based risk factors, (2) investigating a comprehensive list of conventional breast cancer risk factors, which notably includes family history and mammographic density, and (3) using a sample for discovery and homogenous samples for replication analyses, as well as conducting a meta-analysis to take advantage of the increased sample size. One potential limitation is that no multiple testing adjustment was performed; our results must of course be interpreted with this in mind.

In conclusion, our study found evidence that the lifestyle risk factors BMI, smoking and alcohol drinking, and the hormonal breast cancer risk factors age at menarche, age at first live birth and number of liver births, are associated with the DNA methylation-based biological age and global DNA methylation. We also found evidence that the observed associations are unlikely to be due to familial confounding but are likely causal. DNA methylation-based risk factors could interplay with conventional risk factors to modify breast cancer risk. Such interplay requires further investigation.

Methods

Study sample for discovery

The sample was participants from the Australian Mammographic Density Twins and Sisters Study (AMDTSS)33, which was approved by Human Research Ethics Committee at the University of Melbourne in accordance with the Declaration of Helsinki. All participants provided written informed consent. The analytical sample included 479 women from 130 families: 66 monozygotic twin (MZ) pairs, 66 dizygotic twin (DZ) pairs and 215 sisters of twins34. They were aged from 40–78 years and had a mean (standard deviation [SD]) age of 55.7 (8.0) years. All women were healthy and none had been diagnosed with breast cancer when recruited.

DNAm age and genome-wide average DNA methylation

Methylation of DNA extracted from dried peripheral blood spots was measured using the HM450 assay. Data were normalized using Illumina’s reference factor-based normalization methods and subset-quantile within array normalization35 for type I and II probe bias correction, and an empirical Bayes batch effects removal method, ComBat36, was applied to minimise technical variation across physical batches; see Li et al.34 for more details.

DNAm age was calculated based on each of the Hannum, Horvath and Levine clocks, using the online calculator (https://dnamage.genetics.ucla.edu). As done previously16, we investigated epigenetic age acceleration, calculated as the residuals from regressing DNAm age on chronological age and blood cell composition. Therefore, epigenetic age acceleration was independent of chronological age and blood cell composition. Blood cell composition (CD8 + T-cells, CD4 + T-cells, natural killer cells, B-cells, monocytes and granulocytes) was estimated using the Houseman method37 implemented in the minfi package and Reinius et al.38 as reference. Genome-wide average DNA methylation was calculated as the average methylation beta-value across all autosomal probes that passed quality control.

Conventional breast cancer risk factors

We studied multiple conventional breast cancer risk factors, including lifestyle factors, hormonal factors, family history and mammographic density (Table 1). All risk factors except mammographic density were collected via telephone-administered questionnaire survey. Lifestyle factors included body mass index (BMI), smoking status, smoking intensity measured as pack-years and alcohol drinking. Hormonal factors included age at menarche, parity, number of live births, age at first live birth, oral contraceptive use, length of oral contraceptive use in years, menopausal status, age at menopause, hormonal replacement therapy (HRT) use and duration of HRT use in years. Family history was defined as having at least one first-degree relative diagnosed with breast cancer. Mammographic density was measured using the computer-assisted thresholding technique, CUMULUS (Imaging Research Program, Sunnybrook Health Sciences Centre, University of Toronto, Toronto, Canada), at the conventional brightness threshold. We studied three measures: dense area, non-dense area and percentage dense area. Details of the measurement can be found in Odefrey et al.33 435 women had mammographic density data available.

Statistical analysis

Correlations between DNA methylation-based risk factors and chronological age were assessed using Pearson’s correlation coefficient. We investigated the association of each DNA methylation-based risk factor (the three epigenetic age acceleration measures and genome-wide average DNA methylation) with the conventional risk factors separately using a linear regression model, in which the DNA methylation-based risk factor was the dependent variable and the conventional risk factor was the independent variable. To account for the relatedness between family members, the regression model was fitted using the Generalised Estimating Equations method with family as cluster. The model was adjusted for age, BMI, smoking status and blood cell composition, with the exception that blood cell composition was dropped from the model investigating epigenetic age acceleration, since epigenetic age acceleration is independent of cell composition by construction. A P-value of 0.05 was used to define nominal statistical significance. All statistical tests were two-sided.

Mammographic dense, non-dense and percentage dense areas were power-transformed with powers of 0.25, 0.25, 0.4, respectively, and the other right-skewed continuous risk factors (BMI, pack-years of smoking, number of live births, and durations of HRT and oral contraceptive use) were log transformed, to have an approximately normal distribution. The analyses for pack-years of smoking, age at menopause, number of live births, age at first live birth, length of HRT use and length of oral contraceptive use, were restricted to participants who were ever smokers, post-menopausal, parous, ever HRT users and ever oral contraceptive users, as appropriate.

Sensitivity analysis

BMI was analysed as a categorical variable: normal (BMI < 25; 217 women; used as the reference group), overweight (25 ≤ BMI < 30; 146 women) and obese (BMI ≥ 30; 116 women). The analyses for pack-years of smoking, number of live births, length of HRT use and length of oral contraceptive use included non-smokers, nulliparous women, HRT non-users and oral contraceptive non-users as well, for whom the corresponding variable was treated as zero and added by one in log transformation.

Replication and meta-analysis

For the associations with a nominal P < 0.05, we performed replication analyses using the data of participants (1,350 women, 2,004 men) from the Melbourne Collaborative Cohort Study (MCCS)39. The participants were the controls from case-control studies of several cancers nested in the MCCS. Same as those in the AMDTSS, DNA samples were extracted from peripheral blood, DNA methylation data were measured using the HM450 assay and normalized using Illumina’s reference factor-based normalization methods and subset-quantile within array normalization; see Dugue et al.23,27 for more details.

The association was investigated using the same linear regression model. Additional to the covariates used in the AMDTSS analysis, country of birth (Australia/New Zealand, United Kingdom/Malta, Italy, or Greece) and sample type (dried blood spot, peripheral blood mononuclear cells, or buffy coat) were adjusted for, all fitted as fixed-effects. Batch effects were minimized by fitting study and the plate on which the sample was processed as random effects. We also investigated the associations between smoking status and DNA methylation-based risk factors. The analysis was stratified by sex. The heterogeneity in the association by sex was investigated by fitting an interaction term of the risk factor with sex in the regression model for the whole sample.

Taking advantage of the homogeneity between the two samples40, results from the two studies were pooled via a fixed-effect meta-analysis, using the generic inverse variance method. Two meta-analyses were performed: (1) AMDTSS sample and MCCS women, and (2) AMDTSS sample, MCCS women and MCCS men.

Within-twin-pair analysis

Taking advantage of our sample including twin pairs, we performed within-twin-pair analyses to investigate the associations after controlling for the confounding effects of familial factors (both known and unknown) shared by twins. Within-twin-pair differences in the DNA methylation-based risk factors, conventional risk factors and covariates were calculated, and used to estimate their within-twin-pair associations using an ordinary linear regression model without an intercept41. The within-twin-pair analyses were performed separately for all twin pairs, MZ pairs and DZ pairs. We investigated the heterogeneity in the within-twin-pair association by zygosity by fitting an interaction term of the risk factor with zygosity in the regression model for all twin pairs.