DNA methylation-based biological age, genome-wide average DNA methylation, and conventional breast cancer risk factors

DNA methylation-based biological age (DNAm age), as well as genome-wide average DNA methylation, have been reported to predict breast cancer risk. We aimed to investigate the associations between these DNA methylation-based risk factors and 18 conventional breast cancer risk factors for disease-free women. A sample of 479 individuals from the Australian Mammographic Density Twins and Sisters was used for discovery, a sample of 3354 individuals from the Melbourne Collaborative Cohort Study was used for replication, and meta-analyses pooling results from the two studies were conducted. DNAm age based on three epigenetic clocks (Hannum, Horvath and Levine) and genome-wide average DNA methylation were calculated using the HumanMethylation 450 K BeadChip assay data. The DNAm age measures were positively associated with body mass index (BMI), smoking, alcohol drinking and age at menarche (all nominal P < 0.05). Genome-wide average DNA methylation was negatively associated with smoking and number of live births, and positively associated with age at first live birth (all nominal P < 0.05). The association of DNAm age with BMI was also evident in within-twin-pair analyses that control for familial factors. This study suggests that some lifestyle and hormonal risk factors are associated with these DNA methylation-based breast cancer risk factors, and the observed associations are unlikely to be due to familial confounding but are likely causal. DNA methylation-based risk factors could interplay with conventional risk factors in modifying breast cancer risk.

DNA methylation has been proposed to play an important role in the aetiology of complex traits and diseases, including cancer 1,2 . DNA methylation, especially when detected in blood, has been investigated for its association with breast cancer risk. Peripheral blood DNA methylation at the known breast cancer susceptibility genes BRCA1 and ATM has been found to be associated with elevated breast cancer risk [3][4][5] . Several genome-wide studies of DNA methylation have reported peripheral blood DNA methylation changes associated with the risk of breast cancer [6][7][8][9][10] .
Two DNA methylation-based measures have been reported to be associated with breast cancer risk. One measure is DNA methylation-based biological age (DNAm age). Several epigenetic clocks have been developed to estimate DNAm age 11 , and the Hannum 12 and Horvath 13 clocks have received the most attention. Both these clocks are developed by regressing chronological age on individual methylation sites to select a set of sites to predict chronological age. The Levine clock 14 has recently been developed by selecting a set of sites to predict Results characteristics of the Australian mammographic density twins and sisters study (AMDtSS) sample. Table 1 shows the distribution of the investigated risk factors for the AMDTSS sample. Our sample is comparable with samples from previous studies 8,9,14,15 that reported on the DNA methylation-based breast cancer risk factors considered in this study.
The mean (SD) DNAm age based on the Hannum, Horvath and Levine clocks were 57.3 (6.4), 55.5 (6.5) and 53.0 (7.4) years, respectively. By construction, the mean epigenetic age acceleration was zero years, and the SD was 4.2, 4.7 and 5.6 years, respectively. The mean (SD) percentage genome-wide average DNA methylation was 53.0% (0.3%). Figures 1 and 2 show the correlations between chronological age and the DNA methylation-based risk factors. The three DNAm age measures were correlated with each other, and each of them was correlated with chronological age (all r > 0.61; all P < 10 −15 ). The three epigenetic age acceleration measures were correlated with each other (all r > 0.62; all P < 10 −15 ), and independent of chronological age by construction. Genome-wide average DNA methylation was independent of chronological age, DNAm age and epigenetic age acceleration (all P > 0. 19). No obvious heterogeneity in the correlations across MZ twins, DZ twins and sisters was observed. Table 2 shows that the epigenetic acceleration measures were associated with different conventional risk factors: the Hannum clock was negatively associated with age at first live birth, the Horvath clock was positively associated with BMI and age at menarche, and the Levine clock was positively associated with BMI, pack-years of smoking and alcohol drinking (all nominal P < 0.05); BMI was the only risk factor associated with two epigenetic acceleration measures. Genome-wide average DNA methylation was negatively associated with the number of live births, and positively associated with age at first live birth (both nominal P < 0.05). No association was found for the other risk factors.

Associations between DnA methylation-based and conventional risk factors.
Similar results were found from the sensitivity analyses. For categorized BMI, positive associations were found for the Horvath clock: regression coefficient (β) = 1.10, 95% confidence interval (CI) = 0.03 to 2.17, P = 0.04 for overweight women, and β = 1.12, 95% CI = 0.08 to 2.17, P = 0.04 for obese women. As to the Levine clock, the association for obese women was marginally significant: β = 1.29, 95% CI = −0.01 to 2.58, P = 0.05. No association was found for the Hannum clock or genome-wide average DNA methylation. The analyses including the whole sample for pack-years of smoking, number of live births, length of HRT use and length of oral contraceptive use gave similar results (data not shown).
Melbourne Collaborative Cohort Study (MCCS) sample (Table 3). At the level of nominal significance (P < 0.05), the association between BMI and the Levine clock was replicated using both the MCCS women and men, and the association between pack-years of smoking and the Levine clock was replicated using the MCCS men.
The associations for the Hannum clock were not statistically significant from any of the meta-analyses, and the rest of the associations were found to be nominally significant in at least one of the meta-analyses, with the exception that the association between number of live births and genome-wide average DNA methylation was marginally significant (P = 0.07). Smoking status was associated with the greatest number of DNA methylation-based risk factors investigated: the Horvath clock, Levine clock and genome-wide average DNA methylation.
No evidence of heterogeneity by sex was found for any association, except for the association between smoking status and the Horvath clock (P = 0.04). Note that this association was also significant in the meta-analysis when including women only.
Within-twin-pair associations. The associated risk factors found above were examined using within-twin-pair analyses (Table 4). For all twin pairs, there was evidence of a within-twin-pair association between the Horvath clock and BMI. There was no evidence of within-twin-pair association heterogeneity between MZ and DZ pairs (all P > 0.05).

Discussion
Using a sample of unaffected women comparable with those in the previous studies 3,7,11,13 which have reported that DNAm age and genome-wide average DNA methylation are predictive of breast cancer risk, we investigated the associations between these DNA methylation-based risk factors and multiple conventional breast cancer risk factors. We further conducted replication and meta-analyses using independent and homogenous samples, as well as within-twin-pair analyses that control for familial confounding. We found that DNAm age was associated   www.nature.com/scientificreports www.nature.com/scientificreports/ with lifestyle factors (BMI, smoking and alcohol drinking) and hormonal factors (age at menarche), while genome-wide average DNA methylation was associated with hormonal factors (number of live births and age at first live birth) and smoking. We also found evidence that the association of DNAm age with BMI remained after controlling for familial factors shared by twins.
Following our previous finding that the variance in genome-wide average DNA methylation is determined to a large extent by as yet unmeasured environmental factors across the human lifespan, including non-genetic factors shared by relatives 17 , here we found that smoking, number of live births and age at first live birth were associated with this summary measure of DNA methylation across the genome. To the best of our knowledge, our study is www.nature.com/scientificreports www.nature.com/scientificreports/ the first to report that genome-wide average DNA methylation is associated with these three conventional risk factors.
Similar to previous studies 14,27-30 , we also found DNAm age was positively associated with lifestyle risk factors including BMI, smoking and alcohol drinking. This suggests that some lifestyle factors accelerate biological ageing, consistent with both DNAm age and those lifestyle factors being associated with an increased risk of breast cancer. We found that the Levine clock was associated with more lifestyle factors than the other two clocks. Similar findings have arisen from previous studies; see Table 1 of Horvath et al. 31 Interestingly, the Levine clock was found to be most strongly associated with breast cancer risk 16 . These differences might be due to differences in the development of these clocks; the Horvath and Hannum clocks are trained on chronological age 12,13 , while www.nature.com/scientificreports www.nature.com/scientificreports/ the Levine clock is trained on the 'phenotypic age' which in addition to chronological age also considers nine other health-related factors, so is theoretically more relevant to health 14 .
We found that DNAm age was associated with hormonal risk factors, as has been previously found 26,32 . Binder and colleagues found a negative association between the Horvath clock and age at menarche 32 , which is in the opposite direction of our observed association. One difference between the two studies is that Binder et al. studied adolescent girls while we studied older women. Note that the point estimates for the associations of age at menarche and the other two clocks were also positive, providing some more evidence about the direction of association between DNAm age and age at menarche for older women.
Results from the within-twin-pair analyses suggest that our observed association of the Horvath clock with BMI is unlikely to be due to confounding due to familial factors shared by twins, because such potentially confounding effects are cancelled out when using a regression analysis of within-twin-pair differences. In this sense, this association is more likely to be causal. We did not find evidence for genetic confounders, given that the within-twin-pair association was similar for MZ and DZ twin pairs. The null results for the other within-twin-pair associations do not necessarily imply that the associations observed in Tables 2 and 3 are due to familial confounding. Note that, the confidence interval of the within-twin-pair association in Table 4 contained the point estimate of the corresponding association in Tables 2 and 3, suggesting that the associations tend to be alike, and the lack of statistical significance in the within-twin-pair analyses is possibly a result of insufficient sample size.
Our findings imply that DNA methylation-based risk factors potentially interplay with their associated conventional risk factors in modifying breast cancer risk. Such interplay may be the due to the DNA methylation-based risk factors mediating the associations between conventional risk factors and cancer risk, or other mechanisms. Studies with appropriate data, design and analytic methods are required to further investigate the interrelationships between DNA methylation-based risk factors, conventional risk factors and breast cancer risk. Findings from such studies could provide novel insights on the aetiology, early detection and prevention of breast cancer.
The strengths of our study include: (1) using samples comparable with those used in studies that reported the DNA methylation-based risk factors, (2) investigating a comprehensive list of conventional breast cancer risk factors, which notably includes family history and mammographic density, and (3) using a sample for discovery and homogenous samples for replication analyses, as well as conducting a meta-analysis to take advantage of the increased sample size. One potential limitation is that no multiple testing adjustment was performed; our results must of course be interpreted with this in mind.
In conclusion, our study found evidence that the lifestyle risk factors BMI, smoking and alcohol drinking, and the hormonal breast cancer risk factors age at menarche, age at first live birth and number of liver births, are associated with the DNA methylation-based biological age and global DNA methylation. We also found evidence that the observed associations are unlikely to be due to familial confounding but are likely causal. DNA   www.nature.com/scientificreports www.nature.com/scientificreports/ methylation-based risk factors could interplay with conventional risk factors to modify breast cancer risk. Such interplay requires further investigation.

Methods
Study sample for discovery. The sample was participants from the Australian Mammographic Density Twins and Sisters Study (AMDTSS) 33  DnAm age and genome-wide average DnA methylation. Methylation of DNA extracted from dried peripheral blood spots was measured using the HM450 assay. Data were normalized using Illumina's reference factor-based normalization methods and subset-quantile within array normalization 35 for type I and II probe bias correction, and an empirical Bayes batch effects removal method, ComBat 36 , was applied to minimise technical variation across physical batches; see Li et al. 34 for more details.
DNAm age was calculated based on each of the Hannum, Horvath and Levine clocks, using the online calculator (https://dnamage.genetics.ucla.edu). As done previously 16 , we investigated epigenetic age acceleration, calculated as the residuals from regressing DNAm age on chronological age and blood cell composition. Therefore, epigenetic age acceleration was independent of chronological age and blood cell composition. Blood cell composition (CD8 + T-cells, CD4 + T-cells, natural killer cells, B-cells, monocytes and granulocytes) was estimated using the Houseman method 37 implemented in the minfi package and Reinius et al. 38 as reference. Genome-wide average DNA methylation was calculated as the average methylation beta-value across all autosomal probes that passed quality control. conventional breast cancer risk factors. We studied multiple conventional breast cancer risk factors, including lifestyle factors, hormonal factors, family history and mammographic density ( Table 1). All risk factors except mammographic density were collected via telephone-administered questionnaire survey. Lifestyle factors included body mass index (BMI), smoking status, smoking intensity measured as pack-years and alcohol drinking. Hormonal factors included age at menarche, parity, number of live births, age at first live birth, oral contraceptive use, length of oral contraceptive use in years, menopausal status, age at menopause, hormonal replacement therapy (HRT) use and duration of HRT use in years. Family history was defined as having at least one first-degree relative diagnosed with breast cancer. Mammographic density was measured using the computer-assisted thresholding technique, CUMULUS (Imaging Research Program, Sunnybrook Health Sciences Centre, University of Toronto, Toronto, Canada), at the conventional brightness threshold. We studied three measures: dense area, non-dense area and percentage dense area. Details of the measurement can be found in Odefrey et al. 33 435 women had mammographic density data available.  www.nature.com/scientificreports www.nature.com/scientificreports/ Statistical analysis. Correlations between DNA methylation-based risk factors and chronological age were assessed using Pearson's correlation coefficient. We investigated the association of each DNA methylation-based risk factor (the three epigenetic age acceleration measures and genome-wide average DNA methylation) with the conventional risk factors separately using a linear regression model, in which the DNA methylation-based risk factor was the dependent variable and the conventional risk factor was the independent variable. To account for the relatedness between family members, the regression model was fitted using the Generalised Estimating Equations method with family as cluster. The model was adjusted for age, BMI, smoking status and blood cell composition, with the exception that blood cell composition was dropped from the model investigating epigenetic age acceleration, since epigenetic age acceleration is independent of cell composition by construction. A P-value of 0.05 was used to define nominal statistical significance. All statistical tests were two-sided.
Mammographic dense, non-dense and percentage dense areas were power-transformed with powers of 0.25, 0.25, 0.4, respectively, and the other right-skewed continuous risk factors (BMI, pack-years of smoking, number of live births, and durations of HRT and oral contraceptive use) were log transformed, to have an approximately normal distribution. The analyses for pack-years of smoking, age at menopause, number of live births, age at first live birth, length of HRT use and length of oral contraceptive use, were restricted to participants who were ever smokers, post-menopausal, parous, ever HRT users and ever oral contraceptive users, as appropriate. Sensitivity analysis. BMI was analysed as a categorical variable: normal (BMI < 25; 217 women; used as the reference group), overweight (25 ≤ BMI < 30; 146 women) and obese (BMI ≥ 30; 116 women). The analyses for pack-years of smoking, number of live births, length of HRT use and length of oral contraceptive use included non-smokers, nulliparous women, HRT non-users and oral contraceptive non-users as well, for whom the corresponding variable was treated as zero and added by one in log transformation.

Replication and meta-analysis.
For the associations with a nominal P < 0.05, we performed replication analyses using the data of participants (1,350 women, 2,004 men) from the Melbourne Collaborative Cohort Study (MCCS) 39 . The participants were the controls from case-control studies of several cancers nested in the MCCS. Same as those in the AMDTSS, DNA samples were extracted from peripheral blood, DNA methylation data were measured using the HM450 assay and normalized using Illumina's reference factor-based normalization methods and subset-quantile within array normalization; see Dugue et al. 23,27 for more details.
The association was investigated using the same linear regression model. Additional to the covariates used in the AMDTSS analysis, country of birth (Australia/New Zealand, United Kingdom/Malta, Italy, or Greece) and sample type (dried blood spot, peripheral blood mononuclear cells, or buffy coat) were adjusted for, all fitted as fixed-effects. Batch effects were minimized by fitting study and the plate on which the sample was processed as random effects. We also investigated the associations between smoking status and DNA methylation-based risk factors. The analysis was stratified by sex. The heterogeneity in the association by sex was investigated by fitting an interaction term of the risk factor with sex in the regression model for the whole sample.
Taking advantage of the homogeneity between the two samples 40 , results from the two studies were pooled via a fixed-effect meta-analysis, using the generic inverse variance method. Two meta-analyses were performed: (1) AMDTSS sample and MCCS women, and (2) AMDTSS sample, MCCS women and MCCS men.
Within-twin-pair analysis. Taking advantage of our sample including twin pairs, we performed within-twin-pair analyses to investigate the associations after controlling for the confounding effects of familial