Exploratory analysis of the human breast DNA methylation profile upon soymilk exposure

Upon soy consumption, isoflavone metabolites attain bioactive concentrations in breast tissue possibly affecting health. Though in vitro epigenetic activity of soy metabolites has been described, the in vivo impact on the epigenome is largely unknown. Therefore, in this case-control study, the breast glandular tissue DNA methylome was explored in women undergoing an aesthetic breast reduction. After a run-in phase, 10 generally healthy Belgian or Dutch women received soymilk for 5 days. MethylCap-seq methylation profiles were compared with those of 10 matched controls. Isoflavones and their microbial metabolites were quantified in urine, serum, and glandular breast tissue (liquid chromatography-mass spectrometry) and 17β-estradiol in glandular breast tissue (immunoassay). Global DNA methylation levels were obtained for 6 cases and 5 controls using liquid chromatography-mass spectrometry. Although lower MethylCap-seq coverages were observed, mass spectrometry results and computational LINE-1 methylation analysis did not provide evidence supporting global methylation alterations upon treatment. At a false discovery rate of 0.05, no differentially methylated loci were identified. Moreover, a set of previously identified loci was specifically tested, but earlier reported results could not be validated. In conclusion, after a 5-day soymilk treatment, no major general epigenetic reprogramming in breast tissue could be found in this exploratory study.


ID
) and were therefore not listed. Units for each variable are: years, kg/m², nmol/L, and pmol/g for respectively the age, BMI, the S concentrations and the BG concentrations. *Ages were calculated based on provided birth year and date of surgery. Age for one participant (SB22), indicated by $, was manually modified to 18, given that a rounding error led to 17 years of age, though no participants were younger than 18 years at the date of procedure. **Smokers.  10 is the only report on the effect of dietary isoflavones on the DNA methylation degrees of a limited subset (5) of cancer-related genes in human mammary ductoscopy samples. Though no treatment-related changes were observed in this study, results suggested an association between higher post-treatment genistein levels and RARβ2 (RARB) and CCND2 hypermethylation, which might increase breast cancer risk 10 . As epigenetic alterations in breast cancer have been related to histological and outcome data 11 , additional research is required to evaluate the in vivo epigenetic impact of isoflavone exposure in breast tissue, preferably in a genome-wide manner. Therefore, here, we explored the breast tissue epigenome of healthy women with and without soymilk supplementation, both globally (by liquid chromatography -mass spectrometry, LC-MS) and locally (by MethylCap-seq).

Study population.
A total of 30 healthy women undergoing an aesthetic breast reduction, all complying with the study protocol, participated in this study. However, for the methylomics analysis, 10 controls were selected based on their age and menstrual cycle or menopausal status, to match the overall characteristics of subjects in the soymilk group. As a result, the sample size was decreased to 20. The age and BMI, based on self-reported weight and height measurements, ranged from 17 to 60 y and from 18.7 to 33.0 kg/m², respectively (Table 1). Five women (25%; soymilk: 2; control: 3) were in the follicular phase of their menstrual cycle and 3 (15%; soymilk: 2; control: 1) in the luteal phase, whereas 12 (60%; soymilk: 6; control: 6) were (peri)menopausal. Two women were taking oral contraceptives (10%; soymilk: 1; control: 1). All participants reported average fat and fiber intakes, and 4 women (20%; soymilk: 2; control: 2) were smoking on a daily basis. With regard to past and present isoflavone intakes, the study population consisted of mostly non-consumers (80%), with only 2 subjects consuming soy-derived products on a daily-to-weekly basis (soymilk: 1; control: 1).
Exposure to isoflavones and 17β-estradiol. Exposure to genistein, daidzein, and its microbial metabolites upon soy supplementation, was assessed as the sum of unconjugated aglycones and deconjugated (sulfo) glucuronides and sulfates, measured in hydrolyzed urine, serum, and glandular breast tissue. None of the urine samples collected at the end of the run-in phase after the intervention phase for the control group contained detectable amounts of isoflavones, whereas the estimated daily urinary isoflavone excretion confirmed compliance to the soymilk diet in the treatment group (genistein: 6.6-13,140.0 µmol/day; daidzein: 2.4-2,025.0 µmol/day; individual data not shown). Genistein and total daidzein (i.e., sum of daidzein, dihydrodaidzein, O-desmethylangolensin, and equol) serum concentrations ranged from 246.4 to 2,831 nmol/L and from 105.1 to 1,314 nmol/L upon treatment, respectively. In breast glandular tissue, exposure levels of 114.0-493.8 pmol/g genistein, of 22.2-666.0 pmol/g total daidzein, and of 0.021-1.489 pmol/g 17β-estradiol were measured. 17β-estradiol was not significantly different between soymilk and control groups (Wilcoxon rank-sum test, P = 0.74). Individual hormone and metabolite levels can be found in Table 1.

DNA methylation profiles.
Sequencing statistics and data exploration. Overall coverages, amount and fraction of mapped fragments, and library sizes upon data summary are shown in Table 2. Hierarchical cluster analysis of soy supplementation and control samples revealed that the technical replicates cluster together (Fig. 1), yet that there is no major distinction between treated and control subjects. Though the right cluster appears to be featured by more soy-treated samples and somewhat lower coverages, no sample characteristic (age, BMI, total coverage, library size, treatment) was found to clearly explain these two clusters (data not shown).
Global methylation degrees. Comparison of general sequencing characteristics between the soymilk and control groups revealed significant differences in coverage (P = 0.009), number of mapped fragments (P = 0.012) and library size upon data summary (P = 0.003), despite sample randomization before gDNA extraction and MethylCap-seq. The lower coverages for the soymilk samples may suggest global hypomethylation upon soy exposure. However, as reflected by the variation in sequencing characteristics for the technical replicates (Table 2), technical sequencing variation is an important, yet difficult to assess, confounder. Therefore, we attempted to find additional evidence supporting overall differences in DNA methylation between the treatment groups. These analyses were based on (i) independent measurement of methylation degree (by LC-MS) on a subset of the population under study, (ii) an evaluation of the association between the MethylCap sequencing characteristics and measured soy metabolite concentrations in the tissue for the soy-exposed group only, (iii) evaluation of LINE-1 methylation (proxy for global methylation 12 ) shifts associated with soy exposure in the MethylCap-seq data (full population). First, for a subset of 6 cases and 5 controls (i.e. 11 subjects of the total of 20), sufficient gDNA was available to measure global DNA methylation by LC-MS analysis. No significant difference was observed between cases (2.45 ± 0.55%) and controls (2.34 ± 0.46%) (mean ± standard deviation) (P = 0.72).
Next, if sequencing characteristics differ due to soy supplementation, one would expect that these parameters are also correlated with the in vivo concentrations of soy metabolites (genistein, total daidzein and total isoflavones) in soy treated individuals. However, Spearman correlation revealed no borderline or significant associations between MethylCap-seq library size and respectively genistein (ρ = 0.47, P = 0.18), total daidzein (ρ = 0.03, P = 0.95), and total isoflavone (ρ = 0.37, P = 0.30) tissue concentrations in those subjects exposed to soy. Additionally, the correlations are all positive, opposite of what would be anticipated if soy supplementation would lead to global hypomethylation (i.e. lower library sizes with higher isoflavone concentrations). It should be noted that the small sample size (n = 10, i.e. only cases) implies a low power.
Finally, LINE-1 repeat methylation was considered, often used as a proxy for "global methylation" levels 12 . On average, approximately 1% of all reads mapped to L1Hs, with no significant differences in LINE-1 methylation between both groups (P = 0.49). It should be noted that adjustment for total coverage might have adjusted for the total degree of DNA methylation as well. However, it seems unlikely that global hypomethylation equally affects all loci in the genome, implying that some (relative) methylation shifts should be observed, cfr. Akalin et al. 13 . Moreover, other relative comparisons, e.g. average genic-intergenic methylation degree ratios, promoter-genic methylation degree ratios, revealed no significant differences between treatment groups (data not shown).
Differential DNA methylation. Upon performing the extra filtering steps for minimum coverage, 758,275 variables were withheld for further analysis, leading to the final dataset (see Materials and Methods section). Given results from the global DNA methylation analysis, TMM was applied for normalization, assuming that most loci are not differentially methylated. At a false discovery rate (FDR) of 0.05, no significant loci were found using limma-voom (minimum FDR: 78,9%). The P-value distribution suggests that a minor treatment effect may be present in the data (minor enrichment at P = 0, Fig. 2), yet this may also be due to (slightly) imperfect normalization of pre-existing library size differences.
Previously, RARB and CCND2 promoter hypermethylation has been associated with in vivo serum genistein concentrations 10 . Moreover, as summarized by Pudenz et al. 14  and BRCA1 (exon 1) and BRCA2 (exon 2) exon methylation in breast (cancer) cells 15 . Therefore, we also evaluated differential methylation in these specific regions, except for the BRCA1 and BRCA2 exonic regions where coverages were too low to pass the filtering step (see Materials and Methods section). As less putatively methylated regions (20) were present in these loci compared to the full genome-wide approach, adjustment for multiple testing is less conservative. However, still no significant results were found using this approach (minimal FDR: 36.1%; Table 3). Furthermore, similar to the analyses by Qin et al. 10 , Spearman correlation between log-cpm values and resp. genistein, total daidizein and total isoflavone concentrations was assessed for these 20 regions in the treatment group (Table 4), again without significant results (likewise for Pearson correlations). For RARB, the exact genomic region tested by Qin et al. 10 was unclear, yet evaluation of other regions in this gene did not yield significant results upon FDR adjustment (data not shown). Finally, for completeness, an additional analysis was performed for all regions in gene promoters (49,378) given the well-described functional relevance of promoter methylation, but also this analysis did not yield significant results (data not shown).
Pathway analysis. Using a P-value cut-off of 0.0005 (see Materials and Methods), Entrez Identifiers of 177 selected regions were considered for both GO enrichment and KEGG pathway analysis. Using this larger set, top GO enrichment results included vacuoles/lysosomes and voltage gated calcium channel gene sets. KEGG pathway analysis includes involvement of signaling pathways such as mTOR, Oxytocine, MAPK, AGE-RAGE among the top results. However, none of these GO terms or KEGG pathways remained significant enriched upon FDR adjustment (Top 10 results in Tables 5 and 6).

Discussion
Here, we report the first genome-wide in vivo study on the impact of soymilk consumption on the glandular mammary epigenome. The aim of this study, exploratory in nature, was to detect consistent methylation differences that are specific to exposure to isoflavones by soy intake. Previously reported in vitro experiments 1 suggest hypomethylation upon soy exposure, which might confer altered breast cancer risk, as reviewed by 16 . Though this study has several limitations (see below), it did not reveal data supporting the latter to be physiologically relevant. Library size estimates appeared to be somewhat lower in the soymilk group, but these estimates are easily affected by experimental/technical variation and are, therefore, no reliable proxies for global DNA methylation. Moreover, LC-MS analysis of global DNA methylation, association with exposure measures, and MethylCap-seq based Line-1 methylation assessment indicated no differences.
Furthermore, locus-specific analysis revealed no significantly differentially methylated loci. Moreover, no significant results for RARβ2 and CCND2 could be observed, though Qin et al. linked hypermethylation of these loci with genistein serum concentrations 10 . The latter study reported dose-specific effects of a longer supplementation phase (one menstrual cycle vs. 5 days) in a different study population, treated with different sources of isoflavones (soy supplements vs. soymilk). Also other previously reported in vitro results 14 could not be reproduced, which is not so surprising as in vitro findings are often difficult to validate in vivo. Indeed, this is the first genome-wide in  vivo study effectively assessing the impact of soymilk on glandular breast tissue, which is the most relevant target in the context of breast health. On the other hand, the affinity of MethylCap-seq depends on the sequence context of the methylated regions, implying that overall coverage (and therefore power) for some loci might have been too limited to validate these known findings 17 . Nevertheless, previous studies by our group using the same methodology (and often for lower sample sizes) did typically lead to significant results, which were often successfully independently validated (e.g. Tomar et al. 18 , Van Vlodrop et al. 19 ), implying that the applied methodology is a very unlikely reason for the general lack of significant findings here. Yet, as the sample size was relatively small in this exploratory analysis, better powered DNA methylation experiments are required to observe potential small effects. Indeed, though this could also be attributed to imperfect normalization of observed library size differences, the P-value distribution of the results suggests that such minor effects may be present. Moreover, we cannot exclude potential impact on breast cancer risk, be it positive or negative. For example, some of the top loci in the general differential methylation analysis, TIAM1, DUSP22 and ID  JAK2 are described to be related to proliferation, survival and invasiveness of breast cancer [20][21][22][23] . Also, top KEGG pathway analysis results exhibit some pathways that have a known association to breast and other cancers such as mTOR, Oxytocine, MAPK, AGE-RAGE signaling pathways [24][25][26][27] . Yet, since both the differential methylation and pathway analyses yielded no significant results upon FDR correction, it is clear that these findings at most point towards the necessity of larger studies to identify possible minor or subpopulation specific effects and do not support major epigenetic modulation of breast cancer risk (positive or negative) as such. Next to increasing power by larger sample numbers and a possible impact of duration of soy consumption, also the (variance of the) age of the participants should be considered in larger studies. In the study at hand, both control and treatment populations were very heterogeneous (Table 1), as we aimed to identify general (i.e. age-independent) methylation differences in the adult female population due to soy consumption. However, whereas our analyses indicate no large, age-independent differences in breast methylation, recent literature links the possible positive or negative effect of soy consumption on breast cancer risk in adults to the period or age when soy consumption started 4,28 . Early-life exposure to soy may alter estrogen mediated processes and therefore, alter the effect of genistein, daidzein and other isoflavones, which are estrogen antagonists. Whether this effect is sustained by DNA methylation remains largely unknown 28 . Note that also other sources of heterogeneity (e.g. smoking, contraceptive use, menstrual cycle) present in our study may obscure effects only present in specific subgroups.
In conclusion, in this exploratory analysis, we observed no impact of soymilk consumption on the human mammary gland epigenome. Furthermore, our study could not confirm previously described results of either in vivo or in vitro studies. Therefore, overall, our exploratory results do not support major general impact of short duration soymilk consumption on breast health through DNA methylation. Yet, we suggest that larger scale research with prolonged exposure and on different time points through female development, menstrual cycles and considering age, is essential for a full understanding of the impact of soy metabolites on (epigenetically regulated) breast health.

Materials and Methods
Subjects and treatments. A total of 30 generally healthy Belgian or Dutch women, scheduled for an aesthetic breast reduction, were recruited for this study; 20 of them were included in this epigenomics study. The exclusion criteria were breast cancer, antibiotic treatment within the previous month, and soy allergy. Ethical approval was granted by the Ethics Committee of the Ghent University Hospital (EC UZG 2005/022). The volunteers were fully informed of the aims of the study and gave their written consent. All experiments were performed in accordance with relevant local and national guidelines and regulations.
One batch of commercially available soymilk derived from whole soybeans in 250 mL cartons (Alpro ® Soya Drink Nature, Alpro NV, Wevelgem, Belgium) was kindly provided by the manufacturer and analyzed in triplicate at study onset and closure as described by Bolca et al. 29 . One portion of soymilk (250 mL) contained 16.98 ± 0.76 mg genistein and 5.40 ± 0.22 mg daidzein aglycone equivalents, and 8.25 g proteins, 7 g carbohydrates, 4.75 g lipids, 1.5 g fibers, 0.375 µg vitamin B12, 0.6 µg vitamin B2, and minerals. Study design. This study was a randomized dietary intervention trial with a run-in phase of at least 4 days and a supplementation phase of 5 days before breast surgery. Following eligibility assessment, volunteers were randomly allocated to the soymilk (n = 10) or control (n = 20, of which 10 included in the epigenomics study) group. All participants were asked to abstain from soy-based products during the whole experimental period. A detailed list of isoflavone-containing foods and dietary supplements was distributed in order to guide the volunteers in this respect. Additionally, subjects were instructed to report every case of doubt or fortuitous consumption and to provide detailed information on that eating occasion, including type and portion size. During the supplementation phase, 250 mL soymilk was consumed daily with breakfast, lunch, and dinner. The control group did not receive any supplementation before surgery. Compliance was evaluated by subject inquiry and urinary isoflavone excretion.
Subjects delivered a spot urine sample after the run-in phase and before anesthesia. During surgery (12-18 h after last soy supplementation), blood and breast biopsies were collected. Serum was obtained by centrifugation  Table 6. Top 10 results for pathway analysis for putatively differentially methylated regions (P < 0.0005). Columns represent KEGG pathway, the number of genes in a pathway, the number of "differentially methylated" genes (loose P < 0.0005 cut-off), the P-value, and False Discovery Rate (FDR).
SciEntiFic REPORTS | (2018) 8:13617 | DOI:10.1038/s41598-018-31767-x (10 min at 600 g, room temperature) after coagulation. Aliquots of both urine and serum samples were stored at −20 °C until analysis. The tissue samples were immediately frozen in liquid nitrogen and stored at −80 °C until analysis. Without thawing the tissue samples, fractions containing almost exclusively glandular tissue were dissected, based on gross inspection. Areas of adipose tissue intimately intermixed with fibroglandular tissue were avoided and connective tissue was removed. Before processing, all samples were randomized and the investigators were blinded to the treatments when working with the samples. In addition, a general questionnaire was used to obtain information on the subjects' history of antibiotic treatments, hormonal therapies, other medication, food supplement intakes, and anthropometric measures. Each participant reported her habitual fat and fiber intakes, and her soy consumption since birth using validated self-administered food-frequency questionnaires 30 . Chemicals. Genistein, daidzein, and equol were purchased from Extrasynthèse (Genay, France), and dihydrodaidzein and O-desmethylangolensin from Plantech UK (Reading, UK). For the hydrolysis of conjugated isoflavones, a 33 g/L-solution of Type H-1 Helix pomatia extract (min. 300 U β-glucuronidase/mg and 15.3 U sulfatase/ mg; Sigma-Aldrich, Bornem, Belgium) in sodium acetate buffer (0.1 mol/L, pH = 5) was prepared. A 400 µmol/L and 40 µmol/L-solution of 4-hydroxybenzophenone in methanol was used as internal standard in the quantitative analyses of urine, serum, and breast tissue, respectively.

MethylCap-seq.
High-quality genomic DNA (gDNA) (20-50 kb; 11.04-127.42 ng/mg) was obtained from 22 mammary gland samples (10 soymilk, 10 matched controls, 2 technical replicates -one for each treatment group) using the PureLink Genomic DNA Mini kit (Invitrogen, Merelbeke, Belgium). Briefly, 50-85 mg of manually dissected glandular tissue was lysed with proteinase K (20 g/L) for 4 h at 55 °C and 300 rpm. The lysate was centrifuged (3 min at 17968 g, room temperature) and treated with RNase A (20 g/L in 50 mM Tris-HCl (pH 8.0), 10 mM EDTA; 2 min at room temperature). The sample was purified through a silica-based membrane and gDNA was eluted twice with 50 µL of water. DNA quantity and fragment length were measured using Qubit dsDNA HS assay (Invitrogen, Merelbeke, Belgium) and 1% agarose gelelectrophoresis, respectively, and all samples were diluted to a final concentration of 500 ng gDNA/75 µL water.
Global DNA methylation percentages. Global DNA methylation levels were assessed as previously reported 39 . As a minimum amount of 1 µg gDNA input was required, analyses were performed on the qualifying subset of 6 treatment and 5 control samples. In summary, gDNA was enzymatically hydrolyzed to deoxyribonucleosides and dissolved in LC-grade water. Similarly, stock solutions were prepared for reference standards of 2′-deoxycytidine (dC) and 5-methyl-2′-deoxycytidine (5-mdC), purchased from Sigma (D3897-1G) and Jena Bioscience (N-1044) respectively, to create a series of calibration standards, used in all of the experiments. Global DNA methylation was assessed by quantifying both 5-mdC and dC using Waters Acquity UPLC coupled to Waters Micromass Quattro Premier Mass Spectrometer (electrospray ionization in positive mode). Global DNA methylation was calculated as a percentage of 5-mdC versus the sum of 5-mdC and dC. Data analysis. For MethylCap-seq, sequence reads were mapped to the human reference genome (GRCh37) with BOWTIE 40 and aligned fragments were summarized using the in-house developed Map of the Human Methylome (http://www.biobix.be/map-of-the-human-methylome/). As previously described in e.g. Van Vlodrop et al. 19 , this map contains genomic regions that are putatively independently methylated, without relying on existing gene annotation. For each of these regions, the maximum number of mapped fragments ("peak height") per sample was used for subsequent analysis (duplicate fragments removed). For annotation, we relied on Ensembl Genes 91. Unless mentioned otherwise, subsequent data analysis was performed using R (3.3.2) 41 .
For the exploratory analysis, the technical replicates were included and log-transformed counts per millions (log-cpm) values were calculated using the "cpm" function present in the R Bioconductor EdgeR package (version 3.16.5) 42 . Subsequently, the 10,000 loci with most variance were selected for hierarchical clustering ("complete" method on Euclidean distance matrix, using the "hclust" function).
For the statistical assessment of differential methylation, those technical replicates with the lowest overall coverage where removed from the analysis. Average log-cpm values were used to compare the genic-intergenic resp. promoter-genic methylation ratios between treatment groups. For the locus specific comparisons, only those loci featured by Gene Symbol annotation and an average coverage of at least 4 (prior to normalization) were used. This SciEntiFic REPORTS | (2018) 8:13617 | DOI:10.1038/s41598-018-31767-x filtering step, aiming to remove uninformative variables from the data set, together with the removal of duplicate entries led to a final set of 759,275 loci out of a total of 3,618,706.
For the analysis of differentially methylated loci, the limma-voom model was used 43 , since limma-voom presented considerably improved quality statistics compared to others (based on MA-plot and P-value distribution, data not shown). As is standard practice for limma-voom, TMM normalization (from EdgeR package) was performed on the filtered counts 42,44 . Known to be often associated with gene expression, particularly promoter methylation is of interest, leading us to focus on the 49,378 promoter regions (2000 bp upstream to 500 bp downstream of canonical transcription start site, based on Ensembl annotation) for the promotor specific differential methylation analysis.
Gene Ontology (GO) enrichment analysis was performed using the "goana" function in the limma package. This package uses Entrez gene identifiers and can simultaneously report on the three main GO categories: biological processes, molecular functions and cellular components. For gene set analysis, a P-value cutoff (P-value < 0.0005) was used rather than a multiple testing adjusted cut-off given the general lack of significant results for the individual genes (see Results). Although individual loci considered are thus less reliable, this is putatively compensated by the higher number of actually differentially methylated loci, leading to increased power for the gene set analysis. Additionally, pathway analysis was investigated using the "kegga" function in the limma package. Cut-off criteria and P-value correction were performed analogously to the GO enrichment analysis.
LINE-1 methylation was evaluated by mapping (BOWTIE, loose parameter settings: L = 5, e = 100) both paired-end reads separately on the Homo Sapiens LINE-1 consensus sequence (L1HS) obtained from Repbase 45 and counting the mapped reads. Duplicate reads were removed first using FastUniq 46 . Paired-ends mapping both to L1HS were counted as a single event. L1HS count data were normalized for coverage differences by division by the total number of sequenced fragments.
Additional data analyses were performed in R (3.3.2) as well: the Student t-test was used to compare sequencing characteristics, LINE-1 methylation, and global DNA methylation between groups, the Wilcoxon rank-sum test for the comparison of 17β-estradiol concentration and both genic-intergenic and promoter-genic methylation ratios between treatment groups, and Spearman correlation to evaluate the correlation between isoflavone concentrations and library sizes resp. locus-specific methylation degrees. Throughout the manuscript, Benjamin-Hochberg P-value adjustment was performed to correct for multiple testing, leading to False Discovery Rate (FDR) estimates.

Data Availability
The datasets generated and analysed during the current study are available in the Gene Expression Omnibus (GEO) repository, GSE112727.