Characterizing steroid hormone receptor chromatin binding landscapes in male and female breast cancer

Male breast cancer (MBC) is rare and poorly characterized. Like the female counterpart, most MBCs are hormonally driven, but relapse after hormonal treatment is also noted. The pan-hormonal action of steroid hormonal receptors, including estrogen receptor alpha (ERα), androgen receptor (AR), progesterone receptor (PR), and glucocorticoid receptor (GR) in this understudied tumor type remains wholly unexamined. This study reveals genomic cross-talk of steroid hormone receptor action and interplay in human tumors, here in the context of MBC, in relation to the female disease and patient outcome. Here we report the characterization of human breast tumors of both genders for cistromic make-up of hormonal regulation in human tumors, revealing genome-wide chromatin binding landscapes of ERα, AR, PR, GR, FOXA1, and GATA3 and enhancer-enriched histone mark H3K4me1. We integrate these data with transcriptomics to reveal gender-selective and genomic location-specific hormone receptor actions, which associate with survival in MBC patients.

B reast cancer in men is rare and largely understudied. Male breast cancer (MBC) accounts for around 1% of all 1.67 million breast cancers diagnosed each year, worldwide 1 . As compared to the female counterpart, MBC is a distinct disease regarding clinicopathological features (e.g., age of onset and frequency of hormone receptor positivity 2 ) and molecular features (e.g., frequency of BRCA2 mutation 3 and gene expression subtypes 4 ).
The majority of male (and female) breast cancers are hormonally driven 5 , where ERα genomic action dictates transcriptional programs that drive tumor cell proliferation 6 . Analogous to the female counterpart, male breast cancers are treated with endocrine therapies (such as tamoxifen) to block ERα transcriptional activity, yet relapse after hormonal treatment has also been noted 2,7 . Even though the genomic action of ERα in MBC remains completely elusive, multiple reports have studied ERα genomics in the female disease. ERα-DNA binding profiles in tumors are dynamically affected by endocrine therapeutics 8 and can differentiate female patients on outcome 9,10 . Cell line studies revealed ERα-DNA binding and ERα-driven transcriptional activation and cell proliferation to depend on its pioneer factors, including FOXA1 11 and GATA3 12 .
Apart from ERα, other steroid hormone receptors are expressed in breast cancer as well, including androgen receptor (AR) 13 , progesterone receptor (PR) 14 , and glucocorticoid receptor (GR) 14 . AR expression is frequently observed in most male (and female) breast cancers [13][14][15][16] , although its role in breast cancer is poorly understood 17 . AR activation in breast cancer cells facilitates downstream gene expression that drives tumorigenesis in a similar manner to ERα 16 . This tumorigenic action of AR is most extensively studied in prostate cancer 18,19 , where differential AR-DNA binding profiles can classify prostate cancer patients on outcome [20][21][22] .
AR and PR are favorable prognostic markers in female breast cancer (FBC) 23,24 . In addition, PR has recently been shown to modulate ERα-DNA binding, directly reprogramming ERα-driven transcriptional programs 25 . GR expression has been associated with FOXA1 and GATA3 expression in ERα-positive FBC, and is associated with a favorable outcome in this patient population 26 . Its functional role in breast cancer in relation to other steroid hormone receptors is poorly characterized. Cumulatively, these data illuminate the likely interplay between different steroid hormone receptors in breast cancer. Although ERα cistromics has previously been studied in female breast tumors 9,10 , and its interplay with transcription factors has been reported in cell lines 10,11,15,[27][28][29][30][31] , all these transcription factors have never been profiled together in a single study in human breast tumors.
We have characterized DNA binding of six different hormonerelated transcription factors in an understudied field of human pathophysiology: male breast cancer. Through multidimensional genomic data integration on the level of transcription factor binding, copy number cistrome profiling (using off-target sequencing reads from ChIP-seq data) 32 , transcriptomics and the enhancer enriched histone mark H3K4me1, we present a first comprehensive overview of male breast cancer, which we compared with the female counterpart. This comprehensive overview reveals gender-selective and genomic location-specific hormone receptor action, which associate with survival in MBC.

Results
Steroid hormone receptor profiling in male breast cancer. We aimed to generate a compendium of (epi)genomic, transcriptomic and clinical data for 49 ERα-positive MBC samples to better characterize the molecular makeup of this disease. To determine the chromatin binding landscape of ERα in relation to steroid hormone receptors AR, PR, and GR and its pioneer factors FOXA1 and GATA3, we performed ChIP-seq analyses in clinical specimens from patients who did not receive any therapy prior to surgery. These results were integrated with gene expression data and compared with female breast cancer and cell line ChIP-seq data (Fig. 1a, Supplementary Fig. 1). Samples were selected randomly for ChIP-seq of different factors. In this pioneering work, each transcription factor was profiled in MBC for the first time (30 ERα ChIP-seq datastreams and ≥7 samples/factor for other factors with the exception of GATA3 and PR, with 3 and 4 samples, respectively). To position these results into epigenetic context, H3K4me1 was included as active enhancer marker 33 . We generated RNA-seq data for the series, used to classify samples on subtypes related to outcome; M1 (poor) and M2 (good) 4 (Supplementary Fig. 1). Finally, we used copy number data (detected using off-target sequencing reads) 32 and RNA-seq data to perform integrative clustering (IntClust) classification, which was previously associated with FBC prognostication (Supplementary Fig. 1 Supplementary Fig. 2), FOXA1, AR, GR, PR, and GATA3 (ChIP-seq validated by ChIP-QPCR in Supplementary Fig. 3) using validated antibodies ( Supplementary  Fig. 4A, B), shown at two well-known ERα bound regions in FBC: loci RARA and GREB1 (Fig. 1b). In this principal examination of SHRs, FOXA1 and GATA3 in a tumor series, we observed these two ERα-regions were bound by all other factors. This is reminiscent of FBC transcription factors, such as GATA3, FOXA1, RARα, which have been found to be bound in the same regions 35 . These findings were confirmed on a genome-wide scale, where all ERα sites were considered bound in ≥50% of tumors in which we find co-binding by all other factors tested (Fig. 1c). In accordance with reports in cell lines 11,25,29,36 , all factors studied mainly bind intronic and distal intergenic regions (Fig. 1d). DNA motif analysis revealed self-preference for all factors ( Interplay of ERα with other SHRs and pioneer factors. We have shown ERα binding sites have considerable overlap with other factors studied. Next, overlap of ERα with the other steroid hormone receptors (Fig. 2a, left) or pioneer factors FOXA1 and GATA3 (Fig. 2a, right) was studied. All sites shared for individual factors: ERα (15 out of 30 samples), AR (5 out of 10 samples), FOXA1 (3 out of 7 samples), PR (2 out of 4 samples), GR (3 out of 7 samples) and GATA3 (1 out of 3 samples). AR and GR have virtually no unique binding regions, while selective sites for ERα and PR are identified. When examining ERα, FOXA1 and GATA3, we identified selective sites in each. Strikingly, 71% of FOXA1 sites were FOXA1-selective while ER and GATA3selective sites were 42% and 33%, respectively. Next, we counted the reads in the union of sites for all factors (Fig. 2a) and computed the correlation score for each sample (Pearson correlation coefficient), which was represented in a correlation matrix (Fig. 2b). Contrary to what is described for ERα/FOXA1 behavior in FBC cell lines 11 , we observed FOXA1 profiles do not correlate with other profiles, while all SHRs and GATA3 cluster together. Most notably, a strong similarity in genomic profiles of ERα with GR and AR was found. Practically all AR sites were co-occupied    (Fig. 2c), which was also seen in female breast tumor and MCF7 cells ( Supplementary Fig. 6A, B). MBC is ERα-driven 37,38 . Therefore, a one-to-one comparison of binding sites of ERα with each factor was performed (Fig. 2c, d and Supplementary Fig. 7), where only samples were analyzed in which both ERα and the other factor were profiled (peak selection as stated above and in figure legends). A significant proportion of ERα binding sites overlap with AR and GR sites as seen in cell line data 39,40 which we confirmed in MBC (Fig. 2c, d). Interestingly, although PR modulates ERα binding in FBC 25 , many PR sites in MBC were devoid of ERα (46%). These findings were confirmed in DNA binding correlation matrices (Fig. 2e). ERα and AR show clustering within patient rather than between factors (Fig. 2e), indicating a stronger correlation between factors within the same tumor as compared to the same factor between tumors.
A dominant dogma of ERα biology purports ERα binding is dependent on its pioneering factor FOXA1, with 95% of binding found at enhancer regions 11,29,30,41,42 . In line with literature 11 , we find~50% overlap between ERα and FOXA1 in cell lines, where sites enriched for FOXA1 or ERα (Fig. 3a, b and Supplementary  Fig. 8) were largely found in intronic and distal intergenic regions (Fig. 3c). These findings are in stark contrast to observations in both male and female breast cancers, in which ERα sites devoid of FOXA1 were strongly promoter-enriched ( Fig. 3a-c), suggesting the model systems currently used do not adequately capture the genomic distributions of ERα found in clinical samples. In contrast to FOXA1-enriched sites, sites selectively occupied by ERα were weaker for active enhancer mark H3K4me1 33 ( Supplementary Fig. 9A, B). Interestingly, GATA3 was found at both the FOXA1-enriched and ERα-enriched sites (Supplementary Fig. 9A). Motifs at ERα selective sites are related to ESR1 and devoid of forkhead motifs ( Supplementary Fig. 10), which is in contrast to the total of ERα sites ( Supplementary Fig. 5).
MBC subtypes differ in hormone receptor action. Having characterized MBC with respect to SHRs, GATA3 and FOXA1 DNA binding, we next performed gene expression analyses in these samples (n = 46). In order to assess underlying ERα binding patterns between published MBC intrinsic subtypes M1 and M2 4 , we first confirmed M1/M2 subtype clustering using our RNA-seq data set using only subtype genes (Fig. 4a). Supporting our hypothesis that ERα function may deviate between M1 and M2 subtypes, we found 1395 differential ERα DNA binding sites (Fig. 4b). Analogous analyses were not performed for other factors than ERα, since ChIP-seq datastreams were not sufficiently powered to represent both the M1 and M2 subtypes (Supplementary Fig. 1). With available datastreams, we confirmed the occupancy of FOXA1, AR, GR, PR, and GATA3 at these differential ER α DNA binding sites (Fig. 4c). M1-and M2-specific sites were comparable in genomic location and motif usage (Fig. 4d, e and Supplementary Fig. 11). Genes with proximal binding sites (<20 kb or within the gene body) were subsequently examined for molecular and biological associations using pathway analysis (IPA, Qiagen). Both ERα ChIP-seq associated M1/ M2 genes and gene expression-based M1/M2 genes strongly associated with ERα pathway indicators as expected, though some additional regulators are specifically found in the previously reported expression-based classification, such as ERBB2 and KRAS (Fig. 4f). Interestingly, among canonical pathways, AR signaling was the only hormonal signaling pathway more associated with the ERα binding based genes compared to the gene expression-based genes (Fig. 4g), in line with the strong overlap of AR/ERα binding in these tumors (Figs. 2, 4c).
Comparing genomics of ERα and FOXA1 between genders. As ERα is the key driver and therapeutic target in both genders, we compared ERα chromatin binding in female (17 from Ross-Innes et al. 10 , 9 from Jansen et al. 9 and 10 generated in-house) and male (30 generated in-house) breast tumors (Fig. 5a), along with its pioneer factor FOXA1 (n = 7 for both genders) (Fig. 5b). Interestingly, no clear differences in ERα and FOXA1 binding was found between genders, on the level of peak overlap ratio  Supplementary Fig. 13), and vice versa. Furthermore, motif enrichment at ERα and FOXA1 sites was highly comparable between genders (Fig. 5g, h). While clear clustering was observed for ERα between (male and female) tumors and cell lines (Fig. 5a), no separation on gender was observed for any of the factors studied in an integrative analysis ( Supplementary Fig. 14), as well as separately for all factors studied ( Fig. 5a, b, Supplementary Fig. 15). ERα sites that classify FBC on outcome 10 were used to predict male outcome (k-nearest neighbor classifier; Methods section), and a weak but similar trend of ERα signal strength was observed in these sites (Fig. 5i). However, overall survival (OS) was not significantly different between the two groups of male patients ( Supplementary Fig. 16). These results suggest that although the vast majority of ERα and FOXA1 sites are conserved between breast cancers from both genders, ERα sites indicative for outcome in FBC may not be applicable in the male disease.
Genomic profiles of ERα and FOXA1 stratify MBC patients on outcome. Since prognostic ERα sites in female tumors do not seem to be indicative for male patient outcome in our MBC series, we analyzed binding sites of ERα and pioneer factor FOXA1 in MBC for outcome prediction. Analogous analyses for ERα/ GATA3 were not performed due to insufficient power due to ChIP-seq peak overlaps 27 17 Pair-wise correlation matrices  Fig. 17), 365 ERα and 470 FOXA1 sites differed (Fig. 6a). Differential ERα and FOXA1 sites between patient subgroups were coupled to proximal genes (<20 kb or within the gene body). Unsupervised hierarchical clustering revealed clusters dominated by either of M1 or M2 subtypes, both in our cohort (Fig. 6b) and a validation set 4 ( Fig. 6c; n = 66 patients). We performed logistic regression with elastic net regularization 43 to construct a supervised binary classification model by which predictive gene signatures could be identified, which was trained in our cohort and tested in the validation cohort. Both ERα-and FOXA1-based classifiers captured predictive features, which were outperformed by the union of both classifier gene lists (Fig. 6d, e). A bootstrapping analysis confirmed that comparable performance is rarely achieved with random gene sets (p = 0.013, one-tailed test with bootstrapped performance distribution; Methods section; Supplementary  Fig. 18). Dividing patients into two groups of equal size based on the signature (high-risk and row-risk group of LN-status) significantly classified patients on distant metastasis free survival (DMFS; p = 0.048, log-rank test; Fig. 6f; Methods section), which was marginally significant in a multivariate Cox analysis including LN-status (p = 0.066, Cox proportional hazards test). The gene expression signature is a linear combination of gene expression levels, in which 14 contributing genes classify MBC on outcome (Fig. 6g). Cumulatively, we show that global ERα and FOXA1 chromatin binding selectivity reveals gender-specific prognostic features that successfully classify MBC patients on survival.

Discussion
This work has characterized the DNA binding landscape of ERα in male breast cancer, along with its pioneer factors FOXA1,  ERα Genes associated with M1/M2 subtypes necessity to address transcription factor functioning in the physiological context of human tissue, rather than limiting analyses to cell line models.
Our data reveal that genomic functions of ERα and AR in male breast tumors are largely overlapping, which strongly co-localized with GR and PR at the same regions. Even though many sites for GATA3, FOXA1, and PR were not shared with ERα, both AR and GR show virtually no unique binding sites with respect to ERα binding. AR has been shown to compensate for ERα in ERα-/AR+ female breast cancers 40,44 , however the biological interaction between ERα and AR is relatively unknown in both FBC and MBC 45 . The observed genomic overlap of steroid hormone receptor binding profiles is likely due to the close sequence homology of DNA binding domains between all steroid hormone receptors, which warrants potential competition between them in DNA binding. Alternatively, genomic overlap of SHR binding profiles may be the consequence of 'tethering', in which factors associate to the DNA indirectly through complex formation with DNA-bound factors 35 , as was recently described for ERα and PR 25 . Such genomic convergence of steroid hormone receptor action in tumors may provide a novel starting point for pharmaceutical intervention strategies, yielding direct biological rationale for the use of small molecule therapeutics to target AR (e.g., clinicaltrials.gov NCT01990209), GR or PR in hormone receptor-positive breast cancer. As MBC is a rare cancer with limited numbers of available tumors for genomic studies, ChIPseq analyses for some factors including PR and GATA3 were performed on relatively low number of tumors. To focus our analyses on the most-robust peaks and thus minimizing potential impact of patient heterogeneity, for all SHRs we only considered peaks that were found in around 50% of patient samples. This could be considered a rather conservative approach compared to other tissue-derived ChIP-seq papers, where the union of all peaks 46 or peaks identified in at least 2 out of 21 tumors 10 were used as consensus for analyses. Nonetheless, results for PR and GATA3 still warrant validation in larger cohorts.
Although, we found the vast majority of ERα sites to be shared between male and female breast cancers, ERα sites that are associated with patient outcome appeared gender-selective. In line with these results, genomic selectivity of combinatorial steroid hormone receptor action is associated with the genderspecific intrinsic MBC subtypes M1 and M2. While these data suggest ERα function may be driving these subtypes, causality can only be illustrated when cell line models, organoids or patientderived xenografts are available for mechanistic studies. As the most clinically relevant observation, we have identified distinct genomic signatures of ERα action, which selectively and exclusively classifies MBC patients on outcome. With differential binding of ERα and FOXA1 as a guide, we developed a gene expression signature that is significantly associated with DMFS in MBC patients. The union of genes under differential control of ERα and FOXA1 jointly classify patients on outcome, and it remains to be determined which transcription factor is facilitated by FOXA1 at these sites. The MLL3 histone methyltransferase may represent one candidate to be tested in future studies based on the published FOXA1 and MLL3 interaction in FBC cells 47 . The 14 genes classifier we identified may be of added value as male breast cancer-specific prognostic classifier, but further validation of these results would be needed. Furthermore, small molecule inhibitors are available for a number of the 14 genes represented in the classifier, such as CAMKK2 (STO-609) 48 , CAPN9 49 , BACE2 50 , and TNFSF11 (aka RANKL) 51 , and future studies could further elucidate whether these inhibitors may be applicable in the treatment of male breast cancer.
With this, we present the first comprehensive genomic overview of shared and unique features of four steroid hormone receptors in human cancer, with outcome prediction. By studying MBC, gender-selective features of ERα action were identified with potentially direct clinical implications, revealing the first biologydriven biomarker for outcome prediction in this highly understudied cancer-type.

Methods
Tumor specimens. In this study, primary male and female breast tumors were used, none of whom received neoadjuvant endocrine therapy. Male breast cancer patients received surgery at the Netherlands Cancer Institute-Antoni van Leeuwenhoek (NKI-AVL; Amsterdam, the Netherlands), University Medical Center Utrecht (UMCU; Utrecht, the Netherlands), Vrije Universiteit Medical Center (VUMC; Amsterdam, the Netherlands), Radboud University Medical Center (RadboudUMC; Nijmegen, the Netherlands), University Medical Center Groningen (UMCG; Groningen, The Netherlands), Leiden University Medical Center (LUMC; Leiden, the Netherlands), and Erasmus Medical Center (ErasmusMC; Rotterdam, the Netherlands).
Female breast cancer patients received surgery at the Netherlands Cancer Institute-Antoni van Leeuwenhoek (NKI-AVL; Amsterdam, the Netherlands). Tumor content and immunohistochemical analyses were assessed by pathological examination. For clinicopathological parameters, see Supplementary Tables 1 (male tumors) and 2 (female tumors). Local medical ethical authorities at abovementioned centers approved of the collection protocols. All samples were from anonymous left-over material, which would be discarded otherwise. Anonymized, coded leftover material which is not traced back to the patient and therefore does not interfere with care and/or prognosis, under strict requirements can be used without written informed consent according to Dutch legislation on Secondary Use 52 .
Immunohistochemistry. Immunohistochemistry staining for ER, PR, and AR was performed as previously described 8 . Immunohistochemistry of other factors was performed on a BenchMark Ultra autostainer (GATA3 and FOXA1) or Discovery Ultra autostainer (GR). Briefly, paraffin sections were cut at 3 µm, heated and deparaffinized in the instrument with EZ prep solution (Ventana Medical Systems). FOXA1 was detected using clone 2F83 (1/100,000 dilution, 16 min at RT, Seven      63 . To generate consensus peaksets for each factor we used DiffBind (v1.14) 10 with a cutoff defined where a peak must be seen in at least 50% of the samples for that factor. In the event that there were only 2 samples sequenced for a factor, only peaks were considered found in both samples. A list of the union of these sites was generated for Fig. 2a, b). For two-way Venn diagrams, we used these cutoffs in each factor in the set of 4 (ERα vs AR), 3 (ERα vs FOXA1), and 2 (ERα vs GR/PR) (Fig. 2c, d). For differential binding analyses (Fig. 2e), DiffBind was used 10 with the following parameters, FDR of 0.1 for comparison between two different factors and p-value of 0.01 for comparison between different LN-status. Given a set of binding site, genomic distribution and significantly enriched motifs were obtained using CEAS 63 and SeqPos 64 in Galaxy Cistrome (v1.0.0) 64 . All identified motifs were included in the wordcloud figures without removing close homologues, to prevent selection bias (Supplementary Figures 5, 10 and 11).
RNA isolation and RNA-seq. Sections (30 µm thick) were cut from the frozen tumor tissues for RNA isolation. Total RNA was extracted using the mirVana miRNA isolation kit (Ambion, USA) according to the manufacturer's protocol until the end of F1. Quality and quantity of the total RNA was assessed by the 2100 Bioanalyzer (Agilent, USA). Total RNA samples having RIN >8 were subjected to library generation. Strand-specific libraries were generated using the TruSeq Stranded mRNA sample preparation kit (Illumina, USA; RS-122-2101/2) according to the manufacturer's instructions (Illumina, Part # 15031047 Rev. E). 3′ adenylated and adapter ligated cDNA fragments were subject to 12 cycles of PCR. The libraries were analyzed on a 2100 Bioanalyzer using a 7500 chip (Agilent, Santa Clara, CA), diluted and pooled equimolarly into a multiplexed, 10 nM sequencing pools and stored at −20°C. Strand-specific cDNA libraries were sequenced with 100 base paired-end reads on a HiSeq2500 using V4 chemistry (Illumina).
RNA-seq analysis. For all analyses we used the reference file Ensembl GRCh37.75. Adapter filtered reads were subject to STAR alignment (v2.4.2) 65 carried out using default parameters. For expression analysis, HTSeq (v0.6.1p1) 66 was used to count reads for all genes in our RNA-seq samples using the htseq-count command. The DESeq2 (v1.16.0) R package was then used to generate a gene expression matrix from these data 67 . Normalization of the data was carried out using the 'rlog' method within the package. Only samples with at least 5 reads across each gene were retained for further analyses (N = 46).
Prediction of male patient outcome on profiles of ERα and FOXA1. We constructed a k-nearest neighbor classifier based on ERα binding profile of female patients using scikit-learn module (v0. 19) 68 in Python. Taking read counts of ERα in the ERα binding sites that classify female outcome, male patient outcome is predicted by the outcome of five closest female data in terms of Minkowski distance.
Logistic regression with elastic net regularization. We used R package glmnet (v2.0) 43 for constructing a logistic regression model using RNA-seq data in our cohort. Leave-one-out cross validation was performed for finding robust coefficients. Taking the independent validation cohort, linear combination of gene expression levels using the coefficients (gene expression signature) were obtained to rank the patients. Measuring sensitivity and specificity of LN-status prediction with varying threshold gives a receiver-operating characteristics (ROC) curve from which area under the curve (AUC) can be measured. A threshold dividing highand low-risk group was chosen as the median. We used potential target genes of ERα and FOXA1, and union of potential targets to construct three classification models. Note that due to the high dimensionality, no contributing gene is found when the model is trained with whole genome data.
Bootstrapping analysis of classification model. We used the R package boot (v1.3) 69 for bootstrapping analysis. For each classification model, 1000 random models were constructed in the same manner but with random gene sets with the same size to obtain bootstrapped AUC distribution. Taking the bootstrapped AUC distribution as a null hypothesis distribution, significance of the performance was assessed by one-tailed test.
Survival analysis. We used R package Survival (v2.38) 70 for survival analysis. Given two patient groups based on data availability, the log-rank test was used for overall survival (OS) comparison in our cohort and distant metastasis free survival (DMFS) comparison in a validation cohort. Cox regression was used to assess association of LN-status and gene expression signature to survival. The available additional prognostic factors used in multivariate Cox regression were LN-status, endocrine therapy, chemotherapy, radiotherapy and age at diagnosis.
Data availability. All ChIP-seq data generated in the study are available on GEO repository: GSE104399 and RNA-seq data on GEO repository: GSE104730. All public data streams used in the study are listed in Supplementary Table 3.