Introduction

Beyond genomic alterations, aberrant epigenomes contribute to many cancers, as demonstrated by widespread changes to DNA methylation patterns, redistribution of histone marks, and disruption of chromatin structure.1 Altered epigenomes and transcriptomes are closely intertwined and share non-genomic mechanisms of dysregulation in cancer, and are therefore not just a passive by-product of cancer.2 Epigenomic modifiers have the ability to affect the behavior of an entire network of cancer genes and can take on oncogenic roles themselves.3 Furthermore, epigenetic factors cooperate and team up with transcription factors to control specific gene target networks.4,5 In such works and in the following text, a cis-regulatory, synergistic molecular event between epigenetic and transcription factors is referred to as transcriptional cooperation (Fig. 1).

Fig. 1
figure 1

Universal workflow for computational elucidation of regulatory cooperation networks. a By conjoining epigenomics and transcriptomics data, it is possible to define an effector network comprised of target genes affected by epigenomic regulation. The epigenomic effector network is regulated by chromatin binding or chromatin modification events resulting in gene expression changes. b Concerted analysis of chromatin immunoprecipitation ChIP-Seq and RNA-Seq data (or similar data) enables identification of epigenomic and transcriptomic master regulators and transcription factor networks. c By elucidating transcription factors associated with an epigenomic event or regulator, it is possible to identify a well-defined epigenomic-transcriptomic cooperation network supported by complementary multi-omics data. A color scheme denoting, both data types and systems biology analyses, is maintained throughout the entire document. Each color represents a specific analysis technique executed on either genome-wide epigenome and transcriptome profiles. The genome-wide intersection of epigenomic target regions (light green) with differentially expressed (DE) transcripts (purple) results in the effector network of regulated target genes. The intersection of analysis of motif enrichment (AME) and transcription factor target (TFT), and upstream regulator analysis (URA) approaches provides insights into cooperative networks of transcription factors associated with epigenomic regulators. Importantly, such genome-wide information can be accessed at the sequence or gene level providing different level of depth and resolution. ChIP-Seq data from AME (green) is enhanced by position site-specific matrix (PSSM) models of transcription factor motifs. TFT analysis can be performed on gene target sets derived from ChIP-Seq (yellow) or transcriptomics data (red). Furthermore, gene expression data contains valuable directional information indicated by arrows next to the gene expression data utilized by URA (blue), which incorporates hierarchical systems biology networks. The core analysis of the workflow includes multi-omics data integration between chromatin binding and differential gene expression events

The combination of both transcriptomic and epigenomic profiling offers insight into different levels of gene regulation, transcription factor binding motifs, DNA and chromatin modifications, and how each component is coupled to a functional output. Chromatin remodelers and transcription factors are in close communication via recognition of post-translational histone modifications.6 Thereby, they have the ability to harmonize and synchronize a dynamic exchange of chromatin between open, transcriptionally active conformations, and compacted, silenced ones.3 Coordinated experiments interrogating transcriptional responses and chromatin binding via chromatin immuno-precipitation with next generation sequencing (ChIP-Seq) are well-equipped to capture different epigenomic and transcriptomic levels governing the circuitry of a regulatory network.5

Regulatory networks in biology are intrinsically hierarchical and governed by interactions and chemical modifications.7,8 The regulome describes the interplay between genes and their products and defines the control network of cellular factors determining the functional outcome of a genomic element. The reconstruction of regulatory gene networks is stated as one of the main objectives of systems biology.9,10 However, an accurate description of the regulome is a difficult task due to the dynamic nature of epigenetic, transcriptional, and signaling networks. Systems biology has the ability to integrate genome-wide omics data recorded by ChIP-Seq, assay for transposase-accessible chromatin using sequencing (ATAC-Seq), whole genome bisulfite sequencing (WGBS-Seq), and RNA sequencing (RNA-Seq) technology to identify gene targets of a regulatory event.11 The integrated analysis of such data—on the one hand based on gene networks, on the other hand based on sequence features of high-resolution sequencing data—captures cooperation among regulators. Effective experimental design and data analysis of complementary epigenomic and transcriptomic platforms are required to decipher such epigenomic and transcriptional cooperation, which has a profound impact in development and disease.

We took advantage of published, information-rich transcriptomic and epigenomic data to study regulatory networks surrounding histone lysine demethylation. The presence or absence of methylation on histone lysine residues correlates with altered gene expression and is an integral part of the epigenetics code.12 In particular histone 3 lysine 9 methylation (H3K9) is regarded as an epigenetic mark associated with suppressed gene activity.13 The H3K9 lysine demethylase 3 A (KDM3A, also referred to as JMJD1A, Gene ID: 55818, HGNC ID: 20815) demethylates mono-methylated or di-methylated histone marks, thereby activating gene regulation within spermatogenesis, metabolism, stem cell activity and tumor progression.14,15,16 Genome-wide ChIP-Seq data of KDM3A identified specific gene targets and transcriptional networks in androgen response, hypoxia, glycolysis, and lipid metabolism, emphasizing the importance of cooperation with transcription factors.5 However, among epigenetic profiling experiments, a common observation is that enrichment studies provide significance for multiple transcription factors and not just one single, prioritized hit. This underscores the concept of transcriptional cooperation among epigenetic players but also emphasizes the need to design a reliable workflow that includes cross-validation with complementary, multi-omics platforms and analysis techniques.

Results

Deciphering the regulatory landscape of an epigenomic and transcriptomic network

The regulatory landscape of an epigenomic player includes histone modifications, non-enzymatic chromatin interactions, cooperation with transcription factors, transcriptional modulation of gene target networks, and eventually stimulation of specific effector pathways. With the help of hierarchical experimental design, the complementary power of epigenomic and transcriptomic data can be leveraged, and thus used to address distinct levels of the regulome.4 However, different genome-scale data platforms and analysis techniques result in the detection of significant, yet only occasionally overlapping, insight into regulatory networks. To address this, the intersection of multi-omics data levels is useful to augment and validate epigenetic regulation of transcriptional programs. At the same time, there are unique, platform-specific insights, which need to be analyzed accordingly.

Prioritization of cooperating transcription factors by integration of complementary data

The goal of our network analysis is to detect epigenomic and transcriptomic cooperation. Specifically, it is of interest to identify and prioritize transcription factors that are closely associated with an epigenomic factor by integrating complementary data (Fig. 1). Analysis of motif enrichment (AME) determines significant enrichment of transcription factor motifs among promotors or given sequences. Transcription factor target (TFT) analysis defines which transcription factor governs a set of target genes. Upstream regulator analysis (URA) integrates TFT networks with reconstructions of systems biology maps. Each platform provides a measure of significance for each detected transcription factor feature and corrects for multiple hypothesis testing using unbiased genome-wide data.17 Importantly, computations of significance of enrichment may be performed at the level of either the gene or the sequence. Furthermore, different members of transcription factor families have the ability to recognize the same sequence motif. Therefore, dedicated searches may account for such ambiguity and overcome potential gene-specific database biases of individual transcription factors (Fig. 2). In a subsequent step, results of complementary platforms are intersected and compared. Taken together, combinations of high-throughput sequencing data deliver coordinates of epigenomic modification, enrichment of transcription factor motifs, transcriptional output, and networks of transcription factor targets (Figs. 1, 2).

Fig. 2
figure 2

Transcription factor enrichment associated with activity of an epigenetic modifier is assessed by complementary multi-omics platforms and resolved at the family and gene level. a Transcription factor (TF) enrichment associated with epigenomic activity is quantified using complementary omics platforms. ChIP-Seq data provides sequence-based insights on motifs (green) and genomic coordinates (yellow) of epigenomic activity. Transcriptomics data provides functional insight into regulated gene networks (red) and the direction of response (blue). The data was analyzed using analysis of motif enrichment (AME), transcription factor target analysis (TFT), or up-stream regulator analysis (URA). b On the one hand, the analysis is carried out at the level of transcription factor families based on position-specific matrix-assisted searches of structural motifs of transcription factor site recognition. This approach is sequence-based and considers the possibility that multiple, homologous members of transcription factor families have the ability to recognize the same transcription factor site. c On the other hand, the analysis is conducted at the transcript level with gene-specific insight into transcription factors and their expression levels. This approach takes advantage of regulatory networks, which are assigned to specific isoforms of homologous members of transcription factor families, and includes direction of regulation. The later approach yields a set of transcriptional coactivators that is about two orders of magnitude smaller and more specific than the transcription factor family-based approach

Epigenomic switch of histone demethylation makes chromatin accessible and activates gene expression

Our approach is showcased by elucidating the epigenomic switch of KDM3A emphasizing its role as a master regulator. In order to better understand the impact of KDM3A on transcriptional networks, coordinated ChIP-Seq and transcriptomic data of KDM3A binding and demethylation activity in combination with knockdown of KDM3A was utilized.5 Such a combined array of matching epigenomics and transcriptomics experiments has the ability to focus on the cooperative forces of epigenetic regulation as well as its transcriptional consequences. A ChIP-Seq experiment offers direct insight into chromatin binding events and chemical modifications of histones. By overlaying genomic binding events with tracks of epigenomic marks, such as histone acetylation or methylation, associated with open or closed states of chromatin, a functional epigenomic landscape arises. Such ChIP-Seq profiles in combination with transcriptomics and functional genomics allow interrogation of the genome-wide impact of knockdown of a specific epigenomic regulator. Via genome-wide annotation and integration of sequencing reads, it becomes apparent that corresponding profiles of binding and histone modifications are reversed upon loss of function, mirroring the enzymatic function of the epigenetic modifier. Cooperative epigenomic and transcription factor binding coincides with promoter sites on meta gene coordinates enriched for histone lysine demethylation—overall indicators of transcriptionally activating epigenetic remodeling.

Regulation of transcriptional networks by H3K9 chromatin demethylation

Coordinated ChIP-Seq and transcriptomic data classified genome-wide interactions of the chromatin demethylase KDM3A using antibodies specific for KDM3A, and its histone marks H3K9me1 and H3K9me2, conjointly with shRNA knockdown of KDM3A in the CRL-2505 cell line. Transcriptomic impact of 4326 differentially expressed genes upon KDM3A knockdown showed 2460 genes as positively regulated by KDM3A activity (down in the CRL-2505 prostate cancer line with shRNA knockdown of KDM3A), and 1866 genes as negatively regulated by KDM3A activity. Using this data we defined the set of 56.9% differentially expressed genes as positively regulated by KDM3A activity (down in the CRL-2505 prostate cancer line with shRNA knockdown of KDM3A), and 43.1% of differentially expressed genes as negatively regulated by KDM3A activity. KDM3A binding locations were defined by a loss of ChIP-Seq signal following knockdown of KDM3A. Concurrently, H3K9me1/2 histone marks following KDM3A knockdown are recognized as target regions of KDM3A histone lysine demethylation mediated by KDM3A. Changes in these ChIP-Seq marks upon KDM3A knockdown were contrasted against reference genomic DNA input or control non-coding shRNA samples. Quantification of the activity-based ChIP-Seq array matched with knockdown of KDM3A resulted in 37525 peaks associated with KDM3A binding, 45246 and 32665 H3K9 mono- and di-demethylation (H3K9me1/2-KDM) events, respectively. Overall, the peak counts of both histone marks showed a gain of signal upon knockdown of KDM3A reflecting the demethylase activity. By integrating continuous ChIP-Seq signals, an average profile of a meta-gene can be generated and functional coordinates analyzed for regulatory control. In such a meta-gene profile, promotor regions are located within 1000 bp upstream of the gene-coding body, with the transcription start site (TSS) as the start of the gene-coding body at the zero position, and intergenic regions as the remaining regions outside of the gene body. KDM3A localized to the response element-rich promoter regions and demethylated H3K9me1/2 histone marks in the proximity of the TSS. Taken together, the meta-gene analysis classified areas important for transcriptional regulation and defined genomic sequence coordinates relevant for cooperation with transcription factors.

Accounting for motif similarity and structural homology of transcription factor families

The array of ChIP-Seq data was subjected to AME and TFT analysis, while the list of differentially expressed genes served as input for URA and TFT analysis (Fig. 2a). Each transcription factor hit was reported with its HGNC identification number, hierarchical classification of human transcription factors (TFClass) family barcode, motif logo, and significance corrected for multiple hypothesis testing using an adjusted p value cut-off of 0.05 (Table 1, Supplementary Information). Each individual analysis yielded between 29 and 41 significantly enriched transcription factor families, each corresponding to more than 1000 (1083 and up to 1292, respectively) associated genes (Fig. 2b). In comparison to the entire realm of 1539 existing transcription factors, such wide-ranging data tables provide little benefit, despite the impressive p values produced by analysis tools at first glance. For example hypoxia inducible factor 1 alpha subunit (HIF1A, Gene ID: 4609, HGNC ID: 4910, TFClass: 1.2.5) of the PAS domain factors (TFClass 1.2.5) is detected with an adjusted p value below 1.0E-100 by AME in the ChIP-Seq data. Other members of the same family like the aryl hydrocarbon receptor nuclear translocator (ARNT, Gene ID: 405, HGNC ID: 700, TFClass: 1.2.5) show similar significance, since the detection is based on the same sequence logo, highlighting the lack of ability to differentiate between structurally homologous transcription factors. Therefore, we intersected all four sets of AME ChIP-Seq, TFT ChIP-Seq, URA transcriptomics, and TFT transcriptomics, and narrowed down 21 transcription factor families supported by all datasets (Fig. 2a). Despite a considerable improvement of 21 projected families among 110 existing transcription factor families, the final set maps back to 967 transcription factors. In part, such lack of specificity is due to the large family of more than 3 adjacent zinc finger factors (TFClass: 2.3.3), whose motif was detected by the analysis but contains 487 members, accounting for almost a third of all transcription factors (Fig. 3a). Systems biology networks and enrichment studies provide insight into directionality of the response and draw attention to different sized effector networks (Fig. 3b, c). Only few nodes of the transcription factor target network were hyperconnected and showed promoter association with multiple transcription factors in epigenomics and transcriptomics datasets (Figs. 4, 5). Such a high degree of network connectivity speaks to a synergistic effect, where selected master regulators cooperate and act in sync, resulting in robust transcriptional output.

Table 1 Detection of transcriptional cooperation by multi-omics integration of complementary data and analysis techniques
Fig. 3
figure 3

Visualization of epigenomic and transcriptional cooperation illustrates redundancy and complexity of a target network. a Hierarchical trees of human transcription factors correspond to transcription factor superclass, class, and family from inward out. Transcription factor motifs often get recognized by multiple members of the same transcription factor family due to structural homology of DNA binding domains. The transcription factor target analysis (TFT) is carried out on sequence-specific epigenomics data. b Size of the transcriptional effector network and the direction of response are key parameters when evaluating target genes in epigenomic cooperation. The TFT analysis is based on differentially expressed transcripts with transcriptomic up and down response in red and blue, respectively. c Transcription factor target networks provide insight into enrichment and direction of the response. Identified cooperating transcription factors show agreement between complementary data of expression levels, direction of regulation, target sets, and hierarchical linkage

Fig. 4
figure 4

Network of transcriptional cooperation of an epigenomic master regulator visualized by transcription factor family trees. a A network of transcription factors is detected using complementary epigenomic and transcriptomic data. The realm of potentially relevant transcription factor families is large yet many data points are not mirrored or validated by different platforms. b, c Confidence and mutual data support increases by integrating analysis of motif enrichment (AME), transcription factor target analysis (TFT), or up-stream regulator analysis (URA). For each data source B chromatin modifications or c differential expression of target genes detected transcription factors supported by at least two complementary techniques, AME and TFT for epigenomics data, and URA and TFT for transcriptomics. d High-confidence target network of transcription factors validated by four different omics platforms integrating ChIP-Seq based motifs and transcriptional networks. Legends of colored areas in Venn diagrams illustrate intersections of complementary datasets and analysis platforms with numbers of identified transcription factor families, respectively. The numbers next to individual nodes of the hierarchical family tree indicate transcription factor superclass, class, and family from inward out

Fig. 5
figure 5

The network of networks convoluted in epigenomic and transcriptional cooperation by an epigenomic master regulator displays specificity yet redundancy and hyperconnectivity. a A highly specific effector network is accomplished by epigenomic and transcriptional cooperation. Key features of the network are hyperconnectivity of utilized nodes and targeting of effector genes by multiple cooperating transcription factors. b Each identified node of the transcription factor network comprising epigenomic cooperation is supported by complementary epigenomic and transcriptomic data. Each cooperating transcription factor supports the epigenomic factor—in this case KDM3A—by contributing motif-specific recognition, by directing chromatin accessibility, and by controlling transcriptional coactivation

Multi-omics integration of complementary data yields well-refined target network

In order to improve the detected output, gene-specific systems biology networks were employed. In particular, URA and TFT analysis fueled by transcriptomic data provide useful insight. Databases of gene sets rely in part on experimental data of gene-specific knockdowns to characterize the impact of a transcription factor on a target network. Furthermore, a consistent directional response amplifies the significance of a detected hit. Therefore, such directional, gene-specific networks have the ability to overcome ambiguities. For example, members of the JUN-related factors (TFClass: 1.1.1) are detected but show different signs of regulation depending on the factor of interest, the response of its target genes, and the change of expression of the factor itself. After intersection of all four datasets, 11 transcription factors belonging to 10 transcription factor families were determined (Fig. 2c). This set of master regulators is supported by complementary omics platforms and different analysis techniques representing a high-confidence cooperation network of the epigenomic master regulator KDM3A (Fig. 5). The cooperation network includes cancer associated factors Jun proto-oncogene, AP-1 transcription factor subunit (JUN, Gene ID: 3725, HGNC ID: 6204, TFClass: 1.1.1), CCAAT/enhancer binding protein beta (CEBPB, Gene ID: 1051, HGNC ID: 1834, TFClass: 1.1.8), myogenic differentiation 1 (MYOD1, Gene ID: 3091, HGNC ID: 7611, TFClass: 1.2.2), HIF1A, sterol regulatory element binding transcription factor 1 (SREBF1, Gene ID: 379, HGNC ID: 11289, TFClass: 1.2.6), MYC proto-oncogene, bHLH transcription factor (MYC, Gene ID: 4609, HGNC ID: 7553, TFClass: 1.2.6), androgen receptor (AR, Gene ID: 367, HGNC ID: 644, TFClass: 2.1.1), Sp1 transcription factor (SP1, Gene ID: 6667, HGNC ID: 11205, TFClass: 2.3.1), Meis homeobox 1 (MEIS1, Gene ID: 4211, HGNC ID: 7000, TFClass: 3.1.4), zinc finger E-box binding homeobox 1 (ZEB1, Gene ID: 6935, HGNC ID: 11642, TFClass: 3.1.8), and ELK1, ETS transcription factor (ELK1, Gene ID: 2002, HGNC ID: 3321, TFClass: 3.5.2) representing less than 0.8% of all possible transcription factors (Figs. 4d, 5). Thereby it represents a hyperconnnected network of networks surrounding an epigenomic cooperation event. The identified factors can further be surveyed at the level of basal expression or regulation in patient-derived tumor specimens in TCGA underlining elevated expression in tumor progression. The analysis validates previously reported associations and contacts implicated in chromatin remodeling and discovered newly identified cooperative interactions (Table 2). For the specific role of KDM3A in cancer, epigenomic and transcriptional cooperation with transcription factors is key. KDM3A cooperates with mitogenic basic helix-loop-helix factors including MYC, HIF1A, and SREBF1, and derives a lipogenic program from association with nuclear receptors like AR. Ultimately, by overlaying the motif-specific and genomic data produced through matched experiments, epigenomic events can be correlated with the transcriptomic effect of histone remodelers and transcription factors.

Table 2 Epigenomic and transcriptional cooperation events in cancer. Original findings and reported cooperation events of epigenomic regulators with transcription factor are enumerated

Discussion

ChIP-Seq based approaches provide sequence resolution but detection of enriched transcription factor motifs is ambiguous and is most appropriately accomplished at the transcription factor family level to account for and include homologous factors. In contrast, transcriptomics studies provide directionality of regulation—transcriptional activation or repression upon epigenomic activity—an important aspect lacking in coordinate-based ATAC-Seq or ChIP-Seq experiments. Integration of different sequence, gene, or network-based approaches prioritizes high-fidelity cooperation partners in epigenomic regulation. Therefore, any combination of complementary data from sequence, gene, or network-based approaches is identified as desirable input for reliable regulatory systems biology analyses.

For the oncogenic nature and target specificity of an epigenomic master regulator, epigenomic and transcriptional cooperation with transcription factors is key. KDM3A is able to support initiation of transcription by its ability to specifically remove mono-methylation and di-methylation marks from the H3K9 residue leading to chromatin de-condensation.14 Transcriptionally silenced genes contain methylation marks on the H3K9 subunit18 and arrays of ChIP-Seq experiments monitoring H3K9 methylation marks revealed global histone demethylation effects of KDM3A. Combined assessment of histone demethylation events and gene expression changes indicated major transcriptional activation, suggesting that distinct oncogenic regulators, in particular transcription factors, may synergize with the epigenetic patterns controlled by KDM3A. Furthermore, the epigenetic factor was shown to cooperate with the androgen receptor to control prostate tissue-specific gene target networks introducing the concept of the epioncogene and transcriptional cooperation.19

While KDM3A is able to control chromatin accessibility, the mechanism by which it targets specific genes is of current interest and may influence understanding of epigenetic dysregulation in human disease. While several cancers exhibit deregulated KDM3A activity, in prostate adenocarcinoma it functions as a transcriptional coregulator with the androgen receptor.5,14,20,21 Such cooperative coactivation of the androgen receptor with KDM3A features a role for KDM3A as an active force in commencing oncogenesis in prostate epithelial cells. KDM3A is known to control the transcription and function of oncogenic transcription factors.22,23 However, an expanded study outlining the effects of perturbed KDM3A H3K9 demethylation upon human transcription factor response element recognition in cancer has so far been missing.

The cooperation network includes previously validated interactions of MYC, HIF1A, and AR in cancer but also highlights close association of KDM3A with transcriptional networks of factors rather studied in development and tissue differentiation like JUN, CEBPB, MYOD, SREBF1, SP1, MEIS1, ZEB1, or ELK1 (Table 1). By taking advantage of motif-specific target networks, KDM3A has the ability to induce glycolytic genes in urothelial bladder carcinoma.24,25 Epigenomic regulation of SREBF1 activity has been reported to stimulate lipogenesis, and SREBP1 regulates lipid accumulation and cell cycle progression in androgen independent prostate cancer cell lines.26,27,28 KDM3A regulates the transcriptional program of the AR, serves epigenomic master regulator by epigenomic and transcriptional cooperation of prostate adenocarcinoma.5 Despite some factors including the forkhead box (FOX) factors (TFClass: 3.3.1) family were frequently detected at the transcription factor family levels, lack of consistent overlap of epigenomic and transcriptomic data eventually excluded prominent cancer drivers like forkhead box A1 (FOXA1, Gene ID: 3169, HGNC ID: 5021, TFClass: 3.3.1), forkhead box M1 (FOXM1, Gene ID: 2305, HGNC ID: 3818, TFClass: 3.3.1), forkhead box O1 (FOXO1, Gene ID: 2308, HGNC ID: 3819, TFClass: 3.3.1), or forkhead box O3 (FOXO3, Gene ID: 2309, HGNC ID: 3821, TFClass: 3.3.1). Despite FOX factors are known to cooperate with nuclear hormone receptors (TFClass: 2.1.1), it is possible that the putative association with KDM3A steams from the fact that the closely cooperating AR frequently has FOX motifs nearby. Systematic, genome-wide surveys have elucidated that FOX motifs are adjacent androgen response elements (AREs),29,30 thereby facilitating cooperation at the level of transcription factors and promotion of prostate cancer progression.

Hundreds of transcription factors are significantly associated with each individual data analysis platform or high-throughput sequencing technology. Big data challenges can be overcome by systems biology analysis and integration of multi-omics data. Motif similarity is visualized by transcription factor family trees classifying superclass, class, and family of transcription factors (from inward to outward, Fig. 4) based on the characteristics of their DNA-binding domains. Single epigenomic or transcriptomic datasets examined by different analysis tools result in improved resolution but leave ambiguities. The footprints of cooperating transcription factors are found in cognate sequence motifs specific to their DNA binding domains. Such sequence motifs are more pronounced in events of cooperating epigenomic activity. Detected motif enrichment highlights the modularity, versatility, and efficacy of epigenomic cooperation, providing target specificity at genome-wide reach. The number of detected events in genome-wide epigenomic binding studies provides statistical power for sequence motif discovery and gene target enrichment. As a consequence, high-resolution epigenomic studies often arrive at multiple plausible solutions, though each suggested interaction or association may carry statistical significance. By intersecting complementary data platforms and analysis techniques, high-fidelity gene target networks involved in epigenomic and transcriptomic cooperation can be identified.

Within the regulome, epigenetic master regulators position themselves at the top of cellular hierarchies and control distinct phenotypic programs via reversible chemical modifications of chromatin, histone or nucleotide marks, without altering the core DNA sequence. Epigenetic oncogenes or tumor suppressors can arise when epigenetic master regulators are somatically activated or lost, and contribute to cancer initiation and progression.31 In cancer, such epigenetic master regulators are found at the top of regulatory hierarchies, particularly in pathways related to cellular proliferation, survival, fate, and differentiation. For the manifestation of a genomic or non-genomic aberration of an epigenetic master regulator, it is a necessity that its own activity is affected by somatic mutation, copy number alteration, expression levels, protein cofactors, or methylation status. Epigenetic master regulators often accomplish target specificity of their phenotypic program by cooperation with members of the transcriptional machinery and therefore may depend on tissue-specific expression of such auxiliary factors. In cancer, an epigenetic master regulator populates an extreme state and is either permanently switched on or off. An epigenetic master regulator will become a cancer driver, if it is not functionally neutral but rather contributes to tumorigenesis or disease progression due to its hyperactive or deactivated state. Genomic profiling of cancer patients has the ability to identify coincidence or mutual exclusivity of somatic alterations of epigenomic and transcription factors. Extreme states of epigenetic master regulators by somatic loss or gain of function in cancer may emphasize preexisting cooperative interactions with transcription factors, which may be subtle and difficult to detect under normal circumstances. A defined challenge in the field of epigenetic master regulators is to identify cancer-specific vulnerabilities in gene targets and biological pathways that are frequently and consistently perturbed under the control of an epigenetic driver.

Conclusion

In conclusion, the identification of transcriptional cooperation and regulatory hierarchies highlights the importance of epigenetic regulators in mitogenic control and their potential as therapeutic targets. Epigenetic regulators such as KDM1A, KDM3A, KDM5A, KDM6A, KDM7A, EZH2, DOT1L, and others have been shown to be critical in oncogenesis and cancer resistance.3,32,33 The discovery of the specific role of KDM3A in the interplay between a tissue-specific steroid receptor transcription factor and metabolic signaling provides a foundation for rational design of combination approaches where metabolic, epigenetic, and hormone-deprivation therapies may synergize. Our integrated multi-platform analysis reveals a complex molecular landscape of epigenomic and transcriptomic cooperation in cancer, providing avenues for precision medicine.34 A close teamwork of the transcriptional and epigenomic machinery was discovered, in which one component opens the chromatin, another recognizes gene-specific DNA motifs, and others scaffold between histones, cofactors, and the transcriptional complex. This highlights a close connection between the epigenomic and transcriptomic machinery, albeit much of the underlying principles remain to be discovered. In conclusion, transcriptomics in combination with epigenomic profiling and measurement of chromatin accessibility enable global detection of epigenetic modifications and characterization of transcriptional and epigenetic footprints. Chromatin remodelers and transcription factors are in close communication via recognition of post-translational histone modifications and coordinate the dynamic exchange of chromatin between open, transcriptionally active conformations and compacted, silenced ones. In cancer, due to the ability to team up with transcription factors, epigenetic factors concert mitogenic and metabolic gene networks, claiming the role of a cancer master regulators or epioncogenes. Exploration into the cooperative roles of epigenetic histone modifiers and transcription factor families in gene regulatory networks contributes to our understanding of how a seemingly promiscuous epigenomic program is converted into a specific transcriptional response assisting in oncogenesis.

Methods

Experimental design

Optimal experimental design mirrors different layers of the regulatory network and organizes sequencing assays in an array of coordinated experiments.4,5,35 Coordinated epigenomic and transcriptomic profiles are well-equipped to capture different regulatory levels governing the circuitry of a cooperative network. The workflow introduced in our approach is open and applicable to different epigenome and transcriptome profiles including ChIP-Seq, ATAC-Seq, RNA-Seq, and microarray experiments. Since the analysis primarily relies on differential expression, microarray or RNA-Seq data are equally applicable. Additionally, data on alternative transcripts may allow resolving regulation of splice-isoforms. For coordinated multi-omics analysis, it is important to integrate compatible datasets and look for matching conditions, each associated with presence or absence of a defined epigenomic event or factor. Genomic editing offers tools to conduct target-specific loss or gain of function studies such that an array of coordinated experiments can be assembled. Complementary epigenomic and transcriptomic data—on the one hand in form of significantly enriched genomic regions (epigenomic target regions), on the other hand as differentially expressed transcripts (target genes)—serves as input for four different analysis platforms (Figs. 1, 2).

Investigation of KDM3A as epigenetic switch in human cancer

KDM3A coordinates transcriptional activation by H3K9 demethylation thereby enabling chromatin accessibility and replacement of components of nucleosome-stalled polymerase complexes by tissue-specific transcription factors. The workflow is exemplified by reprocessing published data on KDM3A activity recorded with matching knockdown conditions in a human prostate carcinoma epithelial cellular model deposited in NCBI GEO entries GSE109748 and GSE7049829,36,37 (CRL-2505, American Type Culture Collection, Manassas, VA). The utilized cellular model is a variant derived from a xenograft and simulates castration-induced regression and relapse typical of human prostate carcinoma epithelial cells independent of dihydroxytestosterone stimulation. Furthermore, KDM3A is activated by somatic copy number amplification in lung, prostate, uterine, bladder, testicular germ cell, ovarian, cervical, breast, sarcoma, melanoma, and other cancers making its cooperation network an important target of broad interest in oncology.

Data processing

Illumina HiSeq 2000 (Illumina, San Diego, CA) fastq files were aligned to the reference human genome 19 using the Bowtie software package.38 Peak-calling utilized a model-based analysis of ChIP-Seq (MACS) algorithm.39,40 Significant ChIP-Seq genomic locations relative to nearby gene bodies were annotated by ChIPSeek.41 ChIP-Seq peak regions were sorted and filtered by BEDtools.42 Average ChIP enrichment profiles over specific genomic features were calculated using the cis-regulatory element annotation system tool.43,44 ChIPSeq binding profiles were visualized in the integrative genomics viewer (IGV).45 Utilized conditions include ChIP-Seq profiles of antibodies specific for chromatin marks H3K9me1, H3K9me2, and the epigenomic modifier KDM3A in combination with small hairpin RNA (shRNA) knockdown of KDM3A matched with coordinated transcriptomic data of control and KDM3A knockdown cells using human transcriptome platform GPL10558 (HT-12 V4.0, Illumina, San Diego, CA).5 The epigenomic and transcriptomics datasets contained 77911 features and 4356 differentially expressed transcripts upon KDM3A knockdown, respectively, with p values and q values below 0.05 adjusted for multiple hypothesis testing.

Network analysis and transcription factor target enrichment

Human transcription factors were annotated according to their Human Genome Organization (HUGO) Gene Nomenclature Committee (HGNC) identification number using the using the multi-symbol checker tool. Discovered transcription factors were classified by shared DNA binding domains according to the hierarchical classification of human transcription factors (TFClass) database.46 Transcription factor binding and promoter sites were annotated utilizing transcription factor databases.30,47 For transcription factor binding site searches we built manual or utilized deposited position site-specific matrices or sequence logos of curated, non-redundant transcription factor databases. Statistically significant enrichment of these transcription factor motifs was determined using find individual motif occurrences (FIMO) and motif enrichment tools of the motif-based sequence analysis toolkit (MEME) suite.48 Upstream regulators were determined by ingenuity pathway analysis (IPA, Qiagen, Redwood City, CA) based on differentially expressed genes with an adjusted p value below 0.05. Significant enrichment of target gene networks with consistent transcription factor motifs was calculated for all target genes with annotated transcription factor motifs in the 3′ promoter region of their transcription start sites.49

Data availability

Data is deposited in NCBI GEO entries GSE109748 and GSE70498. Identified transcription factors and statistics are assembled in the Supplementary Information. All data supporting the findings of this study is openly available within the paper and the Supplementary Information deposited at the npj Systems Biology and Applications website. A preprint version of this manuscript is made available to the scientific community on the preprint server bioRxiv 309484.50

Supplementary Table 16 are compiled as Supplementary Information. Supplementary Table 1: Master regulators among epigenomic and transcriptomic cooperation network. Supplementary Table 2: Detection and hierarchical classification of human transcription factors. Supplementary Table 3: ChIP-Seq analysis of motif enrichment (AME). Supplementary Table 4: ChIP-Seq transcription factor target (TFT) analysis. Supplementary Table 5: Transcriptomics upstream regulator analysis (URA). Supplementary Table 6: Transcriptomics transcription factor target (TFT) analysis.