Coordinated experiments focused on transcriptional responses and chromatin states are well-equipped to capture different epigenomic and transcriptomic levels governing the circuitry of a regulatory network. We propose a workflow for the genome-wide identification of epigenomic and transcriptional cooperation to elucidate transcriptional networks in cancer. Gene promoter annotation in combination with network analysis and sequence-resolution of enriched transcriptional motifs in epigenomic data reveals transcription factor families that act synergistically with epigenomic master regulators. By investigating complementary omics levels, a close teamwork of the transcriptional and epigenomic machinery was discovered. The discovered network is tightly connected and surrounds the histone lysine demethylase KDM3A, basic helix-loop-helix factors MYC, HIF1A, and SREBF1, as well as differentiation factors AP1, MYOD1, SP1, MEIS1, ZEB1, and ELK1. In such a cooperative network, one component opens the chromatin, another one recognizes gene-specific DNA motifs, others scaffold between histones, cofactors, and the transcriptional complex. In cancer, due to the ability to team up with transcription factors, epigenetic factors concert mitogenic and metabolic gene networks, claiming the role of a cancer master regulators or epioncogenes. Significantly, specific histone modification patterns are commonly associated with open or closed chromatin states, and are linked to distinct biological outcomes by transcriptional activation or repression. Disruption of patterns of histone modifications is associated with the loss of proliferative control and cancer. There is tremendous therapeutic potential in understanding and targeting histone modification pathways. Thus, investigating cooperation of chromatin remodelers and the transcriptional machinery is not only important for elucidating fundamental mechanisms of chromatin regulation, but also necessary for the design of targeted therapeutics.
Beyond genomic alterations, aberrant epigenomes contribute to many cancers, as demonstrated by widespread changes to DNA methylation patterns, redistribution of histone marks, and disruption of chromatin structure.1 Altered epigenomes and transcriptomes are closely intertwined and share non-genomic mechanisms of dysregulation in cancer, and are therefore not just a passive by-product of cancer.2 Epigenomic modifiers have the ability to affect the behavior of an entire network of cancer genes and can take on oncogenic roles themselves.3 Furthermore, epigenetic factors cooperate and team up with transcription factors to control specific gene target networks.4,5 In such works and in the following text, a cis-regulatory, synergistic molecular event between epigenetic and transcription factors is referred to as transcriptional cooperation (Fig. 1).
The combination of both transcriptomic and epigenomic profiling offers insight into different levels of gene regulation, transcription factor binding motifs, DNA and chromatin modifications, and how each component is coupled to a functional output. Chromatin remodelers and transcription factors are in close communication via recognition of post-translational histone modifications.6 Thereby, they have the ability to harmonize and synchronize a dynamic exchange of chromatin between open, transcriptionally active conformations, and compacted, silenced ones.3 Coordinated experiments interrogating transcriptional responses and chromatin binding via chromatin immuno-precipitation with next generation sequencing (ChIP-Seq) are well-equipped to capture different epigenomic and transcriptomic levels governing the circuitry of a regulatory network.5
Regulatory networks in biology are intrinsically hierarchical and governed by interactions and chemical modifications.7,8 The regulome describes the interplay between genes and their products and defines the control network of cellular factors determining the functional outcome of a genomic element. The reconstruction of regulatory gene networks is stated as one of the main objectives of systems biology.9,10 However, an accurate description of the regulome is a difficult task due to the dynamic nature of epigenetic, transcriptional, and signaling networks. Systems biology has the ability to integrate genome-wide omics data recorded by ChIP-Seq, assay for transposase-accessible chromatin using sequencing (ATAC-Seq), whole genome bisulfite sequencing (WGBS-Seq), and RNA sequencing (RNA-Seq) technology to identify gene targets of a regulatory event.11 The integrated analysis of such data—on the one hand based on gene networks, on the other hand based on sequence features of high-resolution sequencing data—captures cooperation among regulators. Effective experimental design and data analysis of complementary epigenomic and transcriptomic platforms are required to decipher such epigenomic and transcriptional cooperation, which has a profound impact in development and disease.
We took advantage of published, information-rich transcriptomic and epigenomic data to study regulatory networks surrounding histone lysine demethylation. The presence or absence of methylation on histone lysine residues correlates with altered gene expression and is an integral part of the epigenetics code.12 In particular histone 3 lysine 9 methylation (H3K9) is regarded as an epigenetic mark associated with suppressed gene activity.13 The H3K9 lysine demethylase 3 A (KDM3A, also referred to as JMJD1A, Gene ID: 55818, HGNC ID: 20815) demethylates mono-methylated or di-methylated histone marks, thereby activating gene regulation within spermatogenesis, metabolism, stem cell activity and tumor progression.14,15,16 Genome-wide ChIP-Seq data of KDM3A identified specific gene targets and transcriptional networks in androgen response, hypoxia, glycolysis, and lipid metabolism, emphasizing the importance of cooperation with transcription factors.5 However, among epigenetic profiling experiments, a common observation is that enrichment studies provide significance for multiple transcription factors and not just one single, prioritized hit. This underscores the concept of transcriptional cooperation among epigenetic players but also emphasizes the need to design a reliable workflow that includes cross-validation with complementary, multi-omics platforms and analysis techniques.
Deciphering the regulatory landscape of an epigenomic and transcriptomic network
The regulatory landscape of an epigenomic player includes histone modifications, non-enzymatic chromatin interactions, cooperation with transcription factors, transcriptional modulation of gene target networks, and eventually stimulation of specific effector pathways. With the help of hierarchical experimental design, the complementary power of epigenomic and transcriptomic data can be leveraged, and thus used to address distinct levels of the regulome.4 However, different genome-scale data platforms and analysis techniques result in the detection of significant, yet only occasionally overlapping, insight into regulatory networks. To address this, the intersection of multi-omics data levels is useful to augment and validate epigenetic regulation of transcriptional programs. At the same time, there are unique, platform-specific insights, which need to be analyzed accordingly.
Prioritization of cooperating transcription factors by integration of complementary data
The goal of our network analysis is to detect epigenomic and transcriptomic cooperation. Specifically, it is of interest to identify and prioritize transcription factors that are closely associated with an epigenomic factor by integrating complementary data (Fig. 1). Analysis of motif enrichment (AME) determines significant enrichment of transcription factor motifs among promotors or given sequences. Transcription factor target (TFT) analysis defines which transcription factor governs a set of target genes. Upstream regulator analysis (URA) integrates TFT networks with reconstructions of systems biology maps. Each platform provides a measure of significance for each detected transcription factor feature and corrects for multiple hypothesis testing using unbiased genome-wide data.17 Importantly, computations of significance of enrichment may be performed at the level of either the gene or the sequence. Furthermore, different members of transcription factor families have the ability to recognize the same sequence motif. Therefore, dedicated searches may account for such ambiguity and overcome potential gene-specific database biases of individual transcription factors (Fig. 2). In a subsequent step, results of complementary platforms are intersected and compared. Taken together, combinations of high-throughput sequencing data deliver coordinates of epigenomic modification, enrichment of transcription factor motifs, transcriptional output, and networks of transcription factor targets (Figs. 1, 2).
Epigenomic switch of histone demethylation makes chromatin accessible and activates gene expression
Our approach is showcased by elucidating the epigenomic switch of KDM3A emphasizing its role as a master regulator. In order to better understand the impact of KDM3A on transcriptional networks, coordinated ChIP-Seq and transcriptomic data of KDM3A binding and demethylation activity in combination with knockdown of KDM3A was utilized.5 Such a combined array of matching epigenomics and transcriptomics experiments has the ability to focus on the cooperative forces of epigenetic regulation as well as its transcriptional consequences. A ChIP-Seq experiment offers direct insight into chromatin binding events and chemical modifications of histones. By overlaying genomic binding events with tracks of epigenomic marks, such as histone acetylation or methylation, associated with open or closed states of chromatin, a functional epigenomic landscape arises. Such ChIP-Seq profiles in combination with transcriptomics and functional genomics allow interrogation of the genome-wide impact of knockdown of a specific epigenomic regulator. Via genome-wide annotation and integration of sequencing reads, it becomes apparent that corresponding profiles of binding and histone modifications are reversed upon loss of function, mirroring the enzymatic function of the epigenetic modifier. Cooperative epigenomic and transcription factor binding coincides with promoter sites on meta gene coordinates enriched for histone lysine demethylation—overall indicators of transcriptionally activating epigenetic remodeling.
Regulation of transcriptional networks by H3K9 chromatin demethylation
Coordinated ChIP-Seq and transcriptomic data classified genome-wide interactions of the chromatin demethylase KDM3A using antibodies specific for KDM3A, and its histone marks H3K9me1 and H3K9me2, conjointly with shRNA knockdown of KDM3A in the CRL-2505 cell line. Transcriptomic impact of 4326 differentially expressed genes upon KDM3A knockdown showed 2460 genes as positively regulated by KDM3A activity (down in the CRL-2505 prostate cancer line with shRNA knockdown of KDM3A), and 1866 genes as negatively regulated by KDM3A activity. Using this data we defined the set of 56.9% differentially expressed genes as positively regulated by KDM3A activity (down in the CRL-2505 prostate cancer line with shRNA knockdown of KDM3A), and 43.1% of differentially expressed genes as negatively regulated by KDM3A activity. KDM3A binding locations were defined by a loss of ChIP-Seq signal following knockdown of KDM3A. Concurrently, H3K9me1/2 histone marks following KDM3A knockdown are recognized as target regions of KDM3A histone lysine demethylation mediated by KDM3A. Changes in these ChIP-Seq marks upon KDM3A knockdown were contrasted against reference genomic DNA input or control non-coding shRNA samples. Quantification of the activity-based ChIP-Seq array matched with knockdown of KDM3A resulted in 37525 peaks associated with KDM3A binding, 45246 and 32665 H3K9 mono- and di-demethylation (H3K9me1/2-KDM) events, respectively. Overall, the peak counts of both histone marks showed a gain of signal upon knockdown of KDM3A reflecting the demethylase activity. By integrating continuous ChIP-Seq signals, an average profile of a meta-gene can be generated and functional coordinates analyzed for regulatory control. In such a meta-gene profile, promotor regions are located within 1000 bp upstream of the gene-coding body, with the transcription start site (TSS) as the start of the gene-coding body at the zero position, and intergenic regions as the remaining regions outside of the gene body. KDM3A localized to the response element-rich promoter regions and demethylated H3K9me1/2 histone marks in the proximity of the TSS. Taken together, the meta-gene analysis classified areas important for transcriptional regulation and defined genomic sequence coordinates relevant for cooperation with transcription factors.
Accounting for motif similarity and structural homology of transcription factor families
The array of ChIP-Seq data was subjected to AME and TFT analysis, while the list of differentially expressed genes served as input for URA and TFT analysis (Fig. 2a). Each transcription factor hit was reported with its HGNC identification number, hierarchical classification of human transcription factors (TFClass) family barcode, motif logo, and significance corrected for multiple hypothesis testing using an adjusted p value cut-off of 0.05 (Table 1, Supplementary Information). Each individual analysis yielded between 29 and 41 significantly enriched transcription factor families, each corresponding to more than 1000 (1083 and up to 1292, respectively) associated genes (Fig. 2b). In comparison to the entire realm of 1539 existing transcription factors, such wide-ranging data tables provide little benefit, despite the impressive p values produced by analysis tools at first glance. For example hypoxia inducible factor 1 alpha subunit (HIF1A, Gene ID: 4609, HGNC ID: 4910, TFClass: 1.2.5) of the PAS domain factors (TFClass 1.2.5) is detected with an adjusted p value below 1.0E-100 by AME in the ChIP-Seq data. Other members of the same family like the aryl hydrocarbon receptor nuclear translocator (ARNT, Gene ID: 405, HGNC ID: 700, TFClass: 1.2.5) show similar significance, since the detection is based on the same sequence logo, highlighting the lack of ability to differentiate between structurally homologous transcription factors. Therefore, we intersected all four sets of AME ChIP-Seq, TFT ChIP-Seq, URA transcriptomics, and TFT transcriptomics, and narrowed down 21 transcription factor families supported by all datasets (Fig. 2a). Despite a considerable improvement of 21 projected families among 110 existing transcription factor families, the final set maps back to 967 transcription factors. In part, such lack of specificity is due to the large family of more than 3 adjacent zinc finger factors (TFClass: 2.3.3), whose motif was detected by the analysis but contains 487 members, accounting for almost a third of all transcription factors (Fig. 3a). Systems biology networks and enrichment studies provide insight into directionality of the response and draw attention to different sized effector networks (Fig. 3b, c). Only few nodes of the transcription factor target network were hyperconnected and showed promoter association with multiple transcription factors in epigenomics and transcriptomics datasets (Figs. 4, 5). Such a high degree of network connectivity speaks to a synergistic effect, where selected master regulators cooperate and act in sync, resulting in robust transcriptional output.
Multi-omics integration of complementary data yields well-refined target network
In order to improve the detected output, gene-specific systems biology networks were employed. In particular, URA and TFT analysis fueled by transcriptomic data provide useful insight. Databases of gene sets rely in part on experimental data of gene-specific knockdowns to characterize the impact of a transcription factor on a target network. Furthermore, a consistent directional response amplifies the significance of a detected hit. Therefore, such directional, gene-specific networks have the ability to overcome ambiguities. For example, members of the JUN-related factors (TFClass: 1.1.1) are detected but show different signs of regulation depending on the factor of interest, the response of its target genes, and the change of expression of the factor itself. After intersection of all four datasets, 11 transcription factors belonging to 10 transcription factor families were determined (Fig. 2c). This set of master regulators is supported by complementary omics platforms and different analysis techniques representing a high-confidence cooperation network of the epigenomic master regulator KDM3A (Fig. 5). The cooperation network includes cancer associated factors Jun proto-oncogene, AP-1 transcription factor subunit (JUN, Gene ID: 3725, HGNC ID: 6204, TFClass: 1.1.1), CCAAT/enhancer binding protein beta (CEBPB, Gene ID: 1051, HGNC ID: 1834, TFClass: 1.1.8), myogenic differentiation 1 (MYOD1, Gene ID: 3091, HGNC ID: 7611, TFClass: 1.2.2), HIF1A, sterol regulatory element binding transcription factor 1 (SREBF1, Gene ID: 379, HGNC ID: 11289, TFClass: 1.2.6), MYC proto-oncogene, bHLH transcription factor (MYC, Gene ID: 4609, HGNC ID: 7553, TFClass: 1.2.6), androgen receptor (AR, Gene ID: 367, HGNC ID: 644, TFClass: 2.1.1), Sp1 transcription factor (SP1, Gene ID: 6667, HGNC ID: 11205, TFClass: 2.3.1), Meis homeobox 1 (MEIS1, Gene ID: 4211, HGNC ID: 7000, TFClass: 3.1.4), zinc finger E-box binding homeobox 1 (ZEB1, Gene ID: 6935, HGNC ID: 11642, TFClass: 3.1.8), and ELK1, ETS transcription factor (ELK1, Gene ID: 2002, HGNC ID: 3321, TFClass: 3.5.2) representing less than 0.8% of all possible transcription factors (Figs. 4d, 5). Thereby it represents a hyperconnnected network of networks surrounding an epigenomic cooperation event. The identified factors can further be surveyed at the level of basal expression or regulation in patient-derived tumor specimens in TCGA underlining elevated expression in tumor progression. The analysis validates previously reported associations and contacts implicated in chromatin remodeling and discovered newly identified cooperative interactions (Table 2). For the specific role of KDM3A in cancer, epigenomic and transcriptional cooperation with transcription factors is key. KDM3A cooperates with mitogenic basic helix-loop-helix factors including MYC, HIF1A, and SREBF1, and derives a lipogenic program from association with nuclear receptors like AR. Ultimately, by overlaying the motif-specific and genomic data produced through matched experiments, epigenomic events can be correlated with the transcriptomic effect of histone remodelers and transcription factors.
ChIP-Seq based approaches provide sequence resolution but detection of enriched transcription factor motifs is ambiguous and is most appropriately accomplished at the transcription factor family level to account for and include homologous factors. In contrast, transcriptomics studies provide directionality of regulation—transcriptional activation or repression upon epigenomic activity—an important aspect lacking in coordinate-based ATAC-Seq or ChIP-Seq experiments. Integration of different sequence, gene, or network-based approaches prioritizes high-fidelity cooperation partners in epigenomic regulation. Therefore, any combination of complementary data from sequence, gene, or network-based approaches is identified as desirable input for reliable regulatory systems biology analyses.
For the oncogenic nature and target specificity of an epigenomic master regulator, epigenomic and transcriptional cooperation with transcription factors is key. KDM3A is able to support initiation of transcription by its ability to specifically remove mono-methylation and di-methylation marks from the H3K9 residue leading to chromatin de-condensation.14 Transcriptionally silenced genes contain methylation marks on the H3K9 subunit18 and arrays of ChIP-Seq experiments monitoring H3K9 methylation marks revealed global histone demethylation effects of KDM3A. Combined assessment of histone demethylation events and gene expression changes indicated major transcriptional activation, suggesting that distinct oncogenic regulators, in particular transcription factors, may synergize with the epigenetic patterns controlled by KDM3A. Furthermore, the epigenetic factor was shown to cooperate with the androgen receptor to control prostate tissue-specific gene target networks introducing the concept of the epioncogene and transcriptional cooperation.19
While KDM3A is able to control chromatin accessibility, the mechanism by which it targets specific genes is of current interest and may influence understanding of epigenetic dysregulation in human disease. While several cancers exhibit deregulated KDM3A activity, in prostate adenocarcinoma it functions as a transcriptional coregulator with the androgen receptor.5,14,20,21 Such cooperative coactivation of the androgen receptor with KDM3A features a role for KDM3A as an active force in commencing oncogenesis in prostate epithelial cells. KDM3A is known to control the transcription and function of oncogenic transcription factors.22,23 However, an expanded study outlining the effects of perturbed KDM3A H3K9 demethylation upon human transcription factor response element recognition in cancer has so far been missing.
The cooperation network includes previously validated interactions of MYC, HIF1A, and AR in cancer but also highlights close association of KDM3A with transcriptional networks of factors rather studied in development and tissue differentiation like JUN, CEBPB, MYOD, SREBF1, SP1, MEIS1, ZEB1, or ELK1 (Table 1). By taking advantage of motif-specific target networks, KDM3A has the ability to induce glycolytic genes in urothelial bladder carcinoma.24,25 Epigenomic regulation of SREBF1 activity has been reported to stimulate lipogenesis, and SREBP1 regulates lipid accumulation and cell cycle progression in androgen independent prostate cancer cell lines.26,27,28 KDM3A regulates the transcriptional program of the AR, serves epigenomic master regulator by epigenomic and transcriptional cooperation of prostate adenocarcinoma.5 Despite some factors including the forkhead box (FOX) factors (TFClass: 3.3.1) family were frequently detected at the transcription factor family levels, lack of consistent overlap of epigenomic and transcriptomic data eventually excluded prominent cancer drivers like forkhead box A1 (FOXA1, Gene ID: 3169, HGNC ID: 5021, TFClass: 3.3.1), forkhead box M1 (FOXM1, Gene ID: 2305, HGNC ID: 3818, TFClass: 3.3.1), forkhead box O1 (FOXO1, Gene ID: 2308, HGNC ID: 3819, TFClass: 3.3.1), or forkhead box O3 (FOXO3, Gene ID: 2309, HGNC ID: 3821, TFClass: 3.3.1). Despite FOX factors are known to cooperate with nuclear hormone receptors (TFClass: 2.1.1), it is possible that the putative association with KDM3A steams from the fact that the closely cooperating AR frequently has FOX motifs nearby. Systematic, genome-wide surveys have elucidated that FOX motifs are adjacent androgen response elements (AREs),29,30 thereby facilitating cooperation at the level of transcription factors and promotion of prostate cancer progression.
Hundreds of transcription factors are significantly associated with each individual data analysis platform or high-throughput sequencing technology. Big data challenges can be overcome by systems biology analysis and integration of multi-omics data. Motif similarity is visualized by transcription factor family trees classifying superclass, class, and family of transcription factors (from inward to outward, Fig. 4) based on the characteristics of their DNA-binding domains. Single epigenomic or transcriptomic datasets examined by different analysis tools result in improved resolution but leave ambiguities. The footprints of cooperating transcription factors are found in cognate sequence motifs specific to their DNA binding domains. Such sequence motifs are more pronounced in events of cooperating epigenomic activity. Detected motif enrichment highlights the modularity, versatility, and efficacy of epigenomic cooperation, providing target specificity at genome-wide reach. The number of detected events in genome-wide epigenomic binding studies provides statistical power for sequence motif discovery and gene target enrichment. As a consequence, high-resolution epigenomic studies often arrive at multiple plausible solutions, though each suggested interaction or association may carry statistical significance. By intersecting complementary data platforms and analysis techniques, high-fidelity gene target networks involved in epigenomic and transcriptomic cooperation can be identified.
Within the regulome, epigenetic master regulators position themselves at the top of cellular hierarchies and control distinct phenotypic programs via reversible chemical modifications of chromatin, histone or nucleotide marks, without altering the core DNA sequence. Epigenetic oncogenes or tumor suppressors can arise when epigenetic master regulators are somatically activated or lost, and contribute to cancer initiation and progression.31 In cancer, such epigenetic master regulators are found at the top of regulatory hierarchies, particularly in pathways related to cellular proliferation, survival, fate, and differentiation. For the manifestation of a genomic or non-genomic aberration of an epigenetic master regulator, it is a necessity that its own activity is affected by somatic mutation, copy number alteration, expression levels, protein cofactors, or methylation status. Epigenetic master regulators often accomplish target specificity of their phenotypic program by cooperation with members of the transcriptional machinery and therefore may depend on tissue-specific expression of such auxiliary factors. In cancer, an epigenetic master regulator populates an extreme state and is either permanently switched on or off. An epigenetic master regulator will become a cancer driver, if it is not functionally neutral but rather contributes to tumorigenesis or disease progression due to its hyperactive or deactivated state. Genomic profiling of cancer patients has the ability to identify coincidence or mutual exclusivity of somatic alterations of epigenomic and transcription factors. Extreme states of epigenetic master regulators by somatic loss or gain of function in cancer may emphasize preexisting cooperative interactions with transcription factors, which may be subtle and difficult to detect under normal circumstances. A defined challenge in the field of epigenetic master regulators is to identify cancer-specific vulnerabilities in gene targets and biological pathways that are frequently and consistently perturbed under the control of an epigenetic driver.
In conclusion, the identification of transcriptional cooperation and regulatory hierarchies highlights the importance of epigenetic regulators in mitogenic control and their potential as therapeutic targets. Epigenetic regulators such as KDM1A, KDM3A, KDM5A, KDM6A, KDM7A, EZH2, DOT1L, and others have been shown to be critical in oncogenesis and cancer resistance.3,32,33 The discovery of the specific role of KDM3A in the interplay between a tissue-specific steroid receptor transcription factor and metabolic signaling provides a foundation for rational design of combination approaches where metabolic, epigenetic, and hormone-deprivation therapies may synergize. Our integrated multi-platform analysis reveals a complex molecular landscape of epigenomic and transcriptomic cooperation in cancer, providing avenues for precision medicine.34 A close teamwork of the transcriptional and epigenomic machinery was discovered, in which one component opens the chromatin, another recognizes gene-specific DNA motifs, and others scaffold between histones, cofactors, and the transcriptional complex. This highlights a close connection between the epigenomic and transcriptomic machinery, albeit much of the underlying principles remain to be discovered. In conclusion, transcriptomics in combination with epigenomic profiling and measurement of chromatin accessibility enable global detection of epigenetic modifications and characterization of transcriptional and epigenetic footprints. Chromatin remodelers and transcription factors are in close communication via recognition of post-translational histone modifications and coordinate the dynamic exchange of chromatin between open, transcriptionally active conformations and compacted, silenced ones. In cancer, due to the ability to team up with transcription factors, epigenetic factors concert mitogenic and metabolic gene networks, claiming the role of a cancer master regulators or epioncogenes. Exploration into the cooperative roles of epigenetic histone modifiers and transcription factor families in gene regulatory networks contributes to our understanding of how a seemingly promiscuous epigenomic program is converted into a specific transcriptional response assisting in oncogenesis.
Optimal experimental design mirrors different layers of the regulatory network and organizes sequencing assays in an array of coordinated experiments.4,5,35 Coordinated epigenomic and transcriptomic profiles are well-equipped to capture different regulatory levels governing the circuitry of a cooperative network. The workflow introduced in our approach is open and applicable to different epigenome and transcriptome profiles including ChIP-Seq, ATAC-Seq, RNA-Seq, and microarray experiments. Since the analysis primarily relies on differential expression, microarray or RNA-Seq data are equally applicable. Additionally, data on alternative transcripts may allow resolving regulation of splice-isoforms. For coordinated multi-omics analysis, it is important to integrate compatible datasets and look for matching conditions, each associated with presence or absence of a defined epigenomic event or factor. Genomic editing offers tools to conduct target-specific loss or gain of function studies such that an array of coordinated experiments can be assembled. Complementary epigenomic and transcriptomic data—on the one hand in form of significantly enriched genomic regions (epigenomic target regions), on the other hand as differentially expressed transcripts (target genes)—serves as input for four different analysis platforms (Figs. 1, 2).
Investigation of KDM3A as epigenetic switch in human cancer
KDM3A coordinates transcriptional activation by H3K9 demethylation thereby enabling chromatin accessibility and replacement of components of nucleosome-stalled polymerase complexes by tissue-specific transcription factors. The workflow is exemplified by reprocessing published data on KDM3A activity recorded with matching knockdown conditions in a human prostate carcinoma epithelial cellular model deposited in NCBI GEO entries GSE109748 and GSE7049829,36,37 (CRL-2505, American Type Culture Collection, Manassas, VA). The utilized cellular model is a variant derived from a xenograft and simulates castration-induced regression and relapse typical of human prostate carcinoma epithelial cells independent of dihydroxytestosterone stimulation. Furthermore, KDM3A is activated by somatic copy number amplification in lung, prostate, uterine, bladder, testicular germ cell, ovarian, cervical, breast, sarcoma, melanoma, and other cancers making its cooperation network an important target of broad interest in oncology.
Illumina HiSeq 2000 (Illumina, San Diego, CA) fastq files were aligned to the reference human genome 19 using the Bowtie software package.38 Peak-calling utilized a model-based analysis of ChIP-Seq (MACS) algorithm.39,40 Significant ChIP-Seq genomic locations relative to nearby gene bodies were annotated by ChIPSeek.41 ChIP-Seq peak regions were sorted and filtered by BEDtools.42 Average ChIP enrichment profiles over specific genomic features were calculated using the cis-regulatory element annotation system tool.43,44 ChIPSeq binding profiles were visualized in the integrative genomics viewer (IGV).45 Utilized conditions include ChIP-Seq profiles of antibodies specific for chromatin marks H3K9me1, H3K9me2, and the epigenomic modifier KDM3A in combination with small hairpin RNA (shRNA) knockdown of KDM3A matched with coordinated transcriptomic data of control and KDM3A knockdown cells using human transcriptome platform GPL10558 (HT-12 V4.0, Illumina, San Diego, CA).5 The epigenomic and transcriptomics datasets contained 77911 features and 4356 differentially expressed transcripts upon KDM3A knockdown, respectively, with p values and q values below 0.05 adjusted for multiple hypothesis testing.
Network analysis and transcription factor target enrichment
Human transcription factors were annotated according to their Human Genome Organization (HUGO) Gene Nomenclature Committee (HGNC) identification number using the using the multi-symbol checker tool. Discovered transcription factors were classified by shared DNA binding domains according to the hierarchical classification of human transcription factors (TFClass) database.46 Transcription factor binding and promoter sites were annotated utilizing transcription factor databases.30,47 For transcription factor binding site searches we built manual or utilized deposited position site-specific matrices or sequence logos of curated, non-redundant transcription factor databases. Statistically significant enrichment of these transcription factor motifs was determined using find individual motif occurrences (FIMO) and motif enrichment tools of the motif-based sequence analysis toolkit (MEME) suite.48 Upstream regulators were determined by ingenuity pathway analysis (IPA, Qiagen, Redwood City, CA) based on differentially expressed genes with an adjusted p value below 0.05. Significant enrichment of target gene networks with consistent transcription factor motifs was calculated for all target genes with annotated transcription factor motifs in the 3′ promoter region of their transcription start sites.49
Data is deposited in NCBI GEO entries GSE109748 and GSE70498. Identified transcription factors and statistics are assembled in the Supplementary Information. All data supporting the findings of this study is openly available within the paper and the Supplementary Information deposited at the npj Systems Biology and Applications website. A preprint version of this manuscript is made available to the scientific community on the preprint server bioRxiv 309484.50
Supplementary Table 1–6 are compiled as Supplementary Information. Supplementary Table 1: Master regulators among epigenomic and transcriptomic cooperation network. Supplementary Table 2: Detection and hierarchical classification of human transcription factors. Supplementary Table 3: ChIP-Seq analysis of motif enrichment (AME). Supplementary Table 4: ChIP-Seq transcription factor target (TFT) analysis. Supplementary Table 5: Transcriptomics upstream regulator analysis (URA). Supplementary Table 6: Transcriptomics transcription factor target (TFT) analysis.
Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Practicing science is experiencing the creativity of life. Undertaking systems biology is bringing it to life. This work on cancer systems biology is generously supported by the National Institutes of Health and the National Science Foundation. Emilia Mili Filipp—fierce and fearless friend, protector and companion on all adventures of Franzi Violet Filipp and Leland Volker Filipp. F.V.F. is grateful for the support of grant CA154887 from the National Institutes of Health, National Cancer Institute and grant CRN-17-427258 by the University of California, Office of the President, Cancer Research Coordinating Committee, the Goethe Institute, Washington, DC, USA, and the Federal Foreign Office, Berlin, Germany.