Prediction of mammalian tissue-specific CLOCK–BMAL1 binding to E-box DNA motifs

Marri, Daniel; Filipovic, David; Kana, Omar; Tischkau, Shelley; Bhattacharya, Sudin

doi:10.1038/s41598-023-34115-w

Download PDF

Article
Open access
Published: 12 May 2023

Prediction of mammalian tissue-specific CLOCK–BMAL1 binding to E-box DNA motifs

Scientific Reports volume 13, Article number: 7742 (2023) Cite this article

1680 Accesses
2 Citations
1 Altmetric
Metrics details

Subjects

Abstract

The Brain and Muscle ARNTL-Like 1 protein (BMAL1) forms a heterodimer with either Circadian Locomotor Output Cycles Kaput (CLOCK) or Neuronal PAS domain protein 2 (NPAS2) to act as a master regulator of the mammalian circadian clock gene network. The dimer binds to E-box gene regulatory elements on DNA, activating downstream transcription of clock genes. Identification of transcription factor binding sites and genomic features that correlate to DNA binding by BMAL1 is a challenging problem, given that CLOCK–BMAL1 or NPAS2–BMAL1 bind to several distinct binding motifs (CANNTG) on DNA. Using three different types of tissue-specific machine learning models with features based on (1) DNA sequence, (2) DNA sequence plus DNA shape, and (3) DNA sequence and shape plus histone modifications, we developed an interpretable predictive model of genome-wide BMAL1 binding to E-box motifs and dissected the mechanisms underlying BMAL1–DNA binding. Our results indicated that histone modifications, the local shape of the DNA, and the flanking sequence of the E-box motif are sufficient predictive features for BMAL1–DNA binding. Our models also provide mechanistic insights into tissue specificity of DNA binding by BMAL1.

Simultaneous single-cell three-dimensional genome and gene expression profiling uncovers dynamic enhancer connectivity underlying olfactory receptor choice

Article Open access 15 April 2024

Inferring gene regulatory networks from single-cell multiome data using atlas-scale external data

Article Open access 12 April 2024

Single-cell long-read sequencing-based mapping reveals specialized splicing patterns in developing and adult mouse and human brain

Article Open access 09 April 2024

Introduction

All animals and plants have a robust time-keeping mechanism which enables them to anticipate and adapt to periodic changes in the environment. In mammals, this time keeping mechanism, also known as the circadian system, is made up of a hierarchy of oscillators. A central clock in the suprachiasmatic nucleus of the hypothalamus coordinates peripheral clocks in multiple tissues¹. The intracellular gene regulatory network of both the central and peripheral circadian clocks involves a relatively small set of master transcription factors (TFs) interconnected through multiple negative and positive feedback loops². The core activators of the circadian network, the Clock Locomotor Output Cycles Kaput (CLOCK) and brain and muscle ARNT Like 1 (BMAL1), transcription factors from the basic helix–loop–helix (bHLH) family form a heterodimer complex CLOCK–BMAL1. In the absence of CLOCK, the Neuronal PAS domain protein 2 (NPAS2) which is also a member of the basic helix–loop–helix (bHLH)-PAS transcription factor family can compensate for the loss of CLOCK to form a heterodimer protein with Bmal1 to regulate the circadian clock³. In the classical model of clock gene regulation, the CLOCK–BMAL1 or NPAS2–BMAL1 dimer binds to a hexanucleotide sequence known as the E-box motif (canonical sequence CANNTG, where N is any nucleotide) within the promoter or enhancer regions of clock-controlled genes to regulate their transcription^3,4. BMAL1 has also been shown to bind to E-box-like sequences, such as CACGTT in the promoter of the murine Per2 gene⁵. However, the experimental support for genome wide binding of BMAL1 to such sequences is lacking. Therefore, in this publication we have focused solely on the classical E-box with the sequence of CANNTG. Alterations in the expression or binding activity of the core clock TFs disrupt natural circadian oscillations, and can lead to numerous pathologies including insomnia, cancer, cardiovascular disease, and metabolic disorder^6,7. Here we attempt to improve our understanding of gene regulation by the CLOCK–BMAL1 or NPAS2–BMAL1 complex and its perturbation using interpretable predictive models of DNA binding by the master regulatory factor BMAL1.

Genome-wide identification of transcription factor binding sites (TFBS) is a challenging problem. Typically, only a small fraction of classically defined sequence motifs for a particular TF are bound⁸. For example, the canonical E-box binding motif occurs more than 7 million times across the mouse genome, but less than 0.7% of these motifs are bound by CLOCK–BMAL1 or NPAS2–BMAL1 in mouse peripheral tissues⁹. Binding of a particular TF to its cognate DNA motif depends on several molecular features including the DNA sequence of the core motif, sequences flanking the core motif, chromatin accessibility, local shape of the DNA, presence of co-factors, histone modifications, DNA methylation, and other biophysical parameters^10,11,12,13. These features and their relative contribution to binding can vary greatly across cell and tissue types^14,15. Chromatin immunoprecipitation followed by sequencing (ChIP-seq) is the current gold standard for assaying genome-wide TF binding locations¹⁶. However, assaying the binding of a given TF under various conditions and in different tissues is prohibitively expensive. As such, several predictive computational models of genome-wide TF-DNA binding have been developed. From these models, DNA sequence and chromatin accessibility emerge as the most important determinants of TF binding^17,18,19. Chromatin accessibility assays such as deoxyribonuclease hyper-sensitive sites sequencing (DNase-seq), and assay for transposase-accessible chromatin sequencing (ATAC-seq) have been used to improve TFBS prediction²⁰. Recently, improved model predictions for TF binding have been obtained by leveraging advancements in machine learning and specifically deep learning techniques^21,22,23. However, these models are often difficult to interpret and thus offer limited insights into the mechanisms governing the tissue specificity of TF-DNA binding.

In this study, we present interpretable machine learning-based models capable of predicting which canonical E-box motifs occurring in accessible chromatin regions of the mouse liver, heart, and kidney are likely to be bound by BMAL1. Our predictive models are based on the XGBoost machine learning algorithm²⁴, with logistic regression used as a baseline algorithm to evaluate model performance. Published data from a BMAL1 ChIP-seq study⁹ was used to train and evaluate the models. When considering which features to include in our predictive models, we noted that DNA shape²⁵ and histone modifications²⁶ have been shown to be efficient predictors of TF binding in addition to DNA sequence. Specifically, it has been proposed that TFs prefer specific 3D DNA conformations and not just specific sequences²⁷. For example, incorporation of DNA shape features led to improved model performance when predicting in vivo binding of TFs from the basic helix–loop–helix (bHLH) family²⁵. Particularly, five distinct shape features—electrostatic potential (EP), minor groove width (MGW), propeller twist (ProT), roll, and helix twist (HelT) have been shown to be useful for TF-DNA binding prediction²⁸.

Interpreting the structure of our models, we identified genomic and epigenomic features most predictive of BMAL1–DNA binding. Most of the flanking DNA sequence features showed low importance in predicting the binding of BMAL1, except the second flanking nucleotide upstream of the E-box motif in the liver. On the other hand, the histone modifications H3K27ac, H3K4me1, H3K4me3, H3K36me3, together with DNA shape features EP, Roll, and MGW were significant predictors of BMAL1–DNA binding in all tissues, resulting in high performing models. However, our cross-tissue predictive model showed that that even though there is high specificity for BMAL1 to bind certain DNA conformations and chromatin contexts, these specificities vary across tissues.

Methods

ChIP-seq dataset preprocessing

Uniformly processed BMAL1 ChIP-seq peaks from the C57BL/6J mouse liver, kidney and heart were obtained from Gene Expression Omnibus under the accession code GSE110604⁹. BMAL1 ChIP-seq experiments were performed at Zeitgeber time 6 (ZT6). The locations of accessible chromatin regions in DNase I-hypersensitive (DHS) sites for all three tissues (DNase-seq) were obtained from the Encyclopedia of DNA Elements, ENCODE (Supplementary Materials). The DNase-seq experiments were perfomed on unsynchronized tissues. The Genome Reference Consortium Mouse Build 38 (GRCm 38) was used as the reference genome. DHS sequences were processed in Python with BEDTools²⁹ to extract all E-Box sequences (CANNTG) in accessible chromatin. E-box motifs in accessible chromatin regions but not overlapping their respective tissue ChIP-seq bed files were used as instances of unbound motifs (the negative dataset for the model). All accessible chromatin singleton E-boxes (instances of only one E-box motif under a BMAL1 peak) and E-boxes that were closest to the summit of the BMAL1 peaks for peaks with multiple E-boxes were labeled as bound (the positive dataset). All other E-boxes under BMAL1 peaks were considered ambiguous and ignored in further analysis. 1175 E-boxes, 1082 E-boxes, and 663 E-boxes from the bound Bmal1 liver, kidney and heart respectively were found to be ambiguous due to multiple E-boxes. We extended each E-box motif sequence to include 4-basepair (bp) flanking sequences upstream and downstream of the E-box. Since the E-box motif sequence is a palindrome, the reverse complement was ignored.Each E-box, thus represented by a 14-nucleotide sequence (6-bp core plus 4-bp sequence on either end), was one-hot encoded. The binary (bound and unbound) E-box data produced highly imbalanced datasets, as there were far more unbound than bound E-boxes in the mouse accessible chromatin. The bound E-box motif in the liver, kidney and heart numbered 3725, 3237 and 1313 respectively. The unbound E-box motif in the liver, kidney and heart numbered 189,581, 262,053 and 291,840 respectively. Specifically, the negative samples outnumbered the positives by factors of 51 in the liver, 223 in the heart, and 82 in the kidney. Like the previous reported occupancies of the number E-Box binding motif and the percentage of the motifs that are bound, the negative and positive samples reported in liver, kidney and heart are consistent with that.

DNA shape preprocessing

Because of the degrees of freedom of the DNA sugar phosphate backbone, neighboring base pairs and bases within a pair can vary their position relative to each other causing a change in the shape of the DNA either through rotation or translation. We used the R/Bioconductor package DNAshapeR³⁰ to estimate DNA shape features. The DNAshapeR algorithm predicts DNA shape features given a DNA sequence and encodes them in feature vectors. The feature vectors for each shape category were normalized to values between 0 and 1 by Min–Max normalization and placed in groups of 10 values for MGW, ProT and EP and groups of 11 values for HelT and Roll to be used as inputs for the predictive models. The number of bins for each shape feature is based on the length of the sliding window used to generate the features—5 bp for MGW, ProT, EP, and 6 bp for HelT and Roll.

Histone modification preprocessing

We downloaded ChIP-seq data for five histone modifications, H3K27ac, H3K4me1, H3K4me3, H3K27me3 and H3K36me3, for mouse liver, kidney and heart tissues from ENCODE (Supplementary Materials). Histone modification ChIP-seq was performed on unsynchronized tissues. These histone modifications were chosen based on data availability for all tissues and their established roles in transcription factor binding. The corresponding bed files were used to generate signal profiles and heatmaps using deepTools³¹. From the profiles and heatmaps generated, we found the histone modification ChIP-seq signals extended meaningfully to at most 1.5-kb region (± 750 bp) centered on the E-box core motif. Using the 1.5-kb region centered at the E-box core motif, we extracted the histone modification features for the binary dataset for each tissue using bwtool³². The features were then divided into ten bins with the same number of nucleotides in each bin.

Machine learning models

XGBoost

Extreme Gradient Boosting (XGBoost) is an ensemble learning method based on boosting trees for classification and regression²⁴. We used up to 20 features as inputs for each E-box motif—ten sequence features (one for each nucleotide), five DNA-Shape features and five histone modification features. The first two and last two nucleotide of the E-box motifs were set because they were the same in all motifs. Using the Scikit-learn library, we performed hyperparameter tuning of the following parameters to reduce the degree of overfitting—the number of iterations in training (n_estimators), the sum of sample weight of the smallest leaf nodes to prevent overfitting (min_child_weight), the maximum depth of the tree in building a model while training (max_depth), the sampling rate of the training set in each iteration (subsample), the learning rate (learning_rate), and the feature sampling rate when constructing each tree (colsample_bytree). The hyperparameter tunning of the XGBoost model through a grid search of the hyperparameter space with the following values: n_estimators = {30, 40, 50, 60, 70, 80, 90, 100}, min_child_weight = {1, 2, 3, 4, 5, 6}, subsample = {0.5, 0.6, 0.7, 0.8, 0.9, 1 }, max_depth = {1, 2, 3, 4, 5}, learning_rate = {0.1, 0.2, 0.3, 0.4, 0.5} and colsample_bytree = {0.6, 0.7, 0.8, 0.9, 1 } leading to a possible combination of 36,000 hyper-parameters.. In addition to tunning of the hyper parameters of the model, we also evaluated the model performance using fivefold cross validation on predicting the binding status of E-box motifs in accessible chromatin.

Logistic regression

Logistic regression is a parametric classification model that estimates the probability that the output variable belongs to the appropriate class³³. Logistic regression is used as the baseline for most machine learning-based classification models. In this study, we tuned the following logistic regression model hyperparameters to reduce overfitting in our testing dataset—the regularization solver for the training dataset (solver), and the maximum number of attempts the solver algorithm is to run before it converges (max_iter).

Results

BMAL1 binds most frequently to the CACGTG E-box motif in all tissues

BMAL1 is known to bind to E-box motifs, and these motifs are considered to have a consensus sequence of CANNTG³⁴.Therefore, we scanned the mouse mm10 reference genome and identified instances of the canonical E-Box motif (CANNTG). Since we have investigated all possible nucleotide permutation of the central two nucleotides, the reverse complement of the canonical E-box sequence was considered but a particulate E-box and its reverse complement were considered separately. For each DNase-seq dataset obtained from ENCODE for C57BL/6J mouse tissues (liver, heart, and kidney), we found the subset of E-boxes overlapping DNase-seq hypersensitive sites (DHS), i.e., E-boxes in accessible chromatin. Tissue specific lists of E-boxes in accessible chromatin were then compared with their tissue matched BMAL1 ChIP-seq peaks⁹ to extract all BMAL1-bound and unbound E-Boxes in accessible chromatin. Additionally, we found instances where BMAL1-bound E-boxes were not located in accessible chromatin (0.8% of all peaks). We excluded these E-boxes from model training and evaluation, to avoid confounding between the two classes of bound E-boxes. First, we compared occurrences of BMAL1-bound E-boxes in accessible chromatin across liver, heart, kidney and observed that they were highly tissue-specific, with only 398 E-boxes bound in common in all three tissues (Fig. 1A,B). E-boxes bound in all three tissues were often found in promoters of core circadian clock genes (results not shown). Next, we counted all instances of the canonical E-box motif (CANNTG) in the mouse genome, where N represents any nucleotide type. The canonical E-box includes 16 distinct E-box types, one for each permutation of the NN dinucleotide in the center of the motif. We computed the fraction of each individual E-box type compared to the total number of E-boxes (Fig. 1C). The E-box types CACATG and CATGTG represented the highest fraction of E-boxes in the mouse genome, jointly comprising 17.3% of all instances. These two motifs are the reverse complements of each other, and like all other non-palindromic E-boxes that are reverse complements of each other, the two show roughly equal frequencies. Interestingly, the palindromic BMAL1-preferred E-Box motif, CACGTG, occurs the fewest number of times (1.83% of all instances) in the mouse genome (Fig. 1C).

We then applied the same procedure to E-boxes in accessible chromatin of mouse liver, kidney, and heart. The palindromic motif CAGCTG was the most common E-box type across accessible chromatin regions in all three tissues, while the BMAL1-preferred E-Box CACGTG was among the three least common motifs which were all palindromes (Fig. 1D). We then used the overlap between the BMAL1 ChIP-seq and DNase-seq peaks to compute the percentage of BMAL1-bound E-boxes in the mouse liver, kidney, and heart, relative to the total number of E-boxes of the same type in accessible chromatin of their respective tissue. The BMAL1-preferred E-box CACGTG was the most frequently bound E-box type across all three tissues. In addition, about 18% of CACGTG E-boxes accessible in the liver were also bound in the liver, and for the kidney and heart these fractions were 15%, and 4%, respectively. Furthermore, less than 20% of all individual E-boxes found in accessible chromatin in any particular tissue were also bound in that same tissue (Fig. 1E). The kidney and heart had a higher number of E-boxes in accessible chromatin when compared to the liver. However, the liver had a higher number of BMAL1-bound E-boxes.

We observed instances where there were none (zero), exactly one (singleton) and two or more (multi) E-box motif(s) under a single BMAL1 ChIP-seq peak in all tissues (Fig. 1F). We then extracted all singleton E-boxes and E-boxes closest to the summit of the BMAL1 peak within multi-E-box peaks and labeled these as bound (positive dataset). The E-Boxes in accessible chromatin that were not bound by BMAL1 were labelled as unbound (negative dataset). All other E-boxes were left out from further analysis. The ratios of the positive to negative datasets were 1:51, 1:82, and 1:223 in liver, kidney, and heart, respectively.

Together, these results indicate that BMAL1 likely interacts, in a tissue-specific manner, with multiple different E-box types across the liver, kidney, and heart, with CACGTG being the most highly associated with BMAL1 binding.

Predicting genome wide BMAL1 binding within tissues

Nucleotides flanking the E-box have been shown to affect the binding specificity of an E-box binding TFs¹⁸. Therefore, we extended and one-hot encoded the genomic sequence for all BMAL1-bound (positive) and unbound (negative) E-boxes by 4 bps up- and down- stream of the E-box (Fig. 2A). Additionally, we computed the following DNA shape features for the extended, 14 bp sequence—electrostatic potential (EP), minor groove width (MGW), propeller twist (ProT), roll, and helix twist (HelT), using the k-mer + k-shape (k = 1) sequence feature model¹³ (Fig. 2A). Even though the shape features are derived from DNA sequence, they can potentially capture high order interdependencies between neighboring nucleotide and thus add extra information to the model input. DNA shape features can also explain the importance of flanking sequence in TF-DNA binding specificity¹⁸. Visualization of the DNA shape features EP, ProT, and Roll showed differences in DNA shape between the bound and unbound motifs across the liver, kidney, and heart, while the MGW feature showed a difference between the bound and the unbound motifs for the kidney only (Supplementary Figs. 1–3). The shape feature vector for each category was then normalized to values between 0 and 1 using Min–Max normalization and binned in groups of ten values for the DNA shape features EP, MGW and ProT, and groups of 11 values for HelT and Roll. These normalized DNA shape feature vectors were used as input features for the predictive models as shown in Fig. 2A.

Epigenetic modifications are also known to influence transcription factor binding. Specifically, histone modifications are involved in regulation of transcription factor occupancy and subsequent regulation of gene expression^25,35. Histone modification ChIP-seq binding signal values, across the genomic regions spanning ± 750 bps around the E-box were used to compute feature vectors for five histone modifications: H3K27ac, H3K4me1, H3K4me3, H3K27me3 and H3K36me3. The ± 750-bp region was chosen to consider local profiles of histone modifications around the size of a typical promoter or enhancer. The histone feature vector was binned into ten bins with the signal strength averaged across 150 bps of each bin (Fig. 2A)³⁶.

We implemented three different models using subsets of the final encoded feature set: (i) DNA sequence-only; (ii) DNA sequence and DNA shape (sequence + shape); and (iii) DNA sequence, DNA shape, and histone modification (sequence + shape + HM) model³⁶. We used two machine learning algorithms to predict the binding status of E-boxes in accessible chromatin. XGBoost²⁴ was our principal predictive algorithm, and we compared its performance with that of a baseline logistic regression model. Using grid search and stratified fivefold cross validation, we tuned model hyperparameters and derived the optimal hyperparameters for each model based on the liver, heart, and kidney datasets. The model with the optimal hyperparameters was trained through five-fold stratified cross validation and used to predict the binding of BMAL1 to the E-boxes in the liver, heart and kidney; and average performance across the 5 folds was reported (Fig. 2B,C). Model performance was evaluated using the performance metrics—area under the receiver operating characteristic (AUROC) and area under the precision-recall curve AUPRC, which showed that XGBoost outperformed logistic regression (Table 1, Supplementary Figs. 4, 5).

Table 1 Model performance scores: the performance of models predicting BMAL1–DNA binding status in open chromatin of the liver, kidney, and heart using XGBoost and logistic regression.

Full size table

DNA shape and histone modification features improve within tissue model performance

Performance of sequence-only models

In order to derive mechanistic insights, we developed two interpretable machine learning models based on logistic regression (LR) and XGBoost algorithms. LR was used as our baseline model. We trained and validated our XGBoost classifier on the liver, heart, and kidney with ten sequence features comprising two central nucleotides of the E-Box and an additional four flanking nucleotides up and down- stream of the E-box (NNNNCANNTGNNNN where the conserved CA and TG subsequences are not included). We calculated the average AUROC and AUPRC scores for each tissue using stratified fivefold cross-validation. The AUPRC can be considered a more appropriate metric in our case, given the unbalanced distribution in the two classes—bound vs unbound E-boxes. The mean AUROC scores were 0.71, 0.78, and 0.80 for the liver, kidney, and heart respectively (Fig. 3A), with corresponding mean AUPRC scores of 0.09, 0.10, and 0.06, respectively (Fig. 3B). The relatively high AUROC and AUPRC scores across all tissues suggest differences in the two central nucleotides and flanking sequence between BMAL1 bound and unbound E-boxes. However, there does not appear to be sufficient information in DNA sequence alone for a robust prediction.

Performance of sequence + shape models

The three-dimensional structure of DNA gives rise to specific local conformations. Features to quantify DNA shape have been derived computationally using Monte Carlo simulations from local DNA sequence^28,37. Five DNA shape features—electrostatic potential (EP), minor groove width (MGW), propeller twist (ProT), roll, and helix twist (HelT) were found to contribute to the binding affinity of transcription factors from the basic helix loop helix (bHLH) family¹³. We combined the DNA shape feature matrix with the sequence features as input for the model, to evaluate the contribution of DNA shape to BMAL1 binding. The mean AUROC scores were 0.97, 0.98, and 0.98 for the liver, kidney, and heart, respectively (Fig. 3A) which are all somewhat higher than the sequence-only model. Compared to the sequence-only model, the mean AUPRC metric increased sharply from 0.09 to 0.79 for the liver, 0.10 to 0.51 for the kidney, and 0.06 to 0.71 for the heart (Fig. 3B), suggesting significant differences in local DNA shape features between the bound and unbound E-boxes. Inspection of feature importance revealed that the EP, Roll and ProT DNA shape features contributed 33% to the prediction of BMAL1 binding to the E-boxes in the liver. For the kidney, the EP, ProT and MGW DNA shape features contributed 68% to prediction of BMAL1 binding, while in the heart, EP, Roll and MGW contributed 70% to the prediction. Overall, the EP, Roll, MGW and ProT DNA shape features had the biggest influence on prediction of bound E-Boxes across all three tissues (Supplementary Fig. 6). We also trained and evaluated DNA shape only models, however its performance was lower than even DNA sequence only models, suggesting that local shape or configuration of the DNA near the E-box by itself is not sufficient to predict BMAL1 binding (results not shown).

Performance of sequence + shape + histone modification (HM) models

HMs in gene promoter and enhancer regions are known to be correlated with transcription factor (TF) binding³⁸. However, the mechanisms of interaction between TF binding and HMs are not fully understood. Recent studies have shown that the extent to which HMs improve the performance of models predicting TF binding is TF-specific, with models of bHLH transcription factor binding showing significantly improved accuracy when HMs are included^39,40 Based on these findings, several models have been developed to improve TF binding prediction using results from epigenetic assays^41,42. We examined the importance of HMs in prediction of BMAL1 binding by adding five histone features (H3K27ac, H3K4me1, H3K4me3, H3K27me3 and H3K36me3) to the sequence and DNA shape feature matrix. These HM features were chosen based on data availability and their roles in transcription factor binding described in literature⁴⁰. Using models incorporating HM features, we obtained mean AUROC scores of 0.99, 0.988 and 0.99 for the liver, kidney, and heart, respectively (Fig. 3A). The mean AUPRC performance increased significantly to 0.95, 0.65 and 0.79 for the liver, kidney, and heart respectively (Fig. 3B).

Feature importance reveals tissue-specific BMAL1 binding grammar

Given the improved performance of the sequence + shape + HM models, we used the ELI5 permutation importance method⁴³ to identify features most predictive of BMAL1–DNA binding. The importance for each DNA shape and histone modification feature was calculated as the sum of the importance of all bins for that particular feature. The feature importance of each nucleotide type at a particular position relative to the E-Box motif was normalized to the sum of all feature importance at that nucleotide position. The immediate flanking sequences upstream and downstream of the core E-box binding motif were important predictors of BMAL1 binding in the liver, heart and kidney as compared to distal flanking sequences (Fig. 4). Analysis of the binding specificities of the bHLH transcription factors CBf1 and Tye7 in yeast has previously shown that 2-bp flanking sequences contribute to binding of these transcription factors to the E-box¹⁸. In our quantitative analysis of the E-box sequence, we did not find the two central base pairs of the CANNTG E-box motif to directly contribute to the model performance across the three mouse tissues, even though BMAL1 has a strong preference for the CG central dinucleotide across all three mouse tissues. Analysis of the feature weights showed the nucleotide G at the second proximal upstream flanking sequence (Seq-2) to be a strong predictor of BMAL1–DNA binding in the liver (Fig. 4A). This nucleotide accounted for more than 50% of feature weights used in predicting BMAL1–DNA binding in the liver. Other contributing features included EP (10%) and H3K27ac (6%). Most of the DNA shape and histone modification features had weights greater than 5% indicating their importance in predicting BMAL1 DNA binding in the liver, while most of the DNA sequence features except Seq-2 had a feature weight of less than 5%. In the kidney, H3K27ac had the highest feature importance, contributing 21% to the overall feature importance (Fig. 4B). EP followed with a feature importance of 19%. Three histone modifications (H3K27ac, H3K4me3 and H3K4me1) and four DNA shape features (EP, ProT, MGW and Roll) all had feature weights > 5%. In the heart, H3K27ac and H3K4me3 had the highest feature importance (both > 20%) followed by EP (8%). Most of the DNA sequence features had weights < 5% in both heart and kidney. The histone modifications H3K27ac, H3K4me1, H3K4me3 and DNA shape features EP, and Roll showed high importance scores across all three tissues (Fig. 4A–C). The histone modifications with the largest contribution to BMAL1 binding were H3K27ac, H3K4me1 in all three tissues, and H3K4me3 and H3K36me3 in the kidney and heart. These results show that the combination of the TF binding motif and its flanking sequence, local shape of DNA, and histone modifications is sufficient to produce predictive models of BMAL1 binding to E-box motifs, especially in the mouse liver. The second upstream flanking nucleotide (Seq-2) had by far the highest feature importance score in the liver. The nucleotides G and C were overrepresented at the second proximal upstream flanking sequence of the liver bound E-box motifs. This was supported by the sequence logo of the bound E-box sequence with 4 bps upstream and downstream of the core E-box motif (Fig. 4D). Analysis of the bound E-box motifs along with their upstream and downstream flanking sequence revealed that the nucleotide G is enriched at the third position of the 5’ flanking region (1228 out of 3374 bound E-boxes in the liver) (Fig. 5A,B). This was not the case for bound E-box motifs in the kidney and heart.

Cross-tissue models highlight differences in BMAL1–DNA binding in different tissues

To test the hypothesis that the DNA binding of BMAL1 is determined by similar factors across the three tissues, we developed cross-tissue models for binding prediction with features based on—(a) sequence only; (b) sequence plus DNA shape; and (c) sequence plus DNA shape plus histone modifications. We trained these models on all data available in the respective tissue, using the optimal hyper-parameters previously derived for the respective within-tissue model. Trained models were used to predict BMAL1 binding in a different tissue. Performance of the sequence-only models trained on tissue X and predicting tissue Y (X_Y model) was similar to the performance of the within-tissue sequence-only model in tissue X, for all tissues (Fig. 6A, blue bars). Surprisingly, the addition of the DNA shape and HM features resulted in decreased performance scores across all cross-tissue models relative to the sequence only models (Fig. 6A–C). The sequence plus shape model trained on the liver data was able to correctly classify 22% of the E-boxes bound in both kidney (liver_kidney) and heart (liver_heart) (Fig. 6C, brown bars). This model predicted most of the bound E-boxes in the kidney and heart as unbound, yielding a high false negative rate. The addition of histone modification features improved the AUROC and AUPRC for most cross-tissue models (Fig. 6A,B, green bars). However, the cross-tissue sequence plus DNA shape plus HM model trained on the liver data correctly classified only 18% of E-boxes bound in the kidney and 19% in the heart, also leading to a high false negative rate than the sequence plus shape model.

Interestingly, models trained on kidney and heart and evaluated on the liver (kidney_liver and heart_liver) displayed a dramatic increase in performance with the sequence plus DNA shape plus HM models compared to other types of models. The AUROC performance of the kidney_liver model increased from 0.68 for the sequence plus DNA shape model to 0.83 for the sequence plus DNA shape plus HM model, while the AUPRC score increased sharply from 0.055 to 0.38. In summary, while sequence-only models performed relatively poorly in cross-tissue binding prediction, adding genomic features like DNA shape and epigenetic features like histone modifications generally decreased the performance even further in most cross-tissue models. These results highlight the tissue specificity of BMAL1 DNA binding.

Discussion

Identification of transcription factor (TF)-DNA binding determinants can improve our understanding of gene regulatory grammar⁴⁵. However, precise DNA-binding sequences and the amount of flexibility in these sequences are currently unknown for many TFs, including BMAL1, a master regulator of the circadian clock. Features other than the simple DNA binding site sequence clearly contribute to usage of a DNA sequence as a TF binding site. While the architecture of the core clock gene regulatory network in the suprachiasmatic nucleus of the brain is believed to be similar to the architecture in peripheral tissues, clock-controlled gene expression is largely tissue-specific^9,46. Here we used XGBoost, an ensemble decision tree-based machine learning algorithm, to predict the binding of BMAL1 to its putative binding motif (the E-box) in three mouse tissues—liver, heart and kidney. We developed three different types of models: (1) sequence-only, (2) sequence plus DNA shape, and (3) sequence plus DNA shape plus histone modifications (Fig. 2A,B).

The CACGTG E-box type showed up the fewest number of times, compared to other E-box types, in the whole mouse genome and in accessible chromatin regions of all three tissues (Fig. 1C-D). However, this E-box type was the most frequently BMAL1 bound (Fig. 1E). This is consistent with the observations that the BMAL1-preferred binding motif is CACGTG⁴⁷. Interestingly, even though the dinucleotide CG was over-represented at the center of the BMAL1-bound E-boxes, these nucleotides did not enhance model performance. Additionally, the heart had more E-boxes in accessible chromatin than the liver and kidney; but had the lowest number of bound E-boxes in accessible chromatin (Fig. 1D,E). The role of circadian rhythms in the heart is not well understood, and only 6% of protein coding genes in the mouse heart are circadian-regulated as compared to 11–16% in the liver⁴⁸. Most likely, this is a consequence of lower overall BMAL1 binding in the heart. However, the low level of BMAL1 binding to otherwise accessible E-boxes in the heart remains to be resolved. One possibility is that a heart-specific E-box binding factor interferes with BMAL1 binding to E-boxes. For example, it has been shown that elevated levels of Usf1, a ubiquitous TF, can interfere with the binding of a mutant CLOCKΔ19:BMAL1 to E-box sites⁴⁹, and other such factors likely exist. Interestingly, neither kidney nor heart within-tissue models achieved the same level of performance as the liver within-tissue model (Fig. 3A,B). There is evidence that the heart circadian rhythm might be phase-shifted when compared to the liver, indicating that the maximal BMAL1 binding in the heart might occur at a different time than the time shown to result in maximal BMAL1 binding in the liver—Zeitgeber time 6 (ZT06)⁹. Since the same time was used in the heart BMAL1 ChIP-seq experiment, this could result in some of the heart BMAL1-bound E-boxes being labeled as unbound, which would affect model learning and be reflected in lower model performance, as observed. A limitation of our work is that we had only considered E-boxes in accessible chromatin and disregarded inaccessible E-boxes. Our observations confirmed that on average more than 75% of BMAL1 peaks lie in accessible chromatin, therefore BMAL1 is more likely to bind in accessible chromatin. However, it has been demonstrated that BMAL1-CLOCK can act as a pioneering factor and rhythmically control the accessibility of chromatin surrounding the BMAL1 bound sites⁵⁰.

Recent studies have shown that DNA shape computed using core TF binding motifs and their flanking sequences improves TF binding prediction for many human TFs^13,25,51. Additionally, DNA topology is highly correlated with the structure and stability of the nucleosome, suggesting that topological changes can influence the binding of TFs to DNA⁵². In our sequence plus shape models the EP, Roll, MGW and ProT DNA shape features had the highest influence on prediction of bound E-boxes (Supplementary Fig. 6). A recent study¹³ showed that for Max, a basic helix–loop–helix (bHLH) TF like BMAL1 and CLOCK, it was Roll and ProT that were the dominant determinants of TF-DNA binding affinity. These observations agree with our findings. Further, in agreement with previous studies, we found that DNA shape features by themselves do not enhance model accuracy.

Analysis of feature importance of our sequence plus shape plus histone modifications (HMs) models showed that the HMs H3K27ac and H3K4me3, and the DNA shape feature EP dominated binding prediction across the kidney and the heart and were also ranked highly in the liver. It has been shown that H3 acetylation and methylation modifications surrounding CLOCK–BMAL1 bound sites change in a rhythmic fashion⁵³. Our results demonstrate that even with only a snapshot of these HMs, i.e., a single ChIP-seq experiment from mice that are not light/dark synchronized, we can discern which E-boxes are bound and which are not with high levels of sensitivity. We propose that this is likely due to the information that is encoded in the shape and flanking sequence of the E-box motif in addition to the average levels of histone acetylation and methylation. Furthermore, we propose that this information is tissue specific as evidenced by the performance of our cross-tissue models. Intriguingly, the DNA sequence features by themselves had little to no effect on binding prediction in the kidney and heart. However, the second nucleotide upstream of the E-box had a large contribution to predicting BMAL1–DNA binding in the liver. The nucleotide G in this position contributed to about 50% of the feature importance score in the liver. Analysis of the bound E-box motif with their upstream and downstream flanking sequence revealed that the nucleotide G at the third position of the 5’ flanking sequence is enriched in bound E-box motifs in the liver Since the heart and kidney models do not rely on this feature it is understandable that liver_kidney and liver_heart cross-tissue models show an unexpected decrease in performance when DNA shape and histone modification features are added to the sequence features. On the other hand, kidney_liver and heart_liver cross tissue models show a boost in performance with the addition of histone modification features. These results suggest that there is some degree of commonality in BMAL1 binding between different tissues. However, in cross-tissue models sequence-only models exhibit the most robust performance with the exception of the kidney_heart and heart_kidney models, indicating that DNA shape and chromatin context features can exhibit high degree of tissue specificity and are more similar between kidney and heart than they are between liver and the other two tissues.

E-box binding specificity of the yeast bHLH TFs Cbf1 and Tye7 is governed by sequences flanking the E-box as reflected in DNA shape⁵⁴. Our findings extend this concept, indicating that not only might DNA shape and chromatin context confer different binding specificities to different TFs in the same tissue, but that they might also confer different binding specificities to the same TF in different tissues.

Data availability

The ChIP-seq datasets used for this study are available in GEO (accession number GSE110604) from a previous study⁹. DNase-seq and Histone modification datasets accession numbers are included in the supplementary material. The code used in the machine learning modeling are available at https://github.com/BhattacharyaLab.

References

Ko, C. H. & Takahashi, J. S. Molecular components of the mammalian circadian clock. Hum. Mol. Genet. 15(Spec No 2), 271–277 (2006).
Article Google Scholar
Takahashi, J. S., Hong, H. K., Ko, C. H. & McDearmon, E. L. The genetics of mammalian circadian order and disorder: implications for physiology and disease. Nat. Rev. Genet. 9, 764–775 (2008).
Article CAS PubMed PubMed Central Google Scholar
Landgraf, D., Wang, L. L., Diemer, T. & Welsh, D. K. NPAS2 compensates for loss of CLOCK in peripheral circadian oscillators. PLoS Genet. 12, e1005882 (2016).
Article PubMed PubMed Central Google Scholar
Cox, K. H. & Takahashi, J. S. Circadian clock genes and the transcriptional architecture of the clock mechanism. J. Mol. Endocrinol. 63, R93–R102 (2019).
Article CAS PubMed PubMed Central Google Scholar
Yoo, S. H. et al. A noncanonical E-box enhancer drives mouse Period2 circadian oscillations in vivo. Proc. Natl. Acad. Sci. USA 102, 2608–2613 (2005).
Article ADS CAS PubMed PubMed Central Google Scholar
Kathiresan, S. & Srivastava, D. Genetics of human cardiovascular disease. Cell 148, 1242 (2012).
Article CAS PubMed PubMed Central Google Scholar
Schödel, J. et al. Common genetic variants at the 11q13.3 renal cancer susceptibility locus influence binding of HIF to an enhancer of cyclin D1 expression. Nat. Genet. 44, 420–425 (2012).
Article PubMed PubMed Central Google Scholar
Lambert, S. A. et al. The human transcription factors. Cell 172, 650–665 (2018).
Article CAS PubMed Google Scholar
Beytebiere, J. R. et al. Tissue-specific BMAL1 cistromes reveal that rhythmic transcription is associated with rhythmic enhancer–enhancer interactions. Genes Dev. 33, 294–309 (2019).
Article CAS PubMed PubMed Central Google Scholar
Dror, I., Golan, T., Levy, C., Rohs, R. & Mandel-Gutfreund, Y. A widespread role of the motif environment in transcription factor binding across diverse protein families. Genome Res. 25, 1268–1280 (2015).
Article CAS PubMed PubMed Central Google Scholar
Morgunova, E. & Taipale, J. Structural perspective of cooperative transcription factor binding. Curr. Opin. Struct. Biol. 47, 1–8 (2017).
Article CAS PubMed Google Scholar
Wang, J. et al. Sequence features and chromatin structure around the genomic regions bound by 119 human transcription factors. Genome Res. 22, 1798–1812 (2012).
Article CAS PubMed PubMed Central Google Scholar
Zhou, T. et al. Quantitative modeling of transcription factor binding specificities using DNA shape. Proc. Natl. Acad. Sci. USA 112, 4654–4659 (2015).
Article ADS CAS PubMed PubMed Central Google Scholar
Filipovic, D. et al. Predictive Models of Genome-Wide Aryl Hydrocarbon Receptor DNA Binding Reveal Tissue Specific Binding Determinants. bioRxiv 2022.05.13.491754. https://doi.org/10.1101/2022.05.13.491754 (2022).
Steuernagel, L. et al. Computational identification of tissue-specific transcription factor cooperation in ten cattle tissues. PLoS ONE 14, e0216475 (2019).
Article CAS PubMed PubMed Central Google Scholar
Barski, A. et al. High-resolution profiling of histone methylations in the human genome. Cell 129, 823–837 (2007).
Article CAS PubMed Google Scholar
Arvey, A., Agius, P., Noble, W. S. & Leslie, C. Sequence and chromatin determinants of cell-type-specific transcription factor binding. Genome Res. 22, 1723–1734 (2012).
Article CAS PubMed PubMed Central Google Scholar
Gordân, R. et al. Genomic regions flanking E-box binding sites influence DNA binding specificity of bHLH transcription factors through DNA shape. Cell Rep. 3, 1093–1104 (2013).
Article PubMed PubMed Central Google Scholar
Pique-Regi, R. et al. Accurate inference of transcription factor binding from DNA sequence and chromatin accessibility data. Genome Res. 21, 447–455 (2011).
Article CAS PubMed PubMed Central Google Scholar
Das, M. K. & Dai, H. K. A survey of DNA motif finding algorithms. BMC Bioinform. 8, 1–13 (2007).
Article Google Scholar
Alipanahi, B., Delong, A., Weirauch, M. T. & Frey, B. J. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat. Biotechnol. 33, 831–838 (2015).
Article CAS PubMed Google Scholar
Quang, D. & Xie, X. DanQ: A hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences. Nucleic Acids Res. 44, 107 (2016).
Article Google Scholar
Zhou, J. & Troyanskaya, O. G. Predicting effects of noncoding variants with deep learning-based sequence model. Nat. Methods 12, 931–934 (2015).
Article CAS PubMed PubMed Central Google Scholar
Chen, T. & Guestrin, C. XGBoost: A scalable tree boosting system. in Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 13–17-August-2016. 785–794 (2016).
Mathelier, A. et al. DNA shape features improve transcription factor binding site predictions in vivo. Cell Syst. 3, 278-286.e4 (2016).
Article CAS PubMed PubMed Central Google Scholar
Wang, Y., Li, X. & Hu, H. H3K4me2 reliably defines transcription factor binding regions in different cells. Genomics 103, 222–228 (2014).
Article CAS PubMed Google Scholar
Slattery, M. et al. Absence of a simple code: How transcription factors read the genome. Trends Biochem. Sci. 39, 381–399 (2014).
Article CAS PubMed PubMed Central Google Scholar
Li, J. et al. Expanding the repertoire of DNA shape features for genome-scale studies of transcription factor binding. Nucleic Acids Res. 45, 12877 (2017).
Article ADS CAS PubMed PubMed Central Google Scholar
Quinlan, A. R. & Hall, I. M. BEDTools: A flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Article CAS PubMed PubMed Central Google Scholar
Chiu, T. P. et al. DNAshapeR: An R/Bioconductor package for DNA shape prediction and feature encoding. Bioinformatics 32, 1211–1213 (2016).
Article CAS PubMed Google Scholar
Ramírez, F. et al. deepTools2: A next generation web server for deep-sequencing data analysis. Nucleic Acids Res. 44, W160–W165 (2016).
Article PubMed PubMed Central Google Scholar
Pohl, A. & Beato, M. bwtool: A tool for bigWig files. Bioinformatics 30, 1618–1619 (2014).
Article CAS PubMed PubMed Central Google Scholar
Peng, C. Y. J., Lee, K. L. & Ingersoll, G. M. An introduction to logistic regression analysis and reporting. J. Educ. Res. https://doi.org/10.1080/0022067020959878696,3-14 (2010).
Article Google Scholar
Wang, Z., Wu, Y., Li, L. & Su, X. D. Intermolecular recognition revealed by the complex structure of human CLOCK–BMAL1 basic helix–loop–helix domains with E-box DNA. Cell Res. 23, 213–224 (2012).
Article ADS PubMed PubMed Central Google Scholar
Liu, S. et al. Assessing the model transferability for prediction of transcription factor binding sites based on chromatin accessibility. BMC Bioinform. 18, 1–11 (2017).
Article Google Scholar
Untitled Diagram—diagrams.net. https://app.diagrams.net/?src=about.
Zhou, T. et al. DNAshape: A method for the high-throughput prediction of DNA structural features on a genomic scale. Nucleic Acids Res. 41, 56–62 (2013).
Article ADS Google Scholar
Benveniste, D., Sonntag, H. J., Sanguinetti, G. & Sproul, D. Transcription factor binding predicts histone modifications in human cell lines. Proc. Natl. Acad. Sci. USA 111, 13367–13372 (2014).
Article ADS CAS PubMed PubMed Central Google Scholar
Guccione, E. et al. Myc-binding-site recognition in the human genome is determined by chromatin context. Nat. Cell Biol. 8, 764–770 (2006).
Article CAS PubMed Google Scholar
Xin, B. & Rohs, R. Relationship between histone modifications and transcription factor binding is protein family specific. Genome Res. 28, 321–333 (2018).
Article CAS PubMed PubMed Central Google Scholar
Heintzman, N. D. et al. Histone modifications at human enhancers reflect global cell-type-specific gene expression. Nature 459, 108–112 (2009).
Article ADS CAS PubMed PubMed Central Google Scholar
Ramsey, S. A. et al. Genome-wide histone acetylation data improve prediction of mammalian transcription factor binding sites. Bioinformatics 26, 2071–2075 (2010).
Article CAS PubMed PubMed Central Google Scholar
Korobov, M. & Lopuhin, K. ELI5 Documentation Release 0.11.0. (2021).
Crooks, G. E., Hon, G., Chandonia, J. M. & Brenner, S. E. WebLogo: A sequence logo generator. Genome Res. 14, 1188–1190 (2004).
Article CAS PubMed PubMed Central Google Scholar
Levine, M. & Tjian, R. Transcription regulation and animal diversity. Nature 424, 147–151 (2003).
Article ADS CAS PubMed Google Scholar
Mure, L. S. et al. Diurnal transcriptome atlas of a primate across major neural and peripheral tissues. Science 359, 0318 (2018).
Article Google Scholar
Hogenesch, J. B., Gu, Y. Z., Jain, S. & Bradfield, C. A. The basic-helix–loop–helix-PAS orphan MOP3 forms transcriptionally active complexes with circadian and hypoxia factors. Proc. Natl. Acad. Sci. USA 95, 5474–5479 (1998).
Article ADS CAS PubMed PubMed Central Google Scholar
Zhang, R., Lahens, N. F., Ballance, H. I., Hughes, M. E. & Hogenesch, J. B. A circadian gene expression atlas in mammals: Implications for biology and medicine. Proc. Natl. Acad. Sci. USA 111, 16219–16224 (2014).
Article ADS CAS PubMed PubMed Central Google Scholar
Shimomura, K. et al. Usf1, a suppressor of the circadian Clock mutant, reveals the nature of the DNA-binding of the CLOCK:BMAL1 complex in mice. Elife 2, 426 (2013).
Article Google Scholar
Menet, J. S., Pescatore, S. & Rosbash, M. CLOCK:BMAL1 is a pioneer-like transcription factor. Genes Dev. 28, 8–13 (2014).
Article CAS PubMed PubMed Central Google Scholar
Wang, S. et al. Predicting transcription factor binding sites using DNA shape features based on shared hybrid deep learning architecture. Mol. Ther. Nucleic Acids 24, 154–163 (2021).
Article CAS PubMed PubMed Central Google Scholar
Gupta, P., Zlatanova, J. & Tomschik, M. Nucleosome assembly depends on the torsion in the DNA molecule: A magnetic tweezers study. Biophys. J. 97, 3150 (2009).
Article ADS CAS PubMed PubMed Central Google Scholar
Koike, N. et al. Transcriptional architecture and chromatin landscape of the core circadian clock in mammals. Science 338, 349 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
Grove, C. A. et al. A multiparameter network reveals extensive divergence between C. elegans bHLH transcription factors. Cell 138, 314–327 (2009).
Article CAS PubMed PubMed Central Google Scholar

Download references

Author information

Authors and Affiliations

Department of Biomedical Engineering, Michigan State University, East Lansing, MI, USA
Daniel Marri, David Filipovic & Sudin Bhattacharya
Institute for Quantitative Health Science and Engineering, Michigan State University, East Lansing, MI, USA
Daniel Marri, David Filipovic, Omar Kana & Sudin Bhattacharya
Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, MI, USA
David Filipovic
Department of Pharmacology and Toxicology, Michigan State University, East Lansing, MI, USA
Omar Kana & Sudin Bhattacharya
Institute for Integrative Toxicology, Michigan State University, East Lansing, MI, USA
Omar Kana & Sudin Bhattacharya
Department of Pharmacology, Southern Illinois University School of Medicine, Springfield, IL, USA
Shelley Tischkau

Authors

Daniel Marri
View author publications
You can also search for this author in PubMed Google Scholar
David Filipovic
View author publications
You can also search for this author in PubMed Google Scholar
Omar Kana
View author publications
You can also search for this author in PubMed Google Scholar
Shelley Tischkau
View author publications
You can also search for this author in PubMed Google Scholar
Sudin Bhattacharya
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

D.M., S.B., and S.T. design the study. D.M., D.F., and O.K. developed the predictive model and generated all the model figures. D.M., S.B., O.K., S.T., and D.F. wrote the manuscript. D.M. prepared the figures and table. All authors reviewed the manuscript.

Corresponding author

Correspondence to Sudin Bhattacharya.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information 1.

Supplementary Information 2.

Supplementary Information 3.

Supplementary Information 4.

Supplementary Information 5.

Supplementary Information 6.

Supplementary Figures.

Supplementary Information 7.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Marri, D., Filipovic, D., Kana, O. et al. Prediction of mammalian tissue-specific CLOCK–BMAL1 binding to E-box DNA motifs. Sci Rep 13, 7742 (2023). https://doi.org/10.1038/s41598-023-34115-w

Download citation

Received: 15 February 2023
Accepted: 25 April 2023
Published: 12 May 2023
DOI: https://doi.org/10.1038/s41598-023-34115-w

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.