Conservation of trans-acting circuitry during mammalian regulatory evolution

Stergachis, Andrew B.; Neph, Shane; Sandstrom, Richard; Haugen, Eric; Reynolds, Alex P.; Zhang, Miaohua; Byron, Rachel; Canfield, Theresa; Stelhing-Sun, Sandra; Lee, Kristen; Thurman, Robert E.; Vong, Shinny; Bates, Daniel; Neri, Fidencio; Diegel, Morgan; Giste, Erika; Dunn, Douglas; Vierstra, Jeff; Hansen, R. Scott; Johnson, Audra K.; Sabo, Peter J.; Wilken, Matthew S.; Reh, Thomas A.; Treuting, Piper M.; Kaul, Rajinder; Groudine, Mark; Bender, M. A.; Borenstein, Elhanan; Stamatoyannopoulos, John A.

doi:10.1038/nature13972

Download PDF

Article
Open access
Published: 19 November 2014

Conservation of trans-acting circuitry during mammalian regulatory evolution

Andrew B. Stergachis¹^na1,
Shane Neph¹^na1,
Richard Sandstrom¹,
Eric Haugen¹,
Alex P. Reynolds¹,
Miaohua Zhang²,
Rachel Byron²,
Theresa Canfield¹,
Sandra Stelhing-Sun¹,
Kristen Lee¹,
Robert E. Thurman¹,
Shinny Vong¹,
Daniel Bates¹,
Fidencio Neri¹,
Morgan Diegel¹,
Erika Giste¹,
Douglas Dunn¹,
Jeff Vierstra¹,
R. Scott Hansen^1,3,
Audra K. Johnson¹,
Peter J. Sabo¹,
Matthew S. Wilken⁴,
Thomas A. Reh⁴,
Piper M. Treuting⁵,
Rajinder Kaul^1,3,
Mark Groudine^2,6,
M. A. Bender^7,8,
Elhanan Borenstein^1,9,10 &
…
John A. Stamatoyannopoulos^1,3

Nature volume 515, pages 365–370 (2014)Cite this article

26k Accesses
159 Citations
112 Altmetric
Metrics details

Subjects

Abstract

The basic body plan and major physiological axes have been highly conserved during mammalian evolution, yet only a small fraction of the human genome sequence appears to be subject to evolutionary constraint. To quantify cis- versus trans-acting contributions to mammalian regulatory evolution, we performed genomic DNase I footprinting of the mouse genome across 25 cell and tissue types, collectively defining ∼8.6 million transcription factor (TF) occupancy sites at nucleotide resolution. Here we show that mouse TF footprints conjointly encode a regulatory lexicon that is ∼95% similar with that derived from human TF footprints. However, only ∼20% of mouse TF footprints have human orthologues. Despite substantial turnover of the cis-regulatory landscape, nearly half of all pairwise regulatory interactions connecting mouse TF genes have been maintained in orthologous human cell types through evolutionary innovation of TF recognition sequences. Furthermore, the higher-level organization of mouse TF-to-TF connections into cellular network architectures is nearly identical with human. Our results indicate that evolutionary selection on mammalian gene regulation is targeted chiefly at the level of trans-regulatory circuitry, enabling and potentiating cis-regulatory plasticity.

Transposable elements contribute to cell and species-specific chromatin looping and gene regulation in mammalian genomes

Article Open access 14 April 2020

The changing mouse embryo transcriptome at whole tissue and single-cell resolution

Article Open access 29 July 2020

Expanded encyclopaedias of DNA elements in the human and mouse genomes

Article Open access 29 July 2020

Main

Gene regulation is classically partitioned into cis- and trans-acting compartments, which are in turn integrated to form a regulatory network. The cis compartment comprises DNA elements that encode TF recognition sites, while the trans compartment encompasses hundreds of TF genes and their DNA recognition repertoires. The cross-regulation of TF genes by one another creates a regulatory network that facilitates complex information processing and potentiates robustness at the cellular and higher levels¹.

In metazoan genomes, actuatable TF recognition sites are clustered into compact (∼100–300 bp) regulatory DNA regions that give rise to DNase I hypersensitive sites (DHSs) upon TF occupancy in place of a canonical nucleosome². Mice and humans diverged ∼90 million years ago³, and an extensive survey of mouse DHSs indicates that the cis-regulatory DNA compartment has evolved markedly since the last common ancestor⁴, generalizing and extending observations from selected TFs assayed by ChIP-seq in one or a few tissues^5,6. However, given the limited experimental resolution of previous studies, it is currently unknown how dynamic are individual in vivo TF recognition sites within broader regulatory regions, or more generally how cis-regulatory dynamics relate to the conservation of the higher-level cellular and physiological features that define mammals. Earlier studies of individual regulatory elements in Drosophila⁷ and zebrafish⁸ indicate a potential for functional conservation without sequence conservation, and the maintenance of regulatory activity with different phenotypic outcomes. However, the generality of these observations and their broader relevance for mammalian evolution is unclear.

Genomic DNase I footprinting enables systematic delineation of TF–DNA interactions at nucleotide resolution and on a global scale^9,10,11, permitting: (1) the simultaneous interrogation of hundreds of DNA-binding TFs expressed in a given cell type in a single experiment; (2) de novo derivation of the cis-regulatory lexicon of an organism; and (3) systematic mapping of TF-to-TF cross-regulatory networks^1,10.

To delineate an expansive set of specific mouse genomic sequence elements contacted by TFs in vivo, we performed genomic DNase I footprinting on 25 diverse mouse cell and tissue types (Extended Data Table 1). From an average of 323 million uniquely mapped DNase I cleavages per cell type, we identified an average of ∼1 million high-confidence (false discovery rate (FDR) 1%^10,11) DNase I footprints (6 to 40 base pairs (bp)), and a total of 8.6 million differentially occupied footprints (Fig. 1a and Extended Data Fig. 1a). DNase I footprints were highly reproducible (Extended Data Fig. 1b) and robust to intrinsic DNase I cleavage propensities (Extended Data Fig. 2a).

**Figure 1: Footprinting the mouse genome and comparison with human footprints.**

Evolutionary turnover of TF footprints

To study the evolution of TF occupancy patterns between mouse and human, we compared mouse DNase I footprint maps with those from 41 diverse human cell types^10,12 by using bi-directional pairwise alignments of the mouse and human genomes⁴ to resolve mouse DNase I footprints to the human genome (Fig. 1b). In total, 65% of mouse TF footprint sequences could be localized within the human genome, comparable to the cross-alignment rate of entire ∼150-bp DHSs⁴ (Fig. 1c). However, whereas 35% of mouse DHSs have human orthologues that are also DNase I hypersensitive in at least one human cell type⁴, only 22% of mouse TF footprints have human sequence orthologues that are occupied in any of the human cell types assayed (Fig. 1c). This indicates that the individual DNA elements within DHSs that are directly contacted by TFs in vivo have undergone massive turnover since the last common ancestor of mouse and human.

Conservation of TF recognition lexicon

Although most mouse TFs have human orthologues, the collective consequences of divergence in DNA binding domains and lineage-specific expansion of certain TF families (for example, KRAB zinc fingers) for the genomic occupancy landscape is unknown. We thus next explored the evolutionary stability of the mammalian TF recognition repertoire encompassed within mouse and human TF footprints. At directly occupied recognition sites for a given TF, footprinting data closely recapitulate TF ChIP-seq^10,11 (Extended Data Fig. 3), and average per-nucleotide DNase I cleavage profiles mirror the morphology of the DNA–protein binding interface^10,11,13. Examination of cleavage profiles at occupied sites for diverse TFs showed these to be nearly identical between mouse and human cell types (Fig. 2a and Extended Data Fig. 2b), suggesting that in vivo DNA recognition preferences for many TFs have experienced little change between mouse and human.

**Figure 2: Mouse TF footprints define a conserved cis-regulatory lexicon.**

To investigate comprehensively the divergence of mouse and human TF recognition repertoires, we performed de novo motif discovery on the 8.6 million mouse TF footprints. In total, we defined 604 unique motif models collectively accounting for the large majority of footprints (Fig. 2b), of which 355 models (59%) matched those within motif databases and 249 were novel (Extended Data Fig. 4a). Comparison of known and novel mouse-derived motif models to motif models derived de novo from 8.4 million human DNase I footprints¹⁰ revealed that >94% of the collective TF lexicon is conserved between mouse and humans (Fig. 2c). The human lineage has witnessed expansion of certain TF gene families, notably zinc finger TFs¹⁴; our results indicate that the proportion of genomic DNA elements bound by lineage-specific TFs in vivo is comparatively small. The fact that TF footprints in mouse and human contain highly similar effective in vivo recognition sequence repertoires indicates that regulatory divergence between mouse and humans has occurred chiefly at the level of individual TF-binding cis-regulatory elements.

A total of 22 novel motif models were selective for the mouse lineage and 14 were selective for the human lineage (Fig. 2c). The 22 novel mouse-selective motifs are found chiefly in distal elements (Extended Data Fig. 4b), where they populate ∼2% of DNase I footprints and show cell/tissue-specific occupancy, predominantly for mouse ES cells (Fig. 2d, e). This suggests that the TFs recognizing these elements may have important roles in very early development, when humans and rodents show more differences than at later stages¹⁵, and further highlights the role of distal gene regulation in species divergence¹⁶. Notably, whereas sequence matches to the 14 human-selective models in human DNase I footprints showed evidence of strong human-specific evolutionary constraint^10,17 (Fig. 2f), nucleotide diversity at sequence matches to the 22 mouse-selective models in human DNase I footprints is compatible with significantly reduced human-specific evolutionary constraint (P < 0.05) (Fig. 2f), consistent with a loss of TF occupancy (and selective pressure) due to divergence (or loss) of the cognate factor within the human lineage.

Conservation of TF-to-TF connections

We next sought to characterize the core mouse TF regulatory network, and to compare its features with the human TF network. Genomic footprinting provides a direct and empirical approach for mapping the core TF regulatory network of an organism comprising cross-regulatory interactions (network edges) between TF genes (network nodes)¹. Footprint-anchored TF regulatory networks precisely recapitulate well-validated TF-to-TF regulatory connections^1,18, and are agnostic to whether any given TF-to-TF regulatory interaction is positive (activating) or negative (repressive), as these may vary conditionally even for a given TF. Following the approach of ref. 1, we mapped mouse TF-to-TF networks connecting the 586 mouse TF genes with known recognition sequences (Supplementary Information) within each of the 25 cell/tissue types (Fig. 3a). This disclosed an average of 22,970 unique TF-to-TF edges per cell type, totalling 77,084 non-redundant edges across all 25 cell types. Differences between cell types derived from both the cell-selective usage of TFs, as well as the cell-selective occupancy patterns of these TFs. For example, the neuronal developmental regulator OTX2 is selective for neuronal tissue, but its connectivity/occupancy patterns differ between distinct neuronal cell/tissue types (Fig. 3b).

**Figure 3: Evolutionary dynamics of cis-regulatory logic.**

Mouse TF regulatory networks from functionally similar cell and tissue types are coherently organized into anatomical and functional groups (Fig. 3c), analogous to results from human TF regulatory networks¹. However, although the similarity (pairwise Jaccard indices) between all mouse and human networks was mostly maximal between orthologous mouse–human cell and tissue pairs (Fig. 3d, e), network differences within each species were smaller than differences between species (Fig. 3e).

We next asked to what extent specific mouse TF-to-TF regulatory connections were conserved in human. We first identified TF-to-TF connections that were mouse-specific, human-specific or shared across both orthologous human and mouse cell types (Fig. 4a and Extended Data Table 2). We then differentiated shared regulatory edges (that is, present in both a mouse cell type and its human orthologue) arising from TF occupancy of an orthologous binding element from those shared edges arising from occupancy of non-orthologous sequence within regulatory DNA of the orthologous target gene (Fig. 4a). In the former case, both sequence and circuitry are conserved; in the latter, circuitry only. Overall, ∼44% of the TF-to-TF regulatory connections are conserved between orthologous mouse and human cell types (P < 0.001) (Fig. 4b). However, >40% of these connections represent edges created by TF binding to a novel sequence element arising since mouse–human divergence (Fig. 4b). As such, conservation of functional regulatory circuitry is considerably greater than indicated by sequence conservation alone.

**Figure 4: Conservation of TF-to-TF regulatory circuitry.**

Comparative TF network architecture

We next compared the overall architecture of mouse and human TF networks. The architecture of complex networks can be analysed in terms of simple regulatory circuit ‘building blocks’ termed network motifs, such as the feed-forward loop (FFL)¹⁹. In human, despite the general selectivity of specific TF-to-TF edges for specific cell types, the pattern of utilization of three-node network motifs within each individual cell type network is nearly identical¹. Computing network motif utilization within each of the 25 mouse TF networks also revealed uniform patterns across mouse cell/tissue type regulatory networks (Extended Data Fig. 5a). Strikingly, these patterns are nearly identical with human, indicating that mouse and human TF networks utilize virtually the same architecture (Fig. 5a and Extended Data Fig. 5).

**Figure 5: Conserved organizing principles of mammalian TF regulatory networks.**

To analyse evolutionary conservation at the level of individual regulatory circuits, we identified all instances of each three-node network motif within each mouse cell type, extracted the constituent TFs, and computed how the same TFs were connected in orthologous human cell types. Despite the conservation of overall network architecture between mouse and humans, this analysis revealed that the specific combinations of TFs comprising individual regulatory circuits have undergone substantial remodelling between mouse and human (Fig. 5b and Extended Data Fig. 6). Overall, 39% of combinations of three TFs found within one or more three-node circuit in a given mouse cell type were also organized into at least one type of three-node circuit in an orthologous human cell type (Extended Data Fig. 6b). For example, >25% of three-TF combinations organized into ‘regulating mutual’ circuits were conserved between orthologous mouse and human cell types, whereas only 8% of three-TF combinations that form ‘mutual-and-three-chain’ circuits show such conservation. By contrast, 12% of three-TF combinations that form ‘mutual-and-three-chain’ circuits lose one cross-regulatory interaction, transforming them into FFL circuits in orthologous human cell types (Fig. 5b and Extended Data Fig. 6c). Collectively, TF circuits conserved between mouse and human were enriched in four major network motif types: (1) the FFL motif; (2) the ‘regulated mutual’ motif; (3) the ‘regulating mutual’ (RM) motif; and (4) the ‘clique’ motif (Fig. 5b and Extended Data Fig. 6c). As such, these circuits appear to comprise the most vital building blocks of mammalian TF regulatory architectures.

Conserved TF positions within networks

We next asked to what degree the position of a specific TF within a given network motif circuit was conserved between mouse and human. To analyse this, we focused on FFL and RM circuits, as these are both strongly conserved overall and have a clear top-down hierarchical organization (Fig. 5a, b). Computation of the propensity for each TF (of 586) to occupy each of the nodes within these network motifs revealed that the preferred position of a given TF within FFL and RM circuits is strongly conserved between orthologous human and mouse cell types (Fig. 5c, d). It also revealed conserved preferential positioning of entire classes of TFs within particular network motif positions. For example, TFs with ubiquitous cellular functions such as CTCF, SP1 and NRF1 systematically localize within the driver positions of FFL and RM circuits (Fig. 5c, d), while TFs involved in cell lineage fate decisions (for example, SOX2, NFE2 and FOXP3) preferentially localized within the final passenger positions (Fig. 5c, d and Extended Data Fig. 7a, b). We also found the passenger edges of FFL and RM motifs to be significantly more cell-selective than the driver edges (Extended Data Fig. 7c, d). These findings raise the possibility that one of the major functions of conserved mammalian network motifs may be to stabilize the expression of TFs that drive cell-type-specific regulatory programs via exploitation of stable cell-ubiquitous regulatory interactions.

A conserved developmental program

To explore how the TF regulatory network interacts with downstream non-TF structural/effector genes and to test for conserved interactions, we first quantified, for each TF, whether it preferentially regulates another TF gene(s) or a non-TF ‘structural’ gene(s) across different mouse and human cell types (Extended Data Fig. 8a). This parameter varied widely between different TFs; in general, TFs involved in development state specification such as HOXB1, OCT4 and SOX2 preferentially regulated other TF genes, while general transcriptional regulators such as NRF1, CTCF and SP1 preferentially regulated non-TF genes (Extended Data Fig. 8b, c). To test how these preferences varied by cell type, we averaged TF gene versus structural gene propensities for all TFs within each cell-type regulatory network. This revealed that the TF networks of pluripotent and early developmental cell types and tissues such as ES cells and fetal brain were globally significantly more oriented towards regulation of TF genes compared with the TF networks of more highly differentiated cell types (for example, B cells, T cells) and tissues (for example, adult brain) (Extended Data Fig. 8d). These TF versus structural gene preferences—both at the individual TF level and at the cell-type regulatory network level—were strongly conserved between mouse and human (Extended Data Fig. 8d, e). The above findings suggest the operation of a conserved global developmental regulatory program that directs a shift in the orientation of TF regulatory networks from TF genes to structural genes during the transition from primitive to definitive cells.

Taken together, our results expose several major organizing principles of mammalian gene regulation, and a fundamental hierarchy in the modes of evolutionary transmission of regulatory information, ranging from poor conservation of cis-acting sequence elements to the preservation of trans-acting and network-level regulatory features (Fig. 6). Conservation of trans-acting components is reflected both in the effective in vivo recognition repertoires of human and mouse TFs, which differ only slightly, and in the conserved patterns of TF-to-gene interactions. The dichotomy between cis- and trans-acting regulatory components is most apparent in the context of the core TF regulatory network. Whereas the individual DNA bases contacted by TFs in vivo have undergone extensive turnover since the last common ancestor of mouse and human, the repertoire of TFs regulating other TF genes is vastly more conserved. Notably, this cis-acting versus trans-acting disparity in mammals greatly eclipses that previously described for different Drosophila species²⁰.

**Figure 6: Hierarchy of evolutionary constraint on cis- versus trans-regulatory features.**

At the TF network level, organization of the regulatory circuitry in both mouse and human cell types appears to be governed by common principles that result in highly similar network architectures (Fig. 6). Conserved shifts in TF network orientation during the transition from primitive to definitive cells in both organisms suggest that the mammalian regulatory network architecture has converged around a central goal of guiding cell identity during development.

Collectively, our results indicate that evolutionary selection on gene regulation is targeted chiefly at the level of regulatory networks, and explain how essential features of the mammalian body plan and physiology have been maintained in the face of massive turnover of the cis-regulatory landscape.

Methods

Definition of DNase I footprint

Following the original description of ref. 21, DNase I footprints signify short polynucleotide segments over which the cleavage pattern induced by DNase I is attenuated by the presence of a ‘binding protein on the DNA sequence’. This concept was subsequently generalized to encompass altered cleavage patterns encompassing both attenuation of cleavage as well as potentiation of cleavage due to the alteration in minor groove resulting from TF–DNA engagement²². It is critical to recognize that DNase I footprints represent TF occupancy at specific positions along the genome. Recently, several publications have mistakenly confounded individual DNase I footprints with aggregated DNase I cleavage profiles for a given TF motif^23,25. Aggregated DNase I cleavage plots were originated by ref. 9 to visualize and summarize averaged per-nucleotide DNase I cleavage patterns across hundreds to thousands of instances of a given TF recognition sequence (typically within DHSs) genome-wide^9,10. Because they encompass both occupied and unoccupied motifs, the morphology of the averaged profile depends greatly on the proportion of occupied elements. In the case of TFs with few high-affinity, highly occupied sites, such as the glucocorticoid receptor, aggregated cleavage profiles will dominantly reflect the unoccupied elements, and thus converge on intrinsic DNase I cleavage biases, which have now been well defined²⁴. Failure to acknowledge this feature of the data has mistakenly led to erroneous statements concerning DNase I footprinting of low-occupancy TFs, and to restating of previously published conclusions^10,21.

Genomic footprinting

A description of each cell and tissue type used in this study can be found in Extended Data Table 1 and at https://genome.ucsc.edu/encode/dataSummaryMouse.html. IACUC approval for all mouse samples was obtained from the Fred Hutchinson Cancer Research Center. Mouse cell and tissue types were subjected to DNase I digestion and high-throughput sequencing, following previous methods²⁶. 36-bp sequence tags were aligned to the reference genome, build NCBI37/mm9, using Bowtie 3, version 0.12.7 with parameters: –mm -n 3 -v 3 -k 2, and –phred33-quals. DNase I footprint discovery and false discovery rate estimation (software available at https://github.com/StamLab/footprinting2012) were performed as previously described¹⁰ using 36-mer sequencing reads and unique mappability information for mouse, build NCBI37/mm9 (available at http://www.uwencode.org/proj/hotspot/). For clarity, we note that the footprint detection algorithm we employed differs substantially from (and greatly outperforms) an early algorithm⁹. A recently published modification of the algorithm of ref. 10 termed Wellington incorporates stranded cleavage information and specifically identifies high occupancy sites, although at the expense of greatly reduced sensitivity²⁷. Of note, another recently published DNase I footprint detection algorithm²⁵ was reported to have compared itself against the algorithm of ref. 10, but in fact compared itself against an ad hoc concoction of the ref. 9, ref. 10 and ref. 28 algorithms.

The number and proportion of all DNase I cleavages that fell within DNase I hotspot regions were calculated as previously described²⁶ (Extended Data Table 1). To identify the total cohort of DNA elements contained within mouse FDR 1% DNase I footprints we first computed the multi-set union of all footprints across all cell types using BEDOPS²⁹. For each element of the union, we then collected all significantly overlapping footprints, which were defined as those footprints with 65% or more of their bases in common with the element (bedmap–fraction-map 0.65). A footprint’s genomic coordinates were redefined to the minimum and maximum coordinates from its overlap set (bedmap–echo-map-range), which always included the footprint itself. All redefined footprints from the union then passed through a subsumption and uniqueness filter: when a footprint was genomically contained within another, the filter discarded the smaller of the two or selected just one footprint if identical. Footprints passing through the filter comprised the final set of 8.6 million combined footprints across all cell types. Unlike footprints from any single cell type, the combined set included overlapping footprints. We further computed the number of cell types from which each of these 8.6 million combined footprints were derived. To identify the reproducibility of a DNase I footprint, we calculated for every sample the proportion of DNase I footprints that were independently discovered in 1 or more other samples from the same species using an overlap criterion of 25% (bedmap–fraction-either 0.25).

Accounting for intrinsic DNase I cleavage preferences

Different rates of DNase I cleavage of phosphate bonds between different flanking base combinations was originally discussed by ref. 21, and have more recently been exhaustively quantified by ref. 24, who performed deep sequencing of DNase I-digested naked DNA from yeast and from human fetal lung fibroblast cells (IMR90) (ref. 24). For each nucleotide within a genomic window [i,l] the normalized expected cleavage rate is . We define a_k as the relative cleavage bias of the 6-mer spanning the positions [k − 3, k + 2] as described in ref. 24. We redistributed the total observed cleavages () in a window [i,l] such that the observed and expected count for each base j is n_j and . The per-nucleotide deviation from intrinsic sequence specificity was defined as . The sequence bias normalization was computed separately for each strand and then recombined for visualization purposes.

Using deeply mapped DNase I cleavage preferences²⁴, we analysed each FDR 1% footprint in all mouse and human cell/tissue types and counted the total number of mapped tags falling in each footprint and the left and right flanking regions. We then randomly assigned the same number of simulated tags to positions within these regions, using probabilities proportional to the DNase I cut-rate bias model for the sequence context surrounding each position. A new footprint-occupancy score (FOS) was calculated over the same L, C and R regions as before¹⁰ and compared to the FOS value of the original footprint. Footprints that showed smaller FOS values using the DNase I cut-rate bias model were considered potential false-positive footprints.

Correspondence of DNase I footprints with ChIP-seq peaks

TF occupancy profiles generated by ChIP-seq represent a mixture of both direct (TFs directly contacting the DNA) and indirect (TFs contacting another protein or complex that is contacting the DNA) occupancy events. Of note, for the majority of TFs analysed to date, the indirect component predominates¹⁰. In contrast to ChIP-seq, DNase I footprinting provides information exclusively at sites of direct TF occupancy¹⁰. In Extended Data Fig. 3, motif models (from TRANSFAC, JASPAR Core, and UniPROBE) were used in conjunction with the FIMO motif scanning software³⁰, version 4.6.1 using a P < 1 × 10⁻⁵ threshold, to find all motif instances of CTCF (Transfac model V_CTCF_01), GATA1 (Jaspar model MA0035.2-GATA1), MAX (Jaspar model MA0058.1-MAX), Myc (Jaspar model MA0147.1-Myc), and TBP (Transfac model V_TATA_01) within DNase I hotspots of the MEL cell line. We buffered (±30 nucleotides) discovered motif instances and counted at each base position within the buffered motif the number of uniquely mapping DNase I sequencing reads with a 5′ end mapping to that position. We sorted buffered motif instances by their total counts, and then normalized each instance’s counts to a mean value of 0 and variance 1. A heat map, with 1 row per motif instance, was generated using matrix2png³¹, version 1.2.1. A 46-species phyloP evolutionary conservation score heat map over the same ordered motif instances and bases was generated using the same processing techniques. Motif instances that overlapped DNase I footprints by at least 3 nucleotides were annotated. Uniformly processed mm9 MEL ChIP-seq peaks were downloaded from the UCSC Genome Browser website and motif instances overlapping ChIP-seq peaks by at least 3 nucleotides were also annotated.

Identification of orthologous human sequence at mouse footprints

We aligned the coordinates for the 8.6 million combined mouse footprints to the human genome using the ‘over chain’ best pairwise alignment file available from the UCSC Genome Browser. Mouse footprints with 50% or more of their constituent sequences aligned to the human genome, with at least half not aligned to insertions or deletions, were considered successfully aligned. For a description of the alignment procedure, see ref. 4.

Aggregated DNase I cleavage profiles

Mouse motif models from TRANSFAC³², version 2011.1, JASPAR Core³³, and UniPROBE³⁴ were used in conjunction with the FIMO motif scanning software, version 4.6.1, using a P < 1 × 10⁻⁵ threshold, to find predicted motif instances within hotspot regions as identified by the hotspot algorithm²⁶. All motif instances identified for a given model were padded by 10 bp on each side, and aligned in a strand-sensitive manner. DNase I cleavages were averaged for each aligned nucleotide to create an aggregate profile for the motif model.

De novo motif model discovery and comparison

The method for the identification of de novo motif models using mouse DNase I footprints was identical to that previously described using human DNase I footprints¹⁰. Across 25 mouse cell types, we identified 604 unique motif models within DNase I footprints.

We compared de novo motif models to models available as part of various experimentally grounded databases, including TRANSFAC, JASPAR Core, and UniPROBE using the TOMTOM software, version 4.6.1 (ref. 35). TOMTOM parameters were set to their default values during model comparisons with the exception of the min-overlap argument, which was set to 5. When partitioning the de novo motifs by assigning each to a single category, the order of match assignment preference was to TRANSFAC, JASPAR Core, UniPROBE and finally to the novel motif category. The novel motif models were further classified using previously published motif models derived from human DNase I footprinting experiments¹⁰. We also determined the proportion of motif models in each experimentally grounded database that matched to mouse de novo motif models using TOMTOM with the same parameter settings.

Analysis of nucleotide diversity (π)

To quantify the nature of selection operating on regulatory DNA, we surveyed nucleotide diversity (π) in DNase I footprints. Population genetics analyses were performed as previously described on 53 unrelated, publicly available human genomes released by Complete Genomics, version 1.10 (ref. 36). Relatedness was determined both by pedigree and with KING³⁷. Variant sites were filtered by coverage (>20% of individuals must have calls). Additionally, Complete Genomics makes partial calls at some sites (that is, one allele is A and the other is N). These were counted as fully missing. Repeats were defined by RepeatMasker, downloaded from the UCSC Genome Browser (http://www.repeatmasker.org). CpGs and repeats were removed from all footprints before analysis. π for a single variant is 2pq, where p = major allele frequency and q = minor allele frequency. π was calculated for each cell type by summing for all variants and dividing by total number of bases considered. Although binding elements for mouse-selective motif models are enriched in mouse DNase I footprints, instances of these models in human footprints are also present, but to a significantly lesser degree. To identify instances of mouse-selective motif models in human regulatory elements, human DHSs were scanned using each of the novel mouse-selective motif models and the FIMO software tool (P < 1 × 10⁻⁵). Predicted motif instances in human DHSs were then filtered to those that overlapped human DNase I footprints identified in any human cell type by at least three nucleotides.

Calculation of cell-selective motif occupancy

We scanned for instances of a motif model using the FIMO software tool (P < 1 × 10⁻⁵) and filtered predicted motif instances to those that overlapped DNase I footprints identified in a particular cell type by at least three nucleotides. To derive a final occupancy value for a motif model in that cell type, we counted the total number of DNase I footprinted motif instances for that motif model and normalized it by the total number of bases contained within DNase I footprints in that cell type.

Calculation of promoter-proximal occupancy of motif models

We scanned for instances of a novel mouse-selective motif model using the FIMO software tool (P < 1 × 10⁻⁵) and filtered predicted motif instances to those that overlapped DNase I footprints identified in any cell type by at least three nucleotides. We classified those within 5 kb of a transcriptional start site using RefSeq annotations as ‘promoter-proximal’ and all others as ‘promoter-distal’.

TF regulatory network construction

Transcription factor (TF) regulatory networks were constructed as previously described¹ using 5,000 nucleotide buffers anchored on canonical TF transcriptional start site (TSS) annotations. TF genes and motif models used for network construction were collected from the JASPAR Core, UniPROBE and TRANSFAC databases (Supplementary Information). To create genome-wide networks this method was extended to include all mm9 RefSeq genes, anchored using the 5′-most TSS annotation³⁸.

Clustering and similarities of TF regulatory networks

We computed the pairwise Jaccard distances between TF regulatory networks and applied Ward clustering³⁹ using the hclust and dendrogram functions in R. The heat map representation in Fig. 3d used the Jaccard index for a similarity measure. Importantly, all comparisons were made using the same subset of orthologous TF genes (567) with known, associated motif models in both species.

TF regulatory edge conservation

To identify conserved regulatory connections that are also sequence conserved we first collected all motif instances that overlapped a DNase I footprint by at least 3 nucleotides in a specific mouse cell type that gave rise to a regulatory edge in that cell-type TF regulatory network. We then aligned the coordinates of this mouse motif instance to the human genome using the ‘over chain’ best pairwise alignment file available from the UCSC Genome Browser. A mouse motif instance was considered successfully aligned if 50% or more of its underlying sequence aligned to the human genome, with at least half not aligned to insertions or deletions. If a footprinted mouse motif instance aligned to a motif instance of the same TF in an orthologous human cell type that also overlapped a footprint by 3 nucleotides or more, the human motif possibly gave rise to the same regulatory edge. If it did, the edge in the mouse regulatory network was classified as a shared edge between species arising from orthologous binding elements. Notably, an edge that connects two TFs within a regulatory network may arise from a single, or multiple, distinct footprinted TF binding elements. In cases where multiple, distinct footprinted TF binding elements underlie a regulatory edge within a mouse cell-type TF regulatory network, this regulatory edge is considered to arise from an orthologous binding element so long as one of these TF binding elements is a shared connection arising from an orthologous binding element.

We calculated an empirical P value to evaluate the significance of the number of shared edges found between orthologous mouse and human cell types. We first generated 1,000 randomized human TF regulatory networks. When creating a randomized network, we ignored the usual requirement that a motif instance must significantly overlap a human footprint. The genomic space used to construct a random network was identical to that used in the observed case (within 5,000 nucleotides of a canonical TSS). A random subset of generated edges was chosen so that the in-degree to every TF gene node was identical to that of the observed human TF regulatory network case (and, hence, the total number of edges was the same), and all edges were unique. We then determined the number of functionally conserved edges between the observed mouse TF regulatory network and each randomized human TF regulatory network. We counted the number of times this number of functionally conserved edges was at least as large as in the observed TF regulatory network's case. An empirical P value was calculated as one more than the number of times this event occurred divided by 1,000. This analysis was performed between every pair of orthologous cell types. No randomized experiment gave a functionally conserved number that reached or exceeded the observed, real TF regulatory networks case.

Network motif architectures

We removed self-edges from every TF regulatory network and used the mfinder software tool for network motif analysis⁴⁰. A z-score was calculated over each of 13 network motifs of size 3 (three-node network motifs), using 250 randomized networks of the same size for a null estimate. We vectorized z-scores from every cell type and normalized each to unit length to create triad significance profiles¹⁹.

Distribution of three-node network motifs

We enumerated all three-node circuits in a mouse TF regulatory network, and determined if and how each was connected in an orthologous human cell-type TF regulatory network. Software is available for download at https://github.com/StamLab/network-motifs.

Central-facing versus peripheral-facing TF enrichments

Enrichments were calculated by taking the log base 2 of the ratio of two proportions. The numerator was the proportion of outgoing edges from a TF node in the regulatory network that connected to another TF node, divided by the total number of input edges to all TFs. The denominator was the proportion of outgoing edges from a TF node that connected to any non-TF gene node, divided by the total number of input edges to all non-TFs gene nodes.

Accession codes

Primary accessions

Gene Expression Omnibus

GSE51341

Data deposits

All data are available through the mouse ENCODE data repository at UCSC (http://genome.ucsc.edu/ENCODE/) and through GEO series accession GSE51341, or as indicated in Extended Data Table 1. TF regulatory networks may be viewed and downloaded from https://tools.stamlab.org/interactome/mouse and processed data can be downloaded at http://www.mouseencode.org. Human DNase I data can be accessed with GEO series accession GSE51341 and processed data can be viewed and downloaded from http://genome.ucsc.edu/.

References

Neph, S. et al. Circuitry and dynamics of human transcription factor regulatory networks. Cell 150, 1274–1286 (2012)
Article CAS Google Scholar
Thurman, R. E. et al. The accessible chromatin landscape of the human genome. Nature 489, 75–82 (2012)
Article ADS CAS Google Scholar
Mouse Genome Sequencing Consortium Initial sequencing and comparative analysis of the mouse genome. Nature 420, 520–562 (2002)
Google Scholar
Vierstra, J. et al. Mouse regulatory DNA landscapes reveal global principles of cis-regulatory evolution. Science (in the press)
Schmidt, D. et al. Five-vertebrate ChIP-seq reveals the evolutionary dynamics of transcription factor binding. Science 328, 1036–1040 (2010)
Article ADS CAS Google Scholar
Villar, D., Flicek, P. & Odom, D. T. Evolution of transcription factor binding in metazoans - mechanisms and functional implications. Nature Rev. Genet. 15, 221–233 (2014)
Article CAS Google Scholar
Ludwig, M. Z., Bergman, C., Patel, N. H. & Kreitman, M. Evidence for stabilizing selection in a eukaryotic enhancer element. Nature 403, 564–567 (2000)
Article ADS CAS Google Scholar
Fisher, S., Grice, E. A., Vinton, R. M., Bessling, S. L. & McCallion, A. S. Conservation of RET regulatory function from human to zebrafish without sequence similarity. Science 312, 276–279 (2006)
Article ADS CAS Google Scholar
Hesselberth, J. R. et al. Global mapping of protein-DNA interactions in vivo by digital genomic footprinting. Nature Methods 6, 283–289 (2009)
Article CAS Google Scholar
Neph, S. et al. An expansive human regulatory lexicon encoded in transcription factor footprints. Nature 489, 83–90 (2012)
Article ADS CAS Google Scholar
Samstein, R. M. et al. Foxp3 exploits a pre-existent enhancer landscape for regulatory T cell lineage specification. Cell 151, 153–166 (2012)
Article CAS Google Scholar
Stergachis, A. B. et al. Exonic transcription factor binding directs codon choice and affects protein evolution. Science 342, 1367–1372 (2013)
Article ADS CAS Google Scholar
Vierstra, J., Wang, H., John, S., Sandstrom, R. & Stamatoyannopoulos, J. A. Coupling transcription factor occupancy to nucleosome architecture with DNase-FLASH. Nature Methods 11, 66–72 (2014)
Article CAS Google Scholar
Looman, C., Abrink, M., Mark, C. & Hellman, L. KRAB zinc finger proteins: an analysis of the molecular mechanisms governing their increase in numbers and complexity during evolution. Mol. Biol. Evol. 19, 2118–2130 (2002)
Article CAS Google Scholar
Raff, R. A. The Shape of Life: Genes, Development, and the Evolution of Animal Form (Univ. Chicago Press, 1996)
Book Google Scholar
King, M. C. & Wilson, A. C. Evolution at two levels in humans and chimpanzees. Science 188, 107–116 (1975)
Article ADS CAS Google Scholar
Vernot, B. et al. Personal and population genomics of human regulatory variation. Genome Res. 22, 1689–1697 (2012)
Article CAS Google Scholar
Sullivan, A. M. et al. Mapping and dynamics of regulatory DNA and transcription factor networks in A. thaliana . Cell Rep. 8, 2015–2030 (2014)
Article CAS Google Scholar
Milo, R. et al. Network motifs: simple building blocks of complex networks. Science 298, 824–827 (2002)
Article ADS CAS Google Scholar
Wittkopp, P. J., Haerum, B. K. & Clark, A. G. Evolutionary changes in cis and trans gene regulation. Nature 430, 85–88 (2004)
Article ADS CAS Google Scholar
Galas, D. J. & Schmitz, A. DNAse footprinting: a simple method for the detection of protein-DNA binding specificity. Nucleic Acids Res. 5, 3157–3170 (1978)
Article CAS Google Scholar
Stamatoyannopoulos, J. A., Goodwin, A., Joyce, T. & Lowrey, C. H. NF-E2 and GATA binding motifs are required for the formation of DNase I hypersensitive site 4 of the human beta-globin locus control region. EMBO J. 14, 106–116 (1995)
Article CAS Google Scholar
He, H. H. et al. Refined DNase-seq protocol and data analysis reveals intrinsic bias in transcription factor footprint identification. Nature Methods 11, 73–78 (2014)
Article CAS Google Scholar
Lazarovici, A. et al. Probing DNA shape and methylation state on a genomic scale with DNase I. Proc. Natl Acad. Sci. USA (2013)
Sung, M. H., Guertin, M. J., Baek, S. & Hager, G. L. DNase footprint signatures are Ddictated by factor dynamics and DNA sequence. Mol. Cell http://dx.doi.org/10.1016/j.molcel.2014.08.016 (2014)
John, S. et al. Chromatin accessibility pre-determines glucocorticoid receptor binding patterns. Nature Genet. 43, 264–268 (2011)
Article CAS Google Scholar
Piper, J. et al. Wellington: a novel method for the accurate identification of digital genomic footprints from DNase-seq data. Nucleic Acids Res. 41, e201 (2013)
Article CAS Google Scholar
Mercer, T. R. et al. The human mitochondrial transcriptome. Cell 146, 645–658 (2011)
Article CAS Google Scholar
Neph, S. et al. BEDOPS: high-performance genomic feature operations. Bioinformatics 28, 1919–1920 (2012)
Article CAS Google Scholar
Bailey, T. L. et al. MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res. 37, W202–W208 (2009)
Article CAS Google Scholar
Pavlidis, P. & Noble, W. S. Matrix2png: a utility for visualizing matrix data. Bioinformatics 19, 295–296 (2003)
Article CAS Google Scholar
Wingender, E., Dietze, P., Karas, H. & Knüppel, R. TRANSFAC: A database on transcription factors and their DNA binding sites. Nucleic Acids Res. 24, 238–241 (1996)
Article CAS Google Scholar
Bryne, J. C. et al. JASPAR, the open access database of transcription factor-binding profiles: new content and tools in the 2008 update. Nucleic Acids Res. 36, D102–D106 (2008)
Article CAS Google Scholar
Newburger, D. E. & Bulyk, M. L. UniPROBE: an online database of protein binding microarray data on protein–DNA interactions. Nucleic Acids Res. 37, D77–D82 (2009)
Article CAS Google Scholar
Gupta, S., Stamatoyannopoulos, J., Bailey, T. & Noble, W. Quantifying similarity between motifs. Genome Biol. 8, R24 (2007)
Article Google Scholar
Drmanac, R. et al. Human genome sequencing using unchained base reads on self-assembling DNA nanoarrays. Science 327, 78–81 (2010)
Article ADS CAS Google Scholar
Manichaikul, A. et al. Robust relationship inference in genome-wide association studies. Bioinformatics 26, 2867–2873 (2010)
Article CAS Google Scholar
Pruitt, K. D. et al. The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes. Genome Res. 19, 1316–1323 (2009)
Article CAS Google Scholar
Ward, J. H. Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc. 58, 236 (1963)
Article MathSciNet Google Scholar
Milo, R. et al. Superfamilies of evolved and designed networks. Science 303, 1538–1542 (2004)
Article ADS CAS Google Scholar

Download references

Acknowledgements

We thank our colleagues for their insightful comments and critical readings of the manuscript. We also thank many individuals who provided mouse cell and tissue samples. This work was supported by NIH grants U54HG004592, U54HG007010 and U01ES01156 to J.A.S.; RC2HG005654 to J.A.S. and M.G.; and R37 DK44746 to M.G. and M.A.B. A.B.S. was supported by grant FDK095678A from NIDDK.

Author information

Andrew B. Stergachis and Shane Neph: These authors contributed equally to this work.

Authors and Affiliations

Department of Genome Sciences, University of Washington, Seattle, 98195, Washington, USA
Andrew B. Stergachis, Shane Neph, Richard Sandstrom, Eric Haugen, Alex P. Reynolds, Theresa Canfield, Sandra Stelhing-Sun, Kristen Lee, Robert E. Thurman, Shinny Vong, Daniel Bates, Fidencio Neri, Morgan Diegel, Erika Giste, Douglas Dunn, Jeff Vierstra, R. Scott Hansen, Audra K. Johnson, Peter J. Sabo, Rajinder Kaul, Elhanan Borenstein & John A. Stamatoyannopoulos
Division of Basic Sciences, Fred Hutchinson Cancer Research Center, Seattle, 98109, Washington, USA
Miaohua Zhang, Rachel Byron & Mark Groudine
Department of Medicine, University of Washington, Seattle, 98195, Washington, USA
R. Scott Hansen, Rajinder Kaul & John A. Stamatoyannopoulos
Department of Biological Structure, University of Washington, Seattle, 98195, Washington, USA
Matthew S. Wilken & Thomas A. Reh
Department of Comparative Medicine, University of Washington, Seattle, 98195, Washington, USA
Piper M. Treuting
Division of Radiation Oncology, University of Washington, Seattle, 98195, Washington, USA
Mark Groudine
Clinical Research Division, Fred Hutchinson Cancer Research Center, Seattle, 98109, Washington, USA
M. A. Bender
Department of Pediatrics, University of Washington, Seattle, 98195, Washington, USA
M. A. Bender
Department of Computer Science and Engineering, University of Washington, Seattle, 98102, Washington, USA
Elhanan Borenstein
Santa Fe Institute, Santa Fe, 87501, New Mexico, USA
Elhanan Borenstein

Authors

Andrew B. Stergachis
View author publications
You can also search for this author in PubMed Google Scholar
Shane Neph
View author publications
You can also search for this author in PubMed Google Scholar
Richard Sandstrom
View author publications
You can also search for this author in PubMed Google Scholar
Eric Haugen
View author publications
You can also search for this author in PubMed Google Scholar
Alex P. Reynolds
View author publications
You can also search for this author in PubMed Google Scholar
Miaohua Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Rachel Byron
View author publications
You can also search for this author in PubMed Google Scholar
Theresa Canfield
View author publications
You can also search for this author in PubMed Google Scholar
Sandra Stelhing-Sun
View author publications
You can also search for this author in PubMed Google Scholar
Kristen Lee
View author publications
You can also search for this author in PubMed Google Scholar
Robert E. Thurman
View author publications
You can also search for this author in PubMed Google Scholar
Shinny Vong
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Bates
View author publications
You can also search for this author in PubMed Google Scholar
Fidencio Neri
View author publications
You can also search for this author in PubMed Google Scholar
Morgan Diegel
View author publications
You can also search for this author in PubMed Google Scholar
Erika Giste
View author publications
You can also search for this author in PubMed Google Scholar
Douglas Dunn
View author publications
You can also search for this author in PubMed Google Scholar
Jeff Vierstra
View author publications
You can also search for this author in PubMed Google Scholar
R. Scott Hansen
View author publications
You can also search for this author in PubMed Google Scholar
Audra K. Johnson
View author publications
You can also search for this author in PubMed Google Scholar
Peter J. Sabo
View author publications
You can also search for this author in PubMed Google Scholar
Matthew S. Wilken
View author publications
You can also search for this author in PubMed Google Scholar
Thomas A. Reh
View author publications
You can also search for this author in PubMed Google Scholar
Piper M. Treuting
View author publications
You can also search for this author in PubMed Google Scholar
Rajinder Kaul
View author publications
You can also search for this author in PubMed Google Scholar
Mark Groudine
View author publications
You can also search for this author in PubMed Google Scholar
M. A. Bender
View author publications
You can also search for this author in PubMed Google Scholar
Elhanan Borenstein
View author publications
You can also search for this author in PubMed Google Scholar
John A. Stamatoyannopoulos
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

J.A.S., A.B.S. and S.N. designed the experiments. S.N., A.B.S., A.P.R., E.H. and R.S. carried out the analysis supervised by J.A.S. and E.B.; A.B.S., J.A.S. and S.N. wrote the paper; and all other authors carried out or supervised various aspects of experimental data collection.

Corresponding author

Correspondence to John A. Stamatoyannopoulos.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Extended data figures and tables

Extended Data Figure 1 Cell-selectivity and reproducible detection of DNase I footprints.

a, Distribution of the number of mouse cell types in which each of the 8.6 million distinct footprinted cis-regulatory elements in mouse is contained within a DNase I footprint. b, For each mouse and human cell type, shown is the percentage of DNase I footprints identified in that cell type that are observed in at least one other mouse or human cell type respectively (data represents median ± 25% and 75% quartiles). c, Red: percentage of mouse DNase I footprints with sequence aligning to the human genome that are occupied in one or more human cell types. Brown: percentage of human DNase I footprints with sequence aligning to the mouse genome that are occupied in one or more mouse cell types.

Extended Data Figure 2 Negligible impact of intrinsic DNase I cleavage biases on delineation of DNase I footprints.

a, Box-and-whisker plot displaying the percentage of DNase I footprints found in each of the mouse and human samples that are potentially better explained by intrinsic DNase I cleavage specificity (box represents mean ± 25% and 75% quartiles and whiskers represent minimum and maximum values across all human and mouse samples, respectively. b, Effects of protein occupancy and sequence context on DNase I cleavage profiles. Top: heat maps of per-nucleotide DNase I cleavages; the ratio of the observed cleavages to expected cleavages computed using empirically-modelled DNase I cleavage bias²⁴; and discovered 1% FDR DNase I footprints surrounding Sp1, Ctf1 and Nrf1 recognition sequences in MEL cells. Each heat map pixel row corresponds to an individual motif instance within a DNase I hotspot. Each blue tick mark under the ‘footprint’ column denotes whether (tick) or not (blank) that motif instance overlaps a called FDR 1% DNase I footprint. Bottom: aggregated DNase I cleavage profiles of occupied (that is, within DNase I footprints) Sp1, Ctf1 and Nrf1 recognition sequences in MEL cells shown side-by-side with log₂ ratio of observed versus expected (from intrinsic cleavage preferences) DNase I cleavage. Note that in all cases the cleavage profile of occupied elements differs markedly from expectation.

Extended Data Figure 3 DNase I footprints accurately recapitulate ChIP-seq data.

For five different TFs with corresponding ChIP-seq data in MEL cells, displayed are (left) heat maps showing per-nucleotide DNase I cleavage and (right) vertebrate conservation by phyloP for all motif instances of that TF within MEL DNase I hotspots (irrespective of whether they overlap a DNase I footprint), ranked by the local density of DNase I cleavages. The number of motif instances for that TF is indicated to the left of the heat map. Purple ticks indicate the presence of the corresponding TF ChIP-seq peaks at each motif instance. Green ticks indicate the presence of DNase I footprints at each motif instance. Below each graph is indicated the percentage of TF footprints that reside outside of a ChIP-seq verified binding site, as well as the percentage of ChIP-seq peaks that do not contain a DNase I footprint for that TF (indicating indirect TF occupancy). Of note, occupied motifs within DNase I footprints accurately recapitulate sites of direct TF occupancy, as 99% of DNase I footprinted motifs for a given TF overlap a cognate ChIP-seq peak. In contrast, for most TFs the majority of ChIP-seq peaks arise from indirect TF occupancy events (and thus lack DNase I footprinted sequence elements for their cognate TF).

Extended Data Figure 4 Annotation of the de novo mouse motif models.

a, Left: bar chart showing the percentage of the motif models within different experimentally grounded motif databases that match our de novo mouse motif models. Right: bar chart showing the number of novel de novo motif models in mouse that match de novo motif models in human. b, The proportion of mouse-selective motif model DNase I footprints within distal regulatory regions.

Extended Data Figure 5 Conserved organizing principles of the mammalian TF regulatory network.

a, b, Shown is the relative enrichment or depletion of the 13 three-node network motifs in each of the mouse (a) and human (b) regulatory networks. c, Shown is the relative enrichment or depletion of the 13 three-node network motifs in each of the mouse regulatory networks compared with the relative enrichment of the same motifs in the C. elegans neuronal connectivity network.

Extended Data Figure 6 The conservation of individual three-node circuit types.

a, Examples of three-node circuits formed by TFs in both mouse and human regulatory T (T_reg) cells. b, For each of eight orthologous mouse and human cell-type pairings shown is the percentage of three-node circuits in the mouse cell type that are maintained as any three-node circuit in the orthologous human cell type. c, For each of seven orthologous mouse and human cell-type pairings shown is: (left) heat map showing the overall propensity of individual three-node circuits in the mouse cell-type regulatory network to form the same or other three-node circuits in the human cell-type regulatory network; (middle) bar plot showing the percentage of specific three-node circuits in the mouse cell-type regulatory network to be maintained as the same three-node circuits in the human cell-type regulatory network; (right) the relative enrichment or depletion of the 13 three-node network motifs in a regulatory network constructed using the subset of edges present in both mouse and human cell-type regulatory networks.

Extended Data Figure 7 TF position propensities and cell selectivity of conserved network motifs.

a, Shown is the propensity of all TFs within the ES cell regulatory network to occupy the different positions within a FFL. FFL positions are defined in panel c. b, Shown is the GO term enrichment of TFs that preferentially occupy position C within FFLs as opposed to TFs that preferentially occupy positions A and B within FFLs. Asterisk indicates a q value less than 0.05. P values and q values calculated using the Gene Ontology enrichment analysis and visualization tool (GOrilla). c, For all instances of FFLs in mouse ES cells, shown is the tissue specificity of each component edge across the other 24 mouse cell types. P values were calculated using a Wilcoxon rank sum test. d, Same as c but for regulating mutual motifs.

Extended Data Figure 8 Polarity of TF genes and regulatory networks during development.

a, Schematic illustrating the definition of and contrasting effector-facing and TF-facing TFs. b, Top: a box-and-whisker plot shows the distribution of the relative log enrichment of TF-facing to effector-facing TFs in mouse ES cells. Bottom: relative target landscape enrichments for individual TFs grouped together based on their functional categories. c, Shown is the GO term enrichment of TFs that preferentially regulate TFs (TF-facing) as opposed to TFs that preferentially regulate effector genes (effector-facing). Asterisk indicates a q value less than 0.05. P values and q values calculated using the Gene Ontology enrichment analysis and visualization tool (GOrilla). d, For each cell type, shown is the average propensity of each TF within the regulatory network to regulate TF genes versus effector genes. Relative enrichment values were calculated such that 0 indicates a cell-type regulatory network that is equally geared towards regulating TF genes and effector genes. Cell types are grouped/coloured according to their developmental origin. P values were calculated using a Wilcoxon rank sum test. e, Same as b but for human iPS cells. For box-and-whisker plots, box represents mean ± 25% and 75% quartiles, whiskers represent minimum and maximum values excluding outliers, and outliers indicated by open circles are defined as values outside 1.5 times the interquartile range above the upper quartile and bellow the lower quartile.

Extended Data Table 1 Baseline DNase I characteristics of the different mouse cell types

Full size table

Extended Data Table 2 Orthologous mouse and human cell types used for in-depth analyses

Full size table

Supplementary information

Supplementary Information

This file contains Supplementary Data. (XLSX 62 kb)

PowerPoint slides

PowerPoint slide for Fig. 1

PowerPoint slide for Fig. 2

PowerPoint slide for Fig. 3

PowerPoint slide for Fig. 4

PowerPoint slide for Fig. 5

PowerPoint slide for Fig. 6

Rights and permissions

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported licence. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons licence, users will need to obtain permission from the licence holder to reproduce the material. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-sa/3.0/.

Reprints and permissions

About this article

Cite this article

Stergachis, A., Neph, S., Sandstrom, R. et al. Conservation of trans-acting circuitry during mammalian regulatory evolution. Nature 515, 365–370 (2014). https://doi.org/10.1038/nature13972

Download citation

Received: 21 February 2014
Accepted: 15 October 2014
Published: 19 November 2014
Issue Date: 20 November 2014
DOI: https://doi.org/10.1038/nature13972

This article is cited by

Cell-type-specific prediction of 3D chromatin organization enables high-throughput in silico genetic screening
- Jimin Tan
- Nina Shenker-Tauris
- Aristotelis Tsirigos
Nature Biotechnology (2023)
Limited conservation in cross-species comparison of GLK transcription factor binding suggested wide-spread cistrome divergence
- Xiaoyu Tu
- Sibo Ren
- Silin Zhong
Nature Communications (2022)
Functional annotations of three domestic animal genomes provide vital resources for comparative and agricultural research
- Colin Kern
- Ying Wang
- Huaijun Zhou
Nature Communications (2021)
Annotation of chromatin states in 66 complete mouse epigenomes during development
- Arjan van der Velde
- Kaili Fan
- Zhiping Weng
Communications Biology (2021)
Evolutionarily conserved hierarchical gene regulatory networks for plant salt stress response
- Ting-Ying Wu
- HonZhen Goh
- Daisuke Urano
Nature Plants (2021)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.