We determined genome-wide nucleosome occupancies in mouse embryonic stem cells and their neural progenitor and embryonic fibroblast counterparts to assess features associated with nucleosome positioning during lineage commitment. Cell-type- and protein-specific binding preferences of transcription factors to sites with either low (Myc, Klf4 and Zfx) or high (Nanog, Oct4 and Sox2) nucleosome occupancy as well as complex patterns for CTCF were identified. Nucleosome-depleted regions around transcription start and transcription termination sites were broad and more pronounced for active genes, with distinct patterns for promoters classified according to CpG content or histone methylation marks. Throughout the genome, nucleosome occupancy was correlated with certain histone methylation or acetylation modifications. In addition, the average nucleosome repeat length increased during differentiation by 5–7 base pairs, with local variations for specific regions. Our results reveal regulatory mechanisms of cell differentiation that involve nucleosome repositioning.
Embryonic stem cells and differentiated cells derived from them share the same DNA sequence but have distinct cellular functions. Many of the underlying cell-fate decisions occur through changes to chromatin features that affect gene expression. The specific location of nucleosomes on the DNA is important for controlling access to the DNA1,2. Binding of protein factors to the 145–147 base pairs of DNA wrapped around the histone octamer core is frequently impeded, whereas the linker DNA between nucleosomes is more easily accessible. Recent advancements in high-throughput sequencing methods allowed for the genome-wide mapping of individual nucleosomes at single–base pair resolution3,4, with yeast serving as a model system for the initial pioneering studies5,6,7. More recently, tissue- and disease-specific features of nucleosome positions in higher organisms were reported8,9,10,11,12,13,14. These include studies of human cell lines8,9,10 and mouse hepatocyte cells15. We set out to identify features of nucleosome positioning at functional genomic elements during lineage commitment in mouse embryonic stem cells (ESCs) and neural progenitor cells (NPCs) derived from these ESCs, as well as mouse embryonic fibroblasts (MEFs) from the corresponding mouse strain. By comparing these three cell types, we identified local and global rearrangements of nucleosome occupancy that revealed important roles of nucleosome positioning in cell differentiation.
Nucleosome occupancy maps of ESCs, NPCs and MEFs
We mapped nucleosome positions by genome-wide paired-end sequencing of nucleosomal DNA from mouse ESCs, NPCs and MEFs after digesting the linker DNA between nucleosomes with micrococcal nuclease (MNase) (Online Methods, Supplementary Fig. 1a–c, Supplementary Note). Examples of the resulting nucleosome coverage maps are depicted in Figure 1. We calculated these patterns around transcription factor–binding sites that were determined previously by chromatin immunoprecipitation and DNA sequencing (ChIP-seq) in mouse ESCs16 or, in the case of the CCCTC-binding factor (CTCF), in both ESCs and MEFs17,18. Nucleosome positioning and transcription-factor binding followed a complex relation (Fig. 1a). Some transcription-factor sites identified by ChIP-seq in ESCs were nucleosome depleted, with a nucleosome occupancy reduced to 40–80% of that of the flanking regions in all cell types. In contrast, other transcription factors were preferentially bound to nucleosome-enriched regions or displayed distinct patterns that changed during cell differentiation, as described in further detail below. DNA sequence–dependent binding affinities of the histone octamer also contributed to nucleosome positioning. This is inferred from an exemplary comparison of the experimental nucleosome occupancies to those predicted from the DNA sequences19, in which the computed peaks of nucleosome occupancy correlated well with our experimental nucleosome positions in MEFs but displayed large differences relative to the ESC data set (Fig. 1b). Thus, for the latter cell type the intrinsic binding preferences were overwritten by other factors, for example by the presence of CTCF that binds in ESCs to the region shown and is flanked by two well-positioned nucleosomes.
We found many promoter regions to be nucleosome depleted at the transcription start site (TSS), as shown for the Smarca4 promoter in Figure 1c. At this locus, one nucleosome was constitutively absent downstream of the TSS in all three cell lines. An additional nucleosome was also removed upstream of the TSS in ESCs, where this gene was upregulated by about two-fold in comparison to MEFs. As described in the following, specific features of individual nucleosome occupancy profiles at functional genomic elements like transcription factor–binding sites or TSSs can be evaluated in a genome-wide analysis of the experimental nucleosome occupancy profiles.
Nucleosome occupancy at transcription factor–binding sites
We calculated average nucleosome occupancy profiles for binding sites that were experimentally determined by ChIP-seq in ESCs and that comprised 12 developmentally important transcription factors16, p300 histone acetyltransferase18, chromatin remodelers Chd7 (ref. 20) and Brg1 (ref. 21), as well as DNase I–hypersensitivity sites20. Four types of patterns were observed (Fig. 2). (i) Some transcription factors, such as c-Myc, n-Myc, Zfx and Klf4, were preferentially bound in nucleosome-depleted regions in ESCs (Fig. 2a). These regions retained a largely reduced nucleosome occupancy in NPCs and MEFs. A similar pattern was observed for the DNase I–hypersensitivity sites, supporting the previous conclusion that these reflect nucleosome-depleted regions20. (ii) For another class of proteins, such as Stat3 and p300, the binding sites resided in nucleosome-depleted regions in ESCs but became preferentially occupied by nucleosomes in differentiated cells, possibly because these factors were no longer bound (Fig. 2b). (iii) Some proteins, such as E2f1, Tcfcp2l1 and Essrb, showed a more complex pattern with a small nucleosome occupancy peak at the binding sites, surrounded by wider regions of reduced nucleosome occupancy (Fig. 2c). This could reflect a regulatory role of nucleosome positioning for those transcription factors that remain expressed at different developmental stages22. In addition, binding for this transcription-factor group might require active translocation or eviction of a nucleosome by chromatin remodelers23,24. (iv) Another transcription-factor class including the master regulators Nanog, Sox2 and Oct4 had binding sites in ESCs that coincided with well-positioned nucleosomes (Fig. 2d). We conclude that these factors can efficiently bind while the DNA target site interacts with a histone octamer, as has been postulated for so-called 'pioneering factors' that initiate cellular programs.
Cell type–dependent CTCF-directed nucleosome positioning
To further explore nucleosome rearrangement around transcription factor–binding sites, we analyzed CTCF binding sites that have been previously mapped in both ESCs and MEFs17,18 (Fig. 3a–c, Supplementary Fig. 2). CTCF establishes insulatory or boundary elements to demarcate repressive and active chromatin regions25 by setting a local boundary organizing 10–20 nucleosomes26 as well as by bridging distant chromatin regions27. CTCF binding sites were recently identified by ChIP-seq by another study17 and by the ENCODE project18. According to the ENCODE data set used in our analysis, the total number of CTCF binding sites is 34,000 in ESCs and 41,000 in MEFs, with only ~30% of sites coinciding in the two cell types (Fig. 3d).
Both ESCs and MEFs displayed a nucleosome-depleted region of ~200 base pairs (bp) at the center of the CTCF binding sites, which was somewhat more pronounced in ESCs (Fig. 3a). Two nucleosomes were positioned directly adjacent to this site and flanked by up to nine regularly spaced nucleosomes, similar to the pattern reported previously for human T cells26. Notably, the subset of CTCF sites that were occupied only in MEFs but not in ESCs was found to already be partly nucleosome depleted in ESCs (Fig. 3a). In MEFs, nucleosomes were substantially rearranged around the CTCF sites (Fig. 3b). The previous positions of CTCF that were unique to ESCs but not associated with CTCF in MEFs displayed an ~20% increase in nucleosome occupancy. The sites of bound CTCF in MEFs fell into two classes with respect to their nucleosome occupancy (Fig. 3 and Supplementary Fig. 2a). (i) The constitutive CTCF sites present both in ESCs and MEFs were also nucleosome depleted in MEFs but to a lesser extent than in ESCs. This might be related to an ~27% reduction of the ratio of CTCF to core histone H4 expression (Supplementary Table 1). Notably, the CTCF sites that were unique to MEFs displayed an additional local peak of nucleosome occupancy (Supplementary Fig. 2b). This suggests that CTCF is able to bind to nucleosomal DNA at these sites, possibly through additional interaction partners. In support of this view, an example with CTCF bound in the middle of the nucleosome peak is shown in Supplementary Figure 2b. (ii) In addition, a subset of CTCF sites located within enhancer elements displayed a very different pattern (Supplementary Fig. 2c).
Nucleosome occupancies at TSSs and transcription end sites
To characterize the nucleosome occupancy at promoters, we aligned the maps of mouse transcripts at their TSSs and clustered them according to their expression levels for each cell type into active (top 5% of expression level), inactive (bottom 5% of expression level) and the remaining 90% of the transcripts. We observed a broad nucleosome-depleted region downstream of the TSS for ESCs, NPCs and MEFs (Fig. 4a). It extended into the gene body similarly to the pattern reported for mouse hepatocytes15 but very differently from the typical patterns observed in yeast and invertebrates5,6,7,28,29. The nucleosome occupancy profiles in ESCs, NPCs and MEFs were dependent on the gene expression level. For the most active genes, the nucleosome-depleted region became wider and deeper. Supplementary Figure 3 shows the occupancy maps of the 5% of genes with the highest and lowest expression levels. In general, the TSS nucleosome occupancy displayed a significant anticorrelation with expression of the corresponding transcripts (Supplementary Table 2), indicating that a reduced nucleosome occupancy at the promoter favors gene expression.
Next, we grouped average nucleosome profiles at the transcription termination sites (TTSs) of the 5% highest, the 5% lowest and the remaining 90% of genes in ESCs, NPCs and MEFs, according to gene expression level (Fig. 4b). Inactive genes were characterized by a relatively small nucleosome-depleted region around the TTS, possibly due to a nucleosome-excluding DNA sequence30. This nucleosome-depleted region was largely increased upstream of the TTS for the highly active genes in all cell types (Fig. 4b and Supplementary Fig. 4) and was different from the nucleosome-depleted region downstream of the TTS reported previously for yeast31,32,33. Thus, both TSSs and TTSs have unique nucleosome signatures that might reflect more complex regulatory mechanisms than those found in simpler eukaryotes.
Promoter nucleosome occupancy and gene expression changes
To investigate whether changes in nucleosome occupancy between ESCs and MEFs were correlated with gene expression changes, we evaluated and identified three different promoter classes (Fig. 5, Supplementary Fig. 5 and Supplementary Tables 3 and 4). The first class was defined by the simultaneous presence of the trimethylation modification of histone H3 at lysine residue 4 (H3K4me3) and at lysine 27 (H3K27me3) in ESCs and is referred to as 'bivalent' promoters34,35 (Fig. 5a,b and Supplementary Fig. 5). Two other classes were distinguished according to their DNA sequence composition as high CpG (HCG) (Fig. 5c,d) or low CpG (LCG) promoters (Fig. 5e,f)36. For bivalent and HCG promoters, the nucleosome occupancy profiles were characterized by a strong nucleosome-depleted region (Fig. 5a–d) similar to the average TSS patterns (Fig. 4a) and with no substantial differences between the three cell types. In contrast, the nucleosome occupancy around the TSS was high for LCG promoters (Fig. 5e,f) and for promoters that carried only the H3K27me3 modification (Supplementary Fig. 5). We then sorted the TSSs within each class according to their average nucleosome occupancy in ESCs in the region at −500 to 500 bp around the TSS and compared these to MEFs while keeping the same ordering. The overall pattern remained very similar between ESCs and MEFs for bivalent (Fig. 5b) and HCG promoters (Fig. 5d) but changed for a large number of LCG promoters (Fig. 5f).
To test whether nucleosome occupancy changes in the region at −500 to 500 bp were linked to gene expression changes between ESCs and MEFs, we conducted a correlation analysis of the corresponding log2 ratios (Supplementary Tables 3 and 4). In this analysis we found no simple correlation between the two parameters except for LCG promoters. In addition, for the two small groups of ESC bivalent promoters that had transcripts detected by RNA-seq in ESCs but not in MEFs (36 genes) and ESC H3K27me3 promoters that were found to be expressed in MEFs but not in ESCs (20 genes), the data were indicative of an increase of nucleosome occupancy at the TSS during silencing.
We further dissected the relation between nucleosome occupancy and gene expression changes for subgroups of bivalent (H3K4me3 and H3K27me3) promoters in ESCs. In NPCs these resolve into promoters that either carry only the H3K4me3 or only the H3K27me3 mark34,35. For the averaged profiles, we did not observe substantial changes of nucleosome occupancy (Supplementary Fig. 5b). Clustering of bivalent promoters according to their nucleosome occupancy pattern revealed one group, designated as cluster I, which showed a well-positioned nucleosome occupancy peak in the region from −500 to −350 bp upstream of the TSS (Fig. 5g). This nucleosome was preferentially removed in both NPCs and MEFs (Fig. 5h,i) with a concomitant average gene expression increase of 2.2-fold in NPCs and 4.6-fold in MEFs, similar to the average gene expression change observed for all bivalent promoters. Thus, activation of cluster I genes could involve the complete removal of a nucleosome instead of changing its associated histone modifications. Within this cluster, gene ontology categories were enriched that are associated with differentiated cell function (blood-vessel development, 9 genes, P < 4.7 × 10−3; positive regulation of transcription and gene expression, 11 genes, P < 4.1 × 10−2; cell migration and cell motility, 7 genes, P < 7.8 × 10−2; axonogenesis, neuron projection morphogenesis and cell morphogenesis, 6 genes, P < 7.9 × 10−2; Supplementary Table 5).
Histone modification–dependent nucleosome occupancies
Next, we investigated whether nucleosome occupancies changed between transcriptionally active or inactive chromatin regions. Histone modifications were determined by ChIP-seq, and well-defined peaks (P < 10−5) were selected. We identified about 10,000 clusters for each histone mark studied. These included the bona fide repressive trimethylation modification of histone H3 at lysine residue 9 (H3K9me3), as well as the permissive acetylation of histone H3 at either lysine residue 9 (H3K9ac) or at lysine 27 (H3K27ac) in ESCs and MEFs. Average nucleosome occupancy patterns in ESCs, NPCs and MEFs were calculated around the centers of these clusters (Fig. 6). We found that H3K9me3 clusters displayed increased nucleosome occupancy (Fig. 6a), whereas H3K9ac and H3K27ac clusters showed the opposite trend (Fig. 6b,c). H3K27ac sites common to ESCs and MEFs were substantially depleted of nucleosomes in both ESCs and MEFs (Fig. 6d). However, genomic positions that were acetylated in ESCs but not in MEFs had a pronounced peak of nucleosome occupancy in MEFs. The same was true for the MEF-specific H3K27ac sites: these positions had increased nucleosome occupancy in ESCs, where they were not acetylated. Thus, histone modifications could mark nucleosomes for changes in their density along the DNA in the corresponding regions.
Increase of nucleosome repeat length during differentiation
An essential parameter that describes the primary chromatin organization is the nucleosome repeat length (NRL), which is the average distance between two neighboring nucleosomes. We determined NRLs according to a previously described method9. It is based on calculating the frequency of nucleosome distances between the starts of all mononucleosomal DNA fragments and then analyzing the preferred distances between the nearest-neighbor nucleosomes, next-nearest neighbors, etc. The resulting plots yielded well-defined peaks for the preferred internucleosome distances (Fig. 7a). From the plot of peak position and corresponding nucleosome number, we obtained values of 186.1 ± 0.4 bp (ESCs), 193.1 ± 0.6 bp (NPCs) and 191.1 ± 0.5 bp (MEFs): that is, the average NRL increased by 5–7 bp during differentiation (Fig. 7b). On the basis of previous findings, the changes in NRL could involve a change in the molar ratio of linker to core histones37. We found that for only the linker histone variants H1.0 and H1.7, gene expression in relation to that of the core histones was raised substantially in both NPCs and MEFs (Supplementary Table 1). Accordingly, these variants might be particularly important for inducing an NRL increase during differentiation.
Nucleosome position distances at specific genomic loci displayed large local variations from these average NRLs, for example at the TSS and TTS, due to the binding of other protein factors. Moreover, in a 4-kb region of orderly packed nucleosomes around CTCF binding sites, the NRL was reduced to 177.5 ± 1.5 bp for ESCs and 179.4 ± 0.7 bp for MEFs: that is, both values were ~10 bp smaller than the corresponding genome-wide NRLs (Fig. 7c,d).
Our analysis of nucleosome positioning in mouse ESCs in comparison to their lineage-committed NPC and MEF counterparts revealed distinct profiles at functional genomic elements that are relevant for cell differentiation. The analysis of nucleosome occupancy at transcription factor–binding sites indicated the presence of different chromatin interaction mechanisms (Figs. 1,2 and 3); some binding sites were constitutively depleted of nucleosomes in all three cell types (Fig. 2a). This might be an important feature of a certain set of constitutive sites that are always competent for transcription-factor binding if the appropriate factor becomes expressed. In contrast, for other transcription factors the nucleosome occupancy at target sites showed little correlation with transcription-factor binding, which suggested that these transcription factors can bind to nucleosomal DNA or that binding occurs in only a small fraction of the cells at any given point in time (Fig. 2d). In a third group, transcription-factor and histone-octamer binding appeared to be in a competitive equilibrium, in which an increase of the transcription-factor concentration during development could be sufficient to displace the nucleosome from a position that interferes with binding to result in occupancy profiles like those observed in Figure 2b,c. In addition, chromatin-remodeling complexes like Chd7 and Brg1 (Fig. 2d) could bind to a nucleosome and actively translocate it along the DNA to a new position, as discussed previously23. Because the DNA occupancy of developmental transcription factors is highly predictive of gene expression in mouse ESCs38, any modulation of this parameter by nucleosome occupancy at a given transcription factor–binding site could directly influence the expression of the target genes.
A particularly interesting example of the complex relationship between protein binding and nucleosome positioning was revealed here for CTCF (Fig. 3 and Supplementary Fig. 2). Consistent with the previous findings39, the CTCF binding sites in ESCs were located in regions with reduced nucleosome occupancy and acted as a nucleosome boundary element to position adjacent nucleosomes. Notably, some CTCF binding sites specific for MEFs appear to be predisposed in ESCs for later CTCF binding (Fig. 3a). In contrast, CTCF binding sites unique to ESCs became occupied with nucleosomes in MEFs, which would be consistent with the dissociation of CTCF and possibly related to its decreased expression in MEFs (Supplementary Table 1). As reported previously, CTCF binding sites in general have an intrinsically high affinity for the histone octamer, which could promote their incorporation into a nucleosome if CTCF dissociates9. Unexpectedly, a fraction of CTCF proteins in MEFs was apparently associated with nucleosomes (Supplementary Fig. 2a,b), and for CTCF binding sites located within enhancers, no regular nucleosome positioning pattern was detected (Supplementary Fig. 2c). Thus, our results suggest that multiple modes of CTCF interaction with chromatin exist, which might involve other protein factors or RNAs that mediate CTCF binding to nucleosomal DNA at certain sites during differentiation.
In order to investigate the gene-specific functions of nucleosome positioning, we conducted an analysis of nucleosome occupancy at the TSS (Figs. 4a and 5). The profile of the nucleosome-depleted region varied for different classes of promoters that were selected either on the basis of the presence of the H3K4me3 and H3K27me3 histone modifications or the CpG content of the DNA sequence (Fig. 5 and Supplementary Fig. 5). For the majority of promoters, a nucleosome-depleted region centered around +100 bp was present that became more pronounced for highly active genes. This profile might be related to the presence of RNA polymerase II that has been mapped recently at the promoters of mouse ESCs40. RNA polymerase II could be present in either a transcriptionally engaged form or bound in a stalled state that requires additional factors for initiation of transcription36. Notably, the transcript ends displayed a decrease of nucleosome occupancy toward the gene body for the most active genes (Fig. 4b).
The average TSS nucleosome pattern determined here was similar to nucleosome profiles reported previously for mouse hepatocytes15 and selected human promoters9,10,41 that were also characterized by a rather broad nucleosome-depleted region. This pattern is markedly different from those previously reported for simpler eukaryotes, in which a single nucleosome was missing upstream of the TSS and downstream of the TTS, and this was followed by an oscillatory pattern of several regularly positioned nucleosomes5,6,7,28,29. In addition, a nucleosome-depleted region extending into the gene body was found at the TTSs, and it became more pronounced with increased gene expression (Fig. 4b). This feature could be related to a coupling of the TTS with 3′ polyadenylation of the transcript42.
When evaluating all promoters within one cell type, we found that an increased nucleosome occupancy in the region of 500 bp around the TSS was correlated with a reduction of transcription (Supplementary Table 2). Although no corresponding anticorrelated changes of promoter nucleosome occupancy and gene expression were found between ESCs and MEFs for the majority of genes (Supplementary Tables 3 and 4), we identified certain specific groups of promoters that displayed such a behavior. This qualifies them as potential candidates for a regulatory mechanism that would operate through nucleosome repositioning during differentiation. For example, the bivalent promoters in ESCs contained a group of promoters with a specific nucleosome occupancy profile. This cluster showed a correlation between the loss of a nucleosome, at position −500 to −350 bp around the TSS in both NPCs and MEFs, accompanied by an increase in gene expression (Fig. 5g–i). In addition, other ESC bivalent and H3K27me3-only promoters as well as LCG promoters displayed anticorrelated relations between nucleosome occupancy and gene expression changes during differentiation (Supplementary Tables 3 and 4).
These changes might be directly related to the addition or removal of certain histone marks, as concluded from our analysis of exemplary histone modifications with respect to nucleosome occupancy (Fig. 6). H3K9me3 clusters in ESCs were found to be nucleosome enriched, whereas H3K9ac and H3K27ac clusters were nucleosome depleted. H3K9me3 and H3K9ac are particularly notable because H3K9me3 is highly correlated with local mutation rates in cancer cells, whereas H3K9ac is strongly anticorrelated43. Together with recent findings that chromatin regions characterized by different histone modifications vary in their nucleosome repeat length9, our data link histone modifications with important structural functions with respect to nucleosome positioning and occupancy. As reviewed recently, chromatin-remodeling complexes recognize a variety of histone modifications24. Thus, it is tempting to speculate that these molecular machines are involved in changes of nucleosome occupancy at transcription factor–binding sites and promoters after marking a given nucleosome with specific histone modification signals.
Finally, we observed genome-wide changes of the primary chromatin structure during cell differentiation, as reflected in the increase of the average NRL in ESCs of 186.1 ± 0.4 bp by 7 bp (NPCs) and 5 bp (MEFs) (Fig. 7). As reported previously, this NRL change is related to an increase in the ratio of linker to core histones37. Although mouse ESCs have a histone H1/nucleosome ratio of 0.46 (ref. 37), this parameter increases to 0.75–0.83 in various differentiated mouse tissues44,45. The upregulated gene expression of linker histone variants H1.0 and H1.7 in NPCs and MEFs versus ESCs (Supplementary Table 1) indicates that these factors might be particularly important for the change in the NRL. It is noted that the 5–7-bp difference in NRL observed between ESCs and NPCs and MEFs could have large effects on the folding properties of the nucleosome chain because the helical phasing of the DNA double helix would relocate neighboring nucleosomes by a torsional angle of about 36° per additional base pair. In agreement with this view, large differences in the chromatin folding properties as a function of NRL have been observed experimentally as, for example, reported in ref. 46. In addition, our calculation for selected short genomic regions revealed that the NRL shows local variation, as for example in the 4-kb region surrounding CTCF binding sites that had an ~10 bp smaller NRL than the genome-wide average value.
In summary, we identified a number of substantial rearrangements of nucleosome positions at different functional genomic elements like transcription factor–binding sites and promoters that are likely to modulate protein binding to these regions. In addition, global changes of nucleosome density that occur throughout the genome during cell lineage commitment of mouse ESCs could also affect the DNA accessibility by changing the folding of the nucleosome chain. Accordingly, we conclude that the cell type–specific organization of nucleosomes on identical genomes represents an additional regulatory layer that controls DNA access of protein factors for selecting tissue-specific gene expression programs.
Isolation of nucleosomes.
ESCs from 129P2/Ola mice47 were cultured in ESGRO complete medium (Millipore). Differentiation of ESCs into neuronal precursors was induced by formation of embryoid bodies in embryoid body–formation medium (Millipore) and treatment with 5 μM retinoic acid for 4 d. Neuronal embryoid bodies were dissociated and seeded on Matrigel (BD Biosciences) in neuronal stem cell medium (PAN) for 4 d. MEFs were generated from pregnant 129P2/Ola E13.5 mice and cultured in DMEM supplemented with 10% FCS and glutamine for up to passage 5. For MNase digestion, cells were harvested and resuspended in low-salt buffer (10 mM HEPES, pH 8, 10 mM KCl, 0.5 mM DTT) at 4 °C. After disruption of the cells with a dounce homogenizer, the nuclei were collected by centrifugation and washed once with the MNase Buffer (10 mM Tris-HCl, pH 7.5, 10 mM CaCl2), resuspended in the MNase Buffer and digested with 0.5 units MNase (Fermentas) per microliter and incubated for 6–11 min at 37 °C. The MNase digestion was stopped by putting the samples on ice and adding EDTA to a concentration of 10 mM. After digestion with 0.1 μg μl−1 RNase A (Fermentas) and removal of protein by phenol-chloroform extraction, the DNA was ethanol precipitated, and the resulting DNA pellet was dissolved in H2O. DNA fragments corresponding to mononucleosomes or dinucleosomes were separated on a 2% agarose gel by using an E-Gel electrophoresis system (Life Technologies). The libraries for sequencing were prepared according to the standard protocol for the Illumina HiSeq2000 sequencing platform.
Deep sequencing of nucleosomal DNA.
High-throughput paired-end sequencing of at least 50-bp read length was performed on the Illumina HiSeq2000 platform at the DKFZ sequencing core facility in Heidelberg, Germany. We mapped about 150 million nucleosome positions per sequencing reaction and used in the final analysis three biological-replicate experiments for ESCs and two replicate experiments for each of NPCs and MEFs, yielding a total of 300 million–450 million nucleosome positions per cell type. In line with the previous studies48,49, we observed that the chromosome-wide nucleosome density was dependent on the average GC content, which was anticorrelated with the MNase preferences found with purified genomic mouse DNA (Supplementary Fig. 1, Supplementary Note). For mapping the position of individual nucleosomes, MNase sequence preferences were found to be negligible, in agreement with a recent study50. Following previous findings described in refs. 32 and 51, we checked the dependence of the nucleosome maps on the level of MNase digestion. Using slightly different levels of MNase digestions in four replicate experiments in ESCs, we obtained average mononucleosome fragment lengths around 150 bp, 155 bp, 160 bp and 180 bp, with ~150 million mapped reads in each reaction. The changes of the integral parameters such as the nucleosome repeat length during the cell differentiation were found to be independent of the degree of MNase digestion. However, in line with previous studies31,48,50, different levels of MNase digestion affected nucleosome distributions, with individual nucleosome peaks sometimes missing or appearing, without clear indications that one of the samples with 150-bp, 155-bp or 160-bp average length was a better representation of the situation in vivo. For MNase digestion with 180-bp average fragment length, a large fraction of the linker remained undigested, leading to a largely reduced coverage of individual nucleosome positions. Accordingly, the nucleosome occupancy maps used here were generated from combining only samples with MNase digestions that had an average mononucleosome fragment length between 150–160 bp.
Data analysis of nucleosome occupancies.
DNA reads were aligned on the mm9 assembly version of the mouse genome, with Bowtie52 reporting unique hits with up to two mismatches. The nucleosome occupancy maps were calculated with custom-made Perl scripts by counting how many reads covered a given DNA base pair (Supplementary Fig. 1b). Sites with artificially high coverage were considered as artifacts and excluded from the analysis. No further peak calling or smoothing was conducted. In addition, no assumptions on the length of the nucleosomal DNA had to be made to derive the nucleosome occupancy maps, as nucleosome boundaries were determined on both sides of the nucleosome by paired-end sequencing. The nucleosome signatures at transcription factor–binding sites and TSSs or TTSs, respectively, were calculated as the sum of nucleosome occupancies in a window of –2,000 to 2,000 bp around a given site. For each gene, the sum of reads was normalized to 1. Then the averaged nucleosome profile was normalized to yield the nucleosome occupancy equal to 1 at position –2,000 bp53. For nucleosome alignment around CTCF binding sites, we analyzed the data set from a previous study17 downloaded from the GEO archive (GSE27944) and the data sets from ENCODE18 downloaded from the UCSC Genome Browser (accession codes: wgEncodeEM001703 and wgEncodeEM001698), which resulted in qualitatively similar patterns. Only the patterns obtained with ENCODE data are reported here. For nucleosome alignment around p300 sites, we used the ENCODE data set wgEncodeLicrTfbsEsb4P300ME0C57bl6StdPk. For nucleosome alignment around binding sites of 12 developmental transcription factors, the analysis was conducted according to the data set from ref. 16, which was initially mapped to the mm8 genome build and converted to mm9 by using the liftOver tool of the UCSC Genome Browser. The histone-modification data from refs. 34 and 35 were also converted from mm8 to mm9 before the analysis. Brg1 ChIP-seq data from ref. 21 (GEO archive GSE14344) were reclustered with MACS54 using a P = 10−5 cutoff for peak detection. Chd7 ChIP-seq data were from ref. 20 (GEO archive GSM558674). DNase I–hypersensitivity raw data from the latter study were provided by the authors and mapped and clustered as described above.
For nucleosome alignment around the TSS and TTS, we used the Eldorado gene annotation provided in the Genomatix Genome Analyzer software (Genomatix)55. Alignments with the RefSeq gene annotation resulted in similar patterns. Nucleosome-occupancy cluster plots for visualizing multiple transcripts were generated in Matlab (Mathworks). These profiles were based on the average occupancy at the TSS in the region from position –500 to +500 bp. Hierarchical clustering was done according to the Ward's minimum variance method implemented in Matlab, which computes mutually exclusive groups of occupancy profiles with minimum within-cluster variance56. The resulting clusters were analyzed with the DAVID gene annotation clustering tool57.
The nucleosome repeat length was calculated essentially as described previously9, with the following modifications: A histogram of the number of occurrences of nucleosome distances from 1–3,000 bp between all nucleosomes was computed and smoothed with a 50-bp window. The NRL was then determined from a linear fit of the detected peak positions versus the nucleosome number. Up to 12 peaks could be identified in this analysis. To make calculations more robust, a threshold of 20 reads was set to remove artificially enriched nucleosome fragments starting at a given genomic position. These nucleosomes were given a statistical weight equal to 20. Less-abundant nucleosome fragments entered the calculation with the weights equal to the number of their occurrences in the high-throughput sequencing.
Expression profiling by RNA sequencing.
Total RNA was purified and prepared for sequencing as described previously58. Sequencing was performed on Illumina platforms. RNA reads were aligned with TopHat59. Further expression analysis was performed with the Genomatix software using the most recent Eldorado gene annotation55. For each transcript, a normalized expression value was calculated from the read distribution that accounts for the length differences and the amount of mapped reads. The program DESeq60 was used for the analysis of differential expression (Supplementary Tables 6 and 7).
For each sample, 1 × 106 cells were cross-linked with 1% PFA, and cell nuclei were prepared by using a swelling buffer (25 mM HEPES, pH 7.8, 1 mM MgCl2, 10 mM KCl, 0.1% NP-40, 1 mM DTT). Chromatin was sheared to mononucleosomal fragments. After IgG preclearance the sheared chromatin was incubated overnight with 4 μg of antibodies against either H3K9ac (Abcam, ab4441), H3K27ac (Abcam, ab4729) or H3K9me3 (Abcam ab8898). After washes with sonication buffer (10 mM Tris-HCl, pH 8.0, 200 mM NaCl, 1 mM EDTA, 0.5% N-lauroylsarcosine, 0.1% Na-deoxycholate), high-salt buffer (50 mM HEPES, pH 7.9, 500 mM NaCl, 1mM EDTA, 1% Triton X-100, 0.1% Na-deoxycholate, 0.1% SDS), lithium buffer (20 mM Tris-HCl, pH 8.0, 1 mM EDTA, 250 mM LiCl, 0.5% NP-40, 0.5% Na-deoxycholate) and 10 mM Tris-HCl, chromatin was eluted from the protein G magnetic beads and the cross-link was reversed overnight. After RNase A and proteinase K digestion, DNA was purified and cloned in a barcoded sequencing library for the Illumina HiSeq2000 sequencing platform. Single reads of 50-bp length were mapped with Bowtie and clustered with MACS54, using a P-value cutoff of 10−5.
MNase-seq, ChIP-seq and RNA-seq data have been deposited to the GEO database under the accession number GSE40896.
Gene Expression Omnibus
Gene Expression Omnibus
We are grateful to A. Valouev and R. Chereji for help with the algorithms for calculations of NRL and average TSS patterns, respectively; to M. Gerstein for advice on data processing; to G. Längst, G. Wedemann and K. Fejes Tóth for discussions; and to the Deutsches Krebsforschungszentrum Sequencing Core Facility for conducting the sequencing. This work was funded within project EpiGenSys by the German Federal Ministry of Education and Research (BMBF) as a partner of the ERASysBio+ initiative in the EU FP7 ERA-NET Plus program through grant 0315712A to K.R. Computational resources and data storage were provided by grants from the BMBF (01IG07015G, Services@MediGRID) and the German Research Foundation (DFG INST 295/27-1). V.B.T. acknowledges the support from the Heidelberg Center for Modeling and Simulation in the Biosciences and a Deutsches Krebsforschungszentrum intramural grant, and Y.V. was supported by BMBF MedSys grant 0315409E to T.H.
About this article
Cell Reports (2019)