DNA sequence-dependent formation of heterochromatin nanodomains

The mammalian epigenome contains thousands of heterochromatin nanodomains (HNDs) marked by di- and trimethylation of histone H3 at lysine 9 (H3K9me2/3), which have a typical size of 3–10 nucleosomes. However, what governs HND location and extension is only partly understood. Here, we address this issue by introducing the chromatin hierarchical lattice framework (ChromHL) that predicts chromatin state patterns with single-nucleotide resolution. ChromHL is applied to analyse four HND types in mouse embryonic stem cells that are defined by histone methylases SUV39H1/2 or GLP, transcription factor ADNP or chromatin remodeller ATRX. We find that HND patterns can be computed from PAX3/9, ADNP and LINE1 sequence motifs as nucleation sites and boundaries that are determined by DNA sequence (e.g. CTCF binding sites), cooperative interactions between nucleosomes as well as nucleosome-HP1 interactions. Thus, ChromHL rationalizes how patterns of H3K9me2/3 are established and changed via the activity of protein factors in processes like cell differentiation.

The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one-or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.
A description of all covariates tested A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted Give P values as exact values whenever suitable.
For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated Our web collection on statistics for biologists contains articles on many of the points above.

Software and code
Policy information about availability of computer code Data collection ATRX ChIP-seq libraries were generated with the NEBNext Ultra DNA Library Prep Kit for Illumina (New England Biolabs, NEB #E7370).
Sequencing was done on the Illumina HiSeq 2000 platform. Alignment of sequencing reads was conducted with the Bowtie2 software allowing for up to 2 mismatches software (bowtie-bio.sourceforge.net/bowtie2/index.shtml)

April 2020
Data Policy information about availability of data All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: -Accession codes, unique identifiers, or web links for publicly available datasets -A list of figures that have associated raw data -A description of any restrictions on data availability Data from the ATRX knockout experiments generated in this study is deposited in the GEO database under accession number GSE158744. Previously published datasets used in our analysis are available in GEO entries GSE40086, GSE54412, GSE97945, GSE61874, GSE57092, GSE40910, GSE82127, GSE40910, GSE29184, GSE57092 and GSE30206 as detailed below.
For a given HND type, differential H3K9me3 or H3K9me2 peaks were called with MACS2 based on the WT and KO datasets of Suv39h1/h2 (GSE40086), Glp (GSE54412), and ATRX (GSE158744). ADNP-associated HNDs were defined as the intersection of ADNP-bound ChIP-seq peaks with all H3K9me3 peaks in wild type ESCs from these experiments (GSE97945). For H3K9me3 in NPCs, we used datasets GSE61874 and GSE57092 with peak calling performed by MACS for Fig. S15 as well as EPIC for Fig. 6B. The nucleosome repeat length (NRL) was determined using NucTools based on the previously published MNase-seq dataset (GSE40910). The dyad-dyad differences were computed using the chemical mapping dataset (GSE82127). In addition, we used published ChIP-seq datasets of CTCF (GSE29184) and HP1 (GSE57092), CpG methylation (GSE30206), and H3K4me1 and H3K27me3 histone modifications (GSE29184).

Field-specific reporting
Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection.

Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences
For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf

Life sciences study design
All studies must disclose on these points even when the disclosure is negative.

Sample size
For the ATRX ChIP-seq, two independently generated samples were analyzed for each of the four sample types (input, IgG, H3 and H3K9me3) for both wildtype and ATRX knockout cells.
Data exclusions No data were excluded from the analysis.

Replication
All eight different ATRX ChIP-seq readouts were repeated once. Replication was successful as shown in Supplementary Figure S1.
Randomization No randomization was used. The analysis of ATRX by ChIP-seq was observational and studied the effect of the ATRAX knock-out.

Blinding
Investigators were not blinded during data collection and analysis.

Reporting for specific materials, systems and methods
We require information from authors about some types of materials, experimental systems and methods used in many studies. Here, indicate whether each material, system or method listed is relevant to your study. If you are not sure if a list item applies to your research, read the appropriate section before selecting a response.

Validation
Both antibodies used in this study are commercially available with validation procedures described on the following sites of the manufacturers: H3K9me3 (Abcam), https://www.abcam.com/histone-h3-tri-methyl-k9-antibody-chip-grade-ab8898.html, ChIP-grade and validated by the manufacturer for ChIP analysis and previously used in the publications listed at the web page. Rabbit IgG (R&D Systems), https://www.rndsystems.com/products/normal-rabbit-igg-control_ab-105-c#product-citations, control for unspecific binding, validated by the manufacturer and previously used in the publications listed at the web page. H3 rabbit polyclonal (ChIP-grade) from Abcam (ab179, lot GR103864-1), https://www.abcam.com/histone-h3-antibody-nuclearmarker-and-chip-grade-ab1791.html, validated by the manufacturer for ChIP analysis and previously used in the publications listed at the web page. The study does not use commonly misidentified cell lines.

ChIP-seq Data deposition
Confirm that both raw and final processed data have been deposited in a public database such as GEO.
Confirm that you have deposited or provided access to graph files (e.g. BED files) for the called peaks.

Data access links
May remain private before publication. Methodology Replicates ChIP-seq and the sequencing of the corresponding Input samples was performed with two replicates in each of the two conditions (wild-type and ATRX knockout)