## Main

Hematopoietic stem cells (HSCs) reside in the bone marrow (BM) and replenish diverse blood cell types1,2. During differentiation, hematopoietic stem and progenitor cells (HSPCs) restrict their potential to fewer lineages to yield mature blood cells3. These cell fate decisions have recently been dissected through single-cell mRNA sequencing (scRNA-seq) technologies4,5,6.

The regulation of gene expression partially relies on post-translational modifications of histones that modulate chromatin activity7,8. Chromatin dynamics during hematopoiesis have been analyzed for accessible regions in single cells9,10 and active chromatin marks in sorted blood cell types11. Although the role of repressive chromatin has been characterized in embryonic stem cells12,13,14,15 and early development16,17,18, repressive chromatin states during hematopoiesis have been unexplored.

The following two repressive chromatin states have a major role in gene regulation: a polycomb-repressed state, marked by H3K27me3 at gene-rich regions19,20, and a heterochromatin state mainly found in gene-poor regions, marked by H3K9me316. Conventional techniques to detect histone modifications involve chromatin immunoprecipitation (ChIP), which relies on affinity-purification of histone–DNA complexes. As immunoprecipitations are not feasible for single cells individually, protocols were developed that fragment and barcode single cells before pooling them for immunoprecipitation21,22,23. Alternatives to ChIP24 circumvent this affinity-purification by using antibody tethering of either protein A-micrococcal nuclease (pA-MN)24,25,26,27,28 or protein A-Tn5 transposase29,30,31,32,33,34 that produce recoverable fragments only at the site of interest. Although these strategies allow profiling of histone modifications in single cells31,32,34, they do not enrich for specific cell types, making it challenging to profile rare cell types, such as HSCs, that contribute about 0.01% of the cells35. Therefore, we develop sort-assisted single-cell chromatin immunocleavage (sortChIC), which combines single-cell histone modification profiling with cell enrichment.

## Results

### SortChIC maps histone modifications in single cells

To detect histone modifications in single cells, we stain surface antigens for cell type recognition, fix cells in ethanol and incubate them with an antibody against a histone modification. We then add pA-MN that binds to the histone-bound antibody at specific regions of the genome where the modification is present (Fig. 1a). Subsequently, single cells in G1 phase of the cell cycle are sorted based on their Hoechst staining into 384 well plates (Extended Data Fig. 1a). Next, MN is activated by adding calcium, allowing MN to digest antibody-proximal internucleosomal DNA regions. Removing the need for purification steps, nucleosomes are digested and genomic DNA fragments are ligated to adapters containing a unique molecular identifier (UMI) and cell-specific barcode. The genomic fragments are amplified by in vitro transcription and PCR and sequenced.

To test sortChIC performance, we apply it to the well-characterized cell line K562, where we map four histone modifications that represent major chromatin states regulating gene expression (Fig. 1b–e). For modifications associated with gene activation, we profile H3K4me1 (Fig. 1b) and H3K4me3 (Fig. 1c), found at active enhancers and promoters and promoters of active genes, respectively36. For modifications associated with repression, we profile H3K9me3 found in gene-poor regions (Fig. 1d) and H3K27me3 found in gene-rich regions (Fig. 1e)20.

For each histone modification, we process 1,128 G1 phase K562 cells. Using the MN cut site position and UMIs, we map unique MN cut sites. Following filtering, we retain 3,113 cells (Extended Data Fig. 1b) with the large majority of reads falling in peaks identified from pseudobulks (Extended Data Fig. 1c). We compare pseudobulk sortChIC profiles with bulk ChIP-seq results37, which are highly correlated (Pearson correlation > 0.8; Extended Data Fig. 1d–e). Single-cell tracks underneath each average track (Fig. 1b–e) illustrate the high reproducibility of the signal between cells. Of note, the H3K9me3 histone modification profiles obtained from sortChIC represent the heterochromatin state without the need for input normalization (Extended Data Fig. 1f), which is required for ChIP experiments38. Lastly, we compare the sensitivity and specificity of sortChIC with existing methods. To compare sortChIC with pA-MN22,27,28 and Tn5-based methods30,31,32 (Extended Data Fig. 2a–c), we quantify sensitivity and signal specificity (Gini coefficient and signal enrichment). In terms of sensitivity, we find sortChIC to perform better than scChIP-seq and Tn5-based methods. While single-cell chromatin immunocleavage sequencing (scChIC-seq) and indexing single-cell immunocleavage sequencing (iscChIC-seq) have comparable or slightly higher sensitivity (Extended Data Fig. 2b,c, top left panel), both achieve this high signal at the expense of specificity (Extended Data Fig. 2b,c, bottom panels). A caveat for these comparisons is the use of different cell lines, antibodies and primary tissue samples.

### Active marks prime HSPCs, H3K27me3 marks mature alternatives

Next, we map active and repressive chromatin changes during blood formation. To equally include rare and common cell types from the mouse BM, we use cell surface markers Sca1, cKit and a set of lineage markers (Lin) to sort whole BM, lineage marker negative (Lin) and LSK (LinSca1+ckit+) cells that contain HSCs and multipotent progenitors (MPPs) and profile the same set of histone modifications (Extended Data Fig. 3a). Applying Latent Dirichlet Allocation (LDA)39 and visualizing the output with Uniform Manifold Approximation and Projection (UMAP) reveals distinct clusters that contain LSKs, unenriched cell types or mixtures of lineage negative and unenriched cell types (Fig. 2a and Extended Data Fig. 3b). We use the H3K4me3 signal in promoter regions (transcription start site (TSS) ±5 kb) to determine marker genes for eight blood cell types (Fig. 2b). These regions contain known cell-type-specific genes such as the B-cell-specific transcription factor (TF), Ebf1 (Fig. 2c), and the neutrophil-specific gene, S100a8 (Fig. 2d). Specific regions are marked in a cell-type-dependent manner for H3K4me1 and H3K4me3. Conversely, these regions are depleted for H3K27me3 (Fig. 2e). This is exemplified by the TSS of the B-cell-specific TF, Ebf1 (Fig. 2f). Next, we analyze published scRNA-seq data to determine mRNA abundances4 associated with our cell-type-specific promoter regions and confirm that these sets of genes are cell-type-specific (Extended Data Fig. 3c). Interestingly, we find that HSPCs already have H3K4me3 and H3K4me1 signal at the Ebf1 promoter and gene body suggesting HSPCs may already have active marks at genes before their expression in different lineages.

We extend the Ebf1 observation to all TSSs in our eight cell-type-specific gene sets defined using H3K4me3, by comparing fold changes between differentiated cell type relative to HSPCs (Extended Data Fig. 3d–f). We find both up- and down-regulation of active chromatin. for example, at B-cell-specific genes, active chromatin levels increase from HSPCs to B cells and plasmacytoid dendritic cells (pDCs) but decrease in basophils/eosinophils, neutrophils and erythroblasts (Extended Data Fig. 3d,e). This divergence occurs in all eight cell-type-specific gene sets, suggesting that cell-type-specific regions in HSPCs already have an intermediate level of active chromatin marks, which are modulated depending on the final cell type.

Repressive H3K27me3 at B-cell-specific genes, by contrast, is upregulated in nonB cells compared to HSPCs, while only few of them lose H3K27me3 signal upon B-cell differentiation (Extended Data Fig. 3f). Across other cell types, we observe a similar trend where mature cells upregulate H3K27me3 at genes specific for alternative cell fates, likely silencing cell type inappropriate genes.

In sum, our analysis of hematopoietic cell-type specific genes shows that in HSPCs active chromatin premarks genes of different blood cell fates, while H3K27me3 repressive chromatin during hematopoiesis silences genes of alternative fates.

### Dynamic H3K9me3 regions reveal HSPCs and three lineages

To understand chromatin regulation in heterochromatic regions, we explore H3K9me3. H3K9me3 analysis reveals the following four clusters: one cluster containing mostly LSKs, one containing mostly unenriched cells and two clusters containing a mixture of unenriched and lineage-negative cells (Fig. 3a,b). Large megabase-scale domains marked by H3K9me3 are constant across cell types; however, smaller regions display cluster-specific signals (Fig. 3c). Analysis of 50 kb regions across the genome identified 6,085 cluster-specific H3K9me3 regions (q < 10−9, deviance goodness of fit). These regions have a 62.8 kb median distance to the nearest TSS, while noncluster-specific H3K9me3 regions have a 138 kb median distance to a TSS (Extended Data Fig. 4a). This suggests that cluster-specific H3K9me3 regions may be associated with gene regulation.

We hypothesize that H3K4me1 may also show differential enrichment in these cluster-specific H3K9me3 regions. Therefore, we select 150 regions with the largest depletion of the H3K9me3 compared to HSPC, resulting in four sets of cluster-specific regions (Extended Data Fig. 4b). The H3K4me1 signal in each of these four sets of regions shows cell-type-specific enrichment (Extended Data Fig. 4c), which anticorrelates with H3K9me3 (Fig. 3d). We use this anti-correlation to annotate H3K9me3-defined cell clusters as erythroid, lymphoid and myeloid lineages (Fig. 3e). We find that regions depleted of H3K9me3 in HSPCs show upregulation of H3K4me1 in HSPCs (Fig. 3f). For H3K9me3-depleted regions in myeloid cells, we find that H3K4me1 is upregulated not only in neutrophils but also in other cell types that share the myeloid lineage, such as monocytes (Fig. 3g). This anti-correlation is exemplified at the Gbe1 locus. In this region, HSPCs, lymphoid and myeloid cell types show enrichment of H3K4me1 accompanied by a marked depletion in H3K9me3 (Fig. 3h). At these H3K9me3 regions, we also detect cell-type-specific signal in H3K4me3 and in H3K27me3, although the pattern is weaker than in H3K4me1 (Extended Data Fig. 4d). Overall, we find fewer cell clusters with distinguishable H3K9me3 distribution compared to active chromatin marks. We show that this reduction is the consequence of cell types of the same lineage sharing the same H3K9me3 signal.

### Repressive chromatin changes are mostly cell fate-independent

We next ask whether global patterns in chromatin dynamics during hematopoiesis differ between repressive and active marks. We apply differential analysis on 50 kb regions for all four marks, resulting in 10,518 dynamic bins for H3K4me1, 2,225 for H3K4me3, 5,494 for H3K27me3 and 6,085 for H3K9me3 (Supplementary Table 1). For each histone modification, we count the cell type pseudobulk signal across the bins and perform hierarchical clustering. In active marks, we find that the largest differences come from erythroblast versus nonerythroblasts (Extended Data Fig. 5a). This observation is consistent with the TSS analysis, where the erythroblasts show the largest changes in active chromatin (Extended Data Fig. 3d–e). In accordance with the same TSS-centric analysis, we find intermediate levels of H3K4me1 and H3K4me3 in HSPCs (Extended Data Fig. 5a), suggesting a more accessible chromatin state in HSPCs.

We used generalized principal component analysis (GLMPCA) to project the active mark data onto the two most significant axes of chromatin variation40, which reveals a central position for HSPCs relative to other cell types, suggesting that active chromatin during hematopoiesis diverges depending on the cell type (Fig. 4a, left two panels). By contrast, clustering repressive chromatin dynamics mainly distinguishes HSPCs and differentiated cell types, (Extended Data Fig. 5a). Projecting the repressive mark data reveals a peripheral position of HSPCs compared to other cell types (Fig. 4a, right two panels). By comparing bins that gain or lose chromatin marks in mature cell types relative to HSPCs, we find more than half of the bins that gain or lose repressive marks co-occur in all other cell fates (Fig. 4b), suggesting that changes in repressive chromatin during hematopoiesis are independent of cell fate. By contrast, only 8% of bins in active chromatin show cell-type-independent changes. Differences between HSPCs and non-HSPCs at affected bins show distinct separation between HSPCs and non-HSPCs in repressive marks. We do not observe this for active marks (Extended Data Fig. 5b), corroborating that a large fraction of changes in repressive chromatin is independent of cell fate. These cell fate-independent changes are exemplified for H3K27me3 at the Hoxa region, which shows low levels of H3K27me3 in HSPCs, which are upregulated in differentiated cell types (Fig. 4c). In addition, HSPCs at the immunoglobulin heavy chain (Igh) region carry high levels of H3K9me3, which is lost in myeloid and lymphoid cells, suggesting that this region, encoding the heavy chains of immunoglobulins, is derepressed during differentiation (Fig. 4d).

Next, we ask whether H3K27me3 and H3K9me3 regulate distinct processes. We confirm that H3K27me3 dynamics occur at TSS-proximal GC-rich regions while H3K9me3 is dynamic at TSS-distal AT-rich regions (Extended Data Fig. 5c–d)20. Gene ontology (GO) analysis of H3K9me3 regions unique to HSPCs shows enrichment of phagocytosis, complement activation and B-cell-receptor signaling (Extended Data Fig. 5e), suggesting that HSPCs use H3K9me3 to repress genes that are required in differentiated blood cells. In contrast, GO analysis of HSPC-specific H3K27me3 regions does not show enrichment for biological processes related to blood development.

Taken together, we find that during differentiation, intermediate levels of active chromatin marks in HSPCs are up- or down-regulated depending on the specific cell fate. In contrast, most dynamic repressive chromatin regions are gained or lost independent of the specific cell fate.

### TF motifs underlie chromatin dynamics

Next, we ask whether regulatory DNA sequences underlying the sortChIC data can explain the chromatin changes. We hypothesize that regions with correlated sortChIC signal across cells can be explained by TF binding motifs shared across these regions41,42 (Extended Data Fig. 6a). We adapted MARA, a ridge regression framework, to infer TF motif activities in single cells. SortChIC signals are the observed variables, TF binding motifs are covariates and TF motif activities are latent variables to be inferred. We find statistically significant TF motifs that explain correlations in single-cell chromatin dynamics across different genomic regions. We use TF motif activity42,43,44,45,46 as a term to connect our method to earlier contributions to this problem. Overlaying the predicted single-cell TF motif activities onto the UMAP shows the expected cell-type-specific TF motif activities. We find high ERG motif activity in HSPCs47 (Fig. 5a, left), high CEBP motif activity in neutrophils48,49 (Fig. 5a, mid-left), high EBF motif activity in B cells50 (Fig. 5a, mid-right) and high TAL1 motif activity in erythroblasts51 (Fig. 5a, right), in agreement with the reported role of each TF.

We summarize the inferred single-cell TF activities underlying the cell-type-specific distribution of active H3K4me1 in Fig. 5b. We predict motifs active in pDCs belonging to the IRF and RUNX family (Fig. 5b and Extended Data Fig. 6b–d), consistent with their role in type 1 interferon secretion52,53, dendritic cell progenitor development54 and pDC migration55, respectively. We find natural killer (NK) cells to have high E26 transformation-specific (ETS) family motif activity (Fig. 5b and Extended Data Fig. 6b,e), consistent with the role of Ets1 in the development of natural killer and innate lymphocyte cells56,57. Finally, we predict TFs that have the lowest activity in HSPCs and pDCs, such as the NR4A family (Fig. 5b and Extended Data Fig. 6b,f). Considering that NR4A family members are highest expressed in HPSCs (data not shown), we conclude that NR4A mainly prevents enhancer activation, consistent with a repressive function of Nr4a1 in HSPCs58,59. The low activity of several TFs suggests that pDCs could be in a more progenitor-like state, consistent with our pseudobulk clustering results in H3K4me1, H3K4me3 and H3K27me3 (Extended Data Fig. 5a).

We apply our TF motif analysis to the two repressive chromatin landscapes to predict motifs that explain HSPC-specific distributions. In H3K27me3, we predict a CCAT motif belonging to the Yin Yang family60, specifically active in HSPCs (Fig. 5c). The Yy1 gene encodes a polycomb group protein, shown to regulate HSC self-renewal61. In H3K9me3, we predict an AT-rich motif belonging to the transcriptional repressor PLZF, specifically active in HSPCs (Fig. 5d), that has been implicated in regulating the cell cycle of HSCs62.

Taken together, our framework predicts TFs underlying cell-type-specific chromatin dynamics. We suggest that differentiating cells decide which active regions to up- or down-regulate depending on the cell-type-specific TFs that associate with these regions.

### Distinct cell types can share similar heterochromatin states

To understand the relationship between the eight cell types identified by histone marks of gene-rich regions (H3K4me1, H3K4me3 and H3K27me3) to the four clusters identified by H3K9me3, we stain cells with both H3K4me1 and H3K9me3 antibodies63. This double-incubation strategy generates cuts that come from both H3K4me1 and H3K9me3, and uses our single mark sortChIC data to infer the relationships between the two marks in single cells (Fig. 6a). We sort Lin and unenriched cells to profile abundant and rare cell types. Joint UMAP landscapes reveal clusters that are depleted or enriched for mature lineage markers (Fig. 6b). We use clusters from H3K4me1 and H3K9me3 single-incubated data to develop a model of how the double-incubated data could be generated (Fig. 6c).

For this, we select 811 regions associated with cell-type-specific genes found in our H3K4me1 analysis (Fig. 2e) and 6,085 cluster-specific regions (50 kb bins) found in our H3K9me3 analysis (Extended Data Fig. 5a, right panel) as features in our model, making a total of 6,896 regions. We verify that these features show cluster-specific differences, by clustering the single-incubated H3K4me1 and H3K9me3 signal across cell types (Extended Data Fig. 7a,b).

Because we do not know which cluster from H3K4me1 pairs with which cluster from H3K9me3, we generate an in-silico model of all possible pairings (Fig. 6c, left). For each double-incubated cell, we perform model selection to select the cell pair with the highest probability (Fig. 6c, right, and Extended Data Fig. 7c–e). This selection reveals that cell types share a common heterochromatin landscape, reflecting their myeloid64 or lymphoid lineage65 (Fig. 6d). Erythroblasts do not share a heterochromatin landscape with any other cell type. Surprisingly, we find pDCs associated with the HSPC-enriched H3K9me3 landscape, suggesting that these cells may have already committed toward a pDC fate through active chromatin, while their heterochromatin remains undifferentiated.

This confirms that distinct cell types in related lineages can share their heterochromatin state (Fig. 6e,f), suggesting a hierarchical model where changes in heterochromatin might restrict lineages and changes in active chromatin define cell types within lineages.

### Distinct repressive chromatin trajectories in hematopoiesis

To systematically analyze a continuous trajectory from fluorescence-activated cell sorting (FACS)-validated HSCs to differentiated cell types across histone modifications, we expand our dataset to include different HSPC subpopulations and cKit+ progenitor cells. Specifically, we sort HSCs, including both long-term (LT) and short-term (ST) HSCs, MPPs, common myeloid progenitors (CMPs), and megakaryocyte/erythrocyte progenitors (MEPs). Furthermore, we validate our differentiated cell types by sorting B cells, NK cells, erythroblasts, neutrophils, monocytes, pDCs and cDCs (Extended Data Fig. 8a). In total, we increase our BM dataset by 17,270 new cells across H3K4me1, H3K4me3, H3K27me3 and H3K9me3 (Extended Data Fig. 8b), giving a total of 39,857 cells in our dataset.

A subset of the new sortChIC cells has combinations of Sca1, cKit and Lin marker levels from FACS that allow the definition of a FACS-based differentiation stage (Fig. 7a). We plot these Sca1, cKit, Lin-stained cells onto a ternary plot to project cells along a FACS-defined differentiation trajectory. Cells arrange along a continuum of differentiation potential as follows: from uncommitted progenitors (Sca1+, cKit+ and Lin) and committed progenitors (Sca1, cKit+ and Lin) to mature cells (Sca1, cKit and Lin+). Plotting relative levels of Sca1, cKit and Lin onto the UMAP reveals HSCs, progenitors and mature cells (Fig. 7b).

Next, we use the labeled cells from FACS (Extended Data Fig. 8a) to assign each cell to a cell type in a supervised and probabilistic manner (Extended Data Fig. 9a–e), creating a high-confidence dataset of 14 subtypes (Fig. 7c). Of note, we find that monocytes are epigenetically distinct from neutrophils and DCs in H3K4me1, H3K4me3 and H3K27me3, but in H3K9me3 all mature myeloid cell types appear to cluster together (Fig. 7c and Extended Data Fig. 9a–c). We validate the presence of pDCs in our dataset, which forms distinct islands in H3K4me1, H3K4me3 and H3K27me3 but are spread across the HSPC cluster in H3K9me3 (Extended Data Fig. 9b).

We analyze neutrophil, B cell, erythroblast and HSPC-specific marker gene sets (±5 kb around TSS) for H3K4me1, H3K4me3 and H3K27me3 alterations from HSCs to different mature cell types. For mature cell-type-specific genes, we find that active marks start with intermediate levels in HSCs, which diverge during differentiation into mature cell types (Fig. 7d and Extended Data Fig. 10a–c). In contrast, marker genes of mature cell types show low H3K27me3 in HSCs that increase during differentiation in cell types that do not express them (Fig. 7d and Extended Data Fig. 10b–c, right). Genes specifically expressed in HSPCs lose active marks and accumulate H3K27me3 in all differentiation trajectories (Extended Data Fig. 10d).

To summarize these trajectory dynamics, we take dynamic bins (Supplementary Table 1) and apply principal component analysis (PCA) (Fig. 7e). To estimate chromatin velocities for each mark, we fit a trajectory-specific cubic spline across pseudotime for each bin, then calculate the derivatives with respect to pseudotime. Bin-level velocities are then projected onto the PCA for each histone mark (Fig. 7e). In active marks, we find trajectories that diverge according to erythroid, myeloid and lymphoid lineages. Repressive chromatin, by contrast, shows cell-type-independent changes before lineage specification. At the bin level, we use regions that are upregulated for each histone mark independently for neutrophils, B cells or erythroblasts relative to HSPCs and plot the mean histone mark levels per cell along pseudotime (Fig. 7f, Supplementary Fig. 1a–b, regions defined previously, and Supplementary Table 1). For all three bin sets, we find that active marks diverge across cell types, while repressive marks show dynamics that are shared across cell types consistent with our earlier findings (Fig. 4b).

### Chromatin commitment coincides with lineage restriction

To compare the global dynamics of the four different histone marks along a common trajectory, we use the marker levels of Sca1, cKit and Lin and asked when global chromatin states are specified along the Sca1-cKit-Lin trajectory. Overlaying the relative levels of Sca1, cKit and Lin onto the PCA shows that Sca1 levels are already low when chromatin has specified the myeloid (CMPs) or erythroid lineage (MEPs; Supplementary Fig. 2a). Plotting principal component 1 along the Sca1-cKit-Lin trajectory shows that first differences on chromatin level can be observed at the exit of multipotency, when MEPs and CMPs emerge after the loss of Sca1 (Supplementary Fig. 2b,c), suggesting that chromatin changes co-occur with lineage commitment. These results are in line with previous studies identifying a switch from multilineage priming to lineage restriction on marker genes during progenitor cell commitment66. Overall, we apply sortChIC to interrogate FACS-validated rare subpopulations and differentiated cell types in the BM, enabling systematic analysis of active and repressive chromatin dynamics during hematopoiesis.

## Discussion

Here we provide a comprehensive map of chromatin regulation at both euchromatic and heterochromatic regions during blood formation. We find that repressive chromatin shows distinct dynamics compared with active chromatin, demonstrating that profiling repressive chromatin regulation in single cells reveals new dynamics. Active chromatin premarks in HSPCs genes of all lineages and is up- or down-regulated depending on the specific cell fate, mediated by cell-type-specific TFs. Consequently, active chromatin shows divergent changes for different blood cell fates (Fig. 8, left panel). In contrast, changes in repressive chromatin often occur in the same direction regardless of the specific cell fate, resulting in large differences between HSPCs and mature cell types (Fig. 8, middle and right panel). In accordance with the premarked active chromatin state in HSCs, the majority of mature cell-type-specific genes show low levels of H3K27me3 in HSCs and consolidate their differentiation choice by silencing genes specific to HSCs and of the unchosen trajectory. This progressive transition to a restricted chromatin state agrees with previous studies showing a genome-wide transition during ES cell differentiation67. Although our results are correlative, previous work characterizing the consequences of HSC-specific deletion of EED68, a core component of both PRC1 and PRC2, showed a loss of differentiation capacity, while preserving HSCs self-renewal. This suggests an integral role of H3K27me3 after the onset of lineage commitment in hematopoiesis.

Our findings further expand the role of H3K9me316. We find that H3K9me3 changes underlie the lineage restriction in hematopoiesis and are rewired as HSPCs differentiate. Although in vivo dynamics in H3K9me3 have been reported during early development16,17,18, our results extend the knowledge of H3K9me3 dynamics to homeostatic renewal in adult physiology. Joint analysis of active and repressive marks corroborates the hierarchical chromatin changes and shows a similarity between pDCs and HSPCs69,70 in their heterochromatin state.

Our FACS sorting strategy profiled the epigenomes of rare and abundant cell types in the BM. Although our analysis did not find clear subpopulations within rare progenitor cells previously observed in scRNA-seq studies4,71, the cell type resolution obtained with sortChIC is comparable to scRNA-seq studies. Rather than a way to further subcategorize existing cell types, sortChIC profiles layers of regulation that guide differentiation. If the sensitivity can be further improved, additional chromatin states might become visible that are indistinguishable from scRNA-seq. Future multi-omics studies integrating the detection of chromatin modifications with transcription72,73,74 should further facilitate the integrated analysis of diverse histone modifications and allow us to more clearly understand how these multiple layers of gene regulation are related.

## Methods

Our research complies with all relevant ethical guidelines. Experimental procedures were approved by the Dier Experimenten Commissie of the Royal Netherlands Academy of Arts and Sciences and performed according to the guidelines.

### Animal experiments

Primary BM cells were collected from 3-month-old male C57BL/6 mice. Femur and tibia were extracted, and the bones ends were cut away to access the BM, which was flushed out using a 22 G syringe with HBSS (-Ca, -Mg, -phenol red; Gibco, 14175053) supplemented with Pen-Strep and 1% FCS. The BM was dissociated and debris was removed by passing it through a 70 μm cell strainer (Corning, 431,751). Cells were washed with 25 ml supplemented HBSS before lineage marker staining was performed following the instructions of the EasySep Mice Hematopoietic Progenitor Cell Isolation Kit (Stemcell), using half of the recommended concentration of the biotinylated antibodies. This was followed by 30 min incubation at 4 °C with a staining layout-dependent antibody cocktail detailed below. Where indicated lineage depletion was performed by incubating cells with magnetic streptavidin beads following instructions of the EasySep Mice Hematopoietic Progenitor Cell Isolation Kit. After two additional washes with HBBS (+PS, +FCS), cells were prepared following the sortChIC protocol for the four different histone modifications.

### Cell culture

K562 cells (ATCC CCL-243) were grown in RPMI 1640 Medium GlutaMAX, supplemented with 10% FCS, Pen-Strep and nonessential amino acids. After collecting, cells were washed three times with room temperature PBS before continuing with the sortChIC protocol.

#### sortChIC-seq: Cell preparation: fixation

Three buffers are used for the majority of cell preparation. A basic ChIC buffer (47.5 ml H2O RNAse free, 1 ml 1 M HEPES pH 7.5 (Invitrogen), 1.5 ml 5 M NaCl, 3.6 μl pure spermidine solution (Sigma Aldrich), 0.05% Tween20), a Wash buffer (Basic ChIC buffer with 1 Ethylenediaminetetraacetic acid (EDTA)-free protease inhibitor cocktail tablet per 50 ml (Sigma Aldrich)) as well as a Antibody incubation buffer (Wash buffer with 4 ml ml−1 0.5 M EDTA). All steps performed on ice were as follows: in step 1, cells were resuspended in 300 μl PBS per 1 million cells in a 15 ml protein low binding falcon tube and 700 μl ethanol (−20 oC precooled) per 1 million cells are added while vertexing cells at middle speed. In step 2, cells were fixed for 1 h at −20 oC. In step 3, after fixation, cells were washed twice in 1 ml antibody incubation buffer. In case cells had to be stored before sorting, DMSO was added to a final concentration of 10% and cells were frozen at –80 °C. After thawing, cells are washed once in 0.5 ml antibody incubation buffer before continuing with pA-MN targeting.

#### sortChIC-seq: Cell preparation: nuclei

Cells were washed once in 1 ml antibody incubation buffer (0.05% Tween replaced by 0.05% Saponin for this and following steps with nuclei). Nuclei were isolated by further Saponin incubation overnight in parallel to the antibody staining. For BM, we sorted nine plates each for H3K4me1, H3K4me3 and H3K9me3.

#### sortChIC-seq: pA-MN targeting

In step 4, cells were pelleted at 500 g for 4 min and resuspended in 200 μl antibody incubation buffer per 1 million cells and were aliquoted into 0.5 ml protein low binding tubes containing the primary histone mark antibody (details can be found in the Supplementary Note section Materials section) diluted in 200 μl antibody incubation buffer; in step 5, cells were incubated overnight at 4 oC on a roller, (step 6) before they were washed once with 500 μl Wash Buffer. In the case of double-labeling experiments, cells were incubated with antibodies against H3K4me1 and H3K9me3 together at the same concentrations as for the single-mark experiments. Afterwards (step 7), cells were resuspended in 500 μl wash buffer containing pA-MN (3 ng ml−1) and Hoechst 34580 (5 μg ml−1) and (step 8) incubated for 1 h at 4 °C on a roller. In step 9, finally, cells were washed an additional two times with 500 μl Wash Buffer before passing them through a 70 μm cell strainer (Corning, 431751).

#### sortChIC-seq: FACS sorting

In step 10, for all experiments, cells were gated additionally to cell surface markers for G1 cell cycle stage based on the Hoechst staining on an Influx FACS machine into 384 well plates, containing 5 μl sterile filtered mineral oil (Sigma Aldrich) per well, using forward scatter and trigger pulse width to further remove cell doublets. Cells were sorted using index sorting, which records FACS information for every sorted well. To further exclude missorting of more than the intended cell, we used custom sort settings—objective: single, number of drops=1, extra coincidence=complete empty (no signal in the previous and next drop) and phase mask=center 10/16 (cell is in the middle of the sorted drop).

Sort layouts for separate experiments can be found in Extended Data Figs. 1a, 3a and 8a, with total number of plates sorted per condition found in Supplementary Table 4. Antibody details can be found in the Supplementary Note section Materials section. Data was collected using BD FACS software (version 1.2.0.124).

#### sortChIC-seq: pA-MN activation

The following small volumes were distributed using a Nanodrop II system (Innovadyme) and plates were spun for 2 min at 4 °C and 2,000g after each reagent addition.

In step 11, 100 nl of basic ChIC buffer, containing 2 mM CaCl2, was added per well to induce pA-MN mediated chromatin digestion. In step 12, for digestion, plates were incubated for 30 min in a PCR machine set at 4 °C. Afterwards (step 13), the reaction was stopped by adding 100 nl of a stop solution containing 40 mM EGTA (chelates Ca2+ and stops MN, Thermo, 15425795), 1.5% NP40 and 10 nl 2 mg ml−1 proteinase K (Invitrogen, AM2548). In step 14, plates were incubated in a PCR machine for further 20 min at 4 °C, before chromatin is released and pA-MN was permanently destroyed by proteinase K digestion at 65 °C for 6 h followed by 80 °C for 20 min to heat inactivate proteinase K. Afterwards, plates can be stored at −80 °C until further processing.

#### sortChIC-seq: Library preparation

In step 15, DNA fragments are blunt-ended by adding 150 nl end repair mix (Supplementary Table 5) per well and incubating for 30 min at 37 °C followed by 20 min at 75 °C for enzyme inactivation. In step 16, blunt fragments are subsequently A-tailed by adding 150 nl per well of A-tailing mix (Supplementary Table 6) and incubating for 15 min at 72 °C. Through AmpliTaq 360ʼs strong preference to incorporate dATP as a single base overhang even in the presence of other nucleotides, a general dNTP removal is not necessary.

Next fragments are ligated to T-tail containing forked adapters (see Supplementary Note section Materials for sequences).

In step 17, for ligation, 50 nl of 5 μM adapter in 50 mM Tris pH 7 is added to each well with a mosquito HTS (ttp labtech). After centrifugation (step 18), 150 nl of adapter ligation mix (Supplementary Table 7) are added before (step 19) plates are incubated for 20 min at 4 °C, followed by 16 h at 16 °C for ligation and 10 min at 65 °C to inactivate ligase.

In step 33, 5 μl of the RNA is primed for reverse transcription by adding 0.5 μl dNTPs (10 mM) and 1 μl random hexamer reverse transcription primer 20 μM (for sequence see Supplementary Note section Materials) and (step 34) hybridizing it by incubation at 65 °C for 5 min followed by direct cool down on ice. In step 35, reverse transcription is performed by further addition of 2 μl first strand buffer (part of Invitrogen, 18064014), 1 μl DTT 0.1 M (Invitrogen, 15846582), 0.5 μl RNAseOUT (Invitrogen, LS10777019) and 0.5 μl SuperscriptII (Invitrogen, 18064014) and (step 36) incubating the mixture at 25 °C for 10 min followed by 1 h at 42 °C. In step 37, single-stranded DNA is purified through incubation with 0.5 μl RNAse A (Thermo Fisher Scientific, EN0531) and (step 38) incubation for 30 min at 37 °C. In step 39, a final PCR amplification to add the Illumina small RNA barcodes and handles is performed by adding 25 μl of NEBNext Ultra II Q5 Master Mix (NEB, M0492L), 11 μl nuclease-free water and 2 μl of RP1 and RPIx primers (10 μM).

In step 40, PCR is performed with following protocol, activation for 30 s at 98 C, 8–12 cycles (depending on starting material) 10 s at 98 C, 30 s at 60 C, 30 s at 72 °C, final amplification 10 min at 72 °C (step 41) PCR products are cleaned by two consecutive DNA bead clean-ups with a 0.8X bead-to-sample ratio. In step 42, the final product was eluted in 7 μl nuclease-free water, and the abundance and quality of the final library are assessed by QUBIT and bioanalyzer.

### pA-MN production

The pA-MN fusion protein was produced following the methods section in ref. 24 (details can be found in Supplementary Note section Materials).

### Statistics and reproducibility

No statistical method was used to predetermine the sample size. Low-quality cells (for example, number of cuts below threshold, cuts not containing expected MN cut motif, and cells with unspecific cuts) were removed from further analysis. The experiments were not randomized. The investigators were not blinded to allocate during experiments and outcome assessment.

### Data preprocessing

We developed a preprocessing pipeline called SingleCellMultiOmics (version v.0.1.25) to process sortChIC data (https://github.com/BuysDB/SingleCellMultiOmics/wiki). The pipeline for sortChIC processes raw fastq files through the following software:

Demultiplexing is performed with demux.py (from SCMO v0.1.25) and adaptors are trimmed using cutadapt (version 3.5). Reads are mapped with bwa (version: 0.7.17-r1188) and are assigned to molecules with bamtagmultiome.py (SCMO v0.1.25). Finally, count tables are generated using bamToCountTable.py (SCMO v0.1.25). The code was run using python version 3.7.6 and R version 4.1.2. Details can be found in the Supplementary Note section Methods.

An example of this full pipeline is available in the sortchicAnalysis git repository: https://github.com/jakeyeung/sortchicAnalysis/tree/main/example_processing_pipeline.

### Calculating reads falling in peaks in sortChIC for K562 cells

For each histone modification, we merged K562 single-cell sortChIC data, and used the resulting pseudobulk as input for hiddenDomains75, with minimum peak length of 1,000 bp. We determined 40,574, 58,257, 28,499 and 28,380 peaks for H3K4me1, H3K4me3, H3K27me3 and H3K9me3, respectively. For each histone modification, we counted the fraction of total reads that fall within each set of peaks.

### Comparison of sortChIC data with other single-cell chromatin profiling assays

To perform a fair comparison of sortChIC data with other similar assays, we downloaded the raw data from Bartosovic et al. (GSE163532)32, Grosselin et al. (GSE117309)22, Ku et al. (GSE105012)27, Wu et al. (GSE139857)31, Kaya-Okur et al. (GSE124557)30 and Ku et al. (GSE139857)28, from GEO, and mapped and quantified them using the pipelines described by the authors in the original study. For details of study-specific processing, see Supplementary Note section Materials.

### Dimensionality reduction based on multinomial models

We counted the number of cuts mapped to peaks across cells and applied the LDA model39 (from topicmodels version 0.2–12), which is a matrix factorization method that models discrete counts across predefined regions as a hierarchical multinomial model. LDA can be thought of as a discrete version of probabilistic PCA, replacing the Gaussian likelihood with a multinomial one76,77. Details can be found in Supplementary Note section Materials.

### Defining eight sets of blood cell-type-specific genes for cell typing

We used the LDA outputs to define topics associated with each cell type. Details can be found in Supplementary Note section Materials.

### Defining genomic regions for dimensionality reduction

We initially defined regions based on 50 kb nonoverlapping windows genome-wide, applying LDA and using the Louvain method to define clusters to merge single-cell bam files. These merged bam files were then used to call substantially marked regions using hiddenDomains75 with minimum bin size of 1 kb. We merged the regions across clusters and generated a new count matrix using the hiddenDomains peaks as features. This new count matrix was used as input for dimensionality reduction.

### Batch correction in dimensionality reduction

Initial LDA of the count matrix revealed batch effects in H3K4me1 and H3K9me3 between cell types of plates that contained only one sorted type. We fit a linear model in the latent space learned from LDA with a cell-type-specific batch effect to correct batch effects. Details can be found in Supplementary Note section Methods.

### Differential histone mark levels analysis

To calculate the fold change in histone mark levels at a genomic region between a cell type versus HSPCs, we modeled the discrete counts Y across cells as a Poisson regression. We fitted a null model, which is independent of cell type, and a full model, which depends on the cell type and compared their deviances to predict whether a region was ‘un-changing‘ or ‘dynamic‘ across cell types. We implemented the model in R using glm(), details can be found in Supplementary Note section Materials.

### Defining bins above background levels for each mark

For each mark, we counted fragments falling in 50 kb bins summed across all cells. We then plotted this vector of summed counts as a histogram in log scale, which shows a bimodal distribution. We manually defined a cut-off for each mark as a background level and took bins that were above this cut-off. This cut-off resulted in 22,067, 12,661, 18,512 and 19,881 bins for H3K4me1, H3K4me3, H3K27me3 and H3K9me3, respectively.

### Calculating bins that change independent of cell type

We used a cut-off of q < 10−50 for H3K4me1, H3K4me3 and H3K27me3, and q < 10−9 for H3K9me3 from the deviance test statistic (details of ‘differential histone mark analysisʼ can be found in Supplementary Note section Materials) to define bins that are changing between cell types. Details can be found in Supplementary Note section Materials.

### Predicting activities of TFs in single cells

We adapted motif activity response analysis (MARA) described in ref. 42 to accommodate the sortChIC data. Briefly, we model the log-imputed sortChIC-seq signal learned from LDA as a linear combination of TF binding sites and activities of TF motifs using a ridge regression framework:

$$\tilde Y_{g,c} = \mathop {\sum}\limits_{m = 1}^M {N_{g,m}A_{m,c} + {\it{\epsilon }}}$$

where $$\tilde Y_{g,c}$$ is the batch-corrected sortChIC-seq signal in genomic region g in cell c; Ng,m is the number of TF binding sites in region g for TF motif m; Am,c is the activity of TF motif m in cell c; $${\it{\epsilon }}$$ is Gaussian noise. The L2 penalty for ridge regression was determined automatically using an 80/20 cross-validation scheme. Z scores of motifs greater than 0.7 were kept as statistically significant motifs. Details can be found in Supplementary Note section Materials.

### Joint H3K4me1 and H3K9me3 analysis by double incubation

We assume that counts from double-incubated cells (H3K4me1 + H3K9me3) were generated by drawing N reads from a mixture of two multinomials, one from a cell type c from H3K4me1 (parametrized by relative frequencies $$\overrightarrow p _c$$) and one from a lineage l from H3K9me3 (parametrized by relative frequencies $$\overrightarrow q _l$$):

$$\overrightarrow y |c,l,w \sim {{{\mathrm{Multinomial}}}}\left( N,{w\overrightarrow p _c + \left( {1 - w} \right)\overrightarrow q _l } \right),$$

where w is the fraction of H3K4me1 that was mixed with H3K9me3. We used this model to calculate the likelihood that a double-incubated cell was generated by a specific pair of cell type and lineage combination. Details can be found in Supplementary Note section Materials.

### Imputing Sca1-cKit-Lin marker levels

Some cells had only two of the three marker levels (Sca1, cKit or Lin), and we imputed the missing third marker by averaging the top ten nearest neighbors in the cell that contains the missing marker levels. Details can be found in Supplementary Note section Materials.

### Reference-based cell typing using multinomials

We generated a ground truth reference dataset using FACS-defined labels, then used this reference to calculate the probability of each cell to be assigned to a cell type by assuming the counts from a cell were generated from a multinomial distribution parametrized by a cell type-specific vector of genomic locus probabilities. Details can be found in Supplementary Note section Materials.

### Inferring pseudotime across different differentiation trajectories

We manually selected two PCs for each cell type trajectory, selecting components that show large variation from progenitors (HSCs, LT, ST and MPPs), committed progenitors (for example, CMPs and MEPs), to mature cell types (for example, neutrophils, DCs, basophils, monocytes, pDCs, NK cells and B cells) of interest. Details can be found in Supplementary Note section Materials.

### Chromatin velocity in each histone modification

After defining a pseudotime for each differentiation trajectory, we fit a trajectory-specific cubic spline of the sortChIC signal along pseudotime for each genomic region. We then calculate the derivative using the spline fits to predict the sortChIC signal of each cell at pseudotime t to a future pseudotime t+ 0.01. Details can be found in Supplementary Note section Materials78.

### Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.