Unimodal single-cell nano-CT outperforms scCUT&Tag

To multiplex the profiling of chromatin states in the same single cell, we designed a single-chain secondary antibody30—nanobody-Tn5 fusion proteins (nano-Tn5; Fig. 1a), conferring the specificity of the Tn5 towards primary antibodies raised in either mouse (ms-Tn5) or rabbit (rb-Tn5). Because nano-Tn5 binds directly to the primary antibodies, secondary antibody incubation from the CUT&Tag protocol is not required and is thus omitted (Fig. 1c). The monovalent interaction of the nanobody with the primary antibody further allows to combine the antibody and nano-Tn5 incubation steps, greatly simplifying the nano-CT protocol and reducing the number of required washes and increasing the yield of recovered nuclei ranging between 28 to 68% (Methods; Figs. 1b,c). This allows profiling of low input bulk samples from as little as ~25,000 cells as starting material, compared to previously required 1 million16,32 or 150,00015 cells for scCUT&Tag protocols on the Chromium platform.

Fig. 1: nano-CUT&Tag (nano-CT). a, Schematic image of the Tn5 fusion proteins used in the experiments. b, Bar plot depicting number of cells used as input for nano-CT and number of cells recovered. c, Comparison of the antibody- and Tn5-binding strategy between scCUT&Tag and nano-CT. d, Cartoon depiction of the tagmentation and library preparation strategy. The nano-Tn5 is loaded with MeA/Me-Rev oligonucleotides, tagmented genomic DNA is used as template for linear amplification, which is then tagmented in a second round with standard Tn5 loaded with MeB/Me-Rev oligonucleotides. The resulting library is amplified by PCR and sequenced. e, Violin plot depicting number of unique reads per cell obtained by scCUT&Tag15 and nano-CT targeting H3K27me3 per replicate. Violin plots 1–4 from left show multimodal nano-CT performed without ATAC (1 and 3 from left) or with ATAC-seq (2 and 4 from left), and violin plot 5 depicts unimodal nano-CT experiment. f, Individual UMAP embeddings of the single-modality scCUT&Tag (left) and nano-CT (right) data depicting the identified clusters, (scCUT&Tag: 13,932 cells in 4 biological replicates; nano-CT: 6,798 cells in 1 biological replicate; 200,000 cells used as input) g, UMAP co-embedding of the scCUT&Tag data (13,932 cells in 4 biological replicates) together with nano-CT data (6,798 cells in 1 biological replicate; 200,000 cells used as input). Raw matrices obtained by scCUT&Tag and nano-CT were merged together and analyzed without integration. VASC, vascular; AST, astrocytes; RGCs, radial glial cells; OECs, olfactory ensheathing cells; OPCs, oligodendrocyte progenitor cells; MOLs, mature oligodendrocytes; BG, bergman glia; EXC, excitatory neurons; INH, inhibitory neurons; MGL, microglia. Full size image

We further developed a tagmentation and nano-CT library prep protocol for single-cell applications (Fig. 1d). We first perform a deterministic tagmentation to capture more sites using a nano-Tn5 loaded with MeA/Me-Rev (P5) oligonucleotides. The tagmented genomic DNA is then used as a template for combined linear amplification and barcoding on the 10x Genomics Chromium platform. After the recovery of the barcoded DNA, a second adapter is introduced using second tagmentation with MeB/Me-Rev (P7)-loaded standard Tn5, followed by library amplification by PCR (Fig. 1d). This tagmentation protocol led to decreased cycle requirements for PCR amplification of the library in a bulk experiment, from 15 cycles to around 6 cycles (Extended Data Fig. 1a) yielding a library with similar concentration. The nucleosome phasing size profile typical for scCUT&Tag is not present in the nano-CT library (Extended Data Fig. 1b). The reduction in the number of PCR cycles reflects higher capture efficiency by the nano-Tn5 fusion protein and tagmentation protocol when compared to conventional CUT&Tag and can’t be explained solely by the introduction of linear pre-amplification step (Extended Data Fig. 1a).

We first tested the newly developed protocol by profiling of H3K27me3 in post-natal day 19 (P19) mouse brain. We obtained 6,798 single-cell profiles with a median of 3,720 unique fragments, which is 15.8-fold increase in the number of unique fragments per cell over the previous generation of scCUT&Tag technology when performed on the same platform and tissue (Fig. 1e). The increase in the number of unique fragments was associated with a reduction in the fraction of reads in peak regions by about 1.7-fold from 69% to 39% when using the same peak-calling parameters (Extended Data Fig. 1d).

We then constructed a 5-kilobase-bin-by-cell matrix, performed clustering and dimensionality reduction using latent semantic indexing (LSI) and uniform manifold approximation and projection (UMAP) and identified 13 clusters (Fig. 1f). We merged the newly generated data with the previous generation of scCUT&Tag data15 by combining the raw count matrices and performed dimensionality reduction (Fig. 1g). The individual clusters intermingled well among the technologies (Extended Data Fig. 1c) without using integration methods underscoring the reproducibility of the nano-CT protocol with previous scCUT&Tag technology (Fig. 1g).

We could identify and deconvolute more fine-grained clusters from the nano-CT data compared to the scCUT&Tag. For example, we deconvoluted the cluster containing astroependymal cells into four new sub-clusters (clusters 2, 3, 6, and 8) and the cluster vascular lepto-meningeal cells (VLMCs) into three new sub-clusters (cluster 0, 7, and 11) (Fig. 1f,g). On the basis of H3K27me3 profiles, we also detected cells from the broad spectrum of oligodendrocyte lineage, in contrast to scCUT&Tag, which could deconvolute these intermediate cell states only upon integration with single-cell RNA sequencing (scRNA-seq) data (Fig. 1f). Moreover, the marker bins identified by single-cell nano-CT showed a significantly higher capture rate (23.1%) versus markers identified by scCUT&Tag (6.6%) (Extended Data Fig. 2a–c) and the top enriched markers (50 for each cluster) showed significantly higher P value (Extended Data Fig. 2d,e; Wilcoxon test). Although nano-CT data also showed increased levels of background noise comparing to scCUT&Tag data, the levels of background were below the background of benchmarking encode H3K27me3 ChIP-seq data33 in nano-CT datasets (Extended Data Fig. 2f,g).

Multimodal single-cell nano-CT in the mouse brain

Next, we loaded the nanobody-Tn5 fusion proteins with different barcoded oligonucleotides to be able to track the insertions by distinct Tn5 fusion proteins (Fig. 2a and Supplementary Table 1). We performed nano-CT with single-cell indexing on the 10x Genomics Chromium ATAC-seq v1.1 platform on fresh mouse brain tissue obtained from 19-day-old mice (P19). By using uniquely barcoded oligonucleotides, we targeted two histone modifications simultaneously, H3K27ac and H3K27me3 (two biological replicates). In addition, we also profiled chromatin accessibility by a prior treatment with non-fused, barcoded Tn5 (ATAC-seq) in the same sample in two biological replicates, altogether profiling three epigenomic modalities in single cells (Fig. 2a). Performing ATAC-seq before nano-CT consistently resulted in increased nuclei loss and clumping and thus profiling of the three modalities required at least 200,000 cells as input material.

Fig. 2: Multimodal nano-CT. a, Cartoon depicting the strategy used to profile multiple epigenomic modalities. Individual Tn5 and nano-Tn5 are loaded with barcoded oligonucleotides that are used in the analysis to identify the source of tagmentation and demultiplex the modalities. b, Violin plots depicting the number of unique fragments per cell per replicate and modality. c, Violin plots depicting FrIP per cell per replicate and modality. d, UMAP embeddings of the multimodal nano-CT data for ATAC-seq, H3K27ac, and H3K27me3. The lines connect representations of the same cells in the individual modalities (4,434 cells in two biological replicates, which passed quality control for all three modalities individually and originate from the three-modal datasets; 200,000 cells used as input for all replicates). e, UMAP embedding of the individual modalities with cluster labels. n = 2 biological replicates—each biological replicate was profiled both by nano-CT with ATAC (3-modal) and nano-CT without ATAC (2-modal): 4,960 cells ATAC-seq, 12,464 cells H3K27ac, 12,763 cells H3K27me3; 200,000 cells were used as input for all replicates. Cell is shown in modality UMAP if it passes quality control in its respective modality regardless of the other modalities. AST_NT, astrocytes non-telencephalon; AST-TE, astrocytes telencephalon; AST_3, astrocytes 3; AST_4, astrocytes 4; INH1–4, inhibitory neurons; EXC1–4, excitatory neurons; MGL1–3, microglia 1–3; MAC, macrophages; VEC, vascular endothelial cells; PER, pericytes; CHP, choroid plexus epithelial cells; EPE, ependymal cells; CHP-EPE, choroid plexus + ependymal cells; BG, Bergmann glia; VSMC, vascular smooth muscle cells; ABC, arachnoid barrier cells. Full size image

We demultiplexed the three modalities and preprocessed the datasets individually using the Cellranger pipeline (10x Genomics). We selected the cells using two parameters—number of reads per cell and fraction of reads in peak per cell—and identified the high-quality cells using Gaussian mixture model clustering in each modality separately (Supplementary Fig. 1). Out of 13,428 and 5,157 cells identified in the two-modality and three-modality datasets 11,981 (89.2%) and 4,434 (85.9%) cells passed the quality control filter in all modalities, respectively (Extended Data Fig. 3a). We then compared the number of unique fragments per cell and the fraction of reads in peaks (FrIP) with our previous scCUT&Tag dataset15. We found that multimodal nano-CT consistently outperforms previous scCUT&Tag protocol in terms of the number of fragments per cell in each modality (median 6,123–1,4496 and 1,832–6,510 fragments per cell for nano-CT and 315–547 and 217–329 fragments per cell for scCUT&Tag; H3K27ac and H3K27me3, respectively; Fig. 2b and Extended Data Fig. 3b), also after downscaling the individual replicates to the same sequencing depth (30 million reads per sample, median 2,949 reads per cell nano-CT, 221 scCUT&Tag in H3K27me3; Extended Data Fig. 3d). We also detected lower signal-to-noise ratio measured as the fraction of reads in peak regions (35.9–52.4% and 36.3–48.9% nano-CT and 51.2–62.1% and 67.9–69.9 scCUT&Tag; H3K27ac and H3K27me3, respectively; Fig. 2c and Extended Data Fig. 3c). Increase in number of profiled modalities also resulted in modest increases in assay noise as measured by fraction of reads in peak regions (Fig. 2c) and fingerprint analysis (Extended Data Fig. 2g). The change in the library construction strategy resulted in a minimal number of duplicate linear amplification fragments across all samples (0.11–0.14% of total reads; Extended Data Fig. 3f).

Nano-CT also vastly outperforms previously reported multimodal histone profiling method-multi-CUT&Tag in terms of number of fragments per cell with median 95 and 428 reads per cell in multi-CUT&Tag and 9,135 and 3,201 median reads per cell in nano-CT in H3K27ac and H3K27me3, respectively (Extended Data Fig. 3e).

We then constructed cell-by-peak matrix for all modalities and performed dimensionality reduction using LSI, UMAP, and clustered the cells using each modality separately. We identified individual clusters in each modality, obtaining 15 ATAC-seq clusters, 28 H3K27ac clusters, and 24 H3K27me3 clusters and broadly classified them into major cell classes—neurons, oligodendrocytes, astroependymal, immune, and vascular cells (Fig. 2d,e). We then annotated the clusters by co-embedding the active modalities (ATAC and H3K27ac) together with the single-cell RNA-seq (scRNA-seq) mouse brain atlas dataset34 using canonical correlation analysis (CCA) (Extended Data Figs. 3g,h). All clusters displayed a combination of unique marker regions in all modalities including enhancers labeled by H3K27ac proximal to Mag/Mbp (OLG), Rbfox3 (pan-NEU), C1qa/C1qb (microglia/macrophage), Gad1 (inhibitory neurons), Neurod1 (excitatory neurons), Foxg1/Lhx2 (astrocytes telencephalon), Irx2 (astrocytes non-telencephalon), and Foxc1 (vascular endothelial cells/VLMCs/pericytes) among others (Fig. 3, Extended Data Figs. 4a,b).

Fig. 3: ATAC, H3K27ac and H3K27me3 at loci harboring microglia and mature oligodendrocyte marker genes. Genome browser tracks of the multimodal data for several clusters showing marker peak regions. Markers: Mag for ATAC/H3K27ac in mature oligodendrocytes and Dmkn for H3K27me3 in microglia. Full size image

To validate the specificity of individual deconvoluted modalities, we investigated chromatin states at the Hox genes clusters, expressed in the caudal but not rostral central nervous system, and found strong enrichment of H3K27me3, but not ATAC and H3K27ac, at HoxA locus (Fig. 4a), underscoring the specificity of individual modalities at the cluster level. Similar enrichment of H3K27me3 was observed in other Hox loci including HoxB, HoxC, and HoxD (Extended Data Fig. 5a–c). Then we performed principal component analysis (PCA) of ATAC-seq, H3K27ac, and H3K27me3 pseudo-bulk tracks for each individual cluster identified by scCUT&Tag15 (single modality) and nano-CT (multimodal). We selected the 50 most significant marker regions (peaks) from all clusters and across all modalities, merged any overlapping regions and used them as features for PCA. The PCA showed that cell populations identified through the individual modalities co-clustered together regardless of the method used for obtaining the data (Fig. 4b). We then selected a set of highly specific peaks for both H3K27ac and H3K27me3 on the basis of our previous scCUT&Tag data in astrocytes15 and generated a metagene plot of H3K27ac/H3K27me3 signal obtained by multimodal nano-CT. We observed high enrichment of the respective modifications only in the respective set of marker peaks (Fig. 4c). We also plotted correlation matrix of H3K27me3/H3K27ac signal in all peak regions identified in nano-CT and observed strong correlation of the respective H3K27ac and H3K27me3 tracks, and no correlation in H3K27ac–H3K27me3 combinations (Fig. 4d). Despite the high enrichment of the respective H3K27ac and H3K27me3 modalities, a small subset of peak regions showed signal of both H3K27ac and H3K27me3 specifically in nano-CT dataset (Fig. 4c). To further quantify this overlap, we intersected peaks called from H3K27ac and H3K27me3 in the astrocytes telencephalon cluster and found that 1,093 peaks were overlapping, representing 4.4% of H3K27ac peaks and 11.5% of H3K27me3 peaks (Fig. 4e).

Fig. 4: Quality control and benchmarking of nano-CT. a, Genome browser nano-CT pseudo-bulk view of the HoxA region on chromosome 6 for all three modalities. b, Principal component analysis (PCA) of pseudo-bulk tracks for each cluster identified from the respective modalities by scCUT&Tag and nano-CT. Top 50 marker regions were selected from each nano-CT cluster and modality, and all peaks were merged and flattened before running PCA. c, Metagene plots showing the signal distribution of H3K27ac and H3K27me3 in astrocyte populations obtained by nano-CT and scCUT&Tag around specific H3K27ac and H3K27me3 peaks. The peaks were defined and selected on the basis of reference scCUT&Tag data15. d, Scatter plots matrix showing correlation of H3K27ac and H3K27me3 signal in astrocyte populations defined by scCUT&Tag and nano-CT. r, Pearson correlation coefficient. Cell labels as in Fig. 2. e, Venn diagram showing the genomic overlap of significant H3K27ac and H3K27me3 peaks in cluster AST-TE. Full size image

We also investigated how the overlapping modalities such as ATAC-seq and H3K27ac would interfere with each other. ATAC-seq tagmentation is performed as the first and optional step in the nano-CT protocol (Fig. 1c) and therefore we hypothesized that ATAC-seq signal should not be affected by other histone modalities. Indeed, the open chromatin signal obtained in nano-CT correlated well with the scATAC-seq profiled as single modality on the chromium platform (10x Genomics datasets) regardless of the levels of H3K27ac signal within the same region in astrocytes (Extended Data Fig. 6a–c). On the other hand, H3K27ac signal was slightly affected by the overlapping ATAC-seq signal specifically in regions with high levels of open chromatin and hence high ATAC-seq signal (Extended Data Fig. 6d–f). In summary, single-cell nano-CT can be used to simultaneously obtain robust and specific multimodal epigenetic profiles of several histone modifications and open chromatin from single cells.

Integrative multimodal analysis of the epigenomic states

The variability in the number of identified clusters suggests that some modalities might be more informative towards cell identity, given the similar number of fragments per cell (Fig. 2b). To investigate whether the cell identities assigned using different modalities concord, we generated a confusion matrix for all combinations of modalities (Extended Data Fig. 7). The major cell type identities were fully recapitulated across all three modalities (Extended Data Fig. 7a–c), whereas there were specific clusters identified only in subset individual modalities (Extended Data Fig. 7d–f). For example, pericytes and vascular smooth muscle cells can be deconvoluted from the H3K27ac modality, but not from the H3K27me3 or ATAC-seq modality (Extended Data Fig. 7d–f).

To further improve the clustering, we used all of the multimodal matrices to perform weighted nearest neighbors (WNN) analysis35. The WNN largely recapitulated the clusters identified by each individual modality (Fig. 5a and Extended Data Fig. 8a). We then investigated whether features that explain most of the variability in single cells (LSI component loadings) overlapped among the different modalities. We found that H3K27ac and the ATAC-seq features overlapped the most (10,601 overlapping regions), but also a large fraction of genomic regions showed variability in all three modalities simultaneously (9,463) (Fig. 5b). For example, the locus surrounding gene coding for transcription factor Foxg1 was heavily regulated in two major classes of astrocytes. Whereas the chromatin surrounding Foxg1 was primarily open and K27 acetylated in telencephalon astrocytes, it was strongly K27 trimethylated in non-telencephalon astrocytes (Fig. 5c). Given that Foxg1 has previously been linked with early neurogenesis in the forebrain36 and with inhibition of gliogenesis37,38 the epigenomic state of astrocytes likely reflects the developmental origin of the astrocyte populations. Other developmental and patterning transcription factors such as Lhx2, Foxb1 or Irx2 were also found to be differentially enriched at an epigenomic level in astrocyte populations (Fig. 5d and Extended Data Fig. 8b,c).

Fig. 5: Multimodal analysis and visualization of the nano-CT data. a, UMAP embedding of the individual modalities with cluster labels identified through WNN analysis. Embedding is based on individual modalities, whereas cluster identities are assigned from WNN dimensionality reduction. b, Venn diagram showing the overlap of peaks identified from the individual modalities. c,d, UMAP projection and visualization of ATAC, H3K27ac and H3K27me3 signal intensity in single cells at the Foxg1 (c) and Irx2 loci (d). Gray lines connect the cells with same the single-cell barcodes across the different modalities. Clusters for telencephalon astrocytes (AST_TE) and non-telencephalon astrocytes (AST_NT) were selected for the visualization. Aggregated pseudo-bulk tracks for all modalities together with genomic annotations are shown to the right. Full size image

Sequential waves of H3K27me3 in the oligodendrocyte lineage

One of the strengths of multimodal nano-CT is that it allows for direct and simultaneous analysis of the dynamics of multiple histone marks and chromatin accessibility in the same cells. Our post-natal day P19 brain dataset covers the progression of the whole oligodendrocyte lineage from oligodendrocyte progenitor cells towards mature oligodendrocytes. Therefore, we focused on the oligodendrocyte lineage, and we used the combined WNN embedding to generate pseudo-time of oligodendrocyte differentiation (Fig. 6a). We then projected the pseudo-time identified from WNN onto UMAPs for the individual modalities, which recapitulated the gradient found in WNN embedding (Extended Data Fig. 9a). Moreover, the predicted trajectory was consistent with the published trajectory of oligodendrocytes differentiation identified by scRNA-seq (Extended Data Fig. 9b) and gene expression and H3K27ac was correlated for lineage marker genes (Extended Data Fig. 9c–e).

Fig. 6: nano-CT reveals sequential H3K27me3 waves during oligodendrocyte differentiation. a, UMAP embedding showing pseudo-time calculated by slingshot on the basis of WNN dimensionality reduction and cluster identities. b, Scatter plot depicting meta-region score for all modalities (y-axis) and pseudo-time (x-axis). The score was calculated as a sum of normalized score across all regions. The regions were selected on the basis of P value (P < 0.05, Wilcoxon test) and log fold change > 0 at the marker regions of the ATAC modality, and top 200 regions were used. The line depicts local polynomial regression fit (loess) of the data and shaded regions depict 95% confidence intervals. c, Heat map representation of the H3K27me3 signal intensity at the regions the marker regions that are gaining H3K27me3 during oligodendrocytes differentiation (P < 0.05, Wilcoxon test, log fold change > 0, top 200 regions). Each column depicts one single cell and row single genomic region (peak). Cells are ordered by pseudo-time calculated as shown in a. The order of the regions is based on k-means clustering of the matrix with k = 2. d, Scatter plots depicting meta-region score for all modalities (y-axis) and pseudo-time (x-axis). The score was calculated as a sum of normalized score across all regions. The regions were selected on the basis of P value (P < 0.05, Wilcoxon test) and log fold change > 0 at the marker regions of the H3K27ac modality, and top 200 regions were used. The regions were further stratified to wave 1 and wave 2 regions on the basis of k-means clustering as shown in c. The line depicts local polynomial regression fit (loess) of the data and shaded regions depict 95% confidence intervals. Full size image

Then, we investigated the magnitude of changes in all measured chromatin modalities at sites that are the most dynamically opening or acquiring histone modifications. We observed that loci that are opening the chromatin (ATAC-seq) acquire H3K27ac with a slight pseudo-time delay, whereas there is overall little change in the overall H3K27me3 state of these sites (Fig. 6b). The ATAC signal reaches its plateau in an earlier stage in the pseudo-time than H3K27ac for the top open chromatin regions. A similar, although less prominent effect is observed, when looking specifically at loci that acquire H3K27ac during oligodendrocytes differentiation (Extended Data Fig. 9f). Thus, our analysis indicates that the chromatin opening precedes deposition of H3K27ac at loci that are marked for gene expression.

Strikingly, we found that H3K27me3 deposition occurs in two distinct waves during oligodendrocytes differentiation (Fig. 6c,d). The first wave of H3K27me3 occurred early on in the differentiation process and was associated with repression of genes expressed predominantly in neurons, whereas the second wave repressed both neuronal genes (Extended Data Fig. 9g) and genes expressed in oligodendrocyte progenitor cells (for example Sox5, Sox6, and Ptprz1), which are associated with Gene Ontology (GO) terms gliogenesis, glial cell differentiation, and oligodendrocyte differentiation (Extended Data Fig. 9h). Thus, oligodendrocyte lineage progression encompasses two sequential H3K27me3 repressive states, which would not be possible to discriminate using transcriptomic data. In summary, multimodal nano-CT analysis allows unique insights into the epigenomic processes driving biological processes such as oligodendrocyte differentiation.

Chromatin velocity of oligodendrocyte lineage

It has been shown that differentiation kinetics can be predicted from scRNA-seq data using RNA velocity modeling on the basis of the ratio of spliced/unspliced mRNA39. Recently, a similar concept has been proposed for chromatin velocity, using information from two anti-correlated chromatin modalities—heterochromatin and open chromatin29. Our multimodal chromatin profiling allows us to define several chromatin velocities: ATAC/H3K27ac; ATAC/H3K27me3; and H3K27ac/H3K27me3. Our analysis (Fig. 6b) suggests that chromatin accessibility precedes H3K27ac, which is analogous to the spliced/unspliced mRNA relationship. Although the difference between ATAC-seq and H3K27ac is subtle, it led us to hypothesize that we can directly leverage the velocity framework to predict the directionality of a differentiation pathway. We tested this idea by generating a gene-by-cell matrix for ATAC and H3K27ac and using these as an input into the scvelo algorithm40. Indeed, ATAC/H3K27ac velocity accurately predicted the differentiation trajectory of the oligodendrocyte lineage from oligodendrocyte progenitor cells towards mature oligodendrocytes (Fig. 7a). Phase plots of ATAC-seq versus H3K27ac of several genes associated with oligodendrocyte differentiation such as Mal or Mog further supported this directionality (Fig. 7b).

Fig. 7: nano-CT-based chromatin velocity analysis. a, UMAP projection and chromatin velocity visualization. The chromatin velocity was calculated by using ATAC-seq gene-by-cell matrix as input into the unspliced layer and H3K27ac gene-by-cell matrix into the spliced layer and then running scvelo algorithm using default parameters. b, Phase plots of ATAC-seq and H3K27ac signal for key genes associated with oligodendrocyte differentiation (Mal, Mag). c, UMAP projection of the latent time calculated by the scvelo algorithm. d, Heat map showing H3K27ac signal normalized with sctransform42. Rows depict individual top velocity driver genes, sorted by time of value with maximum intensity and columns depict individual cells sorted by latent time. e, Heat map representing gene expression profiles measured by scRNA-seq41 in the oligodendrocyte lineage. Rows depicts individual genes, clustered by similarity and columns depict single cells ordered in pseudo-time. f, Violin plot showing normalized expression of set of marker genes identified in scRNA-seq dataset, and normalized expression of a set of genes identified as the key driver genes by scvelo. g, UMAP projection and velocity vectors projection of chromatin velocity calculated using H3K27ac gene-by-cell matrix used as input into unspliced layer and H3K27me3 gene-by-cell matrix used as input into the spliced layer and then running the scvelo algorithm using default parameters. Full size image

We then identified the key velocity driver genes (Extended Data Fig. 10a and Supplementary Table 2) and plotted normalized H3K27ac, ATAC signal, and velocity along inferred latent time (Fig. 7c) highlighting the dynamic changes in chromatin during oligodendrocyte differentiation (Fig. 7d and Extended Data Fig. 10b,c). We validated that the majority of the driver genes also show variable gene expression (measured by scRNA-seq) in the oligodendrocyte lineage41 (Fig. 7e). Interestingly, these driver genes are relatively lowly expressed, compared to the typical marker genes identified by scRNA-seq (Fig. 7f). The GO terms associated with these genes were nervous system development, cell development, and neurogenesis or generation of neurons (Extended Data Fig. 10d), suggesting that the set of genes identified through chromatin velocity comprised non-canonical oligodendrocyte differentiation genes. This indicates the potential of multimodal chromatin profiling and chromatin velocity modeling in identification of genes, which might be dynamically regulated through changes in the chromatin landscape, but difficult to pick up through gene expression profiling.

Finally, we attempted to model RNA velocity using other combinations of chromatin modalities that do not follow the same causal relationship as ATAC/H3K27ac as input into the velocity analysis. H3K27ac/H3K27me3 are anti-correlated and mutually exclusive histone marks and this combination of modalities did not correctly predict the oligodendrocyte differentiation trajectory (Fig. 7g). Thus, anti-correlated active and repressive chromatin marks might require other modeling or data pre-processing strategies to correctly predict chromatin velocity.