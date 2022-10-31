Using antibodies against mutually exclusive Histone H3 lysine 27 trimethylation (H3K27me3) and RNA polymerase II phosphorylated at serine 5 of the C-terminal domain (PolIIS5P) in human K562 chronic myelogenous leukemia cells as controls, we systematically tested a variety of protocol conditions for antibody–barcode association with the goal of optimizing both assay efficiency and fidelity of target identification (Extended Data Fig. 1a). In contrast with previous reports21, we found that both pre-incubation of barcoded protein A-Tn5 (pA–Tn5) complexes and combined incubation and tagmentation of all antibodies simultaneously resulted in high levels of spurious cross-enrichment between targets (Extended Data Fig. 1b,c), leading us to use adapter-conjugated antibodies loaded into pA–Tn5 to tagment multiple targets in sequence. We also found that tagmenting in sequence beginning with the target predicted to be less abundant (PolIIS5P in this case) modestly reduced off-target read assignment (Extended Data Fig. 1d). We further found that primary antibody conjugates resulted in superior target distinction versus secondary antibody conjugates (Extended Data Fig. 1b,c) but also variable data quality, likely owing to fewer pA–Tn5 complexes accumulating per target locus in the absence of a secondary antibody. To overcome this obstacle, we (1) loaded pA–Tn5 onto 1° antibody-conjugated i5 forward adapters; (2) tagmented target chromatin in sequence; and (3) added a secondary antibody followed by pA–Tn5 loaded with i7 reverse adapters and carried out a final tagmentation step (Fig. 1a). This resulted in libraries that were as robust as matched CUT&Tag experiments, particularly for H3K27me3 (Extended Data Fig. 1e). We dubbed this combined approach MulTI-Tag (Fig. 1a). MulTI-Tag profiles for each of H3K27me3 and PolIIS5P profiled in sequence were highly accurate for on-target peaks as defined by ENCODE chromatin immunoprecipitation followed by sequencing (ChIP-seq) (Fig. 1b,c) and had similar specificity of enrichment to CUT&Tag as measured by fraction of reads in peaks (Extended Data Fig. 1f), indicating that MulTI-Tag recapitulates target enrichment without cross-contamination that may confound downstream analysis.

Fig. 1: MulTI-Tag directly identifies user-defined chromatin targets in the same cells. a, Schematic describing the MulTI-Tag methodology. (1) Antibody–oligonucleotide conjugates are used to physically associate forward-adapter barcodes with targets and are loaded directly into pA–Tn5 transposomes for sequential binding and tagmentation. (2) pA–Tn5 loaded exclusively with reverse adapters are used for a secondary CUT&Tag step to efficiently introduce the reverse adapter to conjugate-bound loci. (3) Target-specific profiles are distinguished by barcode identity in sequencing. b, Genome browser screenshot showing individual CUT&Tag profiles for H3K27me3 (first row) and RNA PolIIS5P (second row) in comparison with MulTI-Tag profiles for the same targets probed individually in different cells (third and fourth rows) or sequentially in the same cells (fifth and sixth rows). c, Heat maps describing the enrichment of H3K27me3 (red) or RNA PolIIS5P (blue) signal from sequential MulTI-Tag profiles at CUT&Tag-defined H3K27me3 peaks (left) or RNA PolIIS5P peaks (right). d, Genome browser screenshot showing H3K27me3 (red), H3K4me2 (purple) and H3K36me3 (teal) MulTI-Tag signal from experiments in H1 hESCs using an individual antibody (rows 1, 3 and 5) or all three antibodies in sequence (rows 2, 4 and 6). e, Normalized CUT&Tag (light colors) and MulTI-Tag (dark colors) enrichment of H3K27me3, H3K4me2 and H3K36me3 across genes in H1 hESCs. RPM, reads per million. Full size image

In H1 human embryonic stem cells (hESCs), we simultaneously profiled three targets that represent distinct waypoints during developmental gene expression: H3K27me3, enriched in developmentally regulated heterochromatin22,23; H3K4me2, enriched at active enhancers and promoters24; and H3K36me3, co-transcriptionally catalyzed during transcription elongation25,26 (Fig. 1d,e). In comparison with control experiments in which each of the three targets was profiled individually, MulTI-Tag retains similar accuracy of target-specific enrichment in peaks (Extended Data Fig. 2a) and efficiency of signal over background (Extended Data Fig. 2b). Moreover, both control and MulTI-Tag experiments exhibit characteristic patterns of enrichment for each mark, including H3K4me2 at promoters, H3K36me3 in gene bodies and H3K27me3 across both (Fig. 1e). Of note, we observed regions with overlap between H3K27me3 and H3K4me2 for both CUT&Tag and MulTI-Tag samples consistent with known ‘bivalent’ chromatin in hESCs27. The enrichment of these regions in our MulTI-Tag was similar to standard CUT&Tag, indicating that tagmenting targets in sequence does not preclude detection of expected co-enrichment of two targets at the same loci (Extended Data Fig. 2c,d).

Given the successful adaptation of CUT&Tag for single-cell profiling16,28,29,30, we sought to use MulTI-Tag for single-cell molecular characterization (Fig. 2a). To do so, we adapted the Takara ICELL8 microfluidic system for unique single-cell barcoding via combinatorial indexing (Fig. 2a and Methods). In a pilot combinatorial indexing MulTI-Tag experiment profiling H3K27me3 and H3K36me3 either individually or in combination in a mixture of human K562 cells and mouse NIH3T3 cells, we calculated cross-species collision rates as 9.9% (231/2,334, H3K27me3), 10.7% (173/1,623, H3K36me3) and 11.0% (358/3,262, H3K27me3–H3K36me3) of cells yielding <90% of reads from a single species (Extended Data Fig. 3a,b). These statistics are similar to the same metrics reported for combinatorial indexing-based assay for transposase-accessible chromatin with sequencing (ATAC-seq) (7–12%10,31). To confirm that MulTI-Tag could be used to distinguish a mixture of cells originating from the same species, we jointly profiled H3K27me3 and H3K36me3 in K562 cells, H1 hESCs and a mixture of the two cell types, yielding 21,548 cells (7,025 K562, 7,601 H1 and 6,922 Mixed) containing at least 100 unique H3K27me3 and 100 unique H3K36me3 reads (Fig. 2b and Extended Data Fig. 3c). For most peaks defined by ENCODE ChIP-seq (91.4% and 92.4% for H3K27me3 in H1 and K562 cells; 84.9% and 94.8% for H3K36me3 in H1 and K562 cells), more than 80% of fragments corresponded to the expected target (Extended Data Fig. 3d,e). Moreover, MulTI-Tag uniformity of coverage at representative loci (Extended Data Fig. 3f), cell recovery from input, and library complexity as measured by unique reads per cell were similar or superior to analogous published methods for single-cell chromatin profiling21,28,29,32 (Extended Data Fig. 3g).

Fig. 2: MulTI-Tag in single cells. a, Schematic describing single-cell MulTI-Tag experiments. H1 hESCs (fuschia) and K562 cells (gold) were profiled separately or in a mixture of the two cell types in bulk, and then cells were dispensed into nanowells on a Takara ICELL8 microfluidic device for combinatorial barcoding via amplification. b, Genome browser screenshot showing aggregated single-cell MulTI-Tag data (rows 2, 4, 6 and 8) in comparison with ENCODE ChIP-seq data (rows 1, 3, 5 and 7) profiling H3K27me3 (rows 1, 2, 5 and 6) and H3K36me3 (rows 3, 4, 7 and 8) in K562 (rows 1–4) and H1 (rows 5–8) cells. All single-cell MulTI-Tag data are from cells co-profiled with H3K27me3 and H3K36me3. c, Connected UMAP plots for single-cell MulTI-Tag data from H1 and K562 cells. Projections based on H3K27me3 (left), H3K36me3 (right) or a WNN integration of H3K27me3 and H3K36me3 data (center) are shown. NMI of cell type cluster accuracy is denoted for each projection. Lines are connected between points that represent the same single cell in different projections. d, WNN UMAP projections with MulTI-Tag enrichment scores plotted for POLR3E (top left), HOXD3 (top right), HOXB3 (bottom left) and SALL4 (bottom right). The balance of enrichment between H3K36me3 and H3K27me3 in each cell is denoted by color, and the total normalized counts in each cell are denoted by the transparency shading. Full size image

We used uniform manifold approximation and projection (UMAP)33,34 to project single-cell data into low-dimensional space based on enriched features defined for H3K27me3, H3K36me3 or a combination of both based on weighted nearest neighbor (WNN) integration35 and clustered the resulting projections (Fig. 2c). Using our known cell type labels to calculate cluster normalized mutual information (NMI) on a scale of 0 (no cell type distinction by cluster) to 1 (perfect cell type distinction by cluster), H3K27me3 (0.913), H3K36me3 (0.944) and H3K27me3–H3K36me3 combined (0.930) were all highly proficient in cluster distinction (Fig. 2c). Additionally, 99.1% (6,383/6,443) of ‘Mixed’ cells occupied non-ambiguous clusters defined nearly exclusively by either H1 or K562 cells (Fig. 2c). Constitutively expressed (POLR3E) or silenced (HOXD3) genes exhibited cluster non-specific enrichment of H3K36me3 and H3K27me3, respectively, and genes expressed exclusively in K562 (HOXB3) or H1 (SALL4) cells were enriched for H3K36me3 in the cell-specific cluster versus H3K27me3 in the other (Fig. 2d). To further demonstrate the flexibility of target combinations possible with MulTI-Tag, we profiled K562, H1 and K562–H1 Mixed cells in three additional target pair combinations (H3K27me3–PolIIS5P, H3K27me3–H3K9me3 and H3K27me3–H3K4me1) (Extended Data Fig. 4a,b). All individual marks distinguished cell types with high efficiency with the exception of H3K4me1, likely owing to the fact that only 27 K562 cells were analyzed for H3K4me1 enrichment after quality control filtering (Extended Data Fig. 4c). In all, these results show that MulTI-Tag can use enrichment of multiple targets to distinguish mixtures of cell types.

Because MulTI-Tag uses barcoding to define fragments originating from specific targets, we can directly ascertain and quantify relative target abundances and instances of their co-occurrence at the same loci in single cells. To establish methods for cross-mark analysis in single cells, we co-profiled the aforementioned transcription-associated marks (H3K27me3–H3K4me2–H3K36me3) by MulTI-Tag in single H1 and K562 cells with high target specificity (Fig. 3a,b and Extended Data Fig. 5a–e). When we calculated the percentage of unique reads originating from each of the three profiled targets in each single cell, we found that H3K27me3 represented the vast majority (89.4% and 80.0% in K562 cells and H1 cells) of unique reads (Fig. 3c). This is consistent with previously reported mass spectrometry36 and single-molecule imaging37 quantification of H3K27me3 versus H3K4me2 species and with a reported higher abundance of H3K27me3 in differentiated versus pluripotent cells38. By mapping fragments from any target in H1 and K562 cells onto genes in a window from 1 kilobase (kb) upstream of the transcription start site (TSS) to the gene terminus, we found notable instances of genes that show co-enrichment of distinct targets in the same single cells, including H3K4me2 and/or H3K36me3 enrichment in NR5A2 linked with H3K27me3 enrichment in HOXB3 in the same H1 cells and vice-versa in K562 cells (Fig. 3e). We were also able to classify genes by the frequency with which they were singly or co-enriched with specific targets in an individual cell. H1 hESCs had a higher frequency of most co-enriched target combinations than K562 cells (Extended Data Fig. 5f), including ‘bivalent’ H3K27me3–H3K4me2 co-enrichment in the same gene in individual cells27 (Fig. 3e,f). We used Cramér’s V (ref. 39) to quantify the degree of co-enrichment between each pair of targets in the same genes in the same single cells, and we confirmed that H1 cells had a higher degree of co-enrichment between H3K27me3 and H3K4me2 than K562 cells (Fig. 3g). Curiously, the same was true for association between H3K27me3 and H3K36me3, despite previous observations that H3K27me3 and H3K36me3 appear to be antagonistic in vitro and in vivo40,41 (Fig. 3g). Nevertheless, in CUT&Tag, in bulk MulTI-Tag and in previously published ENCODE ChIP-seq data from H1 hESCs, we were similarly able to detect co-occurrence of H3K27me3 at the 5′ ends and H3K36me3 at the 3′ ends of several genes, concomitant with their low expression as quantified by ENCODE RNA sequencing (RNA-seq) data (Extended Data Fig. 6a–d). Together, these results shed light on patterns of chromatin enrichment at single-cell, single-locus resolution.

Fig. 3: Coordinated multifactorial analysis in the same cells using MulTI-Tag. a, Schematic describing a three-antibody MulTI-Tag experiment. b, Connected UMAP plots for single-cell MulTI-Tag data from H1 and K562 cells. Projections based on H3K27me3 (top), H3K4me2 (left), H3K36me3 (right) or a WNN integration of H3K27me3, H3K4me2 and H3K36me3 data (center) are shown. Lines are connected between points that represent the same single cell in different projections. c, Violin plots describing the distribution of the proportions of MulTI-Tag H3K27me3 (red), H3K4me2 (purple) or H3K36me3 (teal) unique reads out of total unique reads in individual H1 (left) or K562 (right) cells. d, Schematic describing coordinated multifactorial analysis strategy for MulTI-Tag. Genes in individual cells are analyzed for the enrichment of all MulTI-tag targets, and gene–cell target combinations are mapped onto a matrix for clustering and further analysis. e, Top: heat map describing co-occurrence of MulTI-tag targets in six genes of interest in each of 373 H1 cells and 372 K562 cells. The balance of enrichment between H3K4me2/H3K36me3 and H3K27me3 in each cell is denoted by color, and the total normalized counts in each cell are denoted by the transparency shading. Bottom: Instances of ‘bivalent’ enrichment of H3K27me3 and H3K4me2 or H3K36me3 in the same gene in the same cell are highlighted, with color reflecting normalized counts. f, WNN UMAP projection with cells colored by the sum of all counts occurring in a ‘bivalent’ context (that is, H3K27me3 and H3K4me2/H3K36me3 enrichment in the same gene). g, Violin plots describing calculated Cramér’s V of association between target combinations listed at bottom in individual H1 (fuschia, n = 373) or K562 (gold, n = 372) cells. Full size image

To ascertain how histone modifications co-occur in single cells in a continuous developmental context, we differentiated H1 hESCs into three germ layers (Endoderm, Mesoderm and Ectoderm); harvested nuclei at 24-hour timepoints across the three time courses; and used MulTI-Tag to co-profile H3K27me3, H3K4me1 and H3K36me3, resulting in 7,727 cells meeting quality filters (Fig. 4a and Extended Data Fig. 7a). A UMAP based on H3K36me3 was unable to distinguish cell types as calculated by NMI for distinct cluster assignment of the four terminal cell types (NMI = 0.0166; Extended Data Fig. 7b). However, UMAPs based on H3K27me3 (NMI = 0.4060), H3K4me1 (NMI = 0.277) or WNN synthesis of H3K27me3 and H3K4me1 signal (NMI = 0.3403) all distinguished two major clusters corresponding to endoderm and mesoderm, along with H1-dominant or ectoderm-dominant clusters that were partially mixed, consistent with H1 hESC gene expression profiles being more similar to ectoderm42 (Fig. 4b and Extended Data Fig. 7b). To determine how well MulTI-Tag profiles reflect expected developmental trajectories, we used H3K27me3, H3K4me1 or combined H3K27me3–H3K4me1 MulTI-Tag data to infer pseudotemporally ordered differentiation trajectories using monocle3 (ref. 43). We then calculated two quality metrics: frequency of cell type assignment to an incorrect trajectory and inversion frequency, or the likelihood that ‘correct’ trajectory timepoints derived from known differentiation age were ‘out of order’ based on the inference (Fig. 4d and Extended Data Fig. 8a–f). Relative to either H3K27me3 or H3K4me1 pseudotime alone, inferred H3K27me3–H3K4me1 pseudotime correlated more closely with known differentiation age based on experimental timepoints (Fig. 4c and Extended Data Fig. 8g) and minimized both incorrect trajectory assignment and trajectory-specific inversion rates (Extended Data Fig. 8h). Moreover, the H3K27me3–H3K4me1 inferred trajectories alone recapitulated two major known branch points in hESC tri-lineage differentiation: partitioning of Ectoderm and Mesendoderm lineages at the outset of differentiation based on TGF-β and WNT signaling and subsequent separation of Endoderm and Mesoderm based on BMP and FGF signaling44,45 (Fig. 4d). These results show that multifactorial data integration is important for accurately representing continuous developmental chromatin states.

Fig. 4: MulTI-Tag profiling of continuous developmental trajectories. a, Schematic describing differentiation of H1 hESCs (black) into three germ layers—Ectoderm (blue shading), Endoderm (red shading) and Mesoderm (green shading)—followed by MulTI-Tag profiling of H3K27me3, H3K4me1 and H3K36me3. b, Connected UMAP plots for single-cell MulTI-Tag data from H1 hESCs differentiated to three germ layers. Projections based on H3K27me3 (left), H3K36me3 (right) or a WNN integration of H3K27me3 and H3K36me3 data (center) are shown. Lines are connected between points that represent the same single cell in different projections. c, Violin plot showing the distribution of inferred pseudotimes derived from a WNN integration of H3K27me3 and H3K4me1 data for each cell type profiled. Number of cells profiled for each cell type is denoted at left. d, WNN UMAP projection colored by percent H3K27me3 as a proportion of total unique reads in each single cell. User-defined cell type clusters are denoted by dashed lines, and computationally derived pseudotemporal trajectories are denoted by solid lines and user-classified by color. e, Heat map describing co-occurrence of MulTI-tag targets in selected genes of interest whose RNA-seq expression increases (top) or decreases (bottom) during differentiation from hESC to mesoderm in 4,754 single cells classified as hESC or different stages of differentiated mesoderm. Heat maps are sorted left to right by increasing pseudotime in the mesendoderm/mesoderm trajectory. The balance of enrichment between H3K4me1/H3K36me3 and H3K27me3 in each cell is denoted by color, and the total normalized counts in each cell are denoted by the transparency shading. f, hESCs plotted to the WNN UMAP projection and colored by predicted H3K27me3 percent as a proportion of total unique reads (Methods). hESCs adjacent to the ectoderm trajectory or the mesendoderm trajectory are denoted by arrows. g, Heat maps denoting H3K27me3 enrichment in ‘high-H3K27me3’ and ‘low-H3K27me3’ hESCs (left); log fold change (LFC) in enrichment (center); and −log 10 (P value) of differential enrichment (right) for select genes colored by their function in hESCs (black), mesendoderm (gray), endoderm (red), mesoderm (green) or ectoderm (blue). Full size image

To determine how continuous transitions in chromatin enrichment across differentiation correlate with changes in developmental gene expression, we quantified changes in H3K27me3, H3K4me1 and H3K36me3 enrichment across pseudotime in transcription factors (TFs) with the highest reported fold change enrichment in RNA-seq44 between a terminal cell type (endoderm, mesoderm or ectoderm) and hESCs. Notably, there were trajectory-specific differences in enrichment changes: for TFs whose expression declines during differentiation as measured by RNA-seq, we observed a decline in H3K36me3 enrichment across pseudotime accompanied by relatively low and stable levels of H3K4me1 and H3K27me3 in the mesoderm and endoderm trajectories, whereas the ectoderm trajectory was characterized only by a decline in H3K4me1 enrichment (Extended Data Fig. 9a). For TFs whose expression increases, H3K27me3 is lost gradually in a pseudotime-dependent manner in endoderm and mesoderm trajectories, whereas, in the ectoderm trajectory, H3K27me3 is low at the onset of differentiation, and H3K36me3 enrichment increases across pseudotime (Extended Data Fig. 9b). These phenomena were particularly pronounced for core regulators of cell identity, including LEF1 in mesoderm and SOX17 and FOXA2 in endoderm, whereas ectoderm regulators, such as OTX2, were largely devoid of H3K27me3 early in the ectoderm trajectory (Fig. 4e and Extended Data Fig. 9c,d), indicating that different trajectories manifest distinct temporal chromatin trends at genes important for differentiation.

The unique enrichment profile of the ectoderm trajectory led us to wonder whether changes in global histone modification enrichment may be similarly distinct. As with our experiments in H1 and K562 cells, we calculated the percentage of unique reads assigned to each of the three targets in single cells and analyzed how target balance changed across trajectories. We found that the ectoderm trajectory exhibited a rapid, pseudotime-dependent reduction in H3K27me3 as a percentage of all targets (Extended Data Fig. 10a), resulting in terminal ectoderm exhibiting significantly lower H3K27me3 percentage than other cell types (Fig. 4d and Extended Data Fig. 10b). Notably, hESCs predicted to participate in the ectoderm trajectory also had a lower percentage of H3K27me3 than those participating in the mesendoderm trajectory (P < 1 × 10−5, Wilcoxon rank-sum test) (Fig. 4f). To ascertain whether H3K27me3 level was correlated with developmental gene regulation, we partitioned hESCs into ‘low’ and ‘high’ H3K27me3 groupings, calculated normalized differences in gene-specific enrichment and examined a panel of known regulators of germ cell differentiation (Fig. 4g and Extended Data Fig. 10c). Curiously, whereas most genes exhibited a negligible or modest decline in enrichment despite different global H3K27me3 levels, including constitutively silenced genes such as HOXB3, TFs specifically active in the first phase of germ layer specification after pluripotency exit, including TBXT (T) and OTX2, were strongly de-repressed in the ‘low’ population of cells (Fig. 4f and Extended Data Fig. 10d), suggesting that low H3K27me3 in hESCs is accompanied by a uniquely configured developmental state. TFs de-repressed in the ‘low’ population were enriched for Gene Ontology terms related to organ/anatomical development and pattern specification but not for terms related to neurogenesis, suggesting that such cells were generally primed for differentiation rather than representing spuriously differentiated ectoderm (Extended Data Fig. 10e). Finally, we quantified intragenic ‘bivalent’ H3K27me3–H3K4me1 co-occurrence across cell types and found that ectoderm bivalency is significantly lower than hESCs, endoderm or mesoderm, consistent with the original observation that bivalency is absent in neuronally derived lineages27 (Extended Data Fig. 10f). Bivalency was equivalent in H3K27me3-low and H3K27me3-high hESC populations, however, indicating that pluripotency-specific chromatin characteristics are maintained in H3K27me3-low hESCs despite their distinct chromatin environment (Extended Data Fig. 10f). Taken together, these results show that global changes in chromatin modification enrichment and co-enrichment that can be detected before differentiation are associated with specific developmental endpoints.