Chromatin profiling provides a versatile means to investigate functional genomic elements and their regulation. However, current methods yield ensemble profiles that are insensitive to cell-to-cell variation. Here we combine microfluidics, DNA barcoding and sequencing to collect chromatin data at single-cell resolution. We demonstrate the utility of the technology by assaying thousands of individual cells and using the data to deconvolute a mixture of ES cells, fibroblasts and hematopoietic progenitors into high-quality chromatin state maps for each cell type. The data from each single cell are sparse, comprising on the order of 1,000 unique reads. However, by assaying thousands of ES cells, we identify a spectrum of subpopulations defined by differences in chromatin signatures of pluripotency and differentiation priming. We corroborate these findings by comparison to orthogonal single-cell gene expression data. Our method for single-cell analysis reveals aspects of epigenetic heterogeneity not captured by transcriptional analysis alone.
At a glance
Gene Expression Omnibus
- Mapping human epigenomes. Cell 155, 39–55 (2013). &
- A decade of exploring the cancer epigenome–biological and translational implications. Nat. Rev. Cancer 11, 726–734 (2011). &
- ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
- Mapping and analysis of chromatin state dynamics in nine human cell types. Nature 473, 43–49 (2011). et al.
- Single-cell transcriptomics reveals bimodality in expression and splicing in immune cells. Nature 498, 236–240 (2013). et al.
- Single-cell genomics. Nat. Methods 8, 311–314 (2011). &
- Using gene expression noise to understand gene regulation. Science 336, 183–187 (2012). , &
- Single-cell Hi-C reveals cell-to-cell variability in chromosome structure. Nature 502, 59–64 (2013). et al.
- Linking stochastic fluctuations in chromatin structure and gene expression. PLoS Biol. 11, e1001621 (2013). , , , &
- Multiplex single-cell profiling of chromatin accessibility by combinatorial cellular indexing. Science 348, 910–914 (2015). et al.
- Single-molecule analysis of combinatorial epigenomic states in normal and tumor cells. Proc. Natl. Acad. Sci. USA 110, 7772–7777 (2013). et al.
- Reconstructing lineage hierarchies of the distal lung epithelium using single-cell RNA-seq. Nature 509, 371–375 (2014). et al.
- Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma. Science 344, 1396–1401 (2014). et al.
- Single-cell exome sequencing reveals single-nucleotide mutation characteristics of a kidney tumor. Cell 148, 886–895 (2012). et al.
- Clonal evolution in breast cancer revealed by single nucleus genome sequencing. Nature 512, 155–160 (2014). et al.
- The present and future role of microfluidics in biomedical research. Nature 507, 181–189 (2014). , &
- Droplet microfluidics for high-throughput biological assays. Lab Chip 12, 2146–2155 (2012). , , &
- Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell 161, 1187–1201 (2015). et al.
- Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell 161, 1202–1214 (2015). et al.
- High-throughput single-cell labeling (Hi-SCL) for RNA-Seq using drop-based microfluidics. PLoS ONE 10, e0116328 (2015). et al.
- Genome-wide chromatin maps derived from limited numbers of hematopoietic progenitors. Nat. Methods 7, 615–618 (2010). , &
- Automated microfluidic chromatin immunoprecipitation from 2,000 cells. Lab Chip 9, 1365–1370 (2009). et al.
- Immunogenetics. Chromatin state dynamics during blood formation. Science 345, 943–949 (2014). et al.
- Epigenetic characterization of the early embryo with a chromatin immunoprecipitation protocol applicable to small cell populations. Nat. Genet. 38, 835–841 (2006). , &
- Regulatory principles of pluripotency: from the ground state up. Cell Stem Cell 15, 416–430 (2014). &
- Single-cell gene expression profiles define self-renewing, pluripotent, and lineage primed states of human pluripotent stem cells. Stem Cell Rep. 2, 881–895 (2014). et al.
- Dynamic heterogeneity and DNA methylation in embryonic stem cells. Mol. Cell 55, 319–331 (2014). et al.
- Single-cell genome-wide bisulfite sequencing for assessing epigenetic heterogeneity. Nat. Methods 11, 817–820 (2014). et al.
- Nanog safeguards pluripotency and mediates germline development. Nature 450, 1230–1234 (2007). et al.
- An embryonic stem cell–like gene expression signature in poorly differentiated aggressive human tumors. Nat. Genet. 40, 499–507 (2008). et al.
- Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013). et al.
- Chromatin in pluripotent embryonic stem cells and differentiation. Nat. Rev. Mol. Cell Biol. 7, 540–546 (2006). &
- Integration of external signaling pathways with the core transcriptional network in embryonic stem cells. Cell 133, 1106–1117 (2008). et al.
- Foxa2 and H2A.Z mediate nucleosome depletion during embryonic stem cell differentiation. Cell 151, 1608–1616 (2012). et al.
- Chromatin signatures of pluripotent cell lines. Nat. Cell Biol. 8, 532–538 (2006). et al.
- A bivalent chromatin structure marks key developmental genes in embryonic stem cells. Cell 125, 315–326 (2006). et al.
- Genome-wide chromatin state transitions associated with developmental and environmental cues. Cell 152, 642–654 (2013). et al.
- Single-cell DNA methylome sequencing and bioinformatic inference of epigenomic cell-state dynamics. Cell Rep. 10, 1386–1397 (2015). et al.
- Naive and primed pluripotent states. Cell Stem Cell 4, 487–492 (2009). &
- Genomewide analysis of PRC1 and PRC2 occupancy identifies two classes of bivalent domains. PLoS Genet. 4, e1000242 (2008). et al.
- Deconstructing transcriptional heterogeneity in pluripotent stem cells. Nature 516, 56–61 (2014). et al.
- Single-cell analysis and sorting using droplet-based microfluidics. Nat. Protoc. 8, 870–891 (2013). et al.
- Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012). &
- Ab initio reconstruction of cell type–specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nat. Biotechnol. 28, 503–510 (2010). et al.
- Modern Applied Statistics with S 4th edn. (Springer, New York, 2002).
- Supplementary Figure 2: Barcode design and library preparation. (215 KB)
A) Two orientation of the same barcode adapter. The different regions of the barcode are color coded. B) Four possible configurations of the same barcode adapter ligating on both ends of a genomic read. The sequences depict the end result of a successful library preparation. Shaded regions depict the part of the fragments that are sequenced for each of the paired end reads. C) Microfluidics 96 “parallel drop-maker” device is shown. Each drop-maker encapsulates barcodes from a well in one quadrant of a 384-well plate. D) Micrograph shows drops from a barcode library emulsion stored at 4°C for 6 months. Inset compares diameter distribution of stored drops against the same batch immediately after library generation.
- Supplementary Figure 3: Comparison between H3K4me2 Bulk ChIP-seq and Drop-ChIP-seq. (90 KB)
A) The fractions of reads that are mapped onto gene promoters and enhancer regions are comparable between Drop-ChIP and Bulk ChIP-seq. Enhancers regions were identified as peaks of H3K27ac with no overlap with promoter regions. B) Average GC content of Drop-ChIP reads is 46%, identical to that in bulk ChIP-seq. C) The distance of distal marks to the nearest promoter region is comparable between Drop-ChIP (65.5kb) and ChIP-seq reads (68.0kb). D) Venn diagrams show the number of promoter and enhancer regions that are most highly represented in Drop-ChIP measurements and in ChIP-seq reads, as well as the overlap between the two measurements. E) Genome-wide correlation between bulk ChIP-seq and Drop-ChIP data, measured in fragments per million reads (fpm) in non-overlapping 5kb windows. Only chromosome 19 was used for the plot. Correlation score (r=0.83) computed genome wide.
- Supplementary Figure 4: Aggregation of Drop-ChIP data from 200 single cells yields a high quality profile that recapitulates bulk ChIP-seq. (92 KB)
A) Chromatin maps generated by aggregating Drop-ChIP data from 200 single cells (‘200’) closely match conventional ChIP-seq data (‘Bulk’). Tracks depict H3K4me3 or H3K4me2 in ES cells, MEFs or EML cells for intervals on Chr6: 125,140,000-125180000 and Chr17: 35,486,000-35,526,000. Scatter plots (right) compare aggregate profile signals against conventional ChIP-seq signals in non-overlapping 5kb windows in chromosome 19 (genome-wide correlations between respective datasets are indicated, top right plot is identical to Supplementary Figure 3E). B) Receiver operating characteristic (ROC) curve plots the fraction of true peaks vs. the fraction of false peaks called from H3K4me3 chromatin profiles derived by aggregating the indicated numbers of ES cell profiles. Area under the curve (AUC) for aggregates of different sizes is plotted in the inset. Excellent recovery of bulk peaks (AUC>0.9) is attained for 500 single-cells or more.
- Supplementary Figure 5: Data from single cells are essential for detecting subpopulations. (91 KB)
A) The correlation of H3K4me3 Drop-ChIP profiles for a mixture of 200 ES cells and 200 MEFs with H3K4me3 bulk profiles of ES cells (y-axis) is plotted vs their correlation with H3K4me3 MEF bulk profile (x-axis) (left). Two groups of cells, one correlating stronger with ES profile and another resembling MEF profile, are clearly observed (and enables correct assignment of cell type with >95% accuracy; Fig. 4C). Similar clusters could also be derived by unsupervised divisive hierarchical clustering (middle). Aggregate profiles derived for these clusters closely match ES cells and MEFs, respectively (right; shown for region on Chromosome 17). B) To demonstrate the importance of single-cell data, we combined reads from randomly selected sets of 5 cells and repeated both analyses. Neither supervised (left) nor unsupervised (middle) procedures were able to distinguish the two cell states within the mixed population when starting from these compromised data. Consequently, the chromatin profiles could no longer be deconvoluted (right). C) For comparison, bulk H3K4me3 profiles for ES cells and MEFs are shown over the same locus.
- Supplementary Figure 6: Single-cell chromatin profiles de-convolve mixed cell populations. (81 KB)
A) A mixture of ES cells and MEFs was applied to the microfluidics device and subjected to Drop-ChIP for H3K4me3. Divisive hierarchical clustering was then applied to the single-cell profiles as described in the methods. The resulting hierarchal tree reveals two major clusters, whose aggregate profiles closely match bulk H3K4me3 profiles for ES cells and MEFs, respectively (shown for region on Chromosome 17). B) Re-clustering of these cells together with single-cell H3K4me3 profiles from pre-labeled ES cells and MEFs confirms that the original completely unbiased clustering distinguished ES cells from MEFs with >95% specificity and sensitivity.
- Supplementary Figure 7: Clustering dendrogram of chromatin signatures. (464 KB)
Heat map shows pairwise correlations between 91 different chromatin signatures derived from 314 ChIP-seq datasets for histone marks, chromatin regulators and transcription factors assayed in different cell types (see Online Methods). The 91 signatures were clustered based on Pearson correlations derived from the degree of overlap between the enriched intervals that make up each signature. Labels (right) refer to signatures that form the major clusters. See also Supplementary Table 4.
- Supplementary Figure 8: Sensitivity of K4me2 clusters to technical aspects of the analysis. (112 KB)
A) The distribution of single cell coverage in each of the 4 clusters found and described in Figure 6A. B) The fraction of cells that were assigned to their original cluster when re-clustering a randomly chosen subset of 50% of cells (averaged over 100 random subsets). See also Figure 5C. C) The correlations between signatures are not driven by the overlap between them. The correlation between chromatin signatures based on co-variation across cells in the population is plotted vs. their overlap-based correlations, when using the real single-cell data (top) or shuffled data (bottom). Red dots indicate correlations between members of the chosen set of 91 signatures. See also the quality control for clustering analysis. The plots show that correlations in the shuffled data are equal to (and determined by) the overlaps between the signatures. In contrast, the correlations between the signatures in the real data are higher than that due to actual co-variation of signal in distinct genomic regions across single cells. D) The clustering of single cells is largely the same whether one uses the representative signatures, the entire set, or a completely different set of signatures. Top (replica of Fig. 5B): Multidimensional scaling (MDS) plot comparing the chromatin landscapes of single ES cells and MEFs (colored circles). The distance between any two cells is proportional to the distance between their 91-dimensional signature vectors. The plot shows 1,000 single cells (randomly sampled from the 5,405 cells with H3K4me2 data), colored by their cluster association. Middle: the same single-cell profiles re-clustered using the complete set of 314 chromatin signatures (see Online Methods). Bottom: the same single-cell profiles re-clustered using a different set of chromatin signatures (datasets taken from Meshorer, see Supplementary Note 1). E) The distributions of single-cell scores for G1/S related signature, G2/M related signature and OCT4 signature are plotted for ES1, ES2 and ES3. See Supplementary Note 1 for more information on the cell cycle related signatures.
- Supplementary Figure 9: Detection limit of rare subpopulations. (114 KB)
In-silico subsampling of data to simulate lowering the size and frequency of ES and MEF subpopulations derived from H3K4me2 data. The fraction of cells from the subsampled clusters that were correctly classified (True Positives) is plotted for ES1 (A) and for MEF (B) as a function of relative cluster size (Frequency) and of the total number of cells in the in-silico sample. C) True positive rate (color coded) averaged over all ES clusters is plotted as a function of the frequency and size of the targeted subpopulation. The contour line of True Positives = 75% is fitted by a model (white line) that predicts that to maintain a given purity of clustering the size of the subpopulation should grow inversely proportional with the frequency of the subpopulation. D) The model used in C predicts that the number of cells required for accurately detecting a subpopulation grows quadratically for rare subpopulations. This prediction was numerically validated by subsampling our data for populations of up to 5000 cells and for subpopulations as rare as 3% (dashed lines).
- Supplementary Figure 10: Chromatin signatures that separate K4me2 clusters also separate between RNA expression clusters. (66 KB)
A) The distribution of RNA expression scores for 6 signatures that emerged as differential across the Drop-ChIP ES clusters, is plotted for single cells from two clusters that were derived based on these signatures. B) The mean (normalized) expression of genes in the two clusters from A is plotted for 3 sets of genes: those that have more H3K4me2 in ES1 (446 genes), those that are more enriched in ES3 (47 genes) and all other genes. For more details, see Supplementary Note 2.
- Supplementary Text and Figures (2,355 KB)
Supplementary Figures 1–10, Supplementary Tables 1 and 3, and Supplementary Notes 1–3
- Supplementary Table 2 (38 KB)
- Supplementary Table 4 (16 KB)
Signatures data set sources
- Supplementary Design Files (290 KB)
Design of microfluidic devices. This compressed folder includes 3 ACAD designs, one for the 96 parallel drop makers, one for the co-flow drop maker and one for the 3 point merger.