Single-cell ChIP-seq reveals cell subpopulations defined by chromatin state

Journal name:
Nature Biotechnology
Volume:
33,
Pages:
1165–1172
Year published:
DOI:
doi:10.1038/nbt.3383
Received
Accepted
Published online

Abstract

Chromatin profiling provides a versatile means to investigate functional genomic elements and their regulation. However, current methods yield ensemble profiles that are insensitive to cell-to-cell variation. Here we combine microfluidics, DNA barcoding and sequencing to collect chromatin data at single-cell resolution. We demonstrate the utility of the technology by assaying thousands of individual cells and using the data to deconvolute a mixture of ES cells, fibroblasts and hematopoietic progenitors into high-quality chromatin state maps for each cell type. The data from each single cell are sparse, comprising on the order of 1,000 unique reads. However, by assaying thousands of ES cells, we identify a spectrum of subpopulations defined by differences in chromatin signatures of pluripotency and differentiation priming. We corroborate these findings by comparison to orthogonal single-cell gene expression data. Our method for single-cell analysis reveals aspects of epigenetic heterogeneity not captured by transcriptional analysis alone.

At a glance

Figures

  1. Overview of Drop-ChIP procedure for acquiring single cell chromatin data.
    Figure 1: Overview of Drop-ChIP procedure for acquiring single cell chromatin data.

    (a) Microfluidics workflow. A library of drops containing DNA barcodes is prepared by emulsifying DNA suspensions from plates (top left). Cells are encapsulated and lysed in drops and then their chromatin is fragmented (bottom left). Chromatin-bearing drops and barcode drops are merged in a microfluidic device, and DNA barcodes are ligated to the chromatin fragments, thus indexing them to originating cell. (b) Combined contents of many drops are immunoprecipitated in the presence of 'carrier' chromatin and the enriched DNA is sequenced. (c) Sequencing reads are partitioned by their barcode sequences to yield single cell chromatin profiles (left). An unsupervised algorithm identifies groups of related single cell profiles, which are then aggregated to produce high-quality chromatin profiles for subpopulations (right). See also Supplementary Figure 1.

  2. Labeling single-cell chromatin by drop-based microfluidics.
    Figure 2: Labeling single-cell chromatin by drop-based microfluidics.

    (a) Micrograph shows an aqueous suspension of cells ('S') co-flowed together with lysis buffer and MNase ('B') as they enter the drop maker junction and disperse in oil ('O'), resulting in the formation of cell-bearing drops (see also Supplementary Video 1). (b) Micrograph shows cell-bearing drops (~50-μm diameter) and barcode-bearing drops (~30-μm diameter) paired in a microfluidics “three-point merger” device. As adjacent drops flow by the electrodes (+ and −), an induced electric field triggers their coalescence; simultaneously, labeling buffer (B) containing ligase is injected into the merged drops (Supplementary Video 2). (c) Table depicts estimated frequencies of possible drop fusion outcomes. The number of cells in each drop was measured from Supplementary Video 1 (see panel a). Drops containing cells or cell debris may fuse with one (90%) or two (10%) barcode drops (green frame). Two-barcode fusion events can be detected and corrected in silico. Background reads contributed by drops that contain only cell debris are also filtered in silico. (d) The frequency distribution of barcodes is plotted as a function of the number of reads contributed by each barcode and fitted to a sum of two Poisson distributions, one for the background reads (blue) and one for the single-cells reads (green; see Online Methods). Barcodes in the highlighted range are assumed to originate from single cells and are retained for further analysis. Scale bars, 100 μm.

  3. Symmetric barcoding and amplification of chromatin fragments.
    Figure 3: Symmetric barcoding and amplification of chromatin fragments.

    (a) Barcode adapters (top) are 64-bp double-stranded oligonucleotides with universal primers, barcode sequences and restriction sites, whose symmetric design allows ligation on either side. Schematic (bottom left) depicts possible outcomes of ligation in drops, including symmetrically labeled nucleosomes, asymmetrically labeled nucleosomes and adaptor concatemers. Concatemers are removed by digestion of PacI sites formed by adaptor juxtaposition (bottom center), allowing selective PCR amplification of symmetrically adapted chromatin fragments (bottom right). See also Supplementary Figure 2. (b) Gel electrophoresis for DNA products at successive assay stages. Left lane, DNA ladder; MNase, DNA fragments purified after capture, lysis and MNase digestion of single cells in drops confirm efficient digestion to mononucleosomes (~1 million drops collected); Concat, Illumina library prepared from adaptor-ligated chromatin fragments without PacI digestion reveals overwhelming concatemer bias; Library, Illumina library prepared from adaptor-ligated chromatin fragments digested with PacI, reveals appropriate MNase digestion pattern, shifted by the size of barcode and Illumina adapters. (c) Pie charts depict numbers of uniquely aligned sequencing read that satisfy successive filtering criteria (values reflect data from 100 single cells, averaged over 82 trials). We select reads that have barcode sequences on both ends (top) with matching sequence (middle). We then apply a Poisson model to identify barcodes that represent single cells (bottom). (d) Heat map depicts homogeneity of barcode selection. Barcodes (rows) are colored according to their relative prevalence (rank order) across 37 experiments (columns). The absence of bias toward particular barcodes (light or dark horizontal stripes) indicates the homogeneity of the barcode library. The mean normalized rank over all barcodes (right) is close to 0.5, consistent with balanced representation. (e) Stability of the barcode library emulsion over time. The fraction of reads with matching barcodes on both ends is plotted as a function of time from encapsulation of the barcode library. (f) The microfluidics system was applied to barcode a mixed suspension of human and mouse cells. For each barcode, plot depicts the number of reads aligning to the mouse genome (y axis) versus the number of reads aligning to the human genome (x axis). The data suggest that a vast majority of barcodes is unique to a single cell.

  4. Single-cell H3K4me3 chromatin data inform about subpopulations of known cell types.
    Figure 4: Single-cell H3K4me3 chromatin data inform about subpopulations of known cell types.

    (a) Drop-ChIP data is shown for 50 ES cells (ESCs) and 50 MEFs across representative gene loci. Each row represents data from a single cell. Each column includes reads in 330-kb regions centered on selected genes (Anxa1, chr19: 20465000; M6pr, chr6: 122269000; Egr2: chr10: 67022000; Ring1b, chr17: 34262000; Cyb5d1, chr11: 69207000; Ctbp2, chr7: 140254000; Pou5f1, chr17: 35612000; Sox2, chr3: 34573000). A high proportion of reads aligns to genomic positions enriched in both bulk ChIP-seq assays ('Bulk') and aggregated chromatin profiles from 200 single-cell assays ('200'), providing evidence that single-cell data are informative. (b) The precision (fraction of single-cell reads overlapping known H3K4me3 peaks) and sensitivity (fraction of known H3K4me3 peaks occupied by single-cell reads) are plotted for the top 50 ES cells by sensitivity and for all ES cells in the data set. These data are compared to random profiles simulated by arbitrarily positioning reads. Middle bar marks the median, box covers the 25th–75th percentiles and whiskers cover the 1st–99th percentiles. The average ES cell H3K4me3 profile has a precision of 53% ± 12% and a sensitivity of 7% ± 4%, whereas the average ES cell H3K4me2 profile has a precision of 42% ± 5% and a sensitivity of 3% ± 2% (not shown). (c) For 400 single-cell H3K4me3 profiles, scatterplot depicts normalized detection of ES cell–specific intervals versus MEF-specific intervals. In this experiment, ES cells (red) and MEFs (green) were separately barcoded in the microfluidics device, but collectively immunoprecipitated and processed. A naive classification (black line) distinguishes ES cell profiles from MEF profiles with >95% specificity and sensitivity. (d) ES cells, MEFs and EML cells were separately barcoded but collectively processed to acquire 883 single-cell profiles (314 ES cells, 376 MEFs, 193 EMLs). These profiles were clustered using an unsupervised divisive hierarchical clustering algorithm (see Online Methods). The hierarchal tree discriminates between cell types with >95% accuracy, indicating that the information content of single-cell profiles is sufficient to accurately group related cells and thereby distinguish cell states within a mixed population. See also Supplementary Figures 3, 4, 5, 6 and Online Methods.

  5. A spectrum of ES cell subpopulations with variable chromatin signatures for pluripotency and priming.
    Figure 5: A spectrum of ES cell subpopulations with variable chromatin signatures for pluripotency and priming.

    (a) Single-cell H3K4me2 data for 4,643 ES cells and 762 MEFs were subjected to agglomerative hierarchical clustering based on their scores in 91 signature sets of genomic regions (see Online Methods). Pie chart at left depicts the proportions of individual ES cells that cluster into each of three clusters (1,436 cells in ES1, 1,550 cells in ES2 and 1,648 cells in ES3), and pie chart at right depicts the relative numbers of ES cells and MEFs that cluster into a fourth group, which corresponds to MEFs. Heat map (below) depicts the mean signature scores (rows) for each cluster (columns). (b) Multidimensional scaling (MDS) plot compares the chromatin landscapes of single ES cells and MEFs (colored dots). The distance between any two dots (cells) approximates the distance between their 91-dimensional signature vectors. The plot shows 1,000 single cells (randomly sampled from the 5,405 cells with H3K4me2 data), colored on the basis of their cluster association. Tight co-localization of the MEF cluster and, to a lesser degree, the ES1 cluster suggests that the corresponding landscapes are relatively more homogeneous. In contrast, the ES2 and ES3 clusters are more broadly distributed and may reflect a gradient of single cell states. (c) MDS plot as in b, but with cells that frequently switched clusters in bootstrapping tests on varying subsets of cells indicated in black (see Online Methods). These unstable cells are exclusively located on the borders between clusters. (d) Violin plots show the distribution of peak widths for peaks called from aggregate ES1, ES2 or ES3 profiles (see Online Methods). (e) Venn diagram depicts the relative numbers and overlaps of peaks called from aggregate ES1, ES2 or ES3 profiles. The ES1 cluster is notable for higher pluripotency-signature scores, larger numbers of peaks and tighter internal concordance. In contrast, the ES3 cluster has higher activity over Polycomb signatures and increased heterogeneity, potentially reflecting a mixture of primed states. See also Supplementary Figures 7 and 8, Supplementary Note 1 and the online source data for this figure.

  6. Orthogonal single-cell assays corroborate ES cell subpopulations and cell-to-cell variability in regulatory programs.
    Figure 6: Orthogonal single-cell assays corroborate ES cell subpopulations and cell-to-cell variability in regulatory programs.

    (a) The distribution of single-cell scores for eight dominant signatures is plotted for ES1, ES2 and ES3. Vertical lines depict the mean score of each signature in MEFs. DNAme signature consists of 10,000 regions identified by Kelsey et al.28 as most variable in their methylation status in ES cells. (b) Heat map depicts positive and negative correlations between six selected signatures, based on co-variation of H3K4me2 across single ES cells. (c) Heat map depicts positive and negative correlations between six selected signatures, based on co-variation of expression across single ES cells (See Supplementary Note 2). (d) Scatterplot depicts correlations between the indicated signature pairs across single ES cells, as determined from H3K4me2 or RNA expression data. Best-fit line and Pearson correlation are also indicated. Thus, orthogonal single-cell techniques lead to similar conclusions regarding ES cell subpopulations and underlying patterns of variability in pluripotency and Polycomb signatures, suggestive of a continuum from pluripotent to primed states. See also Supplementary Figure 10.

  7. Flow chart of the microfluidic protocol.
    Supplementary Fig. 1: Flow chart of the microfluidic protocol.
  8. Barcode design and library preparation.
    Supplementary Fig. 2: Barcode design and library preparation.

    A) Two orientation of the same barcode adapter. The different regions of the barcode are color coded. B) Four possible configurations of the same barcode adapter ligating on both ends of a genomic read. The sequences depict the end result of a successful library preparation. Shaded regions depict the part of the fragments that are sequenced for each of the paired end reads. C) Microfluidics 96 “parallel drop-maker” device is shown. Each drop-maker encapsulates barcodes from a well in one quadrant of a 384-well plate. D) Micrograph shows drops from a barcode library emulsion stored at 4°C for 6 months. Inset compares diameter distribution of stored drops against the same batch immediately after library generation.

  9. Comparison between H3K4me2 Bulk ChIP-seq and Drop-ChIP-seq.
    Supplementary Fig. 3: Comparison between H3K4me2 Bulk ChIP-seq and Drop-ChIP-seq.

    A) The fractions of reads that are mapped onto gene promoters and enhancer regions are comparable between Drop-ChIP and Bulk ChIP-seq. Enhancers regions were identified as peaks of H3K27ac with no overlap with promoter regions. B) Average GC content of Drop-ChIP reads is 46%, identical to that in bulk ChIP-seq. C) The distance of distal marks to the nearest promoter region is comparable between Drop-ChIP (65.5kb) and ChIP-seq reads (68.0kb). D) Venn diagrams show the number of promoter and enhancer regions that are most highly represented in Drop-ChIP measurements and in ChIP-seq reads, as well as the overlap between the two measurements. E) Genome-wide correlation between bulk ChIP-seq and Drop-ChIP data, measured in fragments per million reads (fpm) in non-overlapping 5kb windows. Only chromosome 19 was used for the plot. Correlation score (r=0.83) computed genome wide.

  10. Aggregation of Drop-ChIP data from 200 single cells yields a high quality profile that recapitulates bulk ChIP-seq.
    Supplementary Fig. 4: Aggregation of Drop-ChIP data from 200 single cells yields a high quality profile that recapitulates bulk ChIP-seq.

    A) Chromatin maps generated by aggregating Drop-ChIP data from 200 single cells (‘200’) closely match conventional ChIP-seq data (‘Bulk’). Tracks depict H3K4me3 or H3K4me2 in ES cells, MEFs or EML cells for intervals on Chr6: 125,140,000-125180000 and Chr17: 35,486,000-35,526,000. Scatter plots (right) compare aggregate profile signals against conventional ChIP-seq signals in non-overlapping 5kb windows in chromosome 19 (genome-wide correlations between respective datasets are indicated, top right plot is identical to Supplementary Figure 3E). B) Receiver operating characteristic (ROC) curve plots the fraction of true peaks vs. the fraction of false peaks called from H3K4me3 chromatin profiles derived by aggregating the indicated numbers of ES cell profiles. Area under the curve (AUC) for aggregates of different sizes is plotted in the inset. Excellent recovery of bulk peaks (AUC>0.9) is attained for 500 single-cells or more.

  11. Data from single cells are essential for detecting subpopulations.
    Supplementary Fig. 5: Data from single cells are essential for detecting subpopulations.

    A) The correlation of H3K4me3 Drop-ChIP profiles for a mixture of 200 ES cells and 200 MEFs with H3K4me3 bulk profiles of ES cells (y-axis) is plotted vs their correlation with H3K4me3 MEF bulk profile (x-axis) (left). Two groups of cells, one correlating stronger with ES profile and another resembling MEF profile, are clearly observed (and enables correct assignment of cell type with >95% accuracy; Fig. 4C). Similar clusters could also be derived by unsupervised divisive hierarchical clustering (middle). Aggregate profiles derived for these clusters closely match ES cells and MEFs, respectively (right; shown for region on Chromosome 17). B) To demonstrate the importance of single-cell data, we combined reads from randomly selected sets of 5 cells and repeated both analyses. Neither supervised (left) nor unsupervised (middle) procedures were able to distinguish the two cell states within the mixed population when starting from these compromised data. Consequently, the chromatin profiles could no longer be deconvoluted (right). C) For comparison, bulk H3K4me3 profiles for ES cells and MEFs are shown over the same locus.

  12. Single-cell chromatin profiles de-convolve mixed cell populations.
    Supplementary Fig. 6: Single-cell chromatin profiles de-convolve mixed cell populations.

    A) A mixture of ES cells and MEFs was applied to the microfluidics device and subjected to Drop-ChIP for H3K4me3. Divisive hierarchical clustering was then applied to the single-cell profiles as described in the methods. The resulting hierarchal tree reveals two major clusters, whose aggregate profiles closely match bulk H3K4me3 profiles for ES cells and MEFs, respectively (shown for region on Chromosome 17). B) Re-clustering of these cells together with single-cell H3K4me3 profiles from pre-labeled ES cells and MEFs confirms that the original completely unbiased clustering distinguished ES cells from MEFs with >95% specificity and sensitivity.

  13. Clustering dendrogram of chromatin signatures.
    Supplementary Fig. 7: Clustering dendrogram of chromatin signatures.

    Heat map shows pairwise correlations between 91 different chromatin signatures derived from 314 ChIP-seq datasets for histone marks, chromatin regulators and transcription factors assayed in different cell types (see Online Methods). The 91 signatures were clustered based on Pearson correlations derived from the degree of overlap between the enriched intervals that make up each signature. Labels (right) refer to signatures that form the major clusters. See also Supplementary Table 4.

  14. Sensitivity of K4me2 clusters to technical aspects of the analysis.
    Supplementary Fig. 8: Sensitivity of K4me2 clusters to technical aspects of the analysis.

    A) The distribution of single cell coverage in each of the 4 clusters found and described in Figure 6A. B) The fraction of cells that were assigned to their original cluster when re-clustering a randomly chosen subset of 50% of cells (averaged over 100 random subsets). See also Figure 5C. C) The correlations between signatures are not driven by the overlap between them. The correlation between chromatin signatures based on co-variation across cells in the population is plotted vs. their overlap-based correlations, when using the real single-cell data (top) or shuffled data (bottom). Red dots indicate correlations between members of the chosen set of 91 signatures. See also the quality control for clustering analysis. The plots show that correlations in the shuffled data are equal to (and determined by) the overlaps between the signatures. In contrast, the correlations between the signatures in the real data are higher than that due to actual co-variation of signal in distinct genomic regions across single cells. D) The clustering of single cells is largely the same whether one uses the representative signatures, the entire set, or a completely different set of signatures. Top (replica of Fig. 5B): Multidimensional scaling (MDS) plot comparing the chromatin landscapes of single ES cells and MEFs (colored circles). The distance between any two cells is proportional to the distance between their 91-dimensional signature vectors. The plot shows 1,000 single cells (randomly sampled from the 5,405 cells with H3K4me2 data), colored by their cluster association. Middle: the same single-cell profiles re-clustered using the complete set of 314 chromatin signatures (see Online Methods). Bottom: the same single-cell profiles re-clustered using a different set of chromatin signatures (datasets taken from Meshorer, see Supplementary Note 1). E) The distributions of single-cell scores for G1/S related signature, G2/M related signature and OCT4 signature are plotted for ES1, ES2 and ES3. See Supplementary Note 1 for more information on the cell cycle related signatures.

  15. Detection limit of rare subpopulations.
    Supplementary Fig. 9: Detection limit of rare subpopulations.

    In-silico subsampling of data to simulate lowering the size and frequency of ES and MEF subpopulations derived from H3K4me2 data. The fraction of cells from the subsampled clusters that were correctly classified (True Positives) is plotted for ES1 (A) and for MEF (B) as a function of relative cluster size (Frequency) and of the total number of cells in the in-silico sample. C) True positive rate (color coded) averaged over all ES clusters is plotted as a function of the frequency and size of the targeted subpopulation. The contour line of True Positives = 75% is fitted by a model (white line) that predicts that to maintain a given purity of clustering the size of the subpopulation should grow inversely proportional with the frequency of the subpopulation. D) The model used in C predicts that the number of cells required for accurately detecting a subpopulation grows quadratically for rare subpopulations. This prediction was numerically validated by subsampling our data for populations of up to 5000 cells and for subpopulations as rare as 3% (dashed lines).

  16. Chromatin signatures that separate K4me2 clusters also separate between RNA expression clusters.
    Supplementary Fig. 10: Chromatin signatures that separate K4me2 clusters also separate between RNA expression clusters.

    A) The distribution of RNA expression scores for 6 signatures that emerged as differential across the Drop-ChIP ES clusters, is plotted for single cells from two clusters that were derived based on these signatures. B) The mean (normalized) expression of genes in the two clusters from A is plotted for 3 sets of genes: those that have more H3K4me2 in ES1 (446 genes), those that are more enriched in ES3 (47 genes) and all other genes. For more details, see Supplementary Note 2.

Videos

  1. Cell encapsulation.
    Video 1: Cell encapsulation.
    A slowed down movie showing barcode drops (small) and drops containing cellular chromatin (large) merge together with labeling buffer flowing into the T-junction.
  2. Drop labeling.
    Video 2: Drop labeling.
    A slowed down movie showing barcode drops (small) and drops containing cellular chromatin (large) merge together with labeling buffer flowing into the T-junction.

Accession codes

Primary accessions

Gene Expression Omnibus

References

  1. Rivera, C.M. & Ren, B. Mapping human epigenomes. Cell 155, 3955 (2013).
  2. Baylin, S.B. & Jones, P.A. A decade of exploring the cancer epigenome–biological and translational implications. Nat. Rev. Cancer 11, 726734 (2011).
  3. ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 5774 (2012).
  4. Ernst, J. et al. Mapping and analysis of chromatin state dynamics in nine human cell types. Nature 473, 4349 (2011).
  5. Shalek, A.K. et al. Single-cell transcriptomics reveals bimodality in expression and splicing in immune cells. Nature 498, 236240 (2013).
  6. Kalisky, T. & Quake, S.R. Single-cell genomics. Nat. Methods 8, 311314 (2011).
  7. Munsky, B., Neuert, G. & van Oudenaarden, A. Using gene expression noise to understand gene regulation. Science 336, 183187 (2012).
  8. Nagano, T. et al. Single-cell Hi-C reveals cell-to-cell variability in chromosome structure. Nature 502, 5964 (2013).
  9. Brown, C.R., Mao, C., Falkovskaia, E., Jurica, M.S. & Boeger, H. Linking stochastic fluctuations in chromatin structure and gene expression. PLoS Biol. 11, e1001621 (2013).
  10. Cusanovich, D.A. et al. Multiplex single-cell profiling of chromatin accessibility by combinatorial cellular indexing. Science 348, 910914 (2015).
  11. Murphy, P.J. et al. Single-molecule analysis of combinatorial epigenomic states in normal and tumor cells. Proc. Natl. Acad. Sci. USA 110, 77727777 (2013).
  12. Treutlein, B. et al. Reconstructing lineage hierarchies of the distal lung epithelium using single-cell RNA-seq. Nature 509, 371375 (2014).
  13. Patel, A.P. et al. Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma. Science 344, 13961401 (2014).
  14. Xu, X. et al. Single-cell exome sequencing reveals single-nucleotide mutation characteristics of a kidney tumor. Cell 148, 886895 (2012).
  15. Wang, Y. et al. Clonal evolution in breast cancer revealed by single nucleus genome sequencing. Nature 512, 155160 (2014).
  16. Sackmann, E.K., Fulton, A.L. & Beebe, D.J. The present and future role of microfluidics in biomedical research. Nature 507, 181189 (2014).
  17. Guo, M.T., Rotem, A., Heyman, J.A. & Weitz, D.A. Droplet microfluidics for high-throughput biological assays. Lab Chip 12, 21462155 (2012).
  18. Klein, A.M. et al. Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell 161, 11871201 (2015).
  19. Macosko, E.Z. et al. Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell 161, 12021214 (2015).
  20. Rotem, A. et al. High-throughput single-cell labeling (Hi-SCL) for RNA-Seq using drop-based microfluidics. PLoS ONE 10, e0116328 (2015).
  21. Adli, M., Zhu, J. & Bernstein, B.E. Genome-wide chromatin maps derived from limited numbers of hematopoietic progenitors. Nat. Methods 7, 615618 (2010).
  22. Wu, A.R. et al. Automated microfluidic chromatin immunoprecipitation from 2,000 cells. Lab Chip 9, 13651370 (2009).
  23. Lara-Astiaso, D. et al. Immunogenetics. Chromatin state dynamics during blood formation. Science 345, 943949 (2014).
  24. O'Neill, L.P., VerMilyea, M.D. & Turner, B.M. Epigenetic characterization of the early embryo with a chromatin immunoprecipitation protocol applicable to small cell populations. Nat. Genet. 38, 835841 (2006).
  25. Hackett, J.A. & Surani, M.A. Regulatory principles of pluripotency: from the ground state up. Cell Stem Cell 15, 416430 (2014).
  26. Hough, S.R. et al. Single-cell gene expression profiles define self-renewing, pluripotent, and lineage primed states of human pluripotent stem cells. Stem Cell Rep. 2, 881895 (2014).
  27. Singer, Z.S. et al. Dynamic heterogeneity and DNA methylation in embryonic stem cells. Mol. Cell 55, 319331 (2014).
  28. Smallwood, S.A. et al. Single-cell genome-wide bisulfite sequencing for assessing epigenetic heterogeneity. Nat. Methods 11, 817820 (2014).
  29. Chambers, I. et al. Nanog safeguards pluripotency and mediates germline development. Nature 450, 12301234 (2007).
  30. Ben-Porath, I. et al. An embryonic stem cell–like gene expression signature in poorly differentiated aggressive human tumors. Nat. Genet. 40, 499507 (2008).
  31. Alexandrov, L.B. et al. Signatures of mutational processes in human cancer. Nature 500, 415421 (2013).
  32. Meshorer, E. & Misteli, T. Chromatin in pluripotent embryonic stem cells and differentiation. Nat. Rev. Mol. Cell Biol. 7, 540546 (2006).
  33. Chen, X. et al. Integration of external signaling pathways with the core transcriptional network in embryonic stem cells. Cell 133, 11061117 (2008).
  34. Li, Z. et al. Foxa2 and H2A.Z mediate nucleosome depletion during embryonic stem cell differentiation. Cell 151, 16081616 (2012).
  35. Azuara, V. et al. Chromatin signatures of pluripotent cell lines. Nat. Cell Biol. 8, 532538 (2006).
  36. Bernstein, B.E. et al. A bivalent chromatin structure marks key developmental genes in embryonic stem cells. Cell 125, 315326 (2006).
  37. Zhu, J. et al. Genome-wide chromatin state transitions associated with developmental and environmental cues. Cell 152, 642654 (2013).
  38. Farlik, M. et al. Single-cell DNA methylome sequencing and bioinformatic inference of epigenomic cell-state dynamics. Cell Rep. 10, 13861397 (2015).
  39. Nichols, J. & Smith, A. Naive and primed pluripotent states. Cell Stem Cell 4, 487492 (2009).
  40. Ku, M. et al. Genomewide analysis of PRC1 and PRC2 occupancy identifies two classes of bivalent domains. PLoS Genet. 4, e1000242 (2008).
  41. Kumar, R.M. et al. Deconstructing transcriptional heterogeneity in pluripotent stem cells. Nature 516, 5661 (2014).
  42. Mazutis, L. et al. Single-cell analysis and sorting using droplet-based microfluidics. Nat. Protoc. 8, 870891 (2013).
  43. Langmead, B. & Salzberg, S.L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357359 (2012).
  44. Guttman, M. et al. Ab initio reconstruction of cell type–specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nat. Biotechnol. 28, 503510 (2010).
  45. Venables, W.N. Modern Applied Statistics with S 4th edn. (Springer, New York, 2002).

Download references

Author information

  1. Current address: Fraunhofer ICT-IMM, Mainz, Germany.

    • Ralph A Sperling
  2. These authors contributed equally to this work.

    • Assaf Rotem,
    • Oren Ram &
    • Noam Shoresh

Affiliations

  1. Department of Physics and School of Engineering and Applied Sciences, Harvard University, Cambridge, Massachusetts, USA.

    • Assaf Rotem,
    • Ralph A Sperling &
    • David A Weitz
  2. Epigenomics Program, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA.

    • Assaf Rotem,
    • Oren Ram,
    • Noam Shoresh &
    • Bradley E Bernstein
  3. Howard Hughes Medical Institute, Chevy Chase, Maryland, USA.

    • Oren Ram &
    • Bradley E Bernstein
  4. Department of Pathology, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts, USA.

    • Oren Ram &
    • Bradley E Bernstein
  5. Broad Technology Labs, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA.

    • Alon Goren

Competing financial interests

D.A.W. and B.E.B. are both founders and consultants for HiFiBio.

Corresponding authors

Correspondence to:

We thank A. Regev, N. Yosef, E. Shema, I. Tirosh, H. Zhang, S. Gillespie and J. Xing for their valuable comments and critiques of this work. We also thank G. Kelsey for sharing single-cell data for comparisons. This research was supported by funds from Howard Hughes Medical Institute, the National Human Genome Research Institute's Centers of Excellence in Genome Sciences (P50HG006193), ENCODE Project (U54HG006991), the National Heart, Lung, and Blood Institute (U01HL100395), the National Science Foundation (DMR-1310266), the Harvard Materials Research Science and Engineering Center (DMR-1420570) and the Defense Advanced Research Projects Agency (HR0011-11-C-0093).

Author details

Supplementary information

Supplementary Figures

  1. Supplementary Figure 1: Flow chart of the microfluidic protocol. (229 KB)
  2. Supplementary Figure 2: Barcode design and library preparation. (215 KB)

    A) Two orientation of the same barcode adapter. The different regions of the barcode are color coded. B) Four possible configurations of the same barcode adapter ligating on both ends of a genomic read. The sequences depict the end result of a successful library preparation. Shaded regions depict the part of the fragments that are sequenced for each of the paired end reads. C) Microfluidics 96 “parallel drop-maker” device is shown. Each drop-maker encapsulates barcodes from a well in one quadrant of a 384-well plate. D) Micrograph shows drops from a barcode library emulsion stored at 4°C for 6 months. Inset compares diameter distribution of stored drops against the same batch immediately after library generation.

  3. Supplementary Figure 3: Comparison between H3K4me2 Bulk ChIP-seq and Drop-ChIP-seq. (90 KB)

    A) The fractions of reads that are mapped onto gene promoters and enhancer regions are comparable between Drop-ChIP and Bulk ChIP-seq. Enhancers regions were identified as peaks of H3K27ac with no overlap with promoter regions. B) Average GC content of Drop-ChIP reads is 46%, identical to that in bulk ChIP-seq. C) The distance of distal marks to the nearest promoter region is comparable between Drop-ChIP (65.5kb) and ChIP-seq reads (68.0kb). D) Venn diagrams show the number of promoter and enhancer regions that are most highly represented in Drop-ChIP measurements and in ChIP-seq reads, as well as the overlap between the two measurements. E) Genome-wide correlation between bulk ChIP-seq and Drop-ChIP data, measured in fragments per million reads (fpm) in non-overlapping 5kb windows. Only chromosome 19 was used for the plot. Correlation score (r=0.83) computed genome wide.

  4. Supplementary Figure 4: Aggregation of Drop-ChIP data from 200 single cells yields a high quality profile that recapitulates bulk ChIP-seq. (92 KB)

    A) Chromatin maps generated by aggregating Drop-ChIP data from 200 single cells (‘200’) closely match conventional ChIP-seq data (‘Bulk’). Tracks depict H3K4me3 or H3K4me2 in ES cells, MEFs or EML cells for intervals on Chr6: 125,140,000-125180000 and Chr17: 35,486,000-35,526,000. Scatter plots (right) compare aggregate profile signals against conventional ChIP-seq signals in non-overlapping 5kb windows in chromosome 19 (genome-wide correlations between respective datasets are indicated, top right plot is identical to Supplementary Figure 3E). B) Receiver operating characteristic (ROC) curve plots the fraction of true peaks vs. the fraction of false peaks called from H3K4me3 chromatin profiles derived by aggregating the indicated numbers of ES cell profiles. Area under the curve (AUC) for aggregates of different sizes is plotted in the inset. Excellent recovery of bulk peaks (AUC>0.9) is attained for 500 single-cells or more.

  5. Supplementary Figure 5: Data from single cells are essential for detecting subpopulations. (91 KB)

    A) The correlation of H3K4me3 Drop-ChIP profiles for a mixture of 200 ES cells and 200 MEFs with H3K4me3 bulk profiles of ES cells (y-axis) is plotted vs their correlation with H3K4me3 MEF bulk profile (x-axis) (left). Two groups of cells, one correlating stronger with ES profile and another resembling MEF profile, are clearly observed (and enables correct assignment of cell type with >95% accuracy; Fig. 4C). Similar clusters could also be derived by unsupervised divisive hierarchical clustering (middle). Aggregate profiles derived for these clusters closely match ES cells and MEFs, respectively (right; shown for region on Chromosome 17). B) To demonstrate the importance of single-cell data, we combined reads from randomly selected sets of 5 cells and repeated both analyses. Neither supervised (left) nor unsupervised (middle) procedures were able to distinguish the two cell states within the mixed population when starting from these compromised data. Consequently, the chromatin profiles could no longer be deconvoluted (right). C) For comparison, bulk H3K4me3 profiles for ES cells and MEFs are shown over the same locus.

  6. Supplementary Figure 6: Single-cell chromatin profiles de-convolve mixed cell populations. (81 KB)

    A) A mixture of ES cells and MEFs was applied to the microfluidics device and subjected to Drop-ChIP for H3K4me3. Divisive hierarchical clustering was then applied to the single-cell profiles as described in the methods. The resulting hierarchal tree reveals two major clusters, whose aggregate profiles closely match bulk H3K4me3 profiles for ES cells and MEFs, respectively (shown for region on Chromosome 17). B) Re-clustering of these cells together with single-cell H3K4me3 profiles from pre-labeled ES cells and MEFs confirms that the original completely unbiased clustering distinguished ES cells from MEFs with >95% specificity and sensitivity.

  7. Supplementary Figure 7: Clustering dendrogram of chromatin signatures. (464 KB)

    Heat map shows pairwise correlations between 91 different chromatin signatures derived from 314 ChIP-seq datasets for histone marks, chromatin regulators and transcription factors assayed in different cell types (see Online Methods). The 91 signatures were clustered based on Pearson correlations derived from the degree of overlap between the enriched intervals that make up each signature. Labels (right) refer to signatures that form the major clusters. See also Supplementary Table 4.

  8. Supplementary Figure 8: Sensitivity of K4me2 clusters to technical aspects of the analysis. (112 KB)

    A) The distribution of single cell coverage in each of the 4 clusters found and described in Figure 6A. B) The fraction of cells that were assigned to their original cluster when re-clustering a randomly chosen subset of 50% of cells (averaged over 100 random subsets). See also Figure 5C. C) The correlations between signatures are not driven by the overlap between them. The correlation between chromatin signatures based on co-variation across cells in the population is plotted vs. their overlap-based correlations, when using the real single-cell data (top) or shuffled data (bottom). Red dots indicate correlations between members of the chosen set of 91 signatures. See also the quality control for clustering analysis. The plots show that correlations in the shuffled data are equal to (and determined by) the overlaps between the signatures. In contrast, the correlations between the signatures in the real data are higher than that due to actual co-variation of signal in distinct genomic regions across single cells. D) The clustering of single cells is largely the same whether one uses the representative signatures, the entire set, or a completely different set of signatures. Top (replica of Fig. 5B): Multidimensional scaling (MDS) plot comparing the chromatin landscapes of single ES cells and MEFs (colored circles). The distance between any two cells is proportional to the distance between their 91-dimensional signature vectors. The plot shows 1,000 single cells (randomly sampled from the 5,405 cells with H3K4me2 data), colored by their cluster association. Middle: the same single-cell profiles re-clustered using the complete set of 314 chromatin signatures (see Online Methods). Bottom: the same single-cell profiles re-clustered using a different set of chromatin signatures (datasets taken from Meshorer, see Supplementary Note 1). E) The distributions of single-cell scores for G1/S related signature, G2/M related signature and OCT4 signature are plotted for ES1, ES2 and ES3. See Supplementary Note 1 for more information on the cell cycle related signatures.

  9. Supplementary Figure 9: Detection limit of rare subpopulations. (114 KB)

    In-silico subsampling of data to simulate lowering the size and frequency of ES and MEF subpopulations derived from H3K4me2 data. The fraction of cells from the subsampled clusters that were correctly classified (True Positives) is plotted for ES1 (A) and for MEF (B) as a function of relative cluster size (Frequency) and of the total number of cells in the in-silico sample. C) True positive rate (color coded) averaged over all ES clusters is plotted as a function of the frequency and size of the targeted subpopulation. The contour line of True Positives = 75% is fitted by a model (white line) that predicts that to maintain a given purity of clustering the size of the subpopulation should grow inversely proportional with the frequency of the subpopulation. D) The model used in C predicts that the number of cells required for accurately detecting a subpopulation grows quadratically for rare subpopulations. This prediction was numerically validated by subsampling our data for populations of up to 5000 cells and for subpopulations as rare as 3% (dashed lines).

  10. Supplementary Figure 10: Chromatin signatures that separate K4me2 clusters also separate between RNA expression clusters. (66 KB)

    A) The distribution of RNA expression scores for 6 signatures that emerged as differential across the Drop-ChIP ES clusters, is plotted for single cells from two clusters that were derived based on these signatures. B) The mean (normalized) expression of genes in the two clusters from A is plotted for 3 sets of genes: those that have more H3K4me2 in ES1 (446 genes), those that are more enriched in ES3 (47 genes) and all other genes. For more details, see Supplementary Note 2.

Video

  1. Video 1: Cell encapsulation. (12.81 MB, Download)
    A slowed down movie showing barcode drops (small) and drops containing cellular chromatin (large) merge together with labeling buffer flowing into the T-junction.
  2. Video 2: Drop labeling. (4.92 MB, Download)
    A slowed down movie showing barcode drops (small) and drops containing cellular chromatin (large) merge together with labeling buffer flowing into the T-junction.

PDF files

  1. Supplementary Text and Figures (2,355 KB)

    Supplementary Figures 1–10, Supplementary Tables 1 and 3, and Supplementary Notes 1–3

Excel files

  1. Supplementary Table 2 (38 KB)

    Barcodes design

  2. Supplementary Table 4 (16 KB)

    Signatures data set sources

Zip files

  1. Supplementary Design Files (290 KB)

    Design of microfluidic devices. This compressed folder includes 3 ACAD designs, one for the 96 parallel drop makers, one for the co-flow drop maker and one for the 3 point merger.

Additional data