We report a single-cell bisulfite sequencing (scBS-seq) method that can be used to accurately measure DNA methylation at up to 48.4% of CpG sites. Embryonic stem cells grown in serum or in 2i medium displayed epigenetic heterogeneity, with '2i-like' cells present in serum culture. Integration of 12 individual mouse oocyte datasets largely recapitulated the whole DNA methylome, which makes scBS-seq a versatile tool to explore DNA methylation in rare cells and heterogeneous populations.
This is a preview of subscription content, access via your institution
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Rent or buy this article
Prices vary by article type
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Jones, P.A. Nat. Rev. Genet. 13, 484–492 (2012).
Smith, Z.D. & Meissner, A. Nat. Rev. Genet. 14, 204–220 (2013).
Jaitin, D.A. et al. Science 343, 776–779 (2014).
Deng, Q. et al. Science 343, 193–196 (2014).
Macaulay, I.C. & Voet, T. PLoS Genet. 10, e1004126 (2014).
Lee, H.J. et al. Cell Stem Cell 14, 710–719 (2014).
Miura, F. et al. Nucleic Acids Res. 40, e136 (2012).
Shirane, K. et al. PLoS Genet. 9, e1003439 (2013).
Chambers, I. et al. Nature 450, 1230–1234 (2007).
Islam, S. et al. Nat. Methods 11, 163–166 (2014).
Hayashi, K. et al. Cell Stem Cell 3, 391–401 (2008).
Torres-Padilla, M.E. & Chambers, I. Development 141, 2173–2181 (2014).
Ficz, G. et al. Cell Stem Cell 13, 351–359 (2013).
Habibi, E. et al. Cell Stem Cell 13, 360–369 (2013).
Stadler, M.B. et al. Nature 480, 490–495 (2011).
Ziller, M.J. et al. Nature 500, 477–481 (2013).
Hon, G.C. et al. Nat. Genet. 45, 1198–1206 (2013).
Guo, H. et al. Genome Res. 23, 2126–2135 (2013).
Smallwood, S.A. et al. Nat. Genet. 43, 811–814 (2011).
Quail, M.A. et al. Nat. Methods 9, 10–11 (2012).
Krueger, F. & Andrews, S.R. Bioinformatics 27, 1571–1572 (2011).
Illingworth, R.S. et al. PLoS Genet. 6, e1001134 (2010).
Creyghton, M.P. et al. Proc. Natl. Acad. Sci. USA 107, 21931–21936 (2010).
Li, Y. et al. PLoS Biol. 8, e1000533 (2010).
Bock, C. et al. Mol. Cell 47, 633–647 (2012).
We thank K. Tabbada and the Welcome Trust Sanger Institute sequencing pipeline team for assistance with Illumina sequencing, R. Walker for assistance with flow cytometry, T. Hore (Babraham Institute, Cambridge, UK) for providing ESCs maintained in 2i medium and serum conditions, and T. Hore, J. Huang, I. Macaulay, S. Lorenz, M. Quail, T. Voet and H. Swerdlow for helpful discussions. This work was supported by the UK Biotechnology and Biological Sciences Research Council grant BB/J004499/1, UK Medical Research Council grant MR/K011332/1, Wellcome Trust award 095645/Z/11/Z and EU FP7 EpiGeneSys and BLUEPRINT.
W.R. is a consultant to Cambridge Epigenetix Ltd.
Integrated supplementary information
(a) Mapping efficiency of scBS-Seq samples and negative controls. Boxplot representation of the mapping efficiencies (on sequences obtained after trimming and mapping against human genome) for each single cell and negative control (red crosses represent individual cell values). The overall higher mapping efficiency of oocytes versus ESCs can be explained by the amount of DNA in each cells (4n for MII oocytes and 2n for ESCs), resulting in a relatively lower contribution of spurious sequences in MIIs (see Supplementary Fig. 2). All negative controls had less than 3.5% mapping efficiency (the dashed line indicates 5% mapping efficiency). (b) Visualization of scBS-Seq library fragment size distribution on the Bioanalyser platform. The Bioanalyser trace of library MII#1 is shown as an example.
(a) The relatively low mapping efficiency of scBS-Seq is associated with a significant fraction of sequences mapping at multiple genomic locations, which are therefore discarded. (b) Analysis of the G+C content of the raw sequences (i.e. prior to mapping) of scBS-Seq libraries revealed many with <3% G+C, absent from bulk samples. These correspond to poly-T stretches (poly-Ts) (i.e., (T)N with N>50). Poly-Ts are present in both actual samples and corresponding negative controls suggesting a contaminant as their main source of origin. (c,d) The amount of poly-Ts is higher in ESCs than oocytes, and the percentage of sequences with poly-Ts and sequences with multiple alignments are tightly correlated across samples. (e) This suggests that poly-Ts are the major cause for scBS-Seq low mapping efficiency. To test this, we trimmed, from the raw fasq file, sequences containing poly-Ts of at least 50 bp in size and repeated the mapping. This resulted in a drastic reduction in the percentage of sequences with multiple alignments and an increase in the percentage of sequences with unique alignments. Poly-Ts are inherent to our current methodology, and while alternative protocols we developed do not generate these artifacts, they still yield significantly fewer measured CpGs.
For each individual MII scBS-Seq library and one representative example of bulk BS-Seq (PBAT), the percentage of informative CpGs is plotted for 10% increments of mapped sequences. This demonstrates that in contrast to the bulk BS-Seq example (black line), MIIs scBS-Seq libraries (colored lines) have not reached the plateau of saturating sequencing depth, indicating that further sequencing would yield additional information. MII#2 Deep Seq and MII#5 Deep Seq correspond to the deeper sequencing of these libraries (see main text and Supplementary Table 1).
(a) For each single MII BS-Seq library, and for the bulk MII sample, CpGs were grouped based on their read depth. The proportion of CpGs in each group with a methylation value of either 0% or 100% (digital output) was calculated for each sample. The boxplot represents the results from all 12 single MII libraries. The results from the bulk MII sample are superimposed as solid blue circles. As expected, the proportion of digital CpGs in the scBS-Seq libraries was very high (>90% for read depth 2-5 in all cells, dashed line). In contrast, the bulk sample had fewer digital CpGs (66% at read depth 5) due to cell-to-cell variability within the population. (b) Histograms of the distribution of CpG methylation values for MII bulk and MII single cells for CpGs with at least 2 reads.
(a) CpG concordance was calculated for each cell pair as the proportion of overlapping CpGs with identical methylation state. On average, 1.8 M CpGs were measured for each pairwise analysis. Within each cell types, the order from bottom – up is the same than in Supplementary Table 1 (For oocytes bottom sample is MII#1 and top sample is MII#12). (b) Pearson correlation matrix of MIIs, 2i ESCs and serum ESCs scBS-Seq was calculated using 2 kb window methylation values.
Supplementary Figure 6 scBS-seq accurately determines CpG island (CGI) methylation status in MII oocytes.
(a) Heatmap displaying in individual MII libraries the methylation level of CGIs identified as methylated (>80%) and unmethylated (<20%; random selection) in bulk. The number on top indicates the number of individual MIIs in which CGIs are commonly informative. The discrepancy between the number of methylated and unmethylated CGIs informative across single cells reflects the different CpG density between these 2 groups as previously described19. (b) Histogram displaying for MII bulk and individual MII libraries the percentage of total CGIs (23,020) found methylated, unmethylated, with an intermediate level of methylation, and the percentage of wrong calls (i.e., CGI methylated in bulk (>80%) and called unmethylated (<20%) in single cells, and vice versa). (c) Boxplot presenting the methylation level in each individual MII of CGIs found methylated in bulk (>80%). The percentage of these CGIs informative in each MII with a methylation level lower than 80% is shown below the plot. (d) Similar to (c) for unmethylated CGIs (<20%).
(a) Snapshot displaying read distribution across 61 Mbp of chromosome 19. Below the annotation tracks are displayed the mapped reads and the quantification (number of reads per 25 kb window (log)). (b) The representation of different genomic contexts in single cell and bulk libraries is shown as fold enrichment over the expected value (dashed line). The boxplot represents the values for all single cell samples, and the bulk samples are superimposed as blue diamonds (MII), purple crosses (serum ESCs) and red plus signs (2i ESCs).
Number of CpGs (a) and CGIs (b) for the union and intersect of all possible combinations of the 12 individual MII scBS-Seq libraries. The union shows that pooling data from multiple scBS-Seq samples increases the number of measured sites. The intersect shows that the number of measured sites common to multiple scBS-Seq datasets decreases as the number of datasets increases. Dotted lines show the information obtained in standard BS-Seq experiments as well as the number of CpGs and CGIs in the mouse genome.
The imprinted Plagl1 locus (top) and Plagl1 maternal DMR/CGI (bottom) is shown for all 12 individual MIIs, MIIs merged and MII bulk. Quantification is absolute level of methylation (%), at individual CpG resolution, as indicated on the scale on the left of each sample (0 is 0% methylation, 1 is 100% methylation).
Cluster dendrograms are shown for (a) genome-wide methylation estimates (equivalent to the dendrogram shown in Figure 2b) and (b) the top 300 most variable sites among single ESC samples (equivalent to the dendrogram shown in Figure 2c). The cell IDs are included for direct comparison between dendrograms. (c) The distance matrix for the 300 most variable sites is grossly similar to that for all sites (Figure 2b). Cells are presented in the order shown in (b).
The top 300 ranked most variable sites in ESCs show similar methylation patterns across ESCs, as indicated by the low distance between sites.
(a) Receiver Operating Characteristic (ROC) curves showing the fraction of annotated sites (sensitivity) versus the fraction of non-annotated sites (1-specificity). Sites with high variance are more likely to belong to a given genomic context if the ROC curve is above the diagonal (e.g. H3K4me1), and less likely to belong to genomic contexts if the ROC curve is below the diagonal (e.g. CGI). (b) Different genomic contexts have different mean methylation values. (c) For most genomic contexts, variance was greatest for sites with mean methylation rates close to 50%. H3K27ac and H3K4me1 sites were among the most variable, even after accounting for mean methylation rate. CGI and p300 sites with intermediate mean methylation rates were also highly variable.
(a) Summary table showing the number of raw sequences, informative CpGs and CGIs. For scRRBS, the number of CpG dinucleotides and the number of informative CGIs were calculated using the methylation calls present in the.bed file of GEO accession number GSE47343 from Guo et al.18. (b) Plots showing the number of raw sequences generated and the corresponding number of CpGs obtained in MII oocytes for both methods.
About this article
Cite this article
Smallwood, S., Lee, H., Angermueller, C. et al. Single-cell genome-wide bisulfite sequencing for assessing epigenetic heterogeneity. Nat Methods 11, 817–820 (2014). https://doi.org/10.1038/nmeth.3035
This article is cited by
Whole-genome transcriptome and DNA methylation dynamics of pre-implantation embryos reveal progression of embryonic genome activation in buffaloes
Journal of Animal Science and Biotechnology (2023)
Single-cell DNA methylation sequencing by combinatorial indexing and enzymatic DNA methylation conversion
Cell & Bioscience (2023)
BMC Biology (2023)
Genome-wide assessment of DNA methylation alterations induced by superovulation, sexual immaturity and in vitro follicle growth in mouse blastocysts
Clinical Epigenetics (2023)
Extensive intratumor regional epigenetic heterogeneity in clear cell renal cell carcinoma targets kidney enhancers and is associated with poor outcome
Clinical Epigenetics (2023)