Parallel single-cell sequencing links transcriptional and epigenetic heterogeneity

Abstract

We report scM&T-seq, a method for parallel single-cell genome-wide methylome and transcriptome sequencing that allows for the discovery of associations between transcriptional and epigenetic variation. Profiling of 61 mouse embryonic stem cells confirmed known links between DNA methylation and transcription. Notably, the method revealed previously unrecognized associations between heterogeneously methylated distal regulatory elements and transcription of key pluripotency genes.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Figure 1: Quality control and global methylation and transcriptome patterns identified in serum ESCs profiled using scM&T-seq.
Figure 2: Genome-wide associations between methylation and transcriptional heterogeneity in mouse ESCs.

Accession codes

Primary accessions

Gene Expression Omnibus

Referenced accessions

Gene Expression Omnibus

Data deposits

This study reused some data from Smallwood et al.3 available in the Gene Expression Omnibus under accession GSE56879.

References

  1. 1

    Shapiro, E., Biezuner, T. & Linnarsson, S. Nat. Rev. Genet. 14, 618–630 (2013).

    CAS  Article  Google Scholar 

  2. 2

    Guo, H. et al. Genome Res. 23, 2126–2135 (2013).

    CAS  Article  Google Scholar 

  3. 3

    Smallwood, S.A. et al. Nat. Methods 11, 817–820 (2014).

    CAS  Article  Google Scholar 

  4. 4

    Farlik, M. et al. Cell Rep. 10, 1386–1397 (2015).

    CAS  Article  Google Scholar 

  5. 5

    Levsky, J.M., Shenoy, S.M., Pezo, R.C. & Singer, R.H. Science 297, 836–840 (2002).

    CAS  Article  Google Scholar 

  6. 6

    Yan, L. et al. Nat. Struct. Mol. Biol. 20, 1131–1139 (2013).

    CAS  Article  Google Scholar 

  7. 7

    Macaulay, I.C. et al. Nat. Methods 12, 519–522 (2015).

    CAS  Article  Google Scholar 

  8. 8

    Dey, S.S., Kester, L., Spanjaard, B., Bienko, M. & van Oudenaarden, A. Nat. Biotechnol. 33, 285–289 (2015).

    CAS  Article  Google Scholar 

  9. 9

    Schübeler, D. Nature 517, 321–326 (2015).

    Article  Google Scholar 

  10. 10

    Jones, P.A. Nat. Rev. Genet. 13, 484–492 (2012).

    CAS  Article  Google Scholar 

  11. 11

    Singer, Z.S. et al. Mol. Cell 55, 319–331 (2014).

    CAS  Article  Google Scholar 

  12. 12

    Kalmar, T. et al. PLoS Biol. 7, e1000149 (2009).

    Article  Google Scholar 

  13. 13

    Chambers, I. et al. Nature 450, 1230–1234 (2007).

    CAS  Article  Google Scholar 

  14. 14

    Singh, A.M., Hamazaki, T., Hankowski, K.E. & Terada, N. Stem Cells 25, 2534–2542 (2007).

    CAS  Article  Google Scholar 

  15. 15

    Torres-Padilla, M.E. & Chambers, I. Development 141, 2173–2181 (2014).

    CAS  Article  Google Scholar 

  16. 16

    Ficz, G. et al. Cell Stem Cell 13, 351–359 (2013).

    CAS  Article  Google Scholar 

  17. 17

    Klein, A.M. et al. Cell 161, 1187–1201 (2015).

    CAS  Article  Google Scholar 

  18. 18

    Kolodziejczyk, A.A. et al. Cell Stem Cell 17, 471–485 (2015).

    CAS  Article  Google Scholar 

  19. 19

    Habibi, E. et al. Cell Stem Cell 13, 360–369 (2013).

    CAS  Article  Google Scholar 

  20. 20

    Stadler, M.B. et al. Nature 480, 490–495 (2011).

    CAS  Article  Google Scholar 

  21. 21

    Lee, H.J., Hore, T.A. & Reik, W. Cell Stem Cell 14, 710–719 (2014).

    CAS  Article  Google Scholar 

  22. 22

    Papp, B. & Plath, K. EMBO J. 31, 4255–4257 (2012).

    CAS  Article  Google Scholar 

  23. 23

    Whyte, W.A. et al. Cell 153, 307–319 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  24. 24

    Krueger, F. & Andrews, S.R. Bioinformatics 27, 1571–1572 (2011).

    CAS  Article  Google Scholar 

  25. 25

    Wu, T.D. & Nacu, S. Bioinformatics 26, 873–881 (2010).

    CAS  Article  Google Scholar 

  26. 26

    Love, M.I., Huber, W. & Anders, S. Genome Biol. 15, 550 (2014).

    Article  Google Scholar 

  27. 27

    Trapnell, C. et al. Nat. Biotechnol. 28, 511–515 (2010).

    CAS  Article  Google Scholar 

  28. 28

    Bourgon, R., Gentleman, R. & Huber, W. Proc. Natl. Acad. Sci. USA 107, 9546–9551 (2010).

    CAS  Article  Google Scholar 

Download references

Acknowledgements

We thank A. Kolodziejczyk and S.A. Teichmann for providing a list of 86 ESC pluripotency and differentiation genes18. We thank W. Haerty for his supervision and valuable advice to T.X.H. We thank the Wellcome Trust Sanger Institute sequencing pipeline team for assistance with Illumina sequencing. We thank the members of the Sanger–European Bioinformatics Institute (EBI) Single-Cell Genomics Centre for general advice. W.R. is supported by the UK Biotechnology and Biological Sciences Research Council (BBSRC), the Wellcome Trust and the EU. G.K. is supported by the BBSRC, the UK Medical Research Council (MRC) and the EU. C.P.P. is supported by the Wellcome Trust and the MRC. T.V. is supported by the Wellcome Trust and KU Leuven (SymBioSys, PFV/10/016). H.J.L. is supported by EU Network of Excellence EpiGeneSys. O.S. is supported by the European Molecular Biology Laboratory (EMBL), the Wellcome Trust and the EU.

Author information

Affiliations

Authors

Contributions

C.A. performed all statistical analyses of the data. H.J.L., I.C.M., S.J.C. and S.A.S. developed the protocol and performed experiments. H.J.L., I.C.M., C.A., S.J.C., O.S., W.R. and C.P.P. interpreted the results. M.J.T. contributed to method development. T.X.H. processed RNA-seq data. F.K. processed BS-seq data. W.R., G.K., I.C.M. and T.V. contributed protocols and reagents. H.J.L., I.C.M., W.R. and T.V. conceived the project. W.R., O.S., T.V. and G.K. jointly supervised the project. O.S., H.J.L., S.J.C., W.R. and I.C.M. wrote the paper with input from all other authors. Names of authors who contributed equally to this work are ordered alphabetically on the first page.

Corresponding authors

Correspondence to Thierry Voet or Gavin Kelsey or Oliver Stegle or Wolf Reik.

Ethics declarations

Competing interests

W.R. is a consultant and shareholder of Cambridge Epigenetix.

Integrated supplementary information

Supplementary Figure 1 Detailed flow chart of the scM&T-seq protocol.

Single cells are collected and lysed before poly-A RNA is captured on magnetic beads and physically separated from DNA. Amplified cDNA is generated from mRNA on beads whilst DNA is bisulfite converted and Illumina sequencing libraries are prepared from both components in parallel.

Supplementary Figure 2 Quality metrics of scRNA-seq data obtained from mouse ESCs profiled using scM&T-seq.

(a,b) Number of genes detected on (Y-axis) as a function of the expression cut off (x-axis). In each cell, between 4,000 and 8,000 genes were expressed (TPM>1) (the dashed line drawn at X=1). High quality cells generally have about 5,000 genes detectable at the cut-off of TPM>1, indicating a high level of quality among the 61 serum ESCs (or the 14 2i ESCs). (c,d) Distribution of Pearson correlation coefficient calculated pairwise on the 61 serum ESCs (or the 14 2i ESCs). The observed correlation coefficient tended to be between 0.7-0.99, indicating a high degree of technical consistency in the measured transcriptome of the cells considered, and attesting high quality of scRNA-seq data.

Supplementary Figure 3 Quality metrics of single-cell methylomes in serum ESCs profiled using alternative protocols.

Shown are quality metrics for the scM&T-seq protocol to profile 20 serum ESCs, compared with scBS-seq (Smallwood et al. 2014) to profile 20 serum cells. (a) Read mapping efficiency. (b) Read duplication rate. (c) Genome-wide CpG and CHH methylation rate per cell. (d) Analysis of representation bias for different genomic contexts. (e) FASTQC report of adapter content from one representative single cell bisulfite library (Read 1 of cell B06). A large proportion of sequenced fragments are concatemers of the primer used in first strand synthesis which substantially limits the alignment rates of these libraries. It may be possible to improve mapping efficiencies by reducing oligo concentrations or reaction times but this is likely to result in reduced genomic coverage. Source data

Supplementary Figure 4 Methylation coverage in different genomic contexts.

Shown is the percentage of genomic contexts of different classes (y-axis) that are covered for an increasing number of minimum cells (x-axis), considering both scBS-seq (Smallwood et al. 2014, green) and scM&T-seq (blue). Note that the total number of serum cells is 20 for scBS-seq and 61 for scM&T-seq. Source data

Supplementary Figure 5 Genome-wide methylation coverage.

Shown is the percentage of genome-wide 10kb, 5kb, and 1kb windows covered (y-axis) by an increasing minimum number of cells (x-axis), for scBS-seq (Smallwood et al. 2014, green) and scM&T-seq (blue). Note that the total number of serum cells is 20 for scBS-seq and 61 for scM&T-seq. Source data

Supplementary Figure 6 Hierarchical clustering of DNA-methylation profiles generated by scM&T-seq and scBS-seq.

Shown s a joint hierarchical clustering from 61 serum and 16 2i cells profiled using scM&T-seq, as well as 20 serum and 12 2i ESCs profiled by scBS-seq (Smallwood et al. 2014), as well as corresponding synthetic bulk samples and an independent bulk BS-seq sample from serum ESCs (Ficz et al. 2013). The clustering analysis was performed on gene body methylation of the 500 genes with the largest epigenome heterogeneity. Source data

Supplementary Figure 7 Correlation between single-cell methylomes and the methylome of a bulk cell population.

Shown is a scatter plot, relating bulk gene-body methylation (Ficz et al. 2013) on the x-axis, versus synthetic bulk estimates of gene-body methylation derived using either scBS-seq (Smallwood et al. 2014, green) or scM&T-seq (blue) on the y-axis. Synthetic bulk methylation profiles are derived form averages of the single-cell methylation profiles. The true bulk methylation profile is concordant with both single-cell profiles, where the scM&T-seq bulk estimates correlate slightly better (R=0.77) than the scBS-seq bulk (R=0.69). Source data

Supplementary Figure 8 Principal-component analysis of gene-body methylation and gene expression in serum-grown ESCs.

Shown are projections onto first two principle components (left) alongside with percentage of variance explained by individual components (right) for both gene expression levels (a) and gene body methylation (b). Cells are color-coded based on clustering obtained using gene expression values, showing that that the methylation principal components partially recapitulate the structure in the expression data. Source data

Supplementary Figure 9 Scatter-plot matrix of principal components from methylation and gene expression profiles.

Shown are scatter plots between individual principal components of gene expression levels (y-axis) and corresponding gene body methylation (x-axis), using 61 serum cells profiled using scM&T-seq. Cells are color coded as in Supplementary Fig. 8. There is a strong correlation between the second principal component of DNA methylation and the corresponding component from gene expression, suggesting shared axes of variation between transcriptome and methylome profiles. Source data

Supplementary Figure 10 Clustering analysis of transcriptome and methylation data from 61 serum ESCs.

Shown are heatmaps for the gene body methylation (left) and gene expression profiles (right) using the 300 most heterogeneous genes (based on gene expression). The order of genes was taken from an individual clustering analysis based on gene methylation whereas cells were clustered separately either using DNA methylation or expression data, showing unlinked clusters (colored clusters). The bar plots in the center show the heterogeneity in DNA methylation (left) and gene expression (right). Source data

Supplementary Figure 11 Bootstrap robustness analysis of the gene-specific correlation analysis.

Shown is the absolute (a) and relative (b) reduction in the number of significant methylation-expression associations for different genomic contexts, as well as the root mean squared error of Pearson’s correlation coefficient (c) when either considering the full datasets or alternatively boot-strapped samples for the methylation-RNA correlation analysis. Bootstrap samples were obtained from independent draws of 60%, 70%, or 80% of the total set of cells. As expected, a reduction in the number of analyzed cells resulted in reduced power to detect significant associations (a, b). Overall, only a relatively small number of linkages were affected and the concordance to the full dataset remained high (c). Source data

Supplementary Figure 12 Correlation coefficients for associations between DNA-methylation profiles in alternative genomic contexts and gene expression levels.

Shown are boxplots of the correlation coefficient (Pearson r) between DNA methylation in different genomic contexts and corresponding gene expression levels (see Supplementary Table 2). Source data

Supplementary Figure 13 Volcano plots for association tests between DNA-methylation profiles in alternative genomic contexts and gene expression levels.

For each context, shown is the correlation coefficient (Pearson r, x-axis) versus the adjusted p-value (Benjamini Hochberg adjustment; y-axis). The blue horizontal line corresponds to the 10% FDR significance level. Each dot corresponds to a gene and the size to the adjusted p-value of the association test. Genes colored in red correspond to known pluripotency genes (Supplementary Table 5). The vertical orange line denotes the average correlation coefficient across all genes for a given annotation. Source data

Supplementary Figure 14 Comparison of results of cell-specific correlation analysis with known covariates (mean CpG methylation rate).

Supplementary Figure 15 Comparison of cell-specific correlation analysis with known covariates (CpG coverage).

For alternative genomic contexts, shown are scatter plots between cell-specific methylation-expression correlation coefficients and the (technical) CpG coverage in the corresponding cell. The lack of associations suggests that technical factors do not drive the heterogeneity in the coupling between methylation and expression between cells. Source data

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–15 and Supplementary Table 3 (PDF 2789 kb)

Supplementary Table 1

scRNA-seq and scBS-seq quality metrics. (XLSX 119 kb)

Supplementary Table 2

Genomic contexts considered for the methylation–gene expression association analyses. (XLSX 9 kb)

Supplementary Table 4

Gene-level results of the association tests between DNA-methylation variation in alternative genomic contexts and gene expression variation. (XLSX 21480 kb)

Supplementary Table 5

List of 86 literature-derived pluripotency genes. (XLS 33 kb)

Supplementary Table 6

Summary statistics obtained for the cell-specific association analysis correlating the methylome and the transcriptome in individual cells. (XLSX 51 kb)

Supplementary Software

scMT-seq software (ZIP 11 kb)

Source data

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Angermueller, C., Clark, S., Lee, H. et al. Parallel single-cell sequencing links transcriptional and epigenetic heterogeneity. Nat Methods 13, 229–232 (2016). https://doi.org/10.1038/nmeth.3728

Download citation

Further reading

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing