Protein–DNA interactions are critical to the regulation of gene expression, but it remains challenging to define how cell-to-cell heterogeneity in protein–DNA binding influences gene expression variability. Here we report a method for the simultaneous quantification of protein–DNA contacts by combining single-cell DNA adenine methyltransferase identification (DamID) with messenger RNA sequencing of the same cell (scDam&T-seq). We apply scDam&T-seq to reveal how genome–lamina contacts or chromatin accessibility correlate with gene expression in individual cells. Furthermore, we provide single-cell genome-wide interaction data on a polycomb-group protein, RING1B, and the associated transcriptome. Our results show that scDam&T-seq is sensitive enough to distinguish mouse embryonic stem cells cultured under different conditions and their different chromatin landscapes. Our method will enable the analysis of protein-mediated mechanisms that regulate cell-type-specific transcriptional programs in heterogeneous tissues.
Subscribe to Journal
Get full journal access for 1 year
only $4.92 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
The sequencing data from this study are available from the Gene Expression Omnibus, accession number GSE108639.
All computational code used for this study is available upon request.
Nagano, T. et al. Single-cell Hi-C reveals cell-to-cell variability in chromosome structure. Nature 502, 59–64 (2013).
Kind, J. et al. Genome-wide maps of nuclear lamina interactions in single human cells. Cell 163, 134–147 (2015).
Flyamer, I. M. et al. Single-nucleus Hi-C reveals unique chromatin reorganization at oocyte-to-zygote transition. Nature 544, 110–114 (2017).
Stevens, T. J. et al. 3D structures of individual mammalian genomes studied by single-cell Hi-C. Nature 544, 59–64 (2017).
Cusanovich, D. A. et al. Multiplex single cell profiling of chromatin accessibility by combinatorial cellular indexing. Science 348, 910–914 (2015).
Buenrostro, J. D. et al. Single-cell chromatin accessibility reveals principles of regulatory variation. Nature 523, 486–490 (2015).
Jin, W. et al. Genome-wide detection of DNase I hypersensitive sites in single cells and FFPE tissue samples. Nature 528, 142–146 (2015).
Guo, H. et al. Single-cell methylome landscapes of mouse embryonic stem cells and early embryos analyzed using reduced representation bisulfite sequencing. Genome Res. 23, 2126–2135 (2013).
Smallwood, S. A. et al. Single-cell genome-wide bisulfite sequencing for assessing epigenetic heterogeneity. Nat. Methods 11, 817–820 (2014).
Farlik, M. et al. Single-cell DNA methylome sequencing and bioinformatic inference of epigenomic cell-state dynamics. Cell Rep. 10, 1386–1397 (2015).
Mooijman, D. et al. Single-cell 5hmC sequencing reveals chromosome-wide cell-to-cell variability and enables lineage reconstruction. Nat. Biotechnol. 34, 852–856 (2016).
Zhu, C. et al. Single-cell 5-formylcytosine landscapes of mammalian early embryos and ESCs at single-base resolution. Cell Stem Cell 20, 720–731 (2017).
Wu, X. et al. Simultaneous mapping of active DNA demethylation and sister chromatid exchange in single cells. Genes Dev. 31, 511–523 (2017).
Rotem, A. et al. Single-cell ChIP-seq reveals cell subpopulations defined by chromatin state. Nat. Biotechnol. 33, 1165–1172 (2015).
Dey, S. et al. Integrated genome and transcriptome sequencing of the same cell. Nat. Biotechnol. 33, 285–289 (2015).
Macaulay, I. C. et al. G&T-seq: parallel sequencing of single-cell genomes and transcriptomes. Nat. Methods 12, 519–522 (2015).
Hou, Y. et al. Single-cell triple omics sequencing reveals genetic, epigenetic, and transcriptomic heterogeneity in hepatocellular carcinomas. Cell Res. 26, 304–319 (2016).
Angermueller, C. et al. Parallel single-cell sequencing links transcriptional and epigenetic heterogeneity. Nat. Methods 13, 229–232 (2016).
Clark, S. J. et al. scNMT-seq enables joint profiling of chromatin accessibility DNA methylation and transcription in single cells. Nat. Commun. 9, 781 (2018).
Steensel van, B. et al. Chromatin profiling using targeted DNA adenine methyltransferase. Nat. Genet. 27, 304–308 (2001).
Vogel, M. J. et al. Detection of in vivo protein–DNA interactions using DamID in mammalian cells. Nat. Protoc. 2, 1467–1478 (2007).
Kind, J. et al. Single-cell dynamics of genome–nuclear lamina interactions. Cell 153, 178–192 (2013).
Hashimshony, T. et al. CEL-Seq2: sensitive highly-multiplexed single-cell RNA-Seq. Genome Biol. 17, 77 (2016).
Meuleman, W. et al. Constitutive nuclear lamina–genome interactions are highly conserved and associated with A/T-rich sequence. Genome Res. 23, 270–281 (2013).
Monkhorst, K. et al. X inactivation counting and choice is a stochastic process: evidence for involvement of an X-linked activator. Cell 132, 410–421 (2008).
Nishimura, K. et al. An auxin-based degron system for the rapid depletion of proteins in nonplant cells. Nat. Methods 6, 917–922 (2009).
Peric-Hupkes, D. et al. Molecular maps of the reorganization of genome–nuclear lamina interactions during differentiation. Mol. Cell 38, 603–613 (2010).
Aughey, G. N. et al. CATaDa reveals global remodelling of chromatin accessibility during stem cell differentiation in vivo. eLife 7, e32341 (2018).
Schones, D. E. et al. Dynamic regulation of nucleosome positioning in the human genome. Cell 132, 887–898 (2008).
Valouev, A. et al. Determinants of nucleosome organization in primary human cells. Nature 474, 516–520 (2011).
Boyle, A. P. et al. High-resolution mapping and characterization of open chromatin across the genome. Cell 132, 311–322 (2008).
Wang, H. et al. Role of histone H2A ubiquitination in polycomb silencing. Nature 431, 873–878 (2004).
Zylicz, J. J. et al. The implication of early chromatin changes in X chromosome inactivation. Cell 176, 182–197 (2019).
Bonev, B. et al. Multiscale 3D genome rewiring during mouse neural development. Cell 171, 557–572.e524 (2017).
Wang, Y. et al. The 3D Genome Browser: a web-based browser for visualizing 3D genome organization and long-range chromatin interactions. Genome Biol. 19, 151 (2018).
Sakaue-Sawano, A. et al. Visualizing spatiotemporal dynamics of multicellular cell cycle progression. Cell 132, 487–498 (2008).
Keane, T. M. et al. Mouse genomic variation and its effect on phenotypes and gene regulation. Nature 477, 289–294 (2011).
Lun, A. T. et al. Pooling across cells to normalize single-cell RNA sequencing data with many zero counts. Genome Biol. 17, 75 (2016).
Lun, A. T. et al. A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor. F1000Res 5, 2122 (2016).
Boedigheimer, M. J. et al. Sources of variation in baseline gene expression levels from toxicogenomics study control animals across multiple laboratories. BMC Genomics 9, 285 (2008).
Johnson, W. E. et al. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 1, 118–127 (2006).
Kent, W. J. et al. The human genome browser at UCSC. Genome Res. 12, 996–1006 (2002).
Ernst, J. et al. Mapping and analysis of chromatin state dynamics in nine human cell types. Nature 473, 43–49 (2011).
Knijnenburg, T. A. et al. Multiscale representation of genomic signals. Nat. Methods 11, 689–694 (2014).
Robinson, M. D. et al. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010).
We would like to thank the members of the Kind, Dey and van Oudenaarden laboratories for their comments on the manuscript and J. Gribnau (Erasmus UMC) for kindly providing the 129/Sv:CAST/EiJ mESCs and for advice on differentiation. We would like to thank B. de Barbanson and J. Yeung for suggestions regarding computational analyses and statistics, R. van der Linden for FACS and M. Muraro and L. Kester for input on the scDam&T-seq technique. S.S.D and A.C. received computational support from the Center of Scientific Computing at UCSB based on funding from NSF MRSEC (DMR-1720256) and NSF CNS-1725797. This work was funded by a European Research Council Starting grant (no. ERC-StG 678423-EpiID) and a Nederlandse Organisatie voor Wetenschappelijk Onderzoek (NWO) Open grant (no. 824.15.019) and ALW/VENI grant (no. 016.Veni.181.013). The Oncode Institute is supported by the KWF Dutch Cancer Society.
The authors declare no competing interests.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Integrated supplementary information
a) Comparison between the binarized (OE >= 1) single cell (horizontal tracks) maps for scDamID and scDam&T-seq (horizontal tracks; both panels show 75 single-cell samples with highest sample depths). Each row represents a single cell; each column a 100-kb bin along the genome. Unmappable genomic regions are indicated in red along the top of the track. b) Comparison of scDamID and scDam&T-seq CFs. CF distributions are depicted in the margins. Pearson’s r and p-value are indicated. P-value indicates the result of a two-sided test (0 indicates a value smaller than 32-bit floating point precision, that is 1.18e-38) c) Raw signal (RPKM values) on LAD-boundaries, for both scDamID and scDam&T-seq. LAD positions were defined independently based on HT1080 cells (Genome Research 23, 270–281, 2013). d) Run-length frequencies of uninterrupted ‘OE >= 1’ runs for two Dam-LMNB1 clones (#14 and #5-5) and one Dam clone (#5–8) for both scDam&T-seq (top) and scDamID (bottom). Run-length frequencies of randomized matrices with preserved marginals (Nature communications 5, 4114, 2014) are shown in light colors. e) Pearson autocorrelation of raw signal (y-axis) vs genomic distance (x-axis) of in silico population samples for two Dam-LMNB1 clones and one Dam clone, measured with scDam&T-seq (top) and scDamID (bottom). f) Comparison of sample complexities obtained with scDam&T-seq (dark markers) and scDamID (light markers) for Dam-LMNB1 clones and one Dam clone. Unique detected GATCs are depicted on the y-axis vs. GATC-aligning reads (including duplicates) on the x-axis. g) Overview of losses during processing of raw sequencing data in scDamID and scDam&T-seq. Bars from left-to-right follow the order of the processing pipeline, where raw reads are first filtered on the correct adapter structure, then aligned to the human genome, where reads not yielding a unique alignment are filtered out, as well as reads not aligning immediately adjacent to GATCs. Finally, duplicate reads are removed, on account of the haploid nature of the KBM7 cell line.
a) Distributions of the number of unique transcripts detected using CEL-Seq (Cell 163, 134–147, 2015) and scDam&T-seq. b) Correlation in expression values (TPM, transcripts per million reads) between scDam&T-seq and CEL-Seq (left panel) and two scDam&T-seq libraries processed in parallel (right panel). Pearson’s correlation coefficient is indicated. P-value indicate the result of a two-sided test (0 indicates a value smaller than 32-bit floating point precision, ie. 1.18e-38). c) Correlation of fraction of cells (passing cutoffs) in which a gene was detected between scDam&T-seq and CEL-Seq (left panel) and two scDam&T-seq libraries processed in parallel (right panel). Pearson’s correlation. P-value indicates the result of a two-sided test (0 indicates a value smaller than 32-bit floating point precision, ie. 1.18e-38). d) Coefficient of variation (CV) of gene expression (y-axis) vs, mean expression values (x-axis), as measured by scDam&T-seq (4 libraries across 2 KBM7 clones) and CEL-Seq. The dotted line indicates the CV of Poisson-distributed data (CV = λ−1/2) e) Principal component analysis on normalized expression data obtained from CEL-Seq and scDam&T-seq (Dam-LMNB1 clone #14), where the first three principal components are shown, as well as correlation of PC1 with sample depth (number of unique transcripts detected). Numbers in parentheses indicate the fraction of data variance explained by the principal component. f) Overview of losses during processing of transcriptomic data obtained with CEL-Seq and scDam&T-seq. Bars from left-to-right follow the order of the processing pipeline, where raw reads are aligned to the human genome, reads that do not yield unique alignments are filtered, as well as reads that do not match exons. Finally, duplicate reads are removed based on the UMIs. g) Fraction of transcriptomic reads mapping uniquely to either gene introns, exons or to intergenic loci for scDam&T-seq (top) and CEL-Seq (bottom) in KBM7 samples. Error bars indicate a 95% confidence interval for the mean. Error bars indicate a 95% confidence interval of the mean (calculated by bootstrap procedure). n=315 for scDam&T-seq, n=87 for CELseq. h) Relation between number of unique transcripts detected (y-axis) and number of unique GATCs detected (x-axis) with scDam&T-seq for two DamID adapter concentrations. Pearson’s r and p-values (two-sided test) are indicated. The dotted line indicates a linear regression estimate, the shaded area indicates a 95% confidence interval of regression estimates (determined by bootstrap procedure).
a) Auxin-mediated control of AID-Dam (clone #c8) and AID-Dam-LMNB1(clone #b2) cell lines. DamID PCR products of cells 24 and 48 hours after auxin washout (left). Time course and quantitative PCR analysis of auxin induction for a locus within a LAD, 0-, 8-, 10-, 12- and 24 hours after auxin washout (right). Quantification of the m6A levels as described for the DpnII assay (Cell 153, 178–192, 2013). Dot-plot depicts the mean value of n = 2 independent experiments. b) mESC in silico population Dam-LMNB1 RPKM values projected on the starts and ends of LAD boundaries defined previously (Molecular cell 38, 603–613, 2010), for two different Dam-LMNB1 clones (#b2 and #e6) and a total of 166 single-cell samples.
a) 10 bp resolution frequency spectrum of in silico population Dam signal stratified in four regimes of increasing CTCF binding activities (corresponding to Fig. 2d). The black vertical lines indicate 174 bp. b) Distribution of 20-kb bins as a function of bulk H3K4me3 (y-axis, left) or bulk H3K36me3 (y-axis, right) ChIP-seq and in silico population Dam data (x-axis). Pearson’s r and p-values are indicated. c) Percentage of methylation (for scNMT-seq) and RPKM values (for scDam&T-seq Dam signal) at DNaseI hypersensitivity sites, relates to Fig. 1d from Clark et al. (Nature communications 9, 781, 2018). Solid lines indicate the mean across single-cell samples while the shaded areas indicate the standard deviation of signals observed across single-cell samples. n=95 Dam-only single-cell samples, n=72 scNMT-seq single-samples. d) Number of unique genes observed with scNMT-seq and scDam&T-seq, using single-cell samples down sampled to 150,000 reads. Samples below cutoff were not considered for this analysis.
Supplementary Figure 5 Single-cell associations between transcription levels and variance, and Dam or Dam-LMNB1 contacts.
a) Relation between expression log2FC values and mean expression levels for Dam-LMNB1. See Fig. 3a for analysis of expression log2FC values. Solid line indicates the mean, shaded area indicates 1.96 times the standard deviation around the mean. b) As (a), but for Dam. c) Relation between expression log2FC values and expression variance-to-mean ratio for Dam-LMNB1. The variance-to-mean ratios were adjusted by controlling for mean expression levels, since the raw variance-to-mean values were not constant (nor linearly correlating) with mean expression levels (see methods for details). Solid line indicates the mean, shaded area indicates 1.96 times the standard deviation around the mean. d) As (c), but for Dam. e) Relative enrichment (min-max normalized) of several histone post-translational modifications (PTMs) in genomic regions with different CF values. f) The fraction of the genome coverage for (constitutive) cLADs, (facultative) fLADs, ciLADs and fiLADs over a range of CF values.
a) AluI signal obtained from 129/Sv:CAST/EiJ mESCs. Each row represents a single cell; each column a 100-kb bin along the genome. Red colors indicate an enrichment of signal compared to expected (OE, based on AluI-motif density) whereas blue colors indicate a depletion. The top cells with exclusive 129/Sv genomic annotations are likely a contamination of feeder cells (mouse embryonic fibroblasts). b) Dam signal from the same clone as in (a), on the maternal (129/Sv) and paternal (CAST/EiJ) alleles, and for 2i and serum, respectively. c) Example of a cell (marked with a blue X) which has no duplication of the paternal chromosome 12 (unlike the majority of the population), but harbors a duplication of the maternal chromosome 12 instead, observable in the Dam signals. This reciprocally corresponds to allele-specific transcription with approximately double the maternal level and half the paternal level, compared to the majority. d) Relation between allelic imbalance (‘bias’) of Dam signals (x-axis) and transcription (y-axis). Note that chromosomes 5, 8 and 12 (and sex chromosomes) seem frequently (partially) duplicated and were excluded from this analysis, as well as single-cell samples for which there was evidence of any CNV on any autosome (see methods). Pearson’s correlation is indicated. P-value indicate the result of a two-sided test. The solid line indicates a linear regression estimate, the shaded area indicates a 95% confidence interval of regression estimates (determined by bootstrap procedure).
Supplementary Figure 7 In silico identification of cell identities and corresponding regulatory landscapes with scDam&T-seq.
a) Normalized expression values for the top five down-regulated (left) and up-regulated (right) genes in 2i compared to serum. Box plots indicate the 25th and 75th percentile (box), and the median (line). Data points are overlaid as circles. n=61 and n=71 for serum and 2i conditions, respectively. b) Distribution of variances of differential 2i/serum accessibility between 100-kb bins within TADs (Cell 171, 557–572.e524, 2017, black line) and within randomly reordered TADs (50 iterations, green lines). P-values between variance distributions of the original and randomized TADs were calculated using a two-sided Mann-Whitney U test. Box plots indicate the 25th and 75th percentile (box), median (line) and 1.5 times the inter-quartile range (IQR) past the 25th and 75th percentiles (whiskers). c) Relation between log2 fold-change in Dam signal (RPM) values of in silico population samples (y-axis) and normalized transcription (x-axis) between 2i and serum conditions. Pearson’s correlation coefficient and p-value (two-sided test) are indicated. The solid line indicates a linear regression estimate, the shaded area indicates a 95% confidence interval of regression estimates (determined by bootstrap procedure). d) Dam OE values of in silico population samples at TSSs of down-regulated (left), upregulated (middle), or unaffected (right) genes in 2i (orange) compared to serum (blue). e) Dam OE values measured in single cells at TSSs of the top 5 down-regulated (left) and up-regulated (right) genes in 2i compared to serum. Box plots indicate the 25th and 75th percentile (box), and the median (line). Data points are overlaid as circles. f) Genome-wide correlation between RING1B ChIP-seq (bulk, input normalized) and DamID (62 cells, Dam normalized) in 100-kb bins. Spearman’s rho and p-value (two-sided test, determined by bootstrap) are indicated. Chromosomes 12, X and Y were excluded from the analysis. g) Transcriptional allelic bias on chromosome X (blue) compared to the allelic bias on the somatic chromosomes (green, downsampled to chromosome X coverage). Box plots indicate the 25th and 75th percentile (box), median (line) and 1.5 times the inter-quartile range (IQR) past the 25th and 75th percentiles (whiskers). Data points are overlaid as circles. h) Average allelic profiles of RING1B scDam&T-seq signal on chromosome 6 for cells that show a transcriptional bias on chromosome X towards neither allele (top), towards 129/Sv (middle), or towards CAST/EiJ (bottom). i) H2AK119Ub ChIP-seq data from GSM3267034 (Cell 176, 182–197, 2019) showing allele-specific signals upon X-inactivation (left panel). Comparison of allelic imbalance (ratio maternal over total) between DamID signals measured in cells showing either maternal or paternal X-inactivation, to H2AK119Ub allelic imbalance, per 100-kb bin. Pearson’s correlation coefficient and p-value (two-sided test) are indicated in the right panels. The solid line indicates a linear regression estimate, the shaded area indicates a 99% confidence interval of regression estimates (determined by bootstrap procedure).
About this article
Cite this article
Rooijers, K., Markodimitraki, C.M., Rang, F.J. et al. Simultaneous quantification of protein–DNA contacts and transcriptomes in single cells. Nat Biotechnol 37, 766–772 (2019). https://doi.org/10.1038/s41587-019-0150-y
Single-cell joint detection of chromatin occupancy and transcriptome enables higher-dimensional epigenomic reconstructions
Nature Methods (2021)
Frontiers in Cell and Developmental Biology (2021)
Nature Reviews Molecular Cell Biology (2021)
Current Opinion in Cell Biology (2021)
ACS Nano (2021)