Propagation of clonal regulatory programs contributes to cancer development. It is poorly understood how epigenetic mechanisms interact with genetic drivers to shape this process. Here, we combine single-cell analysis of transcription and DNA methylation with a Luria–Delbrück experimental design to demonstrate the existence of clonally stable epigenetic memory in multiple types of cancer cells. Longitudinal transcriptional and genetic analysis of clonal colon cancer cell populations reveals a slowly drifting spectrum of epithelial-to-mesenchymal transcriptional identities that is seemingly independent of genetic variation. DNA methylation landscapes correlate with these identities but also reflect an independent clock-like methylation loss process. Methylation variation can be explained as an effect of global trans-acting factors in most cases. However, for a specific class of promoters—in particular, cancer–testis antigens—de-repression is correlated with and probably driven by loss of methylation in cis. This study indicates how genetic sub-clonal structure in cancer cells can be diversified by epigenetic memory.
Subscribe to Journal
Get full journal access for 1 year
only $17.42 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Raw and transformed data are available from the Gene Expression Omnibus and Sequence Read Archive (accession GSE144357).
MetaCell25 code is available from https://tanaylab.github.io/metacell/. Source code, additional metadata and a vignette exemplifying cell cycle normalization with MetaCell are available from https://github.com/tanaylab/Meir_et_al_nat_gen_2020_clonemem.
Hanahan, D. & Weinberg, R. A. Hallmarks of cancer: the next generation. Cell 144, 646–674 (2011).
Zwemer, L. M. et al. Autosomal monoallelic expression in the mouse. Genome Biol. 13, R10 (2012).
Iberg-Badeaux, A. et al. A transcription factor pulse can prime chromatin for heritable transcriptional memory. Mol. Cell. Biol. 37, e00372-16 (2017).
Shaffer, S. M. et al. Memory sequencing reveals heritable single cell gene expression programs associated with distinct cellular behaviors. Preprint at https://www.biorxiv.org/content/10.1101/379016v2 (2018).
Vardi, N., Levy, S., Assaf, M., Carmi, M. & Barkai, N. Budding yeast escape commitment to the phosphate starvation program using gene expression noise. Curr. Biol. 23, 2051–2057 (2013).
Shipony, Z. et al. Dynamic and static maintenance of epigenetic memory in pluripotent and somatic cells. Nature 513, 115–119 (2014).
Choi, M. et al. Epigenetic memory via concordant DNA methylation is inversely correlated to developmental potential of mammalian cells. PLoS Genet. 13, e1007060 (2017).
Arand, J. et al. Selective impairment of methylation maintenance is the major cause of DNA methylation reprogramming in the early embryo. Epigenet. Chromatin 8, 1 (2015).
Li, E., Bestor, T. H. & Jaenisch, R. Targeted mutation of the DNA methyltransferase gene results in embryonic lethality. Cell 69, 915–926 (1992).
Liu, X. et al. UHRF1 targets DNMT1 for DNA methylation through cooperative binding of hemi-methylated DNA and methylated H3K9. Nat. Commun. 4, 1563 (2013).
Roadmap Epigenomics Consortium et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).
Horvath, S. & Raj, K. DNA methylation-based biomarkers and the epigenetic clock theory of ageing. Nat. Rev. Genet. 19, 371–384 (2018).
Cruickshanks, H. A. et al. Senescent cells harbour features of the cancer epigenome. Nat. Cell Biol. 15, 1495–1506 (2013).
Irizarry, R. A. et al. The human colon cancer methylome shows similar hypo- and hypermethylation at conserved tissue-specific CpG island shores. Nat. Genet. 41, 178–186 (2009).
Klutstein, M., Nejman, D., Greenfield, R. & Cedar, H. DNA methylation in cancer and aging. Cancer Res. 76, 3446–3450 (2016).
Shih, A. H., Abdel-Wahab, O., Patel, J. P. & Levine, R. L. The role of mutations in epigenetic regulators in myeloid malignancies. Nat. Rev. Cancer 12, 599–612 (2012).
Zhou, W. et al. DNA methylation loss in late-replicating domains is linked to mitotic cell division. Nat. Genet. 50, 591–602 (2018).
Maxfield, K. E. et al. Comprehensive functional characterization of cancer–testis antigens defines obligate participation in multiple hallmarks of cancer. Nat. Commun. 6, 8840 (2015).
The Cancer Genome Atlas Network Comprehensive molecular characterization of human colon and rectal cancer. Nature 487, 330–337 (2012).
Rao, S. S. P. et al. Cohesin loss eliminates all loop domains. Cell 171, 305–320.e24 (2017).
Milyavsky, M. et al. Prolonged culture of telomerase-immortalized human fibroblasts leads to a premalignant phenotype. Cancer Res. 63, 7147–7157 (2003).
Jaitin, D. A. et al. Massively parallel single cell RNA-Seq for marker-free decomposition of tissues into cell types. Science 343, 776–779 (2014).
Brocks, D., Chomsky, E., Mukamel, Z., Lifshitz, A. & Tanay, A. Single cell analysis reveals dynamics of transposable element transcription following epigenetic de-repression. Preprint at https://www.biorxiv.org/content/10.1101/462853v1 (2019).
Luo, C. et al. Single-cell methylomes identify neuronal subtypes and regulatory elements in mammalian cortex. Science 357, 600–604 (2017).
Baran, Y. et al. MetaCell: analysis of single-cell RNA-seq data using K-nn graph partitions. Genome Biol. 20, 206 (2019).
Olivares-Chauvet, P. et al. Capturing pairwise and multi-way chromosomal conformations using chromosomal walks. Nature 540, 296–300 (2016).
Bonev, B. et al. Multiscale 3D genome rewiring during mouse neural development. Cell 171, 557–572.e24 (2017).
We thank O. Schwartzman for assistance with whole-exome sequencing analysis, P. Olivares-Chauvet for help with analysis of the Hi-C data, and all members of the Tanay group for fruitful discussions. We also thank E. Hagai, G. Levi-Cohen, A. Sharp and Z. Porat from the Weizmann Institute of Science flow cytometry unit. Research was supported by the European Research Council (724824 - scAssembly), Israel Science Foundation and Chan Zuckerberg Initiative. A.T. is a Kimmel investigator.
The authors declare no competing interests.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
a, Schematic readout of transcriptional memory test using a Luria-Delbrück design. b, Distributions of the total number of UMIs obtained per clone in different cell-lines. n = number of clones profiled. c, Distributions of total UMIs obtained per cell in different cell-lines. n = number of cells profiled. d, Normalized expression variability (log2(variance/mean)) per gene in singles cells obtained by 10×(x-axis) and MARS-seq (y-axis). Genes with high normalized variance are annotated. Blue - cell-cycle markers. e, Normalized gene expression variance in HCT116 cells. Selected variable genes (black) and cell-cycle markers (blue) are annotated. Purple line is showing a roll-median trend. For both plots, cells are down-sampled to 6 K UMIs. f, Normalized pooled expression of common 17,949 genes in single cells obtained by 10×(x-axis) and MARS-seq (y-axis). Expression values were computed as log2 of UMIs / 10 K UMIs. Five top differential genes are annotated in red. g, Total log2 UMI counts in two MARS-seq technical replicates of 260 HCT116 well covered clonal populations (>10 K UMIs in both replicates). Color of dots indicates first (black) or second (blue) culturing batches. h, Normalized expression of selected variable genes and gene-modules in technical replicates of HCT116 clones. i-j, as in g-h, for 199 H1299 clonal populations. All replicates were covered by at least 5 K UMIs (random pairs of quadruple experiments are shown) k-l, As in g-h, for 157 A549 clonal populations where all replicates covered by at least 5 K UMIs. m-n, As in g-h, for 57 WI38 clonal populations where all replicates are covered by at least 5 K UMIs. Three randomly selected replicates were selected and summed to represent a single technical. ρ values represent Spearman’s correlations and r values represent Pearsons’s.
Extended Data Fig. 2 Identification of cell-cycle independent transcription variation of HCT116, H1299 and WI38 single cells.
a-c, Normalized pooled expression in clonal populations (x-axis) and single-cells (y-axis) in WI38 (left), H1299 (center) and HCT116 (right) cells. Expression values computed as log2 of UMIs / 10 K UMIs. Genes with high differential expression in each system are displayed by red dots and annotated. n (WI38, HCT116) = 27,052, n (H1299) = 17,698. d, Distributions of total number of molecules per cell in inferred cell-cycle metacells of HCT116. Colors are as in Fig. 1b. e, log2 total expression signatures (left) and ratio of cell-cycle phases (right) in WI38 cells. Right panel shows only cells that were annotated as replicating in left panel. f, as in e for H1299 cells. g, as in e for HCT116 cells. A full list of genes used in this assay for all cell lines is in Supplementary Table 1. h, Illustration of expression randomization in each cell according to cell-cycle based cell-cell similarity graph. i, Showing scRNA gene profiles correlation with EpCAM expression, controlled by each gene total expression (left), with a running median shown in red. Following subtraction of the trendline, correlations are generally independent of gene sampling depth (right). j, Matrix of gene-gene correlations in HCT116 cells before (upper triangle) and after (lower triangle) cell-cycle based randomization. Showing selected cell-cycle related (Supplementary Table 1) and unrelated (Supplementary Table 3) gene modules. Number of analyzed cells defined in Extended Data Fig. 1c. k, Maximal correlation value of each gene with another gene before (x-axis) and after (y-axis) cell-cycle based randomization. Loss of correlation (blue dots) indicates that the co-expression patterns of this gene were independent of the cell-cycle, thus eliminated by the randomization. l-m, as in j-k, for NCI-H1299 cells. n-o, as in j-k, for WI38 cells.
a-c, Spearman’s correlations (depth-adjusted) of HCT116 scRNA-seq gene profiles without (blue, red) and following (light blue, tomato) permutation. Bar graphs show top positively (left) and negatively (right) correlated genes with EpCAM, VIM and IDI1, with the respective distributions of original and permutated correlations shown at the bottom. d, Normalized expression of epithelial module genes in HCT116 clones. For each gene (row), expression is divided by maximal value observed in clones. Displaying 233 clones (columns) covered by at least 50 K UMIs. e, Matrix showing clustered gene-gene correlations of all genes defined to maintain strong cell-cycle independent co-variation in Extended Data Fig. 2K (and summarized in Supplementary Table 2). Labels of genes related to epithelial module shown in d are colored in green, and its anti-correlated gene Vimentin (VIM) is colored in magenta. f, As in Fig. 1j for clones, we grouped cells obtained by MARS-seq (top) and 10×(bottom) into five bins based on expression of the EpCAM gene module (Ep5 consisting of cells with highest module expression). Bars are showing mean expression of each bin for EpCAM gene (blue) and for genes negatively enriched in EpCAM high cells (red). Error-bars represents standard error of binomial distribution. g, Distribution of normalized expression of Cholesterol (purple), Epithelial (antique-white) and EMT genes (red), binned and ordered according to the cell-cycle associated HCT116 metacells shown in Fig. 1b.
a, As shown in Extended Data Fig. 3e for HCT116, clustering gene-gene correlations of all H1299 cell-cycle independent genes labeled in Supplementary Fig. 3d (and summarized in Supplementary Table 2). Number of cells and clones analyzed from each cell-line are defined in Extended Data Fig. 1b,c. b, As in Extended Data Fig. 3d, showing normalized expression of ID and SERPINE1 gene modules in all H1299 single-cell derived clones. c, Comparison of NCI-H1299 gene-gene correlation over single cells (upper triangle) and clones (lower triangle). d, As in Supplementary Fig. 1d, showing for each gene the log2-ratio of relative expression in high SERPINE1 (top 25% H1299 cells and 20% H1299 clones) and low SERPINE1 (lower 30% cells and 40% clones) cells (x-axis) and clones (y-axis). Labeling genes of the ID module (red dots) and of the SERPINE1 gene module (blue dots). e, Total output (log-normalized expression) of SERPINE1 gene module (x-axis) and ID gene module (y-axis) in cells and clones shows a bi-modal, clonally conserved population structure in the NCI-H1299 system. f, As in c, Comparison of WI38 gene-gene correlation over single cells (upper triangle) and clones (lower triangle). g, As in a, clustering gene-gene correlations of all WI38 cell-cycle independent genes labeled in Extended Data Fig. 2o. Showing black labels for collagen module genes. h, As in b, showing normalized expression of Collagen module genes in WI38 clones. i, Gene-gene correlation of most variable genes in A549 clones. Labels of selected gene are shown on right. j, As in Extended Data Fig. 3d, showing normalized expression of variable genes in A549 clones.
Extended Data Fig. 5 Longitudinal whole exome sequencing (WES) analysis of selected clonal populations.
a, Coverage Summary of 27 Whole Exome Sequencing (WES, see Methods) experiments. Total reads obtained per sample (orange) and median on-target coverage per base (blue) are shown. Other stats and WES quality control are available in Supplementary Tables 12,13. b, Fraction of SNPs detected per coverage bin in different cell lines (mutational burden). Calls from all clones were aggregated per cell line. Coverage per base was obtained by DepthOfCoverage module of GATK v3.5. c, Allele frequency (AF) distributions of detected variants in HCT116 clones sampled after 78 days (top) and 168 days (bottom). d, Spatial distribution of SNPs detected in HCT116 clones. e, Comparison of allele frequencies in five HCT116 single-cell derived clones after 78 days (x axis) and 168 days (y axis). If selection was greatly affecting the process, allele frequencies were not expected to remain stable as observed in practice. f, Expression of marker genes in six A549 clones that were selected for exome analysis (colored in brown). g, Similar to Fig. 2c, kinship analysis of A549 clones. Rows above column show normalized expression of KRT18 (red) and SERPINE1 (blue) genes in each clone. h, Selection of seven NCI-H1299 clones (colored in magenta). i, Kinship analysis as in Fig. 2c for NCI-H1299 clones. j, Normalized expression of the SERPINE1 and ID modules in H1299. Single cells represented by small grey dots. Clones profiled by WES are labeled in black and red (as in panel i). Note the concordance between the genetic and transcriptional sub-clonal structure for these cells.
Extended Data Fig. 6 scPBAT and PBAT-capture of HCT116 clones using a colon cancer oriented probe-set.
a, Distribution of methylation calls in low-depth HCT116 single-cell PBAT analysis. b, Whole genome coverage and on-target coverage for HCT116 clonal populations. Coverage = total number of methylation calls. c, Density plot of pooled average methylation of on-target regions in single cells (black line) and clonal populations (antique-white line). d, Distributions of pooled averaged methylation of on-target regions in clones, grouped by their respective pooled average methylation in single cells (regions with > 50 calls in single cells and > 500 calls in clones, n = 341, 69, 49, 49, 39, 32, 67, 77, 186, 595). e, Pooled average methylation of individual cells in very low (0–1%) and low (1–2%) CG-content regions. f, Trends showing the correspondence between DNA methylation and CpG content for 1,022 single cells. g, Similar to f, showing correlation with genomic replication time. h, Single cell methylation average in regions with low CG-content (0–3%), defining classification into low, mid and high-bg cells i, Distribution of the log2 ratio of coverage of genomic sequences in early- and late-replicating regions. Vertical dashed grey line is defining the threshold for classifying single cells into G1 and S phase. j, Distribution of average methylation per cell in genome-wide low CpG regions (0–3%) for cells inferred to be in G1 and S phases in panel i. nG1 = 254, nS = 767. k, Average promoter and enhancer (methods) methylation in groups of single cells For all groups, n > 6000, chi-squared test, in all cases P < 2*10−16. l, Genomic spatial distribution of colon PBAT-capture probe set. m, Number of regions covered by the probe set, stratified by genomic context. n, Distribution of methylation of covered regions in TCGA colon cancer (COAD). Shown is average methylation of CpGs that reside within (blue) and outside (orange) the PBAT-capture probe set, grouped by genomic context (for all comparisons n > 6000, two-tailed KS test: D > 0.11, P < 2*10−16). o-q, Average methylation of 293 TCGA colon cancer tumors (COAD), in different ranges of CpG content.
Extended Data Fig. 7 Clonal methylation at functional regions is association with epithelial transcriptional output in HCT116.
a, Example for selection of clones for KNN-based normalization of DNA methylation over the clonal HCG (y axis) and LCG (x axis) space. Red dots indicate the K = 25 nearest neighboring clones used to normalize methylation of the selected clone (shown as blue dot). b, Distribution of correlations between average methylation of capture regions in clones to average methylation of clones in Low-CG (LCG) loci before (grey) and after (black) KNN normalization. c, same as b for High-CG (HCG) loci. d, Clustering of Spearman’s cross-correlation between gene expression and normalized average methylation of capture regions over 251 HCT116 clones covered simultaneously by RNA-seq and PBAT-capture. Green annotation of genes indicates epithelial genes. e, Epithelial transcriptional output per clone (x-axis) and clonal average methylation (y-axis) in 73 capture regions defined in d as Epithelial regions (Ep. regions, top) and 62 capture regions defined in d as EMT related regions (bottom) in 155 HCT116 clones, covered by at least 50 K UMIs and 50 K on-target methylation calls. f, As in e, showing expression of EMT related gene Zinc finger E-box-binding homeobox 1 (ZEB1) and methylation in Epithelial (top) and EMT (bottom) associated capture regions defined in d. g, Pooled average methylation of enhancer CpGs in EpCAM-high and -low clones, highlighting enhancers of epithelial up- (blue) or down-regulated (red) genes.
a, Average methylation of EpCAM-high (blue, n = 51) and EpCAM-low (light-blue, n = 51) clones over a region of chromosome 20 (Top panel − 5 kb bins, lower magnification: single CpGs). Green dots mark “cold” CpGs as defined in Fig. 3l. b, Bars indicate pooled expression levels in EpCAM-high and -low clones for genes within the genomic region shown in a. Chi-squared P values: TRIB3, RBCK1 < 2*10−16, SOX12 = 7*10−6, TBC1D20 = 2*10−5, C20orf54 = 1.3*10−3, CSNK2A1 = 3*10−3. c, Distribution of deviation from persistency (blue trend in Fig. 3l, see Methods) of enhancer CpGs. Ep-high and Ep-low represent CpGs with differential methylation of 0.1 or higher in EpCAM high vs. low clones. nother = 767, nEp.high = 122, nEp.low = 152. Two-tailed KS test, Ep-high: D = 0.2, P = 4*10−4. Ep-low: D = 0.16, P = 3*10−3. d, Showing inter-clonal variance (Fig. 3l) for enhancer CpGs colored as in c. Green - epithelial-related CpGs in chr20 as in panel a. e, Schematics of the screen for GEMINIs. f, Bars indicate clones’ LCG average methylation, color-coded by the number of GEMINIs de-repressed in it. Two-tailed KS test (D = 0.22, P = 0.039), comparing LCG average methylation for clones with and without GEMINIs (excluding VIM-high clones). g, Coverage depth of DKO transcriptome. h, Statistics of single-cell-PBAT methylation profiles of 974 DKO cells (orange boxes) and 1,022 WT cells (blue boxes). i, Pooled average methylation in WT (blue line) and DKO (orange line) cells, as a function of genomic CpG content. j, Distribution of pooled methylation of DKO and WT HCT116 cells (x-axis), showing 1,988 CpGs with n > 8 calls in both pools. k, Normalized pooled expression (log2 UMI / 10 K Umis) in DKO clonal populations (x-axis) and DKO single-cells (y-axis). Genes with highest differential expression are highlighted. l-m, Reproducibility of technical replicates in MARS-seq for 203 DKO clones, showing total UMI counts (log2 transformed) in two MARS-seq technical replicates and normalized expression of selected variable genes and gene-modules. Rho (ρ) represents Spearman’s correlation coefficient and r represents Pearson’s. n-p, Cell-cycle analysis of 3,371 single DKO cells, as shown in Fig. 1b and Extended Data Fig. 2d for wild-type HCT116.
Extended Data Fig. 9 Identification of cell-cycle independent transcriptional variance in DKO HCT116 single cells.
a, Maximal correlation values of each gene with another gene before (x-axis) and after (y-axis) cell-cycle based randomization of DKO cells (blue dots indicate genes maintaining cell-cycle variance, for full list see Supplementary Table 2). b, Matrix of gene-gene Spearman’s correlations in DKO cells before (upper triangle) and after (lower triangle) cell-cycle based randomization. c, Distribution of gene module expression per cell, classified by cell-cycle associated metacells in DKO (as defined in Extended Data Fig. 8n-p). d, Matrix of gene-gene Spearman’s correlations in DKO single cells (upper triangle) and DKO clones (lower triangle), indicating cell-cycle independent gene modules summarized in Supplementary Table 3. e, Genes with highest (blue bars) and lowest (red bars) expression change between EpCAM high and low DKO clones. f, Comparing gene expression fold-change of EpCAM high and low clones in HCT116 WT (x-axis) and DKO (y-axis).
a, Schematics of the screen for TAD de-repression. Clones can maintain deterministic repression of transcription in TADs that are “closed”. De-repression of a TAD in a clone can result in stochastic (possibly uncorrelated) de-repression of genes within it. b, Distribution of contact distances for 488 M HCT116 Hi-C contacts. C. TADs are defined between two picks of insulation (y-axis), as exemplified here for a segment of chromosome 7. d, Showing log mean expression in HCT116 clones (x-axis) and TAD auto-correlation scores (y-axis, see Methods). Genes showing statistically significant (positive) auto-correlation are labeled (light-blue for FDR < 0.25 and dark-blue for FDR < 0.05), for full list see Supplementary Table 9). e, We computed the correlation of expression between all genes to all TADs, and for each gene we measured the rank of its TAD auto-correlation. Shown is the distribution of these TAD auto-correlation ranks (value of 1 means the gene’s own TAD was the most correlated to it). f, Cumulative distribution of TAD auto-correlations in HCT16 clones, for observed data (black line) and for shuffled data (randomly assigning genes to TADs, grey line). g-i, Same screen as in d-f for A549 clones. j-l, Same screen as in d-f for WI38 clones. m, Showing fold-change expression of genes in HB-high vs. HB-low HCT116 clones (y-axis), over expression in HB-low clones (x-axis, left) fold-change in HB-high single cells vs. HB-low cells (x-axis, right). n, Expression across HCT116 clones of selected genes that correlate with expression of the HB gene module (x-axis), compared to expression from genes in the beta-globin chromosomal domains (y-axis).
About this article
Cite this article
Meir, Z., Mukamel, Z., Chomsky, E. et al. Single-cell analysis of clonal maintenance of transcriptional and epigenetic states in cancer cells. Nat Genet 52, 709–718 (2020). https://doi.org/10.1038/s41588-020-0645-y