Genetic and epigenetic intra-tumoral heterogeneity cooperate to shape the evolutionary course of cancer1. Chronic lymphocytic leukaemia (CLL) is a highly informative model for cancer evolution as it undergoes substantial genetic diversification and evolution after therapy2,3. The CLL epigenome is also an important disease-defining feature4,5, and growing populations of cells in CLL diversify by stochastic changes in DNA methylation known as epimutations6. However, previous studies using bulk sequencing methods to analyse the patterns of DNA methylation were unable to determine whether epimutations affect CLL populations homogeneously. Here, to measure the epimutation rate at single-cell resolution, we applied multiplexed single-cell reduced-representation bisulfite sequencing to B cells from healthy donors and patients with CLL. We observed that the common clonal origin of CLL results in a consistently increased epimutation rate, with low variability in the cell-to-cell epimutation rate. By contrast, variable epimutation rates across healthy B cells reflect diverse evolutionary ages across the trajectory of B cell differentiation, consistent with epimutations serving as a molecular clock. Heritable epimutation information allowed us to reconstruct lineages at high-resolution with single-cell data, and to apply this directly to patient samples. The CLL lineage tree shape revealed earlier branching and longer branch lengths than in normal B cells, reflecting rapid drift after the initial malignant transformation and a greater proliferative history. Integration of single-cell bisulfite sequencing analysis with single-cell transcriptomes and genotyping confirmed that genetic subclones mapped to distinct clades, as inferred solely on the basis of epimutation information. Finally, to examine potential lineage biases during therapy, we profiled serial samples during ibrutinib-associated lymphocytosis, and identified clades of cells that were preferentially expelled from the lymph node after treatment, marked by distinct transcriptional profiles. The single-cell integration of genetic, epigenetic and transcriptional information thus charts the lineage history of CLL and its evolution with therapy.
This is a preview of subscription content, access via your institution
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 51 print issues and online access
$199.00 per year
only $3.90 per issue
Rent or buy this article
Prices vary by article type
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Multiplexed scRRBS and single-cell Smart-seq2 datasets have been deposited to the NCBI Gene Expression Omnibus (GEO) under accession number GSE109085. ChIP–seq datasets have been deposited to the NCBI GEO under accession number GSE119103. Other data are available from the corresponding author upon reasonable request.
Flavahan, W. A., Gaskell, E. & Bernstein, B. E. Epigenetic plasticity and the hallmarks of cancer. Science 357, eaal2380 (2017).
Burger, J. A. et al. Clonal evolution in patients with chronic lymphocytic leukaemia developing resistance to BTK inhibition. Nat. Commun. 7, 11589 (2016).
Landau, D. A. et al. Mutations driving CLL and their evolution in progression and relapse. Nature 526, 525–530 (2015).
Beekman, R. et al. The reference epigenome and regulatory chromatin landscape of chronic lymphocytic leukemia. Nat. Med. 24, 868–880 (2018).
Oakes, C. C. et al. DNA methylation dynamics during B cell maturation underlie a continuum of disease phenotypes in chronic lymphocytic leukemia. Nat. Genet. 48, 253–264 (2016).
Landau, D. A. et al. Locally disordered methylation forms the basis of intratumor methylome variation in chronic lymphocytic leukemia. Cancer Cell 26, 813–825 (2014).
Landan, G. et al. Epigenetic polymorphism and the stochastic formation of differentially methylated regions in normal and cancerous tissues. Nat. Genet. 44, 1207–1214 (2012).
Shipony, Z. et al. Dynamic and static maintenance of epigenetic memory in pluripotent and somatic cells. Nature 513, 115–119 (2014).
Shibata, D. Mutation and epigenetic molecular clocks in cancer. Carcinogenesis 32, 123–128 (2011).
Hansen, K. D. et al. Increased methylation variation in epigenetic domains across cancer types. Nat. Genet. 43, 768–775 (2011).
Hudson, R. R. & Kaplan, N. L. Statistical properties of the number of recombination events in the history of a sample of DNA sequences. Genetics 111, 147–164 (1985).
Chen, T. L. et al. NF-κB p50 (nfkb1) contributes to pathogenesis in the Eμ-TCL1 mouse model of chronic lymphocytic leukemia. Blood 130, 376–379 (2017).
Arsura, M., Hofmann, C. S., Golay, J., Introna, M. & Sonenshein, G. E. A. A-myb rescues murine B-cell lymphomas from IgM-receptor-mediated apoptosis through c-myc transcriptional regulation. Blood 96, 1013–1020 (2000).
Qu, Y. et al. Cancer-specific changes in DNA methylation reveal aberrant silencing and activation of enhancers in leukemia. Blood 129, e13–e25 (2017).
Landau, D. A. et al. Evolution and impact of subclonal mutations in chronic lymphocytic leukemia. Cell 152, 714–726 (2013).
Rendeiro, A. F. et al. Chromatin accessibility maps of chronic lymphocytic leukaemia identify subtype-specific epigenome signatures and transcription regulatory networks. Nat. Commun. 7, 11938 (2016).
Sekimata, M. & Homma, Y. Sequence-specific transcriptional repression by an MBD2-interacting zinc finger protein MIZF. Nucleic Acids Res. 32, 590–597 (2004).
Grün, D. et al. De novo prediction of stem cell identity using single-cell transcriptome data. Cell Stem Cell 19, 266–277 (2016).
Caron, G. et al. Cell-cycle-dependent reconfiguration of the DNA methylome during terminal differentiation of human B cells into plasma cells. Cell Reports 13, 1059–1071 (2015).
Sottoriva, A. et al. A Big Bang model of human colorectal tumor growth. Nat. Genet. 47, 209–216 (2015).
Shlush, L. I. et al. Cell lineage analysis of acute leukemia relapse uncovers the role of replication-rate heterogeneity and microsatellite instability. Blood 120, 603–612 (2012).
Wang, L. et al. Transcriptomic characterization of SF3B1 mutation reveals its pleiotropic effects in chronic lymphocytic leukemia. Cancer Cell 30, 750–763 (2016).
Herman, S. E. M. et al. Ibrutinib-induced lymphocytosis in patients with chronic lymphocytic leukemia: correlative analyses from a phase II study. Leukemia 28, 2188–2196 (2014).
Satterwhite, E. et al. The BCL11 gene family: involvement of BCL11A in lymphoid malignancies. Blood 98, 3413–3420 (2001).
Herndon, T. M. et al. Direct in vivo evidence for increased proliferation of CLL cells in lymph nodes compared to bone marrow and peripheral blood. Leukemia 31, 1340–1347 (2017).
Phelan, J. D. et al. A multiprotein supercomplex controlling oncogenic signalling in lymphoma. Nature 560, 387–391 (2018).
Herishanu, Y. et al. The lymph node microenvironment promotes B-cell receptor signaling, NF-κB activation, and tumor proliferation in chronic lymphocytic leukemia. Blood 117, 563–574 (2011).
Dadashian, E. L. et al. TLR signaling is activated in lymph-node resident CLL cells and is only partially inhibited by ibrutinib. Cancer Res. 79, 360–371 (2019).
Siegmund, K. D., Marjoram, P., Woo, Y.-J., Tavaré, S. & Shibata, D. Inferring clonal expansion and cancer stem cell dynamics from DNA methylation patterns in colorectal cancers. Proc. Natl Acad. Sci. USA 106, 4828–4833 (2009).
Messmer, B. T. et al. In vivo measurements document the dynamic cellular kinetics of chronic lymphocytic leukemia B cells. J. Clin. Invest. 115, 755–764 (2005).
Rassenti, L. Z. et al. Relative value of ZAP-70, CD38, and immunoglobulin mutation status in predicting aggressive disease in chronic lymphocytic leukemia. Blood 112, 1923–1930 (2008).
Guo, H. et al. Profiling DNA methylome landscapes of mammalian cells with single-cell reduced-representation bisulfite sequencing. Nat. Protocols 10, 645–659 (2015).
Guo, H. et al. Single-cell methylome landscapes of mouse embryonic stem cells and early embryos analyzed using reduced representation bisulfite sequencing. Genome Res. 23, 2126–2135 (2013).
Krueger, F. & Andrews, S. R. Bismark: a flexible aligner and methylation caller for bisulfite-seq applications. Bioinformatics 27, 1571–1572 (2011).
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Macaulay, I. C. et al. G&T-seq: parallel sequencing of single-cell genomes and transcriptomes. Nat. Methods 12, 519–522 (2015).
Picelli, S. et al. Full-length RNA-seq from single cells using Smart-seq2. Nat. Protocols 9, 171–181 (2014).
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
Harrow, J. et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 22, 1760–1774 (2012).
Van den Berge, K. et al. Observation weights unlock bulk RNA-seq tools for zero inflation and single-cell applications. Genome Biol. 19, 24 (2018).
Risso, D., Perraudeau, F., Gribkova, S., Dudoit, S. & Vert, J. P. A general and flexible method for signal extraction from single-cell RNA-seq data. Nat. Commun. 9, 284 (2018).
Van den Berge, K., Soneson, C., Robinson, M. D. & Clement, L. stageR: a general stage-wise method for controlling the gene-level false discovery rate in differential expression and differential transcript usage. Genome Biol. 18, 151 (2017).
Illingworth, R. S. et al. Orphan CpG islands identify numerous conserved promoters in the mammalian genome. PLoS Genet. 6, e1001134 (2010).
Andersson, R. et al. An atlas of active enhancers across human cell types and tissues. Nature 507, 455–461 (2014).
The ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Jurka, J. Repbase update: a database and an electronic journal of repetitive elements. Trends Genet. 16, 418–420 (2000).
O’Geen, H., Echipare, L. & Farnham, P. J. in Epigenetics Protocols (ed. Tollefsbol, T. O.) 265–286 (Humana Press, 2011).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Zhang, Y. et al. Model-based analysis of ChIP–seq (MACS). Genome Biol. 9, R137 (2008).
Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005).
Tajima, F. Infinite-allele model and infinite-site model in population genetics. J. Genet. 75, 27 (1996).
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Machanick, P. & Bailey, T. L. MEME-ChIP: motif analysis of large DNA datasets. Bioinformatics 27, 1696–1697 (2011).
Gupta, S., Stamatoyannopoulos, J. A., Bailey, T. L. & Noble, W. S. Quantifying similarity between motifs. Genome Biol. 8, R24 (2007).
Ushijima, T. et al. Fidelity of the methylation pattern and its variation in the genome. Genome Res. 13, 868–874 (2003).
Biezuner, T. et al. A generic, cost-effective, and scalable cell lineage analysis platform. Genome Res. 26, 1588–1599 (2016).
Navin, N. et al. Tumour evolution inferred by single-cell sequencing. Nature 472, 90–94 (2011).
Bian, S. et al. Single-cell multiomics sequencing and analyses of human colorectal cancer. Science 362, 1060–1063 (2018).
de Bourcy, C. F. A. et al. Phylogenetic analysis of the human antibody repertoire reveals quantitative signatures of immune senescence and aging. Proc. Natl Acad. Sci. USA 114, 1105–1110 (2017).
Xu, J. et al. Single-cell lineage tracing by endogenous mutations enriched in transposase accessible mitochondrial DNA. eLife 8, e45105 (2019).
Ludwig, L. S. et al. Lineage tracing in humans enabled by mitochondrial mutations and single-cell genomics. Cell 176, 1325–1339 (2019).
Nguyen, L.-T., Schmidt, H. A., von Haeseler, A. & Minh, B. Q. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32, 268–274 (2015).
Zhou, X., Shen, X.-X., Hittinger, C. T. & Rokas, A. Evaluating fast maximum likelihood-based phylogenetic programs using empirical phylogenomic data sets. Mol. Biol. Evol. 35, 486–503 (2018).
Kalyaanamoorthy, S., Minh, B. Q., Wong, T. K. F., von Haeseler, A. & Jermiin, L. S. ModelFinder: fast model selection for accurate phylogenetic estimates. Nat. Methods 14, 587–589 (2017).
Soubrier, J. et al. The influence of rate heterogeneity among sites on the time dependence of molecular rates. Mol. Biol. Evol. 29, 3345–3358 (2012).
Hoang, D. T., Chernomor, O., von Haeseler, A., Minh, B. Q. & Vinh, L. S. UFBoot2: Improving the ultrafast bootstrap approximation. Mol. Biol. Evol. 35, 518–522 (2017).
Robinson, D. F. & Foulds, L. R. Comparison of phylogenetic trees. Math. Biosci. 53, 131–147 (1981).
Raval, A. et al. TWIST2 demonstrates differential methylation in immunoglobulin variable heavy chain mutated and unmutated chronic lymphocytic leukemia. J. Clin. Oncol. 23, 3877–3885 (2005).
Perez, C. A., Ott, J., Mays, D. J. & Pietenpol, J. A. p63 consensus DNA-binding site: identification, analysis and application into a p63MH algorithm. Oncogene 26, 7363–7370 (2007).
Hsiao, L.-L. et al. A compendium of gene expression in normal human tissues. Physiol. Genomics 7, 97–104 (2001).
Wilhelm, M. et al. Mass-spectrometry-based draft of the human proteome. Nature 509, 582–587 (2014).
We thank the Epigenomics Core Facility at Weill Cornell Medicine for technical help. R.C. is supported by Leukemia Research Foundation (LRF) and Marie Skłodowska-Curie fellowships. A.G. is supported by Broad Institute SPARC funding. D.A.L. is supported by the Burroughs Wellcome Fund Career Award for Medical Scientists, American Society of Hematology (ASH) Scholar Award, Pershing Square Sohn Prize for Young Investigators in Cancer Research, and the National Institutes of Health (NIH) Director’s New Innovator Award (DP2-CA239065). This work was also supported by the Starr Foundation, the Max Planck Society, Leukemia & Lymphoma Society (LLS) Translational Research Program, National Cancer Institute (R01-CA229902), and Stand Up To Cancer Innovative Research Grant (SU2C-AACR-IRG-0616).
Nature thanks Ken Duffy and the other anonymous reviewer(s) for their contribution to the peer review of this work.
The authors declare no competing interests.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
Extended Data Fig. 1 Multiplexed scRRBS is an accurate and reproducible method for single-cell DNAme analysis.
a, Detailed schematic of the protocol for multiplexed scRRBS. b, Summary table of the healthy donor and CLL patient samples used in this study. c, Representative size distribution of the multiplexed scRRBS libraries assessed by Agilent Bioanalyzer before and after primer dimers removal. The DNA fragment size in multiplexed scRRBS libraries is typically 200–1,000 bp, with some visible peaks corresponding to the MspI fragments for repeat elements, and primer dimer contaminants (approximately 170 bp). LM, lower marker; UM, upper marker; MW, molecular-weight size marker. d, Number of CpGs observed in multiplexed scRRBS libraries across relevant genomic regions comparing multiplexed scRRBS (left) and bulk RRBS (right) assays for normal B (B01) and CLL (CLL01) cells. The enrichment in exons, promoters and CpG islands (CGIs) observed in multiplexed scRRBS libraries corresponded to approximately 40% of the total sequenced CpGs, akin to bulk RRBS assays. e, Downsampling analysis showing that around 1.7 million paired-end reads per cell provided roughly 85% of unique CpGs with further sequencing, resulting in a marginal increase in coverage. f, Correlation of average CpG methylation across in silico merged single cells and bulk RRBS obtained from matched samples for normal B (B01, n = 40,257 CpGs) (left) and CLL (CLL01, n = 9,578 CpGs) (right) cells. P values are indicated for two-sided Pearson’s correlation test. g, Pooling individual single cells together rapidly increases the number of CpGs recovered, approaching bulk RRBS coverage with more than 48 cells. The percentage of CpG sites detected in single-cell data (blue and red for normal B and CLL cells, respectively), the in vitro pooled single-cell datasets (light blue and light red, respectively) and matched bulk RRBS libraries (striped bars) are shown. Error bars represent 95% confidence interval. h, As in g for the percentage of average CpG DNA methylation. Single, pooled cells and bulk RRBS showed a similar percentage of CpG methylation, suggesting measured genome-wide DNAme profiles of individual cells accurately recapitulate bulk methylation profiles in the same cell type.
a, The approximately 10% sampling of the multiplexed scRRBS DNA methylome leads to intersection decrease of individual CpGs across cells. Left, expected number of times of observing a given CpG across all k-cells (matching k (number of cells) indicated in the x-axis value). Right, expected number of measured CpGs given k-cells. b, Biallelic coverage within a given single cell was detected in only 4.6 ± 2% of approximately 230 germline single nucleotide polymorphisms (SNPs) available for analysis, suggesting that the observed single-cell CpG data largely represents only one of the two alleles of the near-diploid CLL genome. c, Histograms of the distribution of CpG methylation values for single normal B (blue) and CLL (red) cells and matched bulk RRBS libraries showing highly digitized patterns of DNAme in single cells (that is, CpG sites either methylated or unmethylated) in contrast to bulk RRBS, which shows intermediate DNAme values. d, Representative analysis for three non-contiguous genomic windows around the promoter region of TWIST2, previously shown to be implicated in CLL pathogenesis68. Shown from top to bottom are the annotation of the TWIST2 promoter locus with CGI sites indicated (green); the estimated methylation rate of in silico pooled single cells for healthy donors and CLL; and the CpG methylation patterns (black circles: methylated; white circles: unmethylated) of single cells. Note the higher level of DNAme percentage in CLL compared with healthy donor cells at these selected regions.
Extended Data Fig. 3 CLL epigenomes show an increased epimutation rate across all genomic regions, with low cell-to-cell variability in epimutation rates.
a, Representative analysis of the WT1 locus. CpG island is indicated in green, along with the CpG methylation patterns (black circles denote methylated; white circles denote unmethylated) in single cells. We note that CLL cells exhibit lower cell-to-cell variation in the epimutation rate than normal B cells. b, Comparison of cell-to-cell epimutation rate difference per genomic region between CLL cells (n = 12; M-CLL (CLL01–CLL07), n = 309 pairs; U-CLL (CLL08–CLL12), n = 218 pairs) and healthy B cells (n = 5; B01–B02, B04–B06, n = 256 pairs). c, Difference in average CpG methylation per genomic region between CLL samples (n = 12; CLL01–CLL12 (M-CLL, n = 619 cells; U-CLL, n = 436 cells)) and normal B samples (n = 6; B01–B06 (n = 666 cells)). d, Percentage of change in CpG methylation at CGIs when comparing the DNAme level of individual cells in each sample to the baseline (defined as the average DNAme level across all samples) for CLL cells (n = 12; CLL01–CLL12 (M-CLL, n = 619 cells; U-CLL, n = 436 cells)) and normal B cells (n = 6; B01–B06 (n = 666 cells)). e, Multivariable linear regression model that accounts for potential technical confounders (bisulfite conversion rate, number of aligned reads, number of covered CpGs) in CLL samples (n = 12; CLL01–CLL12 (M-CLL, n = 619 cells; U-CLL, n = 436 cells)) and normal B samples (n = 6; B01–B06 (n = 666 cells)). f, Single-cell epimutation rate across index-sorted normal B cells (B04, n = 96 cells; B05, n = 96 cells; B06 = 92 cells). g, As in f for the difference in cell-to-cell epimutation rate (B04, n = 48 pairs; B05, n = 48 pairs; B06 = 46 pairs). h, Direct comparison of difference in cell-to-cell epimutation rate between CLL cells (n = 12; M-CLL (CLL01–CLL07)), n = 309 pairs; U-CLL (CLL08–CLL12), n = 218 pairs) and index-sorted B cells (n = 3; B04–B06; NBC, n = 35 pairs; loMBC, n = 35 pairs; intMBC, n = 35 pairs; hiMBC, n = 35 pairs). Box plots are as defined in Fig. 1. Error bars represent 95% confidence interval. P values were determined by two-sided Mann–Whitney U-test (b, f–h), followed by a Bonferroni adjustment procedure (b).
a, Concordance odds ratio (COR) of the DNA methylation state between any two neighbouring CpGs as a function of their genomic distance (see Methods for details). b, Left, scaled COR (0–1) for CGIs at transcription start sites (TSS) (the B01 and CLL01 samples are shown as representative examples). Right, average rate of decay (slope of the first order fit line) in the COR for normal B samples (n = 6) and CLL samples (n = 12) for CGIs at TSS (B01–B06 (n = 666 cells; n = 48,065,000 CpGs) and CLL01–CLL12 (M-CLL, n = 619 cells, n = 38,968,846 CpGs; U-CLL, n = 436 cells, n = 37,464,310 CpGs)). c, As in b for CGIs at TSS of genes belonging to the TP53 gene set69. Healthy donor B cell samples (n = 6): n = 666 cells, n = 6,308,174 CpGs; CLL samples (n = 12): M-CLL, n = 619 cells, n = 5,113,493 CpGs; U-CLL, n = 436 cells, n = 4,982,039 CpGs. d, As in b for CGIs at the TSS of housekeeping genes70. Healthy donor B cell samples (n = 6): n = 666 cells, n = 2,087,432 CpGs; CLL samples (n = 12): M-CLL, n = 619 cells, n = 1,686,295 CpGs; U-CLL, n = 436 cells, n = 1,620,802 CpGs. e, Average rate of decay in the COR for normal B (n = 6) and CLL (n = 12) samples for CGIs at the TSS of genes belonging to gene sets previously reported to be affected by a high epimutation rate6. Healthy donor B cell samples (n = 6): n = 666 cells, n = 48,065,000 CpGs; CLL samples (n = 12): M-CLL, n = 619 cells, n = 38,968,846 CpGs; U-CLL, n = 436 cells, n = 37,464,310 CpGs. Error bars represent 95% confidence interval. P values were determined by two-sided Mann–Whitney U-test.
a, Frequency of four gametes according to the level of average methylation of each individual CpG site in each CLL sample (CLL01–CLL12; randomly sampled CpGs shown out of the total CpGs assessed in each CLL sample; range 156,662–2,371,498 CpGs per sample covered in >5 cells in each sample). Smooth local regression line (LOESS) is shown in red. b, Low epimutation (loEpi) CpGs are defined as being 1.5× median absolute deviation (MAD) away from the median frequency of four gametes in each DNAme window of 0.05 (range 0.1–0.9) for a given sample. Shown is a representative example of this procedure for DNAme window of 0.5–0.55 in the CLL04 patient sample. c, Percentage of low epimutation CpGs (average of 1.22% ± 0.42 (mean ± s.e.m.); range 0.04–2.9%) out of the total CpGs assessed in each CLL sample. CLL01, n = 14,711 loEpi CpGs; CLL02, n = 2,573 loEpi CpGs; CLL013, n = 25,270 loEpi CpGs; CLL04, n = 29,114 loEpi CpGs; CLL05, n = 16,603 loEpi CpGs; CLL06, n = 11,413 loEpi CpGs; CLL07, n = 19,330 loEpi CpGs; CLL08, n = 19,916 loEpi CpGs; CLL09, n = 11,440 loEpi CpGs; CLL10, n = 18,614 loEpi CpGs; CLL11, n = 7,067 loEpi CpGs; CLL12, n = 308 loEpi CpGs. d, Additional sequence logos of the DNA motifs determined to be significantly overrepresented in low epimutation CpGs (±25 bp around CpGs at promoters (TSS ± 1 kb) or at enhancers) across all CLL samples. For each motif, the E value and the TOMTOM P value are shown. See Methods for details on the de novo motif enrichment analysis and the statistical tests used. e, Median protein expression (log10(normalized intensity-based absolute quantification (iBAQ))) of transcription factors for which motifs were enriched in regions with low epimutation CpGs, confirming that the identified transcription factors are expressed at the protein level in B cells and/or haematopoietic compartments. Error bars represent 95% confidence interval. All available human proteome data from lymphoid/haematopoietic lineages are displayed71.
a, Number of reads (left) and expression of IGH genes (right) in index-sorted B cells, validating our index-sorting strategy (CD27−IgM+IgD+++IgG− (NBC, n = 24 cells), CD27−IgM+IgD+IgG− (loMBC, n = 24 cells), CD27+IgM+IgD++IgG− (intMBC, n = 24 cells), and CD27+IgG+ (hiMBC, n = 23 cells)). Violin plots represent kernel density estimation showing the distribution shape of the data. b, Proportion of cells with gene expression (read count > 0) and exhibiting above-threshold DNAme. Data are mean ± s.e.m. across all genes with sufficient RNA (expression seen in >5 cells) and DNAme (>5 CpGs per promoter) information across the three samples (n = 1,816 genes). c, Transcriptional entropy across cells (see Methods) showing higher transcriptome entropy in CLL cells (CLL03, n = 94; CLL04, n = 92) than in healthy donor B cells (B04, n = 84) across various downsampling regimes (range 5,000–100,000; step-size of 1,000). Data are mean ± s.e.m. d, e, Single-cell transcriptional entropy (d) and epimutation rate (e) between normal CD27− B (NBC and loMBC) and CD27+ B (intMBC and hiMBC) cells. f, Left, distribution of the Spearman’s rho between expression and promoter DNAme rate (n = 3,094 genes with sufficient RNA (expression seen in >5 cells) and DNAme (>5 CpGs per promoter) information) in CLL04. The observed Spearman’s rho values were compared to values obtained by randomly permuting cell labels for the methylation values (see Methods). Right, heat maps of Spearman’s rank-order correlation for representative genes with positive or negative single-cell expression-methylation correlation. Scale bar represents promoter methylation and RNA read counts scaled by maximal value. g, As in f for individual normal B cells (n = 5,729 genes; n = 16 permutations; see Methods for details). h, As in g for CLL03 (n = 2,699 genes; n = 16 permutations). i, As in g for CLL04 (n = 3,094 genes; n = 16 permutations). j, Absolute change in Spearman’s rho when comparing matched versus scrambled DNAme and RNA single-cell data in CLL (CLL03 and CLL04) and normal B (B04) cells. From the pool of genes used in g–i, only overlapping genes (n = 951) across the three samples were used in the comparison. k, As in f for individual normal B cells (n = 2,500 most variable genes with sufficient RNA (expression seen in >5 cells) and DNAme (>5 CpGs per promoter) information; n = 16 permutations; see Methods for details). l, As in k for CLL03. m, As in k for CLL04. n, Absolute change in Spearman’s rho when comparing matched versus scrambled DNAme and RNA single-cell data in CLL (CLL03 and CLL04) and normal B (B04) cells. From the pool of genes used in k–m, only overlapping genes (n = 459) across the three samples were used in the comparison. o, Hydroxymethylation (5hmC) level at genes with positive correlation between expression and promoter DNA methylation (top correlated 10% of genes) compared with negatively correlated genes (top anti-correlated 10% of genes) in both normal B (B04; n = 336 and 330 genes, respectively) and CLL (CLL03 (n = 290 and 278 genes, respectively); CLL04 (n = 320 and 314 genes, respectively)) cells. Error bars represent 95% confidence interval. Published 5hmC data were used for the analysis19. Box plots are as defined in Fig. 1. P values were determined by two-sided Kolmogorov–Smirnov test (f–i, k–m), two-sided Wilcoxon signed-rank test (j, n) or two-sided Welch’s t-test (o).
a, Additional representative (random cell subsampling) methylation-based lineage trees of CLL cells. b, As in a for index-sorted normal B cells, showing that naive CD27− B cells (NBC; CD27−IgM+IgD+++IgG−) precede CD27+ memory terminally differentiated B cells (hiMBC; CD27+IgG+) in the lineage tree. c, Representative (cell subsampling) methylation-based lineage trees of CLL cells reconstructed using only autosomes or chromosome X. Tree topologies are similar to when using whole-genome information (see a and Fig. 3d), showing rapid drift after the initial malignant transformation. d, As in c for lineage trees of CLL cells obtained by holding-out chromosomes (hold-out three chromosomes at a time before phylogeny reconstruction; for example, excluding chromosomes 1–3, left), or downsampling the number of CpGs per cell to equal numbers (120,000 CpGs per cell; right). e, Normalized Robinson–Foulds distances between any two trees (n = 30 tree replicates; see Methods) of CLL01 reconstructed by maximum (M)-likelihood versus maximum-parsimony analyses. Differences (Δ) are indicated. f, Average maximum tree depth of lineage trees (n = 10 tree replicates; see Methods) of CLL (CLL01) and normal B (B02) cells when using whole-genome information compared to lineage trees obtained by holding-out chromosomes (hold-out three chromosomes at a time before phylogeny reconstruction). Error bars represent 95% confidence interval. g, Distribution of root-to-tip branch lengths (that is, the length from the root to each tip in the lineage tree) between CLL and normal B cells (M-CLL (CLL07), U-CLL (CLL10) and B05 are shown as representative examples). h, Patristic distances between index-sorted B cells from B04, B05 and B06 healthy donor samples (NBC, n = 24 cells for each sample; loMBC, n = 24 cells for each sample; intMBC, n = 24 cells for each sample; hiMBC, n = 23 cells for each sample). i, Patristic distances between CLL (CLL01) and normal B (B02) cells obtained from lineage trees reconstructed by using only autosomes, chromosome X, holding-out chromosomes (hold-out three chromosomes at a time before phylogeny reconstruction), or downsampling the number of CpGs per cell to equal numbers (120,000 CpGs per cell), respectively. Box plots are as defined in Fig. 1. P values were determined by two-sided Mann–Whitney U-test (e, h, i) or Welch’s t-test (f).
a, Schematic of the joint multiplexed scRRBS, transcriptome and genotyping capture protocol. b, Normalized Robinson–Foulds distances between any two trees (n = 30 tree replicates; see Methods) of CLL12 (n = 56 cells; see Fig. 3h) reconstructed by maximum-likelihood versus maximum-parsimony analyses. Differences (Δ) are indicated. c, Proportion of wild-type (white) and mutated (black) SF3B1 cells in each clade identified from the lineage tree shown in Fig. 3h. d, Comparison of the number of unique CpGs (left) and the CpG methylation level (right) between the wild-type-enriched and the mutated-enriched SF3B1 clade of cells identified from the lineage tree in Fig. 3h. e, Volcano plot of differentially methylated gene promoters (absolute weighted average DNAme difference > 0.3 and two-sided non-parametric permutation test P < 0.05) between the wild-type and mutated SF3B1 cells from the lineage tree shown in Fig. 3h. f, Single-cell alternative 3′ splicing score (fraction of reads that map downstream to the 3′ end (up to 100 bp) of the exons versus within the exons) for cells belonging to wild-type (n = 30) and mutated (n = 26) SF3B1 clades identified from the lineage tree shown in Fig. 3h. g, Volcano plot of differentially expressed genes between the wild-type-enriched and mutated-enriched SF3B1 clade. Genes (n = 57) with absolute log2(SF3B1 mutated-enriched/SF3B1 wild-type-enriched gene expression) > 0.5 and Benjamini–Hochberg FDR-adjusted weighted F-test P < 0.2 are shown in red. Genes that were previously reported to be affected by SF3B1 mutation22 are also labelled. h, Gene expression projections on lineage trees for two representative genes identified in g. i, Comparison of transcriptional distances (measured as Euclidean distances of the first three principal components after principal component analysis) as a function of lineage distance between cell pairs from the lineage tree shown in Fig. 3h. j, Cells belonging to SF3B1-mutated enriched clade show significantly lower relative node heights (that is, height of internal tree nodes relative to the root node; see Methods) compared with wild-type SF3B1-enriched clade, consistent with SF3B1 mutation being a later subclonal event in CLL15. k, As in j for root-to-tip branch lengths (that is, the length from the root to each tip in the lineage tree). l, Distribution of node ages (estimated number of divisions before present; see Methods) between the wild-type (white, n = 30 nodes) and mutated (grey, n = 25 nodes) SF3B1 enriched clade. Box plots are as defined in Fig. 1. P values were determined by two-sided Mann–Whitney U-test (b, d, f, i, j, l) or two-sided Fisher’s exact test (c).
Extended Data Fig. 9 Joint single-cell methylomics and RNA-seq link epigenetic and transcriptional information in CLL evolution with therapy.
a, Representative methylation-based lineage trees integrating cells before treatment (T0; white circle; n = 40 out of 96 randomly sampled cells) and during treatment (T1; red circle; n = 40 out of 96 randomly sampled cells) for samples CLL03, CLL04 and CLL05. See Fig. 4c for the percentage of T1 cells in each of the two clades (defined as the ones occurring after the first major split in the lineage tree) inferred from these lineage trees. b, Comparison of the CpG methylation level between the T1-enriched clade of cells and the remaining T1 cells identified from the lineage trees in a and in Fig. 4b and for samples CLL03, CLL04, CLL05 and CLL11, respectively. c, As in b for number of unique CpGs. d, Volcano plot of differentially methylated genes (absolute weighted average DNAme difference > 0.3 and two-sided non-parametric permutation test P < 0.05) between the T1-enriched clade of cells and the remaining T1 cells identified from the lineage trees in a and Fig. 4b for CLL03 (n = 515 genes), CLL04 (n = 429 genes), CLL05 (n = 690 genes) and CLL11 (n = 578 genes), respectively. Box plots are as defined in Fig. 1. P values were determined by two-sided Mann–Whitney U-test (b, c).
Extended Data Fig. 10 Cells preferentially expelled from the lymph nodes are marked by a distinct transcriptional profile.
a, Gene sets (canonical pathways; CP) enriched in differentially expressed genes (n = 336) between the T1-enriched clade of cells and the remaining T1 cells identified from the lineage trees in Fig. 4b and Extended Data Fig. 9a. A two-sided hypergeometric test was used to measure the enrichment of these genes in each gene set, followed by a Benjamini–Hochberg (BH) FDR procedure (cut-off of adjusted P < 0.2). b, Gene expression projections on lineage tree for TLR pathway genes from Fig. 4d for samples CLL04, CLL05 and CLL11, respectively. Scale bar represents RNA read counts scaled by maximal value. Expression value projection is performed only for T1 cells, comparing T1 versus T0-enriched clades. Asterisks indicate cells without RNA information. c, Fold change in gene expression of NF-κB-related genes between the T1-enriched clade of cells and the remaining T1 cells identified from the lineage trees in Fig. 4b and Extended Data Fig. 9a.
Primers and adapters used in MscRRBS library construction.
List of cell barcodes used in MscRRBS library construction.
Summary table of healthy donors and CLL patient samples used in this study. Table includes sample ID, treatment condition, number of cells sequenced, number of cells with >50,000 unique CpGs, and cell type with surface markers.
Summary statistics for normal B and CLL cells with >50,000 unique CpGs. Table includes cell ID, sample name, number of reads, number of unique CpGs, and bisulfite conversion rate.
Low epimutation CpG sites (n = 166,720) identified from the 4-gamete analysis. A total of 166,720 unique CpGs across the 12 CLL patient samples exhibited a lower frequency of 4 gametes than expected based on their DNAme level. For each CpG site, genomic coordinates (chromosome, start, end) are given.
Discriminative motif discovery results summary for low epimutation CpG sites (n = 16,619) within gene promoters.
Low epimutation CpG sites (n = 1,585) identified from the 4-gamete analysis within enhancers. Amongst the 166,720 low epimutation CpGs identified by the 4-gamete test, 1,585 CpGs overlapped with enhancers. For each CpG site, genomic coordinates (chromosome, start, end) are given.
Discriminative motif discovery results summary for low epimutation CpG sites (n = 1,585) within enhancers.
Enriched gene lists in the two clades of cells inferred based on the epimutation information from methylation-based lineage tree shown in Fig. 3h. Differential expression is compared between cells that map to the SF3B1 wild type enriched clade and cells that map to the SF3B1 mutated enriched clade. Genes (n = 57) with absolute log2(fold-change) > 0.5 and Benjamini-Hochberg FDR adjusted weighted F test P-values < 0.2 are shown in bold. The direction of change between the two clades of cells inferred based on the epimutation information is also reported.
Enriched gene lists in the two clades of cells inferred based on the epimutation information from individual lineage trees integrating cells before treatment (T0) and during treatment (T1) for CLL03, CLL04, CLL05, and CLL11 individual patient samples. Differential expression is compared between T1 cells that map to the T1 enriched clades and T1 cells that map to the T0 enriched clades (CLL03, n = 9,808 genes; CLL04, n = 10,210 genes; CLL05, n = 10,426 genes; CLL11, n = 1,400 genes). P-values are shown for weighted F test (see Methods for details).
Enriched gene lists in the two clades of cells inferred in combination across CLL03, CLL04, CLL05, and CLL11 patient samples based on the epimutation information from lineage trees integrating cells before treatment (T0) and during treatment (T1). Differential expression is compared between T1 cells that map to the T1 enriched clades and T1 cells that map to the T0 enriched clades. Weighted F test P-values were combined across patient samples (CLL03, CLL04, CLL05, and CLL11; 8,372 genes expressed >5 cells in ≥3 samples) via Fisher's combined probability test. Average of log2(fold change) across samples is also reported.
Characteristics of the 12 CLL patients used in this study.
List of genes tested through Genoptix (Genoptix, Carlsbad, CA) clinical grade CLL gene panel testing.
About this article
Cite this article
Gaiti, F., Chaligne, R., Gu, H. et al. Epigenetic evolution and lineage histories of chronic lymphocytic leukaemia. Nature 569, 576–580 (2019). https://doi.org/10.1038/s41586-019-1198-z
This article is cited by
Clinical Epigenetics (2023)
Military Medical Research (2023)
Nature Reviews Molecular Cell Biology (2023)
Nature Communications (2023)
DNA methylation profiling to determine the primary sites of metastatic cancers using formalin-fixed paraffin-embedded tissues
Nature Communications (2023)