Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Epigenetic evolution and lineage histories of chronic lymphocytic leukaemia

Abstract

Genetic and epigenetic intra-tumoral heterogeneity cooperate to shape the evolutionary course of cancer1. Chronic lymphocytic leukaemia (CLL) is a highly informative model for cancer evolution as it undergoes substantial genetic diversification and evolution after therapy2,3. The CLL epigenome is also an important disease-defining feature4,5, and growing populations of cells in CLL diversify by stochastic changes in DNA methylation known as epimutations6. However, previous studies using bulk sequencing methods to analyse the patterns of DNA methylation were unable to determine whether epimutations affect CLL populations homogeneously. Here, to measure the epimutation rate at single-cell resolution, we applied multiplexed single-cell reduced-representation bisulfite sequencing to B cells from healthy donors and patients with CLL. We observed that the common clonal origin of CLL results in a consistently increased epimutation rate, with low variability in the cell-to-cell epimutation rate. By contrast, variable epimutation rates across healthy B cells reflect diverse evolutionary ages across the trajectory of B cell differentiation, consistent with epimutations serving as a molecular clock. Heritable epimutation information allowed us to reconstruct lineages at high-resolution with single-cell data, and to apply this directly to patient samples. The CLL lineage tree shape revealed earlier branching and longer branch lengths than in normal B cells, reflecting rapid drift after the initial malignant transformation and a greater proliferative history. Integration of single-cell bisulfite sequencing analysis with single-cell transcriptomes and genotyping confirmed that genetic subclones mapped to distinct clades, as inferred solely on the basis of epimutation information. Finally, to examine potential lineage biases during therapy, we profiled serial samples during ibrutinib-associated lymphocytosis, and identified clades of cells that were preferentially expelled from the lymph node after treatment, marked by distinct transcriptional profiles. The single-cell integration of genetic, epigenetic and transcriptional information thus charts the lineage history of CLL and its evolution with therapy.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: CLL epigenomes show increased epimutation rate with low cell-to-cell variation.
Fig. 2: CLL higher epimutation rate is associated with higher transcriptional entropy consistent with transcriptional dysregulation.
Fig. 3: Lineage relationships inferred from single-cell DNA methylomes.
Fig. 4: Joint single-cell methylomics and RNA sequencing link lineage and transcriptional information in CLL evolution.

Similar content being viewed by others

Data availability

Multiplexed scRRBS and single-cell Smart-seq2 datasets have been deposited to the NCBI Gene Expression Omnibus (GEO) under accession number GSE109085. ChIP–seq datasets have been deposited to the NCBI GEO under accession number GSE119103. Other data are available from the corresponding author upon reasonable request.

References

  1. Flavahan, W. A., Gaskell, E. & Bernstein, B. E. Epigenetic plasticity and the hallmarks of cancer. Science 357, eaal2380 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  2. Burger, J. A. et al. Clonal evolution in patients with chronic lymphocytic leukaemia developing resistance to BTK inhibition. Nat. Commun. 7, 11589 (2016).

    Article  CAS  ADS  PubMed  PubMed Central  Google Scholar 

  3. Landau, D. A. et al. Mutations driving CLL and their evolution in progression and relapse. Nature 526, 525–530 (2015).

    Article  CAS  ADS  PubMed  PubMed Central  Google Scholar 

  4. Beekman, R. et al. The reference epigenome and regulatory chromatin landscape of chronic lymphocytic leukemia. Nat. Med. 24, 868–880 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Oakes, C. C. et al. DNA methylation dynamics during B cell maturation underlie a continuum of disease phenotypes in chronic lymphocytic leukemia. Nat. Genet. 48, 253–264 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Landau, D. A. et al. Locally disordered methylation forms the basis of intratumor methylome variation in chronic lymphocytic leukemia. Cancer Cell 26, 813–825 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Landan, G. et al. Epigenetic polymorphism and the stochastic formation of differentially methylated regions in normal and cancerous tissues. Nat. Genet. 44, 1207–1214 (2012).

    Article  CAS  PubMed  Google Scholar 

  8. Shipony, Z. et al. Dynamic and static maintenance of epigenetic memory in pluripotent and somatic cells. Nature 513, 115–119 (2014).

    Article  CAS  ADS  PubMed  Google Scholar 

  9. Shibata, D. Mutation and epigenetic molecular clocks in cancer. Carcinogenesis 32, 123–128 (2011).

    Article  CAS  PubMed  Google Scholar 

  10. Hansen, K. D. et al. Increased methylation variation in epigenetic domains across cancer types. Nat. Genet. 43, 768–775 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Hudson, R. R. & Kaplan, N. L. Statistical properties of the number of recombination events in the history of a sample of DNA sequences. Genetics 111, 147–164 (1985).

    CAS  PubMed  PubMed Central  Google Scholar 

  12. Chen, T. L. et al. NF-κB p50 (nfkb1) contributes to pathogenesis in the Eμ-TCL1 mouse model of chronic lymphocytic leukemia. Blood 130, 376–379 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Arsura, M., Hofmann, C. S., Golay, J., Introna, M. & Sonenshein, G. E. A. A-myb rescues murine B-cell lymphomas from IgM-receptor-mediated apoptosis through c-myc transcriptional regulation. Blood 96, 1013–1020 (2000).

    CAS  PubMed  Google Scholar 

  14. Qu, Y. et al. Cancer-specific changes in DNA methylation reveal aberrant silencing and activation of enhancers in leukemia. Blood 129, e13–e25 (2017).

    Article  CAS  PubMed  Google Scholar 

  15. Landau, D. A. et al. Evolution and impact of subclonal mutations in chronic lymphocytic leukemia. Cell 152, 714–726 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Rendeiro, A. F. et al. Chromatin accessibility maps of chronic lymphocytic leukaemia identify subtype-specific epigenome signatures and transcription regulatory networks. Nat. Commun. 7, 11938 (2016).

    Article  CAS  ADS  PubMed  PubMed Central  Google Scholar 

  17. Sekimata, M. & Homma, Y. Sequence-specific transcriptional repression by an MBD2-interacting zinc finger protein MIZF. Nucleic Acids Res. 32, 590–597 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Grün, D. et al. De novo prediction of stem cell identity using single-cell transcriptome data. Cell Stem Cell 19, 266–277 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  19. Caron, G. et al. Cell-cycle-dependent reconfiguration of the DNA methylome during terminal differentiation of human B cells into plasma cells. Cell Reports 13, 1059–1071 (2015).

    Article  CAS  PubMed  Google Scholar 

  20. Sottoriva, A. et al. A Big Bang model of human colorectal tumor growth. Nat. Genet. 47, 209–216 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Shlush, L. I. et al. Cell lineage analysis of acute leukemia relapse uncovers the role of replication-rate heterogeneity and microsatellite instability. Blood 120, 603–612 (2012).

    Article  CAS  PubMed  Google Scholar 

  22. Wang, L. et al. Transcriptomic characterization of SF3B1 mutation reveals its pleiotropic effects in chronic lymphocytic leukemia. Cancer Cell 30, 750–763 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Herman, S. E. M. et al. Ibrutinib-induced lymphocytosis in patients with chronic lymphocytic leukemia: correlative analyses from a phase II study. Leukemia 28, 2188–2196 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Satterwhite, E. et al. The BCL11 gene family: involvement of BCL11A in lymphoid malignancies. Blood 98, 3413–3420 (2001).

    Article  CAS  PubMed  Google Scholar 

  25. Herndon, T. M. et al. Direct in vivo evidence for increased proliferation of CLL cells in lymph nodes compared to bone marrow and peripheral blood. Leukemia 31, 1340–1347 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  26. Phelan, J. D. et al. A multiprotein supercomplex controlling oncogenic signalling in lymphoma. Nature 560, 387–391 (2018).

    Article  CAS  ADS  PubMed  PubMed Central  Google Scholar 

  27. Herishanu, Y. et al. The lymph node microenvironment promotes B-cell receptor signaling, NF-κB activation, and tumor proliferation in chronic lymphocytic leukemia. Blood 117, 563–574 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Dadashian, E. L. et al. TLR signaling is activated in lymph-node resident CLL cells and is only partially inhibited by ibrutinib. Cancer Res. 79, 360–371 (2019).

    Article  CAS  PubMed  Google Scholar 

  29. Siegmund, K. D., Marjoram, P., Woo, Y.-J., Tavaré, S. & Shibata, D. Inferring clonal expansion and cancer stem cell dynamics from DNA methylation patterns in colorectal cancers. Proc. Natl Acad. Sci. USA 106, 4828–4833 (2009).

    Article  CAS  ADS  PubMed  PubMed Central  Google Scholar 

  30. Messmer, B. T. et al. In vivo measurements document the dynamic cellular kinetics of chronic lymphocytic leukemia B cells. J. Clin. Invest. 115, 755–764 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Rassenti, L. Z. et al. Relative value of ZAP-70, CD38, and immunoglobulin mutation status in predicting aggressive disease in chronic lymphocytic leukemia. Blood 112, 1923–1930 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Guo, H. et al. Profiling DNA methylome landscapes of mammalian cells with single-cell reduced-representation bisulfite sequencing. Nat. Protocols 10, 645–659 (2015).

    Article  CAS  PubMed  Google Scholar 

  33. Guo, H. et al. Single-cell methylome landscapes of mouse embryonic stem cells and early embryos analyzed using reduced representation bisulfite sequencing. Genome Res. 23, 2126–2135 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Krueger, F. & Andrews, S. R. Bismark: a flexible aligner and methylation caller for bisulfite-seq applications. Bioinformatics 27, 1571–1572 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Macaulay, I. C. et al. G&T-seq: parallel sequencing of single-cell genomes and transcriptomes. Nat. Methods 12, 519–522 (2015).

    Article  CAS  PubMed  Google Scholar 

  37. Picelli, S. et al. Full-length RNA-seq from single cells using Smart-seq2. Nat. Protocols 9, 171–181 (2014).

    Article  CAS  PubMed  Google Scholar 

  38. Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).

    Article  CAS  PubMed  Google Scholar 

  39. Harrow, J. et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 22, 1760–1774 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Van den Berge, K. et al. Observation weights unlock bulk RNA-seq tools for zero inflation and single-cell applications. Genome Biol. 19, 24 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  41. Risso, D., Perraudeau, F., Gribkova, S., Dudoit, S. & Vert, J. P. A general and flexible method for signal extraction from single-cell RNA-seq data. Nat. Commun. 9, 284 (2018).

    Article  ADS  PubMed  PubMed Central  Google Scholar 

  42. Van den Berge, K., Soneson, C., Robinson, M. D. & Clement, L. stageR: a general stage-wise method for controlling the gene-level false discovery rate in differential expression and differential transcript usage. Genome Biol. 18, 151 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  43. Illingworth, R. S. et al. Orphan CpG islands identify numerous conserved promoters in the mammalian genome. PLoS Genet. 6, e1001134 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  44. Andersson, R. et al. An atlas of active enhancers across human cell types and tissues. Nature 507, 455–461 (2014).

    Article  CAS  ADS  PubMed  PubMed Central  Google Scholar 

  45. The ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).

    Article  ADS  PubMed Central  Google Scholar 

  46. Jurka, J. Repbase update: a database and an electronic journal of repetitive elements. Trends Genet. 16, 418–420 (2000).

    Article  CAS  PubMed  Google Scholar 

  47. O’Geen, H., Echipare, L. & Farnham, P. J. in Epigenetics Protocols (ed. Tollefsbol, T. O.) 265–286 (Humana Press, 2011).

  48. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Zhang, Y. et al. Model-based analysis of ChIP–seq (MACS). Genome Biol. 9, R137 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  50. Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005).

    Article  CAS  ADS  PubMed  PubMed Central  Google Scholar 

  51. Tajima, F. Infinite-allele model and infinite-site model in population genetics. J. Genet. 75, 27 (1996).

    Article  Google Scholar 

  52. Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Machanick, P. & Bailey, T. L. MEME-ChIP: motif analysis of large DNA datasets. Bioinformatics 27, 1696–1697 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Gupta, S., Stamatoyannopoulos, J. A., Bailey, T. L. & Noble, W. S. Quantifying similarity between motifs. Genome Biol. 8, R24 (2007).

    Article  PubMed  PubMed Central  Google Scholar 

  55. Ushijima, T. et al. Fidelity of the methylation pattern and its variation in the genome. Genome Res. 13, 868–874 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Biezuner, T. et al. A generic, cost-effective, and scalable cell lineage analysis platform. Genome Res. 26, 1588–1599 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Navin, N. et al. Tumour evolution inferred by single-cell sequencing. Nature 472, 90–94 (2011).

    Article  CAS  ADS  PubMed  PubMed Central  Google Scholar 

  58. Bian, S. et al. Single-cell multiomics sequencing and analyses of human colorectal cancer. Science 362, 1060–1063 (2018).

    Article  CAS  ADS  PubMed  Google Scholar 

  59. de Bourcy, C. F. A. et al. Phylogenetic analysis of the human antibody repertoire reveals quantitative signatures of immune senescence and aging. Proc. Natl Acad. Sci. USA 114, 1105–1110 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  60. Xu, J. et al. Single-cell lineage tracing by endogenous mutations enriched in transposase accessible mitochondrial DNA. eLife 8, e45105 (2019).

  61. Ludwig, L. S. et al. Lineage tracing in humans enabled by mitochondrial mutations and single-cell genomics. Cell 176, 1325–1339 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. Nguyen, L.-T., Schmidt, H. A., von Haeseler, A. & Minh, B. Q. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32, 268–274 (2015).

    Article  CAS  PubMed  Google Scholar 

  63. Zhou, X., Shen, X.-X., Hittinger, C. T. & Rokas, A. Evaluating fast maximum likelihood-based phylogenetic programs using empirical phylogenomic data sets. Mol. Biol. Evol. 35, 486–503 (2018).

    Article  CAS  PubMed  Google Scholar 

  64. Kalyaanamoorthy, S., Minh, B. Q., Wong, T. K. F., von Haeseler, A. & Jermiin, L. S. ModelFinder: fast model selection for accurate phylogenetic estimates. Nat. Methods 14, 587–589 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. Soubrier, J. et al. The influence of rate heterogeneity among sites on the time dependence of molecular rates. Mol. Biol. Evol. 29, 3345–3358 (2012).

    Article  CAS  PubMed  Google Scholar 

  66. Hoang, D. T., Chernomor, O., von Haeseler, A., Minh, B. Q. & Vinh, L. S. UFBoot2: Improving the ultrafast bootstrap approximation. Mol. Biol. Evol. 35, 518–522 (2017).

    Article  PubMed Central  Google Scholar 

  67. Robinson, D. F. & Foulds, L. R. Comparison of phylogenetic trees. Math. Biosci. 53, 131–147 (1981).

    Article  MathSciNet  Google Scholar 

  68. Raval, A. et al. TWIST2 demonstrates differential methylation in immunoglobulin variable heavy chain mutated and unmutated chronic lymphocytic leukemia. J. Clin. Oncol. 23, 3877–3885 (2005).

    Article  CAS  PubMed  Google Scholar 

  69. Perez, C. A., Ott, J., Mays, D. J. & Pietenpol, J. A. p63 consensus DNA-binding site: identification, analysis and application into a p63MH algorithm. Oncogene 26, 7363–7370 (2007).

    Article  CAS  PubMed  Google Scholar 

  70. Hsiao, L.-L. et al. A compendium of gene expression in normal human tissues. Physiol. Genomics 7, 97–104 (2001).

    Article  CAS  PubMed  Google Scholar 

  71. Wilhelm, M. et al. Mass-spectrometry-based draft of the human proteome. Nature 509, 582–587 (2014).

    Article  CAS  ADS  PubMed  Google Scholar 

Download references

Acknowledgements

We thank the Epigenomics Core Facility at Weill Cornell Medicine for technical help. R.C. is supported by Leukemia Research Foundation (LRF) and Marie Skłodowska-Curie fellowships. A.G. is supported by Broad Institute SPARC funding. D.A.L. is supported by the Burroughs Wellcome Fund Career Award for Medical Scientists, American Society of Hematology (ASH) Scholar Award, Pershing Square Sohn Prize for Young Investigators in Cancer Research, and the National Institutes of Health (NIH) Director’s New Innovator Award (DP2-CA239065). This work was also supported by the Starr Foundation, the Max Planck Society, Leukemia & Lymphoma Society (LLS) Translational Research Program, National Cancer Institute (R01-CA229902), and Stand Up To Cancer Innovative Research Grant (SU2C-AACR-IRG-0616).

Reviewer information

Nature thanks Ken Duffy and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Author information

Authors and Affiliations

Authors

Contributions

F.G., R.C., C.J.W., A.M. and D.A.L. conceived and designed the project. R.C., H.G., A.A., C.S., N.D.O., L.W., J.A.F., E.B.B., J.N.A., R.F. and A.G. performed patient selection and prepared samples for sequencing. R.C., H.G., A.G., D.A.L. and A.M. designed and developed multiplexed scRRBS and joint multiplexed scRRBS and single-cell RNA sequencing. F.G., R.M.B., S.K.-H., R.C.S., K.G., D.R., K.T.K., A.P., K.Y.H., E.B., K.C., M.A. and D.A.L. performed the computational genomics analyses. F.G., R.C., C.J.W., A.M. and D.A.L. wrote the manuscript with comments and contributions from all authors.

Corresponding author

Correspondence to Dan A. Landau.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Multiplexed scRRBS is an accurate and reproducible method for single-cell DNAme analysis.

a, Detailed schematic of the protocol for multiplexed scRRBS. b, Summary table of the healthy donor and CLL patient samples used in this study. c, Representative size distribution of the multiplexed scRRBS libraries assessed by Agilent Bioanalyzer before and after primer dimers removal. The DNA fragment size in multiplexed scRRBS libraries is typically 200–1,000 bp, with some visible peaks corresponding to the MspI fragments for repeat elements, and primer dimer contaminants (approximately 170 bp). LM, lower marker; UM, upper marker; MW, molecular-weight size marker. d, Number of CpGs observed in multiplexed scRRBS libraries across relevant genomic regions comparing multiplexed scRRBS (left) and bulk RRBS (right) assays for normal B (B01) and CLL (CLL01) cells. The enrichment in exons, promoters and CpG islands (CGIs) observed in multiplexed scRRBS libraries corresponded to approximately 40% of the total sequenced CpGs, akin to bulk RRBS assays. e, Downsampling analysis showing that around 1.7 million paired-end reads per cell provided roughly 85% of unique CpGs with further sequencing, resulting in a marginal increase in coverage. f, Correlation of average CpG methylation across in silico merged single cells and bulk RRBS obtained from matched samples for normal B (B01, n = 40,257 CpGs) (left) and CLL (CLL01, n = 9,578 CpGs) (right) cells. P values are indicated for two-sided Pearson’s correlation test. g, Pooling individual single cells together rapidly increases the number of CpGs recovered, approaching bulk RRBS coverage with more than 48 cells. The percentage of CpG sites detected in single-cell data (blue and red for normal B and CLL cells, respectively), the in vitro pooled single-cell datasets (light blue and light red, respectively) and matched bulk RRBS libraries (striped bars) are shown. Error bars represent 95% confidence interval. h, As in g for the percentage of average CpG DNA methylation. Single, pooled cells and bulk RRBS showed a similar percentage of CpG methylation, suggesting measured genome-wide DNAme profiles of individual cells accurately recapitulate bulk methylation profiles in the same cell type.

Extended Data Fig. 2 Single-cell DNA methylation coverage analysis.

a, The approximately 10% sampling of the multiplexed scRRBS DNA methylome leads to intersection decrease of individual CpGs across cells. Left, expected number of times of observing a given CpG across all k-cells (matching k (number of cells) indicated in the x-axis value). Right, expected number of measured CpGs given k-cells. b, Biallelic coverage within a given single cell was detected in only 4.6 ± 2% of approximately 230 germline single nucleotide polymorphisms (SNPs) available for analysis, suggesting that the observed single-cell CpG data largely represents only one of the two alleles of the near-diploid CLL genome. c, Histograms of the distribution of CpG methylation values for single normal B (blue) and CLL (red) cells and matched bulk RRBS libraries showing highly digitized patterns of DNAme in single cells (that is, CpG sites either methylated or unmethylated) in contrast to bulk RRBS, which shows intermediate DNAme values. d, Representative analysis for three non-contiguous genomic windows around the promoter region of TWIST2, previously shown to be implicated in CLL pathogenesis68. Shown from top to bottom are the annotation of the TWIST2 promoter locus with CGI sites indicated (green); the estimated methylation rate of in silico pooled single cells for healthy donors and CLL; and the CpG methylation patterns (black circles: methylated; white circles: unmethylated) of single cells. Note the higher level of DNAme percentage in CLL compared with healthy donor cells at these selected regions.

Extended Data Fig. 3 CLL epigenomes show an increased epimutation rate across all genomic regions, with low cell-to-cell variability in epimutation rates.

a, Representative analysis of the WT1 locus. CpG island is indicated in green, along with the CpG methylation patterns (black circles denote methylated; white circles denote unmethylated) in single cells. We note that CLL cells exhibit lower cell-to-cell variation in the epimutation rate than normal B cells. b, Comparison of cell-to-cell epimutation rate difference per genomic region between CLL cells (n = 12; M-CLL (CLL01–CLL07), n = 309 pairs; U-CLL (CLL08–CLL12), n = 218 pairs) and healthy B cells (n = 5; B01–B02, B04–B06, n = 256 pairs). c, Difference in average CpG methylation per genomic region between CLL samples (n = 12; CLL01–CLL12 (M-CLL, n = 619 cells; U-CLL, n = 436 cells)) and normal B samples (n = 6; B01–B06 (n = 666 cells)). d, Percentage of change in CpG methylation at CGIs when comparing the DNAme level of individual cells in each sample to the baseline (defined as the average DNAme level across all samples) for CLL cells (n = 12; CLL01–CLL12 (M-CLL, n = 619 cells; U-CLL, n = 436 cells)) and normal B cells (n = 6; B01–B06 (n = 666 cells)). e, Multivariable linear regression model that accounts for potential technical confounders (bisulfite conversion rate, number of aligned reads, number of covered CpGs) in CLL samples (n = 12; CLL01–CLL12 (M-CLL, n = 619 cells; U-CLL, n = 436 cells)) and normal B samples (n = 6; B01–B06 (n = 666 cells)). f, Single-cell epimutation rate across index-sorted normal B cells (B04, n = 96 cells; B05, n = 96 cells; B06 = 92 cells). g, As in f for the difference in cell-to-cell epimutation rate (B04, n = 48 pairs; B05, n = 48 pairs; B06 = 46 pairs). h, Direct comparison of difference in cell-to-cell epimutation rate between CLL cells (n = 12; M-CLL (CLL01–CLL07)), n = 309 pairs; U-CLL (CLL08–CLL12), n = 218 pairs) and index-sorted B cells (n = 3; B04–B06; NBC, n = 35 pairs; loMBC, n = 35 pairs; intMBC, n = 35 pairs; hiMBC, n = 35 pairs). Box plots are as defined in Fig. 1. Error bars represent 95% confidence interval. P values were determined by two-sided Mann–Whitney U-test (b, fh), followed by a Bonferroni adjustment procedure (b).

Extended Data Fig. 4 Long-range DNA methylation concordance decay.

a, Concordance odds ratio (COR) of the DNA methylation state between any two neighbouring CpGs as a function of their genomic distance (see Methods for details). b, Left, scaled COR (0–1) for CGIs at transcription start sites (TSS) (the B01 and CLL01 samples are shown as representative examples). Right, average rate of decay (slope of the first order fit line) in the COR for normal B samples (n = 6) and CLL samples (n = 12) for CGIs at TSS (B01–B06 (n = 666 cells; n = 48,065,000 CpGs) and CLL01–CLL12 (M-CLL, n = 619 cells, n = 38,968,846 CpGs; U-CLL, n = 436 cells, n = 37,464,310 CpGs)). c, As in b for CGIs at TSS of genes belonging to the TP53 gene set69. Healthy donor B cell samples (n = 6): n = 666 cells, n = 6,308,174 CpGs; CLL samples (n = 12): M-CLL, n = 619 cells, n = 5,113,493 CpGs; U-CLL, n = 436 cells, n = 4,982,039 CpGs. d, As in b for CGIs at the TSS of housekeeping genes70. Healthy donor B cell samples (n = 6): n = 666 cells, n = 2,087,432 CpGs; CLL samples (n = 12): M-CLL, n = 619 cells, n = 1,686,295 CpGs; U-CLL, n = 436 cells, n = 1,620,802 CpGs. e, Average rate of decay in the COR for normal B (n = 6) and CLL (n = 12) samples for CGIs at the TSS of genes belonging to gene sets previously reported to be affected by a high epimutation rate6. Healthy donor B cell samples (n = 6): n = 666 cells, n = 48,065,000 CpGs; CLL samples (n = 12): M-CLL, n = 619 cells, n = 38,968,846 CpGs; U-CLL, n = 436 cells, n = 37,464,310 CpGs. Error bars represent 95% confidence interval. P values were determined by two-sided Mann–Whitney U-test.

Extended Data Fig. 5 Epimutations at single CpG resolution.

a, Frequency of four gametes according to the level of average methylation of each individual CpG site in each CLL sample (CLL01–CLL12; randomly sampled CpGs shown out of the total CpGs assessed in each CLL sample; range 156,662–2,371,498 CpGs per sample covered in >5 cells in each sample). Smooth local regression line (LOESS) is shown in red. b, Low epimutation (loEpi) CpGs are defined as being 1.5× median absolute deviation (MAD) away from the median frequency of four gametes in each DNAme window of 0.05 (range 0.1–0.9) for a given sample. Shown is a representative example of this procedure for DNAme window of 0.5–0.55 in the CLL04 patient sample. c, Percentage of low epimutation CpGs (average of 1.22% ± 0.42 (mean ± s.e.m.); range 0.04–2.9%) out of the total CpGs assessed in each CLL sample. CLL01, n = 14,711 loEpi CpGs; CLL02, n = 2,573 loEpi CpGs; CLL013, n = 25,270 loEpi CpGs; CLL04, n = 29,114 loEpi CpGs; CLL05, n = 16,603 loEpi CpGs; CLL06, n = 11,413 loEpi CpGs; CLL07, n = 19,330 loEpi CpGs; CLL08, n = 19,916 loEpi CpGs; CLL09, n = 11,440 loEpi CpGs; CLL10, n = 18,614 loEpi CpGs; CLL11, n = 7,067 loEpi CpGs; CLL12, n = 308 loEpi CpGs. d, Additional sequence logos of the DNA motifs determined to be significantly overrepresented in low epimutation CpGs (±25 bp around CpGs at promoters (TSS ± 1 kb) or at enhancers) across all CLL samples. For each motif, the E value and the TOMTOM P value are shown. See Methods for details on the de novo motif enrichment analysis and the statistical tests used. e, Median protein expression (log10(normalized intensity-based absolute quantification (iBAQ))) of transcription factors for which motifs were enriched in regions with low epimutation CpGs, confirming that the identified transcription factors are expressed at the protein level in B cells and/or haematopoietic compartments. Error bars represent 95% confidence interval. All available human proteome data from lymphoid/haematopoietic lineages are displayed71.

Extended Data Fig. 6 Methylation-transcription relationships at the single-cell level.

a, Number of reads (left) and expression of IGH genes (right) in index-sorted B cells, validating our index-sorting strategy (CD27IgM+IgD+++IgG (NBC, n = 24 cells), CD27IgM+IgD+IgG (loMBC, n = 24 cells), CD27+IgM+IgD++IgG (intMBC, n = 24 cells), and CD27+IgG+ (hiMBC, n = 23 cells)). Violin plots represent kernel density estimation showing the distribution shape of the data. b, Proportion of cells with gene expression (read count > 0) and exhibiting above-threshold DNAme. Data are mean ± s.e.m. across all genes with sufficient RNA (expression seen in >5 cells) and DNAme (>5 CpGs per promoter) information across the three samples (n = 1,816 genes). c, Transcriptional entropy across cells (see Methods) showing higher transcriptome entropy in CLL cells (CLL03, n = 94; CLL04, n = 92) than in healthy donor B cells (B04, n = 84) across various downsampling regimes (range 5,000–100,000; step-size of 1,000). Data are mean ± s.e.m. d, e, Single-cell transcriptional entropy (d) and epimutation rate (e) between normal CD27 B (NBC and loMBC) and CD27+ B (intMBC and hiMBC) cells. f, Left, distribution of the Spearman’s rho between expression and promoter DNAme rate (n = 3,094 genes with sufficient RNA (expression seen in >5 cells) and DNAme (>5 CpGs per promoter) information) in CLL04. The observed Spearman’s rho values were compared to values obtained by randomly permuting cell labels for the methylation values (see Methods). Right, heat maps of Spearman’s rank-order correlation for representative genes with positive or negative single-cell expression-methylation correlation. Scale bar represents promoter methylation and RNA read counts scaled by maximal value. g, As in f for individual normal B cells (n = 5,729 genes; n = 16 permutations; see Methods for details). h, As in g for CLL03 (n = 2,699 genes; n = 16 permutations). i, As in g for CLL04 (n = 3,094 genes; n = 16 permutations). j, Absolute change in Spearman’s rho when comparing matched versus scrambled DNAme and RNA single-cell data in CLL (CLL03 and CLL04) and normal B (B04) cells. From the pool of genes used in gi, only overlapping genes (n = 951) across the three samples were used in the comparison. k, As in f for individual normal B cells (n = 2,500 most variable genes with sufficient RNA (expression seen in >5 cells) and DNAme (>5 CpGs per promoter) information; n = 16 permutations; see Methods for details). l, As in k for CLL03. m, As in k for CLL04. n, Absolute change in Spearman’s rho when comparing matched versus scrambled DNAme and RNA single-cell data in CLL (CLL03 and CLL04) and normal B (B04) cells. From the pool of genes used in km, only overlapping genes (n = 459) across the three samples were used in the comparison. o, Hydroxymethylation (5hmC) level at genes with positive correlation between expression and promoter DNA methylation (top correlated 10% of genes) compared with negatively correlated genes (top anti-correlated 10% of genes) in both normal B (B04; n = 336 and 330 genes, respectively) and CLL (CLL03 (n = 290 and 278 genes, respectively); CLL04 (n = 320 and 314 genes, respectively)) cells. Error bars represent 95% confidence interval. Published 5hmC data were used for the analysis19. Box plots are as defined in Fig. 1. P values were determined by two-sided Kolmogorov–Smirnov test (fi, km), two-sided Wilcoxon signed-rank test (j, n) or two-sided Welch’s t-test (o).

Extended Data Fig. 7 Methylation-based lineage trees provide a native lineage tracing system.

a, Additional representative (random cell subsampling) methylation-based lineage trees of CLL cells. b, As in a for index-sorted normal B cells, showing that naive CD27 B cells (NBC; CD27IgM+IgD+++IgG) precede CD27+ memory terminally differentiated B cells (hiMBC; CD27+IgG+) in the lineage tree. c, Representative (cell subsampling) methylation-based lineage trees of CLL cells reconstructed using only autosomes or chromosome X. Tree topologies are similar to when using whole-genome information (see a and Fig. 3d), showing rapid drift after the initial malignant transformation. d, As in c for lineage trees of CLL cells obtained by holding-out chromosomes (hold-out three chromosomes at a time before phylogeny reconstruction; for example, excluding chromosomes 1–3, left), or downsampling the number of CpGs per cell to equal numbers (120,000 CpGs per cell; right). e, Normalized Robinson–Foulds distances between any two trees (n = 30 tree replicates; see Methods) of CLL01 reconstructed by maximum (M)-likelihood versus maximum-parsimony analyses. Differences (Δ) are indicated. f, Average maximum tree depth of lineage trees (n = 10 tree replicates; see Methods) of CLL (CLL01) and normal B (B02) cells when using whole-genome information compared to lineage trees obtained by holding-out chromosomes (hold-out three chromosomes at a time before phylogeny reconstruction). Error bars represent 95% confidence interval. g, Distribution of root-to-tip branch lengths (that is, the length from the root to each tip in the lineage tree) between CLL and normal B cells (M-CLL (CLL07), U-CLL (CLL10) and B05 are shown as representative examples). h, Patristic distances between index-sorted B cells from B04, B05 and B06 healthy donor samples (NBC, n = 24 cells for each sample; loMBC, n = 24 cells for each sample; intMBC, n = 24 cells for each sample; hiMBC, n = 23 cells for each sample). i, Patristic distances between CLL (CLL01) and normal B (B02) cells obtained from lineage trees reconstructed by using only autosomes, chromosome X, holding-out chromosomes (hold-out three chromosomes at a time before phylogeny reconstruction), or downsampling the number of CpGs per cell to equal numbers (120,000 CpGs per cell), respectively. Box plots are as defined in Fig. 1. P values were determined by two-sided Mann–Whitney U-test (e, h, i) or Welch’s t-test (f).

Extended Data Fig. 8 Multiplexed scRRBS integration with single-cell transcriptomes and genotyping.

a, Schematic of the joint multiplexed scRRBS, transcriptome and genotyping capture protocol. b, Normalized Robinson–Foulds distances between any two trees (n = 30 tree replicates; see Methods) of CLL12 (n = 56 cells; see Fig. 3h) reconstructed by maximum-likelihood versus maximum-parsimony analyses. Differences (Δ) are indicated. c, Proportion of wild-type (white) and mutated (black) SF3B1 cells in each clade identified from the lineage tree shown in Fig. 3h. d, Comparison of the number of unique CpGs (left) and the CpG methylation level (right) between the wild-type-enriched and the mutated-enriched SF3B1 clade of cells identified from the lineage tree in Fig. 3h. e, Volcano plot of differentially methylated gene promoters (absolute weighted average DNAme difference > 0.3 and two-sided non-parametric permutation test P < 0.05) between the wild-type and mutated SF3B1 cells from the lineage tree shown in Fig. 3h. f, Single-cell alternative 3′ splicing score (fraction of reads that map downstream to the 3′ end (up to 100 bp) of the exons versus within the exons) for cells belonging to wild-type (n = 30) and mutated (n = 26) SF3B1 clades identified from the lineage tree shown in Fig. 3h. g, Volcano plot of differentially expressed genes between the wild-type-enriched and mutated-enriched SF3B1 clade. Genes (n = 57) with absolute log2(SF3B1 mutated-enriched/SF3B1 wild-type-enriched gene expression) > 0.5 and Benjamini–Hochberg FDR-adjusted weighted F-test P < 0.2 are shown in red. Genes that were previously reported to be affected by SF3B1 mutation22 are also labelled. h, Gene expression projections on lineage trees for two representative genes identified in g. i, Comparison of transcriptional distances (measured as Euclidean distances of the first three principal components after principal component analysis) as a function of lineage distance between cell pairs from the lineage tree shown in Fig. 3h. j, Cells belonging to SF3B1-mutated enriched clade show significantly lower relative node heights (that is, height of internal tree nodes relative to the root node; see Methods) compared with wild-type SF3B1-enriched clade, consistent with SF3B1 mutation being a later subclonal event in CLL15. k, As in j for root-to-tip branch lengths (that is, the length from the root to each tip in the lineage tree). l, Distribution of node ages (estimated number of divisions before present; see Methods) between the wild-type (white, n = 30 nodes) and mutated (grey, n = 25 nodes) SF3B1 enriched clade. Box plots are as defined in Fig. 1. P values were determined by two-sided Mann–Whitney U-test (b, d, f, i, j, l) or two-sided Fisher’s exact test (c).

Extended Data Fig. 9 Joint single-cell methylomics and RNA-seq link epigenetic and transcriptional information in CLL evolution with therapy.

a, Representative methylation-based lineage trees integrating cells before treatment (T0; white circle; n = 40 out of 96 randomly sampled cells) and during treatment (T1; red circle; n = 40 out of 96 randomly sampled cells) for samples CLL03, CLL04 and CLL05. See Fig. 4c for the percentage of T1 cells in each of the two clades (defined as the ones occurring after the first major split in the lineage tree) inferred from these lineage trees. b, Comparison of the CpG methylation level between the T1-enriched clade of cells and the remaining T1 cells identified from the lineage trees in a and in Fig. 4b and for samples CLL03, CLL04, CLL05 and CLL11, respectively. c, As in b for number of unique CpGs. d, Volcano plot of differentially methylated genes (absolute weighted average DNAme difference > 0.3 and two-sided non-parametric permutation test P < 0.05) between the T1-enriched clade of cells and the remaining T1 cells identified from the lineage trees in a and Fig. 4b for CLL03 (n = 515 genes), CLL04 (n = 429 genes), CLL05 (n = 690 genes) and CLL11 (n = 578 genes), respectively. Box plots are as defined in Fig. 1. P values were determined by two-sided Mann–Whitney U-test (b, c).

Extended Data Fig. 10 Cells preferentially expelled from the lymph nodes are marked by a distinct transcriptional profile.

a, Gene sets (canonical pathways; CP) enriched in differentially expressed genes (n = 336) between the T1-enriched clade of cells and the remaining T1 cells identified from the lineage trees in Fig. 4b and Extended Data Fig. 9a. A two-sided hypergeometric test was used to measure the enrichment of these genes in each gene set, followed by a Benjamini–Hochberg (BH) FDR procedure (cut-off of adjusted P < 0.2). b, Gene expression projections on lineage tree for TLR pathway genes from Fig. 4d for samples CLL04, CLL05 and CLL11, respectively. Scale bar represents RNA read counts scaled by maximal value. Expression value projection is performed only for T1 cells, comparing T1 versus T0-enriched clades. Asterisks indicate cells without RNA information. c, Fold change in gene expression of NF-κB-related genes between the T1-enriched clade of cells and the remaining T1 cells identified from the lineage trees in Fig. 4b and Extended Data Fig. 9a.

Supplementary information

Reporting Summary

Supplementary Table  1

Primers and adapters used in MscRRBS library construction.

Supplementary Table 2

List of cell barcodes used in MscRRBS library construction.

Supplementary Table 3

Summary table of healthy donors and CLL patient samples used in this study. Table includes sample ID, treatment condition, number of cells sequenced, number of cells with >50,000 unique CpGs, and cell type with surface markers.

Supplementary Table 4

Summary statistics for normal B and CLL cells with >50,000 unique CpGs. Table includes cell ID, sample name, number of reads, number of unique CpGs, and bisulfite conversion rate.

Supplementary Table 5

Low epimutation CpG sites (n = 166,720) identified from the 4-gamete analysis. A total of 166,720 unique CpGs across the 12 CLL patient samples exhibited a lower frequency of 4 gametes than expected based on their DNAme level. For each CpG site, genomic coordinates (chromosome, start, end) are given.

Supplementary Table 6

Discriminative motif discovery results summary for low epimutation CpG sites (n = 16,619) within gene promoters.

Supplementary Table 7

Low epimutation CpG sites (n = 1,585) identified from the 4-gamete analysis within enhancers. Amongst the 166,720 low epimutation CpGs identified by the 4-gamete test, 1,585 CpGs overlapped with enhancers. For each CpG site, genomic coordinates (chromosome, start, end) are given.

Supplementary Table 8

Discriminative motif discovery results summary for low epimutation CpG sites (n = 1,585) within enhancers.

Supplementary Table 9

Enriched gene lists in the two clades of cells inferred based on the epimutation information from methylation-based lineage tree shown in Fig. 3h. Differential expression is compared between cells that map to the SF3B1 wild type enriched clade and cells that map to the SF3B1 mutated enriched clade. Genes (n = 57) with absolute log2(fold-change) > 0.5 and Benjamini-Hochberg FDR adjusted weighted F test P-values < 0.2 are shown in bold. The direction of change between the two clades of cells inferred based on the epimutation information is also reported.

Supplementary Table 10

Enriched gene lists in the two clades of cells inferred based on the epimutation information from individual lineage trees integrating cells before treatment (T0) and during treatment (T1) for CLL03, CLL04, CLL05, and CLL11 individual patient samples. Differential expression is compared between T1 cells that map to the T1 enriched clades and T1 cells that map to the T0 enriched clades (CLL03, n = 9,808 genes; CLL04, n = 10,210 genes; CLL05, n = 10,426 genes; CLL11, n = 1,400 genes). P-values are shown for weighted F test (see Methods for details).

Supplementary Table 11

Enriched gene lists in the two clades of cells inferred in combination across CLL03, CLL04, CLL05, and CLL11 patient samples based on the epimutation information from lineage trees integrating cells before treatment (T0) and during treatment (T1). Differential expression is compared between T1 cells that map to the T1 enriched clades and T1 cells that map to the T0 enriched clades. Weighted F test P-values were combined across patient samples (CLL03, CLL04, CLL05, and CLL11; 8,372 genes expressed >5 cells in ≥3 samples) via Fisher's combined probability test. Average of log2(fold change) across samples is also reported.

Supplementary Table 12

Characteristics of the 12 CLL patients used in this study.

Supplementary Table 13

List of genes tested through Genoptix (Genoptix, Carlsbad, CA) clinical grade CLL gene panel testing.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gaiti, F., Chaligne, R., Gu, H. et al. Epigenetic evolution and lineage histories of chronic lymphocytic leukaemia. Nature 569, 576–580 (2019). https://doi.org/10.1038/s41586-019-1198-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41586-019-1198-z

This article is cited by

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing: Cancer

Sign up for the Nature Briefing: Cancer newsletter — what matters in cancer research, free to your inbox weekly.

Get what matters in cancer research, free to your inbox weekly. Sign up for Nature Briefing: Cancer