Adjacent CpG sites in mammalian genomes can be co-methylated owing to the processivity of methyltransferases or demethylases, yet discordant methylation patterns have also been observed, which are related to stochastic or uncoordinated molecular processes. We focused on a systematic search and investigation of regions in the full human genome that show highly coordinated methylation. We defined 147,888 blocks of tightly coupled CpG sites, called methylation haplotype blocks, after analysis of 61 whole-genome bisulfite sequencing data sets and validation with 101 reduced-representation bisulfite sequencing data sets and 637 methylation array data sets. Using a metric called methylation haplotype load, we performed tissue-specific methylation analysis at the block level. Subsets of informative blocks were further identified for deconvolution of heterogeneous samples. Finally, using methylation haplotypes we demonstrated quantitative estimation of tumor load and tissue-of-origin mapping in the circulating cell-free DNA of 59 patients with lung or colorectal cancer.
This is a preview of subscription content
Subscribe to Nature+
Get immediate online access to the entire Nature family of 50+ journals
Subscribe to Journal
Get full journal access for 1 year
only $4.92 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Get time limited or full article access on ReadCube.
All prices are NET prices.
Wigler, M., Levy, D. & Perucho, M. The somatic replication of DNA methylation. Cell 24, 33–40 (1981).
Landau, D.A. et al. Locally disordered methylation forms the basis of intratumor methylome variation in chronic lymphocytic leukemia. Cancer Cell 26, 813–825 (2014).
Slatkin, M. Linkage disequilibrium—understanding the evolutionary past and mapping the medical future. Nat. Rev. Genet. 9, 477–485 (2008).
Shoemaker, R., Deng, J., Wang, W. & Zhang, K. Allele-specific methylation is prevalent and is contributed by CpG-SNPs in the human genome. Genome Res. 20, 883–889 (2010).
Jones, B. DNA methylation: switching phenotypes with epialleles. Nat. Rev. Genet. 15, 572 (2014).
Schwartzman, O. & Tanay, A. Single-cell epigenomics: techniques and emerging applications. Nat. Rev. Genet. 16, 716–726 (2015).
Roadmap Epigenomics Consortium et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).
Houseman, E.A. et al. Reference-free deconvolution of DNA methylation data and mediation by cell composition effects. BMC Bioinformatics 17, 259 (2016).
Sun, K. et al. Plasma DNA tissue mapping by genome-wide methylation sequencing for non-invasive prenatal, cancer and transplantation assessments. Proc. Natl. Acad. Sci. USA 112, E5503–E5512 (2015).
Lehmann-Werman, R. et al. Identification of tissue-specific cell death using methylation patterns of circulating DNA. Proc. Natl. Acad. Sci. USA 113, E1826–E1834 (2016).
Schultz, M.D. et al. Human body epigenome maps reveal noncanonical DNA methylation variation. Nature 523, 212–216 (2015).
Heyn, H. et al. Distinct DNA methylomes of newborns and centenarians. Proc. Natl. Acad. Sci. USA 109, 10522–10527 (2012).
Xie, W. et al. Epigenomic analysis of multilineage differentiation of human embryonic stem cells. Cell 153, 1134–1148 (2013).
Blattler, A. et al. Global loss of DNA methylation uncovers intronic enhancers in genes showing expression changes. Genome Biol. 15, 469 (2014).
Heyn, H. et al. Epigenomic analysis detects aberrant super-enhancer DNA methylation in human cancer. Genome Biol. 17, 11 (2016).
Chen, K. et al. Loss of 5-hydroxymethylcytosine is linked to gene body hypermethylation in kidney cancer. Cell Res. 26, 103–118 (2016).
Shao, X., Zhang, C., Sun, M.A., Lu, X. & Xie, H. Deciphering the heterogeneity in DNA methylation patterns during stem cell differentiation and reprogramming. BMC Genomics 15, 978 (2014).
Hansen, K.D. et al. Increased methylation variation in epigenetic domains across cancer types. Nat. Genet. 43, 768–775 (2011).
Guelen, L. et al. Domain organization of human chromosomes revealed by mapping of nuclear lamina interactions. Nature 453, 948–951 (2008).
Wen, B., Wu, H., Shinkai, Y., Irizarry, R.A. & Feinberg, A.P. Large histone H3 lysine-9-dimethylated chromatin blocks distinguish differentiated from embryonic stem cells. Nat. Genet. 41, 246–250 (2009).
Dixon, J.R. et al. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature 485, 376–380 (2012).
Pujadas, E. & Feinberg, A.P. Regulated noise in the epigenetic landscape of development and disease. Cell 148, 1123–1131 (2012).
Irizarry, R.A. et al. The human colon cancer methylome shows similar hypo- and hypermethylation at conserved tissue-specific CpG island shores. Nat. Genet. 41, 178–186 (2009).
Ziller, M.J. et al. Charting a dynamic DNA methylation landscape of the human genome. Nature 500, 477–481 (2013).
Leung, D. et al. Integrative analysis of haplotype-resolved epigenomes across human tissues. Nature 518, 350–354 (2015).
Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).
ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Mitsui, K. et al. The homeoprotein Nanog is required for maintenance of pluripotency in mouse epiblast and ES cells. Cell 113, 631–642 (2003).
Shu, J. et al. Induction of pluripotency in mouse somatic cells with lineage specifiers. Cell 153, 963–975 (2013).
Guo, H. et al. Single-cell methylome landscapes of mouse embryonic stem cells and early embryos analyzed using reduced-representation bisulfite sequencing. Genome Res. 23, 2126–2135 (2013).
Snyder, M.W., Kircher, M., Hill, A.J., Daza, R.M. & Shendure, J. Cell-free DNA comprises an in vivo nucleosome footprint that informs its tissues-of-origin. Cell 164, 57–68 (2016).
Williams, K. et al. TET1 and hydroxymethylcytosine in transcription and DNA methylation fidelity. Nature 473, 343–348 (2011).
Saito, D. & Suyama, M. Linkage-disequilibrium analysis of allelic heterogeneity in DNA methylation. Epigenetics 10, 1093–1098 (2015).
Takai, D. & Jones, P.A. Comprehensive analysis of CpG islands in human chromosomes 21 and 22. Proc. Natl. Acad. Sci. USA 99, 3740–3745 (2002).
Timmons, J.A., Szkop, K.J. & Gallagher, I.J. Multiple sources of bias confound functional enrichment analysis of global '-omics' data. Genome Biol. 16, 186 (2015).
Andersson, R. et al. An atlas of active enhancers across human cell types and tissues. Nature 507, 455–461 (2014).
Hnisz, D. et al. Super-enhancers in the control of cell identity and disease. Cell 155, 934–947 (2013).
Thurman, R.E. et al. The accessible chromatin landscape of the human genome. Nature 489, 75–82 (2012).
Xie, H. et al. Genome-wide quantitative assessment of variation in DNA methylation patterns. Nucleic Acids Res. 39, 4099–4108 (2011).
Landan, G. et al. Epigenetic polymorphism and the stochastic formation of differentially methylated regions in normal and cancerous tissues. Nat. Genet. 44, 1207–1214 (2012).
Johnson, W.E., Li, C. & Rabinovic, A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8, 118–127 (2007).
Houseman, E.A. et al. DNA methylation arrays as surrogate measures of cell mixture distribution. BMC Bioinformatics 13, 86 (2012).
Gong, T. & Szustakowski, J.D. DeconRNASeq: a statistical framework for deconvolution of heterogeneous tissue samples based on mRNA-seq data. Bioinformatics 29, 1083–1085 (2013).
We thank S. Kaushal for managing and handling patient samples in the UCSD Moores Cancer Center Biorepository Tissue Technology Shared Resource, and S.M. Lippman, R. Liu and B. Ren for insightful discussions. This study was supported by US National Institutes of Health grants R01GM097253 (Kun Zhang), R01CA217642 (Kang Zhang), R01EY025090 (Kang Zhang) and P30CA23100 (S.M.L.), and a VA Merit Award (Kang Zhang).
S.G., D.D. and Kun Zhang are listed as inventors in patent applications related to the methods disclosed in this manuscript, and Kun Zhang is a co-founder and scientific advisor of Singlera Genomics, Inc.
Integrated supplementary information
(a) Distribution of MHB sizes. (b) Distribution of MHBs CpG densities (CpGs/bp). (c) Co-localization of known genomic features broken down by CpG density. We split all MHBs into quartiles where each quartile is as follows: (0, 0.046], (0.046, 0.097], (0.097, 0.155], (0.155, 6]. Note that closed brackets are inclusive. The 1st quartile (MHBs with the lowest CpG densities) are mostly in CGI shelf or shore, and are enriched for LAD, LOCK and enhancers.
Supplementary Figure 2 Loss of CpG linkage disequilibrium replicated in two additional samples tumor tissues from patients with kidney cancer.
Two kidney cancer WGBS data were downloaded from NCBI GEO GSE63183), and processed with the same computational procedures.
Supplementary Figure 3 Validation of MHBs with TCGA Methylation HM450K beadchip and ENCODE RRBS data.
(a) Squared Pearson correlation coefficient r2 versus methylation LD r2. (b) The Pearson correlation coefficient for CpGs in RRBS and HM450K data were significantly higher in regions overlapping with MHBs compared with the CpGs without overlapping with MHBs. IN denotes RRBS or HM450K CpGs within MHBs. OUT denotes RRBS or HM450K regions beyond MHBs.
Supplementary Figure 4 Profiles of H3K27ac, H3K4me3 and H3K4me1 over methylation haplotype blocks for 12 human adult tissue types.
X-axis denote the distances from the centers of MHBs (+/- 1000 bp) and y-axis denotes the average reads density in RPKM (input normalized reads per kilobase per million). Epigenomics Roadmap histones data were downloaded from NCBI GEO (https://www.ncbi.nlm.nih.gov/geo/roadmap/epigenomics/).
Supplementary Figure 5 PCA of human tissues and cells based on methylation haplotype loads in MHBs regions.
Tissues and cells from WGBS datasets were downloaded from 5 other studies and 10 adult tissues WGBS were from generated in this study.
Supplementary Figure 6 Distinct patterns of functional enrichment for TFBS associated with layer-specific MHBs.
(a) Venn diagrams of transcription factors (TF) with binding sites associated with layer specific hypo- or hyper- MHL regions. (b) Functional enrichment analysis of associated TFBS using GREAT (http://bejerano.stanford.edu/great/public/html/).
Supplementary Figure 7 Distribution of incidence of cancer-associated HMH in plasma samples from patients with colorectal cancer or lung cancer.
Y-axis denotes the frequency of caHMH and x-axis denotes the incidence (sample number) of the caHMH in CRC plasma samples (a) or LC plasma samples (b). The majority of caHMH are patient specific while a few have high incidence among the cancer plasma samples.
Supplementary Figure 8 Deconvolution of cancer and normal plasma samples using non-negative decomposition with quadratic programming.
(a) Deconvolution accuracy as a function of tumor fraction using simulated data. (b) Cancer DNA proportions estimated by deconvolution of plasma samples using CCT or LCT as the tumor reference.
Supplementary Figure 9 Estimated tumor fraction in plasma correlated with the normalized yield of DNA extraction from plasma.
Color bar represents the number of tissue specific MHBs (for each respective tissue) over the MHL threshold in each plasma sample.
Supplementary Figure 11 Distributions of counts of highly methylated tsMHBs in human plasma samples.
(a) Distributions of counts of ts-MHBs infor normal plasma samples forto 11 reference tissues. (b) Distributions of count of ts-MHBs infor lung cancer plasma samples forto lung tissue or to pan-cancer tissue (CT). (c) Distributions of counts of ts-MHBs infor colorectal cancer plasma samples forto colon tissue or to pan-cancer tissue (CT).
Distribution of enrichment Z-score in each set of reference-specific MHBs for colon cancer plasma samples (a) and lung cancer plasma samples (b). Integrating signatures from cancer and tissue-of-origin (Colon+CT; Lung+CT) improveds the prediction accuracy (c,d), on both types of plasma samples, compared withover focusing on cancer signatures alone (CT). The ROC curves were created by adjusting the Z-score cutoff for calculating specificities and sensitivities. AUC denotes area under the curve.
About this article
Cite this article
Guo, S., Diep, D., Plongthongkum, N. et al. Identification of methylation haplotype blocks aids in deconvolution of heterogeneous tissue samples and tumor tissue-of-origin mapping from plasma DNA. Nat Genet 49, 635–642 (2017). https://doi.org/10.1038/ng.3805
Journal of Translational Medicine (2022)
Cancer Gene Therapy (2022)
Limitations and opportunities of technologies for the analysis of cell-free DNA in cancer diagnostics
Nature Biomedical Engineering (2022)
npj Genomic Medicine (2022)
Comparison and imputation-aided integration of five commercial platforms for targeted DNA methylome analysis
Nature Biotechnology (2022)