Knowing the DNA methylation status of a cell is crucial to understanding its gene expression pattern and hence phenotype. However, most available datasets of human DNA methylation cover only a small fraction of the 30 million methylation sites in the human genome and are limited to in vitro-cultured cells or bulk tissues containing undefined mixtures of cells. Loyfer et al. report in Nature a publicly available comprehensive atlas of the human methylome, together with cell type-specific markers and computational tools for the analysis of mixed samples, that provides a wealth of data for further discovery.

The atlas was constructed using whole-genome bisulfite sequencing (WGBS) of purified cell populations from healthy adult humans, representing 77 primary cell types from 205 samples from 137 individuals. The WGBS data were analysed using software that the authors developed to segment the genome into 2,783,421 ‘methylation blocks’ (covering at least 3 adjacent CpG sites) that are differentially methylated between cell types. Notably, the focus on methylation blocks reflects the regional nature of DNA methylation, which was not possible with previous array-based datasets that profiled individual CpG sites.

Credit: shuoshu/DigitalVision Vectors/Getty

Methylation block patterns had very high levels of similarity between biological replicates, which supports the principle that methylation status is mainly determined by cell lineage rather than genetic or environmental factors. Furthermore, the methylation patterns could be used to confirm developmental relationships between cell types. Using an unsupervised agglomerative algorithm to cluster the 205 samples according to the methylation status of 20,997 blocks having highest variability across samples, the resulting fan diagram not only grouped together biological samples of the same cell type but also recapitulated known lineage relationships. For example, pancreatic islet cells clustered with pancreatic duct and acinar cells and then with hepatocytes, which share endodermal origins.

To identify cell type-specific markers, samples were organized into 39 groups of the same cell type and analysed for blocks that were differentially methylated (mainly unmethylated) in one group compared with all other groups. The top 25 differentially unmethylated blocks for each cell type together comprise an atlas of 953 cell type-specific methylation markers that has enormous potential for analysis of composite tissue samples and cell-free DNA (cfDNA). The authors developed a deconvolution algorithm for DNA methylation sequencing data using these cell type-specific markers and confirmed the accuracy of the approach to infer cell type composition using in silico mixtures of DNA sequences. Consistent with previous studies using more limited methylome data, cfDNA from blood of healthy donors was shown to be mainly derived from leukocytes, with minor contributions from vascular endothelial cells and hepatocytes; surprisingly, the new atlas also showed a significant contribution of megakaryocytes and erythrocyte progenitor cells. Using the same algorithm on WGBS data from 52 patients hospitalized with COVID-19, vascular endothelial cells were identified as making a significant contribution to cfDNA that correlated with disease severity. Such analysis of cfDNA in COVID-19 and other diseases could provide insight into tissue-specific damage in many pathologies.

To further characterize the differentially methylated regions, Loyfer et al. showed by gene-set analysis that the genes adjacent to these regions mainly reflect cell type-specific functions. They further showed that the cell type-specific unmethylated regions have high levels of DNA accessibility and are enriched for histone marks indicative of active promoters (H3K27ac) and enhancers (H3K4me1), enhancer annotations as mapped by chromHMM, and binding motifs of key transcription factors for that cell type. In keeping with the assumption that the differentially unmethylated regions represent gene enhancers, the authors used a computational algorithm to identify nearby genes that have increased expression where the cell type-specific marker is unmethylated; this analysis identified hallmark genes for many cell types, such as insulin and glucagon genes for pancreatic islet markers. Mapping of unmethylated genomic regions specific to each of the 39 cell types was used to generate a catalogue of cell type-specific putative enhancer regions for further analysis.

Whereas most of the differentially methylated regions were unmethylated in the cell type of interest, about 3% of these were methylated in one cell type but unmethylated elsewhere. Such hypermethylated regions were enriched for target sequences of the chromatin looping factor CTCF, suggesting that they might be involved in cell type-specific 3D genome organization. Comparing patterns of DNA methylation with published data on CTCF occupancy at a locus that is methylated specifically in colon and small intestine suggests that DNA methylation prevents CTCF binding at this locus.

Thus, the detailed, genome-wide mapping of the human methylome presented in this study offers both mechanistic insights into gene regulation and enhancer activity and great clinical potential for analysis of cfDNA in disease states.