A DNA methylation atlas of normal human cell types

Loyfer, Netanel; Magenheim, Judith; Peretz, Ayelet; Cann, Gordon; Bredno, Joerg; Klochendler, Agnes; Fox-Fisher, Ilana; Shabi-Porat, Sapir; Hecht, Merav; Pelet, Tsuria; Moss, Joshua; Drawshy, Zeina; Amini, Hamed; Moradi, Patriss; Nagaraju, Sudharani; Bauman, Dvora; Shveiky, David; Porat, Shay; Dior, Uri; Rivkin, Gurion; Or, Omer; Hirshoren, Nir; Carmon, Einat; Pikarsky, Alon; Khalaileh, Abed; Zamir, Gideon; Grinbaum, Ronit; Abu Gazala, Machmud; Mizrahi, Ido; Shussman, Noam; Korach, Amit; Wald, Ori; Izhar, Uzi; Erez, Eldad; Yutkin, Vladimir; Samet, Yaacov; Rotnemer Golinkin, Devorah; Spalding, Kirsty L.; Druid, Henrik; Arner, Peter; Shapiro, A. M. James; Grompe, Markus; Aravanis, Alex; Venn, Oliver; Jamshidi, Arash; Shemer, Ruth; Dor, Yuval; Glaser, Benjamin; Kaplan, Tommy

doi:10.1038/s41586-022-05580-6

Download PDF

Article
Open access
Published: 04 January 2023

A DNA methylation atlas of normal human cell types

Nature volume 613, pages 355–364 (2023)Cite this article

80k Accesses
94 Citations
211 Altmetric
Metrics details

Subjects

Abstract

DNA methylation is a fundamental epigenetic mark that governs gene expression and chromatin organization, thus providing a window into cellular identity and developmental processes¹. Current datasets typically include only a fraction of methylation sites and are often based either on cell lines that underwent massive changes in culture or on tissues containing unspecified mixtures of cells^2,3,4,5. Here we describe a human methylome atlas, based on deep whole-genome bisulfite sequencing, allowing fragment-level analysis across thousands of unique markers for 39 cell types sorted from 205 healthy tissue samples. Replicates of the same cell type are more than 99.5% identical, demonstrating the robustness of cell identity programmes to environmental perturbation. Unsupervised clustering of the atlas recapitulates key elements of tissue ontogeny and identifies methylation patterns retained since embryonic development. Loci uniquely unmethylated in an individual cell type often reside in transcriptional enhancers and contain DNA binding sites for tissue-specific transcriptional regulators. Uniquely hypermethylated loci are rare and are enriched for CpG islands, Polycomb targets and CTCF binding sites, suggesting a new role in shaping cell-type-specific chromatin looping. The atlas provides an essential resource for study of gene regulation and disease-associated genetic variants, and a wealth of potential tissue-specific biomarkers for use in liquid biopsies.

A pan-tissue DNA methylation atlas enables in silico decomposition of human tissue methylomes at cell-type resolution

Article Open access 11 March 2022

Tianyu Zhu, Jacklyn Liu, … Andrew E. Teschendorff

Simultaneous single-cell analysis of 5mC and 5hmC with SIMPLE-seq

Article 09 February 2024

Dongsheng Bai, Xiaoting Zhang, … Chengqi Yi

NOMePlot: analysis of DNA methylation and nucleosome occupancy at the single molecule

Article Open access 31 May 2019

Francisco Requena, Helena G. Asenjo, … David Landeira

Main

Understanding how the same DNA sequence is interpreted differently in different cell types is a fundamental challenge of biology. Gene expression, DNA accessibility and chromatin packaging are well-established essential determinants of cellular phenotype. Underneath these lies DNA methylation, a stable epigenetic mark that underpins the lifelong maintenance of cellular identity.

Available human DNA methylation datasets suffer from major limitations. Multiple studies that have characterized methylomes of embryonic development, differentiation, cancer or other settings^6,7,8,9 have relied on the Illumina BeadChip platforms, which are limited to a predefined subset of 450,000 or 860,000 CpG methylation sites, representing just 3% of around 30 million CpG sites in the human genome¹⁰. In addition, by measuring each CpG site independently, such assays overlook coordinated patterns of DNA methylation occurring in blocks, the critical functional units of DNA methylation^11,12.

Most DNA methylation analyses interrogated primarily bulk tissue, thus precluding the study of minority cell types such as tissue-resident immune cells, fibroblasts or endothelial cells, whereas others analysed cultured cells, which may contain nonphysiological methylation patterns introduced in vitro¹³. As a partial solution, recent studies used single-cell RNA sequencing data from whole tissues to identify marker genes expressed in specific cell types, then identified specific CpGs whose methylation is anticorrelated with expression. These could be used on array-based methylomes to deconvolute bulk tissue and assess cell type composition or sample purity^14,15, but might be insufficiently accurate for identification of rare cellular contributions in liquid biopsies. Some studies of the human methylome did analyse isolated primary cells using whole-genome bisulfite sequencing (WGBS), but their scope was limited^2,4,5.

To overcome these limitations and to accurately characterize the human cell methylome, we performed deep genome-wide sequencing with paired-end, 150 base pair (bp)-long reads at an average sequencing depth of 30× (6.62× or greater) on fluorescent activated cell sorter (FACS)-purified populations of 39 human cell type groups obtained from freshly dissociated adult healthy tissues. We coalesced methylation patterns across the entire genome into blocks of homogeneously methylated CpG sites and used these to study variation in methylation patterns across cell types. Here we identify and characterize genomic regions that are uniquely methylated in a tissue or cell-type-specific manner, provide vignettes of their possible biological function and introduce a fragment-level deconvolution algorithm with applications such as clinical diagnosis based on circulating cell-free DNA methylation.

Methylation atlas of human cell types

To portray genome-wide DNA methylation across a variety of cell types, we performed WGBS (150-bp-long paired-end reads to a mean depth of at least 30×) on 205 samples representing 77 primary cell types from 137 consenting donors. These were carefully sorted and mapped to the human genome (hg19, hg38). Average sample purity (that is, proportion of material from desired cell type) was over 90% as determined by flow cytometry, gene expression and DNA methylation analysis. Several samples showed lower purity (for example, colon fibroblasts 78%, smooth muscle cells (SMC) 82%, endothelial cells 86% or adipocytes 87%). Detailed descriptions of sample isolation and purity estimations, as well as sample information, are provided in Supplementary Table 1, Supplementary Figs. 1–3 and Supplementary Information.

The cell types analysed (Fig. 1) represent most major human cell types, allowing a composite view of physiological systems (for example, gastrointestinal tract, haematopoietic cells and pancreas), as well as a comparison of similar cell types in different environments (for example, tissue-resident macrophages).

**Fig. 1: Methylation atlas of the adult human body.**

The 205 methylomes show great similarities between replicates with distinctive changes between cell types in a block-like manner, as shown in Fig. 1. We sought to identify genomic regions differentially methylated in specific cell types to shed light on cell-type-specific biological processes, define cell identity and facilitate development of methylation biomarkers to identify the cellular origin of circulating cfDNA fragments^{1,11,12,16,17,18,19,20,21}.

We developed wgbstools, a computational machine learning suite, to represent, compress, visualize and analyse WGBS data (https://github.com/nloyfer/wgbs_tools). We segmented the genome into 7,104,162 nonoverlapping continuous blocks by identification of change points in DNA methylation patterns across multiple conditions. Each block spans highly correlated CpG sites similarly methylated in each sample but that may covary across cell types (Supplementary Information). We retained 2,783,421 methylation blocks of at least three CpGs with an average length of 544 bp (interquartile range (IQR) = 565 bp) and eight CpGs (IQR = 5 CpGs). Robust analysis of these compact genomic units is more straightforward than individual CpG sites and because of the regional nature of methylation can be viewed as the biological ‘atoms’ of human DNA methylation¹².

Interindividual variation in methylation

Methylation patterns were extremely robust across different individuals. For most cell types, 0.5% or less of blocks showed a difference of 50% or more across different donors compared with 4.9% among samples of different cell types (Extended Data Fig. 4). This high similarity in DNA methylation across donors is on a par with the estimated interindividual variability of genomic sequence²². Whereas the definition of 50% is somewhat arbitrary, other thresholds (35–50%) show a similar trend, with 0.5% or less variable blocks. Similar interindividual variation was observed in replicates obtained from different laboratories (Supplementary Table 1). Strikingly, for cell types with n ≥ 3 biological replicates, 195 of 197 samples (99%) showed the highest similarity to another replicate (rather than to another cell type from the same donor). These results demonstrate the reproducibility of preparations but also, in agreement with previous studies⁶, highlight the fundamental biological phenomenon that DNA methylation is primarily determined by cell lineage and cell-type-specific programmes rather than by genetic or environmental factors.

Methylation records developmental history

Whereas DNA methylation patterns reflect the functional identity of a cell, they could also be used to track its developmental history. To identify patterns shared by the progeny of early progenitors, we calculated average methylation within blocks of at least four CpGs and selected those showing the highest variability across all samples (21,000 blocks, top 1%; Supplementary Table 2). We then clustered all 205 methylomes using an unsupervised agglomerative algorithm that iteratively identifies and connects the two closest samples regardless of their labelling²³. This analysis systematically grouped biological samples of the same cell type (Fig. 2), similar to array-based clustering of purified human blood cells⁶. This supports the reproducibility of cell isolation and suggests that three or four replicates of each normal cell type are sufficient to infer its methylation patterns for practical applications such as biomarker identification.

**Fig. 2: Unsupervised agglomerative clustering reflects human developmental lineage of healthy cell types.**

Strikingly, the resulting fanning diagram recapitulates key elements of lineage relationships among human tissues. For example, pancreatic islet cell types (alpha, beta and delta), which originate from the same embryonic endocrine progenitor²⁴, densely cluster together. Consistent with methylomes reflecting lineage rather than function, islet cells further cluster with pancreatic duct and acinar cells, and then with hepatocytes, with whom they share endodermal origins. Conversely, endoderm-derived islet cells do not cluster with ectoderm-derived neurons²⁵ despite common tissue-specific gene regulation and exocytosis machinery²⁶.

Additional examples include the clustering of gastric, small intestine and colon epithelial cells; the clustering of all blood cell types; and the clustering of multiple mesoderm-derived cell types including vascular endothelial cells, adipocytes and skeletal muscle. Interestingly, lung bronchial epithelium clustered with oesophagus and oral epithelium whereas lung alveolar epithelium clustered with intestinal epithelium, consistent with evidence of early developmental origins of the alveolar cell lineage²⁷.

Some methylation patterns were common to lineages that formed during early developmental stages. For example, 892 regions were unmethylated in epithelial cells derived from early endodermal derivatives and methylated in mesoderm- and ectoderm-derived cells (Methods). We suggest that these were demethylated in the endoderm germ layer, with derived cell types retaining these patterns decades later (Extended Data Fig. 5a). Because endoderm derivatives do not share common function or gene expression, this provides yet another example of methylation patterns as a stable lineage mark.

Finally, we applied the same segmentation and clustering approach to a published methylation atlas from the Roadmap Epigenomics project⁴. The algorithm did not group related cell types, and often clustered samples based on donor identity. This further emphasizes the importance of careful purification of homogeneous cell types, avoiding mixed cell populations (Extended Data Fig. 5b).

Cell-type-specific methylation markers

We next turned to study genomic regions differentially methylated in a cell-type-specific manner. We organized the 205 samples into 39 groups of specific cell types, including blood cell types (B, T, natural killer (NK), granulocytes, monocytes and tissue-resident macrophages), breast epithelium (basal and luminal), lung epithelium (alveolar and bronchial), pancreatic endocrine (alpha, beta and delta) and exocrine (acinar and duct) cells, vascular endothelial cells from various sources, cardiomyocytes and cardiac fibroblasts and more. We also defined 12 supergroups in which related cell types were grouped, including muscle cells, gastrointestinal epithelium, pancreas and more (Supplementary Table 3).

We then focused on differentially methylated blocks comprising five or more CpGs that are unmethylated in one group of cell types but methylated in all other samples, or vice versa. Intriguingly, almost all regions (97%) were unmethylated in one cell type and methylated in all others. We then sorted these differential regions by absolute difference in methylation in target cell type versus all other samples (Methods and Supplementary Information).

The top 25 differentially unmethylated regions for each cell type comprise a human cell-type-specific methylation atlas of 1,246 markers (Fig. 3 and Supplementary Table 4). These regions are uniquely unmethylated in particular cell types (average methylation 13%) and methylated in all other samples (average methylation 91%), and can serve as sensitive biomarkers for quantification of the presence of DNA from a specific cell type in a mixture. The markers include 953 cell-type-specific unmethylated loci, as well an additional 293 loci that are unmethylated in few related cell types. A fragment-level analysis further shows that the vast majority of DNA fragments at these regions are unmethylated in the target cell type compared with almost none in all other cell types (Extended Data Fig. 6). The atlas has various applications, including the analysis of circulating cell-free DNA fragments^{18,19,20,21,28,29,30}. Importantly, only about 1% of cell type-specific markers are covered by reduced representation bisulfite sequencing (RRBS), 4–8% by methyl-sequencing hybrid capture panels and 14–24% are represented in single-CpG 450K/EPIC arrays¹⁰, emphasizing the benefits of whole-genome sequencing for exhaustive identification of biomarkers.

**Fig. 3: A human methylation atlas of 205 samples across 39 cell type groups.**

Human cell-type-specific regulatory maps

We next turned to characterize these sets of cell-type-specific differentially unmethylated regions. For this we identified the top 250 unmethylated markers for each cell type (Supplementary Table 4b) and used GREAT to identify those genes adjacent to each group of markers, and to test their enrichment for various gene-set annotations³¹. Genes adjacent to loci uniquely unmethylated in a given cell type typically reflect the functional identity of that cell type. For example, genes near B cell markers were enriched for B cell morphology, differentiation, IgM levels and lymphopoiesis; NK cell markers were associated with NK cell-mediated cytotoxicity, the haematopoietic system, cytotoxicity and lymphocyte physiology; Fallopian tube markers were enriched for egg coat and perivitelline space; and cardiomyocyte markers for cardiac relaxation, systolic pressure, muscle development and hypertrophy (Supplementary Table 5).

We then analysed the DNA accessibility and chromatin packaging of cell-type-specific markers as defined by assay for transposase-accessible chromatin using sequencing (ATAC–seq), DNase I hypersensitive site sequencing (DNaseI–seq)^4,32 and histone marks indicative of active promoters and enhancers⁴. The top 250 unmethylated markers for monocytes and macrophages are highly accessible and characterized by H3K27ac and H3K4me1 in monocytes, whereas markers of other cell types show no enrichment in monocytes (Fig. 4a), with similar results for markers of other cell types (Extended Data Fig. 7). We also show strong coordinated enrichment of chromHMM enhancer annotations at cell-type-specific markers³³ (Fig. 4a). These findings are consistent with previous studies that have associated tissue-specific demethylation with gene enhancers^1,34.

**Fig. 4: Cell-type-specific markers as putative enhancers.**

To further assess the biological importance of cell-type-specific unmethylated regions, we studied their association with transcription factors (TFs) that could either affect DNA methylation or bind DNA in a cell-type-specific manner, depending on methylation and chromatin^35,36,37,38. We identified the top 1,000 unmethylated markers per cell type (Supplementary Table 4c) and performed motif analysis using HOMER³⁹ to calculate the enrichment of known TF binding motifs (Supplementary Table 6a). For most cell types the top motifs included master regulators and key TFs (Fig. 4b). For example, B cells are enriched for Ebf2/HEB/E2A, granulocytes for CEBP/AP1/ETS and T cells for ETS/RUNX. This association between cell-type-specific unmethylated regions and TF binding motifs can identify new gene regulatory circuits and expose distal enhancers active in specific cell types.

We aimed to identify the target genes of putative enhancers marked by cell-type-specific demethylation. Top markers frequently fall within intronic regions and are likely to regulate these genes (for example, glucagon in pancreatic alpha cells, NPPA and MYL4 in cardiomyocytes and MBP in oligodendrocytes; Supplementary Table 7), or proximally to probable targets (for example, a beta cell marker 5 kb from the insulin gene). Other markers are further apart from their target genes. We devised a computational algorithm to identify genes in the proximity of cell-type-specific markers showing increased gene expression levels under matching conditions (Methods). This highlighted hallmark genes for many cell types and suggested putative targets for many of the top 25 unmethylated markers for each cell type. For example, hepatocyte markers were associated with APOE, APOC1, APOC2 and the glucagon receptor. Similarly, cardiomyocyte markers were associated with NPPA, NPPB and myosin genes; and pancreatic islet markers with insulin and glucagon genes (Supplementary Table 7). These findings further support the principle that loci specifically unmethylated in a given cell type are probably enhancers positively regulating genes expressed in this cell type, often controlling adjacent genes. We note, however, that genes adjacent to a locus specifically unmethylated in a given cell type are often broadly expressed beyond this cell type (Discussion).

To generate a catalogue of putative regulatory regions in each cell type we applied a fragment-level analysis across all samples from each cell type, independently of other cell types. We scanned the entire genome and identified genomic regions in which at least 85% of DNA fragments with at least four CpGs are unmethylated (Methods). This identified a set of unmethylated genomic regions in each of the 39 cell type groups analysed, including 36,111 regions on average (Supplementary Dataset 1). These regions were then annotated for genomic features, showing that 56% on average overlapped CpG islands, 46% were near promoter regions and 44% overlapped CTCF binding sites, thus highlighting the regulatory and structural roles of unmethylated loci. When available, we crossed these regions with chromatin immunoprecipitation sequencing (ChIP–seq) peaks from ENCODE⁵ and Roadmap Epigenomics⁴ under matching conditions, including H3K4me3, H3K27ac, H3K4me1, H3K27me3, CTCF and ATAC–seq, and generated a cell-type-specific catalogue of putative enhancer regions comprising unmethylated regions that overlap H3K27ac, but not H3K4me3, peaks (Supplementary Dataset 2). Motif analysis of these regions identified key TFs in each cell type, similar to those shown in Fig. 4 (Supplementary Table 6b,c).

Cell-type-specific hypermethylated loci

We studied those genomic regions methylated in one cell type but unmethylated elsewhere in the human body. These are enriched for CpG islands (38% of methylated regions compared with 1.7–2.7% of cell-type-specific unmethylated regions), and are marked by H3K27me3 and Polycomb in other cell types (Fig. 5a–c), as previously reported for cancer and developmental processes^40,41. These cell-type-specific hypermethylated regions were generally less significant for motif enrichment (compared with uniquely unmethylated regions). Intriguingly, only around 3% of the total set of cell-type-specific differentially methylated regions are hypermethylated.

**Fig. 5: Cell-type-specific hypermethylated regions are enriched for CpG islands, Polycomb targets and CTCF and REST/NSRF.**

After pooling all cell-type-specific hypermethylated regions, we identified strong enrichment for target sequences of the chromatin regulator CTCF (P ≤ 1 × 10^–18; Fig. 5d). This suggests that DNA methylation of CTCF binding sites could act as a tissue-specific regulatory switch to modulate its binding, potentially affecting tissue-specific three-dimensional genomic organization^35,42,43. To test this idea we compared patterns of DNA methylation at CTCF sites with genome-wide CTCF protein binding in specific tissues. Figure 5e shows the methylation pattern and published in vivo CTCF occupancy at one locus, which is methylated specifically in the colon and intestine. Consistent with DNA methylation preventing CTCF binding, ChIP data show selective absence of CTCF binding at this locus in the colon. In addition, loci methylated in specific cell types were enriched for targets of the transcriptional repressor of neural genes, RE1-silencing TF/neuron-restrictive silencer factor (REST/NRSF) (P ≤ 1 × 10^–24), and this was seen most prominently in the methylome of pancreatic islet cells (Fig. 5f). Whereas DNA methylation has not been shown to affect the binding or activity of REST, this finding raises the intriguing possibility that methylation of REST targets in islets could permit endocrine differentiation independently of REST repression.

Fragment-level methylome deconvolution

Last, we developed a computational fragment-level deconvolution algorithm for DNA methylation sequencing data and used the top 25 markers defined for each cell type (a total of 1,246 markers) to study methylomes obtained from composite tissue samples and cfDNA. Briefly, we generated an atlas in which the percentage of unmethylated fragments is computed for every marker (row) in each cell type (column). A non-negative least-squares (NNLS) algorithm is then used to fit an input sample and estimate its relative contributions (Supplementary Information).

To estimate the accuracy of our fragment-level approach, we used in silico mixtures of sequenced reads. For each cell type we applied a leave-one-out approach to mix one held-out sample in leukocyte reads, then used the deconvolution algorithm to infer cellular composition in the mixture. We repeated this process at concentrations varying from 0 to 10%. As shown in Fig. 6a, we found that the 1,246 markers (top 25 per cell type) allowed accurate detection of DNA from a given source at around 0.1% resolution, an improvement of nearly one order of magnitude in comparison with array-based approaches²⁸. Four-way in silico mixes, in which endothelial and hepatocyte methylomes were also included to realistically mimic cfDNA composition, yielded similar results (Extended Data Fig. 8).

**Fig. 6: Fragment-level deconvolution using cell type-specific biomarkers.**

We then estimated the cellular composition of leukocytes and cfDNA using WGBS data from 23 healthy donors; 99.5% of leukocyte-derived DNA was attributed to granulocytes, monocytes, macrophages and NK, T and B cells, consistent with typical blood counts (Fig. 6b and Supplementary Table 8). The cfDNA of healthy subjects was mostly derived from leucocytes: granulocytes (29.7%), monocytes/macrophages (20%) and lymphocytes (3%). Solid tissues contributing to cfDNA included vascular endothelial cells (6%) and hepatocytes (3.1%) (Fig. 6c), consistent with previous results²⁸. The current atlas also shows a significant contribution of megakaryocytes (31%) and erythrocyte progenitor (prog.) cells (5%) to cfDNA, which were not observed in previous studies that used reference methylomes of a more limited scope.

Endothelial cfDNA in patients with COVID-19

Analysis based on DNA methylation patterns offers an opportunity to identify the tissue origins of cfDNA. COVID-19 inflicts damage to multiple tissues, some of which have no biomarkers. We used the atlas to deconvolve shallow WGBS data from 52 patients hospitalized owing to COVID-19 (ref. ⁴⁴). We identified excessive cell-free DNA fragments from granulocytes, erythrocyte progenitors, lung and liver, consistent with published analysis of these samples (Supplementary Information). Strikingly, we also identified a significant contribution of vascular endothelial cells to the cfDNA of these patients, which could not be detected in the published analysis in the absence of an endothelial cell methylome reference (Fig. 6d). Interestingly, the concentration of endothelial cell-derived cfDNA was higher in patients with severe disease (WHO score ≥7) compared with those with milder disease (WHO score ≤6; P ≤ 6 × 10^–5, Mann–Whitney). These results suggest that vascular endothelial cell death plays a substantial role in the pathogenesis of COVID-19, potentially related to coagulopathy, and highlight the benefit of using a comprehensive cell-type-specific atlas for cfDNA methylome analysis.

Cell type deconvolution of composite tissues

Finally, we analysed whole-genome methylomes from ENCODE⁵ and the Roadmap Epigenomics atlas⁴ using our atlas (based on 25 markers per cell type). Deconvolution of some methylomes showed a homogenous composition as intended—for example, 97–99% T cell DNA in Roadmap T cell samples (Supplementary Table 9). However, analysis of other samples showed a highly heterogeneous composition, as previously reported based on array-based bulk tissue deconvolution algorithms such as EpiDISH and EpiScore^14,15,45. For example, heart ventricle samples comprised 29% cardiomyocytes, 41% endothelial cells and 18% cardiac fibroblasts (Fig. 6e); liver methylomes comprised around 60% hepatocytes, 21% blood and 20% endothelial cells; and colon methylomes comprised about 50% colon epithelium, 26% colon fibroblasts and 19% blood. Most strikingly, Roadmap lung samples were dominated by blood (40%), endothelium (34%) and smooth muscle (5%), with only 22% of DNA derived from lung epithelial cells (Fig. 6f–i and Supplementary Table 9). Importantly, a similar deconvolution of the 205 samples presented here yielded an average contribution of 94% for the expected cell type for each sample (median of 95%, Supplementary Table 10), or of 91% (median of 92%) in a more stringent leave-one-out cross-validation analysis (Supplementary Table 11), highlighting the purity of collected samples.

Naturally, fragment-level analysis is limited to cell types for which whole-genome sequencing data are available, and some cell types can be analysed only by array-based algorithms^15,28. Nonetheless, the markers and algorithm presented here allow analysis of composite bulk tissue and plasma samples, across multiple cell types and with high accuracy.

Discussion

The comprehensive atlas of human cell type methylomes described here sheds light on principles of DNA methylation and provides a valuable resource for multiple lines of investigation, as well as translational applications.

Our analysis used whole-genome sequencing data to show that methylation patterns are strikingly similar among healthy replicates of the same cell type from different individuals. The similarity between individuals reflects the robustness of cell differentiation and maintenance circuits, at least as far as healthy tissues are concerned. Pathologies involving destabilization of the epigenome obviously disrupt these circuits, resulting in a larger variety of methylation patterns among cells descended from a specific normal cell type. We predict that, even in cancers (of the same primary anatomic site and histologic type), comparative methylome analysis of purified epithelial cells, performed at the level of methylation blocks, will show a smaller interindividual variation than typically assumed.

As the atlas demonstrates, each cell type has a set of genomic regions that are uniquely unmethylated in that cell type compared with others, as well as additional genomic regions that share methylation patterns with related cell types. Using unsupervised clustering of cell-type-specific methylomes, we found that cell types were clustered in ways that reflected their developmental origins rather than expression patterns. This offers a fascinating view of DNA methylation as a record of the methylomes of progenitor cells, retained in the genome through dramatic developmental transitions and decades of life thereafter. We propose that comparative methylome analysis will allow reconstruction of parts of the methylomes of fetal structures or cell types, similarly to the reconstruction of last common ancestors in evolutionary biology.

The vast majority of cell-type-specific differentially methylated regions were specifically demethylated in one cell type. The chromatin of these regions is typically highly accessible and bears histone marks associated with active gene regulation, as found in enhancers and promoters. Moreover, these loci are enriched for TF binding site motifs that operate in that cell type. We devised an integrated approach that, based on distance and gene expression profiles, allowed us to highlight potential target genes for these putative enhancer regions. Many enhancer regions were associated with nearby genes that are broadly expressed, potentially reflecting gene regulation by multiple tissue-specific enhancers. Our findings are consistent with previous studies that showed tissue-specific hypomethylation occurring at gene enhancers^35,36,37. Our data-driven approach for marker identification is complementary to recent gene-centric approaches^14,15 that use tissue-specific single-cell RNA sequencing data to define marker genes and identify neighbouring CpGs specifically unmethylated in target cell types. Finally, we devised a fragment-level genomic analysis to identify tens of thousands of unmethylated regions, per cell type, which were annotated with genomic features, DNA accessibility, chromatin marks and TF binding motifs to produce a cell type-specific catalogue of putative enhancers. Further analysis of this atlas will show and validate the complete set of human enhancers in each cell type.

Conversely, we identified genomic regions specifically methylated in one or two cell types, representing around 3% of cell-type-specific differentially methylated regions. These are often located in CpG islands and characterized by H3K27me3 and Polycomb binding in tissues where the locus is not methylated^40,41. This epigenetic repressive switching was previously described in cancer and during early development^41,46, but its role during differentiation of specific cell types remains unclear. These regions are enriched for CTCF binding sites, suggesting a role for DNA methylation in attenuating the binding of CTCF and thus modulation of the cell-type-specific, three-dimensional organization of neighbouring DNA^35,36,47.

For DNA methylation sequencing data, the atlas described here is, to our knowledge, the most comprehensive compendium to date. We identified more than one thousand cell-type-unique DNA methylation regions that could serve as accurate and specific biomarkers for fragment-level analysis and identification of cell death events by monitoring of cfDNA. Notably, most of these marker regions are not covered by 450K/EPIC BeadChip DNA methylation arrays, and were not previously appreciated. To allow interpretation of array data, we offer alternative sets of cell-type-specific markers limited to CpG sites included in BeadChip 450K arrays. Similarly, we identified cell-type-specific markers in regions targeted by both RRBS and hybrid capture panels (Extended Data Fig. 9 and Supplementary Tables 12–17). As shown in Extended Data Fig. 10, the array-adapted atlas allows high-resolution interpretation of array methylomes of pancreatic islet, lung and breast biopsies, highlighting the presence of cell types not previously profiled^48,49,50.

Many cell types are missing from the atlas, typically because of limited availability of material. Examples include osteoblasts, cholangiocytes, cells of the adrenal gland, urethral epithelium and haematopoietic stem cells. Additionally, we did not separate many subpopulations of interest—for example, different types of neurons or lymphocytes. The atlas is viewed as a living, publicly available database to be updated in the future. The resolution of the atlas yields a quantitative understanding of composite tissues and allows one to identify missing methylomes of additional cell types yet to be characterized. We also acknowledge that the purity of the sorted cell populations varies, owing to variation in the quality of antibodies used for FACS and the extent to which they allow separation of cell types. Nonetheless, even the least pure cell types in the atlas (for example, some preparations of vascular endothelial cells, fibroblasts, SMC and adipocytes showing 70–80% purity), when averaged over replicates, are useful for identification of differentially methylated regions and for inference of cell composition in mixtures.

In summary, we present a comprehensive methylation atlas of primary human cell types along with an extensive set of cell-type-specific markers and computation tools for fragment-level analysis of mixed cell type samples. These complement the plethora of array-based methylomes and deconvolution tools available for the analysis of array data. Together, the data shed light on the roles of DNA methylation in cellular biology and gene regulation and facilitate the identification of enhancers active in each cell type. Perhaps the most promising utility of our atlas is the potential for fragment-level deconvolution of mixed cell type samples, allowing sensitive identification of the tissue of origin of cfDNA in plasma of individuals with cancer and other diseases^{18,19,20,21,28,29,30}.

Methods

Human tissue samples

Human tissues were obtained from various sources, as detailed in Supplementary Table 1. The majority (148) of the 205 samples analysed were sorted from tissue remnants obtained at the time of routine, clinically indicated surgical procedures at the Hadassah Medical Center. In all cases, normal tissue distant from any known pathology was used. Surgeons and/or pathologists were consulted before removal of tissue to confirm that its removal would not compromise the final pathologic diagnosis in any way. For example, in patients undergoing right colectomy for carcinoma of the caecum, the most distal part of the ascending colon and most proximal part of the terminal ileum were obtained for cell isolation. Normal bone marrow was obtained at the time of joint replacement in patients with no known haematologic pathology. The patient population included 135 individuals (n = 60 males, n = 74 females) aged 3–83 years. The majority of donors were White. Approval for collection of normal tissue remnants was provided by the Institutional Review Board (IRB, Helsinki Committee), Hadassah Medical Center, Jerusalem, Israel. Written informed consent was obtained from each donor or legal guardian before surgery.

As described in Supplementary Table 1, some cells and tissues were obtained through collaborative arrangements: pancreatic exocrine and liver samples (cadaveric organ donors, n = 5) from M. Grompe, Oregon Health & Science University; adipocytes (subcutaneous adipocytes at time of cosmetic surgery following weight loss, n = 3), oligodendrocytes and neurons (brain autopsies, n = 14) from K. L. Spalding and H. Druid, Karolinska Institute, Stockholm; and research-grade cadaveric pancreatic islets from J. Shapiro, University of Alberta (n = 16). In all cases, tissues were obtained and transferred in compliance with local laws and after the approval of the local ethics committee on human experimentation. Sixteen cell types were obtained from commercial sources, including 15 from Lonza and one from Sigma-Aldrich. Three pancreatic islet preparations were obtained from the Integrated Islet Distribution Program (https://iidp.coh.org).

Tissue dissociation and FACS sorting of purified cell populations

Fresh tissue obtained at the time of surgery was trimmed to remove extraneous tissue. Cells were dispersed using enzyme-based protocols optimized for each tissue type. The resulting single-cell suspension was incubated with the relevant antibodies and FACS sorted to obtain the desired cell type (Extended Data Fig. 2 and Supplementary Information).

Purity of live sorted cells was determined by messenger RNA analysis for key known cell-type-specific genes, whereas the purity of cells fixed before sorting was determined using previously validated cell-type-specific methylation signals (Extended Data Fig. 2c and Supplementary Information). DNA was extracted using the DNeasy Blood and Tissue kit (no. 69504, Qiagen) according to the manufacturer’s instructions, and stored at −20 °C for bisulfite conversion and whole-genome sequencing.

WGBS

Up to 75 ng of sheared genomic DNA was subjected to bisulfite conversion using the EZ-96 DNA Methylation Kit (Zymo Research), with liquid handling on a MicroLab STAR (Hamilton). Dual-indexed sequencing libraries were prepared using Accel-NGS Methyl-Seq DNA library preparation kits (Swift BioSciences) and custom liquid handling scripts executed on the Hamilton MicroLab STAR. Libraries were quantified using KAPA Library Quantification Kits for Illumina Platforms (Kapa Biosystems). Four uniquely dual-indexed libraries, along with the 10% PhiX v.3 library (Illumina), were pooled and clustered on an Illumina NovaSeq 6000 S2 flow cell followed by 150 bp, paired-end sequencing. Total read count and average sequencing depth (in read pairs), as well as percentage of CpGs, per sample, at 1× and 10×, are detailed in Supplementary Table 1. Also listed are average methylation levels, per sample, at CpG, nonCpG and CC dinucleotides. Intriguingly, sorted neuron samples showed higher CpA methylation (approximately 10%) compared with other samples (approximately 1%).

WGBS computational processing

Paired-end FASTQ files were mapped to the human (hg19, hg38), lambda, pUC19 and viral genomes using bwa-meth (v.0.2.0)⁵¹ then converted to BAM files using SAMtools (v.1.9)⁵². Duplicated reads were marked by Sambamba (v.0.6.5) with parameters ‘-l 1 -t 16 --sort-buffer-size 16000 --overflow-list-size 10000000’ (ref. ⁵³). Reads with low mapping quality, duplicated or not mapped in a proper pair were excluded using SAMtools view with parameters ‘-F 1796 -q 10’. Reads were stripped from nonCpG nucleotides and converted to PAT files using wgbstools (v.0.1.0)⁵⁴.

Genomic segmentation into multisample homogenous blocks

We developed and implemented a multichannel dynamic Pprogramming segmentation algorithm to divide the genome into continuous genomic regions (blocks), showing homogeneous methylation levels across multiple CpGs for each sample⁵⁴. A generative probabilistic model is used, each block inducing a Bernoulli distribution with some ${\theta }_{i}^{k}$, where i is the block index and k the sample index (k = 1,..., K), and each observation (occurence of one CpG at one sequenced fragment) is represented by a random variable sampled i.i.d. (independent and identically distributed) from the same beta value Ber ${\theta }_{i}^{k}$. The log-likelihood of all sequencing data is the sum of log-likelihoods across all blocks, each decomposing as the sum of log-likelihoods across all samples. The log-likelihood of the ith block can therefore be formalized as:

$${\rm{score}}({{\rm{block}}}_{i})={ll}_{i}={{\varSigma }^{K}}_{k=1}({{({N}_{C})}_{i}}^{k}\times \log ({{\hat{\theta }}_{i}}^{k})+{{({N}_{T})}_{i}}^{k}\times \log (1-{{\hat{\theta }}_{i}}^{k}))$$

where ${({N}_{C})}_{i}^{k}\,,\,{({N}_{T})}_{i}^{k}$ is the number of methylated and unmethylated observations, respectively, in the ith block in the kth sample, whereas ${{\hat{\theta }}_{i}}^{k}$ marks a Bayes estimator of the Bernoulli distribution parameter, calculated with ${a}_{C},{a}_{T}$ pseudocounts for each block/sample:

$${{\hat{\theta }}_{i}}^{k}=\frac{{{({N}_{C})}_{i}}^{k}+{\alpha }_{C}}{{{({N}_{C})}_{i}}^{k}+{{({N}_{T})}_{i}}^{k}+{\alpha }_{C}+{\alpha }_{T}}$$

These hyperparameters are used for regularization, to control the trade-off between overfitting (shorter blocks) and generalization (longer blocks). Dynamic programming is then used to find the optimal segmentation across the genome. Briefly, we maintain a 1 × N table T (N = 28,217,448 CpGs) for optimal segmentation scores across all prefixes. Specifically, T[i] holds the score of the optimal segmentation of all CpG sites from 1 through to i, and T[N] holds the final, optimal, score across the entire genome. The table itself is updated sequentially from 1 to N, where the optimal segmentation up to position i is achieved by the addition of a new block to a shorter optimal segmentation (for example, up to position i′):

$$T[i]=\mathop{\max }\limits_{i{\prime} < i}\{T[i{\prime} ]+{\rm{score}}({\rm{block}}[i{\prime} +1,...,i])\}$$

For this, all previous optimal segmentations are considered and a new block is added from position (iʹ + 1) to position i (with a maximal block size of 5,000 bp). The combination that maximizes log-likelihood is selected as the optimal segmentation from 1 to i, and the start index of the last block is recorded in a traceback table. Once the score of optimal segmentation is calculated in T[N], the traceback table is used to retrieve the full segmentation. An upper bound on block length (5,000 bases) is set to improve running times and each chromosome is run separately. The linear distance between consecutive CpGs is ignored under this model. The model and segmentation algorithm are further described in Supplementary Information.

Segmentation and clustering analysis

We segmented the genome into 7,104,162 blocks using wgbstools (with parameters ‘segment --max_bp 5000’) with all of the 205 samples as reference, and retained 2,099,681 blocks covering at least four CpGs. For hierarchical clustering (Fig. 2) we selected the top 1% (20,997) blocks showing the highest variability in average methylation across all samples. Blocks with sufficient coverage of at least ten observations (calculated as sequenced CpG sites) across two-thirds of the samples were further retained. We then computed the average methylation for each block and sample calculated using wgbstools (--beta_to_table -c 10), marked blocks with fewer than ten observations as missing values and imputed their methylation values using sklearn KNNImputer (v.0.24.2)⁵⁵. The 205 samples were clustered with the unsupervised agglomerative clustering algorithm²³, using scipy (v.1.6.3)⁵⁶ and L1 norm. The fanning diagram was plotted using ggtree (v.2.2.4)⁵⁷.

Cell-type-specific markers

The 205 atlas samples were divided into 51 groups by cell type, yielding 39 basic groups and 12 composite supergroups (Supplementary Table 3). We then performed a one-versus-all comparison to identify differentially methylated blocks unique for each cell type. For this we used wgbstools’ ‘find_markers’ function to first identify blocks covering at least five CpGs (length 10–1,500 bp) to calculate the average methylation per block/sample and rank the blocks according to the difference in average methylation between target samples versus all other samples. To allow some flexibility, this difference was computed (for unmethylated markers) as the difference between the 75th percentile in target samples (typically allowing one outlier) versus the 2.5th percentile in the background group (typically allowing about five outlier samples). For methylated markers, this was computed as the difference between the 25th and 97.5th percentiles (Supplementary Information). Low-coverage blocks (fewer than 25 observations), in which the estimation error of average methylation was around 10%, were replaced by a default value of 0.5 which is neither unmethylated nor methylated, thus reducing the block’s methylation difference and downgrading its rank. For cell type-specific markers we selected the top 25 per cell type, for a total of 1,246 markers (Supplementary Table 4a).

Atlases for 450K/EPIC, RRBS and hybrid capture panels were identified similarly while examining a subset of genomic regions, overlapping various probe sets or genomic regions (-b option). Chromatin analysis was performed on the top 250 markers per cell type (total of 11,713 markers; Supplementary Table 4b). Motif analysis was performed on the top 1,000 markers per cell type (total of 50,286 markers; Supplementary Table 4b) using the difference between the 25th and 75th percentile, to allow putative enhancers unmethylated in additional cell types.

Enrichment for gene set annotations

Analysis of gene set enrichment was performed using GREAT³¹. For each cell type we selected the top 250 differentially unmethylated regions and ran GREAT via batch web interface using default parameters. Enrichments for ‘Ensembl Genes’ were ignored, and a significance threshold of binomial false discovery rate ≤0.05 was used.

Enrichment for chromatin marks

For each cell type we analysed the top 250 differentially unmethylated regions versus published ChIP–seq (H3K27ac and H3K4me1) and DNase sequencing from the Roadmap Epigenomics project (downloaded from ftp.ncbi.nlm.nih.gov/pub/geo/DATA/roadmapepigenomics/by_experiment and http://egg2.wustl.edu/roadmap/data/byDataType/dnase/BED_files_enh) in bigWig and bed formats. These include E032 for B cell markers, E034 for T cell markers, E029 for monocyte/macrophage markers, E066 for liver hepatocytes, E104 for heart cardiomyocytes and fibroblasts and E109 and E110 for gastric/small intestine/colon⁴. Annotations for chromHMM were downloaded (15-states version) from https://egg2.wustl.edu/roadmap/data/byFileType/chromhmmSegmentations/ChmmModels/coreMarks/jointModel/final³, and genomic regions annotated as enhancers (7_Enh) were extracted and reformatted in bigWig format. Raw single-cell ATAC–seq data were downloaded from GEO GSE165659 (ref. ³²) as ‘feature’ and ‘matrix’ files for 70 samples. For each sample, cells of the same type were pooled to output a bedGraph file, which was mapped from hg38 to hg19 using UCSC liftOver⁵⁸. Overlapping regions were dropped using bedtools (v.2.26.0)⁵⁹. Finally, bigWig files were created using bedGraphToBigWig (v.4)⁶⁰. Heatmaps and average plots were prepared using deepTools (v.3.4.1)⁶¹, with the functions ‘computeMatrix’, ‘plotHeatmap’ and ‘plotProfile’. We used default parameters except for ‘referencePoint=center’, 15 kb margins and ‘binSize=200’ for ChIP–seq, DNaseI and chromHMM data, and 75 kb margins with ‘binSize=1000’ for ATAC–seq data.

Motif analysis

For each cell type we analysed the top 1,000 differentially unmethylated regions for known motifs (Supplementary Table 6a) using the HOMER function ‘findMotifsGenome.pl’, with parameters ‘-bits’ and ‘-size 250’³⁹. Similar analyses were performed for the unmethylated regions in each cell type (Supplementary Table 6b), as well as unmethylated regions overlapping H3K27ac, but not H3K4me3, peaks (Supplementary Table 6c).

Methylation marker–gene associations

For each cell-type-specific marker we identified all neighbouring genes up to 500 kb apart. We then examined the expression levels of these genes across the GTEx dataset covering 50 tissues and cell types⁶². We then standardized the expression of each gene across all conditions, by replacing expression values with standard deviations (z-scores) above/below the average expression of that gene across samples. This was followed by column-wise standardization in which the relative enrichment of a gene under a given condition is normalized by the enrichment of other genes under that condition. This highlighted the most overexpressed genes for each tissue. We then classified each ‘marker–gene–condition’ combination as tier 1: distance ≤5 kb, expression ≥10 TPM and z-score ≥1.5; tier 2: same as tier 1 but with distance ≤50 kb; tier 3: up to 750 kb, expression ≥25 TPM and z-score ≥5; and tier 4: same as tier 3 but with z-score ≥3.5.

A catalogue of unmethylated loci and putative enhancers for each cell type

For each genomic region (blocks of at least four CpGs), and for any of the 39 cell type groups, fragments with at least four CpGs from all replicates were merged and classified as either U (fragment-level methylation 15% or less), M (at least 85%) or X (over 15% but below 85%). The percentage of U fragments was then calculated using ‘wgbstools homog --threshold .15,.85’, and blocks with at least 85% unmethylated fragments retained. These blocks were overlapped with genomic features based on UCSC hg19 annotations, including CpG islands and transcriptional start site regions (up to 1 kb from a gene start site). We also used narrowPeak annotations downloaded from Roadmap⁴ and ENCODE project⁵ (accessions listed in Supplementary Table 6d). hg38 bed files were converted to hg19 using liftOver⁵⁸. For putative enhancers, nonpromoter active regulatory regions were defined as those overlapping H3K27ac, but not H3K4me3, peaks under matching conditions. TF binding sites were downloaded from JASPAR 2022 (ref. ⁶³).

Interindividual variation in cell type methylation

We define a similarity score between two samples as the fraction of blocks containing at least three CpGs and at least ten binary observations (sequenced CpG sites) in which the average methylation of the two samples differs by at least 0.5. Only cell types with n ≥ 3 FACS-sorted replicates from different donors are considered (136 samples in total).

CTCF ChIP–seq analysis

CTCF ChIP–seq data were downloaded from the ENCODE project⁵ as 168 bigWig files, covering 61 tissues/cell types (hg19). Samples of the same cell type were averaged using multiBigwigSummary (v.3.4.1)⁶¹.

Endodermal marker analysis

All 892 endodermal hypomethylated markers were found using wgbstools function ‘find_markers’ (v.0.2.0), with parameters ‘--delta_quants 0.4 --tg_quant 0.1 --bg_quant 0.1’ (ref. ⁵⁴). For endoderm-derived epithelium, 51 samples were compared with 103 nonepithelial samples from mesoderm or ectoderm. Blocks were selected as markers if the average methylation of the 90th percentile of the epithelial samples was lower than the tenth percentile of the nonepithelial samples by at least 0.4.

UXM fragment-level deconvolution algorithm

We developed a fragment-level deconvolution algorithm: each fragment was annotated as U (mostly unmethylated), M (mostly methylated) or X (mixed) depending on the number of methylated and unmethylated CpGs⁶⁴. We then calculated, for each genomic region (marker) and across all cell types, the proportion of U/X/M fragments with at least k CpGs. Here we used k = 4 and thresholds of less than or equal to 25% methylated CpGs for U reads, and more than or equal to 75% methylated CpGs for M reads. We then constructed reference atlas A with 1,232 regions (top 25 markers per cell type), in which the A_i,j cell holds the U proportion of the ith marker in the jth cell type. Given an input sample, the U proportion at each marker is computed to form a 1,232 × 1 vector b. Then, NNLS is applied to infer coefficient vector x by minimizing ${| A\times x-b| }_{2}$ subject to non-negative x, normalized to ${\Sigma }_{j}{x}_{j}=1$. Alternatively, each marker can be weighed differently based on fragment coverage in the input sample. For this, b can be defined as the number of U fragments in each region and the rows of A similarly multiplied by Ci, the total number of fragments in each region, thus minimizing ${| {\rm{diag}}(C)\times A\times x-b| }_{2}$. Additional details are available in Supplementary Information.

In silico simulation of WGBS deconvolution

Simulated mixtures were performed for cardiomyocytes (n = 4), bladder epithelium (n = 5), breast epithelium (n = 7), endothelial cells (n = 19) and erythrocyte progenitors (n = 3) in a leave-one-out manner. For this, one sample was held out and segmentation and marker selection (25 per cell type) were rerun using the remaining 204 samples. We then simulated mixtures by sampling and mixing reads from the held-out sample at 10, 3, 1, 0.3, 0.1, 0.03 and 0% into a background of leukocyte samples. This was repeated ten times. Finally, mixed samples were analysed using the UXM fragment-level algorithm with markers from the reduced (204) atlas, using fragments with at least three CpGs. Merging, splitting and mixing of reads were performed using wgbstools (v.0.1.0)⁵⁴.

Array-based analysis was performed by computing, for each mixed set of fragments, average methylation levels across each of around 480,000 CpG sites present in the 450K array (‘wgbstools beta_to_450k’). We then deconvolved these data according to the method of Moss et al.²⁸ (https://github.com/nloyfer/meth_atlas).

We also simulated four-way mixtures in which background plasma methylomes were simulated as a combination of 90% fragments from leukocytes, 7.5% from a vascular endothelial sample and 2.5% from a hepatocyte sample. As described above, this was done by holding out the three samples (for example, cardiomyocytes, endothelial cells and hepatocytes) and then rererunning segmentation and marker selection on the (202 = 205 – 3) remaining samples, to obtain a set of markers that was then used for fragment-level deconvolution of mixtures.

WGBS deconvolution

Leukocytes and matching plasma samples (n = 23) were processed as described above and analysed using the WGBS methylation atlas, including 1,246 markers plus (for plasma samples) an additional 25 megakaryocyte markers. Fifty-two plasma samples from 28 patients with SARS-CoV-2 (ref. ⁴⁴) downloaded as FASTQ files were processed as described above. Because of the low coverage (1–2×) of these samples, we extended the marker set from the top 25 to the top 250 markers per cell type (Supplementary Table 4b), and also included 250 megakaryocyte markers⁶⁵. Roadmap⁴ and ENCODE⁵ samples were processed as described above and analysed using the UXM algorithm.

Deconvolution of 450K array data

Previously published 450K array data were downloaded from either The Cancer Genome Atlas (lung and breast biopsies)^49,50 or GEO accession no. GSE62640 (ref. ⁴⁸) and deconvoluted with meth_atlas NNLS software (https://github.com/nloyfer/meth_atlas) using our array-adapted atlas (Supplementary Table 12). Breast biopsies were grouped using PAM50 classifications⁶⁶.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

DNA methylation data are available in formats bigWig (position and average methylation across 28,217,448 CpGs) and beta (a similar wgbstools-compatible binary format) at GEO (accession no. GSE186458). BigWig and beta files for hg38 are also available. Fragment-level information (in pat format, including CpG starting index, methylation pattern of all covered CpGs and number of fragments with exact multiCpG pattern) are also available. Raw fastq files have been deposited at the European Genome-phenome Archive (EGA) under study accession number: EGAS00001006791 and can be downloaded upon request to EGA (through the atlas Data Access Committee).

Code availability

Code is available at github.com/nloyfer/wgbs_tools and github.com/nloyfer/UXM_deconv.

References

Dor, Y. & Cedar, H. Principles of DNA methylation and their implications for biology and medicine. Lancet 392, 777–786 (2018).
Article CAS Google Scholar
Ziller, M. J. et al. Charting a dynamic DNA methylation landscape of the human genome. Nature 500, 477–481 (2013).
Article ADS CAS Google Scholar
Cancer Genome Atlas Research Network. Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature 455, 1061–1068 (2008).
Article Google Scholar
Kundaje, A. et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).
Article CAS Google Scholar
ENCODE Project Consortiumet al. Expanded encyclopaedias of DNA elements in the human and mouse genomes. Nature 583, 699–710 (2020).
Article ADS Google Scholar
Reinius, L. E. et al. Differential DNA methylation in purified human blood cells: implications for cell lineage and studies on disease susceptibility. PLoS ONE 7, e41361 (2012).
Article ADS CAS Google Scholar
Smith, Z. D. et al. A unique regulatory phase of DNA methylation in the early mammalian embryo. Nature 484, 339–344 (2012).
Article ADS CAS Google Scholar
Hannum, G. et al. Genome-wide methylation profiles reveal quantitative views of human aging rates. Mol. Cell 49, 359–367 (2013).
Article CAS Google Scholar
Titus, A. J., Gallimore, R. M., Salas, L. A. & Christensen, B. C. Cell-type deconvolution from DNA methylation: a review of recent applications. Hum. Mol. Genet. 26, R216–R224 (2017).
Article CAS Google Scholar
Bibikova, M. et al. High density DNA methylation array with single CpG site resolution. Genomics 98, 288–295 (2011).
Article CAS Google Scholar
Lehmann-Werman, R. et al. Identification of tissue-specific cell death using methylation patterns of circulating DNA. Proc. Natl Acad. Sci. USA 113, E1826–E1834 (2016).
Article CAS Google Scholar
Guo, S. et al. Identification of methylation haplotype blocks aids in deconvolution of heterogeneous tissue samples and tumor tissue-of-origin mapping from plasma DNA. Nat. Genet. 49, 635–642 (2017).
Article CAS Google Scholar
Franzen, J. et al. DNA methylation changes during long-term in vitro cell culture are caused by epigenetic drift. Commun. Biol. 4, 598 (2021).
Article CAS Google Scholar
Teschendorff, A. E., Zhu, T., Breeze, C. E. & Beck, S. EPISCORE: cell type deconvolution of bulk tissue DNA methylomes from single-cell RNA-Seq data. Genome Biol. 21, 221 (2020).
Article CAS Google Scholar
Zhu, T. et al. A pan-tissue DNA methylation atlas enables in silico decomposition of human tissue methylomes at cell-type resolution. Nat. Methods 19, 296–306 (2022).
Article CAS Google Scholar
Wan, J. C. M. et al. Liquid biopsies come of age: towards implementation of circulating tumour DNA. Nat. Rev. Cancer 17, 223–238 (2017).
Article CAS Google Scholar
Aravanis, A. M., Lee, M. & Klausner, R. D. Next-generation sequencing of circulating tumor DNA for early cancer detection. Cell 168, 571–574 (2017).
Article CAS Google Scholar
Sun, K. et al. Plasma DNA tissue mapping by genome-wide methylation sequencing for noninvasive prenatal, cancer, and transplantation assessments. Proc. Natl Acad. Sci. USA 112, E5503–E5512 (2015).
Article CAS Google Scholar
Kang, S. et al. CancerLocator: non-invasive cancer diagnosis and tissue-of-origin prediction using methylation profiles of cell-free DNA. Genome Biol. 18, 53 (2017).
Article Google Scholar
Li, W. et al. CancerDetector: ultrasensitive and non-invasive cancer detection at the resolution of individual reads using cell-free DNA methylation sequencing data. Nucleic Acids Res. 46, e89 (2018).
Article Google Scholar
Klein, E. A. et al. Clinical validation of a targeted methylation-based multi-cancer early detection test using an independent validation set. Ann. Oncol. 32, 1167–1177 (2021).
Article CAS Google Scholar
The 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Article Google Scholar
Michener, C. D. & Sokal, R. R. A quantitative approach to a problem in classification. Evolution 11, 130–162 (1957).
Article Google Scholar
Gu, G., Dubauskaite, J. & Melton, D. A. Direct evidence for the pancreatic lineage: NGN3⁺ cells are islet progenitors and are distinct from duct progenitors. Development 129, 2447–2457 (2002).
Article CAS Google Scholar
Le Douarin, N. M. On the origin of pancreatic endocrine cells. Cell 53, 169–171 (1988).
Article Google Scholar
van Arensbergen, J. et al. Derepression of Polycomb targets during pancreatic organogenesis allows insulin-producing beta-cells to adopt a neural gene activity program. Genome Res. 20, 722–732 (2010).
Article Google Scholar
Frank, D. B. et al. Early lineage specification defines alveolar epithelial ontogeny in the murine lung. Proc. Natl Acad. Sci. USA 116, 4362–4371 (2019).
Article ADS CAS Google Scholar
Moss, J. et al. Comprehensive human cell-type methylation atlas reveals origins of circulating cell-free DNA in health and disease. Nat. Commun. 9, 5068 (2018).
Article ADS Google Scholar
Liu, M. C. et al. Sensitive and specific multi-cancer detection and localization using methylation signatures in cell-free DNA. Ann. Oncol. 31, 745–759 (2020).
Article CAS Google Scholar
Gai, W. et al. Applications of genetic-epigenetic tissue mapping for plasma DNA in prenatal testing, transplantation and oncology. eLife 10, e64356 (2021).
Article CAS Google Scholar
McLean, C. Y. et al. GREAT improves functional interpretation of cis-regulatory regions. Nat. Biotechnol. 28, 495–501 (2010).
Article CAS Google Scholar
Zhang, K. et al. A single-cell atlas of chromatin accessibility in the human genome. Cell 184, 5985–6001 (2021).
Article CAS Google Scholar
Ernst, J. & Kellis, M. ChromHMM: automating chromatin-state discovery and characterization. Nat. Methods 9, 215–216 (2012).
Article CAS Google Scholar
Kirillov, A. et al. A role for nuclear NF-kappaB in B-cell-specific demethylation of the Igkappa locus. Nat. Genet. 13, 435–441 (1996).
Article CAS Google Scholar
Stadler, M. B. et al. DNA-binding factors shape the mouse methylome at distal regulatory regions. Nature 480, 490–495 (2011).
Article ADS CAS Google Scholar
Schübeler, D. Function and information content of DNA methylation. Nature 517, 321–326 (2015).
Article ADS Google Scholar
Yin, Y. et al. Impact of cytosine methylation on DNA binding specificities of human transcription factors. Science 356, eaaj2239 (2017).
Article Google Scholar
Feldmann, A. et al. Transcription factor occupancy can mediate active turnover of DNA methylation at regulatory regions. PLoS Genet. 9, e1003994 (2013).
Article Google Scholar
Heinz, S. et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell 38, 576–589 (2010).
Article CAS Google Scholar
Straussman, R. et al. Developmental programming of CpG island methylation profiles in the human genome. Nat. Struct. Mol. Biol. 16, 564–571 (2009).
Article CAS Google Scholar
Gal-Yam, E. N. et al. Frequent switching of Polycomb repressive marks and DNA hypermethylation in the PC3 prostate cancer cell line. Proc. Natl Acad. Sci. USA 105, 12979–12984 (2008).
Article ADS Google Scholar
Wang, H. et al. Widespread plasticity in CTCF occupancy linked to DNA methylation. Genome Res. 22, 1680–1688 (2012).
Article CAS Google Scholar
Wiehle, L. et al. DNA (de)methylation in embryonic stem cells controls CTCF-dependent chromatin boundaries. Genome Res. 29, 750–761 (2019).
Article CAS Google Scholar
Cheng, A. P. et al. Cell-free DNA tissues of origin by methylation profiling reveals significant cell, tissue, and organ-specific injury related to COVID-19 severity. Med (N Y) 2, 411–422 (2021).
CAS Google Scholar
Zheng, S. C. et al. A novel cell-type deconvolution algorithm reveals substantial contamination by immune cells in saliva, buccal and cervix. Epigenomics 10, 925–940 (2018).
Article CAS Google Scholar
Li, Y. et al. Genome-wide analyses reveal a role of Polycomb in promoting hypomethylation of DNA methylation valleys. Genome Biol. 19, 18 (2018).
Article Google Scholar
Nordin, M., Bergman, D., Halje, M., Engström, W. & Ward, A. Epigenetic regulation of the Igf2/H19 gene cluster. Cell Prolif. 47, 189–199 (2014).
Article CAS Google Scholar
Hall, E. et al. Sex differences in the genome-wide DNA methylation pattern and impact on gene expression, microRNA levels and insulin secretion in human pancreatic islets. Genome Biol. 15, 522 (2014).
Article Google Scholar
Weisenberger, D. J. Characterizing DNA methylation alterations from The Cancer Genome Atlas. J. Clin. Invest. 124, 17–23 (2014).
Article CAS Google Scholar
Cancer Genome Atlas Network. Comprehensive molecular portraits of human breast tumours. Nature 490, 61–70 (2012).
Article ADS Google Scholar
Pedersen, B. S., Eyring, K., De, S., Yang, I. V. & Schwartz, D. A. Fast and accurate alignment of long bisulfite-seq reads. Preprint at https://doi.org/10.48550/arXiv.1401.1129 (2014).
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Article Google Scholar
Tarasov, A., Vilella, A. J., Cuppen, E., Nijman, I. J. & Prins, P. Sambamba: fast processing of NGS alignment formats. Bioinformatics 31, 2032–2034 (2015).
Article CAS Google Scholar
Loyfer, N. & Kaplan, T. wgbstools: A computational suite for DNA methylation sequencing data representation, visualization, and analysis. https://github.com/nloyfer/wgbs_tools (accessed 28 November 2022).
Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
MathSciNet MATH Google Scholar
Virtanen, P. et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods 17, 261–272 (2020).
Article CAS Google Scholar
Yu, G. Using ggtree to visualize data on tree-like structures. Curr. Protoc. Bioinformatics 69, e96 (2020).
Article Google Scholar
Kent, W. J. et al. The human genome browser at UCSC. Genome Res. 12, 996–1006 (2002).
Article CAS Google Scholar
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Article CAS Google Scholar
Kent, W. J., Zweig, A. S., Barber, G., Hinrichs, A. S. & Karolchik, D. BigWig and BigBed: enabling browsing of large distributed datasets. Bioinformatics 26, 2204–2207 (2010).
Article CAS Google Scholar
Ramírez, F. et al. deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Res. 44, W160–W165 (2016).
Article Google Scholar
GTEx Consortium. Genetic effects on gene expression across human tissues. Nature 550, 204–213 (2017).
Article Google Scholar
Castro-Mondragon, J. A. et al. JASPAR 2022: the 9th release of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 50, D165–D173 (2022).
Article CAS Google Scholar
Loyfer, N. & Kaplan, T. UXM: a fragment-level algorithm for computational deconvolution of whole tissues and plasma samples using DNA methylation sequencing data. https://github.com/nloyfer/UXM_deconv (accessed 28 November 2022).
Moss, J. et al. Megakaryocyte and erythroblast DNA in plasma and platelets. Preprint at bioRxiv https://doi.org/10.1101/2022.10.03.510502 (2022).
Parker, J. S. et al. Supervised risk predictor of breast cancer based on intrinsic subtypes. J. Clin. Oncol. 27, 1160–1167 (2009).
Article Google Scholar

Download references

Acknowledgements

We thank H. Cedar and N. Friedman for insightful discussions. We also thank members of the Dor, Kaplan and Rosenfeld laboratories. This work was supported by grants from GRAIL, Alzheimer’s Drug Discovery Foundation, Human Islet Research Network (nos. HIRN UC4DK116274 and UC4DK104216), the Ernest and Bonnie Beutler Research Program of Excellence in Genomic Medicine, The Alex U Soyka pancreatic cancer fund, The Israel Science Foundation, the Waldholtz/Pakula family, the Robert M. and Marilyn Sternberg Family Charitable Foundation, the Helmsley Charitable Trust and DON Foundation (to Y.D.), Israel Science Foundation (no. 1250/18 to T.K.) and the Center for Interdisciplinary Data Science Research (to T.K., Y.D. and B.G.). N.L. was supported by CIDR Data Science and Leibniz fellowships. Y.D. holds the Walter and Greta Stiel Chair and Research Grant in Heart Studies.

Author information

Einat Carmon
Present address: Department of Surgery, Samson Assuta Ashdod University Hospital, Ashdod, Israel
Alex Aravanis
Present address: Illumina, Inc., San Diego, CA, USA
These authors contributed equally: Netanel Loyfer, Judith Magenheim, Ayelet Peretz

Authors and Affiliations

School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel
Netanel Loyfer, Sapir Shabi-Porat & Tommy Kaplan
Department of Developmental Biology and Cancer Research, Institute for Medical Research Israel-Canada, Hadassah Medical Center and Faculty of Medicine, Hebrew University of Jerusalem, Jerusalem, Israel
Judith Magenheim, Ayelet Peretz, Agnes Klochendler, Ilana Fox-Fisher, Merav Hecht, Tsuria Pelet, Joshua Moss, Zeina Drawshy, Ruth Shemer, Yuval Dor & Tommy Kaplan
GRAIL, Inc., Menlo Park, CA, USA
Gordon Cann, Joerg Bredno, Hamed Amini, Patriss Moradi, Sudharani Nagaraju, Alex Aravanis, Oliver Venn & Arash Jamshidi
Sharett Institute of Oncology, Hadassah Hebrew University Medical Center, Jerusalem, Israel
Joshua Moss
Department of Obstetrics and Gynecology, Hadassah Medical Center and Faculty of Medicine, Hebrew University of Jerusalem, Jerusalem, Israel
Dvora Bauman, David Shveiky, Shay Porat & Uri Dior
Department of Orthopedics, Hadassah Medical Center and Faculty of Medicine, Hebrew University of Jerusalem, Jerusalem, Israel
Gurion Rivkin & Omer Or
Department of Otolaryngology, Hadassah Medical Center and Faculty of Medicine, Hebrew University of Jerusalem, Jerusalem, Israel
Nir Hirshoren
Department of General Surgery, Hadassah Medical Center and Faculty of Medicine, Hebrew University of Jerusalem, Jerusalem, Israel
Einat Carmon, Abed Khalaileh, Gideon Zamir, Ronit Grinbaum, Machmud Abu Gazala, Ido Mizrahi & Noam Shussman
Surgery Division, Hadassah Medical Center and Faculty of Medicine, Hebrew University of Jerusalem, Jerusalem, Israel
Alon Pikarsky
Department of Cardiothoracic Surgery, Hadassah Medical Center and Faculty of Medicine, Hebrew University of Jerusalem, Jerusalem, Israel
Amit Korach, Ori Wald, Uzi Izhar & Eldad Erez
Department of Urology, Hadassah Medical Center and Faculty of Medicine, Hebrew University of Jerusalem, Jerusalem, Israel
Vladimir Yutkin
Department of Vascular Surgery, Shaare Zedek Medical Center, Jerusalem, Israel
Yaacov Samet
Department of Endocrinology and Metabolism, Hadassah Medical Center and Faculty of Medicine, Hebrew University of Jerusalem, Jerusalem, Israel
Devorah Rotnemer Golinkin & Benjamin Glaser
Department of Cell and Molecular Biology, Karolinska Institutet, Stockholm, Sweden
Kirsty L. Spalding
Department of Oncology-Pathology, Karolinska Institutet, Stockholm, Sweden
Henrik Druid
Department of Forensic Medicine, The National Board of Forensic Medicine, Stockholm, Sweden
Henrik Druid
Department of Medicine (H7) and Karolinska University Hospital, Karolinska Institutet, Stockholm, Sweden
Peter Arner
Department of Surgery and the Clinical Islet Transplant Program, University of Alberta, Edmonton, Alberta, Canada
A. M. James Shapiro
Papé Family Pediatric Research Institute, Oregon Health & Science University, Portland, OR, USA
Markus Grompe

Authors

Netanel Loyfer
View author publications
You can also search for this author in PubMed Google Scholar
Judith Magenheim
View author publications
You can also search for this author in PubMed Google Scholar
Ayelet Peretz
View author publications
You can also search for this author in PubMed Google Scholar
Gordon Cann
View author publications
You can also search for this author in PubMed Google Scholar
Joerg Bredno
View author publications
You can also search for this author in PubMed Google Scholar
Agnes Klochendler
View author publications
You can also search for this author in PubMed Google Scholar
Ilana Fox-Fisher
View author publications
You can also search for this author in PubMed Google Scholar
Sapir Shabi-Porat
View author publications
You can also search for this author in PubMed Google Scholar
Merav Hecht
View author publications
You can also search for this author in PubMed Google Scholar
Tsuria Pelet
View author publications
You can also search for this author in PubMed Google Scholar
Joshua Moss
View author publications
You can also search for this author in PubMed Google Scholar
Zeina Drawshy
View author publications
You can also search for this author in PubMed Google Scholar
Hamed Amini
View author publications
You can also search for this author in PubMed Google Scholar
Patriss Moradi
View author publications
You can also search for this author in PubMed Google Scholar
Sudharani Nagaraju
View author publications
You can also search for this author in PubMed Google Scholar
Dvora Bauman
View author publications
You can also search for this author in PubMed Google Scholar
David Shveiky
View author publications
You can also search for this author in PubMed Google Scholar
Shay Porat
View author publications
You can also search for this author in PubMed Google Scholar
Uri Dior
View author publications
You can also search for this author in PubMed Google Scholar
Gurion Rivkin
View author publications
You can also search for this author in PubMed Google Scholar
Omer Or
View author publications
You can also search for this author in PubMed Google Scholar
Nir Hirshoren
View author publications
You can also search for this author in PubMed Google Scholar
Einat Carmon
View author publications
You can also search for this author in PubMed Google Scholar
Alon Pikarsky
View author publications
You can also search for this author in PubMed Google Scholar
Abed Khalaileh
View author publications
You can also search for this author in PubMed Google Scholar
Gideon Zamir
View author publications
You can also search for this author in PubMed Google Scholar
Ronit Grinbaum
View author publications
You can also search for this author in PubMed Google Scholar
Machmud Abu Gazala
View author publications
You can also search for this author in PubMed Google Scholar
Ido Mizrahi
View author publications
You can also search for this author in PubMed Google Scholar
Noam Shussman
View author publications
You can also search for this author in PubMed Google Scholar
Amit Korach
View author publications
You can also search for this author in PubMed Google Scholar
Ori Wald
View author publications
You can also search for this author in PubMed Google Scholar
Uzi Izhar
View author publications
You can also search for this author in PubMed Google Scholar
Eldad Erez
View author publications
You can also search for this author in PubMed Google Scholar
Vladimir Yutkin
View author publications
You can also search for this author in PubMed Google Scholar
Yaacov Samet
View author publications
You can also search for this author in PubMed Google Scholar
Devorah Rotnemer Golinkin
View author publications
You can also search for this author in PubMed Google Scholar
Kirsty L. Spalding
View author publications
You can also search for this author in PubMed Google Scholar
Henrik Druid
View author publications
You can also search for this author in PubMed Google Scholar
Peter Arner
View author publications
You can also search for this author in PubMed Google Scholar
A. M. James Shapiro
View author publications
You can also search for this author in PubMed Google Scholar
Markus Grompe
View author publications
You can also search for this author in PubMed Google Scholar
Alex Aravanis
View author publications
You can also search for this author in PubMed Google Scholar
Oliver Venn
View author publications
You can also search for this author in PubMed Google Scholar
Arash Jamshidi
View author publications
You can also search for this author in PubMed Google Scholar
Ruth Shemer
View author publications
You can also search for this author in PubMed Google Scholar
Yuval Dor
View author publications
You can also search for this author in PubMed Google Scholar
Benjamin Glaser
View author publications
You can also search for this author in PubMed Google Scholar
Tommy Kaplan
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

A.A., G.C., R.S., B.G., T.K. and Y.D. conceived and initiated the project and designed the experiments. D.B., D.S., S.P., U.D., G.R., O.O., N.H., E.C., A. Pikarsky, A. Khalaileh, G.Z., R.G., M.A.G., I.M., N.S., A. Korach, O.W., U.I., E.E., V.Y., Y.S., D.R.G., K.L.S., H.D., P.A., A.M.J.S. and M.G. provided materials. J. Magenheim, A. Peretz, A. Klochendler, I.F.-F., M.H., T.P. and Z.D. performed experiments. G.C., J.B., H.A., P.M., S.N., O.V. and A.J. conducted sequencing and analysis. N.L., S.S.-P., J. Moss and T.K. developed and performed computational analyses. N.L., Y.D. and T.K. wrote the manuscript.

Corresponding authors

Correspondence to Yuval Dor, Benjamin Glaser or Tommy Kaplan.

Ethics declarations

Competing interests

This work was supported by GRAIL, Inc. G.C., J.B., A.A., O.V. and A.J. are employees, shareholders and/or founders at GRAIL, Inc. J.M., J.M., I.F.-F., R.S., Y.D., B.G. and T.K. have filed patents on cfDNA analysis technology. The remaining authors declare no competing interests.

Peer review

Peer review information

Nature thanks Matteo Pellegrini and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 A human whole-genome DNA methylation atlas of healthy cell types.

205 healthy samples were obtained from adult humans, isolated and deeply sequenced (WGBS, mean depth ≥30x), to form a comprehensive human cell-type-specific methylation atlas.

Extended Data Fig. 2 Sample preparation and purity.

(A) Fresh tissue was obtained at surgery and dissociated (optimized per tissue type), then incubated with antibodies, and FACS-sorted. Sorted cells were analysed using qRT-PCR for key cell-type-specific genes, or targeted PCR for cell-type-specific DNA methylation markers.DNA methylation was also analysed using whole-genome bisulfite sequencing. (B) Example of FACS sorting for pancreatic endocrine cell types. Left panel: staining for the beta cell marker C-peptide (x-axis) versus alpha cell marker glucagon (y-axis). Note that no double positive cells are observed. Centre panel: staining for c-peptide (x-axis) versus delta cell marker somatostatin (y-axis). Right panel: unstained control (only fluorescent secondary antibodies added, no primary antibodies). (C) Fragment-level validation of sample purity using targeted PCR. Cell-type-specific markers were designed using pre-existing 450K data, covering 4–7 several neighbouring CpGs. Shown is the percentage of unmethylated molecules in each cell type (including endothelial cells and leukocytes). Colour gradient fades from fully unmethylated molecules (allT), through those unmethylated in all but one CpG (allT-1), etc. Amplicon locations are reported in hg19, for acinar cells, alpha, beta, delta, duct, and endothelial markers (from left to right). (D) Fragment-level validation of the same locations, using the atlas WGBS data. Y-axis marks the percentage of unmethylated fragments (with ≥4 CpGs). As these markers show, approximately 90% of molecules in that target cell type are unmethylated, compared with less than 5% in other cell types, thus emphasizing the purity of the DNA methylation atlas using a set of independently selected DMRs.

Extended Data Fig. 3 Purity estimation for pancreas, lung, heart, liver, breast, and GI using atlas markers.

The percent of unmethylated fragments (y-axis) among fragments of ≥4 CpGs from selected differentially methylation markers could serve as an (under-) estimate of the atlas purity. Here we show one such marker for each cell type, selected from the top 25 markers, and use fragment-level analysis to demonstrate the purity in the target cell type compared to other cell types from the same tissue or environment. (A) Pancreas. (B) Lung. (C) Heart. (D) Liver. (E) Breast. (F) GI tract. For most cell types, 90% of the molecules in the target cell types are unmethylated, compared with less than 5% of other types. This is an under-estimation, as some heterogeneity could occur in each cell type, reflecting stochastic noise, cellular states, age, or environmental changes.

Extended Data Fig. 4 Biological replicates of the same cell type, from different individuals show a surprisingly low rate of differentially methylated blocks.

We focused on 37 cellular subtypes with n≥3 replicates (e.g. endothelial cells from a specific tissue) and measured the average percentage of methylation blocks (≥3 CpGs) that differ in their methylation by 50% (absolute delta beta), across replicates (shown as Y-axis). Nearly all cellular subtypes (36/37) differ by ≤0.5% of blocks suggesting a very high degree of conservation among replicates. Dotted red line marks the average number of differential blocks between two random samples of different cell types (4.9%). Box plots mark median and interquartile range (IQR), with 1.5*IQR whiskers.

Extended Data Fig. 5 Unsupervised agglomerative clustering of human methylomes.

(A) Same as Fig. 2, coloured by developmental lineage from germ layers, including endoderm (green), mesoderm (blue), and ectoderm (red). (B) same as Fig. 2, for Roadmap Epigenomics DNA methylation atlas.

Extended Data Fig. 6 Marker specificity across 953 cell-type-specific markers.

For every cell type (row), we plot each of the top 25 markers (shown as boxes). For each marker, we compare the percentage of unmethylated fragments (≥3 CpGs) in the target samples (blue dots) versus their percentage in background samples (golden dots). Blue and golden bars plot the average proportion across all target and background samples, respectively.

Extended Data Fig. 7 Markers of putative enhancers in other atlas cell types.

Including top 250 unmethylated markers for B cells (top left), hepatocytes (top right), gastric/small intestine/colon epithelium (bottom left), and cardiomyocytes/heart fibroblasts (bottom right). Grey lines mark the same ChIP-seq/ATAC/DNase/chromHMM signal, averaged across all 11,371 unmethylated markers (top 250 per cell type).

Extended Data Fig. 8 Fragment-level analysis of in silico mixes.

Shown are in silico simulations for four cell types, which are computationally mixed at various proportions with a plasma-like mixture of 90% leukocytes, 7.5% vascular endothelial cells, and 2.5% hepatocytes. Each mixture was analysed using our atlas (red), and compared to Moss et al. (grey). Box plots show average contribution in 10 simulations, with 1 SD error bars.

Extended Data Fig. 9 Specialized DNA methylation atlases for the analysis of (A) Illumina BeadChip 450K array, (B) EPIC array, (C) RRBS sequencing data, (D) Illumina TruSeq EPIC capture panel, (E) Roche SeqCapEpi capture panel, and (F) Agilent SureSelectXT capture panel

.

Extended Data Fig. 10 Deconvolution of previously published 450K DNA methylation array data.

(A) Deconvolution of pancreatic islet methylomes. Methylation arrays from 53 male and 34 female non-diabetic donors⁴⁸ were analysed atlas methylomes, revealing detailed cellular composition including previously uncharacterized alpha and delta cells. No statisticaly significant sex differences in cellular composition were observed. (B) Analysis of 865 pulmonary methylomes from TCGA⁴⁹. WGBS-based markers for lung alveolar epithelium and lung bronchial epithelium cells reveal differential cell populations in 443 LUAD, 11 SCLC, 337 LUSC, 32 normal adjacent (LUAD), and 42 normal adjacent (LUSC) lung methylomes. Note that only alveolar cell DNA is identified in lung adenocarcinomas, while small cell lung cancer and squamous cell carcinomas contain also bronchial DNA, consistent with the presumed cellular origins of each type of lung cancer. Note that epithelial cells are a minority in both lung adenocarcinoma and normal lungs. This is probably due to the abundance of stromal cells in bulk preparations of either normal lungs or lung cancers. (C) DNA methylation from 721 cancerous and 97 normal breast biopsies from TCGA. WGBS-based markers for breast luminal and basal epithelial cells were used to study the cellular composition in TCGA⁵⁰, which were classified into five subtypes using PAM50, a 50-gene expression-based classification⁶⁶. Different cell composition is observed for normal-like, basal-like, luminal A, luminal B, and Her2-enriched PAM50 subtypes, compared to healthy breast biopsies. The low fraction of breast basal cells in breast cancer is likely to result from the abundance of non-epithelial cells in both the normal breast and breast cancer. Box plots mark median and interquartile range (IQR), with 1.5*IQR whiskers.

Supplementary information

Supplementary Information

Legends for Supplementary Tables; cell sorting: FACS plots and RNA/methylation enrichments; genome-wide segmentation to methylation blocks; selection of type-specific markers; UXM fragment-level deconvolution algorithm; purity estimations: proportion of unmethylated fragments in various samples; analysis of cfDNA composition in patients with COVID-19; dissociation protocols; appendix: RBC lysis buffer (X10); Supplementary Table 18: antibodies for flow cytometry; and Supplementary Table 19: reagents used.

Reporting Summary

Peer Review File

Supplementary Tables

This file contains Supplementary Tables 1–17 (see main Supplementary Information PDF for legends).

Supplementary Dataset 1

Genome-wide set of unmethylated regions per cell type, annotated. Zip file contains 39 bed files, each with all genomic regions (blocks of at least four CpGs in which at least 85% of sequenced fragments are unmethylated in at least 85% of covered CpGs). Bed file information includes position (chr, from, to, CpG_from, CpG_to, hg19), regional annotation by HOMER and neighbouring gene, and annotations of when a region is 1 kb or less from a transcriptional start site, overlapping a CpG island, chromHMM annotation in that cell type or similar, ChIP–seq peak annotations for H3K4me3 (promoter), H3K27ac (gene regulation), H3K4me1 (enhancer), H3K27me3, overlapping ATAC–seq peak, presence of CTCF binding site and CTCF ChIP–seq peak.

Supplementary Dataset 2

Genome-wide set of unmethylated regions per cell type, overlapping H3K27ac, but not J3K4me3, ChIP–seq peaks, annotated. Zip file containing bed files, listing for each cell type a set of putative enhancers defined as the subset of hypomethylated regions (Extended Dataset 1) marked by active gene regulation chromatin marks (H3K27ac peaks) but lacking promoter marks (H3K4me3 peaks).

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Loyfer, N., Magenheim, J., Peretz, A. et al. A DNA methylation atlas of normal human cell types. Nature 613, 355–364 (2023). https://doi.org/10.1038/s41586-022-05580-6

Download citation

Received: 30 November 2021
Accepted: 18 November 2022
Published: 04 January 2023
Issue Date: 12 January 2023
DOI: https://doi.org/10.1038/s41586-022-05580-6

This article is cited by

Increased CpG methylation at the CDH1 locus in inflamed ileal mucosa of patients with Crohn disease
- Charles de Ponthaud
- Solafah Abdalla
- Pierre Bougnères
Clinical Epigenetics (2024)
Tissue of origin prediction for cancer of unknown primary using a targeted methylation sequencing panel
- Miaomiao Sun
- Bo Xu
- Kuisheng Chen
Clinical Epigenetics (2024)
Cell-free DNA methylation reveals cell-specific tissue injury and correlates with disease severity and patient outcomes in COVID-19
- Yuan-Yuan Li
- Ming-Ming Yuan
- Bin Du
Clinical Epigenetics (2024)
Self-control is associated with health-relevant disparities in buccal DNA-methylation measures of biological aging in older adults
- Y. E. Willems
- A. deSteiguer
- Laurel Raffington
Clinical Epigenetics (2024)
Utilizing epigenetics to study the shared nature of development and biological aging across the lifespan
- Laurel Raffington
npj Science of Learning (2024)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.