Introduction

The identities of cells and tissues in multicellular organisms can be maintained by their particular epigenome.1 DNA methylation is a relatively stable component of the epigenome, which establishes and stabilizes cellular phenotypes by maintaining gene expression states.2, 3, 4 DNA methylation patterns for a particular cell type are inherited through successive cell cycles and extended through a specific lineage.5, 6 DNA methylation can reflect the tissue of origin even after long-term culture.7, 8 Furthermore, induced pluripotent stem cells (iPSCs), reprogrammed from mature cells by defined transcription factors, are found to harbor residual DNA methylation from the original donor cells.9, 10

The term ‘epigenetics’ was coined by Waddington11 in 1942 to refer to ‘the causal mechanisms by which the genes of a genotype bring about a phenotype’. Currently, the widely accepted definition of ‘epigenetics’ is ‘heritable changes in genome function that occur without changes in the DNA sequence’.12 This definition implies that particular states that define cell identity are heritable and maintained.13 Here we discuss epigenetic memory, a natural mechanism by which the identity of a cell is maintained through successive cell cycles during development and differentiation.5, 14

This review covers DNA methylation as a form of epigenetic memory in stem cells and cancer cells. We have organized this review into three main sections. The first section is an introduction to DNA methylation in mammals. We briefly describe the mechanisms of maintenance and erasure of DNA methylation and de novo DNA methylation. We also introduce DNA methylation analysis technologies. The second section summarizes DNA methylation as a mechanism of epigenetic memory in various types of stem cells, including embryonic stem cells (ESCs), iPSCs, hematopoietic stem cells (HSCs), mesenchymal stem cells (MSCs) and neural stem cells (NSCs). The last section is devoted to the role of DNA methylation in cancer initiation and evolution. We also describe DNA methylation as a marker of cancer origin and discuss its use in classifying cancer of unknown primary.

DNA methylation

DNA methylation on the fifth position of cytosine (5mC) is a stable epigenetic mark that has important roles in mammalian development, differentiation and maintenance of cellular identity through the control of gene expression.15 Over the past 40 years, changes in DNA methylation have been observed in many human diseases, especially cancer.16

DNA methylation in vertebrates is mainly restricted to CpG sites, but significant non-CpG methylation has been found in pluripotent stem cells.17, 18 There are ~29 million CpGs in the human genome, and 60–80% of them are methylated.19 Approximately 7% of CpGs are located in CpG islands (CGIs), which are regions of high CG density.20 Approximately 70% of annotated gene promoters are associated with a CGI, and CGIs are largely resistant to DNA methylation.21 The enzymes responsible for DNA methylation are DNA methyltransferases (DNMTs) including DNMT1, DNMT3A, DNMT3B and DNMT3C.22, 23

In this section, we briefly describe the molecular mechanisms of maintenance and erasure of DNA methylation, as well as de novo methylation. We also introduce recent DNA methylation analysis technologies that can be used in clinical applications.

Maintenance and erasure of DNA methylation

DNA methylation patterns are transmitted with high fidelity during DNA replication.13 DNMT1 maintains global DNA methylation and shows a strong preference for hemimethylated DNA.24 DNMT1 is recruited to the DNA replication fork through direct interactions with PCNA (proliferating cell nuclear antigen) and UHRF1 (ubiquitin-like, containing PHD and RING finger domains 1, also known as Np95 and ICBP90).25, 26 UHRF1 recognizes hemimethylation sites via an SRA domain and recruits DNMT1 to these sites.27 Chromatin-associated enzymes also regulate DNMT1 through post-translational modifications.15 LSD1 (lysine-specific demethylase 1, also known as KDM1) is essential for maintaining global DNA methylation; it regulates the methylation status of DNMT1 and modulates its stability.28 Histone H3 lysine 9 methylation (H3K9me) is necessary for DNA methylation maintenance; it binds to UHRF1 and regulates DNMT1 stability during S phase.29 These interactions of DNMT1 with other heterochromatin-associated proteins ensure that DNMT1 activity is stabilized only during DNA replication, which provides fidelity to global DNA methylation.15

DNA methylation can be removed through passive and active mechanisms. Passive DNA demethylation occurs in the absence of functional DNA methylation maintenance machinery during successive rounds of replication. By contrast, active DNA demethylation occurs through an enzymatic process that removes or modifies the methyl group from 5mC.30 Ten–eleven translocation (TET) family enzymes, such as TET1, TET2 and TET3, are involved in active demethylation.31 TET proteins oxidize 5mC to 5-hydroxymethylcytosine (5hmC) and further oxidize 5hmC to generate 5-formylcytosine and 5-carboxylcytosine.32 DNA demethylation can be completed by either replication-dependent dilution of 5mC oxidation derivatives or thymine–DNA glycosylase-mediated base excision repair.33, 34

De novo DNA methylation

Many CGI promoters are protected from DNA methylation by transcription factor binding, nucleosome exclusion and H3K4 methyltransferases, such as SETD1A (SET domain containing 1A) or MLL proteins.15 Although these promoters remain unmethylated, some repressed promoters acquire DNA methylation during development.15 De novo DNA methylation is carried out by DNMT3A and DNMT3B complexed with DNMT3L, a closely related homolog that lacks a catalytic domain.35, 36 DNMT3L interacts with unmethylated H3K4 and recruits the DNMTs.35

DNMT3A and DNMT3B are recruited to target promoters in complex with other epigenetic repressors, including histone deacetylases and H3K9 methyltransferases.15, 37 Frequently, proper targeting to stable silencing regions occurs by the binding of repressive transcription factors.15 Repressive transcription factors induce chromatin remodeling by recruitment of LSH (lymphoid-specific helicase; also known as HELLS), linker histone H1 and heterochromatin protein 1. H3K9 methyltransferase, G9A is also recruited in this complex with DNMT3A or DNMT3B.37, 38, 39, 40 The crosstalk between DNA methylation and histone modification suggests that histone modifications, such as H3K9me, initiate heterochromatin formation and subsequent DNA methylation ensures stable silencing of the promoter.15

DNA methylation analysis technologies

The three main principles of DNA methylation analysis are as follows: (1) digestion of genomic DNA with methylation-sensitive restriction enzymes; (2) affinity-based enrichment of methylated DNA fragments; and (3) sodium bisulfite conversion.41 Although there are many DNA methylation analysis methods, bisulfite sequencing (BS) is widely accepted as a gold standard for detection of DNA methylation.42, 43 BS is a sodium bisulfite conversion method that provides quantitative DNA methylation level with single-base resolution. Sodium bisulfite treatment of genomic DNA converts unmethylated cytosine to uracil and then uracil become thymidine in subsequent PCR amplification and sequencing.44 5mC is resistant to this conversion and remains as cytosine, so it can be distinguished from unmethylated cytosine.44 Initially, BS was used to assay individual loci with locus-specific PCR followed by Sanger sequencing.44 Recently, reduced representation BS has extended the genomic coverage of BS by using high-throughput sequencing technology. Reduced representation BS combines restriction digestion with BS for analysis of high CpG density regions such as CGIs.45 Finally, whole-genome BS provides single-base resolution and quantitative rates of methylation for all cytosines in the genome.46, 47 Whole-genome BS has been applied to various tissues and cell lines to provide a complete map of the ~29 million CpG sites in the human genome.19

The Infinium methylation 450k microarray is a cost-effective, high-throughput method for detecting DNA methylation in many human samples.16 This assay involves bisulfite treatment of genomic DNA and subsequent hybridization to over 450 000 CpG sites throughout the genome. The coverage of this platform targets gene regions including promoters, 5′-untranslated regionss, the first exons, gene bodies and 3′-untranslated regionss.48 Notably, The Cancer Genome Atlas consortium used this platform to profile >7500 samples from over 200 different cancer types.49, 50, 51, 52 The MethylationEPIC (EPIC) BeadChip, an advancement upon the 450k array, contains more than 850 000 probes. The new platform covers >90% of the sites on the 450k array, plus >350 000 CpGs at regions identified as potential enhancers by FANTOM5 and the ENCODE project.53, 54 The EPIC array is expected to be a valuable tool for understanding human development and disease, in particular enhancer DNA methylation.52

DNA methylation in stem cells

Stem cells can be applied to cell therapy, drug development, disease modeling and the study of cellular differentiation. DNA methylation has critical roles in the maintenance of stem cell identity and lineage commitment during differentiation.55 In this section, we describe DNA methylation as an epigenetic memory of stem cells such as ESCs, iPSCs, HSCs, MSCs and NSCs.

Embryonic stem cells

ESCs are pluripotent, self-renewing cells. It is known that ESCs can maintain their self-renewal ability even in the absence of all three DNMTs.56 However, the differentiation of ESCs is almost completely inhibited if DNMTs are not present.57 Global DNA hypomethylation blocks ESCs to silence pluripotency factors and to express differentiation-associated markers.57

ESCs exhibit significant levels of non-CpG methylation and express high levels of DNMT3A and DNMT3B.17, 19 Non-CpG methylation, primarily at CpA sites, accounts for ~25% of all methylated cytosines in human ESCs.19 Non-CG methylation is mediated by DNMT3A and DNMT3B, and depends on the presence of DNMT3L. DNMT3L may direct de novo DNMT activity during pluripotency, but it is silenced upon differentiation.58, 59 The prevalence of non-CpG methylation in ESCs, as well as in iPSCs, suggests that it could be important for pluripotency, but it is currently unclear whether it is a cause or consequence of the pluripotent state.60

In contrast to somatic cells, which transmit considerable epigenetic information to daughter cells, ESCs preserve their epigenetic memory by balancing the addition and removal of DNA methylation.4 Although ESCs show high DNA methylation turnover rates, their epigenomes are well organized and highly stable.4

Induced pluripotent stem cells

iPSCs are originally generated through ectopic expression of four transcription factors: OCT4, SOX2, KLF4 and MYC. iPSCs can be used to uncover the epigenetic mechanisms of reprogramming.61 During reprogramming, a global reset of the mature somatic epigenome occurs, and the epigenomes of iPSCs are remarkably similar to those of ESCs.62, 63 However, it has been discovered that iPSCs harbor residual DNA methylation signatures from their donor cells, and they exhibit a preference for differentiation into their original cell lineage (Figure 1).9, 64, 65 Moreover, a subset of human iPSCs retain their epigenetic memory even after extended passaging.10, 66

Figure 1
figure 1

A model of epigenetic memory in iPSCs (modified from Ohi et al.66). Induced pluripotent stem cells (iPSCs) harbor residual DNA methylation signatures from their donor cells. During reprogramming, pluripotency genes are demethylated and reactivated. Incomplete demethylation occurs in developmental regulators that are silenced in the somatic cell. Somatic cell genes were differentially methylated and repressed in iPSCs. Black, white and gray circles represent methylated, unmethylated and partially methylated CpGs, respectively.

Epigenetic memory has also been reported in direct reprogramming. Direct reprogramming is the conversion of fully differentiated cells to other cell types, bypassing an intermediate pluripotent stage. Direct reprogramming of fibroblasts into neural stem cells by defined factors shows that there is some epigenetic memory in fibroblasts, although the reprogrammed neural stem cells were able to suppress the donor cell-specific transcription network.67 All these studies of reprogramming technologies provide insight into epigenetic memory and show how it can potentially be used for disease modeling and therapeutic applications.68

Hematopoietic stem cells

HSCs are a rare cell population that is responsible for generating erythroid, myeloid and lymphoid lineages.69 DNA methylation is critical for the regulation of HSC self-renewal during hematopoiesis; it facilitates commitment to a lymphoid or myeloid fate, and it establishes the differentiated cell identity.69, 70 DNMT1 is essential for protecting HSCs from the premature activation of predominant differentiation programs.69, 71 Dnmt1-knockout mice suffer from self-renewal defects and marked misregulation of the myeloid and lymphoid compartments.70, 71 On the other hand, DNMT3A and DNMT3B are required to repress the HSC self-renewal gene network during HSC differentiation.72 Combined loss of Dnmt3a and Dnmt3b shows enhanced HSC self-renewal and severe inhibition of differentiation.72

DNA methylation levels increase upon lymphoid commitment, but decrease with myeloid commitment.73 Although DNA hypomethylation is a general feature of myeloid cells, DNA methylation is dynamically regulated throughout the stages of differentiation.69 During neutrophil development, DNA methylation appears to change during specific differentiation stages, and its states overlap with changes in the activation of key hematopoietic transcription factors.74 During the differentiation of monocytes into macrophages and dendritic cells, the time course of demethylation occurs at individual CpG sites.75, 76 However, de novo DNA methylation has rarely been detected during these differentiation processes.75

Mesenchymal stem cells

MSCs are multipotent adult stem cells that have self-renewal capacity, support hematopoiesis and can differentiate into osteocytes, chondrocytes and adipocytes.77 Human MSCs from various tissues, including bone marrow, umbilical cord, adipose tissue, dental pulp, skin and many others, have been used clinically as potential regenerative cell therapies.78 However, the high proliferation rate of MSCs in an artificial cell culture environment could favor genetic and epigenetic alterations.79, 80 DNA methylation patterns of human MSCs are maintained throughout long-term culture and aging, but senescence-associated DNA methylation differences are observed in regions with H3K9me3, H3K27me3 and targets of EZH2.8 Therefore, DNA methylation can be a good molecular marker for the quality control of MSCs.80 The DNA methylation profiles of MSCs can reflect their cell type of origin and can be useful for the classification of MSCs.7, 81

Neural stem cells

NSCs are a subtype of progenitor cells in the nervous system that have the capacity to self-renew and to differentiate into distinct cell types such as neurons, astrocytes and oligodendrocytes.82 NSCs at early gestation can only self-renew, and they then differentiate exclusively into neurons during midgestation. At late gestation, NSCs begin to differentiate into astrocytes and oligodendrocytes.83 DNA methylation plays an important role in defining the timing of the NSC fate specification switch from neurogenesis to astrocytogenesis.84, 85, 86 Many astrocytic genes, such as GFAP (glial fibrillary acidic protein), are methylated in early and mid-gestational NSCs, then demethylated in late-stage NSCs.87, 88 Thus, epigenetic mechanisms have a critical role in fine-tuning and coordinating gene expression during neurogenesis.86

5hmC is present at much lower levels than 5mC, but it is particularly abundant in brain tissue.89 DNA hydroxymethylation may play important roles in mediating dynamic gene expression changes during brain development.85, 90 Intriguingly, Tet1 mutant mice also show adult neurogenesis deficits and impairment in learning and memory.91 It remains to be determined how 5hmC and DNA demethylation regulate neurogenesis.86

DNA methylation in cancer cells

Aberrant DNA methylation is common across many types of cancer. Global hypomethylation of the cancer genome, promoter hypermethylation of tumor suppressor genes and potentially direct mutagenesis of 5mC-containing sequences through deamination of methylated cytosine can contribute to cancer.16 These alterations generally co-exist in tumors, suggesting that epigenetic mechanisms are central to the evolution of human cancer.16 In this section, we describe epigenetic reprogramming during tumor initiation and describe the roles of DNA methylation in tumor evolution. We also describe DNA methylation as an epigenetic memory of cell or tissue origin of cancers and its utility as a molecular marker for classifying cancers of unknown primary (CUP).

Epigenetic reprogramming during tumor initiation

The malignant transformation of a normal cell into a cancerous cell has similarities to the reprogramming of a somatic cell to a pluripotent cell.92 Transformation resets the transcriptional network and chromatin structure and produces cells with unlimited self-renewal potential.92 Several reprogramming transcription factors, such as Sox2 and c-Myc, are well-known oncogenes, whereas many genes that act as barriers to reprogramming, including p53 and Ink4A/Arf, function as tumor suppressors.93 DNA methylation is a potent barrier to cellular reprogramming, and the methylation changes markedly during malignant transformation, as it does in cellular reprogramming.94

Stem cell-like chromatin patterns frequently lead to DNA hypermethylation during cancer progression.95 Hypermethylated genes in cancer are heavily biased to PRC2 (polycomb repressive complex 2)-regulated, H3K27me3-marked genes, in ESCs and adult stem cells.95, 96, 97 Regions of focal DNA hypermethylation in cancer are located primarily at CGIs and are concentrated within long-range hypomethylated regions.98 In the nucleus, these hypomethylated regions correspond broadly to nuclear lamina-associated domains, which are generally associated with repressive chromatin and polycomb group protein-marked genes in ES cells.16, 98, 99

The widespread DNA methylation changes in cancer may be caused by mutations in components of the citric acid cycle and the epigenetic machinery.16 For example, mutations in IDH1 (isocitrate dehydrogenase 1) and IDH2 alter the DNA and histone demethylation pathways by causing the accumulation of D-2-hydroxyglutarate, which competes with the α-ketoglutarate needed by the TET and histone lysine demethylase (KDM) enzymes.100, 101 IDH1 and TET2 mutations are mutually exclusive in acute myeloid leukemia.102, 103 Thus, interactions between epigenetic and genetic events drive progressive cellular abnormalities throughout the entire course of cancer development.16

Epigenetic heterogeneity and tumor evolution

According to the clonal evolution theory of tumor cell populations, cancers evolve by an iterative process of clonal expansion, genetic diversification and clonal selection within adaptive microenvironments.104, 105 Genetic diversity is essential for tumor evolution. In addition, it is now thought that not only the genome but also the epigenome can contribute to tumor evolution and that the genome and epigenome are intertwined.106 For example, promoter DNA hypermethylation of DNA repair genes is known to cause genetic changes, and mutations of epigenetic modifiers can cause epigenetic disruptions.106 Although epigenetic modifications are enzymatically reversible, some epigenetic marks are retained through cancer progression and represent the history of the cancer cells (Figure 2).106, 107, 108 Epigenetic marks can also reflect the responsive potential of cancer cells to therapeutic treatment.106, 107, 108

Figure 2
figure 2

A model of epigenetic memory in cancer cells. Cancers evolve by an iterative process of clonal expansion, genetic and epigenetic diversification and clonal selection within adaptive microenvironments. DNA methylation can be retained as an epigenetic memory of tumor evolution. Vertical line represents cancer treatments such as chemotherapy and radiation therapy. Small circles filled with black, white and gray represent DNA methylation pattern.

Profiling intratumoral heterogeneity is a powerful way to reconstruct tumor evolution, from tumor initiation through the subsequent stepwise development of cancer.107, 109 Recently, genome-wide profiling of intratumoral heterogeneity has been extended from genome to epigenome. Interestingly, intratumoral heterogeneity analyses in prostate cancers and gliomas have shown that the inferred histories from DNA methylation are remarkably similar to those obtained by looking at copy-number changes or somatic mutation.108, 110, 111 These studies suggest that genetic mechanisms and epigenetic mechanisms have widespread co-dependency during tumor evolution.107

Epigenetic heterogeneity in cancer has clinical importance for cancer diagnosis and treatment.109 A rare population of cancer cells with unique epigenetic states can drive drug resistance.112 In addition, the degree of epigenetic heterogeneity has been associated with patient response to drug treatment.107 Epigenetic heterogeneity at the single-cell level may play a role in determining the responses of patients to therapies, and concurrent treatment with epigenetic drugs against chromatin regulators can improve anti-cancer drug responses.113, 114, 115

DNA methylation profiling of CUP

CUP are a molecularly heterogeneous group of cancers for which the primary site remains obscure after metastasis.116, 117 CUP accounts for ~3–9% of all cancer diagnoses, and it is the fourth most common cause of cancer-related deaths worldwide.117, 118 Overall median survival of CUP patients is 9 months, and only 25% survive for 1 year or more.119, 120 Identification of the primary tumor site and treatment with origin-selective therapy can improve the survival of CUP patients.117 Sophisticated imaging, immunohistochemical testing and molecular-profiling tools have been tested for the identification of primary sites in CUP cases.117

DNA methylation patterns are tumor-type specific, and methylation analysis has already been clinically successful for the pharmacogenetic management of gliomas.121, 122, 123, 124, 125 A recent attempt to diagnose primary sites for CUP by using DNA methylation signatures (EPICUP) is a promising advance for CUP patients.126 EPICUP shows 99.6% specificity and 97.7% sensitivity in the validation set of 7691 tumors. Furthermore, it predicted the tissue of origin in 188 (87%) of 216 CUP patients.126 This achievement suggests that DNA methylation as an epigenetic memory of cancer cell origin can be a useful biomarker to unmask the original primary tumor site of a CUP, and it is clinically applicable for diagnosis and treatment of CUP patients.

Future perspectives

Intratumoral heterogeneity plays a critical role in cancer drug resistance. Single-cell analysis technologies for the genome, epigenome, transcriptome and proteome will make it possible to resolve such heterogeneity as these technologies become more available.127, 128, 129 Single-cell analysis of DNA methylation is technically difficult because bisulfite conversion is a relatively harsh process that causes DNA to be randomly fragmented.130 Although single-cell genome-wide BS technologies have been developed, more clinically available DNA methylation analysis tools are needed for rare cell populations such as stem cells, immune cells, circulating tumor cells and cell-free DNA.130, 131, 132 Though DNA methylation is critical in mammalian development and disease progression, the direct function of DNA methylation at specific sites remains unclear. Recently developed technologies for targeted DNA methylation editing, such as dCas9-Dnmt3a/Tet1, will be useful to validate the function of site-specific DNA methylation in gene expression and cell-fate determination.133, 134, 135, 136, 137 The identification of cell of origin is essential to stem cell biology and cancer research. As an epigenetic memory of cell origin, DNA methylation profiles will be useful in the development of regenerative medicine and tumor-type-specific and patient type-specific treatments.