Introduction

DNA methylation refers to the addition of a methyl group to the cytosine bases of DNA to form 5-methylcytosine. DNA methylation occurs in both prokaryotes and eukaryotes. In bacteria, DNA methylation differentiates the genome DNA from invading phage DNA, so that phage DNA is cleaved by the host restriction enzymes 1. DNA methylation is conserved in most major eukaryotic groups, including plants, and many fungi and animals, although it has been lost in some organisms such as the budding yeast Saccharomyces cerevisiae and the nematode worm Caenorhabditis elegans 2. In plants, DNA methylation occurs in the contexts of CG, CHG, and CHH (H = A, C, or T). In mammals, DNA methylation is restricted to the symmetric CG context, although non-CG methylation is prevalent in embryonic stem (ES) cells 3, 4. In both mammals and plants, centromeric and pericentromeric regions, as well as other repetitive elements are heavily methylated. Many genic regions also show high degrees of methylation. In contrast, promoter regions mostly lack DNA methylation 5, 6.

Mammals have three active DNA methyltransferases: DNMT1, DNMT3A, and DNMT3B. DNMT1 maintains DNA methylation at hemi-methylated DNA after DNA replication during cell divisions, whereas DNMT3A and DNMT3B are responsible for establishing de novo DNA methylation 7. A third member of the DNMT3 family, DNMT3-like (DNMT3L), which has no catalytic activity, functions as a regulator of DNMT3A and DNMT3B 8, 9. Mammalian DNMT2 is a tRNA methyltransferase rather than a DNA methyltransferase and has been renamed tRNA aspartic acid methyltransferase 1 10. In plants, three DNA methyltransferases have been characterized: MET1, CMT3, and DRM2 11, 12, 13. MET1, a homolog of mammalian DNMT1, is responsible for maintenance of symmetric CG methylation 14. CMT3 is a plant-specific DNA methyltransferase that is required for the maintenance of DNA methylation at CHG sites 12. DRM2 is responsible for de novo DNA methylation at all sequence contexts; its role in CHH methylation is most prominent, since CHH methylation cannot be maintained and thus must rely entirely on de novo methylation. The DRM2 activity is highly regulated by the RNA-directed DNA methylation (RdDM) pathway 15, 16.

In plants, DNA demethylation depends on four bifunctional 5-methylcytosine glycosylases: Repressor of silencing 1 (ROS1), Demeter (DME), DME-like 2 (DML2), and DML3, which remove methylated bases and cleave the DNA backbone at abasic sites. The resulting gap is filled by a DNA polymerase and a DNA ligase 17, 18, 19. ROS1 is ubiquitously expressed and functions to prevent the TGS of transgenes and some endogenous genes, and to maintain some levels of expression of transposons 17, 20, 21. DME is required for gene imprinting and genome-wide DNA demethylation in central cells and endosperms 19, 22, 23, 24. In mammals, active demethylation contributes to the genome-wide erasure of DNA methylation marks observed in preimplantation embryos and primordial germ cells (PGCs). Active demethylation also takes place at specific loci in somatic cells under certain circumstances 25. However, the mechanisms underlying active demethylation in mammals have been highly controversial. It is possible that various mechanisms function in a context-dependent manner 26, 27.

Recent studies have deepened our understanding of the establishment and maintenance of DNA methylation as well as of active DNA demethylation pathways in plants and animals. The high-throughput sequencing and ChIP-on-chip technologies have been applied to study the profiles of DNA methylation at the whole genome level 28. In this review, we will discuss recent advances in DNA methylation and demethylation, the affects of the RNA interference pathway and histone modifications on DNA methylation, and the diverse functions of DNA methylation.

De novo DNA methylation

RNA-directed DNA methylation in plants

In the Arabidopsis genome, DNA methylation is highly concentrated in centromeric regions and repetitive sequences throughout the genome 5, 29. The methylated DNA loci are frequently accompanied by siRNAs. Approximately one-third of methylated DNA loci are rich in siRNAs, supporting an important role of siRNAs in DNA methylation 5, 30. Whole-genome tilling array studies in Arabidopsis suggested that most non-coding genomic sequences are capable of generating RNA transcripts. Recent studies suggested that both siRNAs and long non-coding RNAs are involved in de novo DNA methylation 31.

RdDM is a conserved de novo DNA methylation mechanism in plants. The phenomenon was first found in transgenic potato containing viroid genes, in which the recombinant viroid sequence in the plant genome was methylated when the RNA viroid was replicated 32, indicating an RdDM mechanism. RdDM was subsequently recognized as a general transcriptional silencing mechanism in plants. It is involved in various epigenetic phenomena, which include transgene silencing, transposon suppression, gene imprinting, and paramutation 33, 34, 35, 36.

Many components of the RdDM pathway have been recently identified (Figure 1). The pathway contains two plant-specific, atypical DNA-dependent RNA polymerases whose largest subunits, which were independently identified by different groups, are NRPD1 and NRPE1 31, 34, 37, 38, 39. NRPD1 and NRPE1 are similar to the largest subunits of canonical DNA-dependent RNA polymerases but specifically function in de novo DNA methylation and transcriptional gene silencing. Other characteristics of the two plant-specific, atypical, DNA-dependent RNA polymerases have been explored by forward genetic screens and affinity purification 40, 41, 42, 43. Similar to DNA-dependent RNA polymerase II (Pol II), the two atypical plant-specific RNA polymerases are also multi-subunit protein complexes and have been named (in the manner of DNA-dependent RNA polymerases) Pol IV and Pol V 42, 43. Pol IV and Pol V have their own specific subunits but share some subunits with each other and share other subunits with Pol II 42.

Figure 1
figure 1

The RNA-directed DNA methylation pathway in plants. In transposons and other DNA repeat regions, aberrant single-stranded RNAs are proposed to be produced by DNA-dependent RNA polymerase IV (Pol IV). The chromatin remodeling protein CLSY may facilitate Pol IV transcription. RNA-dependent RNA polymerase RDR2 converts the aberrant single-stranded RNAs to double-stranded RNAs, which are then cleaved into 24-nt siRNAs by the Dicer-like protein DCL3. The 24-nt siRNAs are bound by an ARGONAUTE protein AGO4, AGO6, or AGO9. In intergenic non-coding (IGN) regions, DNA-dependent RNA polymerase V (Pol V) generates single-stranded scaffold RNA transcripts. Generation of Pol V RNA transcripts requires RDM4/DMS4, DRD1, DMS3, and RDM1. RDM1 may bind single-stranded methylated DNA and help recruit Pol V and Pol II to appropriate chromatin regions. DRD1, DMS3, and RDM1 form a stable protein complex, named DDR. KTF1 is an RNA-binding protein, which tethers AGO4 to nascent Pol V or Pol II RNA transcripts to form the RNA-directed DNA methylation effector complex. IDN2 may stabilize the base-pairing between the nascent scaffold transcripts and 24-nt siRNAs. The effector complex directs the de novo DNA methyltransferase DRM2 to specific chromatin regions to catalyze new DNA methylation.

Pol IV was proposed to synthesize RNA transcripts by using the RdDM target loci as templates (Figure 1). The single-stranded RNA is converted into double-stranded RNA by the RNA-dependent-RNA polymerase RDR2 33. The double-stranded RNA is cleaved into 24-nt primary siRNAs by a Dicer-like protein DCL3 33. The maturation of siRNAs requires HUA ENHANCER 1 (HEN1), a conserved S-adenosyl-L-methionine-dependent RNA methyltransferase that methylates the 2′-OH group on the 3′-terminal nucleotides of siRNAs 44, 45. In addition to siRNAs, RdDM also requires non-coding RNA transcripts produced by Pol V (Figure 1) 31. When siRNAs are loaded onto the ARGONAUTE proteins AGO4 and/or AGO6 16, 46, 47, KTF1 binds the nascent non-coding RNA transcripts produced by Pol V, and is tethered to AGO4 by the base pairing between the non-coding RNA transcripts and the siRNAs 48, 49. The interaction between AGO4 and the WG/GW motif of KTF1 reinforces the association between siRNAs and Pol V transcripts, forming a functional RdDM effector complex that directs the de novo DNA methyltransferase DRM2 to genomic loci being transcribed by Pol V 31, 47, 48, 50. Moreover, AGO4 has a slicer activity by which the scaffold non-coding RNA transcripts are cleaved. The cleaved RNA fragments may be copied by RDR2 and then processed into 24-nt secondary siRNAs by DCL3. IDN2, a recently identified RdDM component, is an SGS3-like protein containing XS and XH domains separated by a coiled-coil region 51, 52. The XS domain in IDN2 binds double-stranded RNA in vitro51, suggesting a role in siRNA production or in the RdDM effector complex. RDM1 was recently identified from forward genetics screens for Arabidopsis mutants with reduced TGS 53. RDM1 functions together with AGO4 and DRM2 as a part of the RdDM effector complex. RDM1 appears to bind single-stranded methylated DNA, and may contribute to the recruitment of the RdDM effector complex to methylated DNA 53. Such recruitment helps to explain how siRNA production and amplification occur preferentially at methylated loci, and support the notion of a self-reinforcing feedback loop between siRNA generation and DNA methylation.

In eukaryotes, DNA-dependent RNA polymerases are generally composed of 12 subunits, some of which are shared by Pol I, II, and III 54. Similar to RNA polymerases I, II, and III, the plant-specific Pol IV and Pol V are multi-subunit complexes. The subunits of Pol IV and Pol V are paralogous or identical to the 12 subunits of Pol II 34, 37, 38, 39, 40. The functions of subunits NRPD/E2, NRPD/E4, and NRPE5 as well as the largest subunits NRPD1/E1 have been demonstrated by both genetic and biochemical analyses 34, 37, 38, 39, 40, 41, 42. NRPD/E2 and NRPD/E4 are common to Pol IV and Pol V 37, 40. NRPE5 functions exclusively in Pol V 41, 42. The absence of genetic evidence for the function of the other subunits indicates that there are redundancy or they are non-essential for RdDM. It appears that Pol IV and Pol V have evolved from the general RNA polymerases and function specifically in the production of non-coding RNAs for RdDM and TGS.

The transcription activity of Pol II is tightly regulated with the help of a variety of general transcription factors during initiation and elongation 55, 56. While the transcription regulation mechanism for eukaryotic Pol II has been extensively studied, little is known about the transcription regulation of Pol IV and Pol V. KTF1, an SPT5-like elongation factor, was characterized as a scaffold protein in the RdDM effector 48, and was also identified as a Pol V-interacting component 43. RDM4/DMS4, a homolog of the yeast transcription factor IWR1, was identified by two different genetic screens 57, 58. RDM4/DMS4 interacts with Pol V, and is required for generation of Pol V-dependent transcripts. These results suggest that the regulation of the polymerase activity of Pol IV and Pol V requires transcriptional factors such as KTF1 and RDM4/DMS4. However, KTF1 is dispensable for generation of Pol V-dependent non-coding RNA transcripts. Pol V-dependent transcripts apparently over-accumulate in ktf1 and ago4 mutants 48, 49, indicating that KTF1 and AGO4 act downstream of Pol V transcription. The C-terminal region of KTF1 contains more than 40 WG/GW repeats that bind to AGO4 48, 59. Moreover, KTF1 also binds RNA transcripts, which may be produced by Pol V and/or Pol II. With the dual functions, KTF1 helps recruit AGO4 to the scaffold RNA transcripts to form a functional effector complex that directs the DNA methylase DRM2 to RdDM target loci 48, 49.

Non-coding RNAs produced by RNA Pol II are widespread in plants and animals. Zheng et al. 60 found that mutation of the second largest subunit of Pol II leads to loss of DNA methylation and TGS at specific genomic loci. The Pol II-dependent non-coding RNAs physically associate with AGO4, resulting in DNA methylation and TGS. Moreover, Pol II-dependent transcripts may also help recruit Pol IV and Pol V to promote siRNA biogenesis and function 60. Interestingly, RDM1 interacts and co-localizes with Pol II, but it is also required for the generation of Pol V-dependent transcripts 53. RDM1 may bind single-stranded DNA at sites of Pol II transcription, and be involved in recruiting Pol IV, Pol V, and other RdDM components. Further studies are required to clarify the functions of Pol II, Pol IV, and Pol V in the RdDM pathway.

Two other RdDM components: defective in RNA-directed DNA methylation 1 (DRD1) and defective in meristem silencing 3 (DMS3), were identified from forward genetics screens 61, 62. DRD1 is an SNF2-like chromatin remodeler, and DMS3 contains a domain that is similar to the hinge region of structural maintenance of chromosome proteins. Both proteins are required for Pol V-dependent transcript production and are required for the association of Pol V with chromatin 16, 31, 63. Recent affinity purification identified a protein complex containing DRD1, DMS3, and RDM1 (DDR; Figure 1) 63. Pol V components are enriched in the DDR complex, indicating that the complex may help recruit Pol V to specific chromatin regions 63. Another SNF2-like protein, CLSY1, functions together with Pol IV and RDR2, and is required for the production 24-nt siRNAs and for the spreading of TGS 64.

De novo DNA methylation in animals

In mammals, DNA methylation patterns are set up early in development through a highly orchestrated process that involves genome-wide demethylation and de novo methylation. During preimplantation development, DNA methylation marks inherited from gametes are largely erased. On implantation, the embryo undergoes a wave of de novo methylation that establishes a new methylation pattern, which is copied during division of somatic cells 65, 66, 67. Genetic and biochemical evidence indicates that DNMT3A and DNMT3B are mainly responsible for de novo methylation, and DNMT1 is the major enzyme for maintaining DNA methylation patterns 68, 69, 70, 71, 72, 73, 74.

Another developmental stage that exhibits substantial de novo DNA methylation in mammals is gametogenesis. DNA methylation in both male and female germ cells plays a critical role in the establishment of genomic imprinting. Genomic imprinting is an epigenetic process that marks alleles according to their parental origin and results in monoallelic expression of a small subset of genes 75. Genetic studies demonstrate that DNMT3A and DNMT3L are essential for setting up DNA methylation imprints in germ cells 8, 9, 76. Although DNMT3L has no enzymatic activity, it has been shown to interact with DNMT3A and stimulate its activity 9, 77, 78, 79, 80. A more recent study showed that DNMT3L binds the N-terminal tail of histone H3, suggesting a role for DNMT3L in recruiting DNMT3A to specific genomic regions 81. Interestingly, the interaction between DNMT3L and histone H3 is inhibited by methylation at H3K4, indicating that H3K4 methylation may regulate germline imprinting 81. Consistent with this notion, Ciccone et al. 82 demonstrated that KDM1B (also known as AOF1 and LSD2), a histone H3K4 demethylase highly expressed in growing oocytes, is required for de novo DNA methylation of maternally imprinted genes (Figure 2). Active transcription across imprinting control regions also appears to be required for the establishment of DNA methylation imprints in female germ cells 83. It is possible that removal of H3K4 methylation and active transcription at imprinted loci create or maintain a chromatin state that facilitates the access of the DNMT3A-DNMT3L complex to these loci.

Figure 2
figure 2

De novo DNA methylation in the mammalian germline. Two Piwi proteins, MILI and MIWI2, are required for piRNA (Piwi-interacting RNA) generation. The piRNAs are generated in fetal male gonads, and play important roles in silencing transposons by causing DNA methylation. The primary piRNAs are bound with cytoplasmic MILI, which cleaves antisense transposon transcripts. The secondary piRNAs are bound with MIWI2, which cleaves sense transposon transcripts. The sense transposon transcripts produce primary piRNAs with 5′ uridine (U), whereas antisense transposon transcripts produce secondary piRNAs with an adenine (A) at position 10. Generation of piRNAs also require Tudor domain-containing (TDRD) proteins, mouse VASA homolog (MVH), and putative DExD-box helicase MOV10L1. The interaction between Piwi proteins and TDRD proteins is essential for generation of piRNAs. TDRD9-MIWI2 and TDRD1-MILI are two conserved complexes that generate primary piRNAs and secondary piRNAs, respectively. MVH and MOV10L1 are required for the activity of both MILI and MIWI2 in piRNA generation and de novo DNA methylation. DNMT3L interacts with unmethylated H3K4, and recruits DNMT3A to specific genomic regions for DNA methylation. The histone H3K4 demethylase KDM1B catalyzes demethylation of H3K4, by which it promotes de novo DNA methylation.

DNA methylation in mammalian germ cells is also essential for suppression of transposons. DNMT3L-deficient male mice, in addition to exhibiting imprinting defects, fail to establish de novo methylation of transposons and, as a result, show uncontrolled transposon expression and spermatogenesis failure 84, 85. This phenotype is remarkably similar to that of mice deficient for the Piwi family members MILI or MIWI2, suggesting that the Piwi-piRNA complexes and DNA methylation machinery act together to suppress transposons in germ cells 86, 87, 88.

The Piwi proteins belong to the ARGONAUTE superfamily. In contrast to the Ago subfamilies of ARGONAUTE proteins, which complex with microRNAs or siRNAs, the Piwi subfamily members bind to piRNAs (Figure 2). Tens of thousands of piRNA species, typically 24-32 nucleotide in length, have been found in mammals, zebrafish, and Drosophila, but not in plants. A considerable portion of piRNAs maps to transposon-encoding regions. The expression of Piwi proteins and piRNAs is mainly restricted to the germline, and the Piwi-piRNA pathway has been implicated in a variety of germline functions 89, 90, 91. In mice, two Piwi proteins, MILI and MIWI2, are involved in piRNA generation in fetal male gonads, and these piRNAs play important roles in silencing retrotransposons via DNA methylation (Figure 2) 86, 88, 89. Genetic studies indicate that, in male germ cells, MILI- or MIWI2-deficiency impairs de novo methylation of retrotransposons, whereas DNMT3L-deficiency leaves the piRNA pathway largely intact 92, 93. These results suggest that the Piwi-piRNA pathway acts upstream of DNA methylation. piRNA-containing complexes are proposed to guide the de novo DNA methylation machinery to transposon sequences. This role is analogous to that of the AGO4/6-siRNA complexes in RdDM and transposon suppression. No direct interaction between Piwi proteins and DNMT3 proteins was detected 93, suggesting that the recruitment of DNMT3 proteins by Piwi proteins could be indirect.

Tudor-domain-containing proteins (TDRDs) have been shown to participate in the Piwi pathway to suppress retrotransposons via DNA methylation 94, 95, 96, 97, 98. TDRD1 associates with MILI, whereas TDRD9 associates with MIWI2 (Figure 2) 95. Dimethylation of arginine in the N-terminal region of MILI is essential for the interaction with TDRD1. A global mass spectrometry study suggested that all mouse Piwi protein complexes comprise specific TDRD proteins 95. The TDRD-Piwi associations are essential for retrotransposon repression. MVH is a homolog of Drosophila VASA, a germ cell-specific DEAD-box RNA helicase 99. Similar to Piwi proteins and TDRD proteins, MVH is expressed in male germ cells from early embryogenesis to around the time of spermatid development 100. The defective spermatogenesis in MVH-deficient mice is similar to that in MILI-deficient mice 86. MVH interacts with both MILI and MIWI, and plays essential roles in piRNA production and subsequent DNA methylation (Figure 2) 101. Another putative DExD-box RNA helicase, MOV10L1, is also essential for the piRNA-dependent DNA methylation pathway 102, 103. Absence of MOV10L1 in mice results in activation of LTR and long interspersed nuclear element-1 (LINE-1) retrotransposons, followed by cell death, causing male infertility and complete blockage of spermatogenesis 103. MOV10L1 is required for biogenesis of piRNAs, and for the loading of piRNA to MILI and MIWI2 in mammalian male germ cells (Figure 2) 102. These results indicate that requirement of small RNAs and ARGONAUTE proteins is a conserved mechanism for de novo DNA methylation in mammalian male germ cells and plant RdDM.

Active DNA demethylation in plants and animals

While DNA methylation can be established and maintained, DNA demethylation also occurs in plants and animals. When DNA methylation pathways are inactivated, DNA methylation is diluted after DNA replication, leading to passive DNA demethylation. In other cases, however, DNA methylation is removed through active DNA demethylation pathways. Passive and active DNA demethylation may simultaneously reduce DNA methylation during specific developmental stages. We now review active DNA demethylation mechanisms.

In plants, active DNA demethylation was first found in a transgene system. Our previous research showed that ROS1 counteracts the DNA methylation pathway to prevent gene silencing in plants 20. In the wild-type (WT) genetic background, the construct harboring the RD29A-LUC transgene is normally transcribed in response to appropriate environmental cues 104. Mutations of ROS1 cause hypermethylation of the promoter of RD29A-LUC transgene, leading to silencing of the transgene and its homologous endogenous gene (Figure 3A). ROS1 is also required to suppress DNA methylation in a number of other endogenous genomic loci including many transposons 21, 105. The DNA demethylation function of ROS1 was further confirmed by screening for second-site suppressors of ros1. The study recovered most of known RdDM components, including NRPD1, NRPE1, NRPD2, AGO4, DRD1, HEN1, and DMS3, but also identified several previously uncharacterized RdDM components, including RDM1, RDM2/NRPD4, RDM3/KTF1, and RDM4 40, 48, 53, 57. In the ros1 background, mutation of each of these RdDM components reduces the DNA methylation level at the transgene RD29A promoter and its corresponding homologous endogenous gene (Figure 3A). The results strongly suggest that ROS1 counteracts de novo DNA methylation in transgenes by actively promoting DNA demethylation.

Figure 3
figure 3

Active DNA demethylation and its function in plants. The plant 5-methylcytosine DNA glycosylases ROS1, DME, DML2, and DML3 function as active DNA demethylases. (A) ROS1 was discovered by screening for repressor of silencing in Arabidopsis plants expressing the RD29A promoter-driven luciferase reporter gene. ROS1 prevents transgene silencing that is caused by RNA-directed DNA methylation. ROS1 also functions to prevent over-methylation and alleviate the silencing of some endogenous genes and transposons. ROS3 is an RNA-binding protein that may direct ROS1 to specific genome targets. (B) DME is preferentially expressed in endosperms, and is responsible for genome-wide DNA demethylation and gene imprinting. Genome-wide DNA demethylation activates transposons and other repetitive DNA sequences, leading to the enhanced production of siRNAs in endosperms. These siRNAs might be transported into embryos, and contribute to DNA hypermethylation, to ensure genome stability in embryos. Black and white circles represent methylated and unmethylated cytosines, respectively.

ROS1 encodes a 5-methylcytosine DNA glycosylase/demethylase, which is a nuclear protein containing a C-terminal DNA glycosylase domain and an N-terminal histone H1-like basic region 17, 20. ROS1 is a member of a small protein family comprising four DNA glycosylases that also includes DME, DML2, and DML3 25, 106. DNA glycosylases function in DNA repair processes by excising damaged or mismatched bases. The bifunctional DNA glycosylase ROS1 not only removes 5-methylcytosine from deoxyribose but also cleaves the DNA backbone at the abasic site 17, 18, 20. Overexpression of ROS1 in transgenic plants leads to a reduced level of cytosine methylation and an increased expression of target genes 17. Another Arabidopsis DNA glycosylase, DME, is preferentially expressed in central cells of the female gametophyte and endosperms of developing seeds and is required for gene imprinting in endosperms 19, 24. DME also shows 5-methylcytosine glycosylase activity, which is required for the reduced methylation and expression of maternal alleles at imprinted loci 18, 19. Recent studies showed that DME is responsible for genome-wide DNA demethylation in endosperms (Figure 3B) 23, 24. These results demonstrated that ROS1 and DME are 5-methylcytosine DNA glycosylases and function as DNA demethylases in Arabidopsis. DML2 and DML3 are in the same family as ROS1 and DME. Whole-genome DNA methylation assays in ros1dml2dml3-triple mutants showed that hundreds of endogenous loci are hypermethylated, suggesting that these DNA demethylases function throughout the genome 30, 106. Most hypermethylated loci are located in promoters and 3′-UTRs. These results indicate that ROS1, DML2, and DML3 might specifically demethylate 5′ and 3′ ends of genes but not gene-body methylation 106. Besides ros1, our forward genetic screen identified another ROS mutant ros3 107. ROS3 also counteracts the DNA methylation function of RdDM components. ROS3 is an essential regulator of DNA demethylation and acts in the same genetic pathway as ROS1 to prevent DNA hypermethylation (Figure 3A). ROS3 and ROS1 co-localize in discrete foci dispersed throughout the nucleus. ROS3 is capable of binding single-stranded RNAs, which might guide ROS1-mediated DNA demethylation 107.

ROS1-mediated active demethylation and RdDM pathways have opposing functions but have an interesting inter-dependent relationship. Mutations in RdDM components significantly reduce the ROS1 transcript level 105, 108, 109, suggesting a feedback mechanism for the regulation of DNA methylation. The hypomethylation status of some genomic loci in RdDM mutants somehow reduces the expression of ROS1 to protect the genome from further demethylation. The central cell-specific expression of DME causes genome-wide DNA demethylation in endosperms relative to embryos, but compared with WT endosperms, non-CG DNA methylation is lower in dme mutant endosperms 23, 24. This interesting result suggests that DME promotes DNA methylation indirectly at non-CG sites, possibly by upregulating RdDM components. Moreover, increased DNA methylation occurs at specific genome loci in the RdDM mutants nrpd1 and nrpe1, while reduced DNA methylation occurs at many repetitive DNA sequences 110. The increased DNA methylation at certain loci in the RdDM mutants may be due to downregulation of ROS1 in these mutants. Together, these results indicate that methylation of genomic DNA in Arabidopsis is dynamically regulated by both active DNA demethylation and DNA methylation mechanisms.

Compared with the compelling evidence for the active DNA demethylation mechanism in plants, the evidence for active DNA demethylation mechanisms in animals is controversial. No orthologs of ROS1 and DME were found in mammals 25. Global DNA demethylation occurs at two stages of embryogenesis in mammals. One is the paternal demethylation in zygotes, and the other is demethylation in PGCs from embryonic day E10.5 to E12.5 (Figure 4) 27. After fertilization, the paternal genome in zygotes shows active DNA demethylation and remains demethylated until the implantation of the blastocyst, while the maternal genome undergoes passive demethylation due to exclusion of DNMT1 from the nucleus 111. Imprinted genes are resistant to this wave of DNA demethylation. Several proteins, including DNMT1, ZFP57, and PGC7, are required for maintaining the DNA methylation imprints or protecting them from demethylation (Figure 4) 112, 113, 114, 115, 116. PGCs appear at E7.5, begin to migrate at E8.5, and arrive at the genital ridge at E11.5. During the migration, global active DNA demethylation occurs in the presence of DNMT1 117. Besides the global DNA demethylation described above, sequence-specific DNA demethylation also occurs in somatic cells in response to various signals 118, 119, 120, 121.

Figure 4
figure 4

Dynamic changes in DNA methylation during mouse development. Shortly after fertilization, the paternal genome undergoes active demethylation, and the maternal genome is passively demethylated during subsequent cleavage divisions. After implantation, the embryo undergoes de novo methylation that establishes a new methylation pattern. Imprinted genes escape the waves of demethylation and de novo methylation during embryogenesis. Genome-wide demethylation and de novo methylation also occur in the male and female germ cells during gametogenesis, which are critical for the establishment of genomic imprinting. DNMTs and other regulatory factors involved in these processes are indicated.

Glycosylase-dependent DNA demethylation was first proposed in animals 122, 123. The 5-methylcytosine glycosylase activity was initially detected in chicken embryo extracts, which contain thymine-DNA glycosylase (TDG). However, the glycosylase activity of TDG is much lower against 5-methylcytosine than against mismatched thymine 124. Thus, some other components in the chicken embryo extracts could be involved in active DNA demethylation. In addition to TDG, a methylcytosine-binding protein (MBD4) also has DNA glycosylase activity. Similar to the activity of TDG, the 5-methylcytosine glycosylase activity of MBD4 is also much lower than the TDG activity 125. Recently, MBD4 was reported to carry out active DNA demethylation in the promoter of CYP27B1 126. Thus, TDG and MBD4 might require some other proteins to activate DNA demethylation pathways in vivo.

Activation-induced deaminase AID and apolipoprotein B mRNA editing enzyme APOBEC can deaminate 5-methylcytosine to thymine in vitro 127. A study of zebrafish embryos suggested that AID, MBD4, and the DNA-repair protein GADD45A can cooperate to promote DNA demethylation, which is required for the development of zebrafish embryos 128. The results suggested that AID might deaminate 5-methylcytosine to thymine, which is followed by excision of the T/G mismatch by MBD4. As a DNA-repair-related protein, GADD45A might help to coordinate the function of AID and MBD4. A recent study suggests that AID deficiency interferes with genome-wide erasure of DNA methylation marks in mouse PGCs 129. However, no clear developmental phenotypes were found in AID- or MBD4-deficient mice 130, 131, 132, possibly due to functional redundancy.

DNMT3A and DNMT3B have been shown to deaminate 5-methylcytosine in vitro, although they are commonly known as DNA methyltransferases 121. The 5-methylcytosine deamination function of DNMTs may occur when S-adenosylmethionine (SAM) concentrations are very low. DNMT3A/DNMT3B associate with TDG/MBD4, and this interaction stimulates the glycosylase activity, allowing for the repair of the T/G mismatch 133, 134. The interaction between DNMTs and TDG/MBD4 facilitates active DNA demethylation by coupling the functions of 5-methylcytosine deaminase and thymine glycosylase. Although this mechanism has been shown to demethylate the pS2/TFF1 gene promoter upon activation by estrogens 121, its significance in mediating demethylation in other contexts remains to be determined.

Consistent with the DNA repair-driven DNA demethylation hypothesis, Hajkova et al. 135 recently showed that components of the base excision repair (BER) pathway (i.e., Parp1, Ape1, and Xrcc1) are upregulated in mouse PGCs and zygotes at time points when active DNA demethylation takes place. While it remains to be determined what triggers BER, based on the lack of expression of AID, APOBEC, or DNMT3A/DNMT3B in E11.5 PGCs, the authors suggest that deamination of 5-methylcytosine may not play a major role in PGCs 135. Another possible DNA demethylation mechanism in mammals depends on oxidation of the 5-methyl group of cytosine 136. Trypanosome base J, a modified thymine (β-D-glucosyl hydroxymethyluracil), is produced by sequential hydroxylation and glucosylation of the methyl group of thymine 27, 137. The process requires JBP1 and JBP2, enzymes of the 2OG- and Fe(II)-dependent oxygenases 138, 139. A computational search for homologs of JBP1 and JBP2 identified TET1, a mammalian enzyme that has now been shown to catalyze the conversion of 5-methylcytosine to 5-hydroxymethylcytosine 136. A recent study showed that TET1 is involved in maintaining the expression of Nanog in ES cells 140. Downregulation of Nanog via TET1 knockdown correlates with hypermethylation of the Nanog promoter, indicating that TET1 has an important role in DNA demethylation. It is possible that 5-hydroxymethylcytosine may act as an intermediate that is finally replaced by an unmethylated cytosine through DNA-repair pathways 27, 140. TET1 is highly expressed in preimplantation embryos and PGCs, and TET1 depletion in preimplantation embryos favors cell specification toward the trophectoderm lineage 135, 140. It would be interesting to determine whether TET1 plays a major role in genome-wide erasure of DNA methylation marks in PGCs and zygotes.

Recently, Okada et al. 141 provided evidence for the involvement of the ELP3-containing elongator complex in demethylating the paternal genome in mouse zygotes. The human ELP3 contains a histone acetyltransferase (HAT) domain and a SAM domain. Knockdown of ELP3, as well as of ELP1 and ELP4, impairs paternal DNA demethylation in zygotes. Importantly, the ELP3 radical SAM domain, but not the HAT domain, is required for paternal DNA demethylation, indicating that the SAM radical domain is essential for DNA demethylation. Demethylation may thus be mediated through a reaction that requires an intact radical SAM domain 141. Alternatively, the role of the elongator complex in active DNA demethylation may be indirect.

Interplay between histone modifications and DNA methylation

Both DNA methylation and histone modifications are important epigenetic marks for gene regulation. Histone H3 lysine 9 dimethylation (H3K9me2) is an important histone modification for TGS. In Arabidopsis, H3K9me2 is catalyzed by the histone methyltransferase SUVH4/KYP. SUVH4/KYP is also required for maintenance of non-CG methylation 142. SUVH4 is a member of a protein family, in which the proteins contain an SET- and RING-associated (SRA) domain and a SET domain. The SRA domain of SUVH4 can directly bind to methylated DNA. Therefore, DNA methylation is required for recruitment of SUVH4. On the other side, the non-CG DNA methyltransferase CMT3 can directly interact with the N-terminal tail of histone H3, but only when it is simultaneously methylated at both the H3K9 and H3K27 positions 143. The results suggest that the histone methylation at H3K9 and H3K27, catalyzed by SUVH4 and an unknown enzyme, provides a histone code for the recruitment of CMT3 to methylate the DNA loci. The results also suggest a self-reinforcing loop between histone H3K9 methylation and DNA methylation 144.

VIM1 is a methylcytosine-binding protein that interacts with methylated cytosine at CG and CHG sites via its SRA domain 145. The Arabidopsis vim1 mutation causes centromeric DNA hypomethylation and centromeric heterochromatin decondensation in interphase. Moreover, the localization of centromere-specific histone H3 variant, HTR12, was altered in the vim1 mutant. VIM1, which is a member of a small family of proteins containing PHD, RING, and SRA domains 145, 146, is similar to mammalian ubiquitin-like, containing PHD and RING finger domains 1 (UHRF1). Recent studies demonstrated that UHRF1 is required for maintenance of CG DNA methylation 147, 148. Similar to UHRF1 in mammals, VIM1 may also be required for recruitment of the maintenance DNA methyltransferase DNMT1/MET1 to hemimethylated DNA loci after DNA replication. VIM1 associates with histone proteins 145, suggesting it is an interface between histone proteins and DNA methylation machinery. In Arabidopsis, the vim1vim2vim3-triple mutant is late flowering, which is associated with hypomethylation at CG sites in the 5′ region of the flower repressor FWA and release of FWA gene silencing 146. Importantly, while the vim1vim3-double mutant showed decreased DNA CG methylation in the 5S rRNA genes, it gained ectopic CHH methylation 146. A similar phenomenon was also found in the met1 and ddm1 mutants, in which some specific genome loci are hypermethylated, especially at CHH sites, through the RdDM pathway 149, 150. Thus, the symmetric CG hypomethylation caused by mutation of MET1, DDM1, or VIMs leads to increased CHH methylation. The work of Mathieu et al. 109 suggested that this increase in CHH methylation is caused by RdDM combined with a lack of active DNA demethylation due to downregulation of ROS1 expression.

De novo DNA methylation in plants is established by DRM2, a protein homologous to DNMT3 in animals. In plants, targeting of DRM2 relies on the RdDM pathway. However, two SRA-SET proteins, SUVH2 and SUVH9, are also essential for DRM2-mediated de novo DNA methylation in Arabidopsis 151. In the suvh2suvh9-double mutant, DNA methylation of the RdDM targets, FWA, and suppressor of drm1drm2cmt3 (SDC), was significantly reduced, especially at asymmetric CHH sites, leading to the release of silencing. Unlike histone methylation in the suvh4 mutant, histone methylation in the suvh2suvh9 mutant was apparently not affected 151. These results indicate that the SRA-domain proteins in Arabidopsis are not only involved in symmetric CG and CHG methylation, but also in asymmetric CHH methylation.

Involvement of histone modifications in DNA methylation is also indicated by the study of increase in bonsai methylation 1 (IBM1). IBM1 is a JmjC domain-containing protein in the JHDM2 family, which consists of histone H3K9 demethylases 152, 153. IBM1 was identified by screening for mutants that show ectopic DNA methylation in the genic region of BONSAI (BNS) 154. Mutation of IBM1 induces CHG hypermethylation at BNS, which is suppressed by mutations of CHG methylase CMT3 and histone H3K9 methylase KYP 154. The results are consistent with the positive correlation between CHG methylation and H3K9 methylation 142, 143, 144. A genome-wide analysis revealed that the ibm1 mutation induces CHG hypermethylation in thousands of genes, while transposons and other repetitive DNA elements are unaffected. IBM1 might specifically counteract the function of CMT3 and KYP, preventing CHG methylation but not CG methylation in open reading frames on a genome-wide scale 155. Further study is required to investigate how IBM1 targets open reading frames and specifically functions at CHG sites. Another Arabidopsis JmjC protein, JMJ14 (a homolog of human KDM5/JARID1 protein), contributes to de novo DNA methylation through the RdDM pathway 156. Mutation of JMJ14 causes DNA hypomethylation and releases gene silencing at RdDM targets. JMJ14 seems to promote DNA methylation through demethylation of H3K4me3. JMJ14 acts downstream from RdDM components RDR2 and AGO4 to demethylate histone H3K4me3 residues and to thereby promote DNA methylation 156.

HDA6 is a histone deacetylase that catalyzes the deacetylation of histone H4 at some specific genome loci. Mutation of HDA6 results in loss of TGS at some RdDM targets, which suggests a function for HDA6 in RdDM 40, 157. In hda6 mutants, symmetric cytosine methylation at CG and CHG sites is reduced, which supports the function of HDA6 in maintaining DNA methylation 157. Aberrant RNA Pol II transcripts occur throughout the 45S intergenic spacers 158. In hda6 mutants, the transcripts are upregulated, which results in an overproduction of siRNAs. The overproduced siRNAs direct de novo DNA methylation at CHH sites, but the increased de novo methylation fails to suppress Pol I or Pol II transcription in hda6 mutants 158. The results suggest that the histone deacetylation catalyzed by HDA6 induces symmetric DNA methylation at specific chromatin regions while it suppresses asymmetric DNA methylation. Our previous forward genetic screens identified SUP32/UBP26, which encodes an Arabidopsis deubiquitination enzyme. Mutation of SUP32 decreases RNA-directed DNA methylation and histone H3K9 dimethylation, and releases TGS 159. The study showed that SUP32 deubiquitinates histone H2B at lysine143. The histone H2B deubiquitination by SUP32 represents an important histone modification that coordinates heterochromatic histone modifications with DNA methylation 159.

In the Arabidopsis genome, DNA methylation is highly correlated with the histone H2A variant, H2A.Z. H2A.Z is absent at hypermethylated DNA regions in the bodies of actively transcribed genes and in methylated transposons 160. Mutation of the MET1 DNA methyltransferase causes both loss and gain of DNA methylation and leads to opposite changes (gains and losses) in H2A.Z deposition, whereas mutation of the PIE1 subunit of the SWR1 complex that deposits H2A.Z leads to genome-wide hypermethylation 160. The results indicate that DNA methylation can influence chromatin structure and affect gene silencing by excluding H2A.Z, and that H2A.Z may protect genes from DNA methylation. It is unclear whether this protection from DNA methylation involves active DNA demethylation. The negative correlation of H2A.Z and methylated DNA is conserved between plants and animals 161.

In mammals, the maintenance DNA methyltransferase DNMT1 can recognize hemimethylated DNA as templates and catalyze DNA methylation on newly synthesized DNA strands. The proliferating cell nuclear antigen of the replication machinery has been known as a component that recruits DNMT1 to hemimethylated DNA replication foci 162. However, disruption of the interaction causes only minor reduction in DNA methylation 163. Recent studies provided evidence for the involvement of the multidomain protein UHRF1, also known as NP95 and ICP90, in recruiting DNMT1 to hemimethylated DNA during S phase. UHRF1 directly interacts with DNMT1 and, during S phase, co-localizes with DNMT1. Disruption of Uhrf1 in mouse ES cells results in severe loss of global DNA methylation 147, 148. Structural studies reveal that the SRA domain of UHRF1 recognizes hemimethylated DNA and flips 5-methylcytosine out of the DNA helix 164, 165, 166.

Polycomb group proteins are required for epigenetic transcriptional repression. The Polycomb repressive complex 2 component enhancer of zeste homolog 2 (EZH2), which catalyzes histone methylation on H3K27, has been shown to associate with DNA methyltransferases. EZH2 seems to be important for DNA methylation of EZH2-target promoters, and may serve as a recruitment platform for DNA methyltransferases 167. In ES cells, the majority of Polycomb-target genes are marked by both the repressive H3K27me3 and the activating H3K4me3 168, 169, 170, 171. This 'bivalent' modification pattern is believed to confer a gene the potential to be either activated or repressed. Interestingly, genes marked by H3K27me3 in ES cells often undergo de novo DNA methylation in cancer cells 172, 173, 174. Other studies, however, indicated that EZH2 is not required for DNA methylation, suggesting that some additional events are required for de novo DNA methylation 175, 176.

Numerous studies have shown a close relationship between histone H3K9 methylation and DNA methylation in animals. The H3K9 methyltransferase G9a is involved in the recruitment of de novo DNA methyltransferases DNMT3A and DNMT3B, as well as the maintenance DNA methyltransferase DNMT1 177, 178. G9a protein has two distinct functional domains, SET domain and ankyrin domain. The SET domain is responsible for histone H3K9 methyltransferase activity, whereas the ankyrin domain recruits de novo DNA methyltransferases DNMT3A and DNMT3B independently of the histone methyltransferase activity. The two activities of G9a are required for silencing of the pluripotency-determining gene Oct3/4 in early embryos 179. The mammalian HP1 family members mediate communication between G9a and DNMT1. The methylated H3K9 tails create a binding platform for HP1, which is sufficient for the recruitment of DNMT1 and the silencing of euchromatic genes 180. In adult neural stem cells, knockdown of G9a leads to demethylation of the Oct4 promoter and partial reactivation of Oct4 expression, supporting the correlation between histone methylation and DNA methylation 181. In zebrafish, DNMT3 is required for proper neurogenesis. The neurogenesis regulator, LEF1, is a DNMT3-specific target gene that is demethylated and upregulated in the dnmt3 mutant. DNMT3 cooperates with G9a in regulating DNA methylation of LEF1. DNA methylation and H3K9 trimethylation on the LEF1 promoter were reduced in both dnmt3 and G9a mutants. The results suggested that the cooperation between the DNA methyltransferase DNMT3 and the histone methyltransferase G9a is required for the silencing of critical regulators during neurogenesis 182.

SUV39H1 and SUV39H2, which mediate H3K9 trimethylation at pericentric heterochromatin 183, have also been shown to regulate DNA methylation (Figure 4). In suv39h1/h2-double null mouse ES cells, DNA methylation is impaired at pericentric satellite repeats, but not at other sequences examined. Co-immunoprecipitation experiments showed that DNMT3B interacts with HP1, suggesting that HP1 may act as a bridge between H3K9me3 and DNMT3B. The data indicate that histone methyltrasferases SUV39H1/H2 direct H3K9 trimethylation and DNMT3B-dependent DNA methylation to pericentric repeats to reinforce the stability of the heterochromatin regions 184.

Similar to that in plants, in animals, while histone methylation seems to act upstream of DNA methylation in some cases, DNA methylation also affects histone methylation. For example, the methylcytosine-binding protein, MBD1, has been shown to recruit the H3K9 methyltransferase SETDB1 to the large subunit of chromatin assembly factor CAF-1 to form a CAF-1/MBD1/SETDB1 complex that facilitates H3K9 methylation during DNA replication 185. Moreover, SETDB1 interacts with de novo DNA methyltransferases DNMT3A and DNMT3B. SETDB1 and DNMT3A simultaneously occupy the promoter of RASSF1A, which is essential for the silencing of this gene in human cancer cells. The results support the functional connection between the histone H3K9 methyltransferase SETDB1 and DNA methyltransferase DNMT3A in epigenetic transcriptional repression 186.

In contrast to H3K9 methylation, which usually facilitates DNA methylation, H3K4 methylation appears to protect DNA from de novo methylation. The finding that the ADD domain of DNMT3L binds histone H3 tail only if H3K4 is unmethylated suggests a possible molecular mechanism for the inhibitory effect of H3K4 methylation on DNA methylation 81, 187. The recent discovery that H3K4 demethylation by KDM1B is a prerequisite for the establishment of genomic imprinting in female germ cells provides genetic evidence supporting such a mechanism (Figure 2B) 82. It is worth noting that DNMT3L-deficient mice show severe phenotypes in the germline, but not in somatic tissues. There is evidence that DNMT3A and DNMT3B, via their ADD domain, can also bind H3K4-unmodified H3 tail, suggesting that a similar mechanism may be operative in somatic cells 188, 189.

Recent studies have revealed more complex relationships between the histone and DNA methylation systems. Wang et al. 190 recently identified DNMT1 as a substrate for the lysine demethylase LSD1 (also known as KDM1A). Lsd1-deficient mouse ES cells show progressive loss of global DNA methylation. This loss correlates with a decrease in DNMT1 protein, due to reduced DNMT1 stability. DNMT1 protein is methylated in vivo, and LSD1 deficiency enhances DNMT1 methylation (Figure 4) 190. Furthermore, DNMT1 can be methylated by Set7/9 (an H3K4 methyltransferase) and demethylated by LSD1 in vitro 190, 191. These results suggest that DNMT1 stability is regulated by lysine methylation. LSD1 and Set7/9 (and perhaps other histone methyltransferases and demethylases as well), by acting directly on both histones and DNMT1, may play a role in coordinating histone methylation and DNA methylation.

Function of DNA methylation

Function of DNA methylation in plants

The genomes of higher plants comprise many transposable elements that potentially disrupt genome stability. In plants, high DNA methylation generally occurs in these transposable elements 5. Methylation of symmetric cytosines mainly relies on the methylase MET1 and chromatin remodeler DDM1, whereas methylation at CHG and CHH sites mainly depends on the methylases CMT3 and DRM2, respectively 36. Loss of DNA methylation in mutants of these genes leads to the release of transcriptional silencing of transposable elements and other repetitive DNA sequences. Mutation of DDM1 causes a variety of developmental abnormalities. One of the ddm1-induced abnormalities in the clam mutant is caused by transposition of an endogenous CACTA family transposon CAC1 from its original site to the DWF4 locus 192. The bns mutant is due to silencing of APC13 in the ddm1 background, in which a LINE retrotransposon is inserted 193. Transposition of several LTR retrotransposons was also found in the ddm1-induced defective mutants through genomic tiling arrays 194. In the epigenetic recombinant inbred lines derived from a cross between WT and met1, movement of the LTR transposon EVD was detected. The 5′-LTR of EVD has a high methylation level in WT plants, but the methylation is erased in met1 and in those lines with transposition. The EVD RNA transcript is present in met1 but not in WT. The transcript is translated and subjected to reverse transcription, producing extrachromosomal DNA 195. The extrachromosomal DNA and EVD movement are absent in the first generation of the met1 mutant, but they occur in the second and subsequent generations, which is similar to what happens in the ddm1 mutant. In the met1nrpd2a-double mutant, the EVD RNA transcript is synergistically increased, and EVD extrachromosomal DNA accumulates. Moreover, a high rate of EVD transposition is found in the double mutant 196. The results indicated that NRPD2, as a component of RdDM, cooperates with MET1 to inhibit EVD transposition. In nrpd2 and other RdDM-single mutant plants, transposons show increased transcript levels but transposition generally does not happen.

Gene imprinting is widespread in mammals as well as in flowering plants, although the underlying mechanisms are different. In Arabidopsis, parent-of-origin-specific gene expression is primarily found in the endosperm during seed development. The previously characterized imprinted genes, including FWA, MEA, FIS2, and PHERES1, are specifically expressed from the maternal genome of the endosperm while the alleles from the paternal genome are silenced 19, 197, 198, 199. FWA, a suppressor of flowering, was initially identified from late-lowering epigenetic mutants that show ectopic FWA expression. The hypomethylation of the DNA repeat region in the FWA promoter is required for the normal expression of FWA 200. FWA is expressed in the endosperm but not in any other tissues. The FWA imprint depends on DME 197. Unlike the constitutive expression of ROS1, the expression of DME is concentrated in the central cells and endosperm during seed development 22. DME specifically demethylates the maternal genome in central cells (progenitors of endosperm) and leads to the expression of imprinted genes (Figure 3B) 19, 23, 24.

Although maternal-origin-specific gene expression was found in the endosperm of Arabidopsis, only a small number of imprinted genes (such as FWA, MEA, FIS2, and PHERES1) had previously been identified. New imprinted genes were subsequently found when imprinted sequences in the endosperm and embryo were explored genome-wide 23, 24. The new imprinted genes also show hypomethylation and high expression in endosperms relative to other parts of the plant. Moreover, at the whole-genome level, transposable elements are extensively demethylated in endosperms (Figure 3B). These data suggest that imprinting in plants might have evolved from targeted methylation of transposable elements inserted near genic regulatory elements 23, 24. The genome-wide demethylation in endosperms is accompanied by extensive non-CG hypermethylation of siRNA-targeted transposon sequences.

Mosher et al. 201 found that the 24-nt siRNAs, which are capable of inducing DNA methylation, are highly expressed during early embryogenesis. The predominant phase of Pol IV-dependent 24-nt siRNA accumulation is initiated in the maternal gametophyte and continues during seed development. This discovery of maternally expressed Pol IV-dependent siRNAs in the endosperms of developing seeds greatly expands the catalog of imprinted loci, which now include siRNA-producing sequences throughout the Arabidopsis genome 110. Because the genome-wide demethylation and transcription occur in the genome of central cells and persist in endosperms after fertilization, the widespread demethylation in the maternal-origin genome may contribute to the overexpression of 24-nt siRNAs in endosperms during embryogenesis (Figure 3B). Several recent studies indicated that the genome-wide hypomethylation caused by mutation of DDM1 or MET1 induces the expression of 24-nt siRNAs, and thereby activates de novo methylation pathways at the hypomethylated sites 149, 150, 202. The highly expressed 24-nt siRNAs in endosperms may be translocated to embryos and help reinforce silencing of transposable elements in the embryonic genome (Figure 3B).

Transposable elements in the Arabidopsis genome are normally methylated and silenced. Pollen grains contain one vegetative cell and two accompanying sperm cells. Transposable elements can be reactivated and transpose in the pollen vegetative cell but not in the sperm cells, which provide DNA to the fertilized zygote 203. DNA in the pollen vegetative nucleus has reduced methylation, which correlates with the reactivation and transposition of transposable elements. The expression of transposable elements in the vegetative cell may be important to ensure the silencing of these elements in sperm cells 203. The silencing status of transposable elements in the male gametes is essential for genome stability and integrity. Many genes involved in the RdDM pathway have reduced expression in mature pollen 203, 204. This reduced expression coincides with the downregulation of 24-nt siRNAs in the vegetative cell nucleus. However, the epigenetic reactivation of transposable elements in the pollen vegetative nucleus leads to the accumulation of 21-nt siRNAs from transposable elements. These 21-nt siRNAs could move to the sperm cell to posttranscriptionally silence transposons that may have escaped TGS 203. Although passive DNA demethylation contributes to the reactivation of transposable elements in the pollen vegetative nucleus 203, it is unclear whether active DNA demethylation is also involved.

Mutation of DDM1 leads to severe loss of DNA methylation in the Arabidopsis genome. This loss persists in F1 plants obtained in crosses with the WT, although the ddm1 mutation is recessive 202. This finding suggests that DDM1 is involved in the maintenance of DNA methylation but is not sufficient for establishment of DNA methylation. In subsequent generations, however, the hypomethylated DNA loci can be partially remethylated by siRNA-dependent DNA methylation 202. These remethylatable sequences are characterized by an abundance of siRNAs. Methylation of the remethylatable sequences requires the RNA-directed DNA methylation pathway 202. The results suggest an important role of RdDM in protecting genomes against long-term epigenetic defects 202, 205.

In Arabidopsis, siRNA-dependent de novo DNA methylation mainly targets transposons and other repetitive DNA sequences in pericentromeric regions and the end of chromosomes where few genes are distributed. RdDM, however, also functions in gene regulation, especially for those genes flanked by transposons and other repetitive DNA sequences. Some RdDM mutants display late-flowering phenotypes under short-day conditions 39. FWA is a flowering suppressor in Arabidopsis. The promoter region of FWA is hypermethylated and silenced in WT plants 206. In fwa mutant plants, however, FWA is hypomethylated and expressed. De novo DNA methylation of FWA depends on the RdDM components such as DRM2, NRPD1, RDR2, and DCL3 207. The results partially explain the late-flowering phenotype of the RdDM mutants. In the drm1drm2cmt3 (ddc)-triple mutant, DNA methylation in most of the non-symmetrical (CHG and CHH) sequence contexts is lost, and the mutant has defective developmental phenotypes. The gene responsible for the developmental phenotypes has been identified as SDC, which encodes an F-box protein and has seven promoter tandem repeats upstream of SDC. In WT, the tandem repeats of SDC is methylated. The methylation requires DRM2 and CMT3, which are recruited to the repeats in an siRNA-dependent manner. In the ddc mutant, DNA methylation of the repeats is reduced and thereby the expression of SDC is activated, which causes the developmental phenotypes 208. For most RdDM components, however, no severe developmental phenotype was found in their defective alleles. An exception is RDM4/DMS4. The rdm4/dms4 mutants show pleiotropic developmental phenotypes 57, 58. As a putative transcriptional regulator, RDM4/DMS4 is not only required for the RdDM pathway by affecting Pol V transcription, but is also critical for Pol II transcription of some coding sequences including developmentally important genes.

In contrast to the lack of strong developmental phenotypes in most RdDM mutants in Arabidopsis, mutation of the RDR2 and NRPD1 orthologs in maize causes significant developmental defects 209, 210. The different effects of RdDM on development between Arabidopsis and maize may be caused by the different abundance and distribution of RdDM targets at the genome level. Compared with the small-sized Arabidopsis genome, the maize genome has more transposons and other non-coding DNA repeats, and thus more developmentally important genes are flanked and influenced by repetitive sequences that are targeted by RdDM. Moreover, a genome-wide DNA methylation analysis in rice indicated that DNA methylation in rice is enriched in the promoter regions of some endogenous genes. The promoter DNA methylation is associated with transcriptional repression 211. As a monocot, maize may have a similar DNA methylation pattern with rice. The defective development phenotype in the maize rdr2 and nrpd1 mutants is likely due to reduced DNA methylation at the promoter regions of development-related genes. Investigating DNA methylation in plants with larger genomes will enhance our understanding on the regulation and function of DNA methylation.

Paramutation is a well-studied epigenetic phenomenon first described in maize. It is defined as an interaction between two alleles of a single locus, resulting in a heritable change of one allele that is induced by the other allele. The booster 1 (b1) is a well-studied locus that shows paramutation. The b1 locus encodes a bHLH transcription factor, which activates genes involved in anthocyanin biosynthesis in maize 212. The two alleles of the b1 locus are B-I (paramutable allele, active) and B′ (paramutagenic allele, inactive), and these alleles possess the same DNA sequence but differ in DNA methylation. When B-I and B′ are crossed, the inactive B′ allele silences the B-I allele and leads to the lack of expression of the bHLH transcription factor gene in the hybrid plants. The tandem repeats ∼100 kb upstream of the b1 locus are required for paramutation 213. Recent studies with maize determined that MOP1, ZmRPD1, and RMR7 (the orthologs of Arabidopsis RDR2, NRPD1, and NRPD2, respectively) are required for paramutation, suggesting that the RdDM pathway is required for establishment and maintenance of silencing in paramutation 210, 214, 215.

In rice, the spontaneous mutant Epi-d1 shows a metastable dwarf phenotype 216. The phenotype is mitotically and meiotically heritable and corresponds to the metastable epigenetic silencing of the DWARF1 (D1) gene. The silenced state is correlated with DNA hypermethylation in the D1 promoter region. The epigenetic state of D1 is bidirectionally mutable, from active to repressed and from repressed to active. The bidirectional epigenetic state might indicate that the epigenetic regulation of D1 is due to de novo DNA methylation caused by RNA. Epigenetic alleles such as Epi-d1 could provide for a mechanism of rapid adaptation to changing environmental conditions 216.

Function of DNA methylation in animals

In mammals, DNA methylation and demethylation are involved in diverse processes including early embryogenesis 217, 218, stem cell differentiation 219, 220, genomic imprinting 75, 221, X chromosome inactivation 222, 223, and silencing of repetitive elements 224. DNA methylation is also involved in regulating neuronal development and development of cancers 225, 226.

In mammals, complex changes in DNA methylation levels occur during embryonic development. Immediately after fertilization, the paternal and maternal genomes in zygotes differ in DNA methylation levels, i.e., the DNA methylation level is initially higher in the paternal than in the maternal genome 217. Within 3 to 6 h after fertilization, however, the maternal genome is rapidly methylated through de novo DNA methylation, while the paternal genome is actively demethylated in an elongator-dependent manner (Figure 4) 141. Demethylation of the paternal genome persists in preimplantation embryos. After rapid de novo DNA methylation in the maternal genome of zygotes, passive DNA demethylation occurs gradually during DNA replication cycles 217, 227. The replication-dependent demethylation is due to exclusion of maintenance methyltransferase DNMT1 from the nucleus of embryos 111, 228. At the morula stage, the parental and maternal genomes have equally low DNA methylation levels (Figure 4). Embryonic DNA methylation patterns are established after implantation through lineage-specific de novo methylation. DNA methylation levels increase rapidly in the primitive ectoderm, which gives rise to the entire embryo, whereas methylation is either inhibited or not maintained in the trophoblast and the primitive endoderm lineage, which give rise to the placenta and yolk sac membrane, respectively 217, 218, 229, 230. Demethylation and de novo methylation also occur during gametogenesis, which, as discussed above, are critical for the establishment of genomic imprinting and for the suppression of transposons in germ cells 65.

Our knowledge about the roles of DNA methylation in mammalian development comes mainly from genetic manipulations of DNMTs in mice. Studies of the zygotic functions of DNMTs have shown that the establishment of embryonic methylation patterns requires both de novo and maintenance methyltransferase activities, and that the maintenance of DNA methylation above a threshold level is essential for embryonic development (Figure 4) 71, 72, 74. Complete elimination of DNMT1 function results in embryonic lethality around E9.5, with extensive loss of global DNA methylation 72. DNMT3B is also essential for embryogenesis. DNMT3B-deficient embryos show growth impairment and multiple developmental defects after E9.5 and die after E12.5. DNMT3A-mutant mice die around at 4 weeks of age. DNMT3A/DNMT3B-double knockout embryos die around E9.5, similar to DNMT1-null mutants 74.

The underlying mechanisms for the developmental defects observed in the DNMT mutants are not fully understood. Loss of DNA methylation does not affect ES proliferation and viability, and the effect of demethylation only becomes apparent during or after gastrulation when the pluripotent embryonic cells begin to differentiate 71, 72, 74. Consistent with these data, DNMT1-deficient or DNMT3A/DNMT3B-double mutant ES cells show severe differentiation defects 231. Conditional disruption of DNMT1 in mouse embryonic fibroblasts (MEFs) results in severe demethylation and cell death, and DNMT3B-deficient MEFs show moderate demethylation, chromosomal instability, and abnormal proliferation 232, 233. These findings suggest that DNA methylation is essential for cellular differentiation and the normal functioning of differentiated cells.

The pluripotency genes are generally hypomethylated in stem cells and gain methylation during cell differentiation 234. Oct4 and Nanog are two such pluripotency genes, and they are essential for the pluripotency of ES cells. Recently, Kim et al. 235 demonstrated that the transcription factor Oct4 is sufficient to reprogram human neural stem cells to pluripotency. De novo DNA methylation and silencing of the pluripotency genes contributes to loss of pluripotency in differentiated cells. Genome-wide DNA methylation analysis indicated widespread differences in DNA methylation between the genomes of ES cells and differentiated cells 4. Non-CG methylation is widespread in ES cells and induced pluripotent stem cells but not in differentiated cells. Methylation in non-CG contexts shows enrichment in gene bodies and depletion in protein binding sites and enhancers 4. The results suggest that DNA methylation is highly associated with mammalian development and cell pluripotency. AID is involved in active DNA demethylation through its deaminase activity. It is required for promoter demethylation and induction of Oct4 and Nanog gene expression during reprogramming toward pluripotency in single heterokaryons 236. AID protein binds silent methylated Oct4 and Nanog promoters in fibroblasts but not active and demethylated promoters in ES cells. AID-mediated active DNA demethylation is required for nuclear reprogramming toward pluripotency in human somatic cells 236.

Although DNMT1 is dispensable for ES cell maintenance, it is required for maintaining the somatic progenitor state through cell divisions. Depletion of the maintenance DNA methyltransferse DNMT1 in epidermal progenitors leads to premature differentiation. Genome-wide DNA methylation analysis showed that some epidermal differentiation gene promoters were methylated in self-renewing progenitor cells but were subsequently demethylated during differentiation 237. DNMT1 and UHRF1, which target DNMT1 to hemi-methylated DNA, are involved in suppressing epidermal differentiation gene induction. In contrast, Gadd45a and Gadd45b, which promote DNA demethylation, are required for full epidermal differentiation gene induction 237. The results suggest that the dynamic regulation of DNA methylation is important for maintenance and differentiation of progenitor cells.

Somatic tissues have specific gene expression patterns. Because DNA methylation in promoters of genes suppresses gene expression, the correlation between gene expression and DNA methylation was investigated in diverse tissues. SERPINB5 is a potential tumor suppressor gene expressed in specific human tissues. Its expression level in different tissues highly correlates with the DNA methylation level of its promoter region 238. Global DNA methylation studies suggest that DNA methylation is critical for regulating the expression of some tissue-specific genes 239, 240, 241. Furthermore, comparison of DNA methylation levels in embryonic tissues derived from different germ layers revealed that differentially methylated regions located about 2 kb apart from CG islands may be involved in tissue-specific gene expression 242. The results suggested that different tissue types have unique DNA methylation patterns that can contribute to their lineage specificity. As noted earlier, DNA methylation helps regulate neuronal development and development of cancers 225, 226. In the case of neuronal development, this methylation involves DNMT3A, which is expressed in postnatal neural stem cells and is required for neurogenesis 243.

Defects in DNA methylation can cause several developmental disorders and diseases in humans. Deficiency of the de novo DNA methyltransferase DNMT3B leads to a rare autosomal recessive disorder, ICF syndrome 74, 244. ICF syndrome is characterized by mental retardation, reduced growth, distinct facial abnormalities, and immunodeficiency in children 245, 246. Cells from ICF patients show hypomethylation of classical satellite DNA 247. MECP2 is a methyl-CG-binding protein, which was identified as the target of mutations that cause Rett syndrome 248. Rett syndrome is characterized by loss of acquired skills in affected girls. The X-linked MECP2 contributes to repression of many genome targets. One of the targets is brain-derived neurotrophic factor, an important protein in neuronal plasticity 119. Gadd45 proteins were identified as key factors that promote DNA demethylation through DNA repair 249. Mice with Gadd45b deletion exhibit specific deficits in neural activity-induced proliferation of neural progenitors 250. Gadd45b is required for activity-induced DNA demethylation of specific promoters and expression of corresponding genes critical for adult neurogenesis 250. Thus, Gadd45b may link neuronal circuit activity to DNA demethylation and gene expression in the mammalian brain.

A hallmark of many cancers is global hypomethylation and regional hypermethylation of CG islands. In mouse models, hypomethylation has been shown to induce genomic instability and tumorigenesis 251, 252. Although the causes of hypomethylation in cancer cells are poorly understood, a recent study suggests that upregulation of a DNA demethylase system may play a role in some cases 182. Hypermethylation of CG islands can lead to silencing of tumor suppressor genes and, thus, may also contribute to tumorigenesis. Cancer cells often show overexpression of DNMTs, especially DNMT1, which may cause abnormal hypermethylation. DNMT1 has been shown to be essential for the survival and proliferation of human cancer cells 253.

Conclusions

DNA methylation is a relatively stable epigenetic mark. Nevertheless, it is still dynamically regulated. Plants and animals share some common mechanisms for the regulation of DNA methylation and demethylation. In both plants and animals, maintenance of DNA methylation depends on the maintenance DNA methyltransferases during DNA replication. RNA-directed DNA methylation is a key mechanism that silences repetitive DNA elements in plants, and this mechanism relies on the plant-specific 24-nt siRNAs for de novo DNA methylation. In mammals, de novo DNA methylation in germ cells also relies on small RNAs, although they are the animal-specific piRNAs. The requirement of small RNAs for at least some de novo DNA methylation is evidently shared by plants and mammals. Moreover, both types of small RNAs are loaded onto ARGONAUTE proteins, and both mainly target transposable elements and other repetitive DNA sequences. Comparison of de novo DNA methylation in plants and mammals might help us understand the general aspects of RNA-guided DNA methylation mechanisms.

In mammals, genome-wide DNA methylation and demethylation occur during gametogenesis and early embryogenesis. These processes may involve multiple DNA demethylation mechanisms. ELP3 is involved in active DNA demethylation of the paternal genome in zygotes 141. TET1 is required for ES cell maintenance and inner cell mass cell specification 140. Although AID has no genome-wide effect in mammals, it is required for the demethylation and induction of two key pluripotency genes, Oct4 and Nanog, during reprogramming toward pluripotency 236. In plants, genome-wide DNA methylation changes are found in central cells and endosperms, which are nutrition tissues and which do not pass their DNA to the next generation. It is still unclear whether there is global resetting of DNA methylation during gametogenesis and embryogenesis in plants.

Although active DNA demethylation has been studied, how the process is initiated and regulated is still unknown. ROS3 is an RNA-binding protein that seems to cooperate with ROS1 in the plant DNA demethylation pathway 107. However, we need to know whether and how non-coding RNAs are involved in DNA demethylation. It would also be useful to know how the DNA methylation and demethylation pathways are coordinated in vivo. In mammals, the DNA methyltransferases DNMT3A and DNMT3B are also reported to be involved in DNA demethylation 121. Learning how DNA methylation and demethylation are reversed will help us understand the dynamic regulation of DNA methylation.

Gene-body methylation is widespread in plants and animals, but the function of such methylation requires further study. Whole-genome DNA methylation and transcription patterns suggest that gene-body methylation is conserved between plants and animals 161, 254. Gene-body methylation is correlated with the transcription pattern of genes, suggesting an important role of gene-body methylation in gene regulation. A recent study indicated that the DNA methylation in non-promoter intergenic regions and gene bodies promotes gene expression by functionally antagonizing Polycomb repression. Non-promoter DNA methylation seems to be important for the maintenance of active chromatin states of genes 243. Further studies are required to clarify how gene-body methylation is established and maintained, and how gene transcription may be regulated by gene-body methylation in animals as well as in plants.