Introduction

Methylation at the 5-position of cytosine (5-methylcytosine, 5mC), which occurs predominantly in the context of cytosine-phosphate-guanine (CpG) dinucleotides, is a common modification present in mammalian genomes. DNA methylation is essential for mammalian development and has crucial roles in a variety of biological processes including regulation of gene expression, genomic imprinting, X chromosome inactivation and retrotransposon silencing.1, 2 Aberrant changes of genomic DNA methylation patterns and genetic mutations of components of the DNA methylation machinery are linked to numerous human diseases including developmental syndromes, neurological diseases, immunological disorders and various types of cancer.3, 4, 5, 6

In mice, DNA methylation patterns are established and maintained by three active DNA methyltransferases (Dnmts)—Dnmt1, Dnmt3a and Dnmt3b. Dnmt3a and Dnmt3b function primarily as de novo methyltransferases that set up DNA methylation patterns during early embryogenesis, whereas Dnmt1 is the major maintenance enzyme that copies the CpG methylation pattern from the parental strand onto the daughter strand during DNA replication.7 Genetic studies revealed that Dnmt1 and Dnmt3b are essential for embryogenesis and Dnmt3a is required for postnatal survival.8, 9, 10 Dnmt3a also cooperates with its cofactor Dnmt3L, a Dnmt3-like protein with no enzymatic activity, in mediating DNA methylation in developing germ cells, including the establishment of methylation marks at imprinting control regions (ICRs).11, 12, 13

DNA methylation is considered to be a relatively stable modification. However, waves of global demethylation occur in two developmental stages—preimplantation embryos and developing primordial germ cells (PGCs)—through both DNA replication-independent ‘active’ and DNA replication-dependent ‘passive’ processes.2, 14 Progress in understanding the mechanisms of demethylation and the major players involved had been slow until the recent discovery that 5mC can be converted to 5-hydroxymethylcytosine (5hmC) by the ten-eleven translocation (Tet) family of dioxygenases.15, 16 Studies in the past several years have revealed that 5hmC is an intermediate in the process of demethylation and distinct Tet proteins appear to be involved in methylation erasure in the zygote and PGCs. Evidence has also emerged for the involvement of Tet-mediated 5mC oxidation in other biological processes.

In this review, we discuss recent progress in our understanding of the biological functions of 5hmC and Tet proteins with an emphasis on mammalian development.

Tet proteins as 5mC dioxygenases

Although 5hmC was identified in mammalian DNA in 1972,17 its significance and biological function had not been explored, largely because it is present in relatively low levels in most cell types. In 2009, two research groups reported abundant 5hmC in mouse Purkinjie neurons and embryonic stem (ES) cells.15, 18 More importantly, Tahiliani et al.15 identified Tet1, Tet2 and Tet3 as mammalian homologs of the trypanosome proteins JBP1 and JBP2, which oxidize the 5-methyl group of thymine, and experimentally demonstrated that Tet1 has the capacity to catalyze the conversion of 5mC to 5hmC. Shortly afterward, Tet2 and Tet3 were also shown to have 5mC hydroxylase activity.16, 19 Subsequent studies revealed that Tet proteins can further catalyze the oxidation of 5hmC to 5-formylcytosine (5fC) and 5-carboxylcytosine (5caC), two less abundant bases.20, 21

The Tet proteins belong to the 2-oxoglutarate (2OG)- and Fe(II)-dependent dioxygenase (2OGFeDO) superfamily.15, 22 All three Tet proteins contain a C-terminal catalytic domain, which consists of a cysteine-rich region and a double-stranded β-helix fold characteristic of the 2OGFeDO superfamily. Tet1 and Tet3, but not Tet2, also contain an N-terminal CXXC zinc finger domain, a DNA-binding motif (Figure 1).

Figure 1
figure 1

Schematic diagrams of mouse Tet proteins. There are three Tet proteins in mice: Tet1, Tet2 and Tet3. They all have a C-terminal catalytic domain, consisting of a cysteine-rich region and the double-stranded β-helix (DSBH) fold characteristic of the 2-oxoglutarate- and Fe(II)-dependent dioxygenase family. Tet1 and Tet3, but not Tet2, have an N-terminal CXXC zinc finger domain, a DNA-binding domain.

The CXXC domain, present in multiple chromatin-interacting proteins such as CFP1, MBD1, MLL and Dnmt1, has been shown to selectively bind unmethylated CpG dinucleotides.23 Sequence alignment revealed that the CXXC domains of Tet1 and Tet3 lack a KFGG motif that is present in many other CXXC domains.23, 24 The DNA-binding property of the Tet1 CXXC domain is controversial: one report showed no specific DNA-binding activity,24 whereas another report showed binding to unmodified, 5mC-modified and 5hmC-modified CpG-rich DNA.25 The discrepancy could be because of the different DNA-binding assays and substrates used in these studies. Recently, it was shown that the CXXC domains of Xenopus and human TET3 bind only unmodified cytosines, regardless of the sequence contexts (with a slight preference for CpG content), and that in Xenopus, the CXXC domain of Tet3 works cooperatively with the catalytic domain in targeting Tet3 to its binding sites during development.26 Notably, mouse Tet3 has a shorter isoform that lacks the N-terminal 135 amino acids including the CXXC domain (GenBank accession no. NM_183138). It would be interesting to determine the expression patterns and functional differences of the two isoforms.

Tet2 lost its CXXC domain during evolution due to a chromosomal inversion event that split the ancestral Tet2 gene into two distinct genes, Idax, which encodes a protein containing the ancestral CXXC domain of Tet2, and Tet2, which encodes the current Tet2 protein. Interestingly, the two proteins physically interact.27 IDAX, which preferentially binds DNA sequences containing unmethylated CpG via its CXXC domain, could have a role in recruiting Tet2 to its genomic targets. Strikingly, IDAX induces Tet2 degradation by caspase-mediated cleavage.27

5hmC, 5fC and 5caC as intermediates of DNA demethylation pathways

The finding that 5mC can be converted to 5hmC by Tet proteins immediately raised the possibility that this conversion could be involved in DNA demethylation.15, 16 Indeed, Tet-mediated oxidation of 5mC appears to be the only source of 5hmC, as Dnmt1/Dnmt3a/Dnmt3b triple knockout ES cells lack both 5mC and 5hmC28, 29, 30 and depletion of Tet proteins substantially reduces or abolishes the production of 5hmC in various cells and tissues.15, 16, 29, 31, 32, 33, 34, 35, 36, 37 However, Tet proteins alone do not seem to be sufficient to complete DNA demethylation by converting 5hmC, 5fC or 5caC to unmodified cytosine. Thus, it is generally accepted that 5hmC, 5fC and 5caC are intermediates in the process of DNA demethylation.

Various mechanisms of DNA demethylation involving Tet-mediated oxidation have been proposed (Figure 2). The simplest mechanism is ‘passive’ dilution of 5hmC, 5fC and 5caC owing to the lack of maintenance during DNA replication. In support of this mechanism, in vitro assays have revealed that Dnmt1 methylates hemi-hydroxymethylated CpG sites much more poorly than hemi-methylated CpG sites.38, 39

Figure 2
figure 2

Proposed DNA demethylation pathways involving Tet proteins. Tet proteins catalyze 5mC oxidation to 5hmC, which can be further converted to 5fC and 5caC by Tet proteins or to 5-hydroxymethyluracil (5hmU) by AID/APOBEC deaminases (recent evidence suggests that 5hmC is an unlikely substrate for AID/APOBEC). 5mC can also be deaminated to thymine (T) by AID/APOBEC. 5fC, 5caC, 5hmU and T can then be excised by glycosylases (TDG and SMUG1) and replaced by unmodified cytosine (C) following base excision repair. As 5hmC, 5fC, and 5acC are poorly recognized by Dnmt1, demethylation can also be achieved by passive dilution with DNA replication. In addition, Dnmt3a and Dnmt3b have been shown to function as dehydroxymethylases that directly convert 5hmC to C in vitro, and putative deformylases and decarboxylases could directly convert 5fC and 5caC, respectively, to C. Solid lines represent processes with relatively strong evidence and dashed lines represent processes that need to be further confirmed.

Several DNA replication-independent ‘active’ pathways have also been suggested. First, 5hmC can be further oxidized to 5fC and 5caC, which can be recognized and excised from DNA by thymine DNA glycosylase (TDG).20, 40, 41 The resulting abasic site could then be repaired by the base excision repair (BER) pathway, thus generating an unmodified cytosine. Another possibility is that deformylases or decarboxylases could convert 5fC and 5caC directly to unmodified cytosine, although whether such enzymes exist remains an open question. Second, the AID/APOBEC family of deaminases has been shown to deaminate 5hmC to 5-hydroxymethyluracil, which can then be excised by TDG and SMUG1, another DNA glycosylase, and replaced by cytosine through BER.42, 43 Deamination of 5mC by AID/APOBEC enzymes, resulting in a T:G mismatch leading to subsequent repair by TDG and BER, has also been implicated in DNA demethylation.43, 44, 45, 46 Third, a recent study provided in vitro evidence that Dnmt3a and Dnmt3b, in addition to their methyltransferase activity, function as dehydroxymethylases that convert 5hmC directly to cytosine. The methyltransferase and dehydroxymethylase activities seem to be regulated by the redox state of the enzymes. Reduction conditions (for example, the presence of DTT or β-mercaptoethanol) inhibit their dehydroxymethylase activity, whereas oxidation conditions (for example, presence of H2O2) inhibit their methyltransferase activity.47 The bacterial HhaI methyltransferase has also been shown to have dehydroxymethylase activity in vitro.48 Interestingly, a previous study suggests that Dnmt3a and Dnmt3b exhibit dual actions in mammalian cells, being involved in both CpG methylation and active demethylation at some loci (for example, pS2 gene promoter), although the mechanism of demethylation was suspected to involve Dnmt3a/3b-mediated deamination of 5mC, TDG and BER.49 It would be interesting to revisit the mechanism of demethylation and determine whether the dehydroxymethylase activities of Dnmt3a and Dnmt3b are partly responsible.

Tet proteins in demethylation in preimplantation embryos

Genome-wide analysis reveals that the male and female gametes have different levels of CpG methylation, with ∼90% in sperm and ∼40% in oocytes.50, 51 After fertilization, most DNA methylation marks inherited from gametes are erased during preimplantation development, exceptions include those associated with ICRs and intracisternal-A particles that resist this wave of global demethylation.50, 51, 52, 53 The mechanisms by which the paternal and maternal genomes undergo demethylation are distinct. In the zygote, the male pronucleus, but not its female counterpart, undergoes rapid global loss of 5mC before the onset of DNA replication, suggesting an active mechanism.54, 55, 56 In contrast, the maternal genome is passively demethylated during cleavage divisions, presumably owing to the exclusion of Dnmt1, the maintenance DNA methyltransferase, from the nucleus.57

Recent studies using immunofluorescence revealed that, in the zygote, concomitant with the loss of 5mC signal in the male pronucleus, there is a dramatic increase in 5hmC, as well as 5fC and 5caC, thus suggesting Tet-mediated 5mC oxidation.58, 59, 60, 61 Tet3, but not Tet1 and Tet2, is highly expressed in oocytes and zygotes.36, 58, 59 Indeed, depletion of maternal Tet3 blocks the conversion of 5mC to 5hmC in the male pronucleus in the zygote.36, 58 5mC oxidation seems to be a key step in the erasure of paternal methylation marks, as Tet3 deficiency inhibits demethylation of paternal genes.36 Although BER has been proposed to be involved in active demethylation in preimplantation embryos,62, 63 5hmC, 5fC and 5caC do not appear to be rapidly replaced by unmodified cytosine. Instead, they persist in the paternal genome and gradually decline during cleavage divisions.36, 59, 60, 61 These results suggest that, although 5hmC, 5fC and 5caC are generated in the zygote by an enzyme-catalyzed process, their loss during preimplantation development is primarily through a DNA replication-coupled passive process (Figure 3).

Figure 3
figure 3

DNA demethylation during preimplantation development. Shortly after fertilization, paternal 5mC is rapidly oxidized by Tet3. The resulting 5hmC, as well as maternal 5mC, gradually declines during subsequent cleavage divisions primarily through passive dilution. After implantation, Dnmt3a/3b-mediated de novo methylation occurs to establish lineage-specific methylation patterns.

Although the maternal and paternal genomes are exposed to an identical environment in the zygote, the maternal genome is protected from Tet3-mediated 5mC oxidation. PGC7 (also known as Stella and Dppa3), a maternal factor, has recently been shown to be required for this protection. Depletion of maternal PGC7 results in conversion of 5mC to 5hmC in both the male and female pronuclei.58 Consistent with this finding, a previous study showed that PGC7 protects the maternal genome from demethylation in early embryos.64 In normal zygotes, Tet3 is enriched and preferentially associated with the male pronucleus.36, 65 PGC7 seems to bind histone H3K9me2, which is abundant in the maternal chromatin but absent in the paternal chromatin with the exception of some imprinted loci, and inhibit Tet3 binding to the maternal chromatin and paternally imprinted loci.65

Reprogramming of the parental genomes in early embryos is believed to be important for the establishment of totipotency, but its biological significance remains largely unknown. Embryos conceived from Tet3-depleted oocytes implant normally but show high frequency of degeneration and morphological abnormalities, starting from mid-gestation, with only ∼20% surviving to term,36 whereas embryos without PGC7 show preimplantation defects and rarely reach the blastocyst stage.66 These findings support the notion that epigenetic reprogramming is crucial for embryonic development, although the developmental phenotypes observed may not be entirely attributable to defects in epigenetic reprogramming.

Tet proteins in demethylation in PGCs

In mice, PGCs are specified around embryonic day (E) 7.25 in the epiblast of the developing embryo, with the involvement of bone morphogenetic protein (BMP) signaling and the transcription factors BLIMP1 and PRDM14.67 Shortly afterward, PGCs begin migrating along the embryonic–extraembryonic interface and eventually arrive at the genital ridge, mostly by E11.5.68 PGCs initially have similar epigenetic marks as other epiblast cells, including significant levels of DNA methylation,69, 70 and thus need to be reprogrammed to generate an epigenome for the development of germ cells.

Previous studies indicated that, during their migration, PGCs undergo genome-wide demethylation including the erasure of DNA methylation marks at ICRs.71, 72 Exceptions include intracisternal-A particles and other active retrotransposons, which appear to be resistant to complete demethylation.71, 72, 73 In the past several years, a number of groups have generated genome-wide high-resolution DNA methylation profiles of PGCs at different stages of the reprogramming process.45, 73, 74, 75, 76, 77 These studies reveal that PGCs undergo demethylation in two phases. The first phase occurs during PGC expansion and migration from ∼E8.5, and involves global demethylation affecting sequences of almost all genomic features. Passive demethylation may have a major role in this phase of genome-wide loss of methylation, as Dnmt3a and Dnmt3b, as well as Uhrf1 (also known as NP95), an essential factor for Dnmt1 function, are repressed in PGCs.78, 79 The second phase occurs from E9.5 to E13.5 and affects specific loci including ICRs, CpG islands (CGIs) on the X chromosome and germline-specific genes.73, 74, 75, 76

Recent studies provide evidence for the involvement of Tet-mediated 5mC oxidation in demethylation in PGCs.37, 75, 76, 80, 81, 82 Tet1 and Tet2 are expressed in PGCs between E9.25 and E11.5, but Tet3 is undetectable in PGCs.75, 80, 81 Hackett et al.75 and Yamaguchi et al.76 used immunofluorescence to analyze PGCs at various time points and showed that both 5mC and 5hmC levels are low at E8.5, 5hmC levels begin to increase between E9.5 and E10.5, peak at ∼E11.5 and then gradually decline from E11.5 to E13.5. Genetic studies reveal that deficiency for Tet1 or both Tet1 and Tet2 has no effect on global demethylation in PGCs, but results in defective demethylation and altered expression of specific genes including meiotic genes and imprinted genes.37, 80 These results suggest that Tet1 and Tet2 are responsible for the production of 5hmC in PGCs and that 5hmC enrichment is followed by replication-coupled dilution. Using PGCs differentiated from wild-type or Tet1- and Tet2-depleted ES cells in vitro, Vincent et al.81 showed that, in the absence of Tet1 and Tet2, the first phase of global demethylation is unaffected but numerous promoters and gene bodies become hypermethylated. Taken together, these findings suggest that Tet-mediated conversion of 5mC to 5hmC is mainly involved in the second phase of demethylation in PGCs including the erasure of imprints at ICRs (Figure 4).

Figure 4
figure 4

DNA demethylation in primordial germ cells (PGCs). DNA demethylation in PGCs occurs in two phases. The first phase involves global demethylation. The second phase affects specific loci including imprinting control regions. Tet1/Tet2-mediated 5mC oxidation occurs mainly in the second phase, and 5hmC enrichment is followed by gradual decline at a rate consistent with passive dilution.

Previous studies have shown that embryonic germ cells (EGCs) have the capacity to reprogram somatic genomes, including the erasure of imprints at ICRs, in hybrid cells.83 Recently, Piccolo et al.82 used this system to address the requirement of Tet1 and Tet2 in EGC-induced pluripotent reprogramming. Intriguingly, Tet2 induces 5mC oxidation at pluripotent genes (for example, Oct4), as well as expression of these genes, and is thus required for the efficient reprogramming capacity of EGCs, whereas Tet1 is necessary to induce 5mC oxidation specifically at ICRs. These results suggest that Tet1 and Tet2 may have distinct genomic targets.

Despite the participation of Tet1 and Tet2 in epigenetic reprogramming in PGCs, these enzymes do not seem to be essential for germ cell development and fertility. Mice deficient for either Tet1 or Tet2 or both are viable and fertile, although Tet1-null and Tet1/Tet2-double-null female mice show reduced fertility owing to meiotic defects.32, 35, 37, 80, 84

Tet proteins in embryonic and postnatal development

Tet1, Tet2 and Tet3 show different expression patterns. Tet1 and Tet2 are highly expressed in the inner cell mass of mouse blastocysts, as well as in ES cells (which are derived from inner cell mass), whereas Tet3 is highly expressed in mouse oocytes and zygotes.16, 58, 59 Upon differentiation of mouse ES cells, Tet1 and Tet2 are rapidly downregulated and Tet3 is upregulated.15, 16, 31 Tet2 and Tet3 also appear to be widely expressed, at various levels, in adult tissues.16 Consistent with the distinct expression patterns of Tet proteins, recent genetic studies indicate that these enzymes have different functions in mammalian development.

Multiple groups have reported that depletion of Tet1 in mouse ES cells results in 5hmC reduction, alterations in gene expression and defects in self-renewal or differentiation.16, 29, 30, 31, 32 However, Tet1-null mice show no overt developmental abnormalities, although some mutant mice are slightly smaller at birth.32, 80 Several Tet2-mutant alleles have been generated. Tet2-null mice develop normally and are fertile.35, 84 However, Tet2 deletion, either systemically or in the hematopoietic compartment, results in hematological phenotypes in adult animals characterized by progressive enlargement of the hematopoietic stem cell pool and eventual myeloid malignancies.33, 34, 35, 84 Tet2 expression is ubiquitous in the hematopoietic compartment, including in the stem and progenitor subsets and in mature myeloid and lymphoid cells.33, 35 Although the molecular mechanisms by which Tet2 deficiency leads to the hematological phenotypes remain to be elucidated, Tet2-null mice show decreased levels of 5hmC and concurrent increased levels of 5mC in bone marrow and spleen.33, 34, 35 Tet2 could regulate genes important for hematopoiesis by modulating DNA methylation. Consistent with the observed phenotypes in Tet2-deficient mice, TET2 is frequently mutated in patients with various myeloid malignancies, such as myelodysplastic syndromes, myeloproliferative neoplasms, chronic myelomonocytic leukemia, acute myeloid leukemia and secondary acute myeloid leukemia.85 The fact that Tet2 deficiency in mice recapitulates the major phenotypes in human patients suggests that TET2 mutations are driver mutations in hematological malignancies.

Tet1 and Tet2 seem to have partially redundant functions in embryonic development. Although a fraction of Tet1/Tet2 double knockout (DKO) mice are viable and fertile, some DKO embryos exhibit mid-gestation abnormalities and most DKO animals die perinatally with a variety of malformations, such as exencephaly, hemorrhage in the head and profound growth retardation. Tet3 is upregulated in DKO mice, suggesting that compensation by Tet3 may contribute to the viability of DKO mice.37 Systemic deletion of Tet3 leads to neonatal lethality, and maternal deletion impairs reprogramming in the zygote.36 Given the wide expression of Tet3 in somatic tissues,16 it would be interesting to determine the function of Tet3 in adult animals.

Concluding remarks

Since the discovery that Tet proteins can convert 5mC to 5hmC in 2009,15 tremendous progress has been made in understanding the functions of these enzymes and their products (5hmC, 5fC and 5caC). It is now widely accepted that 5hmC, 5fC and 5caC serve as intermediates in the process of DNA demethylation. Genetic studies in mice have confirmed the involvement of distinct Tet proteins in demethylation in the zygote and PGCs. Despite the progress, complete models of DNA demethylation in various physiological contexts remain to be assembled. One of the challenges is that multiple mechanisms seem to work cooperatively to achieve demethylation, and the relative contribution of these mechanisms and how they are orchestrated need to be clarified. Some of the proposed mechanisms may not be relevant or significant. For instance, recent biochemical studies suggest that 5hmC is an unlikely substrate for the AID/APOBEC family of deaminases.86, 87 Emerging evidence suggests that 5hmC, in addition to its role in DNA demethylation, may function as a stable epigenetic mark. Indeed, recent studies have identified several 5hmC-specific ‘readers’ including MBD3, MeCP2, Uhrf1 and Uhrf2.88, 89, 90, 91 Further work needs to be done to determine the significance of 5hmC in regulating chromatin structure and function, including gene expression, and the mechanism by which 5hmC is maintained. Tet proteins are large molecules and may have other functions, some of which may be independent of their enzymatic activities. For example, several recent reports show that Tet proteins interact with O-GlcNAc transferase and promote histone O-GlcNAcylation.92, 93, 94 Another area of intense research is the role of Tet proteins and 5hmC in cancer. We expect to see exciting discoveries addressing these issues in the coming years.