5-hydroxymethylcytosine Marks Mammalian Origins Acting as a Barrier to Replication

In most mammalian cells, DNA replication occurs once, and only once between cell divisions. Replication initiation is a highly regulated process with redundant mechanisms that prevent errant initiation events. In lower eukaryotes, replication is initiated from a defined consensus sequence, whereas a consensus sequence delineating mammalian origin of replication has not been identified. Here we show that 5-hydroxymethylcytosine (5hmC) is present at mammalian replication origins. Our data support the hypothesis that 5hmC has a role in cell cycle regulation. We show that 5hmC level is inversely proportional to proliferation; indeed, 5hmC negatively influences cell division by increasing the time a cell resides in G1. Our data suggest that 5hmC recruits replication-licensing factors, then is removed prior to or during origin firing. Later we propose that TET2, the enzyme catalyzing 5mC to 5hmC conversion, acts as barrier to rereplication. In a broader context, our results significantly advance the understating of 5hmC involvement in cell proliferation and disease states.

rereplication: (i) Geminin binds and inhibits CDT1 activity 21 , (ii) CDT1 is degraded 22 , and (iii) CDC6 is exported from the nucleus 23 . Inappropriate expression of CDT1, CDC6, and/or Geminin is sufficient to cause genomic rereplication, often resulting in polyploidy 24 . Interestingly, dividing mammalian cells assemble significantly more potential origins than are used during genome duplication. While the cause for this phenomenon is unknown, some groups have speculated that multiple inactive origins are licensed to ensure genome integrity during genetically harmful events.
Replication is initiated at specific locations in the mammalian genome; however, no mammalian replication origin consensus sequence is known. Mammalian origins are over-represented in GC rich regions, including CpG islands [25][26][27][28] and interestingly, mammalian origins of replication also appear to be heritable from one cell division to the next. Based on these and other evidence, many researchers have linked replication origins to epigenetic factors including DNA and chromatin modifications 2,28-31 . Among DNA modifications, 5-methylcytosine (5mC) is present at some, but not all replication origins 32 . Mammalian replication origins positively correlate with regions of active transcription [33][34][35] and as a parallel, the newly identified base 5-hydroxymethylcytosine (5hmC) correlates with regions of active transcription [36][37][38] .
5-hydroxymethylcytosine is a relatively recently discovered DNA base in mammalian cells 36,39 . While the function of 5hmC is unclear, its presence at promoters has been correlated with transcription [40][41][42][43] . 5hmC is the product of enzyme-catalyzed oxidation of 5-methylcytosine (5mC). Presently, three enzymes have been described to catalyze this oxidation reaction: Tet1, Tet2, and Tet3 44 . Initial reports suggest that 5hmC is involved in a variety of cellular processes, including stem cell pluripotency, tumorigenesis, cell fate, embryogenesis and differentiation [45][46][47][48] . 5hmC has been suggested to affect these processes through transcriptional regulation.
5hmC is widely distributed throughout the mammalian genome and is enriched to the greatest extent within gene bodies and to a relatively lesser degree at promoters [49][50][51][52] . In biochemical assays the presence of 5hmC at promoters strongly inhibits transcription, while the presence of 5hmC within gene bodies has almost no effect on transcription 53 . In vivo assays that correlate 5hmC with transcription have produced mixed results: high 5hmC levels at a subset of mammalian promoters correlate with increased transcription, at other promoters high 5hmC levels correlate with reduced gene expression 36,38,44,54,55 . As 5hmC is present throughout the genome, its presence at non-promoter regions suggests an alternative function for this highly modified base.
Since (i) 5hmC is found within gene bodies regions, (ii) the presence of 5hmC within gene bodies regions has almost no detectable effect on transcription, (iii), transcription coincides with active origins of replication, (iv) both 5hmC and origins of replication appear to be heritable, we hypothesized that 5hmC could play a role in replication.
In this manuscript, we demonstrate the role of 5hmC in cell cycle regulation. We show that selected replication licensing components bind to 5hmC-modified DNA and we propose that 5hmC marks replication origins. Our data indicate that 5hmC is globally enriched at replication origins and that 5hmC is depleted at recently fired origins. Global 5hmC levels are inversely correlated with proliferation; indeed, cell cycle analysis demonstrates that high 5hmC levels significantly increase the time a cell spends in G1 phase. Taken together, our data provide a better understanding of 5hmC involvement in cell proliferation and disease states.

Results
Proteins involved in genome maintenance and cell cycle bind preferentially to 5-hydroxymethylcytosine-modified DNA. Beads coated with unmodified or 5hmC-modified DNA substrates were incubated with HeLa nuclear extracts (Supplemental Fig. S1). Substrates were recovered and weak DNA binding proteins were washed from the sample. Notably, the binding and wash buffers contain EDTA, which inhibits a 5hmC-specific nuclease 56 . As equal portions of unmodified DNA and 5hmC-modified DNA were loaded onto the beads (Supplemental Fig. S2A) with similar recovery efficiencies (Supplemental Fig. S2B), the beads did not introduce experimental bias. Proteins bound to unmodified or 5hmC-modified DNA were identified through Electrospray and MALDI Mass Spectrometry (Supplemental Fig. S3; Table S1). Selected window of protein masses allowed us to eliminate degradation products that produced confounding results. We identified 100 distinct proteins that interact with unmodified DNA (Supplemental Fig. S4, and Table S2) and 125 distinct proteins that interact with 5hmC-modified DNA. Proteins that interacted with both the unmodified and 5hmC-modified substrates (Supplemental Fig. S5 and Table S3) were eliminated from the analysis, yielding 48 proteins that specifically interact with 5hmC-modified DNA (Fig. 1A). These proteins were analyzed using a gene ontology over-representation test 57 . Proteins that appeared at the top of our over-representation list included proteins involved in mitosis and chromosome organization (Fig. 1B and Supplemental Table S4). Interestingly, proteins involved in transcription were neither over-represented nor under-represented in this sample. Our results suggest that proteins that interact with 5hmC may be involved in mitosis or maintenance of chromosome integrity.
The MCM2-7 helicase binds to 5hmC-modified DNA in vitro and in vivo. Our top hit observed on the 5hmC binding protein gene ontology over-representation test was "DNA duplex unwinding" including the MCM3 protein. Since MCM3 is a subunit of the hexameric MCM2-7 helicase, we speculated that all components of this helicase would interact more strongly with 5hmC-modified DNA. While only MCM3 was observed in our initial screen, we found that all 6 subunits of the replicative helicase, MCM2-7, interacted with 5hmC-modified DNA. We were unable to detect an interaction between any of the MCM2-7 subunits and unmodified DNA ( Fig. 2A). In order to validate the pull-down efficiency, a positive control previously identified by mass spectrometry was included. As expected, the DNA mismatch repair protein MSH2, was shown to bind both unmodified and 5hmC-modified DNA ( Fig. 2A).
As compared with input, DNA immunoprecipitated from mouse embryonic stem (mES) cells with MCM2 antiserum showed a greater than 5-fold enrichment of 5hmC, suggesting that 5hmC and the MCM2-7 helicase interact in vivo (Fig. 2B). In addition, we show that all MCM2-7 subunits co-immunoprecipitate with DNA enriched with 5hmC; however, only MCM2, MCM4 and MCM6 showed a statistically significant increase in 5hmC compared with the input control (Supplemental Fig. S6A). In contrast, DNA that co-immunoprecipitates with MCM4 is also enriched with 5mC (Supplemental Fig. 6B).
Origins are enriched for 5hmC, while active origins are 5hmC depleted. The MCM2-7 helicase is not only a replicative helicase but it is also the principal component in replication licensing, ensuring that the genome replicates once and only once. MCM2-7 helicase is present not only at active origins but also at inactive origins. Based on the preferential binding of MCM2-7 to 5hmC-modified DNA, we hypothesized that 5hmC was present at both inactive and active origins. As we have shown, 5hmC is enriched at DNA regions occupied by MCM2 (Fig. 2B), demonstrating that 5hmC occupies origins in vivo. This result was verified by intersecting 5hmC-enriched genomic regions with origin of replication sequences 25,58 (data sets are described in materials and methods) (Fig. 2C). Further clarifying this finding, we included heat maps illustrating 5mC and 5hmC signal around replication origins (Fig. 2E,F) 25,58 . Figure 2F supports the hypothesis that 5hmC co-occurs with selected replication origins. Recently fired origins of replication can be identified by the incorporation of BrdU into the nascent strand followed by a BrdU pull-down of size fractionated nascent strand (Supplemental Fig. S7) 25 . Using this method, we observed a strong depletion of 5hmC at nascent strand as measured by mass spectrometry (Fig. 2D). At the same time, 5mC was slightly and non-significantly reduced. Our mass spectrometry results also indicate that the newly replicated DNA, which contains a fired origin of replication, is relatively GC rich (Supplemental Fig. S8A); the absolute 5mC and 5hmC base composition at origins are shown in Supplemental Fig. S8B,C. These results suggest that (i) 5hmC is globally enriched at replication origins and (ii) that 5hmC is depleted at fired origins. 5hmC content is proportional to doubling time. Since we found that 5hmC interacts with several components of the replication licensing machinery and that 5hmC is present at origins of replication, we hypothesized that 5hmC may have an effect on the cell cycle. We evaluated this hypothesis by creating HeLa cell lines that stably express either YFP (Control), a Tet2 CD:YFP fusion (Tet2 CD), or a Tet2 CD:YFP fusion mutant (Tet2 CD H1295Y/D1297A) that has been previously reported 59,60 to be catalytically inactive (Tet2 CD/CI). The Tet2 CD and Tet2 CD/CI cell lines express more exogenous Tet2 CD as shown in Supplemental Fig. S9A. As expected the Tet2 CD stable HeLa cell line has a significant increase in 5hmC compared to the control cell line (Supplemental Fig. S9B). Surprisingly, we found that the HeLa cell line expressing the catalytically inactive Tet2 CD mutant also Proteins that bound to each substrate were resolved using SDS-PAGE and identified by mass spectrometry. A significant fraction of proteins interacted with both unmodified DNA and 5hmC-modified DNA. (B) Proteins that interacted exclusively with 5hmC-modified DNA were subjected to a Panther Gene Ontology over-representation test. Results of the over-representation test are displayed as fold above expected for a random protein population. had increased cellular 5hmC content (Supplemental Fig. S9B). This finding leads us to conclude that the Tet2 CD H1295Y/D1297A (Tet2 CD/CI) mutations do not completely abolish Tet activity but rather reduce the catalytic activity of Tet2. All cell lines maintain similar 5mC levels (Supplemental Fig. S9B). We measured the proliferative properties of each of these cell lines in both asynchronous and synchronized cultures. In asynchronous cultures, we found that the doubling time of HeLa cells expressing the Tet2 CD:YFP fusion protein was significantly longer (18.11 ± 1.34 hrs) than the control HeLa cell line (15.61 ± 1.45 hrs; p-value = 0.045). HeLa cell line expressing the Tet2 CD/CI mutant, which we show possesses catalytic activity, also demonstrated a cell cycle delay (18.78 ± 0.84 hrs; p-value = 0.009) ( Fig. 3A-D).
As these cells that express either Tet2 CD or Tet2 CD/CI were dividing more slowly, we anticipated a delay in the cell cycle. Using flow cytometric analysis to compare DNA content (propidium iodide -PI) and recently synthesized DNA (BrdU), we were able to determine the fraction of cells in G1, S, or G2/M phases of the cell cycle (Fig. 3A). We found that cells expressing Tet2 CD were delayed in G1 phase (47.25 ± 2.56%) compared to control cell lines (28.67 ± 2.37%; p-value < 0.001) (Fig. 3B). While spending a similar, yet statistically different, amounts of time in S and G2/M phases of the cell cycle, cells expressing the Tet2 CD spent 7.57 ± 0.49 hrs in G1 phase, nearly twice as long as cells that only expressed the YFP control plasmid (4.59 ± 0.38 hrs; p-value < 0.001) (Fig. 3C). As the Tet2 CD/CI mutant expressed a partially functional copy of Tet2, we observed a similar delay in G1 phase (7.32 ± 0.24 hrs; p-value < 0.001) compared to the control. Values are summarized in Supplemental  Table S5.
To confirm our observations, we synchronized cells in G1/S phase using a double thymidine block and in G2/M phase using a nocodazole block. Control cells released from a double thymidine block progressed to G2 phase more rapidly than cells that expressed either variant of Tet2 CD (Fig. 3D,E). These cells also showed a delay in reaching G2/M phase of the cell cycle (Fig. 3D,F). After the release from nocodazole block, cell population that expressed Tet2 CD were significantly delayed in G1 phase compared to the control population (Supplemental Fig. S10A,B). Through the cell cycle analysis of synchronized populations, we confirm that Tet2 CD expressing cells are delayed in G1 phase and that these cells have an overall cell cycle delay.
We were also interested in determining if 5hmC levels vary throughout the cell cycle. With this in mind, we evaluated the quantity of 5hmC in G1, S and G2/M phases in HeLa control cells and HeLa cells expressing Tet2 CD. Our results suggest that 5hmC quantity increases throughout the cell cycle (Fig. 3G). However, because the DNA content is also increasing throughout the cell cycle, these results indicate that 5hmC as a fraction of DNA www.nature.com/scientificreports www.nature.com/scientificreports/ content peaks in G1 phase of the cell cycle and is overall reduced as the cell progresses through S and G2/M phases (Data summarized in Supplemental Fig. S10C). This finding is consistent with the hypothesis that 5hmC is removed prior to or during origin firing.
We hypothesized that 5hmC-induced G1 delay could be mediated by the upregulation of cyclin dependent kinase inhibitors or p53. We found no evidence that p53 nor any cyclin dependent kinase inhibitor (p15 INK4b ,  p16 INK4a , p18 INK4c , p19 INK4d , p21 CIP1 , p27 KIP1 ) was upregulated in the cell lines that overexpress Tet2 CD or Tet2 CD/CI (Fig. 3H). This result suggests that over-production of 5hmC causes a G1 delay through a mechanism that is unrelated to cyclin dependent kinase activity.
Taken together, these data suggest that 5hmC increases the time required for a cell to divide by retarding the cell cycle; the most pronounced delay from overabundance of Tet2 or 5hmC occurs in G1 phase of the cell cycle.
MCM2 Chromatin Occupancy mirrors 5hmC levels throughout the cell cycle. The preferential binding of pre-replication complex to 5hmC-modified DNA together with cell cycle analysis let us to speculate that 5hmC directs the assembly of pre-replication complex. We expected that cells with elevated 5hmC levels would have increased chromatin occupancy of MCM2-7 in G1 phase of the cell cycle and at the same time, we would expect MCM2-7 chromatin occupancy to mirror 5hmC levels ( Fig. 3G) in various cell cycle phases.
To measure MCM2 chromatin occupancy throughout the cell cycle, cell lines were fixed and permeabilized, followed by extraction of soluble proteins. Chromatin bound MCM2 was antibody-stained. Chromatin bound MCM2 in each phase of the cell cycle was evaluated using flow cytometry. Consistent with our hypothesis, MCM2 chromatin occupancy was increased in G1 phase and was decreased in G2/M phase (Supplemental Fig. S11A-C), and this result mirrors the 5hmC genomic content (Fig. 3G). As expected, MCM2 chromatin occupancy is increased in cell lines with elevated 5hmC levels. These findings are consistent with the hypothesis that 5hmC binds MCM2, prevents its release, and allows us to hypothesize that elevated 5hmC levels would retard the cell cycle by causing a G1 delay or arrest. In line with this hypothesis, an increase in genomic 5hmC levels could lead to their more prominent association with MCM2; this in turn will be reflected in G1 length.
Elevated levels of 5hmC prevent rereplication. Consistent with previous studies, HeLa cells display high prevalence of polyploid cells 61 . Interestingly, during cell cycle analysis we observed that cells with elevated 5hmC levels have a significant reduction in the population of polyploid cells (Fig. 3A,B). This observation led us to speculate that Tet2 CD and/or 5hmC could be a barrier to genomic rereplication.
It is well known that overexpression of CDT1 in p53-deficient cells results in substantial genomic rereplication 24 . We observed a similar effect in p53-deficient H1299 cells. In our hands, H1299 cells have a > 4n population of 1.5% (Fig. 4A) whereas when CDT1 is exogenously overexpressed 55.7% of cells are polyploid (Fig. 4B). The overexpression of the TET2 CD in H1299 cells resulted in a significant reduction of polyploidy; yielding a 4-fold decrease in polyploid cell population compared to the control cell line (Fig. 4A,C). TET2 CD is able to partially rescue the rereplication phenotype induced by CDT1 overexpression -55.7% polyploid cells versus 29.4% when TET2 CD is co-expressed with CDT1 (Fig. 4B,D). As expected H1299 cells also experience a G1 delay when TET2 CD is overexpressed, supporting our previous results.
High levels of 5hmC inversely correlate with organ proliferation rates. As we found that elevated 5hmC levels delay progression through G1 phase thus reducing cell division, we wanted to see if this phenomenon was apparent also in vivo. We analyzed the 5hmC content in DNA isolated from mouse spleen, testis, liver, kidney, lung, heart and brain. We compared 5hmC levels with proliferation rate as measured by MKi-67 expression. We found a compelling linear relationship between doubling time (1/MKi-67 expression) and 5hmC levels (5hmC vs. 1/MKi-67 expression: R 2 = 0.826; Spearman's Rank Correlation Coefficient r s = 0.857, p-value = 0.0137), showing that lower levels of 5hmC are indicative of increased proliferation (Fig. 5A). Our data suggest that 5mC content correlates poorly with doubling time (R 2 = 0.138; Spearman's Rank Correlation Coefficient, r s = −0.286, p-value = 0.535) (Fig. 5B).
Indeed, the brain, composed of cells that proliferate relatively slowly, had the highest quantities of 5hmC (0.584 ± 0.051% 5hmC). Most cells in the brain are not actively dividing; according to our hypothesis, we expect high 5hmC levels in brain. Interestingly, some brain regions are actively dividing, which we expect to have low 5hmC levels. Consistent with our hypothesis, the neural progenitor cells in the dentate gyrus incorporated relatively high levels of BrdU but were almost devoid of 5hmC, while the non-proliferative neurons did not incorporate BrdU but had high 5hmC levels. Cortex sections composed of a homogenous population of non-dividing neurons showed a uniformly high level of 5hmC (Fig. 5C).
A similar situation was observed in the testis. Overall, testis DNA has low 5hmC levels (0.131 ± 0.007% 5hmC) and proliferates rapidly. Observation of testis sections revealed that slowly or non-dividing Sertoli cells contained high 5hmC levels, while the more rapidly dividing cells within the seminiferous tubules had low 5hmC levels (Fig. 5C).
DNA isolated from the slowly dividing heart tissue showed increased 5hmC levels (0.380 ± 0.037% 5hmC) and relatively low Ki-67 expression. Composed of slowly dividing cells, immunofluorescent images of heart sections showed homogenously high 5hmC levels (Fig. 5C).
These data strongly suggest rapidly dividing cells have low 5hmC content and non-dividing cells have elevated quantities of 5hmC.

Discussion
Mammalian cells have multiple origins of replication. While each cell has many more origins of replication than required for accurate duplication of the genome, in healthy cells any given origin of replication should not initiate replication more than one time prior to division. Origin of replication firing more frequently would lead to polyploidy or other replication problems. To achieve a single replication event each active origin of replication becomes "licensed" for replication when certain factors are present at origins of replication. Among others, these factors include ORC1-6, CDT1, CDC6, and the MCM2-7 helicase. With some deviations, eukaryotic replication licensing is a highly conserved process. S. cerevasiae has a well-defined origin of replication sequence where replication licensing factors assemble. Interestingly, no mammalian origin of replication consensus sequence has been identified. Several groups have speculated that DNA methylation or histone modifications may signal the assembly of the replication licensing machinery 25,33,[62][63][64][65][66][67][68][69][70] . Even if a limited relationship exists between DNA methylation or histone modifications and replication licensing, specific histone modifications or DNA methylation are not definitive of replication origins. www.nature.com/scientificreports www.nature.com/scientificreports/ Our results suggest that 5hmC is a common feature of replication origins that may direct pre-RC assembly. We find that the licensing factor and replicative helicase MCM2-7 binds preferentially to 5hmC-modified residues rather than unmodified residues. 5hmC pull-down sequencing data intersected with origin of replication tiling arrays shows that 5hmC is a characteristic component of replication origins; then we suggest that 5hmC is removed prior to or during origin firing. In line with these findings, our data suggest that the presence of 5hmC is involved in cell cycle regulation and origin assembly. Indeed, cells that have the highest 5hmC levels are also proliferating the least. Consistent with this finding, we show that increased 5hmC levels slow the cell's departure from G1. We demonstrate that 5hmC levels do not significantly vary throughout the cell cycle; origins are enriched for 5hmC and 5hmC is depleted at the fired origin. This led us to speculate that 5hmC must be removed from DNA prior to or during origin firing and is deposited immediately after newly replicated DNA is synthesized. We suggest that 5hmC is deposited rapidly on both mother and daughter strands after the replication fork passes beyond the origin of replication. We find this to be an attractive model since Tet1 was recently shown to interact with PCNA in the replication fork 71,72 . The notion that 5hmC is deposited after replication is consistent with our data and recent reports suggest that cellular 5hmC level related to DNA content (5hmC/total DNA) is slightly decreased 73 throughout the cell cycle. The most striking evidence linking 5hmC to replication is our finding that TET2 CD overexpression strongly reduces the rereplication phenotype afforded by CDT1 overexpression. www.nature.com/scientificreports www.nature.com/scientificreports/ Several groups provided evidence that supports a role for 5hmC in transcriptional regulation. These reports are not only compatible with our findings but also indirectly support our hypothesis, given that active origins of replication strongly correlated with genomic regions harboring active transcription. Moreover, we demonstrate that origins of replication have a genomic distribution overlapping with the one of 5hmC along the genome.

Conclusions
Consistent with our results and previous publications, we propose the following model -(1) 5hmC marks location of the origin of replication; in G1 phase, replication licensing factors are recruited to origins of replication, (2) as the cell progresses through S-phase, 5hmC is oxidized further, removed, or replaced with 5mC at active replication origins, (3) the origin fires, (4) 5hmC is deposited on the nascent and parental DNA strand (Model shown in Fig. 6).
As compared to the tissue of origin, cancer genomes have reduced 5hmC levels 74,75 . Furthermore, within any given cancer type the 5hmC levels have prognostic value [76][77][78] : cancers with higher 5hmC levels have better outcomes than cancers with low 5hmC levels. As cancer is broadly characterized by unrestricted cell growth and division, our model suggesting that 5hmC maintains cells in G1/G0, has potential utility in cancer diagnostics and prognostics. Our findings and the proposed model describe a general mechanism that explains both 5hmC levels and 5hmC genomic position.  www.nature.com/scientificreports www.nature.com/scientificreports/ Cell culture. HeLa cells were maintained in a 5% CO 2 , humidified, water-jacketed incubator at 37 °C. HeLa cells were grown in DMEM supplemented with 10% Fetal Bovine Serum, 2 mM glutaMAX, and 100 U/ml penicillin 100 U/ml streptomycin. Cell lines stably expressing YFP, Tet2 CD:YFP, and Tet2 CD/CI:YFP fusion were grown in the identical media supplemented with 1 mg/ml G418 (Constructs shown in Supplemental Table S6). Cells were passed as previously described 79   ). Samples were boiled and resolved using SDS-PAGE. Samples were either analyzed by mass spectrometry or by Western analysis. Samples to be used in mass spectrometric analyses were resolved on a 4-20% acrylamide gel, while samples for Western analysis were resolved on a 10% acrylamide gel.

Methods
Protein mass spectrometry. The gel resulting from 4-20% SDS-PAGE was fixed in a solution of 10% methanol/7% acetic acid for 30 minutes at 25 °C. The gel was then stained with SyproRuby Protein Stain (ThermoFisher cat. nr. S12000) at 25 °C overnight. The gel was washed in 10% methanol/7% acetic acid and imaged using a UV light source. Two bands resolved on 4-20% SDS-PAGE were excised from each lane. The bands contained proteins that encompassed the molecular weight of between 40 and 120 kDa. Peptides were identified using an electrospray ion trap mass spectrometer and matched to the relevant protein using a Mascot search. This search yielded two lists: (i) proteins that interact with unmodified DNA and (ii) proteins that interact with 5hmC modified DNA (Supplemental Table S1). Proteins that occur in both lists were removed from the 5hmC pull-down yielding proteins that interact 5hmC and not unmodified DNA. These 5hmC interacting proteins were then analyzed using a Panther Gene Ontology over-representation test. Statistically significantly over-represented biological functions are reported.
Western blotting. Western blots were performed as previously described 80 . Chromatin immunoprecipitation. ChIP was performed as previously described 81  www.nature.com/scientificreports www.nature.com/scientificreports/ in 100 µl ELISA buffer was heated to 98 °C for 5 min. Samples were immediately placed on the ice. DNA samples were than incubated in antibody-coated wells at 37 °C for 30 minutes. Wells were washed three times with ELISA buffer. α-ssDNA antiserum conjugated to HRP in ELISA Buffer (1:100 dilution) was incubated with each well for 30 min at 37 °C. Wells were washed three times with ELISA buffer and 100 µl of developer solution was added to each well; the color reaction proceeded for 30 min at 25 °C. Wells were analyzed for absorbance at 450 nm using a Wallac Victor 2 plate reader. 5mC quantification. 5mC DNA ELISA Kit (Zymo Research, Cat. nr. D5325) was used according to the manufacturer's instructions to quantify 5mC levels. Briefly, DNA at 1 ng/µl in 100 µl 5mC coating buffer was heated to 98 °C for 5 min. DNA was immediately placed on ice for 10 min. DNA was added to the appropriate ELISA plate and incubated for 1 hour at 37 °C. Each well was washed three times with 5mC ELISA Buffer and blocked with 200 µl of 5mC ELISA Buffer for 30 min at 37 °C. Antibody mix consisting of α-5-Methylcytosine antiserum (1:200) and secondary antibody (1:1000) was prepared in 5mC ELISA Buffer. The antibody mix was added to the wells and incubated for 1 hour at 37 °C. Wells were washed three times with 5mC ELISA buffer. 100 µl of developer solution was added to each well; the color was developed for 30 min. Plate was read on a Wallac Victor 2 plate reader at 450 nm.
Quantification of DNA modifications using LC-MS/MS. DNA was hydrolyzed to deoxynucleosides by benzonase from E. coli (Santa Cruz Biotech), nuclease P1 from P. citrinum (Sigma), and alkaline phosphatase from E. coli (Sigma) in 10 mM ammonium acetate pH 6.0 and 1 mM magnesium chloride at 40 °C for 40 min, added 3 volumes of acetonitrile and centrifuged at 16,000 g for 30 min at 4 °C). The supernatants were dried and dissolved in 50 µl water for LC-MS/MS analysis of 5-hm(dC). A portion of each sample was diluted for the analysis of 5-methyl(dC) and unmodified deoxynucleosides. Chromatographic separation was performed using an Agilent 1290 Infinity II UHPLC system with an ZORBAX RRHD Eclipse Plus C18 150 × 2.1 mm ID (1.8 μm) column protected with an ZORBAX RRHD Eclipse Plus C18 5 × 2.1 mm ID (1.8 µm) guard column (Agilent). The mobile phase consisted of water and methanol (both added 0.1% formic acid). The following conditions were employed during analyses: for 5-hm(dC): 0.1 ml/min flow, starting with 5% methanol for 4 min, followed by 1-min gradient of 5-70% methanol, 5 min with 70% methanol, and 5 min re-equilibration with 5% methanol;forfor 5-methyl(dC) 0.25 ml/min flow, 3-min gradient of 5-90% methanol, followed by 4 min re-equilibration with 5% methanol; and for unmodified deoxynucleosides 0.25 ml/min flow, 20% methanol. Mass spectrometric detection was performed using an Agilent 6495 Nascent strand isolation and BrdU pulldown. DNA and nascent strand were purified as described previously 25 with slight modifications. Briefly, 1 × 10 8 Dividing mES cells were pulsed with 100 µM BrdU for 60 min. Cells were washed with PBS and harvested in 20 ml of DNAzol (Invitrogen, cat. nr. 10503027) for 10 min at 25 °C. Samples were digested with 200 µg/ml Proteinase K (NEB, cat. nr. P8107S) at 37 °C overnight. After centrifugation at 4500 rpm at 4 °C for 15 min, the supernatant was added to a fresh tube and DNA was precipitated with the same volume of 100% EtOH for 1 hour at 25 °C. DNA was washed twice with 70% EtOH for 5 min and air-dried at 25 °C. DNA was suspended in 1 ml of TEN20 (10 mM Tris-Cl pH 7.9, 2 mM EDTA, 20 mM NaCl, 0.1% SDS, 1000 U RNasin) at 70 °C, denatured at 95 °C for 15 min and chilled on ice. Two aliquots composed of 500 µl denatured DNA was loaded onto a 10 ml, 5% to 30% sucrose gradient in TEN30 buffer (10 mM Tris pH 7.9, 2 mM EDTA pH 8.0, 300 mM NaCl) and centrifuged in a Beckman SW40 rotor at 21,600 RPM for 20 hrs at 4 °C. One milliliter fractions were withdrawn from the top of the gradient using a wide-bore pipette tip. Fifty microliters of each fraction was run on a 2% alkaline agarose gel at 30 V, overnight at 4 °C. The gel was neutralized in excess of neutralizing solution (1M Tris pH 7.6, 1.5 M NaCl) for 45 min 4 °C. The gel was stained with Ethidum Bromide (0.5 µg/ml) for 45 min at 25 °C. Prior to imaging, the gel was destained three times in ddH 2 O. Fractions corresponding to 0.5-2.5 kb were pooled and precipitated with 2.5 volumes ethanol and 0.3 M sodium acetate pH 5.5 for 2 hours at −80 °C. Pellets were washed twice with 1 mL of 70% ethanol and suspended in 1 mL of water. BrdU-IP was performed as previously described 82 with the following adjustments. BrdU-labeled nascent strands were precipitated with 10 µl of monoclonal α-BrdU antibody (BD Biosciences, cat. nr. 555627) in IP buffer (10 mM Sodium Phosphate pH 7.0, 140 mM NaCl, 0.05% Triton X-100 in ddH 2 O) overnight at 4 °C. The following day, 30 µl of Dynabeads Protein G beads (Invitrogen, cat. nr. 10003D) were added to each sample and incubated 4 hours at 4 °C with rotation. Beads were washed twice for 5 min with 1 ml of IP buffer, and suspended in 200 µl of digestion buffer (50 mM Tris-HCl pH 8.0, 10 mM EDTA, 0.5% SDS in ddH2O) and incubated overnight at 37 °C with Proteinase K at a final concentration of 0.25 mg/ml. One hundred microliters digestion buffer were added to the samples and they were incubated for 2 hours at 68 °C. An organic extraction was performed and DNA was ethanol precipitated and dried.
Bioinformatics analysis. Origin of replication (OoR) data is from a Nimblegen chip and in mm8 coordinates (http://genome.cshlp.org/content/21/9/1438.long) (GSM718735). We downloaded the processed data and the chip info and made a bed file with chromosomal coordinates and log2 ratio values. We then converted the mm8 coordinates to mm9 coordinates using UCSCs lift-over tool. We selected "significant" spots as those with a score > 0.5 (n = 18199). 5-mC and 5-hmC DIP-seq (GSE42250, GSE46111). mm9 coordinates. Supplementary_files_format_and_ content: wiggle files (IgG signal subtracted; scores represent tag numbers per 10 million reads). We downloaded wig files and converted them to bed format and intersected these data with the significant OoR data. We also www.nature.com/scientificreports www.nature.com/scientificreports/ intersected 5-mC and 5-hmC data with random intervals (generated using bedtools shuffle on the significant OoR tiles).
We determined how many of the 5-mC and 5-hmC tile regions were enriched (IgG and depth normalized score > 5) and found no difference between OoR and random regions for 5-mC. In contrast, there was a substantial overrepresentation of tiles with enrichment for 5-hmC. We annotated the OoR regions and regions that were both OoR and enriched in 5-hmC using HOMER (v. v4.8.2) (cmd = annotatePeaks.pl OoR_pos_and_val_mm9. bed mm9).
Enrichment profiles of 5mC and 5hmC around ORI regions (Fig. 2E,F) were generated using ngs.plot v2.61 with the following options: -L 1 -P 32 -RR 10 -GO km. The data was then replotted using ggplot2 in R to view enrichment in only ORI regions. Following datasets were used: 5-mC and 5-hmC DIP-seq (GSE42250, GSE46111) Origin of replication (GSM718735).
Growth curve. HeLa cells stably expressing YFP, Tet2 CD:YFP, or Tet2 CD/CI:YFP were seeded at 10 5 cells/ well density in a six well dish. At 24, 36, 48, and 60 hours after seeding, cells were harvested and counted using a Countess Cell counter (Life Technologies).
Cell cycle and 5hmC analysis. HeLa cells stabling expressing YFP, Tet2 CD:YFP, or Tet2 CD/CI:YFP fusion proteins were incubated with 20 μM BrdU for 60 minutes. Cells were harvested, washed in PBS, and fixed in 1 ml of ice-cold methanol overnight at −20 °C on rotator. The following day cells were centrifuged at 1200 rpm for 4 min, the supernatant was removed and the cells were suspended in 1 ml 2 N HCl for 20 min at 25 °C. Cells were centrifuged at 1200 rpm for 4 min, the supernatant was discarded and cells were suspended in 1 ml of 100 mM Glycine in PBS for 20 min at 25 °C. Cells were centrifuged at 1200 rpm for 4 min, the supernatant was discarded and the cells were suspended in 1 ml 0.1% (v/v) Triton X-100 in PBS for 30 min at 25 °C. Cells were centrifuged at 1200 rpm for 4 min, the supernatant was discarded and cells were suspended in washing solution (PBS supplemented with 0.1% (v/v) Tween20 and 1% (v/v) goat serum) for 30 min at 25 °C. Cells were centrifuged at 1200 rpm for 4 min, the supernatant was discarded and cells were suspended in 100 μl PBS containing rat α-BrdU antiserum and incubated at 4 °C overnight with rotation. For 5hmC analysis, cells were incubated with 1:100 rabbit α-5hmC antiserum. The following day cells were washed three times with 1 ml washing solution for 5 min at 25 °C. Cells were suspended in 100 μl washing solution supplemented with goat α-rat IgG conjugated to Alexa Flour 647 for 2 hours at 4 °C with rotation. For 5hmC analysis, cells were incubated with 1:200 goat α-rabbit AlexaFluor 488. Cells were washed three times with 1 ml of washing solution at 25 °C for 5 min. Cells were suspended in 500 μl propidium iodide/RNase A solution and incubated for 30 min at 25 °C. Flow cytometric analysis of BrdU content was performed using BD Accuri C6 benchtop flow cytometer (BD biosciences) while 5hmC analysis during cell cycle using LSR Fortessa (BD Biosciences). Collected data was analyzed using FlowJo software (FlowJo LLC). Gating strategies are shown in Supplemental Fig. S12.
Double thymidine and nocodazole synchronization. HeLa cells were synchronized in G1-early S phase through double treatment with 2 mM thymidine (Sigma) for 14 h followed by releasing in fresh growth medium supplemented with 24 μM deoxycytidine (Sigma) for 9 h. Cells were collected at different time points after the second thymidine treatment and deoxycytidine release. For the synchronization from mitosis, double thymidine block was performed followed by a 2 h deoxycytidine release, then 100 ng/ml of nocodazole (Sigma) were added for 10 h. Mitotic HeLa cells were than shaken-off from the plate, washed, seeded and released in fresh medium and samples were collected at different time points. Cells were analyzed for cell cycle profile (DNA content) using DNA-stain Hoechst 33258 (1.5 μg/ml) and Anti-phospho-Histone H3 (Ser10) as a marker of cell cycle progression. Cell cycle distribution was performed using flow cytometry on LSRII flow cytometer (BD Biosciences) and data were analyzed on FlowJo software. All experiments were performed in duplicate. Gating Strategy of double thymidine block and nocodazole are shown in Supplemental Fig. S14A,B, respectively.

MCM2 chromatin occupancy.
In brief, the proteins that were not chromatin bound were extracted from the cell as previously described 83 . Cells were probed using anti-MCM2 antibody and dylight-549-goat-anti-rabbit secondary antibody (Catalog Number: DI-1549, VECTOR LABORATORIES, USA) and stained with DNA-stain Hoechst 33258 (1.5 μg/ml). The samples were analyzed on LSRII flow cytometer (BD Biosciences) using FlowJo software. Gating strategy is shown in Supplemental Fig. S15.