DNA methylation is an epigenetic modification that differs between plant organs and tissues, but the extent of variation between cell types is not known. Here, we report single-base-resolution whole-genome DNA methylomes, mRNA transcriptomes and small RNA transcriptomes for six cell populations covering the major cell types of the Arabidopsis root meristem. We identify widespread cell-type-specific patterns of DNA methylation, especially in the CHH sequence context, where H is A, C or T. The genome of the columella root cap is the most highly methylated Arabidopsis cell characterized so far. It is hypermethylated within transposable elements (TEs), accompanied by increased abundance of transcripts encoding RNA-directed DNA methylation (RdDM) pathway components and 24-nt small RNAs (smRNAs). The absence of the nucleosome remodeller DECREASED DNA METHYLATION 1 (DDM1), required for maintenance of DNA methylation, and low abundance of histone transcripts involved in heterochromatin formation suggests that a loss of heterochromatin may occur in the columella, thus allowing access of RdDM factors to the whole genome, and producing an excess of 24-nt smRNAs in this tissue. Together, these maps provide new insights into the epigenomic diversity that exists between distinct plant somatic cell types.
DNA methylation is an epigenetic modification of cytosine bases implicated in gene regulation. In plants, DNA methylation occurs in three distinct cytosine contexts: CG, CHG and CHH. CG and CHG methylation is stably maintained by DNA METHYLTRANSFERASE 1 (MET1) and CHROMOMETHYLASE 3 (CMT3), respectively. De novo DNA methylation is catalysed by DOMAINS REARRANGED METHYLTRANSFERASE 2 (DRM2) in all three sequence contexts, in a process that is guided by 24-nt smRNAs, known as RdDM (refs 1,2). DNA methylation may also be maintained independently of the RdDM pathway through the concerted action of DDM1 and CHROMOMETHYLASE 2 (CMT2)3,4. DDM1 functions to displace the linker histone H1 in heterochromatic regions of the genome, allowing CMT2 access to the DNA, where it is able to catalyse the methylation of cytosines in the CHG and CHH contexts3,4. Although DNA methylation can be a stable epigenetic mark, faithfully maintained for many hundreds of generations5, dynamic changes in DNA methylation patterns can be observed during short time scales in response to the environment6,7, or in different cell types of a single individual8,
In Arabidopsis, a major biological role of DNA methylation is in silencing TE transcription. Loss of DNA methylation because of mutations in DDM1 or MET1 is sufficient for transcriptional activation of demethylated TE sequences, and transposition of some of these activated TEs3,12,13. Although TE insertions may contribute to novel modes of gene regulation, excess TE activity produces deleterious mutations, and efficient TE silencing is crucial for the maintenance of genome integrity. Plants may be most vulnerable to TE activity in the stem cells, as these are the progenitor cells from which all others derive, and TE insertions within the stem cells will therefore be inherited by all descendant cells. Indeed, highly complex mechanisms of TE silencing have been reported in the sperm and embryo. TE silencing in the sperm is thought to be assisted by 21-nt smRNAs derived from the vegetative cell nucleus, a non-generative companion to the sperm, and in the developing embryo by endosperm-derived 24-nt smRNAs9,
Columella is the most CHH-hypermethylated cell type in Arabidopsis
To investigate patterns of DNA methylation in different plant cell types, we used protoplasting followed by fluorescence-activated cell sorting (FACS) of cell populations marked by green fluorescent protein (GFP) in a range of reporter lines. These lines represent the major cell types or tissues in the root: epidermis (ProWER–GFP), cortex (ProCOR–GFP), endodermis (ProSCR–GFP), stele (ProWOL–GFP), whole columella root cap (PET111 enhancer trap line) and lower columella (ProCYCD5–GFP) (Fig. 1a). Two independently generated reporter lines were analysed for the endodermis. Following isolation of highly enriched populations of each cell type (Supplementary Fig. 1), we generated single-base resolution maps of cytosine methylation by whole-genome bisulphite sequencing and transcriptome profiles by RNA-seq and smRNA-seq (Fig. 1b and Supplementary Table 1). Analysis of global levels of DNA methylation in the six cell populations revealed that methylation in all sequence contexts (mCG, mCHG, mCHH) were higher in the columella, with dramatically increased levels of mCHH (Fig. 1c). Comparison with previously published Arabidopsis methylomes showed that mCHH levels in the columella are higher than in any other tissue or cell type analysed to date10,11 (Fig. 1c). The enrichment of mCHH in the columella was the most pronounced in the pericentromeric regions of the chromosome (Fig. 1d). Whole root tips from the PET111 transgenic line, and from Col-0, showed similar patterns and levels of mC as the non-columella cell types (Fig. 1c,d), indicating that the differences observed in the columella cell populations were due to cell type and not a widespread perturbation of DNA methylation in the transgenic lines used for cell isolation.
Columella hypermethylation is the major source of widespread differential DNA methylation in the root meristem
To further investigate the large differences in DNA methylation patterns, we identified differentially methylated regions (DMRs) in the genome between the cell types. With a target false discovery rate of 5%, we identified 38,307 DMRs among the different cell types (Fig. 2a). Of these, 13.6% (5,225) were differentially methylated only in the CG context (CG-DMRs), whereas 82.9% (31,761) were differentially methylated only in the CH context (CH-DMRs) (Fig. 2a, Supplementary Tables 2 and 3). Regions differentially methylated in both the CG and CH context (C-DMRs) were rare, with only 1,321 such regions observed (Fig. 2a and Supplementary Table 4). The DMR length also seemed to be associated with DNA methylation context, with CG-DMRs being, on average, shorter than CH- and C-DMRs (Fig. 2b). Overall, 13.8% of the nuclear genome was differentially methylated between the six cell types, mostly in the CH context (Fig. 2c).
Some regions of the genome are prone to spontaneous changes in DNA methylation levels17,18. To determine if the regions of differential DNA methylation between cell types were due to spontaneous fluctuations in DNA methylation levels between the different transgenic lines used, we compared the root cell-type-specific DMRs with two types of previously identified spontaneous DMRs: transgenerational DMRs (ref 17) and population DMRs (ref 19). We found that 76 and 60% of root cell-type-specific CG-DMRs and C-DMRs, respectively, overlapped with population DMRs, whereas only 5 and 2% of root cell-type-specific CG- and C-DMRs overlapped with transgenerational CG- and C-DMRs (Supplementary Fig. 2). We concluded that the majority of root cell-type-specific DMRs occur in regions of the genome known to be epigenetically labile, likely to be due to variation in smRNAs.
To determine if the enrichment of DNA methylation in pericentromeric regions (Fig. 1d) was linked to DMRs, we assessed the distribution of DMRs along the chromosomes (Fig. 2d and Supplementary Fig. 3). Although CG-DMRs are most abundant in the chromosome arms, the number of CH- and C-DMRs peaked in the proximal and distal pericentromeric regions, respectively. Closer inspection of the genomic features intersecting each set of DMRs revealed that more than 80% of CG-DMRs overlapped with protein-coding gene bodies (Fig. 2e), and 73% of CH-DMRs and 44% of C-DMRs overlapped with TEs. The remaining CH-DMRs and C-DMRs were found to overlap mainly with intergenic regions or pseudogenes.
Hierarchical clustering based on differences in DNA methylation showed that the columella cells form a highly distinct group compared with other cells of the root (Fig. 2f). Interestingly, DNA methylation patterns seemed to be more similar between cell types located physically close to one another in the root, regardless of their lineage, whereas transcriptional profiles were more dependent on cell lineage than physical position in the root (Supplementary Fig. 4). This may suggest that methylation patterns are in part regulated by positional information or cell–cell communication. Columella cells were highly distinct in their DNA methylation landscape, particularly in the mCHH context. Methylation at CH- and C-DMRs was higher in the columella than in other cell types, suggesting that CHH hypermethylation in the columella is the primary basis for CH- and C-DMRs among root meristem cells (Fig. 2g and Supplementary Fig. 5).
As mCHH is deposited by two distinct DNA methyltransferases, DRM2 and CMT2 (refs 3,4), we sought to determine which methyltransferase was responsible for mediating changes in mCHH in each set of DMRs. We analysed mCHH levels within DMR coordinates in leaves of wild-type, drm1 drm2 and cmt2 plants to categorize DMRs as DRM2 or CMT2 targets, using previously published DNA methylation data20 (Supplementary Fig. 6). For CH-DMRs, both drm1 drm2 and cmt2 showed decreased mCHH in these regions, but the effect of cmt2 was much larger, whereas for C-DMRs only drm1 drm2 caused a decrease in mCHH levels. These results reveal that mCHH within CH-DMRs and C-DMRs is mainly catalysed by CMT2 and DRM2, respectively. DRM2 is involved in two types of RdDM: the canonical Pol IV-mediated RdDM guided by 24-nt smRNAs1,2, and RDR6-mediated RdDM guided by 21- and 22-nt smRNAs21. We detected upregulation of 21–24-nt smRNA abundance within both CH-DMRs and especially C-DMRs in the columella, but 24-nt smRNAs were predominant (Fig. 2h and Supplementary Fig. 7), suggesting that the canonical Pol IV-mediated RdDM pathway plays a major role in establishing these DMRs. We did not observe higher steady-state transcript abundance of TEs in the columella (Supplementary Fig. 8).
Gene body methylation in the CG context is correlated with constitutive gene expression22,
Transposable elements are targets for CHH hypermethylation
Although only a small percentage of CH-DMRs were found to intersect with gene bodies (Fig. 2f), these still represented more than 1,000 gene loci because of the abundant nature of CH-DMRs. To further investigate whether there was a correlation between mCHH levels within genes and the transcript abundance of those genes, we placed all TAIR10 genes in order on the basis of the average transcript abundance among cell populations and further analysed patterns of DNA methylation (Fig. 3). This revealed that whereas levels and patterns of mCG and mCHG were similar between cell types (Fig. 3a) lowly expressed and silent genes were CHH hypermethylated in the columella. Furthermore, we found that the number of genes harbouring TEs was also enriched in genes with lower expression (Fig. 3a), suggesting that increases in mCHH within lowly expressed genes may be due to the hypermethylation of TEs contained within these genes. As mCHH serves to transcriptionally silence TEs in Arabidopsis, and most CH-DMRs were found within annotated TEs, we compared patterns and levels of DNA methylation across all TEs in the genome (Fig. 3b–d). Levels of mCG and mCHG in TEs were only moderately higher in both of the columella cell populations, consistent with our observations on a genome-wide scale (Fig. 1c). However, a large increase in mCHH in in TEs in both of the columella cell populations was observed compared with the other cell types, and this was consistent across all known TE superfamilies in Arabidopsis (Supplementary Fig. 10). This indicates that, although some CH-DMRs were found to intersect with protein-coding genes, differences in mCHH between cell types can be attributed almost entirely to the CHH hypermethylation of TEs in the columella. As TEs are greatly enriched in the pericentromeric heterochromatin, this would also explain the enrichment of mCHH and CH-DMRs in the pericentromeric regions (Figs 1c and 2d).
Enhanced RNA-directed DNA methylation in the columella
As we observed an increase in mCHH in TEs, as well as an increase in 24-nt smRNA abundance at CH-DMRs, we next sought to determine whether there might be transcriptional upregulation of the RdDM pathway in the columella. Analysis of the RNA-seq data revealed an increase in transcripts encoding components of the RdDM pathway in the columella compared with the other cell populations (Fig. 4a). In particular, we found an enrichment for transcripts encoding proteins needed for smRNA biogenesis, such as the major unique Pol IV component NRPD1a, as well as CLSY1, RDR2 and DCL3, and those components involved directly in the deposition of DNA methylation were only mildly upregulated in the columella25,
DDM1 protein is not present in the columella
In the vegetative cell nucleus of the pollen, a loss of mCG and mCHH throughout the genome is coupled with increased mCHH at the centromere, the absence of DDM1 protein, and loss of heterochromatin9,10 (Supplementary Fig. 11). This triggers TE transcriptional activation and increased production of 21-nt smRNAs from TE transcripts, which are thought to be transported to the sperm cells to reinforce TE silencing in the germline9,10. We observed an increase in 24-nt smRNAs and CHH hypermethylation of TEs in columella cells. Although no decrease in DDM1 transcript abundance specific to the columella was detected (Figs 4a and 5a), analysis of a transgenic line expressing the DDM1–GFP fusion protein revealed that DDM1–GFP was undetectable in the columella, whereas it was present in the nuclei of other root cell types (Fig. 5b). This indicates that DDM1 is transcribed in the columella, but either the transcripts are not translated or there is rapid degradation of DDM1 protein. Despite an apparent lack of DDM1 in the columella, and in contrast to ddm1, normal levels of mCG and mCHG are maintained at TEs, and there are elevated levels of mCHH (Figs 3d and 5c).
DDM1-dependent mCHH deposition is catalysed by the DNA methyltransferase CMT2 (ref. 3), and the RdDM pathway together with CMT2 are responsible for almost all mCHH in the genome4. We classified all methylated TEs into four clusters, based on the mCHH levels within TE bodies of the wild-type, drm1 drm2, cmt2 and ddm1 leaf tissue (Fig. 5c, left). mCHH levels in TEs in clusters 1 and 2 were decreased in drm1 drm2, indicating that they were RdDM-dependent. mCHH levels in TEs in cluster 3 and cluster 4 were decreased in cmt2, indicating that their methylation was CMT2 dependent. Strikingly, in the columella, TEs in all four clusters were hypermethylated (Fig. 5c). RdDM-dependent TEs were hypermethylated, accompanied by 24-nt, but not 21-nt, smRNA accumulation. CMT2-dependent TEs were hypermethylated in the columella, and those located in chromosome arms were accompanied by 24-nt smRNA accumulation, consistent with 24-nt smRNA enrichment in CH-DMRs. The edges of CMT2-dependent TEs are subjected to RdDM, and 24-nt smRNAs are enriched in these regions3. However, the edges as well as the bodies of CMT2-dependent TEs accumulated 24-nt smRNAs in the columella (Fig. 5c,d), suggesting that the bodies of CMT2-dependent TEs are also subjected to RdDM. This may account for CHH hypermethylation of CMT2-dependent TEs, but lower expression of CMT2 in the columella (Fig. 4a).
DDM1 is normally required for the displacement of histone H1 at heterochromatic regions of the genome, allowing DNA methyltransferases MET1, CMT3 and CMT2 to access and methylate the DNA3. As loss of H1 suppresses the reduction in DNA methylation in ddm1 mutants, we examined transcript levels for the two canonical histone H1 genes, H1.1 and H1.2, and observed lower abundance of transcripts for both genes in columella cells than in other cells in the root meristem (Fig. 4a). Also H2A.W6 and H2AW.7, which are required for chromatin condensation29, were downregulated in the columella (Fig. 4a), suggesting that the columella may lose heterochromatin by a reduction of heterochromatin-related components. Loss of heterochromatin in the columella may play a role in enhancing generation of the 24-nt smRNA transcripts needed for RdDM, leading to the observed CHH hypermethylation of TEs.
Plants are complex multicellular organisms that contain a broad variety of cell types with specialized functions. Although differences in patterns of DNA methylation have been observed previously between different somatic tissues16 and reproductive cell types9,
Seedlings were grown vertically for 6 days after plating on 1× Murashige and Skoog media supplemented with 1% sucrose and 1% agar. All seedlings were grown under standard long day conditions (16 h of light, 8 h of darkness, 22 °C). FACS was performed using cell-specific GFP lines as described previously35. The columella root cap was marked with the enhancer trap PET111 (ref. 36), the bottom two layers of the columella were marked with ProCYCD5–GFP (ref. 37), the stele with ProWOL–GFP (ref. 38), the endodermis with ProSCR–GFP (ref. 39), the cortex with ProCORTEX–GFP (ref. 40), and both the epidermis and lateral root cap with ProWER–GFP (ref. 41). Sorted cells were collected directly into specific lysis buffers that were compatible with downstream applications. Cells used for bisulphite sequencing, mRNA-seq and smRNA-seq were lysed in Buffer AP1 (Qiagen), Buffer RLT (Qiagen), Trizol (Invitrogen). All samples were immediately stored at −80 °C until genomic DNA and RNA had been extracted using the DNeasy Plant mini kit (Qiagen) and the RNeasy Plant mini kit (Qiagen) or Trizol, respectively.
MethylC-seq library preparation, read mapping and base calling were performed as described previously42,
Identification of differentially methylated regions
DMRs were identified using the methylpy pipeline45. Briefly, differentially methylated sites (DMSs) were identified by root mean square tests with a false discovery rate at 0.05 using 1,000 permutations. Cytosine positions at least with four reads were examined for differential methylation. Then, DMSs within 200 bp were collapsed into DMRs. DMRs were classified into CG-DMRs (only CG difference), CH-DMRs (only CHG and/or CHH difference), C-DMRs (CG and CHG and/or CHH difference). In addition, CG-DMRs, CH-DMRs and C-DMRs with fewer than five, five and ten DMSs, respectively, were discarded in the following analysis. Differential methylation tests were performed among samples, not in a pairwise manner, generating a set of all non-redundant DMRs among the samples. The methylation levels in each region were calculated as weighted methylation levels46, in which the methylation level was equal to the frequency of C base calls at C positions within the region divided by the frequency of C and T base calls at C positions within the region.
RNA-seq library preparation was performed using the Illumina TruSeq RNA Library Prep kit from polyA+ selected mRNA as per the manufacturer's instructions. smRNA sequencing data were obtained from a previous study47. smRNA data were processed and mapped to the TAIR10 genome as described previously48. smRNAs levels were normalized to TE size and library size by counting the reads per kilobase of TE per million reads mapped (RPKM). Only reads that mapped uniquely to the genome contributed to the average count for each TE. RNA-seq data were mapped to the TAIR10 reference genome using Tophat2 with the default parameters49 and quantified using Cuffdiff (ref. 50).
Associating DMRs with proximal genes
DMRs located within 3 kb of gene upstream regions, gene bodies and 3 kb of gene downstream regions were extracted, and relative position to genes were assigned by the middle position of DMRs. Some DMRs were located within multiple genomic features, for example in the 3 kb upstream regions, gene bodies or 3 kb downstream regions for more than one gene. We refer to all possible pairwise comparisons between DMRs and nearby genomic features as ‘combinations’. Pearson correlation coefficients between the methylation levels of DMRs and the expression levels of proximal genes (FPKM) were computed and plotted as density.
mCHH levels within annotated TE bodies at least 400 bp in length were computed, and only TEs with a minimum of 10% of mCHH in at least one sample from Col-0, drm1 drm2, cmt2 and ddm1 were assigned as methylated TEs. TEs were then clustered into four clusters by using R k-means function, with the ‘centres’ parameter set to 4.
The DDM1–GFP transgenic line has been described previously9. Seeds were plated on 1/2× Linsmaier and Skoog media. Three days after germination, seedlings were incubated in propidium iodide for 5 min to stain cell walls of root tips, and imaged using Zeiss LSM 710 Confocal Microscope.
All sequence data can be downloaded from NCBI GEO under accession GSE79710, and can also be viewed at http://neomorph.salk.edu/Arabidopsis_root_methylomes.php.
We thank K. Slotkin (Ohio State Univ., USA) and J.A.H. Murray (Univ. Cardiff, UK) for kindly providing DDM1–GFP seeds and ProCYCD5–GFP seeds, respectively. T.K. was supported by the Japan Society for the Promotion of Sciences Research Abroad Fellowship. T.S was supported by the Jean Rogerson Postgraduate Scholarship. This research was supported by grants from the National Science Foundation (MCB-1344299 to J.R.E and IOS-1021619 to P.N.B.), by the National Institutes of Health (GM R01-043778 to P.N.B.) and by the Gordon and Betty Moore Foundation (GBMF3034 to J.R.E and GBMF3405 to P.N.B.). R.L. was supported by the Australian Research Council (FT120100862). R.J.S. was supported by the National Institutes of Health (R00GM100000). J.R.E. and P.N.B. are investigators of the Howard Hughes Medical Institute.