Divergent DNA methylation patterns associated with gene expression in rice cultivars with contrasting drought and salinity stress response

DNA methylation is an epigenetic mechanism that play an important role in gene regulation in response to environmental conditions. The understanding of DNA methylation at the whole genome level can provide insights into the regulatory mechanisms underlying abiotic stress response/adaptation. We report DNA methylation patterns and their influence on transcription in three rice (Oryza sativa) cultivars (IR64, stress-sensitive; Nagina 22, drought-tolerant; Pokkali, salinity-tolerant) via an integrated analysis of whole genome bisulphite sequencing and RNA sequencing. We discovered extensive DNA methylation at single-base resolution in rice cultivars, identified the sequence context and extent of methylation at each site. Overall, methylation levels were significantly different in the three rice cultivars. Numerous differentially methylated regions (DMRs) among different cultivars were identified and many of which were associated with differential expression of genes important for abiotic stress response. Transposon-associated DMRs were found coupled to the transcript abundance of nearby protein-coding gene(s). Small RNA (smRNA) abundance was found to be positively correlated with hypermethylated regions. These results provide insights into interplay among DNA methylation, gene expression and smRNA abundance, and suggest a role in abiotic stress adaptation in rice.

Rice is an important crop accounting for food security of over half the world population. Water-deficit and salinity are the major abiotic factors that affect rice productivity worldwide. Rice germplasm exhibit variability in their response to these abiotic stresses; some genotypes possess ability to tolerate extreme drought and salinity stresses, whereas many of them are highly susceptible. These phenotypic variability may be attributed to genetic and epigenetic variations, and different regulatory architecture among them. The study of tolerance/response mechanisms to abiotic stresses has been intensively worked out based on genomics, transcriptomics and proteomics analyses. Several genes and genomic variations involved in stress responses have been identified and a few regulatory networks have been proposed [22][23][24][25] . However, only a few studies have analyzed the epigenetic regulation of stress response in rice, that too at low-throughput level, mostly focused on a set of genes/loci 26,27 . High-throughput sequencing technologies provide an opportunity to recognize the DNA methylation at single-base resolution quickly and comprehensively in different biological contexts in plants 18,[28][29][30] . Although several techniques have been applied to study genome-wide DNA methylation, most detailed methylome maps at single-base resolution have been obtained using bisulphite sequencing [31][32][33][34] .
A global analysis of extent and pattern of DNA methylation in rice varieties with contrasting response to abiotic stresses is lacking. In this study, we investigated the differences in DNA methylation patterns of three different rice cultivars with contrasting responses to drought and salinity stresses. The expression of stress-responsive genes was found to be influenced by methylation status of associated regions, which suggested possible role of DNA methylation in stress adaptation. The high resolution methylome maps of different rice genotypes and differentially methylated regions contributes to our understanding of the role of epigenetic regulation of stress responses in plants.

Results
Three well-characterized rice cultivars [IR64, drought and salinity sensitive; Nagina22 (N22), drought-tolerant; Pokkali, salinity-tolerant] with contrasting response to drought and/or salinity stresses, were used in this study for genome-wide analysis of DNA methylation, gene expression and small RNA profiling to reveal the epigenetic regulation of abiotic stress response. These rice cultivars are commonly used worldwide for the generation of various genomic resources and donors of stress-related agronomic traits 22,25,35,36 . DNA methylation landscape of rice genome in different cultivars. Genome-wide profiling of DNA methylation using bisulphite sequencing was performed in IR64, N22 and Pokkali rice cultivars. In total, 160-212 million high-quality reads were obtained (Supplementary Table S1), reflecting > 30 × genome coverage for each sample. Only the sequence reads that mapped uniquely to the reference rice genome were considered for the analysis. The uniquely mapped reads provided 87-89% coverage of the rice genome and over 80% (81-83%) of cytosines (Cs) in the genome (Supplementary Table S1). Using the non-conversion rate of Cs (0.05%) in the non-methylated chloroplast genome as error rate, the methylation status of each C was determined at a P-value cut-off of 0.001. Applying this criteria, we identified methylated C residues (mCs) for which sufficient read depth (at least five) was available. Overall, the frequencies of mCs were similar in IR64 (11.32%) and Pokkali (11.5%), and marginally higher in N22 (12.30%). Of the total mCs, the highest fraction was of CG (47-49%) followed by CHG (28-31%) and CHH (20-24%) for all the three rice cultivars (Fig. 1a). The methylation level of mCs in CG context was much higher than CHG and CHH contexts (Fig. 1b). The average methylation levels was also highest in CG context (87-88%) followed by CHG (67-68%) and CHH (41-43%) contexts ( Supplementary Fig. S1). The relative frequency of mCs in different contexts and the tendency of higher methylation level in CG context and lower methylation levels in CHH context were similar to those observed in other plants 10,32,37 . A higher methylation was observed in N22 for Cs in all contexts (Fig. 1b). Overall, significant differences were observed in the methylation levels among different rice cultivars.
We observed extensive DNA methylation in the pericentromeric regions of the chromosomes as compared to ends, reflecting dense methylation of repetitive sequences (Fig. 1c, Supplementary Fig.  S2). A considerable number of mCs were observed in the non-repetitive intergenic and genic regions as well. The comparison of DNA methylation levels with density of genes and transposable elements (TEs) revealed a positive correlation with the density of TEs and negative correlation with gene density. We found substantial variations between IR64 and N22 in the distribution of mCs in all contexts, whereas only few regions showed variations in distribution of mCs between IR64 and Pokkali (Fig. 1c,  Supplementary Fig. S2). Further, we examined strand-specific distribution of DNA methylation in the three rice cultivars. Both DNA strands (sense and anti-sense) of the genome exhibited similar methylation levels with minor differences (Supplementary Fig. S3).
DNA methylation patterns in protein-coding genes and TEs. We analyzed DNA methylation patterns in genic and TE regions. In general, CHG methylation levels were much higher in TEs (28-32%) as compared to protein-coding genes (7-9%) in all the rice cultivars (Fig. 2a). The CG context methylation of TEs was also higher than genes in N22, which was in contrast to that of IR64 and Pokkali. In CHH context, the methylation level of TEs and genes was similar in N22, but was lower for TEs than genes in IR64 and Pokkali. These results suggest that protein-coding genes and TEs are differentially recognized for methylation in different rice cultivars. Further, we investigated patterns of DNA methylation in various genomic features in all contexts ( Supplementary Fig. S4). For CG, largest number of mCs were located within gene body in rice cultivars. However, the frequency of mCs in CHG and CHH contexts were considerably higher in the upstream and downstream regions as compared to the gene body. Least number of CHH context mCs existed in the exonic regions (CDS and UTRs). Further, we analyzed the pattern of DNA methylation within gene body, and upstream and downstream flanking sequences. DNA methylation sites in all contexts increased rapidly while moving upstream to transcriptional start sites (TSSs) and downstream to termination sites (TTSs, Fig. 2b). In contrast to CHG and CHH, number of mCs in CG context was higher in the gene body as compared to the flanking sequences. Notably, the frequency of mCs in CHH context was higher than CG context mCs in the upstream region near the TSSs. Similar observations have been made in other plant species too 38,39 . However, the methylation level throughout the upstream, gene body and downstream regions was highest in CG context followed by CHG and CHH ( Supplementary Fig. S5).
Because DNA methylation has a major role in transposon silencing, we examined the patterns of DNA methylation in TEs as well (Fig. 2c). The analysis revealed enrichment of DNA methylation throughout the TE body as compared to flanking sequences in all contexts. However, the enrichment of CG and CHG methylation was several-fold higher than CHH methylation. Notably, the methylation level was markedly higher in the TE body as compared to protein-coding genes. Also, TEs were highly methylated near the TSSs and TTSs as opposed to the protein-coding genes (Fig. 2b,c), consistent with earlier reports 10,13 . Among the rice cultivars, the patterns of DNA methylation in CHG and CHH contexts were almost similar, whereas substantial differences were observed for CG context. The TE body was most highly methylated in N22 followed by IR64 and Pokkali.
Differentially methylated regions in different rice cultivars. To investigate the differential methylation in the three rice cultivars, we identified differentially methylated regions (DMRs). For identification of DMRs, we calculated methylation levels in 100 bp bins. A total of 64,212 DMRs between N22 and IR64 (N22/IR64), and 35,723 DMRs between Pokkali and IR64 (PK/IR64) could be identified (q-value ≤ 0.01, Fisher's exact test followed by SLIM correction) (Fig. 3a). We observed that hypermethylation was more common in N22, whereas the frequency of hyper and hypomethylation was similar in Pokkali. The DMRs were found more likely to be located near (2 kb upstream and downstream) the genes (42-45%), but not so often within the gene body (16-18%) ( Supplementary Fig. S6). Overall, the DMRs present within/near protein-coding genes presented the major fraction (57-62%) of total DMRs. The genes with DMRs within their body and 2 kb flanking sequences were regarded as DMR-associated genes. A large number of protein-coding rice genes were identified as DMR-associated genes in N22/ IR64 (57.9%) and PK/IR64 (38.5%). In N22/IR64, number of hyper DMR-associated genes were significantly higher (2.2 times) than hypo DMR-associated genes, whereas the number of hyper and hypo DMR-associated genes were almost similar in PK/IR64 (Fig. 3b). Gene ontology (GO) analysis revealed that genes involved in diverse biological processes, such as metabolic processes, response to stress, signal transduction, translation and epigenetic regulation of gene expression were differentially methylated among rice cultivars ( Supplementary Fig. S7). Notably, a large fraction of these genes were found to be associated with response to abiotic stress. The enrichment analysis of DMR-associated genes revealed significant overrepresentation of genes involved in metabolic processes (lipid metabolic process and secondary metabolic process) and response to stress in both N22/IR64 and PK/IR64 (Fig. 3c).

Differential gene expression in different cultivars.
We determined the transcript abundance of all the rice genes in IR64, N22 and Pokkali cultivars using RNA-seq approach (Supplementary Table S2). The transcripts showing at least two-fold change with P-value less than 0.05 were identified as differentially expressed genes (DEGs). A total of 5172 (2620 upregulated and 2552 downregulated) and 3226 (1202 upregulated and 2024 downregulated) rice transcripts were found to be differentially expressed in N22/ IR64 and PK/IR64, respectively (Fig. 4a, Supplementary Table S3). Among these, 1395 (544 upregulated and 851 downregulated) transcripts were differentially expressed in both the comparisons. The genes involved in various cellular processes, including metabolic processes, amino acid metabolism, cell wall components, response to abiotic stimulus (osmotic stress, salt stress and water stress), defense response, photosynthesis, transcription and signal transduction were well represented among the differentially expressed genes (Supplementary Table S3). At least 356 transcription factor encoding genes belonging to 50 families were found to be differentially expressed in N22 and/or Pokkali as compared to IR64. The genes belonging to MYB/MYB-related, AP2-EREBP, bHLH and homeobox families were most highly represented among the differentially expressed genes (Fig. 4b). In addition, a large number of members of C3H, NAC, bZIP, PHD and WRKY families were also differentially expressed in N22 and/or Pokkali. GO enrichment analysis revealed that the genes involved in response to abiotic stimulus and amine metabolic processes were significantly enriched in both N22 and Pokkali cultivars (Fig. 4c). In addition, the genes involved in translation, protein folding and regulation of catalytic activity were significantly enriched in N22 cultivar. The changes in expression of many of these genes under stress conditions have been reported in previous studies too and implicated in stress tolerance mechanisms 10,35,36,40,41 . The modulation of various metabolic pathways, such as osmoprotectants, cell wall components, amino acid metabolism, lipid metabolism, hormone metabolism and secondary metabolites have been correlated with the physiological differences among the rice cultivars under stress conditions. Differential methylation is coupled to differential gene expression in different cultivars. To examine influence of DNA methylation on expression of neighbouring genes, we assessed the relationship between DMRs and transcript abundance on a genome-wide scale. A total of about 39% (2010) and 30% (966) of the DEGs in N22/IR64 and PK/IR64, respectively, were found to be associated with DMRs. We found that differential expression of DMR-associated genes was dependent on the direction of change in methylation status (Fig. 5a). In general, the genes proximal to hypermethylated DMRs exhibited lower levels of transcript abundance (downregulation) relative to entire gene set. The genes proximal to hypomethylated DMRs displayed similar or moderately higher levels of transcript abundance (upregulation) compared to all genes. However, some of the DMR-associated genes showed positive correlation with the transcript abundance. DNA methylation appeared to be correlated with on/off status of gene expression too for DMRs (14.5% for N22/IR64 and 7.2% for PK/IR64) ( Supplementary Fig. S8a,b). For other cases, DNA methylation state was correlated with quantitative differences in gene expression. DMRs with negative correlation with gene expression were found more likely located near genes boundaries opposed to those not associated with gene expression.
Furthermore, enrichment analysis of DMR-associated DEGs revealed a negative correlation between methylation status and transcript abundance (Fig. 5b). A significant enrichment of downregulated genes was observed in genes proximal to hyper-DMRs present within the gene body and vice-versa for both N22/IR64 and PK/IR64. However, no significant correlation was observed for the DMRs present in the flanking sequences except for PK/IR64. These results suggest that the gene body methylation play an important role in regulation of gene expression and might contribute, in large part, to the differential stress response of the rice cultivars.
Differential methylation of genes associated with stress response and epigenetic regulation of gene expression. The expression of genes involved in diverse cellular processes vary in different cultivars. The activation or repression of many of these genes is dependent on their chromatin structure, which is mainly determined by epigenetic marks, such as DNA methylation. Several examples of epigenetic regulation of gene expression in response to environmental stress are known 10,15,18,20,26,27,42 . We also found the differential methylation of several genes belonging to various functional categories in the rice cultivars. These genes encoded proteins involved in various cellular processes, such as transcription regulation, metabolic processes and signal transduction (Supplementary Table S4). A substantial number of genes encoded for proteins with unknown function. The differential methylation level and corresponding differential gene expression of rice genes in N22/IR64 and PK/IR64 are shown in Fig. 6a. The GO analysis of DMR-associated genes in both N22/IR64 and PK/IR64 revealed a significant enrichment of genes that participate in stress response (Fig. 6b,c). In addition, genes involved in metabolic processes, such as lipid metabolic processes were enriched in both the comparisons. The epigenetic regulation of gene expression and protein modification processes were other important GO terms significantly overrepresented among the DMR-associated genes in N22/IR64 and PK/IR64, respectively (Fig. 6b). The biological processes, regulation of gene expression, response to abiotic stimulus and chromatin binding were significant among the genes exhibiting on/off gene expression with respect to their methylation status ( Supplementary Fig. S8c). Many of these genes represented known abiotic stress-responsive genes. For example, genes encoding for transcription factors (MYB, AP2-EREBP, WRKY, NAC and HB families), sodium transporter HKT1 homologs, F-box, calcium-dependent protein kinases, proteinases, peptidases, oxidoreductase, glutathione S-transferase, histone deacetylase and putative dicer-like proteins, were differentially methylated in N22 and/or Pokkali cultivars as compared to IR64 (Supplementary Table S4).
A few representative examples of differential methylation of genes involved in abiotic stress response and epigenetic regulation of gene expression have been shown in Fig. 6c. The transcript abundance of these proteins were found to be negatively correlated with their differential methylation status. For example, transcription factors, such as A20/AN1-like zinc finger (Os05g23470), homeobox (Os02g43330 and Os03g43930) and AGL19 (Os10g39130) exhibited differential methylation in N22/IR64 and/or PK/ (c) Heatmap representation of the differential methylation (M) and differential expression (E) of DMR-associated genes known to be involved in abiotic stress response and epigenetic regulation of gene expression. Gene identifiers (MSU v7) and gene descriptions (best Arabidopsis ortholog) are given on the left and right sides of the heatmap, respectively. Colors at the bottom represents status of differential methylation (hypo/hyper) and differential expression (up/down). N22, Nagina 22; PK, Pokkali.
IR64. The methylation status of these transcription factors can regulate the expression of a cascade of several downstream targets. In addition, genes encoding calcium-dependent protein kinase, calmodulin, transporter protein, drought-responsive proteins, detoxifying enzymes (oxidoreductase, glutathione S-transferase and thioredoxin), F-box protein, and heat-shock proteins also exhibited differential methylation status and transcript abundance in the rice cultivars (Fig. 6c). The role of many of these genes in drought/salinity stress response has already been demonstrated. Interestingly, we observed differential methylation of many of the genes involved in epigenetic regulation of gene expression. For example, histone deacetylase (Os08g25570) was hypermethylated (downregulated), but histone methyltransferase (Os01g70220) was hypomethylated (upregulated) in N22. One of the RNA-directed DNA methylation protein (Os06g42430) was found to be hypermethylated with lower transcript abundance in Pokkali. Many of the genes encoding components of gene silencing machinery, such as dicer-like proteins (Os09g14610, Os08g05320 and Os04g43050) exhibited differential methylation and transcript abundance in the rice cultivars. Although the differential expression of a few of these genes have been reported earlier 43,44 , their differential methylation was not known. These proteins are required for establishment and maintenance of different context cytosine methylation and many aspects of epigenetic regulation. Although the exact mechanism of involvement of these proteins remains to be elucidated, our results provide evidence for their role in environmental stress responses.
Methylation status of TEs is correlated with transcription of proximal protein-coding genes. Further, we examined the possibility of involvement of methylation within TEs to regulate gene expression in different cultivars. We determined the relative TE density at DMR-associated DEGs or non-DEGs as compared to all DEGs or non-DEGs, respectively, for N22/IR64 and PK/IR64. The enrichment of TEs was substantially higher for DMR-associated DEGs as compared to non-DEGs for both N22/IR64 (P-value 1.49e-05) and PK/IR64 (P-value 5.26e-05) (Fig. 7a). However, TE density for all expressed genes and DEGs was lower. Further, we analyzed the methylation level difference of DMRs associated with protein-coding genes and TEs. We found significantly higher differences in the methylation level of DMRs associated with TEs as compared to protein-coding genes for PK/IR64 (Fig. 7b).

Validation of DNA methylation and gene expression.
To validate the results of differential methylation patterns, we selected a series of DMR candidate regions. Based on the whole-genome bisulphite-sequencing results, we picked at least 15 genomic regions (DMRs) lying within the genes known to be involved in abiotic stress response and epigenetic regulation of gene expression. In these regions, we performed traditional bisulphite-PCR followed by Sanger sequencing for the three rice cultivars. Notably, more than 90-95% of the methyl cytosines at CG sites were validated, whereas 80-90% and 75-87% of CHG and CHH sites, respectively, could be validated in the three rice cultivars. The correlation between methylation levels of all methyl cytosines ranged from 0.78-0.83 for CG, 0.73-0.77 for CHG and 0.70-0.71 for CHH context (Supplementary Fig. S9). In addition, about 70% of genes/loci reported to be differentially methylated in different stress-related rice cultivars in earlier studies 20,26,27 were represented in our dataset.
Further, we validated transcript levels of at least 16 DMR-associated genes in rice cultivars by real-time PCR analysis. A high correlation (Pearson correlation 0.75) between the differential gene expression results from RNA-seq and qRT-PCR data, was obtained ( Supplementary Fig. S10). Overall, the results were in concordance with results obtained from whole genome bisulphite sequencing and RNA-seq data analyses indicating high-quality of our data.

Role of small RNAs in DNA methylation.
A correlation between small RNA (smRNA) abundance and DNA methylation has been established by many studies 10,18,29,32 . Therefore, we sought to investigate the relationship between DNA methylation and smRNAs in the rice cultivars. To perform genome-wide discovery of smRNAs, small RNA libraries for IR64, N22 and Pokkali seedlings were sequenced and more than 15 million reads were generated for each sample. After pre-processing, the unique reads (smR-NAs) of 21-24 nt were aligned to the rice genome (Supplementary Table S5). A total of 46-47% of the smRNAs (21-24 nt) mapped uniquely to the rice genome. The small RNAs were well distributed throughout the rice genome without any obvious difference in the different cultivars (Fig. 8a, Supplementary  Fig. S2). The integration of genomic coordinates of smRNAs with the rice genome annotation revealed that a large fraction (68-70%) of smRNAs were originated from the genic and flanking sequences. The abundance of smRNAs was higher in the upstream sequences as compared to the gene body and downstream sequences (Fig. 8b). Highest abundance of smRNAs was observed in the upstream sequences near the TSSs. Within the genebody, highest abundance of smRNAs was observed at the 3′ -end. However, the abundance of smRNAs was much higher within TE body as compared to the upstream and downstream sequences (Fig. 8c). This pattern seems to be consistent with the CHH methylation patterns in protein-coding genes and TEs (Fig. 2b,c). Further, we searched for the presence of mCs in different contexts within the smRNA loci. Overall 9-10% of total mCs were associated with smRNA loci in different rice cultivars and 53-55% of smRNA loci contained at least one mC. Notably, the proportion of CHH context mCs lying within smRNAs (13.1-15.2%) was substantially higher as compared to CG (7.5-8.5%) and CHG (8.5-9.3%) context mCs in all the three rice cultivars (Fig. 8d). The higher frequency of CHH context mCs located within the smRNA loci suggested a positive correlation between them.
Next, we investigated the correlation between smRNA abundance and differential DNA methylation. We estimated the abundance of smRNA in all the protein-coding genes and DMR-associated genes in the rice cultivars. We found that abundance of smRNAs was significantly higher in the DMR-associated genes in all the three cultivars (Fig. 8e). In addition, we noticed significantly higher (~1.6-1.8 fold) enrichment of smRNAs in the hypermethylated genes and significant (~two-fold) depletion of smRNAs in the hypomethylated genes in both N22/IR64 and PK/IR64 comparisons (Fig. 8f). These results suggest that smRNAs participate in alteration of DNA methylation levels in different rice cultivars.

Discussion
Growing evidences from recent studies have suggested that DNA methylation plays a crucial role in regulation of stress responses/adaptation in plants 15,26,45,46 . Rice has rich germplasm resources with variability in their adaptive response to environmental cues, such as drought and salinity. The epigenetic marks, such as DNA methylation may be responsible for phenotypic consequences like tolerance to abiotic stresses. Therefore, it is important to investigate the DNA methylomes of rice cultivars with contrasting phenotype towards abiotic stress tolerance to understand the epigenetic regulation of stress adaptation and identify specific marks that contribute to the modulation of this agronomic trait.
In this study, we assessed the dynamics of DNA methylation at single base resolution in rice cultivars with contrasting drought and salinity stress response. Our results revealed several features regarding distribution of mCs, differential methylation patterns and their relationship with gene expression. The highest fraction of mCs in CG context followed by CHG and CHH contexts observed in the rice cultivars seems to be due to higher statistical power in detecting change at CG sites followed by CHG and CHH sites, which follow the same pattern of average methylation levels 28,32 . Significant differences were observed in the methylation patterns in different rice cultivars with contrasting stress response. We found several cultivar-specific methylated regions in the rice genome and detected genome-wide differences in DNA methylation status, which were coupled to the gene expression, strengthening the possible role of epigenetic mechanisms in abiotic stress adaptation. Significant differences in methylation patterns of specific loci have been reported in different cultivars/lines with contrasting phenotype under stress conditions 26,27,46,47 . Hypomethylation/demethylation is considered as a common feature associated with adaptive response to various stresses 19,[47][48][49] . Based on methylation sensitive amplified polymorphism analysis, it has been shown that hypomethylation and hypermethylation are more frequent in drought-tolerant and drought-sensitive rice genotypes, respectively, under drought stress conditions 50 . Other studies also suggested the genotype and developmental/tissue specificity of epigenetic regulation of abiotic stress responses in rice cultivars 26,27 . Differential methylation in IR64, N22 and Pokkali rice cultivars under stress conditions is also expected to contribute to their contrasting stress response phenotype. Overall, these results suggest cultivar-specific changes in DNA methylation and different unknown mechanism(s) might be responsible for the differential stress responses.
Our genome-wide analysis provide unique insight into the dynamics of DNA methylation in rice cultivars. The results suggest that DNA methylation regulate stress responsive genes via altering their expression. Our observations are consistent with previous reports associating methylation status of the DNA with transcriptional control of specific loci in different plant species 18,48,[51][52][53] . Although we found significant enrichment of down and upregulated genes associated with hyper and hypo-DMRs, respectively, a large fraction of the DEGs did not exhibit significant differences in their methylation levels among cultivars. It may be speculated that the effect of DNA methylation on gene expression may be mediated either directly or indirectly via some transcriptional regulatory proteins, which can recognize mCs in the promoter regions. A mechanism involving the recruitment of methyl CG-binding proteins to remodel chromatin by utilizing histone deacetylase activity has been proposed that regulate gene expression [54][55][56][57] . An alternate mechanism of gene silencing via inhibition of transcription activator binding due to promoter DNA methylation has also been reported earlier [58][59][60] . Further, we observed a stronger correlation of gene body methylation/demethylation as compared to that of flanking sequences with decreased/increased transcript abundance. These results are consistent with previous reports, which demonstrated that DNA methylation in gene bodies affects gene expression more effectively than the promoter methylation in various organisms including rice 18,48,[61][62][63] . However, these observations are in contrast with other studies, which reported a strong correlation between promoter methylation and gene expression 42,64 . Further studies are required to clarify that how the transcript abundance is regulated differentially by the gene body or promoter methylation in different biological contexts. Overall, the identification of diverse categories of genes including those involved in abiotic stress response with differential methylation patterns provides evidence that epigenetic modifications play a crucial role in plants adaptation/response to abiotic stresses.
The association of DNA methylation with inactivation of transposon activity is well known 65,66 . Several genome-wide studies have shown heavy methylation of transposon sequences 67,68 . It is now well established that the position and methylation status of nearby transposon(s) can regulate the gene expression 39,42,[69][70][71] . We also observed heavy methylation of TEs in rice cultivars and their methylation patterns were quite different than protein-coding genes. Higher methylation of TEs is due to their ability to recruit the silencing machinery and is considered as an evolutionary mechanism to silence their expression and mobility 65,66 . Many of the DMRs were associated with TEs and changes in the methylation of TEs was coupled to the expression of proximal protein-coding genes. Similar results have been reported in other studies on different plants as well 18,42,72 . A few evidences have demonstrated that abiotic stresses can evoke heritable changes in the epigenetic framework that can confer enhanced stress tolerance in the progeny due to transgenerational memory 45,73,74 . Interestingly, a copia-type retrotransposon (ONSEN) in Arabidopsis was found to have heat-induced transcription and transposition activity, which was found to be transgenerationally inherited 75 . Altogether, it can be speculated that changes in DNA methylation patterns can remove epigenetic constraints from TEs, thereby resulting in transcriptional changes in stress-responsive genes.
The de novo DNA methylation is established mainly via small RNA (smRNA)-guided RdDM pathway [76][77] . Recently, bisulphite sequencing of selected genes in rdd mutant (a triple DNA demethylase mutant) Arabidopsis plants revealed that RdDM-associated CHH methylation play a positive role in stress-responsive gene expression 42 . We observed a concordance in distribution patterns of mCs in CHH context and smRNAs within gene/TE body and their flanking sequences in the rice cultivars. In addition, a larger fraction of CHH context mCs were associated with smRNA loci. As CHH methylation is established by de novo methyltransferases only, our data also suggested a larger participation of smRNAs in CHH methylation which can regulate the target gene expression. Further, smRNAs were significantly enriched in hypermethylated regions, but depleted in hypomethylated regions. Overall, these results suggest that smRNAs also play a crucial role in shaping the DNA methylation landscape.

Conclusions
Our results suggest the potential role of cultivar-specific DNA methylation patterns as an important regulatory mechanism for sensing and responding to the stress conditions via modulation of stress-responsive gene expression. Our investigation suggest that variability for drought/salinity tolerance in rice germplasm is dependent on the extent and patterns of DNA methylation. We have provided evidence suggesting that DNA methylation play an important role in abiotic stress adaptation/response by regulating expression of a set of stress-responsive genes in rice largely via methylation/demethylation of proximal TEs. The DMRs described here provide an initial set of targets of epigenetic variations across rice germplasm and may provide basis of future selection strategies. Further, it would be interesting to study the changes in methylation pattern in the rice cultivars under stress conditions. Overall, the understanding of epigenetic regulation of abiotic stress responses can have a significant impact in breeding for development of improved varieties with enhanced stress tolerance.

Methods
Plant material and genomic DNA isolation. Three rice (Oryza sativa) cultivars, IR64 (drought and salinity sensitive), Pokkali (salinity-tolerant) and Nagina 22 (drought-tolerant), were used for DNA methylome analysis in this study. Rice seeds after surface sterilization with sodium hypochlorite solution were grown in hydroponics in a culture room under 14 h light/10 h dark conditions at 28 ± 1 °C. After two weeks of growth, seedlings of different cultivars were harvested, frozen in liquid nitrogen and stored at − 80 °C till further use. The experimental set-up was repeated to collect three independent biological replicates of each tissue sample. Genomic DNA was isolated from frozen tissues using Qiagen DNeasy Minikit (Qiagen) as per manufacturer's protocol. The quality and quantity of the genomic DNA samples were checked by Nanodrop Spectrophotometer (Thermo Scientific) and Qubit Fluorimeter (Life Technologies). Agarose gel electrophoresis was also carried out to ascertain the quality of genomic DNA samples.
Whole genome bisulphite sequencing. The genomic DNA isolated from IR64, N22 and Pokkali seedlings (pooled in equal quantity from the three independent biological replicates) were processed for bisulphite sequencing. The genomic DNA samples were fragmented via sonication to a size of 100-300 bp, end repaired and TruSeq-methylated adapters were ligated to the DNA fragments. Approximately, 500 ng of adapter-ligated DNA fragments were used for bisulphite conversion using EZ DNA Methylation-Gold TM kit (Zymo Research Corporation, CA, USA) according to manufacturer's protocol. After desalting, size selection and PCR amplification, library quality was analyzed. The qualified libraries were sequenced on the HiSeq 2000 system (Illumina) for 90 cycles in paired-end mode to achieve more than 30 × genome coverage for each sample and processed sequenced data (after removal of reads containing adaptor sequences and low-quality reads) was obtained. The sequence data were further filtered using our in-house NGS QC Toolkit (v2.3) 78 .
Read alignment and identification of mCs. The filtered 90 bp paired-end reads from each sample were aligned to the rice genome sequence (MSU v7.0) using Bismark (v0.8) 79 under default parameters. Reads from each sample were processed to remove clonal reads by retaining only one of such reads. Only the reads aligned at unique location in the genome were retained. Alignment of all the reads was performed on the rice chloroplast genome also to estimate the bisulphite conversion efficiency and error-rate. A bisulphite conversion efficiency of ≥ 99% was observed for all the samples. The non-conversion rate of chloroplast genome Cs was considered as a measure of false discovery rate (error rate). The alignment files on the rice genome were used as input in Methylkit (v0.5.6) for further analysis. The mCs were identified by P-value calculation using binomial distribution [P = binomial (m, x, error_rate), where m = number of reads giving methylated call, n = number of reads giving unmethylated call; x = m + n, the sequencing depth of single C and error_rate was the error rate for bisulphite nonconversion of the sample] as described earlier 37 . This step excludes mCs, which might be the result of non-conversion of cytosines during bisulphite conversion. A P-value cut-off of 0.001 and minimum read-depth of five was used to identity true mCs. The methylation level (percentage of reads showing mC among all the reads covering the same cytosine site) of each identified mC was determined. Various analyses, including coverage determination, distribution of mCs in the genome and various genomic features, and determination of frequency of flanking nucleotides were carried out using custom perl scripts.

Identification of DMRs.
For screening of DMRs, we determined the frequency of mCs in each bin size of 100 bp throughout the genome. Only the cytosine sites covered by at least five reads in each sample were considered. For each bin, the methylation level at each cytosine site was calculated for each sample. The bins containing at least three mCs and a minimum difference of 20% methylation level with q-value of 0.01 were identified as DMRs. For estimation of q-value, P-value was determined by Fisher's exact test followed by correction using Sliding Linear Model (SLIM). The DMRs were positioned within gene body or 2 kb flanking sequences based on the position of mid-point relative to gene position coordinates.
Gene ontology (GO) enrichment analysis. The enrichment analysis of GO categories within the methylated genes (methylation at promoters and/or within annotated transcribed regions) was performed using the BiNGO tool and visualized using Cytoscape (v3.7). The GO categories with P-value 0.05 after applying FWER correction were considered as significantly enriched.
RNA sequencing and data analysis. Total RNA was extracted using TRI Reagent (Sigma Life Science, USA), according to manufacturer's instructions. The quality and quantity of RNA samples were assessed using Agilent Bioanalyzer (Agilent Technologies, Singapore) as described previously 80 . For sequencing, cDNA libraries were generated from total RNA of each tissue sample and sequencing was performed on Illumina platform to generate 100 bp paired-end reads. The Fastq files were obtained and various quality controls were performed using NGS QC Toolkit 78 . Filtered high-quality reads were mapped on the rice genome (MSU7) using Tophat (v2.0.0) software. To analyze gene expression, a consensus reference-guided assembly of the transcriptome data from all samples was generated using Cufflinks (v2.0.2) and differential expression of genes among rice cultivars was determined by Cuffdiff as described 81 . Only the genes exhibiting significant difference (at least two-fold change with P-value ≤ 0.05) were considered.
Methylation-expression correlation analysis. The correlation between methylation and gene expression was determined by comparison of methylation status of DMR-associated genes and their expression level/differential expression measured by RNA-seq. A box-and-whisker plot (boxplot R function) of the differential expression levels of the genes associated with hypo-or hyper-methylated DMRs and all rice genes was generated. The significance of differences was estimated using a two-tailed Wilcoxon rank-sum test (wilcox.test R function) using R programming environment.
Small RNA sequencing and data analysis. The total RNAs isolated from seedlings of IR64, N22 and Pokkali rice cultivars were used for library preparation using TruSeq Small RNA Sample Prep Kit (Illumina Technologies) according to the manufacturer's instructions. Each small RNA library was sequenced for 50 cycles on Illumina platform and the sequence data was obtained in FASTQ files for further processing. The sequence data was pre-processed using modified perl script as described previously 82 and only the unique reads were retained. Further, all the unique reads from each sample were mapped to the rice genome sequence using Bowtie2. Only the uniquely mapped reads of 21-24 nt in length were used for subsequent analysis.
Locus-specific bisulphite sequencing of selected DMRs. About 1 μ g of genomic DNA extracted from unstressed control seedlings of rice genotypes was bisulphite converted using EZ DNA Methylation-Gold TM kit (Qiagen). The bisulphite converted DNA was desalted and an aliquot of it was used for PCR amplification of the selected genomic regions representing DMRs using locus-specific forward and reverse primers. The purified PCR products were cloned in pGEM-T Easy vector (Promega) and confirmed clones were sequenced via Sanger sequencing method. An average of 10 clones were chosen randomly for sequencing. Methylation status of cytosines was assessed by comparing the sequence of the bisulphite treated DNA with that of untreated DNA.
Quantitative PCR analysis. The gene-specific primers for qRT-PCR were designed using Primer Express (v3.0) software (Applied Biosystems, Foster City, CA). The primer sequences used in this study are listed in Supplementary Table S6. The quantitative PCR analysis was performed as described previously 80 employing ABI 7500 Real-Time PCR System (Applied Biosystems). At least two independent biological replicates for each sample and three technical replicates of each biological replicate were analyzed for the analysis. The transcript level of each gene in different tissue samples was normalized with the transcript level of internal control gene, Ubiquitin5 (UBQ5) 83 and fold change was calculated as compared to the control condition.
Data availability. The DNA methylation, RNA-seq and small RNA sequencing data generated in this study have been deposited with NCBI at Gene expression Omnibus (GEO) under series accession numbers GSE60288, GSE60287 and GSE64651, respectively.