Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Resource
  • Published:

Genome-wide identification of mRNA 5-methylcytosine in mammals

Abstract

Accurate and systematic transcriptome-wide detection of 5-methylcytosine (m5C) has proved challenging, and there are conflicting views about the prevalence of this modification in mRNAs. Here we report an experimental and computational framework that robustly identified mRNA m5C sites and determined sequence motifs and structural features associated with the modification using a set of high-confidence sites. We developed a quantitative atlas of RNA m5C sites in human and mouse tissues based on our framework. In a given tissue, we typically identified several hundred exonic m5C sites. About 62–70% of the sites had low methylation levels (<20% methylation), while 8–10% of the sites were moderately or highly methylated (>40% methylation). Cross-species analysis revealed that species, rather than tissue type, was the primary determinant of methylation levels, indicating strong cis-directed regulation of RNA methylation. Combined, these data provide a valuable resource for identifying the regulation and functions of RNA methylation.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: The evaluation of BS-seq library construction protocols with different reaction conditions.
Fig. 2: Development of a computational framework to identify high-confidence m5C sites.
Fig. 3: Performance of our computational pipeline and sequence and structural features of mRNA m5C.
Fig. 4: The validation of mRNA m5C sites using human and mouse NSUN2-knockout models.
Fig. 5: Global characterization of mRNA m5C and the effect of CDS m5C sites on translation.
Fig. 6: The profiles of mRNA m5C in human and mouse.

Similar content being viewed by others

Data availability

The sequence data have been deposited in the NCBI GEO database under the accession code GSE122260. All other data are available from the corresponding author upon reasonable request.

Code availability

All relevant code and data processing pipelines have been deposited in GitHub (https://github.com/SYSU-zhanglab/RNA-m5C).

References

  1. Li, S. & Mason, C. E. The pivotal regulatory landscape of RNA modifications. Annu. Rev. Genomics Hum. Genet. 15, 127–150 (2014).

    Article  CAS  Google Scholar 

  2. Gilbert, W. V., Bell, T. A. & Schaening, C. Messenger RNA modifications: form, distribution, and function. Science 352, 1408–1412 (2016).

    Article  CAS  Google Scholar 

  3. Roundtree, I. A., Evans, M. E., Pan, T. & He, C. Dynamic RNA modifications in gene expression regulation. Cell 169, 1187–1200 (2017).

    Article  CAS  Google Scholar 

  4. Machnicka, M. A. et al. MODOMICS: a database of RNA modification pathways–2013 update. Nucleic Acids Res. 41, D262–D267 (2013).

    Article  CAS  Google Scholar 

  5. Grozhik, A. V. & Jaffrey, S. R. Distinguishing RNA modifications from noise in epitranscriptome maps. Nat. Chem. Biol. 14, 215–225 (2018).

    Article  CAS  Google Scholar 

  6. Ramaswami, G. et al. Accurate identification of human Alu and non-Alu RNA editing sites. Nat. Methods 9, 579–581 (2012).

    Article  CAS  Google Scholar 

  7. Bass, B. et al. The difficult calls in RNA editing. Interviewed by H Craig Mak. Nat. Biotechnol. 30, 1207–1209 (2012).

    Article  CAS  Google Scholar 

  8. Bahn, J. H. et al. Accurate identification of A-to-I RNA editing in human by transcriptome sequencing. Genome Res. 22, 142–150 (2012).

    Article  CAS  Google Scholar 

  9. Peng, Z. et al. Comprehensive analysis of RNA-Seq data reveals extensive RNA editing in a human transcriptome. Nat Biotechnol. 30, 253–260 (2012).

    Article  CAS  Google Scholar 

  10. Ramaswami, G. et al. Identifying RNA editing sites using RNA sequencing data alone. Nat. Methods 10, 128–132 (2013).

    Article  CAS  Google Scholar 

  11. Blanco, S. & Frye, M. Role of RNA methyltransferases in tissue renewal and pathology. Curr. Opin. Cell Biol. 31, 1–7 (2014).

    Article  CAS  Google Scholar 

  12. Schaefer, M. et al. RNA methylation by Dnmt2 protects transfer RNAs against stress-induced cleavage. Genes Dev. 24, 1590–1595 (2010).

    Article  CAS  Google Scholar 

  13. Blanco, S. et al. Stem cell function and stress response are controlled by protein synthesis. Nature 534, 335–340 (2016).

    Article  CAS  Google Scholar 

  14. Sharma, S., Yang, J., Watzinger, P., Kotter, P. & Entian, K. D. Yeast Nop2 and Rcm1 methylate C2870 and C2278 of the 25S rRNA, respectively. Nucleic Acids Res. 41, 9062–9076 (2013).

    Article  CAS  Google Scholar 

  15. Schosserer, M. et al. Methylation of ribosomal RNA by NSUN5 is a conserved mechanism modulating organismal lifespan. Nat. Commun. 6, 6158 (2015).

    Article  CAS  Google Scholar 

  16. Luo, Y., Feng, J., Xu, Q., Wang, W. & Wang, X. NSun2 deficiency protects endothelium from inflammation via mRNA methylation of ICAM-1. Circ. Res. 118, 944–956 (2016).

    Article  CAS  Google Scholar 

  17. Li, Q. et al. NSUN2-mediated m5C methylation and METTL3/METTL14-mediated m6A methylation cooperatively enhance p21 translation. J. Cell Biochem. 118, 2587–2598 (2017).

    Article  CAS  Google Scholar 

  18. Shen, Q. et al. Tet2 promotes pathogen infection-induced myelopoiesis through mRNA oxidation. Nature 554, 123–127 (2018).

    Article  CAS  Google Scholar 

  19. Guallar, D. et al. RNA-dependent chromatin targeting of TET2 for endogenous retrovirus control in pluripotent stem cells. Nat. Genet. 50, 443–451 (2018).

    Article  CAS  Google Scholar 

  20. Yang, X. et al. 5-methylcytosine promotes mRNA export—NSUN2 as the methyltransferase and ALYREF as an m5C reader. Cell Res. 27, 606–625 (2017).

    Article  CAS  Google Scholar 

  21. Cheng, J. X. et al. RNA cytosine methylation and methyltransferases mediate chromatin organization and 5-azacytidine response and resistance in leukaemia. Nat. Commun. 9, 1163 (2018).

    Article  Google Scholar 

  22. Khoddami, V. & Cairns, B. R. Identification of direct targets and modified bases of RNA cytosine methyltransferases. Nat. Biotechnol. 31, 458–464 (2013).

    Article  CAS  Google Scholar 

  23. Hussain, S. et al. NSun2-mediated cytosine-5 methylation of vault noncoding RNA determines its processing into regulatory small RNAs. Cell Rep. 4, 255–261 (2013).

    Article  CAS  Google Scholar 

  24. Lister, R. et al. Human DNA methylomes at base resolution show widespread epigenomic differences. Nature 462, 315–322 (2009).

    Article  CAS  Google Scholar 

  25. Hussain, S., Aleksic, J., Blanco, S., Dietmann, S. & Frye, M. Characterizing 5-methylcytosine in the mammalian epitranscriptome. Genome Biol. 14, 215 (2013).

    Article  Google Scholar 

  26. Squires, J. E. et al. Widespread occurrence of 5-methylcytosine in human coding and non-coding RNA. Nucleic Acids Res. 40, 5023–5033 (2012).

    Article  CAS  Google Scholar 

  27. Edelheit, S., Schwartz, S., Mumbach, M. R., Wurtzel, O. & Sorek, R. Transcriptome-wide mapping of 5-methylcytidine RNA modifications in bacteria, archaea, and yeast reveals m5C within archaeal mRNAs. PLoS Genet. 9, e1003602 (2013).

    Article  CAS  Google Scholar 

  28. Legrand, C. et al. Statistically robust methylation calling for whole-transcriptome bisulfite sequencing reveals distinct methylation patterns for mouse RNAs. Genome Res. 27, 1589–1596 (2017).

    Article  CAS  Google Scholar 

  29. Amort, T. et al. Distinct 5-methylcytosine profiles in poly(A) RNA from mouse embryonic stem cells and brain. Genome Biol. 18, 1 (2017).

    Article  Google Scholar 

  30. David, R. et al. Transcriptome-wide mapping of RNA 5-methylcytosine in Arabidopsis mRNAs and noncoding RNAs. Plant Cell 29, 445–460 (2017).

    Article  CAS  Google Scholar 

  31. Batista, PedroJ. et al. m6A RNA modification controls cell fate transition in mammalian embryonic stem cells. Cell Stem Cell 15, 707–719 (2014).

    Article  CAS  Google Scholar 

  32. Blanco, S. et al. Aberrant methylation of tRNAs links cellular stress to neuro-developmental disorders. EMBO J. 33, 2020–2039 (2014).

    Article  CAS  Google Scholar 

  33. Hoernes, ThomasP. et al. Nucleotide modifications within bacterial messenger RNAs regulate their translation and are able to rewire the genetic code. Nucleic Acids Res. 44, 852–862 (2016).

    Article  CAS  Google Scholar 

  34. Park, J. E., Yi, H., Kim, Y., Chang, H. & Kim, V. N. Regulation of poly(A) tail and translation during the somatic cell cycle. Mol. Cell 62, 462–471 (2016).

    Article  CAS  Google Scholar 

  35. Stumpf, C. R., Moreno, M. V., Olshen, A. B., Taylor, B. S. & Ruggero, D. The translational landscape of the mammalian cell cycle. Mol. Cell 52, 574–582 (2013).

    Article  CAS  Google Scholar 

  36. Sazanov, L. A. A giant molecular proton pump: structure and mechanism of respiratory complex I. Nat. Rev. Mol. Cell Biol. 16, 375 (2015).

    Article  CAS  Google Scholar 

  37. Stroud, D. A. et al. Accessory subunits are integral for assembly and function of human mitochondrial complex I. Nature 538, 123 (2016).

    Article  CAS  Google Scholar 

  38. Colombini, M. VDAC: the channel at the interface between mitochondria and the cytosol. Mol. Cell. Biochem. 256, 107–115 (2004).

    Article  Google Scholar 

  39. Barbosa-Morais, N. L. et al. The evolutionary landscape of alternative splicing in vertebrate species. Science 338, 1587–1593 (2012).

    Article  CAS  Google Scholar 

  40. Merkin, J., Russell, C., Chen, P. & Burge, C. B. Evolutionary dynamics of gene and isoform regulation in mammalian tissues. Science 338, 1593–1599 (2012).

    Article  CAS  Google Scholar 

  41. Tan, M. H. et al. Dynamic landscape and regulation of RNA editing in mammals. Nature 550, 249–254 (2017).

    Article  Google Scholar 

  42. Zhang, R. et al. Quantifying RNA allelic ratios by microfluidic multiplex PCR and sequencing. Nat. Methods 11, 51–54 (2014).

    Article  CAS  Google Scholar 

  43. Dominissini, D. et al. Topology of the human and mouse m6A RNA methylomes revealed by m6A-seq. Nature 485, 201–206 (2012).

    Article  CAS  Google Scholar 

  44. Schmittgen, T. D. & Livak, K. J. Analyzing real-time PCR data by the comparative CT method. Nat. Protocols 3, 1101 (2008).

    Article  CAS  Google Scholar 

  45. Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J. 17, 10–12 (2011).

    Article  Google Scholar 

  46. Kopylova, E., Noe, L. & Touzet, H. SortMeRNA: fast and accurate filtering of ribosomal RNAs in metatranscriptomic data. Bioinformatics 28, 3211–3217 (2012).

    Article  CAS  Google Scholar 

  47. Kim, D. et al. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 14, R36 (2013).

    Article  Google Scholar 

  48. Frazee, A. C., Jaffe, A. E., Langmead, B. & Leek, J. T. Polyester: simulating RNA-seq datasets with differential transcript expression. Bioinformatics 31, 2778–2784 (2015).

    Article  CAS  Google Scholar 

  49. Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).

    Article  CAS  Google Scholar 

  50. Kim, D., Langmead, B. & Salzberg, S. L. HISAT: a fast spliced aligner with low memory requirements. Nat. Methods 12, 357–360 (2015).

    Article  CAS  Google Scholar 

  51. Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).

    Article  CAS  Google Scholar 

  52. Chan, P. P. & Lowe, T. M. GtRNAdb 2.0: an expanded database of transfer RNA genes identified in complete and draft genomes. Nucleic Acids Res. 44, D184–D189 (2016).

    Article  CAS  Google Scholar 

  53. Quast, C. et al. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res. 41, D590–D596 (2013).

    Article  CAS  Google Scholar 

  54. Crooks, G. E., Hon, G., Chandonia, J.-M. & Brenner, S. E. WebLogo: a sequence logo generator. Genome Res. 14, 1188–1190 (2004).

    Article  CAS  Google Scholar 

  55. Lorenz, R. et al. ViennaRNA package 2.0. Algorithms Mol. Biol. 6, 26 (2011).

    Article  Google Scholar 

  56. Artieri, C. G. & Fraser, H. B. Accounting for biases in riboprofiling data indicates a major role for proline in stalling translation. Genome Res. 24, 2011–2021 (2014).

    Article  CAS  Google Scholar 

  57. Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: a bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

We thank J.B. Li and members of R.Z.’s laboratory for critical discussion of the project and L. Wu for manuscript editing. We thank SYSU Ecology and Evolutionary Biology Sequencing Core Facility for the sequencing service. This study was supported by grants from the National Key R&D Program of China (no. 2018YFC1003100), Guangdong Major Science and Technology Projects (no. 2017B020226002 to R.Z.), Guangdong Innovative and Entrepreneurial Research Team Program (no. 2016ZT06S638 to R.Z.) and National Natural Science Foundation of China (nos. 91631108 and 31571341 to R.Z.).

Author information

Authors and Affiliations

Authors

Contributions

R.Z. conceived the project. T.H. and W.C. conducted the experiments. J.L performed the bioinformatics analysis. J.L, T.H., W.C., N.G. and R.Z. wrote the manuscript.

Corresponding author

Correspondence to Rui Zhang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated supplementary information

Supplementary Fig. 1 Evaluation of the sequencing mappers and RNA bisulfite treatment conditions.

(a) Comparison of different sequencing mappers. The mapping efficiency and accuracy of different mappers evaluated with 100 bp (left) or 50bp (right) simulated paired-end reads. Simulated reads were generated by R package polyester. Six mapping strategies were tested: Bowtie2 (2.2.9), HISAT2 (2.10.0), HISAT2 plus Bowtie2, meRanGh (HISAT2-2.0.4) and meRanGs (STAR-2.5.2b) from meRanTK-1.20, and BS-RNA (2.10.0). The mapping rates were shown on the y axis. Pseudogene means that the simulated reads generated from pseudogenes were mapped to their parent genes. Despite these reads were not mapped to their original genomic coordinates, they may not be considered as incorrect mapping. (b) Comparison of the coverage of each ERCC mix between medium-stringency condition and high-stringency condition. The coverages were normalized by all detected ERCC reads of each library (Methods). The concentrations of the ERCC mixes reported in the manufacturer’s protocol were shown on the x axis. Each concentration contains several ERCC transcripts. (c) The distribution of the coverage across the in vitro transcribed transcript. The coverage is normalized by the total base count of the in vitro transcript. Dashed lines indicate the positions of the 5 Cs. C380 was omitted because of the significant decrease of sequenced depth, which indicates the failure of synthesis of the full-length transcript.

Supplementary Fig. 2 Characterizing the conversion rate, C-reads and m5C site distribution in different studies.

(a) Comparison of conversion rates among different studies. The raw sequencing data from all studies were mapped using our pipeline and all annotated transcripts were used to estimate the overall conversion rate. Conversion rates estimated using ERCC mixes were also shown in our samples. Annot., all genes from Ensembl annotation. (b) The cumulative distributions of C-reads among different studies. The number of Cs (C-content) in each C-read was shown on x axis. A lowly converted sample and a highly converted sample in this study, along with three samples from other studies were shown as examples. The rankings of the samples based on the overall conversion rate in the studies were given in parentheses. Forward (blue) and reverse (orange) reads were plotted separately. The dashed line indicates C-content of 3. c, Comparison of gene-specific conversion rates among BS-seq data generated by different methods. The distributions of gene-specific conversion rates in samples constructed by different BS-seq library construction protocols. The same 5 samples as in b were shown as examples. Genes with two C-position coverage cutoffs were shown: >1,000 (grey) and >10,000 (blue). Overall conversion rate (orange) and conversion rate estimated by ERCC mixes (red) were indicated as dashed lines. d, Stacked barplot showing the number of genes with different numbers of putative m5C sites. Sites reported in the original studies were used for analysis. The mRNA m5C site list of Legrand et al. is not available. e, Stacked barplot showing the number of m5C sites in different cluster statuses (Methods). f, Barplot showing the Gini coefficient in each sample.

Supplementary Fig. 3 Verification of RNA m5C via RIP rtPCR and m5C RIP-seq.

(a) The fold changes between IP and input samples of in vitro transcribed RNAs with different numbers of m5C sites relative to the non-m5C control RNA. Three anti-m5C antibodies (Diagenode, C15200003; Zymo Research, A3001; Abcam, ab10805) were examined using RIP RT-PCR. Three transcripts (Oligo-C, Oligo-10m5C and Oligo-28m5C, Supplementary Table 5) containing 0, 10 and 28 m5C sites were used. Error bars, standard deviation based on three technical replicates. (b) Top: dot blotting on in vitro transcribed m5C and non-m5C RNAs with an anti-m5C antibody (Diagenode, C15200003). Bottom: methylene blue staining of the RNAs served as a loading control. (c, d) The relationship between the enrichment fold and the number of m5C sites in the in vitro transcribed transcripts. (c) Transcripts containing 0, 1, 3, 5, 8, 10, 28 m5Cs were examined using RIP RT-PCR. In short, m5C or non-m5C RNAs in IP and input samples were measured by RT-PCR. The log2 fold change between m5C and non-m5C RNAs was then calculated. (d) For the transcript with 5 m5C sites, m5C and non-m5C transcripts were mixed at different allelic frequencies (0, 10, 20 and 50%) and two transcripts were tested. Error bars, standard deviation based on three technical replicates. (e) The relationship between the enrichment fold and the number of m6A sites in the in vitro transcribed transcripts. In vitro transcribed transcripts containing 0, 1, 2, 3, 4, 5, 8, 10, 17 m6As were examined using RIP RT-PCR. For the transcript with 1 m6A site, m6A and non-m6A transcripts were mixed at different allelic frequencies (20, 40 and 80%). Error bars, standard deviation based on three technical replicates. (f) RPKM of CTP or m5CTP transcribed transcripts in m5C RIP-seq, corresponding to Fig. 2c. (g) The control window selection in m5C RIP-seq data analysis. An RKPM cutoff of input was set to remove lowly expressed windows (grey). Next, two overlapped sliding windows of one m5C site were selected as the m5C-containing windows (orange). The upstream and downstream windows that were 200 nt away from the m5C-containing windows were selected as control windows (green). (h) The cumulative distributions of winscores of windows with either non-clustered or clustered m5C sites in HeLa cells. Non-clustered, windows containing the high-confidence sites called using our pipeline; clustered, windows containing the sites called without sample-specific filters minus windows containing the high-confidence sites. Control windows were selected from the upstream and downstream regions of the m5C site containing windows. Windows with RPKM ≥ 1 in the input sample were required for analysis. The difference between control windows and m5C candidate windows was determined using the Kolmogorov-Smirnov test.

Supplementary Fig. 4 The development of the computational filters.

(a) The distribution of clustered sites across different C-cutoffs. Here we used one sample as an example to illustrate how to determine the C-cutoff filter based on the Gini coefficient. One HeLa cell sample from Yang et al. was re-analyzed using our computational pipeline. The number of sites per gene called using different C-cutoffs was shown, along with the Gini coefficient. In this sample, the Gini coefficient determined C-cutoff was 3 because it is the largest C-cutoff where the corresponding Gini coefficient is < 0.15. (b) Examples of false positive sites that were removed by C-cutoff filter. After applying the C-cutoff filter, the minimal frequencies of these two sites were < 0.1 and therefore were removed. The sites from one HeLa cell sample of Yang et al. and one mouse muscle sample in this study were shown. Base count of sequenced Cs or Ts (y axis) and the C-content of the corresponding reads (x axis) were shown. C-content is defined as the number of Cs in a given read. The Gini coefficient determined C-cutoffs are 3 in both samples. (c) Examples of false positive sites that were removed by signal ratio filter. (d) Examples of high-confidence sites. Reads covering these two sites only contain 0–3 Cs. (e) Outline of the filtering procedure applied in our computational pipeline. Red numbers describe the number of sites that remained after each filtering step. The proportions of sites that with significantly decreased m5C levels (p < 0.05, two-sided Fisher’s exact test) upon NSUN2 knockdown after each filtering step are given in parentheses. These numbers are for the HeLa cells BS-seq data with two biological replicates. The sites shown in filtering step 1 to 6 were the union of the two replicates. (f) The relationship between the cluster status of the sites and the proportion of the sites with decreased m5C levels upon NSUN2 knockdown in HeLa cells. The cluster degree of m5C sites called with different C-cutoffs was shown as a barplot. The proportions of the sites with significantly decreased (p < 0.05, two-sided Fisher’s exact test) m5C levels in the knockdown sample were shown as a line plot.

Supplementary Fig. 5 Only NSUN2 regulates the mRNA m5C in HeLa cells.

(a) The expression levels of NSUN methyltransferase family and DNMT2 in HeLa cells. Data was obtained from the Human Protein Atlas (https://www.proteinatlas.org). (b) Comparison of the expression levels of NSUN2, NSUN3, NSUN4 and NSUN6 in control and knockdown samples. mRNA expression was measured using RT-PCR. Three technical replicates were performed. P-values were calculated using two-sided Student’s t-test: () P < 0.01; () P < 0.001. Error bars denote standard deviation. (c) Western blot analysis of NSUN2, NSUN4 and NSUN6 upon siRNA knockdown. NSUN3 data was not shown because the NSUN3 antibody does not work. (d) Boxplot showing m5C level between control and knockdown HeLa cells. m5C sites identified in the control sample were used for analysis. Box boundaries represent 25th and 75th percentiles; center line represents the median; whiskers indicate ± 1.5 × IQR.

Supplementary Fig. 6 tRNA contamination does not affect the identification of mRNA m5C sites.

tRNA information was obtained from GtRNAdb. All reads that cover the m5C sites were extracted and mapped to the reference sequences (C-to-T converted tRNAs with 100 bp flanking regions) with bowtie2. The number of sites with reads that have > 10 bp overlap with tRNAs was shown. For the reads with > 10bp overlap with tRNA sequences, we further aligned them to the tRNAs with blastn (E-value < 0.1). The number of sites with reads that have matched bases > 40 or 45 was also shown. HeLa, libraries from HeLa cells; Human, all libraries from human tissues; Mouse, all libraries from mouse tissues.

Supplementary Fig. 7 The generation of NSUN2 knockout cells.

(a) Schematic representation of NSUN2 knockout generation using the CRISPR-Cas9 system. The sequence of the sgRNA and information for the mutant are indicated. A single base deletion was introduced at C177 in CDS region, which results in a premature stop codon at amino acid position 65. (b) Sanger sequencing validation of NSUN2 mutagenesis. Wild type and NSUN2 mutant were shown. (c, d) Western blot validation of NSUN2 mutagenesis (c) and rescue (d).

Supplementary Fig. 8 Overlap of m5C sites between replicates.

(a, b) Top: The numbers and levels of sites that are shared or found in one replicate only in human (a) and mouse (b). Box boundaries represent 25th and 75th percentiles; center line represents the median; whiskers indicate ± 1.5 × IQR. Bottom: Scatter plots showing the comparison of m5C levels between replicates. Three human and mouse tissues were selected as examples. Sites that are covered by at least 20 reads in both samples were shown. For mouse tissues with multiple replicates, we randomly selected two replicates (one from our samples and another from Yang et al.’s samples) for analysis.

Supplementary Fig. 9 Characterizing the high-confidence m5C sites in human and mouse.

(a) The number of m5C sites identified in human and mouse tissues or cells. The number of mapped reads in each library was shown as dots. (b) Genic positions of m5C sites. 5′ UTR, CDS and 3′ UTR of protein-coding genes were shown. ncRNA, Non-coding transcripts. (c) The distribution of m5C sites across the transcripts. Sites identified in all samples were first combined. Next, each m5C site was assigned to the longest isoform of the corresponding gene. The number of bins was determined based on relative lengths between 5’UTR, CDS and 3’UTR (human, 10, 50 and 40 bins; mouse, 10, 60 and 50 bins). The red curve was fitted with polyfit. (d) The cluster degree of m5C sites in human and mouse.

Supplementary Fig. 10 The profiles of mRNA m5C in different tissues.

(a) The number of sites that are shared between tissues in human and mouse. (b) Heatmap of normalized m5C levels in human and mouse. For each species, only sites with methylation levels ≥ 10% in at least one sample were selected for analysis. The methylation levels are normalized across samples for each site. NA was treated as zero. (c) Gene Ontology (GO) terms enriched in m5C containing genes in mice muscle and heart. Sites with methylation level > 20% were selected. GO terms with a BH corrected p value < 0.01, enrichment score > 1.5 and gene counts > 10 were shown. Go term analysis were performed using David and biological process, molecular function and cellular component were selected.

Supplementary information

Supplementary information

Supplementary Figs. 1–10 and Supplementary Notes 1–8

Reporting Summary

Supplementary Table 1

Comparison of BS-seq protocols between different studies

Supplementary Table 2

The mapping statistics and conversion status of all RNA BS-seq libraries studied

Supplementary Table 3

The mapping statistics of m5C RIP-seq libraries

Supplementary Table 4

m5C site lists in human and mouse

Supplementary Table 5

Sequences of DNA templates used in this study

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Huang, T., Chen, W., Liu, J. et al. Genome-wide identification of mRNA 5-methylcytosine in mammals. Nat Struct Mol Biol 26, 380–388 (2019). https://doi.org/10.1038/s41594-019-0218-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41594-019-0218-x

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing