Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Systematic calibration of epitranscriptomic maps using a synthetic modification-free RNA library

Abstract

Recent years have witnessed rapid progress in the field of epitranscriptomics. Functional interpretation of the epitranscriptome relies on sequencing technologies that determine the location and stoichiometry of various RNA modifications. However, contradictory results have been reported among studies, bringing the biological impacts of certain RNA modifications into doubt. Here, we develop a synthetic RNA library resembling the endogenous transcriptome but without any RNA modification. By incorporating this modification-free RNA library into established mapping techniques as a negative control, we reveal abundant false positives resulting from sequence bias or RNA structure. After calibration, precise and quantitative mapping expands the understanding of two representative modification types, N6-methyladenosine (m6A) and 5-methylcytosine (m5C). We propose that this approach provides a systematic solution for the calibration of various RNA-modification mappings and holds great promise in epitranscriptomic studies.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: In vitro transcription generates a modification-free RNA library.
Fig. 2: The IVT RNA library used as a negative control in MeRIP-seq/m6A-seq.
Fig. 3: Detection of single-nucleotide m6A sites by m6A-REF-seq/MAZTER-seq using the IVT RNA library as a negative control.
Fig. 4: Quantitative m6A mapping with single-nucleotide resolution.
Fig. 5: Detection of RNA m5C based on BS-seq coupled with an IVT RNA library used as a negative control.

Similar content being viewed by others

Data availability

All data generated for this paper have been deposited at NCBI’s Gene Expression Omnibus under accession number GSE151028. Source data are provided with this paper.

Code availability

Code and scripts are available at https://github.com/Epigenomics-Lab/RNAMod.

References

  1. He, C. Grand challenge commentary: RNA epigenetics? Nat. Chem. Biol. 6, 863–865 (2010).

    Article  CAS  PubMed  Google Scholar 

  2. Saletore, Y. et al. The birth of the epitranscriptome: deciphering the function of RNA modifications. Genome Biol. 13, 8–12 (2012).

    Article  CAS  Google Scholar 

  3. Roundtree, I. A., Evans, M. E., Pan, T. & He, C. Dynamic RNA modifications in gene expression regulation. Cell 169, 1187–1200 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Zaccara, S., Ries, R. J. & Jaffrey, S. R. Reading, writing and erasing mRNA methylation. Nat. Rev. Mol. Cell Biol. 20, 608–624 (2019).

    Article  CAS  PubMed  Google Scholar 

  5. Linder, B. & Jaffrey, S. R. Discovering and mapping the modified nucleotides that comprise the epitranscriptome of mRNA. Cold Spring Harb. Perspect. Biol. 11, a032201 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Li, X., Xiong, X. & Yi, C. Epitranscriptome sequencing technologies: decoding RNA modifications. Nat. Methods 14, 23–31 (2016).

    Article  PubMed  CAS  Google Scholar 

  7. Motorin, Y. & Helm, M. Methods for RNA modification mapping using deep sequencing: established and new emerging technologies. Genes 10, 35 (2019).

    Article  PubMed Central  CAS  Google Scholar 

  8. Helm, M. & Motorin, Y. Detecting RNA modifications in the epitranscriptome: predict and validate. Nat. Rev. Genet. 18, 275–291 (2017).

    Article  CAS  PubMed  Google Scholar 

  9. Schwartz, S. & Motorin, Y. Next-generation sequencing technologies for detection of modified nucleotides in RNAs. RNA Biol. 14, 1124–1137 (2017).

    Article  PubMed  Google Scholar 

  10. Dominissini, D. et al. Topology of the human and mouse m6A RNA methylomes revealed by m6A-seq. Nature 485, 201–206 (2012).

    Article  CAS  PubMed  Google Scholar 

  11. Meyer, K. D. et al. Comprehensive analysis of mRNA methylation reveals enrichment in 3′ UTRs and near stop codons. Cell 149, 1635–1646 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Gu, W., Hurto, R. L., Hopper, A. K., Grayhack, E. J. & Phizicky, E. M. Depletion of Saccharomyces cerevisiae tRNAHis guanylyltransferase Thg1p leads to uncharged tRNAHis with additional m5C. Mol. Cell. Biol. 25, 8191–8201 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Schaefer, M., Pollex, T., Hanna, K. & Lyko, F. RNA cytosine methylation analysis by bisulfite sequencing. Nucleic Acids Res. 37, e12 (2009).

    Article  PubMed  CAS  Google Scholar 

  14. Dominissini, D. et al. The dynamic N1-methyladenosine methylome in eukaryotic messenger RNA. Nature 530, 441–446 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Li, X. et al. Transcriptome-wide mapping reveals reversible and dynamic N1-methyladenosine methylome. Nat. Chem. Biol. 12, 311–316 (2016).

    Article  CAS  PubMed  Google Scholar 

  16. Marchand, V., Blanloeil-Oillo, F., Helm, M. & Motorin, Y. Illumina-based RiboMethSeq approach for mapping of 2′-O-Me residues in RNA. Nucleic Acids Res. 44, e135 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  17. Dai, Q. et al. Nm-seq maps 2′-O-methylation sites in human mRNA with base precision. Nat. Methods 14, 695–698 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Schwartz, S. et al. Transcriptome-wide mapping reveals widespread dynamic-regulated pseudouridylation of ncRNA and mRNA. Cell 159, 148–162 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Wang, Y., Xiao, Y., Dong, S., Yu, Q. & Jia, G. Antibody-free enzyme-assisted chemical approach for detection of N6-methyladenosine. Nat. Chem. Biol. 16, 896–903 (2020).

    Article  CAS  PubMed  Google Scholar 

  20. Linder, B. et al. Single-nucleotide-resolution mapping of m6A and m6Am throughout the transcriptome. Nat. Methods 12, 767–772 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Koh, C. W. Q., Goh, Y. T. & Goh, W. S. S. Atlas of quantitative single-base-resolution N6-methyl-adenine methylomes. Nat. Commun. 10, 5636 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Garcia-Campos, M. A. et al. Deciphering the ‘m6A code’ via antibody-independent quantitative profiling. Cell 178, 731–747 (2019).

    Article  CAS  PubMed  Google Scholar 

  23. Zhang, Z. et al. Single-base mapping of m6A by an antibody-independent method. Sci. Adv. 5, eaax0250 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Li, X. et al. Base-resolution mapping reveals distinct m1A methylome in nuclear- and mitochondrial-encoded transcripts. Mol. Cell 68, 993–1005 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Safra, M. et al. The m1A landscape on cytosolic and mitochondrial mRNA at single-base resolution. Nature 551, 251–255 (2017).

    Article  CAS  PubMed  Google Scholar 

  26. Shu, X. et al. A metabolic labeling method detects m6A transcriptome-wide at single base resolution. Nat. Chem. Biol. 16, 887–895 (2020).

    Article  CAS  PubMed  Google Scholar 

  27. Liu, N. et al. Probing N6-methyladenosine RNA modification status at single nucleotide resolution in mRNA and long noncoding RNA. RNA 19, 1848–1856 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Liu, W. et al. Identification of a selective DNA ligase for accurate recognition and ultrasensitive quantification of N6-methyladenosine in RNA at one-nucleotide resolution. Chem. Sci. 9, 3354–3359 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Xiao, Y. et al. An elongation and ligation-based qPCR amplification method for the radiolabeling-free detection of locus-specific N6-methyladenosine modifications. Angew. Chem. Int. Ed. Engl. 57, 15995–16000 (2018).

    Article  CAS  PubMed  Google Scholar 

  30. Meyer, K. D. DART-seq: an antibody-free method for global m6A detection. Nat. Methods 16, 1275–1280 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Worpenberg, L., Jakobi, T., Dieterich, C. & Roignant, J.-Y. (eds Wajapeyee, N. & Gupta, R.) Identification of methylated transcripts using the TRIBE approach. Epitranscriptomics. Methods in Molecular Biology 1870, 89–106 (Humana Press, 2019).

  32. Grozhik, A. V. & Jaffrey, S. R. Distinguishing RNA modifications from noise in epitranscriptome maps. Nat. Chem. Biol. 14, 215–225 (2018).

    Article  CAS  PubMed  Google Scholar 

  33. Slama, K. et al. Determination of enrichment factors for modified RNA in MeRIP experiments. Methods 156, 102–109 (2019).

    Article  CAS  PubMed  Google Scholar 

  34. Helm, M., Lyko, F. & Motorin, Y. Limited antibody specificity compromises epitranscriptomic analyses. Nat. Commun. 10, 5669 (2019).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  35. McIntyre, A. B. R. et al. Limits in the detection of m6A changes using MeRIP/m6A-seq. Sci. Rep. 10, 6590 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Zhou, H. et al. Evolution of a reverse transcriptase to map N1-methyladenosine in human messenger RNA. Nat. Methods 16, 1281–1288 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Huang, T., Chen, W., Liu, J., Gu, N. & Zhang, R. Genome-wide identification of mRNA 5-methylcytosine in mammals. Nat. Struct. Mol. Biol. 26, 380–388 (2019).

    Article  CAS  PubMed  Google Scholar 

  38. Legrand, C. et al. Statistically robust methylation calling for whole transcriptome bisulfite sequencing reveals distinct methylation patterns for mouse RNAs. Genome Res. 27, 1589–1596 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Squires, J. E. et al. Widespread occurrence of 5-methylcytosine in human coding and non-coding RNA. Nucleic Acids Res. 40, 5023–5033 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Yang, X. et al. 5-methylcytosine promotes mRNA export—NSUN2 as the methyltransferase and ALYREF as an m5C reader. Cell Res. 27, 606–625 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Schwartz, S. et al. High-resolution mapping reveals a conserved, widespread, dynamic mRNA methylation program in yeast meiosis. Cell 155, 1409–1421 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Meyer, K. D. & Jaffrey, S. R. Rethinking m6A readers, writers, and erasers. Annu. Rev. Cell Dev. Biol. 33, 319–342 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Molinie, B. et al. m6A-LAIC-seq reveals the census and complexity of the m6A epitranscriptome. Nat. Methods 13, 692–698 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Schmid, K. et al. Variable presence of 5-methylcytosine in commercial RNA and DNA. RNA Biol. 12, 1152–1158 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  45. Schmidt, W. M. & Mueller, M. W. CapSelect: a highly sensitive method for 5′ CAP-dependent enrichment of full-length cDNA in PCR-mediated analysis of mRNAs. Nucleic Acids Res. 27, e31 (1999).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Wulf, M. G. et al. Non-templated addition and template switching by Moloney murine leukemia virus (MMLV)-based reverse transcriptases co-occur and compete with each other. J. Biol. Chem. 294, 18220–18231 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  47. Douvlataniotis, K., Bensberg, M., Lentini, A., Gylemo, B. & Nestor, C. E. No evidence for DNA N6-methyladenine in mammals. Sci. Adv. 6, eaay3335 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Geula, S. et al. m6A mRNA methylation facilitates resolution of naive pluripotency toward differentiation. Science 347, 1002–1006 (2015).

    Article  CAS  PubMed  Google Scholar 

  49. Liu, J. et al. The RNA m6A reader YTHDC1 silences retrotransposons and guards ES cell identity. Nature 591, 322–326 (2021).

    Article  CAS  PubMed  Google Scholar 

  50. Batista, P. J. et al. m6A RNA modification controls cell fate transition in mammalian embryonic stem cells. Cell Stem Cell 15, 707–719 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Liu, J. et al. N6-methyladenosine of chromosome-associated regulatory RNA regulates chromatin state and transcription. Science 367, 580–586 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Zhou, J. et al. Dynamic m6A mRNA methylation directs translational control of heat shock response. Nature 526, 591–594 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Li, H. B. et al. m6A mRNA methylation controls T cell homeostasis by targeting the IL-7/STAT5/SOCS pathways. Nature 548, 338–342 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Imanishi, M., Tsuji, S., Suda, A. & Futaki, S. Detection of N6-methyladenosine based on the methyl-sensitivity of MazF RNA endonuclease. Chem. Commun. 53, 12930–12933 (2017).

    Article  CAS  Google Scholar 

  55. Kivioja, T. et al. Counting absolute numbers of molecules using unique molecular identifiers. Nat. Methods 9, 72–74 (2012).

    Article  CAS  Google Scholar 

  56. Wei, J. et al. Differential m6A, m6Am, and m1A demethylation mediated by FTO in the cell nucleus and cytoplasm. Mol. Cell 71, 973–985 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Harcourt, E. M., Ehrenschwender, T., Batista, P. J., Chang, H. Y. & Kool, E. T. Identification of a selective polymerase enables detection of N6-methyladenosine in RNA. J. Am. Chem. Soc. 135, 19079–19082 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Wang, S. et al. N6-methyladenine hinders RNA- and DNA-directed DNA synthesis: application in human rRNA methylation analysis of clinical specimens. Chem. Sci. 7, 1440–1446 (2016).

    Article  CAS  PubMed  Google Scholar 

  59. Lorenz, D. A., Sathe, S., Einstein, J. M. & Yeo, G. W. Direct RNA sequencing enables m6A detection in endogenous transcript isoforms at base-specific resolution. RNA 26, 19–28 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Parker, M. T. et al. Nanopore direct RNA sequencing maps the complexity of Arabidopsis mRNA processing and m6A modification. eLife 9, e49658 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. Liu, H. et al. Accurate detection of m6A RNA modifications in native RNA sequences. Nat. Commun. 10, 4079 (2019).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  62. Jenjaroenpun, P. et al. Decoding the epitranscriptional landscape from native RNA sequences. Nucleic Acids Res. 49, e7 (2021).

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

This work was supported by the Ministry of Science and Technology of China (G.-Z.L., National Science and Technology Major Project, grant nos. 2019YFA0802203 and 2018YFA0109100), the National Natural Science Foundation of China (G.-Z.L., grant nos. 31922015, 31870808 and 91753129), the Natural Science Foundation of Guangdong Province (G.-Z.L., grant no. 2018B030306044), the Guangdong Basic and Applied Basic Research Foundation (Z.Z., grant no. 2019A1515110099) and the China Postdoctoral Science Foundation (2019M653164 to Z.Z. and 2020M672949 to T.C.). We thank E. Drummond at the University of British Columbia for editing the manuscript.

Author information

Authors and Affiliations

Authors

Contributions

Z.Z. and G.-Z.L. conceived the project; Z.Z. and T.C. conducted experiments with assistance from H.-X.C., L.-Q.C., Y.-L.Z., L.J., W.Z., C.L. and W.-S.Z.; Z.Z. analyzed NGS data with the assistance of Y.-Y.X., B.-D.L., G.-S.C. and Y.Z.; Z.Z. and G.-Z.L. wrote the manuscript with input from D.-Z.M., W.H.N., J.C., G.J. and J.Y. All authors reviewed the results and approved the final version of the manuscript.

Corresponding author

Correspondence to Guan-Zheng Luo.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Methods thanks Hongjun Song, Matthias Schaefer, Cornelia Vesely and the other, anonymous, reviewers for their contribution to the peer review of this work. Lei Tang was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Quality control for IVT RNA samples.

a-c) LC-MS/MS detection and quantification of m5C in two commercial NTPs samples. Data are presented as mean values + /- SD (n = 3). ND, not detected. d) Quantification of three RNA modifications (m6A, m5C and m1A) in cellular mRNA and IVT RNA derived from HEK293T cells. Data are presented as mean values + /- SD (n = 3). e) The accumulation percentage curves of recapitulated genes in IVT RNA library. Different threshold (reads count ≥2, 5, 10) were used to determine whether one gene could be reproduced in IVT RNA library. f) Reads coverage across the genes which are grouped by their lengths. g-h) Gene counts, lengths, GC contents, and expression levels (TPM log10 value) for different groups of genes. Genes are grouped in respect to their expression changes or existences in two libraries derived from HEK293T (g) and mESCs (h). The number of each group of genes are indicated on the top-left panel. none: unchanged genes; up: up-regulated genes in IVT RNA library; down: down-regulated genes in IVT RNA library; mRNAmiss: gene specifically present in IVT RNA library but not in cellular mRNA library; IVTmiss: gene specifically present in cellular mRNA library but not in IVT RNA library. Box boundaries represent 25th and 75th percentiles; center line represents the median; whiskers indicate ±1.5× interquartile range (IQR). i) SNVs and Indels identified in cellular mRNA and IVT RNA libraries derived from HEK293T cells. Data are presented as mean values of two replicates. IVT unique indicates SNVs or Indels specifically present in the IVT RNA library.

Source data

Extended Data Fig. 2 The recurrent peaks identified by MeRIP-seq/m6A-seq.

Three commercial antibodies (NEB [E1610S], SYSY [202003] and Abcam [ab190886]) and two popular peak calling algorithms (Macs2 and exomePeak2) are applied for peak identification (see Supplementary Methods section 2 for detailed description). (a-c) The overlapping peaks identified by MeRIP-seq/m6A-seq using different antibodies. Cellular mRNA and corresponding IVT RNA are derived from HEK293T cells. d) Ratio of recurrent peaks in published MeRIP-seq datasets for human (n = 143) and mouse (n = 66). The m6A-irrelevant peaks (IVT peaks) are compared to specific datasets derived from HEK293T (n = 10) or mESCs (n = 14). Box boundaries represent 25th and 75th percentiles; the center line represents the median; whiskers indicate ±1.5× interquartile range (IQR). The same definition for box plot is also applied for panel (h). e) Cumulative frequency curves of IVT peaks recurrently identified in published datasets using different antibodies and peak calling software. f) Representative gene loci showing the false positive peaks identified in published datasets. The accumulated reads with dark colors represent the peaks in IP samples, whereas the light colors represent the reads from input samples. g) Venn diagram showing the miCLIP peaks in cellular mRNA and IVT RNA library. h) Ratio and counts of m6A-irrelevant peaks identified by miCLIP which recurrently appear in published MeRIP-seq datasets (n = 143). The datasets derived from HEK293T (n = 10) are separately processed and compared. m6A-irrelevant peaks identified by MeRIP-seq in this study are also processed and illustrated for comparison. Detailed information of published datasets used in this study is provided in Supplementary Table 6.

Source data

Extended Data Fig. 3 Analyses of MeRIP-seq/m6A-seq in HEK293T using Macs2.

Three commercial antibodies (NEB[E1610S], SYSY [202003] and Abcam [ab190886]) are used for MeRIP-seq experiment. a) The metagene plot for transcriptome-wide distribution of each kind of peaks, including mRNA (peaks in cellular mRNA), calibrated (mRNA peaks excluding the false positive peaks), false positive (peaks in both cellular mRNA and IVT RNA), IVT (peaks in IVT RNA), and IVT unique (peaks explicitly present in IVT RNA). The location of peak summit is determined to represent the peak position. b) Cumulative frequency curves for the distance between each peak summit and nearest GGACH motif. c) The frequency of GGACH motif within the flanking 200 nt regions to the summit of each kind of peak. d) Nucleotide composition of immunoprecipitated oligonucleotides with randomized sequences (the central position is a determined A or m6A). e) The base ratio for the flanking 200 nt regions around the summit of detected peaks. Base ratio = the density of individual bases in peak region/that in transcript scope. Box boundaries represent 25th and 75th percentiles; the center line represents the median; whiskers indicate ±1.5× interquartile range (IQR). The same definition for box plot is also applied for panel (g). f) Two most significant motifs and the corresponding E-values for false positive peaks identified using three antibodies. g) The expression levels of genes with peaks are calculated separately from mRNA and IVT RNA. Two-sided t-test or paired t-test is applied to calculate significance. Correlations of expression levels are plotted for genes containing false positive peaks in mRNA and IVT RNA libraries.

Source data

Extended Data Fig. 4 RNA modifications detected using LC-MS/MS, dot-blot assays and MeRIP-seq/m6A-seq.

a) LC-MS/MS detection for m6A, m1A and m5C in mESC cellular mRNA, IVT RNA and mRNA from two Mettl3 KO cell lines. b) Dot-blot assays of m6A for mESC cellular mRNA and IVT RNA. c) LC-MS/MS and dot-blot assay of m6A for the modification detection in FTO-treated mRNA. d) IVT RNA or Mettl3 KO samples are applied as negative controls for m6A peak calling. The peak summits are determined to represent the peak position for plotting.

Source data

Extended Data Fig. 5 MazF digestion and the false positive rate (FPR) and undigested rate calculated using m6A-REF-seq data.

a) The total ACA motif counts in ERCC spike-ins (Thermo Fisher Scientific, 4456740), and the remaining intact ACA counts after MazF treatments. Only one ACA site is left after calibrating. ACA sites with sequence depth ≥ 10 are counted. b) Undigested rate for each ACA motif on different ERCC sequences. Only ACA sites with sequence depth ≥ 10 in all samples are counted (n = 354 sites in 6 replicates). Box boundaries represent 25th and 75th percentiles; center line represents the median; whiskers indicate ±1.5× interquartile range (IQR). c) Venn diagrams for the undigested ACA motifs in two replicates of IVT RNA samples derived from HEK293T and mESC. The ACA motifs with undigested reads ≥ 10 are counted. d) Demonstration of the false positive rate (FPR) and undigested rate calculated using m6A-REF-seq data. The undigested ACA motifs in cellular mRNA sample indicates the putative m6A sites, while the undigested ACA motifs in both cellular RNA and IVT RNA samples indicates putative false positives. Only ACA sites with depth ≥10 are used in this study. De: the number of digested reads in cellular mRNA; Ue: the number of undigested reads in cellular mRNA; Di: the number of digested reads in IVT RNA; Ui: the number of undigested reads in IVT RNA.

Source data

Extended Data Fig. 6 The sequence context analysis of high confidence m6A sites and comparison with published datasets.

a) Frequency of 5-mer motif where the high confidence m6A sites and false-positive sites locate in mESC. Background indicates the transcript-wide motif frequency. b) Flanking sequences of high confidence m6A sites (upper) and false-positive sites (bottom). c) Relative distances of m6A locations derived from MeRIP-seq/m6A-seq and m6A-REF-seq before and after calibrating using IVT RNA control in HEK293T. d-e) Comparison and classification of previously reported m6A sites in mESCs (d) and HEK293T (e). De novo m6A sites reported in MAZTER-seq with confidence group ≥1 (Garcia-Campos et al., 2019) are labeled as ‘MAZTER-seq’, and a subgroup with prevKnown_ConfidenceGroups as ‘high’ or ‘highest’ are labeled as ‘MAZTER-seq, high’. m6A sites derived from Zhang et al., 2019 using m6A-REF-seq are labeled as ‘m6A-REF-seq’. ‘True’ indicates that these sites reported in previous works are also detected as high confidence m6A sites in this study. ‘False’ indicates sites determined to be false positive in this study. ‘No undigested signal on mRNA’ indicates sites completely digested in this study. The cutoff of sequencing depth ≥10 is set to group the sites for evaluating the results. f-g) The undigested rates of putative m6A sites detected in this and previous studies. The undigested rates of candidate sites were determined based on the cleavage status of RNA samples in this study. Box boundaries represent 25th and 75th percentiles; center line represents the median; whiskers indicate ±1.5× interquartile range (IQR).

Source data

Extended Data Fig. 7 Pairing bias of high confidence m6A sites and false positive sites in HEK293T and mESCs.

a) Predicted RNA structures of two synthetic oligos (RNA6 and RNA7) with ACA motif located near stem region. The RNA structures are predicted by Vienna package. b) Gel imaging of two synthetic oligos after MazF cleavage. An oligo (RB1) completely cleaved is also shown as positive control. c) Increasing the amount of MazF does not improve the digestion efficiency for oligo RNA7. Sequences of synthetic oligos are provided in Supplementary Table 22. Assays of MazF digestion on synthetic oligo had conducted once. d) Pairing bias for high confidence m6A sites and false positive sites. Coordinate 0 indicates the location of interested sites. Pairing probabilities of upstream and downstream 25 bp flanking sequencing are calculated in respect to the high confidence m6A sites, false-positive sites and random sampling sites. Pairing bias = log2(Pair_probfalse (i,j)/Pair_probrandom (i,j)), where Pair_prob(i,j) is the mean pairing probability of bases between position i and position j. Red color indicates the RNA fragments are more likely to form pairing than random sites.

Source data

Extended Data Fig. 8 Characterization of m6A sites with various stoichiometry (undigested rates) determined by m6A-REF-seq.

Data from HEK293T (n = 3,219) and mESCs (n = 2,607) are calibrated and corrected by IVT RNA control. a-b) The corrected m6A stoichiometry (undigested rates) in two replicates for HEK293T (a) and mESCs (b). Two-sided Pearson correlation test was applied to calculate significance. c-d) qPCR quantification for six sites with corrected stoichiometry values. Data are presented as mean values. Detailed information and calculation processes are described in Supplementary Methods (section 3.2) and Supplementary Tables 23-25. e) The distribution of m6A stoichiometry/undigested rates for high confidence m6A sites containing the motifs GGACA, [AA/AG/GA]ACA, and sequence combinations other than RRACA. f) Stoichiometry/undigested rates of m6A sites within each 5-mer motif. Box boundaries represent 25th and 75th percentiles; center line represents the median; whiskers indicate ±1.5× interquartile range (IQR). The same definition for box plot is also applied for panel (g), (h), (j), and (k). g-h) Frequency of m6A sites from HEK293T (g) and mESCs (h) within each 5mer motif. Sites are equally divided into five groups according to the stoichiometry/undigested rates. i) Metagene plot of transcriptome-wide m6A distribution in respect to different motifs. j-k) Metagene plot of m6A distribution in respect to different stoichiometry/undigested rates in HEK293T (j) and mESCs (k). l-m) The stoichiometry/undigested rates of each two m6A sites located in the same host gene of HEK293T (l) and mESCs (m).

Source data

Extended Data Fig. 9 Bisulfite-treatment and high confidence m5C sites.

a) The workflow of BS-seq protocol under high-stringency condition. b-c) Scatter plots and correlations of gene expressions before and after bisulfite treatment in respect to cellular mRNA (b) and IVT RNA (c). d) Scatter plot and correlation of gene expression fold changes in respect to the cellular mRNA and IVT RNA library. The fold changes are computed as the log2 reads count of BS-treated RNA divided by that of original RNA. e) Illustration of the false positive rate (FPR) and non-conversion rate calculated based on BS-seq data. Four representative C sites are provided for demonstrating the detailed processes. The non-converted C in cellular mRNA sample indicates the putative m5C site, while the non-converted C in IVT RNA sample indicates putative false positive. The non-conversion rate for a high confidence m5C site should be significantly lower in IVT RNA sample than in cellular mRNA sample. The FPR value and putative stoichiometry/non-conversion rate of high confidence m5C site are calculated as the formulas shows. f) The calibrated identities of candidate m5C sites reported in previous literatures. The site identifies are determined as True (high confidence sites), False (false-positive sites), or Completely convert (unmethylation) by comparing them to the datasets in this study. The cutoff sequencing depth ≥10 is set to group the sites for evaluating the results. g) Count of reproducible sites with different non-conversion rate reported by previous literatures. The overlapping indicates the sites identified in both datasets, and the non_overlapping indicates the sites specifically identified in the current study. Pearson’s correlation of the non-conversion rate (putative m5C stoichiometry) between two studies is calculated and plotted. Huang2019, Yang2017, and Squires2012 indicate the m5C sites reported in the literatures Huang et al., (2019), Yang et al., (2017), and Squires et al., (2012), respectively. Two-sided Pearson correlation test was applied to calculate significance.

Source data

Extended Data Fig. 10 RNA m5C BS-seq performance under different conditions, sanger validation and RNA pairing analysis.

a) The workflow to conduct BS-seq under different conditions of bisulfite treatment. b) Comparison of the overall conversion rates estimated by ERCC mixes under different conversion conditions. Data are presented as mean values + /- SD (n = 4). c) Boxplot for conversion rates in each ERCC oligos under different conditions (n = 39,372, 4 replicates). Reads depth ≥10 is used to filter the low-abundance oligos. Box boundaries represent 25th and 75th percentiles; center line represents the median; whiskers indicate ±1.5× interquartile range (IQR). d) Counts of non-converted sites before and after calibration using IVT RNA control. Three datasets under different conditions are compared to show the effectiveness of calibration. e) Putative methylation rates (non-conversion rates) of five candidate m5C sites validated by BS-converted Sanger sequencing. The reported methylation rates from previous literatures are also included. The PCR primers for sequencing are listed in Supplementary Table 29. f) Correlation of methylation rates for five sites determined in BS-seq and Sanger sequencing. g) Pairing probabilities of upstream and downstream 25 bp flanking sequencing are calculated in respect to the high confidence m5C sites, false-positive sites and random sampling sites. Coordinate 0 indicates the location of interested sites. Pairing bias = log2(Pair_probfalse (i,j)/Pair_probrandom (i,j)), where Pair_prob(i,j) is the mean pairing probability of bases between position i and position j. Red color indicates the RNA fragments are more likely to form pairing than random sites.

Source data

Supplementary information

Supplementary Information

Supplementary Methods, Figs. 1–5 and Tables 1–5, 7, 14, 17, 22–24, 26 and 29

Reporting Summary

Supplementary Tables

Supplementary Tables 6, 8–13, 15, 16, 18–21, 25, 27 and 28.

Source data

Source Data Fig. 1

Statistical source data.

Source Data Fig. 1

Unprocessed blots and gels.

Source Data Fig. 2

Statistical source data.

Source Data Fig. 2

Unprocessed western blots.

Source Data Fig. 3

Statistical source data.

Source Data Fig. 4

Statistical source data.

Source Data Fig. 5

Statistical source data.

Source Data Extended Data Fig. 1

Statistical source data.

Source Data Extended Data Fig. 2

Statistical source data.

Source Data Extended Data Fig. 3

Statistical source data.

Source Data Extended Data Fig. 4

Statistical source data.

Source Data Extended Data Fig. 4

Unprocessed blots.

Source Data Extended Data Fig. 5

Statistical source data.

Source Data Extended Data Fig. 6

Statistical source data.

Source Data Extended Data Fig. 7

Statistical source data.

Source Data Extended Data Fig. 7

Unprocessed gels.

Source Data Extended Data Fig. 8

Statistical source data.

Source Data Extended Data Fig. 9

Statistical source data.

Source Data Extended Data Fig. 10

Statistical source data.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, Z., Chen, T., Chen, HX. et al. Systematic calibration of epitranscriptomic maps using a synthetic modification-free RNA library. Nat Methods 18, 1213–1222 (2021). https://doi.org/10.1038/s41592-021-01280-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41592-021-01280-7

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing