Nat. Methods 14, 695–698 (2017); published online 15 May 2017; corrected after print 28 February 2018
We were alerted by readers that the reported Nm consensus sequence in mRNA matches the 3′-adaptor sequence used in sequencing library preparations, and this could be caused by mispriming1. In our approach, the majority of RNA fragments without Nm at the 3′ end are blocked from ligating to the 3′ adaptor because of the presence of 3′ phosphate from the last oxidation elimination step (OE) (Fig. 1a of the original paper), while Nm sites accumulate at the 3′ ends (Supplementary Figs. 1 and 2; Supplementary Notes 1 and 2). However, because of the low Nm abundance in messenger RNA (mRNA), only very limited amounts of mRNA fragments carry 3′ Nm and thus can be successfully ligated to the 3′ adaptor. Mispriming could occur if the 3′ end of the reverse transcription (RT) primer hybridizes to a few bases of the 5′-ligated RNA (Fig. 1a). Although our method effectively identifies Nm sites in abundant ribosomal RNA (rRNA, Supplementary Fig. 1), its application to less abundant mRNA can be contaminated by mispriming, leading to false-positive Nm sites and the erroneous AGAUC motif on mRNA (original Fig. 2d), which corresponds to the 5′-end sequence of the 3′ adaptor.
To eliminate mispriming, we kept the original procedure intact but designed new 3′ and 5′ adaptors with the following features (Fig. 1a): (i) we added a six-letter in-line barcode (ATCACG) at the 5′ end of the original 3′-adaptor sequence. After RT, all of the first-strand cDNAs generated from the correct priming should contain the complementary sequence of the in-line barcode. On the contrary, the cDNAs generated from mispriming will not contain it, as it will not be a part of the template to synthesize cDNA. We can thus readily identify and filter off the mispriming reads. (ii) We added 5-nt randomized nucleotides to the 3′ and 5′ adaptors at the ligation junctions to reduce ligation-associated bias2,3,4. They also serve as unique molecular identifiers (UMIs) to identify and exclude PCR duplicates so that the real numbers of original molecules before PCR can be accurately quantified5.
With the elimination of mispriming, the refined Nm-seq was applied to the same input as in the original paper (10 μg mRNA from HeLa and HEK293 cells). Using a customized pipeline (see Online Methods), we detected 2,103 confident Nm sites from HeLa cells and 699 Nm sites from HEK cells, respectively, with a Nm site distribution profile of Nm sites showing a similar distribution pattern as reported in the original Figure 2c (Fig. 1b), and a different codon preference from the original Figure 2e (Fig. 1c, Supplementary Fig. 3, Supplementary Note 3). Additional features are summarized in Supplementary Figures 4 and 5, and consistent in HeLa and HEK293 cells (Supplementary Figs. 6 and 7). In both cell lines, Um is the dominant Nm modification (64% of all Nm sites in HeLa mRNA and 78% of all Nm site in HEK mRNA), which is consistent with our previous LC-MS/MS data, with a depletion of A flanking the modification site (Fig. 1d).
The majority of Nm sites occurred in 1,267 RefSeq-annotated genes in HeLa cells, 88.9% of which are protein coding. We found a different distribution of Nm sites in codons than originally reported. 60.4% of sites occurred in six codons corresponding to six amino acids (Leu (17.0%), Phe (11.7%), Ser (11.1%), Val (7.5%), Asp (6.9%) and Thr (6.1%)) (Supplementary Table 1). Nm distribution within a codon was found to be 30%, 36% and 34% at each position, contrary to the increased methylation at the first position originally reported. These new features are consistent in HEK cells (Supplementary Fig. 7 and Supplementary Table 2).
Selected Nm sites on mRNA have been confirmed with a low-throughput validation approach6 (Supplementary Fig. 8) and enrich FBL-binding sites (Supplementary Note 5), indicating functional roles for future explorations (Supplementary Note 5). The new Nm-seq data have been deposited to the GEO (GSE90164).
Accession codes
References
Gillen, A.E. et al. BMC Genomics, 17, 338 (2016).
Sorefan, K. et al. Silence 3, 4–14 (2012).
Zhuang, F. et al. Nucleic Acids Res. 40, e54 (2012).
Sun, G. et al. RNA 17, 2256–2262 (2011).
Marx, V. Nat. Methods 14, 473–476 (2017).
Dong, Z.W. et al. Nucleic Acids Res. 40, e157 (2012).
Additional information
The online version of the original article can be found at https://doi.org/10.1038/nmeth.4294
Integrated supplementary information
Supplementary Figure 1 Detection of Nm sites in HeLa rRNA
(a-b), Nm-seq profiles of human 18S (a) and 28S (b) rRNA above MCC-determined optimal threshold (blue). Known Nm sites are shown as red bars below. (c-d), Receiver Operating Characteristic (ROC, orange) and Mathews Correlation Coefficient (MCC, green) curves for human 18S (c) and 28S (d) rRNA plotted using increasing normalized 3′ end coverage thresholds at each position.
Supplementary Figure 2 Examples of Nm sites in HeLa mRNA
Nm-seq plots of methylated transcripts: (a) NKIRAS1 (b) KLHL5. Normalized summed sequence coverage of Nm-seq and input are shown below and above the transcript, respectively. Individual pairedend reads within the Nm site window are shown in magnification.
Supplementary Figure 3 Features of the HeLa Nm methylome
(a) Distribution of 2′-OMe sites between the four nucleobases in the various transcript segments and overall. (b) Fraction of Nm sites detected within mRNA and ncRNA. (c) The percentage of methylated genes according to the number of Nm sites per gene. (d) The percentage of methylated genes increases with expression level.
Supplementary Figure 4 RNA secondary structure surrounding Nm sites, m6Am in mRNA and Gene Ontology (GO) analysis
The secondary structures of a 200-nt window centered on Nm sites was analyzed using the Structure Surfer tool based on: (a) PARS score (b) ds/ssRNA score and (c) DMS-seq. (d) LC-MS/MS quantification of internal (i.e., excluding the first transcribed nucleotide) m6A and internal m6Am in HeLa mRNA. The level of each modified nucleoside is presented as a percentage of the unmodified one. Mean values ±} s.e.m. are shown, n = 3. (e) GO analysis of Nm-methylated HeLa genes relative to all adequately expressed genes (above the 1st quartile) reveals enrichment of GO terms related to cell-cell interactions, splicing and more (fold enrichment ≥ 2, Bonferroni corrected P ≤ 0.005). Fold-enrichment and P values are indicated for each category.
Supplementary Figure 5 Distribution of Nm sites in HeLa mRNA
(a) Metagene profile of Nm site distribution along a normalized mRNA transcript. (b) Metagene profile of Nm sites distribution relative to the first and nearest splice sites in a 400-nt non-normalized window.
Supplementary Figure 6 Features of the Nm methylomes in HEK293 cells (part 1)
(a) Distribution of 2′-O-methyl sites between the four nucleobases in the various transcript segments and overall. (b) Fraction of Nm sites detected within mRNA and ncRNA. (c) Metagene profile of Nm sites distribution along a normalized mRNA transcript illustrated below. (d) The percentage of methylated genes according to the number of Nm sites per gene.
Supplementary Figure 7 Features of the Nm methylomes in HEK293 cells (part 2)
(a) HEK293 Nm sites in different transcript segments of coding genes. (b) Distribution of Nm sites between the three codon positions. (c) Distribution of Nm sites among different amino acid codons.
Supplementary Figure 8 Nm site validation by measuring percentage of reverse transcription (RT) stop with limited concentration of dNTPs
(a) Scheme of Nm site validation strategy adopted from (Nucleic Acids Res. 40, e157 (2012)). Nm methylation would cause stop in RT reactions when limited concentrations of dNTPs were present. Two sets of paired primers, one with a forward primer annealing to upstream of Nm site (FU), the other with a forward primer annealing to downstream of Nm site (FD), were used to quantify “RT efficiency” as defined. “RT fold change” was defined as the ratio between the RT efficiency with low dNTPs and the one with high dNTPs. (b) RT fold change of Nm candidate sites in mRNA (blue), positive ribosomal RNA Nm sites (orange), and negative control sites (grey) in presence of the dNTPs concentrations at 0.5 uM (low), 1 uM (low), or 40 uM (high). Top right panel, box plot of RT fold change of the three groups of sites. Box, 25–75%. P values were calculated with t.test (2–tailed, heteroscedastic). We selected seven Nm candidate sites in mRNAs identified by our optimized method, and measured percentages of reverse transcription (RT) stops with limited concentrations of dNTPs. The candidate sites showed RT stop percentages comparable to those observed for known Nm sites in rRNAs, and were significantly lower than those of negative sites, confirming that the current method is sensitive and accurate.
Supplementary information
Supplementary Figures Tables and Texts
Supplementary Figures 1–8, Supplementary Tables 1–2 and Supplementary Notes 1–5 (PDF 974 kb)
Supplementary Online Methods
Supplementary Online Methods 1 (PDF 113 kb)
Supplementary Protocol
Supplementary Protocol 1. (PDF 72 kb)
Rights and permissions
About this article
Cite this article
Dai, Q., Moshitch-Moshkovitz, S., Han, D. et al. Correction: Corrigendum: Nm-seq maps 2′-O-methylation sites in human mRNA with base precision. Nat Methods 15, 226–227 (2018). https://doi.org/10.1038/nmeth0318-226c
Published:
Issue Date:
DOI: https://doi.org/10.1038/nmeth0318-226c