Nat. Methods 14, 695–698 (2017); published online 15 May 2017; corrected after print 28 February 2018

We were alerted by readers that the reported Nm consensus sequence in mRNA matches the 3′-adaptor sequence used in sequencing library preparations, and this could be caused by mispriming1. In our approach, the majority of RNA fragments without Nm at the 3′ end are blocked from ligating to the 3′ adaptor because of the presence of 3′ phosphate from the last oxidation elimination step (OE) (Fig. 1a of the original paper), while Nm sites accumulate at the 3′ ends (Supplementary Figs. 1 and 2; Supplementary Notes 1 and 2). However, because of the low Nm abundance in messenger RNA (mRNA), only very limited amounts of mRNA fragments carry 3′ Nm and thus can be successfully ligated to the 3′ adaptor. Mispriming could occur if the 3′ end of the reverse transcription (RT) primer hybridizes to a few bases of the 5′-ligated RNA (Fig. 1a). Although our method effectively identifies Nm sites in abundant ribosomal RNA (rRNA, Supplementary Fig. 1), its application to less abundant mRNA can be contaminated by mispriming, leading to false-positive Nm sites and the erroneous AGAUC motif on mRNA (original Fig. 2d), which corresponds to the 5′-end sequence of the 3′ adaptor.

Figure 1: Nm-seq, based on oxidative cleavage for mapping 2′-O-methylation.
figure 1

(a) Schematic illustration of revised Nm-seq. Following eight rounds of OED, 5′ phosphorylation and the last OE, 3′- and 5′-adaptor ligation generate two kinds of RT templates: (i) 3′ monophosphates (block cross), which block 3′ ligation, and (ii) 3′-Nm (red triangle) fragments, which ligate with both 3′ and 5′ adaptors. Mispriming may occur if the 3′ end of excess RT primer hybridizes to 5′-ligated RNA, which would exclude the introduced in-line barcode. Correct priming would include the in-line barcode, which enables filtering off mispriming reads. (b) Metagene profile of Nm site distribution along mRNA transcript in HeLa cells. Y-axis represents the density of identified Nm sites along transcripts. (c) Distribution of Nm sites in HeLa mRNA among different amino acid codons. (d) Um is the dominant Nm modification in both HeLa mRNA with a depletion of A flanking the modification site. The gray ring represents the nucleotide distribution of Nm site, the inside ring represents the nucleotide distribution at the −1 position (5′) of Um, and the outside ring represents the nucleotide percentage at the +1 position (3′).

To eliminate mispriming, we kept the original procedure intact but designed new 3′ and 5′ adaptors with the following features (Fig. 1a): (i) we added a six-letter in-line barcode (ATCACG) at the 5′ end of the original 3′-adaptor sequence. After RT, all of the first-strand cDNAs generated from the correct priming should contain the complementary sequence of the in-line barcode. On the contrary, the cDNAs generated from mispriming will not contain it, as it will not be a part of the template to synthesize cDNA. We can thus readily identify and filter off the mispriming reads. (ii) We added 5-nt randomized nucleotides to the 3′ and 5′ adaptors at the ligation junctions to reduce ligation-associated bias2,3,4. They also serve as unique molecular identifiers (UMIs) to identify and exclude PCR duplicates so that the real numbers of original molecules before PCR can be accurately quantified5.

With the elimination of mispriming, the refined Nm-seq was applied to the same input as in the original paper (10 μg mRNA from HeLa and HEK293 cells). Using a customized pipeline (see Online Methods), we detected 2,103 confident Nm sites from HeLa cells and 699 Nm sites from HEK cells, respectively, with a Nm site distribution profile of Nm sites showing a similar distribution pattern as reported in the original Figure 2c (Fig. 1b), and a different codon preference from the original Figure 2e (Fig. 1c, Supplementary Fig. 3, Supplementary Note 3). Additional features are summarized in Supplementary Figures 4 and 5, and consistent in HeLa and HEK293 cells (Supplementary Figs. 6 and 7). In both cell lines, Um is the dominant Nm modification (64% of all Nm sites in HeLa mRNA and 78% of all Nm site in HEK mRNA), which is consistent with our previous LC-MS/MS data, with a depletion of A flanking the modification site (Fig. 1d).

The majority of Nm sites occurred in 1,267 RefSeq-annotated genes in HeLa cells, 88.9% of which are protein coding. We found a different distribution of Nm sites in codons than originally reported. 60.4% of sites occurred in six codons corresponding to six amino acids (Leu (17.0%), Phe (11.7%), Ser (11.1%), Val (7.5%), Asp (6.9%) and Thr (6.1%)) (Supplementary Table 1). Nm distribution within a codon was found to be 30%, 36% and 34% at each position, contrary to the increased methylation at the first position originally reported. These new features are consistent in HEK cells (Supplementary Fig. 7 and Supplementary Table 2).

Selected Nm sites on mRNA have been confirmed with a low-throughput validation approach6 (Supplementary Fig. 8) and enrich FBL-binding sites (Supplementary Note 5), indicating functional roles for future explorations (Supplementary Note 5). The new Nm-seq data have been deposited to the GEO (GSE90164).