Systematic comparison of tools used for m6A mapping from nanopore direct RNA sequencing

N6-methyladenosine (m6A) has been increasingly recognized as a new and important regulator of gene expression. To date, transcriptome-wide m6A detection primarily relies on well-established methods using next-generation sequencing (NGS) platform. However, direct RNA sequencing (DRS) using the Oxford Nanopore Technologies (ONT) platform has recently emerged as a promising alternative method to study m6A. While multiple computational tools are being developed to facilitate the direct detection of nucleotide modifications, little is known about the capabilities and limitations of these tools. Here, we systematically compare ten tools used for mapping m6A from ONT DRS data. We find that most tools present a trade-off between precision and recall, and integrating results from multiple tools greatly improve performance. Using a negative control could improve precision by subtracting certain intrinsic bias. We also observed variation in detection capabilities and quantitative information among motifs, and identified sequencing depth and m6A stoichiometry as potential factors affecting performance. Our study provides insight into the computational tools currently used for mapping m6A based on ONT DRS data and highlights the potential for further improving these tools, which may serve as the basis for future research.

Selection of the best cut-offs for ONT tools. Plots show the distribution of F1 scores with varing cut-offs of all tools using MeRIP-seq, miCLIP and miCLIP2 results as validation sets. "miCLIP and miCLIP2" represents the sum of F1 scores from miCLIP and miCLIP2.
Supplementary Fig. 9 Performance of m6A detection for ONT tools under the optimal cut-off. a-b Precision, recall and F1 scores for ONT tools using miCLIP results (a) and miCLIP2 results (b) as validation set. Minmum coverage requirements of sites indicated in the left side.
Supplementary Fig. 10 Assessment of intrinsic bias and the effects of using a negative control on m6A detection. a Metagene plots of m6A sites detected in KO samples under the best cut-off. "Background" means all RRACH sites with coverage >= 5 nanopore reads in KO samples. Number of sites indicated in parentheses. b Rates of different types intrinsic bias in the sites detected both in WT and KO samples. "Background" means all RRACH sites with coverage >= 5 nanopore reads. One-side Binomial tests were applied to calculate significance. c IGV snapshots of two highly probable false positives around SNP/homopolymer in WT and KO samples. Coverage for each site is indicated on the left of each short sequence. d-e Comparison of recall before and after calibration with the KO sample results (d) and a random dataset (e). WT shows the recall for m6A sites detected in the WT samples, WT-KO shows the recall for sites calibrated using KO results, and WT-random shows the recall for sites calibrated with a randomly selected dataset. miCLIP results were used as a validation set. f-g Comparison of precision, recall and F1 score before and after calibration with the KO sample results (f) and a random dataset (g). miCLIP2 results were used as a validation set. h-i Comparison of number of m6A sites, precision, recall and F1 scores using original mode, compare mode and calibration with the KO sample results for Tombo, ELIGOS and Epinano. The validation set is derived from the miCLIP results in h, the miCLIP2 results in i.

Supplementary Fig. 11 Detection capacity for m6A among sequence motifs. a-b
Relative recall rates (a) and precision (b) for all tools for 12 motifs bearing the consensus sequence RRACH. Relative recall rates/precision were calculated as the ratio of the recall/precision for the sites in a specific motif to the overall mean (across all sites). A m6A profile created from miCLIP2 results was used for validation. c A comparison of insertion frequency and base quality for modified and unmodified sites in GGACA and GAACA motifs. In the Boxplot, the upper and lower limits represent the 75th and 25th percentiles, respectively, while the center line represents the median; upper and lower whiskers indicate ±1.5 × the interquartile range (IQR). The number of sites is indicated in parentheses. Relative recall rates (a) and precision (b) for m6A sites versus coverage. Relative recall rates/precision were calculated as the ratio of the recall rate/precision for m6A sites with a given coverage to the overall mean (across all sites). The validation set used came from the miCLIP2 results.    Table 5 The Best F1 score for each tool and the corresponding threshold using miCLIP and miCLIP2 results as validation sets. "Sum" represents the sum of F1 scores from miCLIP and miCLIP2.

Supplementary
Supplementary Table 6 Recall and precision for all tools for 12 motifs bearing the consensus sequence RRACH. m6A profiles created from miCLIP and miCLIP2 results were used for validation separately.