Deep learning modeling m6A deposition reveals the importance of downstream cis-element sequences

Luo, Zhiyuan; Zhang, Jiacheng; Fei, Jingyi; Ke, Shengdong

doi:10.1038/s41467-022-30209-7

Download PDF

Article
Open access
Published: 17 May 2022

Deep learning modeling m⁶A deposition reveals the importance of downstream cis-element sequences

Nature Communications volume 13, Article number: 2720 (2022) Cite this article

7414 Accesses
7 Citations
3 Altmetric
Metrics details

Subjects

Abstract

The N⁶-methyladenosine (m⁶A) modification is deposited to nascent transcripts on chromatin, but its site-specificity mechanism is mostly unknown. Here we model the m⁶A deposition to pre-mRNA by iM6A (intelligent m⁶A), a deep learning method, demonstrating that the site-specific m⁶A methylation is primarily determined by the flanking nucleotide sequences. iM6A accurately models the m⁶A deposition (AUROC = 0.99) and uncovers surprisingly that the cis-elements regulating the m⁶A deposition preferentially reside within the 50 nt downstream of the m⁶A sites. The m⁶A enhancers mostly include part of the RRACH motif and the m⁶A silencers generally contain CG/GT/CT motifs. Our finding is supported by both independent experimental validations and evolutionary conservation. Moreover, our work provides evidences that mutations resulting in synonymous codons can affect the m⁶A deposition and the TGA stop codon favors m⁶A deposition nearby. Our iM6A deep learning modeling enables fast paced biological discovery which would be cost-prohibitive and unpractical with traditional experimental approaches, and uncovers a key cis-regulatory mechanism for m⁶A site-specific deposition.

Exon-intron boundary inhibits m6A deposition, enabling m6A distribution hallmark, longer mRNA half-life and flexible protein coding

Article Open access 13 July 2023

Elucidation of DNA methylation on N6-adenine with deep learning

Article 03 August 2020

A two-stream convolutional neural network for microRNA transcription start site feature integration and identification

Article Open access 11 March 2021

Introduction

The N⁶-methyladenosine (m⁶A) modification is the most common internal modification in eukaryotic mRNA, and widely distributed in various tissues^1,2. It was first identified to be in mRNA during 1970s^3,4,5. m⁶A is involved in diverse biological processes including cell differentiation, cancer progression and neurological development^6,7,8,9,10. Due to its functional importance, m⁶A has been discovered to affect various aspects of RNA biology, including splicing, polyadenylation, export, degradation, and translation^11,12. Its major function is believed to regulate mRNA turnover^13,14,15,16.

The m⁶A modification on mRNA is catalyzed by the m⁶A methyltransferase complex (MTC), which is comprised of METTL3 and METTL14 as the catalytic core^17,18,19. Additional components including WTAP, VIRMA, ZC3H13, and HAKAI are also found to interact with METTL3-METTL14 and affect the complex activity^{20,21,22,23,24,25,26}. The m⁶A consensus sequence RRACH as a stringent motif or RAC as a more inclusive motif (R = A or G, H = A, C, or U) was first determined by biochemical experiments^27,28,29,30. Despite the wide prevalence of the m⁶A consensus motif in transcripts, very few of them are methylated, highlighting the site-specificity of m⁶A methylation. To investigate the global m⁶A distribution at the transcriptomic level, the m⁶A-seq and MeRIP-seq were first developed to map the m⁶A peak regions (typically ~200 nt or longer) using commercially available m⁶A antibodies^1,2. To achieve single-nucleotide resolution mapping for m⁶A, m⁶A-CLIP/miCLIP/PA-m⁶A-seq crosslinked m⁶A-antibody to its m⁶A mRNA target by UV and achieved precise m⁶A mapping in transcripts by detecting the reverse transcription errors due to the residual peptide crosslinked to m⁶A (CIMS sites: crosslink induced mutational sites, CITS sites: crosslink induced truncation sites)^31,32,33. In addition, the reverse transcription errors by m⁶A modification itself also enabled the single-nucleotide resolution mapping (MITS sites: m⁶A induced truncation sites)³³. Though a few new precise m⁶A mapping methods have recently been developed by exploring alternative ideas^{34,35,36,37,38}, the m⁶A-CLIP method has generated the major share of precise m⁶A sites in human and mouse transcripts^{8,14,32,33,39}. The m⁶A mapping studies showed that m⁶As were preferentially enriched in last exons, both their coding region and 3'UTR (untranslated region), as well as in long internal exons^1,2.

Based on the existing m⁶A sites precisely determined by experiments, computational methods have been developed to model the m⁶A sites in mRNA, including the machine learning-based methods (WHISTLE, SRAMP, and MethyRNA) and the deep learning-based methods (TDm6A, DeepM6ASeq)^{39,40,41,42,43}. These bioinformatics methods mostly focused on the gradually improvement for the m⁶A site modeling accuracy, but used the relatively small-scale data integration and contributed little to discovery of biological mechanisms.

Here we first described a new deep learning method, the ResNet (residual neural network), for modeling the m⁶A deposition in pre-mRNA. The ResNet avoids the vanishing gradient problem in deep neural networks by the skip connections⁴⁴. Skip connections allow to skip some layers in the neural network and feed the output of one layer as the input to the next layers⁴⁴, enabling us to build deeper neural networks (adding more layers) and improve the accuracy of classification. In addition, it can handle very large datasets to investigate more complex issues. This deep learning method has been successfully used to handle high-throughput sequencing data and model biological processes^45,46.

Our ResNet deep learning approach, the iM6A (intelligent m⁶A), models the m⁶A site-specific deposition in the genome with a state of art accuracy. Using saturated mutational analysis to generate input sequence, we systematically perturbed the input sequence to the iM6A deep learning model to see how it affects the m⁶A deposition output. We discovered surprisingly that the downstream 50 nt region of the m⁶A sites contained a high density of the cis-elements for the m⁶A deposition. This pattern was consistently true for both last exons and internal exons. We further characterized m⁶A enhancers and silencers by implementing linear regression to interpret the iM6A deep learning output. The iM6A modeling as well as the identified functional cis-elements were validated by independent experimental data and evolutionary conservation. By a similar process of model perturbation, we found that synonymous codon mutations can affect m⁶A deposition and that the TGA stop codon may promote the adjacent m⁶A deposition. The iM6A approach enabled high-throughput and effective biological discovery which would be cost-prohibitive for traditional experimental methods, and uncovered a key cis-regulatory mechanism governing m⁶A site-specific deposition.

Results

iM6A accurately models m⁶A deposition

As with any nucleotide-related biological process, the question arises whether the site-specificity of m⁶A deposition is determined in whole or part by a “code” in flanking primary nucleotide sequences. Is there an m⁶A cis-element code? To address this question directly, we developed the iM6A (intelligent m⁶A, Fig. 1a), a deep residual neural network (ResNet)⁴⁴ to model the m⁶A site-specific deposition at genome-wide level. We first collected a high-quality set of m⁶A sites that were precisely determined by the m⁶A-CLIP experiments in mouse transcriptome^14,33. We used pre-mRNA sequences as input: the m⁶A sites on pre-mRNA were served as positive sites, while the remained nucleotides were treated as negative sites. The whole dataset was divided into training and test datasets. The training dataset contained all the transcripts on most chromosomes except chromosome 9 (Chr9), the transcripts of which were held out and reserved for the later independent test of iM6A modeling. iM6A evaluated the full length of transcripts, and the outputs of which were probabilities of each nucleotide position being an m⁶A site (see details in the Methods). iM6A modeled the m⁶A sites in the test set with an accuracy of 0.991 as measured by the AUROC score (area under receiver operator curves) (Fig. 1b). As the comparison, we also implemented a traditional machine learning method, SVM (Support Vector Machine)³⁹ and an alternative deep learning method CNN-RNN (Convolutional Neural Network-Recurrent Neural Network)⁴⁰ to modeling the m⁶A modification deposition for the same training and testing datasets (see Methods for more details). The comparisons showed that iM6A achieved better performance than both SVM and CNN-RNN (Fig. 1b).

**Fig. 1: iM6A models m⁶A deposition with single-nucleotide resolution.**

Alternatively, the performance of iM6A measured by the AUPRC score (area under precision recall curves) showed iM6A was also better than those of SVM and CNN-RNN methods (Supplementary Fig. 1b). The m⁶A sites experimentally determined by m⁶A-CLIP were accurately identified from the non-methylated sites by iM6A in the independent test (Fig. 1c). For comparison with mouse, we implemented the iM6A strategy to model the m⁶A site-specificity in human genome by using a high-quality set of human m⁶A sites that were precisely determined by the m⁶A-CLIP experiment^8,14,32,33 and obtained the same high AUROC and AUPRC performances (Supplementary Fig. 1a, c, d).

Our iM6A training was using the experimentally determined m⁶A sites by the m⁶A-CLIP method which identified a major share of the single-nucleotide resolution m⁶A sites that had been mapped so far (Supplementary Fig. 1l). To make sure that the iM6A model was accurate for all m⁶A sites independent of the experimental methods that precisely mapped them, we examined whether iM6A could identify m⁶A sites mapped by alternative experimental methods (Supplementary Fig. 1g) including m6A-label-seq³⁶, MAZTER-seq³⁴, m6ACE-seq³⁵, and miCLIP2⁴⁷. The m6A-label-seq method detected m⁶A sites by chemically substituting the m⁶A with a⁶A (N⁶-allyladenosine) at the m⁶A sites, MAZTER-seq identified a relatively small subset of m⁶A sites that were in the m⁶ACA motifs by a methyl-sensitive RNase, and m6ACE-seq detected m⁶A sites by its crosslinking to the m⁶A-antibody and followed with the exonuclease digest to achieve single-base resolution. In addition, miCLIP2 was an optimized CLIP method that combined miCLIP with machine learning to improve m⁶A detection⁴⁷. The precisely mapped m⁶A sites by all these alternative experimental methods were identified with high probability values by iM6A (Fig. 1d and Supplementary Fig. 1j, k for mouse, Supplementary Fig. 1e–i for human), indicating that iM6A modeling was accurate and supported by a variety of the m⁶A mapping experimental methods. Furthermore, we investigated if the modeled m⁶A probability by iM6A for an m⁶A site was quantitatively associated with its methylation level. The m⁶A peak enrichment value quantifies its methylation level by normalizing the m⁶A-IP read count to the input read count for an m⁶A peak region^1,2,14,33. We categorized the m⁶A sites into three groups based on their m⁶A peak enrichment value as the low, medium, and high groups, and found that the modeled m⁶A probability by iM6A associated with the quantitative distribution of the peak enrichment value across the three groups (Fig. 1e for mouse, and Supplementary Fig. 1m for human). MAZTER-seq is another method that could experimentally quantify m⁶A methylation level for a small subset of m⁶A sites that were in RRACA (R = A or G) motif, with the higher m⁶A methylation level associated with the lower cleavage efficiency by a methylation-sensitive RNase³⁴. The modeled m⁶A probability by iM6A also associated with the quantitative distributions across the different cleavage efficiencies groups (Supplementary Fig. 1n for mouse data, and Supplementary Fig. 1o for human data). All the results above supported that the m⁶A probability score generated by iM6A reflected quantitatively the methylation level at the m⁶A site.

It is known that the m⁶A site consensus could be either RRACH (H = A, C, or U) as a high stringent set or RAC as a more inclusive set. Accordingly, to be comprehensive, we independently trained the RAC iM6A model and the RRACH iM6A model using either the RAC or the RRACH experimentally determined m⁶A sites in most genes on chromosomes except chromosome 9 (Chr9) as the training dataset, and tested the performance of the RAC and the RRACH iM6A models on genes from chromosome 9 (Chr9). The RAC iM6A model performed very similarly to the RRACH iM6A model (Fig. 1f for mouse data, and Supplementary Fig. 1p for human data). In addition, we trained the iM6A model with 80, 400, 2 K, and 10 K sequence on both sides, and the performance increased along with sequence length (Supplementary Fig. 1q). For all the analysis in the remaining result section, we implemented the RAC iM6A-10K model to generate all the data.

Cis-elements that govern the m⁶A deposition locate largely within 50 nt downstream of the m⁶A sites

Though iM6A as a deep learning approach was powerful in accurately modeling the m⁶A sites in the genome, this deep learning black box did not aid understanding of the underlying cis-element rules, i.e., the m⁶A cis-element code. To systematically identify the cis-elements that determine m⁶A modification, we performed single nucleotide saturation mutagenesis (Fig. 2a) to the sequences flanking the m⁶A sites in last exon which contains about 70% of all m⁶A sites in the transcripts³³ and calculated the positional mutational effects for the m⁶A deposition by iM6A. We found that the mutations that either increased or decreased m⁶A probability significantly (|ΔProbability | > 0.1) were largely enriched in the downstream region of the m⁶A sites, especially within the 50 nt downstream of m⁶A sites (Fig. 2b for mouse; and Supplementary Fig. 2a for human), suggesting cis-elements that influence m⁶A deposition locate largely in this region.

**Fig. 2: *Cis*-elements that regulate m⁶A deposition locate largely within 50 nt downstream of m⁶A sites.**

While the last exon hosts a majority of m⁶A sites, the long internal exon also contains many m⁶A sites¹⁴. We applied the same strategy to investigate the cis-elements flanking the m⁶A sites in the long internal exon, and found that the downstream 50 nt region of the m⁶A sites again contained largely of the cis-elements that regulate the m⁶A deposition (Fig. 2c for mouse; and Supplementary Fig. 3b for human), suggesting that the m⁶A deposition in both the last exon and the long internal exon may follow a similar mechanism.

To systematically and quantitatively analyze the cis-element effect on m⁶A deposition in the 50 nt downstream region, we implemented a linear regression approach (Fig. 2d) which had been demonstrated to be effective in identifying functional motifs for microRNA targeting⁴⁸ and pre-mRNA splicing regulation⁴⁹: a substitution was made which created and disrupted five overlapping 5-mers simultaneously and the net effect for each pentamer motif was determined by the slope of the linear regression equation when pooling all the data (see details in the Methods). Based on their effect value and the statistical significance, pentamer motifs were ranked, Top 20 enhancers and silencers were showed. For the last exon, the m⁶A enhancers included mostly part of the RRACH motif; the m⁶A silencers mostly contained the CG/GT/CT dinucleotides (Fig. 2e, f for mouse, and the virtually the same motif set for human, Supplementary Fig. 2c, d). Almost the same set of the m⁶A enhancers and the m⁶A silencers were obtained for the long internal exon (Supplementary Fig. 2f, g). We also observed a strong effect value correlation for all pentamers between the study in the last exon and the study in the long internal exon (Fig. 2g for mouse, and Supplementary Fig. 2e for human), supporting that the same cis-element code governed m⁶A site-specific deposition in both locations. Moreover, the strong effect correlation was obtained for all pentamers between the study in mouse and the study in human (Supplementary Fig. 2h for the last exon, and Supplementary Fig. 2i for the long internal exon), supporting that both mouse and human had the same cis-element code in regulating m⁶A deposition for both the last exon and the long internal exon.

We further investigated the m⁶A enhancer and silencer motif distribution in the region flanking the m⁶A sites. The m⁶A enhancers had a higher frequency around the m⁶A sites than the control that had the exact RAC motif matched (Supplementary Fig. 2h for mouse, and Supplementary Fig. 2j for human). In contrast, the m⁶A silencers had a lower frequency around the m⁶A sites than the control that had the exact RAC motif matched (Fig. 2i for mouse, and Supplementary Fig. 2k for human). The difference in the downstream region of the m⁶A sites was more evident than upstream region (Fig. 2h, i for mouse, Supplementary Fig. 2j, k for human), supporting the hypothesis that the functional cis-elements largely resided in the 50 nt downstream of the m⁶A sites. Next, we examined the m⁶A enhancer and silencer motif distribution on several sets of the experimentally mapped m⁶A sites by different methods, including m⁶A-CLIP, m6A-label-seq, m6ACE-seq, and MAZTER-seq. The m⁶A enhancers showed consistently higher frequency in the positive m⁶A sites than the control (Supplementary Fig. 3a, c, e, g, i, k, m) for both human and mouse dataset, while the m⁶A silencers exhibited lower frequency (Supplementary Fig. 3b, d, f, h, j, l, n). All of these positional distribution investigations confirmed that the frequency difference for the m⁶A enhancers and silencers was more evident in the downstream region of m⁶A sites than the upstream region, and generally true regardless of the experimental approaches that mapped the m⁶A sites.

Furthermore, we conducted the study for the sequence conservation flanking the m⁶A sites and found that the flanking sequences of the m⁶A sites were more conserved than that of the control (Fig. 2j for mouse, and Supplementary Fig. 2l for human). Moreover, the functionally greater importance of the downstream sequences flanking m⁶A sites compared with upstream sequences was supported by their being more conserved cross species and the fact that such conservation did not exist in the control. (Fig. 2j for mouse, and Supplementary Fig. 2l for human).

At last, given that the enhancers surrounding the m⁶A sites include mostly part of RRACH motif, which are potential motif for methylation, we examined the distribution of methylated sites flanking the m⁶A sites. We found that the RAC sites adjacent to m⁶A sites have a higher frequency to be m⁶A sites (Supplementary Fig. 2m), indicating it’s more likely to be methylated. The RAC sites adjacent to non-m⁶A sites have lower frequency to be m⁶A sites (Extended Data Fig. 2n), indicating it’s unlikely to be methylated. Moreover, both methylated and non-methylated RAC sites are enriched in the downstream 50 nt region of m⁶A site (Extended Data Fig. 2m), suggesting both could enhance m⁶A deposition.

Taken together, our data strongly supported that the cis-elements regulating m⁶A deposition largely reside within the 50 nt downstream of the m⁶A sites, with additional functional subsequences being less concentrated in other regions (Fig. 2b, c, and Supplementary Fig. 1q). Enhancers include mostly part of RRACH motif, while silencers generally contain CG/GT/CT motifs.

Experimental validation of the iM6A modeling

By an independent experimental dataset, we validated the m⁶A deposition modeling by iM6A. The lymphoblastoid cell lines (LCLs) were from a collection of 60 Yoruba (YRI) human individuals. m⁶A signals of LCLs were experimentally determined by m⁶A RIP-seq method (m⁶A RNA immunoprecipitation and sequencing) in the transcriptome⁵⁰. Within the genomes of the 60 individuals, there was adequate data to obtain a reference allele, alternate SNVs (single-nucleotide variants), and heterozygote examples. It was now possible to investigate how SNVs influence m⁶A deposition. We implemented a computational method (see Fig. 3a and Methods section) to quantify the association between a specific SNV and the m⁶A level of an m⁶A peak region in which this SNV located. The m⁶A peak regions that contained the m⁶A sites were usually 200 nt or longer. iM6A calculated the effect of specific SNVs on m⁶A deposition and identified 47 SNVs that either increased or decreased the m⁶A deposition (|ΔProbability | > 0.1). Among them, the statistical majority (33 SNVs out of 47, P < 0.004, Binomial test, Fig. 3b) had the same directional change in m⁶A deposition modeled by iM6A and as determined experimentally. Furthermore, we examined the value correlation between the iM6A modeled m⁶A deposition changes (ΔProbability) and the experimentally measured m⁶A deposition changes (ΔPeakEnrichment), and found that iM6A quantitatively modeled the experimental m⁶A deposition changes (Fig. 3c, P < 0.0003, Student’s t-test). Among the 47 SNVs, ten located upstream of m⁶A sites, four were at the m⁶A site, and the remaining 33 located within the downstream 50 nt of m⁶A sites (Fig. 3d). Thus, SNVs that affected m⁶A deposition were statistical biased towards downstream (P < 0.0002, Binomial test), supporting that the downstream region of m⁶A sites contained largely the cis-elements regulating m⁶A deposition. We also found that the underlying cis-elements alterations for each SNV represented as the sum of the effective value changes for all the involved pentamers (ten pentamers in total, five pentamers disrupted and created simultaneously) quantitively agreed with the experimental m⁶A deposition changes (Fig. 3e, P < 0.0001, Student’s t-test). Four examples in which the m⁶A deposition was affected by an SNV were shown in Fig. 3f–i. The rs7831 in PDCD11 gene was an A to C mutation that was modeled by iM6A to be at an m⁶A sites and decrease its m⁶A probability value from 0.8 to about 0, and indeed we observed an evident loss of experimental m⁶A signal in the alternative allele data (C nucleotide, blue color) in comparison to that of the reference allele data (A nucleotide, red color) (Fig. 3f). The rs75907001 in DOPEY2 gene was a T to C mutation which was modeled by iM6A to decrease the m⁶A probability value of the m⁶A site from 0.6 to about 0.4, and an experimental m⁶A signal decrease was observed (Fig. 3g). This T to C mutation located 4 nt downstream of an m⁶A site, disrupted three m⁶A enhancer motifs (ACTCT, CTTGG, and TTGGG), and simultaneously created four CG/GT/CT-containing silencer motifs (ACTCC, CTCCT, TCCTG, and CCTGG) (Fig. 3g). The rs9090 in PARM1 gene was a C to T mutation which was modeled by iM6A to increase the probability value of the m⁶A site from about 0 to 0.15, and an experimental m⁶A signal increase was recorded (Fig. 3h). This C to T mutation located three nucleotides downstream of an m⁶A sites, created four enhancers (AGACT, GACTG, ACTGT, and CTGTT), and one silencer (TGTTT) and simultaneously disrupted three silencers (GACCG, ACCGT, and CCGTT), leading to an overall increase of m⁶A signal supported by both the iM6A modeling and the experimental data (Fig. 3h). Another example was the rs1057278 located in TTLL3 genes. This G to A mutation located eleven nucleotides downstream of an m⁶A sites according to iM6A, disrupted four CG/GT/CT silencers (CAGGA, AGGGC, GGGCT, and GGCTG) and simultaneously created five enhancers (CAGGA, AGGAC, GGACT, GACTG, and ACTGA), collectively contributing to an increase of m⁶A signal again supported by both the iM6A modeling and the experimental data (Fig. 3i). Altogether, the experimental data supported the m⁶A deposition modeling by iM6A and that confirmed that the m⁶A regulating cis-elements locate downstream of the m⁶A sites.

**Fig. 3: Experimental validation of iM6A modeling.**

Many pathogenic SNVs are associated with m⁶A deposition changes

Even though a number of studies have revealed that the dysregulation of m⁶A impacts various diseases⁵¹, little is known about how nucleotide variants impact m⁶A deposition. To address this question, 68286 SNVs were extracted from the ClinVar database (https://www.ncbi.nlm.nih.gov/clinvar/) (see details in the Methods). As shown previously (Fig. 2), flanking sequence can influence m⁶A deposition. We selected the SNVs in the 500 nt region flanking the m⁶A sites. The clinical significance of these SNVs were categorized based on the ClinVar annotation. We found that many SNVs were modeled to alter m⁶A deposition, either enhancing or dampening (Fig. 4a). Though a large proportion of SNVs don’t affect m⁶A deposition, some do have evident effects on m⁶A deposition (Fig. 4a). We focused on the events that could change the m⁶A probability (|ΔProbability | ≥ 0.1) (Fig. 4b), and found that many of these were also highly enriched in the region 50 nt downstream of the m⁶A sites (Fig. 4c), as was found previously for SNVs created by single nucleotide saturation mutagenesis (Fig. 2b, c, and Supplementary Fig. 2a, b) and the SNV experimental validation data (Fig. 3d). To exclude the effect of SNVs on change of protein-coding sequence, we focused on the SNVs that only cause synonymous mutations, and these SNVs that could change the m⁶A probability (|ΔProbability | ≥ 0.1) were also highly enriched in the region 50 nt downstream of m⁶A sites (Supplementary Fig. 4a, b). We further categorized all the SNVs (68286 SNVs in ClinVar database) into two groups (m⁶A probability change or no change) based on ΔProbability (|ΔProbability | ≥ 0.1 or | ΔProbability | < 0.1), and we found that the pathogenic SNVs had greater prevalence in the group for which m⁶A probability significantly changed (P < 0.001, Fisher’s exact test, Fig. 4d), demonstrating that pathogenic SNVs were more likely to alter m⁶A deposition than non-pathogenic SNVs.

Four examples in which m⁶A deposition was affected by the pathogenic SNVs were shown in Fig. 4e–h. For rs121918205 and rs121918213 in DARS2 gene, the rs121918205 was a C to G mutation that broke the C in the RAC consensus of an m⁶A site, leading to a decreased m⁶A probability value of −0.3 according to iM6A. The rs121918213 was a T to A mutation that led to a decreased m⁶A probability value of −0.3 (Fig. 4e). The DARS2 encodes mitochondrial aspartyl-tRNA synthetase, and its deficiency may be involved in leukoencephalopathy⁵². The two SNVs above could affect the m⁶A modification in the DARS2 transcript as a novel disease cause. The rs760539449 located in SOX10 gene was an A to G mutation that led to a decrease m⁶A probability value of −0.4 (Fig. 4f). The rs760539449 was annotated as likely pathogenic SNV in ClinVar. The iM6A modeled this mutation to result in a loss of the m⁶A deposition as a potentially new disease mutation insight. The rs1555162327 in TUBA1A gene was a C to T mutation that generated an increased m⁶A probability value of 0.6 according to iM6A (Fig. 4g). Another example was the rs199473377 located in KCNJ2 gene, and this G to A mutation caused m⁶A probability increase to 0.4 by iM6A (Fig. 4h). In summary, iM6A worked as a method to annotate the disease-related SNVs that could affect m⁶A deposition. Even though all the SNVs showed in Fig. 4e–h also cause missense mutation, iM6A could help to annotate the effect of pathogenic SNVs on m⁶A deposition beyond protein-coding sequence mutations. In ClinVar database, the missense and nonsense SNVs are more likely to be annotated as pathogenic for their convenience in inferring protein functional disruption. In other words, the pathogenic SNVs that are documented currently in ClinVar primarily focus on protein sequence disruption. Our iM6A annotation provides an alternative angle to interpret these disease-causing SNVs from the m⁶A RNA modification perspective. As the m⁶A disease research grows mature in the future, the ClinVar database could include pathological SNVs that was affected by m⁶A deposition alone and our iM6A work could promote the disease research discovery in this direction. Defining the disease-associated mutations among millions of SNVs is a grand challenge. The database like RMvar⁵³, RMDisease⁵⁴ collected the genetic variants which might be associated with m⁶A modification, while iM6A could provide synergistic contribution to decipher the cis-element mechanisms and could provide a new perspective in understanding the diseases caused by RNA modifications.

Synonymous codons may influence m⁶A deposition

Since many m⁶A sites locate in the coding region, an open hypothesis is whether synonymous codon usage affects m⁶A deposition and serves as a new layer of regulation. To test this hypothesis, we performed saturation synonymous codon swap (Fig. 5a) for the coding sequences flanking the m⁶A sites in the last exon and calculated with iM6A the positional mutational effects on m⁶A deposition. We found the mutational events that either increased or decreased the m⁶A probability significantly (|ΔProbability | > 0.1) were also highly enriched in the downstream region of the m⁶A sites (Fig. 5b for mouse; and Supplementary Fig. 5a for human), supporting that synonymous codons that influenced m⁶A deposition located in this region. Next, we systematically and quantitatively analyzed the effect of the synonymous codons on m⁶A deposition in the 15 downstream codons (15 × 3 nt = 45 nt, covering the downstream 50 nt region). Similar to the pentamer analysis in Fig. 2d, we implemented the linear regression approach (Fig. 5c) to identify the effect of synonymous codon on m⁶A deposition: each synonymous codon substitution created one codon and disrupted the original codon simultaneously, the effect for each synonymous codon was determined by the slope of the linear regression equation. Based on their effect value and the statistical significance, synonymous codons were ranked for their effect in m⁶A deposition. Top 10 enhancing or silencing synonymous codons and their corresponding amino acids were showed: the m⁶A enhancing synonymous codons include mostly part of the RRACH motif; the m⁶A silencing synonymous codons mostly have the CG/GT/CT motifs, agreeing with the pentamer motif property of the m⁶A enhancers and the m⁶A silencers (Fig. 5d for mouse, and the virtually the same codon set for human, Supplementary Fig. 5b). Interestingly, we saw that many sets of synonymous codons encoding the same amino acids contained codons with opposing effects on m⁶A deposition. For example, both AGA and CGT encoded arginine (R) with the former enhancing m⁶A deposition and the latter silencing m⁶A deposition (Fig. 5e). More examples included the synonymous codon pair GAC and GAT for the aspartic acid (D) and the pair ACT and ACC for the threonine (T) (Fig. 5e). We also observed a strong effect correlation for all synonymous codons between the studies in mouse and in human, supporting that the same synonymous codon bias influenced m⁶A site-specific deposition in both mouse and human (Fig. 5f). We further investigated the positional distribution of the m⁶A enhancing and silencing synonymous codons in the region flanking the m⁶A sites. The m⁶A enhancing synonymous codons had a higher frequency around the m⁶A sites than the control (Fig. 5g for mouse, and Supplementary Fig. 5c for human). In contrast, the m⁶A silencing synonymous codons had a lower frequency around the m⁶A sites than the control (Fig. 5h for mouse, and Supplementary Fig. 5d for human). The density difference for the enhancing/silencing synonymous codon is more evident in the downstream region of an m⁶A site than its upstream region (Fig. 5g, h for mouse, and Supplementary Fig. 5c, d for human), arguing that the functional cis-elements fall more often in the 50 nt downstream of the m⁶A sites.

The stop codon TGA may favor the m⁶A deposition nearby

We investigated the hypothesis if different stop codons could affect the m⁶A deposition. We categorized all the coding genes based on their stop codon (TAA/TAG/TGA), and investigated the m⁶A probability value distribution in the region flanking the stop codon (Fig. 6a). We found three positions at which the m⁶A sites could be adjacent to the stop codon. If TRR is the stop codon, then (1) Position −2 (Fig. 6a), straddled by motif NRACTRR; (2) position 1, straddled by TRACN; and (3) position 3, straddled by TRRACN. All the three positions showed relatively higher m⁶A probability than other positions near stop codons (Fig. 6a for mouse, Supplementary Fig. 6a for human). Interestingly, transcripts with the TGA stop codon had higher m⁶A probability as calculated by iM6A for all the three positions in comparison to transcripts with the TAA or TAG stop codon, particularly for the Position −2: NRACTRR and the Position 2: TRACN (Fig. 6a for mouse, Supplementary Fig. 6a for human). Next, we performed stop codon swaps and evaluated the resulting impact on m⁶A deposition (Fig. 6b). We found that the m⁶A probability at all the three locations decreased when the stop codon was changed to TAA or TAG, particularly for Position −2: NRACTRR and Position 2: TRACN (Fig. 6b, c for mouse, Supplementary Fig. 6b, c for human). Conversely, m⁶A probability at the three locations increased when TAA or TAG was changed to TGA, again particularly for Position −2: NRACTRR and Position 2: TRACN (Fig. 6b, d for mouse, Supplementary Fig. 6b, d for human). Both stop codon swap experiments support that the TGA stop codon may favor m⁶A deposition at and adjacent to a stop codon location. We further categorized the transcripts into two groups according to their stop codons (TGA or non-TGA), and found that transcripts with the m⁶A sites at and adjacent to stop codons were statistically enriched with the TGA stop codon (P < 1 × 10⁻⁴, Fisher’s exact test) (Fig. 6e for mouse, and Supplementary Fig. 6e for human). As evolution conservation provides evidence for functional importance, we further explored the conservation of stop codons in all the transcripts. Indeed, the stop codon of a transcript was more conserved if it was a part of an m⁶A site, supporting its functional importance (Fig. 6f and Supplementary Fig. 7a for mouse, Supplementary Figs. 6f, 7b for human). In the situation that the stop codon of a transcript was be part of an m⁶A site, the TGA stop codons were more conserved than the non-TGA stop codons, supporting that the TGA stop codon may favor the m⁶A deposition at and adjacent to stop codon (Fig. 6f and Supplementary Fig. 7a for mouse, Supplementary Figs. 6f, 7b for human). Moreover, the TGA as a trimer motif may promote m⁶A deposition in comparison to TAA and TAG trimers (Supplementary Table 1).

**Fig. 6: Stop codon TGA may favor m⁶A deposition at and adjacent to Stop codon.**

Evidence for an evolutionarily conserved m⁶A regulatory code in mouse and human

For all the findings in this work, our data consistently suggests that the same m⁶A cis-element code governs m⁶A deposition in both human and mouse. To comprehensively address this hypothesis, we implemented a head-to-head test comparison for the human iM6A model and the mouse iM6A model, both of which were trained on that species’ genes from most chromosomes except chromosome 9. Thus, the genes from human chromosome 9 and mouse chromosome 9 offered two independent testing datasets being untouched for both the human and mouse iM6A models during their training. Human and mouse iM6A model performed comparably on both chromosome 9 transcripts (Fig. 7a, b), supporting that a common cis-element code governed the m⁶A deposition in both human and mouse. To further investigate the possibility of a common cis-element code, we compared the protein sequence conservation between human and mouse for the known components of the m⁶A methyltransferase complex including METTL3, METTL14, WTAP, and VIRMA (Fig. 7c), and found that >95% of amino acids were identical for each of the four proteins between human and mouse, supporting their functional conservation and, therefore, the likelihood of the cis-elements code commonality.

Our iM6A method modeled the m⁶A site-specific deposition in the pre-mRNA transcript, showing that the cis-elements regulating m⁶A deposition located preferentially within 50 nt downstream of m⁶A sites. It also identified which pentamers were m⁶A enhancers and silencers with the former mostly being part of RRACH motif and the latter mostly containing CG/GT/CT dinucleotides (Fig. 7d).

Discussion

In this study, we used experimentally determined m⁶A sites from mouse and human as the training dataset to build our iM6A deep learning model, the modeling of which was confirmed to be accurate with AUROC = 0.99 by the independent testing dataset and by using m⁶A sites experimentally determined by a variety of precise m⁶A mapping methods. Taking advantage of the high modeling accuracy of the iM6A deep learning model, we implemented saturated mutagenesis on input transcripts to systematically perturb the iM6A deep learning black box. This led to discovering that the downstream 50 nt of the m⁶A sites located a high density of cis-elements regulating m⁶A deposition. Applying the linear regression model as a machine learning method to the saturated mutagenesis data, we were able to systematically identify m⁶A enhancers and silencers in the region. Thorough bioinformatics characterization of these cis-elements including positional plot and sequence conservation analysis confirmed their designated function. Our finding was further supported by independent experimental validations, and uncovered a key cis-regulatory mechanism for m⁶A site-specific deposition.

Though deep learning modeling is powerful in integrating large datasets and accurate in modeling compared to traditional machine learning methods, it is hard to interpret the underlying biological insights from its deep learning model as a black box. Conversely, traditional machine learning approaches such as linear regression are useful for connecting model parameters to the biological function. In this study, we took advantage of both deep learning and machine learning: we first implemented the iM6A deep learning model to accurately model m⁶A deposition, then applied linear regression as a machine learning approach to systematically characterize the cis-element contribution in the high-throughput saturated mutagenesis data from the iM6A deep learning modeling. Our joint method of combining both deep learning and machine learning may be applicable to other biological investigations.

Our work revealed that m⁶A cis-elements are largely located within the 50 nt region downstream of the m⁶A site. This 50 nt range echoes that of several other RNA processing events. A typical exon length is usually at least 50 nt or longer though tiny exons do exist⁵⁵. The cis-elements that regulates cleavage and polyadenylation generally locate within 50 nt of the cleavage site⁵⁶. The size 50 nt may reflect the physical sizes of the different RNA processing complexes. A detailed investigation of the molecular mechanism underlying the 50 nt range would be an interesting and useful future research project based on thorough biochemistry experiments. Also, our work revealed that the cis-elements regulating the m⁶A deposition include the RRACH enhancers and the CG/GT/CT silencers. What are the trans-factors that recognize these cis-elements and how they regulate m⁶A deposition are all potentially worth directions for future molecular mechanism investigations.

The iM6A modeling introduced here should prove effective in future mechanism investigations of m⁶A regulation and deposition because it not only enables accurate modeling of the m⁶A deposition but also offers a high-throughput, fast and efficient mechanism discovery which would be cost-prohibitive and time-impractical for traditional laboratory experimentation. Anticipating broad interests in our iM6A strategy, we have deposited our iM6A source code at GitHub (https://github.com/ke-laboratory/iM6A) as well as the probability values of m⁶A candidates in human and mouse genes (https://doi.org/10.5281/zenodo.4734266).

Methods

iM6A Model

Model architecture

The iM6A is based on a deep residual neural network⁴⁴. The basic unit of iM6A is the Residual Network (ResNet) block and we implemented the ResNet structure according to the CNN Architectures and implementations by MLT (https://github.com/Machine-Learning-Tokyo/CNN-Architectures), which is composed of batch-normalization (BatchNorm) layers, rectified linear units (Relu), and convolutional (Conv1D) layers organized in a specific manner (Fig. 1a). In ResNet block, k, w, ar, and r are the number of convolutional kernels, window size, dilation rate of each convolutional kernel in the layer, and repetition numbers, respectively. The current combination of k, w, ar, and r-values are showed in Fig. 1a, which were obtained by hyperparameter search (Supplementary Data 1). iM6A starts with a Conv1D, then is followed by four ResNet blocks. The output of every ResNet block is added to the input of penultimate layer (Cropping1D), connected with a Cov1D layer with softmax activation.

The training input of iM6A for each gene is the full length of the pre-mRNA sequence with 5000 nucleotides on each side, covering the transcript from 5 kb upstream of TSS (transcription start site) to 5 kb downstream of TES (transcription end site).The sequence is transformed by One-Hot-Encoding, which N, A, C, G, and T are encoded as [0,0,0,0], [1,0,0,0], [0,1,0,0], [0,0,1,0], and [0,0,0,1] respectively. Then, the one-hot-encoded nucleotide sequence was split into blocks of length 5000 + 5000 + 5000 in such a way that the i^th block is consisted of the nucleotide sequence position from 5000(i−2) +1 to 5000i + 5000. Along with the sequence input, the location for the positive training set of individual m⁶A sites was marked out, and the output label was also split into block of length 5000 in such a way that i^th block consists of the positions from 5000(i−1) +1 to 5000i. Information of input and output was jointly feed into the ResNet deep learning network for training. A similar input strategy has been implemented in SpliceAI⁴⁵ that implemented deep learning to model splice sites in pre-mRNA. The output of the model is the probability value of each position being an m⁶A site.

Model training and testing

We downloaded the gene annotation tables (vM7 for mouse, v19 for human) from GENCODE (https://www.gencodegenes.org/) and extracted the longest transcript for each gene. Both mouse and human m⁶A sites were collected from published data^8,14,32,33, which were determined by m⁶A-CLIP experiments. The consensus motif for an m⁶A site could be either RRACH as a high stringent set or RAC as a more inclusive set. We generated two separate iM6A models using either stringency: the RRACH dataset for RRACH iM6A model and the RAC dataset for RAC iM6A model. The transcripts with its m⁶A sites were kept as input (mouse RRACH: 8475 genes, 41,551 m⁶A sites; mouse RAC: 8939 genes, 57,712 m⁶A sites; human RRACH: 8598 genes, 54,354 m⁶A sites; human RAC: 10,314 genes, 81,519 m⁶A sites). We used pre-mRNA sequences as input: the m⁶A sites on pre-mRNA were served as positive sites, while the remained nucleotides were treated as negative sites. The whole dataset was divided into training and test datasets. The training dataset contained all the transcripts on most chromosomes except chromosome 9, the transcripts of which were held out and reserved for the test later on.

The iM6A were trained for 10 epochs with a batch size of 30 on NVIDIA GPU. By pulling singularity container (tensorflow-19.01-py2) from NVIDA official website, we created an environment for model training and testing. Extra packages (biopython: 1.76; scikit-learn: 0.20.3, matplotlib: 2.2.4, keras: 2.0.5) were installed into an external path by pip. For training, Adam optimizer was used to minimize the categorical cross-entropy loss between the target and modeled outputs. The learning rate of the optimizer was set as 0.001 for the first 6 epochs, and then reduced by a factor of 2 in every subsequent epoch. We trained the model for five times and obtained five trained models. For testing, each input was evaluated using all five trained models, while the average score of their outputs was used as the modeled value.

Comparison of iM6A with other methods

We compared the modeling performance of iM6A with that of the machine learning-based SVM method³⁹ and that of the deep learning-based CNN-RNN method⁴⁰. Both SVM and CNN-RNN models were trained on the same training samples used for iM6A, and the m⁶A and non-m⁶A sites were conformed to the RRACH motif in the same way as in Chen et al., 2019. For the positive training data, the input is the sequence centered on the m⁶A sites (39,138 sites). For the negative training data, the input is the sequence centered on the non-m⁶A sites, which were randomly selected from the non-m⁶A sites on the same full transcripts that contained the positive sites. The sequence length for SVM model was 41 nt as described in Chen et al., 2019, while the sequence length for CNN-RNN model was 1001 nt as in Wang and Wang, 2020. Moreover, the positive-to-negative ratio was 1:1. For independent testing, the sequence centered on the m⁶A and non-m⁶A sites in chromosome 9 were used to quantify the modeling performance of the models, and ROC (receiver operator curves) curves and AUROC (area under receiver operator curves) scores were used as the performance evaluation metrics.

Validation of iM6A modeling by experimentally detected m⁶A sites

We downloaded the gene annotation tables (vM7 for mouse, v19 for human) from GENCODE (https://www.gencodegenes.org/) and extracted the longest transcript for each coding gene (mouse: 22,357 genes, human: 20,536 genes). The probability value of each nucleotide being an m⁶A site in the pre-mRNA of the transcripts were modeled by iM6A, and the sites selected were those conforming to the RRACH (the iM6A RRACH model) or the RAC (the iM6A RAC model). We collected the m⁶A sites detected by the experimental methods including m⁶A-CLIP^8,14,32,33, m6A-label-seq³⁶, MAZTER-seq³⁴, and m6ACE-seq³⁵. The heatmap was used to visualize the experimentally detected sites in all modeled sites. The modeled sites were ranked by its probability value, and the black line denoted whether methylation was identified by the experimental method at the site (Fig. 1c, d).

Calculation of the m⁶A probability and the enrichment score for the m⁶A sites derived from m⁶A-CLIP

The peak enrichment value for the m⁶A sites in mouse (mouse embryonic stem cell, mESC) and human (the A549 cell line) were quantified by the m⁶A-CLIP^14,33. Based on the enrichment score, the m⁶A were categorized into three groups (low: score < 5, medium: 5 ≤ score < 20, high: score ≥ 20). The probability of the site being an m⁶A site was modeled by the iM6A, and the box plot was used to visualize the peak enrichment value and the modeled m⁶A probability (Fig. 1e and Supplementary Fig. 1h).

Calculation of the m⁶A probability and the cleavage efficiencies for the m⁶A sites derived from MAZTER-seq

The m⁶A sites identified by MAZTER-seq³⁴ were downloaded. According to their supplemental tables, the m⁶A sites were categorized into the groups of control, low, intermediate, high, and highest confidence. We filtered the dataset to retain the sites conforming to the RRACA motif and extracted the normalized cleavage efficiency of the sites from the table. Box plot was used to visualize the normalized cleavage efficiency and the modeled m⁶A probability (Supplementary Fig. 1i, j).

Comparison of the RRACH model with the RAC model

Both RRACH and RAC of the iM6A models for mouse (Fig. 1f) were trained independently on the genes of all the other chromosomes except those of the chromosome 9 (Chr9). The m⁶A sites in Chr9 were modeled by either the RRACH iM6A model or the RAC iM6A model, and the scatter plot was used to visualize the modeled probability of the m⁶A sites between the RRACH model and the RAC model. Each dot represented one site in Chr9 discovered by both models, and the labeled axes provided the probability estimate for that site by the two models. The R-value was calculated by Pearson Correlation Coefficient. The same analysis was performed for human (Supplementary Fig. 1k).

Positional mutational effects on m⁶A deposition implemented by single nucleotide saturation mutagenesis

For the m⁶A sites in last exon, we modeled its probability by iM6A. The sites were sorted based on probability value, and a single m⁶A site with the highest probability value were kept for each gene. In addition, the probability should be ≥0.4. Then, we selected the sites which located at least 250 nt away from both last exon start and last exon end. We obtained 2048 sites for mouse and 2724 sites for human in the last exon region. The same strategy was applied to the m⁶A sites in long internal exon, and we obtained 893 sites for mouse (the m⁶A sites in Plekhm3 gene was excluded for its unusual sequence property) and 806 sites for human.

For each position in the sequence (−250 to 250) flanking the m⁶A site, the nucleotide was substituted by each of the three other nucleotides (Fig. 2a). The delta changes of m⁶A probability value (ΔProbability) after mutation was calculated by iM6A (Fig. 2a).

Quantify the effect of all cis-element pentamers by linear regression

To prepare the m⁶A sites in last exon for the systematic effect analysis of all pentamers, we first modeled the m⁶A probability by iM6A for all m⁶A sites in last exons. In addition, the probability should be ≥0.4. All of these sites were sorted based on their m⁶A probability value, and only a single m⁶A site with the highest probability value was kept for each gene. We further selected the m⁶A sites which located at least 50 nt away from both last exon start and last exon end. We obtained 5292 sites for mouse and 4772 sites for human from which we randomly selected 1500 sites for both mouse and human, as 1500 sites was sufficient for our analysis. The same strategy is applied to the m⁶A sites in long internal exon (length > 100 nt), and we got 1460 sites for mouse and 1416 sites for human.

For each position in the downstream region of an m⁶A site (i.e., from position 3 to position 46), the nucleotide was substituted by each of three other nucleotides. The resulted probability change (ΔProbability) of this m⁶A site is calculated by iM6A. Each substitution created and broke 5 overlapping 5-mers simultaneously, and −1 or 1 was assigned to each of the five created or broke 5-mers. Linear regression was implemented to each 5-mers (total 1024 pentamers) when pooling all the data, then the effect of each motif was ranked based on the slope of linear regression equation and the statistical significance was quantified by p-value (Fig. 2d).

Positional plot of pentamers in sequences flanking m⁶A sites

For the potential m⁶A sites in the RAC consensus at the last exons of each gene, we calculated their m⁶A probability values by iM6A. The m⁶A sites were sorted based on their m⁶A probability value, and a single m⁶A site with the m⁶A highest probability value were kept for each gene. We selected the m⁶A sites with the higher m⁶A probability values (probability ≥ 0.7) as the positive sites, while the control was the exact RAC motif matched site with a lower m⁶A probability value (probability < 0.1). For the m⁶A enhancer and silencer positional plot, we randomly selected 1000 positive sites or control sites located in the last exon, and extracted the 50 nt upstream and downstream sequence flanking the m⁶A site. The pentamers were enumerated from the 5' end to the 3' end of the 101 nt sequence. For the positional plot, we counted the numbers of top 100 enhancers and top 100 silencers at each position of the 101 nt sequence (see details in the section for quantifying the effect of all cis-element pentamers). The frequency of the enhancers or silencers were also calculated. The plots were compared between the positive sites and the control, and the data were presented as mean ± S.E.M. standard error of the mean (Fig. 2h, i).

In parallel, we collected the m⁶A sites detected by different experimental methods, including m⁶A-CLIP^8,14,32,33, m6A-label-seq³⁶, m6ACE-seq³⁵, and MAZTER-seq³⁴. The experimentally determined m⁶A sites (m⁶A-CLIP, m6A-label-seq, m6ACE-seq, and MAZTER-seq) served as the positive sites while the control was the exact RAC motif matched site which was not determined by the experimental methods. Moreover, those control sites did not overlap with the m⁶A peak regions^14,33 and came from the transcripts that also contained the positive sites. For the sites detected by MAZTER-seq, we intersected it with the sites determined by other methods (m⁶A-label-seq, m6ACE-seq, and m⁶A-CLIP) to get the high-quality sites as to lower the multiple technical noises of MAZTER-seq as discussed in Garcia-Campos et al. 2019³⁴. The overlapped sites were served as the positive sites while the control was the exact RAC motif matched site which was not determined by the MAZTER-seq. In addition, those control sites did not overlap with the m⁶A peak regions^14,33 and came from the transcripts that also contained the positive sites. For the m⁶A enhancer and silencer positional plot, we selected the positive sites or control sites located in the last exon, and extracted the 50 nt upstream and downstream sequence flanking the m⁶A site. The pentamers were enumerated from the 5' end to the 3' end of the 101 nt sequence. For the positional plot, we counted the numbers of top 100 enhancers and top 100 silencers at each position of the 101 nt sequence (see details in the section for quantifying the effect of all cis-element pentamers). The frequency of the enhancers or silencers were also calculated. The plots were compared between the positive sites and the control, and the data were presented as mean ± S.E.M. (Supplementary Fig. 3).

Conservation analysis for the sequence flanking the m⁶A sites

For the potential m⁶A sites in the RAC consensus at the last exon, we calculated their m⁶A probability values by iM6A. The m⁶A sites were sorted based on their m⁶A probability value, and a single m⁶A site with the m⁶A highest probability value were kept for each gene. We selected the m⁶A sites with the higher m⁶A probability values (probability ≥ 0.7) as the positive sites, while the control was the exact RAC motif matched site with a lower m⁶A probability value (probability < 0.1). In addition, these RAC sites were located in the noncoding region of last exon (at least 50 nt from the stop codon). We calculated the phyloP score of each nucleotide flanking the RAC sites. The average phyloP score for the sequence flanking the RAC sites were calculated, the plots were compared between the positive sites and the control, and the data were presented as mean ± S.E.M. (Fig. 2j).

Experimental validation of iM6A modeling by the m6A profiling in the lymphoblastoid cell lines (LCLs) of 60 Yoruba (YRI) individuals

The m⁶A levels were profiled across the transcriptome in LCLs derived from 60 YRI individuals⁵⁰. We downloaded raw sequencing data from Gene Expression Omnibus (GEO) repository (GSE125377). Raw sequencing data was mapped to the hg19 reference genome by HISAT2 with the parameter “-known-splicesite-infile <splice-file extract from Refseq hg.19 GTF file > −k 1 —no-unal”. The BAM files obtained from the alignment were used as an input file for BigWig file, which were visualized by UCSC Genome Browser (Fig. 3).

The m6A profiling dataset in the LCLs of 60 YRI individuals was downloaded from Zenodo (https://doi.org/10.5281/zenodo.3870952), which includes the bed file of m⁶A peaks, the normalized enchainment score of each peak of 60 samples, and the imputed genotype data of 60 samples. We downloaded all SNP sites from dbSNP database (https://ftp.ncbi.nih.gov/snp/organisms/human_9606_b151_GRCh37p13/BED/), and extracted all the SNVs located in the m⁶A peaks. The corresponding genotype of each SNV for 60 samples were also extracted. For each SNV, 0, 0.5, and 1 were assigned to homozygote of the major allele, heterozygote, and homozygote of the minor allele. The association between SNV and m⁶A level was tested by linear regression. We obtained 3297 SNVs that were strongly correlated with m⁶A level (p-value ≤ 0.1). Then, we calculated the effects of these SNVs on m⁶A deposition by iM6A, and found 47 SNVs which could affect m⁶A deposition significantly (|ΔProbability | ≥ 0.1). The delta changes of peak enrichment (ΔPeakEnrichment) of these 47 SNVs corresponding m⁶A peaks were calculated using the m⁶A profiling experimental data (Fig. 3).

Characterization of m⁶A associated SNVs

We downloaded data from ClinVar database (https://ftp.ncbi.nlm.nih.gov/pub/clinvar/tab_delimited/variant_summary.txt.gz), and extracted the SNVs located in last exons. We obtained 68286 SNVs, and utilized iM6A to calculate their effects on m⁶A deposition. To characterize SNVs that altered m⁶A deposition, we selected the SNVs located within 500 nt upstream or downstream of an m⁶A site. Then those SNVs were categorized by clinical significance according to ClinVar, and we only kept the sites annotated with uncertain significance/benign/likely benign/pathogenic/likely pathogenic. We singled out uncertain significance as the first group (VUS), grouped benign and likely benign as the second group (Benign), and grouped pathogenic and likely pathogenic as the third group (Patho.)(Fig. 4). Then, the SNVs were categorized into two groups (m⁶A probability changed group or no change group) based on ΔProbability (|ΔProbability | ≥ 0.1 for the changed group). Bar plot was used to show the log₂(odd ratio, m⁶A probability changed group over no change group) for the percentages of SNVs with different clinical significances in ClinVar, and the p-value was calculated by the Fisher’s exact test. To visualize the effect of SNVs on m⁶A deposition, saturation mutagenesis was performed in the region −100 to +100 nt up- and downstream of m⁶A sites, and the ΔProbability of each mutation event was displayed as a heatmap (Fig. 4).

Saturation synonymous codon swap

For the m⁶A sites in last exons, we calculated m⁶A probability values by iM6A. Then the m⁶A sites were sorted based on their probability value, and a single m⁶A site with the highest probability value was kept for each gene. Then, we selected the m⁶A sites which located at least 60 nt away from both the last exon start and the end of the coding sequence. In total, we obtained 1473 m⁶A sites for mouse and 1532 m⁶A sites for human in the last exon region. For each codon position in the codons (−15 to +15 codons, −45 nt to +45 nt) flanking the m⁶A site, the codon was substituted by each of its synonymous codons (Fig. 5a). The delta changes of the m⁶A probability value (ΔProbability) after the codon swap was calculated by iM6A.

Quantify the effect of all synonymous codons on m⁶A deposition by linear regression

To prepare the m⁶A sites in last exons for the systematic effect analysis of all pentamers, we first modeled the m⁶A probability value by iM6A for all m⁶A sites in last exons. These sites were sorted based on their m⁶A probability value, and only a single m⁶A site with the highest probability value was kept for each gene. We further selected the m⁶A sites which located at least 60 nt away from both the last exon start and the coding sequence end. In total, we obtained 1473 sites for mouse and 1532 sites for human. For each codon position in the downstream region of an m⁶A site (from position 1 to position 15), the codon was substituted by each of its synonymous codons. The resulted probability change (ΔProbability) of this m⁶A site was calculated by iM6A. Each codon substitution created one codon and simultaneously broke the original codon. A value of 1 or −1 was assigned to the created codon or the broken codon accordingly. Linear regression was implemented to each codon (total 64 codons) when pooling all the data together, then the effect of each synonymous codon was ranked based on the slope of the linear regression equation and the statistical significance was quantified by p-value (Fig. 5c).

Positional plot of trimers in sequences flanking m⁶A sites

For the potential m⁶A sites in the RAC consensus at the coding region of the last exon for each gene, we modeled its probability value by iM6A. We selected the m⁶A sites with relatively high probability values (probability ≥ 0.7) as the positive sites, while the control was the exact RAC motif matched site with lower m⁶A probability (probability < 0.01). For the trimers enumeration, we extracted the 50 nt upstream and downstream sequences of each m⁶A site. The trimers were enumerated from the 5' end to the 3' end of the 101 nt sequence. For the positional plot, we counted the numbers of the top 20 enhancers codons and the top 20 silencers codons at each position of the 101 nt sequence (see details in the section for quantifying the effect of all synonymous codons on the m⁶A deposition by linear regression). The frequency of enhancers or silencers were also calculated. The plots were compared between the positive sites and control, and the data were presented as mean ± S.E.M. (Fig. 5g, h and Supplementary Fig. 5c, d).

Distribution of m⁶A probability value around stop codon

All the coding genes were categorized by their stop codons (three groups: TAA/TAG/TGA). To plot the distribution of m⁶A probability value around stop codon (the position 0 was the T nucleotide for stop codons), we first calculated the m⁶A probability value of each nucleotide flanking the stop codon (−500 nt to +500 nt) by iM6A. The total probability value for each position was summed, and the average probability value was computed by dividing the total number of transcripts at each position (Fig. 6a).

For the stop codon swap, the TGA stop codon was substituted by the TAA or the TAG stop codon (Fig. 6b). Similarly, TAA or TAG was replaced by TGA. The average m⁶A probability value around the stop codon was also calculated by the iM6A. For the m⁶A sites adjacent to the stop codon (Position −2: NRACTRR, Position 2: TRACN, Position 3: TRRACN), we evaluated the m⁶A probability value changes by the CDF (Cumulative Distribution Function) plot, and the p-values were calculated by the KS-test (Kolmogorov-Smirnov test).

Conservation analysis of stop codons with or without m⁶A sites

For the RAC sites adjacent to the stop codons (NRACTRR, TRACN, and TRRACN, and TRR represented the stop codon), we calculated the average phyloP score of its corresponding stop codon. The m⁶A sites were categorized into two groups (m⁶A or non-m⁶A) based on its probability value (the cutoff = 0.05), while its corresponded stop codon was also categorized into two groups (TGA or non-TGA). The donut plot was used to show the percentage of stop codon for the m⁶A sites and the non-m⁶A sites, and the p-value was calculated by the Fisher’s exact test. The conservation score of stop codons for each group was compared by the box plot, and the p-value was determined by the Student’s t-test.

Conservation analysis of stop codons with or without m⁶A-CLIP sites

For the RAC sites adjacent to or at the stop codons (NRACTRR, TRACN, and TRRACN, and TRR represented the stop codon), we calculated the average phyloP score of its corresponding stop codon. The stop codons adjacent to or with the m⁶A-CLIP sites were the stop codons that overlapped with either the m⁶A sites detected by m⁶A-CLIP at the three positions or the m⁶A peak region^8,14,32,33. The control was the stop codons that was neither adjacent to nor at the m⁶A-CLIP sites nor overlapped with the m⁶A peak regions^14,33, and we further require the control stop codons to come from the transcripts with the m⁶A-CLIP sites (i.e., these transcripts had adequate expression level to have m⁶A sites detected by the m⁶A-CLIP). Moreover, the stop codons were categorized into two groups (TGA or non-TGA). The conservation score of stop codons for each group was compared by the box plot, and the p-value was determined by the Student’s t-test.

Comparison of mouse iM6A model with human iM6A model

Both mouse and human iM6A models were trained independently on the genes of all the other chromosomes except the chromosome 9 (Chr9) in mouse and human, respectively. The m⁶A sites in Chr9 of mouse or human were modeled by either mouse iM6A model or human iM6A model independently, and the scatter plot was used to visualize the modeled probability of the m⁶A sites between the mouse and human models (Fig. 7a, b). Each dot represented one site in chromosome 9 (Chr9) discovered by both models, and the labeled axes provided the probability estimate for that site by the two models. The R-value was calculated by Pearson Correlation Coefficient.

Comparison of the protein sequence conservation of METTL3, METTL14, WTAP, and VIRMA between mouse and human

The mouse and human protein sequence of METTL3 (Mouse: Q8C3P7, Human: Q86U44), METTL14 (Mouse: Q3UIK4, Human: Q9HCE5), WTAP (Mouse: Q9ER69, Human: Q15007), and VIRMA (Mouse: A2AIV2, Human: Q69YN4) were downloaded from Uniport (https://www.uniprot.org/). The MEGA (Molecular Evolutionary Genetics Analysis) software⁵⁷ was used to align the protein sequences. We visualized the sequence conservation with the heatmap. The full length of the protein was illustrated by the gray box, while the individual amino acids that differed between mouse and human were marked as black lines. The percentage of conserved amino acids between mouse and human proteins was also calculated accordingly (Fig. 7c).

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

The datasets of the probability for the m6A candidates are deposited to available at Zenodo (https://doi.org/10.5281/zenodo.4734266). ClinVar dataset is available at https://ftp.ncbi.nlm.nih.gov/pub/clinvar/tab_delimited/variant_summary.txt.gz. dbSNP dataset is available at https://ftp.ncbi.nih.gov/snp/organisms/human_9606_b151_GRCh37p13/BED/. The m6A profiles of 60 YRI samples were available with the Gene Expression Omnibus repository under accession no. GSE125377 and https://doi.org/10.5281/zenodo.3870952.

Code availability

The source code of iM6A is available at GitHub (https://github.com/ke-laboratory/iM6A).

References

Dominissini, D. et al. Topology of the human and mouse m6A RNA methylomes revealed by m6A-seq. Nature 485, 201–206 (2012).
Article ADS CAS PubMed Google Scholar
Meyer, K. et al. Comprehensive analysis of mRNA methylation reveals enrichment in 3' UTRs and near stop codons. Cell 149, 1635–1646 (2012).
Article CAS PubMed PubMed Central Google Scholar
Perry, R. & Kelley, D. Existence of methylated messenger RNA in mouse L cells. Cell 1, 37–42 (1974).
Article CAS Google Scholar
Desrosiers, R., Friderici, K. & Rottman, F. Characterization of Novikoff hepatoma mRNA methylation and heterogeneity in the methylated 5’ terminus. Biochemistry 14, 4367–4374 (1975).
Article CAS PubMed Google Scholar
Wei, C., Gershowitz, A. & Moss, B. Methylated nucleotides block 5' terminus of HeLa cell messenger RNA. Cell 4, 379–386 (1975).
Article CAS PubMed Google Scholar
Batista, P. et al. m(6)A RNA modification controls cell fate transition in mammalian embryonic stem cells. Cell Stem Cell 15, 707–719 (2014).
Article CAS PubMed PubMed Central Google Scholar
Geula, S. et al. Stem cells. m6A mRNA methylation facilitates resolution of naive pluripotency toward differentiation. Science 347, 1002–1006 (2015).
Article ADS CAS PubMed Google Scholar
Vu, L. et al. The N6 methyladenosine m6A forming enzyme METTL3 controls myeloid differentiation of normal hematopoietic and leukemia cells. Nat. Med 23, 1369–1376 (2017).
Article CAS PubMed PubMed Central Google Scholar
Yoon, K. et al. Temporal control of mammalian cortical neurogenesis by m6A methylation. Cell 171, 877–889.e17 (2017).
Article CAS PubMed PubMed Central Google Scholar
Wang, Y. et al. N6-methyladenosine RNA modification regulates embryonic neural stem cell self-renewal through histone modifications. Nat. Neurosci. 21, 195–206 (2018).
Article CAS PubMed PubMed Central Google Scholar
Zaccara, S., Ries, R. J. & Jaffrey, S. R. Reading, writing and erasing mRNA methylation. Nat. Rev. Mol. Cell Biol. 20, 608–624 (2019).
Article CAS PubMed Google Scholar
He, P. C. & He, C. m(6) A RNA methylation: from mechanisms to therapeutic potential. Embo j. 40, e105977 (2021).
Article CAS PubMed PubMed Central Google Scholar
Wang, X. et al. N6-methyladenosine-dependent regulation of messenger RNA stability. Nature 505, 117–120 (2014).
Article ADS PubMed CAS Google Scholar
Ke, S. et al. m(6)A mRNA modifications are deposited in nascent pre-mRNA and are not required for splicing but do specify cytoplasmic turnover. Genes Dev. 31, 990–1006 (2017).
Article CAS PubMed PubMed Central Google Scholar
Zaccara, S. & Jaffrey, S. R. A unified model for the function of YTHDF proteins in regulating m(6)A-modified mRNA. Cell 181, 1582–1595.e18 (2020).
Article CAS PubMed PubMed Central Google Scholar
Sommer, S., Lavi, U. & Darnell, J. J. The absolute frequency of labeled N-6-methyladenosine in HeLa cell messenger RNA decreases with label time. J. Mol. Biol. 124, 487–499 (1978).
Article CAS PubMed Google Scholar
Wang, P., Doxtader, K. & Nam, Y. Structural basis for cooperative function of Mettl3 and Mettl14 methyltransferases. Mol. Cell 63, 306–317 (2016).
Article CAS PubMed PubMed Central Google Scholar
Śledź, P. & Jinek, M. Structural insights into the molecular mechanism of the m(6)A writer complex. Elife 5, e18434 (2016).
Wang, X. et al. Structural basis of N(6)-adenosine methylation by the METTL3-METTL14 complex. Nature 534, 575–578 (2016).
Article ADS CAS PubMed Google Scholar
Ping, X. et al. Mammalian WTAP is a regulatory subunit of the RNA N6-methyladenosine methyltransferase. Cell Res 24, 177–189 (2014).
Article CAS PubMed PubMed Central Google Scholar
Knuckles, P. et al. Zc3h13/Flacc is required for adenosine methylation by bridging the mRNA-binding factor Rbm15/Spenito to the m. Genes Dev. 32, 415–429 (2018).
Article CAS PubMed PubMed Central Google Scholar
Wen, J. et al. Zc3h13 regulates nuclear RNA m6A methylation and mouse embryonic stem cell self-renewal. Mol. Cell 69, 1028–1038.e6 (2018).
Article CAS PubMed PubMed Central Google Scholar
Schwartz, S. et al. Perturbation of m6A writers reveals two distinct classes of mRNA methylation at internal and 5’ sites. Cell Rep. 8, 284–296 (2014).
Article CAS PubMed PubMed Central Google Scholar
Haussmann, I. et al. m(6)A potentiates Sxl alternative pre-mRNA splicing for robust Drosophila sex determination. Nature 540, 301–304 (2016).
Article ADS CAS PubMed Google Scholar
Patil, D. et al. m(6)A RNA methylation promotes XIST-mediated transcriptional repression. Nature 537, 369–373 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Růžička, K. et al. Identification of factors required for m(6) A mRNA methylation in Arabidopsis reveals a role for the conserved E3 ubiquitin ligase HAKAI. N. Phytol. 215, 157–172 (2017).
Article CAS Google Scholar
Harper, J., Miceli, S., Roberts, R. & Manley, J. Sequence specificity of the human mRNA N6-adenosine methylase in vitro. Nucleic Acids Res. 18, 5735–5741 (1990).
Article CAS PubMed PubMed Central Google Scholar
Kane, S. & Beemon, K. Inhibition of methylation at two internal N6-methyladenosine sites caused by GAC to GAU mutations. J. Biol. Chem. 262, 3422–3427 (1987).
Article CAS PubMed Google Scholar
Schibler, U., Kelley, D. & Perry, R. Comparison of methylated sequences in messenger RNA and heterogeneous nuclear RNA from mouse L cells. J. Mol. Biol. 115, 695–714 (1977).
Article CAS PubMed Google Scholar
Wei, C. & Moss, B. Nucleotide sequences at the N6-methyladenosine sites of HeLa cell messenger ribonucleic acid. Biochemistry 16, 1672–1676 (1977).
Article CAS PubMed Google Scholar
Chen, K. et al. High-resolution N(6) -methyladenosine (m(6) A) map using photo-crosslinking-assisted m(6) A sequencing. Angew. Chem. Int Ed. Engl. 54, 1587–1590 (2015).
Article CAS PubMed Google Scholar
Linder, B. et al. Single-nucleotide-resolution mapping of m6A and m6Am throughout the transcriptome. Nat. Methods 12, 767–772 (2015).
Article CAS PubMed PubMed Central Google Scholar
Ke, S. et al. A majority of m6A residues are in the last exons, allowing the potential for 3' UTR regulation. Genes Dev. 29, 2037–2053 (2015).
Article CAS PubMed PubMed Central Google Scholar
Garcia-Campos, M. et al. Deciphering the “m6A code” via antibody-independent quantitative profiling. Cell 178, 731–747.e16 (2019).
Article CAS PubMed Google Scholar
Koh, C. W. Q., Goh, Y. T. & Goh, W. S. S. Atlas of quantitative single-base-resolution N(6)-methyl-adenine methylomes. Nat. Commun. 10, 5636 (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
Shu, X. et al. A metabolic labeling method detects m(6)A transcriptome-wide at single base resolution. Nat. Chem. Biol. 16, 887–895 (2020).
Article CAS PubMed Google Scholar
Meyer, K. DART-seq: an antibody-free method for global m6A detection. Nat. Methods 16, 1275–1280 (2019).
Article CAS PubMed PubMed Central Google Scholar
Zhang, Z. et al. Single-base mapping of m6A by an antibody-independent method. Sci. Adv. 5, eaax0250 (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
Chen, K. et al. WHISTLE: a high-accuracy map of the human N6-methyladenosine (m6A) epitranscriptome predicted using a machine learning approach. Nucleic Acids Res. 47, e41 (2019).
Article PubMed PubMed Central CAS Google Scholar
Wang, J. & Wang, L. Deep analysis of RNA N6-adenosine methylation (m6A) patterns in human cells. NAR Genom. Bioinform. https://doi.org/10.1093/nargab/lqaa007 (2020).
Zhang, Y. & Hamada, M. DeepM6ASeq: prediction and characterization of m6A-containing sequences using deep learning. BMC Bioinforma. 19, 524 (2018).
Article CAS Google Scholar
Zhou, Y., Zeng, P., Li, Y., Zhang, Z. & Cui, Q. SRAMP: prediction of mammalian N6-methyladenosine (m6A) sites based on sequence-derived features. Nucleic Acids Res. 44, e91 (2016).
Article PubMed PubMed Central CAS Google Scholar
Chen, W., Tang, H. & Lin, H. MethyRNA: a web server for identification of N6-methyladenosine sites. J. Biomol. Struct. Dyn. 35, 683–687 (2017).
Article CAS PubMed Google Scholar
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 770–778 (IEEE, Las Vegas, 2016).
Jaganathan, K. et al. Predicting splicing from primary sequence with deep learning. Cell 176, 535–548.e24 (2019).
Article CAS PubMed Google Scholar
Lal, A. et al. Deep learning-based enhancement of epigenomics data with AtacWorks. Nat. Commun. 12, 1507 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Körtel, N. et al. Deep and accurate detection of m6A RNA modifications using miCLIP2 and m6Aboost machine learning. Nucleic Acids Res. 49, e92 (2021).
Article PubMed PubMed Central CAS Google Scholar
Khan, A. A. et al. Transfection of small RNAs globally perturbs gene regulation by endogenous microRNAs. Nat. Biotechnol. 27, 549–555 (2009).
Article CAS PubMed PubMed Central Google Scholar
Ke, S. et al. Quantitative evaluation of all hexamers as exonic splicing elements. Genome Res. 21, 1360–1374 (2011).
Article CAS PubMed PubMed Central Google Scholar
Zhang, Z. et al. Genetic analyses support the contribution of mRNA N(6)-methyladenosine (m(6)A) modification to human disease heritability. Nat. Genet. 52, 939–949 (2020).
Article CAS PubMed PubMed Central Google Scholar
Roundtree, I. A., Evans, M. E., Pan, T. & He, C. Dynamic RNA modifications in gene expression regulation. Cell 169, 1187–1200 (2017).
Article CAS PubMed PubMed Central Google Scholar
Scheper, G. C. et al. Mitochondrial aspartyl-tRNA synthetase deficiency causes leukoencephalopathy with brain stem and spinal cord involvement and lactate elevation. Nat. Genet. 39, 534–539 (2007).
Article CAS PubMed Google Scholar
Luo, X. et al. RMVar: an updated database of functional variants involved in RNA modifications. Nucleic Acids Res. 49, D1405–d1412 (2021).
Article CAS PubMed Google Scholar
Chen, K. et al. RMDisease: a database of genetic variants that affect RNA modifications, with implications for epitranscriptome pathogenesis. Nucleic Acids Res. 49, D1396–d1404 (2021).
Article CAS PubMed Google Scholar
Fu, X. D. & Ares, M. Jr. Context-dependent control of alternative splicing by RNA-binding proteins. Nat. Rev. Genet. 15, 689–701 (2014).
Article CAS PubMed PubMed Central Google Scholar
Tian, B. & Graber, J. H. Signals for pre-mRNA cleavage and polyadenylation. Wiley Interdiscip. Rev. RNA 3, 385–396 (2012).
Article CAS PubMed Google Scholar
Kumar, S., Stecher, G., Li, M., Knyaz, C. & Tamura, K. MEGA X: molecular evolutionary genetics analysis across computing platforms. Mol. Biol. Evol. 35, 1547–1549 (2018).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We thank Liangjiang Wang for sharing the code of TDm6A, Jia Meng for sharing the code of WHISTLE. We thank Dennis Weiss, Susan E. Liao, and members of Ke Laboratory for comments, suggestions, and thoughtful discussions. Ke Laboratory and this research is funded by NIH/NIGMS Maximizing Investigators’ Research Award (MIRA) R35 Award (R35 GM133711 to S.K.), American Cancer Society Pilot Award (ACS-2019-Pilot-Ke/IRG-16-191-33/ IRG-21-136-36-IRG to S.K.) and the Jackson Laboratory Cancer Center New Investigator award from the NIH/NCI Cancer Center Support Grant (2 P30 CA034196-34 to S.K.).

Author information

Authors and Affiliations

The Jackson Laboratory, Bar Harbor, ME, 04609, USA
Zhiyuan Luo & Shengdong Ke
Graduate Program in Biophysical Sciences, The University of Chicago, Chicago, IL, 60637, USA
Jiacheng Zhang
Department of Biochemistry and Molecular Biology, The University of Chicago, Chicago, IL, 60637, USA
Jingyi Fei
Institute for Biophysical Dynamics, The University of Chicago, Chicago, IL, 60637, USA
Jingyi Fei

Authors

Zhiyuan Luo
View author publications
You can also search for this author in PubMed Google Scholar
Jiacheng Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jingyi Fei
View author publications
You can also search for this author in PubMed Google Scholar
Shengdong Ke
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

S.K. and Z.L. conceived and designed the study and wrote the manuscript. Z.L. conducted the experiments and performed the data analysis, with some additional contribution from J.Z. and J.F. in experimental validation design and test. S.K. supervised the research.

Corresponding author

Correspondence to Shengdong Ke.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks the anonymous reviewers for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Peer Review File

Description of Additional Supplementary Files

Supplementary Data 1

Supplementary Data 2

Supplementary Data 3

Reporting Summary

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Luo, Z., Zhang, J., Fei, J. et al. Deep learning modeling m⁶A deposition reveals the importance of downstream cis-element sequences. Nat Commun 13, 2720 (2022). https://doi.org/10.1038/s41467-022-30209-7

Download citation

Received: 21 June 2021
Accepted: 06 April 2022
Published: 17 May 2022
DOI: https://doi.org/10.1038/s41467-022-30209-7

This article is cited by

Exon-intron boundary inhibits m6A deposition, enabling m6A distribution hallmark, longer mRNA half-life and flexible protein coding
- Zhiyuan Luo
- Qilian Ma
- Shengdong Ke
Nature Communications (2023)
English–Vietnamese Machine Translation Using Deep Learning for Chatbot Applications
- Nguyen Minh Tuan
- Phayung Meesad
- Ha Huy Cuong Nguyen
SN Computer Science (2023)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.