MiREDiBase, a manually curated database of validated and putative editing events in microRNAs

Marceca, Gioacchino P.; Distefano, Rosario; Tomasello, Luisa; Lagana, Alessandro; Russo, Francesco; Calore, Federica; Romano, Giulia; Bagnoli, Marina; Gasparini, Pierluigi; Ferro, Alfredo; Acunzo, Mario; Ma, Qin; Croce, Carlo M.; Nigita, Giovanni

doi:10.1038/s41597-021-00979-8

Download PDF

Article
Open access
Published: 04 August 2021

MiREDiBase, a manually curated database of validated and putative editing events in microRNAs

Gioacchino P. Marceca¹^na1,
Rosario Distefano²^na1,
Luisa Tomasello²,
Alessandro Lagana³,
Francesco Russo⁴,
Federica Calore²,
Giulia Romano⁵,
Marina Bagnoli ORCID: orcid.org/0000-0001-7655-4630⁶,
Pierluigi Gasparini^2,7,8,
Alfredo Ferro¹,
Mario Acunzo ORCID: orcid.org/0000-0002-3942-9061⁵,
Qin Ma ORCID: orcid.org/0000-0002-3264-8392⁹,
Carlo M. Croce²^na1 &
…
Giovanni Nigita ORCID: orcid.org/0000-0001-8880-4334²^na1

Scientific Data volume 8, Article number: 199 (2021) Cite this article

5514 Accesses
14 Citations
5 Altmetric
Metrics details

Subjects

Abstract

MicroRNAs (miRNAs) are regulatory small non-coding RNAs that function as translational repressors. MiRNAs are involved in most cellular processes, and their expression and function are presided by several factors. Amongst, miRNA editing is an epitranscriptional modification that alters the original nucleotide sequence of selected miRNAs, possibly influencing their biogenesis and target-binding ability. A-to-I and C-to-U RNA editing are recognized as the canonical types, with the A-to-I type being the predominant one. Albeit some bioinformatics resources have been implemented to collect RNA editing data, it still lacks a comprehensive resource explicitly dedicated to miRNA editing. Here, we present MiREDiBase, a manually curated catalog of editing events in miRNAs. The current version includes 3,059 unique validated and putative editing sites from 626 pre-miRNAs in humans and three primates. Editing events in mature human miRNAs are supplied with miRNA-target predictions and enrichment analysis, while minimum free energy structures are inferred for edited pre-miRNAs. MiREDiBase represents a valuable tool for cell biology and biomedical research and will be continuously updated and expanded at https://ncrnaome.osumc.edu/miredibase.

microRNAs in action: biogenesis, function and regulation

Article 28 June 2023

Landscape of adenosine-to-inosine RNA recoding across human tissues

Article Open access 04 March 2022

The RNA Atlas expands the catalog of human non-coding RNAs

Article 17 June 2021

Introduction

MiRNAs are the most studied class of small non-coding RNAs involved in gene expression regulation. According to the canonical miRNA biogenesis pathway, miRNAs are initially transcribed into primary transcripts (pri-miRNAs) that present hairpin structures and undergo a double RNase III-mediated processing¹. The first step occurs within the nucleus, where the Drosha-DGCR8 enzymatic complex cleaves pri-miRNAs into ~70 nucleotide long transcripts. These typically maintain the stem-loop conformation and represent the precursors of miRNAs (pre-miRNAs). Pre-miRNAs are then exported to the cytoplasm, where they are ultimately processed by Dicer into ~22 nucleotides long single-stranded RNAs (mature miRNAs)¹. These can be found as −5p or −3p forms, depending on which miRNA’s arm they derive from¹. To date, it is estimated that more than 1,900 pre-miRNAs are expressed in humans, giving rise to over 2,600 different mature miRNAs².

MiRNAs are important modulators of gene expression^3,4. The rule underlying their inhibitory activity over translation consists of a thermodynamically stable base pairing between a specific miRNA region, termed “seed region,” and a complementary nucleotide sequence of an mRNA, termed “miRNA responsive element” (MRE), causing the enzymatic degradation of the targeted transcript^3,4. Conventionally, the seed region consists of nucleotides 2–8 located at the 5′ end of miRNAs and is usually assumed to interact with MREs included within the 3′ untranslated region (3′UTR) of target mRNAs^3,4. MiRNAs take part in a vast range of physiological processes, including cell cycle control⁵, angiogenesis⁶, brain development⁷, behavioral changes, and cognitive processes⁸. Conversely, dysregulations of their expression or mutations in miRNA seed regions/MREs often lead to several pathologies, including tumors^5,9.

RNA editing consists of the co- or post-transcriptional enzymatic modification of a primary RNA sequence through single-nucleotide substitutions, insertions, or deletions¹⁰. Recent transcriptome-wide analyses have revealed the pervasive presence of RNA editing in the human transcriptome. Currently, the adenosine-to-inosine (A-to-I) and cytosine-to-uracil (C-to-U) RNA editing are considered the canonical editing types¹¹, with the A-to-I type being the most prevalent one^12,13,14,15.

A-to-I RNA editing is catalyzed by enzymes of the Adenosine Deaminase Acting on RNA (ADAR) family, specifically ADAR1 (two isoforms) and ADAR2 (one predominant isoform)¹⁶. A-to-I RNA editing frequently occurs in non-coding transcripts, including pri-miRNA transcripts, and shows tissue-dependent patterns^12,15,17. C-to-U RNA editing is catalyzed by enzymes of the Apolipoprotein B mRNA Editing Enzyme Catalytic Polypeptide (APOBEC) family, APOBEC1 and APOBEC3, at least in the context of mRNAs¹⁸. However, to date, no proof has been reported concerning the role of APOBECs in miRNA editing, and only a few studies have discussed this editing type in miRNAs^19,20.

Editing of pri-miRNA exerts significant effects on miRNA biogenesis and function, with profound implications in pathophysiological processes, such as the progression of neurodegenerative diseases and cancers^16,21. For instance, the editing of pri-miRNAs could induce a local structural change that prevents Drosha from recognizing the hairpin conformation, averting its cleavage, and allocating the edited pri-miRNAs to degradation¹⁶. Differently, editing events falling within the mature miRNA region not inducing the pre-miRNA suppression generate miRNAs diversified in their primary sequence, subsequently causing a change in miRNAs’ target repertoire (miRNA re-targeting)¹⁶.

Given the extensive number of high-confidence RNA editing sites retrieved so far and their relevance in the biomedical field, several efforts have been made to develop online resources capable of summarizing, contextualizing, and interpreting such data. These include databases like DARNED²², RADAR²³, and REDIportal²⁴, and more complex resources such as TCEA²⁵. However, no dedicated online resources have been explicitly implemented for the study of miRNA editing until now. Here, we present MiREDiBase, the first comprehensive and integrative catalog of validated and putative miRNA editing events. MiREDiBase is manually curated and provides users with valuable information to study edited miRNAs as potential disease biomarkers.

Results

Data collection

The MiREDiBase data processing workflow is depicted in Fig. 1. We first explored the PubMed literature by searching for specific keywords, such as “microRNA editing” and “miRNA editing,” narrowing the temporal range between 2000 and 2019. Retrieved articles were then manually filtered, discarding those not containing information on miRNA editing. Editing events detected or validated by targeted methods were included in the database and considered as authentic modifications. Among editing events detected through wide-transcriptome methods, we retained those established as “reliable” or “high-confidence” by the authors, classifying them as putative modifications. Statistical significance was taken into consideration when possible, eventually maintaining only significant editing events. We did not consider enzyme perturbation experiments as validation methods. For putative edited pre-miRNA sequences with no official miRNA name, e.g., “Antisense-hsa-mir-451” in Blow et al.²⁶, we employed the BLASTN tool to generate alignments between the putative pre-miRNA sequence and miRBase’s pre-miRNA sequences (v22)^2,27. Only perfect matches were retained and provided with their respective official name, as indicated by miRBase. In case editing positions were presented in the form of coordinates of previous genomic assemblies (i.e., hg19/GRCh37), these were converted to the hg38/GRCh38 assembly using the University of California Santa Cruz (UCSC) liftOver tool²⁸. Editing sites associated with miRBase’s dead-entries were discarded.

**Fig. 1: MiREDiBase’s data pre-processing workflow.**

In the second step, we expanded our search by employing the three most prominent online resources for A-to-I events available at present: DARNED²², RADAR²³, and REDIportal²⁴. Resources were manually screened, removing editing sites associated with dead entries and opposite strands. Editing sites falling into misassigned miRNAs in the hg19 genomic assembly (i.e., miRNAs of the hsa-mir-548 family and hsa-mir-3134 present in DARNED) were excluded from the database. The retained data were then integrated into the initial dataset.

Database content and statistics

Considering the recent knowledge about genomic differences and similarities among primates, we retained data from Homo sapiens and three primate species (Pan troglodytes, Gorilla gorilla, and Macaca mulatta). In particular, the current version of MiREDiBase includes 2,989 validated and putative unique A-to-I (2,885) and C-to-U (104) editing events occurring in 571 human miRNA transcripts (Fig. 2a,b, see Data Availability section) and 70 unique A-to-I (46) and C-to-U (24) editing events taking place in 55 primate miRNA transcripts (Supplementary Figs. 1, 2a,b, see Data Availability section). Overall, 909 (29.7%) editing events occur outside of the pre-miRNA sequences, 971 (31.7%) within pre-miRNA sequences, outside of the mature sequence, and 1,179 (38.6%) within mature miRNA sequences (Fig. 3a, Supplementary Fig. 3a, see Data Availability section). These data were manually extracted from 51 original papers (Supplementary Table 1), which refer to 256 biological sources (Supplementary Tables 2–5).

**Fig. 2: Descriptive statistics about human data in the current version of MiREDiBase.**

**Fig. 3: Editing events distribution in human microRNAs.**

Human editing sites in MiREDiBase are distributed across several genomic positions throughout the human genome, covering most chromosomes (Fig. 2b). However, of the 2,989 unique editing sites, only 257 (8.6%) have been validated by low-throughput methods or ADAR expression perturbation experiments. The majority of such events fall into clustered miRNAs located in chromosomes 14 (9.5% A-to-I; 7.7% C-to-U), chrX (9.4% A-to-I; 7.7% C-to-U), chr1 (7.7% A-to-I; 6.7% C-to-U), and chr19 (6.7% A-to-I; 11.5% C-to-U), respectively. Such a phenomenon very likely depends on local structural elements and motifs in these primary transcripts that function as editing inducers^29,30 and would deserve more in-depth investigations. For the vast majority, the functionality of miRNA editing events has currently remained undetermined. So far, only 24 editing sites (0.8%) were functionally characterized by appropriate techniques (Fig. 3c). Among these, twelve were demonstrated to impair miRNA biogenesis; seven cause functional re-targeting; three cause impaired biogenesis and functional re-targeting; two cause enhancement of biogenesis.

Concerning primates, the majority of data refer to macaque (Macaca mulatta), for which our database reports 40 A-to-I and 24 C-to-U editing sites (Supplementary Figs. 1, 2, see Data Availability section). Here, 26 (65%) A-to-I editing sites are conserved between human and macaque, whereas only 8 (33%) C-to-U sites are conserved between these two species. This figure might suggest that A-to-I editing of miRNA transcripts is more conserved than the C-to-U type; however, it might also be due to the current low number of C-to-U instances reported for both human and primates. Only three editing sites are reported for both chimpanzee (Pan troglodytes) and gorilla (Gorilla gorilla), occurring in one pre-miRNA transcript for each species (see Data Availability section). None of the editing sites from primates have been validated yet.

When looking at editing sites falling within mature miRNA sequences, data from MiREDiBase let emerge two distinct patterns for A-to-I and C-to-U editing in humans (Fig. 3b). Examining the A-to-I type, most edited sites (325 out of 1018, 31.9%) are located at positions 2–5 of the seed region. Other hotspots for A-to-I editing seem to be represented by positions 1, 6–9, and 12, which account for 325 more edited sites. In the case of C-to-U miRNA editing, most modification sites are located outside of the seed region. In particular, 48 out of 104 edited sites (46.2%) are located at positions 10–12 and 15, whereas only 17 edited sites (16.3%) fall within the seed region. A very similar pattern can be observed in macaque (Supplementary Fig. 3b).

To help users interpret and contextualize data, miRNA editing events occurring within pre-miRNA or mature miRNA sequences were supplied with in silico predictions. We computed 2,150 MFE pre-miRNA predictive structures using editing sites internal to pre-miRNA sequences and 1,018 miRNA-targeting predictions and enrichment analyses. In both cases, users have the opportunity to compare the edited miRNAs with their relative wild-type versions.

To infer whether local motifs influence A-to-I editing, we investigated the nucleotide composition around the editing sites across the different regions of the miRNA transcripts. Indeed, previous works have already demonstrated a neighbor preference for ADAR-mediated editing. Specifically, comparative studies showed that ADARs have a higher affinity for the 5′ nearest neighbor consisting of U ~ A > C ~ G^31,32. A neighbor preference for the 3′ nearest neighbor was shown only for ADAR2, consisting of U ~ G > C ~ A³². More generally, the UAG triplet has been found as the most favored among others, even in miRNAs^33,34. These results were recently confirmed by a structural study, which demonstrated that ADAR nearest neighbor preference in humans is mainly determined by nucleotide-amino acid interactions rather than local duplex stability³⁵. Analysis of the nearest neighbors of edited adenosines from MiREDiBase revealed a similar pattern, although showing interesting clues when subdividing the miRNA transcripts into distinct regions (Supplementary Fig. 4). In humans, edited adenosines falling into the mature sequences and regions of the pre-miRNA (excluding the loop region) showed a 5′ neighbor preference consisting of A ~ U > C > G and a 3′ neighbor preference consisting of G > A ~ C ~ U (Supplementary Fig. 4a). Likewise, the pri-miRNA regions (out of the stem-loop sequence) showed a 3′ neighbor preference consisting of G > A ~ C ~ U but a distinct 5′ neighbor preference, consisting of A ~ C > G > U. The loop region showed a sharply different neighbor preference, consisting of G ~ A > C ~ U at 5′ and C ~ G > A ~ U at 3′. Such differences might indicate that local RNA structures also affect the editing preference during ADAR activity. Concerning the neighbor preference in editing of macaque miRNA transcripts, our analysis evidenced a 5′ neighbor preference consisting of U > C > A > G and a 3′ neighbor preference consisting of G > U > C ~ A, with UAG being the most representative motif (Supplementary Fig. 4b). No analysis was carried out for the other miRNA regions due to the scarcity (pre-miRNA regions) or lack of data.

Finally, biological sources in MiREDiBase can be grouped into three main categories (Table 1). The “normal condition” group (human and primates) accounts for 92 different healthy tissues/organs analyzed for miRNA editing. Among these, 85 were obtained from adult individuals and seven from pre-natal developmental stages (Supplementary Tables 2 and 5). The “adverse condition” group (human only) is broadly represented by tumors, with 60 distinct oncological conditions and 62 different sample subtypes. The neurological disorders include four pathological conditions and six sample subtypes. The inflammatory condition, cardiovascular disease, and genetic disorder are currently the less representative classes, with two pathological conditions and three sample subtypes for the former and one pathological condition and sample type for the latter two, respectively (Supplementary Table 3). The “cell line” group (human only) accounts for 78 commercial cell lines and ten primary human cells cultured in vitro. Of the 78 commercial cell lines, 71 are malignant, while the remaining represent non-malignant conditions. Among the ten primary human cells, only one refers to a malignant condition, while nine represent normal conditions (Supplementary Table 4).

Table 1 Number of normal and adverse conditions and cell lines present in MiREDiBase.

Full size table

User interface and data accessibility

MiREDiBase provides users an intuitive and straightforward web interface to access data, requiring no bioinformatics skills to perform accurate searches across the database. Users can explore MiREDiBase by interacting with the Search (Fig. 4) or the Compare module. Each module starts with a modal box by which users can filter miRNA editing sites.

**Fig. 4: The MiREDiBase Search module.**

The Search module provides four filtering fields, including organism (e.g., Human), modification type (e.g., the A-to-I editing), genomic region (e.g., chromosome, pre-miRNA, or miRNA), and, optionally, biological source (e.g., BRCA – breast carcinoma). The “Search module” generates a table listing a set of editing sites supplied with essential information based on the selected filtering options. Reported information covers the organism, modification type, chromosome, strand, genomic position, pre-miRNA and mature miRNA relative positions, employed detection strategies, and whether the site is putative or validated. By clicking on the dedicated left-sided buttons, users can dig down to find supplementary information about each editing site. Here, the detection strategies information is expanded, categorizing the editing site as putative (i.e., only detected by high-throughput sequencing methods) or validated (i.e., authenticated by targeted methods), indicating the confidence level for each modification. Additional information covers publications, external resources, biological sources, 2D structures of edited and non-edited pre-miRNAs, miRNA-target predictions, and associated functional enrichment data (Fig. 4), which enable ready access to a putative biological interpretation. The results in each module can be easily downloaded through dedicated buttons.

The Compare module aims at exploring differentially edited sites in adverse vs. normal conditions. It provides a set of essential information supplied with the editing level for each examined condition. Like the Search module, the Compare module allows users to filter out RNA editing sites by specifying the organism, modification type, disease, and pre-miRNA.

All miRNAs reported in MiREDiBase are linked to their specific miRbase and RNAcentral³⁶ web pages. Moreover, A-to-I genomic coordinates were mapped onto the UCSC hg38/GRCh38 genome assembly and available via the UCSC website. If applicable, editing sites provide links to external RNA editing resources, such as DARNED, RADAR, and REDIportal, to improve miRNA editing research.

To encourage users to familiarize themselves with our tool, MiREDiBase offers, throughout the website, helpers reporting explanations on how to interpret results, along with statistics and complete documentation on how to use each module. Advanced users can instead exploit the RESTful API, which provides a standalone web interface to explore available methods for extracting data, with the opportunity to embed RESTful API HTTP calls within users’ code (Supplementary Fig. 5).

The MiREDiBase platform adopts a multi-containerized microservice architecture (Supplementary Fig. 5), which provides user-friendly and efficient ways to access all manually collected data (see Methods section for more details).

Discussion

At the beginning of the study on miRNA editing, Sanger sequencing represented the standard method to reliably identify editing events^34,37,38. However, this low-throughput technique only enabled the detection of a relatively restricted set of editing sites. In later years, the employment of high-throughput sequencing (HTS) technologies and the design of ad-hoc bioinformatic pipelines have dramatically improved the computational identification of RNA editing events³⁹, including those occurring in miRNAs.

Given the ever-increasing number of editing sites detected at a genome-wide scale, the need to create a comprehensive catalog of such modifications has become imperative. In light of this, Kiran and Baranov published DARNED, the first online repository providing centralized access to published data on RNA editing²². DARNED currently includes ~350,000 predicted RNA A-to-I editing sites from humans, mice (Mus musculus), flies (Drosophila melanogaster), and a few C-to-U instances. However, only a small portion of these modification events was manually annotated, and no information is provided about editing levels. DARNED’s last update dates back to 2012⁴⁰.

In 2013, Ramaswami and Li presented RADAR, a rigorously annotated A-to-I RNA editing database containing manually curated editing sites²³. Like DARNED, RADAR includes data from humans, mice, and flies and currently accounts for ~1.4 million editing sites, providing several useful information like tissue-specific editing level, conservation in other model organisms, and genomic context. RADAR does not include C-to-U editing data, and the update took place in 2014.

In 2017, Picardi and colleagues developed REDIportal, which today is the most extensive collection of RNA editing in humans, including more than 4.5 million A-to-I modification events detected across 55 body sites from thousands of RNA-seq experiments²⁴. Moreover, with its last update, REDIportal also includes ∼90,000 putative A-to-I editing events from the mouse brain transcriptome and incorporates CLAIRE, a searchable catalog of RNA editing levels across cell lines⁴¹.

Although these three mentioned online resources are undoubtedly the most authoritative repositories of RNA editing events, none of them is strictly dedicated to miRNA editing. The vast majority of the editing events reported in these databases fall into mRNAs and long non-coding RNAs (lncRNAs), with only a minority occurring in miRNAs. Indeed, a few online resources have been lastly developed that partially focus on the effects of RNA editing on miRNA functionality. For instance, the Editome-Disease Knowledgebase (EDK)⁴² is a manually curated database that aims to link experimentally validated RNA editing events in non-coding RNAs to various diseases. However, this database currently contains only 16 validated A-to-I instances in miRNAs and does not provide any information about publications, position of editing sites, or detection/validation methods. The Cancer Editome Atlas (TCEA) is a powerful, user-friendly bioinformatics resource that characterizes more than 192 million editing events at ~4.6 million editing sites from approximately 11,000 samples across 33 cancer types recovered from The Cancer Genome Atlas²⁵. However, TCEA is focused on editing events occurring in coding transcripts. From the miRNA standpoint, TCEA only allows users to predict A-to-I editing’s effects in the 3′ UTR of mRNAs in terms of miRNA-mRNA interactions. Analogous considerations apply for miR-EdiTar⁴³, a database that exploits DARNED data to predict the potential effects of A-to-I editing over miRNA targeting.

To cover the gap between the fields of RNA editing and miRNA biology, we developed MiREDiBase, the first-of-its-kind database dedicated explicitly to miRNA modifications. In the current version, MiREDiBase includes more than three thousand A-to-I and C-to-U miRNA editing events manually collected from the literature, occurring in humans and primates. MiREDiBase allows users to consult the RNA secondary structure of both the wild-type and edited pre-miRNAs and infer the possible function of edited mature miRNA, based on the predicted targetome and subsequent functional analysis.

We implemented a user-friendly interface that allows users to track each search step to improve the user experience. Moreover, MiREDiBase includes a “Compare” section, which compares adverse versus normal conditions in a study-specific manner. Finally, the MiREDiBase platform relies on cutting-edge technologies, aiming at providing reliability and continuous operability. The platform represents an orchestration of different containerized services on top of Docker. Each service fulfills a specific purpose, such as a Web Application Service (Vue.js/Quasar - a Progressive JavaScript Framework), a RESTful API Service (FastAPI - a modern, high-performance, web framework for building secure APIs), and a Database Service (MongoDB - a NoSQL document-based database). The platform is designed to provide the smoothest and user-friendly experience to users.

We are aware that the lack of data on more commonly adopted model organisms and the inclusion of C-to-U RNA editing sites represent weaknesses in our work. The choice to include primates rather than other model species in this first release was motivated by the fact that primates present the highest genomic and transcriptomic similarity compared to humans⁴⁴. Moreover, primates are recognized as excellent candidates to investigate epigenetic control of genome functions and are highly relevant for biomedical studies⁴⁴. The choice to include putative C-to-U miRNA editing events was because this editing type is considered “canonical” among mammals. Indeed, previous Sanger-sequence validation of putative C-to-U editing sites in miRNAs found no evidence for real C-to-U miRNA editing^15,45, letting hypothesize that such events were HTS artifacts. On the other hand, Negi et al. recently found and validated C-to-U editing at the fifth position of mature human miR-100, demonstrating that such an instance was functionally associated with CD4(+) T cell differentiation²⁰. Given these controversies, we believe that collecting C-to-U miRNA sites with high consensus would serve to orientate future studies on this topic.

Besides expanding the database with published data, our main future goals are (i) to include editing events from other species, primarily model organisms like Mus musculus and Drosophila melanogaster, and (ii) adding other modification types. We believe that this will help interpret the functional roles of modified miRNA transcripts within the cell system. For example, after analyzing human brain samples for RNA editing events, Paul et al. unexpectedly found that a consistent percentage of miRNA editing events are non-canonical, especially C-to-A and G-to-U¹¹. Similar data were reported by Wang and co-workers⁴⁶, raising questions on whether these editing events exert essential function in neurons and if specific enzymes can catalyze such modifications. Likewise, miRNA methylation has recently caught the scientific community’s attention, being demonstrated to affect miRNA biogenesis⁴⁷. However, the study of this phenomenon and its potential functional implications have remained widely unexplored. With continuous updating, we believe that MiREDiBase will gradually become a precious resource for researchers in the field of epitranscriptomics, leading to a better understanding of miRNA modification phenomena and their functional consequences.

Methods

Data processing

Each editing event was supplied with essential information recovered from miRBase (v22), including the relative position within pre-miRNA and mature miRNA, genomic position, and pre-miRNA region (5′- or 3′-arm, or loop region). For editing events occurring outside the pre-miRNA sequence, we adopted the notation “pri-miRNA.” Editing events were then enriched with metadata manually collected from selected publications. Overall, we extracted eight different information types: detection/validation method, experiment type, biological source, correspondent condition (adverse or normal), comparison (pathological vs. physiological condition), editing level, enzyme affinity, and functional characterization.

The “detection method” does not specify the method adopted by authors to identify miRNA editing events. Instead, it indicates which kind of methodological approach (targeted, wide-transcriptome, or both) the authors selected for editing detection. Only in two cases, the method has been specified to highlight particularly sensitive and innovative approaches, i.e., miR-mmPCR-seq^15,48 and RIP-seq⁴⁹.

The “validation method” refers to methods confirming sequencing data, especially those obtained by wide-transcription approaches, including enzyme knock-down (only ADAR in the current version), knock-out, differential expression, and modification-specific enzymatic cleavage⁵⁰.

The “experiment type” specifies whether, in a particular study, individual editing events were identified in vitro, in vivo, or ex vivo. Editing events obtained by analyzing small RNA-seq data from The Cancer Genome Atlas (TCGA)⁵¹ or Genotype-Tissue Expression (GTEx) atlas⁵² were considered as detected in vivo. Editing events obtained by analyzing sequence libraries from the Sequence Read Archive (SRA) database⁵³ were considered as detected in vitro, in vivo, or ex vivo depending on the library derivation.

The “pathological condition” specifies whether a miRNA editing event was detected in one or multiple diseases. For a given study, physiological and pathological conditions were compared to whether editing levels for an individual miRNA were simultaneously available for both conditions.

In studies with multiple editing level values per miRNA editing site, we considered only the minimum and maximum values, rounding them up by multiples of five (e.g., editing levels of 21.1% and 44% were rounded up to 20% and 45%, respectively). Whether a single value was reported for an individual miRNA, this was rounded, creating an interval of 5% (e.g., if a study reported the editing level as 13% for a specific editing site, the editing level was presented as “from 10% to 15%”).

Information concerning enzyme affinity (only ADARs in the current version) was retrieved whether authors carried out enzyme-transfection experiments causing enzyme overexpression. Finally, we annotated all the functionally characterized editing events with information regarding their specific biological function. In the event of functional re-targeting, validation methods were reported along with the set of validated lost and gained targets.

Sequence logo generation

Sequence logos present in Supplementary Fig. 4 were generated by using ggseqlogo function from ggseqlogo R package (v0.1).

Secondary structure prediction analysis

We generated the minimum free energy (MFE) structures for all those pre-miRNAs subjected to editing and their wild-type (WT) counterparts. The double-stranded RNA structures were created by employing the RNAfold tool from the ViennaRNA package⁵⁴ with default settings. Finally, we considered all editing sites occurring within the mature miRNA region to infer possible miRNA target re-direction as well as diversified biological functions.

MiRNA-Target prediction and functional enrichment analyses

The miRNA-target prediction analysis, for both edited and WT miRNA, was achieved by using our web-based containerized application isoTar⁵⁵, designed to simplify and perform miRNA consensus target prediction and functional enrichment analyses. For miRNA target predictions, we established a minimum consensus of 3. An adjusted P-value < 0.05 was considered as a threshold for the functional enrichment analysis.

Platform design and implementation

To achieve reliability and continuous delivery (short-cycle updates), we developed each lightweight, standalone, microservice on top of Docker (v19.03.12) (https://docs.docker.com). The platform itself consists of three microservices, which are orchestrated by Docker Compose (v1.26.0) (https://docs.docker.com/compose), a tool for managing multi-container-based applications. Each microservice provides a specific functionality: a web-based user interface (UI), a RESTful API for data retrieving, and a NoSQL document-based database for data storing. To offer users an engaging and responsive experience, we developed a high-performance platform which relies on cutting-edge open-source technologies, such as Quasar (v1.12.8) (https://quasar.dev) for the UI, FastAPI (v0.55.1) (a Python (v3.8.1) framework, https://fastapi.tiangolo.com) for the RESTful API, and MongoDB (v4.2.8) (https://www.mongodb.com) for the data storage. We have tested MiREDiBase on the following browsers: Firefox (80+), Google Chrome (85+), Edge (85+), Safari (13+), and Opera (70+). MiREDiBase is freely accessible to the scientific community through the link: https://ncrnaome.osumc.edu/miredibase, without requiring registration or login.

Data availability

The data set related to MiREDiBase is freely available at figshare (https://doi.org/10.6084/m9.figshare.14666466.v1)⁵⁶, and via MiREDiBase website (https://ncrnaome.osumc.edu/miredibase/download).

Code availability

The whole platform is openly available at https://github.com/ncRNAome-OSU/miredibase and at Zenodo (https://doi.org/10.5281/zenodo.4421966)⁵⁷.

References

Ha, M. & Narry Kim, V. Regulation of microRNA biogenesis. Nature Reviews Molecular Cell Biology 15, 509–524 (2014).
Article CAS PubMed Google Scholar
Kozomara, A., Birgaoanu, M. & Griffiths-Jones, S. miRBase: from microRNA sequences to function. Nucleic Acids Res. 47, D155–D162 (2019).
Article CAS PubMed Google Scholar
Bartel, D. P. MicroRNAs: target recognition and regulatory functions. Cell 136, 215–233 (2009).
Article CAS PubMed PubMed Central Google Scholar
Jonas, S. & Izaurralde, E. Towards a molecular understanding of microRNA-mediated gene silencing. Nat. Rev. Genet. 16, 421–433 (2015).
Article CAS PubMed Google Scholar
Liang, L.-H. & He, X.-H. Macro-management of microRNAs in cell cycle progression of tumor cells and its implications in anti-cancer therapy. Acta Pharmacol. Sin. 32, 1311–1320 (2011).
Article CAS PubMed PubMed Central Google Scholar
Salinas-Vera, Y. M. et al. AngiomiRs: MicroRNAs driving angiogenesis in cancer (Review. Int. J. Mol. Med. 43, 657–670 (2019).
CAS PubMed Google Scholar
Ziats, M. N. & Rennert, O. M. Identification of differentially expressed microRNAs across the developing human brain. Mol. Psychiatry 19, 848–852 (2014).
Article CAS PubMed Google Scholar
Wingo, T. S. et al. Brain microRNAs associated with late-life depressive symptoms are also associated with cognitive trajectory and dementia. NPJ Genom Med 5, 6 (2020).
Article CAS PubMed PubMed Central Google Scholar
Croce, C. M. Causes and consequences of microRNA dysregulation in cancer. Nat. Rev. Genet. 10, 704–714 (2009).
Article CAS PubMed PubMed Central Google Scholar
Gott, J. M. & Emeson, R. B. Functions and mechanisms of RNA editing. Annu. Rev. Genet. 34, 499–531 (2000).
Article CAS PubMed Google Scholar
Paul, D., Ansari, A. H., Lal, M. & Mukhopadhyay, A. Human Brain Shows Recurrent Non-Canonical MicroRNA Editing Events Enriched for Seed Sequence with Possible Functional Consequence. Noncoding RNA 6 (2020).
Picardi, E. et al. Corrigendum: Profiling RNA editing in human tissues: towards the inosinome Atlas. Sci. Rep. 6, 20755 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Zheng, Y. et al. Accurate detection for a wide range of mutation and editing sites of microRNAs from small RNA high-throughput sequencing profiles. Nucleic Acids Res. 44, e123 (2016).
Article PubMed PubMed Central CAS Google Scholar
Brümmer, A., Yang, Y., Chan, T. W. & Xiao, X. Structure-mediated modulation of mRNA abundance by A-to-I editing. Nat. Commun. 8, 1255 (2017).
Article ADS PubMed PubMed Central CAS Google Scholar
Li, L. et al. The landscape of miRNA editing in animals and its impact on miRNA biogenesis and targeting. Genome Res. 28, 132–143 (2018).
Article CAS PubMed PubMed Central Google Scholar
Nishikura, K. A-to-I editing of coding and non-coding RNAs by ADARs. Nat. Rev. Mol. Cell Biol. 17, 83–96 (2016).
Article CAS PubMed Google Scholar
Tan, M. H. et al. Dynamic landscape and regulation of RNA editing in mammals. Nature 550, 249–254 (2017).
Article ADS PubMed PubMed Central Google Scholar
Wedekind, J. E., Dance, G. S. C., Sowden, M. P. & Smith, H. C. Messenger RNA editing in mammals: new members of the APOBEC family seeking roles in the family business. Trends Genet. 19, 207–216 (2003).
Article CAS PubMed Google Scholar
Joyce, C. E. et al. Deep sequencing of small RNAs from human skin reveals major alterations in the psoriasis miRNAome. Hum. Mol. Genet. 20, 4025–4040 (2011).
Article CAS PubMed PubMed Central Google Scholar
Negi, V. et al. Altered expression and editing of miRNA-100 regulates iTreg differentiation. Nucleic Acids Res. 43, 8057–8065 (2015).
Article CAS PubMed PubMed Central Google Scholar
Han, L. et al. The Genomic Landscape and Clinical Relevance of A-to-I RNA Editing in Human Cancers. Cancer Cell 28, 515–528 (2015).
Article CAS PubMed PubMed Central Google Scholar
Kiran, A. & Baranov, P. V. DARNED: a DAtabase of RNa EDiting in humans. Bioinformatics 26, 1772–1776 (2010).
Article CAS PubMed Google Scholar
Ramaswami, G. & Li, J. B. RADAR: a rigorously annotated database of A-to-I RNA editing. Nucleic Acids Res. 42, D109–13 (2014).
Article CAS PubMed Google Scholar
Picardi, E., D’Erchia, A. M., Lo Giudice, C. & Pesole, G. REDIportal: a comprehensive database of A-to-I RNA editing events in humans. Nucleic Acids Res. 45, D750–D757 (2017).
Article CAS PubMed Google Scholar
Lin, C.-H. & Chen, S. C.-C. The Cancer Editome Atlas: A Resource for Exploratory Analysis of the Adenosine-to-Inosine RNA Editome in Cancer. Cancer Research 79, 3001–3006 (2019).
Article CAS PubMed Google Scholar
Blow, M. J. et al. RNA editing of human microRNAs. Genome Biol. 7, R27 (2006).
Article PubMed PubMed Central CAS Google Scholar
Griffiths-Jones, S. miRBase: the microRNA sequence database. Methods Mol. Biol. 342, 129–138 (2006).
CAS PubMed Google Scholar
Haeussler, M. et al. The UCSC Genome Browser database: 2019 update. Nucleic Acids Res. 47, D853–D858 (2019).
Article CAS PubMed Google Scholar
Nigita, G., Alaimo, S., Ferro, A., Giugno, R. & Pulvirenti, A. Knowledge in the Investigation of A-to-I RNA Editing Signals. Front Bioeng Biotechnol 3, 18 (2015).
PubMed PubMed Central Google Scholar
Daniel, C., Widmark, A., Rigardt, D. & Öhman, M. Editing inducer elements increases A-to-I editing efficiency in the mammalian transcriptome. Genome Biol. 18, 195 (2017).
Article PubMed PubMed Central CAS Google Scholar
Polson, A. G. & Bass, B. L. Preferential selection of adenosines for modification by double-stranded RNA adenosine deaminase. EMBO J. 13, 5701–5711 (1994).
Article CAS PubMed PubMed Central Google Scholar
Lehmann, K. A. & Bass, B. L. Double-stranded RNA adenosine deaminases ADAR1 and ADAR2 have overlapping specificities. Biochemistry 39, 12875–12884 (2000).
Article CAS PubMed Google Scholar
Riedmann, E. M., Schopoff, S., Hartner, J. C. & Jantsch, M. F. Specificity of ADAR-mediated RNA editing in newly identified targets. RNA 14, 1110–1118 (2008).
Article CAS PubMed PubMed Central Google Scholar
Kawahara, Y. et al. Frequency and fate of microRNA editing in human brain. Nucleic Acids Res. 36, 5270–5280 (2008).
Article CAS PubMed PubMed Central Google Scholar
Matthews, M. M. et al. Structures of human ADAR2 bound to dsRNA reveal base-flipping mechanism and basis for site selectivity. Nat. Struct. Mol. Biol. 23, 426–433 (2016).
Article CAS PubMed PubMed Central Google Scholar
The RNAcentral Consortium. et al. RNAcentral: a comprehensive database of non-coding RNA sequences. Nucleic Acids Res. 45, D128–D134 (2017).
Article CAS Google Scholar
Luciano, D. J., Mirsky, H., Vendetti, N. J. & Maas, S. RNA editing of a miRNA precursor. RNA 10, 1174–1177 (2004).
Article CAS PubMed PubMed Central Google Scholar
Yang, W. et al. Modulation of microRNA processing and expression through RNA editing by ADAR deaminases. Nat. Struct. Mol. Biol. 13, 13–21 (2006).
Article ADS CAS PubMed Google Scholar
Diroma, M. A., Ciaccia, L., Pesole, G. & Picardi, E. Elucidating the editome: bioinformatics approaches for RNA editing detection. Brief. Bioinform. 20, 436–447 (2019).
Article CAS PubMed Google Scholar
Kiran, A. M., O’Mahony, J. J., Sanjeev, K. & Baranov, P. V. Darned in 2013: inclusion of model organisms and linking with Wikipedia. Nucleic Acids Res. 41, D258–61 (2013).
Article CAS PubMed Google Scholar
Schaffer, A. A. et al. The cell line A-to-I RNA editing catalogue. Nucleic Acids Res. 48, 5849–5858 (2020).
Article CAS PubMed PubMed Central Google Scholar
Niu, G. et al. Editome Disease Knowledgebase (EDK): a curated knowledgebase of editome-disease associations in human. Nucleic Acids Res. 47, D78–D83 (2019).
Article CAS PubMed Google Scholar
Laganà, A. et al. miR-EdiTar: a database of predicted A-to-I edited miRNA target sites. Bioinformatics 28, 3166–3168 (2012).
Article PubMed PubMed Central CAS Google Scholar
Rogers, J. & Gibbs, R. A. Comparative primate genomics: emerging patterns of genome content and dynamics. Nat. Rev. Genet. 15, 347–359 (2014).
Article CAS PubMed PubMed Central Google Scholar
Blanc, V. et al. Genome-wide identification and functional analysis of Apobec-1-mediated C-to-U RNA editing in mouse small intestine and liver. Genome Biol. 15, R79 (2014).
Article PubMed PubMed Central CAS Google Scholar
Wang, Y. et al. Systematic characterization of A-to-I RNA editing hotspots in microRNAs across human cancers. Genome Res. 27, 1112–1125 (2017).
Article CAS PubMed PubMed Central Google Scholar
Alarcón, C. R., Lee, H., Goodarzi, H., Halberg, N. & Tavazoie, S. F. N6-methyladenosine marks primary microRNAs for processing. Nature 519, 482–485 (2015).
Article ADS PubMed PubMed Central CAS Google Scholar
Zhang, R. et al. Quantifying RNA allelic ratios by microfluidic multiplex PCR and sequencing. Nat. Methods 11, 51–54 (2014).
Article CAS PubMed Google Scholar
Ishiguro, S. et al. Base-pairing probability in the microRNA stem region affects the binding and editing specificity of human A-to-I editing enzymes ADAR1-p110 and ADAR2. RNA Biol. 15, 976–989 (2018).
Article PubMed PubMed Central Google Scholar
Marceca, G. P. et al. Detecting and Characterizing A-To-I microRNA Editing in Cancer. Cancers 13 (2021).
Tomczak, K., Czerwińska, P. & Wiznerowicz, M. Review The Cancer Genome Atlas (TCGA): an immeasurable source of knowledge. Contemp. Oncol. (Pozn) 19, 68–77 (2015).
Google Scholar
GTEx Consortium. Human genomics. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348, 648–660 (2015).
Article PubMed Central CAS Google Scholar
Kodama, Y., Shumway, M. & Leinonen, R. & International Nucleotide Sequence Database Collaboration. The Sequence Read Archive: explosive growth of sequencing data. Nucleic Acids Res. 40, D54–D56 (2012).
Article CAS PubMed Google Scholar
Lorenz, R. et al. ViennaRNA Package 2.0. Algorithms Mol. Biol. 6, 26 (2011).
Article PubMed PubMed Central Google Scholar
Distefano, R. et al. isoTar: Consensus Target Prediction with Enrichment Analysis for MicroRNAs Harboring Editing Sites and Other Variations. Methods Mol. Biol. 1970, 211–235 (2019).
Article CAS PubMed PubMed Central Google Scholar
Marceca, G. P. et al. MiREDiBase Dataset - Version 1. figshare https://doi.org/10.6084/m9.figshare.14666466.v1 (2021).
Marceca, G. P. et al. MiREDiBase: a manually curated database of validated and putative editing events in microRNAs. Zenodo https://doi.org/10.5281/zenodo.4767702 (2020).

Download references

Acknowledgements

We thank the Cancer IT Operation Group of The Ohio State University (OSU) and Thomas Moore for his valuable technical assistance. At the same time, we want to special thank Paolo Fadda from The Genomics Shared Resource (GSR) of OSU, together with Jing Jiang, Cankun Wang and Yujia Xiang from Dr. Ma’s Laboratory (Department of Biomedical Informatics at OSU) for their helpful suggestions during the beta testing phase of MiREDiBase. This work was supported by National Cancer Institute (National Institutes of Health) grant R35CA197706 to C.M.C. F.R. is supported by the BRIDGE - Translational Excellence Programme (bridge.ku.dk), Faculty of Health and Medical Sciences, University of Copenhagen, funded by the Novo Nordisk Foundation with grant agreement no. NNF18SA0034956 and grant agreement no. NNF14CC0001. M.A. was supported by CTSA award No. UL1TR002649 and NCATS 5KL2TR002648.

Author information

These authors contributed equally: Gioacchino P. Marceca, Rosario Distefano, Carlo M. Croce, Giovanni Nigita.

Authors and Affiliations

Department of Clinical and Experimental Medicine, University of Catania, Catania, Italy
Gioacchino P. Marceca & Alfredo Ferro
Department of Cancer Biology and Genetics and Comprehensive Cancer Center, The Ohio State University, Columbus, OH, USA
Rosario Distefano, Luisa Tomasello, Federica Calore, Pierluigi Gasparini, Carlo M. Croce & Giovanni Nigita
Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
Alessandro Lagana
Section for Clinical Mass Spectrometry, Danish Center for Neonatal Screening, Department of Congenital Disorders, Statens Serum Institut, Copenhagen, Denmark
Francesco Russo
Division of Pulmonary Diseases and Critical Care Medicine, Virginia Commonwealth University, Richmond, VA, USA
Giulia Romano & Mario Acunzo
Fondazione IRCCS Istituto Nazionale dei Tumori (INT), Milan, Italy
Marina Bagnoli
School of Biomedical Sciences and Pharmacy, College of Health, Medicine and Wellbeing, University of Newcastle, Callaghan, NSW, Australia
Pierluigi Gasparini
Hunter Medical Research Institute, New Lambton Heights, NSW, Australia
Pierluigi Gasparini
Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, USA
Qin Ma

Authors

Gioacchino P. Marceca
View author publications
You can also search for this author in PubMed Google Scholar
Rosario Distefano
View author publications
You can also search for this author in PubMed Google Scholar
Luisa Tomasello
View author publications
You can also search for this author in PubMed Google Scholar
Alessandro Lagana
View author publications
You can also search for this author in PubMed Google Scholar
Francesco Russo
View author publications
You can also search for this author in PubMed Google Scholar
Federica Calore
View author publications
You can also search for this author in PubMed Google Scholar
Giulia Romano
View author publications
You can also search for this author in PubMed Google Scholar
Marina Bagnoli
View author publications
You can also search for this author in PubMed Google Scholar
Pierluigi Gasparini
View author publications
You can also search for this author in PubMed Google Scholar
Alfredo Ferro
View author publications
You can also search for this author in PubMed Google Scholar
Mario Acunzo
View author publications
You can also search for this author in PubMed Google Scholar
Qin Ma
View author publications
You can also search for this author in PubMed Google Scholar
Carlo M. Croce
View author publications
You can also search for this author in PubMed Google Scholar
Giovanni Nigita
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

C.M.C. and G.N. conceived, designed, and supervised the project; G.P.M., R.D., L.T. and G.N. designed the database; G.P.M. collected the data; R.D. designed and developed the platform; L.T., A.L., F.R., F.C., G.R., M.B., P.G., A.F., M.A., and Q.M. contributed to the beta-testing phase of the database. G.P.M., R.D. and G.N. wrote the first draft of the manuscript. L.T., A.L., F.R., F.C., G.R., M.B., P.G., A.F., M.A., Q.M. and C.M.C. participated in manuscript revision. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Carlo M. Croce or Giovanni Nigita.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Materials

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Marceca, G.P., Distefano, R., Tomasello, L. et al. MiREDiBase, a manually curated database of validated and putative editing events in microRNAs. Sci Data 8, 199 (2021). https://doi.org/10.1038/s41597-021-00979-8

Download citation

Received: 11 January 2021
Accepted: 10 June 2021
Published: 04 August 2021
DOI: https://doi.org/10.1038/s41597-021-00979-8

This article is cited by

The role of ADAR1 through and beyond its editing activity in cancer
- Yue Jiao
- Yuqin Xu
- Jiao Liu
Cell Communication and Signaling (2024)
Non-coding RNAs as therapeutic targets and biomarkers in ischaemic heart disease
- Andrea Caporali
- Maryam Anwar
- Costanza Emanueli
Nature Reviews Cardiology (2024)
miRNAs in pancreatic cancer progression and metastasis
- Ellie T. Y. Mok
- Jessica L. Chitty
- Thomas R. Cox
Clinical & Experimental Metastasis (2024)
RNA Editing Therapeutics: Advances, Challenges and Perspectives on Combating Heart Disease
- Maria Birgaoanu
- Marco Sachse
- Aikaterini Gatsiou
Cardiovascular Drugs and Therapy (2023)