Introduction

RNA editing refers to a variety of posttranscriptional alterations of RNA molecules, including chemical modifications as well as insertions and deletions of nucleotides, but excluding RNA processing events such as splicing, capping, and polyadenylation1,2. Transcriptome-wide profiling of each type of RNA editing and understanding its biochemical and physiological functions are a major task of molecular and genome biology, and have seen a rapid progress in the last decade3,4,5,6. Among over 100 different types of RNA editing, adenosine (A)-to-inosine (I) editing of RNAs transcribed from animal nuclear genomes is arguably best studied7,8,9. The A-to-I conversion is catalyzed by a family of adenosine deaminase acting on RNA (ADAR) and the resultant I is recognized as guanine (G) in translation. For simplicity, we refer to A-to-I editing as A-to-G editing hereafter. If the editing takes place in protein-coding regions, it could be either nonsynonymous (also known as recoding) or synonymous, depending on whether the encoded amino acid is altered or not. A-to-G editing has been reported in multiple animal phyla10,11, such as many vertebrates10,12,13,14,15,16,17, as well as fruit flies18,19,20,21,22,23, cephalopods24,25,26,27, nematodes28,29, and cnidarians30. Although an editing mechanism could emerge by chance and become fixed by genetic drift31, studies of functional consequences of a handful of A-to-G recoding events led to the initial belief that recoding offers an “extreme advantage32,” because disrupting recoding could be lethal33. This view has been challenged in the last few years by transcriptome-wide analysis of RNA editing. Specifically, there is a long tradition in molecular evolutionary genetics to compare the rate of synonymous nucleotide substitution (dS) with that of nonsynonymous substitution (dN) in protein-coding DNA sequence evolution. As synonymous changes are presumably neutral, while nonsynonymous changes may or may not be neutral, an observation of dN > dS indicates overall positive selection promoting beneficial nonsynonymous substitutions, whereas dN < dS indicates overall purifying selection hindering deleterious nonsynonymous substitutions. Although RNA editing is a molecular phenotype, similar comparisons between synonymous and nonsynonymous editing can be made34. For instance, in humans, the fraction of sites subject to nonsynonymous editing is lower than that subject to synonymous editing and the editing level (i.e., the proportion of RNA molecules edited at a site) is also lower for nonsynonymous than synonymous editing34. These patterns suggest that nonsynonymous editing is generally deleterious and is selectively removed and/or suppressed when compared with synonymous editing, which is presumably inconsequential to protein function. Therefore, most A-to-G coding RNA-editing events appear to be nonadaptive and are probably attributable to cellular errors resulting from ADARs’ limited specificity34. This conclusion is compatible with the fact that only a handful of editing events have known functions33, and that only 1.8% of ~2000 human coding RNA-editing events are shared with mouse35,36.

The trend, however, is drastically different in coleoid cephalopods, which include octopuses, squids, and cuttlefishes. Tens of thousands of coding A-to-G editing events, including a considerable proportion of recoding, have been identified in the neural tissues of coleoids25,27. In particular, the frequency of nonsynonymous sites subject to high levels of editing exceeds that of synonymous sites, leading to the inference that nonsynonymous editing has been promoted by positive selection and is generally advantageous in coleoids25,27. We will refer to this hypothesis as the adaptive hypothesis. Furthermore, because the high editing activity appears to be limited to their neural tissues, it was speculated that the extraordinary abundance of RNA editing in coleoids is related to their complex nervous system and behavior24,25,27,37. Nonetheless, with the exception of recoding of an octopus potassium channel that is associated with cold adaptation26, no benefit of the widespread editing is known in coleoids. Here we propose and provide evidence for an alternative, nonadaptive explanation of the preponderance of highly edited nonsynonymous sites in coleoids.

Results

A nonadaptive hypothesis and its predictions

Let us consider a genomic position in a coding region that is currently occupied by G and does not accept A (see top row in Fig. 1a). As the editing activity in the species rises, a G-to-A mutation at the site may become neutral and fixed if the resultant A is edited back to G in a sufficiently large proportion of mRNA molecules (see middle row in Fig. 1a). Upon the G-to-A substitution, the high editing level at the site will be selectively maintained, because it is G rather than A that is permissible at the mRNA level. As the above situation applies only to nonsynonymous G-to-A substitutions and the coupled nonsynonymous A-to-G editing, it inflates the number of nonsynonymous editing sites and nonsynonymous editing levels relative to the corresponding synonymous values. Although here the nonsynonymous editing has permitted the fixation of the otherwise deleterious G-to-A mutation, the derived genotype with a genomic A that is highly edited is no fitter than the original genotype with a genomic G. Thus, the editing is nonadaptive. We assumed in the above scenario that the editing level is so high that the otherwise deleterious G-to-A mutation becomes neutral. It is also possible that the editing level is not high enough, rendering the G-to-A mutation slightly deleterious (see bottom row in Fig. 1a). A slightly deleterious mutation may nevertheless get fixed and the editing level may be selectively increased in subsequent evolution. Even under this scenario, there is no net fitness gain from the original genotype with a genomic G to the derived genotype with a genomic A that is highly edited. We refer to the above nonadaptive model including both of the described scenarios as the harm-permitting model, because RNA editing permits the fixation of otherwise harmful mutations. Although the possibility of harm-permitting by RNA editing has been proposed multiple times31,38,39,40, especially regarding the editing of organelle transcriptomes, empirical evidence that it is entirely or primarily responsible for creating “adaptive signals” of RNA editing is lacking.

Fig. 1
figure 1

The harm-permitting model and a strategy to detect the harm-permitting effect. a The harm-permitting effect of nonsynonymous editing. The top row shows that, when a nonsynonymous A site is not edited (or is subject to a low level of editing), a G-to-A mutation at the site is too deleterious to get fixed. The middle row shows that, when the site is highly edited, the G-to-A mutation becomes neutral and is fixed by genetic drift. The high editing level is then selectively constrained. The bottom row shows that, when the editing level of the site is intermediate, the G-to-A mutation is slightly deleterious and fixed by genetic drift. The editing level may be further elevated by positive selection (or maintained by negative selection). Despite the relatively high nonsynonymous editing levels in the middle and bottom rows, no adaptation (i.e., no net increase in fitness) occurred when the final genotype is compared with the original genotype. DNA is shown in blue, whereas RNA is in red. Post-edited nucleotides are marked with stars. b Restorative editing restores an ancestral amino acid state lost upon an amino acid substitution, which may have occurred in the exterior branch as shown here or in an earlier branch. In other words, the post-editing state is identical to an ancestral pre-editing state. c Diversifying editing creates an amino acid state that differs from pre-editing states in a set of ancestors considered. Although only the state of one ancestor is shown here, the states of multiple ancestors may be considered. In b and c, X and Y represent different amino acid states, whereas the arrow shows the effect of editing. Restorative but not diversifying editing can confer a harm-permitting effect.

Given the exceptionally high editing activity in coleoid neural tissues25,27, we hypothesize that the reported preponderance of nonsynonymous editing is explained by the harm-permitting model and is nonadaptive. To test this hypothesis, we divide nonsynonymous editing into two categories: restorative and diversifying41. Restorative editing converts the amino acid state back to an ancestral state (Fig. 1b), whereas diversifying editing converts the amino acid state to a non-ancestral state (Fig. 1c). As restorative editing but not diversifying editing can confer a harm-permitting effect, our hypothesis predicts that the reported preponderance of nonsynonymous editing in coleoids is attributable to restorative but not diversifying editing. In particular, we predict that (i) the frequency of sites edited is greater for restorative (FR) than synonymous (FS) editing, and that (ii) the median editing level is higher for restorative (LR) than synonymous (LS) editing. It further predicts that (iii) the frequency of sites edited is no greater for diversifying (FD) than synonymous (FS) editing, and that (iv) the median editing level is no higher for diversifying (LD) than synonymous (LS) editing. By contrast, the adaptive hypothesis does not have specific predictions about FR and LR, but predicts that FD and LD are respectively greater than FS and LS. It is noteworthy that although only restorative editing can be harm-permitting, not all restorative editing is necessarily harm-permitting. For instance, the restorative editing would be neutral if it restores a neutral G-to-A substitution.

Patterns of restorative and diversifying editing

To test the nonadaptive hypothesis, we analyzed the published neural transcriptomes of six mollusk species27, whose phylogenetic relationships are depicted in Fig. 2a. Among them, the four coleoids have widespread coding A-to-G editing in neural tissues, whereas the two outgroups have substantially fewer editing sites27.

Fig. 2
figure 2

Comparison of restorative and diversifying editing with synonymous editing in coleoids. a The phylogenetic relationship of the six mollusks studied here. Branch lengths represent divergence times based on the mid-points of the divergence time ranges in a previous study27. b Frequencies of sites with synonymous (FS), restorative (FR), and diversifying (FD) editing, respectively, in each of the four coleoids. A significant difference between FS and FR (or FD) is indicated by stars above the bin of FR (or FD) (*P< 0.05; **P< 0.01; ***P< 0.001; ****P< 0.0001; ns, not significant; χ2-test). c Synonymous (LS), restorative (LR), and diversifying (LD) editing levels in each of the four coleoids. The lower and upper edges of a box represent the first (qu1) and third quartiles (qu3), respectively, the horizontal line inside the box indicates the median (md), and the whiskers extend to the most extreme values inside inner fences, md ± 1.5(qu3 − qu1). The median editing levels are also given below the corresponding boxes. A significant difference between LS and LR (or LD) is indicated by stars (*P< 0.05; **P< 0.01; ***P< 0.001; ****P< 0.0001; ns, not significant; Mann–Whitney U-test). Source data are provided as a Source Data file.

We identified 3979 one-to-one orthologous genes in these six species and inferred ancestral coding sequences at all interior nodes of the species tree (Fig. 2a). We regarded a nonsynonymous editing event in an exterior node of the tree that modifies the amino acid state from X to Y as restorative if the inferred genomic sequence-based amino acid state is Y at any node of the tree that is ancestral to the focal exterior node (Fig. 1b; also see Methods), or diversifying if Y is not present at any node of the tree that is ancestral to the focal exterior node (Fig. 1c). It is worth noting that these definitions are based on amino acid states and are applied to nonsynonymous editing only. Synonymous editing is presumably neutral, so need not be separated into restorative and diversifying editing. Furthermore, separating synonymous editing into the two categories would be less accurate because of lower reliabilities in inferring ancestral sequences at synonymous sites. Of the two categories of nonsynonymous editing sites, the number of diversifying editing sites is 15.7–20.4 times that of restorative editing sites in the four coleoids (Supplementary Table 1).

In each of the four coleoids, FR is significantly lower than FS when all editing sites are considered (Fig. 2b). However, because the harm-permitting hypothesis concerns relatively highly edited sites, we analyzed sites with editing levels exceeding 10% and found FR to be significantly higher than FS (Fig. 2b inset). We observed that LR is significantly greater than LS in all four coleoids (Fig. 2c). By contrast, FD is significantly smaller than FS when we considered all editing sites (Fig. 2b) or sites with >10% editing levels (Fig. 2b inset). LD is not significantly different from LS except in the squid (Fig. 2c). These results largely confirm all four predictions of the nonadaptive hypothesis and are at odds with the predictions of the adaptive hypothesis, strongly suggesting that the preponderance of nonsynonymous editing in coleoids is explained by the harm-permitting model and is nonadaptive. Figure 2c shows that, although LR is significantly higher than LS in each coleoid, it is lower than 2.5%. One might ask whether such low median levels of restorative editing can be harm-permitting. As mentioned, not all restorative editing is necessarily harm-permitting, which could explain why LR is not particularly high. Nevertheless, Fig. 2c reveals a larger fraction of restorative editing than synonymous editing with appreciable editing levels. For example, in the squid, 29.14% and 15.65% of restorative editing sites but only 23.03% and 6.82% of synonymous editing sites have editing levels >5% and >20%, respectively. Depending on the harm of the G-to-A mutation and the relative dominance of the A and G isoforms, these appreciable levels of A-to-G editing could substantially increase the fixation probability of the G-to-A mutation. It should also be noted that the harm-permitting hypothesis is proposed as an alternative to the adaptive hypothesis. If moderate levels of nonsynonymous editing could be beneficial as asserted by the adaptive hypothesis, there is no reason why they could not be harm-permitting. Furthermore, the general trend of LR > LS and LDLS supports the harm-permitting hypothesis relative to the adaptive hypothesis.

To examine the robustness of our results, we conducted four additional analyses. First, we respectively examined editing sites that are specific to each of the four coleoids, because species-specific editing events have similar evolutionary ages, allowing fairer comparisons. The results obtained are highly similar to those in Fig. 2 and are robust to potential misidentifications of species-specific editing (Supplementary Fig. 1). Second, we probed editing events identified from individual tissues in bimac. FD < FS holds across tissues, but editing level comparisons as well as comparisons between FR and FS are mostly non-significant, likely due to the reduced statistical power as a result of decreased sample sizes (Supplementary Table 2). Third, because editing levels of neighboring editing sites may be co-affected by a mutation, which would reduce the statistical power in comparing synonymous with nonsynonymous editing sites, we compared synonymous editing sites in one half of the gene set with nonsynonymous editing sites in the other half. Specifically, we ranked all genes by the dN/dS ratio between octopus and squid orthologs, and respectively grouped genes with odd ranks into bin 1 and those with even ranks into bin 2. We then compared synonymous editing in bin 1 with nonsynonymous editing in bin 2, as well as synonymous editing in bin 2 with nonsynonymous editing in bin 1. The results (Supplementary Fig. 2) are similar to those obtained from all editing sites (Fig. 2). Fourth, we respectively investigated FR/FS and FD/FS in five editing level ranges (0–20%, 20–40%, 40–60%, 60–80%, and 80–100%) in each coleoid (Supplementary Fig. 3). While both FR/FS and FD/FS generally increase with the editing level, FD/FS is smaller than 1 except when the editing level exceeds 40%. It is important to stress that only a few percent of diversifying editing sites in a coleoid fall in this editing level range (Supplementary Table 3), suggesting that the vast majority of diversifying editing is nonadaptive (see below for quantitative estimates).

Accelerated nonsynonymous G-to-A substitutions

The harm-permitting model further predicts that the rate of nonsynonymous G-to-A substitution relative to that of synonymous G-to-A substitution (dN/dS for G-to-A) should be elevated, because the high editing activity renders some otherwise deleterious nonsynonymous G-to-A mutations acceptable. Furthermore, this elevation should be particularly pronounced in genes exclusively expressed in neural tissues but not in genes unexpressed in neural tissues, because the high editing activity is so far observed only in neural tissues25,27. However, because only bimac and squid have available RNA-sequencing data from several non-neural tissues and because genes unexpressed in neural tissues are not in the transcript sequence data of the octopus and cuttlefish, and hence are excluded from our alignments, we had to define two groups of genes with relatively high and relatively low specificities in neural expression, respectively. The genes with high neural expression specificities are expressed exclusively in neural tissues in the bimac or squid, whereas those with low neural expression specificities are expressed in both neural and non-neural tissues in both the bimac and squid. The harm-permitting model predicts that dN/dS for G-to-A is greater for genes with relatively high neural expression specificities than for those of relatively low neutral expression specificities. As the harm-permitting effect is present only when a G-to-A mutation at a site is deleterious without editing, we focused on nonsynonymous sites that are conserved in the two outgroup species (i.e., nautilus, sea hare, and the immediately ancestral node of the focal species share the same pre-editing state) to increase the sensitivity of our test. Furthermore, the elevation in dN/dS should be specific to G-to-A changes, because the potential harms of other changes such as C/T-to-A and G-to-C/T cannot be alleviated by A-to-G editing.

To this end, we considered all six branches descendent from the common ancestor of the four coleoids. We computed dN and dS of each of these branches using the extant and inferred ancestral sequences, and then calculated dN/dS by dividing the total dN by the total dS of these branches. In support of our prediction, dN/dS for G-to-A changes is greater for genes of relatively high neural expression specificities than those of relatively low specificities (Fig. 3). By respectively bootstrapping the two groups of genes 200 times, we found that the above difference is statistically significant (P < 0.005). By contrast, no significant difference in dN/dS exists between the two groups of genes when C/T-to-A changes or G-to-C/T changes are considered (Fig. 3). It is noteworthy that dN/dS < 1 in all cases in Fig. 3, consistent with the harm-permitting model that does not involve positive selection.

Fig. 3
figure 3

Coleoid nonsynonymous to synonymous substitution rate ratios (dN/dS) for various nucleotide changes. The P-value is based on 200 bootstrap samples; ns, not significant. Source data are provided as a Source Data file.

The potential benefit of shared editing among species

It has been suggested that shared editing among multiple species is likely beneficial, because otherwise the editing status is unlikely to be evolutionarily conserved36. In support of this suggestion was the finding that, even in mammals, where most nonsynonymous editing appears neutral or deleterious, the frequency of conserved sites subject to nonsynonymous editing in both human and mouse significantly exceeds the frequency of conserved sites subject to synonymous editing in both species36. A similar phenomenon was reported in fruit flies23. In coleoids, a sizable fraction of nonsynonymous editing is shared by at least two species and highly edited sites tend to be shared27. To understand the potential evolutionary forces maintaining RNA editing at specific sites across multiple coleoids, we analyzed editing shared by a clade of two or more species.

A nonsynonymous editing event shared by a clade of species that modifies the amino acid state from X to Y is considered restorative if the inferred genomic sequence-based amino acid state is Y at any node of the tree that is ancestral to the most recent common ancestor of the clade, or diversifying if Y is not present at any of these ancestral nodes. In the study of shared editing, we considered the average editing level in the clade where the editing is shared. For each of the three groups of editing respectively shared between octopus and bimac, between squid and cuttlefish, and among the four coleoids, both FR and FD are lower than FS (Fig. 4a). However, when the editing level exceeds 10%, FR is significantly higher than FS for shared editing between octopus and bimac and that between squid and cuttlefish, and FD is significantly higher than FS for shared editing between squid and cuttlefish (Fig. 4a inset). LR and LD are both higher than LS for each group of shared editing (Fig. 4b). A significantly greater FD than FS for shared editing could be caused by (i) positive selection promoting the initial fixation of mutations that lead to nonsynonymous editing and/or (ii) purifying selection preventing the loss of presumably beneficial nonsynonymous editing; therefore, it is a clear indicator of adaptive nonsynonymous editing. A significantly greater LD than LS for shared editing could be caused by (i) positive selection promoting the increase of editing levels of presumably beneficial nonsynonymous editing, (ii) purifying selection preventing the decrease of editing levels of presumably beneficial nonsynonymous editing, (iii) purifying selection preferentially preventing the loss of high-level nonsynonymous editing presumably because high editing levels are associated with larger benefits than low editing levels, and/or (iv) positive selection preferentially promoting the loss of low-level nonsynonymous editing, probably because an A-to-G substitution is favored at an edited site, especially when the editing level is low. Regardless, a significantly greater LD over LS also indicates adaptive nonsynonymous editing. Hence, diversifying editing shared by different coleoids shows adaptive signals, suggesting that a fraction is adaptive.

Fig. 4
figure 4

Patterns of shared editing in coleoids. a Frequencies of sites with shared synonymous (FS), restorative (FR), and diversifying (FD) editing, respectively, among sites shared between the octopus and bimac, between the squid and cuttlefish, and among all four coleoids, respectively. A significant difference between FS and FR (or FD) is indicated by stars above the bin of FR (or FD) (*P< 0.05; **P< 0.01; ***P< 0.001; ****P< 0.0001; ns, not significant; χ2-test).The inset shows the corresponding fractions of sites with editing levels >10%. b Synonymous (LS), restorative (LR), and diversifying (LD) editing levels among the three sets of shared editing sites. The lower and upper edges of a box represent the first (qu1) and third quartiles (qu3), respectively, the horizontal line inside the box indicates the median (md), and the whiskers extend to the most extreme values inside inner fences, md ± 1.5(qu3 − qu1). The median editing levels are also given below the corresponding boxes. A significant difference between LS and LR (or LD) is indicated by stars (*P< 0.05; **P< 0.01; ***P< 0.001; ****P< 0.0001; ns, not significant; Mann–Whitney U-test). c Fraction of sites edited in the common ancestor of the four coleoids that have a genomic G in a coleoid. (****, P < 0.0001; chi-squared test). Source data are provided as a Source Data file.

As most nonsynonymous editing is species-specific (Supplementary Table 1), the above finding is not inconsistent with the analysis of individual species revealing the nonadaptive nature of most editing events. We estimated that, of species-specific diversifying editing sites, 0.15%, 0%, 1.20%, and 0.16% are adaptive in the octopus, bimac, squid, and cuttlefish, respectively (see Methods). Similarly, 3.50%, 3.00%, 14.98%, and 12.26% of shared diversifying editing sites are adaptive in octopus, bimac, squid, and cuttlefish, respectively. Taken together, 1.37%, 1.72%, 4.91%, and 2.18% of diversifying editing sites are adaptive in the four coleoids, respectively.

What is the general benefit of the shared editing that shows adaptive signals? Two hypotheses exist. First, editing may be beneficial because of the intra-organism protein diversity created25,27,32,42. That is, editing allows the existence of two protein isoforms per edited site in an organism, which may confer a higher fitness, analogous to heterozygote advantage at polymorphic sites. Alternatively, editing offers a new isoform that may be simply fitter than the unedited isoform. In this latter hypothesis, the benefit of editing is comparable to that of a nucleotide substitution. To distinguish between these two hypotheses, we focused on sites that are edited in at least three of the four coleoids, because editing should have existed at these sites in the common ancestor of the four species according to the parsimony principle (Fig. 2a). We then estimated the frequency of replacement of editing with an A-to-G substitution in any of the four species. Such replacements are expected to be more or less neutral for synonymous editing. For nonsynonymous editing, such replacements are deleterious under the first hypothesis due to the loss of protein diversity but are neutral under the second hypothesis. Hence, the first hypothesis predicts a lower frequency of such replacements for nonsynonymous editing than synonymous editing, whereas the second hypothesis predicts equal frequencies of such replacements for synonymous and nonsynonymous editing.

Our analysis showed that the frequency of A-to-G substitutions at nonsynonymous editing sites is significantly lower than that at synonymous editing sites (Fig. 4c; Supplementary Table 4). Because it is the shared diversifying editing for which the nature of the benefit is in question, we restricted the analysis to diversifying editing only, but obtained a similar result (Fig. 4c; Supplementary Table 4). Furthermore, A sites edited in at least three coleoid species are more likely to be edited in all four coleoids when the editing is nonsynonymous than when it is synonymous (36.6% vs. 27.6%, P = 0.001, chi-squared test; Supplementary Table 4), suggesting that shared nonsynonymous editing is less likely to be lost than shared synonymous editing. Together, these observations suggest that the benefit of adaptive shared editing is the provision of two protein isoforms per edited site in an organism.

Discussion

The recent discovery of the preponderance of nonsynonymous A-to-G RNA editing among highly edited sites in coleoid neural tissues led to the assertion of widespread adaptive editing in these organisms, but the potential benefits of the editing are unknown. In this work, we proposed an alternative, nonadaptive explanation. Our reanalysis of published transcriptome data from four coleoids and two outgroup species lends strong support to the nonadaptive hypothesis. Combined with previous findings from other species, the new finding suggests a generally nonadaptive nature of coding A-to-G editing among animals. As explained in the harm-permitting model, nonadaptive editing such as some restorative editing, may, however, be selectively protected (middle row in Fig. 1a) or even promoted (bottom row in Fig. 1a). Although such editing events likely originated as molecular errors due to ADARs’ limited target specificity, they are no longer errors today. The fact that a nonadaptive feature can nevertheless be under purifying selection or even be positively selected is well known in evolutionary biology40,43.

In the harm-permitting model, A-to-G editing permits the fixation of otherwise deleterious G-to-A mutations and hence the editing is nonadaptive. In theory, it is also possible that A-to-G editing emerged in evolution after a G-to-A substitution at the same site. If the substitution is slightly deleterious, the editing would be slightly beneficial (i.e., compensatory). However, such sites have minimal contributions to FR and LR, so this possibility does not alter our interpretation of the nonadaptive nature of restorative editing (see Methods).

The principle of our test of the nonadaptive hypothesis of RNA editing is similar to that of the test of the adaptive hypothesis, except that the new test requires a distinction between restorative and diversifying editing, which in turn depends on ancestral coding sequences inferred for the interior nodes of a phylogeny (Fig. 1b, c). Although ancestral sequence inference is generally reliable, it is not expected to be 100% correct44. Will errors and potential biases in this inference bias our test? The answer is no. FR is the number of edited sites with an ancestral nonsynonymous G-to-A substitution divided by the total number of sites with an ancestral nonsynonymous G-to-A substitution. As our ancestral sequence inference is based on genomic sequences and is blind to RNA editing, any potential bias in estimating the number of sites with an ancestral G-to-A substitution is cancelled out in computing FR. The same applies to FD. Errors and potential biases in ancestral sequence inference only increase the stochastic errors of FR and FD estimates, reducing the statistical power in testing our hypothesis. Notwithstanding, the vast majority of our key statistical tests yielded significant results, suggesting that sufficient statistical power remains in these tests.

Although our study explains the preponderance of nonsynonymous editing in coleoids, we have not addressed a related question—why the editing activity was drastically elevated in neural tissues during coleoid evolution. A substantial rise in editing activity is expected to be harmful, because its effect is similar to inducing A-to-G mutations. Indeed, expression of the human ADAR2 gene in the budding yeast Saccharomyces cerevisiae, which does not naturally possess any ADAR gene, inhibits yeast growth because of ADAR2’s RNA editing activity45. Our observation of a significantly lower FD than FS in every coleoid examined (Fig. 2b) strongly suggests that diversifying editing is generally deleterious and has been selectively purged. Hence, it is almost certain that the pervasive coding RNA editing was not the reason for the elevation of the editing activity in coleoids but its byproduct. Whatever the reason was, the relevant benefit must at least offset the harm from pervasive nonsynonymous editing, under the assumption that the evolutionary elevation of the editing activity was not due to genetic drift alone, because the population size of ancestral coleoids was probably not small. It is worth mentioning that a number of physiological functions have been proposed for A-to-G editing, including suppressing the proliferation of transposons46, inhibiting viral replication47, marking RNAs for degradation32, marking RNAs to prevent innate immunity against self-RNAs48,49, regulating alternative splicing32, and modulating nuclear retention of RNAs32. As the primary physiological function of A-to-G editing is unknown, it is difficult to discern why the editing activity rose drastically in coleoids.

Similar to previous findings in mammals and flies23,36, we observed some adaptive signals from nonsynonymous editing shared between species. Our additional analysis suggests that the benefit of these adaptive editing events lies in the protein diversity brought by editing. While this finding supports the prevailing view on why coding RNA editing may be adaptive, it is important to stress that, based on our estimation, only about 2.5% of nonsynonymous editing appears adaptive in an average coleoid.

Liscovitch-Brauer and colleagues27 noted that flanking regions of sites edited in multiple species tend to be evolutionarily conserved and asserted that coleoids “use extensive RNA editing to diversify their neural proteome at the cost of limiting genomic sequence flexibility and evolution.” However, we believe that the observation prompting such a conclusion is caused by an ascertainment bias. Specifically, because of the various requirements for a site to be edited, such as specific flanking sequences27 and secondary structures50, a shared editing site by definition satisfies these requirements in its neighborhood in multiple species. Thus, the site is expected to show a higher interspecific similarity in flanking sequences than a randomly picked site, regardless of whether the editing is shared because of selective constraints or not. The same ascertainment bias occurs in the comparison of intraspecific polymorphisms of flanking sequences between shared editing sites and random sites. In particular, given the flanking sequence requirement for editing, an edited site with a lower flanking sequence polymorphism is expected to be edited in a greater percentage of individuals in the species. Hence, provided that a site is found to be edited in multiple species when only one individual is examined per species, the polymorphism is expected to be low irrespective of the presence/absence of selective constraints on the editing.

The nonadaptive hypothesis we proposed is based on the harm-permitting effect of high levels of editing, which inflates the frequency and level of restorative editing, relative to those of synonymous editing. As previous comparisons of synonymous and nonsynonymous editing in non-coleoid species never considered this effect, one wonders whether their conclusions are still valid. Ignoring the harm-permitting effect renders conclusions of nonadaptive editing more conservative. Hence, such conclusions should still hold. For claims of adaptive editing that are based on comparisons between synonymous and nonsynonymous editing frequencies and levels, a reanalysis taking into account the harm-permitting effect is warranted. In other words, a significantly greater FD than FS and/or a significantly greater LD than LS are required to demonstrate positive selection promoting nonsynonymous editing. This is especially true to the group of fungi that show pervasive A-to-G editing as in coleoids51,52,53.

It is worth mentioning that transcriptome-wide analyses of several other types of RNA editing such as C-to-U editing54 and m6A modification (methylation of A at the nitrogen-6 position)55 also suggest that most editing events are nonadaptive. In addition, variations in several steps of RNA production and processing such as alternative transcriptional initiation56, alternative splicing57, and alternative polyadenylation58 have been shown to be largely molecular errors. Similarly, it is plausible that variations in the translational process such as stop-codon read-through59 and events of posttranslational modifications such as phosphorylation60 and glycosylation61 are primarily manifestations of molecular errors. Whether it is generally true that phenotypic variations at the molecular level are less likely to be adaptive than those at the cellular, tissue, organ, and organismal levels is worth exploration62.

Methods

Transcriptomes, editing sites, and ancestral sequences

The transcriptomes of six mollusk species and the list of A-to-I editing sites in the four coleoid species were previously published27. We extracted coding sequences from the previously assembled transcriptomes27 on the basis of the annotations in the data set and converted Gs in the sequences to As at edited sites. In some genes, we observed stop codons occurring upstream of the last three nucleotides of the annotated coding sequence, possibly due to erroneous inclusions of 3′-untranslated regions. We therefore removed nucleotides downstream of the first stop codon in these sequences. All but one A-to-G editing site in the data are upstream of the first stop codons, suggesting that these annotation errors barely influenced the previous analysis of RNA editing. If a gene appeared more than once in the original dataset for a species, only the longest sequence was retained in our analyses.

Orthologous genes among the six mollusks were previously identified27 and a total of 3979 genes have orthologs in all 6 species and contain at least 1 A-to-G editing site in at least 1 coleoid. We first made a protein sequence alignment of orthologous sequences using MAFFT63 and then generated a coding sequence alignment of these genes using PAL2NAL64. Ancestral sequences were inferred using the codeml program in PAML465 under default parameters without excluding gap sites and the best joint inferences of all interior nodes were used in subsequence analyses. The unrooted topology of the tree in Fig. 2a was used in ancestral sequence inference. Subsequent analyses used in-house Perl scripts.

All reported editing sites in the 3979 genes27 were included in our analyses, unless otherwise noted. Although some editing sites may be sequencing errors, the probability of error is expected to be low given the tiny amount of other types of DNA–RNA mismatches observed27.

Restorative and diversifying editing

The tree in Fig. 2a shows three interior nodes ancestral to each coleoid species. A coding A site in a coleoid is considered a potential site for restorative editing if changing the A to G is nonsynonymous and if the corresponding amino acid after the change becomes identical to the amino acid state at any one of the three ancestral nodes. A potential site for restorative editing becomes a restorative editing site if it is edited in the focal species. By definition, FR is the number of sites with restorative editing divided by the number of potential sites for restorative editing, whereas LR is the median editing level at restorative editing sites. A coding A site in a focal species is considered a potential site for diversifying editing if changing the A to G is nonsynonymous and if the corresponding amino acid after the change differs from all amino acid states of the three ancestral nodes. A potential site for diversifying editing becomes a diversifying editing site if it is edited in the focal species. By definition, FD is the number of sites with diversifying editing divided by the number of potential sites for diversifying editing, whereas LD is the median editing level at diversifying editing sites. FS is the number of sites with synonymous editing divided by the number of A sites where A-to-G editing would be synonymous, whereas LS is the median editing level at synonymous editing sites34. Although the comparison between FR (or FD) and FS, and that between LR (or LD) and LS are not entirely independent from each other, each comparison is fair.

An editing event is considered to be shared by a clade of two or more species if the event occurs in all species of the clade in the tree of Fig. 2a and all of these species have the same pre- and post-editing amino acid states. In studying shared editing by a clade, we followed the above procedure in distinguishing restorative from diversifying editing, except that we considered all interior nodes ancestral to the most recent common ancestor of the clade instead of all interior nodes ancestral to one species.

Comparing median editing levels

When the mRNA concentration is low, RNA editing cannot be detected unless the editing level is sufficiently high. This bias would make the median editing level appear higher in weakly expressed genes than strongly expressed genes even when no such difference actually exists. To alleviate this bias, we considered only those sites that are covered by at least 400 RNA-sequencing reads when comparing median editing levels. Nevertheless, the bias does not affect the comparison between synonymous and nonsynonymous editing, because their detectabilities are equally influenced by the gene expression level. For a shared editing site, the average editing level and average read number of all species in the focal clade are used to represent the site. We did not apply editing level cutoffs in the comparison of editing levels of different sites due to potential biases that may arise.

Proportion of diversifying editing that is adaptive

Under the presumption that the excess of FD over FS represents adaptive editing, we calculated FD and FS in each of 10 editing level intervals (0–10%, 10–20%, till 90–100%). For each interval exhibiting FD > FS, the number of adaptive diversifying editing sites equals ADP = ND(1 − FS/FD), where ND is the number of diversifying editing sites in the interval. Summing up these ADP numbers yields the total number of diversifying editing sites that are adaptive.

Contributions of compensatory editing to FR and LR

In the harm-permitting model, A-to-G editing permits the fixation of otherwise deleterious G-to-A mutations and hence the editing is nonadaptive. In theory, it is also possible that A-to-G editing emerged in evolution after a G-to-A substitution at the same site. If the substitution is slightly deleterious, the editing would be slightly beneficial (i.e., compensatory). For several reasons, such sites should contribute minimally to FR and LR. First, the probability that the G-to-A substitution occurred in the most recent common ancestor of cephalopods (the top five species in Fig. 2a) or more recently is small, because it could occur at any time prior to the emergence of the editing at the site, which most likely took place when the cellular editing activity rose substantially in the branch immediately preceding the common ancestor of coleoids. Hence, the probability that the editing is classified as restorative is small and such compensatory events are unlikely to affect our analysis of restorative editing sites. Although such compensatory events are potentially included in diversifying editing sites we analyzed, diversifying editing still show lower editing frequencies and editing levels when compared with synonymous editing. Thus, our interpretation that diversifying editing is overall under purifying selection remains valid. Furthermore, even for the minority of compensatory events that are classified as restorative, the impact is small. This is because deleterious G-to-A mutations that could get fixed without editing are presumably only slightly deleterious. Hence, the benefit of A-to-G editing at such sites is also presumably small such that their editing level may not be selectively raised or selectively maintained at high levels. More importantly, there will be a comparable number of slightly beneficial G-to-A substitutions followed by slightly deleterious A-to-G editing that are included in the category of restorative editing. The effects of these two groups of events are likely cancelled out.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.