Introduction

Ribosomes presumably evolved through serial accretions of tRNAs and tRNA-like RNAs1,2,3,4,5,6,7,8,9,10,11. The ribosomal dimeric RNA core surrounding the peptide synthesis site12,13,14 also resembles tRNA dimers linked by complementary anticodons, according to the self-referential hypothesis on the origin of translation15,16,17,18,19. Evidence for this process exists also in modern vertebrate mitochondrial ribosomes: regular mitochondrial tRNAs constitutively fulfill 5S rRNA functions20,21. In the latter case, extreme mitogenome reduction perhaps reversed evolution to a tRNA-insertion stage, enabling further mitogenome reduction. These evidences suggest that rRNAs derived from tRNAs.

tRNA accretion

Several hypotheses suggest different historical scenarios for tRNA evolution, all assuming accretions of smaller sequences22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41. A similar hypothesis exists for 5S rRNAs42.

Some evidence suggests that tRNAs originate from stem-loop hairpins initiating replication43,44,45,46,47,48,49,50. Other analyses show striking similarities in nucleotide triplet biases of tRNAs and protein coding genes51,52. Theoretical RNA rings, sequences artificially designed according to coding constraints53,54 seem homologous to tRNAs55,56,57.

rRNA accretion history: cladistics

Two main approaches have been developed and used to recover accretion histories of ribosomal RNAs. Both consider secondary structure subcomponents of rRNAs as units undergoing this process. One approach is based on homology, character polarity58 and cladistic comparisons to infer accretion history from comparisons among numerous sequences59. This classical comparative biology method uses parsimony as its main conceptual tool60 and was also used to recover evolution of molecular functions61,62 and protein accretion63,64. Various empirical tests show that this method recovers actual histories better than chance65,66,67,68,69,70,71,72,73,74,75.

rRNA accretion history: structure

A second approach is structure-based, and assumes that the ribosome grew from its spatial core towards its periphery, with the most ancient structural subcomponents located at the physical center of the ribosome, and the more recent ones at its periphery76,77,78. The method corresponds to that of spatial comparisons in disciplines such as plant community ecology. Structures encompass large amounts of information: in ribosomes, contact biases between amino acids and nucleotide triplets recover the very ancient evolution of genetic code codon-amino acid assignments79. Though reasonable, the structural method lacks to our knowledge further empirical tests in contexts of reconstructing biomolecular histories, but one of its merits is that for each taxon for which accurate structural data are available, it produces (slightly) different histories, enabling to search for consensuses.

The theoretical premises of the structural approach are in observations that ontogenies of different structures recover their phylogenies: chemical prebiotic evolution80; genetic code evolution81; embryology82,83; and ecological communities84. Spatial variation in vegetation can reconstruct the ontogeny of forests (forest succession85), but plant colonization at forest periphery and clearings differ from de novo colonization of areas where no forest is adjacent and no humus exists: primary and secondary successions differ86. In addition, the structural model unrealistically assumes equal ribosomal growth in all directions from the core to the periphery87,88,89. Its name, the onion peeling model, is formally incorrect (in onions, peripheral rings are most ancient), reflecting emphasis on structure rather than historical process90.

Comparing accretion histories: cladistic vs structure

Overall, one can assume that both approaches complete each other, one recovering history using phylogenetic methods, and the other using principles from ecology and embryology for historical reconstruction. Accretion ranks of the 16S rRNA secondary structure subcomponents according to cladistic- and structure-based methods differ (Fig. 1). This analysis shows some congruence between accretion ranks obtained by the two independent methods, for 26 among 44 secondary structure elements (59%), which is not significantly more than 50% according to a one tailed sign test. The highest percentage of secondary structure subelements with reasonable match between accretion ranks from the two methods is for 16S rRNA domain 3, the lowest percentage is for domain 2. Notably, domain 4, presumed most ancient and consisting of two secondary structure subelements, has one element where both methods are highly congruent, and have very different ranks for the other subelement.

Figure 1
figure 1

Accretion rank of 16S rRNA structural subelements according to the structural onion model (periphery most recent78 ranks therein from Fig. 2) as a function of accretion rank according to the phylogenetic method (59, ranks are therein from the phylogenies for 16S secondary structure elements in the Fig. 2 and in their corresponding supplementary figure). Accretion ranks are divided by the highest rank according to that method (structural, 27; phylogeny, 39), then multiplied by 100. Full symbols indicate structural subelements for which the absolute value of the difference between accretion ranks (divided by maximal ranks) is <25, hollow symbols have differences >25. Considering all 44 datapoints, the correlation between the two methods is r = 0.308, P = 0.021, meaning that 9.5% of the variation is common between methods (a,b); for the 26 filled symbols, r = 0.898, P = 0, 80.6% of the variation is common. Hence methods (a,b) are congruent for 26/44 × 100 = 59% of the structural subelements.

Secondary structure classification

The overall impression resulting from Fig. 1 is that both structural and phylogenetic methods have some level of congruence, for a bit more than half of the secondary structure subelements, across all four 16S rRNA structural domains. Hence, for almost half of the secondary structure subelements, we do not know the accretion rank. A third independent method for estimating RNA history could improve the resolution of rRNA accretion ranks.

A method clustering RNA secondary structures found two main RNA secondary structure groups, one characterized by small, presumably ancient tRNA-like secondary structures, and a presumed more derived group, characterized by larger, rRNA-like secondary structures, including viruses91,92. The tRNA-like cluster was designed as tRNA-like because it included tRNAs. The decision to assume it is most ancient was not only based on the inclusion of tRNAs in that cluster. This cluster includes a high diversity of RNA types (viroids, ribozymes, tRNAs, replication origins, 5S rRNAs). Ancient groups tend to be more diverse because more time is available for “evolutionary radiation” (this term from species evolution might not be adequate in context of RNA species). The same rationale was applied to functional tRNA species, ranking as most ancient those with the highest diversity of isoacceptor tRNAs93.

The decision to consider the other RNA cluster as rRNA-like was because this cluster included all subdomains of small and large rRNAs. Note that this clustering is phenotypic, based on secondary structure similarities, not phylogenetic. The assumption that tRNA-like structures are primitive, and that rRNA-like structures are more derived is in line with the tRNA-accretion hypothesis for rRNA formation. Results show that tRNA-like RNAs have few unpaired nucleotides within stems (bulges); for rRNA-like secondary structures, the proportion of bulges among all unpaired nucleotides is greater. Bulges are targets for regulation and enzymatic degradation, properties of advanced metabolism. In prebiotic conditions, these might be disadvantageous, increasing degradation risks.

Polarity of the tRNA-rRNA axis of RNA secondary structure evolution

This assumption about the evolutionary direction of secondary structures was tested explicitly on tRNAs from diverse organisms (organelles, Archaea, Bacteria, Eukaryota and Megavirales). First, similarities of all tRNAs from specific organisms with tRNA-like vs rRNA-like groups91 were estimated, projecting each tRNA secondary structure on a presumed tRNA-rRNA axis of RNA secondary structure evolution. Then correlations were calculated between the genetic code inclusion rank of the tRNA cognate amino acids94 and this tRNA-rRNA similarity score, expecting that tRNAs with relatively recent cognates have more rRNA-like secondary structures, and those with ancient cognates, are more typically tRNA-like. Results were overall positive (weakest in Eukaryota), confirming tRNA-rRNA polarity: two independent scales of evolutionary ranks, one for amino acids, and one for RNA secondary structures, converge56. Here again, polarity is not deduced from phylogenetic reconstructions, but from presumed orders of integration of the tRNA’s cognate amino acid.

Note that the phylogenetic and the structural methods also make polarity assumptions. In the former, these are deduced from cladistic parsimony principles95, in the latter, from structure: the more peripheral a structural element in the ribosome, the more recent, including information on stacking interactions among subdomains78,96. These results strengthen the hypothesis that tRNAs are ancestral and rRNAs derived.

Independent references for RNA evolution

The tRNA-rRNA evolutionary axis score is based on a sample of known RNA secondary structures. Hence, it suffers from sampling biases, and from some level of circularity: biological data are used to infer on biological phenomena, a caveat it shares with the phylogenetic method. A possible solution to this is to use as reference theoretical minimal RNA rings, a set of short sequences designed in silico according to few basic constraints: the shortest possible sequence coding for a start and a stop codon, and once for each of the 20 biogenic amino acids.

These constraints define at most 25 circular RNA sequences of 22 nucleotides, which code according to partially overlapping codons, along three consecutive translation rounds, for a start codon, 20 different amino acids, and a stop codon. The stop codon is physically next to the start codon, closing the RNA ring. These RNA rings, mainly defined by coding sequences, resemble ancestral tRNAs97,98, with a predicted anticodon and its corresponding cognate amino acid for each RNA ring55.

The theoretical minimal RNA rings realistically mimic primitive RNAs and their evolution, along several coding properties99,100,101,102 and primary and secondary structure properties50,56,57. These properties coevolve with the genetic code integration order of the cognate amino acid matching the anticodon defined by homology of the RNA rings with ancestral tRNAs50,56,57,99,100,101,102. Considering that the design of RNA rings is purely rational and mainly based on the structure of the genetic code, this means that the genetic code’s structure intrinsically embeds information on the evolution of these various properties. However, we do not yet understand what determines these complex evolutionary trajectories.

Notably, the tRNA-rRNA scores obtained for secondary structures of these RNA rings, correlate, as observed for real tRNAs56, with the evolutionary ranks of integration of the cognate amino acids matching their predicted anticodons57. This parallels the result described in the previous section for regular tRNAs and the genetic code integration order of their cognate amino acid56. Here too, the polarity results from this order, not from phylogenetic reconstruction.

Working hypothesis and predictions

Hence, RNA rings are designed as proto-mRNAs but have also properties that are expected for proto-tRNAs. As plausible proto-tRNAs, they are used here as references for ancestral RNAs, in line with results of evolutionary analyses of their different properties50,56,57,99,100,101,102. Analyses use similarities between RNA ring secondary structures and those of structural subelements of 16S rRNAs. The method assumes that high similarities with RNA ring secondary structures indicate ancient structural subelements, and low similarities recent 16S rRNA structural subelements. These similarities are then compared with accretion ranks produced by each of the phylogenetic and the structural hypotheses, expecting: 1. negative correlations if the different methods are producing congruent accretion ranks; 2. these correlations should be most negative for RNA rings with ancient cognate amino acids, and gradually be more positive for RNA rings with recent cognate amino acids.

Materials and methods

The quantification of similarities between secondary structures is identical to previous analyses56,57,91,92. Optimal secondary structures of spliced RNA rings were predicted by Mfold103.

Four secondary structure properties are extracted from secondary structures, as shown as example for structural subelement h45 from the archaean Thermus thermophilus 16S rRNA (Fig. 2): 1. the percentage of nucleotides in stems formed by complementary self-hybridization among nucleotides, %stem among all nucleotides in the sequence; 2. the percentage of nucleotides, among those in loops, that are in loops topping stems (external loops), as opposed to unpaired nucleotides forming bulges within stems (internal loops), %eloops; and the 3. stem and 4. loop GC contents, in percentages.

Figure 2
figure 2

Secondary structure of domain IV (ochre, structural subelements h44 and h45) and part of domain III (pink, structural subelement h43) of 16S rRNA of Thermus thermophilus (adapted from http://rna.ucsc.edu/rnacenter/images/figs/thermus_16s_2ndry.jpg). Boundaries between secondary structure subelements are from Fig. 2 in59. Subelement h44 ranges from nucleotides 1397 to 1505. Its only external loop is from nucleotides 1450 to 1454. Sixty nucleotides are involved in stems (G-U included, C-A, U-C and G-A excluded and considered as internal bulges). Hence, a total of 41 nucleotides are considered unpaired, including the external loop. %stem = 100 × 60/101 = 59.4; %eloop = 100 × 4/41 = 9.8; %GCstem = 100 × 52/60 = 86.7; and %GCloops = 100 × 22/41 = 53.7.

Similarities between two secondary structure pairs are estimated by Pearson correlation coefficients r between these four variables as obtained for each secondary structure (Fig. 3), in this case between values from Fig. 2 and those of secondary structures formed by two alternative splicings of RNA ring 25, also called AB53. Table 1 presents the four secondary structure variables for AB for all 22 alternative splicings of that RNA ring. Such data were obtained for all 25 RNA rings. Similar secondary structure data for 22 alternative splicings of RNA ring 13, called AL, were presented previously57, (therein Table 3). For each comparison, Fig. 3 has four datapoints for each secondary structure, one datapoint per secondary structure variable. For each datapoint, the X-axis is defined by the value obtained for the AB secondary structure, and the Y-axis by the value obtained for the corresponding variable for the 16S secondary structure subelement shown in Fig. 2. These pairings are not arbitrary: the x- and y-axis values are for the same secondary structure property, but for a different secondary structure (x-axis, RNA ring 25; y-axis, rRNA structural subelement, in this case h45 of Thermus thermophilus). Similarities are estimated by r, the more positive r, the more similar the secondary structures.

Figure 3
figure 3

Similarity between secondary structure properties of structural subelement h45 of Thermus thermophilus 16S rRNA secondary structure and those of the secondary structure formed by AB (Table 1, secondary structures corresponding to splicing 7 and 19, filled and hollow symbols, respectively), as estimated by Pearson’s correlation coefficient r (note that r-squares are indicated in the figure). Each datapoint represents one of the four variables extracted from secondary structures, Y-axis values are from Fig. 2. Similarity with AB secondary structures, splicings 7 and 19, are: r = 0.633 and r = −0.979. The latter similarity is statistically significant at P < 0.05 (and indicates a stronger than random lack of similarity), the former indicates no similarity.

Table 1 Secondary structure variables extracted as in Fig. 2 explanations, for the 22 secondary structures formed by RNA ring 25 (AB, TATGAATGGTGCCATTCAAGACTA)53, according to 22 splicing positions. Splicing at position 1 corresponds to the splicing position producing highest homology with an ancestral tRNA55, each splicing of the RNA ring is shifted by a single nucleotide. These secondary structure data were used previously57,109,110,111,.

The secondary structure variables of all secondary structure subelements of two Archaea, Thermus thermophilus and Sulfolobus solfataricus104 (Table 2), two bacteria, Escherichia coli and Streptomyces coelicolor105 (Table 3), and the 18S rRNA of two eukaryotes, Homo sapiens and Saccharomyces cerevisiae (Table 4). Secondary rRNA structures for prokaryote 16S of Thermus thermophilus, Escherichia coli, and eukaryote 18S Saccharomyces cerevisiae and Homo sapiens are available at http://apollo.chemistry.gatech.edu/RibosomeGallery/.

Table 2 Variables extracted from secondary structures of 16S rRNA of archaeans Thermus thermophilus and Sulfolobus solfataricus. Columns are: 1. secondary structure subelement of 16S rRNA; 2 and 3. accretion ranks according to phylogenetic and structural models, respectively59,78 and 4–7, secondary structure variables as in Table 1 (explained in Figs. 2 and 3). Domains range from 1. h1-h18; 2. h19-h27; 3. h28-h43; 4. h44-h45.
Table 3 Variables extracted from secondary structures of 16S rRNA of bacteria Escherichia coli and Streptomyces coelicolor. Columns 2–9 correspond to columns 4–11 in Table 2.
Table 4 Variables extracted from secondary structures of 16S rRNA of eukaryotes Homo sapiens and Saccharomyces cerevisiae. Columns 2–9 correspond to 4–11 in Table 2.

Step by step description of analyses

  1. 1.

    There are 25 RNA rings, each 22 nucleotide long. These are considered according to the splicing matching homology with ancestral tRNAs, as shown previously (Table 1 in50,57,100,102 and Table 2 in101).

  2. 2.

    Each RNA ring can be spliced at 22 positions, and a different optimal secondary structure (predicted by Mfold103) exists for RNA ring sequences spliced at each potential splicing position. The 25 RNA rings form 25 × 22 = 550 secondary structures.

  3. 3.

    Four secondary structure variables are extracted from each of these 550 secondary structures. Table 1 presents as an example these four variables for the 22 alternative splicings of a specific RNA ring, RNA ring 25.

  4. 4.

    For each of the (about 45) structural subelements of small rRNA subunits of the 6 examined organisms, the four secondary structure variables are extracted, as was done for the 550 RNA ring structures at step 3. These variables are presented for the 6 × 45 = 270 secondary structure subelements presented in Tables 24.

  5. 5.

    The secondary structures of RNA rings are compared to the secondary structures of rRNA structural subelements by analyses as presented in Fig. 3. These analyses plot the values obtained for each of the 4 secondary structure variables of a rRNA structural subelement as a function of the corresponding values obtained for a given RNA ring secondary structure. A Pearson correlation coefficient r, called rS, estimates similarities between rRNA and RNA ring secondary structures. Figure 3 presents comparisons between 16S rRNA subelement h45 of Thermus thermophilus and two RNA ring 25 secondary structures, one obtained by splicing that ring at position 7, and one at position 19.

    For each of the 550 RNA ring secondary structures, there are as many rS as there are rRNA secondary structure subelements, about 45.

  6. 6.

    According to our hypothesis, the (about) 45 rSs comparing a given RNA ring secondary structure to all rRNA structural subelements are potential estimates of the accretion order of the rRNA secondary structures.

  7. 7.

    These rSs are compared to the accretion order of the rRNA secondary structure subelements, as these were determined by other methods and published by other authors (separately for each cladistic and structural accretion ranks). This comparison is done by calculating the Pearson correlation coefficient between the rS and the accretion orders, producing rH, one for the cladistic method, rHphyl, and one for the structural method, rHstru. Note that rS are z-transformed before calculating rH using the formula z = −ln((1 + r)/(1 − r)). The z transformation linearizes the scale of r, which is not linear.

  8. 8.

    Hence, each of the 550 RNA ring secondary structures produces one rHphyl and one rHstru per organism. For each organism, there are 550 rHphyls and 550 rHstrus. The minimal and maximal rHphyls and rHstrus for each organism are in Table 5. Table 5 includes percentages of negative rHphyls and rHstrus (the working hypothesis expects negative rHs), and numbers of negative and positive rHphyls and rHstrus that have two tailed P < 0.05.

    Table 5 Most negative and most positive Pearson correlations coefficients r (x100) (rH) between accretion ranks according to phylogenetic (rHphyl) and structural (rHstru) models with secondary structure similarities with RNA rings for 16S rRNAs of six organisms, and percentages of rHs (%neg) that are negative as expected by the working hypothesis among the 550 correlation calculated for each rHphyl and rHstru, for each organism. * indicates statistically significant differences (P < 0.05) from 50% (550/2 = 275 negative rHphyl and rHstru are expected if the sign of rH has an unbiased distribution between negative and positive trends) according to a chi-square test. “Co” indicates the cognate amino acid corresponding to the anticodon of the RNA ring(s) producing these correlations. Cognate G always corresponds to RNA ring 25 (AB). N indicates numbers of datapoints involved in the calculation of rH correlation coefficients.
  9. 9.

    For any given RNA ring secondary structure, there are 6 rHphyls and 6 rHstrus, because analyses were done for 6 organisms. There are in total 6 × 550 = 3300 rHphyls and 3300 rHstrus. Further analyses describe general patterns within these data, according to RNA rings, and according to splicing positions.

  10. 10.

    For each RNA ring, there are 22 secondary structures which produce 22 rHphyls and 22 rHstrus per organism, hence 6 × 22 = 132 rHphyls and 132 rHstrus across all 6 organisms. An alternative way to explain this is: for each of the 25 RNA rings, there are 3300/22 = 132 rHphyls and 132 rHstrus across all 6 organisms.

    Percentages of negative rHphyls and rHstrus for each RNA ring (calculated among the 132 rHphyls and among the 132 rHstrus, pooling all organisms) are used in the y axis of Fig. 4.

    Figure 4
    figure 4

    Percentage of negative Pearson correlation coefficients r between accretion ranks (phylogenetic method, filled symbols; structural method, hollow symbols) and similarities between 16S rRNA and RNA ring secondary structures, r’s pooled across organisms and secondary structures formed by the 22 alternative splicing of each RNA ring, as a function of the genetic code integration order of the RNA ring’s predicted cognate amino acid according to Davis’s hypothesis on N-fixing amino acids105. The working hypothesis expects negative r’s in particular for ancient amino acids.

  11. 11.

    There are 25 RNA rings. Hence, for a given splicing position, there are 25 rHphyls and 25 rHstrus. Pooling these data across 6 organisms, for any given splicing position, there are 6 × 25 = 150 rHphyls and 150 rHstrus across all 6 organisms. Percentages of negative rHphyls and rHstrus for each splicing position, calculated from these 150 rHphyls and 150 rHstrus, consist the y axis in Fig. 5.

    Figure 5
    figure 5

    Percentage of negative Pearson correlation coefficient r between accretion ranks (phylogenetic method, filled symbols; structural method, hollow symbols) and similarities between 16S rRNA and RNA ring secondary structures, r’s pooled across organisms and RNA rings, as a function of the splicing position of the RNA ring. The splicing position with the highest percentage of negative correlations is position “1”, which corresponds to the splicing that produces the best homology between RNA rings and ancestral tRNAs57.

Analyses in Table 5, Figs. 4 and 5 each take into consideration all 3300 rHphyls and 3300 rHstrus. Hence, these are not biased representations of the data. They show separately effects of each ‘treatment factor’ (organism, RNA ring, splicing position) on each rHphyl and rHstru.

Results and discussion

There are 25 theoretical minimal RNA rings. Each has exactly 22 nucleotides, hence each RNA ring has 22 alternative splicing positions. Different splicings produce different sequences forming different secondary structures, as shown for RNA ring 25, AB, in Table 1, and previously for RNA rings 9106, (therein Table 1) and 1357, (therein Table 3). Hence, there are 25 × 22 = 550 secondary structures to which secondary structure subelements of the 16S rRNAs can be compared.

The secondary structure variables shown in Table 1 and Figs. 2 and 3 were extracted for each of these 550 RNA ring secondary structures and are compared, as shown in Fig. 3, with the corresponding secondary structure variables of all secondary structure subelements of all six organisms considered here.

Table 5 shows the most negative and the most positive rH correlations between secondary structure similarities and accretion ranks according to the phylogenetic and the structural method for each of the six organisms (rHphyl and rHstru, respectively). Similarities (rS) were between the secondary structure variables described in Tables 24 and corresponding variables for the secondary structures formed by each of the 22 alternative splicings of each of the 25 theoretical minimal RNA rings. Considering that the main prediction of the working hypothesis expects negative correlations, it is notable that in each organism, the absolute values of the negative correlation is larger than the absolute value of the positive correlation, besides for one among 12 comparisons, according to the structural method, for Sulfolobus solfataricus.

Similarly, percentages of negative correlations are in all organisms, for both rHphyl and rHstru, always greater than 50%, significantly so according to a chi-square test in all but three among 12 tests, rHphyl in Homo, rHstru in yeast and in Sulfolobus. In addition, percentages of negative rHs are significantly greater for rHphyl than rHstru within three among six species, Sulfolobus, Streptomyces and yeast. In Homo, percentages of negative rHstru were significantly greater than percentages of negative rHphyl. The overall pattern is that results match the working hypothesis, and this more for rHphyl than rHstru. The opposite occurs in Homo. This could be interpreted as due to recent evolution of small rRNA structure in that species, but would require additional analyses and data from other species.

A second noteworthy point is that the most positive correlations are in 7 among 12 cases with RNA ring 2, which has a predicted anticodon for a stop codon, coding sometimes for selenocysteine. This is presumably one of the latest amino acids integrated in the genetic code (21st). This result fits the prediction that the most positive correlations between accretion ranks and secondary structure similarities would correspond to RNA rings with recent cognates. In other words, these secondary structures would not be references for initial RNAs starting the accretion process, but for the latest RNAs in the accretion process.

Figure 4 plots percentages of negative r’s between accretion ranks and secondary structure similarities between small rRNA subelements and RNA ring secondary structures, pooling all organisms and alternative splicings of RNA rings. Patterns confirm several points: 1. Most correlations between accretion ranks and secondary structure similarities are negative as expected by the working hypothesis, for most RNA rings; 2. In most cases, there are more negative correlations for the phylogenetic than the structural method for reconstructing accretion ranks; 3. Percentages of negative correlations decrease with the genetic code integration order of the cognate amino acid of RNA rings (see above comments for RNA ring with selenocysteine as predicted cognate).

Figure 5 presents the percentages of negative r’s between accretion ranks and secondary structure similarities between small rRNA subelements and RNA ring secondary structures, pooling all organisms and RNA rings, as a function of RNA ring splicing position. Results show that correlations are most frequently negative, meaning fitting the working hypothesis, when RNA rings are spliced at position “1”. This is the position defined by the highest homology between the RNA ring and an ancestral tRNA55. This observation is also in line with the working hypothesis that RNA rings are proto-tRNAs, and that accretion of proto-tRNAs, tRNAs and tRNA-like RNAs formed rRNAs. Note that the assumption that RNA rings are proto-tRNAs is under debate107. Nevertheless, and apparently confirming this status of proto-tRNAs, pseudo-phylogenetic analyses of RNA ring sequences reveal two clusters of RNA rings, one coinciding with RNA rings whose presumed cognate amino acid is the cognate of tRNAs for which the tRNA acceptor stem includes a primitive code108.

Particularly noteworthy is that results of analyses presented here for the small rRNA subunit are in line with results obtained for the large rRNA subunit106. These analyses compared structural subelements of the large rRNA subunit with the same RNA ring secondary structures as those used here. As described here for the small rRNA, for the large rRNA subunit, comparisons with RNA ring secondary structures show that: a. are slightly more congruent with the phylogenetic than the structural method; b. results are strongest for comparisons with RNA rings with predicted ancient cognate amino acids; c. weakest for comparisons with RNA rings with predicted recent cognate amino acids.

Conclusions

Results are strong corroboration of the working hypothesis that tRNA accretions formed rRNAs. They show that RNA rings are likely proto-tRNAs, and that these are good reference points for primitive RNAs in general, and tRNAs in particular. Results confirm that RNA ring cognates are good estimates for RNA ring evolutionary ranks, and that similarities between secondary structures bear information on evolutionary direction of RNA secondary structures, from tRNA to rRNA-like, also among rRNA structural subelements. This has been suggested by several previous lines of analyses presented in the Introduction10,11,12,15,16,17,18,19,20,21,56,57,91,92, expanding upon evidences for common origins for tRNAs and rRNAs1,2,3,4,7,8,9. Analyses presented here for the small rRNA subunit show greater congruence between accretion orders derived from the secondary structure method used here and the phylogenetic method than between the former and the structural method. Similar analyses done for the large rRNA subunit produce qualitatively similar results, independently confirming our method and evolutionary conclusions. Overall, both phylogenetic and structural methods produce accretion orders that are congruent with the secondary structure method applied through the tRNA-rRNA axis of RNA secondary structure evolution. It is probable that the structural methods are more prone to errors due to evolutionary convergences than the phylogenetic method, though convergences remain the main difficulty in reconstructing evolution.