Comparisons between small ribosomal RNA and theoretical minimal RNA ring secondary structures confirm phylogenetic and structural accretion histories

Ribosomal RNAs are complex structures that presumably evolved by tRNA accretions. Statistical properties of tRNA secondary structures correlate with genetic code integration orders of their cognate amino acids. Ribosomal RNA secondary structures resemble those of tRNAs with recent cognates. Hence, rRNAs presumably evolved from ancestral tRNAs. Here, analyses compare secondary structure subcomponents of small ribosomal RNA subunits with secondary structures of theoretical minimal RNA rings, presumed proto-tRNAs. Two independent methods determined different accretion orders of rRNA structural subelements: (a) classical comparative homology and phylogenetic reconstruction, and (b) a structural hypothesis assuming an inverted onion ring growth where the three-dimensional ribosome’s core is most ancient and peripheral elements most recent. Comparisons between (a) and (b) accretions orders with RNA ring secondary structure scales show that recent rRNA subelements are: 1. more like RNA rings with recent cognates, indicating ongoing coevolution between tRNA and rRNA secondary structures; 2. less similar to theoretical minimal RNA rings with ancient cognates. Our method fits (a) and (b) in all examined organisms, more with (a) than (b). Results stress the need to integrate independent methods. Theoretical minimal RNA rings are potential evolutionary references for any sequence-based evolutionary analyses, independent of the focal data from that study.

. Accretion rank of 16S rRNA structural subelements according to the structural onion model (periphery most recent 78 ranks therein from Fig. 2) as a function of accretion rank according to the phylogenetic method ( 59 , ranks are therein from the phylogenies for 16S secondary structure elements in the Fig. 2 and in their corresponding supplementary figure). Accretion ranks are divided by the highest rank according to that method (structural, 27; phylogeny, 39), then multiplied by 100. Full symbols indicate structural subelements for which the absolute value of the difference between accretion ranks (divided by maximal ranks) is <25, hollow symbols have differences >25. Considering all 44 datapoints, the correlation between the two methods is r = 0.308, P = 0.021, meaning that 9.5% of the variation is common between methods (a,b); for the 26 filled symbols, r = 0.898, P = 0, 80.6% of the variation is common. Hence methods (a,b) are congruent for 26/44 × 100 = 59% of the structural subelements.

Secondary structure classification
The overall impression resulting from Fig. 1 is that both structural and phylogenetic methods have some level of congruence, for a bit more than half of the secondary structure subelements, across all four 16S rRNA structural domains. Hence, for almost half of the secondary structure subelements, we do not know the accretion rank. A third independent method for estimating RNA history could improve the resolution of rRNA accretion ranks.
A method clustering RNA secondary structures found two main RNA secondary structure groups, one characterized by small, presumably ancient tRNA-like secondary structures, and a presumed more derived group, characterized by larger, rRNA-like secondary structures, including viruses 91,92 . The tRNA-like cluster was independent references for RnA evolution The tRNA-rRNA evolutionary axis score is based on a sample of known RNA secondary structures. Hence, it suffers from sampling biases, and from some level of circularity: biological data are used to infer on biological phenomena, a caveat it shares with the phylogenetic method. A possible solution to this is to use as reference theoretical minimal RNA rings, a set of short sequences designed in silico according to few basic constraints: the shortest possible sequence coding for a start and a stop codon, and once for each of the 20 biogenic amino acids.
These constraints define at most 25 circular RNA sequences of 22 nucleotides, which code according to partially overlapping codons, along three consecutive translation rounds, for a start codon, 20 different amino acids, and a stop codon. The stop codon is physically next to the start codon, closing the RNA ring. These RNA rings, mainly defined by coding sequences, resemble ancestral tRNAs 97,98 , with a predicted anticodon and its corresponding cognate amino acid for each RNA ring 55 .
The theoretical minimal RNA rings realistically mimic primitive RNAs and their evolution, along several coding properties [99][100][101][102] and primary and secondary structure properties 50,56,57 . These properties coevolve with the genetic code integration order of the cognate amino acid matching the anticodon defined by homology of the RNA rings with ancestral tRNAs 50,56,57,[99][100][101][102] . Considering that the design of RNA rings is purely rational and mainly based on the structure of the genetic code, this means that the genetic code's structure intrinsically embeds information on the evolution of these various properties. However, we do not yet understand what determines these complex evolutionary trajectories.
Notably, the tRNA-rRNA scores obtained for secondary structures of these RNA rings, correlate, as observed for real tRNAs 56 , with the evolutionary ranks of integration of the cognate amino acids matching their predicted anticodons 57 . This parallels the result described in the previous section for regular tRNAs and the genetic code integration order of their cognate amino acid 56 . Here too, the polarity results from this order, not from phylogenetic reconstruction.

Working hypothesis and predictions
Hence, RNA rings are designed as proto-mRNAs but have also properties that are expected for proto-tRNAs. As plausible proto-tRNAs, they are used here as references for ancestral RNAs, in line with results of evolutionary analyses of their different properties 50,56,57,[99][100][101][102] . Analyses use similarities between RNA ring secondary structures and those of structural subelements of 16S rRNAs. The method assumes that high similarities with RNA ring secondary structures indicate ancient structural subelements, and low similarities recent 16S rRNA structural subelements. These similarities are then compared with accretion ranks produced by each of the phylogenetic and the structural hypotheses, expecting: 1. negative correlations if the different methods are producing congruent accretion ranks; 2. these correlations should be most negative for RNA rings with ancient cognate amino acids, and gradually be more positive for RNA rings with recent cognate amino acids.

Materials and methods
The quantification of similarities between secondary structures is identical to previous analyses 56,57,91,92 . Optimal secondary structures of spliced RNA rings were predicted by Mfold 103 .
Four secondary structure properties are extracted from secondary structures, as shown as example for structural subelement h45 from the archaean Thermus thermophilus 16S rRNA (Fig. 2): 1. the percentage of nucleotides in stems formed by complementary self-hybridization among nucleotides, %stem among all nucleotides in the sequence; 2. the percentage of nucleotides, among those in loops, that are in loops topping stems (external loops), as opposed to unpaired nucleotides forming bulges within stems (internal loops), %eloops; and the 3. stem and 4. loop GC contents, in percentages.
Similarities between two secondary structure pairs are estimated by Pearson correlation coefficients r between these four variables as obtained for each secondary structure (Fig. 3), in this case between values from Fig. 2 and those of secondary structures formed by two alternative splicings of RNA ring 25, also called AB 53 . Table 1 presents the four secondary structure variables for AB for all 22 alternative splicings of that RNA ring. Such data were obtained for all 25 RNA rings. Similar secondary structure data for 22 alternative splicings of RNA ring 13, called AL, were presented previously 57 , (therein Table 3). For each comparison, Fig. 3 has four datapoints for each secondary structure, one datapoint per secondary structure variable. For each datapoint, the X-axis is defined by the value obtained for the AB secondary structure, and the Y-axis by the value obtained for the corresponding variable for the 16S secondary structure subelement shown in Fig. 2. These pairings are not arbitrary: the x-and y-axis values are for the same secondary structure property, but for a different secondary structure (x-axis, RNA ring 25; y-axis, rRNA structural subelement, in this case h45 of Thermus thermophilus). Similarities are estimated by r, the more positive r, the more similar the secondary structures.
The secondary structure variables of all secondary structure subelements of two Archaea, Thermus thermophilus and Sulfolobus solfataricus 104 (Table 2), two bacteria, Escherichia coli and Streptomyces coelicolor 105 (Table 3), and the 18S rRNA of two eukaryotes, Homo sapiens and Saccharomyces cerevisiae (Table 4). Secondary rRNA structures for prokaryote 16S of Thermus thermophilus, Escherichia coli, and eukaryote 18S Saccharomyces cerevisiae and Homo sapiens are available at http://apollo.chemistry.gatech.edu/RibosomeGallery/.
Step by step description of analyses. 1 Table 1 presents as an example these four variables for the 22 alternative splicings of a specific RNA ring, RNA ring 25.  www.nature.com/scientificreports www.nature.com/scientificreports/ secondary structure variables of a rRNA structural subelement as a function of the corresponding values obtained for a given RNA ring secondary structure. A Pearson correlation coefficient r, called rS, estimates similarities between rRNA and RNA ring secondary structures. Figure 3 presents comparisons between 16S rRNA subelement h45 of Thermus thermophilus and two RNA ring 25 secondary structures, one obtained by splicing that ring at position 7, and one at position 19. For each of the 550 RNA ring secondary structures, there are as many rS as there are rRNA secondary structure subelements, about 45. 6. According to our hypothesis, the (about) 45 rSs comparing a given RNA ring secondary structure to all rRNA structural subelements are potential estimates of the accretion order of the rRNA secondary structures. 7. These rSs are compared to the accretion order of the rRNA secondary structure subelements, as these were determined by other methods and published by other authors (separately for each cladistic and structural accretion ranks). This comparison is done by calculating the Pearson correlation coefficient between the rS and the accretion orders, producing rH, one for the cladistic method, rHphyl, and one for the structural method, rHstru. Note that rS are z-transformed before calculating rH using the formula z = −ln((1 + r)/ (1 − r)). The z transformation linearizes the scale of r, which is not linear. 8. Hence, each of the 550 RNA ring secondary structures produces one rHphyl and one rHstru per organism.
For each organism, there are 550 rHphyls and 550 rHstrus. The minimal and maximal rHphyls and rHstrus for each organism are in Table 5. Table 5 includes percentages of negative rHphyls and rHstrus (the working hypothesis expects negative rHs), and numbers of negative and positive rHphyls and rHstrus that have two tailed P < 0.05. 9. For any given RNA ring secondary structure, there are 6 rHphyls and 6 rHstrus, because analyses were done for 6 organisms. There are in total 6 × 550 = 3300 rHphyls and 3300 rHstrus. Further analyses describe general patterns within these data, according to RNA rings, and according to splicing positions.     www.nature.com/scientificreports www.nature.com/scientificreports/ as shown for RNA ring 25, AB, in Table 1, and previously for RNA rings 9 106 , (therein Table 1) and 13 57 , (therein Table 3). Hence, there are 25 × 22 = 550 secondary structures to which secondary structure subelements of the 16S rRNAs can be compared.

Results and discussion
The secondary structure variables shown in Table 1 and Figs. 2 and 3 were extracted for each of these 550 RNA ring secondary structures and are compared, as shown in Fig. 3, with the corresponding secondary structure variables of all secondary structure subelements of all six organisms considered here. Table 5 shows the most negative and the most positive rH correlations between secondary structure similarities and accretion ranks according to the phylogenetic and the structural method for each of the six organisms (rHphyl and rHstru, respectively). Similarities (rS) were between the secondary structure variables described in Tables 2-4 and corresponding variables for the secondary structures formed by each of the 22 alternative splicings of each of the 25 theoretical minimal RNA rings. Considering that the main prediction of the working hypothesis expects negative correlations, it is notable that in each organism, the absolute values of the negative correlation is larger than the absolute value of the positive correlation, besides for one among 12 comparisons, according to the structural method, for Sulfolobus solfataricus.
Similarly, percentages of negative correlations are in all organisms, for both rHphyl and rHstru, always greater than 50%, significantly so according to a chi-square test in all but three among 12 tests, rHphyl in Homo, rHstru  Table 5. Most negative and most positive Pearson correlations coefficients r (x100) (rH) between accretion ranks according to phylogenetic (rHphyl) and structural (rHstru) models with secondary structure similarities with RNA rings for 16S rRNAs of six organisms, and percentages of rHs (%neg) that are negative as expected by the working hypothesis among the 550 correlation calculated for each rHphyl and rHstru, for each organism. * indicates statistically significant differences (P < 0.05) from 50% (550/2 = 275 negative rHphyl and rHstru are expected if the sign of rH has an unbiased distribution between negative and positive trends) according to a chi-square test. "Co" indicates the cognate amino acid corresponding to the anticodon of the RNA ring(s) producing these correlations. Cognate G always corresponds to RNA ring 25 (AB). N indicates numbers of datapoints involved in the calculation of rH correlation coefficients. www.nature.com/scientificreports www.nature.com/scientificreports/ in yeast and in Sulfolobus. In addition, percentages of negative rHs are significantly greater for rHphyl than rHstru within three among six species, Sulfolobus, Streptomyces and yeast. In Homo, percentages of negative rHstru were significantly greater than percentages of negative rHphyl. The overall pattern is that results match the working hypothesis, and this more for rHphyl than rHstru. The opposite occurs in Homo. This could be interpreted as due to recent evolution of small rRNA structure in that species, but would require additional analyses and data from other species.
A second noteworthy point is that the most positive correlations are in 7 among 12 cases with RNA ring 2, which has a predicted anticodon for a stop codon, coding sometimes for selenocysteine. This is presumably one of the latest amino acids integrated in the genetic code (21 st ). This result fits the prediction that the most positive correlations between accretion ranks and secondary structure similarities would correspond to RNA rings with recent cognates. In other words, these secondary structures would not be references for initial RNAs starting the accretion process, but for the latest RNAs in the accretion process. Figure 4 plots percentages of negative r's between accretion ranks and secondary structure similarities between small rRNA subelements and RNA ring secondary structures, pooling all organisms and alternative splicings of RNA rings. Patterns confirm several points: 1. Most correlations between accretion ranks and secondary structure similarities are negative as expected by the working hypothesis, for most RNA rings; 2. In most cases, there are more negative correlations for the phylogenetic than the structural method for reconstructing accretion ranks; 3. Percentages of negative correlations decrease with the genetic code integration order of the cognate amino acid of RNA rings (see above comments for RNA ring with selenocysteine as predicted cognate). Figure 5 presents the percentages of negative r's between accretion ranks and secondary structure similarities between small rRNA subelements and RNA ring secondary structures, pooling all organisms and RNA rings, as a function of RNA ring splicing position. Results show that correlations are most frequently negative, meaning fitting the working hypothesis, when RNA rings are spliced at position "1". This is the position defined by the highest homology between the RNA ring and an ancestral tRNA 55 . This observation is also in line with the working hypothesis that RNA rings are proto-tRNAs, and that accretion of proto-tRNAs, tRNAs and tRNA-like RNAs formed rRNAs. Note that the assumption that RNA rings are proto-tRNAs is under debate 107 . Nevertheless, and apparently confirming this status of proto-tRNAs, pseudo-phylogenetic analyses of RNA ring sequences reveal two clusters of RNA rings, one coinciding with RNA rings whose presumed cognate amino acid is the cognate of tRNAs for which the tRNA acceptor stem includes a primitive code 108 .
Particularly noteworthy is that results of analyses presented here for the small rRNA subunit are in line with results obtained for the large rRNA subunit 106 . These analyses compared structural subelements of the large rRNA subunit with the same RNA ring secondary structures as those used here. As described here for the small rRNA, for the large rRNA subunit, comparisons with RNA ring secondary structures show that: a. are slightly more congruent with the phylogenetic than the structural method; b. results are strongest for comparisons with RNA rings with predicted ancient cognate amino acids; c. weakest for comparisons with RNA rings with predicted recent cognate amino acids.

conclusions
Results are strong corroboration of the working hypothesis that tRNA accretions formed rRNAs. They show that RNA rings are likely proto-tRNAs, and that these are good reference points for primitive RNAs in general, and tRNAs in particular. Results confirm that RNA ring cognates are good estimates for RNA ring evolutionary ranks, and that similarities between secondary structures bear information on evolutionary direction of RNA secondary Figure 5. Percentage of negative Pearson correlation coefficient r between accretion ranks (phylogenetic method, filled symbols; structural method, hollow symbols) and similarities between 16S rRNA and RNA ring secondary structures, r's pooled across organisms and RNA rings, as a function of the splicing position of the RNA ring. The splicing position with the highest percentage of negative correlations is position "1", which corresponds to the splicing that produces the best homology between RNA rings and ancestral tRNAs 57 .
structures, from tRNA to rRNA-like, also among rRNA structural subelements. This has been suggested by several previous lines of analyses presented in the Introduction [10][11][12][15][16][17][18][19][20][21]56,57,91,92 , expanding upon evidences for common origins for tRNAs and rRNAs [1][2][3][4][7][8][9] . Analyses presented here for the small rRNA subunit show greater congruence between accretion orders derived from the secondary structure method used here and the phylogenetic method than between the former and the structural method. Similar analyses done for the large rRNA subunit produce qualitatively similar results, independently confirming our method and evolutionary conclusions. Overall, both phylogenetic and structural methods produce accretion orders that are congruent with the secondary structure method applied through the tRNA-rRNA axis of RNA secondary structure evolution. It is probable that the structural methods are more prone to errors due to evolutionary convergences than the phylogenetic method, though convergences remain the main difficulty in reconstructing evolution.