The 16 S rRNA sequence has long been used uncritically as a molecular clock to infer phylogenetic relationships among prokaryotes without fully elucidating the evolutionary changes that this molecule undergoes. In this study, we investigated the functional evolvability of 16 S rRNA, using comparative RNA function analyses between the 16 S rRNAs of Escherichia coli (Proteobacteria) and Acidobacteria (78% identity, 334 nucleotide differences) in the common genetic background of E. coli. While the growth phenotype of an E. coli mutant harboring the acidobacterial gene was disrupted significantly, it was restored almost completely following introduction of a 16 S rRNA sequence with a single base-pair variation in helix 44; the remaining 332 nucleotides were thus functionally similar to those of E. coli. Our results suggest that 16 S rRNAs share an inflexible cradle structure formed by ribosomal proteins and have evolved by accumulating species-specific yet functionally similar mutations. While this experimental evidence suggests the neutral evolvability of 16 S rRNA genes and hence satisfies the necessary requirements to use the sequence as a molecular clock, it also implies the promiscuous nature of the 16 S rRNA gene, i.e., the occurrence of horizontal gene transfer among bacteria.
The ribosome is an RNA-centered and highly complex ribonucleoprotein particle, in which mRNA-encoded genetic information is translated into proteins1, 2. As this biological process is essential to sustain cellular activities, the ribosome is found in every type of cellular organism, including bacteria, archaea, and eukaryotes.
Carl Woese first proposed that the systematic comparison of small subunit ribosomal RNA (SSU rRNA: 16 S rRNA for prokaryotes and 18 S rRNA for eukaryotes) sequences from different organisms would make it possible to infer the evolutionary relationships among organisms in the form of phylogenetic trees3,4,5. Recursive phylogenetic mapping of each newly discovered (micro-) organism onto an already constructed phylogenetic tree has progressively established a persuasive global tree of life4, 6,7,8,9, and this pictorial concept has penetrated deeply into the field of biology as a consensus view on the way organisms have evolved3. Having stood the test of time for over 30 years, this SSU rRNA sequence-based approach remains the gold standard method to infer organismal phylogeny.
However, considering the fact that decisive fossil records, by which any molecular clock should be validated or calibrated, are not available for prokaryotes, strong theoretical basis such as the existence of neutral evolvability or excluding the possibility of horizontal gene transfer (HGT) events should then be assured before a specific gene is used as a molecular clock. Currently, we lack evidence for both of these aspects and therefore cannot answer simple questions such as whether 16 S rRNA has evolved neutrally (without interacting with ribosomal proteins) or whether the complexity hypothesis10 applies to 16 S rRNA, which is a representative “informational gene” product, to prevent HGT within the context of the highly complex ribosomal particle1, 2 of each organism. To address the molecular evolvability of 16 S rRNA, we propose a new methodology named the comparative RNA function (CRF) analysis, in which the functional similarity or dissimilarity of 16 S rRNAs are analyzed experimentally, exclusively focusing on the functional effects of mutations that accumulate differently in respective rRNAs during speciation.
In this study, using the Escherichia coli ribosomal system as a common platform, we conducted a CRF analysis on the 16 S rRNAs between E. coli and Acidobacteria, which are phylogenetically distinct at the phylum level, with only 78% identity (or 334 nucleotide differences out of a total of ~1,500 nucleotides). Our experimental results revealed that 332 (99.4%) nucleotides in the acidobacterial 16 S rRNA gene were functionally similar to those in E. coli genetic background, providing strong evidence that the primitive 16 S rRNAs were held by a common framework of ribosomal proteins and then accumulated lineage-specific neutral mutations during evolution. While this functional similarity in distantly related 16 S rRNAs seemingly assures the use of these sequences as a reliable clock, it also suggests a unique evolutionary characteristic of the gene; 16 S rRNA is quite amenable to HGT, highlighting the promiscuous nature of the gene.
Functional compatibility of 16 S rRNA across bacterial phyla
To obtain 16 S rRNAs that are evolutionary distinct yet functional in E. coli, we screened a metagenomic library of 16 S rRNA genes in the E. coli strain KT105 (the null mutant of all seven rrn operons in the genome), following a previously described procedure11 with modifications, as described in the Supplementary Materials and Methods. We identified a clone of the 16 S rRNA gene from an acidobacterial species (designated 16SNS11) showing a minimal sequence identity of 78.4% to that of E. coli (16SEco) (Supplementary Table 1). We performed a CRF analysis using 16SEco and 16SNS11 to investigate how these sequence differences affect the function of 16 S rRNA.
The 16SNS11 differed from 16SEco by 334 nucleotides (Supplementary Fig. 1), among which 88 nucleotides had direct interactions with ribosomal proteins (dotted in red in Supplementary Fig. 2A). Overall, the secondary structure of the RNA helices appeared to be preserved between 16SEco and 16SNS11 according to the compensatory mechanism as shown in the RNA–RNA contact map in Supplementary Fig. 2B 12, 13. A typical example is shown in Fig. 1a for the helix 17 (h17) structure. However, we also found larger sequence and structural differences in some of the helices. For example, helix 6 (h6) differed between 16SEco and 16SNS11 in both the bulge and stem structures (Fig. 1b). In helix 33 (h33), several nucleotides were inserted in 16SNS11 (Fig. 1c). Therefore, not only are the typical compensatory mutations distinct between the two sequences but also the several unpaired (insertion/deletion) mutations.
The 3′ minor domain of the acidobacterial 16 S rRNA deleteriously affects E. coli growth
We measured the doubling times (DTs) of the E. coli KT105 strains containing 16SEco and 16SNS11 (Fig. 2). We found that KT105 with 16SNS11 (KT105/16SNS11) showed a significantly increased DT (72.6 min), indicating decreased viability compared to that of KT105/16SEco (40.2 min). Thus, introduction of a foreign 16 S rRNA sequence perturbed the function of the ribosome and decreased the host’s viability. There are two possible reasons for the growth perturbation. One possibility is that the foreign 16 S rRNA (16SNS11 in this case) contained a number of nucleotides that differ from those of the host 16S rRNA sequence (16SEco in this case), each of which would be slightly deleterious but not lethal. Such slightly deleterious sites would cumulatively cause significant perturbation of ribosomal functionality. In this situation, the viability of the host should be inversely correlated to the number of deleterious sites; step-by-step back mutation to the E. coli sequence would gradually recover the host’s viability. The second possibility is that the deleterious nucleotide(s) of the foreign 16 S rRNA (16SNS11 in this case) localizes in a specific region, and the remaining nucleotides that differ are functionally neutral. If so, the growth of the host (E. coli in this case) would be restored rapidly by a relatively small number of back mutations.
To identify the nucleotide(s) responsible for the perturbation of growth in KT105/16SNS11, we designed a series of chimeric 16 S rRNAs of 16SEco and 16SNS11, in which one of the four domains (5′, central, 3′ major, and 3′ minor) in 16SNS11 was replaced with the corresponding domain of 16SEco. The resulting chimeric genes (16SNS11-5E, 16SNS11-CntE, 16SNS11-3MjE, and 16SNS11-3MnE) (Supplementary Fig. 3) were each introduced into E. coli KT101 to construct a chimeric mutant series of KT105. We found that 16SNS11-CntE, 16SNS11-3MjE, and 16SNS11-3MnE supported the growth (colony formation) of KT105, whereas 16SNS11-5E did not. The loss of the functionality in 16SNS11-5E was presumably due to the disruption of the central pseudoknot structure (h2), which is formed between the 5′ major and central domains and is essential for ribosomal function14,15,16. This disruption was technically unavoidable in designing the 16SNS11-5E chimera.
Next, DTs were determined for the viable chimeric mutants (i.e., KT105/16SNS11-CntE, KT105/16SNS11-3MjE, and KT105/16SNS11-3MnE) (Fig. 2). Prolonged DT was observed for KT105/16SNS11-CntE (96.6 min), suggesting that partial disruption of the functional interactions between the 5′ and the central domains caused defective growth, though not complete lethality as observed for KT105/16SNS11-5E. No obvious recovery in DT was observed for KT105/16SNS11-3MjE (70.9 min), suggesting that 3′ major domains of E. coli and NS11 are functionally similar. In contrast, KT105/16SNS11-3MnE showed a significant recovery of viability; DT was shortened from 72.6 min to 43.6 min, which was similar to KT105/16SEco (40.2 min). Therefore, it was strongly suggested that the deleterious nature of 16SNS11 in the E. coli ribosomal particle was predominantly due to the 3′ minor domain. As the 3′ minor domain is the smallest among the four domains (~150 bases, whereas the entire 16SEco sequence is 1542 bases), the first possibility mentioned above could be ruled out. In contrast, the second possibility—that localization of deleterious sites in a specific region (the 3′minor domain in the present case)—seemed plausible.
Biochemical characterization of mutant ribosomes
To confirm that the defective growth of the 16S rRNA-substituted strains was due to decreased ribosomal function, we conducted biochemical characterization of the ribosomes. In vitro translational activity of the ribosomes was determined using the E. coli cell-free transcription/translation system. The 70 S ribosome was purified from each KT105 mutant strain, and the activity was measured using green fluorescent protein (GFP)17, 18 as a reporter. The reaction was initiated by the addition of purified 70 S ribosome into the assay solution, and the increase of GFP fluorescence was measured over 5 h. As shown in Supplementary Fig. 4A, the activity of the ribosome with 16SNS11 was approximately a third of that of the ribosome with 16SEco, which was markedly restored in the ribosome with 16SNS11-3MnE. Further characterization of the ribosomes was performed using sucrose density gradient analysis to study their subunit composition (Supplementary Fig. 4B). The mutated ribosome with 16SNS11 displayed a decreased assembly ratio of the 30 S subunit and association ratio between 30 S and 50 S subunits, both of which were markedly restored by substituting the 3′ minor domain with that of E. coli (16SNS11-3MnE). These results confirmed that the growth phenotype and ribosomal functions were well correlated.
99.4% of sequence variations between Acidobacteria and E. coli 16S rRNA are functionally similar with each other
According to the above-mentioned second possibility (presence of deleterious sites in the 3′ minor domain), we continued to identify the deleterious nucleotide(s) within the domain of 16SNS11. Figure 3 shows the secondary structure map of the 3′ minor domain of 16SEco. Forty-one functionally important nucleotides for subunit-subunit interaction and ribosome biogenesis reported in the literature are shown in red19,20,21,22,23. Thirty-two nucleotides, which differ between 16SEco and 16SNS11, are indicated by arrows, and the corresponding nucleotides for 16SNS11 are indicated beside each arrow. Among these 32 nucleotides, we narrowed down the list of potentially deleterious nucleotides to ten (nucleotides 1416, 1417, 1421, 1429, 1430, 1463, 1464, 1465, 1484, and 1516) considering their functional importance. Of these, nucleotides 1463, 1464, and 1465 are located in the highly variable region of h4424, and nucleotides 1429 and 1430 in 16SEco do not affect the host’s growth upon mutation25. Nucleotide 1421 is involved in the central helix of h44, which pairs with nucleotide 1479. In 16SEco, this region is G1421–C1479, whereas in 16SNS11, the base pair was replaced with C1421–G1479. Regarding the secondary structure, it was shown previously that the conservation of base pairing is critical, whereas the base identity is insignificant26. Thus, we excluded these six nucleotides from the subsequent analysis and focused on the following four sites: nucleotides 1416, 1417, 1484, and 1516 (shown in blue in Fig. 3). In 16SEco, G1416 and G1417 are involved in the assembly of the 30 S subunit27, whereas C1484 and G1516 contribute to the interaction between 30 S and 50 S subunits involved in bridge B3 and bridge B2b, respectively21. One of the 11 post-transcriptionally modified bases is G1516, which is methylated by the site-specific methyltransferase RsmJ (encoded by the ygiQ gene)28.
We introduced mutations at four positions (A1416G, A1417G, U1484C, and A1516G), causing each of the four nucleotides in 16SNS11 to revert to the respective 16SEco sequence (Fig. 3). In addition, since nucleotides 1416 and 1484 form a base pair, we created the double mutant A1416G/U1484C. These variant genes were generated by site-directed mutagenesis using 16SNS11 as a template and then introduced into the E. coli host. Among the mutants, the A1416G mutant did not support growth (colony formation); however, the mutation likely retained the non-Watson-Crick base pair (G1416–U1484) (Fig. 3). The same base pair (G1416–U1484) in 16SEco was reported to allow growth, although the growth rate was reduced remarkably25. The reason for this discrepancy remains unclear, but it might be due to the difference in 16 S rRNA background (16SEco or 16SNS11). The U1484C mutation in 16SNS11 should result in disruption of the base pair (A1416–U1484 to A1416–C1484), but the resulting mutant retained viability. Sun et al. suggested that the mutation involved in bridge B3 might affect the conformational rearrangements of the 30 S initiation complex, which is required for association with the 50 S subunit and/or the ordered formation of bridging interactions during subunit association25. Considering the fact that the base pair-disrupting mutation in 16SNS11 was not detrimental and that the C1484G mutation in 16SEco retained functionality25, base-pairing between nucleotides 1416 and 1484 may not be essential for its function.
Next, DTs were determined for the viable point mutants. Overall, all functional single mutants showed better growth rates than the parental E. coli KT105/16SNS11 (Fig. 4). In particular, the DTs of the double mutant KT105/16SNS11-A1416G/U1484C were drastically shortened from 72.6 min (KT105/16SNS11) to 42.7 min, which was slightly shorter than that of the chimeric mutant KT105/16SNS11-3MnE (43.6 min) and comparable to that for KT105/16SEco (40.2 min). These data suggest that the deleterious effect of 16 SNS11 on the E. coli growth phenotype can be primarily explained by the base pair patterns formed between nucleotides 1416 and 1484.
In this study, we conducted a CRF analysis using the E. coli genetic system and investigated the functional evolvability of 16 S rRNAs, showing that the 16 S rRNA gene from a different phylum, Acidobacteria (16SNS11), whose sequence differed from the E. coli allele by 334 nucleotides, was functional in the E. coli ribosomal background. Further mutational analysis revealed that, among the 334 different sites, only a single base pair was deleterious, but the remaining 332 (99.4%) nucleotides were found to be similarly functional to those of E. coli. To the best of our knowledge, except for a preliminary study29 this is the first experimental study to report the similarity of 16 S rRNA genes between distantly related bacteria in terms of functionality, contradicting the notion of the complexity hypothesis that rRNAs coevolve with ribosomal proteins in a species-specific manner10.
Historically, comparative RNA sequence analysis of the 16 S rRNA30, which maps the variable nucleotides onto the secondary structure, revealed that nucleotides in this molecule generally co-vary so that mutations do not disrupt the secondary structure. Although there was no single confirmation that each species’ 16 S rRNA has equal functionality, such mutations were tentatively named compensatory neutral mutations12, 13, which, in turn, became a theoretical basis for the existence of neutral evolvability of the 16 S rRNA gene for use as a molecular clock for phylogenetic studies of prokaryotic species. In addition, the 16 S rRNA gene is regarded as a reliable phylogenetic marker because it has been assumed that the gene would rarely experience HGT, since the 16 S rRNA forms the structural core of the 30 S ribosome (complexity hypothesis)1, 2. On the basis of these backgrounds and the lack of fossil records for prokaryotes, 16 S rRNA was considered to be the de facto ultimate chronometer for the phylogenetic study of prokaryotes3. The current notion on the way microorganisms have evolved on Earth is based largely on the 16 S rRNA-based phylogenetic tree, which has been accepted widely over the last 30 years3, 6. However, the basic assumption—evolutionary neutrality of the 16 S rRNA gene—is based solely on comparative sequence analysis (and intuition) but not supported by any functional analysis. Structural and/or sequence comparison of naturally occurring ribosomes from different organisms are not necessarily informative because each individual ribosome (or any proteins/RNAs) has evolved in their evolutionary context, which should be rich in various biological noise31. It should especially be stressed that, apart from typical base-pairing nucleotides that intuitively seem to evolve following the mechanism of compensatory neutral mutations12, 13, it is unclear how mutations in free (unpaired) nucleotides, or those interacting with proteins, affect functionality of the ribosome; this is why this study aimed to rigorously investigate the functional evolvability of this molecule, which must be linked to the evolutionary history of this molecule in nature.
The results obtained through our CRF analysis led to the clear conclusion that bacterial 16 S rRNAs are significantly connected by a neutral network; the majority (99.4%,) of varying nucleotides (including free/unpaired and protein-binding nucleotides) that accumulated uniquely during the course of evolution of each organism (E. coli and acidobacterial species) were functionally similar (Figs 2 and 4), seemingly assuring that 16 S rRNA sequences can be used as a reasonable clock—at least in terms of neutral evolvability. Notably, this finding conversely suggests that the majority of critical interactions between 16 S rRNA and surrounding proteins do not evolve because of their stringent mutual constraints; slight sequence change can be extremely toxic. Establishment of such an inflexible framework or “cradle” must have been preceded by the divergence of 16 S rRNAs. More specifically, our results suggest that the common cradle for the 16 S rRNAs of Proteobacteria and Acidobacteria developed before the branching point of the two lineages and has remained unchanged. Since the divergence, 16 S rRNA genes of both lineages have accumulated mutations mainly by the compensatory neutral mutation mechanism, with relatively small number of stand-alone point mutations (Supplementary Fig. 1). This evolutionary model for the ribosome, which we call the cradle model, is in complete opposition to the complexity hypothesis10, which claims that the genes involved in complex biosystems, as represented by the ribosome, co-evolve with each other and thus scarcely experience HGT between species.
The finding that evolution of the 16 S rRNA gene may not follow the complexity hypothesis but instead follow the cradle model strongly suggest that HGT of 16 S rRNA genes between species (even across phyla) could have occurred more frequently than previously thought, with promiscuous nature representing an “inconvenient truth” for using the 16 S rRNA gene as a clock. Consistent with our cradle model and opposing the complexity hypothesis, ribosomal protein genes are reported to undergo frequent HGT32, 33. However, our results also suggest the presence of a minimal number of species-specific, non-neutral mutations in 16 S rRNA (e.g., single base pair incompatibility of nucleotides 1416 and 1484 [Figs 3 and 4]), which would work as a barrier for full-length transfer of the 16 S rRNA genes between species. A similar functional barrier was also observed between the 5′ and central domains (i.e., the pseudoknot in helix 2)34, which would prevent unfavorable HGT within two consecutive domains of the 16 S rRNA gene (Fig. 2). In other words, three consecutive domains from the 5′ end (i.e., the 5′, central, and 3′ major domains) or the 3′ major domain alone would be freely transferable between the 16 S rRNA genes of E. coli and Acidobacteria without a major perturbation in growth. Although the artificial chimera that we designed in this study was based on the well-defined domain structure of 16 S rRNA (Supplementary Fig. 1A and Supplementary Fig. 3), the transfer unit in nature may not encompass the full length of the gene, or a discrete domain, but can be a fragment of arbitrary lengths on a case-by-case basis. In fact, sporadic naturally occurring chimeric 16 S rRNA genes have been reported in a moderate number of bacterial species35,36,37,38,39,40,41, the recombination sites of which appear to be independent from domain-domain junctions. A systematic description of the chimeragenesis history of 16 S rRNA, which is technically difficult and has not yet been conducted, would be essential to estimate the evolutionary significance of the cradle model for the molecular evolution of 16 S rRNA genes.
In summary, the major experimental highlight of this paper is to provide a proof-of-concept that 16 S rRNA genes from two distantly related bacteria that differ at the phylum level (Acidobacteria and Proteobacteria in this case) are similar in function. This result, however, does not necessarily ensure that the same rule can be readily applied to other bacterial lineages. To further generalize our findings, the functionality of 16 S rRNA from other phylogenetic lineages or functionality of 16 S rRNAs using a different experimental setting needs to be tested (e.g., use of non-E. coli bacteria as a host). We anticipate that such systematic studies will help clarify Woese’s assumption that 16 S rRNA genes have universal neutral evolvability. Even if, however, experimental evidence shown in this and possible future studies suggest the universal neutral evolvability of 16 S rRNA genes and hence satisfies the necessary requirements to use the sequence as a logical molecular clock, this finding would simultaneously or inevitably imply the promiscuous nature of the 16 S rRNA gene, i.e., the occurrence of horizontal gene transfer among bacteria.
We are beginning to recognize that many genomes and genes have experienced HGT, which includes both operational and informational genes10, and the 16 S rRNA gene is no exception. It is undoubtedly true that organisms evolve following the tree-shape evolutionary model, but molecules or assemblages (i.e. genomes) thereof do not necessarily follow this model; in fact, they often violate it. Therefore, the evolutionary history of potentially promiscuous 16 S rRNA genes may well be described in a way that differs from the tree-shape. A more appropriate representation might be a network, web, or ring shape, which incorporates both vertical and horizontal evolutionary history. Discerning the vertical and horizontal information clearly and analyzing both would help elucidate how 16 S rRNAs and their hosts have evolved vertically and how the hosts interacted with each other upon molecular chimeragenesis.
Materials and Methods
Bacterial strains and growth conditions
E. coli KT101 (∆rrnG ∆rrnA ∆rrnD ∆rrnE ∆rrnH ∆rrnB ∆rrnC/pTRNA67, pRB101, rna::Km R) is a derivative of SQ171 (∆7 prrn strain), a null mutant of the rRNA (rrn) operon in the chromosome42,43,44. The plasmid pRB101 (AmpR, sacB, pSC101 ori) contains the entire wild-type rrnB operon (including the 16 S rRNA gene), which complements the growth of KT101. The strain was cultivated at 37 °C in LB medium (1% [w/v] tryptone, 0.5% [w/v] yeast extract, 0.5% [w/v] NaCl) (Merck) containing 100 µg/mL ampicillin (Amp) and 25 µg/mL kanamycin (Km). E. coli KT105 is a derivative of KT101 in which pRB101 was completely replaced with pRB105 (rrnB, TmpR, pSC101 ori) using sucrose-induced counterselection. The pRB105 plasmid was used as a vector for introducing foreign 16 S rRNA genes or mutagenesis experiments. The KT105 strain was cultivated in LB medium containing 25 µg/mL Km and 10 µg/mL trimethoprim (Tmp). The DTs of the KT105 derivative mutants were determined as described previously11. Briefly, 0.7 µL of overnight pre-culture was inoculated into 200 µL of LB/Km/Tmp in a flat-bottom 96-well plate (Corning). The plate was incubated at 37 °C with vigorous agitation (200 rpm) in a VersaMax plate reader (Molecular Devices) and the OD600 was monitored every 15 min for 24 h.
Functional screening of foreign 16 S rRNA genes in E. coli ∆7
Functional screening of foreign (non-E. coli) 16 S rRNA genes from metagenome samples was carried out as described previously11, 42, 45, with slight modification34 (see Supplementary Materials and Methods and Supplementary Table 2). A total of ~4,000 colonies of KT105 were collected, from which 48 colonies were selected. Plasmid DNA (pRB105) was extracted from these mutants, and nucleotide sequences of the entire foreign 16 S rRNA genes were determined. A BLAST search46 was carried out using the National Center for Biotechnology Information (NCBI) nucleotide database, 16 S rRNA sequences (Bacteria and Archaea), with the program selection optimized for highly similar sequences (megablast). Sequence alignment of 16 S rRNA genes was performed using the SINA alignment service (http://www.arb-silva.de/aligner/).
Domain-based chimeragenesis was carried out between the 16 S rRNA genes of E. coli and Acidobacteria as described previously45, with some modifications. For details, see Supplementary Materials and Methods and Supplementary Fig. 3. Site-specific point mutations were introduced to the 16 S rRNA gene using the QuikChange mutagenesis protocol47. Primer pairs used for mutagenesis experiments are listed in Supplementary Table 2.
Data and materials availability
DNA sequence data for NS5 and NS11 16 S rRNA genes have been deposited under the accession numbers LC093165 and LC093166, respectively.
Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
We thank Shoko Kamada for technical assistance. This work was partly supported by Grant-in-Aid for JSPS Fellows 26-7760 (to M.T.), the Japan Society for the Promotion of Science (JSPS) Grant-in-Aid for Scientific Research (B) 23380197 (to K.M.), Grant-in-Aid for Scientific Research on Innovative Areas 24119515 (to K.M.), Grant-in-Aid for Challenging Exploratory Research 24651231 (to K.M.), and Grant-in-Aid for Young Scientists (B) 25830132 (to K.K.). Funding for open access charge: Japan Society for the Promotion of Science (JSPS).