If the table of the genetic code is rearranged to put complementary codons face-to-face, it becomes apparent that the code displays latent mirror symmetry with respect to two sterically different modes of tRNA recognition. These modes involve distinct classes of aminoacyl-tRNA synthetases (aaRSs I and II) with recognition from the minor or major groove sides of the acceptor stem, respectively. We analyze the anticodon pairs complementary to the face-to-face codon couplets. Taking into account the invariant nucleotides on either side (5′ and 3′), we consider the risk of anticodon confusion and subsequent erroneous aminoacylation in the ancestral coding system. This logic leads to the conclusion that ribozymic precursors of tRNA synthetases had the same two complementary modes of tRNA aminoacylation. This surprising case of molecular mimicry (1) shows a key potential selective advantage arising from the partitioning of aaRSs into two classes, (2) is consistent with the hypothesis that the two aaRS classes were originally encoded by the complementary strands of the same primordial gene and (3) provides a ‘missing link’ between the classic genetic code, embodied in the anticodon, and the second, or RNA operational, code that is embodied mostly in the acceptor stem and is directly responsible for proper tRNA aminoacylation.
The idea of the genetic code was one of the most important and captivating implications of the discovery of the double-helical structure of DNA (Yanofsky, 2007). It has transpired that the genetic code comprises a (nearly) universal assignment of nucleotide triplets (codons) to corresponding amino acids (Table 1). The deciphering of the code piqued the interest of the scientific community in a much more challenging problem—the origin(s) of the code.
An ‘origin-of-code’ scenario, which explains the existence of genetic coding material and the associated amino acids for its expression is worth mentioning if, and only if, it evades the proverbial ‘chicken-or-egg’ conundrum. The hypothesis of a direct stereochemical affinity (‘key–lock’) between an amino acid and a codon (Woese, 1965) meets this criterion, whereas the hypothesis of code ‘adaptor’ molecules (Crick, 1958) cannot get around the chicken-or-egg conundrum. The essence of this hypothesis is that an adaptor is able to recognize simultaneously an amino acid and a cognate codon or codons, but then the burden of explanation is merely passed onto this homunculus. In principle, the existence of such adaptors does not rule out at least weak direct stereochemical recognition between an amino acid and its codon(s); however, it certainly makes this affinity unnecessary (Crick, 1958, 1968).
The discovery of tRNAs confirmed Crick's hypothesis, but raised the following problem: tRNAs implement the code via complementary replica of the codon, the anticodon. However, the anticodon is located in the middle of the tRNA cloverleaf, at the maximum possible distance from the CCA end, where the cognate amino acid will be attached (Figure 1a). Because of this separation, tRNA molecules cannot self aminoacylate; instead, there are 20 amino acid-specific aminoacyl-tRNA synthetases (aaRSs) that perform this function. Thus, it is these aaRSs that actually define the familiar matrix of the genetic code, by linking the specific amino acids and tRNAs with the corresponding anticodons.
The aaRSs are proteins. Moreover, they are pleiotropic proteins that are directly involved in the synthesis of all other proteins, and this means that the aaRSs represent the chicken-or-egg paradox at its most puzzling. The problem is further aggravated by the distinctive partitioning of the protein aaRSs (p-aaRSs) into two classes (I and II) of 10 members each (Eriani et al., 1990). Despite performing exactly the same function, tRNA aminoacylation, these two enzyme classes share no homology—either in primary sequence or at higher 2D and 3D levels (Eriani et al., 1990). However, their modes of tRNA recognition almost perfectly complement each other in a symmetric, mirror-like fashion; class I recognition occurs from the minor groove side of the acceptor stem, whereas class II recognition occurs from the major groove side (Rould et al., 1989; Cusack et al., 1990; Eriani et al., 1990; Ruff et al., 1991; Carter, 1993). We believe that the two different types of groove recognition imply particular distributions of codons and amino acids, and these distributions, together with certain additional considerations, unambiguously point to the essential role that the primordial double-stranded (sense–antisense) translation played in the formation of the very core of the genetic code.
Accordingly, in this review article, we will concentrate on the properties of tRNAs and aaRSs that are associated with complementary codons. In particular, the rearrangement of the genetic code that puts complementary codons face-to-face with each other discloses an intriguing, highly nonrandom pattern. This pattern (1) shows a key potential selective advantage arising from the partitioning of aaRSs into two classes, (2) supports the hypothesis that the two classes of aaRSs were originally encoded by the complementary strands of the same ancestral gene (Rodin and Ohno, 1995), and (3) might provide a ‘missing link’ between the classic genetic code embodied in the anticodons and the operational code that is embodied mostly in the acceptor stem and is directly responsible for aminoacylation (Schimmel et al., 1993).
The problem of two codes
The 3D structure of tRNAs is formed by two domains (Figure 1b): the top domain (minihelix), to which the cognate aaRS attaches a specific amino acid at the 3′CCA end, and the bottom domain (dumbbell), with the anticodon positioned in the very center, which determines amino acid specificity. The minihelix and dumbbell comprise the characteristic L shape, mainly due to additional base pairings between the D and TψC loops. To a striking extent, these two domains appear to be functionally independent of each other. The tRNAs of at least ten amino acids can be charged successfully with the correct amino acids by the cognate p-aaRSs when truncated to a minihelix, or even a smaller piece that contains the 3′CCA end (reviewed by Schimmel and Beebe, 2006). Reciprocally, the truncated aaRSs (in extreme cases, a truncated aaRS is unable even to reach the anticodon) maintained the same tRNA-aa specificity (Schimmel and Beebe, 2006).
The RNA operational code
This striking anticodon-independent, yet amino acid-specific aminoacylation of tRNAs led to the idea of there being a second, RNA operational, code that is localized mainly in the acceptor stem of the tRNAs (in the vicinity of the amino acid attachment site) and is recognized by the corresponding module of p-aaRSs (see also de Duve 1988; Schimmel et al., 1993).
To an unexpected extent, the operational code determines which aaRS is cognate for a given tRNA (Schimmel et al., 1993), and it is the operational code again that brings the classic code, which is associated with anticodons, into action. The question then arises: are the two codes independent by origin? Of great significance in this regard is the observation that the replication initiation sites of RNA genomes resemble the minihelix with the 3′CCA terminus (Weiner and Maizels, 1987, 1999). It was, therefore, proposed (Schimmel et al., 1993) that:
a mini- (or even micro-) helix tRNA precursor might have had the ability to interact specifically with amino acids, probably long before it merged with the anticodon-containing dumbbell (Schimmel et al., 1993).
although the present-day operational code is implemented by the protein aaRSs, the original precursor of the operational code (presumably implemented by the ribozymic aaRSs) might have been older than the classic genetic code, and, if so
Yet, the codons-to-amino acids assignment does not seem to have been shaped by pure chance. Noticeably, similar triplets tend to encode similar amino acids (Table 1). And, although the frozen accident-based scenario does not entirely rule out the gradual selection-driven optimization of the genetic code along the ‘similar codons for similar amino acids’ lines (Crick, 1968), it is clear that direct stereochemical affinity between amino acids and cognate nucleotide triplets (anticodons and/or codons) would provide this agreement much more readily (Woese, 1965; Szathmary, 1993, 1999; Yarus, 1998). Furthermore, both Corey–Pauling–Koltun models of stereochemical C4N complexes of anticodons (noncovalently linked with the unpaired 73rd discriminator base) with appropriate amino acids (Shimizu, 1982) and aa-binding sites of RNA aptamers that were selected from random RNA pools during evolution in vitro (Yarus 1998; Caporaso et al., 2005; Yarus et al., 2005) show that this stereochemical affinity, although often weak, is quite real. Finally, it is only logical to suppose that the putative ribozymic precursors of synthetases, r-aaRSs, experienced precisely the same problem with the two codes.
Indeed, at first glance, the obvious advantage of r-aaRSs over their protein successors is the ability of the ribozymes to easily recognize the anticodon via a trivial complementary pairing. This anticodon-recognition site can be thought of as an anti-anticodon triplet—a codon-like triplet. However, for any r-aaRS to charge its cognate tRNAs with the correct amino acid, the r-aaRS must have possessed not only this anticodon-recognition site but also the specific aa-binding site. Note that the aa-binding site would have needed to be located close to the 3′ end of the tRNA (Figure 1), that is, again very far from the ‘anti-anticodon’ site. Therefore, it appears that if r-aaRSs did exist, they faced exactly the same ‘remoteness’ problem; in order to aminoacylate their cognate tRNAs, they would require catalysts of their own, that is, ‘meta-r-aaRSs,’ which, in turn, would inherit the same problem and require catalysts of their own, ad infinitum. Therefore, the advantages of direct recognition of anticodons by hypothetical r-aaRSs only readdress, rather than solve, the paradox.
This brings us to the only reasonable solution—a duplication of anticodon within the same tRNA molecule (Di Giulio, 1992), a duplication that actually means that these two, presently very different, codes (operational and classic) were originally one and the same (Rodin et al., 1996). And this, in turn, necessarily implies that the bottom (dumbbell) module of tRNA might have originated by duplication from the top (minihelix) module, or vice versa (Figure 1b); (Szathmary, 1999). The well-known internal sequence periodicity of tRNAs (Bloch et al., 1985) is consistent with this duplication model.
Concerted dual complementarity of second bases in two codes
The first three positions of the acceptor stem can be considered the best candidates for being the anticodon homolog, because they represent the major identity elements of tRNAs and they are located adjacent to the base-determinator and the 3′CCA site of amino acid attachment. However, straightforward analysis failed to uncover any traces of homology in this case. We performed a different analysis based on the assumption that the earliest primitive translation might have been more strand symmetric, as compared to the later mechanism that translated the strands after they had differentiated into the sense/coding and antisense/non-coding strands. Accordingly, instead of looking at the individual tRNAs, we tested pairs of consensus tRNAs with complementary anticodons, and were richly rewarded: for the majority of these tRNA pairs, the complementarity of anticodons was accompanied by the complementarity of second bases in their acceptor stems (Figure 1; Rodin et al., 1996).
We interpreted this parallelism as a remnant of the common ancestry of two, classic and operational, codes (Rodin et al., 1996; Rodin and Ohno, 1997). However, only the central nucleotide of the putative anticodon's duplicate in the acceptor stem showed this concerted dual complementarity, so it remained speculative that the primordial operational code even possessed a triplet structure. In a general context, the question was how could the anticodon and the duplicate in the acceptor stem coevolve within the same tRNA molecule during the expansion of the genetic code?
We have re-examined the dual complementarity for ancestral tRNAs that were reconstructed from the updated compilation of more than 8000 tRNA gene sequences covering the three main kingdoms of organisms—eubacteria, archaebacteria and eukaryotes (Sprinzl and Vassilenko, 2005). Dual complementarity was detected for pairs of ancestral tRNAs with completely complementary anticodons and was not detected for pairs in which only the second bases of anticodons were complementary (Rodin and Rodin, 2006a). This surprising difference suggests that
the dual complementarity originated when the three-letter frame of translation had already been in full use.
Although coevolution of these two codes could have started with the duplicates of same trinucleotides, both classic and operational codes were originally highly ambiguous. Specifically, only the second bases could have actually encoded the groups of similar amino acids at the time when the first protein aaRSs began to replace their ribozymic precursors (for details, see Rodin and Rodin, 2006a).
The dual complementarity is consistent with the archaic in-frame translation of both strands—sense and antisense
Figure 2 illustrates this statement for a hypothetical primitive gene consisting of complementary GCC and GGC triplets that encode Ala and Gly, respectively. Ala and Gly were likely the first amino acids incorporated into the genetic coding (Eigen and Schuster, 1979; Rodin and Ohno, 1997; Klipcan and Safro, 2004; Patel, 2005; Trifonov, 2005). Translation of both strands in the same frame suggests that at first the repertoire of codons expanded by complementary pairs rather than one-by-one. In Figure 2, the GCC → GUC (Ala → Val) transition in one strand is necessarily complemented by GGC → GAC (Gly → Asp) transition in the opposite strand (Figure 2a); accordingly, the evolving code gains a new pair of codons, GAC and GUC, for a new pair of amino acids, Val and Asp. Such a concerted recruitment of Val and Asp in translation would imply at least two duplication events for tRNAAla and tRNAGly genes, with the subsequent Val- and Asp-specific mutational ‘tune-ups.’ If the original pair of tRNAs with complementary anticodons, that is, tRNAAla and tRNAGly, carried complementary second bases in the acceptor stem (and it is likely that they did (Figure 2b)), then there is a high probability that their duplicates with complementarily mutated anticodons, tRNAVal and tRNAAsn, preserve this dual complementarity (Figures 2b and c) while gaining the specific identity elements for new amino acids (Val and Asp, in this case) elsewhere in their cognate tRNA molecules (for details, see Rodin and Rodin, 2006).
First pairs of complementarily encoded amino acids
Our belief was that pairs of tRNAs with G:U or A*C illegitimate pairings in their anticodons should also demonstrate significant dual complementarity, because G:U and A*C represent transitory mutational states in simultaneous sense–antisense in-frame coding, where one strand has already gained the mutation, whereas the opposite strand remains in the parental state (Figure 2a). However, we observed significant dual complementarity in pairs with such illegitimate pairing (G:U and A*C) at the flanking positions, but not in pairs with the illegitimate base pairs in the central position. Namely, out of 16 amino acid tetrads, only two (Ala(GCC), Gly(GGC), Val(GUC) and Asp(GAC)) and (Ala(GCG), Val(GUG), Arg (CGC) and His(CAC)) were complementary at the second position in their acceptor and anticodon in all four combinations: two legitimate G-C and A-U, one wobbling G-U and one A*C (Figure 2b). This means that (1) the central nucleotide-based skeleton of the genetic code (Table 1) was initially established for a few (four to six) amino acids, such as those from the above two tetrads and (2) subsequently, the code expanded mostly via conservative, or even silent, substitutions of the flanking nucleotides (Figure 2a). Interestingly, two basic amino acids, Arg and His, show significant stereochemical affinity to cognate triplets in selected RNA aptamers (Yarus et al., 2005). Gly, Ala, Asp and Val were the most preponderant of the abiotically synthesized amino acids (Miller, 1987). These four amino acids and their nearest one-transition-step-apart mutational derivatives generated the tRNA tree with a major NRN vs ИYИ dichotomy (Figure 2b). This dichotomy is consistent with (1) the primacy of Gly, Ala, Asp and Val in nearly every scenario of the origin of the genetic code, (2) the double-strand coding-based expansion of the genetic code, and (3) the preservation of dual complementarity. Significantly, no other amino acid tetrad generated such a bilateral branching pattern (Figure 2c).
The problem of the r-aaRS → p-aaRS transition
Our Ariadne's thread in the labyrinth of possible evolutionary transitions from ribozymes to proteins is the concept that in the emerging genetic code and associated translation machinery, both complementary strands of ancestral genes could have been used not only as catalysts (Kuhns and Joyce, 2003), but also, later, as first templates for encoded protein synthesis—that is, the future coding (sense) and noncoding (antisense) strands were originally both coding (Eigen and Schuster, 1979; Fukuchi and Otsuka, 1992; Rodin and Ohno, 1995, 1997; Carter and Duax, 2002; Pham et al., 2007). Therefore, we looked for ‘fingerprints’ of this primordial strand symmetry not only in tRNAs, but also in aaRSs and in the organization of the genetic code itself. The complementary modes of tRNA recognition by class I and II p-aaRSs were of particular interest in this regard.
Complementarity-based subcode for two modes of tRNA aminoacylation
All class I synthetases, except TyrRS (Yaremchuk et al., 2002) and TrpRS (Yang et al., 2006), approach the acceptor helix of the tRNA from the minor groove side and attach the amino acid to the 2′OH of the terminal adenine A76; by contrast, all class II synthetases, except PheRS (Goldgur et al., 1997), approach the acceptor helix from the opposite (major groove) side and attach the amino acid to the 3′OH. The distribution of these two classes in the code table does not appear to be arbitrary. In particular, it is immediately clear that all amino acids from the second column of the genetic code table (NCN codons) belong to class II, whereas all but Phe from the first column (NUN codons) belong to class I (Table 1). The main chemical properties of amino acids are determined by their side-chain R-groups: the nonpolar aliphatic Gly, Ala, Pro, Val, Leu and Ile; polar uncharged Ser, Thr, Asn, Cys, Met and Gln; negatively charged Asp and Glu; positively charged Lys and Arg; and ring/aromatic His, Phe, Tyr and Trp. Interestingly, the two classes of aaRSs are equally represented in each R-group (Patel, 2005, 2007). However, the amino acids with larger R-groups belong to class I, whereas their counterparts with smaller R-group belong to class II (Patel, 2005). Moreover, the median hydrophobicities of the two classes are very different (Pham et al., 2007).
With the exception of LysRS, every synthetase class assignment is invariant throughout the eubacteria, archaea and eukarya, suggesting that class assignment has not altered since the universal common ancestor of the three major kingdoms was extant (Cusack, 1997). It is unknown, however, whether this invariance was preserved due to steric or other constraints associated with amino acids, or their tRNAs (Frugier et al., 1993; Sissler et al., 1997; Ribas de Pouplana and Schimmel, 2001a, 2001b). The example of tRNALys is particularly revealing in this regard; in some archaebacteria, tRNALys is aminoacylated by class I LysRS (Ibba et al., 1997) instead of the ‘regular’ class II LysRS.
This double assignment of LysRS hints that either of the two enzyme classes is probably versatile enough to be able to aminoacylate tRNAs in all 20 cases. Why, then, are the p-aaRSs divided into two classes? Our recent analyses (Rodin and Rodin, 2006b) suggest a potential explanation.
If the direction in which the aaRS approaches the tRNA acceptor stem is used as a criterion of classification (instead of the class I vs class II dichotomy), then a strikingly nonrandom pattern of tRNA aminoacylation is revealed (Rodin and Rodin, 2006b). This alternative classification is shown in Table 2, with the two different modes of tRNA recognition represented by yellow (minor groove side) and blue (major groove side). Also, in this table, the AGG and AGA codons are assigned to blue Ser or Gly instead of yellow Arg, as they are in some mitochondrial codes (Knight et al., 2001). Alternatively, one can obtain the same yellow vs blue pattern by assuming an Arg↔Lys swap between codons AGR and AAR in Table 1. Remarkably, this swap is consistent with the fact (brought to our attention by E Szathmary) that the Arg-specific binding sites of selected RNA aptamers contain Lys's AAA codons (Caporaso et al., 2005). Another observation is that despite the two swaps, Phe↔Tyr and Lys↔Arg, this representation of the genetic code (Table 2) does not violate an equality of the two modes of tRNA recognition, from the minor and major groove sides, in each R-group ‘subclass’ of amino acids (Patel, 2007).
Thus modified the first genetic code column (NUN codons) is uniformly yellow, the second column (NCN) is uniformly blue, and the two remaining columns (NAN and NGN) appear to complement each other almost perfectly at the flanking codon positions (Table 2). Specifically, in the fourth column (NGN), all yellow codons start with a pyrimidine (Y=C or U) and all blue codons except UGG (Trp) start with a purine (R=A or G). The third column (NAN) also shows the yellow/blue split, but in a complementarily mirror manner (R/Y) and this time at the third, not the first, codon position.
The flanking codon nucleotides are directly connected by the Watson–Crick pairings under only one specific coding scenario, that is, when both complementary gene strands encode proteins, in the same frame. This prompted us to rearrange the genetic code in a way that places complementary codons face-to-face with each other. In this representation, a remarkable mirror symmetry becomes evident (Figure 3). Moreover, this symmetry demonstrates the otherwise latent subcode for the two modes of tRNA recognition and, correspondingly, the two types of anticodon pairs. Specifically (Figure 3b):
If two complementary codons contain YY vs RR at the second and adjacent (either first or third) positions, their aaRSs approach the tRNA acceptor from the same side of the groove (minor (yellow) for 5′NAR3′ × 5′YUИ3′ codon pairs or major (blue) for 5′RGN3′ × 5′ИCY3′ codon pairs).
If these positions are occupied by RY and YR, the modes of tRNA recognition are different, one from the minor groove side and the other from the major groove side, namely: minor (yellow) 5′YGN3′ vs major (blue) 5ИCR3′, and mirror-symmetrically, major (blue) 5′ИAY3′ vs minor (yellow) 5′RUN3′.
These two rules also hold for anticodons with G↔C, A↔U and R↔Y replacements.
The distinction between (i) and (ii) makes sense. The YR and RY dinucleotides include CG, GC, UA and AU palindromes, each of which is indistinguishable from its complement. Thus, even single base shifts in the recognition of the corresponding tRNAs would substantially increase the risk of amino acid confusion. Such errors were likely quite frequent in the primordial RNA life catalyzed by ribozymes. The problem persists if the cognate synthetases spread the tRNA recognition beyond these anticodons in the same direction. If they spread the tRNA recognition in opposite directions, this would greatly decrease the risk of incorrect aminoacylation (Figure 4). For YY and RR dinucleotides at the first–second or the second–third positions, this risk is not immediately obvious, and consequently, the synthetases either both approach from the major groove side or both approach from the minor groove side.
If we assume that the tRNA cloverleaf recognition spreads from the anticodon center in the opposite directions, then the corresponding pair of aaRSs must bind to their cognate tRNAs from the opposite (major or minor groove) sides. This is precisely the case with the two classes of protein synthetases. However, the subcode rules (i) and (ii) apply to codons and anticodons, whereas the protein synthetases most likely evolved from minimal catalytic modules that interacted with the acceptor stem (Schimmel et al., 1993). This contrast, in conjunction with the chicken-or-egg conundrum, suggests that the subcode for the two aminoacylations revealed by the ‘yellow-blue’ pattern (Figure 3) was initially established by two r-aaRSs (Figure 4). However, in our first report (Rodin and Rodin, 2006b), we overlooked what is quite possibly the most conclusive argument yet in support of this hypothesis. It is described below.
The two modes of tRNA aminoacylation are not always symmetric
In general, the revealed yellow/blue pattern (Figure 3) appears to be almost perfectly symmetric with respect to the discrimination between complementary anticodons. (The only deviation is caused by a ‘major-groove-side’ TrpRS. Yet, Trp is not that exceptional because it still represents class I, and is a relatively late amino.) For pairs of the first type (YY vs RR), the symmetry looks invariant to ‘flipping’ the colors. For pairs of the second type (RY vs YR), the direction in which the given r-aaRS ‘spreads’ also does not matter, that is, it is irrelevant which half-tRNA, from minor groove side or major groove side (yellow or blue), is involved. What matters, although, is that the complementary partner spreads in the opposite direction (Figure 4). We have tacitly accepted this isotropy before (Rodin and Rodin, 2006b), but this symmetry is deceptive. On closer inspection, the two directions cease to be equal if we take into account that in all tRNAs, regardless of their complementary partnerships, the anticodon triplet has adjacent UY dinucleotides (mostly UC) at its 5′ side, and adjacent RN dinucleotides (mostly AA) at its 3′ side.
both from the minor groove sides (yellow and yellow), 5′ × 5′
one from the minor and one from the major groove side (yellow and blue), 5′ × 3′
both from the major groove sides (blue and blue), 3′ × 3′
one from the major and one from the minor groove side (blue and yellow), 3′ × 5′.
For the example shown in Figure 4, only the second scenario (5′ × 3′) allows the putative r-aaRSs to meet dissimilar sequences within their anticodon loops, thereby providing faultless discrimination between the cognate tRNAs at aminoacylation. In contrast, the other scenarios would contain similar tetranucleotides that included at least one anticodon (underlined in Figure 4b); moreover, these tetranucleotides would be undistinguishable for the two r-aaRSs at binary resolution, purine (R) vs pyrimidine (Y). The fourth scenario is actually the worst—the two putative r-aaRSs would encounter the identical tetranucleotides (5′UCGA3′) that included both anticodons, even without shifting. Thus, although the fourth scenario is a mirror copy of the second, it has a much higher potential for anticodon confusion, and subsequent erroneous aminoacylation.
In a similar fashion, we tested each of the 32 pairs of complementary anticodons for the risk of amino acid confusion under each of the four scenarios (Table 3; low- and high-risk cases denoted by pluses and minuses, respectively). As expected, the macrosymmetry of this test was well pronounced. For example, the 5′ × 3′ and 3′ × 5′ scenarios are equal with respect to the total plus/minus ratio of 24:8 (3:1). In sharp contrast, the 5′ × 5′ and 3′ × 3′ scenarios are generally more easily confused (the ratio is 1:1). However, the detailed analysis below indicates that the distribution of complementary anticodon pairs and cognate amino acids among these four scenarios is definitely asymmetric and nonrandom.
Ribozymic precursors of tRNA synthetases had the same two complementary modes of tRNA aminoacylation
Let us consider first the group of complementary 5′RGN3′ and 5′ИCY3′ anticodons at the bottom of Table 3. These eight GC-rich pairs are reliably distinguishable under any of the recognition scenarios, as if there was no preference/selection at all. This suggests that the actual scenario chosen (3′ × 3′) might reflect the most fundamental aspects of translation. Remarkably, all amino acids from these pairs are believed to be the first or at least among the earliest candidates recruited in translation (Eigen and Schuster, 1979; Klipcan and Safro, 2004; Trifonov, 2005; Patel, 2005, 2007). Furthermore, in all of these major groove/major groove cases, the putative r-aaRSs grow and spread their recognition of tRNAs from the 3′ end, that is, moving first along the acceptor stem, then along the TΨC domain (together comprising the minihelix), next along the variable loop, and eventually reaching the anticodon. This is perfectly consistent with the original replication-tag functions of the acceptor-like precursors of tRNAs (Weiner and Maizels, 1987, 1999; Maizels and Weiner, 1994); the idea that the ancient operational code is embodied in this part of the tRNA molecule (Schimmel et al., 1993; Schimmel and Beebe, 2006); and the concerted complementarity of the acceptor's second bases and complementarity of anticodons (Rodin et al., 1996, Rodin and Ohno, 1997).
The next 16 pairs of anticodons (enclosed in red in the middle of Table 3) contain RY and YR at the second and adjacent (either first or third) positions. The Arg(UCG)–Ser(CGA) pair (Figure 4) belongs to this group. The advantage of the 5′ × 3′ scenario is clear—at the R/Y resolution, its +/− ratio is 12:4 (3:1), whereas the mirror 3′ × 5′ scenario yields 8:8 (1:1) (Table 3). Moreover, the selection of this scenario fits with the aforementioned earlier choice of the 3′ × 3′ scenario. Indeed, when the ancient operational code allotted the major groove side r-aaRS to Ala, Thr, Pro and Ser in their complementary pairs with Gly (also major groove side), it made the subsequent reassignment of the same amino acids, Ala, Thr, Pro and Ser, to the mirror (minor groove side) r-aaRS (in pairs with Arg, Trp and Cys; see Table 3) exceedingly costly, and therefore unlikely.
At higher resolution, when r-aaRSs distinguish G from A and C from U, G–U is considered to be a weak, wobbling, bond, while A*C is a clear mismatch. At this level, the relative excess of low-risk cases for amino acid confusion under the 5′ × 3′ scenario becomes even more pronounced, with a ratio of 15:1. The 5′ × 3′ scenario is only less favorable for one minor/major groove pair of complementary anticodons (GCA (Cys) and UGC (Ala)) than the 3′ × 5′ scenario, the actual loops being 5′CU–GCA–AA3′ and 5′UU–UGC–AA3′ (identical tetranucleotides are underlined), respectively. Note that the p-aaRSs for both of these amino acids do not need the anticodon for error-proof aminoacylation of their cognate tRNAs (Schimmel and Beebe, 2006).
At first glance, the four (out of eight) anticodon pairs of the YY vs RR type at the top of Table 3 are at variance with all of the above; they do not share any amino acids with the previous two groups, and their best discrimination can be achieved by the ‘minus-free’ 3′ × 5′ scenario instead of the actual 5′ × 5′ scenario (with four high-risk cases). However, two of these four minuses represent the pairs with stop codons! Advanced translation needs termination marks, and it seems rational that these triplets, which would otherwise be extremely confusable with their complementary partners, should be selected such that they do not convey genetic information, thereby being available for ‘punctuation’. Moreover, all pairs of amino acids belonging to this group, other than Glu (CUC) × Leu (GAG), entered translation relatively late (Eigen and Schuster, 1979; Klipcan and Safro, 2004; Trifonov, 2005; Patel, 2005, 2007), possibly when the repertoire of potential assignments to the 3′ r-aaRSs (blue) had been filled, thus increasing the risk for a new amino acid to be confused with an old one.
The actual evolutionary pathway (green in Table 3) includes only two amino acids (both likely to be latecomers) that are either undistinguishable (Cys) or barely distinguishable (Gln) from their complementary partners, even at the higher G/C/A/U level of recognition. It is hardly a coincidence that in many prokaryotes these amino acids represent two of three (Cys, Gln and Asn) indirect older routes for aa-tRNA synthesis: SepRS/SepCysS and Glu-tRNAGln through which Cys and Gln actually entered translation (Ibba et al., 2000; Di Giulio, 2002; O'Donoghue et al., 2005).
In general, there is a strong correlation between the group of abiotically synthesized amino acids (Eigen and Schuster, 1979; Miller, 1987) and the risk of their confusion with a complementary partner. In 16 out of 32 anticodon pairs, complementary anticodons are easily distinguished under any of the four tRNA recognition scenarios (Table 3). Remarkably, all abiotically synthesized amino acids (that is, Ala, Gly, Asp, Val, Leu, Glu, Ser, Ile, Thr and Pro) fall into this group. To achieve this by chance alone is exceedingly improbable.
Origin of the two modes of tRNA recognition is consistent with the stereochemical affinity of amino acids to their own coding triplets
In pools of RNA aptamers that were selected from presumably random sequences to bind to specific amino acids, the aa-binding sites contained cognate codons and/or anticodons at frequencies considerably greater than expected (Caporaso et al., 2005; Yarus et al., 2005). This striking association was reported for seven amino acids: Arg, Trp, His, Tyr, Ile, Leu and Phe. Nonspecific aptamers with cognate triplets and an affinity to the hydrophobic l-valine side chain have also been selected (Majerfeld and Yarus, 1994).
Six amino acids with the presumed stereochemical affinity to their own coding triplets—Arg, Trp, His, Tyr, Val and Ile—come from pairs of complementary anticodons 5′NCR3′ × 5′YGИ3′ and 5′NAY3′ × 5′RYИ3′ (red frame in the middle of Table 3). It is the minor/major, 5′ × 3′ (yellow/blue), scenario of tRNA recognition that gives these six amino acids the lowest risk of confusion with complementary partners. Furthermore, in their aptamers, the Arg- and Tyr-binding sites contain not only a coding triplet per se, but also all of the penta-nucleotides (or their complements) of the anticodon loop. This makes the confusion of Arg and Tyr with their complementary partners (Ser and Ile, respectively) unlikely, if and only if the correct tRNA-recognition scenario (5′ × 3′ in these two cases) is used. These pentanucleotides are 5′UGUAG3′ for the Ile × Tyr pair and 5′UCGAA3′ for the Arg × Ser pair (Figures 4a and b).
Remarkably, the above correlation does not seem to hold for the eight pairs of complementary anticodons 5′NCY3′ × 5′RGИ3′ and their amino acids, Gly, Ala, Thr, Pro and Ser (at the bottom of Table 3). To be more precise, neither positive nor negative results have been reported yet for this group of presumably the earliest amino acids in terms of attempts to select aa-binding RNAs. Yet, even the complete lack of stereochemistry between these amino acids and their cognate triplets is not discouraging. On the contrary, we would expect this if, as we proposed, the recognition of the corresponding pairs of tRNAs was encoded originally in the acceptor stem, was from the major groove side (that is, under the 3′ × 3′ (blue/blue) scenario), and was independent of the anticodon domain. The strong dependence appears later, in the 5′ × 3′ minor/major (yellow/blue) scenario, and thus substantiates the very existence of the two complementary modes of tRNA recognition under aminoacylation.
Furthermore, if indeed the anticodon and first three paired bases in the acceptor helix had a common origin, then the ancient operational code contained not only proto-anticodons on one strand, but necessarily proto-codons on the opposite strand. However, the updated analysis of dual complementarity clearly points to a significant ambiguity of this ancestral double-stranded code (Rodin and Rodin, 2006a). Therefore, it seems reasonable that, in addition to examining individual amino acids for stereochemical affinities to their anticodons or codons, we also test (by both modeling and SELEX-like experiments) the binding preferences between (1) groups of similar amino acids and their coding triplets (Rodin and Rodin, 2006a), and (2) amino acids and codon–anticodon pairs (rather than just individual codons or anticodons) (Patel, 2007). Note, in this regard, that the Arg-binding site of the Tetrahymena group I self-splicing introns is located in the major groove of an rRNA precursor's P7 helix, with codon and anticodon triplets opposing each other on the complementary strands (Yarus, 1991, 1993). This said, when we speculate on the early coevolution of the two codes, it might be even more tempting to apply the above tests to the amino acids that are represented by pairs of complementary anticodons 5′RGN3′ × 5′ИCY3′, that is, the eight blue/blue pairs at the bottom of Table 3.
It is probable that the pairs of anticodons 5′NAR3′ × 5′YUИ3′ (at the top of Table 3) do not conform to the pattern because, again, they include stop codons and the amino acids that most probably entered translation relatively late (Phe and Gln). Gln is particularly indicative here because, as we have already mentioned, it is an indirect addition to the genetic code and it is the only amino acid whose binding sites in selected RNA aptamers do not contain coding triplets (Caporaso et al., 2005; Yarus et al., 2005).
Mimicry between two p-aaRSs and their ribozymic precursors
Most remarkably, for pairs of RY- and YR-containing anticodons (that is, pairs of type (ii) in the subcode for two aminoacylations), it is the 5′ × 3′ scenario which (1) is consistent with the earliest 3′ × 3′ scenario, (2) is more secure than the mirror 3′ × 5′ scenario and (3) is what is actually observed in extant class I and class II p-aaRSs. This crucial nonequivalence between the 5′ × 3′ and 3′ × 5′ scenarios suggests that the revealed subcode for two aminoacylations (Figure 3), and the existence of two complementary versions of p-aaRSs, must have been evolutionarily connected through the direct ribozymic precursors of p-aaRSs.
This intriguing connection highlights the importance of molecular mimicry in the RNA → RNP (RNA plus protein) transition (Nakamura, 2001; Liang and Landweber, 2005; Delarue, 2007), and strongly supports our earlier hypothesis that the two complementary recognition patterns of acceptor stems by the class I and class II p-aaRSs were inherited from the two isofunctional ribozymes (Rodin et al., 1996). This is also relevant to the possible origin of the two p-aaRSs from the complementary strands of the same ancestral gene (Rodin and Ohno, 1995) that conceivably, directly recapitulates the preceding complementarity of the two r-aaRSs. The nonrandom complementarity of signature motifs from the class I and II catalytic domains that are aligned in a ‘head-to-tail’ orientation (Rodin and Ohno, 1995), the real precedent of sense–antisense coding of class I and II aaRS homologs (Carter and Duax, 2002) and the recent artificial creation of the 130-residue minimal catalytic domain of TrpRS (that perfectly fits the minimal catalytic domain of class II aaRSs complementarily and retains the ability to specifically activate tryptophan; Pham et al., 2007) strongly support this hypothesis. Moreover, our present analysis shows that this r-aaRS → p-aaRS succession was quite efficient, even without a ‘color change:’ class I p-aaRSs directly followed their minor groove side, and class II p-aaRSs directly followed their major groove side ribozymic forerunners.
Also, Watson–Crick pairing-based recognition of the operational proto-code by r-aaRSs might imply a local distortion of the acceptor helix. Interestingly, interactions of typical class I protein aaRSs with tRNAs do cause serious changes of the acceptor stem end, including unwinding and disruption of base pairing (Rould et al., 1989; Carter, 1993).
Initially, the ancestors of the two p-aaRSs could have played a chaperone role by protecting the acceptor stem from both sides (Ribas de Pouplana and Schimmel, 2001a, 2001b). As to their participation in coding, the updated analysis of the dual complementarity indicates that p-aaRSs began to replace isofunctional ribozymes long before all 64 codons received their final assignment, and yet most likely only after the complementary core of the code (Figure 3) had been established (Rodin and Rodin, 2006a, 2006b). Since then, duplications of tRNA and p-aaRS genes, and their very specific coevolution, might have gradually reduced the code's ambiguity, as outlined in Ribas de Pouplana and Schimmel, 2001a, 2001b; Carter and Duax, 2002; Rodin and Rodin, 2006a, 2006b; Schimmel and Beebe, 2006; and Pham et al., 2007.
NCN and NUN codons: the first choice between two aminoacylation modes
Although the four scenarios of tRNA recognition in Table 3 have been evaluated for pairs of complementary anticodons, strictly speaking the ‘plus’ mark does not necessarily imply their simultaneous entry into the coding system. This means that the gamut of choices (colored green in Table 3) selectively favors any scenario for the formation of the genetic code that takes into consideration the partitioning of aaRSs into two classes. The recent phenomenological model of progressive differentiation-like reduction of codon ambiguity (Delarue, 2007) is no exception. This elegant model is also based on the pattern of tRNA aminoacylation by class I and II aaRSs, similar to the pattern in Table 2 except for its third (NAN) column. However, in contrast to our complementarity-based model, Delarue (2007) interpreted this pattern as a binary decision tree, emphatically asymmetric, like in a longitudinal differentiation process. Each decision is a choice between two options (first r-aaRSs, then their protein successors, p-aaRSs), but the reason why the minor (yellow) or major (blue) groove side is preferred in each particular case (step) remains unclear. In fact, the subcode for two complementary aminoacylations (Figure 3) and its selective evaluation (Table 3, Figure 4) could provide such explanation. These two models will be compared in detail elsewhere; here, we focus on their mutual benefits associated with the choice between class I and II aaRSs, which is the first choice in Delarue, 2007.
This choice is really self-evident (Table 2): the central C in codons (G in anticodons) leads to tRNA recognition from the major groove side (blue), whereas the central U in codons (A in anticodons) leads to tRNA recognition from the minor groove side (yellow). Recognition of tRNAs by r-aaRSs was likely based on complementary interaction with anticodons. However, this interaction looks absolutely symmetric with regard to G–C vs C–G or A–U vs U–A pairings. Therefore, the choice itself says nothing about why it should involve the first two columns of the genetic code table and not, for example, the third and fourth columns. The answer comes from our analysis of the risk of confusing complementary anticodons that is associated with adjacent 3′U and 5′A/G nucleotides. If our (or Nature's) primary motivation is to minimize the risk of confusion, then the optimal distribution of amino acids and cognate triplets among the two modes of tRNA recognition must necessarily be, as shown in Table 3, that is, starting from the major/major groove sides (blue/blue) pairs of complementary triplets.
If we again consider the possible recruitment of the GUC(Val)–GAC(Asp) pair into a primitive sense–antisense translation as the C → U–G → A derivative of the GCC(Ala)–GGC(Gly) pair (Figure 2a), then the two complementary expansions of the code, Ala → Val and Gly → Asp, may seem equivalent with respect to the fidelity of tRNA aminoacylation by r-aaRSs, but they are not (Figure 5). Indeed, r-ValRS with its anticodon-binding putative GUC site, can recognize not only its cognate GAC anticodon but also the Ala anticodon (GGC) due to U:G wobbling pairing (Figure 5). Importantly, such confusion of new (Val) and old (Ala) amino acids would pleiotropically affect all ‘old’ Ala codons, not just the one mutated (‘new’) individual codon. In contrast, r-AspRS, with its GAC site is unable to recognize the Gly anticodon (GCC) because of A*C mispairing (Figure 5). In this case, the G:U wobbling recognition also occurs, but it comes from the ‘old’ r-GlyRS, not from the ‘new’ r-AspRS, and therefore, does not bring the risks of the pleiotropic negative effects of Gly → Asp in many old Gly codons.
To avoid multiple mishaps along the above (Ala → Val) lines, a principally different mode of tRNA recognition is needed that would make r-ValRS much less (if at all) confusable with the already established r-AlaRS (this is apparently not required in the case of r-AspRS). And this is precisely what happens in reality: AspRS is of the same type as GlyRS—major (blue) groove type—whereas ValRS adopted the new, minor (yellow) groove, mode of tRNA recognition that safely distinguishes it from AlaRS (major groove).
Switching of r-aaRSs from the major to minor groove sides implies spreading out of tRNA recognition in the opposite (from anticodons) direction (Figure 4). Inevitably, the flanking positions of anticodons (and complementarily codons) will be replaced under rules (i) and (ii) of the subcode for two aminoacylations: the first position is changed for the third position and vice versa.
The assignment of the minor and major groove sides of tRNA recognition to the (NUN) and (NCN) columns of codons (Table 2), coaxes out further choices for the evolution of the genetic code in the direction of the route that was actually chosen—by leaving only two options for each type of complementary pairs (shown by black frames in Table 3) out of conceivable four. In fact, this differentiation of NYN on blue NCN and yellow NUN, likely predetermined by the primal choice of the 3′ × 3′ scenario for early amino acids such as Ala and Gly (Figure 5), makes the fourth, 3′ × 5′, scenario very unlikely, and the advantages of the second, 5′ × 3′, scenario even more convincing (Table 3). Thus, the asymmetric differentiation-based model (Delarue, 2007) and our ‘symmetric’ complementarity-based model (Rodin and Rodin, 2006b) supplement, rather than contradict, each other.
The genetic code-shaping processes—coevolution of anticodons with their putative duplicates in the acceptor stem of tRNAs, and the transition from r-aaRSs to p-aaRSs with the same two modes of tRNA recognition—are apparently interrelated. Both suggest that the codon repertoire is likely expanded by means of complementary pairs. This is beautifully reflected in the yin-yang-like mirror symmetric pattern of tRNA aminoacylation that is revealed in the genetic code table after its complementary transformation (Figure 3). In fact, the possible in-frame coding of two p-aaRSs by the two complementary strands of the same ancestral gene represents perhaps the most important variation on the theme.
Yet, our higher-resolution analysis of the mirror symmetric pattern (Figure 3) revealed the fundamental nonequivalence of different pairs of complementary anticodons, as far as the risk of their confusion during aminoacylation is concerned. The cause of such errors is located in the anticodon-flanking nucleotides U and R. These two nucleotides are almost invariant; hence, they do not affect the aa-specificity of tRNAs. This was also most likely the case for primordial tRNAs that were aminoacylated by r-aaRSs. However, Table 3 and Figure 4 show how important these U and R nucleotides can become if the risk of confusion of tRNAs is taken into account during the recognition of tRNAs by aaRSs from the minor and major groove sides. The cost of such confusion is the highest for complementary triplets, because they, more often than not, encode very different amino acids. It seems reasonable, therefore, to assume that each aa-specific tRNA had both aminoacylation options (either by minor groove or major groove r-aaRSs) available at first, and the priority in choosing the correct option was a lower risk of confusion with its complementary partner.
When looked at from this point of view, the two complementary, symmetric modes of tRNA recognition by aaRSs—from minor vs major groove sides—are not perfectly symmetric (Figure 4 and Table 3). Of particular interest in this respect is the difference between the two groups of complementary RR- and YY-containing pairs of anticodons, that is, the 5′RGN3′ vs 5′ИCY3′ pairs, which represent early amino acids and the 5′ NAR3′ vs 5′YUИ3′ pairs, which represent later amino acids. The example in Figure 6 shows why the recognition confusion is virtually impossible for early amino acids and very likely for later amino acids. Not surprisingly, the stop codons UAA and UAG belong to the later group of amino acids.
In conclusion, we believe that the uncovered subcode for the two tRNA recognition modes (from the minor and major groove sides)—the subcode that is essentially associated with complementary anticodons and their adjacent U and R nucleotides—represents an ancient and very important milestone in the history of life. Furthermore, our analysis suggests that the two complementary modes of tRNA aminoacylation, mediated by the ancient ribozymes, constitute the missing link between the two fundamental components of the genetic coding system: the classic code embodied in the anticodon and the operational code embodied mostly in the acceptor stem. Originally, in conformity with an updated dual complementarity and the 3′ × 3′ scenario of tRNA recognition for earliest pairs of amino acids (Table 3), the anticodon loop structure might have evolved to fit an operational code (with a fixed second base in the acceptor stem) rather than the other way around. In this sense, the presumable antiquity of the operational code (Schimmel et al., 1993) is compatible with the logical primacy of anticodons (Szathmary, 1999; Rodin and Rodin, 2006b). This is also consistent with the model of early coding pentanucleotides 5′URNYA3′ that provided archaic ribosome-free translation (Crick et al., 1976).
We thank Paul Schimmel, Charles Carter, Eors Szathmary, Apoorva Patel and Massimo Di Giulio for many thought-provoking discussions and valuable suggestions. We also thank anonymous reviewers for many useful criticisms and suggestions that have enhanced the quality and readability of the manuscript. Finally, we are greatly indebted to Keely Walker, Sarah Cheung and Christine Foreman for work on the manuscript.