Translation of non-standard codon nucleotides reveals minimal requirements for codon-anticodon interactions

The precise interplay between the mRNA codon and the tRNA anticodon is crucial for ensuring efficient and accurate translation by the ribosome. The insertion of RNA nucleobase derivatives in the mRNA allowed us to modulate the stability of the codon-anticodon interaction in the decoding site of bacterial and eukaryotic ribosomes, allowing an in-depth analysis of codon recognition. We found the hydrogen bond between the N1 of purines and the N3 of pyrimidines to be sufficient for decoding of the first two codon nucleotides, whereas adequate stacking between the RNA bases is critical at the wobble position. Inosine, found in eukaryotic mRNAs, is an important example of destabilization of the codon-anticodon interaction. Whereas single inosines are efficiently translated, multiple inosines, e.g., in the serotonin receptor 5-HT2C mRNA, inhibit translation. Thus, our results indicate that despite the robustness of the decoding process, its tolerance toward the weakening of codon-anticodon interactions is limited.

This manuscript describes experiments designed to test the role of hydrogen bonding in codonanticodon formation during decoding in the ribosomal A site in bacteria and mammalian cells. The bacterial experiments were performed with an in vitro translational system and the mammalian experiments were done in vivo in human HEK293 cells. The research design involved insertion of nonstandard nucleotides into mRNAs that were then translated by normal tRNAs. The experiments clearly show that some hydrogen bonds are necessary for recognition, especially in the first two codon positions; the wobble position was less dependent on hydrogen bonding to form a codon-anticodon interaction and presumably relies on stacking forces, although this could not be tested explicitly in this system. A second part of the manuscript tests the role of inosines edited into certain eukaryotic mRNAs, showing that substitution of inosine for adenine has little effect on how the mRNA is decoded (the amino acids inserted) but the presence of multiple inosines does drastically reduce the rate of translation of the edited mRNA. Overall, this is a very useful set of experiments that for the first time test the roles of specific hydrogen bonds in codon recognition.
The description of the data, however, is somewhat sloppy with the authors claiming that certain nonstandard nucleotides are invariably interpreted as equivalent to a single standard nucleotide. This occurs on pages 8-9 and includes claims that zebularine is decoded as C resulting in incorporation of leucine at ZeUU codons (Table S2 and S3 show that isoleucine is incorporated instead of leucine-this could simply be a typing mistake but it occurs in both tables). The authors state that UZeU is partly decoded as Phe but only in the bacterial system despite Table S3 listing Phe as a decoded amino acid in HEK 293. Further on, they state that inthe AUP codon, P (purine) is recognized as A whereas in Table S2 the codon is shown specifying Ile or Met showing that P is recognized as A or G. Later, on page 9, they state that DAP (2,6-diaminopurine) in the codon AUDap cannot be decoded as Met in HEK 293 cells despite Table S3 showing a peptide that incorporated Met in response to AUDap.
The authors also make a strong argument for bacteria being "less restrictive towards modified nucleotides" by which, in context, they mean that there are fewer alternative interpretations of the nucleotides (for example, as A or G) in the HEK 293 data than in the bacterial data. That conclusion rests on determining the amino acid sequence of synthesized peptides and the amount of data from the proteins produced in vivo in HEK 293 cells is far less than from the bacterial experiments. The alternative interpretation of the non-standard nucleotides often much weaker, producing much less peptide, so it may be that the alternatives were not observed because of lack of data. The authors should either justify their conclusion or reduce their claim for "less restrictive" translation in bacteria.

Specific comments:
Page 8, line 11 up: "degenerative" should be "degenerate" Page 9, line 16: "ac4C34" should be "ac2C34" Page 10, line 13: "ribosomen" should be "ribosome" Page 12, line 12: the use of "peculiar" here is not needed; although the word can mean "particular or special" it is much more commonly understood to mean "strange or odd". I doubt that the authors mean that.
Page 12, line 4-5 up: the statement that the eukaryotic system might be more accurate (which hasn't been well determined for the human system used here-the reference is to accuracy in yeast) resulting from longer proteins in eukaryotes is highly speculative. The length of the proteins isn't the reason eukaryotes might be more accurate but is a adaptationist's explanation for the observation. There are probably many reasons why the situation is as it is but sheer protein length is only one of them and doesn't address the mechanistic reason underlying increased accuracy.
Page 13, line 9 up: "the codon-anticodon helix is not as delicate and fragile as previously expected". No reference is provided to show that this was the expectation and, more importantly, the phrase is far too informal for a paper like this. What, in a technical sense, does "delicate and fragile" mean? Response #3: We agree that this phrasing was misleading. We formulated the sentence more precise (p. 10, line 5): "We found that single inosines did not affect the yield of the translated peptide product ( Figure 4D). As expected, the inosines were exclusively decoded as G, resulting in an amino acid change from Gln to Arg and from Asn to Ser in the case of the modified 5-HT 2C R and GluR-B mRNAs, respectively ( Figure 4E and Table   S3)."

Comments to reviewer #2:
Comment #1: The authors study the translation efficiency and selectivity of amino acid incorporation using artificial codons containing several types of base analogs by focusing on hydrogen bond interactions between codon and anticodon. The results that amino acid selectivities vary between prokaryotic and eukaryotic systems arouse researchers' interest for the usage of artificial codon and anticodon systems. However, it seems to me that the study is only based on the number of hydrogen bonds, difficult to scrutinize the relationship between translation and codon-anticodon interaction. Since the data are useful for researchers relevant to translation studies, the manuscript should be published in more specific journals, such as Nucleic Acids Res.
Response #1: Thank you for reviewing our manuscript. To our knowledge, our study is the first one to systematically address the codon-anticodon interaction during protein synthesis in a natural setting by focusing on artificial codons. Although some limitations are present (see comments below), we think that our data can scrutinize the relationship between the codon-anticodon interaction and decoding. Since decoding is such a central process in every living cell, this manuscript will be of interest for broad readership. As you pointed out, these results might also be taken into account for other biological processes, depending on nucleobase interactions.

Comment #2: The strength of hydrogen bonds in each position between pairing bases is different, and thus
more quantitative study of hydrogen bonds is required. For example, in replication, the incorporation efficiencies of 6-methoxypruine-thymine, 2-aminopurine-cytosine, and 6-methoxypurine-cytosine are ten-times different among each, despite of each has one hydrogen bond (Biorg. Med. Chem. Lett., 12, 1391, 2002. Response #2: Thanks for raising a valid point about the number of H-bonds. As you state, the mentioned base pairs 6-methoxypurine-thymine, 2-aminopurine-cytosine, and 6-methoxypurine-cytosine just form one H-bond and show different incorporation efficiencies during replication. However, this seems to be partly explainable by the position of the formed H-bond. The pair 6-methoxypurine-thymine forms a central H-bond between N 1 (purine) and N 3 (pyrimidine) and was reported to have the highest incorporation efficiencies (among the three mentioned base pairs). However, 2-aminopurine-cytosine pair, forming the H-bond between the 2-amino group and the carbonyl oxygen at C 2 , is less efficiently incorporated into the DNA (Hirao et al.;2002). This position dependent effect can be also observed in our study since a single central H-bond only leads to high product yields, if it is formed between the N 1 (purine) and N 3 (pyrimidine). In case of pyridone or c 1 A, providing the single H-bond at a different location, translation efficiencies were drastically reduced. We changed the manuscript accordingly: 1.) p. 12, line 12: "Translation of the respective codon is only modestly impaired, when H-bonds between the purine-N 1 and the pyrimidine-N 3 at the first two codon positions are formed. Thus, in the cases of pyridone or c 1 A, the single H-bond is at a different location and translation efficiencies are drastically reduced.
The only exception was the translation of single purines within an AAA codon (Lys) in HEK293T cells.
[…]" 2.) p. 13-14, line 33: "Over the last decades, different nucleotide analogs and base pairs have been screened for their ability to form stable W-C base pairs, predominately during replication and transcription 68,69 .
In line with our findings, most of these nucleotide derivatives provided the structural prerequisites to form at least one H-bond. Interestingly, the pair 6-methoxypurine-thymine forms a central H-bond between N 1 (purine) and N 3 (pyrimidine) and was reported to have the highest incorporation efficiencies (among the three mentioned base pairs), while the 2-aminopurine-cytosine pair that forms the H-bond between the 2-amino group and the carbonyl oxygen at C 2 is less efficiently incorporated into the DNA 69 . The only exceptions are fluorinecontaining bases. Although not forming H-bonds, they were incorporated during DNA replication 70,71 . More recently, an artificial base pair was identified that did not depend on the presence of H-bond interactions but still allowed efficient transcription and subsequent translation 44 . In this artificial base pair, the components are highly hydrophobic and their interaction leads to a base pair isosteric to W-C pairs. The missing H-bonds can most likely be compensated by different hydrophobic and stacking interactions 72 . In contrast to these studies, we systematically eliminated potential H-bond partners only from the mRNA codon side in the codon-anticodon helix, revealing the robustness of the decoding process in an authentic setting for protein synthesis. Clearly, changes within purines or pyrimidines impact also polarity, stacking, the syn/anti equilibrium and can lead to steric effects. These parameters contribute to the binding strength in a complex manner and would require extended and complex quantum-mechanical calculations as well as precise crystallographic structures to provide a satisfactory energy balance of their contributions." Comment #3: In addition, for the codon-anticodon interaction, the authors should consider not only hydrogen bonds but also polarity of bases, stacking, and steric effects including hydration for base pairings. Furthermore,

the recent study using hydrophobic unnatural base pairs (ref. 44) exhibits that no-hydrogen-bonded base pair can be used for codon-anticodon interaction. However, the authors do not discuss this deeply. There is the contradiction between the authors' idea and the Romesberg's data.
Response #3: We agree that more than hydrogen bonding is responsible for the formation and stability of Watson-Crick base pairing. We are aware that changes within purines or pyrimidines also impact polarity, stacking, and lead to steric effects. However, an analysis of the translation products does not allow dissecting the exact cause for either lower protein yields or misincorporations. In our opinion even a deeper analysis, lacking extended and complex quantum-mechanical calculations, might not be able to dissect the complex interplay among H-bonds, dipole orientation, stacking, sterics and stereoelectronics. A sentence on this point has been added (p. 14, line 15).
With respect to the mentioned Romesberg's data: indeed, the unnatural base pair in this work (Zhang and Romesberg, 2015) allows the establishment of an intact codon-anticodon interaction. In contrast to our study, the authors screened for a functioning base pair instead of altering only one interaction partner. By combining two novel partners, the lack of H-bonds might be compensated by hydrophobic interactions or other interacting forces. It seems that defining rules for a functioning codon-anticodon base is not trivial, since >200 compounds were screened (Zhang and Romesberg, 2018) to find a suitable couple. In contrast, we aimed at defining and determining the interactions that allow base pairing within a Watson-Crick or Wobble geometry. As you suggested, it would be interesting to systematically alter the tRNA side also in order to find a compensating modification (see comment below). We added a paragraph to the main text to address this and shortly discuss the possibility of base pairing even in complete absence of H-bonds (p. 13, line 33; see comment #2).
Comment #4: 1. Opposite the abasic site, pyrene might be better than the natural bases as a pairing partner (Nature, 399, 704, 1999.). The limitation of the authors' current research is that they use only codon modifications in mRNA. For more precise research, modified anticodons in tRNA should be examined.

Response #4:
We initiated this project to investigate the effects of mRNA modifications on translation.
Therefore, we started modulating the codon-anticodon interaction by altering the mRNA codon nucleotides. On the contrary, we do not think this constitutes a limitation because we can study an artificial mRNA in a natural environment. However, we totally agree that altering the helix also by nucleotide derivatives within the anticodon is really interesting. The first chemically synthesized tRNA (interacting with unmodified and modified mRNAs), carrying all natural modifications, was already employed in this study. This is definitely the starting point to extend the investigations on the role of codon-anticodon interactions from "the view" of the tRNAs.
However, this is currently beyond the scope of this manuscript.

Comment #5: The authors should discuss the possibility of syn conformation, especially for Benz.
Response #5: Indeed, this possibility is difficult to exclude, although it is doubtful it could contribute to isosteric base pairing with natural tRNAs. However, the equilibrium between anti and syn may contribute to the binding efficiency. This is now alluded to in the text (together with comment #2). Fig.1 legend, there is no indication about black and gray color difference of base pairs (probably black for mRNA, and gray for tRNA).

Comment #6: In the
Response #6: Thank you for bringing this to our attention. In order to highlight the structures of modified derivatives and to (optically) tone down the standard nucleotides we chose to depict them in black and grey, respectively. We added this information to legend of Fig. 1: "The modified and standard nucleobases are depicted in black and grey, respectively." Comment #7: 4. Probably, Supplementary Figures S5, S6, and S7 are not cited in the main text.
Response #7: Thank you. We included the reference to Figures S5 (p. 7, line 17 Response #1: Thank you for those positive comments on our work.

Comment #2:
The description of the data, however, is somewhat sloppy with the authors claiming that certain non-standard nucleotides are invariably interpreted as equivalent to a single standard nucleotide.
Response #2: We agree that we did not comment on rather rare misincorporation events in the main text. Our intention was not to bloat the manuscript with small effects. We now included them in the main text (see below).

Comment #3: This occurs on pages 8-9 and includes claims that zebularine is decoded as C resulting in incorporation of leucine at ZeUU codons (Table S2 and S3 show that isoleucine is incorporated instead of leucine-this could simply be a typing mistake but it occurs in both tables).
Response #3: The comment is absolutely justified. Because MS analysis cannot differentiate between Ile and Leu (because their molecular masses are exactly the same), the software annotated the respective amino acid as Ile. Since the quantitative incorporation of Ile would require a pyrimidine-pyrimidine base pair (Ze-U at the first codon position), the respective amino acid can only be interpreted as Leu. This is in line with our previous work demonstrating that Ze is exclusively interpreted as C in the first codon position. (Hoernes et al., 2018). However, to reduce ambiguity we clarified these entries. Table S3 listing Phe as a decoded amino acid in HEK 293.

Comment #4: The authors state that UZeU is partly decoded as Phe but only in the bacterial system despite
Response #4: We included the findings from HEK293T cells into the main text in order to provide a more precise picture. Further, we included a reference to Table S4 that provides the quantities of all identified peptide species (p. 8, line 10).
"In bacteria, UZeU was also partly decoded (~8%) as a phenylalanine (Phe) codon, indicating that Ze can also base pair with A to a limited extend ( Figure 2I and Table S2). Although the base pair Ze-A is also observed in HEK293T cells, it is less abundant than in bacteria ( Figure 2J and Table S3 and S4)." Comment #5: Further on, they state that in the AUP codon, P (purine) is recognized as A whereas in Table S2 the codon is shown specifying Ile or Met showing that P is recognized as A or G. Later, on page 9, they state that DAP (2, in the codon AUDap cannot be decoded as Met in HEK 293 cells despite Table S3 showing a peptide that incorporated Met in response to AUDap. Response #5: In order to condense the discussion, we did not comment on low-level decoding effects. However, your point is valid. In bacteria, AUP was decoded to ~2% as Met and to ~98% as Ile. In HEK cells, AUDap was decoded to ~98% as Ile and only ~2% as Met. Although we do not discuss these decoding events in detail, we summarize these effects in Table S4. We rephrased the main text not to mislead the readers (p. 9, line 2): "In accordance with the results obtained when P was within the first two codon nucleotides, this base derivative was decoded almost exclusively as an A in the AUP codon ( Figure 3C and D, Table S2, S3 and S4)." Comment #6: The authors also make a strong argument for bacteria being "less restrictive towards modified nucleotides" by which, in context, they mean that there are fewer alternative interpretations of the nucleotides (for example, as A or G) in the HEK 293 data than in the bacterial data. That conclusion rests on determining the amino acid sequence of synthesized peptides and the amount of data from the proteins produced in vivo in HEK 293 cells is far less than from the bacterial experiments. The alternative interpretation of the non-standard nucleotides often much weaker, producing much less peptide, so it may be that the alternatives were not observed because of lack of data. The authors should either justify their conclusion or reduce their claim for "less restrictive" translation in bacteria.
Response #6: Thank you for pointing this out and we agree with your comment. Our interpretation rests on the ratio of "miscoded" peptides to "wild-type" peptides, which does not change in dependence of the quantity of the produced peptides. Nevertheless, we cannot exclude that we miss some very low-level variants due to limited amounts produced in HEK293T system. In order to determine the actual detection limit especially for our HEK293T samples in the MS/MS analyses, we calculated the amounts of variants that we would be able to detect within each of our samples by the employed setup. To be able to identify a C-terminal peptide representing a variant with high confidence we require an MS/MS intensity of at least 4.025E+05. By taking the signal of the C-terminal peptides of our samples into account, we could determine the detection limit, varying for each sample. The higher the peptide yield, the better the sensitivity (see below). Thus, we are confident that we could detect low-level variants, supporting our notion that miscoding is more prevalent in the bacterial in vitro system than in HEK293T cells. In addition, we commented on this in the discussion (p. 12, line 31): "Noteworthy, due to the lower amounts of purified translation products from HEK293T cells, we cannot completely exclude the existence of low-level peptide variants (below the detection limit; typically <1%)." MSMS intensity necessary to identify a C-terminal peptide at peak maximum: 4.02E+05

Spectrum
MSMS intensity (C-terminal peptide at peak maximum)