DNA makes RNA. RNA makes proteins. The language in which information is transfered from RNA to protein is what we call the 'genetic code'.

This code is created by the interaction between three bases on the messenger RNA called codon and three on the transfer RNA called anticodon. The anticodon reads a matching codon under ribosomal guidance to synthesize polypeptides. The process is called translation.

A fundamental feature of organisms is the near universality of genetic code. With the exception of methionine and tryptophan, each amino acid is encoded by 2-6 different codons, called synonymn codons.

Interestingly, even though several codon options are available for each amino acid, organisms do not use them randomly. They show a special preference for a certain codon and overuse it for incorporating a given amino acid to the growing polypeptide chain. This property is called codon bias. The evolutionary reason for the emergence of codon bias is unclear.


Even though genetic code is considered universal, several exceptions to the codon-amino acid mapping exist. In the standard genetic code table, the UGA is a 'stop' signal that tells resident molecules to exit the translation process. However, mitochondria employs UGA signal for incorporating tryptophan to the growing polypeptide chain.

Likewise, in the standard table, AUA is a code for incorporating isoleucine. However, mitochondria uses this code for making methionine. Further, organisms have evolved special coding strategies for incorporating selenocysteine and pyrrolysine.

Although understanding organism-specific preferences in codon usage helps improve protein yields, the experimental outcome largely remains unpredictable. To add a new unnatural amino acid to the cellular inventory, one needs a codon that uniquely specifies that amino acid.

Encoding unnatural amino acids

It is possible to plug in synthetic amino acids of choice into an existing genome. © Pawan K. Dhar

Scientists are now able to genetically encode unnatural amino acids beyond the standard set of 20. The modification of standard genetic code has resulted in new method of making proteins with new functions. For example, one can make proteins with new side chains embedded with fluorophore, metal ions and so on.

More than 70 unnatural amino acids (UAAs) have been incorporated into proteins in model organisms and cell lines. These represent a wide range of structures and functions not found in the natural set of 20 amino acids. Unnatural amino acids with orthogonal chemical reactivities have enabled the site-specific modification of proteins through a diverse 'toolkit' of chemistries.

Although synthetic and semi-synthetic methods have proven useful for incorporating unnatural amino acids into proteins, the yields are expectedly low and the process technically cumbersome. Also, it can't be scaled up in the present form. Future work is expected in the area of improving the targeted integration and output of unnatural amino acid induction.

Scientists have also demonstrated the feasibility of introducing several incremental and precise modifications in the genome of E. coli such that the genome recoding results in a new functionality. This builds upon a similar previous work.

The team developed a multiplex automated genome engineering (MAGE) approach that permits repetitive introduction and maintenance of mutations in the cell population. The MAGE approach was used to replace all 314 TAG stop codons with synonymous TAA codons in E. coli . Using MAGE they confirmed viability for each modification, and identified associated phenotypes. Their genome editing methods permit targeted modification of a vast genetic landscape. In theory, one could produce a completely synthetic genome leading to the construction of an organism with a new operating system.


Despite fascinating advances in large-scale manipulation of genome, several issues remain. Given that arrangement of amino acids in the standard genetic code is non-random, we do not know how much of the novel genetic code can be rewritten? Further, it would be interesting to see how many of the existing redundant codons can be freed up to accommodate new amino acids? Also, how many novel and practical applications are we looking at?

Given that the existing codon to phenotype relationship has been optimized over millions of years, how long will the new protein be functional and stable in the native setting? What are the boundary conditions of adding a new amino acid to the exisitng genetic alphabet vocabulary?

Errors also arise due to mispairing between the tRNA and mRNA templates. Will such errors reduce or increase the fitness potential of the synthetic genetic code?

Several models have been proposed to understand the evolution of genetic code : (a) steric complementarity that suggests specific interactions between amino acids and triplet codons or anticodons (b) frozen accident that suggests a random fixation and propagation of the fixed code, and (c) adaptive evolution that suggests progressive modification of a certain initial codon format.

Clearly the genetic code is not a frozen accident. Given the significant conservation of genes, interactions and pathways across evolutionary spectrum, nature does not seem to operate by creating a large number of random combinations at every level. This is in particular applicable to the genetic code.

Also currently there seem to be no convincing evidence to indicate that early genetic codes were retired during evolution. The good thing is that nature introduced alternative genetic codes at many places that are still alive and kicking.

It remains to be seen how much of modern genetic hacking nature will allow.