Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Opinion
  • Published:

Why are there four letters in the genetic alphabet?

Abstract

We list, without thinking, the four base types that make up DNA as adenine, guanine, cytosine and thymine. But why are there four? This question is now all the more relevant as organic chemists have synthesized new base pairs that can be incorporated into nucleic acids. Here, I argue that there are theoretical, experimental and computational reasons to believe that having four base types is a frozen relic from the RNA world, when RNA was genetic as well as enzymatic material.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Base-pairing pattern of a DNA molecule.
Figure 2: Base-pairing pattern dependent on shape complementarity.
Figure 3: In silico evolution of RNA.

Similar content being viewed by others

References

  1. Fisher, R. A. The Genetical Theory of Natural Selection (Oxford Univ. Press, London, 1930).

    Book  Google Scholar 

  2. Watson, J. D. & Crick, F. H. C. A structure for deoxyribose nucleic acid. Nature 171, 737 (1953).

    CAS  Google Scholar 

  3. Piccirilli, J. A., Krauch, T., Moroney, S. E. & Benner, S. A. Enzymatic incorporation of a new base pair into DNA and RNA extends the genetic alphabet. Nature 343, 33–37 (1990).

    Article  CAS  PubMed  Google Scholar 

  4. Kool, E. T. Hydrogen bonding, base stacking, and steric effects in DNA replication. Annu. Rev. Biophys. Biomol. Struct. 30, 1–22 (2001).

    Article  CAS  PubMed  Google Scholar 

  5. Benner, S. A. et al. Redesigning nucleic acids. Pure Appl. Chem. 70, 263–266 (1998).

    Article  CAS  PubMed  Google Scholar 

  6. Mathis, G. & Hunziker, J. Towards a DNA-like duplex without hydrogen-bonded base pairs. Angew. Chem. Int. Ed. 41, 3203–3205 (2002).

    Article  CAS  Google Scholar 

  7. Ogawa, A. K., Wu, Y., Berger, M., Schultz, P. G. & Romesberg, F. E. Rational design of an unnatural base pair with increased kinetic selectivity. J. Am. Chem. Soc. 122, 8803–8804 (2000).

    Article  CAS  Google Scholar 

  8. Kool, E. T. Synthetically modified DNAs as substrates for polymerases. Curr. Opin. Chem. Biol. 4, 602–608 (2000).

    Article  CAS  PubMed  Google Scholar 

  9. Switzer, C. Y., Moroney, S. E. & Benner, S. A. Enzymatic incorporation of a new base pair into DNA and RNA. J. Am. Chem. Soc. 111, 8322–8323 (1989).

    Article  CAS  Google Scholar 

  10. Roberts, C., Bandaru, R. & Switzer, C. Theoretical and experimental study of isoguanine and isocytosine: base pairing in an expanded genetic system. J. Am. Chem. Soc. 119, 4640–4649 (1997).

    Article  CAS  Google Scholar 

  11. Switzer, C. Y., Moroney, S. E. & Benner, S. A. Enzymatic recognition of the base pair between isocytidine and isoguanine. Biochemistry 32, 10489–10496 (1993).

    Article  CAS  PubMed  Google Scholar 

  12. Chu, C. K., Reichmann, U., Watanabe K. A. & Fox, J. J. Nucleosides 104. Synthesis of 4-amino-5-(D-ribofuranosyl) pyrimidine C-nucleosides from 2-(2,3-O-isopropylidene-5-O-trityl-D-ribofuranosyl)acetonitrile. J. Org. Chem. 42, 711–714 (1977).

    Article  CAS  PubMed  Google Scholar 

  13. Voegel, J. J. & Benner, S. A. Nonstandard hydrogen bonding in duplex oligonucleotides. The base pair between an acceptor–donor–donor pyrimidine analog and a donor–acceptor–acceptor purine analog. J. Am. Chem. Soc. 116, 6929–6930 (1994).

    Article  CAS  Google Scholar 

  14. Tae, E. L., Wu, Y., Xia, G., Schultz, P. G. & Romesberg, F. E. Efforts toward expansion of the genetic alphabet: replication of DNA with three base pairs. J. Am. Chem. Soc. 123, 7439–7440 (2001).

    Article  CAS  PubMed  Google Scholar 

  15. Yu, C., Henry, A. A., Romesberg, F. E. & Schultz, P. G. Polymerase recognition of unnatural base pairs. Angew. Chem. Int. Ed. 41, 3841–3844 (2002).

    Article  CAS  Google Scholar 

  16. Matsuda, S. et al. The effect of minor-groove hydrogen-bond acceptors and donors on the stability and replication of four unnatural base pairs. J. Am. Chem. Soc. 125, 6134–6139 (2003).

    Article  CAS  PubMed  Google Scholar 

  17. Wu, Y. et al. Enzymatic phosphorylation of unnatural nucleosides. J. Am. Chem. Soc. 124, 14626–14630 (2002).

    Article  CAS  PubMed  Google Scholar 

  18. Ohtsuki, T. et al. Unnatural base pairs for specific transcription. Proc. Natl Acad. Sci. USA 98, 4922–4925 (2001).

    Article  CAS  PubMed  Google Scholar 

  19. Hirao, I. et al. An unnatural base pair for incorporating amino acid analogs into proteins. Nature Biotech. 20, 177–182 (2002).

    Article  CAS  Google Scholar 

  20. Orgel, L. E. Nucleic acids — adding to the genetic alphabet. Nature 343, 18–20 (1990).

    Article  CAS  PubMed  Google Scholar 

  21. Orgel, L. E. Evolution of the genetic apparatus. J. Mol. Biol. 38, 381–393 (1968).

    Article  CAS  PubMed  Google Scholar 

  22. Crick, F. H. C. The origin of the genetic code. J. Mol. Biol. 38, 367–379 (1968).

    Article  CAS  PubMed  Google Scholar 

  23. Wächtershäuser, G. An all-purine precursor of nucleic acids. Proc. Natl Acad. Sci. USA 85, 1134–1135 (1988).

    Article  PubMed  Google Scholar 

  24. Zubay, G. An all-purine precursor of nucleic acids. Chemtracts 2, 439–442 (1991).

    CAS  Google Scholar 

  25. Gilbert, W. The RNA world. Nature 319, 618 (1986).

    Article  Google Scholar 

  26. Joyce, G. F. The antiquity of RNA-based evolution. Nature 418, 214–221 (2002).

    Article  CAS  PubMed  Google Scholar 

  27. Gardner, P. P., Holland, B. R., Moulton, V., Hendy, M. & Penny, D. Optimal alphabets for an RNA world. Proc. R. Soc. Lond. B 270, 1177–1182 (2003).

    Article  CAS  Google Scholar 

  28. Fontana, W., Konings, D., Stadler, P. & Schuster, P. Statistics of RNA secondary structures. Biopolymers 33, 1389–1404 (1993).

    Article  CAS  PubMed  Google Scholar 

  29. Schuster, P. RNA-based evolutionary optimization. Orig. Life Evol. Biosphere 23, 373–391 (1993).

    Article  CAS  Google Scholar 

  30. Grüner, W. et al. Analysis of RNA sequence and structure maps by exhaustive enumeration. Monatshefte Chem. 127, 355–374 (1996).

    Article  Google Scholar 

  31. Szathmáry, E. Four letters in the genetic alphabet: a frozen evolutionary optimum? Proc. R. Soc. Lond. B 245, 91–99 (1991).

    Article  Google Scholar 

  32. Szathmáry, E. What is the optimum size for the genetic alphabet? Proc. Natl Acad. Sci. USA 89, 2614–2618 (1992).

    Article  PubMed  Google Scholar 

  33. Benner, S. A., Ellington, A. D. & Tauer, S. A. Modern metabolism as a palimpsest of an RNA world. Proc. Natl Acid. Sci. USA 86, 7054–7058 (1989).

    Article  CAS  Google Scholar 

  34. Eigen, M. Self-organization of matter and the evolution of biological macromolecules. Naturwiissenschaften 58, 465–523 (1971).

    Article  CAS  Google Scholar 

  35. Rogers, J. & Joyce, G. F. The effect of cytidine on the structure and function of an RNA ligase ribozyme. RNA 7, 395–404 (2001).

    Article  CAS  PubMed  Google Scholar 

  36. Reader, J. S. & Joyce, G. F. A ribozyme composed of only two different nucleotides. Nature 420, 841–844 (2002).

    Article  CAS  PubMed  Google Scholar 

  37. Mac Dónaill, D. A. A parity code interpretation of nucleotide alphabet composition. Chem. Commun. 18, 2062–2063 (2002).

    Article  Google Scholar 

  38. Mac Dónaill, D. A. Why nature chose A, C, G and U/T: an error-coding perspective of nucleotide alphabet composition. Orig. Life Evol. Biosphere 33, 433–455 (2003).

    Article  Google Scholar 

  39. Mac Dónaill, D. A. & Brocklebank, D. An ab initio quantum chemical investigation of the error-coding model of nucleotide alphabet composition. Mol. Phys. 101, 2755–2763 (2003).

    Article  Google Scholar 

  40. McGinness, K. E. & Joyce, G. F. In search of an RNA replicase ribozyme. Chem. Biol. 10, 5–14 (2003).

    Article  CAS  PubMed  Google Scholar 

  41. Brautigam, C. A. & Steitz, T. A. Structural and functional insights provided by crystal structures of DNA polymerases and their substrate complexes. Curr. Opin. Struct. Biol. 8, 54–63 (1998).

    Article  CAS  PubMed  Google Scholar 

  42. Szathmáry, E. The origin of the genetic code: amino acids as cofactors in an RNA world. Trends Genet. 15, 223–229 (1999).

    Article  PubMed  Google Scholar 

  43. Wong, J. T. A coevolution theory of the genetic code. Proc. Natl Acad. Sci. USA 72, 1909–1912 (1975).

    Article  CAS  PubMed  Google Scholar 

  44. Maynard Smith, J. & Szathmáry, E. The Major Transitions in Evolution (Freeman, Oxford, 1995).

    Google Scholar 

  45. Benner, S. A. Synthetic biology: act natural. Nature 421, 118 (2003).

    Article  CAS  PubMed  Google Scholar 

  46. Hamming, R. W. Error detecting and error correcting codes. Bell Syst. Techn. J. 29, 147–160 (1950).

    Article  Google Scholar 

Download references

Acknowledgements

I thank the biologists at the Wissenschaftskolleg zu Berlin for vivid discussions.Also, B.Papp and V.Müller who kindly read the manuscript before submission.

Author information

Authors and Affiliations

Authors

Ethics declarations

Competing interests

The authors declare that they have no competing financial interests.

Related links

Related links

FURTHER INFORMATION

Eörs Szathmáry's web page

Scripps Research Institute

Steven Benner's web page

Glossary

AMINO-A

An adenine molecule with a second amino (-NH2) group attached to its carbon in position 2, which acts as an extra hydrogen-bond donor.

DEAMINATION

The reaction of a water molecule with the amino-group on position 4 of the pyrimidine ring of cytosine, which results in the conversion of cytosine to uracil.

DIRECTIONAL SELECTION

Natural selection that acts to promote the fixation (an increase in frequency in the population to 100%) of a particular allele.

EPIMERIZATION

The spontaneous change of configuration of chemical groups that are attached to a so-called asymmetric carbon atom. Such isomers are not mirror images of each other.

ERROR-CODING THEORY

A theory that was developed by Hamming to analyse the detection and correction of errors in messages consisting of 'zeros' and 'ones'.

KLENOW FRAGMENT

The Escherichia coli DNA polymerase, without the exonuclease subunit.

MALTHUSIAN GROWTH RATE

The per capita rate of growth of a population modelled in continuous time.

MUTATION–SELECTION EQUILIBRIUM

The equilibrium at which selection that decreases the frequency of an unfavourable allele exactly balances mutations that increase its frequency.

ORTHOGONALITY

Features of natural and/or artificial bases that in a given set (alphabet) decrease the degree of incorporating non-cognate base pairs.

PROCESSIVITY

The ability of polymerases to repeatedly add bases to the primer, extending even a new type of base.

RIBO-ORGANISM

A cell in the RNA world.

RNA WORLD

A hypothetical, but widely believed, era in early evolution when RNA-like molecules were not only genetic but also enzymatic material.

SIMULATED PROTOCELL MODEL

An in silico implementation of a ribo-organism.

STABILIZING SELECTION

Selection for the mean or intermediate phenotype; consequently, peripheral variants are eliminated, which maintains an existing state of adaptation in a stable environment.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Szathmáry, E. Why are there four letters in the genetic alphabet?. Nat Rev Genet 4, 995–1001 (2003). https://doi.org/10.1038/nrg1231

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1038/nrg1231

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing