Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Why are there four letters in the genetic alphabet?


We list, without thinking, the four base types that make up DNA as adenine, guanine, cytosine and thymine. But why are there four? This question is now all the more relevant as organic chemists have synthesized new base pairs that can be incorporated into nucleic acids. Here, I argue that there are theoretical, experimental and computational reasons to believe that having four base types is a frozen relic from the RNA world, when RNA was genetic as well as enzymatic material.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Figure 1: Base-pairing pattern of a DNA molecule.
Figure 2: Base-pairing pattern dependent on shape complementarity.
Figure 3: In silico evolution of RNA.


  1. 1

    Fisher, R. A. The Genetical Theory of Natural Selection (Oxford Univ. Press, London, 1930).

    Book  Google Scholar 

  2. 2

    Watson, J. D. & Crick, F. H. C. A structure for deoxyribose nucleic acid. Nature 171, 737 (1953).

    CAS  Google Scholar 

  3. 3

    Piccirilli, J. A., Krauch, T., Moroney, S. E. & Benner, S. A. Enzymatic incorporation of a new base pair into DNA and RNA extends the genetic alphabet. Nature 343, 33–37 (1990).

    CAS  Article  PubMed  Google Scholar 

  4. 4

    Kool, E. T. Hydrogen bonding, base stacking, and steric effects in DNA replication. Annu. Rev. Biophys. Biomol. Struct. 30, 1–22 (2001).

    CAS  Article  PubMed  Google Scholar 

  5. 5

    Benner, S. A. et al. Redesigning nucleic acids. Pure Appl. Chem. 70, 263–266 (1998).

    CAS  Article  PubMed  Google Scholar 

  6. 6

    Mathis, G. & Hunziker, J. Towards a DNA-like duplex without hydrogen-bonded base pairs. Angew. Chem. Int. Ed. 41, 3203–3205 (2002).

    CAS  Article  Google Scholar 

  7. 7

    Ogawa, A. K., Wu, Y., Berger, M., Schultz, P. G. & Romesberg, F. E. Rational design of an unnatural base pair with increased kinetic selectivity. J. Am. Chem. Soc. 122, 8803–8804 (2000).

    CAS  Article  Google Scholar 

  8. 8

    Kool, E. T. Synthetically modified DNAs as substrates for polymerases. Curr. Opin. Chem. Biol. 4, 602–608 (2000).

    CAS  Article  PubMed  Google Scholar 

  9. 9

    Switzer, C. Y., Moroney, S. E. & Benner, S. A. Enzymatic incorporation of a new base pair into DNA and RNA. J. Am. Chem. Soc. 111, 8322–8323 (1989).

    CAS  Article  Google Scholar 

  10. 10

    Roberts, C., Bandaru, R. & Switzer, C. Theoretical and experimental study of isoguanine and isocytosine: base pairing in an expanded genetic system. J. Am. Chem. Soc. 119, 4640–4649 (1997).

    CAS  Article  Google Scholar 

  11. 11

    Switzer, C. Y., Moroney, S. E. & Benner, S. A. Enzymatic recognition of the base pair between isocytidine and isoguanine. Biochemistry 32, 10489–10496 (1993).

    CAS  Article  PubMed  Google Scholar 

  12. 12

    Chu, C. K., Reichmann, U., Watanabe K. A. & Fox, J. J. Nucleosides 104. Synthesis of 4-amino-5-(D-ribofuranosyl) pyrimidine C-nucleosides from 2-(2,3-O-isopropylidene-5-O-trityl-D-ribofuranosyl)acetonitrile. J. Org. Chem. 42, 711–714 (1977).

    CAS  Article  PubMed  Google Scholar 

  13. 13

    Voegel, J. J. & Benner, S. A. Nonstandard hydrogen bonding in duplex oligonucleotides. The base pair between an acceptor–donor–donor pyrimidine analog and a donor–acceptor–acceptor purine analog. J. Am. Chem. Soc. 116, 6929–6930 (1994).

    CAS  Article  Google Scholar 

  14. 14

    Tae, E. L., Wu, Y., Xia, G., Schultz, P. G. & Romesberg, F. E. Efforts toward expansion of the genetic alphabet: replication of DNA with three base pairs. J. Am. Chem. Soc. 123, 7439–7440 (2001).

    CAS  Article  PubMed  Google Scholar 

  15. 15

    Yu, C., Henry, A. A., Romesberg, F. E. & Schultz, P. G. Polymerase recognition of unnatural base pairs. Angew. Chem. Int. Ed. 41, 3841–3844 (2002).

    CAS  Article  Google Scholar 

  16. 16

    Matsuda, S. et al. The effect of minor-groove hydrogen-bond acceptors and donors on the stability and replication of four unnatural base pairs. J. Am. Chem. Soc. 125, 6134–6139 (2003).

    CAS  Article  PubMed  Google Scholar 

  17. 17

    Wu, Y. et al. Enzymatic phosphorylation of unnatural nucleosides. J. Am. Chem. Soc. 124, 14626–14630 (2002).

    CAS  Article  PubMed  Google Scholar 

  18. 18

    Ohtsuki, T. et al. Unnatural base pairs for specific transcription. Proc. Natl Acad. Sci. USA 98, 4922–4925 (2001).

    CAS  Article  PubMed  Google Scholar 

  19. 19

    Hirao, I. et al. An unnatural base pair for incorporating amino acid analogs into proteins. Nature Biotech. 20, 177–182 (2002).

    CAS  Article  Google Scholar 

  20. 20

    Orgel, L. E. Nucleic acids — adding to the genetic alphabet. Nature 343, 18–20 (1990).

    CAS  Article  PubMed  Google Scholar 

  21. 21

    Orgel, L. E. Evolution of the genetic apparatus. J. Mol. Biol. 38, 381–393 (1968).

    CAS  Article  PubMed  Google Scholar 

  22. 22

    Crick, F. H. C. The origin of the genetic code. J. Mol. Biol. 38, 367–379 (1968).

    CAS  Article  PubMed  Google Scholar 

  23. 23

    Wächtershäuser, G. An all-purine precursor of nucleic acids. Proc. Natl Acad. Sci. USA 85, 1134–1135 (1988).

    Article  PubMed  Google Scholar 

  24. 24

    Zubay, G. An all-purine precursor of nucleic acids. Chemtracts 2, 439–442 (1991).

    CAS  Google Scholar 

  25. 25

    Gilbert, W. The RNA world. Nature 319, 618 (1986).

    Article  Google Scholar 

  26. 26

    Joyce, G. F. The antiquity of RNA-based evolution. Nature 418, 214–221 (2002).

    CAS  Article  PubMed  Google Scholar 

  27. 27

    Gardner, P. P., Holland, B. R., Moulton, V., Hendy, M. & Penny, D. Optimal alphabets for an RNA world. Proc. R. Soc. Lond. B 270, 1177–1182 (2003).

    CAS  Article  Google Scholar 

  28. 28

    Fontana, W., Konings, D., Stadler, P. & Schuster, P. Statistics of RNA secondary structures. Biopolymers 33, 1389–1404 (1993).

    CAS  Article  PubMed  Google Scholar 

  29. 29

    Schuster, P. RNA-based evolutionary optimization. Orig. Life Evol. Biosphere 23, 373–391 (1993).

    CAS  Article  Google Scholar 

  30. 30

    Grüner, W. et al. Analysis of RNA sequence and structure maps by exhaustive enumeration. Monatshefte Chem. 127, 355–374 (1996).

    Article  Google Scholar 

  31. 31

    Szathmáry, E. Four letters in the genetic alphabet: a frozen evolutionary optimum? Proc. R. Soc. Lond. B 245, 91–99 (1991).

    Article  Google Scholar 

  32. 32

    Szathmáry, E. What is the optimum size for the genetic alphabet? Proc. Natl Acad. Sci. USA 89, 2614–2618 (1992).

    Article  PubMed  Google Scholar 

  33. 33

    Benner, S. A., Ellington, A. D. & Tauer, S. A. Modern metabolism as a palimpsest of an RNA world. Proc. Natl Acid. Sci. USA 86, 7054–7058 (1989).

    CAS  Article  Google Scholar 

  34. 34

    Eigen, M. Self-organization of matter and the evolution of biological macromolecules. Naturwiissenschaften 58, 465–523 (1971).

    CAS  Article  Google Scholar 

  35. 35

    Rogers, J. & Joyce, G. F. The effect of cytidine on the structure and function of an RNA ligase ribozyme. RNA 7, 395–404 (2001).

    CAS  Article  PubMed  Google Scholar 

  36. 36

    Reader, J. S. & Joyce, G. F. A ribozyme composed of only two different nucleotides. Nature 420, 841–844 (2002).

    CAS  Article  PubMed  Google Scholar 

  37. 37

    Mac Dónaill, D. A. A parity code interpretation of nucleotide alphabet composition. Chem. Commun. 18, 2062–2063 (2002).

    Article  Google Scholar 

  38. 38

    Mac Dónaill, D. A. Why nature chose A, C, G and U/T: an error-coding perspective of nucleotide alphabet composition. Orig. Life Evol. Biosphere 33, 433–455 (2003).

    Article  Google Scholar 

  39. 39

    Mac Dónaill, D. A. & Brocklebank, D. An ab initio quantum chemical investigation of the error-coding model of nucleotide alphabet composition. Mol. Phys. 101, 2755–2763 (2003).

    Article  Google Scholar 

  40. 40

    McGinness, K. E. & Joyce, G. F. In search of an RNA replicase ribozyme. Chem. Biol. 10, 5–14 (2003).

    CAS  Article  PubMed  Google Scholar 

  41. 41

    Brautigam, C. A. & Steitz, T. A. Structural and functional insights provided by crystal structures of DNA polymerases and their substrate complexes. Curr. Opin. Struct. Biol. 8, 54–63 (1998).

    CAS  Article  PubMed  Google Scholar 

  42. 42

    Szathmáry, E. The origin of the genetic code: amino acids as cofactors in an RNA world. Trends Genet. 15, 223–229 (1999).

    Article  PubMed  Google Scholar 

  43. 43

    Wong, J. T. A coevolution theory of the genetic code. Proc. Natl Acad. Sci. USA 72, 1909–1912 (1975).

    CAS  Article  PubMed  Google Scholar 

  44. 44

    Maynard Smith, J. & Szathmáry, E. The Major Transitions in Evolution (Freeman, Oxford, 1995).

    Google Scholar 

  45. 45

    Benner, S. A. Synthetic biology: act natural. Nature 421, 118 (2003).

    CAS  Article  PubMed  Google Scholar 

  46. 46

    Hamming, R. W. Error detecting and error correcting codes. Bell Syst. Techn. J. 29, 147–160 (1950).

    Article  Google Scholar 

Download references


I thank the biologists at the Wissenschaftskolleg zu Berlin for vivid discussions.Also, B.Papp and V.Müller who kindly read the manuscript before submission.

Author information



Ethics declarations

Competing interests

The authors declare that they have no competing financial interests.

Related links

Related links


Eörs Szathmáry's web page

Scripps Research Institute

Steven Benner's web page



An adenine molecule with a second amino (-NH2) group attached to its carbon in position 2, which acts as an extra hydrogen-bond donor.


The reaction of a water molecule with the amino-group on position 4 of the pyrimidine ring of cytosine, which results in the conversion of cytosine to uracil.


Natural selection that acts to promote the fixation (an increase in frequency in the population to 100%) of a particular allele.


The spontaneous change of configuration of chemical groups that are attached to a so-called asymmetric carbon atom. Such isomers are not mirror images of each other.


A theory that was developed by Hamming to analyse the detection and correction of errors in messages consisting of 'zeros' and 'ones'.


The Escherichia coli DNA polymerase, without the exonuclease subunit.


The per capita rate of growth of a population modelled in continuous time.


The equilibrium at which selection that decreases the frequency of an unfavourable allele exactly balances mutations that increase its frequency.


Features of natural and/or artificial bases that in a given set (alphabet) decrease the degree of incorporating non-cognate base pairs.


The ability of polymerases to repeatedly add bases to the primer, extending even a new type of base.


A cell in the RNA world.


A hypothetical, but widely believed, era in early evolution when RNA-like molecules were not only genetic but also enzymatic material.


An in silico implementation of a ribo-organism.


Selection for the mean or intermediate phenotype; consequently, peripheral variants are eliminated, which maintains an existing state of adaptation in a stable environment.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Szathmáry, E. Why are there four letters in the genetic alphabet?. Nat Rev Genet 4, 995–1001 (2003).

Download citation

Further reading


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing