Why are there four letters in the genetic alphabet?

Szathmáry, Eörs

doi:10.1038/nrg1231

Opinion
Published: 01 December 2003

Why are there four letters in the genetic alphabet?

Eörs Szathmáry^1,2

Nature Reviews Genetics volume 4, pages 995–1001 (2003)Cite this article

1774 Accesses
43 Citations
10 Altmetric
Metrics details

Abstract

We list, without thinking, the four base types that make up DNA as adenine, guanine, cytosine and thymine. But why are there four? This question is now all the more relevant as organic chemists have synthesized new base pairs that can be incorporated into nucleic acids. Here, I argue that there are theoretical, experimental and computational reasons to believe that having four base types is a frozen relic from the RNA world, when RNA was genetic as well as enzymatic material.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Base-pairing pattern of a DNA molecule.**

**Figure 2: Base-pairing pattern dependent on shape complementarity.**

**Figure 3: *In silico* evolution of RNA.**

The structural basis of the genetic code: amino acid recognition by aminoacyl-tRNA synthetases

Article Open access 28 July 2020

Florian Kaiser, Sarah Krautwurst, … Michael Schroeder

Enzymatic synthesis and nanopore sequencing of 12-letter supernumerary DNA

Article Open access 26 October 2023

Hinako Kawabe, Christopher A. Thomas, … Jorge A. Marchand

Non-complementary strand commutation as a fundamental alternative for information processing by DNA and gene regulation

Article 05 January 2023

Maxim P. Nikitin

References

Fisher, R. A. The Genetical Theory of Natural Selection (Oxford Univ. Press, London, 1930).
Book Google Scholar
Watson, J. D. & Crick, F. H. C. A structure for deoxyribose nucleic acid. Nature 171, 737 (1953).
CAS Google Scholar
Piccirilli, J. A., Krauch, T., Moroney, S. E. & Benner, S. A. Enzymatic incorporation of a new base pair into DNA and RNA extends the genetic alphabet. Nature 343, 33–37 (1990).
Article CAS PubMed Google Scholar
Kool, E. T. Hydrogen bonding, base stacking, and steric effects in DNA replication. Annu. Rev. Biophys. Biomol. Struct. 30, 1–22 (2001).
Article CAS PubMed Google Scholar
Benner, S. A. et al. Redesigning nucleic acids. Pure Appl. Chem. 70, 263–266 (1998).
Article CAS PubMed Google Scholar
Mathis, G. & Hunziker, J. Towards a DNA-like duplex without hydrogen-bonded base pairs. Angew. Chem. Int. Ed. 41, 3203–3205 (2002).
Article CAS Google Scholar
Ogawa, A. K., Wu, Y., Berger, M., Schultz, P. G. & Romesberg, F. E. Rational design of an unnatural base pair with increased kinetic selectivity. J. Am. Chem. Soc. 122, 8803–8804 (2000).
Article CAS Google Scholar
Kool, E. T. Synthetically modified DNAs as substrates for polymerases. Curr. Opin. Chem. Biol. 4, 602–608 (2000).
Article CAS PubMed Google Scholar
Switzer, C. Y., Moroney, S. E. & Benner, S. A. Enzymatic incorporation of a new base pair into DNA and RNA. J. Am. Chem. Soc. 111, 8322–8323 (1989).
Article CAS Google Scholar
Roberts, C., Bandaru, R. & Switzer, C. Theoretical and experimental study of isoguanine and isocytosine: base pairing in an expanded genetic system. J. Am. Chem. Soc. 119, 4640–4649 (1997).
Article CAS Google Scholar
Switzer, C. Y., Moroney, S. E. & Benner, S. A. Enzymatic recognition of the base pair between isocytidine and isoguanine. Biochemistry 32, 10489–10496 (1993).
Article CAS PubMed Google Scholar
Chu, C. K., Reichmann, U., Watanabe K. A. & Fox, J. J. Nucleosides 104. Synthesis of 4-amino-5-(D-ribofuranosyl) pyrimidine C-nucleosides from 2-(2,3-O-isopropylidene-5-O-trityl-D-ribofuranosyl)acetonitrile. J. Org. Chem. 42, 711–714 (1977).
Article CAS PubMed Google Scholar
Voegel, J. J. & Benner, S. A. Nonstandard hydrogen bonding in duplex oligonucleotides. The base pair between an acceptor–donor–donor pyrimidine analog and a donor–acceptor–acceptor purine analog. J. Am. Chem. Soc. 116, 6929–6930 (1994).
Article CAS Google Scholar
Tae, E. L., Wu, Y., Xia, G., Schultz, P. G. & Romesberg, F. E. Efforts toward expansion of the genetic alphabet: replication of DNA with three base pairs. J. Am. Chem. Soc. 123, 7439–7440 (2001).
Article CAS PubMed Google Scholar
Yu, C., Henry, A. A., Romesberg, F. E. & Schultz, P. G. Polymerase recognition of unnatural base pairs. Angew. Chem. Int. Ed. 41, 3841–3844 (2002).
Article CAS Google Scholar
Matsuda, S. et al. The effect of minor-groove hydrogen-bond acceptors and donors on the stability and replication of four unnatural base pairs. J. Am. Chem. Soc. 125, 6134–6139 (2003).
Article CAS PubMed Google Scholar
Wu, Y. et al. Enzymatic phosphorylation of unnatural nucleosides. J. Am. Chem. Soc. 124, 14626–14630 (2002).
Article CAS PubMed Google Scholar
Ohtsuki, T. et al. Unnatural base pairs for specific transcription. Proc. Natl Acad. Sci. USA 98, 4922–4925 (2001).
Article CAS PubMed Google Scholar
Hirao, I. et al. An unnatural base pair for incorporating amino acid analogs into proteins. Nature Biotech. 20, 177–182 (2002).
Article CAS Google Scholar
Orgel, L. E. Nucleic acids — adding to the genetic alphabet. Nature 343, 18–20 (1990).
Article CAS PubMed Google Scholar
Orgel, L. E. Evolution of the genetic apparatus. J. Mol. Biol. 38, 381–393 (1968).
Article CAS PubMed Google Scholar
Crick, F. H. C. The origin of the genetic code. J. Mol. Biol. 38, 367–379 (1968).
Article CAS PubMed Google Scholar
Wächtershäuser, G. An all-purine precursor of nucleic acids. Proc. Natl Acad. Sci. USA 85, 1134–1135 (1988).
Article PubMed Google Scholar
Zubay, G. An all-purine precursor of nucleic acids. Chemtracts 2, 439–442 (1991).
CAS Google Scholar
Gilbert, W. The RNA world. Nature 319, 618 (1986).
Article Google Scholar
Joyce, G. F. The antiquity of RNA-based evolution. Nature 418, 214–221 (2002).
Article CAS PubMed Google Scholar
Gardner, P. P., Holland, B. R., Moulton, V., Hendy, M. & Penny, D. Optimal alphabets for an RNA world. Proc. R. Soc. Lond. B 270, 1177–1182 (2003).
Article CAS Google Scholar
Fontana, W., Konings, D., Stadler, P. & Schuster, P. Statistics of RNA secondary structures. Biopolymers 33, 1389–1404 (1993).
Article CAS PubMed Google Scholar
Schuster, P. RNA-based evolutionary optimization. Orig. Life Evol. Biosphere 23, 373–391 (1993).
Article CAS Google Scholar
Grüner, W. et al. Analysis of RNA sequence and structure maps by exhaustive enumeration. Monatshefte Chem. 127, 355–374 (1996).
Article Google Scholar
Szathmáry, E. Four letters in the genetic alphabet: a frozen evolutionary optimum? Proc. R. Soc. Lond. B 245, 91–99 (1991).
Article Google Scholar
Szathmáry, E. What is the optimum size for the genetic alphabet? Proc. Natl Acad. Sci. USA 89, 2614–2618 (1992).
Article PubMed Google Scholar
Benner, S. A., Ellington, A. D. & Tauer, S. A. Modern metabolism as a palimpsest of an RNA world. Proc. Natl Acid. Sci. USA 86, 7054–7058 (1989).
Article CAS Google Scholar
Eigen, M. Self-organization of matter and the evolution of biological macromolecules. Naturwiissenschaften 58, 465–523 (1971).
Article CAS Google Scholar
Rogers, J. & Joyce, G. F. The effect of cytidine on the structure and function of an RNA ligase ribozyme. RNA 7, 395–404 (2001).
Article CAS PubMed Google Scholar
Reader, J. S. & Joyce, G. F. A ribozyme composed of only two different nucleotides. Nature 420, 841–844 (2002).
Article CAS PubMed Google Scholar
Mac Dónaill, D. A. A parity code interpretation of nucleotide alphabet composition. Chem. Commun. 18, 2062–2063 (2002).
Article Google Scholar
Mac Dónaill, D. A. Why nature chose A, C, G and U/T: an error-coding perspective of nucleotide alphabet composition. Orig. Life Evol. Biosphere 33, 433–455 (2003).
Article Google Scholar
Mac Dónaill, D. A. & Brocklebank, D. An ab initio quantum chemical investigation of the error-coding model of nucleotide alphabet composition. Mol. Phys. 101, 2755–2763 (2003).
Article Google Scholar
McGinness, K. E. & Joyce, G. F. In search of an RNA replicase ribozyme. Chem. Biol. 10, 5–14 (2003).
Article CAS PubMed Google Scholar
Brautigam, C. A. & Steitz, T. A. Structural and functional insights provided by crystal structures of DNA polymerases and their substrate complexes. Curr. Opin. Struct. Biol. 8, 54–63 (1998).
Article CAS PubMed Google Scholar
Szathmáry, E. The origin of the genetic code: amino acids as cofactors in an RNA world. Trends Genet. 15, 223–229 (1999).
Article PubMed Google Scholar
Wong, J. T. A coevolution theory of the genetic code. Proc. Natl Acad. Sci. USA 72, 1909–1912 (1975).
Article CAS PubMed Google Scholar
Maynard Smith, J. & Szathmáry, E. The Major Transitions in Evolution (Freeman, Oxford, 1995).
Google Scholar
Benner, S. A. Synthetic biology: act natural. Nature 421, 118 (2003).
Article CAS PubMed Google Scholar
Hamming, R. W. Error detecting and error correcting codes. Bell Syst. Techn. J. 29, 147–160 (1950).
Article Google Scholar

Download references

Acknowledgements

I thank the biologists at the Wissenschaftskolleg zu Berlin for vivid discussions.Also, B.Papp and V.Müller who kindly read the manuscript before submission.

Author information

Authors and Affiliations

Institute for Advanced Study, Berlin (Wissenschaftskolleg zu Berlin),
Eörs Szathmáry
Institute for Advanced Study, Budapest (Collegium Budapest), 2 Szentháromság, Budapest, H-1014, Hungary
Eörs Szathmáry

Authors

Eörs Szathmáry
View author publications
You can also search for this author in PubMed Google Scholar

Ethics declarations

Competing interests

The authors declare that they have no competing financial interests.

Glossary

AMINO-A: An adenine molecule with a second amino (-NH₂) group attached to its carbon in position 2, which acts as an extra hydrogen-bond donor.
DEAMINATION: The reaction of a water molecule with the amino-group on position 4 of the pyrimidine ring of cytosine, which results in the conversion of cytosine to uracil.
DIRECTIONAL SELECTION: Natural selection that acts to promote the fixation (an increase in frequency in the population to 100%) of a particular allele.
EPIMERIZATION: The spontaneous change of configuration of chemical groups that are attached to a so-called asymmetric carbon atom. Such isomers are not mirror images of each other.
ERROR-CODING THEORY: A theory that was developed by Hamming to analyse the detection and correction of errors in messages consisting of 'zeros' and 'ones'.
KLENOW FRAGMENT: The Escherichia coli DNA polymerase, without the exonuclease subunit.
MALTHUSIAN GROWTH RATE: The per capita rate of growth of a population modelled in continuous time.
MUTATION–SELECTION EQUILIBRIUM: The equilibrium at which selection that decreases the frequency of an unfavourable allele exactly balances mutations that increase its frequency.
ORTHOGONALITY: Features of natural and/or artificial bases that in a given set (alphabet) decrease the degree of incorporating non-cognate base pairs.
PROCESSIVITY: The ability of polymerases to repeatedly add bases to the primer, extending even a new type of base.
RIBO-ORGANISM: A cell in the RNA world.
RNA WORLD: A hypothetical, but widely believed, era in early evolution when RNA-like molecules were not only genetic but also enzymatic material.
SIMULATED PROTOCELL MODEL: An in silico implementation of a ribo-organism.
STABILIZING SELECTION: Selection for the mean or intermediate phenotype; consequently, peripheral variants are eliminated, which maintains an existing state of adaptation in a stable environment.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Szathmáry, E. Why are there four letters in the genetic alphabet?. Nat Rev Genet 4, 995–1001 (2003). https://doi.org/10.1038/nrg1231

Download citation

Issue Date: 01 December 2003
DOI: https://doi.org/10.1038/nrg1231

This article is cited by

Statistical analysis of synonymous and stop codons in pseudo-random and real sequences as a function of GC content
- Valentin Wesp
- Günter Theißen
- Stefan Schuster
Scientific Reports (2023)
Excitation and ionization energies of unnatural nucleic acid bases: a computational approach
- Mandira Dey
- Paulami Ghosh
- Debashree Ghosh
Theoretical Chemistry Accounts (2023)
Kinds of modalities and modeling practices
- Rami Koskinen
Synthese (2023)
Prebiotic competition and evolution in self-replicating polynucleotides can explain the properties of DNA/RNA in modern living systems
- Hemachander Subramanian
- Joel Brown
- Robert Gatenby
BMC Evolutionary Biology (2020)
Advances in high-dimensional quantum entanglement
- Manuel Erhard
- Mario Krenn
- Anton Zeilinger
Nature Reviews Physics (2020)

Why are there four letters in the genetic alphabet?

Abstract

Access options

Similar content being viewed by others

The structural basis of the genetic code: amino acid recognition by aminoacyl-tRNA synthetases

Enzymatic synthesis and nanopore sequencing of 12-letter supernumerary DNA

Non-complementary strand commutation as a fundamental alternative for information processing by DNA and gene regulation

References

Acknowledgements

Author information

Authors and Affiliations

Ethics declarations

Competing interests

Related links

FURTHER INFORMATION

Glossary

Rights and permissions

About this article

Cite this article

This article is cited by

Statistical analysis of synonymous and stop codons in pseudo-random and real sequences as a function of GC content

Excitation and ionization energies of unnatural nucleic acid bases: a computational approach

Kinds of modalities and modeling practices

Prebiotic competition and evolution in self-replicating polynucleotides can explain the properties of DNA/RNA in modern living systems

Advances in high-dimensional quantum entanglement

Search

Quick links

Abstract

Access options

Similar content being viewed by others

References

Acknowledgements

Author information

Authors and Affiliations

Ethics declarations

Competing interests

Related links

Related links

FURTHER INFORMATION

Glossary

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links