3.3 Reading the Genetic Code Citation

Once scientists determined that messenger RNA (mRNA) served as a copy of each gene's DNA and specified the sequence of amino acids in proteins, they immediately had many more questions about the process of protein formation. Specifically, these researchers knew that proteins are made from 20 different amino acids. Moreover, they also knew that there were only four nucleotides in mRNA: adenine (A), cytosine (C), guanine (G), and uracil (U). But how exactly could these four nucleotides code for all 20 amino acids? The answer to this question turned out to be simpler than one might expect.

Determining the Number of Nucleotides Per Amino Acid

Right away, researchers knew that the genetic code was more complex than one nucleotide per amino acid. After all, if this was the case, a person's DNA could only code for four different amino acids. In fact, even two nucleotides per amino acid (i.e., a doublet code) could not account for 20 amino acids, because such a code provides only 16 permutations (four bases at each of two positions = 4 × 4 = 16 amino acids).

Figure 1: Distinct possibilities: Overlapping or non-overlapping genetic code?

Early researchers studying the genetic code had to determine if the mRNA encoding amino acids was non-overlapping. Was it each sequential set of three nucleotides encoding one amino acid? Or was it overlapping, with each three-nucleotide code beginning on sequential single nucleotides?

Figure Detail

Thus, early researchers quickly determined that the smallest combination of As, Cs, Gs, and Us that could encode all 20 amino acids in RNA would be a triplet (three-base) code. A triplet combination, or codon, would allow for 64 possible combinations (four bases at each of three positions = 4 × 4 × 4 = 64). However, with only 20 amino acids, a triplet code would also suggest redundancy–in other words, more than one codon might correspond to the same amino acid, or there might even be "spare" or unused codons. If such "spare" codons were present, what was their purpose? Did they serve to "break up" the code, much like commas in a sentence? Furthermore, how could a three-nucleotide code be "read" by the protein-forming machinery of the ribosome? Was it an overlapping or non-overlapping code (Figure 1)? Was it a continuous code, or were there "commas" (spare nucleotides) between codons that served as signals for the next amino acid (Table 1)? These questions were answered by way of several elegant experiments.