Reading the Genetic Code

By: Amy Ralston, Ph.D. (Write Science Right) & Kenna Shaw, Ph.D. (Nature Education) © 2008 Nature Education
Citation: Ralston, A. & Shaw, K. (2008) Reading the genetic code. Nature Education 1(1)

How can just four nitrogenous bases--adenine, cytosine, guanine, and uracil--possibly code for all 20 amino acids?

 

Once scientists determined that messenger RNA (mRNA) served as a copy of each gene's DNA and specified the sequence of amino acids in proteins, they immediately had many more questions about the process of protein formation. Specifically, these researchers knew that proteins are made from 20 different amino acids. Moreover, they also knew that there were only four nucleotides in mRNA: adenine (A), cytosine (C), guanine (G), and uracil (U). But how exactly could these four nucleotides code for all 20 amino acids? The answer to this question turned out to be simpler than one might expect.

Determining the Number of Nucleotides Per Amino Acid

Distinct possibilities:  Overlapping or non-overlapping genetic code?
Figure 1: Distinct possibilities: Overlapping or non-overlapping genetic code?

Right away, researchers knew that the genetic code was more complex than one nucleotide per amino acid. After all, if this was the case, a person's DNA could only code for four different amino acids. In fact, even two nucleotides per amino acid (i.e., a doublet code) could not account for 20 amino acids, because such a code provides only 16 permutations (four bases at each of two positions = 4 × 4 = 16 amino acids).

Thus, early researchers quickly determined that the smallest combination of As, Cs, Gs, and Us that could encode all 20 amino acids in RNA would be a triplet (three-base) code. A triplet combination, or codon, would allow for 64 possible combinations (four bases at each of three positions = 4 × 4 × 4 = 64). However, with only 20 amino acids, a triplet code would also suggest redundancy–in other words, more than one codon might correspond to the same amino acid, or there might even be "spare" or unused codons. If such "spare" codons were present, what was their purpose? Did they serve to "break up" the code, much like commas in a sentence? Furthermore, how could a three-nucleotide code be "read" by the protein-forming machinery of the ribosome? Was it an overlapping or non-overlapping code (Figure 1)? Was it a continuous code, or were there "commas" (spare nucleotides) between codons that served as signals for the next amino acid (Table 1)? These questions were answered by way of several elegant experiments.

Table 1: Did the Code Have "Commas" or Not?

An overlapping code provided scientists with predictions they could test.

Ruling Out Overlaps

In their investigation of the exact nature of the genetic code, scientists first turned to the question of possible overlaps. Specifically, researchers Akira Tsugita and Heinz Fraenkel-Conrat (1960) proposed that if the code were overlapping, a mutation (or change) in one nucleotide would cause changes in more than one amino acid in the resulting protein. Fortunately, recent technological advancements had made it possible for Tsugita and Fraenkel-Conrat to determine the amino acid sequence in short proteins. Thus, by comparing protein sequences made from both nonmutated and mutated DNA, they were able to resolve this issue. First, the research team treated tobacco mosaic virus DNA with nitrous acid, leading to a point mutation in the DNA sequence. Then, they compared the protein produced by the mutated DNA with that produced by the "normal" viral DNA. Strikingly, the amino acid sequence of the "mutant" protein contained a change in only one amino acid, strongly suggesting use of a non-overlapping code.

Determining Codon Length

However, Tsugita and Fraenkel-Conrat's findings alone did not resolve whether the genetic code was read in sets of three nucleotides or perhaps more. This issue was addressed by a separate research team consisting of Francis Crick, Leslie Barnett, Sydney Brenner, and Richard Watts-Tobin. In 1961, this group provided the first evidence for a triplet code by way of experiments using the T4 bacteriophage (a bacteria-specific virus).

In particular, these researchers devised a clever assay that enabled them to deduce the properties of the genetic code following introduction of a special kind of mutation, known as a frameshift mutation. A frameshift mutation is caused by either the addition or the deletion of a base in the original DNA sequence, which in turn causes the protein-forming machinery to shift positions (or reading frames) on the RNA. Such a frameshift alters codon groupings, and thus the corresponding protein is made with incorrect amino acids from the point of the mutation onward (Figure 2).

In their work, the research team first introduced a single frameshift mutation into a viral protein involved in the infection of E. coli bacteria. (Bacterial infection was the readout in this experiment.) This addition of a lone frameshift mutation rendered the resulting protein ineffective. The researchers then introduced additional frameshift mutations in the hope that doing so would restore the correct reading frame (and, in turn, allow the protein to once again play a role in the infection of E. coli). The experiment worked! For example, when the first mutation added a base (+), a later suppressor mutation (-), which deleted a base, was able to put the code back on track.

Interestingly, the team noted that the introduction of three separate frameshift mutations that each added a base (+ + +) to the same DNA were also sometimes (when they were close together) able put the code back on track. Similarly, three mutations that deleted a base (- - -) could also rescue protein function and infectivity. Therefore, the code was only thrown off by nontriplet changes. This finding strongly supported the existence of a triplet code, or at least a code written in multiples of three bases. Thus, when Crick and his colleagues analyzed their results, they were the first people to see that the genetic code was based on multiples of three bases!

References and Recommended Reading


Crick, F. H. C., et al. General nature of the genetic code for proteins. Nature 192, 1227–1232 (1961). doi:10.1038/1921227a0 (link to article)

Tsugita, A., & Fraenkel-Conrat, H. The amino acid composition and C-terminal sequence of a chemically evoked mutant of TMV. Proceedings of the National Academy of Sciences 46, 636–642 (1960)


Flag Inappropriate

This content is currently under construction.

This reading is linked to the following Scitable pages:

How can a gene, consisting of a string of DNA hidden in the nucleus, know when it should express itself? How does the gene cause production of a string of amino acids called a protein?
How does the cell convert DNA into working proteins? The process of translation can be seen as the decoding of instructions for making proteins, involving mRNA in transcription as well as tRNA.
It was a pleasant surprise to the scientific community when clinical research with the bacterium Streptococcus pneumoniae led to the discovery of DNA as the hereditary material.
The more researchers examine RNA, the more surprises they continue to uncover. What have we learned about RNA structure and function so far?
Awash in a sea of data, how do scientists identify the function of a newly cloned gene? Online resources like the Basic Local Alignment Search Tool (BLAST) provide a helping hand.
All Articles Within Gene Expression and Regulation (47)

Chromatin Structure & Histone Modifications (1)

  • Chromatin in eukaryotic regulation
    In eukaryotes, the tight or loose packaging of the genes in chromatin (DNA plus specific proteins) can control whether the genes can be expressed to form their encoded product. Chromatin is usually not 'permissive' but it can be modified in specific areas to open it up for transcription of the genes.

Transcription Factors (7)

From DNA to Protein (6)

Regulation of Transcription (7)

Organization of Chromatin (5)

RNA (5)

  • Small Non-coding RNA and Gene Expression
    While we've been taught: don't shoot the messenger, our cells haven’t gotten the message. See how small bits of non-coding RNA target mRNA for destruction and regulate gene expression.
  • RNA Transcription by RNA Polymerase: Prokaryotes vs Eukaryotes
    Gene expression is linked to RNA transcription, which cannot happen without RNA polymerase. However, this is where the similarities between prokaryote and eukaryote expression end.
  • mRNA: History of Functional Investigation
    When looking at a molecule that's transported around like RNA, how can you figure out where it's synthesized and where it ends up? See how researchers examined this very question in the 1950s.
  • The Role of Ribosomes in Protein Synthesis
    How did Brenner, Jacob and Meselson figure out the "missing link" in central dogma? By using radioactive tracing called "pulse-chasing" to discover how ribosomes enable protein synthesis.
  • RNA Functions
    The central dogma of molecular biology suggests that the primary role of RNA is to convert the information stored in DNA into proteins. In reality, there is much more to the RNA story.

Gene Responses to Environment (9)

Consequences of Gene Regulation (9)

 
Ask an Expert
Post Question



Nature Education Home Learn More About Faculty Page Students Page Feedback



Genetics

Event Reminder