This page has been archived and is no longer updated

Aa Aa Aa

The Order of Nucleotides in a Gene Is Revealed by DNA Sequencing

A schematic shows four nucleotide molecules against a white background. Each contains a gray cylinder, representing the deoxyribose sugar, attached to a colored hexagonal prism, representing the nitrogenous base. The base is twice as long as the diameter of the gray cylinder. The green nitrogenous base is labeled A (upper left), the red base is labeled T (upper right), the blue base is labeled G (lower left), and the orange base is labeled C.
All of the information needed to build and maintain an organism — whether it's a human, a dog, or a bacterial cell — is contained in its DNA. DNA molecules are composed of four nucleotides, and these nucleotides are linked together much like the words in a sentence. Together, all of the DNA "sentences" within a cell contain the instructions for building the proteins and other molecules that the cell needs to carry out its daily work.

How do researchers "read" gene sequences?

Determining the order of the nucleotides within a gene is known as DNA sequencing. The earliest DNA sequencing methods were time consuming, but a major breakthrough came in 1975 with the development of the process called Sanger sequencing. Sanger sequencing is named after English biochemist Frederick Sanger, and it is sometimes also referred to as chain-termination sequencing or dideoxy sequencing. Some 25 years after its creation, the Sanger method was used to sequence the human genome, and, with the addition of many technological improvements and modifications, it remains an important method in laboratories across the world today.

How does Sanger sequencing work?

Sanger sequencing is modeled after the natural process of DNA replication, and it uses dummy nucleotides to stop replication whenever a specific nucleotide is encountered. Because this truncated replication occurs over and over again, nucleic acids of varying lengths accumulate and can be used to determine the position of each nucleotide in the sequence.

Understanding DNA replication

A schematic shows a region of DNA, with part of the DNA being single-stranded and most of the DNA being double-stranded. The sugar-phosphate backbone is depicted as a segmented grey cylinder. Nitrogenous bases are represented by blue, orange, red or green vertical rectangles attached above each segment of the sugar-phosphate backbone. A transparent blue globular structure, representing the enzyme DNA polymerase, is bound to a several nucleotide-long region along the DNA strand about a quarter of the way from the right side. The DNA is single-stranded to the right of DNA polymerase and double-stranded to the left, indicating that DNA polymerase is moving from left to right as it replicates the DNA strand. The region of DNA bound by DNA polymerase is visible inside the transparent enzyme at a higher magnification. Six nucleotides in this region are bound to six complementary nucleotides arranged above and in parallel to the single strand, forming red-green or blue-orange pairs of rungs between the grey cylinders.
Figure 1: DNA polymerase assembles nucleotides to make a new DNA strand.
In order to understand how Sanger sequencing works, it's first necessary to understand the process of DNA replication as it exists in nature. DNA is a double-stranded, helical molecule composed of nucleotides, each of which contains a phosphate group, a sugar molecule, and a nitrogenous base. Because there are four naturally occurring nitrogenous bases, there are four different types of DNA nucleotides: adenine (A), thymine (T), guanine (G), and cytosine (C). Within double-stranded DNA, the nitrogenous bases on one strand pair with complementary bases along the other strand; in particular, A always pairs with T, and C always pairs with G. Then, during DNA replication, the two strands in the double helix separate. This allows an enzyme called DNA polymerase to access each strand individually (Figure 1). As the DNA polymerase moves down the single-stranded DNA, it uses the sequence of nucleotides in that strand as a template for replication. Thus, whenever the DNA polymerase recognizes a T in the template strand, it adds an A to the complementary daughter strand it is building; similarly, whenever it encounters a C in the original strand, it adds a G to the daughter strand. This process happens along both strands simultaneously, resulting in the eventual production of two double-helical molecules, each of which contains one "old" strand and one "new" strand of DNA.

Setting up the sequencing experiment

The Sanger method relies upon a variation of the replication process described above in order to determine the sequence of nucleotides in a segment of DNA. Before Sanger sequencing can begin, however, researchers must first make many copies of, or amplify, the DNA segment they wish to sequence. This is done either by cloning the DNA or by triggering the polymerase chain reaction (PCR). Once the DNA has been amplified, it is heated so that the two strands separate, and a synthetic primer is added to the mixture. The primer's sequence is complementary to the first piece of target DNA, which means that the primer and the DNA target bind with each other. At this point, the target sequence is exposed to a solution that contains DNA polymerase and all of the nucleotides required for synthesis of the complementary DNA strand — along with one special ingredient.

Adding ddNTPs

A schematic shows 16 nucleotides arranged to form a single horizontal strand of DNA. Horizontal cylinders represent the deoxyribose sugar molecule and vertical rectangles represent the nitrogenous base in each nucleotide. The deoxyribose sugar and the nitrogenous base in the single DNA strand are shown in grey. Below the single strand, four DDNTPs and four individual nucleotides are shown. The sugar molecules in the lower four DDNTPs are black, while the sugar molecules in the lower four individual nucleotides are grey. The nitrogenous bases on the DDNTPs and lower four nucleotides are different colors, representing different chemical identities of the bases: DDATP is green, DDCTP is orange, DDTTP is red, and DDGTP is blue. Similarly, adenine is green; cytosine is orange; thymine is red; and guanine is blue.
Figure 2: The four ddNTPs.

As described above, the next major step in the Sanger process is to expose the target sequence to DNA polymerase and significant amounts of all four nucleotides. In their unbound form, nucleotides have three phosphate groups and are formally called deoxynucleotide triphosphates, or dNTPs (where the "N" is a placeholder for A, T, G, or C). During the construction of a new DNA strand, a molecule called a hydroxyl group (which contains an oxygen atom and a hydrogen atom) attaches to the sugar of the last dNTP in the strand and chemically binds to the phosphate group on the next dNTP. This binding causes the DNA chain to grow. In Sanger sequencing, however, a special type of "dummy" nucleotide is included with the regular dNTPs that surround the growing DNA strand. These special nucleotides are known as dideoxynucleotide triphosphates, or ddNTPs (Figure 2), and they lack the crucial hydroxyl group that is attached to the sugar of dNTPs. Therefore, whenever a ddNTP is added to a growing DNA strand, it is unable to chemically bind with the next nucleotide in the chain, and the DNA strand stops growing.

When researchers carry out the Sanger process, they are manipulating many copies of the template strand at once, so an overabundance of dNTPs is required in order for DNA synthesis to proceed unimpeded on these copies until a ddNTP is added. Then, after the supply of dNTPs has been exhausted, the final result of the sequencing experiment is a group of new DNA strands of varying lengths. These strands all have a terminal ddNTP that indicates whether an A, T, G, or C occurs in that position on the template strand (Figure 3).
This schematic diagram shows how to form truncated DNA strands to help determine the nucleotide sequence of a DNA strand. Individual dideoxy-nucleotides are added to the sequencing mixture. Addition of a dideoxy-nucleotide truncates synthesis because another nucleotide cannot be added to the strand. By analyzing information from the truncated sequences for all four sequencing mixtures, the entire DNA sequence can be determined.
Figure 3: By adding together information about all of the truncated strands, researchers can determine the nucleotide sequence of the DNA target.

Reading the sequence: Now and then

When Sanger sequencing was first introduced, four separate reagents were used, one for each type of ddNTP. The four reaction products were then separated by gel electrophoresis, a process that organizes DNA fragments in order of size. This enabled researchers to assess the lengths of the truncated strands in each sample. This was important, because the end of each truncated strand was used to determine the position at which a ddNTP was added to the strand, thereby halting DNA elongation.

More recently, automation of the Sanger technique has made this process more efficient by combining all four ddNTP reactions in a single test tube. Each of the four ddNTPs in the tube is labeled with a different fluorescent color. Rather than being run on a gel and read manually, the reaction products are passed through a small tube containing a gel-like matrix. As the different-sized DNA fragments pass through the tube, a sequencing machine reads the fluorescent label at each position. Sequencing machines have vastly increased the speed and efficiency of DNA sequencing, and this technology continues to evolve at an astonishing rate.

How is DNA sequencing used by scientists?

In recent years, DNA sequencing technology has advanced many areas of science. For example, the field of functional genomics is concerned with figuring out what certain DNA sequences do, as well as which pieces of DNA code for proteins and which have important regulatory functions. An invaluable first step in making these determinations is learning the nucleotide sequences of the DNA segments under study. Another area of science that relies heavily on DNA sequencing is comparative genomics, in which researchers compare the genetic material of different organisms in order to learn about their evolutionary history and degree of relatedness. DNA sequencing has also aided complex disease research by allowing scientists to catalogue certain genetic variations between individuals that may influence their susceptibility to different conditions.

How can all people benefit from DNA sequencing?

At the individual level, biomedical research into the cause and course of common human diseases is primed to greatly improve health care. The application of DNA sequencing to the identification of disease-causing genetic variants will lead to improvements and expansion in genetic testing, as well as development of more targeted, personalized drug therapies in the years to come. Already today, the benefits of DNA sequencing can be seen in agriculture thanks to the production of disease-resistant plants and animals. In addition, microbial genome sequencing projects may someday lead to the development of new biofuels and pollutant-monitoring systems. DNA sequencing techniques are also used in forensic science, providing crucial evidence in criminal cases. In the United States, for instance, the Federal Bureau of Investigation (FBI) funds and operates a national database containing the genetic profiles of known offenders that can be searched whenever DNA evidence is obtained at a crime scene. According to the FBI, as of 2008, this database had profiles of over 6.5 million offenders and had assisted in almost 81,000 investigations.

Watch this video for a summary of the Sanger sequencing process

Further Exploration

Connect Send a message

Scitable by Nature Education Nature Education Home Learn More About Faculty Page Students Page Feedback


Visual Browse