Essentials of Genetics

Unit 4: How Do Scientists Study and Manipulate the DNA inside Cells?

Progress

Loading ...

DNA Is a Structure That Encodes Biological Information

Discovery of the Function of DNA Resulted from the Work of Multiple Scientists

The Information in DNA Is Decoded by Transcription

The Information in DNA Determines Cellular Function via Translation

Introduction: How Does DNA Move from Cell to Cell?

Replication and Distribution of DNA during Mitosis

Replication and Distribution of DNA during Meiosis

DNA Is Constantly Changing through the Process of Recombination

DNA Is Constantly Changing through the Process of Mutation

Some Sections of DNA Do Not Determine Traits, but Affect the Process of Transcription: Gene Regulation

Introduction: How Is Genetic Information Passed between Organisms?

Each Organism's Traits Are Inherited from a Parent through Transmission of DNA

Inheritance of Traits by Offspring Follows Predictable Rules

Some Genes Are Transmitted to Offspring in Groups via the Phenomenon of Gene Linkage

The Sex of Offspring Is Determined by Particular Chromosomes

Some Organisms Transmit Genetic Material to Offspring without Cell Division

Introduction: How Do We Study the DNA Inside Cells?

The Order of Nucleotides in a Gene Is Revealed by DNA Sequencing

Scientists Can Make Copies of a Gene through PCR

Scientists Can Analyze Gene Function by Deleting Gene Sequences

Gene Expression Is Analyzed by Tracking RNA

Scientists Can Study an Organism's Entire Genome with Microarray Analysis

Introduction: How Does Inheritance Operate at the Level of Whole Populations?

The Collective Set of Alleles in a Population Is Its Gene Pool

The Variety of Genes in the Gene Pool Can Be Quantified within a Population

The Genetic Variation in a Population Is Caused by Multiple Factors

Genomics Enables Scientists to Study Genetic Variability in Human Populations

4.2 The Order of Nucleotides in a Gene Is Revealed by DNA Sequencing

All of the information needed to build and maintain an organism — whether it's a human, a dog, or a bacterial cell — is contained in its DNA. DNA molecules are composed of four nucleotides, and these nucleotides are linked together much like the words in a sentence. Together, all of the DNA "sentences" within a cell contain the instructions for building the proteins and other molecules that the cell needs to carry out its daily work.

How do researchers "read" gene sequences?

Determining the order of the nucleotides within a gene is known as DNA sequencing. The earliest DNA sequencing methods were time consuming, but a major breakthrough came in 1975 with the development of the process called Sanger sequencing. Sanger sequencing is named after English biochemist Frederick Sanger, and it is sometimes also referred to as chain-termination sequencing or dideoxy sequencing. Some 25 years after its creation, the Sanger method was used to sequence the human genome, and, with the addition of many technological improvements and modifications, it remains an important method in laboratories across the world today.

How does Sanger sequencing work?

Sanger sequencing is modeled after the natural process of DNA replication, and it uses dummy nucleotides to stop replication whenever a specific nucleotide is encountered. Because this truncated replication occurs over and over again, nucleic acids of varying lengths accumulate and can be used to determine the position of each nucleotide in the sequence.

Understanding DNA replication

Figure 1: DNA polymerase assembles nucleotides to make a new DNA strand.

In order to understand how Sanger sequencing works, it's first necessary to understand the process of DNA replication as it exists in nature. DNA is a double-stranded, helical molecule composed of nucleotides, each of which contains a phosphate group, a sugar molecule, and a nitrogenous base. Because there are four naturally occurring nitrogenous bases, there are four different types of DNA nucleotides: adenine (A), thymine (T), guanine (G), and cytosine (C). Within double-stranded DNA, the nitrogenous bases on one strand pair with complementary bases along the other strand; in particular, A always pairs with T, and C always pairs with G. Then, during DNA replication, the two strands in the double helix separate. This allows an enzyme called DNA polymerase to access each strand individually (Figure 1). As the DNA polymerase moves down the single-stranded DNA, it uses the sequence of nucleotides in that strand as a template for replication. Thus, whenever the DNA polymerase recognizes a T in the template strand, it adds an A to the complementary daughter strand it is building; similarly, whenever it encounters a C in the original strand, it adds a G to the daughter strand. This process happens along both strands simultaneously, resulting in the eventual production of two double-helical molecules, each of which contains one "old" strand and one "new" strand of DNA.

Setting up the sequencing experiment

The Sanger method relies upon a variation of the replication process described above in order to determine the sequence of nucleotides in a segment of DNA. Before Sanger sequencing can begin, however, researchers must first make many copies of, or amplify, the DNA segment they wish to sequence. This is done either by cloning the DNA or by triggering the polymerase chain reaction (PCR). Once the DNA has been amplified, it is heated so that the two strands separate, and a synthetic primer is added to the mixture. The primer's sequence is complementary to the first piece of target DNA, which means that the primer and the DNA target bind with each other. At this point, the target sequence is exposed to a solution that contains DNA polymerase and all of the nucleotides required for synthesis of the complementary DNA strand — along with one special ingredient.

Adding ddNTPs

Figure 2: The four ddNTPs.

As described above, the next major step in the Sanger process is to expose the target sequence to DNA polymerase and significant amounts of all four nucleotides. In their unbound form, nucleotides have three phosphate groups and are formally called deoxynucleotide triphosphates, or dNTPs (where the "N" is a placeholder for A, T, G, or C). During the construction of a new DNA strand, a molecule called a hydroxyl group (which contains an oxygen atom and a hydrogen atom) attaches to the sugar of the last dNTP in the strand and chemically binds to the phosphate group on the next dNTP. This binding causes the DNA chain to grow. In Sanger sequencing, however, a special type of "dummy" nucleotide is included with the regular dNTPs that surround the growing DNA strand. These special nucleotides are known as dideoxynucleotide triphosphates, or ddNTPs (Figure 2), and they lack the crucial hydroxyl group that is attached to the sugar of dNTPs. Therefore, whenever a ddNTP is added to a growing DNA strand, it is unable to chemically bind with the next nucleotide in the chain, and the DNA strand stops growing.

When researchers carry out the Sanger process, they are manipulating many copies of the template strand at once, so an overabundance of dNTPs is required in order for DNA synthesis to proceed unimpeded on these copies until a ddNTP is added. Then, after the supply of dNTPs has been exhausted, the final result of the sequencing experiment is a group of new DNA strands of varying lengths. These strands all have a terminal ddNTP that indicates whether an A, T, G, or C occurs in that position on the template strand (Figure 3).

By adding together information about all of the truncated strands, researchers can determine the nucleotide sequence of the DNA target.

Figure 3: By adding together information about all of the truncated strands, researchers can determine the nucleotide sequence of the DNA target.

Figure Detail

Reading the sequence: Now and then

When Sanger sequencing was first introduced, four separate reagents were used, one for each type of ddNTP. The four reaction products were then separated by gel electrophoresis, a process that organizes DNA fragments in order of size. This enabled researchers to assess the lengths of the truncated strands in each sample. This was important, because the end of each truncated strand was used to determine the position at which a ddNTP was added to the strand, thereby halting DNA elongation.

More recently, automation of the Sanger technique has made this process more efficient by combining all four ddNTP reactions in a single test tube. Each of the four ddNTPs in the tube is labeled with a different fluorescent color. Rather than being run on a gel and read manually, the reaction products are passed through a small tube containing a gel-like matrix. As the different-sized DNA fragments pass through the tube, a sequencing machine reads the fluorescent label at each position. Sequencing machines have vastly increased the speed and efficiency of DNA sequencing, and this technology continues to evolve at an astonishing rate.

How is DNA sequencing used by scientists?

In recent years, DNA sequencing technology has advanced many areas of science. For example, the field of functional genomics is concerned with figuring out what certain DNA sequences do, as well as which pieces of DNA code for proteins and which have important regulatory functions. An invaluable first step in making these determinations is learning the nucleotide sequences of the DNA segments under study. Another area of science that relies heavily on DNA sequencing is comparative genomics, in which researchers compare the genetic material of different organisms in order to learn about their evolutionary history and degree of relatedness. DNA sequencing has also aided complex disease research by allowing scientists to catalogue certain genetic variations between individuals that may influence their susceptibility to different conditions.

How can all people benefit from DNA sequencing?

More about sequencing

At the individual level, biomedical research into the cause and course of common human diseases is primed to greatly improve health care. The application of DNA sequencing to the identification of disease-causing genetic variants will lead to improvements and expansion in genetic testing, as well as development of more targeted, personalized drug therapies in the years to come. Already today, the benefits of DNA sequencing can be seen in agriculture thanks to the production of disease-resistant plants and animals. In addition, microbial genome sequencing projects may someday lead to the development of new biofuels and pollutant-monitoring systems. DNA sequencing techniques are also used in forensic science, providing crucial evidence in criminal cases. In the United States, for instance, the Federal Bureau of Investigation (FBI) funds and operates a national database containing the genetic profiles of known offenders that can be searched whenever DNA evidence is obtained at a crime scene. According to the FBI, as of 2008, this database had profiles of over 6.5 million offenders and had assisted in almost 81,000 investigations.