This page has been archived and is no longer updated

 

Sequencing Human Genome: the Contributions of Francis Collins and Craig Venter

By: Jill Adams, Ph.D. (Freelance science writer in Albany, NY) © 2008 Nature Education 
Citation: Adams, J. (2008) Sequencing human genome: the contributions of Francis Collins and Craig Venter. Nature Education 1(1):133
Email
How did it become possible to sequence the 3 billion base pairs in the human genome? More than a quarter of a century’s worth of work from hundreds of scientists made such projects possible.
Aa Aa Aa

 

Before the middle of the twentieth century, the gene was an abstract concept thought to physically resemble a "bead on a string," and within the scientific community, it was accepted that each gene was associated with a single protein, enzyme, or metabolic disorder. However, this began to change during the 1950s with the birth of modern molecular genetics. In 1952, Alfred Hershey and Martha Chase proved that DNA was the molecule of heredity, and shortly thereafter, Watson, Crick, Franklin, and Wilkins solved the three-dimensional structure of DNA. By 1959, Jerome Lejeune had demonstrated that Down syndrome was linked to chromosomal abnormalities (Lejeune et al., 1959). Next, the 1961 discovery of mRNA (Jacob & Monod, 1964) and the 1966 cracking of the genetic code (Figure 1; Nirenberg et al., 1966) made it possible to predict protein sequences based on DNA sequence alone. Nonetheless, although it was well established by this time that DNA was the heredity material and that each nucleus must contain the complete DNA required to instruct the chemical processes of an organism, the details of reading individual gene sequences, let alone whole genomes, were out of the technical grasp of scientists.

A large part of the reason for this inability to read gene sequences was the fact that there were simply very few sequences available to read; furthermore, the tools required to identify, isolate, and manipulate desired stretches of DNA were just evolving. Then, during the late 1960s and early 1970s, the combined work of several groups of researchers culminated in the isolation of proteins from prokaryotes using DNA cut at specific sites and spliced with DNA from other species (Meselson & Yuan, 1968; Jackson et al., 1972; Cohen et al., 1973). With these tools in place, the recombinant DNA age was about to allow scientists to start cloning genes en masse for the first time. Indeed, with the advent of Maxam-Gilbert DNA sequencing in the mid-1970s (Maxam & Gilbert, 1977), it actually became possible to read the entire sequence of a cloned gene, perhaps 1,000 to 30,000 base pairs long, with relative ease.

Collins and Other Researchers Master Gene Mapping

Thanks to these advances, mapping of important disease genes was all the rage by the 1980s, and Francis Collins was one of the masters of this process. Collins made a name for himself by discovering the location of three important disease genes—those responsible for cystic fibrosis, Duchenne muscular dystrophy, and Huntington's disease. The accomplishments were a result of both cutting-edge cloning techniques like chromosome jumping (Collins et al., 1987; Richards et al., 1988) and plain perseverance. Collins wasn't the only researcher actively "gene hunting" at this time, however; hundreds of other investigators were also racing to publish detailed descriptions of every new disease gene found.

During the 1980s, the importance of genes was obvious, but determining their location on chromosomes or their sequence of DNA nucleotides was laborious. Early studies of the genome were technically challenging and slow. Reagents were expensive, and the conditions for performing many reactions were temperamental. It therefore took several years to sequence single genes, and most genes were only partially cloned and described. Scientists had already reached the milestone of fully sequencing their first genome—that of the FX174 bacteriophage, whose 5,375 nucleotides had been determined in 1977 (Sanger et al., 1977b)—but this endeavor proved much easier than sequencing the genomes of more complex life forms. Indeed, the prospect of sequencing the 1 million base pairs of the E. coli genome or the 3 billion nucleotides of the human genome seemed close to impossible. For example, an article published in the New York Times in 1987 noted that only 500 human genes had been sequenced (Kanigel, 1987). At the time, that was thought to be about 1% of the total, and given the pace of discovery, it was believed that complete sequencing of the human genome would take at least 100 years.

In addition to questions about the technical challenges and costs associated with sequencing large genomes, a number of concerns about the scientific basis of these endeavors were also raised. Why spend the time, money, and resources to sequence the whole genome when only a small percentage of it was actually genes? With the huge scale of these projects, there was a logic to prioritizing certain tasks over others—specifically, the target sequencing of coding sequences (genes). Thus, instead of sequencing the raw genome, many researchers sought to study cDNA collections; these are DNA strands that are generated by collecting mRNA from a tissue, then converting it back to complementary DNA. Because cDNA starts as a message in a cell, it represents an actively expressed gene. Moreover, because cells behave differently in different tissues and at different developmental stages, specialized cDNA libraries are valuable tools for assessing what specific genes are at work in a cell at any given time. Scientists could therefore use these libraries to prioritize their sequencing in order to focus on coding sequences first.

At the same time, researchers were also working to identify many more polymorphic genetic markers to use as tools in gene mapping. Polymorphisms are the individual DNA base changes that make each of us unique at the level of DNA. The number of known human polymorphisms and microsatellite repeats increased to more than 2,000 by 1992—or 1 per every 2.5 million bases or so (Weissenbach et al., 1992). As researchers characterized more and more polymorphic markers, their chances of mapping a gene of interest to its chromosomal location increased dramatically.

Venter Combines Approaches to Make Sequencing Faster and Less Expensive

Thus, by the late 1980s, multiple approaches for sequencing DNA were in use, but costs and time constraints were still a limiting factor to research. However, this all began to change with the work of National Institutes of Health (NIH) scientist J. Craig Venter. For several years, Venter had been using automated DNA sequencers to sequence portions of chromosomes associated with Huntington's disease and myotonic dystrophy (Adams et al., 1991, 1992). Next, Venter tapped collections of cDNA molecules made from brain tissues. Then, in a 1991 paper, he described how he harnessed the power of his high-tech equipment to sequence more than 600 expressed sequence tags (ESTs) from a brain cDNA collection, identifying about half of them as genes, far more than anyone else had ever reported in a single paper to date. Not only did Venter's paper make an impact, but so did his claims that in his laboratory alone, he could sequence as many as 10,000 ESTs a year at the low cost of $0.12/base. The next year, in a second paper, Venter published the sequences of more than 2,000 genes, although some were incomplete. This brought the total to 2,500 genes sequenced in one laboratory, which was as many as had been sequenced in the entire world to that point (Figure 2).

Many scientists spoke out in criticism of Venter's brash approach. They noted that by sequencing ESTs, Venter was missing promoter sequences and other sites on DNA that were important for the regulation of gene expression. Furthermore, many critics argued that a focus on cheap volume was no substitute for careful, painstaking science. However, Venter's speed also spurred other groups—namely, the NIH effort led by James Watson—to step up their efforts to finish the Human Genome Project sooner.

In 1992, Venter left the NIH and, with the help of a venture capitalist, started a nonprofit research institute at which he quickly set up 30 automated sequencers. Venter's aim in doing so was to complete the sequencing of the human genome faster than the government-backed ("public") effort. This competition would later culminate in the simultaneous publication of the draft human genome sequence by both public and private efforts, ahead of schedule and below budget.

The events that occurred from the discovery of DNA's structure and role as a heredity molecule up through Venter's high-throughput EST experiments roughly delimit what is now known as the pregenomic era of molecular biology. The molecular tools and methods developed during this era were essential to reaching the milestone of sequencing the entire human genome.

References and Recommended Reading


Adams, M. D., et al. Complementary DNA sequencing: "Expressed sequence tags" and the Human Genome Project. Science 252, 1651–1656 (1991)

———. Sequence identification of 2,375 human brain genes. Nature 355, 632–634 (1992) doi:10.1038/355632a0 (link to article)

Cohen, S. N., et al. Construction of biologically functional bacterial plasmids in vitro. Proceedings of the National Academy of Sciences 70, 3240–3244 (1973)

Collins, F. S., et al. Construction of a general human chromosome jumping library, with application to cystic fibrosis. Science 235, 1046–1049 (1987)

Dulbecco, R. A turning point in cancer research: Sequencing the human genome. Science 231, 1055–1056 (1986) doi:10.1126/science.3945817

Jackson, D. A., et al. Biochemical method for inserting new genetic information into DNA of simian virus 40: Circular SV40 DNA molecules containing lambda phage genes and the galactose operon of Escherichia coli. Proceedings of the National Academy of Sciences 69, 2904–2909 (1972)

Jacob, F., & Monod, J. Biochemical and genetic mechanisms of regulation in the bacterial cell. Bulletin de Societe Chimique de France 46, 1499–1532 (1964)

Kanigel, R. The genome project. New York Times, 13 December (1987)

Lejeune, J., et al. Mongolism: A chromosomal disease (trisomy). Bulletin de l'Academie Nationale de Medecine 143, 256–265 (1959)

Maxam, A., & Gilbert, W. A new method of sequencing DNA. Proceedings of the National Academy of Sciences 74, 560–564 (1977)

Meselson, M., & Yuan, R. DNA restriction enzyme from E. coli. Nature 217, 1110–1114 (1968)

Nirenberg, M. W., et al. The RNA code and protein synthesis. Cold Spring Harbor Symposia on Quantitative Biology 31, 11–24 (1966)

Richards, J. E., et al. Chromosome jumping from D4S10 (G8) toward the Huntington disease gene. Proceedings of the National Academy of Sciences 5, 6437–6441 (1988)

Sanger, F., et al. Nucleotide sequence of bacteriophage phi X174 DNA. Nature 265, 687–695 (1977a) (link to article)

Sanger, F., et al. DNA sequencing with chain-terminating inhibitors. Proceedings of the National Academy of Sciences 74, 5463–5467 (1977b)

Weissenbach, J., et al. A second-generation linkage map of the human genome. Nature 359, 794–801 (1992) doi:10.1038/359794a0 (link to article)

Davies, K. Cracking the Genome: Inside the Race to Unlock Human DNA (New York, Free Press, 2001)

Email

Article History

Close

Flag Inappropriate

This content is currently under construction.

Connect
Connect Send a message


Scitable by Nature Education Nature Education Home Learn More About Faculty Page Students Page Feedback



Genomics

Visual Browse

Close