Collection

40 years of Sanger sequencing

DNA sequencing has a remarkable history, in terms of inception and evolution of the technologies themselves, as well as the breadth and scope of problems to which they have been applied. This Nature collection celebrates the 40th anniversary of the Sanger method for DNA sequencing, the most widely used sequencing method, pioneered by Fred Sanger and his team in 1977.

In a Review and accompanying Milestones, Jay Shendure, Shankar Balasubramanian, George M. Church, Walter Gilbert, Jane Rogers, Jeffery A. Schloss and Robert H. Waterston review the evolution of sequencing technologies over the past 40 years. The Milestones list key advances in methods development, computational analyses and applications of genome sequencing. We also highlight a selection of key publications from these Milestones that appeared in Nature journals in the Methods, Genomes and Applications sections.

Accompanying news and commentary in Nature bring further perspectives on this 40 year anniversary of Sanger sequencing. In a Commentary Eric Green, Eddy Rubin and Maynard Olson share perspectives on the future of sequencing over the next 40 years. A Technology Feature explores recent progress in one emerging method, nanopore sequencing, showing potential to upend the DNA sequencing market. A News Feature provides context on genomics applications in direct to consumer genetic testing.

- Orli Bahcall, Senior Editor, Nature

LISTEN: Nature Podcast with NHGRI Director Eric Green on how DNA sequencing has transformed biology, and what might still be to come.

40 year Perspectives

This year marks the 40th anniversary of the Sanger method for DNA sequencing, the most widely used sequencing method, pioneered by Fred Sanger and his team in 1977. Jay Shendure and colleagues review the evolution of sequencing technologies since their inception, highlighting the major milestones in the development, analyses and applications of genome sequencing over the past 40 years. Despite multiple technological revolutions and growth in scale, the authors see DNA sequencing as a relatively nascent technology in the grand scheme of scientific history. They review current emerging applications and discuss the continued evolution and future of DNA sequencing from population-scale resequencing to networks of portable sensors used for real-time monitoring in environmental settings.

Review Article | | Nature

Eric D. Green, Edward M. Rubin and Maynard V. Olson speculate on the next forty years of the applications, from policing to data storage.

Comment | | Nature

Offering long reads and rapidly improving accuracy, nanopore sequencing has the potential to upend the DNA sequencing market.

Technology Feature | | Nature

Methods

The race is on for a big prize: the job of providing the world's DNA sequencing laboratories with the successor to the ‘Sanger-based’ technology that gave us the first wave of genome sequences. One technology in the frame is that produced by 454 Life Sciences Corporation of Branford, Connecticut. Today's technology reads 67,000 base pairs per hour; this new approach is 100 times faster, reading 6 million base pairs per hour. The improved performance results from using picolitre-sized chemical reactors, enhanced light-emitting sequencing chemistries and complex informatics. Further miniaturization of the system is planned. Such leaps in technology may one day make it possible to analyse an individual's genome before designing therapy: the ultimate in personalized medicine.

Article | Open Access | | Nature

The power of the latest massively parallel synthetic DNA sequencing technologies is demonstrated in two major collaborations that shed light on the nature of genomic variation with ethnicity. The first describes the genomic characterization of an individual from the Yoruba ethnic group of west Africa. The second reports a personal genome of a Han Chinese, the group comprising 30% of the world's population. These new resources can now be used in conjunction with the Venter, Watson and NIH reference sequences. A separate study looked at genetic ethnicity on the continental scale, based on data from 1,387 individuals from more than 30 European countries. Overall there was little genetic variation between countries, but the differences that do exist correspond closely to the geographic map. Statistical analysis of the genome data places 50% of the individuals within 310 km of their reported origin. As well as its relevance for testing genetic ancestry, this work has implications for evaluating genome-wide association studies that link genes with diseases.

Article | Open Access | | Nature

A long-held goal in sequencing has been to use a voltage-biased nanoscale pore in a membrane to measure the passage of a linear, single-stranded (ss) DNA or RNA molecule through that pore. With the development of enzyme-based methods that ratchet polynucleotides through the nanopore, nucleobase-by-nucleobase, measurements of changes in the current through the pore can now be decoded into a DNA sequence using an algorithm. In this Historical Perspective, we describe the key steps in nanopore strand-sequencing, from its earliest conceptualization more than 25 years ago to its recent commercialization and application.

Perspective | | Nature Biotechnology

Polymerase kinetics observed during single-molecule, real-time sequencing depend on the methylation status of the DNA template. Measurement of kinetic parameters such as interpulse duration and pulse width allows the identification of methylated adenosine in Escherichia coli and the distinction between 5-methylcytosine and 5-hydroxymethylcytosine in synthetic templates.

Article | | Nature Methods

Progress towards cheaper and more compact DNA sequencing devices is limited by a number of factors, including the need for imaging technology. A new DNA sequencing technology that does away with optical readout, instead gathering sequence data by directly sensing hydrogen ions produced by template-directed DNA synthesis, offers a route to low cost and scalable sequencing on a massively parallel semiconductor-sensing device or ion chip. The reactions are performed using all natural nucleotides, and the individual ion-sensitive chips are disposable and inexpensive. The system has been used to sequence three bacterial genomes and a human genome: that of Gordon Moore of Moore's law fame.

Article | Open Access | | Nature

Protein nanopores are being developed as sensors that could perform rapid, electronic sequencing of long single molecules of DNA. Manrao et al. report the first demonstration of single nucleotide–resolution current traces from a nanopore, and show that these data can be mapped to known DNA sequences.

Letter | | Nature Biotechnology

This article reviews the use of quantum tunnelling for sequencing DNA, RNA and peptides, highlighting the potential advantages of the approach and the significant technical challenges that must be addressed to deliver practical quantum sequencing devices.

Review Article | | Nature Nanotechnology

Genomes

The international collaboration IRGSP has published a detailed analysis of the rice genome, with the finished quality sequence 95% complete. The ordered nature of this draft makes it possible to look at important aspects of the genome structure such as organelle insertions and duplications. The number of transposable elements — higher than predicted — and their composition help to explain how the genomes of other cereal grasses have been derived from rice. The data are publicly available and the IRGSP provides tools for analysis on its websites.

Article | | Nature

The first genomic characterization of the HeLa cancer cell line, the longest-serving and arguably most commonly used human cell line in biomedical research, reveals a genome that is surprisingly stable with respect to both point-mutation and copy-number alterations. The point-mutation rate may be no higher than the somatic mutation rate of normal tissue, and very few copy-number alterations distinguish the genomes of different HeLa strains that were split from one another in the mid-1950s. The authors examine the relationship between gene dosage and expression by integrating several data sets, including those from the ENCODE project, and find strong activation of the MYC proto-oncogene by the human papilloma virus type 18 (HPV-18) integration at chromosome 8q24.21.

Letter | | Nature

The genome of the zebrafish — a key model organism for the study of development and human disease — has now been sequenced and published as a well-annotated reference genome. Zebrafish turns out to have the largest gene set of any vertebrate so far sequenced, and few pseudogenes. Importantly for disease studies, comparison between human and zebrafish sequences reveals that 70% of human genes have at least one obvious zebrafish orthologue. A second paper reports on an ongoing effort to identify and phenotype disruptive mutations in every zebrafish protein-coding gene. Using the reference genome sequence along with high-throughput sequencing and efficient chemical mutagenesis, the project's initial results — covering 38% of all known protein-coding genes — they describe phenotypic consequences of more than 1,000 alleles. The long-term goal is the creation of a knockout allele in every protein-coding gene in the zebrafish genome. All mutant alleles and data are freely available at go.nature.com/en6mos .

Letter | Open Access | | Nature

Xenopus laevis, also known as the African clawed frog or platanna, is an important model organism that is used in the study of vertebrate cell and developmental biology. It is a palaeotetraploid—the product of genome duplications that occurred many millions of years ago. This makes X. laevis ideal for the study of polyploidy, but has greatly complicated genome sequencing. Here an international research collaboration reports the X. laevis genome sequence and compares it to that of the related X. tropicalis. Their analyses confirm that X. laevis is an allotetraploid and distinguishes two subgenomes that evolved asymmetrically—one often retained the ancestral state and the other was subject to gene loss, deletion, rearrangement and reduced expression. The two diploid progenitor species diverged about 34 million years ago, combining to form an allotetraploid about 18 million years ago.

Article | Open Access | | Nature

Applications

The race is on for a big prize: the job of providing the world's DNA sequencing laboratories with the successor to the ‘Sanger-based’ technology that gave us the first wave of genome sequences. One technology in the frame is that produced by 454 Life Sciences Corporation of Branford, Connecticut. Today's technology reads 67,000 base pairs per hour; this new approach is 100 times faster, reading 6 million base pairs per hour. The improved performance results from using picolitre-sized chemical reactors, enhanced light-emitting sequencing chemistries and complex informatics. Further miniaturization of the system is planned. Such leaps in technology may one day make it possible to analyse an individual's genome before designing therapy: the ultimate in personalized medicine.

Article | Open Access | | Nature

The power of the latest massively parallel synthetic DNA sequencing technologies is demonstrated in two major collaborations that shed light on the nature of genomic variation with ethnicity. The first describes the genomic characterization of an individual from the Yoruba ethnic group of west Africa. The second reports a personal genome of a Han Chinese, the group comprising 30% of the world's population. These new resources can now be used in conjunction with the Venter, Watson and NIH reference sequences. A separate study looked at genetic ethnicity on the continental scale, based on data from 1,387 individuals from more than 30 European countries. Overall there was little genetic variation between countries, but the differences that do exist correspond closely to the geographic map. Statistical analysis of the genome data places 50% of the individuals within 310 km of their reported origin. As well as its relevance for testing genetic ancestry, this work has implications for evaluating genome-wide association studies that link genes with diseases.

Article | Open Access | | Nature

Next-generation sequencing technologies are revolutionizing human genomics, promising to yield draft genomes cheaply and quickly. One such technology has now been used to analyse much of the genetic code of a single individual — who happens to be James D. Watson. The procedure, which involves no cloning of the genomic DNA, makes use of the latest 454 parallel sequencing instrument. The sequence cost less than US$1 million (and a mere two months) to produce, compared to the approximately US$100 million reported for sequencing Craig Venter's genome by traditional methods. Still a major undertaking, but another step towards the goal of 'personalized genomes' and 'personalized medicine'.

Letter | Open Access | | Nature

The power of the latest massively parallel synthetic DNA sequencing technologies is demonstrated in two major collaborations that shed light on the nature of genomic variation with ethnicity. The first describes the genomic characterization of an individual from the Yoruba ethnic group of west Africa. The second reports a personal genome of a Han Chinese, the group comprising 30% of the world's population. These new resources can now be used in conjunction with the Venter, Watson and NIH reference sequences. A separate study looked at genetic ethnicity on the continental scale, based on data from 1,387 individuals from more than 30 European countries. Overall there was little genetic variation between countries, but the differences that do exist correspond closely to the geographic map. Statistical analysis of the genome data places 50% of the individuals within 310 km of their reported origin. As well as its relevance for testing genetic ancestry, this work has implications for evaluating genome-wide association studies that link genes with diseases.

Article | Open Access | | Nature

The technologies that made it possible to characterize individual African and Chinese genomes have broad application in the biomedical field. A demonstration of what can be achieved in a medical context is the first comprehensive sequence of an individual cancer genome, for a patient with acute myeloid leukaemia. By comparing DNA from cancer and normal tissue from the same individual, ten mutations of possible relevance for pathogenesis were identified. As well as pointing to genes that may respond to targeted therapy, this work is a step towards the long-term goal of establishing the contextual relevance of such mutants, a process that will involve the analysis of many more personal genomes.

Article | Open Access | | Nature

Application of next-generation sequencing using the ABI SOLiD technology to mammalian transcriptome analysis enabled a survey of the content, the complexity and the developmental dynamics of the embryonic stem cell transcriptome in the mouse. Also in this issue, Mortazavi et al. report Illumina technology–based RNA-Seq analysis of the mouse transcriptome in three different tissues.

Article | | Nature Methods

The mouse transcriptome in three tissue types has been analyzed using Illumina next-generation sequencing technology. This quantitative RNA-Seq methodology has been used for expression analysis and splice isoform discovery and to confirm or extend reference gene models. Also in this issue, another paper reports application of the ABI SOLiD technology to sequence the transcriptome in mouse embryonic stem cells.

Article | | Nature Methods

DNA sequencing costs have fallen dramatically in recent years, but they are still too high for whole-genome sequencing to be used routinely to identify rare and novel variants in large cohorts. Here Ng et al. demonstrate that targeted capture and massively parallel sequencing could be a cost-effective, reproducible, and robust strategy for the sensitive and specific identification of variants causing protein-coding changes in individual human genomes. Using this 'second generation' approach to sequencing they determine 307 megabases across the exomes (the protein-coding regions of the genome) of 12 individuals. Freeman–Sheldon syndrome is used as a proof-of-concept to show that candidate genes for monogenic disorders can be identified by exome sequencing of a small number of unrelated, affected individuals.

Letter | | Nature

This issue of Nature contains the first publication from The 1000 Genomes Project, an international collaboration that will produce an extensive public catalogue of human genetic variation. The plan, in fact, is to sequence about 2,000 unidentified individuals from 20 populations around the world. This first paper presents the results from the project's pilot phase, testing three different strategies for genome-wide sequencing with high-throughput platforms: low-coverage whole-genome sequencing of 179 individuals in three population groups, high-coverage sequencing of two mother–father–child trios, and exon-targeted sequencing of 697 individuals from seven populations.

Article | Open Access | | Nature

The genome of the giant panda — specifically of the female Beijing Olympics mascot Jingjing — has been determined using short-read sequencing technology, a first for such a complex genome. It consists of some 2.4 billion DNA base pairs, compared to 3 billion in humans, and contains around 21,000 protein-encoding genes, similar to the human genome. Genomic diversity reflected in the sequence is high, raising hopes that despite a population of only about 2,500, conservation efforts can keep the species from extinction. Intriguingly, the panda appears to have all the genes needed for a carnivorous digestive system but lacks digestive cellulase genes. It may therefore depend on its gut microbiome to handle its famously limited bamboo diet. Taste may be a diet-limiting factor: loss of function of the T1R1 gene means that pandas may not experience the umami taste associated with high-protein foods. Technical aspects of this work pave the way for the use of next-generation sequencing for rapid de novo assembly of large eukaryotic genomes.

Article | Open Access | | Nature

The two copies of each chromosome in a diploid organism may contain different patterns of genetic variants. Fan et al. describe a microfluidic device capable of isolating each of the sister chromatids from single cells, allowing whole-genome haplotyping by sequencing and arrays.

Article | | Nature Biotechnology

Sequencing a human genome using next-generation methods does not distinguish between the two copies of each chromosome. Kitzman et al. determine a haplotype-resolved genome sequence by efficiently constructing and sequencing long-insert clones that cover the diploid genome with a low likelihood of overlap.

Letter | | Nature Biotechnology

The 1000 Genomes Project has sought to comprehensively catalogue human genetic variation across populations, providing a valuable public genomic resource. The data obtained so far have found applications ranging from association studies and fine mapping studies to the filtering of likely neutral variants in rare-disease cohorts. The authors now report on the final phase of the project, phase 3, which covers previously uncharacterized areas of human genetic diversity in terms of the populations sampled and categories of characterized variation. The sample now includes more than 2,500 individuals from 26 global populations, with low coverage whole-genome and deep exome sequencing, as well as dense microarray genotyping. They find that while most common variants are shared across populations, rarer variants are often restricted to closely related populations. The authors also demonstrate the use of the phase 3 dataset as a reference panel for imputation to improve the resolution in genetic association studies.

Article | Open Access | | Nature

Jeong-Sun Seo and colleagues report de novo assembly and phasing of the genome of an individual from Korea using a combination of PacBio long-read sequencing, Illumina short-read sequencing, 10X Genomics linked reads, bacterial artificial chromosome (BAC) sequencing and BioNano Genomics optical mapping. This provides a useful population-specific reference genome and represents the most contiguous human genome assembly to date. The authors use this to close gaps in the human reference genome and map structural variation.

Letter | Open Access | | Nature

As part of the Exome Aggregation Consortium (ExAC) project, Daniel MacArthur and colleagues report on the generation and analysis of high-quality exome sequencing data from 60,706 individuals of diverse ancestry. This provides the most comprehensive catalogue of human protein-coding genetic variation to date, yielding unprecedented resolution for the analysis of very rare variants across multiple human populations. The catalogue is freely accessible and provides a critical reference panel for the clinical interpretation of genetic variants and the discovery of disease-related genes.

Article | Open Access | | Nature

As part of the Exome Aggregation Consortium (ExAC) project, Daniel MacArthur and colleagues report on the generation and analysis of high-quality exome sequencing data from 60,706 individuals of diverse ancestry. This provides the most comprehensive catalogue of human protein-coding genetic variation to date, yielding unprecedented resolution for the analysis of very rare variants across multiple human populations. The catalogue is freely accessible and provides a critical reference panel for the clinical interpretation of genetic variants and the discovery of disease-related genes.

News & Views | | Nature