Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Genomics

Massively parallel sequencing

A sequencing system has been developed that can read 25 million bases of genetic code — the entire genome of some fungi — within four hours. The technique may provide an alternative approach to DNA sequencing.

Since the publication of the first complete genome sequence of a living organism1 in 1995, the field of genomics has changed dramatically. Fuelled by innovations in high-throughput DNA sequencing, high-performance computing and bioinformatics, genomic science has expanded substantially and the rate of genomic discovery has grown exponentially. To date, the genomes of more than 300 organisms have been sequenced and analysed, including those of most major human pathogens, diverse microbes — and, of course, our own genome2,3. These advances have profoundly altered the landscapes of biological science and medicine. In this issue, Rothberg and colleagues (page 376)4 describe a sequencing system that offers a much higher throughput than the current state-of-the-art methods. The system has some limitations to overcome before it can be used for all sequencing applications, but it is nonetheless one of the most promising sequencing technologies to have emerged in recent years.

For more than a decade, Sanger sequencing5 and fluorescence-based electrophoresis technologies6 have dominated the DNA sequencing field. Continued improvements in these techniques and in instrumentation, paired with advances in computing and informatics, have reduced the cost of sequencing by roughly two orders of magnitude and transformed genome projects from decade-long endeavours to projects of mere months (for mammalian-sized genomes), or even weeks (for microbial genomes). However, it still costs an estimated US$10 million to US$25 million to sequence a single human genome7 and $20,000–$50,000 to sequence a microbial genome. Only a handful of large genome centres worldwide have the resources and technical expertise to handle the sequencing of a mammalian-sized genome, perform large-scale sequencing of multiple organisms or conduct the resequencing of large numbers of genes. To ensure continued growth of genomic science and to enable more labs to become involved in DNA sequencing, new approaches must decrease the cost and increase the throughput of sequencing significantly, while maintaining the high quality of data produced by the current approach.

Rothberg and colleagues4 have developed a highly parallel system capable of sequencing 25 million bases in a four-hour period — about 100 times faster than the current state-of-the-art Sanger sequencing and capillary-based electrophoresis platform. The method could potentially allow one individual to prepare and sequence an entire genome in a few days (Fig. 1). The sequencer itself, equipped with a simple detection device and liquid delivery system, and housed in a casing roughly the size of a microwave oven, is actually relatively low-tech. The complexity of the system lies primarily in the sample preparation and in the microfabricated, massively parallel platform, which contains 1.6 million picolitre-sized reactors in a 6.4-cm2 slide.

Figure 1: Speeding up sequencing.
figure1

Flow diagrams for a, traditional microlitre-scale Sanger DNA sequencing and electrophoresis, and b, the massively parallel picolitre-scale sequencing developed by Rothberg et al.4. The traditional microlitre-scale approach requires a longer processing time per production cycle, substantially more support equipment, a larger facility and more labour than the picolitre-scale approach.

Sample preparation starts with fragmentation of the genomic DNA, followed by the attachment of adaptor sequences to the ends of the DNA pieces. The adaptors allow the DNA fragments to bind to tiny beads (around 28 µm in diameter). This is done under conditions that allow only one piece of DNA to bind to each bead. The beads are encased in droplets of oil that contain all of the reactants needed to amplify the DNA using a standard tool called the polymerase chain reaction. The oil droplets form part of an emulsion so that each bead is kept apart from its neighbour, ensuring the amplification is uncontaminated. Each bead ends up with roughly 10 million copies of its initial DNA fragment.

To perform the sequencing reaction, the DNA-template-carrying beads are loaded into the picolitre reactor wells — each well having space for just one bead. The technique uses a sequencing-by-synthesis8 method developed by Uhlen and colleagues, in which DNA complementary to each template strand is synthesized. The nucleotide bases used for sequencing release a chemical group as the base forms a bond with the growing DNA chain, and this group drives a light-emitting reaction in the presence of specific enzymes and luciferin. Sequential washes of each of the four possible nucleotides are run over the plate, and a detector senses which of the wells emit light with each wash to determine the sequence of the growing strand.

This new system shows great promise in several sequencing applications, including resequencing and de novo sequencing of smaller bacterial and viral genomes. It could potentially allow research groups with limited resources to enter the field of large-scale DNA sequencing and genomic research, as it provides a technology that is inexpensive and easy to implement and maintain. However, this technology cannot yet replace the Sanger sequencing approach for some of the more demanding applications, such as sequencing a mammalian genome, as it has several limitations.

First, the technique can only read comparatively short lengths of DNA, averaging 80–120 bases per read, which is approximately a tenth of the read-lengths possible using Sanger sequencing. This means not only that more reads must be done to cover the same sequence, but also that stitching the results together into longer genomic sequences is a lot more complicated. This is particularly true when dealing with genomes containing long repetitive sequences.

Second, the accuracy of each individual read is not as good as with Sanger sequencing — particularly in genomic regions in which single bases are constantly repeated. Third, because the DNA ‘library’ is currently prepared in a single-stranded format, unlike the double-stranded inserts of DNA libraries used for Sanger sequencing, the technique cannot generate paired-end reads for each DNA fragment. The paired-end information is crucial for assembling and orientating the individual sequence reads into a complete genomic map for de novo sequencing applications. Finally, the sample preparation and amplification processes are still quite complex and will require automation and/or simplification.

Church and colleagues9 also recently hit upon the idea of using massively parallel reactions to speed up sequencing, although their method is still only at the proof-of-principle stage rather than being a full production system. They use a similar principle to Rothberg and colleagues4, that is, sequencing-by-synthesis on a solid support. However, the two approaches diverge in terms of library construction, sequencing chemistry, signal detection and array platform. These differences greatly affect the characteristics and reproducibility of the data, as well as the scalability of the platform. For example, Church and colleagues' method can read paired-end sequences; however, its average read-lengths are approximately a fifth of those generated by Rothberg and colleagues' system. These differences are key factors in determining the sequencing application for which each technique might be most suited.

It may be years before Rothberg and colleagues' system, or other similar approaches9,10, can tackle all three billion letters of the human genome with the same reliability and accuracy as current methods. Nevertheless, it looks extremely promising, and it is certainly one of the most significant sequencing technologies under development.

References

  1. 1

    Venter, J. C. et al. Science 269, 496–512 (1995).

    ADS  Article  Google Scholar 

  2. 2

    Venter, J. C. et al. Science 291, 1304–1351 (2001).

    ADS  CAS  Article  Google Scholar 

  3. 3

    International Human Genome Mapping Consortium. Nature 409, 860–921 (2001).

  4. 4

    Margulies, M. et al. Nature 437, 376–380 (2005).

    ADS  CAS  Article  Google Scholar 

  5. 5

    Sanger, F., Nicklen, S. & Coulson, A. R. Proc. Natl Acad. Sci. USA 74, 5463–5467 (1977).

    ADS  CAS  Article  Google Scholar 

  6. 6

    Prober, J. M. et al. Science 238, 336–341 (1987).

    ADS  CAS  Article  Google Scholar 

  7. 7

    NIH News Release http://www.genome.gov/12513210 (2004).

  8. 8

    Nyren, P., Pettersson, B. & Uhlen, M. Anal. Biochem. 208, 171–175 (1993).

    CAS  Article  Google Scholar 

  9. 9

    Shendure, J. et al. Science advance online publication doi:10.1126/science.1117389 (2005).

  10. 10

    Quake, S. R. et al. Proc. Natl Acad. Sci. USA 100, 3960–3964 (2003).

    ADS  Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Rogers, YH., Venter, J. Massively parallel sequencing. Nature 437, 326–327 (2005). https://doi.org/10.1038/437326a

Download citation

Further reading

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing