Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Genomics

# Massively parallel sequencing

A sequencing system has been developed that can read 25 million bases of genetic code — the entire genome of some fungi — within four hours. The technique may provide an alternative approach to DNA sequencing.

Since the publication of the first complete genome sequence of a living organism1 in 1995, the field of genomics has changed dramatically. Fuelled by innovations in high-throughput DNA sequencing, high-performance computing and bioinformatics, genomic science has expanded substantially and the rate of genomic discovery has grown exponentially. To date, the genomes of more than 300 organisms have been sequenced and analysed, including those of most major human pathogens, diverse microbes — and, of course, our own genome2,3. These advances have profoundly altered the landscapes of biological science and medicine. In this issue, Rothberg and colleagues (page 376)4 describe a sequencing system that offers a much higher throughput than the current state-of-the-art methods. The system has some limitations to overcome before it can be used for all sequencing applications, but it is nonetheless one of the most promising sequencing technologies to have emerged in recent years.

For more than a decade, Sanger sequencing5 and fluorescence-based electrophoresis technologies6 have dominated the DNA sequencing field. Continued improvements in these techniques and in instrumentation, paired with advances in computing and informatics, have reduced the cost of sequencing by roughly two orders of magnitude and transformed genome projects from decade-long endeavours to projects of mere months (for mammalian-sized genomes), or even weeks (for microbial genomes). However, it still costs an estimated US$10 million to US$25 million to sequence a single human genome7 and $20,000–$50,000 to sequence a microbial genome. Only a handful of large genome centres worldwide have the resources and technical expertise to handle the sequencing of a mammalian-sized genome, perform large-scale sequencing of multiple organisms or conduct the resequencing of large numbers of genes. To ensure continued growth of genomic science and to enable more labs to become involved in DNA sequencing, new approaches must decrease the cost and increase the throughput of sequencing significantly, while maintaining the high quality of data produced by the current approach.

Rothberg and colleagues4 have developed a highly parallel system capable of sequencing 25 million bases in a four-hour period — about 100 times faster than the current state-of-the-art Sanger sequencing and capillary-based electrophoresis platform. The method could potentially allow one individual to prepare and sequence an entire genome in a few days (Fig. 1). The sequencer itself, equipped with a simple detection device and liquid delivery system, and housed in a casing roughly the size of a microwave oven, is actually relatively low-tech. The complexity of the system lies primarily in the sample preparation and in the microfabricated, massively parallel platform, which contains 1.6 million picolitre-sized reactors in a 6.4-cm2 slide.

Sample preparation starts with fragmentation of the genomic DNA, followed by the attachment of adaptor sequences to the ends of the DNA pieces. The adaptors allow the DNA fragments to bind to tiny beads (around 28 µm in diameter). This is done under conditions that allow only one piece of DNA to bind to each bead. The beads are encased in droplets of oil that contain all of the reactants needed to amplify the DNA using a standard tool called the polymerase chain reaction. The oil droplets form part of an emulsion so that each bead is kept apart from its neighbour, ensuring the amplification is uncontaminated. Each bead ends up with roughly 10 million copies of its initial DNA fragment.

To perform the sequencing reaction, the DNA-template-carrying beads are loaded into the picolitre reactor wells — each well having space for just one bead. The technique uses a sequencing-by-synthesis8 method developed by Uhlen and colleagues, in which DNA complementary to each template strand is synthesized. The nucleotide bases used for sequencing release a chemical group as the base forms a bond with the growing DNA chain, and this group drives a light-emitting reaction in the presence of specific enzymes and luciferin. Sequential washes of each of the four possible nucleotides are run over the plate, and a detector senses which of the wells emit light with each wash to determine the sequence of the growing strand.

This new system shows great promise in several sequencing applications, including resequencing and de novo sequencing of smaller bacterial and viral genomes. It could potentially allow research groups with limited resources to enter the field of large-scale DNA sequencing and genomic research, as it provides a technology that is inexpensive and easy to implement and maintain. However, this technology cannot yet replace the Sanger sequencing approach for some of the more demanding applications, such as sequencing a mammalian genome, as it has several limitations.

First, the technique can only read comparatively short lengths of DNA, averaging 80–120 bases per read, which is approximately a tenth of the read-lengths possible using Sanger sequencing. This means not only that more reads must be done to cover the same sequence, but also that stitching the results together into longer genomic sequences is a lot more complicated. This is particularly true when dealing with genomes containing long repetitive sequences.

Second, the accuracy of each individual read is not as good as with Sanger sequencing — particularly in genomic regions in which single bases are constantly repeated. Third, because the DNA ‘library’ is currently prepared in a single-stranded format, unlike the double-stranded inserts of DNA libraries used for Sanger sequencing, the technique cannot generate paired-end reads for each DNA fragment. The paired-end information is crucial for assembling and orientating the individual sequence reads into a complete genomic map for de novo sequencing applications. Finally, the sample preparation and amplification processes are still quite complex and will require automation and/or simplification.

Church and colleagues9 also recently hit upon the idea of using massively parallel reactions to speed up sequencing, although their method is still only at the proof-of-principle stage rather than being a full production system. They use a similar principle to Rothberg and colleagues4, that is, sequencing-by-synthesis on a solid support. However, the two approaches diverge in terms of library construction, sequencing chemistry, signal detection and array platform. These differences greatly affect the characteristics and reproducibility of the data, as well as the scalability of the platform. For example, Church and colleagues' method can read paired-end sequences; however, its average read-lengths are approximately a fifth of those generated by Rothberg and colleagues' system. These differences are key factors in determining the sequencing application for which each technique might be most suited.

It may be years before Rothberg and colleagues' system, or other similar approaches9,10, can tackle all three billion letters of the human genome with the same reliability and accuracy as current methods. Nevertheless, it looks extremely promising, and it is certainly one of the most significant sequencing technologies under development.

## References

1. 1

Venter, J. C. et al. Science 269, 496–512 (1995).

2. 2

Venter, J. C. et al. Science 291, 1304–1351 (2001).

3. 3

International Human Genome Mapping Consortium. Nature 409, 860–921 (2001).

4. 4

Margulies, M. et al. Nature 437, 376–380 (2005).

5. 5

Sanger, F., Nicklen, S. & Coulson, A. R. Proc. Natl Acad. Sci. USA 74, 5463–5467 (1977).

6. 6

Prober, J. M. et al. Science 238, 336–341 (1987).

7. 7

NIH News Release http://www.genome.gov/12513210 (2004).

8. 8

Nyren, P., Pettersson, B. & Uhlen, M. Anal. Biochem. 208, 171–175 (1993).

9. 9

Shendure, J. et al. Science advance online publication doi:10.1126/science.1117389 (2005).

10. 10

Quake, S. R. et al. Proc. Natl Acad. Sci. USA 100, 3960–3964 (2003).

Authors

## Rights and permissions

Reprints and Permissions

Rogers, YH., Venter, J. Massively parallel sequencing. Nature 437, 326–327 (2005). https://doi.org/10.1038/437326a

• Published:

• Issue Date:

• ### Accurate haplotype-resolved assembly reveals the origin of structural variants for human trios

• Mengyang Xu
• , Lidong Guo
• , Xiao Du
• , Lei Li
• , Brock A Peters
• , Li Deng
• , Ou Wang
• , Fang Chen
• , Jun Wang
• , Zhesheng Jiang
• , Jinglin Han
• , Ming Ni
• , Huanming Yang
• , Xun Xu
• , Xin Liu
• , Jie Huang
• , Guangyi Fan
•  & Janet Kelso

Bioinformatics (2021)

• ### The challenge and promise of estimating the de novo mutation rate from whole‐genome comparisons among closely related individuals

• Anne D. Yoder
•  & George P. Tiley

Molecular Ecology (2021)

• ### High-throughput microbioreactor provides a capable tool for early stage bioprocess development

• Mathias Fink
• , Monika Cserjan-Puschmann
• , Daniela Reinisch
•  & Gerald Striedner

Scientific Reports (2021)

• ### ERgene: Python library for screening endogenous reference genes

• Zehua Zeng
• , Yuzhe Xiong
• , Wenhuan Guo
•  & Hongwu Du

Scientific Reports (2020)

• ### Classification of congenital upper limb anomalies: towards improved communication, diagnosis, and discovery

• Kerby C. Oberg

Journal of Hand Surgery (European Volume) (2019)