Accurate whole human genome sequencing using reversible terminator chemistry

Bentley, David R.; Balasubramanian, Shankar; Swerdlow, Harold P.; Smith, Geoffrey P.; Milton, John; Brown, Clive G.; Hall, Kevin P.; Evers, Dirk J.; Barnes, Colin L.; Bignell, Helen R.; Boutell, Jonathan M.; Bryant, Jason; Carter, Richard J.; Keira Cheetham, R.; Cox, Anthony J.; Ellis, Darren J.; Flatbush, Michael R.; Gormley, Niall A.; Humphray, Sean J.; Irving, Leslie J.; Karbelashvili, Mirian S.; Kirk, Scott M.; Li, Heng; Liu, Xiaohai; Maisinger, Klaus S.; Murray, Lisa J.; Obradovic, Bojan; Ost, Tobias; Parkinson, Michael L.; Pratt, Mark R.; Rasolonjatovo, Isabelle M. J.; Reed, Mark T.; Rigatti, Roberto; Rodighiero, Chiara; Ross, Mark T.; Sabot, Andrea; Sankar, Subramanian V.; Scally, Aylwyn; Schroth, Gary P.; Smith, Mark E.; Smith, Vincent P.; Spiridou, Anastassia; Torrance, Peta E.; Tzonev, Svilen S.; Vermaas, Eric H.; Walter, Klaudia; Wu, Xiaolin; Zhang, Lu; Alam, Mohammed D.; Anastasi, Carole; Aniebo, Ify C.; Bailey, David M. D.; Bancarz, Iain R.; Banerjee, Saibal; Barbour, Selena G.; Baybayan, Primo A.; Benoit, Vincent A.; Benson, Kevin F.; Bevis, Claire; Black, Phillip J.; Boodhun, Asha; Brennan, Joe S.; Bridgham, John A.; Brown, Rob C.; Brown, Andrew A.; Buermann, Dale H.; Bundu, Abass A.; Burrows, James C.; Carter, Nigel P.; Castillo, Nestor; Chiara E. Catenazzi, Maria; Chang, Simon; Neil Cooley, R.; Crake, Natasha R.; Dada, Olubunmi O.; Diakoumakos, Konstantinos D.; Dominguez-Fernandez, Belen; Earnshaw, David J.; Egbujor, Ugonna C.; Elmore, David W.; Etchin, Sergey S.; Ewan, Mark R.; Fedurco, Milan; Fraser, Louise J.; Fuentes Fajardo, Karin V.; Scott Furey, W.; George, David; Gietzen, Kimberley J.; Goddard, Colin P.; Golda, George S.; Granieri, Philip A.; Green, David E.; Gustafson, David L.; Hansen, Nancy F.; Harnish, Kevin; Haudenschild, Christian D.; Heyer, Narinder I.; Hims, Matthew M.; Ho, Johnny T.; Horgan, Adrian M.; Hoschler, Katya; Hurwitz, Steve; Ivanov, Denis V.; Johnson, Maria Q.; James, Terena; Huw Jones, T. A.; Kang, Gyoung-Dong; Kerelska, Tzvetana H.; Kersey, Alan D.; Khrebtukova, Irina; Kindwall, Alex P.; Kingsbury, Zoya; Kokko-Gonzales, Paula I.; Kumar, Anil; Laurent, Marc A.; Lawley, Cynthia T.; Lee, Sarah E.; Lee, Xavier; Liao, Arnold K.; Loch, Jennifer A.; Lok, Mitch; Luo, Shujun; Mammen, Radhika M.; Martin, John W.; McCauley, Patrick G.; McNitt, Paul; Mehta, Parul; Moon, Keith W.; Mullens, Joe W.; Newington, Taksina; Ning, Zemin; Ling Ng, Bee; Novo, Sonia M.; O’Neill, Michael J.; Osborne, Mark A.; Osnowski, Andrew; Ostadan, Omead; Paraschos, Lambros L.; Pickering, Lea; Pike, Andrew C.; Pike, Alger C.; Chris Pinkard, D.; Pliskin, Daniel P.; Podhasky, Joe; Quijano, Victor J.; Raczy, Come; Rae, Vicki H.; Rawlings, Stephen R.; Chiva Rodriguez, Ana; Roe, Phyllida M.; Rogers, John; Rogert Bacigalupo, Maria C.; Romanov, Nikolai; Romieu, Anthony; Roth, Rithy K.; Rourke, Natalie J.; Ruediger, Silke T.; Rusman, Eli; Sanches-Kuiper, Raquel M.; Schenker, Martin R.; Seoane, Josefina M.; Shaw, Richard J.; Shiver, Mitch K.; Short, Steven W.; Sizto, Ning L.; Sluis, Johannes P.; Smith, Melanie A.; Ernest Sohna Sohna, Jean; Spence, Eric J.; Stevens, Kim; Sutton, Neil; Szajkowski, Lukasz; Tregidgo, Carolyn L.; Turcatti, Gerardo; vandeVondele, Stephanie; Verhovsky, Yuli; Virk, Selene M.; Wakelin, Suzanne; Walcott, Gregory C.; Wang, Jingwen; Worsley, Graham J.; Yan, Juying; Yau, Ling; Zuerlein, Mike; Rogers, Jane; Mullikin, James C.; Hurles, Matthew E.; McCooke, Nick J.; West, John S.; Oaks, Frank L.; Lundberg, Peter L.; Klenerman, David; Durbin, Richard; Smith, Anthony J.

doi:10.1038/nature07517

Download PDF

Article
Open access
Published: 06 November 2008

Accurate whole human genome sequencing using reversible terminator chemistry

David R. Bentley¹,
Shankar Balasubramanian²,
Harold P. Swerdlow¹^nAff8,
Geoffrey P. Smith¹,
John Milton¹^nAff8,
Clive G. Brown¹^nAff8,
Kevin P. Hall¹,
Dirk J. Evers¹,
Colin L. Barnes^1,2,
Helen R. Bignell¹,
Jonathan M. Boutell¹,
Jason Bryant¹,
Richard J. Carter¹,
R. Keira Cheetham¹,
Anthony J. Cox¹,
Darren J. Ellis¹,
Michael R. Flatbush³,
Niall A. Gormley¹,
Sean J. Humphray¹,
Leslie J. Irving¹,
Mirian S. Karbelashvili³,
Scott M. Kirk³,
Heng Li⁴,
Xiaohai Liu^1,2,
Klaus S. Maisinger¹,
Lisa J. Murray¹,
Bojan Obradovic¹,
Tobias Ost¹,
Michael L. Parkinson¹,
Mark R. Pratt³,
Isabelle M. J. Rasolonjatovo¹,
Mark T. Reed³,
Roberto Rigatti¹,
Chiara Rodighiero¹,
Mark T. Ross¹,
Andrea Sabot¹,
Subramanian V. Sankar³,
Aylwyn Scally⁴,
Gary P. Schroth³,
Mark E. Smith¹,
Vincent P. Smith¹,
Anastassia Spiridou¹,
Peta E. Torrance¹,
Svilen S. Tzonev³,
Eric H. Vermaas³,
Klaudia Walter⁴,
Xiaolin Wu¹,
Lu Zhang³,
Mohammed D. Alam³,
Carole Anastasi¹,
Ify C. Aniebo¹,
David M. D. Bailey¹,
Iain R. Bancarz¹,
Saibal Banerjee³,
Selena G. Barbour¹,
Primo A. Baybayan³,
Vincent A. Benoit¹,
Kevin F. Benson¹,
Claire Bevis¹,
Phillip J. Black¹,
Asha Boodhun¹,
Joe S. Brennan¹,
John A. Bridgham³,
Rob C. Brown¹,
Andrew A. Brown¹,
Dale H. Buermann³,
Abass A. Bundu¹,
James C. Burrows³,
Nigel P. Carter⁴,
Nestor Castillo³,
Maria Chiara E. Catenazzi¹,
Simon Chang³,
R. Neil Cooley¹,
Natasha R. Crake¹,
Olubunmi O. Dada¹,
Konstantinos D. Diakoumakos¹,
Belen Dominguez-Fernandez¹,
David J. Earnshaw^1,2,
Ugonna C. Egbujor¹,
David W. Elmore³,
Sergey S. Etchin³,
Mark R. Ewan³,
Milan Fedurco⁵,
Louise J. Fraser¹,
Karin V. Fuentes Fajardo¹,
W. Scott Furey²,
David George³,
Kimberley J. Gietzen⁶,
Colin P. Goddard¹,
George S. Golda³,
Philip A. Granieri³,
David E. Green¹,
David L. Gustafson³,
Nancy F. Hansen⁷,
Kevin Harnish¹,
Christian D. Haudenschild³,
Narinder I. Heyer¹,
Matthew M. Hims¹,
Johnny T. Ho³,
Adrian M. Horgan¹,
Katya Hoschler¹,
Steve Hurwitz³,
Denis V. Ivanov³,
Maria Q. Johnson³,
Terena James¹,
T. A. Huw Jones¹,
Gyoung-Dong Kang¹,
Tzvetana H. Kerelska³,
Alan D. Kersey¹,
Irina Khrebtukova³,
Alex P. Kindwall³,
Zoya Kingsbury¹,
Paula I. Kokko-Gonzales¹,
Anil Kumar¹,
Marc A. Laurent⁶,
Cynthia T. Lawley⁶,
Sarah E. Lee¹,
Xavier Lee³,
Arnold K. Liao³,
Jennifer A. Loch¹,
Mitch Lok³,
Shujun Luo³,
Radhika M. Mammen¹,
John W. Martin³,
Patrick G. McCauley¹,
Paul McNitt³,
Parul Mehta¹,
Keith W. Moon³,
Joe W. Mullens³,
Taksina Newington¹,
Zemin Ning⁴,
Bee Ling Ng⁴,
Sonia M. Novo¹,
Michael J. O’Neill³,
Mark A. Osborne^1,2,
Andrew Osnowski¹,
Omead Ostadan^3,6,
Lambros L. Paraschos³,
Lea Pickering¹,
Andrew C. Pike¹,
Alger C. Pike³,
D. Chris Pinkard³,
Daniel P. Pliskin³,
Joe Podhasky³,
Victor J. Quijano³,
Come Raczy¹,
Vicki H. Rae¹,
Stephen R. Rawlings¹,
Ana Chiva Rodriguez¹,
Phyllida M. Roe¹,
John Rogers¹,
Maria C. Rogert Bacigalupo¹,
Nikolai Romanov¹,
Anthony Romieu⁵,
Rithy K. Roth³,
Natalie J. Rourke¹,
Silke T. Ruediger¹,
Eli Rusman³,
Raquel M. Sanches-Kuiper¹,
Martin R. Schenker¹,
Josefina M. Seoane³,
Richard J. Shaw¹,
Mitch K. Shiver³,
Steven W. Short³,
Ning L. Sizto³,
Johannes P. Sluis³,
Melanie A. Smith¹,
Jean Ernest Sohna Sohna¹,
Eric J. Spence³,
Kim Stevens¹,
Neil Sutton¹,
Lukasz Szajkowski¹,
Carolyn L. Tregidgo¹,
Gerardo Turcatti⁵,
Stephanie vandeVondele¹,
Yuli Verhovsky³,
Selene M. Virk³,
Suzanne Wakelin³,
Gregory C. Walcott³,
Jingwen Wang¹,
Graham J. Worsley¹,
Juying Yan³,
Ling Yau³,
Mike Zuerlein³,
Jane Rogers⁴^nAff8,
James C. Mullikin⁷,
Matthew E. Hurles⁴,
Nick J. McCooke¹^nAff8,
John S. West³,
Frank L. Oaks³,
Peter L. Lundberg³,
David Klenerman²,
Richard Durbin⁴ &
…
Anthony J. Smith¹

Nature volume 456, pages 53–59 (2008)Cite this article

94k Accesses
2460 Citations
88 Altmetric
Metrics details

Abstract

DNA sequence information underpins genetic research, enabling discoveries of important biological or medical benefit. Sequencing projects have traditionally used long (400–800 base pair) reads, but the existence of reference sequences for the human and many other genomes makes it possible to develop new, fast approaches to re-sequencing, whereby shorter reads are compared to a reference to identify intraspecies genetic variation. Here we report an approach that generates several billion bases of accurate nucleotide sequence per experiment at low cost. Single molecules of DNA are attached to a flat surface, amplified in situ and used as templates for synthetic sequencing with fluorescent reversible terminator deoxyribonucleotides. Images of the surface are analysed to generate high-quality sequence. We demonstrate application of this approach to human genome sequencing on flow-sorted X chromosomes and then scale the approach to determine the genome sequence of a male Yoruba from Ibadan, Nigeria. We build an accurate consensus sequence from >30× average depth of paired 35-base reads. We characterize four million single-nucleotide polymorphisms and four hundred thousand structural variants, many of which were previously unknown. Our approach is effective for accurate, rapid and economical whole-genome re-sequencing and many other biomedical applications.

Detection of low-frequency DNA variants by targeted sequencing of the Watson and Crick strands

Article 03 May 2021

High throughput barcoding method for genome-scale phasing

Article Open access 02 December 2019

Beyond assembly: the increasing flexibility of single-molecule sequencing technology

Article 09 May 2023

Main

DNA sequencing yields an unrivalled resource of genetic information. We can characterize individual genomes, transcriptional states and genetic variation in populations and disease. Until recently, the scope of sequencing projects was limited by the cost and throughput of Sanger sequencing. The raw data for the three billion base (3 gigabase (Gb)) human genome sequence, completed in 2004 (ref. 1), was generated over several years for ∼$300 million using several hundred capillary sequencers. More recently an individual human genome sequence has been determined for ∼$10 million by capillary sequencing². Several new approaches at varying stages of development aim to increase sequencing throughput and reduce cost^3,4,5,6. They increase parallelization markedly by imaging many DNA molecules simultaneously. One instrument run produces typically thousands or millions of sequences that are shorter than capillary reads. Another human genome sequence was recently determined using one of these approaches⁷. However, much bigger improvements are necessary to enable routine whole human genome sequencing in genetic research.

We describe a massively parallel synthetic sequencing approach that transforms our ability to use DNA and RNA sequence information in biological systems. We demonstrate utility by re-sequencing an individual human genome to high accuracy. Our approach delivers data at very high throughput and low cost, and enables extraction of genetic information of high biological value, including single-nucleotide polymorphisms (SNPs) and structural variants.

DNA sequencing using reversible terminators

We generated high-density single-molecule arrays of genomic DNA fragments attached to the surface of the reaction chamber (the flow cell) and used isothermal ‘bridging’ amplification to form DNA ‘clusters’ from each fragment. We made the DNA in each cluster single-stranded and added a universal primer for sequencing. For paired read sequencing, we then converted the templates to double-stranded DNA and removed the original strands, leaving the complementary strand as template for the second sequencing reaction (Fig. 1a–c). To obtain paired reads separated by larger distances, we circularized DNA fragments of the required length (for example, 2 ± 0.2 kb) and obtained short junction fragments for paired end sequencing (Fig. 1d).

We sequenced DNA templates by repeated cycles of polymerase-directed single base extension. To ensure base-by-base nucleotide incorporation in a stepwise manner, we used a set of four reversible terminators, 3′-O-azidomethyl 2′-deoxynucleoside triphosphates (A, C, G and T), each labelled with a different removable fluorophore (Supplementary Fig. 1a)⁸. The use of 3′-modified nucleotides allowed the incorporation to be driven essentially to completion without risk of over-incorporation. It also enabled addition of all four nucleotides simultaneously rather than sequentially, minimizing risk of misincorporation. We engineered the active site of 9°N DNA polymerase to improve the efficiency of incorporation of these unnatural nucleotides⁹. After each cycle of incorporation, we determined the identity of the inserted base by laser-induced excitation of the fluorophores and imaging. We added tris(2-carboxyethyl)phosphine (TCEP) to remove the fluorescent dye and side arm from a linker attached to the base and simultaneously regenerate a 3′ hydroxyl group ready for the next cycle of nucleotide addition (Supplementary Fig. 1b). The Genome Analyzer (GA1) was designed to perform multiple cycles of sequencing chemistry and imaging to collect the sequence data automatically from each cluster on the surface of each lane of an eight-lane flow cell (Supplementary Fig. 2).

To determine the sequence from each cluster, we quantified the fluorescent signal from each cycle and applied a base-calling algorithm. We defined a quality (Q) value for each base call (scaled as by the phred algorithm¹⁰) that represents the likelihood of each call being correct (Supplementary Fig. 3). We used the Q-values in subsequent analyses to weight the contribution of each base to sequence alignment and detection of sequence variants (for example, SNP calling). We discarded all reads from mixed clusters and used the remaining ‘purity filtered’ reads for analysis. Typically we generated 1–2 Gb of high-quality purity filtered sequence per flow cell from ∼30–60-million single 35-base reads, or 2–4 Gb in a paired read experiment (Supplementary Table 1).

To demonstrate accurate sequencing of human DNA, we sequenced a human bacterial artificial chromosome (BAC) clone (bCX98J21) that contained 162,752 bp of the major histocompatibility complex on human chromosome 6 (accession AL662825.4, previously determined using capillary sequencing by the Wellcome Trust Sanger Institute). We developed a fast global alignment algorithm ELAND that aligns a read to the reference only if the read can be assigned a unique position with 0, 1 or 2 differences. We collected 0.17 Gb of aligned data for the BAC from one lane of a flow cell. Approximately 90% of the 35-base reads matched perfectly to the reference, demonstrating high raw read accuracy (Supplementary Fig. 4). To examine consensus coverage and accuracy, we used 5 Mb of 35-base purity filtered reads (30-fold average input depth of the BAC) and obtained 99.96% coverage of the reference. There was one consensus miscall, at a position of very low coverage (just above our cutoff threshold), yielding an overall consensus accuracy of >99.999%.

Detecting genetic variation of the human X chromosome

For an initial study of genetic variation, we sequenced flow-sorted X chromosomes of a Caucasian female (sample NA07340 originating from the Centre d’Etude du Polymorphisme Humain (CEPH)). We generated 278-million paired 30–35-bp purity filtered reads and aligned them to the human genome reference sequence. We carried out separate analyses of the data using two alignment algorithms: ELAND (see above) or MAQ (Mapping and Assembly with Qualities)¹¹. Both algorithms place each read pair where it best matches the reference and assign a confidence score to the alignment. In cases where a read has two or more equally likely positions (that is, in an exact repeat), MAQ randomly assigns the read pair to one position and assigns a zero alignment quality score (these reads are excluded from SNP analysis). ELAND rejects all non-unique alignments, which are mostly in recently inserted retrotransposons (see Supplementary Fig. 5). MAQ therefore provides an opportunity to assess the properties of a data set aligned to the entire reference, whereas ELAND effectively excludes ambiguities from the short read alignment before further analysis.

We obtained comprehensive coverage of the X chromosome from both analyses. With MAQ, 204 million reads aligned to 99.94% of the X chromosome at an average depth of 43×. With ELAND, 192 million reads covered 91% of the reference sequence, showing what can be covered by unique best alignments. These results were obtained after excluding reads aligning to non-X sequence (impurities of flow sorting) and apparently duplicated read pairs (Supplementary Table 2). We reasoned that these duplicates (∼10% of the total) arose during initial sample amplification.

The sampling of sequence fragments from the X chromosome is close to random. This is evident from the distribution of mapped read depth in the MAQ alignment in regions where the reference is unique (Fig. 2a): the variance of this distribution is only 2.26 times that of a Poisson distribution (the theoretical minimum). Half of this excess variance can be accounted for by a dependence on G+C content. However, the average mapped read depth only falls below 10× in regions with G+C content less than 4% or greater than 76%, comprising in total just 1% of unique chromosome sequence and 3% of coding sequence (Fig. 2b).

We identified 92,485 candidate SNPs in the X chromosome using ELAND (Supplementary Fig. 6). Most calls (85%) match previous entries in the public database dbSNP. Heterozygosity (π) in this data set is 4.3 × 10^-4 (that is, one substitution per 2.3 kb), close to a previously published X chromosome estimate (4.7 × 10^-4)¹². Using MAQ we obtained 104,567 SNPs, most of which were common to the results of the ELAND analysis. The differences between the two sets of SNP calls are largely the consequence of different properties of the alignments as described earlier. For example, most of the SNPs found only by the MAQ-based analysis were at positions of low or zero sequence depth in the ELAND alignment (Supplementary Fig. 6c).

We assessed accuracy and completeness of SNP calling by comparison to genotypes obtained for this individual using the Illumina HumanHap550 BeadChip (HM550). The sequence data covered >99.8% of the 13,604 genotyped positions and we found excellent agreement between sequence-based SNP calls and genotyping data (99.52% or 99.99% using ELAND or MAQ, respectively; Supplementary Table 3). There was complete concordance of all homozygous calls and a low level of ‘under-calling’ from the sequence data (denoted as ‘GT>Seq’ in Table 1) at a small number of the heterozygous sites, caused by inadequate sampling of one of the two alleles. The depth of input sequence influences the coverage and accuracy of SNP calling. We found that reducing the read depth to 15× still gives 97% coverage of genotype positions and only 1.27% of the heterozygous sites are under-called. We observed no other types of disagreement at any input depth (Supplementary Fig. 7).

Table 1 Comparison of SNP calls made from sequence versus genotype data for the human genome (NA18507) and X chromosome (NA07340)

Full size table

We detected structural variants (defined as any variant other than a single base substitution) as follows. We found 9,747 short insertions/deletions (‘short indels’; defined here as less than the length of the read) by performing a gapped alignment of individual reads (Supplementary Fig. 8). We identified larger indels based on read depth and/or anomalous read pair spacing, similar to previous approaches^13,14,15. We detected 115 indels in total, 77 of which were visible from anomalous read-pair spacing (see Supplementary Tables 4 and 5). We developed Resembl, an extension to the Ensembl browser¹⁶, to view all variants (Supplementary Fig 9). Inversions can be detected when the orientation of one read in a pair is reversed (for example, see Supplementary Fig. 10). In general, inversions occur as the result of non-allelic homologous recombination, and are therefore flanked by repetitive sequence that can compromise alignments. We found partial evidence for other inversion events, but characterization of inversions from short read data is complex because of the repeats and requires further development.

Sequencing and analysis of a whole human genome

Our X chromosome study enabled us to develop an integrated set of methods for rapid sequencing and analysis of whole human genomes. We sequenced the genome of a male Yoruba from Ibadan, Nigeria (YRI, sample NA18507). This sample was originally collected for the HapMap project^17,18 through a process of community engagement and informed consent¹⁹ and has also been studied in other projects^20,21. We were therefore able to compare our results with publicly available data from the same sample. We constructed two libraries: one of short inserts (∼200 bp) with similar properties to the previous X chromosome library and one from long fragments (∼2 kb) to provide longer-range read-pair information (see Supplementary Fig. 11 for size distributions). We generated 135 Gb of sequence (∼4 billion paired 35-base reads; see Supplementary Table 6) over a period of 8 weeks (December 2007 to January 2008) on six GA1 instruments averaging 3.3 Gb per production run (see Supplementary Table 1 for example). The approximate consumables cost (based on full list price of reagents) was $250,000. We aligned 97% of the reads using MAQ and found that 99.9% of the human reference (NCBI build 36.1) was covered with one or more reads at an average of 40.6-fold depth. Using ELAND, we aligned 91% of the reads over 93% of the reference sequence at sufficient depth to call a strong consensus (>three Q30 bases). The distribution of mapped read depth was close to random, with slight over-dispersion as seen for the X chromosome data. We observed comprehensive representation across a wide range of G+C content, dropping only at the very extreme ends, but with a different pattern of distribution compared to the X chromosome (see Supplementary Fig. 12).

We identified ∼4 million SNPs, with 74% matching previous entries in dbSNP (Fig. 3). We found excellent agreement of our SNP calls with genotyping results: sequence-based SNP calls covered almost all of the 552,710 loci of HM550, with >99.5% concordance of sequencing versus genotyping calls (Table 1 and Supplementary Table 7a). The few disagreements were mostly under-calls of heterozygous positions (GT>Seq) in areas of low sequence depth, providing us with a false-negative rate of <0.35% from the ELAND analysis (see Table 1). The other disagreements (0.09% of all genotypes) included errors in genotyping plus apparent tri-allelic SNPs (Supplementary Table 7a). The main cause of genotype error (0.05% of all genotypes) is the existence of a second ‘hidden’ SNP close to the assayed locus that disrupts the genotyping assay, leading to loss of one allele and an erroneous homozygous genotype (Supplementary Figs 13 and 14).

Figure 3: **SNPs identified in the human genome sequence of NA18507.**

To examine the accuracy of SNP calling in more detail, we compared our sequence-based SNP calls with 3.7 million genotypes (HM-All) generated for this sample during the HapMap project (Table 1 and Supplementary Table 7b)¹⁸ and found excellent concordance between the data sets. Disagreements included sequence-based under-calls of heterozygous positions in regions of low read depth. The slightly higher level of other disagreements (0.76%) seen in this analysis compared to that of the HM550 data (0.09%) is in line with the higher level of underlying genotype error rate of 0.7% for the HapMap data¹⁸. To refine this analysis further, we generated a set of 530,750 very high confidence reference genotypes comprising concordant calls in both the HM550 and HM-All genotype data sets. Comparing the results of the MAQ analysis to this high confidence set (see Table 1), we found 130 heterozygote under-calls GT>Seq (that is, a false-negative rate of 0.025%). There were also 130 heterozygote over-calls Seq>GT, but most of these are probably genotype errors as 82 have a nearby ‘hidden’ SNP and 3 have a nearby indel. A further 41 are tri-allelic loci, leaving at most 4 potential wrong calls by sequencing (that is, false-positive rate of 4 per 529,589 positions). Finally we selected a subset of novel SNP calls from the sequence data and tested them by genotyping. We found 96.1% agreement between sequence and genotype calls (Supplementary Table 8). However, the 47 disagreements included 10 correct sequencing calls (genotyping under-calls owing to hidden SNPs) and 7 sequencing under-calls. On this basis, therefore, the false-positive discovery rate for the one million novel SNPs is 2.5% (30 out of 1,206). For the entire data set of four million SNPs detected in this analysis, the false-positive and -negative rates both average <1%.

This genome from a Yoruba individual contains significantly more polymorphism than a genome of European descent. The autosomal heterozygosity (π) of NA18507 is 9.94 × 10^-4 (1 SNP per 1,006 bp), higher than previous values for Caucasians (7.6 × 10^-4, ref. 12). Heterozygosity in the pseudoautosomal region 1 (PAR1) is substantially higher (1.92 × 10^-3) than the autosomal value. PAR1 (2.7 Mb) at the tip of the short arm of chromosomes X and Y undergoes obligatory recombination in male meiosis, which is equivalent to 20× the autosome average. This illustrates a clear correlation between recombination and nucleotide diversity. By contrast, the 0.33-Mb PAR2 region has a much lower recombination rate than PAR1; we observed that heterozygosity in PAR2 is identical to that of the autosomes in NA18507. Heterozygosity in coding regions is lower (0.54 × 10^-3) than the total autosome average, consistent with the model that some coding changes are deleterious and are lost as the result of natural selection²². Nevertheless, the 26,140 coding SNPs (Supplementary Fig. 15) include 5,361 non-conservative amino acid substitutions plus 153 premature termination codons (Supplementary Table 9), many of which are expected to affect protein function.

We performed a genome-wide survey of structural variation in this individual and found excellent correlation with variants that had been reported in previous studies, as well as detecting many new variants. We found 0.4 million short indels (1–16 bp; Supplementary Fig. 16), most of which are length polymorphisms in homopolymeric tracts of A or T. Half of these events are corroborated by entries in dbSNP, and 95 of 100 examined were present in amplicons sequenced from this individual in ENCODE regions, confirming the high specificity of this method of short indel detection. For larger structural variants (detected by anomalously spaced paired ends) we found that some were detected by both long and short insert data sets (Supplementary Fig. 17a), but most were unique to one or other data set. We observed two reasons for this: first, small events (<400 bp) are within the normal size variance of the long insert data; second, nearby repetitive structures can prevent unique alignment of read pairs (see Supplementary Fig. 17b, c). In some cases, the high resolution of the short insert data permits detection of additional complexity in a structural rearrangement that is not revealed by the long insert data. For example, where the long insert data indicate a 1.3-kb deletion in NA18507 relative to the reference, the short insert data reveal an inversion accompanied by deletions at both breakpoints (Fig. 4). We carried out de novo assembly of reads in this region and constructed a single contig that defines the exact structure of the rearrangement (data not shown).

Figure 4: **Homozygous complex rearrangement detected by anomalous paired reads.**

We discovered 5,704 structural variants ranging from 50 bp to >35 kb where there is sequence absent from the genome of NA18507 compared to the reference genome. We observed a steadily decreasing number of events of this type with increasing size, except for two peaks (Supplementary Fig. 18). Most of the events represented by the large peak at 300–350 bp contain a sequence of the AluY family. This is consistent with insertion of short interspersed nuclear elements (SINEs) that are present in the reference genome but missing from the genome of NA18507. Similarly, the second, smaller peak at 6–7 kb is the consequence of insertion of the long interspersed nuclear element (LINE) L1 Homo sapiens (L1Hs) in many cases. We found good correspondence between our results and the data of ref. 23, which reported 148 deletions of <100 kb in this individual on the basis of abnormal fosmid paired-end spacing. We found supporting evidence for 111 of these events. We detected a further 2,345 indels in the range 60–160 bp which are sequences present in the genome of NA18507 and absent from the reference genome (Supplementary Fig. 19). One example is shown in Supplementary Fig. 20. The ‘singleton’ reads on either side of the event, which have partners that do not align to the reference, form part of a de novo assembly that precisely defines the novel sequence and breakpoint (Supplementary Fig. 21).

Effect of sequence depth on coverage and accuracy

We investigated the impact of varying input read depth (and hence cost) on SNP calling using chromosome 2 as a model. SNP discovery increases with increasing depth: essentially all homozygous positions are detected at 15×, whereas heterozygous positions accumulate more gradually to 33× (Fig. 5a). This effect is influenced by the stringency of the SNP caller. To call each allele in this analysis we required the equivalent of two high-quality Q30 bases (as opposed to three used in full depth analyses). Homozygotes could be detected at read depth of 2× or higher, whereas heterozygote detection required at least double this depth for sampling of both alleles. Missing calls (not covered by sequence) and discordances between sequence-based SNP calls and genotype loci (mostly under-calls of heterozygotes due to low depth) progressively reduced with increasing depth (Fig. 5b). We observed very few other types of discordance at any depth; many of these are genotyping errors as described above.

Concluding remarks

Reversible terminator chemistry is a defining feature of this sequencing approach, enabling each cycle to be driven to completion while minimizing misincorporation. The result is a system that generates accurate data at very high throughput and low cost. We determined an accurate whole human genome sequence in 8 weeks to an average depth of ∼40×. We built a consensus sequence, optimized methods for analysis, assessed accuracy and characterized the genetic variation of this individual in detail.

We assessed accuracy relative to genotype data over the entire fraction of the human sequence where SNP calling was possible (>90%). We established very low false-positive and -negative rates for the ∼four million SNPs detected (<1% over-calls and under-calls). This compares favourably with previous individual genome analyses which reported a 24% under-calling of heterozygous positions^2,7.

Paired reads were very powerful in all areas of the analysis. They provided very accurate read alignment and thus improved the accuracy and coverage of consensus sequence and SNP calling. They were essential for developing our short indel caller, and for detecting larger structural variants. Our short-insert paired-read data set introduced a new level of resolution in structural variation detection, revealing thousands of variants in a size range not characterized previously. In some cases we determined the exact sequence of structural variants by de novo assembly from the same paired-read data set. Interpreting events that are embedded in repetitive sequence tracts will require further work.

Massively parallel sequencing technology makes it feasible to consider whole human genome sequencing as a clinical tool in the near future. Characterizing multiple individual genomes will enable us to unravel the complexities of human variation in cancer and other diseases and will pave the way for the use of personal genome sequences in medicine and healthcare. Accuracy of personal genetic information from sequence will be critical for life-changing decisions.

In addition to the large-scale genomic projects exemplified by the present study and others^15,24,25,26, the system described here is being used to explore biological phenomena in unprecedented detail, including transcriptional activity, mechanisms of gene regulation and epigenetic modification of DNA and chromatin^{27,28,29,30,31,32}. In the future, DNA sequencing will be the central tool for unravelling how genetic information is used in living processes.

Methods Summary

DNA and sequencing

DNA samples (NA07340 and NA18507) and cell line (GM07340) were obtained from Coriell Repositories. DNA samples were genotyped on the HM550 array and the results compared to publicly available data to confirm their identity before use. Methods for DNA manipulation, including sample preparation, formation of single-molecule arrays, cluster growth and sequencing were all developed during this study and formed the basis for the standard protocols now available from Illumina, Inc. All sequencing was performed on Illumina GA1s equipped with a one-megapixel camera. All purity filtered read data are available for download from the Short Read Archive at NCBI or from the European Short Read Archive (ERA) at the EBI.

Analysis software

Image analysis software and the ELAND aligner are provided as part of the Genome Analyzer analysis software. SNP and structural variant detectors will be available as future upgrades of the analysis pipeline. The Resembl extension to Ensembl is available on request. The MAQ (Mapping and Assembly with Qualities) aligner is freely available for download from http://maq.sourceforge.net.

Data access

Sequence data for NA18507 are freely available from the NCBI short read archive, accession SRA000271 (ftp://ftp.ncbi.nih.gov/pub/TraceDB/ShortRead/SRA000271). X chromosome data are freely available from ERA, accession ERA000035. Links to Resembl displays for chromosome X and human data, plus information on other available data, are provided at http://www.illumina.com/HumanGenome.

See Supplementary Methods for a detailed Methods section.

References

International Human Genome Sequencing Consortium. Finishing the euchromatic sequence of the human genome. Nature 431, 931–945 (2004)
Levy, S. et al. The diploid genome sequence of an individual human. PLoS Biol. 5, e254 (2007)
Article Google Scholar
Margulies, M. et al. Genome sequencing in microfabricated high-density picolitre reactors. Nature 437, 376–380 (2005)
Article ADS CAS Google Scholar
Shendure, J. et al. Accurate multiplex polony sequencing of an evolved bacterial genome. Science 309, 1728–1732 (2005)
Article ADS CAS Google Scholar
Harris, T. D. et al. Single-molecule DNA sequencing of a viral genome. Science 320, 106–109 (2008)
Article ADS CAS Google Scholar
Lundquist, P. M. et al. Parallel confocal detection of single molecules in real time. Opt. Lett. 33, 1026–1028 (2008)
Article ADS Google Scholar
Wheeler, D. A. et al. The complete genome of an individual by massively parallel DNA sequencing. Nature 452, 872–876 (2008)
Article ADS CAS Google Scholar
Milton, J. et al. Modified nucleotides. World Intellectual Property Organization WO/2004/018497. (2004)
Smith, G. P. et al. Modified polymerases for improved incorporation of nucleotide analogues. World Intellectual Property Organization WO/2005/024010. (2005)
Ewing, B. & Green, P. Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res. 8, 186–194 (1998)
Article CAS Google Scholar
Li, H., Ruan, J. & Durbin, R. Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. doi: 10.1101/gr.078212.108 (25 September 2008)
The International SNP Map Working Group. A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 409, 928–933 (2001)
Tuzun, E. et al. Fine-scale structural variation of the human genome. Nature Genet. 37, 727–732 (2005)
Article CAS Google Scholar
Korbel, J. O. et al. Paired-end mapping reveals extensive structural variation in the human genome. Science 318, 420–426 (2007)
Article ADS CAS Google Scholar
Campbell, P. J. et al. Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing. Nature Genet. 40, 722–729 (2008)
Article CAS Google Scholar
Hubbard, T. et al. The Ensembl genome database project. Nucleic Acids Res. 30, 38–41 (2002)
Article CAS Google Scholar
The International HapMap Consortium. A haplotype map of the human genome. Nature 437, 1299–1320 (2005)
The International HapMap Consortium. A second generation human haplotype map of over 3.1 million SNPs. Nature 449, 851–861 (2007)
The International HapMap Consortium. The International HapMap Project. Nature 426, 789–796 (2003)
The ENCODE Project Consortium. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447, 799–816 (2007)
Redon, R. et al. Global variation in copy number in the human genome. Nature 444, 444–454 (2006)
Article ADS CAS Google Scholar
Cargill, M. et al. Characterization of single-nucleotide polymorphisms in coding regions of human genes. Nature Genet. 22, 231–238 (1999)
Article CAS Google Scholar
Kidd, J. M. et al. Mapping and sequencing of structural variation from eight human genomes. Nature 453, 56–64 (2008)
Article ADS CAS Google Scholar
Hillier, L. W. et al. Whole-genome sequencing and variant discovery in C. elegans . Nature Methods 5, 183–188 (2008)
Article CAS Google Scholar
Hodges, E. et al. Genome-wide in situ exon capture for selective resequencing. Nature Genet. 39, 1522–1527 (2007)
Article CAS Google Scholar
Porreca, G. J. et al. Multiplex amplification of large sets of human exons. Nature Methods 4, 931–936 (2007)
Article CAS Google Scholar
Barski, A. et al. High-resolution profiling of histone methylations in the human genome. Cell 129, 823–837 (2007)
Article CAS Google Scholar
Johnson, D. S., Mortazavi, A., Myers, R. M. & Wold, B. Genome-wide mapping of in vivo protein-DNA interactions. Science 316, 1497–1502 (2007)
Article ADS CAS Google Scholar
Mikkelsen, T. S. et al. Genome-wide maps of chromatin state in pluripotent and lineage-committed cells. Nature 448, 553–560 (2007)
Article ADS CAS Google Scholar
Boyle, A. P. et al. High-resolution mapping and characterization of open chromatin across the genome. Cell 132, 311–322 (2008)
Article CAS Google Scholar
Lister, R. et al. Highly integrated single-base resolution maps of the epigenome in Arabidopsis . Cell 133, 523–536 (2008)
Article CAS Google Scholar
Mortazavi, A., Williams, B. A., McCue, K., Schaeffer, L. & Wold, B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nature Methods 5, 585–587 (2008)
Article Google Scholar
Fedurco, M., Romieu, A., Williams, S., Lawrence, I. & Turcatti, G. BTA, a novel reagent for DNA attachment on glass and efficient generation of solid-phase amplified DNA colonies. Nucleic Acids Res. 34, e22 (2006)
Article Google Scholar

Download references

Acknowledgements

The authors acknowledge the advice of A. Williamson, T. Rink, S. Benkovic, J. Berriman, J. Todd, R. Waterston, S. Eletr, W. Jack, M. Cooper, T. Brown, C. Reece and R. Cook during this work; E. Margulies for assistance with data analysis; M. Shumway for assistance with data submission; and the contributions of the administrative and support staff at all the institutions. This research was supported in part by The Wellcome Trust (to H.L., A.Sc., K.W., N.P.C, B.N.L., J.R., M.E.H. and R.D.), the Biotechnology and Biological Sciences Research Council (BBSRC) (to S.B. and D.K.), the BBSRC Applied Genomics LINK Programme (to A.Sp. and C.L.B.) and the Intramural Research Program of the National Human Genome Research Institute, National Institutes of Health (to N.F.H. and J.C.M.). S. Balasubramanian and D. Klenerman are inventors and founders of Solexa Ltd.

Author information

Harold P. Swerdlow, John Milton, Clive G. Brown, Jane Rogers & Nick J. McCooke
Present address: †Present addresses: The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK (H.P.S.); Oxford Nanopore Technologies, Begbroke Science Park, Sandy Lane, Kidlington OX5 1PF, UK (J.M., C.G.B.); BBSRC Genome Analysis Centre, John Innes Centre, Norwich Research Park, Colney, Norwich NR4 7UH, UK (J.R.); Pronota, NV, VIB Bio-Incubator, Technologiepark 4, B-9052 Zwijnaarde/Ghent, Belgium (N.J.M.).,

Authors and Affiliations

Illumina Cambridge Ltd. (Formerly Solexa Ltd), Chesterford Research Park, Little Chesterford, Nr Saffron Walden, Essex CB10 1XL, UK.,
David R. Bentley, Harold P. Swerdlow, Geoffrey P. Smith, John Milton, Clive G. Brown, Kevin P. Hall, Dirk J. Evers, Colin L. Barnes, Helen R. Bignell, Jonathan M. Boutell, Jason Bryant, Richard J. Carter, R. Keira Cheetham, Anthony J. Cox, Darren J. Ellis, Niall A. Gormley, Sean J. Humphray, Leslie J. Irving, Xiaohai Liu, Klaus S. Maisinger, Lisa J. Murray, Bojan Obradovic, Tobias Ost, Michael L. Parkinson, Isabelle M. J. Rasolonjatovo, Roberto Rigatti, Chiara Rodighiero, Mark T. Ross, Andrea Sabot, Mark E. Smith, Vincent P. Smith, Anastassia Spiridou, Peta E. Torrance, Xiaolin Wu, Carole Anastasi, Ify C. Aniebo, David M. D. Bailey, Iain R. Bancarz, Selena G. Barbour, Vincent A. Benoit, Kevin F. Benson, Claire Bevis, Phillip J. Black, Asha Boodhun, Joe S. Brennan, Rob C. Brown, Andrew A. Brown, Abass A. Bundu, Maria Chiara E. Catenazzi, R. Neil Cooley, Natasha R. Crake, Olubunmi O. Dada, Konstantinos D. Diakoumakos, Belen Dominguez-Fernandez, David J. Earnshaw, Ugonna C. Egbujor, Louise J. Fraser, Karin V. Fuentes Fajardo, Colin P. Goddard, David E. Green, Kevin Harnish, Narinder I. Heyer, Matthew M. Hims, Adrian M. Horgan, Katya Hoschler, Terena James, T. A. Huw Jones, Gyoung-Dong Kang, Alan D. Kersey, Zoya Kingsbury, Paula I. Kokko-Gonzales, Anil Kumar, Sarah E. Lee, Jennifer A. Loch, Radhika M. Mammen, Patrick G. McCauley, Parul Mehta, Taksina Newington, Sonia M. Novo, Mark A. Osborne, Andrew Osnowski, Lea Pickering, Andrew C. Pike, Come Raczy, Vicki H. Rae, Stephen R. Rawlings, Ana Chiva Rodriguez, Phyllida M. Roe, John Rogers, Maria C. Rogert Bacigalupo, Nikolai Romanov, Natalie J. Rourke, Silke T. Ruediger, Raquel M. Sanches-Kuiper, Martin R. Schenker, Richard J. Shaw, Melanie A. Smith, Jean Ernest Sohna Sohna, Kim Stevens, Neil Sutton, Lukasz Szajkowski, Carolyn L. Tregidgo, Stephanie vandeVondele, Jingwen Wang, Graham J. Worsley, Nick J. McCooke & Anthony J. Smith
Department of Chemistry, University of Cambridge, The University Chemical Laboratory, Lensfield Road, Cambridge CB2 1EW, UK.,
Shankar Balasubramanian, Colin L. Barnes, Xiaohai Liu, David J. Earnshaw, W. Scott Furey, Mark A. Osborne & David Klenerman
Illumina Hayward (Formerly Solexa Inc.), 23851 Industrial Boulevard, Hayward, California 94343, USA.,
Michael R. Flatbush, Mirian S. Karbelashvili, Scott M. Kirk, Mark R. Pratt, Mark T. Reed, Subramanian V. Sankar, Gary P. Schroth, Svilen S. Tzonev, Eric H. Vermaas, Lu Zhang, Mohammed D. Alam, Saibal Banerjee, Primo A. Baybayan, John A. Bridgham, Dale H. Buermann, James C. Burrows, Nestor Castillo, Simon Chang, David W. Elmore, Sergey S. Etchin, Mark R. Ewan, David George, George S. Golda, Philip A. Granieri, David L. Gustafson, Christian D. Haudenschild, Johnny T. Ho, Steve Hurwitz, Denis V. Ivanov, Maria Q. Johnson, Tzvetana H. Kerelska, Irina Khrebtukova, Alex P. Kindwall, Xavier Lee, Arnold K. Liao, Mitch Lok, Shujun Luo, John W. Martin, Paul McNitt, Keith W. Moon, Joe W. Mullens, Michael J. O’Neill, Omead Ostadan, Lambros L. Paraschos, Alger C. Pike, D. Chris Pinkard, Daniel P. Pliskin, Joe Podhasky, Victor J. Quijano, Rithy K. Roth, Eli Rusman, Josefina M. Seoane, Mitch K. Shiver, Steven W. Short, Ning L. Sizto, Johannes P. Sluis, Eric J. Spence, Yuli Verhovsky, Selene M. Virk, Suzanne Wakelin, Gregory C. Walcott, Juying Yan, Ling Yau, Mike Zuerlein, John S. West, Frank L. Oaks & Peter L. Lundberg
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.,
Heng Li, Aylwyn Scally, Klaudia Walter, Nigel P. Carter, Zemin Ning, Bee Ling Ng, Jane Rogers, Matthew E. Hurles & Richard Durbin
Manteia Predictive Medicine S.A. Zone Industrielle, Coinsins, CH-1267, Switzerland.,
Milan Fedurco, Anthony Romieu & Gerardo Turcatti
Illumina Inc., Corporate Headquarters, 9883 Towne Centre Drive, San Diego, California 92121, USA.,
Kimberley J. Gietzen, Marc A. Laurent, Cynthia T. Lawley & Omead Ostadan
National Human Genome Research Institute, National Institutes of Health, 41 Center Drive, MSC 2132, 9000 Rockville Pike, Bethesda, Maryland 20892-2132, USA.,
Nancy F. Hansen & James C. Mullikin

Authors

David R. Bentley
View author publications
You can also search for this author in PubMed Google Scholar
Shankar Balasubramanian
View author publications
You can also search for this author in PubMed Google Scholar
Harold P. Swerdlow
View author publications
You can also search for this author in PubMed Google Scholar
Geoffrey P. Smith
View author publications
You can also search for this author in PubMed Google Scholar
John Milton
View author publications
You can also search for this author in PubMed Google Scholar
Clive G. Brown
View author publications
You can also search for this author in PubMed Google Scholar
Kevin P. Hall
View author publications
You can also search for this author in PubMed Google Scholar
Dirk J. Evers
View author publications
You can also search for this author in PubMed Google Scholar
Colin L. Barnes
View author publications
You can also search for this author in PubMed Google Scholar
Helen R. Bignell
View author publications
You can also search for this author in PubMed Google Scholar
Jonathan M. Boutell
View author publications
You can also search for this author in PubMed Google Scholar
Jason Bryant
View author publications
You can also search for this author in PubMed Google Scholar
Richard J. Carter
View author publications
You can also search for this author in PubMed Google Scholar
R. Keira Cheetham
View author publications
You can also search for this author in PubMed Google Scholar
Anthony J. Cox
View author publications
You can also search for this author in PubMed Google Scholar
Darren J. Ellis
View author publications
You can also search for this author in PubMed Google Scholar
Michael R. Flatbush
View author publications
You can also search for this author in PubMed Google Scholar
Niall A. Gormley
View author publications
You can also search for this author in PubMed Google Scholar
Sean J. Humphray
View author publications
You can also search for this author in PubMed Google Scholar
Leslie J. Irving
View author publications
You can also search for this author in PubMed Google Scholar
Mirian S. Karbelashvili
View author publications
You can also search for this author in PubMed Google Scholar
Scott M. Kirk
View author publications
You can also search for this author in PubMed Google Scholar
Heng Li
View author publications
You can also search for this author in PubMed Google Scholar
Xiaohai Liu
View author publications
You can also search for this author in PubMed Google Scholar
Klaus S. Maisinger
View author publications
You can also search for this author in PubMed Google Scholar
Lisa J. Murray
View author publications
You can also search for this author in PubMed Google Scholar
Bojan Obradovic
View author publications
You can also search for this author in PubMed Google Scholar
Tobias Ost
View author publications
You can also search for this author in PubMed Google Scholar
Michael L. Parkinson
View author publications
You can also search for this author in PubMed Google Scholar
Mark R. Pratt
View author publications
You can also search for this author in PubMed Google Scholar
Isabelle M. J. Rasolonjatovo
View author publications
You can also search for this author in PubMed Google Scholar
Mark T. Reed
View author publications
You can also search for this author in PubMed Google Scholar
Roberto Rigatti
View author publications
You can also search for this author in PubMed Google Scholar
Chiara Rodighiero
View author publications
You can also search for this author in PubMed Google Scholar
Mark T. Ross
View author publications
You can also search for this author in PubMed Google Scholar
Andrea Sabot
View author publications
You can also search for this author in PubMed Google Scholar
Subramanian V. Sankar
View author publications
You can also search for this author in PubMed Google Scholar
Aylwyn Scally
View author publications
You can also search for this author in PubMed Google Scholar
Gary P. Schroth
View author publications
You can also search for this author in PubMed Google Scholar
Mark E. Smith
View author publications
You can also search for this author in PubMed Google Scholar
Vincent P. Smith
View author publications
You can also search for this author in PubMed Google Scholar
Anastassia Spiridou
View author publications
You can also search for this author in PubMed Google Scholar
Peta E. Torrance
View author publications
You can also search for this author in PubMed Google Scholar
Svilen S. Tzonev
View author publications
You can also search for this author in PubMed Google Scholar
Eric H. Vermaas
View author publications
You can also search for this author in PubMed Google Scholar
Klaudia Walter
View author publications
You can also search for this author in PubMed Google Scholar
Xiaolin Wu
View author publications
You can also search for this author in PubMed Google Scholar
Lu Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Mohammed D. Alam
View author publications
You can also search for this author in PubMed Google Scholar
Carole Anastasi
View author publications
You can also search for this author in PubMed Google Scholar
Ify C. Aniebo
View author publications
You can also search for this author in PubMed Google Scholar
David M. D. Bailey
View author publications
You can also search for this author in PubMed Google Scholar
Iain R. Bancarz
View author publications
You can also search for this author in PubMed Google Scholar
Saibal Banerjee
View author publications
You can also search for this author in PubMed Google Scholar
Selena G. Barbour
View author publications
You can also search for this author in PubMed Google Scholar
Primo A. Baybayan
View author publications
You can also search for this author in PubMed Google Scholar
Vincent A. Benoit
View author publications
You can also search for this author in PubMed Google Scholar
Kevin F. Benson
View author publications
You can also search for this author in PubMed Google Scholar
Claire Bevis
View author publications
You can also search for this author in PubMed Google Scholar
Phillip J. Black
View author publications
You can also search for this author in PubMed Google Scholar
Asha Boodhun
View author publications
You can also search for this author in PubMed Google Scholar
Joe S. Brennan
View author publications
You can also search for this author in PubMed Google Scholar
John A. Bridgham
View author publications
You can also search for this author in PubMed Google Scholar
Rob C. Brown
View author publications
You can also search for this author in PubMed Google Scholar
Andrew A. Brown
View author publications
You can also search for this author in PubMed Google Scholar
Dale H. Buermann
View author publications
You can also search for this author in PubMed Google Scholar
Abass A. Bundu
View author publications
You can also search for this author in PubMed Google Scholar
James C. Burrows
View author publications
You can also search for this author in PubMed Google Scholar
Nigel P. Carter
View author publications
You can also search for this author in PubMed Google Scholar
Nestor Castillo
View author publications
You can also search for this author in PubMed Google Scholar
Maria Chiara E. Catenazzi
View author publications
You can also search for this author in PubMed Google Scholar
Simon Chang
View author publications
You can also search for this author in PubMed Google Scholar
R. Neil Cooley
View author publications
You can also search for this author in PubMed Google Scholar
Natasha R. Crake
View author publications
You can also search for this author in PubMed Google Scholar
Olubunmi O. Dada
View author publications
You can also search for this author in PubMed Google Scholar
Konstantinos D. Diakoumakos
View author publications
You can also search for this author in PubMed Google Scholar
Belen Dominguez-Fernandez
View author publications
You can also search for this author in PubMed Google Scholar
David J. Earnshaw
View author publications
You can also search for this author in PubMed Google Scholar
Ugonna C. Egbujor
View author publications
You can also search for this author in PubMed Google Scholar
David W. Elmore
View author publications
You can also search for this author in PubMed Google Scholar
Sergey S. Etchin
View author publications
You can also search for this author in PubMed Google Scholar
Mark R. Ewan
View author publications
You can also search for this author in PubMed Google Scholar
Milan Fedurco
View author publications
You can also search for this author in PubMed Google Scholar
Louise J. Fraser
View author publications
You can also search for this author in PubMed Google Scholar
Karin V. Fuentes Fajardo
View author publications
You can also search for this author in PubMed Google Scholar
W. Scott Furey
View author publications
You can also search for this author in PubMed Google Scholar
David George
View author publications
You can also search for this author in PubMed Google Scholar
Kimberley J. Gietzen
View author publications
You can also search for this author in PubMed Google Scholar
Colin P. Goddard
View author publications
You can also search for this author in PubMed Google Scholar
George S. Golda
View author publications
You can also search for this author in PubMed Google Scholar
Philip A. Granieri
View author publications
You can also search for this author in PubMed Google Scholar
David E. Green
View author publications
You can also search for this author in PubMed Google Scholar
David L. Gustafson
View author publications
You can also search for this author in PubMed Google Scholar
Nancy F. Hansen
View author publications
You can also search for this author in PubMed Google Scholar
Kevin Harnish
View author publications
You can also search for this author in PubMed Google Scholar
Christian D. Haudenschild
View author publications
You can also search for this author in PubMed Google Scholar
Narinder I. Heyer
View author publications
You can also search for this author in PubMed Google Scholar
Matthew M. Hims
View author publications
You can also search for this author in PubMed Google Scholar
Johnny T. Ho
View author publications
You can also search for this author in PubMed Google Scholar
Adrian M. Horgan
View author publications
You can also search for this author in PubMed Google Scholar
Katya Hoschler
View author publications
You can also search for this author in PubMed Google Scholar
Steve Hurwitz
View author publications
You can also search for this author in PubMed Google Scholar
Denis V. Ivanov
View author publications
You can also search for this author in PubMed Google Scholar
Maria Q. Johnson
View author publications
You can also search for this author in PubMed Google Scholar
Terena James
View author publications
You can also search for this author in PubMed Google Scholar
T. A. Huw Jones
View author publications
You can also search for this author in PubMed Google Scholar
Gyoung-Dong Kang
View author publications
You can also search for this author in PubMed Google Scholar
Tzvetana H. Kerelska
View author publications
You can also search for this author in PubMed Google Scholar
Alan D. Kersey
View author publications
You can also search for this author in PubMed Google Scholar
Irina Khrebtukova
View author publications
You can also search for this author in PubMed Google Scholar
Alex P. Kindwall
View author publications
You can also search for this author in PubMed Google Scholar
Zoya Kingsbury
View author publications
You can also search for this author in PubMed Google Scholar
Paula I. Kokko-Gonzales
View author publications
You can also search for this author in PubMed Google Scholar
Anil Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Marc A. Laurent
View author publications
You can also search for this author in PubMed Google Scholar
Cynthia T. Lawley
View author publications
You can also search for this author in PubMed Google Scholar
Sarah E. Lee
View author publications
You can also search for this author in PubMed Google Scholar
Xavier Lee
View author publications
You can also search for this author in PubMed Google Scholar
Arnold K. Liao
View author publications
You can also search for this author in PubMed Google Scholar
Jennifer A. Loch
View author publications
You can also search for this author in PubMed Google Scholar
Mitch Lok
View author publications
You can also search for this author in PubMed Google Scholar
Shujun Luo
View author publications
You can also search for this author in PubMed Google Scholar
Radhika M. Mammen
View author publications
You can also search for this author in PubMed Google Scholar
John W. Martin
View author publications
You can also search for this author in PubMed Google Scholar
Patrick G. McCauley
View author publications
You can also search for this author in PubMed Google Scholar
Paul McNitt
View author publications
You can also search for this author in PubMed Google Scholar
Parul Mehta
View author publications
You can also search for this author in PubMed Google Scholar
Keith W. Moon
View author publications
You can also search for this author in PubMed Google Scholar
Joe W. Mullens
View author publications
You can also search for this author in PubMed Google Scholar
Taksina Newington
View author publications
You can also search for this author in PubMed Google Scholar
Zemin Ning
View author publications
You can also search for this author in PubMed Google Scholar
Bee Ling Ng
View author publications
You can also search for this author in PubMed Google Scholar
Sonia M. Novo
View author publications
You can also search for this author in PubMed Google Scholar
Michael J. O’Neill
View author publications
You can also search for this author in PubMed Google Scholar
Mark A. Osborne
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Osnowski
View author publications
You can also search for this author in PubMed Google Scholar
Omead Ostadan
View author publications
You can also search for this author in PubMed Google Scholar
Lambros L. Paraschos
View author publications
You can also search for this author in PubMed Google Scholar
Lea Pickering
View author publications
You can also search for this author in PubMed Google Scholar
Andrew C. Pike
View author publications
You can also search for this author in PubMed Google Scholar
Alger C. Pike
View author publications
You can also search for this author in PubMed Google Scholar
D. Chris Pinkard
View author publications
You can also search for this author in PubMed Google Scholar
Daniel P. Pliskin
View author publications
You can also search for this author in PubMed Google Scholar
Joe Podhasky
View author publications
You can also search for this author in PubMed Google Scholar
Victor J. Quijano
View author publications
You can also search for this author in PubMed Google Scholar
Come Raczy
View author publications
You can also search for this author in PubMed Google Scholar
Vicki H. Rae
View author publications
You can also search for this author in PubMed Google Scholar
Stephen R. Rawlings
View author publications
You can also search for this author in PubMed Google Scholar
Ana Chiva Rodriguez
View author publications
You can also search for this author in PubMed Google Scholar
Phyllida M. Roe
View author publications
You can also search for this author in PubMed Google Scholar
John Rogers
View author publications
You can also search for this author in PubMed Google Scholar
Maria C. Rogert Bacigalupo
View author publications
You can also search for this author in PubMed Google Scholar
Nikolai Romanov
View author publications
You can also search for this author in PubMed Google Scholar
Anthony Romieu
View author publications
You can also search for this author in PubMed Google Scholar
Rithy K. Roth
View author publications
You can also search for this author in PubMed Google Scholar
Natalie J. Rourke
View author publications
You can also search for this author in PubMed Google Scholar
Silke T. Ruediger
View author publications
You can also search for this author in PubMed Google Scholar
Eli Rusman
View author publications
You can also search for this author in PubMed Google Scholar
Raquel M. Sanches-Kuiper
View author publications
You can also search for this author in PubMed Google Scholar
Martin R. Schenker
View author publications
You can also search for this author in PubMed Google Scholar
Josefina M. Seoane
View author publications
You can also search for this author in PubMed Google Scholar
Richard J. Shaw
View author publications
You can also search for this author in PubMed Google Scholar
Mitch K. Shiver
View author publications
You can also search for this author in PubMed Google Scholar
Steven W. Short
View author publications
You can also search for this author in PubMed Google Scholar
Ning L. Sizto
View author publications
You can also search for this author in PubMed Google Scholar
Johannes P. Sluis
View author publications
You can also search for this author in PubMed Google Scholar
Melanie A. Smith
View author publications
You can also search for this author in PubMed Google Scholar
Jean Ernest Sohna Sohna
View author publications
You can also search for this author in PubMed Google Scholar
Eric J. Spence
View author publications
You can also search for this author in PubMed Google Scholar
Kim Stevens
View author publications
You can also search for this author in PubMed Google Scholar
Neil Sutton
View author publications
You can also search for this author in PubMed Google Scholar
Lukasz Szajkowski
View author publications
You can also search for this author in PubMed Google Scholar
Carolyn L. Tregidgo
View author publications
You can also search for this author in PubMed Google Scholar
Gerardo Turcatti
View author publications
You can also search for this author in PubMed Google Scholar
Stephanie vandeVondele
View author publications
You can also search for this author in PubMed Google Scholar
Yuli Verhovsky
View author publications
You can also search for this author in PubMed Google Scholar
Selene M. Virk
View author publications
You can also search for this author in PubMed Google Scholar
Suzanne Wakelin
View author publications
You can also search for this author in PubMed Google Scholar
Gregory C. Walcott
View author publications
You can also search for this author in PubMed Google Scholar
Jingwen Wang
View author publications
You can also search for this author in PubMed Google Scholar
Graham J. Worsley
View author publications
You can also search for this author in PubMed Google Scholar
Juying Yan
View author publications
You can also search for this author in PubMed Google Scholar
Ling Yau
View author publications
You can also search for this author in PubMed Google Scholar
Mike Zuerlein
View author publications
You can also search for this author in PubMed Google Scholar
Jane Rogers
View author publications
You can also search for this author in PubMed Google Scholar
James C. Mullikin
View author publications
You can also search for this author in PubMed Google Scholar
Matthew E. Hurles
View author publications
You can also search for this author in PubMed Google Scholar
Nick J. McCooke
View author publications
You can also search for this author in PubMed Google Scholar
John S. West
View author publications
You can also search for this author in PubMed Google Scholar
Frank L. Oaks
View author publications
You can also search for this author in PubMed Google Scholar
Peter L. Lundberg
View author publications
You can also search for this author in PubMed Google Scholar
David Klenerman
View author publications
You can also search for this author in PubMed Google Scholar
Richard Durbin
View author publications
You can also search for this author in PubMed Google Scholar
Anthony J. Smith
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to David R. Bentley.

Ethics declarations

Competing interests

All authors at Illumina (see affiliations) are employees of Illumina Inc., a public company that develops and markets systems for genetic analysis.

Supplementary information

Supplementary Information

This file contains Supplementary Methods, Supplementary Figures S1-S21 with Legends and Supplementary Tables S1-S9. (PDF 2802 kb)

PowerPoint slides

PowerPoint slide for Fig. 1

PowerPoint slide for Fig. 2

PowerPoint slide for Fig. 3

PowerPoint slide for Fig. 4

PowerPoint slide for Fig. 5

Rights and permissions

This article is distributed under the terms of the Creative Commons Attribution-Non-Commercial-Share Alike licence (http://creativecommons.org/licenses/by-nc-sa/3.0/), which permits distribution, and reproduction in any medium, provided the original author and source are credited. This license does not permit commercial exploitation, and derivative works must be licensed under the same or similar licence.

Reprints and permissions

About this article

Cite this article

Bentley, D., Balasubramanian, S., Swerdlow, H. et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456, 53–59 (2008). https://doi.org/10.1038/nature07517

Download citation

Received: 24 June 2008
Accepted: 02 October 2008
Issue Date: 06 November 2008
DOI: https://doi.org/10.1038/nature07517

This article is cited by

Whole genome sequencing in clinical practice
- Frederik Otzen Bagger
- Line Borgwardt
- Finn Cilius Nielsen
BMC Medical Genomics (2024)
Lung microbiome: new insights into the pathogenesis of respiratory diseases
- Ruomeng Li
- Jing Li
- Xikun Zhou
Signal Transduction and Targeted Therapy (2024)
Clamping-mediated incorporation of single-stranded DNA with concomitant DNA synthesis by Taq polymerase involves nick-translation
- Yoshiyuki Ohtsubo
- Syoutaro Kawahara
- Yuji Nagata
Scientific Reports (2024)
Biological big-data sources, problems of storage, computational issues, and applications: a comprehensive review
- Jyoti Kant Chaudhari
- Shubham Pant
- Dev Bukhsh Singh
Knowledge and Information Systems (2024)
Research progress and potential application of microRNA and other non-coding RNAs in forensic medicine
- Binghui Song
- Jie Qian
- Junjiang Fu
International Journal of Legal Medicine (2024)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.