Performance comparison of whole-genome sequencing platforms

Journal name:
Nature Biotechnology
Volume:
30,
Pages:
78–82
Year published:
DOI:
doi:10.1038/nbt.2065
Received
Accepted
Published online
Corrected online

Abstract

Whole-genome sequencing is becoming commonplace, but the accuracy and completeness of variant calling by the most widely used platforms from Illumina and Complete Genomics have not been reported. Here we sequenced the genome of an individual with both technologies to a high average coverage of ~76×, and compared their performance with respect to sequence coverage and calling of single-nucleotide variants (SNVs), insertions and deletions (indels). Although 88.1% of the ~3.7 million unique SNVs were concordant between platforms, there were tens of thousands of platform-specific calls located in genes and other genomic regions. In contrast, 26.5% of indels were concordant between platforms. Target enrichment validated 92.7% of the concordant SNVs, whereas validation by genotyping array revealed a sensitivity of 99.3%. The validation experiments also suggested that >60% of the platform-specific variants were indeed present in the genome. Our results have important implications for understanding the accuracy and completeness of the genome sequencing platforms.

At a glance

Figures

  1. Genome coverage at different read depths.
    Figure 1: Genome coverage at different read depths.

    (a) Percentage of genome covered by different read depths in different platforms. (b) Histogram of genome coverage at different read depths.

  2. SNV detection and intersection.
    Figure 2: SNV detection and intersection.

    (a) SNVs detected from the PBMC and saliva samples in each platform were combined. The unions of SNVs in each platform were then intersected. Sensitivity was measured against the Illumina Omni array. Ti/Tv is the transition-to-transversion ratio. The known and novel counts were based on dbSNP. 'Sanger' and 'validated' represent validation by Sanger sequencing and Illumina sequencing (with Agilent target enrichment capture), respectively. (b) Comparing platform-specific SNVs to non-SNV calls in another platform. IL, Illumina; CG, Complete Genomics.

  3. SNV association with different genomic elements.
    Figure 3: SNV association with different genomic elements.

    (a) Gene elements: UTR, exonic, intronic and intergenic regions. Inset: number of SNVs associated with UTR5, UTR3 and exonic regions. (b) Gene elements: splicing sites, noncoding RNA and upstream/downstream (<1 kb) regions of genes. (c) Repetitive elements: centromere, telomere, tRNA and rRNA. (d) Repetitive elements: L1, Alu, simple repeat and low-complexity repeat. (e) SNV frequency at different chromosomal locations. Tracks from outer to inner: SNV frequency for Illumina (IL), Complete Genomics (CG), concordant, IL-specific and CG-specific calls. Outermost: chromosome ideogram.

  4. Indel detection and intersection.
    Figure 4: Indel detection and intersection.

    (a) Indels detected from the PBMC and saliva samples in each platform were combined. The unions of indels in each platform were then intersected. Note: 5,668 IL and 8,415 CG indels were removed after 5b-window merging. (b) Indel size distribution. Negative size represents deletion and positive size represents insertion.

Accession codes

Referenced accessions

Sequence Read Archive

Change history

Corrected online 07 June 2012
In the version of this article initially published, the accession code to obtain raw sequence data was given as SRA045736.2; the correct code is SRA045736. The error has been corrected in the HTML and PDF versions of the article.

References

  1. Ajay, S.S., Parker, S.C., Ozel Abaan, H., Fuentes Fajardo, K.V. & Margulies, E.H. Accurate and comprehensive sequencing of personal genomes. Genome Research 21, 14981505 (2011).
  2. Ashley, E.A. et al. Clinical assessment incorporating a personal genome. Lancet 375, 15251535 (2010).
  3. Wheeler, D.A. et al. The complete genome of an individual by massively parallel DNA sequencing. Nature 452, 872876 (2008).
  4. McKernan, K.J. et al. Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding. Genome Res. 19, 15271541 (2009).
  5. Roach, J.C. et al. Analysis of genetic inheritance in a family quartet by whole-genome sequencing. Science 328, 636639 (2010).
  6. Pushkarev, D., Neff, N. & Quake, S. Single-molecule sequencing of an individual human genome. Nat. Biotechnol. 27, 847852 (2009).
  7. Korbel, J.O. et al. Paired-end mapping reveals extensive structural variation in the human genome. Science 318, 420426 (2007).
  8. Snyder, M., Du, J. & Gerstein, M. Personal genome sequencing: current approaches and challenges. Genes Dev. 24, 423431 (2010).
  9. Rios, J., Stein, E., Shendure, J., Hobbs, H.H. & Cohen, J.C. Identification by whole-genome resequencing of gene defect responsible for severe hypercholesterolemia. Hum. Mol. Genet. 19, 43134318 (2010).
  10. Lee, W. et al. The mutation spectrum revealed by paired genome sequences from a lung cancer patient. Nature 465, 473477 (2010).
  11. The 1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature 467, 10611073 (2010).
  12. Lander, E.S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860921 (2001).
  13. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 17541760 (2009).
  14. McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 12971303 (2010).
  15. Sherry, S.T. et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29, 308311 (2001).
  16. Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164 (2010).
  17. Chen, R., Davydov, E.V., Sirota, M. & Butte, A.J. Non-synonymous and synonymous coding SNPs show similar likelihood and effect size of human disease association. PLoS ONE 5, e13574 (2010).
  18. Kaur, I. et al. Variants in the 10q26 gene cluster (LOC387715 and HTRA1) exhibit enhanced risk of age-related macular degeneration along with CFH in Indian patients. Invest. Ophthalmol. Vis. Sci. 49, 17711776 (2008).
  19. Tam, P.O. et al. HTRA1 variants in exudative age-related macular degeneration and interactions with smoking and CFH. Invest. Ophthalmol. Vis. Sci. 49, 23572365 (2008).
  20. Yamaguchi, H. et al. Mutations in TERT, the gene for telomerase reverse transcriptase, in aplastic anemia. N. Engl. J. Med. 352, 14131424 (2005).
  21. Albers, C.A. et al. Dindel: Accurate indel calls from short-read data. Genome Res. 21, 961973 (2011).
  22. Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 21562158 (2011).
  23. Clark, M.J. et al. Performance comparison of exome DNA sequencing technologies. Nat. Biotechnol. 29, 908914 (2011).

Download references

Author information

Affiliations

  1. Department of Genetics, Stanford University, Stanford, California, USA.

    • Hugo Y K Lam,
    • Michael J Clark,
    • Rui Chen,
    • Maeve O'Huallachain &
    • Michael Snyder
  2. Division of Systems Medicine, Department of Pediatrics, Stanford University, Stanford, California, USA.

    • Rong Chen &
    • Atul J Butte
  3. Department of Medicine, Stanford University, Stanford, California, USA.

    • Georges Natsoulis &
    • Hanlee P Ji
  4. Center for Inherited Cardiovascular Disease, Division of Cardiovascular Medicine, Stanford University, Stanford, California, USA.

    • Frederick E Dewey &
    • Euan A Ashley
  5. Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, USA.

    • Lukas Habegger &
    • Mark B Gerstein
  6. Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut, USA.

    • Mark B Gerstein
  7. Department of Computer Science, Yale University, New Haven, Connecticut, USA.

    • Mark B Gerstein
  8. Present address: Personalis, Inc., Palo Alto, California, USA.

    • Hugo Y K Lam &
    • Rong Chen

Contributions

H.Y.K.L. and M.J.C. did the analysis. G.N. and L.H. assisted in the analysis. Rui C. did DNA sequencing. Rong C. did the disease-association study. Rui C. and M.O'H. did the validation experiments. H.Y.K.L., F.E.D., E.A.A., M.B.G., A.J.B., H.P.J. and M.S. coordinated the analysis and revised the manuscript. H.Y.K.L., M.J.C. and M.S. wrote the manuscript.

Competing financial interests

M.S. is a scientific advisory board member for Genapsys, Inc.; a scientific advisory board member and cofounder of Personalis, Inc.; and a consultant for Illumina.

Corresponding author

Correspondence to:

Author details

Supplementary information

PDF files

  1. Supplementary Text and Figures (1 MB)

    Supplementary Figures 1 and 2

Excel files

  1. Supplementary Table 1 (45 KB)

    Disease association of all platform-specific SNPs.

Additional data