Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Performance comparison of exome DNA sequencing technologies

Abstract

Whole exome sequencing by high-throughput sequencing of target-enriched genomic DNA (exome-seq) has become common in basic and translational research as a means of interrogating the interpretable part of the human genome at relatively low cost. We present a comparison of three major commercial exome sequencing platforms from Agilent, Illumina and Nimblegen applied to the same human blood sample. Our results suggest that the Nimblegen platform, which is the only one to use high-density overlapping baits, covers fewer genomic regions than the other platforms but requires the least amount of sequencing to sensitively detect small variants. Agilent and Illumina are able to detect a greater total number of variants with additional sequencing. Illumina captures untranslated regions, which are not targeted by the Nimblegen and Agilent platforms. We also compare exome sequencing and whole genome sequencing (WGS) of the same sample, demonstrating that exome sequencing can detect additional small variants missed by WGS.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Exome enrichment designs include different biochemical methods, bait lengths, quantity and overlap of baits and number of bases targeted.
Figure 2: Efficiency trends by platform.
Figure 3: Off-target enrichment and GC bias.
Figure 4: SNV trends by platform.
Figure 5: Sensitivity toward indels compared between each platform at increasing read counts.
Figure 6: SNVs detected uniquely by exome sequencing or WGS, but not both.

Accession codes

Accessions

Sequence Read Archive

References

  1. Gnirke, A. et al. Solution hybrid selection with ultra-long oligonucleotides for massively parallel targeted sequencing. Nat. Biotechnol. 27, 182–189 (2009).

    Article  CAS  Google Scholar 

  2. Hedges, D. et al. Exome sequencing of a multigenerational human pedigree. PLoS ONE 4, e8232 (2009).

    Article  Google Scholar 

  3. Lee, H. et al. Improving the efficiency of genomic loci capture using oligonucleotide arrays for high throughput resequencing. BMC Genomics 10, 646 (2009).

    Article  Google Scholar 

  4. Adey, A. et al. Rapid, low-input, low-bias construction of shotgun fragment libraries by high-density in vitro transposition. Genome Biol. 11, R119 (2010).

    Article  CAS  Google Scholar 

  5. Bainbridge, M.N. et al. Whole exome capture in solution with 3 Gbp of data. Genome Biol. 11, R62 (2010).

    Article  Google Scholar 

  6. Ng, S.B. et al. Targeted capture and massively parallel sequencing of 12 human exomes. Nature 461, 272–276 (2009).

    Article  CAS  Google Scholar 

  7. Nazarian, R. et al. Melanomas acquire resistance to B-RAF(V600E) inhibition by RTK or N-RAS upregulation. Nature 468, 973–977 (2010).

    Article  CAS  Google Scholar 

  8. Glazov, E.A. et al. Whole-exome re-sequencing in a family quartet identifies POP1 mutations as the cause of a novel skeletal dysplasia. PLoS Genet. 7, e1002027 (2011).

    Article  CAS  Google Scholar 

  9. Kalay, E. et al. CEP152 is a genome maintenance protein disrupted in Seckel syndrome. Nat. Genet. 43, 23–26 (2011).

    Article  CAS  Google Scholar 

  10. Shi, Y. et al. Exome sequencing identifies ZNF644 mutations in high myopia. PLoS Genet. 7, e1002084 (2011).

    Article  CAS  Google Scholar 

  11. Snape, K. et al. Mutations in CEP57 cause mosaic variegated aneuploidy syndrome. Nat. Genet. 43, 527–529 (2011).

    Article  CAS  Google Scholar 

  12. Wheeler, D.A. et al. The complete genome of an individual by massively parallel DNA sequencing. Nature 452, 872–876 (2008).

    Article  CAS  Google Scholar 

  13. Ng, S.B. et al. Exome sequencing identifies the cause of a mendelian disorder. Nat. Genet. 42, 30–35 (2010).

    Article  CAS  Google Scholar 

  14. Ng, S.B. et al. Exome sequencing identifies MLL2 mutations as a cause of Kabuki syndrome. Nat. Genet. 42, 790–793 (2010).

    Article  CAS  Google Scholar 

  15. The 1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010).

  16. Pruitt, K.D., Tatusova, T., Klimke, W. & Maglott, D.R. NCBI Reference Sequences: current status, policy and new initiatives. Nucleic Acids Res. 37, D32–D36 (2009).

    Article  CAS  Google Scholar 

  17. Hsu, F. et al. The UCSC known genes. Bioinformatics 22, 1036–1046 (2006).

    Article  CAS  Google Scholar 

  18. Flicek, P. et al. Ensembl 2011. Nucleic Acids Res. 39, D800–D806 (2011).

    Article  CAS  Google Scholar 

  19. Griffiths-Jones, S., Saini, H.K., van Dongen, S. & Enright, A.J. miRBase: tools for microRNA genomics. Nucleic Acids Res. 36, D154–D158 (2008).

    Article  CAS  Google Scholar 

  20. Dohm, J.C., Lottaz, C., Borodina, T. & Himmelbauer, H. Substantial biases in ultra-short read data sets from high-throughput DNA sequencing. Nucleic Acids Res. 36, e105 (2008).

    Article  Google Scholar 

  21. Aird, D. et al. Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries. Genome Biol. 12, R18 (2011).

    Article  CAS  Google Scholar 

  22. Kane, M.D. et al. Assessment of the sensitivity and specificity of oligonucleotide (50mer) microarrays. Nucleic Acids Res. 28, 4552–4557 (2000).

    Article  CAS  Google Scholar 

  23. Kucho, K., Yoneda, H., Harada, M. & Ishiura, M. Determinants of sensitivity and specificity in spotted DNA microarrays with unmodified oligonucleotides. Genes Genet. Syst. 79, 189–197 (2004).

    Article  CAS  Google Scholar 

  24. McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).

    Article  CAS  Google Scholar 

  25. Degner, J.F. et al. Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data. Bioinformatics 25, 3207–3212 (2009).

    Article  CAS  Google Scholar 

  26. Zhang, Z. & Gerstein, M. Patterns of nucleotide substitution, insertion and deletion in the human genome inferred from pseudogenes. Nucleic Acids Res. 31, 5338–5348 (2003).

    Article  CAS  Google Scholar 

  27. Mills, R.E. et al. An initial map of insertion and deletion (INDEL) variation in the human genome. Genome Res. 16, 1182–1190 (2006).

    Article  CAS  Google Scholar 

  28. Taylor, M.S., Ponting, C.P. & Copley, R.R. Occurrence and consequences of coding sequence insertions and deletions in mammalian genomes. Genome Res. 14, 555–566 (2004).

    Article  CAS  Google Scholar 

  29. Ashley, E.A. et al. Clinical assessment incorporating a personal genome. Lancet 375, 1525–1535 (2010).

    Article  CAS  Google Scholar 

  30. Chen, R., Davydov, E.V., Sirota, M. & Butte, A.J. Non-synonymous and synonymous coding SNPs show similar likelihood and effect size of human disease association. PLoS ONE 5, e13574 (2010).

    Article  Google Scholar 

  31. Wetterstrand, K.A. DNA Sequencing Costs: Data from the NHGRI Large-Scale Genome Sequencing Program. <http://www.genome.gov/sequencingcosts/> (accessed July 15, 2011).

  32. Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

    Article  Google Scholar 

  33. Albers, C.A. et al. Dindel: accurate indel calls from short-read data. Genome Res. 21, 961–973 (2011).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

We thank P. LaCroute for assistance with data processing and analysis. Thanks to A. Boyle and Y. Cheng for consulting with data analysis and display methods. We thank representatives from Agilent, Illumina and Nimblegen for their support and feedback as we performed these tests. We also thank the Hewlett Packard Foundation and Lucile Packard Foundation for Children's Health for support in creation of our disease/trait SNP database. This work was supported by the US National Institutes of Health grant no. HG002357.

Author information

Authors and Affiliations

Authors

Contributions

M.S. and R.C. conceived and planned the study. R.C. performed the experiments. G.E. provided sequencing services. M.J.C. conducted the data analysis. R.C. and M.S. both contributed to the data analysis and discussion. H.Y.K.L. and M.J.C. analyzed the whole genome data. K.J.K., R.C. and A.J.B. created the disease/trait SNP database and analyzed our data against it. M.J.C., R.C. and M.S. prepared the manuscript.

Corresponding author

Correspondence to Michael Snyder.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Tables 1–5 and Supplementary Figures 1–3 (PDF 1279 kb)

Supplementary Data 1.

SNVs detected by exome-seq with three platforms. (ZIP 5966 kb)

Supplementary Data 2.

Indels detected by exome-seq with three platforms. (ZIP 761 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Clark, M., Chen, R., Lam, H. et al. Performance comparison of exome DNA sequencing technologies. Nat Biotechnol 29, 908–914 (2011). https://doi.org/10.1038/nbt.1975

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nbt.1975

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing