Performance comparison of exome DNA sequencing technologies

Clark, Michael J; Chen, Rui; Lam, Hugo Y K; Karczewski, Konrad J; Chen, Rong; Euskirchen, Ghia; Butte, Atul J; Snyder, Michael

doi:10.1038/nbt.1975

Analysis
Published: 25 September 2011

Performance comparison of exome DNA sequencing technologies

Michael J Clark¹^na1,
Rui Chen¹^na1,
Hugo Y K Lam¹,
Konrad J Karczewski¹,
Rong Chen²,
Ghia Euskirchen^1,3,
Atul J Butte² &
…
Michael Snyder^1,3

Nature Biotechnology volume 29, pages 908–914 (2011)Cite this article

17k Accesses
345 Citations
58 Altmetric
Metrics details

Subjects

Abstract

Whole exome sequencing by high-throughput sequencing of target-enriched genomic DNA (exome-seq) has become common in basic and translational research as a means of interrogating the interpretable part of the human genome at relatively low cost. We present a comparison of three major commercial exome sequencing platforms from Agilent, Illumina and Nimblegen applied to the same human blood sample. Our results suggest that the Nimblegen platform, which is the only one to use high-density overlapping baits, covers fewer genomic regions than the other platforms but requires the least amount of sequencing to sensitively detect small variants. Agilent and Illumina are able to detect a greater total number of variants with additional sequencing. Illumina captures untranslated regions, which are not targeted by the Nimblegen and Agilent platforms. We also compare exome sequencing and whole genome sequencing (WGS) of the same sample, demonstrating that exome sequencing can detect additional small variants missed by WGS.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Exome enrichment designs include different biochemical methods, bait lengths, quantity and overlap of baits and number of bases targeted.**

**Figure 2: Efficiency trends by platform.**

**Figure 3: Off-target enrichment and GC bias.**

**Figure 5: Sensitivity toward indels compared between each platform at increasing read counts.**

**Figure 6: SNVs detected uniquely by exome sequencing or WGS, but not both.**

System analysis of the sequencing quality of human whole exome samples on BGI NGS platform

Article Open access 12 January 2022

Shedding light on dark genes: enhanced targeted resequencing by optimizing the combination of enrichment technology and DNA fragment length

Article Open access 10 June 2020

Comparison of Mendeliome exome capture kits for use in clinical diagnostics

Article Open access 24 February 2020

Accession codes

Accessions

Sequence Read Archive

SRA040093

References

Gnirke, A. et al. Solution hybrid selection with ultra-long oligonucleotides for massively parallel targeted sequencing. Nat. Biotechnol. 27, 182–189 (2009).
Article CAS Google Scholar
Hedges, D. et al. Exome sequencing of a multigenerational human pedigree. PLoS ONE 4, e8232 (2009).
Article Google Scholar
Lee, H. et al. Improving the efficiency of genomic loci capture using oligonucleotide arrays for high throughput resequencing. BMC Genomics 10, 646 (2009).
Article Google Scholar
Adey, A. et al. Rapid, low-input, low-bias construction of shotgun fragment libraries by high-density in vitro transposition. Genome Biol. 11, R119 (2010).
Article CAS Google Scholar
Bainbridge, M.N. et al. Whole exome capture in solution with 3 Gbp of data. Genome Biol. 11, R62 (2010).
Article Google Scholar
Ng, S.B. et al. Targeted capture and massively parallel sequencing of 12 human exomes. Nature 461, 272–276 (2009).
Article CAS Google Scholar
Nazarian, R. et al. Melanomas acquire resistance to B-RAF(V600E) inhibition by RTK or N-RAS upregulation. Nature 468, 973–977 (2010).
Article CAS Google Scholar
Glazov, E.A. et al. Whole-exome re-sequencing in a family quartet identifies POP1 mutations as the cause of a novel skeletal dysplasia. PLoS Genet. 7, e1002027 (2011).
Article CAS Google Scholar
Kalay, E. et al. CEP152 is a genome maintenance protein disrupted in Seckel syndrome. Nat. Genet. 43, 23–26 (2011).
Article CAS Google Scholar
Shi, Y. et al. Exome sequencing identifies ZNF644 mutations in high myopia. PLoS Genet. 7, e1002084 (2011).
Article CAS Google Scholar
Snape, K. et al. Mutations in CEP57 cause mosaic variegated aneuploidy syndrome. Nat. Genet. 43, 527–529 (2011).
Article CAS Google Scholar
Wheeler, D.A. et al. The complete genome of an individual by massively parallel DNA sequencing. Nature 452, 872–876 (2008).
Article CAS Google Scholar
Ng, S.B. et al. Exome sequencing identifies the cause of a mendelian disorder. Nat. Genet. 42, 30–35 (2010).
Article CAS Google Scholar
Ng, S.B. et al. Exome sequencing identifies MLL2 mutations as a cause of Kabuki syndrome. Nat. Genet. 42, 790–793 (2010).
Article CAS Google Scholar
The 1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010).
Pruitt, K.D., Tatusova, T., Klimke, W. & Maglott, D.R. NCBI Reference Sequences: current status, policy and new initiatives. Nucleic Acids Res. 37, D32–D36 (2009).
Article CAS Google Scholar
Hsu, F. et al. The UCSC known genes. Bioinformatics 22, 1036–1046 (2006).
Article CAS Google Scholar
Flicek, P. et al. Ensembl 2011. Nucleic Acids Res. 39, D800–D806 (2011).
Article CAS Google Scholar
Griffiths-Jones, S., Saini, H.K., van Dongen, S. & Enright, A.J. miRBase: tools for microRNA genomics. Nucleic Acids Res. 36, D154–D158 (2008).
Article CAS Google Scholar
Dohm, J.C., Lottaz, C., Borodina, T. & Himmelbauer, H. Substantial biases in ultra-short read data sets from high-throughput DNA sequencing. Nucleic Acids Res. 36, e105 (2008).
Article Google Scholar
Aird, D. et al. Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries. Genome Biol. 12, R18 (2011).
Article CAS Google Scholar
Kane, M.D. et al. Assessment of the sensitivity and specificity of oligonucleotide (50mer) microarrays. Nucleic Acids Res. 28, 4552–4557 (2000).
Article CAS Google Scholar
Kucho, K., Yoneda, H., Harada, M. & Ishiura, M. Determinants of sensitivity and specificity in spotted DNA microarrays with unmodified oligonucleotides. Genes Genet. Syst. 79, 189–197 (2004).
Article CAS Google Scholar
McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).
Article CAS Google Scholar
Degner, J.F. et al. Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data. Bioinformatics 25, 3207–3212 (2009).
Article CAS Google Scholar
Zhang, Z. & Gerstein, M. Patterns of nucleotide substitution, insertion and deletion in the human genome inferred from pseudogenes. Nucleic Acids Res. 31, 5338–5348 (2003).
Article CAS Google Scholar
Mills, R.E. et al. An initial map of insertion and deletion (INDEL) variation in the human genome. Genome Res. 16, 1182–1190 (2006).
Article CAS Google Scholar
Taylor, M.S., Ponting, C.P. & Copley, R.R. Occurrence and consequences of coding sequence insertions and deletions in mammalian genomes. Genome Res. 14, 555–566 (2004).
Article CAS Google Scholar
Ashley, E.A. et al. Clinical assessment incorporating a personal genome. Lancet 375, 1525–1535 (2010).
Article CAS Google Scholar
Chen, R., Davydov, E.V., Sirota, M. & Butte, A.J. Non-synonymous and synonymous coding SNPs show similar likelihood and effect size of human disease association. PLoS ONE 5, e13574 (2010).
Article Google Scholar
Wetterstrand, K.A. DNA Sequencing Costs: Data from the NHGRI Large-Scale Genome Sequencing Program. <http://www.genome.gov/sequencingcosts/> (accessed July 15, 2011).
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Article Google Scholar
Albers, C.A. et al. Dindel: accurate indel calls from short-read data. Genome Res. 21, 961–973 (2011).
Article CAS Google Scholar

Download references

Acknowledgements

We thank P. LaCroute for assistance with data processing and analysis. Thanks to A. Boyle and Y. Cheng for consulting with data analysis and display methods. We thank representatives from Agilent, Illumina and Nimblegen for their support and feedback as we performed these tests. We also thank the Hewlett Packard Foundation and Lucile Packard Foundation for Children's Health for support in creation of our disease/trait SNP database. This work was supported by the US National Institutes of Health grant no. HG002357.

Author information

Michael J Clark and Rui Chen: These authors contributed equally to this work.

Authors and Affiliations

Department of Genetics, Stanford University School of Medicine, Stanford, California, USA
Michael J Clark, Rui Chen, Hugo Y K Lam, Konrad J Karczewski, Ghia Euskirchen & Michael Snyder
Division of Systems Medicine, Department of Pediatrics, Stanford University School of Medicine, Stanford, California, USA
Rong Chen & Atul J Butte
Center for Genomics and Personalized Medicine, Stanford University, Stanford, California, USA
Ghia Euskirchen & Michael Snyder

Authors

Michael J Clark
View author publications
You can also search for this author in PubMed Google Scholar
Rui Chen
View author publications
You can also search for this author in PubMed Google Scholar
Hugo Y K Lam
View author publications
You can also search for this author in PubMed Google Scholar
Konrad J Karczewski
View author publications
You can also search for this author in PubMed Google Scholar
Rong Chen
View author publications
You can also search for this author in PubMed Google Scholar
Ghia Euskirchen
View author publications
You can also search for this author in PubMed Google Scholar
Atul J Butte
View author publications
You can also search for this author in PubMed Google Scholar
Michael Snyder
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

M.S. and R.C. conceived and planned the study. R.C. performed the experiments. G.E. provided sequencing services. M.J.C. conducted the data analysis. R.C. and M.S. both contributed to the data analysis and discussion. H.Y.K.L. and M.J.C. analyzed the whole genome data. K.J.K., R.C. and A.J.B. created the disease/trait SNP database and analyzed our data against it. M.J.C., R.C. and M.S. prepared the manuscript.

Corresponding author

Correspondence to Michael Snyder.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Clark, M., Chen, R., Lam, H. et al. Performance comparison of exome DNA sequencing technologies. Nat Biotechnol 29, 908–914 (2011). https://doi.org/10.1038/nbt.1975

Download citation

Received: 16 May 2011
Accepted: 18 August 2011
Published: 25 September 2011
Issue Date: October 2011
DOI: https://doi.org/10.1038/nbt.1975

This article is cited by

Human whole-exome genotype data for Alzheimer’s disease
- Yuk Yee Leung
- Adam C. Naj
- Li-San Wang
Nature Communications (2024)
Twist exome capture allows for lower average sequence coverage in clinical exome sequencing
- Burcu Yaldiz
- Erdi Kucuk
- Patrick May
Human Genomics (2023)
System analysis of the sequencing quality of human whole exome samples on BGI NGS platform
- Vera Belova
- Anna Pavlova
- Dmitriy Korostin
Scientific Reports (2022)
Genome-wide investigations reveal the population structure and selection signatures of Nigerian cattle adaptation in the sub-Saharan tropics
- David H. Mauki
- Abdulfatai Tijjani
- Ya-Ping Zhang
BMC Genomics (2022)
Integrated proteogenomic characterization of medullary thyroid carcinoma
- Xiao Shi
- Yaoting Sun
- Wenjun Wei
Cell Discovery (2022)