Abstract
SpeedSeq is an open-source genome analysis platform that accomplishes alignment, variant detection and functional annotation of a 50× human genome in 13 h on a low-cost server and alleviates a bioinformatics bottleneck that typically demands weeks of computation with extensive hands-on expert involvement. SpeedSeq offers performance competitive with or superior to current methods for detecting germline and somatic single-nucleotide variants, structural variants, insertions and deletions, and it includes novel functionality for streamlined interpretation.
This is a preview of subscription content, access via your institution
Relevant articles
Open Access articles citing this article.
-
A pan-Zea genome map for enhancing maize improvement
Genome Biology Open Access 23 August 2022
-
Interstitial deletion 4p15.32p16.1 and complex chromoplexy in a female proband with severe neurodevelopmental delay, growth failure and dysmorphism
Molecular Cytogenetics Open Access 05 August 2022
-
Accuracy benchmark of the GeneMind GenoLab M sequencing platform for WGS and WES analysis
BMC Genomics Open Access 22 July 2022
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Rent or buy this article
Get just this article for as long as you need it
$39.95
Prices may be subject to local taxes which are calculated during checkout


Accession codes
References
Li, H. Preprint at http://arxiv.org/abs/1303.3997v2 (2013).
DePristo, M.A. et al. Nat. Genet. 43, 491–498 (2011).
Li, H. et al. Bioinformatics 25, 2078–2079 (2009).
Dewey, F.E. et al. J. Am. Med. Assoc. 311, 1035–1045 (2014).
Faust, G.G. & Hall, I.M. Bioinformatics 30, 2503–2505 (2014).
Zook, J.M. et al. Nat. Biotechnol. 32, 246–251 (2014).
Garrison, E. & Marth, G. Preprint at http://arxiv.org/abs/1207.3907 (2012).
Kingsmore, S.F. & Saunders, C.J. Sci. Transl. Med. 3, 87ps23 (2011).
Cibulskis, K. et al. Nat. Biotechnol. 31, 213–219 (2013).
Larson, D.E. et al. Bioinformatics 28, 311–317 (2012).
Koboldt, D.C. et al. Genome Res. 22, 568–576 (2012).
Futreal, P.A. et al. Nat. Rev. Cancer 4, 177–183 (2004).
Alkan, C., Coe, B.P. & Eichler, E.E. Nat. Rev. Genet. 12, 363–376 (2011).
Layer, R.M., Chiang, C., Quinlan, A.R. & Hall, I.M. Genome Biol. 15, R84 (2014).
Abyzov, A., Urban, A.E., Snyder, M. & Gerstein, M. Genome Res. 21, 974–984 (2011).
1000 Genomes Project Consortium. et al. Nature 467, 1061–1073 (2010).
1000 Genomes Project Consortium. et al. Nature 491, 56–65 (2012).
Paila, U., Chapman, B.A., Kirchner, R. & Quinlan, A.R. PLoS Comput. Biol. 9, e1003153 (2013).
Griffith, M. et al. Nat. Methods 10, 1209–1210 (2013).
Stransky, N., Cerami, E., Schalm, S., Kim, J.L. & Lengauer, C. Nat. Commun. 5, 4846 (2014).
Tarasov, A., Vilella, A.J., Cuppen, E., Nijman, I.J. & Prins, P. Bioinformatics 10.1093/bioinformatics/btv098 (2015).
Tange, O. The USENIX Magazine 36, 42–47 (2011).
Cleary, J.G. et al. J. Comput. Biol. 21, 405–419 (2014).
Acknowledgements
The authors thank A. Abyzov for helpful discussions about CNVnator. This work was supported by US National Institutes of Health (NIH) training grant T32 GM007267 (C.C.), NIH NHGRI grant R01HG006693 (A.R.Q.), and NIH NHGRI center grant U54 HG003079, NIH New Innovator Award DP2OD006493-01 and a Burroughs Wellcome Fund Career Award (I.M.H.).
Author information
Authors and Affiliations
Contributions
C.C. wrote SpeedSeq and analyzed the data. R.M.L. advised on LUMPY implementation, G.G.F. contributed SAMBLASTER features, M.R.L. assisted with cloud implementation and D.B.R. parallelized CNVnator. E.P.G. and G.T.M. advised on implementing FreeBayes. A.R.Q. contributed GEMINI features and advised on software design. I.M.H. conceived and designed the study. C.C. and I.M.H. wrote the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Integrated supplementary information
Supplementary Figure 1 Germline SNV detection performance
Receiver operating characteristic curves comparing the performance of three variant callers over the Omni microarray truth set (N=689,788).
Supplementary Figure 2 Somatic SNV detection performance of low frequency variants in a simulated tumor-normal pair
(a) Somatic variants in the simulated 50X tumor dataset (a mixture of 11 grandchildren from the CEPH 1463 pedigree) exhibit a range of variant allele frequencies in accordance with the expected binomial distribution. (b) Sensitivity and (c) precision over the range of variant allele frequencies at the quality thresholds of open circles in Fig. 2d.
Supplementary Figure 3 Structural variant validation by long-reads and 1000 Genomes Project data
SpeedSeq reported 6,696 structural variants (SVs) in the 50X NA12878 human dataset. The subsets of SVs with read-depth support from CNVnator (red) and with both paired-end and split-read support from LUMPY (blue) are displayed alongside the full set of reported variants (black) in each plot. Gray hashed lines denote the validation rate of 100 random permutations of the data. (a) Validation rate using deep (30X) long-read data from Pacific Biosciences or Illumina Moleculo technologies at different quality thresholds. (b) Validation rate of the subset of 3,438 deletions reported by SpeedSeq against deletions reported in the Pilot or Phase 1 callsets of the 1000 Genomes Project. (d) The number of SVs meeting each quality threshold and evidence type.
Supplementary Figure 4 CEPH 1463 family pedigree
Structure of the three-generation CEPH 1463 family pedigree used in evaluations of somatic variant detection, de novo variant detection, and structural variant detection.
Supplementary Figure 5 Construction of depth-based excluded regions and parallelization strategy
(a) Histogram of the aggregate coverage depth over the mappable genome for 17 whole genome datasets and one replicate sample from the Illumima Platinum Genomes project with a red vertical line denoting the coverage depth threshold for exclusion from SpeedSeq analysis. (b) The depth-based binning strategy whereby 34,123 static regions containing approximately equal numbers of reads are processed with a parallel implementation of FreeBayes. A horizontal red line shows that maximum coverage depth for a region to be processed by SpeedSeq.
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–5, Supplementary Tables 1–5 and Supplementary Notes 1–4 (PDF 440 kb)
Supplementary Software
SpeedSeq v0.0.3a (ZIP 16441 kb)
Rights and permissions
About this article
Cite this article
Chiang, C., Layer, R., Faust, G. et al. SpeedSeq: ultra-fast personal genome analysis and interpretation. Nat Methods 12, 966–968 (2015). https://doi.org/10.1038/nmeth.3505
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nmeth.3505
This article is cited by
-
Accuracy benchmark of the GeneMind GenoLab M sequencing platform for WGS and WES analysis
BMC Genomics (2022)
-
Genomic characterization of the world’s longest selection experiment in mouse reveals the complexity of polygenic traits
BMC Biology (2022)
-
Novel variants in the RDH5 Gene in a Chinese Han family with fundus albipunctatus
BMC Ophthalmology (2022)
-
Comprehensive evaluation of structural variant genotyping methods based on long-read sequencing data
BMC Genomics (2022)
-
Assessment of linkage disequilibrium patterns between structural variants and single nucleotide polymorphisms in three commercial chicken populations
BMC Genomics (2022)