Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

SpeedSeq: ultra-fast personal genome analysis and interpretation

Abstract

SpeedSeq is an open-source genome analysis platform that accomplishes alignment, variant detection and functional annotation of a 50× human genome in 13 h on a low-cost server and alleviates a bioinformatics bottleneck that typically demands weeks of computation with extensive hands-on expert involvement. SpeedSeq offers performance competitive with or superior to current methods for detecting germline and somatic single-nucleotide variants, structural variants, insertions and deletions, and it includes novel functionality for streamlined interpretation.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Buy article

Get time limited or full article access on ReadCube.

$32.00

All prices are NET prices.

Figure 1: SpeedSeq workflow.
Figure 2: Case study in a tumor-normal pair.

Accession codes

Primary accessions

European Nucleotide Archive

References

  1. Li, H. Preprint at http://arxiv.org/abs/1303.3997v2 (2013).

  2. DePristo, M.A. et al. Nat. Genet. 43, 491–498 (2011).

    Article  CAS  Google Scholar 

  3. Li, H. et al. Bioinformatics 25, 2078–2079 (2009).

    Article  Google Scholar 

  4. Dewey, F.E. et al. J. Am. Med. Assoc. 311, 1035–1045 (2014).

    Article  CAS  Google Scholar 

  5. Faust, G.G. & Hall, I.M. Bioinformatics 30, 2503–2505 (2014).

    Article  CAS  Google Scholar 

  6. Zook, J.M. et al. Nat. Biotechnol. 32, 246–251 (2014).

    Article  CAS  Google Scholar 

  7. Garrison, E. & Marth, G. Preprint at http://arxiv.org/abs/1207.3907 (2012).

  8. Kingsmore, S.F. & Saunders, C.J. Sci. Transl. Med. 3, 87ps23 (2011).

    Article  Google Scholar 

  9. Cibulskis, K. et al. Nat. Biotechnol. 31, 213–219 (2013).

    Article  CAS  Google Scholar 

  10. Larson, D.E. et al. Bioinformatics 28, 311–317 (2012).

    Article  CAS  Google Scholar 

  11. Koboldt, D.C. et al. Genome Res. 22, 568–576 (2012).

    Article  CAS  Google Scholar 

  12. Futreal, P.A. et al. Nat. Rev. Cancer 4, 177–183 (2004).

    Article  CAS  Google Scholar 

  13. Alkan, C., Coe, B.P. & Eichler, E.E. Nat. Rev. Genet. 12, 363–376 (2011).

    Article  CAS  Google Scholar 

  14. Layer, R.M., Chiang, C., Quinlan, A.R. & Hall, I.M. Genome Biol. 15, R84 (2014).

    Article  Google Scholar 

  15. Abyzov, A., Urban, A.E., Snyder, M. & Gerstein, M. Genome Res. 21, 974–984 (2011).

    Article  CAS  Google Scholar 

  16. 1000 Genomes Project Consortium. et al. Nature 467, 1061–1073 (2010).

  17. 1000 Genomes Project Consortium. et al. Nature 491, 56–65 (2012).

  18. Paila, U., Chapman, B.A., Kirchner, R. & Quinlan, A.R. PLoS Comput. Biol. 9, e1003153 (2013).

    Article  CAS  Google Scholar 

  19. Griffith, M. et al. Nat. Methods 10, 1209–1210 (2013).

    Article  CAS  Google Scholar 

  20. Stransky, N., Cerami, E., Schalm, S., Kim, J.L. & Lengauer, C. Nat. Commun. 5, 4846 (2014).

    Article  CAS  Google Scholar 

  21. Tarasov, A., Vilella, A.J., Cuppen, E., Nijman, I.J. & Prins, P. Bioinformatics 10.1093/bioinformatics/btv098 (2015).

  22. Tange, O. The USENIX Magazine 36, 42–47 (2011).

    Google Scholar 

  23. Cleary, J.G. et al. J. Comput. Biol. 21, 405–419 (2014).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

The authors thank A. Abyzov for helpful discussions about CNVnator. This work was supported by US National Institutes of Health (NIH) training grant T32 GM007267 (C.C.), NIH NHGRI grant R01HG006693 (A.R.Q.), and NIH NHGRI center grant U54 HG003079, NIH New Innovator Award DP2OD006493-01 and a Burroughs Wellcome Fund Career Award (I.M.H.).

Author information

Authors and Affiliations

Authors

Contributions

C.C. wrote SpeedSeq and analyzed the data. R.M.L. advised on LUMPY implementation, G.G.F. contributed SAMBLASTER features, M.R.L. assisted with cloud implementation and D.B.R. parallelized CNVnator. E.P.G. and G.T.M. advised on implementing FreeBayes. A.R.Q. contributed GEMINI features and advised on software design. I.M.H. conceived and designed the study. C.C. and I.M.H. wrote the manuscript.

Corresponding author

Correspondence to Ira M Hall.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Germline SNV detection performance

Receiver operating characteristic curves comparing the performance of three variant callers over the Omni microarray truth set (N=689,788).

Supplementary Figure 2 Somatic SNV detection performance of low frequency variants in a simulated tumor-normal pair

(a) Somatic variants in the simulated 50X tumor dataset (a mixture of 11 grandchildren from the CEPH 1463 pedigree) exhibit a range of variant allele frequencies in accordance with the expected binomial distribution. (b) Sensitivity and (c) precision over the range of variant allele frequencies at the quality thresholds of open circles in Fig. 2d.

Supplementary Figure 3 Structural variant validation by long-reads and 1000 Genomes Project data

SpeedSeq reported 6,696 structural variants (SVs) in the 50X NA12878 human dataset. The subsets of SVs with read-depth support from CNVnator (red) and with both paired-end and split-read support from LUMPY (blue) are displayed alongside the full set of reported variants (black) in each plot. Gray hashed lines denote the validation rate of 100 random permutations of the data. (a) Validation rate using deep (30X) long-read data from Pacific Biosciences or Illumina Moleculo technologies at different quality thresholds. (b) Validation rate of the subset of 3,438 deletions reported by SpeedSeq against deletions reported in the Pilot or Phase 1 callsets of the 1000 Genomes Project. (d) The number of SVs meeting each quality threshold and evidence type.

Supplementary Figure 4 CEPH 1463 family pedigree

Structure of the three-generation CEPH 1463 family pedigree used in evaluations of somatic variant detection, de novo variant detection, and structural variant detection.

Supplementary Figure 5 Construction of depth-based excluded regions and parallelization strategy

(a) Histogram of the aggregate coverage depth over the mappable genome for 17 whole genome datasets and one replicate sample from the Illumima Platinum Genomes project with a red vertical line denoting the coverage depth threshold for exclusion from SpeedSeq analysis. (b) The depth-based binning strategy whereby 34,123 static regions containing approximately equal numbers of reads are processed with a parallel implementation of FreeBayes. A horizontal red line shows that maximum coverage depth for a region to be processed by SpeedSeq.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–5, Supplementary Tables 1–5 and Supplementary Notes 1–4 (PDF 440 kb)

Supplementary Software

SpeedSeq v0.0.3a (ZIP 16441 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Chiang, C., Layer, R., Faust, G. et al. SpeedSeq: ultra-fast personal genome analysis and interpretation. Nat Methods 12, 966–968 (2015). https://doi.org/10.1038/nmeth.3505

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nmeth.3505

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing