SpeedSeq: ultra-fast personal genome analysis and interpretation

Chiang, Colby; Layer, Ryan M; Faust, Gregory G; Lindberg, Michael R; Rose, David B; Garrison, Erik P; Marth, Gabor T; Quinlan, Aaron R; Hall, Ira M

doi:10.1038/nmeth.3505

Brief Communication
Published: 10 August 2015

SpeedSeq: ultra-fast personal genome analysis and interpretation

Colby Chiang^1,2,
Ryan M Layer^3,4,
Gregory G Faust ORCID: orcid.org/0000-0002-8233-9408²,
Michael R Lindberg²,
David B Rose²,
Erik P Garrison⁵,
Gabor T Marth^3,4,
Aaron R Quinlan^3,4 &
…
Ira M Hall^1,6

Nature Methods volume 12, pages 966–968 (2015)Cite this article

15k Accesses
305 Citations
166 Altmetric
Metrics details

Subjects

Abstract

SpeedSeq is an open-source genome analysis platform that accomplishes alignment, variant detection and functional annotation of a 50× human genome in 13 h on a low-cost server and alleviates a bioinformatics bottleneck that typically demands weeks of computation with extensive hands-on expert involvement. SpeedSeq offers performance competitive with or superior to current methods for detecting germline and somatic single-nucleotide variants, structural variants, insertions and deletions, and it includes novel functionality for streamlined interpretation.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 2: Case study in a tumor-normal pair.**

Inferring gene regulatory networks from single-cell multiome data using atlas-scale external data

Article Open access 12 April 2024

Qiuyue Yuan & Zhana Duren

Tissue-specific enhancer–gene maps from multimodal single-cell data identify causal disease alleles

Article 09 April 2024

Saori Sakaue, Kathryn Weinand, … Soumya Raychaudhuri

A single-cell atlas enables mapping of homeostatic cellular shifts in the adult human breast

Article Open access 28 March 2024

Austin D. Reed, Sara Pensa, … Walid T. Khaled

Accession codes

Primary accessions

European Nucleotide Archive

ERP001960

References

Li, H. Preprint at http://arxiv.org/abs/1303.3997v2 (2013).
DePristo, M.A. et al. Nat. Genet. 43, 491–498 (2011).
Article CAS Google Scholar
Li, H. et al. Bioinformatics 25, 2078–2079 (2009).
Article Google Scholar
Dewey, F.E. et al. J. Am. Med. Assoc. 311, 1035–1045 (2014).
Article CAS Google Scholar
Faust, G.G. & Hall, I.M. Bioinformatics 30, 2503–2505 (2014).
Article CAS Google Scholar
Zook, J.M. et al. Nat. Biotechnol. 32, 246–251 (2014).
Article CAS Google Scholar
Garrison, E. & Marth, G. Preprint at http://arxiv.org/abs/1207.3907 (2012).
Kingsmore, S.F. & Saunders, C.J. Sci. Transl. Med. 3, 87ps23 (2011).
Article Google Scholar
Cibulskis, K. et al. Nat. Biotechnol. 31, 213–219 (2013).
Article CAS Google Scholar
Larson, D.E. et al. Bioinformatics 28, 311–317 (2012).
Article CAS Google Scholar
Koboldt, D.C. et al. Genome Res. 22, 568–576 (2012).
Article CAS Google Scholar
Futreal, P.A. et al. Nat. Rev. Cancer 4, 177–183 (2004).
Article CAS Google Scholar
Alkan, C., Coe, B.P. & Eichler, E.E. Nat. Rev. Genet. 12, 363–376 (2011).
Article CAS Google Scholar
Layer, R.M., Chiang, C., Quinlan, A.R. & Hall, I.M. Genome Biol. 15, R84 (2014).
Article Google Scholar
Abyzov, A., Urban, A.E., Snyder, M. & Gerstein, M. Genome Res. 21, 974–984 (2011).
Article CAS Google Scholar
1000 Genomes Project Consortium. et al. Nature 467, 1061–1073 (2010).
1000 Genomes Project Consortium. et al. Nature 491, 56–65 (2012).
Paila, U., Chapman, B.A., Kirchner, R. & Quinlan, A.R. PLoS Comput. Biol. 9, e1003153 (2013).
Article CAS Google Scholar
Griffith, M. et al. Nat. Methods 10, 1209–1210 (2013).
Article CAS Google Scholar
Stransky, N., Cerami, E., Schalm, S., Kim, J.L. & Lengauer, C. Nat. Commun. 5, 4846 (2014).
Article CAS Google Scholar
Tarasov, A., Vilella, A.J., Cuppen, E., Nijman, I.J. & Prins, P. Bioinformatics 10.1093/bioinformatics/btv098 (2015).
Tange, O. The USENIX Magazine 36, 42–47 (2011).
Google Scholar
Cleary, J.G. et al. J. Comput. Biol. 21, 405–419 (2014).
Article CAS Google Scholar

Download references

Acknowledgements

The authors thank A. Abyzov for helpful discussions about CNVnator. This work was supported by US National Institutes of Health (NIH) training grant T32 GM007267 (C.C.), NIH NHGRI grant R01HG006693 (A.R.Q.), and NIH NHGRI center grant U54 HG003079, NIH New Innovator Award DP2OD006493-01 and a Burroughs Wellcome Fund Career Award (I.M.H.).

Author information

Authors and Affiliations

McDonnell Genome Institute, Washington University School of Medicine, St. Louis, Missouri, USA
Colby Chiang & Ira M Hall
Department of Biochemistry and Molecular Genetics, University of Virginia School of Medicine, Charlottesville, Virginia, USA
Colby Chiang, Gregory G Faust, Michael R Lindberg & David B Rose
Department of Human Genetics, University of Utah School of Medicine, Salt Lake City, Utah, USA
Ryan M Layer, Gabor T Marth & Aaron R Quinlan
Utah Science Technology and Research (USTAR) Center for Genetic Discovery, University of Utah School of Medicine, Salt Lake City, Utah, USA
Ryan M Layer, Gabor T Marth & Aaron R Quinlan
Wellcome Trust Sanger Institute, Hinxton, UK
Erik P Garrison
Department of Medicine, Washington University School of Medicine, St. Louis, Missouri, USA
Ira M Hall

Authors

Colby Chiang
View author publications
You can also search for this author in PubMed Google Scholar
Ryan M Layer
View author publications
You can also search for this author in PubMed Google Scholar
Gregory G Faust
View author publications
You can also search for this author in PubMed Google Scholar
Michael R Lindberg
View author publications
You can also search for this author in PubMed Google Scholar
David B Rose
View author publications
You can also search for this author in PubMed Google Scholar
Erik P Garrison
View author publications
You can also search for this author in PubMed Google Scholar
Gabor T Marth
View author publications
You can also search for this author in PubMed Google Scholar
Aaron R Quinlan
View author publications
You can also search for this author in PubMed Google Scholar
Ira M Hall
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

C.C. wrote SpeedSeq and analyzed the data. R.M.L. advised on LUMPY implementation, G.G.F. contributed SAMBLASTER features, M.R.L. assisted with cloud implementation and D.B.R. parallelized CNVnator. E.P.G. and G.T.M. advised on implementing FreeBayes. A.R.Q. contributed GEMINI features and advised on software design. I.M.H. conceived and designed the study. C.C. and I.M.H. wrote the manuscript.

Corresponding author

Correspondence to Ira M Hall.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Germline SNV detection performance

Receiver operating characteristic curves comparing the performance of three variant callers over the Omni microarray truth set (N=689,788).

Supplementary Figure 2 Somatic SNV detection performance of low frequency variants in a simulated tumor-normal pair

(a) Somatic variants in the simulated 50X tumor dataset (a mixture of 11 grandchildren from the CEPH 1463 pedigree) exhibit a range of variant allele frequencies in accordance with the expected binomial distribution. (b) Sensitivity and (c) precision over the range of variant allele frequencies at the quality thresholds of open circles in Fig. 2d.

Supplementary Figure 3 Structural variant validation by long-reads and 1000 Genomes Project data

SpeedSeq reported 6,696 structural variants (SVs) in the 50X NA12878 human dataset. The subsets of SVs with read-depth support from CNVnator (red) and with both paired-end and split-read support from LUMPY (blue) are displayed alongside the full set of reported variants (black) in each plot. Gray hashed lines denote the validation rate of 100 random permutations of the data. (a) Validation rate using deep (30X) long-read data from Pacific Biosciences or Illumina Moleculo technologies at different quality thresholds. (b) Validation rate of the subset of 3,438 deletions reported by SpeedSeq against deletions reported in the Pilot or Phase 1 callsets of the 1000 Genomes Project. (d) The number of SVs meeting each quality threshold and evidence type.

Supplementary Figure 4 CEPH 1463 family pedigree

Structure of the three-generation CEPH 1463 family pedigree used in evaluations of somatic variant detection, de novo variant detection, and structural variant detection.

Supplementary Figure 5 Construction of depth-based excluded regions and parallelization strategy

(a) Histogram of the aggregate coverage depth over the mappable genome for 17 whole genome datasets and one replicate sample from the Illumima Platinum Genomes project with a red vertical line denoting the coverage depth threshold for exclusion from SpeedSeq analysis. (b) The depth-based binning strategy whereby 34,123 static regions containing approximately equal numbers of reads are processed with a parallel implementation of FreeBayes. A horizontal red line shows that maximum coverage depth for a region to be processed by SpeedSeq.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–5, Supplementary Tables 1–5 and Supplementary Notes 1–4 (PDF 440 kb)

Supplementary Software

SpeedSeq v0.0.3a (ZIP 16441 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chiang, C., Layer, R., Faust, G. et al. SpeedSeq: ultra-fast personal genome analysis and interpretation. Nat Methods 12, 966–968 (2015). https://doi.org/10.1038/nmeth.3505

Download citation

Received: 12 November 2014
Accepted: 28 May 2015
Published: 10 August 2015
Issue Date: October 2015
DOI: https://doi.org/10.1038/nmeth.3505

This article is cited by

Acute expression of human APOBEC3B in mice results in RNA editing and lethality
- Alicia Alonso de la Vega
- Nuri Alpay Temiz
- Rocio Sotillo
Genome Biology (2023)
Large-scale genome sequencing redefines the genetic footprints of high-altitude adaptation in Tibetans
- Wangshan Zheng
- Yaoxi He
- Bing Su
Genome Biology (2023)
A comprehensive analysis of copy number variations in diverse apple populations
- Jinsheng Xu
- Weihan Zhang
- Li Li
BMC Genomics (2023)
Pre-radiotherapy ctDNA liquid biopsy for risk stratification of oligometastatic non-small cell lung cancer
- Nicholas P. Semenkovich
- Shahed N. Badiyan
- Aadel A. Chaudhuri
npj Precision Oncology (2023)
Importance of genetic architecture in marker selection decisions for genomic prediction
- Rafael Della Coletta
- Samuel B. Fernandes
- Candice N. Hirsch
Theoretical and Applied Genetics (2023)