Cunningham: a BLAST Runtime Estimator

White, James; Matalka, Malcolm; Fricke, W. Florian; Angiuoli, Samuel

doi:10.1038/npre.2011.5593.1

Download PDF

Manuscript
Open access
Published: 26 January 2011

Cunningham: a BLAST Runtime Estimator

James White¹,
Malcolm Matalka¹,
W. Florian Fricke¹ &
…
Samuel Angiuoli¹

Nature Precedings (2011)Cite this article

181 Accesses
2 Altmetric
Metrics details

Abstract

BLAST is arguably the single most important piece of software ever written for the biological sciences. It is the core of most bioinformatics workflows, being a critical component of genome homology searches and annotation. It has influenced the landscape of biology by aiding in everything from functional characterization of genes to pathogen detection to the development of novel vaccines. While BLAST is very popular, it is also often one of the most computationally intensive parts of bioinformatics analysis. In our workflows, BLAST typically takes the majority of cpu time, and we need to parallelize to finish in a reasonable time frame. Waiting for BLAST to finish without having any clue of how long it’s going to take is kind of depressing, and you could waste a day of work trying to run a job that would never finish. If you feel the same way we do, then check out Cunningham, a tool we designed to estimate BLAST runtimes for shotgun sequence datasets using sequence composition statistics. We’ve trained its models on real metagenomic sequence data using the Amazon EC2 cloud, and it will provide a relatively quick estimate for datasets with up to tens of millions of sequences. It’s not perfect, but it’ll give you at least some idea of expected runtime, how large a cluster you’re going to need, how much you’ll need to partition your data, etc. We use it all the time now, so we hope it’ll be useful to someone else out there. Cunningham has been implemented in CloVR for efficient autoscaling in the cloud and is freely available at http://clovr.org.

Article PDF

Author information

Authors and Affiliations

Institute for Genome Sciences, University of Maryland - School of Medicine, Baltimore, MD, 21201
James White, Malcolm Matalka, W. Florian Fricke & Samuel Angiuoli

Authors

James White
View author publications
You can also search for this author in PubMed Google Scholar
Malcolm Matalka
View author publications
You can also search for this author in PubMed Google Scholar
W. Florian Fricke
View author publications
You can also search for this author in PubMed Google Scholar
Samuel Angiuoli
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to James White.

Rights and permissions

Creative Commons Attribution 3.0 License.

Reprints and permissions

About this article

Cite this article

White, J., Matalka, M., Fricke, W. et al. Cunningham: a BLAST Runtime Estimator. Nat Prec (2011). https://doi.org/10.1038/npre.2011.5593.1

Download citation

Received: 26 January 2011
Accepted: 26 January 2011
Published: 26 January 2011
DOI: https://doi.org/10.1038/npre.2011.5593.1

Cunningham: a BLAST Runtime Estimator

Abstract

Similar content being viewed by others

Large scale sequence alignment via efficient inference in generative models

Large multiple sequence alignments with a root-to-leaf regressive method

Julia for biologists

Article PDF

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Search

Quick links

Abstract

Similar content being viewed by others

Large scale sequence alignment via efficient inference in generative models

Large multiple sequence alignments with a root-to-leaf regressive method

Julia for biologists

Article PDF

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links