Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Commentary
  • Published:

Cloud computing and the DNA data race

Given the accumulation of DNA sequence data sets at ever-faster rates, what are the key factors you should consider when using distributed and multicore computing systems for analysis?

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Map-shuffle-scan framework used by Crossbow.


  1. Stein, L.D. Genome Biol. 11, 207 (2010).

    Article  Google Scholar 

  2. Moore, G.E. Electronics 38, 4–7 (1965).

    Google Scholar 

  3. Dongarra, J.J., Otto, S.W., Snir, M. & Walker, D. Commun. Assoc. Comput. Machinery 39, 84–90 (1996).

    Google Scholar 

  4. Litzkow, M., Livny, M. & Mutka, M. in Proceedings of the 8th International Conference of Distributed Computing Systems 104–111 (IEEE, Washington DC, 1988).

    Google Scholar 

  5. Dagum, L. & Menon, R. IEEE Comput. Sci. Eng. 5, 46–55 (1998).

    Article  Google Scholar 

  6. Markoff, J. & Hansell, S. Hiding in plain sight, Google seeks more power. New York Times (14 June 2006).

    Google Scholar 

  7. Foley, J. Eli Lilly on what's next in cloud computing. Plug Into the Cloud (14 January 2009).

    Google Scholar 

  8. Netflix selects Amazon web services to power mission-critical technology infrastructure. (7 May 2010).

  9. AWS case study: Harvard Medical School. Amazon Web Services

  10. Jeffrey, D. & Sanjay, G. Commun. Assoc. Comput. Machinery 51, 107–113 (2008).

    Google Scholar 

  11. Lin, J. & Dyer, C. Synthesis Lectures on Human Language Technologies 3, 1–177 (2010).

    Article  Google Scholar 

  12. Chu, C.-T. et al. Adv. Neural Inf. Process. Syst. 19, 281–288 (2007).

    Google Scholar 

  13. Schatz, M.C. Bioinformatics 25, 1363–1369 (2009).

    Article  CAS  Google Scholar 

  14. Brin, S. & Page, L. Comput. Netw. ISDN Syst. 30, 107–117 (1998).

  15. Matthews, S.J. & Williams, T.L. BMC Bioinformatics 11 Suppl 1, S15 (2010).

  16. Langmead, B., Schatz, M.C., Lin, J., Pop, M. & Salzberg, S.L. Genome Biol. 10, R134 (2009).

  17. Langmead, B., Trapnell, C., Pop, M. & Salzberg, S.L. Genome Biol. 10, R25 (2009).

  18. Li, R. et al. Genome Res. 19, 1124–1132 (2009).

    Article  CAS  Google Scholar 

  19. Wall, D. et al. BMC Bioinformatics 11, 259 (2010).

    Article  Google Scholar 

  20. Giardine, B. et al. Genome Res. 15, 1451–1455 (2005).

    Article  CAS  Google Scholar 

  21. Anonymous. Creating HIPAA-compliant medical data applications with AWS. Amazon Web Services (April 2009).

  22. Yu, Y. et al. DryadLINQ: a system for general-purpose distributed data-parallel computing using a high-level language. Symposium on Operating System Design and Implementation (OSDI), San Diego, California, 8–10 December 2008.

    Google Scholar 

  23. Malewicz, G. et al. in PODC 09: Proceedings of the 28th ACM Symposium on Principles of Distributed Computing 6 (ACM, 2009).

    Book  Google Scholar 

  24. Matsunaga, A., Tsugawa, M. & Fortes, J. in Proceedings of the IEEE Fourth International Conference on eScience, 222–229 (IEEE, Washington, DC, 2008).

    Google Scholar 

Download references


The authors were supported in part by US National Science Foundation grant IIS-0844494 and by US National Institutes of Health grant R01-LM006845.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Michael C Schatz.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Schatz, M., Langmead, B. & Salzberg, S. Cloud computing and the DNA data race. Nat Biotechnol 28, 691–693 (2010).

Download citation

  • Issue Date:

  • DOI:

This article is cited by


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing