Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Correspondence
  • Published:

Toil enables reproducible, open source, big biomedical data analyses

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: RNA-seq pipeline and expression concordance.
Figure 2: Costs and core usage.


  1. Weinstein, J.N. et al. Nat. Genet. 45, 1113–1120 (2013).

    Article  Google Scholar 

  2. Zhang, J. et al. Database. (2011)

  3. Siva, N. Lancet 385, 103–104 (2015).

    Article  Google Scholar 

  4. McKenna, A. et al. Genome Res. 20, 1297–1303 (2010).

    Article  CAS  Google Scholar 

  5. UNC Bioinformatics. TCGA mRNA-seq pipeline for UNC data. (2013).

  6. Albrecht, M., Michael, A., Patrick, D., Peter, B. & Douglas, T. in Proceedings of the 1st ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies (SWEET '12) 1. ACM (Association of Computing Machinery. (2012).

    Google Scholar 

  7. Bernhardsson, E. & Frieder, E. Luigi. Github (2016).

    Google Scholar 

  8. Goecks, J., Nekrutenko, A. & Taylor, J. Genome Biol. 11, R86 (2010).

    Article  Google Scholar 

  9. UCSC. Xena (2016).

  10. Amstutz, P. Common workflow language. Github (2016).

    Google Scholar 

  11. Frazer, S. Workflow description language. Github (2014).

    Google Scholar 

  12. Vivian, J. Toil scripts. Github (2016).

    Google Scholar 

  13. Apache Software Foundation. Apache Spark (2017).

  14. Massie, M. et al. ADAM: genomics formats and processing patterns for cloud scale computing. University of California, Berkeley, Technical Report No. UCB/EECS-2013-207 (2013).

  15. Gentzsch, W. in Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid 35–36 (IEEE, 2001).

    Book  Google Scholar 

  16. Yoo, A.B., Jette, M.A. & Mark, G. in Lecture Notes in Computer Science 44–60 (2003) Springer, Berlin, Heidelberg.

    Google Scholar 

  17. Apache Software Foundation. Apache Mesos

  18. GTEx Consortium. Science 348, 648–660 (2015).

  19. Dobin, A. et al. Bioinformatics 29, 15–21 (2013).

    Article  CAS  Google Scholar 

  20. Li, B. & Dewey, C.N. BMC Bioinformatics 12, 323 (2011).

    Article  CAS  Google Scholar 

  21. Bray, N.L., Pimentel, H., Melsted, P. & Pachter, L. Nat. Biotechnol. 34, 525–527 (2016).

    Article  CAS  Google Scholar 

  22. Barker, A.D. et al. Clin. Pharmacol. Ther. 86, 97–100 (2009).

    Article  CAS  Google Scholar 

  23. Kent, W.J. et al. Genome Res. 12, 996–1006 (2002).

    Article  CAS  Google Scholar 

Download references


This work was supported by (BD2K) the National Human Genome Research Institute of the National Institutes of Health award no. 5U54HG007990 and (Cloud Pilot) the National Cancer Institute of the National Institutes of Health under the Broad Institute subaward no. 5417071-5500000716. The UCSC Genome Browser work was supported by the NHGRI award 5U41HG002371 (Corporate Sponsors). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health or our corporate sponsors.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Benedict Paten.

Ethics declarations

Competing interests

The authors received support from AWS, Microsoft, and Google.

Supplementary information

Supplementary Figures and Texts

Supplementary Notes 1–9, Supplementary Figures 1–4, Supplementary Tables 1 (PDF 2647 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Vivian, J., Rao, A., Nothaft, F. et al. Toil enables reproducible, open source, big biomedical data analyses. Nat Biotechnol 35, 314–316 (2017).

Download citation

  • Published:

  • Issue Date:

  • DOI:

This article is cited by


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing