Toil enables reproducible, open source, big biomedical data analyses

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Figure 1: RNA-seq pipeline and expression concordance.
Figure 2: Costs and core usage.

References

  1. 1

    Weinstein, J.N. et al. Nat. Genet. 45, 1113–1120 (2013).

  2. 2

    Zhang, J. et al. Database. http://dx.doi.org/10.1093/database/bar026 (2011)

  3. 3

    Siva, N. Lancet 385, 103–104 (2015).

  4. 4

    McKenna, A. et al. Genome Res. 20, 1297–1303 (2010).

  5. 5

    UNC Bioinformatics. TCGA mRNA-seq pipeline for UNC data. https://webshare.bioinf.unc.edu/public/mRNAseq_TCGA/UNC_mRNAseq_summary.pdf (2013).

  6. 6

    Albrecht, M., Michael, A., Patrick, D., Peter, B. & Douglas, T. in Proceedings of the 1st ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies (SWEET '12) 1. ACM (Association of Computing Machinery. http://dx.doi.org/10.1145/2443416.2443417 (2012).

  7. 7

    Bernhardsson, E. & Frieder, E. Luigi. Github https://github.com/spotify/luigi (2016).

  8. 8

    Goecks, J., Nekrutenko, A. & Taylor, J. Genome Biol. 11, R86 (2010).

  9. 9

    UCSC. Xena http://xena.ucsc.edu (2016).

  10. 10

    Amstutz, P. Common workflow language. Github https://github.com/common-workflow-language/common-workflow-language (2016).

  11. 11

    Frazer, S. Workflow description language. Github https://github.com/broadinstitute/wdl (2014).

  12. 12

    Vivian, J. Toil scripts. Github https://github.com/BD2KGenomics/toil-scripts/tree/master/src/toil_scripts (2016).

  13. 13

    Apache Software Foundation. Apache Spark http://spark.apache.org/ (2017).

  14. 14

    Massie, M. et al. ADAM: genomics formats and processing patterns for cloud scale computing. University of California, Berkeley, Technical Report No. UCB/EECS-2013-207 (2013).

  15. 15

    Gentzsch, W. in Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid 35–36 http://dx.doi.org/10.1109/ccgrid.2001.923173 (IEEE, 2001).

  16. 16

    Yoo, A.B., Jette, M.A. & Mark, G. in Lecture Notes in Computer Science 44–60 (2003) Springer, Berlin, Heidelberg.

  17. 17

    Apache Software Foundation. Apache Mesos http://mesos.apache.org/

  18. 18

    GTEx Consortium. Science 348, 648–660 (2015).

  19. 19

    Dobin, A. et al. Bioinformatics 29, 15–21 (2013).

  20. 20

    Li, B. & Dewey, C.N. BMC Bioinformatics 12, 323 (2011).

  21. 21

    Bray, N.L., Pimentel, H., Melsted, P. & Pachter, L. Nat. Biotechnol. 34, 525–527 (2016).

  22. 22

    Barker, A.D. et al. Clin. Pharmacol. Ther. 86, 97–100 (2009).

  23. 23

    Kent, W.J. et al. Genome Res. 12, 996–1006 (2002).

Download references

Acknowledgements

This work was supported by (BD2K) the National Human Genome Research Institute of the National Institutes of Health award no. 5U54HG007990 and (Cloud Pilot) the National Cancer Institute of the National Institutes of Health under the Broad Institute subaward no. 5417071-5500000716. The UCSC Genome Browser work was supported by the NHGRI award 5U41HG002371 (Corporate Sponsors). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health or our corporate sponsors.

Author information

Correspondence to Benedict Paten.

Ethics declarations

Competing interests

The authors received support from AWS, Microsoft, and Google.

Supplementary information

Supplementary Figures and Texts

Supplementary Notes 1–9, Supplementary Figures 1–4, Supplementary Tables 1 (PDF 2647 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Vivian, J., Rao, A., Nothaft, F. et al. Toil enables reproducible, open source, big biomedical data analyses. Nat Biotechnol 35, 314–316 (2017). https://doi.org/10.1038/nbt.3772

Download citation

Further reading