Perspective

Our path to better science in less time using open data science tools

  • Nature Ecology & Evolution 1, Article number: 0160 (2017)
  • doi:10.1038/s41559-017-0160
  • Download Citation
Received:
Accepted:
Published online:

Abstract

Reproducibility has long been a tenet of science but has been challenging to achieve—we learned this the hard way when our old approaches proved inadequate to efficiently reproduce our own work. Here we describe how several free software tools have fundamentally upgraded our approach to collaborative research, making our entire workflow more transparent and streamlined. By describing specific tools and how we incrementally began using them for the Ocean Health Index project, we hope to encourage others in the scientific community to do the same—so we can all produce better science in less time.

  • Subscribe to Nature Ecology & Evolution for full access:

    $99

    Subscribe

  • Purchase article full text and PDF:

    $32

    Buy now

Additional access options:

Already a subscriber?  Log in  now or  Register  for online access.

References

  1. 1.

    Over half of psychology studies fail reproducibility test. Nature News (2015).

  2. 2.

    & Cancer reproducibility project releases first results. Nature News (2017).

  3. 3.

    Open Science Collaboration. Estimating the reproducibility of psychological science. Science 349, aac4716 (2015).

  4. 4.

    1,500 scientists lift the lid on reproducibility. Nature 533, 452–454 (2016).

  5. 5.

    Science isn't broken. FiveThirtyEight (19 August 2015)

  6. 6.

    Solving reproducibility. Science 348, 1403–1403 (2015).

  7. 7.

    & Environmental informatics. Annu. Rev. Environ. Resources 37, 449–472 (2012).

  8. 8.

    , , & The new bioinformatics: Integrating ecological data from the gene to the biosphere. Annu. Rev. Ecol. Evol. Syst. 37, 519–544 (2006).

  9. 9.

    & Ecoinformatics: Supporting ecology as a data-intensive science. Trends Ecol. Evol. 27, 85–93 (2012).

  10. 10.

    Mozilla plan seeks to debug scientific code. Nature News (2013).

  11. 11.

    , , & Building software, building community: Lessons from the rOpenSci project. J. Open Res. Softw. 3, e8 (2015).

  12. 12.

    et al. Good enough practices in scientific computing. Preprint at (2016).

  13. 13.

    Where's the real bottleneck in scientific computing? Am. Sci. 94, 5–6 (2006).

  14. 14.

    Scientific computing: Code alert. Nature 541, 563–565 (2017).

  15. 15.

    , & Unmet needs for analyzing biological big data: A survey of 704 NSF principal investigators. Preprint at bioRxiv (2017).

  16. 16.

    , & Advances in global change research require open science by individual researchers. Global Change Biol. 18, 2102–2110 (2012).

  17. 17.

    et al. Promoting an open research culture. Science 348, 1422–1425 (2015).

  18. 18.

    , & Challenges and opportunities of open data in ecology. Science 331, 703–705 (2011).

  19. 19.

    & Computing workflows for biologists: A roadmap. PLoS Biol. 13, e1002303 (2015).

  20. 20.

    et al. Ten simple rules for the care and feeding of scientific data. PLoS Comput. Biol. 10, e1003542 (2014).

  21. 21.

    , , & Ten simple rules for reproducible computational research. PLoS Comput. Biol. 9, e1003285 (2013).

  22. 22.

    et al. Nine simple ways to make it easier to (re)use your data. Ideas Ecol. Evol. (2013).

  23. 23.

    , & Common errors in ecological data sharing. J. eScience Librarianship (2013).

  24. 24.

    & Research integrity: don't let transparency damage science. Nature News (2016).

  25. 25.

    Ten simple rules for creating a good data management plan. PLoS Comput. Biol. 11, e1004525 (2015).

  26. 26.

    , & Elevating the status of code in ecology. Trends Ecol. Evol. 31, 4–7 (2016).

  27. 27.

    & Data publication consensus and controversies. F1000Research (2014).

  28. 28.

    et al. A manifesto for reproducible science. Nat. Hum. Behav. 1, 0021 (2017).

  29. 29.

    et al. Reproducibility in Science: A Guide to Enhancing Reproducibility in Scientific Results and Writing (2014).

  30. 30.

    & Water, water, everywhere: defining and assessing data sharing in academia. PLoS ONE 11, e0147942 (2016).

  31. 31.

    Why scientists must share their research code. Nature News (2016).

  32. 32.

    et al. Badges to acknowledge open practices: A simple, low-cost, effective method for increasing transparency. PLoS Biol. 14, e1002456 (2016).

  33. 33.

    & R for Data Science (O’Reilly, 2016);

  34. 34.

    et al. Best practices for assessing ocean health in multiple contexts using tailorable frameworks. PeerJ 3, e1503 (2015).

  35. 35.

    A biography of the ocean health index. ohi-science (13 January 2017).

  36. 36.

    et al. An index to assess the health and benefits of the global ocean. Nature 488, 615–620 (2012).

  37. 37.

    et al. Patterns and emerging trends in global ocean health. PLoS ONE 10, e0117863 (2015).

  38. 38.

    Five years of global ocean health index assessments. ohi-science (2016).

  39. 39.

    & The nation's first ocean plans. The White House (7 December 2016).

  40. 40.

    et al. The tao of open science for ecology. Ecosphere 6, art 120 (2015).

  41. 41.

    Introducing mozilla science study groups. Mozilla (22 April 2015).

  42. 42.

    R Core Team R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2016);

  43. 43.

    RStudio Team RStudio: Integrated Development for R (RStudio, 2016); www.rstudio.com

  44. 44.

    Git Team Git Version Control System (Git, 2016);

  45. 45.

    GitHub: A Collaborative Online Platform To Build Software (GitHub, 2016);

  46. 46.

    Software carpentry: getting scientists to write better code by making them more productive. Comput. Sci. Eng. 8, 66–69 (2006).

  47. 47.

    Initial steps toward reproducible research. (2016).

  48. 48.

    et al. How open science helps researchers succeed. eLife 5, e16800 (2016).

  49. 49.

    Scaling the heights of data science. Breakthroughs (2016).

  50. 50.

    et al. Best practices for scientific computing. PLoS Biol. 12, e1001745 (2014).

  51. 51.

    & Practical Computing for Biologists (Sinauer Associates, 2011).

  52. 52.

    Publish your computer code: it is good enough. Nature 467, 753 (2010).

  53. 53.

    , & Mapping uncertainty due to missing data in the global ocean health index. PLoS ONE 11, e0160377 (2016).

  54. 54.

    ESRI ArcGIS Platform (2016);

  55. 55.

    The QGIS Team QGIS Project (2016);

  56. 56.

    The Python Team Python (2016);

  57. 57.

    Tidy data. J. Stat. Softw. 59, 1–23 (2014).

  58. 58.

    Tidyverse Tidyweb (accessed 29 January 2017);

  59. 59.

    Tidyverse: Easily Install and Load ‘tidyverse’ Packages (2016);

  60. 60.

    How dplyr replaced my most common r idioms. StatsBlogs (10 February 2014).

  61. 61.

    RStudio Team R Markdown (2016);

  62. 62.

    et al. R Markdown: Dynamic Documents for R (2016);

  63. 63.

    Ocean Health Index ohicore Package (Ocean Health Index Team, 2016).

  64. 64.

    R Packages (O’Reilly, 2015);

  65. 65.

    & Devtools: Tools to Make Developing R Packages Easier (2016);

  66. 66.

    , & Roxygen2: In-Source Documentation for R (2015);

  67. 67.

    Git can facilitate greater reproducibility and increased transparency in science. Source Code Biol. Med. 8, 7 (2013).

  68. 68.

    , & A quick introduction to version control with git and GitHub. PLoS Comput. Biol. 12, e1004668 (2016).

  69. 69.

    et al. Ten simple rules for taking advantage of Git and GitHub. PLoS Comput. Biol. 12, e1004947 (2016).

  70. 70.

    What Google learned from its quest to build the perfect team. The New York Times (25 February 2016).

  71. 71.

    Democratic databases: Science on GitHub. Nature 538, 127–128 (2016).

  72. 72.

    & Reproducible science. Infect. Immun. 78, 4972–4975 (2010).

  73. 73.

    Software carpentry: lessons learned. F1000Research (2016).

  74. 74.

    et al. Big data and the future of ecology. Front. Ecol. Environ. 11, 156–162 (2013).

  75. 75.

    For big-data scientists, ‘janitor work’ is key hurdle to insights. The New York Times (17 August 2014).

  76. 76.

    , , & Reproducible research is still a challenge. ROpenSci (9 June 2014).

  77. 77.

    , & Ten simple rules to enable multi-site collaborations through data sharing. PLoS Comput. Biol. 13, e1005278 (2017).

  78. 78.

    Scientific writing: the online cooperative. Nature 514, 127–128 (2014).

  79. 79.

    How Twitter improved my ecological model. R-bloggers (26 February 2015).

Download references

Acknowledgements

The Ocean Health Index is a collaboration between Conservation International and the National Center for Ecological Analysis and Synthesis at the University of California at Santa Barbara. We thank J. Polsenberg, S. Katona, E. Pacheco and L. Mosher who are our partners at Conservation International. We thank all past contributors and funders that have supported the Ocean Health Index, including B. Wrigley and H. Wrigley and The Pacific Life Foundation. We also thank all the individuals and groups that openly make their data, tools and tutorials freely available to others. Finally, we thank H. Wickham, K. Ram, K. Woo and M. Schildhauer for friendly review of the developing manuscript. See http://ohi-science.org/betterscienceinlesstime as an example of a website built with RMarkdown and the RStudio–GitHub workflow, and for links and resources referenced in the paper.

Author information

Affiliations

  1. National Center for Ecological Analysis and Synthesis, University of California at Santa Barbara, Santa Barbara, California 93101, USA.

    • Julia S. Stewart Lowndes
    • , Courtney Scarborough
    • , Jamie C. Afflerbach
    • , Melanie R. Frazier
    • , Casey C. O’Hara
    • , Ning Jiang
    •  & Benjamin S. Halpern
  2. EcoQuants.com, Santa Barbara, California 93103, USA.

    • Benjamin D. Best
  3. Bren School for Environmental Science and Management, University of California, Santa Barbara, California 93177, USA.

    • Benjamin S. Halpern
  4. Silwood Park Campus, Imperial College London, Ascot SL5 7PY, UK.

    • Benjamin S. Halpern

Authors

  1. Search for Julia S. Stewart Lowndes in:

  2. Search for Benjamin D. Best in:

  3. Search for Courtney Scarborough in:

  4. Search for Jamie C. Afflerbach in:

  5. Search for Melanie R. Frazier in:

  6. Search for Casey C. O’Hara in:

  7. Search for Ning Jiang in:

  8. Search for Benjamin S. Halpern in:

Competing interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to Julia S. Stewart Lowndes.