Since water is a common good, the outcome of water-related research should be accessible to everyone. Since Open Science is more than just open access research articles, journals must work with the research community to enable fully open and FAIR science
Water is the basis of life on Earth. Water covers approximately 71% of the Earth’s surface, of which 2.5% is fresh water1,2. Society relies on the availability of adequate quantity and quality of water for drinking, hygiene, growing food, transport, regulating microclimate and maintaining an enjoyable environment. At the same time, the water cycle (evapo–transpiration, drainage, evaporation from water bodies, cloud formation and precipitation) makes water a global, common good, as the water availability in one place is strongly influenced by the land use in another. The effects of fossil fuel burning on climate and the global re-distribution of water is a prominent example of this influence3. Global pollution of water with persistent (and non-persistent) chemicals is also becoming increasingly problematic, with e.g., per- and polyfluoroalkyl substances (PFAS) now detected in rainwater above safe limits4, leading to increasing calls for action as ‘there is no effective dilution for persistent global pollution’5.
The importance of water for society and the global relevance of water research means that such research needs to be freely accessible and re-usable globally for everybody, i.e., without the need for paid licenses to either view and re-use the publications or to use the data and related code employed in the research. Due to its global societal relevance, there is a particular onus on water research to be easily traceable and reproducible by a wide range of stakeholders.
Water research should be accessible to everybody
Opening science opens worlds of opportunities for greater societal gain6, especially in the dissemination of research and knowledge to those communities most affected by changes in water quality, quantity, and accessibility. One prominent example in environmental chemistry includes the recent discovery of 6PPD-quinone, a transformation product of tire rubber particles responsible for the death of coho salmon as a result of road runoff in storm events7 that has since triggered extensive research into the influence of tire wear on the environment. A second example is the identification of the cyanotoxin responsible for eagle deaths8, a mystery which took 25 years to solve. There are already countless examples of extreme flood events, droughts, extensive fish kills, and surface waters being declared unfit for human consumption due to various combinations of natural phenomena and complex contamination events, exemplified in the recent event in the Oder River9, which is still not clarified.
The need for rapid, open dissemination of findings is ever increasing to allow for large collaborative efforts such as the development of Early Warning Systems (EWS) for the preservation of wildlife, the human population and water resources. EWS are being developed in several areas, examples including the NormanEWS initiative10, one of the stimulating initiatives for chemical EWS developments within the European Partnership for the Assessment of Risks from Chemicals (PARC)11,12. The Environment Agency in England has also set up a national-scale Prioritisation and Early Warning System (PEWS) for contaminants of emerging concern (CECs)13. Similarly, Flood awareness systems (FAS) are also being developed on the European (EFAS)14 and Global (GloFAS) level15. However, reliable flood forecasting relies heavily on real-time sharing of highly resolved meteorological and satellite data (see e.g. ref. 16). The Climate Risk and Early Warning Systems (CREWS) initiative of the United Nations (UN) is operating in 19 countries in Africa and the Pacific most prone to tropical cyclones and floods, including Least Developed Countries (LDCs) and Small Island Developing States (SIDS), with rollouts planned into further countries in Africa and Asia17. Beside immediate catastrophic events, water resources are subject to slow and persistent trends, whose discovery is only possible by free access to long time series of hydrological data across the globe. For example, groundwater recharge time scales vary globally between centuries and millennia, with the longest time scales found in arid systems18, meaning that over-exploitation of groundwater may be both hardest to detect and most difficult to undo in systems that most heavily rely on it. Thus, large scale problems require global efforts, and all these collaborative efforts will rely on more Open Science.
Open Science goes beyond Open Access publishing and FAIR data
While much focus in recent years has been put on Open Access publishing, this is only a small part of Open Science. According to the 2015 FOSTER taxonomy19, Open Science integrates Open Access, Open Data, Open Source, and Open Reproducible Research (all of which we will touch on here, see Fig. 1a), while UNESCO and others have extended this further (e.g.,6). Open Data is commonly associated with the ‘FAIR Principles’20, which describe how to make data findable, accessible, interoperable, and reusable. The FAIR principles were introduced in 2016 and provide vital guidance that can be applied irrespective of whether the data itself is strictly open or not21. Note that the FAIR principles do not enforce Open Access, i.e., FAIR data is not automatically Open Data. Conversely, Open Data that is neither FAIR nor managed (see Fig. 1b) can easily be useless data. Thus, the combination of Open and FAIR data is extremely important. However, even Open Access publishing combined with Open and FAIR data does not necessarily make the research reproducible and re-usable, as discussed further below.
Open Access is the subset of Open Science that includes principles and practices for distributing research outputs online, free of cost or other access barriers22,23. This includes for instance Open Access publications (e.g., the dissemination of research as so-called Green, Gold or Diamond Open Access) or the use of preprint servers to access earlier versions of research articles.
Open Data refers to the availability of the data behind the published research, typically hosted in either institutional or domain-specific data repositories (e.g., HydroShare for hydrological data24), or generic repositories such as Zenodo or FigShare. For Open Access publications and Open Data, appropriate license conditions should be stipulated, so that the conditions of re-use are clear. Creative Commons (CC) licenses are commonly used, with CC0 (public domain) and CC-BY (re-use with attribution) being the most permissive. Other restrictions on CC licenses can cause problems for downstream use. For instance, the ‘ND’ (no derivatives) clause forbids re-use for derivative works, i.e., any actual re-use other than re-distribution of the original work, while ‘NC’ (non-commercial use only) can prevent commercial companies (e.g., instrument vendors) from integrating Open Data into vendor-provided instrument libraries that could be used by researchers. The ‘SA’ (share-alike) clause can enforce a license on downstream users that they may not be able to comply with, thus preventing integration of Open Data in other open projects (due to incompatible licenses). While Open Data is an important starting point, without the availability of appropriate metadata and sufficient FAIRness to make the data findable, accessible, re-useable and interoperable, Open Data alone is only of limited use. In the era of ‘big data’, it is now relatively easy to create a quick dump of data, but curation and FAIRification of data requires a concerted effort, which may necessitate either incentives (carrot) or mandates (stick). The Global Natural Products Social Molecular Networking (GNPS) ecosystem25 is a prime example for incentivising Open Data sharing. Starting primarily as a mass spectral data repository for metabolomics, the developers have consistently added features and functionality over the years to value-add the repository and increase motivation for deposition. For example, MASST26 has enabled discovery of the neurotoxin domoic acid and analogues within marine samples and food such as ocean-caught mackerel.
Open Source software and code refer to the public availability of source code27, i.e., sets of computer instructions ranging from data processing scripts and algorithms to fully blown numerical models, desktop applications, or even operating systems. The purpose of open source is to provide transparency, and most importantly, re-usability and adaptability of the code, with a common aim of collaborative development. Licenses for Open Source works are generally designed to explicitly cover code sharing, thus Open Source licenses are generally preferred over CC, with common examples including GPL, Apache and MIT27. Suitable code repositories with version control and issue tracking are indispensable for collaborative open source developments, with common platforms including GitHub, GitLab, Bitbucket and more. For all three above-mentioned aspects of Open Science, i.e., Open Access, Open Data and Open Source, the generation of permanent identifiers such as a Digital Object Identifier (DOI)28 is an integral aspect of FAIR and vital to preserve the discoverability and lifetime of such projects.
Finally, open reproducible research is a culmination of all three aspects above. With systems such as RMarkdown and Jupyter Notebooks, it is now possible to have fully compliable research outputs and reproducible manuscripts. The Journal of Open Source Software even accepts submissions as GitHub pull requests and compiles the entire submission on their system; one example relevant to water research is patRoon 2.0 (ref. 29). The ‘open-source knowledge infrastructure for collaborative and reproducible data science’ Renku facilitates traceability and reproducibility of complex workflows involving networks of interconnected code, data and figure files. It does so by automatic provenance tracking of output files and the creation of a version-controlled git repository containing all information, including the computational environment.
How scientists and publishers can strengthen Open Science in water research
While the facilities and infrastructure available to perform Open Science are increasingly available, fully Open Science requires a substantial additional effort beyond the generation of manuscripts and data visualisations. A study in 2019 revealed that out of 360 randomly sampled hydrology papers published in 2017, only 4 (i.e., 1%) were fully reproducible. Articles were considered fully reproducible if the results published in the paper could be reproduced by readers based on the accompanying directions, code and data accessible online. Half of the articles already failed at the general data availability check, whereas most of the others had incomplete supporting information to enable reproduction of results30. To improve on this dire situation, the authors created a survey template to facilitate reproducibility assessments of studies for authors, journals and funders/institutions. More recently, a group of early to mid-career researchers published a practical guideline to Open Science for hydrologists, including approaches for sharing code and documentation and choosing appropriate licenses for facilitating re-usability of research artifacts31.
Several simple steps can be made to support Open Science using existing systems which, over time, will set the basis for successful Open Science efforts to become the ‘new normal’. The setting of open, community endorsed standards is a key step for every field, with examples including the open mzML standard for raw mass spectrometry data32, NetCDF as an open standard for complex data in hydrology33, the International Chemical Identifier (InChI) in chemistry34, or even the simplicity of the comma or tab separated values (CSV, TSV) formats for exchanging data rather than proprietary excel (XLS, XLSX) formats. The provision of templates can also help guide researchers in data sharing in specific domains, as recently discussed for chemistry35 and exposomics23, since the use of standardized headers and simple, interoperable formats can greatly enhance re-use of the data and integration into large knowledge bases. Finally, clear article quality criteria focusing on easily verifiable reproducibility and re-usability of research, and associated highlights of articles meeting such criteria, could provide the right combination of facilitating and incentivising Open Science.
As discussed above, the need for rapid, open dissemination of findings is ever increasing to ensure the success of large collaborative efforts to preserve wildlife, the human population and water resources in a rapidly changing environment. Since water is a common good, we hope that authors and editors alike will join us in this quest for sustaining and supporting Open Science in water research. Together, many seemingly small steps towards Open Science in water research have the potential to create a world of difference.
Gleick, P. H. Water in Crisis: a Guide to the World’s Fresh Water Resources (Oxford University Press, 1993).
USGS. How Much Water is There on Earth? https://www.usgs.gov/special-topics/water-science-school/science/how-much-water-there-earth (2022).
IPCC Climate change 2008: Climate change and water (eds Bates, B. et al.) (IPCC Secretariat, Geneva, 2008).
Cousins, I. T. et al. Environ. Sci. Technol. 56, 11172–11179 (2022).
Arp, H. P. H. Towards reducing pollution of PMT/vPvM substances to protect water resources. SETAC Europe annual meeting - Zenodo https://doi.org/10.5281/zenodo.6566860 (2022).
UNESCO Recommendation on Open Science (UNESCO, Paris, 2021); https://unesdoc.unesco.org/ark:/48223/pf0000379949.locale=en
Tian, Z. et al. Science 371, 185–189 (2021).
Breinlinger, S. et al. Science 371, 9050 (2021).
Braun, S. Mysterious mass fish kill in Oder River expands. Deutsche Welle (DW) https://www.dw.com/en/mysterious-mass-fish-kill-in-oder-river-expands-downstream/a-62784099 (2022).
Alygizakis, N., Samanipour, S. & Thomas, K. S12, NORMANEWS, NormaNEWS for Retrospective Screening of New Emerging Contaminants (Zenodo, 2017); https://doi.org/10.5281/zenodo.2623816
Anses - French Agency for Food, Environmental and Occupational Health & Safety. European Partnership for the Assessment of Risks from Chemicals (PARC) https://www.anses.fr/en/content/european-partnership-assessment-risks-chemicals-parc (2022).
Dulio, V. et al. Environ. Sci. Eur. 32, 100 (2020).
Sims, K. Chemicals of Concern: a Prioritisation and Early Warning System for England (Royal Society of Chemistry, 2022); https://www.envchemgroup.com/eb-35-chemical-of-concern.html
European Flood Awareness System. Copernicus Emergency Management System (CEMS) https://www.efas.eu/en (2022).
Global Flood Awareness System – global ensemble streamflow forecasting and flood forecasting. Copernicus Emergency Management System (CEMS) https://www.globalfloods.eu/ (2022).
Di Mauro, C. et al. Hydrol. Earth Syst. Sci. 25, 4081–4097 (2021).
UN. Early Warning Systems https://www.un.org/en/climatechange/climate-solutions/early-warning-systems (2022).
Cuthbert, M. O. et al. Nat. Clim Change 9, 137–141 (2019).
Pontika, N., Knoth, P., Cancellieri, M. & Pearce, S. Fostering open science to research using a taxonomy and an eLearning portal. In Proc. 15th International Conference on Knowledge Technologies and Data-driven Business 1–8 (ACM, 2015).
FAIR Principles. Go FAIR https://www.go-fair.org/fair-principles/ (2021).
Wilkinson, M. D. et al. Scientific Data 3, 1–9 (2016).
Suber, P. Open Access Overview (definition, introduction) http://legacy.earlham.edu/~peters/fos/overview.htm (2015).
Schymanski, E. L. & Bolton, E. E. Exposome 2, osab006 (2022).
Horsburgh, J. S. et al. J. Am. Water Resour. Assoc. 52, 873–889 (2016).
Wang, M. et al. Nat. Biotechnol. 34, 828–837 (2016).
Wang, M. et al. Nat. Biotechnol. 38, 23–26 (2020).
Open Source Software Foundation. Open Source Software Licenses https://opensource.org/licenses (2022).
International DOI Foundation. Frequently Asked Questions about the DOI® System https://www.doi.org/faq.html (2021).
Helmus, R. et al. J. Open Source Soft 7, 4029 (2022).
Stagge, J. H. et al. Sci. Data 6, 190030 (2019).
Hall, C. A. et al. Hydrol. Earth Syst. Sci. 26, 647–664 (2022).
Chambers, M. C. et al. Nat. Biotechnol. 30, 918–920 (2012).
Rew, R. et al. Network Common Data Form (NetCDF) (Unidata, accessed 11 Nov 2022); http://www.unidata.ucar.edu/software/netcdf/.
Heller, S. et al. J. Cheminform. 5, 1–9 (2013).
Schymanski, E. L. & Bolton E. E. J. Cheminform. 13, 50 (2021).
Universiteit Gent. FAIR Data https://www.ugent.be/en/research/datamanagement/after-research/fair-data.htm (2022).
We would like to acknowledge all our colleagues and collaborators who have collectively helped stimulate many of the thoughts and reflections contained in this article. Specifically, E.L.S wishes to thank Mingxun Wang and Pieter Dorrestein (both GNPS) for providing the MAAST example to add a nice carrot to this article. S.J.S. acknowledges help by Remko Nijzink and the Swiss Data Science Center in exploring the Renku platform. E.L.S. and S.J.S. both acknowledge financial support by the Luxembourg National Research Fund (FNR) ATTRACT programme for projects A18/BM/12341006 and A16/SR/11254288, respectively.
The authors declare no competing interests.
About this article
Cite this article
Schymanski, E.L., Schymanski, S.J. Water science must be Open Science. Nat Water 1, 4–6 (2023). https://doi.org/10.1038/s44221-022-00014-z