A vision for FAIR ocean data products

The ocean is mitigating global warming by absorbing large amounts of excess carbon dioxide from human activities. To quantify and monitor the ocean carbon sink, we need a state-of-the-art data resource that makes data submission and retrieval machine-compatible and efficient.

natural variability in these seawater properties across a range of spatial and temporal scales. Accurately quantifying a small change against a large and variable background requires precise and accurate measurements made over decades.
The GLobal Ocean Data Analysis Project (GLODAP) 4,5 , initiated in 2004 and subsequently updated [6][7][8] , has been instrumental in delivering carbon-relevant interior ocean data that support well-quantified estimates of the ocean carbon sink. The project delivers near-global data coverage; standardized quality control procedures; a high degree of internal consistency; common data formats; and open and free access to the available data. Compared to its first version, the GLODAP data inventory has more than tripled in size (Fig. 1).
In order to continue to serve its purpose, GLODAP needs to advance both its data ingestion systems and its data extraction systems to become more streamlined and automated. In order to decrease the amount of routine manual work as well as the potential for errors, data submission workflows must become uniform, semi-automated, and compatible with machine-learning techniques for quality control. The data extraction system also needs to accommodate a wider range of filtering to fine-tune requests from users.

Global ocean carbon data
Faced with the challenge of quantifying the ocean's storage of anthropogenic carbon, the ocean community began to systematically measure marine inorganic carbon concentrations in the 1970 and 1980's 4 . These efforts ramped up significantly during the World Ocean Circulation Experiment and the Joint Global Ocean Flux Study (WOCE/JGOFS) during the 1990's, and have later been continued along selected WOCE lines in the repeat hydrographic programs including the Global Ocean Ship-based Hydrographic Investigations Program (GO-SHIP) 9 .
The primary focus of GLODAP is synthesizing seawater inorganic carbon chemistry data from these global cruise campaigns. However, data for ocean hydrography, dissolved oxygen, transient tracers, inorganic nutrients, and a range of other variables are included to facilitate interpretation. A unique feature of GLODAP is the addition of several layers of quality control and adjustments conducted to minimize inconsistencies and biases in the data 10 using a range of tools such as comparison of deep water values at nearby locations. GLODAP offers uniform data at three levels; (1) data from individual cruises in a uniform format with coherent quality control and unit conversion applied, (2) a bias adjusted data product, and 3) a global 1°× 1°mapped climatology 11 .
The GLODAP data product has supported more than 2000 articles (and counting) since the year 2000, evidencing its extensive use by the scientific community and the trust placed in it. Seminal contributions on the oceanic anthropogenic carbon content and temporal evolution would not have been possible without GLODAP 2,12,13 . The knowledge from these studies informs, for instance, the Intergovernmental Panel for Climate Change (IPCC) assessments, and the Sustainable Development Goals (SDG) of the UN Agenda 2030 and the Global Climate Observing System (GCOS) indicators on ocean acidification. GLODAP is also an essential reference data set for autonomous observing networks, such as Biogeochemical-Argo: "The longterm success of a global chemical sensor observing system will depend on support from an ongoing, shipboard hydrographic program to produce a high-quality data set for deep waters at the global scale." 14 .
With the growth in the amount of data, the ongoing need to provide information on the ocean carbon sink to inform global carbon emission-reduction efforts, and the emerging need to monitor impacts of initiatives in geoengineering and sustainable use of the oceans, the importance of GLODAP will only increase. However, despite receiving short-term funds from a range of projects, GLODAP is a largely unfunded community effort organized and executed by the GLODAP team. Such a situation is unsustainable, and there is significant risk that the effort will diminish or disappear in the next few years. The building and supporting of infrastructure will be critical to ensure that GLO-DAP continues to provide a valuable service to the global community.

Improved efficiency and service
The current GLODAP workflow requires substantial manual work that necessitates dedicated time from, and funding for, data experts, and that introduces opportunities for data handling errors. GLODAP has matured over the last decade with a set of well-documented protocols and development of dedicated software, as well as a backbone of data management support. However, the GLODAP team now strives for advancements on both data input and output, toward a semi-automated system that will reduce the manual work intensity and associated errors.
First, the team aims to implement a uniform, semi-automatic, and standards-compliant data ingestion system that will facilitate the data submission and quality control procedures. This will enable direct interaction with data providers, leading to improvements in data handling, data quality control, and documentation. The envisaged changes will also enable rapid application of novel quality control approaches using machinelearning techniques.
Second, we want to upgrade to a versatile data extraction system. Such a system will provide more flexibility and options to users, such as requesting output with originally submitted data (without adjustments), or only sub-sets of the data in various formats.
These upgrades will streamline repository workflows to insure the data products are FAIR (findable, accessible, interoperable, and reusable) 15 , while reducing the burden of data management on scientists. Nevertheless, there will remain a need for experts to spend time on quality control and internal consistency adjustments.

Branch out to keep data accessible
We expect that the improvements will encourage submission of data through building a community of data providers, and will simplify and streamline the process of providing regular updates of the GLODAP products. At the same time, access to GLODAP data will increase. Workflow improvements would allow for enhanced data access systems supporting machine-to-machine services, and better integrated data visualization products 16,17 .
The GO-SHIP repeat hydrography effort currently provides the backbone of GLODAP thanks to its high data quality and rapid availability. However, many other datasets reach GLODAP through the extensive network of the GLODAP team; some of these datasets will be functionally lost if not collated by GLODAP. An automated system can aid rescue these data for reuse, by providing a streamlined process for scientists to submit data and metadata, and for users to access and visualize the data.
Upgrades of GLODAP will benefit from the data system that has already been developed for the Surface Ocean CO 2 Atlas (SOCAT) 18 . SOCAT successfully streamlined data submission, quality control, and release of an annual synthesis product, but faces the same resourcing challenges as GLODAP to sustain regular updates. Leveraging an existing, and proven, workflow translates to a significant reduction in both cost and labor of developing a similar system for GLODAP.
An investment for the planet's future GLODAP needs continued support from the scientific community, but also needs support from funding agencies and stakeholders. Without the updated infrastructure and adequate sustained resourcing in place, GLODAP services may not be able to be maintained on a regular basis.
While the ocean currently takes up about 2.6 Gt of anthropogenic carbon annually, we must understand the evolution, efficiency, and regional patterns of the ocean carbon sink if we want to be able to predict the climate effect of future emissions, as well as to quantify and assess mitigation efforts. Furthermore, human activities affect ocean biogeochemistry in other ways as well, such as de-oxygenation 19 , changes in nutrient supply 20 , and ocean acidification 21 , issues that all need high quality, consistent ocean biogeochemical data to quantify trends, and variability.
Co-located high-quality measurements of physical and biogeochemical parameters that allow for the separation of natural variability from anthropogenic changes-as delivered by GLO-DAP-are a key component to monitoring, understanding, and mitigating the human influence on the Earth's climate.