Human activities such as combustion of fossil fuel, land use change, and cement production increased the atmospheric carbon dioxide (CO2) concentration to 418 ppm in April 2021. This level is almost 50% higher than at the beginning of the industrial age. The greenhouse effect of atmospheric CO2 and other gases has led to significant warming and increased stratification in the ocean, and has consequences for ecosystems and marine ecosystem services. Notably, atmospheric CO2 concentrations would now be around another 76 ppm higher than current levels1 if the ocean had not taken up a significant fraction of our emissions from the atmosphere2.

The ocean is one of the largest carbon pools on the planet, second only to the Earth’s crust. The ocean contains about 38,000 Gigatonnes of carbon and thereby dwarfs the cumulative emissions of fossil CO2 since the Industrial Revolution from fossil fuel combustion (about 440 GtC to 2019) and land-use change (about 210 GtC)1. As such, the accumulation rate of carbon in the surface ocean of about 1 µmol kg−1 year−1 driven by anthropogenic CO2 emissions is much smaller than the natural variations in dissolved inorganic carbon content, over a range of 500 µmol kg−1 regionally and 100 µmol kg−1 seasonally3. Thus, any emission-driven trends in ocean carbon concentrations or changes in biogeochemical cycles are expressed amid large natural variability in these seawater properties across a range of spatial and temporal scales. Accurately quantifying a small change against a large and variable background requires precise and accurate measurements made over decades.

The GLobal Ocean Data Analysis Project (GLODAP)4,5, initiated in 2004 and subsequently updated6,7,8, has been instrumental in delivering carbon-relevant interior ocean data that support well-quantified estimates of the ocean carbon sink. The project delivers near-global data coverage; standardized quality control procedures; a high degree of internal consistency; common data formats; and open and free access to the available data. Compared to its first version, the GLODAP data inventory has more than tripled in size (Fig. 1).

Fig. 1: Key outputs and metrics of GLODAP.
figure 1

a Interior ocean concentration of anthropogenic carbon along a section indicated with a black line in panel (b). b Integrated column inventory of anthropogenic carbon22. Both panels used transient tracer data and the Transit Time Distribution method to calculate anthropogenic carbon23 content. c Cumulative number of samples in GLODAPv2.2020 over time.

In order to continue to serve its purpose, GLODAP needs to advance both its data ingestion systems and its data extraction systems to become more streamlined and automated. In order to decrease the amount of routine manual work as well as the potential for errors, data submission workflows must become uniform, semi-automated, and compatible with machine-learning techniques for quality control. The data extraction system also needs to accommodate a wider range of filtering to fine-tune requests from users.

Global ocean carbon data

Faced with the challenge of quantifying the ocean’s storage of anthropogenic carbon, the ocean community began to systematically measure marine inorganic carbon concentrations in the 1970 and 1980’s4. These efforts ramped up significantly during the World Ocean Circulation Experiment and the Joint Global Ocean Flux Study (WOCE/JGOFS) during the 1990’s, and have later been continued along selected WOCE lines in the repeat hydrographic programs including the Global Ocean Ship-based Hydrographic Investigations Program (GO-SHIP)9.

The primary focus of GLODAP is synthesizing seawater inorganic carbon chemistry data from these global cruise campaigns. However, data for ocean hydrography, dissolved oxygen, transient tracers, inorganic nutrients, and a range of other variables are included to facilitate interpretation. A unique feature of GLODAP is the addition of several layers of quality control and adjustments conducted to minimize inconsistencies and biases in the data10 using a range of tools such as comparison of deep water values at nearby locations. GLODAP offers uniform data at three levels; (1) data from individual cruises in a uniform format with coherent quality control and unit conversion applied, (2) a bias adjusted data product, and 3) a global 1° × 1° mapped climatology11.

The GLODAP data product has supported more than 2000 articles (and counting) since the year 2000, evidencing its extensive use by the scientific community and the trust placed in it. Seminal contributions on the oceanic anthropogenic carbon content and temporal evolution would not have been possible without GLODAP2,12,13. The knowledge from these studies informs, for instance, the Intergovernmental Panel for Climate Change (IPCC) assessments, and the Sustainable Development Goals (SDG) of the UN Agenda 2030 and the Global Climate Observing System (GCOS) indicators on ocean acidification. GLODAP is also an essential reference data set for autonomous observing networks, such as Biogeochemical-Argo: “The long-term success of a global chemical sensor observing system will depend on support from an ongoing, shipboard hydrographic program to produce a high-quality data set for deep waters at the global scale.”14.

With the growth in the amount of data, the ongoing need to provide information on the ocean carbon sink to inform global carbon emission-reduction efforts, and the emerging need to monitor impacts of initiatives in geoengineering and sustainable use of the oceans, the importance of GLODAP will only increase. However, despite receiving short-term funds from a range of projects, GLODAP is a largely unfunded community effort organized and executed by the GLODAP team. Such a situation is unsustainable, and there is significant risk that the effort will diminish or disappear in the next few years. The building and supporting of infrastructure will be critical to ensure that GLODAP continues to provide a valuable service to the global community.

Improved efficiency and service

The current GLODAP workflow requires substantial manual work that necessitates dedicated time from, and funding for, data experts, and that introduces opportunities for data handling errors. GLODAP has matured over the last decade with a set of well-documented protocols and development of dedicated software, as well as a backbone of data management support. However, the GLODAP team now strives for advancements on both data input and output, toward a semi-automated system that will reduce the manual work intensity and associated errors.

First, the team aims to implement a uniform, semi-automatic, and standards-compliant data ingestion system that will facilitate the data submission and quality control procedures. This will enable direct interaction with data providers, leading to improvements in data handling, data quality control, and documentation. The envisaged changes will also enable rapid application of novel quality control approaches using machine-learning techniques.

Second, we want to upgrade to a versatile data extraction system. Such a system will provide more flexibility and options to users, such as requesting output with originally submitted data (without adjustments), or only sub-sets of the data in various formats.

These upgrades will streamline repository workflows to insure the data products are FAIR (findable, accessible, interoperable, and reusable)15, while reducing the burden of data management on scientists. Nevertheless, there will remain a need for experts to spend time on quality control and internal consistency adjustments.

Branch out to keep data accessible

We expect that the improvements will encourage submission of data through building a community of data providers, and will simplify and streamline the process of providing regular updates of the GLODAP products. At the same time, access to GLODAP data will increase. Workflow improvements would allow for enhanced data access systems supporting machine-to-machine services, and better integrated data visualization products16,17.

The GO-SHIP repeat hydrography effort currently provides the backbone of GLODAP thanks to its high data quality and rapid availability. However, many other datasets reach GLODAP through the extensive network of the GLODAP team; some of these datasets will be functionally lost if not collated by GLODAP. An automated system can aid rescue these data for reuse, by providing a streamlined process for scientists to submit data and metadata, and for users to access and visualize the data.

Upgrades of GLODAP will benefit from the data system that has already been developed for the Surface Ocean CO2 Atlas (SOCAT)18. SOCAT successfully streamlined data submission, quality control, and release of an annual synthesis product, but faces the same resourcing challenges as GLODAP to sustain regular updates. Leveraging an existing, and proven, workflow translates to a significant reduction in both cost and labor of developing a similar system for GLODAP.

An investment for the planet’s future

GLODAP needs continued support from the scientific community, but also needs support from funding agencies and stakeholders. Without the updated infrastructure and adequate sustained resourcing in place, GLODAP services may not be able to be maintained on a regular basis.

While the ocean currently takes up about 2.6 Gt of anthropogenic carbon annually, we must understand the evolution, efficiency, and regional patterns of the ocean carbon sink if we want to be able to predict the climate effect of future emissions, as well as to quantify and assess mitigation efforts. Furthermore, human activities affect ocean biogeochemistry in other ways as well, such as de-oxygenation19, changes in nutrient supply20, and ocean acidification21, issues that all need high quality, consistent ocean biogeochemical data to quantify trends, and variability.

Co-located high-quality measurements of physical and biogeochemical parameters that allow for the separation of natural variability from anthropogenic changes—as delivered by GLODAP—are a key component to monitoring, understanding, and mitigating the human influence on the Earth’s climate.