Abstract
Mapped monthly data products of surface ocean acidification indicators from 1998 to 2022 on a 0.25° by 0.25° spatial grid have been developed for eleven U.S. large marine ecosystems (LMEs). The data products were constructed using observations from the Surface Ocean CO2 Atlas, co-located surface ocean properties, and two types of machine learning algorithms: Gaussian mixture models to organize LMEs into clusters of similar environmental variability and random forest regressions (RFRs) that were trained and applied within each cluster to spatiotemporally interpolate the observational data. The data products, called RFR-LMEs, have been averaged into regional timeseries to summarize the status of ocean acidification in U.S. coastal waters, showing a domain-wide carbon dioxide partial pressure increase of 1.4 ± 0.4 μatm yr−1 and pH decrease of 0.0014 ± 0.0004 yr−1. RFR-LMEs have been evaluated via comparisons to discrete shipboard data, fixed timeseries, and other mapped surface ocean carbon chemistry data products. Regionally averaged timeseries of RFR-LME indicators are provided online through the NOAA National Marine Ecosystem Status web portal.
Similar content being viewed by others
Background & Summary
The accumulation of carbon dioxide (CO2) in the atmosphere as a result of human activities, and the uptake of ~25% of anthropogenic CO2 by the ocean1,2, has led to increasing acidity of ocean waters of about −0.016 pH units per decade on a global scale since the 1980s3,4,5,6. This ocean acidification (OA) signal is measurable at time series sites7,8, observed in mapped data products of CO2 partial pressure6,9,10,11,12, captured by decadal repeat hydrographic cruises13,14, and simulated by ocean models15 and coupled Earth system models5,16,17. Superimposed on steady increases in accumulated anthropogenic carbon (Cant) and decreases in ocean pH, however, are various modes of temporal (e.g., diurnal, seasonal, interannual) and spatial (e.g., latitudinal, nearshore–offshore) variability, which are particularly pronounced in coastal ecosystems. This variability in coastal OA brings unique impacts to marine organisms that reside in coastal zones and are vulnerable to corrosive waters18.
Large marine ecosystems (LMEs) are ocean regions that border coastlines and are characterized by distinct bathymetry, hydrography, productivity, and trophic structure19. LMEs encompass estuaries and river mouths, nearshore coastal zones, continental shelves, and the outer margins of ocean current systems. Typically, the offshore boundary of an LME extends to the continental shelf break or to the seaward edge of a current system. Due to their coastal proximity, LMEs tend to be natural hotspots of variability in carbon cycling and rapid exchange between carbon pools. For example, intense surface primary productivity in the coastal ocean is fueled by nutrients from river input, atmospheric deposition, and coastal upwelling18; sinking organic matter from surface production leads to intense respiration throughout the water column and at the seafloor20; and high rates of sedimentation are observed in LMEs from both biogenic and lithogenic inputs21.
Ongoing anthropogenic climate drivers coupled with the natural processes occurring in coastal ecosystems make it challenging to attribute modes of OA variability to the appropriate driving mechanisms. For example, anthropogenic eutrophication from freshwater runoff and atmospheric pollution can augment natural nutrient inputs, leading to even greater net primary production in coastal surface waters and greater respiration in subsurface waters. Whereas the direct effect of CO2 uptake by primary producers mitigates OA at the surface, highly respired subsurface waters can be laterally transported and upwelled onto the continental shelf, leading to enhanced OA in the surface waters there18,22. These and other OA-modulating processes differ across ecosystems23, but their impacts are frequently correlated with environmental driver variables such as sea surface height, temperature, salinity, and chlorophyll-a concentration. These correlations allow OA metrics to be reconstructed from measurements and data products that are available at high spatial and temporal resolution11,24,25.
The data product described here is based on direct observations, which are used to reconstruct a recent history of surface ocean OA indicators at monthly, 0.25° resolution in U.S. LMEs. Observations are from a publicly available, annually updated database of surface CO2 observations: the Surface Ocean CO2 Atlas (SOCAT)26. SOCAT is an international data synthesis effort that has facilitated the production of global surface CO2 flux maps27 that contribute data-constrained estimates of the ocean CO2 sink in the Global Carbon Budget28. We also rely on publicly available satellite-derived surface ocean properties and data reanalysis products to leverage the predictive power of environmental variables for upscaling SOCAT observations across U.S. LMEs. This kind of spatiotemporal upscaling has historically been accomplished using statistical interpolations29,30, multiple linear regressions31, and machine learning approaches9,24,32,33. We build upon the approach of Sharp et al.24 — who presented a monthly surface ocean CO2 partial pressure (pCO2) mapped product for the California Current System region called RFR-CCS — to train random forest regression (RFR) algorithms to predict surface CO2 fugacity (fCO2) from environmental variables that can be derived with spatial and temporal continuity across U.S. LMEs.
We advance the Sharp et al.24 approach by first clustering each LME into sub-regions with similar environmental variability using Gaussian mixture modelling. In addition to fCO2, we predict surface total alkalinity (and nutrients) from empirical property estimation algorithms that have been validated and published34. We use fCO2 and total alkalinity (AT) to compute eight additional OA indicators — partial pressure of CO2, total dissolved inorganic carbon, pH on the total scale, hydrogen ion amount content, carbonate ion amount content, saturation states for aragonite and calcite, and the Revelle factor — to produce monthly data products over 1998–2022 on a 0.25° × 0.25° resolution grid. We refer to these data products as RFR-LMEs35, which are freely available online and will be updated annually. Throughout this paper, we will use the term “mapped data products” to describe RFR-LMEs; “mapping” refers to the reconstruction of OA indicators on monthly, spatially continuous grids via the two-step approach of clustering on regional variability and applying trained RFR algorithms to gridded predictor variables.
This work was partially motivated by a partnership with the National Oceanic and Atmospheric Administration (NOAA) Ecosystem Indicators Working Group (EIWG), who manage the National Marine Ecosystem Status (NaMES) website (https://ecowatch.noaa.gov). The NaMES website was created to provide an at-a-glance overview of conditions in U.S. LMEs. These conditions are presented as indicators, which are quantitative and/or qualitative measures of key components of the ecosystem and span the following categories: climatological (e.g., El Niño Southern Oscillation index), physical–chemical (e.g., sea surface temperature), biological (e.g., chlorophyll-a concentration), and human dimensions (e.g., coastal county population). Indicator datasets are used by many NOAA stakeholders, such as fisheries managers, to monitor their ecosystems of interest and to assess the potential for future changes. Indicators included on the NaMES website must be theoretically sound, have demonstrable importance to the system, be relevant and understandable, show sensitivity to environmental variability or policy actions, and complement other indicators that are already served. This paper will describe the theoretical basis of RFR-LMEs and their relevance their respective ecosystems, to justify the use of RFR-LMEs as NaMES indicators of ocean acidification. The NaMES requirements also state that the data used to develop ecosystem indicators should be publicly available, quantitative, directly measurable, and updated on a regular basis; they stipulate that data should have adequate spatial coverage and that the time-series duration should be greater than 10 years and expected to continue for the foreseeable future. Because RFR-LMEs fit these requirements, we aggregate three mapped OA indicators from RFR-LMEs into monthly and annual regional averages of those indicators (and their uncertainties). Timeseries of the selected OA indicators (pCO2, pH on the total scale, and aragonite saturation state) are available on the NaMES website and, like the RFR-LME mapped data products, will be updated annually.
Methods
An overview of the methodological procedure to create RFR-LMEs is provided in Fig. 1. First, data were obtained from a variety of sources and bin-averaged or interpolated onto a consistent grid. Then, within each LME, a two-step cluster–regression strategy was employed. In the first step, spatial clusters were created using Gaussian mixture models (GMMs) based on variability in environmental predictors. In the second step, random forest regression (RFR) algorithms were trained for each cluster using fCO2(SOCAT) as the target variable and co-located environmental variables as predictors. These algorithms were then applied to gridded (0.25° × 0.25°) monthly environmental predictor fields to create monthly RFR-LME mapped data products of sea surface CO2 fugacity (fCO2(RFR-LME)). Applying GMMs on surface data to first divide each LME into subregions reduces the burden on the RFRs to represent many different regimes of dynamic variability at once. Therefore, the RFR algorithms are able to reconstruct sea surface fCO2 more accurately than if all data points from the entire LME were included in the algorithm training36. To create RFR-LMEs for the other indicators, sea surface total alkalinity and nutrient values were estimated, and carbonate system calculations were performed. Uncertainties were propagated through these calculations to obtain uncertainty estimates for each RFR-LME. Finally, RFR-LMEs were evaluated against independent datasets.
Data sources
Surface ocean fCO2 observations were downloaded from the Surface Ocean CO2 Atlas Version 2023 (SOCATv2023; https://doi.org/10.25921/r7xa-bt92)37 in a large quadrangle surrounding North America and U.S. Pacific Islands with the following coordinates: 18°S to 82°N, 140°E to 58°W (Fig. 2). These observations were filtered by year (1998–2022), dataset flag (A, B, C, or D), and quality flag (q.f. = 2, good data), and binned into 0.25 degrees latitude by 0.25 degrees longitude monthly grid cells using platform-weighted averages. A spatial resolution of 0.25° × 0.25° was chosen largely for coherence with the majority of available predictor datasets. Platform-weighted averages mean that, within each latitude by longitude by month bin, a platform-specific (e.g., ship-only, mooring-only) average was first calculated, then an average was taken of those averages (if more than one platform was represented within the cell). This was done to mitigate unwanted biases toward high-resolution measurement systems. For validation exercises, this binning process was also repeated with only moored buoy observations and with a dataset that excluded moored buoy observations.
Binned observations were grouped into eleven LMEs defined according to the United States Exclusive Economic Zone (EEZ), in accordance with the practice of the NOAA EIWG (Table 1; Fig. 2). Platform-weighted fCO2 from SOCATv2023 observations (fCO2(SOCAT)) in each of these grid cells over time shows large-scale patterns of spatial variability (Fig. 2a) — such as relatively high fCO2(SOCAT) at the equator and relatively low fCO2(SOCAT) surrounding Alaska, compared to the region as a whole — and temporal variability (Fig. 2b) — such as relatively high standard deviation in fCO2(SOCAT) observations surrounding Alaska and near the coastlines of the continental U.S. compared to the relatively low standard deviation in these observations around the Pacific Islands, again compared to the region as a whole. The distribution of the total number of months sampled within each 0.25° × 0.25° grid cell and the number of months of the year sampled at least once across the full dataset (1998–2022) within each grid cell reveal consistent patterns (Fig. 2c,d). The Northeast U.S. has especially high observational coverage (9.5% of all 0.25° × 0.25° monthly grid cells covered and 63.6% of seasonal seasonally binned 0.25° × 0.25° grid cells covered); the Southeast U.S. (4.0% total, 50.0% seasonal), Gulf of Mexico (5.2% total, 53.1% seasonal), Caribbean Sea (6.6% total, 40.2% seasonal), and California Current System (3.0% total, 42.1% seasonal) have moderately high observational coverage (Table 1). Observational coverage generally decreases farther offshore.
Next, gridded fields of satellite, reanalysis, and in situ observational products were downloaded from the sources detailed in Table 2. When applicable, these fields were re-gridded using standard interpolation functions to match the resolution and/or central grid cell positions of the binned fCO2(SOCAT) observations. In many cases, multiple datasets could be chosen, but preference were given to those that were provided at 0.25° resolution and that covered the relevant time and space. Rigorous comparison between different input datasets is planned for future development of RFR-LMEs as they are prepared for dynamic, operational production. Sea surface temperature (SST; Fig. 2e) and ice concentration were obtained from the NOAA Optimum Interpolation Sea Surface Temperature version 2 (OISSTv2) product at daily, 0.25° × 0.25° resolution38; values were averaged to monthly resolution. Sea surface salinity (SSS; Fig. 2f) and mixed layer depth (MLD) were obtained from the Copernicus Marine Environment Monitoring Service (CMEMS) Global Ocean Ensemble Physics Reanalysis (GLORYS) product at monthly, 0.25° resolution39. Sea surface height (SSH) was obtained from the CMEMS satellite gridded product, which is produced at monthly, 0.25° resolution by optimal interpolation of along-track measurements from available altimeter missions40. Sea surface chlorophyll (CHL) was obtained from the National Aeronautics and Space Administration (NASA) Ocean Colour Level-3 Mapped Chlorophyll Data product at monthly, 1/12° resolution and re-gridded to 0.25° resolution41,42. One-dimensional, linear interpolation was used within each grid cell to fill gaps in the chlorophyll dataset. Wind speed (Fig. 2g) was obtained from the fifth generation European Centre for Medium-Range Weather Forecasts (ECMWF) reanalysis for the global climate and weather (ERA5) at monthly, 0.25° resolution43. Bathymetry (Z, Fig. 2h) was obtained from the ETOPOv2022 Global Relief Model at 1/60° resolution and re-gridded to 0.25° resolution44. Sea level pressure (SLP) was obtained from the National Centers for Environmental Prediction/Department of Energy (NCEP/DOE) Reanalysis II model at monthly, 2.5° resolution and interpolated to 0.25° resolution45. Atmospheric pCO2 was obtained from the NOAA Marine Boundary Layer (MBL) product at weekly resolution and varying latitudinal resolution and was re-gridded to monthly, 0.25° resolution46. Binned observations of fCO2(SOCAT) were co-located in both time and space with the gridded predictors in preparation for algorithm training.
As one of several validation exercises, the pCO2 reconstructions from our method were compared to other mapped pCO2 data products downloaded from SeaFlux (v2021.04)47, which is an ensemble of six surface pCO2 products that enables users to calculate air–sea CO2 flux consistently across the global ocean27. SeaFlux harmonizes six data-based pCO2 products: CMEMS-FFNN9,48, MPI-SOMFFN33,49, and NIES-FNN50, which are each constructed using neural networks; JENA-MLS30, which is constructed based on a mixed layer scheme; JMA-MLR10, which is constructed using multiple linear regressions; and CSIR-ML636, which is constructed using an ensemble of multiple machine-learning techniques. In an additional exercise, RFR-LME mapped data products were evaluated through comparisons to co-located, independently calculated OA indicators from research cruises included in the Global Ocean Data Analysis Project database (GLODAPv2.2022)51 and the Coastal Ocean Data Analysis Project – North America database (CODAP-NA)52. Measurements of SST, SSS, AT, CT, and nutrients were obtained from the GLODAP and CODAP-NA databases, then filtered to retain only observations with good quality flags for each of those variables that were collected at a depth of 10 meters or less.
Spatial clustering
Three clustering methods were tested: self-organizing mapping — a neural-network-based method of producing a low-dimensional representation of a set of input data — k-means clustering — an iterative method that optimizes a defined number of centroids by minimizing the in-cluster distances from the centroid for a multidimensional dataset — and Gaussian mixture modelling (GMM)53 — a method of clustering that assumes a multidimensional dataset is represented by a mixture of several Gaussian distributions with different properties. Of these methods, preliminary testing suggested GMM provided the best results in terms of k-fold cross-validated root-mean-square error (RMSE) in fCO2 (described in the following section) after RFRs were fit for each cluster. In addition, GMM clustering affords the benefit of providing probabilities that each spatiotemporal grid cell belongs within a given cluster instead of simply providing the cluster assignment to each grid cell as done in the other two clustering methods. These probabilities are used in our method to mitigate discontinuities at boundaries between clusters.
Variability (defined as the standard deviation within a spatial grid cell over time) in SLP, SST, and CHL were used as feature sets to form clusters in most LMEs; CHL was replaced with wind speed in two LMEs (BS and NBCS) due to insufficient CHL observations at high latitudes. The decisions to cluster based on variability over time instead of monthly values and to use the specified sets of variables were based on initial testing and optimization in terms of k-fold cross-validated RMSE in fCO2 (not shown). Future development of RFR-LMEs may continue to explore alternative clustering strategies.
GMM models with full, unshared covariance matrices were created using the MATLAB “fitgmdist” function. Full covariance matrices were used for GMM based on the a priori assumption that some of the predictor variables were correlated due to the nature of oceanographic environmental variables. Covariance matrices for GMM were unshared based on the a priori assumption that each spatial cluster had its own, different covariance matrix. The number of components (i.e., clusters; N in Table 1) was optimized, primarily by minimizing the k-fold cross-validated RMSE in fCO2, but also taking into account the Bayesian information criterion — a measure of model fit that includes a penalty for the number of clusters — and silhouette score — a measure of the accuracy of the clustering technique that is calculated by comparing each point’s similarity to the other points in its assigned cluster to how dissimilar it is to the points in the next nearest cluster (Fig. 3).
Machine learning regressions
Once the numbers of spatial clusters were determined for each LME, random forest regressions (RFRs)54 were trained for each cluster within each LME using binned fCO2(SOCAT) as a target variable and each of the co-located gridded variables listed in Table 2 along with longitude (degrees east with a 0° to 360° convention), latitude, distance from the coast, month of the year (sine- and cosine-transformed to maintain cyclicity throughout the year and predictability within each month), and year as predictors. These variables were found to be useful predictors of fCO2 by Sharp et al.24.
RFRs are a collection of regression “trees”, each of which is trained with a bootstrapped subset of the dataset. Each tree aims to generate a representation of the relationship between the predictor variables and the target variable for its bootstrapped subset of the data. This is done by splitting the data into a series of “branches” based on the predictors. At each branch point, only a random subset of the predictor variables is made available to the algorithm. The algorithm then optimally selects a predictor dataset and a specific value from that dataset on which to split the dataset into two additional branches/groups with the lowest possible within-group fCO2(SOCAT) variance. This continues until the branches become “leaves”, which means they are no longer split, either due to reaching a defined minimum leaf size or a certain criterion (e.g., variance of the remaining fCO2(SOCAT) observations). The use of an ensemble of regression trees constitutes the “forest” aspect of an RFR. The “randomness” aspect of the forest is due to the fact that each tree is constructed with different subsets of the full dataset and that different subsets of the predictors are available at each branch point, making it possible for each tree to provide a slightly different empirical regression for the dataset. New predictor data can be passed through each tree in the ensemble of a trained RFR, and an average of the values output from each tree is the fCO2 prediction (fCO2(RFR-LME)).
For each cluster, all grid cells with a GMM probability of greater than 10% for that cluster were used to train an RFR using the MATLAB “TreeBagger” function. This means that many grid cells on the geographic boundary between one or more clusters may then have been used to train multiple RFRs. The number of trees used for each RFR was set to 1000, which was confirmed to be sufficient through visual inspection of the out-of-bag RMSE with respect to the number of trees (not shown). The minimum leaf size was set to three based on k-fold cross-validation testing, and the number of predictors used for each decision split was set to 6 (equal to the total number of predictors divided by three and rounded up to the nearest whole number).
To create an RFR-LME map of fCO2 for each LME, all the gridded predictor variables (0.25° × 0.25°, monthly) within the LME were run through each cluster-specific RFR. This produced N fCO2(RFR-LME) maps for each LME, where N is equal to the number of clusters. These maps were then merged as weighted average fCO2(RFR-LME) maps using the GMM probabilities as weights, which helped to smooth out discontinuities between clusters. Lastly, RFR-LME maps of fCO2 were converted to maps of pCO2 (pCO2(RFR-LME)) using SST and SLP55.
Cross-validation was used to evaluate the skill of the fCO2(RFR-LME) estimates in each cluster and overall in each LME. This k-fold cross-validation was performed by sequentially withholding subsets of 20% of data, training versions of RFR algorithms with the remaining 80% of data, then, for each data point in the validation dataset, comparing the fCO2 obtained using the k-fold cross-validation algorithms (fCO2(RFR-LME-kFold)) to the observed fCO2(SOCAT) value. This procedure was repeated five times for each LME so all data points were included in the validation data once, producing ΔfCO2 values for each data point.
Alkalinity and nutrient estimation
Sea surface total alkalinity (AT), phosphate (PO4), and silicate (Si(OH)4) were estimated from gridded monthly fields of SSS and SST using Empirical Seawater Property Estimation Routines (ESPERs)34. ESPERs consist of both locally interpolated multiple linear regressions (ESPER-LIR) and feed-forward neural networks (ESPER-NN) trained to estimate seawater properties from a given set of input properties. Though ESPERs are global in nature, the regionally tuned ESPER-LIR coefficients and spatial coordinate predictors in ESPER-NNs mean that ESPERs function similarly to regional property estimation algorithms. ESPERs also provide the benefit of estimating uncertainty corresponding to each predicted value, allowing for the propagation of those uncertainties through downstream computations. The ESPER-Mixed routine (an average of both the ESPER-LIR and ESPER-NN approaches) was used for this study, due to assessment statistics that have indicated a lower global RMSE for the ESPER-Mixed approach (e.g., a global average RMSE of 3.7 μmol kg−1 for AT) compared to ESPER-LIR (4.0 μmol kg−1) and ESPER-NN (4.1 μmol kg−1) when producing property estimates from SSS and SST34.
Carbonate system calculations
CO2 system calculations were performed using CO2SYSv3 for MATLAB56 to determine additional ocean acidification (OA) indicators: dissolved inorganic carbon (CT(RFR-LME)), pH on the total scale (pHT(RFR-LME)), total hydrogen ion amount content ([H+]T(RFR-LME)), total carbonate ion amount content ([CO32−]T(RFR-LME)), saturation states for aragonite (Ωar(RFR-LME)) and calcite (Ωca(RFR-LME)), and Revelle factor (RF(RFR-LME)). These calculations were performed using well established thermodynamic equations describing the chemistry of carbon dioxide in seawater57,58. Input parameters to these equations were fCO2(RFR-LME), along with ESPER-estimated AT (AT(ESPER)), phosphate (PO4(ESPER)), and silicate (Si(OH)4(ESPER)). Carbonic acid dissociation constants from Lueker et al.59, the boric acid dissociation constant from Dickson60, the total boron to salinity ratio from Lee et al.61, the dissociation constant of water from Dickson62, and the hydrofluoric acid dissociation constant from Perez and Fraga63 were used in CO2 system calculations. Uncertainties were propagated through these calculations (see following section).
Uncertainty estimation
Uncertainties in RFR-LME maps of fCO2 were evaluated based on the previously described k-fold cross-validation approach. First, spatially gridded absolute values of ΔfCO2 from k-fold cross-validation were low-pass filtered (using 0.5° × 0.5° windows) two times in each LME to begin to fill nearby grid cells with uncertainty values. Then, nearest-neighbor interpolation was used to fill any remaining empty grid cells with data-based, spatially scaled uncertainty values (\({{\rm E}}_{{fCO}2(s)}\)). This approach only assesses the strength of the fit for available. It is therefore prudent to assign greater uncertainties for periods and regions where training data are less abundant or absent. For this reason, the \({{\rm E}}_{{fCO}2(s)}\) values were further scaled over time by calculating two scaling factors specific to each LME, one representing the seasonal data coverage (using 3-month running means of the relative data coverage across the seasonal cycle) and another representing the relative annual data coverage (using 5-year running means of the relative data coverage across the timeseries).
The seasonal scaling factor (\({{\rm{\varepsilon }}}_{{seas}.}\)) was calculated as:
where my is the numbered month of the year (1–12), myref is the reference month of the year for each time step (1–12), nobs(my) is the number of grid cells with observations in the corresponding month of the year, ntot(my) is the total number of available grid cells in the corresponding month of the year, and MY is the total number of months considered within the window for each time step. Because January (1) comes after December (12), myref − 1 = 12 when myref = 1 and myref + 1 = 1 when myref = 12. The long-term scaling factor (\({{\rm{\varepsilon }}}_{{ann}.}\)) was calculated as:
where ms is the numbered month in the full series (1–228), msref is the reference month in the series for each time step (1–228), nobs(ms) is the number of grid cells with observations in the corresponding month of the series, ntot(ms) is the total number of available grid cells in the corresponding month, and MS is the total number of months considered within the window for each time step. Fewer months were considered within each window near the beginning and end of the time series. Finally, the estimated uncertainty of fCO2(RFR-LME) scaled spatially and temporally (i.e., seasonally and annually) was calculated as:
The window sizes of the scalers were selected to balance data coverage in each time window with realistic periods of time over which observational data may exhibit serial correlations.
Uncertainties in ESPER-estimated AT and nutrients were provided by the ESPER algorithms, which estimate uncertainty using a polynomial fit to salinity and depth. The ESPER algorithms are less skillful in the surface ocean where we use them than they are globally across all depths, and the uncertainty estimates are correspondingly greater at shallow depths.
The uncertainty estimates were propagated along with standard estimated total uncertainties in carbonate system constants (see Table 1 in Orr et al.64) to calculate uncertainty in mapped OA indicators. Gaussian uncertainty propagation was employed, using CO2SYSv3 for MATLAB56, which is based on uncertainty propagation code introduced in CO2SYSv2 by Orr et al.64.
Validation and evaluation
The skill of the RFR-LME maps was evaluated through comparisons with co-located OA indicators independently calculated from the ship-based GLODAPv2.2022 and CODAP-NA measurements described above. OA indicators were computed at in situ temperature from the AT and CT observations using CO2SYSv3 for MATLAB56 and the same equilibrium constants as before. Although the GLODAPv2.2022 and CODAP-NA databases also include pHT and pCO2 measurements, they are not as widespread as AT and CT measurements, so we chose to calculate all indicators from AT and CT for evaluation. Each observation was then co-located with the corresponding RFR-LME grid cell and compared.
In addition, RFR-LME maps were compared to global mapped data products of sea surface pCO2 obtained from SeaFlux (v2021.04)47. Long-term averages of pCO2 from RFR-LME maps and SeaFlux maps were computed across the overlapping time periods of both products (i.e., 1998–2019). Mapped differences between RFR-LME and each SeaFlux ensemble member, as well as an average across the ensemble, were computed and compared.
Finally, observations of pCO2 at fixed buoy locations were compared to pCO2 from RFR-LME data products at grid cells corresponding to those moored buoy observations. For this exercise, special-case RFR-LME maps were created by training RFRs on gridded fCO2(SOCAT) data with buoy observations excluded, then using those algorithms to construct the maps. Comparing pCO2 mapped from datasets both with and without the underlying buoy observations allowed for evaluation of the influence that those seasonally resolved observations have on the fidelity of the pCO2 reconstruction. pCO2 values extracted from the mapped SeaFlux datasets were also included in this comparison, allowing for separate evaluation of how the LME-scale, 0.25° × 0.25° monthly reconstructions compare to global 1° × 1° monthly reconstructions.
Data Records
RFR-LME maps can be accessed through the NOAA National Centers for Environmental Information (NCEI) via the Ocean Carbon and Acidification Data System (OCADS; https://doi.org/10.25921/h8vw-e872)35. The dataset is available in NetCDF format on 0.25° × 0.25° spatial grids at monthly timesteps. Each mapped OA indicator and its uncertainty is provided via a separate NetCDF file, along with a reference grid that indicates to which LME each spatial grid cell belongs. Additionally, regional timeseries for CO2 partial pressure, calcium carbonate saturation state, and pH are displayed at the NOAA Marine Ecosystem Status website (https://ecowatch.noaa.gov). Average values, trends, seasonal amplitudes, and uncertainty estimates of ocean acidification indicators from RFR-LMEs vary considerably among the regions (Tables 3–6).
Long-term means (Table 3; Fig. 4) allow for the description of LME-scale patterns in surface ocean carbonate chemistry. Tropical LMEs (PI and CS) are characterized most notably by high carbonate ion parameters ([CO32−]T, Ωar, and Ωca) and low RF values. Within this pair, the CS can be described as more acidified (higher pCO2 and lower pHT) but better buffered (lower RF and higher AT/CT ratio). Subtropical Atlantic LMEs (GM and SE) also have high carbonate ion parameters ([CO32−]T, Ωar, and Ωca) and low RF values. Compared to the Tropical LMEs however, Subtropical Atlantic LMEs have higher CT and AT values, although AT/CT ratios and therefore RFs are similar between the two groups. Temperate and subarctic coastal LMEs (CCS, GA, and NE) can generally be considered intermediate in all parameters: pCO2, pH, carbonate ion parameters, CT, AT, and RF. Within the group, the GA has the highest RF and lowest carbonate ion parameters, the NE has the lowest RF and highest carbonate ion parameters, and the CCS is between the two. Subarctic North Pacific LMEs (AI and EBS) are characterized by high CT, pCO2, and RF; and low pHT and carbonate ion parameters. Arctic LMEs (NBCS and BS) are characterized by high pHT and RF; and low AT, pCO2, and carbonate ion parameters.
Spatial variability in OA indicators is evident within each LME and throughout the seasonal cycle (Fig. 5). For example, the CCS develops a strong dipole in the summer (June/July/August; Fig. 5c), with low CT off the coast in the northern CT and high CT off the coast in the central CCS. This dipole becomes much weaker in the winter (December/January/February; Fig. 5a). Similarly, relatively low CT occurs off the coast in the northern NE region in the summer but disappears in the winter (Fig. 5c). The southern continental Alaskan coastline exhibits low CT, especially nearshore in the summer, whereas the northern Alaskan coastline is relatively higher in CT than nearby offshore waters in the Arctic Ocean. A band of relatively low CT is evident from about 10° to 20° N in the PI region, between higher CT in the equatorial Pacific and North Pacific subtropical gyre, a feature that has appeared in other sea surface CT data products6.
Mapped indicator uncertainties (see Fig. 6) are served alongside RFR-LME maps35, providing a resource for evaluating uncertainty in OA indicator values at a given location. Area-weighted mean u[pCO2(RFR-LME)] was 12.0 μatm across the entire domain, u[pHT(RFR-LME)] was 0.015, and u[Ωar(RFR-LME)] was 0.18. These domain-wide means are influenced by the large area and low uncertainties in the Pacific Islands region; individual LME uncertainties, particularly in Artic and Subarctic LMEs, may be considerably larger. Spatial patterns of uncertainties also differ for different OA indicators. For example, u[Ωar(RFR-LME)] tends to be relatively high in the tropical LMEs (Fig. 6d), where Ωar(RFR-LME) is also high (Fig. 4e); on the other hand, u[pCO2(RFR-LME)] is extremely low in tropical the LMEs (Fig. 6b).
Uncertainty values reflect not only uncertainty in the RFR predictions, but also uncertainty introduced by interpolating over spatial and temporal gaps in observational coverage. Average uncertainty values for each LME are presented alongside OA indicator timeseries on the NOAA NaMES website. Importantly, the uncertainty values provided in Table 6 and on the NaMES website represent weighted means of grid-cell-level uncertainties rather than uncertainties corresponding to region-wide averages, which may or may not be smaller due to cancelling errors that are removed by areal averaging or larger due to inadequacies of our spatiotemporal scaling approach for representing uncertainties in under-sampled times and locations.
Technical Validation
Data-based validation
A k-fold cross-validation approach was used to assess the skill of the fCO2 estimates and subsequent OA indicator calculations. Region-wide error statistics for each of the eleven LMEs (before the spatial and temporal scaling) indicate that fCO2(RFR-LME-kFold) values are centered around (mean and median errors all close to zero) and tend to correlate closely with (nine of the eleven R2 values are 0.8 or greater) the measured values of fCO2(SOCAT) (Table 7). Root mean square errors (RMSEs) are generally about three times larger than median absolute errors, indicating error populations with long tails of a few particularly large errors. When viewed spatially (Fig. 6a), absolute differences (|ΔfCO2|) between fCO2(RFR-LME-kFold) and fCO2(SOCAT) are greatest near the coast and in the North Pacific and Arctic, and smallest in the open ocean and in the tropics and subtropics. High |ΔfCO2| values tend to correlate with areas of high background variability in fCO2(SOCAT) (Fig. 2b), emphasizing that the RFR algorithms may struggle to capture extreme values, which is consistent with the aforementioned long-tailed error populations.
Comparison to global trends
RFR-LME indicator timeseries (1998–2022) represent spatially weighted annual averages of OA indicators computed from RFR-LME maps. Increasing pCO2(RFR-LME) and decreasing pHT(RFR-LME) are observed in each LME (Figs. 7, 8) — trends that are strongly influenced by anthropogenic CO2 uptake and amplified by ocean warming (Table 8). Ωar(RFR-LME) decreases in many (but not all) LMEs over 1998–2022 (Fig. 9), as Ωar decline is driven by anthropogenic CO2 uptake as well, but moderated by ocean warming and also influenced by changes in SSS (Table 8). Trends in OA indicators across U.S. LMEs (Table 4) can be compared with global trends of about + 1.5 μatm yr–1 for pCO2 (+0.3 to +1.8 μatm yr–1 for RFR-LMEs), +0.9 μmol kg–1 yr–1 for CT (–0.2 to +1.0 μmol kg–1 yr–1 for RFR-LMEs), –1.7·10–3 units yr–1 for pHT (–1.8·10–3 to –0.2·10–3 yr–1 for RFR-LMEs), and –7.0·10–3 yr–1 for Ωar1,4,6,10 (–7.3·10–3 to +2.6·10–3 yr–1 for RFR-LMEs).
It is important to note that, for some of the Arctic and subarctic LMEs that are characterized by high seasonal ice coverage, these trends are driven by primarily summertime OA indicator values (see inset plots in Figs. 7–9). This limitation, along with the fact that these timeseries are relatively short (25 years) and regionally limited, can explain divergence in some specific cases from the global trends.
Comparison to discrete shipboard data
The RFR-LME fields presented in this work are constructed using surface CO2 measurements from shipboard flow-through analyzers. This automated observational approach allows for the collection of high spatial and temporal resolution observations of surface ocean carbonate chemistry. Discrete bottle measurements of carbonate chemistry parameters represent another approach for monitoring ocean acidification. The discrete approach allows for high-quality observations throughout the water column. Here we take near-surface discrete bottle measurements of AT and CT from GLODAPv2.202251 and CODAP-NA52, use those measurements to calculate OA indicators, and compare those calculated values with mapped surface OA indicators from RFR-LMEs.
RFR-LME indicator values are generally in good agreement with calculations from discrete bottle measurements (Fig. 10). Compared to the k-fold-validation-based uncertainty estimates (Table 6), a greater spread (i.e. larger IQRs) in the differences between GLODAP/CODAP and RFR-LME values is expected in this exercise for two reasons. First, uncertainty stemming from CO2 system calculations will contribute to the spread (e.g., Orr et al.64), since GLODAP/CODAP indicators values are calculated from AT and CT and RFR-LME indicator values are calculated from fCO2 and AT. As an example, average propagated uncertainties for GLODAP/CODAP calculations using standard measurement errors for AT and CT (2 μmol kg−1 for both) and for equilibrium constants64 were calculated as 12.0 μatm for pCO2, 0.014 for pHT, and 0.11 for Ωar. In addition, the two datasets are fundamentally different in their spatiotemporal resolution. RFR-LME grid cells represent averages for large swaths of the surface ocean over a monthly timestep, whereas shipboard measurements are appropriate for a distinct point in space at a distinct time. This spatiotemporal mismatch is especially noteworthy in the coastal ocean where diurnal and other sub-monthly modes of variability operate over spatial scales much finer that 0.25 degrees of latitude or longitude. The calculations from bottle measurements also tend to indicate higher pCO2 and therefore lower pHT and Ωar. These offsets between the two datasets may be partly related to inconsistencies in carbonate chemistry calculations, whereby calculations from AT and CT at most surface conditions tend to produce lower pHT (and higher pCO2) values than corresponding measurements of those properties65,66.
Comparison to moored buoy time series data
Timeseries of pCO2 from fixed grid cells of RFR-LME maps and RFR-LME maps constructed without moored buoy observations (RFR-LME-NM) were compared to pCO2 observations at fixed buoy locations that were extracted from the SOCATv2023 database and aggregated in monthly bins. This provides a test of the capacity of RFR-LMEs to reproduce monthly variability in validation measurements that were withheld from training, and can be considered an assessment of the RFR-LME skill with monthly variability generally. Mapped global data products of surface ocean carbonate chemistry obtained from SeaFlux27,47 were also compared to the moored buoy observations.
Differences between moored buoy observations and mapped products (Table 9; Fig. 11) suggest that timeseries extracted from our regionally focused RFR-LME maps more meaningfully reflect observed pCO2 than those from mapped global products. Like RFR-LME, most of these alternative products were trained from versions of SOCAT that include the buoy observations. The average median ( ± IQR) ΔpCO2 (pCO2(moor.) – pCO2(grid)) was 0.1 ± 23.6 μatm for RFR-LME and increased to −13.0 ± 49.2 μatm for the RFR-LME-NM product, which excluded these observations from the training data. These increased error statistics emphasize the value of moored buoy observations for the surface CO2 observing system. Still, all but one (JENA-MLS; ΔpCO2 = −2.2 ± 48.7) of the mapped data products from SeaFlux exhibited more variability in their differences from buoy observations than even the version of RFR-LME constructed without moored buoy observations. JENA-MLS may perform better at representing pCO2 at these mooring sites because it explicitly models mixed layer fluxes and processes rather than relying on empirical relationships learned from large sets of data.
Individual timeseries from moored buoy sites (Fig. 12) emphasize the significant seasonal and interannual variability in buoy pCO2 observations (black dots), even when aggregated in monthly bins, and the challenge for mapped products (colored lines) to accurately capture each of those variations at a local scale. The performance of the regional RFR-LME maps compared to the global mapped products reinforces the notion that locally specific relationships captured by training machine learning algorithms at the scale of objectively defined clusters within LMEs can resolve fine-scale variations in ocean biogeochemistry more effectively than global-scale algorithms, even though those global-scale algorithms are trained with a larger amount of data24,67. Positive trends in pCO2 superimposed upon seasonal variations are visible in both the moored buoy observations and mapped data product timeseries (Fig. 12).
Comparison to mapped data products
Finally, RFR-LME surface pCO2 was compared directly to the six global-scale mapped products of pCO2 from SeaFlux across the overlapping interval between them (1998–2019). Maps of average surface pCO2 display similar patterns across all six SeaFlux products, but differences between those products and RFR-LME (ΔpCO2 = pCO2(SeaFlux) − pCO2(RFR-LME)) reveal subtle regional differences (Fig. 13). SeaFlux provides a pCO2 filler field derived from Landschützer et al.68 to fill spatial gaps in global surface products; this gap filler is not used to produce the difference maps displayed in Fig. 13. However, for spatial consistency, it is used to calculate the averages and standard deviations of the differences for each data product shown in Fig. 13.
In the tropical Pacific, RFR-LME maps agreed well with all products but NIES-FNN, where a prevailing negative bias is evident in that product. In the Atlantic, RFR-LME maps generally agreed well, with visible biases in the Mississippi plume (CSIR-ML6), Georges Bank (JMA-MLR), Caribbean (JENA-MLS), and throughout (NIES-FNN). Coastal negative biases are visible for most products in the central CCS region, and coastal positive biases are visible in the northern CCS region. Both positive and negative biases occur in the regions surrounding Alaska, where low observational density likely leads to significant diversity in pCO2 estimates among the gap-filling approaches.
Despite these regional discrepancies with some individual products, the median (±1 IQR) ΔpCO2 for the ensemble average of all six SeaFlux products is 0.8 ± 16.6 μatm. This indicates that RFR-LME — which represents local-scale temporal variability in surface pCO2 more effectively than global products (Table 9; Fig. 11) — agrees at broad scales with observation-based products that are well accepted and widely used by community-wide synthesis efforts such at the Global Carbon Budget2 and REgional Carbon Cycle Assessment and Processes Project (RECCAP2)69.
Code availability
Code for accessing and processing the data discussed in this study is freely available on Github (https://github.com/jonathansharp/US-RFR-LMEs). Code was written in MATLAB version R2022a. Parameters used to generate and validate the current dataset are described throughout the Methods section and are listed in Table 2.
References
IPCC et al. Changing Ocean, Marine Ecosystems, and Dependent Communities. IPCC Special Report on the Ocean and Cryosphere in a Changing Climate 447–588 (2019).
Friedlingstein, P. et al. Global Carbon Budget 2023. Earth System Science Data 15, 5301–5369 (2023).
Feely, R. A. et al. Acidification of the Global Surface Ocean: What We Have Learned from Observations. Oceanography 36, 120–129 (2023).
Ma, D., Gregor, L. & Gruber, N. Four Decades of Trends and Drivers of Global Surface Ocean Acidification. Global Biogeochemical Cycles 37, e2023GB007765 (2023).
Jiang, L.-Q. et al. Global Surface Ocean Acidification Indicators From 1750 to 2100. Journal of Advances in Modeling Earth Systems 15, e2022MS003563 (2023).
Gregor, L. & Gruber, N. OceanSODA-ETHZ: a global gridded data set of the surface ocean carbonate system for seasonal to decadal studies of ocean acidification. Earth System Science Data 13, 777–808 (2021).
Sutton, A. J. et al. Autonomous seawater pCO2 and pH time series from 40 surface buoys and the emergence of anthropogenic trends. Earth System Science Data 11, 421–439 (2019).
Bates, N. et al. A time-series view of changing ocean chemistry due to ocean uptake of anthropogenic CO2 and ocean acidification. Oceanography 27, 126–141 (2014).
Denvil-Sommer, A., Gehlen, M., Vrac, M. & Mejia, C. LSCE-FFNN-v1: A two-step neural network model for the reconstruction of surface ocean pCO2 over the global ocean. Geoscientific Model Development 12, 2091–2105 (2019).
Iida, Y., Takatani, Y., Kojima, A. & Ishii, M. Global trends of ocean CO2 sink and ocean acidification: an observation-based reconstruction of surface ocean inorganic carbon variables. J Oceanogr 77, 323–358 (2021).
Laruelle, G. G. et al. Global high-resolution monthly pCO2 climatology for the coastal ocean derived from neural network interpolation. Biogeosciences 14, 4545–4561 (2017).
Roobaert, A., Regnier, P., Landschützer, P. & Laruelle, G. G. A novel sea surface pCO2-product for the global coastal ocean resolving trends over the 1982–2020 period. Earth System Science Data Discussions 1–32, https://doi.org/10.5194/essd-2023-228 (2023).
Byrne, R. H., Mecking, S., Feely, R. A. & Liu, X. Direct observations of basin-wide acidification of the North Pacific Ocean. Geophysical Research Letters 37, 1–5 (2010).
Carter, B. R. et al. Pacific Anthropogenic Carbon Between 1991 and 2017. Global Biogeochemical Cycles 33, 597–617 (2019).
Caldeira, K. & Wickett, M. E. Anthropogenic carbon and ocean pH. Nature 425, 365 (2003).
Bopp, L. et al. Multiple stressors of ocean ecosystems in the 21st century: Projections with CMIP5 models. Biogeosciences 10, 6225–6245 (2013).
Kwiatkowski, L. et al. Twenty-first century ocean warming, acidification, deoxygenation, and upper-ocean nutrient and primary production decline from CMIP6 model projections. Biogeosciences 17, 3439–3470 (2020).
Feely, R. A. et al. Chemical and biological impacts of ocean acidification along the west coast of North America. Estuarine, Coastal and Shelf Science 183, 260–270 (2016).
National Marine Fisheries Service. Large marine ecosystems of the world: an annotated bibliography. https://doi.org/10.7289/V5/TM-NMFS-F/SPO-167 (2016).
Dai, M. et al. Carbon Fluxes in the Coastal Ocean: Synthesis, Boundary Processes, and Future Trends. Annu. Rev. Earth Planet. Sci. 50, 593–626 (2022).
Mackenzie, F., Andersson, A., Lerman, A. & Ver, L. Boundary exchanges in the global coastal margin: implications for the organic and inorganic carbon cycles. The sea 13, 193–225 (2005).
Hauri, C. et al. Spatiotemporal variability and long-term trends of ocean acidification in the California Current System. Biogeosciences 10, 193–216 (2013).
Laruelle, G. G., Lauerwald, R., Pfeil, B. & Regnier, P. Regionalized global budget of the CO2 exchange at the air-water interface in continental shelf seas. Global Biogeochemical Cycles 28, 1199–1214 (2014).
Sharp, J. D., Fassbender, A. J., Carter, B. R., Lavin, P. D. & Sutton, A. J. A monthly surface pCO2 product for the California Current Large Marine Ecosystem. Earth System Science Data 14, 2081–2108 (2022).
Chau, T.-T.-T., Gehlen, M., Metzl, N. & Chevallier, F. CMEMS-LSCE: A global 0.25-degree, monthly reconstruction of the surface ocean carbonate system. Earth System Science Data Discussions 1–52, https://doi.org/10.5194/essd-2023-146 (2023).
Bakker, D. C. E. et al. A multi-decade record of high-quality fCO2 data in version 3 of the Surface Ocean CO2 Atlas (SOCAT). Earth System Science Data 8, 383–413 (2016).
Fay, A. R. et al. SeaFlux: harmonization of air–sea CO2 fluxes from surface pCO2 data products using a standardized approach. Earth System Science Data 13, 4693–4710 (2021).
Friedlingstein, P. et al. Global Carbon Budget 2022. Earth System Science Data 14, 4811–4900 (2022).
Takahashi, T. et al. Climatological mean and decadal change in surface ocean pCO2, and net sea–air CO2 flux over the global oceans. Deep-Sea Research Part II 56, 24 (2009).
Rödenbeck, C. et al. Interannual sea-air CO2 flux variability from an observation-driven ocean mixed-layer scheme. Biogeosciences 11, 4599–4613 (2014).
Iida, Y. et al. Trends in pCO2 and sea–air CO2 flux over the global open oceans for the last two decades. Journal of Oceanography 71, 637–661 (2015).
Landschützer, P. et al. A neural network-based estimate of the seasonal to inter-annual variability of the Atlantic Ocean carbon sink. Biogeosciences 10, 7793–7815 (2013).
Landschützer, P., Gruber, N., Bakker, D. C. E. & Schuster, U. Recent variability of the global ocean carbon sink. Global Biogeochemical Cycles 28, 927–949 (2014).
Carter, B. R. et al. New and updated global empirical seawater property estimation routines. Limnology and Oceanography: Methods https://doi.org/10.1002/lom3.10461 (2021).
Sharp, J. D. et al. RFR-LME Ocean Acidification Indicators from 1998 to 2022 (NCEI Accession 0287551). https://doi.org/10.25921/H8VW-E872 (2024).
Gregor, L., Lebehot, A. D., Kok, S. & Scheel Monteiro, P. M. A comparative assessment of the uncertainties of global surface ocean CO2 estimates using a machine-learning ensemble (CSIR-ML6 version 2019a)-Have we hit the wall? Geoscientific Model Development 12, 5113–5136 (2019).
National Centers for Environmental Information. Surface Ocean CO2 Atlas Database Version 2023 (SOCATv2023) (NCEI Accession 0278913).
Huang, B. et al. Improvements of the Daily Optimum Interpolation Sea Surface Temperature (DOISST) Version 2.1. Journal of Climate 34, 2923–2939 (2021).
European Union - Copernicus Marine Service. Global Ocean Ensemble Physics Reanalysis. Mercator Ocean International https://doi.org/10.48670/MOI-00024 (2019).
European Union - Copernicus Marine Service. Global Ocean Gridded L4 Sea Surface Heights And Derived Variables Reprocessed (1993-Ongoing). Mercator Ocean International https://doi.org/10.48670/MOI-00148 (2021).
NASA Ocean Biology Processing Group. Aqua MODIS Level 3 Mapped Chlorophyll Data, Version R2022.0. NASA Ocean Biology Distributed Active Archive Center https://doi.org/10.5067/AQUA/MODIS/L3M/CHL/2022 (2022).
NASA Ocean Biology Processing Group. OrbView-2 SeaWiFS Global Mapped Chlorophyll (CHL) Data, version R2022.0. [object Object] https://doi.org/10.5067/ORBVIEW-2/SEAWIFS/L3M/CHL/2022 (2022).
Copernicus Climate Change Service. ERA5 monthly averaged data on single levels from 1979 to present. ECMWF https://doi.org/10.24381/CDS.F17050D7 (2019).
NOAA National Centers for Environmental Information. ETOPO 2022 15 Arc-Second Global Relief Model. NOAA National Centers for Environmental Information https://doi.org/10.25921/FD45-GT74 (2022).
Kanamitsu, M. et al. NCEP–DOE AMIP-II Reanalysis (R-2). Bull. Amer. Meteor. Soc. 83, 1631–1644 (2002).
NOAA ESRL GML CCGG Group. Earth System Research Laboratory Carbon Cycle and Greenhouse Gases Group Flask-Air Sample Measurements of CO2 at Global and Regional Background Sites, 1967-Present. NOAA ESRL GML CCGG Group https://doi.org/10.15138/WKGJ-F215 (2019).
Gregor, L. et al. SeaFlux v2023: harmonised sea-air CO2 fluxes from surface pCO2 data products using a standardised approach. Zenodo https://doi.org/10.5281/ZENODO.8280457 (2023).
Chau, T., Gehlen, M. & Chevallier, F. Global Ocean Surface Carbon Product MULTIOBS_GLO_BIO_CARBON_SURFACE_REP_015_008. Update 2, 09 (2020).
Jersild, A., Landschützer, P., Gruber, N. & Bakker, D. C. E. An observation-based global monthly gridded sea surface pCO2 and air-sea CO2 flux product from 1982 onward and its monthly climatology (NCEI Accession 0160558). NOAA National Centers for Environmental Information https://doi.org/10.7289/V5Z899N6 (2017).
Zeng, J., Nojiri, Y., Landschützer, P., Telszewski, M. & Nakaoka, S. A global surface ocean fCO2 climatology based on a feed-forward neural network. Journal of Atmospheric and Oceanic Technology 31, 1838–1849 (2014).
Lauvset, S. K. et al. GLODAPv2.2022: the latest version of the global interior ocean biogeochemical data product. Earth Syst. Sci. Data 14, 5543–5572 (2022).
Jiang, L. et al. Coastal Ocean Data Analysis Product in North America (CODAP-NA) – an internally consistent data product for discrete inorganic carbon, oxygen, and nutrients on the North American ocean margins. 1–23 (2021).
McLachlan, G. J. & Peel, D. Finite Mixture Models. (Wiley, New York, 2000).
Breiman, L. Random forests. Machine learning 45, 5–32 (2001).
Weiss, R. F. Carbon dioxide in water and seawater: the solubility of a non-ideal gas. Marine Chemistry 2, 203–215 (1974).
Sharp, J. D. et al. CO2SYSv3 for MATLAB. Zenodo https://doi.org/10.5281/ZENODO.3950562 (2023).
Dickson, A. G., Sabine, C. L. & Christian, J. R. Guide to Best Practices for Ocean CO2 Measurements. PICES Special Publication 3 (North Pacific Marine Science Organization, Sidney, B.C., Canada, 2007).
Lewis, E. & Wallace, D. W. R. CO2SYS-Program developed for the CO2 system calculations. Carbon Dioxide Information Analysis Center Report ORNL/CDIAC-105 (1998).
Lueker, T. J., Dickson, A. G. & Keeling, C. D. Ocean pCO2 calculated from dissolved inorganic carbon, alkalinity, and equations for K1 and K2: validation based on laboratory measurements of CO2 in gas and seawater at equilibrium. Marine Chemistry 70, 105–119 (2000).
Dickson, A. G. Thermodynamics of the dissociation of boric acid in synthetic seawater from 273.15 to 318.15 K. Deep Sea Research Part A, Oceanographic Research Papers 37, 755–766 (1990).
Lee, K. et al. The universal ratio of boron to chlorinity for the North Pacific and North Atlantic oceans. Geochimica et Cosmochimica Acta 74, 1801–1811 (2010).
Dickson, A. G. Standard potential of the reaction: AgCl(s) + 12H2(g) = Ag(s) + HCl(aq), and and the standard acidity constant of the ion HSO4− in synthetic sea water from 273.15 to 318.15 K. The Journal of Chemical Thermodynamics 22, 113–127 (1990).
Perez, F. F. & Fraga, F. Association constant of fluoride and hydrogen ions in seawater. Marine Chemistry 21, 161–168 (1987).
Orr, J. C., Epitalon, J., Dickson, A. G. & Gattuso, J. Routine uncertainty propagation for the marine carbon dioxide system. Marine Chemistry 207, 84–107 (2018).
Fong, M. B. & Dickson, A. G. Insights from GO-SHIP hydrography data into the thermodynamic consistency of CO2 system measurements in seawater. Marine Chemistry 211, 52–63 (2019).
García-Ibáñez, M. I. et al. Gaining insights into the seawater carbonate system using discrete fCO2 measurements. Marine Chemistry 245, 104150 (2022).
Duke, P. J. et al. Estimating marine carbon uptake in the northeast Pacific using a neural network approach. Biogeosciences 20, 3919–3941 (2023).
Landschützer, P., Laruelle, G. G., Roobaert, A. & Regnier, P. A combined global ocean pCO2 climatology combining open ocean and coastal areas (NCEI Accession 0209633). NOAA National Centers for Environmental Information https://doi.org/10.25921/QB25-F418 (2020).
Ciais, P. et al. Definitions and methods to estimate regional land carbon fluxes for the second phase of the REgional Carbon Cycle Assessment and Processes Project (RECCAP-2). Geoscientific Model Development 15, 1289–1316 (2022).
Acknowledgements
This research was supported by the NOAA Ocean Acidification Program (OAP; https://ror.org/02bfn4816) under the project “Temporal changes of ocean acidification indicators in the U.S. large marine ecosystems (LMEs) - an operational data product at NOAA/NCEI,” project ID 21925, as part of the initiative to bolster NOAA’s National Marine Ecosystem Status effort. Additional funding for JDS and BRC was from the Cooperative Institute for Climate, Ocean, & Ecosystem Studies (CIOCES) under NOAA cooperative agreement no. NA20OAR4320271. Additional funding for L-QJ, PDL, and HY was from NOAA National Centers for Environmental Information (NCEI) through a NOAA Cooperative Institute for Satellite Earth System Studies (CISESS) Grant (NA19NES4320002) at the Earth System Science Interdisciplinary Center (ESSIC), University of Maryland. BRC also thanks the Carbon Data Management and Synthesis Grant from NOAA’s Global Ocean Monitoring and Observation division (GOMO), Fund Ref. No. 100007298 for supporting his contributions to this project. This is CICOES contribution no. 2024-1342 and PMEL contribution no. 5593. The Surface Ocean CO2 Atlas (SOCAT) is an international effort, endorsed by the International Ocean Carbon Coordination Project (IOCCP), the Surface Ocean Lower Atmosphere Study (SOLAS) and the Integrated Marine Biosphere Research (IMBeR) program, to deliver a uniformly quality-controlled surface ocean CO2 database. The many researchers and funding agencies responsible for the collection of data and quality control are thanked for their contributions to SOCAT. NOAA OISST V2 High Resolution Dataset data and NCEP/DOE Reanalysis II data were provided by the NOAA PSL, Boulder, Colorado, USA, from their website at https://psl.noaa.gov. This study has been conducted using E.U. Copernicus Marine Service Information; https://doi.org/10.48670/moi-00024, https://doi.org/10.48670/moi-00048, https://doi.org/10.24381/cds.f17050d7. Neither the European Commission nor ECMWF is responsible for any use that may be made of the Copernicus information or data it contains. NASA sea surface chlorophyll data were provided by the NASA Ocean Biology Processing Group (OBPG) and the NASA Ocean Biology Distributed Active Archive Center (OB.DAAC). Atmospheric CO2 data were prepared and provided by the NOAA GML Carbon Cycle Group. We thank the NOAA Ecosystem Indicators Working Group members — including Willem Klajbor (NOAA/AOML), Chris Kelbe (NOAA/AOML), Erica Towle (NOAA/Coral Reef Conservation Program), and Xiao Liu (NOAA/GFDL) — as well as Kaitlin Goldsmith (NOAA/OAP) and other stakeholders for providing comments that helped improved the web interface. We thank Tim Boyer for leading the proposal securing funding for this work and for providing comments and feedback on the manuscript. Zachary Strasberg (University of New Mexico), a summer intern with NOAA OAP, performed computational simulations that informed our error estimates and the results of which may be incorporated into future releases of this data product.
Author information
Authors and Affiliations
Contributions
JDS contributed to the proposal securing funding for this work and led the coding, figure generation, and writing efforts. L-QJ drafted the initial version of the proposal securing funding for this work, managed the overall project and the budget at CISESS (UMD), and provided comments and feedback on the manuscript. BRC contributed to the proposal securing funding for this work, managed the project at UW CICOES, and provided comments and feedback on the manuscript. PDL contributed to the proposal securing funding for this work, assisted with the machine learning model selection and tuning, and provided comments and feedback on the manuscript. HY provided feedback on the code, provided comments and feedback on the manuscript, and has led the transfer of code to a cloud environment in preparation of automated creation of the OA indicators. SLC contributed to the proposal securing funding for this work, interfaced with the NOAA Ecosystem Indicators Working Group, and provided comments and feedback on the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Sharp, J.D., Jiang, LQ., Carter, B.R. et al. A mapped dataset of surface ocean acidification indicators in large marine ecosystems of the United States. Sci Data 11, 715 (2024). https://doi.org/10.1038/s41597-024-03530-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41597-024-03530-7