Introduction

The ocean is an important sink of anthropogenic CO2, having absorbed 23% of all man-made emissions in 20191, with the Southern Ocean south of 35°S accounting for around 40% of this global ocean sink2. This outsized role in the climate system means that understanding the Southern Ocean carbon sink and its variability are critical for climate assessments and for the Global Carbon Budget1 that contributes to the Intergovernmental Panel on Climate Change report3. Estimates of air-sea CO2 fluxes from models, observation-based data products and observations show the largest disagreement in the Southern Ocean, in particular with regard to low frequency variations4. Given the lack of observational coverage in this remote and harsh ocean region, this disagreement does not come as a surprise. The Surface Ocean CO2 Atlas (SOCAT)5 is the largest compilation of all surface CO2 observations, but large areas of the Southern Ocean remain sparsely or unsampled in winter, particularly at higher latitudes (Fig. 1a and S1). To fill this data void, ocean carbon sink estimates are produced by interpolating in-situ measurements using a variety of techniques including state-of-the-art machine learning methods6,7,8,9, and air-sea CO2 fluxes are then computed by combining the mapped values with atmospheric CO2 data using a simple bulk gas transfer parameterization10. Despite the recent advancements in gap-filling methods, however, data sparsity continues to be among the largest uncertainties in the flux reconstructions, particularly in the Southern Ocean4.

Fig. 1: Southern ocean fCO2 data coverage.
figure 1

a Colour shows wintertime (June-September) observations of sea surface fCO2 (in micro-atmosphere, µ atm) from SOCAT south of 35°S from 2004 to 2018, binned and averaged onto a 4° latitude by 6° longitude grid. b As a but with the addition of wintertime pseudo fCO2 observations constructed for the same period. Light grey shading shows the coverage of all non-wintertime SOCAT fCO2 observations for the same years on the same grid, and the black dots are the locations of the pseudo observations on the 1° × 1° grid used to train the SOM-FFN. The black lines show the mean position of the polar front from a published product62.

Recognizing the need for year-round observations to monitor this essential carbon sink, since 2014 the observational coverage of the Southern Ocean has rapidly improved thanks to the advent of autonomous platforms including floats11 and uncrewed surface vehicles12. Notwithstanding the substantial improvement in coverage they bring, these new observations cannot address historical sparsity and are therefore limited in their ability to constrain estimates of past multiyear variability. In an attempt to resolve this, a recent study13 used summertime subsurface observations of dissolved inorganic carbon (DIC) south of the Antarctic Polar Front (APF) from the Global Ocean Data Analysis Project (GLODAP)14,15 to extrapolate ‘pseudo observations’ of wintertime surface pCO2, boosting the coverage from 2004 onwards. They found that the pseudo observations increased winter outgassing, but their method did not reproduce a strong reversal of the trend from an increasing to a decreasing sink around 2011 suggested by other studies7,16. In this work, we build on this novel constraint in two ways: first by switching from a simple multiple linear regression (MLR) for gap-filling the data to a more sophisticated neural network-based approach17; and second by increasing the number of pseudo observations from 760 to 798 using a more recent version of GLODAP18. We further validate the full set of pseudo observations against a data-assimilating biogeochemical ocean model19, and combine our mapped fCO2 product with a gas transfer parameterisation10 to produce air–sea CO2 flux estimates for the Southern Ocean from 1993 to 2018 (see Methods section).

When binned monthly and onto a 4° latitude by 6° longitude grid (roughly equivalent to the 400 km decorrelation length scale for pCO220), our pseudo observations increase wintertime (June-September) coverage south of the APF by 68% for the period 2004–2018 compared with using only the direct observations compiled in the SOCAT database5 (see Fig. 1). However, the pseudo observations are assumed to represent conditions in September (see Methods section), and as such a majority of them are in locations that would be under sea ice, which diminishes their direct influence on the air-sea flux. Instead, it is the modification by the pseudo observations of relationships between driver variables and surface fCO2 established in the gap-filling step that refines our estimate of the carbon sink. An illustration of the differences between the estimate of surface CO2 from this study and an estimate using the same set of pseudo observations combined with the MLR gap-filling method of prevously employed13 is shown in Supplementary Fig. S1.

Furthermore, recent research21 highlighted the need to make a correction to the fCO2 data to account for temperature gradients in the atmosphere-ocean mass boundary layer (MBL) before calculating the air–sea flux. This altered the 1992–2018 global mean ocean CO2 sink by 0.8–0.9 PgC yr−1 (ref. 22), yet no regional correction for the Southern Ocean exists. Here we apply a similar temperature correction, and compare our estimate of the Southern Ocean carbon sink with results from an atmospheric inversion that relies on atmospheric observations of CO2 from surface sites in combination with model simulations of atmospheric CO2 transport23 (see Methods section).

Results

Surface pCO2 south of the Antarctic Polar Front

Figure 2 shows the mean seasonal cycle and annual means south of the APF of surface pCO2 mapped using a Self-Organising Map Feed Forward Neural Network technique (SOM-FFN; see Methods section). The winter peak of the seasonal cycle of pCO2 in the region is reduced due to the pseudo observations. The phase of the seasonal cycle of pCO2 is unaffected by their inclusion, but its magnitude reduces from 47 µatm to 40 µatm. The seasonal cycle of pCO2 is anticorrelated with temperature, and instead most closely resembles the non-thermal drivers mixed layer depth and sea surface salinity, which have their peak values in September and October (see Supplementary Fig. S2). This suggests the seasonal cycle of pCO2 is DIC-driven in this region, with deeper mixed layers in wintertime stirring up DIC-rich waters from below causing an outgassing tendency that overcomes the uptake tendency caused by surface cooling.

Fig. 2: Mean pCO2 from SOM-FFN reconstructions.
figure 2

a 1993-2018 mean seasonal cycle and b annual mean surface pCO2 (in micro-atmosphere, µ atm) south of the Antarctic Polar Front from the SOM-FFN with and without pseudo observations in the training data (solid lines). Each line represents an ensemble mean, and the shaded areas are the 1-σ uncertainties (see Methods section). The black dashed line shows the annual mean atmospheric pCO2 above the same ocean area, calculated from the NOAA ESRL product50,51.

The long-term trend follows the atmospheric pCO2 trend (Fig. 2b), but with some strong variability in the 2000s in particular that is not reduced by the inclusion of pseudo observations. From 2007 onwards, the pseudo observations cause only a slight divergence in the two solid lines, and a slightly weaker upward trend such that the surface pCO2 estimate does not keep pace with the atmosphere.

Air-sea CO2 fluxes south of the Antarctic Polar Front

The reduction of the winter peak of the seasonal cycle due to the pseudo observations seen on Fig. 2a would alone tend to increase the estimated CO2 uptake by the ocean. However, the reduction in pCO2 occurs below the sea ice (see Supplementary Fig. S3), and, consequently, this signal has little contribution to the air-sea flux: away from sea ice, the pseudo observations tend to increase pCO2 and this effect just dominates the air–sea flux south of the APF (Figs. S3 and 3a). The peak winter outgassing increases fractionally from 0.09 PgC yr−1 to 0.12 PgC yr−1, and is shifted from July to September attributed to the data interpolation method; meanwhile the magnitude of the flux seasonal cycle increases from 0.66 to 0.69 PgC yr−1.

The fluxes on Fig. 3 with and without the pseudo observations are not distinguishable within the uncertainties, but the central estimate of the annual mean flux is shifted upwards slightly in the period 2002 to 2011. The pseudo observations begin in 2004 due to their method of calculation (see Methods section), so their influence diminishes before then; after 2011 the relative increase in coverage they provide also reduces due to improvements in direct fCO2 measurement coverage (see Supplementary Fig. S4). There is only a slight impact on the multiyear variability of the fluxes over the period covered by the pseudo observations: both lines on Fig. 3b show an increase in the sink from 2004 to 2011 followed by a stagnation. Without the pseudo observations the sink trend south of the APF from 2004 to 2011 is −0.27 PgC yr−1 decade−1, and from 2011 to 2018 is −0.04 PgC yr−1 decade−1. When the pseudo observations are added, the trends over the same two periods are −0.33 PgC yr−1 decade−1 and −0.06 PgC yr−1 decade−1, respectively (linear trends are shown as dashed lines on Fig. 3). For the period between 2004, when the pseudo observations begin, to 2011, where their influence largely vanishes, they reduce the mean sink by 34% from 0.15 PgC yr−1 to 0.10 PgC yr−1.

Fig. 3: Air-sea CO2 fluxes south of the Antarctic Polar Front.
figure 3

a 1993-2018 mean seasonal cycle and b annual mean air-sea CO2 fluxes (negative into the ocean) south of the Antarctic Polar Front based on sea surface fCO2 from the SOM-FFN with and without pseudo observations in the training data. Values are Petagrams of carbon per year (Pg C yr−1). Each solid line represents an ensemble mean, and the shaded areas are the 1-σ uncertainties (see Methods section). The dashed lines on b are linear fits for the 2004–2011 and 2011–2018 periods.

Spatial patterns of air-sea CO2 flux trends

The decadal trends shown on Fig. 3b support the view of a weakening sink in the 1990s24, followed by a reinvigoration in the 2000s25 and a subsequent stagnation in at least the early part of the 2010s26. We now examine the spatial variation of the decadal trends from 2004 onwards, and the impact of the pseudo observations on those trends, on Fig. 4. While direction and spatial patterns are largely unaltered, we observe a regional refinement of the trends. In 2004–2011 (Fig. 4a, b), the trend is towards increasing uptake almost everywhere across the Southern Ocean, and the pseudo observations even strengthen the trend south of the zonal wind maximum in all sectors, with the largest effect in the Pacific (Fig. 4c). In 2011–2018, the trend is patchier with areas of increasing uptake and areas of increasing outgassing. The pseudo observations strengthen an outgassing trend in the Pacific lower latitudes, and weaken a region of outgassing trend just south of the zonal wind maximum near 20°E (Fig. 4e, f).

Fig. 4: Mapped CO2 flux trends.
figure 4

Colours show air–sea CO2 flux trends (negative is increasing uptake) and arrows show wind trends, for the 2004–2011 period (top row) and 2011–2018 period (bottom row). Values are moles of carbon per metre squared per year per decade (mol C m−2 yr−1 decade−1). Panels a and d show CO2 flux trends when pseudo observations are not used, panels b and e show the trends when the pseudo observations are included, and panels c and f show the difference in CO2 flux trends due to the pseudo observations (note the different colour scale for the difference plots). The solid green line on all panels shows the latitude of the 2004–2018 mean zonal wind maximum based on CCMP winds46, and the black dashed lines mark ocean basin boundaries.

The weakening of the Southern Ocean carbon sink in the 1990s has been explained by a southward shift of westerly winds associated with an increasing positive index of the Southern Annual Mode, which caused increased upwelling of natural DIC and consequent outgassing of CO227. The reinvigoration from 2002 to 2011 has been attributed to a zonally asymmetric atmospheric circulation that drove contrasting patterns of wind and SST change in the different ocean basins25,26, and the subsequent stagnation to regional shifts in sea level pressure linked to its zonal wavenumber 3 and related changes in winds26. Other explanations include changes in external forcing28, and changes in the strength of the upper ocean overturning circulation29. Here we are unable to further elaborate on the cause of the decadal variations, however, we can speculate as to whether the addition of our pseudo observations supports the previously suggested mechanisms. For example, the enhanced 2004 to 2011 uptake trend at high latitudes (Fig. 4c) coincides with a weakening trend in the westerly winds that would likely be associated with changes in the physical circulation. This appears consistent with suggestions that winter variability driven by changes in mixing and stratification controls the longer term variability of the Southern Ocean carbon sink30. This is also supported by our finding that the seasonal cycle of pCO2 in the region closely resembles that of the mixed layer depth (Supplementary Fig. S2).

Comparison with atmospheric inversions

We have so far examined the mean fluxes for the region of the Southern Ocean southwards of the APF, since this is where we have added information in the form of our pseudo observations to aid in the estimation of the carbon sink. To place the results in context, we show the sink for the whole Southern Ocean south of 35°S, on Fig. 5. The results are similar to those for the high latitudes on Fig. 3, with the pseudo observations causing a small reduction in uptake over the winter period (Fig. 5a) and in the mid-late 2000s (Fig. 5b). The central estimate of the 2004 to 2011 mean sink reduces by 8% from −1.21 PgC yr−1 without the pseudo observations to −1.11 PgC yr−1 when they are included.

Fig. 5: Air-sea CO2 fluxes for the Southern Ocean.
figure 5

a 2000–2017 mean seasonal cycle and b annual mean air-sea CO2 fluxes (negative into the ocean) for the Southern Ocean south of 35°S based on sea surface fCO2 from the SOM-FFN with and without pseudo observations. Values are Petagrams of carbon per year (Pg C yr−1). Each line represents an ensemble mean, and the shaded area is the 1-σ uncertainty on the SOM-FFN result with pseudo observations (see Methods). Also plotted are the same quantities derived from an atmospheric inversion using two different priors (dashed lines; see Methods section). Note that unlike Fig. 3, the mean seasonal cycle is plotted for a shorter time period, for consistency with the atmospheric inversions.

Corrections applied to account for MBL effects have increased the sink south of 35°S by ~0.3 PgC yr−1 compared with estimates from fluxes where the corrections were not applied to the fCO2 data driving the SOM-FFN mapping (see Supplementary Fig. S5); this offset is in line with a previous estimate for the global ocean21. The combination of the reduction due to the pseudo observations offset by the increase due to the MBL corrections results in an estimated 2004-2018 mean sink south of the APF of −0.16 ± 0.07 PgC yr−1, and south of 35°S of −1.27 ± 0.23 PgC yr−1. Our results are broadly consistent with two atmospheric inversions (Fig. 5 dashed lines), which are based on a set of independent measurements, and with a recent study that estimated the Southern Ocean sink using aircraft data31 (see Supplementary Fig. S6). We note here that the MBL correction is crucial to bringing our results in line with the atmospheric estimates, with the pseudo observations only improving the agreement with one of the atmospheric inversions on Fig. 5b, and making almost no difference to the comparison with aircraft data on Fig. S6.

Discussion

We have presented an estimate of the Southern Ocean CO2 sink that applied a sophisticated pCO2 gap-filling technique to a dataset with boosted wintertime coverage in the form of pseudo observations extrapolated from subsurface summertime observations. Our work builds on the results of an earlier study13 that attempted to estimate the carbon sink and its variability south of the APF with the benefit of pseudo observations constructed using the same methods, but with the necessary step of gap-filling ocean surface CO2 data having been achieved using a multiple linear regression, rather than the more sophisticated SOM-FFN machine learning method used here. The earlier study estimated the long-term mean sink from 2004 to 2017 as −0.02 ± 0.02 PgC yr−1 south of the APF. After accounting for a downward shift of 0.10 PgC yr−1 for that region due to the MBL corrections applied in this study, our estimate of −0.16 ± 0.07 PgC yr−1 agrees well with the earlier result. Qualitatively the variability over the same period is also quite similar between the two studies: when split at 2011 and run separately over two time periods, the MLR showed an increasing sink from 2004 to 2011 followed by a flattening/slight decrease (their Fig. D1). It also showed, as we do here, that the introduction of pseudo observations increases the downward trend over the 2004–2011 period (Fig. 3b). Quantitatively, the increase in the sink of 0.32 PgC yr−1 from the MLR implies a trend of ~−0.46 PgC yr−1 decade−1 which is larger than our trend of −0.33 PgC yr−1 decade−1 using the SOM-FFN. The difference could be explained by a number of factors, including the different gap-filling methods, the longer training dataset used in the current study (1993–2018 compared to 2004–2017), a larger set of driver variables (see Table 1; the MLR study used only temperature, salinity, mixed layer depth and atmospheric CO2 concentration), and perhaps least significantly, the additional pCO2 and pseudo-data used here.

One potential caveat introduced by the pseudo observations (discussed in detail in previous work13), is the possibility of a bias in the estimated air-sea CO2 fluxes. We therefore explored the possibility of bias in the pseudo observations by validating them with a data-assimilating biogeochemical ocean model (see Methods section), and concluded it was neglectable, confirming an earlier validation with direct winter observations13.

We have found that the incorporation of pseudo observations that increase winter data coverage at high latitudes from 2004 onwards does not significantly alter the Southern Ocean carbon sink variability over the reconstructed period, suggesting it is relatively well constrained by sparse and seasonally-biased observations using this method. The pseudo observations influence our estimates of the air-sea flux pre-2004 by modifying relationships between fCO2 and its driver variables in the SOM-FFN reconstruction, but their effect diminishes with time and disappears by the year 2001 (Figs. 3b and 5b). Our conclusions about the robustness of multiyear variability to additional data coverage are therefore limited to the most recent period encompassing the reinvigoration and subsequent stagnation of the sink. A definitively constrained estimate of the variability in the most data-sparse period before 2000 remains elusive.

While introducing the additional observational constraints lowers the air-sea CO2 uptake in the Southern Ocean, this reduction is more than compensated by corrections made to account for mass boundary layer gradients in temperature and salinity. Those increased the estimated Southern Ocean carbon sink south of 35°S by ~0.3 PgC yr−1, balancing the lower uptake induced by the wintertime observations and bringing it in line with atmospheric inversions. Our results confirm earlier conclusions21 suggesting these corrections are important for understanding the ocean carbon sink.

Methods

Calculation and validation of pseudo observations

The construction of pseudo observations of wintertime surface pCO2 is described in detail in section 2.1 of an earlier study13; here we provide a summary of the method and any differences from that paper. Using data from the Global Ocean Data Analysis Project (GLODAP)18, we identify a Temperature Minimum Layer in summertime temperature profiles south of the Antarctic Polar Front (APF). These are treated as having the preserved properties of the surface waters at the same latitude and longitude in the previous September. We then adjust the concentration of dissolved inorganic carbon (DIC) from the Temperature Minimum Layer for biological activity that occurred in the water mass since last winter using the apparent oxygen utilisation (AOU) and a Redfield ratio RRC:O, obtaining an estimate for the wintertime surface DIC concentration. We also apply two corrections to the AOU value to account for oxygen undersaturation when the Temperature Minimum Layer water was last in contact with the atmosphere. The first is a uniform correction of −13.5 µmol kg−1, applied away from sea ice, that derives from the mean oxygen undersaturation, determined from SOCCOM float data11,32, in the top 10 m in the region between the APF and the sea ice. The second correction, applied in sea ice, takes the form AOUcorr = AOU(1-aCiceb), in which Cice is the sea ice concentration, and a and b are parameters we will optimise during the validation phase (see next paragraph). We then combine the extrapolated wintertime surface DIC estimate with an estimate of total alkalinity derived from a locally interpolated alkalinity regression for global alkalinity estimation (LIAR)33, and with surface temperature and salinity from a gridded Argo product34, and other tracers from GLODAP, to calculate pCO2 using MATLAB CO2SYS software35. The Argo product used in the construction of the pseudo observations spans the years 2004–2018, and consequently this period is the focus of our study.

In order to establish optimal parameters a and b for the AOU correction in sea ice, and to further validate the method of calculating pseudo observations beyond what was carried out previously13, we use model output from a data-assimilating ocean state estimate with biogeochemistry, B-SOSE19. We use the model output to create pseudo observations of DIC in exactly the same manner as had been done using observations, except that we produce model pseudo observations covering the entire region south of the APF for the years 2008–2011 (the limited time range is for convenience, reflecting the packaging of downloaded data; in practice the large number of available model points for validation means this is unlikely to be a limitation). This produces 6791 pseudo observations, which we compare with their wintertime equivalents in the previous September, at the same latitude/longitude grid cell at the surface. We then repeated the calculation for a range of parameters a, and b, and two alternative values of RRC:O, each time calculating the resulting RMSE between the set of model pseudo observations and their wintertime equivalents. The optimised parameters are a = 0.8, b = 0.1, and RRC:O = 106:−138, giving an RMSE of 11.4 µmol kg−1 and a bias of 0.49 µmol kg−1. This compares to a = 0.9, b = 0.3 and the same RRC:O giving an RMSE of 13.9 µmol kg−1 and bias of 4.1 µmol kg−1 for the validation carried out in the earlier paper using a small number of real world observations13. In addition, we tested the effect of fixing RRC:O to 106:−150, equivalent to the fixed values used by B-SOSE. This gives optimised values of a = 0.8 and b = 0.2, and the resulting set of pseudo observations produces results in terms of the mean seasonal cycle and annual mean trends of pCO2 that are indistinguishable from those we have presented on Fig. 2.

Having established a, b and RRC:O, we then run a 100-member MonteCarlo ensemble to create pseudo observations of surface pCO2 while propagating uncertainties in temperature, salinity, AOU and other parameters described previously13 through the calculation. Next, we take a mean of the ensemble to produce a set of 798 pseudo observations of wintertime surface pCO2 (increased from 760 in the earlier study). We also calculate additional surface pCO2 values from wintertime surface GLODAP observations, as in the earlier study13. We then convert the pCO2 values to fCO2 using the formula

$${{fCO}}_{2}={{pCO}}_{2}{{\exp }}\left[\frac{B\times {10}^{-6}+2\delta \times {10}^{-6}{P}}{{R\; T}}\right]$$
(1)

where B = −1636.75 + 12.0408 T − 3.16528 × 10−5 T2 + 3.16528 × 10−5 T3, T is the wintertime surface temperature from the Argo product associated with the pseudo observation, δ = 57.7 – 0.118 T, P = 1013.25 is the atmospheric pressure in millibar, and R = 8.31447. Finally, we combine the GLODAP-derived and pseudo observations of fCO2 with the SOCAT 1° × 1° monthly gridded product by taking the mean of any GLODAP-derived or pseudo observations within a grid cell and the SOCAT fCO2 value for that grid cell, if there is one, for September of the year to which those pseudo observations correspond.

Surface fCO2 correction for surface temperature gradients

The SOCAT gridded fCO2 have been corrected for a temperature difference between the depth of the observations several metres below the surface and the base of surface mass boundary layer (MBL), as in a recent study21. The temperature at the base of the MBL is obtained from the NOAA Optimum Interpolation (OI) product v236. We also correct our GLODAP-derived and pseudo observations of pCO2 to the NOAA OI temperatures, following earlier work37, using the formula:

$${{{pCO}}_{2}}^{{corr}}={{pCO}}_{2}{\exp }\left[0.0433\left({T}^{{OI}}-T\right)-4.35\times {10}^{-5}\left({{T}^{{OI}}}^{2}-{T}^{2}\right)\right]$$
(2)

where TOI is the NOAA OI product temperature at the latitude, longitude and month corresponding to the pCO2 value.

Gap-filling and mapping surface fCO2

In the original pseudo observations study13, a multiple linear regression (MLR) was used to gap-fill a combination of SOCAT, GLODAP-derived and pseudo observations of surface pCO2 and then map them for the Southern Ocean in order to calculate air-sea CO2 fluxes13. This allowed for MonteCarlo ensembles of the fluxes to be conducted to explore the uncertainties due to the pseudo observation calculation in particular, since the MLR is comparatively computationally inexpensive. However, the MLR is limited in its ability to fit the CO2 data, which in general has nonlinear relationships to its driver variables, and in particular was unable to reproduce the interannual variability captured by other studies. Here we employ a Self-Organising-Map Feed-Forward-Neural-network (SOM-FFN) method, applied to a combination of SOCAT, GLODAP-derived and pseudo fCO2 values constructed as described earlier in this text, to produce time-varying maps of surface fCO2 for the Southern Ocean south of 35°S from 1993-2018 from which to calculate air-sea CO2 fluxes. The SOM-FFN method has previously been described in detail7,38,39; here we will detail only the differences with this work.

The time period of 1993–2018 for which we have run the SOM-FFN is motivated by the use of a time-varying mixed layer depth product that starts in 199340,41, and the inclusion of the pseudo observations that have been produced for the period 2004–2018. While we present results for 1993–2018 we focus our analysis on the later period covered by the pseudo observations. The use of interannually varying mixed layer depths is an important distinction from previous work39 which used a climatology; other datasets that we use to drive the SOM-FFN are summarised in Table 1. Note also that we use the NOAA OI sea surface temperature (SST) product for the boundary layer temperature correction, for driving the SOM-FFN, and for calculating the air-sea flux (see next subsection).

Table 1 Sources of data.

We carry out ensembles of SOM-FFN runs with four members each to explore the sensitivity of the method to certain aspects of the setup. We use either only data south of 35°S as driver variables, or data south of 10°S. The latter latitude range provides some more data for the SOM-FFN to draw from, particularly benefitting gap-filling in the mid latitude SouthEast Pacific where fCO2 coverage is especially sparse, while excluding some data from near the equator that was found to cause problems for the regressions. We did not include runs using the global datasets because these require significantly more computation time, and results may be unduly influenced by data from regions remote to the Southern Ocean. We also test two methods of organising the data into biogeochemical provinces (known as biomes) in the first stage of the SOM-FFN. The first uses a Self Organising Map (the ‘SOM’ part) employed in earlier studies7,39,42, which is time-varying, and the second uses previously published biomes43,44 which are fixed in time (see Supplementary Fig. S7). We then take an ensemble mean of the mapped fCO2 from the 4 combinations of these configurations (south of 35°S vs south of 10°S data and fixed vs SOM biomes) to calculate air-sea CO2 fluxes. There is a spread in the flux estimates produced by the individual ensemble members, but it is significantly smaller than the flux uncertainties calculated as described in the next section, and it is not included in our results. We then repeat the ensembles with two further variations on our setup: one with and one without the inclusion of the pseudo observations to drive the SOM-FFN mapping. Another final distinction between the SOM-FFN setup in our study and earlier work7,38,39 is that we do not use the pCO2 climatology of Takahashi45 as a training variable for the SOM, so that our results remain independent of the gap-filling used to produce that climatology. The RMSE between the mapped SOM-FFN fCO2 and the equivalent values from the gridded SOCAT product at the same latitude, longitude and month, for the 4 runs including pseudo observations ranges from 17.63 µatm to 18.77 µatm. In general, the RMSE for the region worsened in tests where the Takahashi data were used in the SOM phase.

Calculation of air–sea fluxes and uncertainties

Following a recent study21, we use the following equation to calculate the air-sea flux of CO2 (values are positive from sea to air):

$${F}_{{{{{{{{\mathrm{CO}}}}}}}}_{2}}=(1-{C}_{{ice}})k\left({C}_{{{{SW}}}}-{C}_{{{{ATM}}}}\right)$$
(3)

where k is the gas transfer velocity, and CSW and CATM are the concentration of dissolved CO2 at the base of the MBL and the air-sea interface, respectively. The gas transfer velocity is a function of wind speed from the CCMP wind product46,47,48 and SST from NOAA OI v236, and is calculated according to an established method10. The CCMP winds are at 0.25° and 6-hourly resolution, and are interpolated onto a 1° grid to match the other inputs to the gas transfer calculation. The temporal resolution is limited by the satellites, but 6-hourly wind speeds are squared before being monthly averaged. CSW is calculated as αSWfCO2-SW, where αSW is the solubility of CO2 at the temperature and salinity at the base of the MBL, following an established method49 and using the data products outlined in Table 1, and fCO2-SW is the fCO2 value also at the base of the MBL. Similarly, CATM is calculated as αASfCO2-ATM, where in this case αAS is the solubility at the air-sea interface, and fCO2-ATM is the atmospheric fCO2, which has been calculated from the NOAA ESRL xCO2 product50,51 according to an established method52 and using NOAA OI SST36 and NCEP sea level atmospheric pressure53. We also correct the surface temperature and salinity used to calculate αAS and k by −0.17 °C and 0.1 PSU, to approximate the conditions at the air-sea interface.

The three main sources of uncertainty in the calculation of air–sea CO2 fluxes are: the uncertainty from gridding SOCAT fCO2 observations onto the 1° × 1° grid, the uncertainty from the SOM-FFN mapping, and the uncertainty on the gas transfer velocity. We calculate the gridding error following earlier work on the SOM-FFN method7, obtaining a value of errgrid = 0.27 µatm for the Southern Ocean. We calculate the mapping error following another SOM-FFN study54, obtaining a value of errmap = 2.00 µatm. These two uncertainties are combined to give an overall fCO2 uncertainty of 2.02 µatm. We then add/subtract this uniform value from the ensemble mean fCO2 to get upper and lower bounds on the full fCO2 field. We then combine the upper and lower bound fCO2 estimates with a central estimate for the gas transfer velocities, producing two estimates of the air–sea flux. We then take the uncertainty on the fluxes due to the fCO2 uncertainty as the difference between the upper bound and central estimate fluxes (or the difference between the central and lower bound estimates; since the errors are normally distributed they are identical). We further combine the central estimate of the fCO2 field with upper and lower bounds on k, assuming 10% uncertainty on k as suggested by a recent review55, producing two more estimates of the air–sea flux. We then take the uncertainty on the fluxes due to the k uncertainty as the difference between this last upper bound and the central flux estimate. Finally, we combined the relative uncertainties for the flux field due to the pCO2 and k uncertainties to produce the total uncertainty on the flux.

Atmospheric inversion

The atmospheric inverse CO2 flux estimates used in this analysis follow previously published methods23. The methodology employs the GEOS-Chem atmospheric transport model in combination with the Localized Ensemble Transform Kalman Filter (LETKF) data assimilation system and atmospheric CO2 observations from the NOAA-ESRL surface network50. We use optimized monthly fluxes from the combined GEOSChem-LETKF system at a spatial resolution of of 2° latitude by 2.5° longitude. For the comparisons of this study we utilize posterior flux estimates for the Southern Ocean derived from two alternative representations of the prior ocean flux21,39. Further details on the GEOSChem-LETKF system are available in the related methods paper23.