## Abstract

The global fraction of anthropogenically emitted carbon dioxide (CO_{2}) that stays in the atmosphere, the CO_{2} airborne fraction, has been fluctuating around a constant value over the period 1959 to 2022. The consensus estimate of the airborne fraction is around 44%. In this study, we show that the conventional estimator of the airborne fraction, based on a ratio of changes in atmospheric CO_{2} concentrations and CO_{2} emissions, suffers from a number of statistical deficiencies. We propose an alternative regression-based estimator of the airborne fraction that does not suffer from these deficiencies. Our empirical analysis leads to an estimate of the airborne fraction over 1959–2022 of 47.0% (± 1.1%; 1*σ*), implying a higher, and better constrained, estimate than the current consensus. Using climate model output, we show that a regression-based approach provides sensible estimates of the airborne fraction, also in future scenarios where emissions are at or near zero.

### Similar content being viewed by others

## Introduction

The amount of anthropogenically emitted carbon dioxide (CO_{2}) that stays in the atmosphere, the so-called airborne fraction (AF), is an important quantity for the study of CO_{2} absorption in the carbon cycle of the Earth system^{1,2,3}. In the literature, it has been investigated and debated whether the AF has increased, decreased, or remained constant over the period from 1959 to today, during which atmospheric measurements of CO_{2} concentrations have been available. Earlier studies found evidence of an increasing AF^{4,5,6}, even though measurement and estimation uncertainty make these findings statistically dubious^{7,8}. Later studies suggest that the AF has remained constant at around 44%, and this has become the consensus view^{9,10,11,12}. Raupach^{13} shows that the AF is given by a constant in a system where emissions follow an exponential trajectory, and the sink uptake is linear in atmospheric CO_{2} concentrations. Bennedsen et al.^{14} formalize such a system statistically, also allowing for linear growth of emissions on the more recent sample, and report a point estimate of the AF of 0.44.

Previous studies have analyzed the AF as the ratio of yearly changes in atmospheric CO_{2} concentrations (*G*_{t}, numerator) and anthropogenic CO_{2} emissions (*E*_{t}, denominator)^{4,5,6,7,8,9,10,15,16,17,18}. An alternative to this approach is to consider the cumulative airborne fraction (CAF)^{19,20,21}, but since this approach is less commonly used in the literature and is less amenable to statistical analysis, we only briefly address it here. Instead, we follow the main body of the literature and adopt the conventional estimator of the AF, which is defined as the sample mean of the yearly ratio *G*_{t}/*E*_{t}. This ratio-based estimator suffers from a number of statistical deficiencies due to its definition as the ratio of two stochastic processes. A particular concern is the presence of trends in the time series of yearly changes of atmospheric concentrations *G*_{t} and of yearly emissions, *E*_{t}, which prohibits the application of a central limit theorem for the ratio-based estimator. Therefore, its limiting distribution may be non-Gaussian, unless a separate assumption of Gaussianity is imposed. Hence, confidence intervals and *p*-values for test statistics based on the Gaussian distribution may not be valid for the conventional ratio-based estimator. Another concern is the denominator in the ratio, *E*_{t}, if it has a positive probability density at zero. In this case, the ratio-based estimator does not possess any moments, such that, for instance, the mean and variance of the estimator do not exist. Although the issue of *E*_{t} being at or close to zero is of no concern during the historical period 1959–2022, it becomes important in future scenarios where CO_{2} emissions decrease, including scenarios consistent with “net-zero” CO_{2} emissions, which is a committed goal of the international community^{22,23}. Future scenarios will likely result in a non-constant AF, either because of emissions trajectories departing from exponential growth or because of changing dynamics in the carbon sinks due to, e.g., saturation^{24,25} or climate feedback effects^{26}. These departures can lead to a non-linear relationship between sink activity and atmospheric concentrations. Climate models have shown that the AF tends to increase in future high-emission scenarios and to decrease in low-emission scenarios^{19}. Hence, analyzing the future AF as implied by output from climate models necessitates an approach that can accommodate a time-varying AF, also in cases when emissions are at or near zero. Such challenging issues with the conventional ratio-based analysis of the AF, when emissions decrease towards zero, have recently been encountered in a study of the future AF implied by output from a climate model^{18}.

In this work, we propose a regression-based approach to estimating the CO_{2} airborne fraction and show that it is statistically superior to the conventional ratio-based approach. We first show that the time series of yearly changes in atmospheric CO_{2} concentrations, *G*_{t}, cointegrates with anthropogenic CO_{2} emissions, *E*_{t}, over the period 1959–2022. This implies a statistically constant AF for the historical sample. On the basis of cointegration between *G*_{t} and *E*_{t}, we prove a number of theoretical results concerning the ratio-based and regression-based estimators of the AF. These results formally establish the statistical deficiencies of the ratio-based estimator mentioned above and show that the regression-based estimator does not suffer from these defects. The theoretical analysis also shows that, under mild assumptions, the regression-based estimator converges at the fast rate of *T*^{3/2}, where *T* is the sample size, compared to the slower rate of *T* for the ratio-based estimator.

We apply the ratio-based and regression-based estimators of the AF to yearly data from the Global Carbon Project^{27} over the period 1959–2022, and we find that the regression-based estimator improves precision compared with the ratio-based estimator, in line with the theoretical results. Our best estimate of the AF over the period 1959–2022 is 47.0% with an associated standard error of 1.1%, which leads to a 95% confidence interval of [44.9%, 49.0%] for the AF. Using output from the reduced-complexity climate model MAGICC^{28}, we illustrate the challenges in applying the ratio-based estimator of the AF to future low-emission scenarios. We further show that a regression-based approach can address these difficulties. When the underlying regression model can accommodate a time-varying AF, we can adopt the Kalman filter to analyze the dynamics of the AF in future scenarios output from climate models, including those compatible with “net-zero” or “net-negative” emissions goals.

## Results

### Atmospheric changes, emissions, and cointegration

Figure 1a, b show yearly changes in atmospheric CO_{2} (*G*_{t}) and yearly CO_{2} emissions from anthropogenic sources (*E*_{t}), respectively. The black line in Fig. 1c) shows the ratio of these two variables, *G*_{t}/*E*_{t}. Data are obtained from the Global Carbon Project and cover the period 1959–2022 (Methods). The most conspicuous feature of the twotime series, *G*_{t} and *E*_{t}, is that they exhibit upward trends. Trending behavior is indicative of time series being non-stationary. A simple least-squares statistical analysis of the bivariate system (*G*_{t}, *E*_{t}), where the non-stationarity of a time series is not accounted for, can yield invalid inference and should be avoided^{29}. However, the notion of cointegration (see, for example, Chapter 19 in Hamilton^{30} for a textbook treatment) allows us to keep working with the trending time series *G*_{t} and *E*_{t} while still obtaining valid statistical inference. Cointegration methods have been applied in earlier climate studies^{31,32}. Informally, twotime series are cointegrated if they share a common trend. Formally, the time series *G*_{t} and *E*_{t} are said to be cointegrated when both *G*_{t} and *E*_{t} are non-stationary, and the error term *u*_{t} in the regression equation *G*_{t} = *α**E*_{t} + *u*_{t} is stationary. We adopt the Dickey-Fuller test^{33} to determine whether a time series is non-stationary. The test statistic is for the null hypothesis of a unit root, that is, of having a unit value for the autoregressive dependence of *G*_{t} on its lagged value *G*_{t−1} (and *E*_{t} on *E*_{t−1}). The results from this test strongly suggest that both time series are non-stationary (Supplementary Table 1). The null hypothesis of no cointegration (that is, *u*_{t} is non-stationary) can be tested formally using the Engle-Granger test^{34}, which is a Dickey-Fuller test on the residuals in the regression *G*_{t} = *α**E*_{t} + *u*_{t}, adjusted for the fact that these residuals are not observed but must be estimated. The null hypothesis of a unit root in *u*_{t} is firmly rejected (Supplementary Table 1), and we can, therefore, conclude that the two yearly time series *G*_{t} and *E*_{t} are cointegrated.

The cointegration analysis supports the hypothesis that the AF parameter *α* is constant during the period studied here (1959–2022). If the parameter *α* was changing in a specific direction, this would introduce a trend in the estimated residuals *u*_{t}. The result from the Engle-Granger test shows that a trend is not present. This is confirmed graphically by the blue line in Fig. 1f and is in line with recent studies^{9,10,11,12}. A Jarque-Bera test^{35} for normality of the estimated residuals for *u*_{t} results in a *p*-value of 24%, implying that we cannot reject the null of *u*_{t} having a Gaussian distribution.

### Statistical properties of the ratio-based and regression-based AF estimators

We consider two approaches to estimating the AF *α*. These are the conventional ratio-based estimator, using equation \({G}_{t}/{E}_{t}=\alpha+{u}_{t}^{(1)}\), and the regression-based estimator, using equation \({G}_{t}=\alpha \,{E}_{t}+{u}_{t}^{(2)}\). Both estimators can be implemented using least-squares regression (Methods). In the case of independent data, it is known that the regression-based estimator is efficient, i.e., it has lower estimation uncertainty compared to, for example, the ratio-based estimator^{36,37}. We study this property for the case of cointegrated non-stationary time series data, which the cointegration analysis summarized above shows is the relevant case for the AF.

We derive the asymptotic properties, that is, consistency and asymptotic normality, for both the ratio-based and the regression-based estimators (Supplementary Methods). These properties depend on the dynamics of CO_{2} emissions, which are well-described by a random walk process with drift over the sample period 1959–2022, i.e., *E*_{t} = *E*_{0} + *b**t* + *x*_{t}, with initial value *E*_{0}, drift coefficient *b* > 0, and random walk process *x*_{t}, for *t* = 1, …, *T*, where *T* is the sample size (Supplementary Methods). We show that the regression-based estimator converges to the data-generating AF *α* at rate *T*^{3/2} (Supplementary Prop. 1). The ratio-based estimator, on the other hand, also converges to the data-generating AF *α* but at the slower rate *T* (Supplementary Prop. 2). This implies that the estimation uncertainty in the regression-based estimator will decrease faster with increasing sample size than the ratio-based estimator, as is the case for independent data.

If the process *E*_{t} has positive probability density at zero, then the ratio-based estimator does not have a finite mean or variance (Supplementary Prop. 3). This follows directly from the definition of the ratio-based estimator as the sample mean of *G*_{t}/*E*_{t}: if values *E*_{t} = 0 have positive probability in the sample space, then the ratio *G*_{t}/*E*_{t} is not integrable on that space.

The model assumption *E*_{t} = *E*_{0} + *b**t* + *x*_{t} for CO_{2} emissions, where *x*_{t} is a random walk process of, for example, Gaussian increments, implies a positive probability density for *E*_{t }= 0. However, for the sample period 1959–2022, the trend terms *E*_{0} + *b**t* of *E*_{t} are much larger in magnitude than the random walk term *x*_{t}, and hence it is not unreasonable to assume that *x*_{t} = 0 for theoretical purposes. In this case, the ratio-based estimator has standard statistical properties. In particular, it is an unbiased estimator of *α*, that is, the mean of the estimator equals *α*, and the variance of the estimator has a simple expression that can easily be estimated. However, we show that even in this case, a central limit theorem does not hold in general (Supplementary Prop. 4(i)). The ratio-based estimator has a limiting Gaussian distribution only if we additionally assume that *u*_{t} is Gaussian (Supplementary Prop. 4(ii)). In contrast, the regression-based estimator follows a central limit theorem with a limiting Gaussian distribution, and the derivation does not require this additional assumption (Supplementary Prop. 1).

Although the theoretical results show that the regression-based estimator is asymptotically, i.e., for sufficiently large sample sizes, more precise than the ratio-based estimator, it is an empirical question of which estimator is more precise in finite samples. Next, we estimate the variances of the ratio-based and the regression-based estimators on the historical sample and compare their magnitudes.

### Estimating the airborne fraction over 1959–2022

We use time series data on yearly changes in atmospheric CO_{2} (*G*_{t}), yearly CO_{2} emissions from fossil fuels (\({E}_{t}^{FF}\)), and yearly CO_{2} emissions from land-use and land cover change (\({E}_{t}^{LULCC}\)), for the sample 1959–2022. Total anthropogenic CO_{2} emissions are then \({E}_{t}= {E}_{t}^{FF}+{E}_{t}^{LULCC}\). The data series are measured in gigatonnes of carbon per year (GtC/yr), obtained from the Global Carbon Project (Methods) and presented in Fig. 1a, b.

The ratio-based estimate (\({\hat{\alpha }}_{1}\)) and the regression-based estimate (\({\hat{\alpha }}_{2}\)) are obtained from least-squares regressions applied to the equations \({G}_{t}/{E}_{t}=\alpha+{u}_{t}^{(1)}\) and \({G}_{t}=\alpha \,{E}_{t}+{u}_{t}^{(2)}\), respectively (Methods). The fits are shown in Fig. 1c, e, and the associated estimated residuals \({\hat{u}}_{t}\) are shown in 1d, f, all as blue lines. To account for possible serial correlation and heteroskedasticity in the model errors *u*_{t}, we calculate standard errors using a heteroskedasticity and autocorrelation consistent (HAC) estimator^{38}. The results are displayed in the first two columns of Table 1. The estimates largely agree on the magnitude of the AF, \({\hat{\alpha }}_{1}=43.86\%\) and \({\hat{\alpha }}_{2}=44.78\%\). However, the standard error of \({\hat{\alpha }}_{2}\) is 11% lower than the standard error of \({\hat{\alpha }}_{1}\), showing that the faster convergence rate of this estimator (*T*^{3/2} versus *T*) outweighs the fact that the error process \({u}_{t}^{(2)}\) of the regression-based model has a larger variance than the error process \({u}_{t}^{(1)}\) of the ratio-based model. In particular, the estimated standard deviations (SDs) of these model errors are \(\widehat{SD}({u}_{t}^{(1)})=0.13\) and \(\widehat{SD}({u}_{t}^{(2)})=0.91\). The discrepancy is due to the different nature of the two models where \({u}_{t}^{(1)}={u}_{t}^{(2)}/{E}_{t}\), with *E*_{t} ≫ 1 in the sample 1959–2022.

By introducing covariates in the least-squares regressions, we can reduce the variance of the error processes *u*_{t} and thus achieve more precise estimates of the AF *α*. For example, it is common practice in the literature to control for the effects of the El Niño-Southern Oscillation (ENSO) and volcanic activity (VAI)^{5,17}. We follow this approach here (see Methods). The estimation results for the ratio-based estimator (\({\hat{\alpha }}_{3}\)) and the regression-based estimator (\({\hat{\alpha }}_{4}\)) of the AF when ENSO and VAI are included as covariates are presented in the third and fourth columns of Table 1. The corresponding regression fits are presented in Fig. 1c, e, and their associated estimated residuals \({\hat{u}}_{t}\) in Fig. 1d),f), all as red dashed lines. The results show that controlling for the effects of ENSO and volcanic activity increases the estimate of the AF considerably, resulting in \({\hat{\alpha }}_{3}=47.16\%\) and \({\hat{\alpha }}_{4}=46.97\%\). The estimates of the standard deviations of the error terms (\(\widehat{SD}({u}_{t}^{(3)})=0.09\) and \(\widehat{SD}({u}_{t}^{(4)})=0.63\)) decrease substantially compared to the models without covariates, indicating that the covariates ENSO and VAI explain much variation in the data. This is corroborated by the coefficient of determination (*R*^{2}) values reported in Table 1. We note that the *R*^{2} for the model in the first column equals zero by construction since this model only features an intercept. The decreased variances of the residuals from the models including covariates, imply that their estimates of the constant AF *α* are more precise. The standard error of the regression-based estimate including covariates (\({\hat{\alpha }}_{4}\)) is approximately 16% lower than the ratio-based estimate including covariates (\({\hat{\alpha }}_{3}\)) and approximately 34% lower than the conventional ratio-based estimate excluding covariates (\({\hat{\alpha }}_{1}\)). Our preferred estimate, obtained from the regression-based estimator including covariates (\({\hat{\alpha }}_{4}\)), results in an AF of 47.0% (± 1.1%; 1*σ*) with an associated 95% confidence interval of [44.9%, 49.0%]. The slightly increased AF estimates for the models with ENSO and VAI, compared to the models without covariates, confirm a similar finding in Betts et al.^{39}.

The variability of the differences in CO_{2} emissions increased in the early 1990s (Supplementary Fig. 1). This is most likely due to increased variability of emission estimates from land-use and land-cover change starting in the early 1990s (Supplementary Methods and Supplementary Fig. 1). As a robustness check, the right panel of Table 1 presents the results for the more recent subsample 1992–2022; Supplementary Table 1 contains the corresponding coefficient estimates for ENSO and VAI. Our conclusions for the full sample are corroborated by the results for the recent sample. All estimates of the AF *α* from the subsample are within the respective confidence bands of the estimates from the full sample, while the reductions in uncertainty from including the covariates and from using the regression-based estimator are similar.

### Estimating the airborne fraction over 2023–2100

The approximate constancy of the AF over the historical period 1959–2022, as documented in the literature and confirmed by the cointegration analysis in this study, can be understood as the result of a near-exponential growth in emissions and an approximately linear response of the carbon sinks to atmospheric concentrations^{9}. In scenarios describing the future, for example, when emissions are declining, the AF is expected to depart from constancy and may vary over time^{18,19}. This motivates the specification of a time-varying AF *α* = *α*_{t}, with *α*_{t} denoting the AF in year *t*, i.e., the fraction of emissions (*E*_{t}) added to the atmosphere (*G*_{t}) in year *t*. This time-varying AF may also be estimated using a ratio-based approach and a regression-based approach (Methods).

To study the performance of the ratio-based (\({\hat{\alpha }}_{1,t}\)) and the regression-based (\({\hat{\alpha }}_{2,t}\)) estimators in situations where the AF is changing over time, we apply the two estimators to output from the MAGICC reduced-complexity climate model^{28}. We let MAGICC produce future trajectories of *G*_{t} and *E*_{t} for *t* = 2023, 2024, …, 2100 according to the Shared Socioeconomic Pathways (SSPs)^{40}. Here we present the results from the so-called SSP1-2.6 scenario, which is a high mitigation scenario consistent with a forcing level of 2.6 Wm^{−2} in the year 2100^{40}. Results obtained from other SSP scenarios are similar to those reported below (Supplementary Figs. 5–9). Since MAGICC is a deterministic model without a stochastic representation of the climate variables, the trajectories of *G*_{t} and *E*_{t} generated by MAGICC are very smooth. To obtain output that resembles climate data, we perturb the trajectories of *G*_{t} and *E*_{t} by zero-mean Gaussian noise, where we set the variances equal to estimates obtained on the historical data. These simulated trajectories, together with the original output from MAGICC and the historical Global Carbon Project data 1959–2022, are shown in panels a and b of Fig. 2. Panel c presents the historical ratio *G*_{t}/*E*_{t} over 1959–2022 and the ratio-based and regression-based estimates \({\hat{\alpha }}_{t}\) of the time-varying airborne fraction over 2023–2100.

The ratio-based estimate \({\hat{\alpha }}_{1,t}\) (blue) is a very noisy series, especially when *E*_{t }≈ 0. In contrast, the regression-based estimate \({\hat{\alpha }}_{2,t}\) (red), obtained from the Kalman filter (Methods), evolves over time in a stable fashion and shows sensible AF estimates, also when *E*_{t }≈ 0. A further benefit of the regression-based method is the availability of confidence intervals for \({\hat{\alpha }}_{2,t}\) (shaded red area), which are not immediately available for the ratio-based estimator \({\hat{\alpha }}_{1,t}\). Finally, the covariates for El Niño and volcanic activity can readily be incorporated into the regression-based framework with a time-varying AF *α*_{t}.

In the SSP1-2.6 scenario studied here, the regression-based AF estimate \({\hat{\alpha }}_{2,t}\) remains roughly constant until 2050, after which it gradually declines toward zero. In 2060, the atmospheric changes turn negative, resulting in a negative estimate of the AF, meaning that the sink uptake exceeds the emissions. In 2077, the emissions turn negative as well, causing a switch to a positive estimate of the AF. The estimates of the AF exceed one from 2077 onwards, indicating that the sinks continue to absorb CO_{2} even in this regime with highly negative emissions. These findings can be contrasted with the analysis of the SSP1-1.9 scenario (Supplementary Fig. 5), which has a similar trajectory for the regression-based AF estimates \({\hat{\alpha }}_{2,t}\) as the SSP1-2.6 scenario. However, from 2080 onwards, the SSP1-1.9 scenario has \({\hat{\alpha }}_{2,t} \, < \, 1\), implying that the sinks turn into carbon sources (releasing more carbon dioxide than they absorb).

## Discussion

Our empirical findings present a slightly higher AF than the consensus estimate of 44%^{11} and the cumulative airborne fraction CAF_{t} = 44.4% obtained from the Global Carbon Project data (Supplementary Methods). The regression-based estimate of the AF, using the 1959–2022 sample of the Global Carbon Project data and controlling for El Niño and volcanic activity, is 47.0% (± 1.1%; 1*σ*), with a 95% confidence interval of [44.9%, 49.0%].

When El Niño and volcanic activity are excluded from the analysis, the estimate is 44.8% (± 1.4%; 1*σ*), which is more in line with the commonly reported results. When we apply the same analysis to two alternative data sets, we obtain slightly higher estimates of the AF than those from the Global Carbon Project (Supplementary Table 2). The more recent 1992–2022 subsample yields an AF estimate of approximately 46% (± 1.0%; 1*σ*) for the Global Carbon Project data (Table 1) and slightly higher estimates for the two alternative data sets (Supplementary Table 3). To account for possible measurement errors in *G*_{t} and *E*_{t}, we report Deming regressions^{41}, which are in line with the results reported so far (Supplementary Table 4). Therefore, we may conclude that measurement error is not driving our results.

To summarize the theoretical findings in our study, we conclude that the ratio-based estimator of the AF suffers from three main shortcomings. First, due to its definition as the ratio of changes in atmospheric concentrations to emissions, means and variances do not exist if zero emissions are possible. While this is of no concern on the historical sample, it is important when analyzing the AF on net-zero emissions scenarios. Studies of the past AF^{4,5,6,7,8,9,10,15,16,17} are most likely not influenced to any substantial degree by this issue, but studies of future low-emission scenarios are affected^{18}. Second, stronger assumptions of Gaussianity on the distribution of the error process are necessary, compared to the case of the regression-based estimator, if a central limit theorem is to be invoked to compute confidence intervals and *p*-values based on the Gaussian distribution. Alternative methods to compute confidence intervals and *p*-values, such as the bootstrap^{42}, can also be used for this purpose. Again, studies on historical data are most likely not strongly affected by our findings, as Fig. 1d suggests Gaussian residuals. Third, the ratio-based estimator converges to the data-generating AF at a slower rate than the regression-based estimator, even if zero emissions are ruled out, and errors are assumed to be normal. Both estimators converge faster than the common \(\sqrt{T}\) rate due to the non-stationarity of the two yearly time series variables, emissions (*E*_{t}) and changes in atmospheric concentrations (*G*_{t}), and to their cointegration. The ratio-based estimator converges at rate *T* and the regression-based estimator at rate *T*^{3/2}.

The preferred regression-based estimator has standard statistical properties, such as the existence of first and second moments, it is defined for zero emissions, and it converges to the data-generating AF at a fast rate. A central limit theorem applies without assuming the Gaussianity of the regression error, and confidence levels and *p*-values can be computed in the usual way. Based on theoretical arguments, on a simulation study (Supplementary Methods and Supplementary Fig. 4), and on a historical sample of yearly data, we have shown that the regression-based estimator exhibits lower estimation uncertainty compared to the ratio-based estimator. Finally, we have argued that the regression-based estimator can readily be generalized to a time-varying AF specification with its estimation done by the Kalman filter and smoother.

Table 2 summarizes the statistical tests performed in this study. The main empirical findings from these tests are: (1) emissions and changes in atmospheric concentration are trending over the historical sample 1959–2022, (2) emissions and changes in atmospheric concentrations cointegrate on the historical sample 1959–2022 with a constant regression coefficient, motivating the model choice for our theoretical studies, (3) the regression errors appear Gaussian, (4) the findings of this paper are qualitatively the same on a subsample of the last 31 years, (5) measurement error in the twotime series is not driving the results.

The main advantage of the regression-based approach for a historical sample analysis is increased precision. In future projections with emissions approaching zero, it remains valid, in contrast to a ratio-based approach. To illustrate this feature, we have simulated trajectories for emissions and changes in atmospheric concentrations over the period 2023–2100, which are consistent with SSP scenarios using the MAGICC reduced-complexity climate model. We regard these analyses as a first step and consider the use of climate projections from the Coupled Model Intercomparison Project (CMIP) as the next step in our research agenda.

## Methods

### Data used in the study

The time series data from the Global Carbon Project are available at https://www.icos-cp.eu/science-and-impact/global-carbon-budget/2023 (last accessed June 17, 2024). The variable \({E}_{t}^{FF}\) includes the cement carbonation sink, as described in Friedlingstein et al.^{27}. VAI data for volcanic activity are obtained from Ammann et al.^{43}. ENSO data are constructed from the Niño 3 SST Index of the National Oceanic and Atmospheric Administration (NOAA), available at https://psl.noaa.gov/gcos_wgsp/Timeseries/Data/nino3.long.anom.data (last accessed June 17, 2024). Specifically, we have converted monthly ENSO data into a yearly time series of September-August ENSO means^{44}. This 4-month lag provides the best fit between *G*_{t} and *E**N**S**O*_{t}. The slight trend in yearly ENSO data is removed by taking deviations from a fitted linear trend so that it has no impact on the AF estimates. The data are shown in Supplementary Fig. 2. The data for the SSP scenarios can be run in MAGICC in a web browser via https://live.magicc.org/scenarios/bced417f-0f7f-4bb7-8359-792a0a8b0368/overview (last accessed June 17, 2024).

### Ratio-based and regression-based estimators of a constant airborne fraction

The ratio-based approach to estimating the AF takes its departure from the statistical model given by

where *G*_{t} are the yearly changes in atmospheric concentrations of CO_{2}, *E*_{t} are yearly CO_{2} emissions, the constant parameter *α* is the AF and \({u}_{t}^{(1)}\) are the disturbance modeled as a zero-mean error process, for *t* = 1, 2, …, *T*, with *T* denoting the number of yearly observations in the sample. The disturbance \({u}_{t}^{(1)}\) captures deviations of the data *G*_{t}/*E*_{t} from the constant value *α* due to measurement errors and internal variability of the climate system. For the statistical model (1), it is straightforward to estimate the AF parameter *α* using the sample mean of the data *G*_{t}/*E*_{t}, yielding the ratio-based estimator given by

The model in equation (1) expresses that, on average, the fraction *α* of emissions *E*_{t} is absorbed in the atmosphere, resulting in atmospheric concentrations increasing with the amount *G*_{t}. An alternative way to express this association between *G*_{t} and *E*_{t} is through the model formulation

for *t* = 1, 2, …, *T*, where the disturbance \({u}_{t}^{(2)}\) are a zero-mean error process. A model closely related to (2) has previously been used to reconstruct and predict CO_{2} growth rates^{39,45}. The relationship between the disturbances in equations (1) and (2) is given by \({u}_{t}^{(1)}={u}_{t}^{(2)}\,/\,{E}_{t}\). Cointegration of *G*_{t} and *E*_{t} implies that \({u}_{t}^{(2)}\) is a stationary process. Then, the parameter *α* can be estimated directly using a simple least-squares calculation, yielding the regression-based estimator as given by

### Including covariates

We may control for the effects of El Niño (*E**N**S**O*_{t}) and volcanic activity (*V**A**I*_{t}) by introducing pertinent data into the models (1) and (2). We thus consider the models

for *t* = 1, 2, …, *T*, where \({\tilde{\gamma }}_{i}\) and *γ*_{i}, for *i* = 1, 2, are regression coefficients, and the model errors \({u}_{t}^{(j)}\) follow a zero-mean error process, for *j* = 3, 4. For both models, the coefficients can be estimated using least-squares regression. We let \({\hat{\alpha }}_{3}\) and \({\hat{\alpha }}_{4}\) denote the least-squares estimators of *α* from models (3) and (4), respectively. The estimation results for the ENSO and VAI coefficients *γ*_{i} and \({\tilde{\gamma }}_{i}\), for *i* = 1, 2, are reported in Supplementary Table 2.

### Ratio-based and regression-based estimators of a time-varying airborne fraction

In the case of a time-varying AF *α*_{t}, the ratio-based model can be written as

where *α*_{t} is a yearly time-varying coefficient, typically specified as a random walk process. The ratio *G*_{t}/*E*_{t} may be used to track the amount of emitted CO_{2} that remains airborne, and hence, the estimate \({\hat{\alpha }}_{1,t}={G}_{t}/{E}_{t}\) can be regarded as an appropriate but very noisy time-varying AF estimate. A possible way to reduce the noise in this AF estimate is to apply a local smoothing operation, e.g., a two-sided moving average filter. Filtered or not, the variability of the ratio *G*_{t}/*E*_{t} will be amplified when future emissions *E*_{t} start to approach zero. Another possible solution for noise reduction previously suggested in the literature is to use the cumulative AF (CAF) in place of the yearly AF^{19,20,21}. However, due to its cumulative nature, the CAF can be slow to detect changes in the behavior of the carbon sinks, making it less useful for the purpose of analyzing a time-varying AF (Supplementary Fig. 3).

When considering the regression-based model with a time-varying AF *α*_{t}, we obtain

A versatile way of treating such a time-varying regression model is to assume random walk dynamics for *α*_{t} and estimate it by means of a recursive regression filter, such as the Kalman filter and smoother^{46}. This approach yields the minimum mean-squared error estimator \({\hat{\alpha }}_{2,t}\), and it does not suffer from the deficiencies of the time-varying ratio-based estimator \({\hat{\alpha }}_{1,t}\). In the regression-based model (5), *α*_{t} is multiplied by *E*_{t}. When emissions turn negative for the first time in some year *t*, i.e., *E*_{t} < 0, we let *α*_{t} be reflected around one. For this purpose, we adjust the random walk specification for *α*_{t} at year *t* with a one-time instantaneous shift in *α*_{t} when *E*_{t} < 0 for the first time. Further motivation and technical detail on this procedure can be found in the Supplementary Methods.

## Data availability

Data used in this study are publicly available and can be found at https://zenodo.org/records/13767769^{47}. These data are also available in a Source Data file accompanying the paper. Source data are provided in this paper.

## Code availability

MATLAB code for replication of all results in the main paper and Supplementary Information can be found at https://zenodo.org/records/13767769^{47}.

## References

Bacastow, R. and Keeling, C. D. Atmospheric carbon dioxide and radiocarbon in the natural cycle: II. Changes from A.D. 1700 to 2070 as deduced from a geochemical model.

*Brookhaven Symp. Biol.***24**, 86–135 (1973).Siegenthaler, U. & Oeschger, H. Predicting future atmospheric carbon dioxide levels.

*Science***199**, 388–395 (1978).Gloor, M., Sarmienti, J. L. & Gruber, N. What can be learned about carbon cycle climate feedbacks from the CO

_{2}airborne fraction?*Atmos. Chem. Phys.***10**, 7739–7751 (2010).Canadell, J. G. et al. Contributions to accelerating atmospheric CO

_{2}growth from economic activity, carbon intensity, and efficiency of natural sinks.*Proc. Natl. Acad. Sci. USA***104**, 18866–18870 (2007).Raupach, M. R., Canadell, J. G. & Le Quéré, C. Anthropogenic and biophysical contributions to increasing atmospheric CO

_{2}growth rate and airborne fraction.*Biogeosciences***5**, 1601–1613 (2008).Le Quéré, C. et al. Trends in the sources and sinks of carbon dioxide.

*Nat. Geosci.***2**, 831–836 (2009).Knorr, W. Is the airborne fraction of anthropogenic CO

_{2}emissions increasing?*Geophys. Res. Lett*.**36**, https://doi.org/10.1029/2009GL040613 (2009).Ballantyne, A. P. et al. Audit of the global carbon budget: estimate errors and their impact on uptake uncertainty.

*Biogeosciences***12**, 2565–2584 (2015).Raupach, M. R. et al. The declining uptake rate of atmospheric CO

_{2}by land and ocean sinks.*Biogeosciences***11**, 3453–3475 (2014).Bennedsen, M., Hillebrand, E. & Koopman, S. J. Trend analysis of the airborne fraction and sink rate of anthropogenically released CO

_{2}.*Biogeosciences***16**, 3651–3663 (2019).Canadell, J. G. et al. Global carbon and other biogeochemical cycles and feedbacks. In

*Climate Change 2021: The Physical Science Basis. Contribution of Working Group I to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change*(Cambridge Univ. Press, 2021).Bennedsen, M., Hillebrand, E. & Koopman, S. J. On the evidence of a trend in the CO

_{2}airborne fraction.*Nature***616**, E1–E3 (2023).Raupach, M. R. The exponential eigenmodes of the carbon-climate system, and their implications for ratios of responses to forcings.

*Earth Syst. Dynam.***4**, 31–49 (2013).Bennedsen, M., Hillebrand, E. & Koopman, S. J. A multivariate dynamic statistical model of the global carbon budget 1959–2020.

*J. R. Stat. Soc. A***186**, 20–42 (2023).Ballantyne, A. P. et al. Increase in observed net carbon dioxide uptake by land and oceans during the past 50 years.

*Nature***488**, 70–72 (2012).Keenan, T. F. et al. Recent pause in the growth rate of atmospheric CO

_{2}due to enhanced terrestrial carbon uptake.*Nat. Commun.***7**, 1–10 (2016).van Marle, M. J. E. et al. New land-use-change emissions suggest a declining CO

_{2}airborne fraction.*Nature***603**, 450–454 (2022).Pressburger, L. et al. Quantifying airborne fraction trends and the destination of anthropogenic CO

_{2}by tracking carbon flows in a simple climate model.*Environ. Res. Lett.***18**, 5 (2023).Jones, C. et al. Twenty-first-century compatible CO

_{2}emissions and airborne fraction simulated by CMIP5 Earth system models under four representative concentration pathways.*J. Clim.***26**, 4398–4413 (2013).Jones, C. D. et al. Simulating the Earth system response to negative emissions.

*Environ. Res. Lett.***11**, 1–11 (2016).Liddicoat, S. K. et al. Compatible fossil fuel CO

_{2}emissions in the CMIP6 Earth system models’ historical and shared socioeconomic pathway experiments of the twenty-first century.*J. Clim.***34**, 2853–2875 (2021).Rockström, J. et al. A roadmap for rapid decarbonization.

*Science***355**, 1269–1271 (2017).Riahi, K. et al. Mitigation pathways compatible with long-term goals. In

*IPCC, 2022: Climate Change 2022: Mitigation of Climate Change. Contribution of Working Group III to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change*(Cambridge Univ. Press, 2022).Le Quéré, C. et al. Saturation of the southern ocean CO

_{2}sink due to recent climate change.*Science***316**, 1735–1738 (2007).Canadell, J. G. et al. Saturation of the terrestrial carbon sink. In

*Terrestrial Ecosystems in a Changing World*, 59–78 (Springer, 2007).Friedlingstein, P. Carbon cycle feedbacks and future climate change.

*Philos. Trans. R. Soc. A***373**, 20140421 (2015).Friedlingstein, P. et al. Global carbon budget 2023.

*Earth Syst. Sci. Data***15**, 5301–5369 (2023).Meinshausen, M., Raper, S. C. B. & Wigley, T. M. L. Emulating coupled atmosphere-ocean and carbon cycle models with a simpler model, MAGICC6 – Part 1: Model description and calibration.

*Atmos. Chem. Phys.***11**, 1417–1456 (2011).Granger, C. W. J. & Newbold, P. Spurious regression in econometrics.

*J. Econom.***2**, 111–120 (1974).Hamilton, J. D.

*Time Series Analysis*. (Princeton University Press, 1994).Kaufmann, R. K. & Stern, D. I. Cointegration analysis of hemispheric temperature relations.

*J. Geophys. Res. Atmos.***107**, D2 (2002).Schmith, T., Johansen, S. & Thejll, P. Statistical analysis of global surface temperature and sea level using cointegration methods.

*J. Clim.***25**, 7822–7833 (2012).Dickey, D. A. & Fuller, W. A. Distribution of the estimators for autoregressive time series with a unit root.

*J. Am. Stat. Assoc.***74**, 427–431 (1979).Engle, R. F. & Granger, C. W. J. Co-integration and error correction: Representation, estimation, and testing.

*Econometrica***55**, 251–276 (1987).Jarque, C. M. & Bera, A. K. A test for normality of observations and regression residuals.

*Int. Stat. Rev.***2**, 163–172 (1987).Cochran, W. G.

*Sampling Techniques*. (Wiley, 3rd edn, 1977).Deng, L.-Y. & Wu, C. F. J. Estimation of variance of the regression estimator.

*J. Am. Stat. Assoc.***82**, 568–576 (1987).Newey, W. K. & West, K. D. A simple, positive semi-definite, heteroskedasticity and autocorrelation-consistent covariance matrix.

*Econometrica***55**, 703–708 (1987).Betts, R. A. et al. El Niño and a record CO

_{2}rise.*Nat. Clim. Change***6**, 806–810 (2016).Riahi, K. et al. The Shared Socioeconomic Pathways and their energy, land use, and greenhouse gas emissions implications: An overview.

*Glob. Environ. Change***42**, 153–168 (2017).Deming, W. E.

*Statistical Adjustments of Data*. (Wiley, 1943).Efron, B. and Tibshirani, R.

*An Introduction to the Bootstrap*. (Chapman & Hall/CRC, 1993).Ammann, C. M., Meehl, G. A., Washington, W. M. & Zender, C. S. A monthly and latitudinally varying volcanic forcing dataset in simulations of 20th century climate.

*Geophys. Res. Lett*.**30**, https://doi.org/10.1029/2003GL016875 (2003).Jones, C. D. et al. The carbon cycle response to ENSO: A coupled climate–carbon cycle model study.

*J. Clim.***14**, 4113–4129 (2001).Jones, C. D. & Cox, P. M. On the significance of atmospheric CO

_{2}growth rate anomalies in 2002–2003.*Geophys. Res. Lett.***32**, 14 (2005).Durbin, J. and Koopman, S. J.

*Time Series Analysis by State Space Methods*. Oxford Univ. (Press, 2nd edn, 2012).Bennedsen, M., Hillebrand, E. & Koopman, S. J. A regression-based approach to the CO

_{2}airborne fraction. Zenodo, https://doi.org/10.5281/zenodo.13767769 (2024).

## Acknowledgements

We thank Morten Ø. Nielsen for helpful discussions regarding the convergence of stochastic processes. This work was supported by the Independent Research Fund Denmark (grant 0219-00001B to M.B.).

## Author information

### Authors and Affiliations

### Contributions

M.B., E.H., and S.J.K. contributed equally to the paper.

### Corresponding author

## Ethics declarations

### Competing interests

The authors declare no competing interests.

## Peer review

### Peer review information

*Nature Communications* thanks Chris Jones and the other anonymous reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

## Additional information

**Publisher’s note** Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Supplementary information

## Source data

## Rights and permissions

**Open Access** This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

## About this article

### Cite this article

Bennedsen, M., Hillebrand, E. & Koopman, S.J. A regression-based approach to the CO_{2} airborne fraction.
*Nat Commun* **15**, 8507 (2024). https://doi.org/10.1038/s41467-024-52728-1

Received:

Accepted:

Published:

DOI: https://doi.org/10.1038/s41467-024-52728-1

## Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.