Robust estimation of the carbon dioxide airborne fraction under measurement errors

Authors
Affiliation

Aalborg University

Charisios Grivas

Aalborg University

Abstract

This paper discusses the effect of measurement errors in the estimation of the carbon dioxide (CO_2) airborne fraction. We are the first to present regression-based estimates and standard errors that are robust to measurement errors for the extended model, the preferred specification to estimate the CO_2 airborne fraction. To achieve this goal, we add to the literature in three ways: i) We generalise the Deming regression to handle multiple variables. ii) We introduce a bootstrap approach to construct confidence intervals for Deming regression in both univariate and multivariate scenarios. iii) Propose to estimate the airborne fraction using instrumental variables (IV), taking advantage of the variation of additional measurements, to obtain consistent estimates that are robust to measurement errors. IV estimates for the airborne fraction are 44.8%(± 1.4%; 1\sigma) for the simple specification, and 47.3%(± 1.1%; 1\sigma) for the extended specification. We show that these estimates are not statistically different from the ordinary least squares (OLS) estimates, while being robust to measurement errors without relying on additional assumptions. In contrast, OLS estimates are shown to fall outside the confidence interval of the Deming regression estimates.

Keywords

airborne fraction, measurement errors, multivariate Deming regression, Deming regression inference, instrumental variables

1 CO_2 airborne fraction

The CO_2 airborne fraction is the portion of anthropogenic CO_2 emissions that remains in the atmosphere. This is an important factor in the carbon cycle, playing a critical role in assessing how human actions influence the climate system. Consequently, it attracts significant scientific interest. Previous studies on its estimation include (Canadell et al. 2007, 2023; Le Quéré et al. 2009; Raupach et al. 2014; Bennedsen, Hillebrand, and Koopman 2019, 2023). Recently, Bennedsen, Hillebrand, and Koopman (2024) proposed a regression-based approach using ordinary least squares (OLS) to estimate the CO_2 airborne fraction. Their analysis demonstrates that, given mild assumptions, regression yields superior statistical properties compared to the conventional method of calculating it as a ratio. The authors report a point estimate of the airborne fraction over 1959–2022 of 47.4%(± 1.1%; 1\sigma) for their preferred specification including additional covariates.

A statistical challenge not fully explored by Bennedsen, Hillebrand, and Koopman (2024) is the impact of measurement errors on OLS estimation. We show that measurement errors in emissions can bias the estimates of the CO_2 airborne fraction. To alleviate these concerns, Bennedsen, Hillebrand, and Koopman (2024) used Deming regression (Deming 1943) to obtain estimates robust to measurement errors. However, Deming regression demands some strong assumptions and presents several computational challenges.

On the one hand, Deming regression requires knowledge of the ratio of measurement error variances. In fields like chemistry, researchers might have a certain understanding of this ratio. This assumption is rarely met in the climate data. For example, Bennedsen, Hillebrand, and Koopman (2024) considered five different values for this parameter.

On the other hand, Deming regression assumes that the measurement errors follow a Gaussian distribution. This assumption is at odds with one of the main statistical properties of the OLS estimate suggested by Bennedsen, Hillebrand, and Koopman (2024). The authors motivated the use of OLS by the fact that the estimate follows a central limit theorem with a limiting Gaussian distribution, and the derivation does not require assuming that the error term has a Gaussian distribution. This property is not shared by the classical method based on a ratio, where Gaussianity must be assumed for the estimator to have a Gaussian distribution.

Moreover, Deming regression presents additional computational challenges. First, closed-form expressions for estimates in the multivariate setting are not available. This point is of great relevance given that the preferred specification of Bennedsen, Hillebrand, and Koopman (2024) to estimate the CO_2 airborne fraction includes additional covariates in a multivariate specification. Second, standard errors and confidence intervals for Deming regression estimates are not available in closed-form expressions. The authors do not report on either for their estimates.

Inspired by these findings, this article contributes to the existing literature in multiple ways. First, it extends Deming regression to address the computational challenges. We introduce a straightforward approach to obtain estimates in the multivariate case and propose using bootstrap to compute standard errors and confidence intervals. Furthermore, we propose the use of instrumental variables (IV) (Reiersøl 1941; Durbin 1954; Sargan 1958) to estimate the CO_2 airborne fraction. IV is robust to measurement errors with a limiting Gaussian distribution without relying on a Gaussian assumption for the errors nor knowledge about their variances. In addition, IV can be easily extended to the multivariate setting and standard errors and confidence intervals can be obtained analytically. Hence, IV addresses all of the technical challenges of Deming regression and is our preferred method to estimate the CO_2 airborne fraction.

Our point estimate of the CO_2 airborne fraction is 44.8%(± 1.4%; 1\sigma) for the simple specification, and 47.3%(± 1.1%; 1\sigma) for the extended specification. We show that these estimates are not statistically different from the OLS estimates while being robust to measurement errors without relying on additional assumptions. Notably, given the discussion above on the Deming regression, this article is the first to provide estimates robust to measurement errors for the extended specification with additional covariates.

2 Data

Figure 1 shows the data used in this study. All data cover the period 1959–2022 and is measured yearly.

CO_2 atmospheric concentration

CO_2 anthropogenic emission measures

Volcanic activity index

ENSO and detrended ENSO
Figure 1: Data used in this study. Top left: atmospheric CO_2 concentration. Top right: CO_2 emissions from anthropogenic sources. Bottom left: volcanic activity index. Bottom right: El Niño Southern Oscillation (ENSO) index and detrended ENSO index. Data sources are described in the text.

Data for atmospheric CO_2 [top left] and CO_2 emissions from anthropogenic sources [top right, blue solid] were obtained from the Global Carbon Project (Friedlingstein et al. 2023). Data for volcanic activity (VAI) [bottom left] are obtained from (Ammann et al. 2003). El Niño Southern Oscillation (ENSO) data [bottom right, solid purple] are constructed by Bennedsen, Hillebrand, and Koopman (2024) from the El Niño 3 SST Index of the National Oceanic and Atmospheric Administration.

Note that Bennedsen, Hillebrand, and Koopman (2024) detrended the ENSO data [bottom right, dashed orange]. However, the trend is not statistically significant (p-value = 0.5464), so we use the original data for this study. Nevertheless, results using the detrended ENSO data are quite similar and can be recovered using the code in the supplementary material.

Additional data sources for emissions [top right, dashed orange and dotted green] are described in Section 3.2.

3 Methods

3.1 Linear models

Bennedsen, Hillebrand, and Koopman (2024) proposed to estimate the airborne fraction using OLS in the following linear model, G_{t}=\alpha E_{t}+u_{t}, \tag{1}

where \alpha is the estimated CO_2 airborne fraction, G_t are the changes in atmospheric CO_2, and E_t are CO_2 anthropogenic emissions. The error u_t is assumed to be a zero-mean-error process.

They also considered an extended specification including additional covariates to reduce the variance of the error process. The preferred specification of Bennedsen, Hillebrand, and Koopman (2024) includes controls for the effects of the El Niño Southern Oscillation and volcanic activity. Their extended specification is thus given by, G_{t}=\alpha E_{t} + \gamma_1 ENSO_t + \gamma_2 VAI_t +u_{t}, \tag{2}

where ENSO_t is the El Niño Southern Oscillation, VAI_t is volcanic activity; and G_t, E_t, and u_t are as before.

For the rest of this paper, we will refer to Equation 1 and Equation 2 as the simple specification and extended specification, respectively.

3.2 Measurement errors

The implications of measurement error can generally be divided into two cases based on severity. The first case involves measurement error only in the dependent variable (G_t in Equation 1 and Equation 2). The second, which is of primary interest, involves the measurement error in the independent variable (E_t in Equation 1 and Equation 2). As shown in the following, the measurement error in the dependent variable does not cause any bias in OLS, whereas the measurement error in the independent variable can have severe implications.

We begin with the case where the dependent variable is the only variable measured with error. In Equation 1, assume that the changes in atmospheric CO_2 are measured with error; that is, G_t=G^{\star}_t+\omega_t where \omega_t is the measurement error. Substituting and rearranging terms, we obtain G^{\star}_{t}=\alpha E_{t}+u_{t}-\omega_t, \tag{3}

which can be estimated by OLS provided that the population orthogonality condition \mathbb{E}[E'(u-\omega)]=0 holds. Note that OLS in Equation 3 will yield unbiased estimates since the model satisfies all the assumptions required. The only drawback is that the resulting residual will have a larger variance due to the presence of the additional term, compared to the estimate in Equation 1 where the true CO_2 coverage data are used.

Next, we focus on the case of measurement error in the explanatory variable. One problem with estimating the airborne fraction is that the emissions data are potentially subject to measurement errors, particularly in the early years. Emissions are formally computed as E_t = E_t^{FF}+E_t^{LULCC}, where E_t^{FF} are fossil fuel emissions, and E_t^{LULCC} are the emissions from land-use and land-cover change (LULCC), respectively.

LULCC is measured using different methods. Figure 2 shows three different measurements for LULCC. The GCP LULCC data [solid blue] are from the Global Carbon Project (Friedlingstein et al. 2023), the H&C LULCC data [dashed orange] are from Houghton and Castanho (2022), and the vMa LULCC data [dotted green] are from Marle et al. (2022).

Figure 2: Different land-use and land-cover change (LULCC) datasets.

The measurement errors in LULCC translate into the measurement errors in the anthropogenic CO_2 emissions. Figure 1 shows the different emissions measurements given the LULCC data used. Given the differences, it can be argued that there is some uncertainty in the measurements in the data. We show that measurement errors in emissions can bias the estimates of the CO_2 airborne fraction.

Assume that we do not observe the true emissions, but rather a noisy version of it. That is, we observe E_{1,t} = E_t^* + \eta_t, where E_t^* are the true emissions and \eta_t is the measurement error, which we assume to have mean zero and variance \sigma^2_\eta. Estimating the CO_2 airborne fraction from noisy emissions data using OLS, we have:

\begin{aligned} \hat{\alpha} & = (E_1^\top E_1)^{-1}(E_1^\top G) = \frac{\sum_{t=1}^T (E_{1,t}^*G_t + \eta_t G_t)}{\sum_{t=1}^T (E_{1,t}^{*2}+2E_{1,t}^*\eta_t+\eta_t^2)}\\ &\rightarrow \frac{\frac{1}{T}\sum_{t=1}^T E_{1,t}^*G_t}{\frac{1}{T}\sum_{t=1}^T E_{1,t}^{*2}+\sigma_\eta^2} = \alpha \times \left(\frac{\frac{1}{T}\sum_{t=1}^T E_{1,t}^{*2}}{\frac{1}{T}\sum_{t=1}^T E_{1,t}^{*2}+\sigma_\eta^2}\right), \end{aligned} \tag{4}

where \alpha is the true value for the CO2 airborne fraction. The notation on the second line denotes convergence as the sample size increases. Hence, the estimator does not converge to the true airborne fraction unless \sigma_\nu^2=0, which is tantamount to having no measurement errors. Furthermore, assuming that the measurement errors are not correlated with the variables of interest, Equation 4 shows that the estimator is biased downward. The bias increases with the variance of the measurement error, which is unknown.

3.3 Deming regression

To alleviate potential bias, Bennedsen, Hillebrand, and Koopman (2024) used Deming regression (Deming 1943). As discussed in Section 1, Deming regression requires knowledge of the ratio of measurement error variances and the assumption that the errors are Gaussian.

Formally, assume that G_{t} and E_{t} are noisy measurements of the variables G^{\star}_{t} and E^{\star}_{t}, respectively, such that,

G_{t}=\alpha E_{t} + \epsilon_{G,t}, \ \epsilon_{G,t}\overset{iid}{\sim} \mathbb{N}(0,\sigma^{2}_{G}), \tag{5} E_{t}=E^{\star}_{t}+\epsilon_{E,t}, \ \epsilon_{E,t}\overset{iid}{\sim} \mathbb{N}(0,\sigma^{2}_{E}), \tag{6}

where iid stands for independent and identically distributed, \mathbb{N} denotes the Gaussian distribution, and \epsilon_{G,t} and \epsilon_{E,t} are assumed independent from one another. Let \delta=\sigma^{2}_{G}/\sigma^{2}_{E} be the ratio of the two measurement error variances. The Deming regression estimate of \alpha in the system defined by Equation 5 and Equation 6 is obtained by maximum likelihood. It is given by,

\hat{\alpha}_{Deming}=\frac{G^\top G-\delta E^\top E+\sqrt{(G^\top G-\delta E^\top E)^{2}+4\delta (E^\top G)^2}}{2E^\top G}, \tag{7}

which shows that the estimate directly depends on the chosen value for \delta.

In addition to the theoretical hurdles, the Deming regression estimation presents some computational shortcomings. Estimates in the multivariate setting are not available in closed form. Furthermore, standard errors and confidence intervals for its estimates are not available analytically, even in the univariate specification. In the following, we develop methods to address these challenges.

3.3.1 Deming regression multivariate case

To the best of our knowledge, no closed-form solution has been developed for the multivariate setting. Case in point, Bennedsen, Hillebrand, and Koopman (2024) did not include results for the Deming regression for their preferred specification given by Equation 2. Therefore, their analysis lacks results that are robust to measurement errors in the preferred specification. One aim of this paper is to address this shortcoming.

Consider the extended specification in Equation 2 where G_t and E_t are noisy measurements. We are interested in the system given by,

G_{t} =\alpha E_{t} + \gamma_1 ENSO_t + \gamma_2 VAI_t +\epsilon_{G,t}, \ \epsilon_{G,t}\overset{iid}{\sim} \mathbb{N}(0,\sigma^{2}_{G}), \tag{8} E_{t}=E^{\star}_{t}+\epsilon_{E,t}, \ \epsilon_{E,t}\overset{iid}{\sim} \mathbb{N}(0,\sigma^{2}_{E}), \tag{9}

where all terms are as before.

In the supplementary material, we show that the Deming regression estimate for a multivariate system as in Equation 8 and Equation 9 can be recovered from the univariate Deming regression using the Frisch-Waugh-Lovell (FWL) theorem (Frisch and Waugh 1933; Lovell 1963).

In the context of the CO_2 airborne fraction, the FWL theorem guarantees that the CO_2 airborne fraction estimator in the preferred specification defined in Equation 2 is the same as the airborne fraction estimator in the following specification:

(\mathbb{I}-P_W)G_t = \alpha (\mathbb{I}-P_W)E_t + (\mathbb{I}-P_W)u_{t}, \tag{10}

where \mathbb{I} is the identity matrix of same size as the sample, W_t = [ENSO_t,\ VAI_t] is the matrix containing the additional covariates, and P_W = W(W^\top W)^{-1}W^\top is the projection matrix onto the column space defined by the additional covariates. That is, the FWL theorem shows that we can estimate the CO_2 airborne fraction using the residuals of regressing the atmospheric CO_2 concentration and CO_2 emissions on the El Niño index and the volcanic activity index.

Hence, estimating Deming regression in the multivariate system, Equation 8 and Equation 9, is equivalent to using Deming regression in the system defined by Equation 10 and Equation 9. This novel theoretical result allows us to obtain Deming regression estimates robust to measurement errors for the preferred specification.

Note, however, that the Deming regression assumptions of a Gaussian distribution for measurement errors and a known ratio of their variances are required.

3.3.2 Deming regression inference

Originally proposed by Efron (1992), bootstrap has become a major tool to approximate sampling distributions and variance of complex statistics. This is given its ability to estimate distributions for statistics when analytical solutions are unavailable. In addition, bootstrap methods often yield more accurate results than standard methods that rely on asymptotics. Hence, bootstrap is used as a means to improve the accuracy of confidence intervals.

For ease of exposition, we show how to employ a model-based bootstrap approach to calculate the confidence intervals of the Deming regression estimate, \hat{\alpha}_{Deming}, in the simple specification. The steps for the multivariate specification are analogous. The algorithm proceeds as follows:

  1. Estimate Equation 1 using the Deming regression formula shown in Equation 7 and obtain the residuals \hat{u}_{t} for t=1,\dots, T based on \hat{\alpha}_{Deming}. Let \tilde{u}_{t}=\hat{u}_{t}-\frac{1}{T}\sum_{t=1}^{T}\hat{u}_{t} be the recentred residuals.
  2. Sample randomly with replacement the residuals \tilde{u}_{t} to generate the bootstrap pseudo-residuals \tilde{u}^{\ast}_{t}. Generate pseudo-data in the G domain by using recursively the following equation: G^{\ast}_{t}=\hat{\alpha}_{Deming} E_{t}+\tilde{u}^{\ast}_{t}. \tag{11}
  3. Repeat the previous step B times, with B sufficiently large, and generate independent copies \hat{\alpha}^{\ast}_{Deming,1},\dots,\hat{\alpha}^{\ast}_{Deming,B} based on Equation 11.
  4. Calculate s.e(\hat{\alpha}_{Deming})=\sqrt{\frac{1}{B-1}\sum_{i=1}^{B}(\hat{\alpha}^{\ast}_{Deming,i}-\bar{\hat{\alpha}}^{\ast}_{Deming})^2}, where \bar{\hat{\alpha}}^{\ast}_{Deming}=\frac{1}{B}\sum_{i=1}^{B}\hat{\alpha}^{\ast}_{Deming,i}. The confidence intervals are then obtained as \left[\hat{\alpha}_{Deming}- q^{\ast}(1-\mathit{\alpha}/2)\ s.e(\hat{\alpha}_{Deming}),\right. \left. \hat{\alpha}_{Deming}+ q^{\ast}(\mathit{\alpha}/2)\ s.e(\hat{\alpha}_{Deming})\right], where q^{\ast}(1-\mathit{\alpha}/2) and q^{\ast}(\mathit{\alpha}/2) denote the 1-\mathit{\alpha}/2 and \mathit{\alpha}/2 percentiles of \hat{\alpha}^{\ast}_{Deming}.

We use this bootstrap algorithm to obtain standard errors and confidence intervals for all Deming regression estimates considered in this study.

3.4 Instrumental variable regression

This article proposes estimating the airborne fraction using instrumental variables to obtain robust estimates without additional assumptions compared to those of OLS. That is, IV is a robust method that does not require the Gaussian assumption nor the knowledge of the measurement error variance. The trade-off is that IV requires the existence of an instrument that is correlated with the emissions variable but uncorrelated with the measurement error. Identifying instruments is a challenging task in most settings. However, in the context of CO_2 airborne fraction estimation, we propose using the different measurements of LULCC as emissions instruments. We show that we can use these different measurements to estimate the CO_2 airborne fraction without bias, even under the assumption that all of these different sources of data are subject to measurement error.

Consider an additional emissions measurement, E_{2,t} = E_t^* + \kappa_t, where \kappa_t is a measurement error, not correlated with \eta_t. Considering the different sources of LULCC data, it is reasonable to presume that the measurement errors do not correlate between them. Using E_{2,t} as an instrument for E_{1,t}, IV estimates the airborne fraction as:

\begin{aligned} \hat{\alpha}_{IV} &= (E_2^\top E_1)^{-1}(E_2^\top G) = \frac{\sum_{t=1}^T E_{2,t}G_t}{\sum_{t=1}^T E_{2,t}E_{1,t}} \\ &= \frac{\sum_{t=1}^T (E_t^*G_t + \kappa_t G_t)}{\sum_{t=1}^T (E_t^{*2}+E_t^*(\eta_t+\kappa_t)+\eta_t\kappa_t)}\rightarrow\alpha. \end{aligned} \tag{12}

Hence, \hat{\alpha}_{IV} is not biased by the measurement error in the emissions variable. In addition to being unbiased, IV is consistent and follows a central limit theorem with limiting Gaussian distribution, even if the error terms are not Gaussian. Note that this last property is shared by OLS. In fact, the proof for the IV case closely follows that of the OLS case.

The instrumental variable estimator can be extended to the case where we have more than one instrument. In this case, we can use the generalised instrumental variable estimator (GIVE). Furthermore, the estimator can be easily applied to the extended model specification, which is the preferred specification in Bennedsen, Hillebrand, and Koopman (2024). Let X be the matrix of regressors that includes emissions and additional covariates and let Z denote the matrix of instruments. Then, the GIVE is computed as:

\hat{\alpha}_{GIVE} = (X^\top P_Z X)^{-1}(X^\top P_Z G)\rightarrow\hat{\alpha}^*, \tag{13}

where P_Z = Z(Z^\top Z)^{-1}Z^\top is the projection matrix in the space defined by the instruments.

Similarly to IV, the GIVE is consistent and with limiting Gaussian distribution. The GIVE is also unbiased in the presence of measurement errors that are not correlated with the variables in the model. What is more, the variance of the GIVE has a closed-form expression given by: \mathrm{Var}\left[ {\hat{\alpha}_{GIVE}} \right] = \hat{\sigma}^2 (X^\top P_Z X)^{-1}, \tag{14}

where \hat{\sigma}^2 = \frac{1}{T} \sum_{t=1}^T (G_t - X_t\hat{\alpha}_{GIVE})^2.

For a textbook treatment of IV and GIVE, see Davidson and MacKinnon (2004). Given all its statistical and computational advantages, GIVE is our preferred estimation method. To simplify notation, we denote both IV and GIVE as IV for the rest of this article.

4 Results

Table 1 shows the estimates for the sample 1959-2022 (full sample) of the CO_2 airborne fraction using the LULCC measurements as instruments. Three cases are considered: i) using H&C LULCC as an instrument, ii) using vMa LULCC as an instrument, and iii) using both H&C and vMa LULCC as instruments. The results for the simple specification are presented in the left columns, while the results for the extended specification are presented in the right most columns. As discussed in Section 2, note that we used ENSO data without detrending, since there is no statistical evidence of a linear trend. However, the results using the detrended ENSO are quite similar. The table below also presents the results from the OLS and Deming regressions for both specifications.

Table 1: Least-squares and instrumental variables estimates of the simple specification (left panel) and the extended specification including additional covariates (right panel) for the full sample (1959-2022). The 95% confidence intervals for the LS and IV regression are based on the Gaussian distribution. The Deming regression estimates were obtained using \delta \in \{0.2, 0.5, 1, 2, 5\}. Standard errors and confidence intervals for Deming regression are computed using 9999 bootstrap replications.
(a) Full sample
Simple specification Est. S. e. Confidence int.
OLS Regression 0.4478 0.0142 [0.4199, 0.4757]
IV reg. (H&N) 0.4479 0.0143 [0.4200, 0.4758]
IV reg. (vMA) 0.4482 0.0143 [0.4202, 0.4761]
IV reg. (H&N-vMA) 0.4476 0.0142 [0.4197, 0.4756]
Deming reg. (0.2) 0.4623 0.0149 [0.4548, 0.4690]
Deming reg. (0.5) 0.4561 0.0146 [0.4489, 0.4624]
Deming reg. (1) 0.4526 0.0146 [0.4455, 0.4589]
Deming reg. (2) 0.4504 0.0145 [0.4434, 0.4566]
Deming reg. (5) 0.4489 0.0142 [0.4421, 0.4549]
(b) Additional covariates, full sample
Extended specification Est. S. e. Confidence int.
OLS Regression 0.4735 0.0108 [0.4522, 0.4947]
IV reg. (H&N) 0.4727 0.0108 [0.4514, 0.4939]
IV reg. (vMA) 0.4723 0.0109 [0.4511, 0.4936]
IV reg. (H&N-vMA) 0.4730 0.0109 [0.4515, 0.4944]
Deming reg. (0.2) 0.4815 0.0107 [0.4760, 0.4865]
Deming reg. (0.5) 0.4782 0.0107 [0.4728, 0.4831]
Deming reg. (1) 0.4762 0.0106 [0.4698, 0.4811]
Deming reg. (2) 0.4750 0.0105 [0.4698, 0.4798]
Deming reg. (5) 0.4741 0.0107 [0.4688, 0.4789]

Several remarks are in order given the results in Table 1 (a) considering the simple specification, Equation 1. First, estimates are centred around 44% for the simple specification. Second, the differences in estimates by OLS and IV are small. In the simple specification, the estimate using OLS is 44.78%, while the IV estimate ranges from 44.76% to 44.82%. Additional tests in the supplementary material show that the estimates are not statistically different (p-values between 0.2672 and 0.8489). In contrast, the Deming regression, used by Bennedsen, Hillebrand, and Koopman (2024) as a robust procedure for measurement error, varies from 45.04% to 46.23% depending on the selected measurement error variance ratio (\delta in Equation 7).

Next, Table 1 (b) reports the results for the extended specification, Equation 2, with ENSO and VAI included. Note that the results in Bennedsen, Hillebrand, and Koopman (2024) do not consider this specification for the Deming regression. The OLS estimate is 47.35%, while the IV estimates are 47.27%, 47.23%, and 47.30%, depending on the instruments used. In contrast, the Deming estimates range from 47.41% to 48.15%. Hence, Deming regression estimates show more variability, due to the unknown measurement error variance ratio.

A significant contribution of our results above is that we can now quantify the uncertainty regarding the Deming regression estimates. It should be noted that the OLS estimates fall outside the confidence intervals of the Deming regression estimates using \delta=0.2 and 0.5 for the simple specification, and for \delta=0.2 for the full specification. In contrast, the OLS estimate always falls within the confidence intervals of the IV estimates. These results show the sensitivity of Deming regression to the choice of \delta, and further illustrate our preference for IV estimation.

Table 2 reports the results for the subsample starting in 1992, as a robustness check. In general, note that the smaller sample size increases the standard errors for all the estimates.

Table 2: Least-squares and instrumental variables estimates of the simple specification (left panel) and the extended specification including additional covariates (right panel) for the recent subsample (1992-2022). The 95% confidence intervals for the LS and IV regression are based on the Gaussian distribution. The Deming regression estimates were obtained using \delta \in \{0.2, 0.5, 1, 2, 5\}. Standard errors and confidence intervals for Deming regression are computed using 9999 bootstrap replications.
(a) Recent sample
Simple specification Est. S. e. Confidence int.
OLS Regression 0.4497 0.0173 [0.4157, 0.4836]
IV Reg. (H&N LULCC) 0.4496 0.0173 [0.4157, 0.4836]
IV Reg. (vMA LULCC) 0.4502 0.0245 [0.4022, 0.4982]
IV Reg. (H&N-vMA) 0.4495 0.0173 [0.4156, 0.4834]
Deming reg. (-0.2) 0.4598 0.0176 [0.4509, 0.4675]
Deming reg. (-0.5) 0.4555 0.0175 [0.4468, 0.4630]
Deming reg. (-1) 0.4531 0.0172 [0.4446, 0.4603]
Deming reg. (-2) 0.4515 0.0172 [0.4431, 0.4587]
Deming reg. (-5) 0.4504 0.0171 [0.4422, 0.4576]
(b) Additional covariates, recent sample
Extended specification Est. S. e. Confidence int.
OLS Regression 0.4622 0.0112 [0.4402, 0.4843]
IV Reg. (H&N LULCC) 0.4622 0.0112 [0.4402, 0.4842]
IV Reg. (vMA LULCC) 0.4623 0.0112 [0.4402, 0.4843]
IV Reg. (H&N-vMA) 0.4622 0.0112 [0.4401, 0.4842]
Deming reg. (0.2) 0.4662 0.0105 [0.4611, 0.4709]
Deming reg. (0.5) 0.4645 0.0106 [0.4594, 0.4692]
Deming reg. (1) 0.4636 0.0107 [0.4584, 0.4683]
Deming reg. (2) 0.4630 0.0107 [0.4578, 0.4677]
Deming reg. (5) 0.4625 0.0106 [0.4574, 0.4672]

In the simple specification the results, shown in Table 2 (a), align closely with those of the full sample: the OLS and IV estimates are similar. The Deming regression produces higher estimates that decrease as \delta increases. As in the full sample, the OLS estimate falls outside the confidence interval of the Deming regression for \delta=0.2.

The extended specification for the subsample, shown in Table 2 (b), yields lower estimates than the extended specification for the full sample. The OLS estimate is 46.13%, while the IV estimates are 46.22% and 46.23%. Deming regression estimates range from 46.25% to 46.62%. In this case, in part due to the larger standard errors, the OLS estimate falls within the confidence interval of the Deming regression.

5 Discussion

The CO_2 airborne fraction estimates in the literature have fluctuated around a constant value over the period 1959 to 2022. The consensus estimate of the CO_2 airborne fraction is around 44.0%. Bennedsen, Hillebrand, and Koopman (2024), using a regression-based approach in an extended specification, found this parameter to be around 47.0%. In this article, we examine the impact of measurement errors on the estimation of the CO_2 airborne fraction. Anticipating possible bias given measurement errors, Bennedsen, Hillebrand, and Koopman (2024) used Deming regression. However, Deming regression was only used in the simple specification and no standard errors or confidence intervals were computed. Furthermore, Deming regression relies on strong assumptions about the measurement errors.

To alleviate these shortcomings, in this paper we develop a method to estimate the Deming regression in the extended specification. Furthermore, we show how to use bootstrap to obtain standard errors and confidence intervals. We are the first to obtain estimates, standard errors, and confidence intervals for the multivariate Deming regression specification.

Given the strong assumptions required for Deming regression, we propose using IV and show how to use different measurements as instruments. Our results show that IV provides estimates of the CO_2 airborne fraction that are close to the OLS estimates. However, IV remains unbiased in the presence of measurement errors. Moreover, in contrast to the Deming regression case, there are closed-form expressions to compute the standard errors for IV. Hence, our results add robustness to the estimation of the CO_2 airborne fraction in the presence of measurement errors.

Reproducibility statement

All the code and data used in this analysis can be accessed in the corresponding author’s GitHub repository at: github.com/everval/Robust-CO2-Estimation-Supplementary.

The authors believe that open source code is essential for inclusive research, particularly in the context of climate change, which is a global challenge that affects all countries, especially developing countries. Hence, all the results presented in this paper are obtained using Julia (Bezanson et al. 2017), a free open source programming language.

Acknowledgements

The authors thank Mikkel Bennedsen, Eric Hillebrand, and Olivia Kvist for useful comments and suggestions to an earlier draft.

References

Ammann, Caspar M, Gerald A Meehl, Warren M Washington, and Charles S Zender. 2003. “A Monthly and Latitudinally Varying Volcanic Forcing Dataset in Simulations of 20th Century Climate.” Geophysical Research Letters 30 (12).
Bennedsen, Mikkel, Eric Hillebrand, and Siem Jan Koopman. 2019. “Trend Analysis of the Airborne Fraction and Sink Rate of Anthropogenically Released CO 2.” Biogeosciences 16 (18): 3651–63.
———. 2023. “On the Evidence of a Trend in the CO2 Airborne Fraction.” Nature 616 (7956): E1–3.
———. 2024. “A Regression-Based Approach to the CO2 Airborne Fraction.” Nature Communications 15 (1): 8507. https://doi.org/10.1038/s41467-024-52728-1.
Bezanson, Jeff et al. 2017. “Julia: A Fresh Approach to Numerical Computing.” SIAM Review 59 (1): 65–98. https://doi.org/10.1137/141000671.
Canadell, Josep G et al. 2007. “Contributions to Accelerating Atmospheric CO2 Growth from Economic Activity, Carbon Intensity, and Efficiency of Natural Sinks.” Proceedings of the National Academy of Sciences 104 (47): 18866–70.
Canadell, Josep G, Pedro MS Monteiro, Marcos H Costa, et al. 2023. “Intergovernmental Panel on Climate Change (IPCC). Global Carbon and Other Biogeochemical Cycles and Feedbacks.” In Climate Change 2021: The Physical Science Basis. Contribution of Working Group i to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change, 673–816. Cambridge University Press.
Davidson, R, and James G MacKinnon. 2004. Econometric Theory and Methods. Oxford University Press.
Deming, William Edwards. 1943. Statistical Adjustment of Data. Wiley.
Durbin, James. 1954. “Errors in Variables.” Revue de l’institut International de Statistique, 23–32.
Efron, Bradley. 1992. “Bootstrap Methods: Another Look at the Jackknife.” In Breakthroughs in Statistics: Methodology and Distribution, 569–93. Springer.
Friedlingstein, Pierre, Michael O’sullivan, Matthew W Jones, et al. 2023. “Global Carbon Budget 2023.” Earth System Science Data 15 (12): 5301–69.
Frisch, Ragnar, and Frederick V Waugh. 1933. “Partial Time Regressions as Compared with Individual Trends.” Econometrica, 387–401.
Houghton, Richard A, and Andrea Castanho. 2022. “Annual Emissions of Carbon from Land Use, Land-Use Change, and Forestry 1850–2020.” Earth System Science Data Discussions 2022: 1–36.
Le Quéré, C. et al. 2009. “Trends in the Sources and Sinks of Carbon Dioxide.” Nature Geoscience 2 (12): 831–36.
Lovell, Michael C. 1963. “Seasonal Adjustment of Economic Time Series and Multiple Regression Analysis.” Journal of the American Statistical Association 58 (304): 993–1010.
Marle, Margreet JE van, Dave van Wees, Richard A Houghton, Robert D Field, Jan Verbesselt, and Guido R van der Werf. 2022. “RETRACTED ARTICLE: New Land-Use-Change Emissions Indicate a Declining CO2 Airborne Fraction.” Nature 603 (7901): 450–54.
Raupach, Michael R et al. 2014. “The Declining Uptake Rate of Atmospheric CO 2 by Land and Ocean Sinks.” Biogeosciences 11 (13): 3453–75.
Reiersøl, Olav. 1941. “Confluence Analysis by Means of Lag Moments and Other Methods of Confluence Analysis.” Econometrica, 1–24.
Sargan, John D. 1958. “The Estimation of Economic Relationships Using Instrumental Variables.” Econometrica, 393–415.