Note

This is a notebook accompanying the paper Robust Estimation of Carbon Dioxide Airborne Fraction Under Measurement Errors. The notebook includes the proofs of the theoretical results presented in the paper. Moreover, the notebook develops the code used to estimate the airborne fraction using instrumental variables, a robust method to correct for measurement errors. We also extend Deming regression to estimate the airborne fraction and its uncertainty using the bootstrap method. The notebook includes the code to reproduce all figures and tables in the paper.

The up-to-date version of this notebook is available at https://github.com/everval/Robust-CO2-Estimation-Supplementary.

Deming regression with additional covariates

Deming regression is a generalisation of OLS that accounts for measurement errors in the independent variable and the dependent variable. Let G_t, and E_t be the noisy atmospheric growth and emissions variables measured with errors.

Deming regression poses the following model:

E_t = E_t^* + \eta_t,

G_t = \alpha E_t^* + \omega_t,

where E_t^* are the true emissions, and \eta_t and \omega_t are the measurement errors. Assuming that the measurement errors are Gaussian with variances given by \sigma_{\eta}^2, and \delta\sigma_{\eta}^2, respectively, the Deming regression estimator is given as the maximum likelihood estimator. Note that the notation implies that the ratio of the variances of the measurement errors is \delta.

In this section, we extend the Deming regression to include additional covariates. We consider the following model:

E_t = E_t^* + \eta_t, \tag{1}

G_t = \alpha E_t + Z_t\gamma + \omega_t, \tag{2}

where Z_t is a row vector of additional covariates at time t, \gamma is a vector of coefficients, and the rest of the notation is as before. In this paper, we consider the El Niño index and the volcanic activity index as additional covariates, but the model can be extended to include other covariates. Assuming that the measurement errors are Gaussian, the Deming regression estimator is obtained as maximum likelihood estimator.

Frisch-Waugh-Lovell theorem

In the context of the airborne fraction, the Frisch-Waugh-Lovell (FWL) theorem (Frisch and Waugh 1933; Lovell 1963) can be used to estimate the Deming regression in the preferred specification including the El Niño index and the volcanic activity index. The FWL theorem guarantees that the airborne fraction estimator, \hat{\alpha}, in the preferred specification given by:

G_t = \alpha E_t + \gamma_1 ENSO_t + \gamma_2 VAI_t + u_{t}

is the same as the airborne fraction estimator in the following specification:

(\mathbb{I}-P_Z)G_t = \alpha (\mathbb{I}-P_Z)E_t + (\mathbb{I}-P_Z)u_{t}, \tag{3}

where Z_t = [ENSO_t,\ VAI_t], and P_Z = Z(Z'Z)^{-1}Z' is the projection matrix onto the column space of Z. Hence, we can estimate the airborne fraction using the residuals from regressing the atmospheric CO_2 concentration and emissions on the El Niño index and the volcanic activity index.

In the following, we show that estimating Equation 3 using Deming regression is equivalent to estimating the model specified by Equation 1 and Equation 2.

We have the following theorem:

Theorem 1 (Deming regression with additional covariates) The Deming regression estimator in the model specified by Equation 1 and Equation 2 is equivalent to the Deming regression in the model specified by Equation 1 and Equation 3. That is, the model where the FWL theorem has been applied.

Proof. The likelihood function for the model specified by Equation 1 and Equation 2 is given by:

L = \prod_{t=1}^T \frac{1}{\sqrt{2\pi\sigma_{\eta}^2}}\exp\left(-\frac{1}{2\sigma_{\eta}^2}\left(E_t - E_t^*\right)^2\right)\frac{1}{\sqrt{2\pi\lambda\sigma_{\eta}^2}}\exp\left(-\frac{1}{2\lambda\sigma_{\eta}^2}\left(G_t - \alpha E_t - Z_t\gamma\right)^2\right).

And the log-likelihood function is given by:

\begin{aligned} \mathcal{L} = \log L &= -\frac{T}{2}\log(2\pi\sigma_{\eta}^2) - \frac{1}{2\sigma_{\eta}^2}\sum_{t=1}^T\left(E_t - E_t^*\right)^2 \\ &- \frac{T}{2}\log(2\pi\lambda\sigma_{\eta}^2) - \frac{1}{2\lambda\sigma_{\eta}^2}\sum_{t=1}^T\left(G_t - \alpha E_t - Z_t\gamma\right)^2. \end{aligned}

Differentiating the log-likelihood and setting the derivatives to zero, we obtain the Deming regression estimator.

In the following, we solve for \gamma and replace it in the equations for \alpha and E_t^* to show that they solve the Deming regression equations in the model with additional covariates where the FWL theorem has been applied; that is, the model specified by Equation 1 and Equation 3.

Solving for \gamma

Rewriting the log-likelihood function using matrix notation, we have:

\begin{aligned} \mathcal{L} &= -\frac{T}{2}\log(2\pi\sigma_{\eta}^2) - \frac{1}{2\sigma_{\eta}^2}\left(E - E^*\right)'(E - E^*) \\ &- \frac{T}{2}\log(2\pi\lambda\sigma_{\eta}^2) - \frac{1}{2\lambda\sigma_{\eta}^2}\left(G - \alpha E - Z\gamma\right)'(G - \alpha E - Z\gamma). \end{aligned}

Hence, the derivative of the log-likelihood with respect to \gamma is given by:

\frac{\partial \mathcal{L}}{\partial\gamma} = \frac{1}{2\lambda\sigma_{\eta}^2}\left(2Z'G-2\alpha Z'E^* - 2 Z'Z\gamma \right)

Equating the derivative to zero, we obtain:

Z'G - \alpha Z'E^* - Z'Z\gamma = 0,

which implies that:

\hat{\gamma} = (Z'Z)^{-1}Z'(G - \alpha E^*). \tag{4}

Solving for \alpha

The derivative of the log-likelihood with respect to \alpha is given by:

\frac{\partial \mathcal{L}}{\partial\alpha} = \frac{1}{2\lambda\sigma_{\eta}^2}\left(2E^{*'}G - 2\alpha E^{*'}E^* - 2 E^{*'} Z\gamma \right)

Equating the derivative to zero, we obtain:

E^{*'}\left(G - \alpha E^* - Z\gamma \right) = 0,

Replacing \gamma from Equation 4, we obtain:

E^{*'}\left( (\mathbb{I} - Z(Z'Z)^{-1}Z' ) (G - \alpha E^*) \right) = 0,

which implies that \hat{\alpha} solves:

E^{*'}\left( (\mathbb{I} - P_Z)G - \alpha (\mathbb{I} - P_Z)E^* \right) = 0, \tag{5}

where P_Z = Z(Z'Z)^{-1}Z' is the projection matrix onto the column space of Z.

Solving for E^*

The derivative of the log-likelihood with respect to E^*_t is given by:

\frac{\partial \mathcal{L}}{\partial E^*_t} = -\frac{2}{2\sigma_{\eta}^2}\left(E_t - E^*_t\right) - \frac{2\alpha}{2\lambda\sigma_{\eta}^2}\left(G_t -\alpha E_t^* - Z_t\gamma\right)

Equating the derivative to zero, we obtain:

\lambda (E_t - E^*_t) + \alpha\left(G_t - \alpha E_t^* - Z_t\gamma\right) = 0.

Replacing \gamma from Equation 4, we obtain:

\lambda (E_t - E^*_t) + \alpha\left( (\mathbb{I} - P_Z) G_t - \alpha (\mathbb{I} - P_Z) E_t^* \right) = 0, \tag{6}

where P_Z is as before.

Comparison with the FWL theorem and additional covariates

Solving the Deming regression with additional covariates implies that estimates for \alpha and E^* are obtained as the solutions to Equation 5 and Equation 6. Analogous steps to the ones above show that these equations are the same as the equations obtained from solving the Deming regression in the model specified by Equation 1 and Equation 3. This implies that the Deming regression with additional covariates is equivalent to the Deming regression in the model specified by Equation 3, where the FWL theorem has been applied. \square

Reproducing the results

Setup

This notebook is written in Julia and uses the following packages:

DataFrames for data manipulation
XLSX for reading data from an Excel file
Plots
Statistics
Distributions

All packages are available in the Julia registry and can be installed using the Julia package manager with the following command:

using Pkg
Pkg.add("DataFrames", "XLSX", "Plots", "Statistics", "Distributions")

In the following, we load a proejct environment that contains the necessary packages. This step is not required if the packages are already installed in the current environment.

In [50]:

using Pkg
Pkg.activate(pwd())
Pkg.instantiate()

  Activating project at `~/Library/CloudStorage/OneDrive-AalborgUniversitet/Research/CLIMATE/AirborneFraction/QuartoVersion`

Airborne fraction

The airborne fraction is the fraction of CO_2 emissions that remain in the atmosphere. It is a key parameter in the carbon cycle and is used to estimate the impact of human activities on the climate system. The airborne fraction is defined as the ratio of the increase in atmospheric CO_2 concentration to the total CO_2 emissions.

Data

We load the data, which is neatly collected in an Excel file in the author’s GitHub repository at the following link.

To ease things up, we have downloaded the data directly from the repository and saved it in the file AF_data.xlsx in the local folder.

In [51]:

using DataFrames, XLSX

path = "AF_data.xlsx"

data = DataFrame(XLSX.readtable(path, "Data"))

year = data[!, 1];
fossilfuels = Vector{Float64}(data[!, 4]);
lulcc = Vector{Float64}(data[!, 6]);
emissions = fossilfuels .+ lulcc;
coverage = Vector{Float64}(data[!, 5]);
VAI = Vector{Float64}(data[!, 9]);
ENSO = Vector{Float64}(data[!, 10]);
E = emissions;
G = coverage;

# Other metrics
lulcc₂ = Vector{Float64}(data[!, 7]);
lulcc₃ = Vector{Float64}(data[!, 8]);

E₂ = fossilfuels .+ lulcc₂;
E₃ = fossilfuels .+ lulcc₃;

Plotting the data

In [52]:

using Plots

l = @layout [a b; c d]
p1 = plot(year, G, label="Atmospheric concentration", xlabel="Year", ylabel="GtC/yr", title="", style=:solid, linewidth=2, color=1)
p2 = plot(year, E, label="Emissions", xlabel="Year", ylabel="GtC/yr", title="", style=:dash, linewidth=2, color=2)
p3= plot(year, VAI, label="Volcanic activity index (VAI)", xlabel="Year", ylabel="", title="", style=:dot, linewidth=2, color=3)
p4 = plot(year, ENSO, label="El Niño southern oscillation (ENSO)", xlabel="Year", ylabel="", title="", style=:dashdot, linewidth=2, color=4)
all_plot = plot(p1, p2, p3, p4, layout = l, fontfamily="Computer Modern", legendfontsize=10, tickfontsize=10, titlefontfamily="Computer Modern", legendfontfamily="Computer Modern", tickfontfamily="Computer Modern", ylabelfontsize=10, xlabelfontsize=10, titlefontsize=12)

display(all_plot)

Figure 1: Plots of the variables of interest

In [53]:

using Plots

p1 = plot(year, G, label="Atmospheric concentration", xlabel="Year", ylabel="GtC/yr", title="", style=:solid, linewidth=2, color=1)
plot!(fontfamily="Computer Modern", legendfontsize=12, tickfontsize=14, titlefontfamily="Computer Modern", legendfontfamily="Computer Modern", tickfontfamily="Computer Modern", ylabelfontsize=14, xlabelfontsize=14, titlefontsize=16)
display(p1)

p3 = plot(year, VAI, label="Volcanic activity index (VAI)", xlabel="Year", ylabel="", title="", style=:dot, linewidth=2, color=3)
plot!(fontfamily="Computer Modern", legendfontsize=12, tickfontsize=14, titlefontfamily="Computer Modern", legendfontfamily="Computer Modern", tickfontfamily="Computer Modern", ylabelfontsize=14, xlabelfontsize=14, titlefontsize=16)
display(p3)

CO_2 atmospheric concentration [top] and volcanic activity index [bottom]

Unit root tests

We first test for the presence of a unit root in the time series of emissions and atmospheric concentrations. We use the augmented Dickey-Fuller test with a constant and a trend. The null hypothesis is that the time series has a unit root. The test is implemented in the HypothesisTests package in Julia.

In [54]:

using HypothesisTests

resultsdf = DataFrame("Variable" => String[], "Model" => String[], "L = 0" => Float64[], "L = 1" => Float64[], "L = 2" => Float64[], "L = 3" => Float64[], "L = 4" => Float64[], "L = 5" => Float64[])

for variable in [:G, :E, :E₂, :E₃]
    for model in [:none, :constant, :trend]
        fila = zeros(6)
        for lags in 0:5
            τ = ADFTest(eval(variable), model, lags)
            fila[lags+1] = pvalue(τ)
        end
        push!(resultsdf, [titlecase(string(variable)), titlecase(string(model)), fila...])
    end
end

resultsdf

12×8 DataFrame

Row	Variable	Model	L = 0	L = 1	L = 2	L = 3	L = 4	L = 5
	String	String	Float64	Float64	Float64	Float64	Float64	Float64
1	G	None	0.235183	0.520472	0.695203	0.803914	0.84451	0.85085
2	G	Constant	0.00180684	0.0718307	0.299885	0.444168	0.465884	0.386677
3	G	Trend	2.4945e-9	7.17214e-6	0.0011416	0.0129135	0.016052	0.00266726
4	E	None	1.0	1.0	0.999998	0.999812	0.997157	0.99959
5	E	Constant	0.953474	0.939141	0.91181	0.862177	0.828175	0.896349
6	E	Trend	0.0834996	0.202047	0.290654	0.313837	0.213148	0.474568
7	E₂	None	1.0	1.0	0.999995	0.999095	0.988348	0.996314
8	E₂	Constant	0.895398	0.909992	0.892363	0.876604	0.859424	0.887444
9	E₂	Trend	0.254049	0.450733	0.454151	0.356734	0.162103	0.385615
10	E₃	None	0.998787	0.999994	1.0	0.99965	0.996771	0.99911
11	E₃	Constant	0.873234	0.904763	0.90761	0.877207	0.890759	0.894926
12	E₃	Trend	0.00540916	0.134192	0.512566	0.354092	0.249755	0.525994

Linear regression

Bennedsen, Hillebrand, and Koopman (2024) suggested to estimate the airborne fraction by linear regression. They propose to use the following specification:

G_t = \alpha E_t + \epsilon_t,

and estimate \alpha, the airborne fraction, using ordinary least squares (OLS). They argue that this approach provides better statistical properties. Among them, the OLS estimator is super-consistent, meaning that it converges to the true value at a faster rate than the classic estimator. They also show that the estimator has lower variance and it is asymptotically normal.

To contrast the results, we first replicate the main results of Bennedsen, Hillebrand, and Koopman (2024). The authors considered a simple specification of the model, where the emissions variable is the only regressor, and an extended model that includes additional covariates.

Simple specification of the model

In [55]:

α₂ = (E'E) \ (E'G)

rss₂ = sum((G - α₂ * E) .^ 2)
σ²₂ = rss₂ / (length(G) - 1)
sd₍α₂₎ = sqrt(σ²₂ / (E'E))

α₂, sd₍α₂₎

(0.44779188441445344, 0.014241317441433234)

Extended model

Additional covariates, controlling for the El Niño Southern Oscillation (ENSO) and volcanic activity index (VAI). This is the preferred specification by Bennedsen, Hillebrand, and Koopman (2024).

Detrending ENSO

Note that Bennedsen, Hillebrand, and Koopman (2024) first detrended the ENSO data using a linear trend. We analyse first if the detrending is necessary.

We plot the ENSO data and the detrended ENSO data.

In [56]:

T = length(ENSO)

Xₜ = [ones(T) collect(1:T)]

ρ = (Xₜ'Xₜ) \ (Xₜ'ENSO)

ENSOᵨ = ENSO - Xₜ * ρ    

p5 = plot(year, [ENSOᵨ ENSO], label=["Detrended ENSO" "ENSO"], xlabel="Year", ylabel="Unitless", title="El Niño southern oscillation", linewidth = 2, style = [:solid :dash :dot])
plot!( fontfamily="Computer Modern", legendfontsize=12, tickfontsize=14, titlefontfamily="Computer Modern", legendfontfamily="Computer Modern", tickfontfamily="Computer Modern", ylabelfontsize=14, xlabelfontsize=14, titlefontsize=16, legend = :topleft)
display(p5)

ENSO data and detrended ENSO data

Moreover, we make the hypothesis test of the presence of a linear trend in the ENSO data. The null hypothesis is that there is no linear trend in the data.

In [57]:

using Distributions

resᵨ = ENSO - Xₜ * ρ
σ²ᵨ = sum(resᵨ.^2) / (T - 2)
Var₍ᵨ₎ = σ²ᵨ * inv(Xₜ'Xₜ) 

t1ᵨ = ρ[1] / sqrt( Var₍ᵨ₎[1,1] )
pval1ᵨ = 2 * (1 - cdf(TDist(T-2), abs(t1ᵨ)))

t2ᵨ = ρ[2] / sqrt( Var₍ᵨ₎[2,2] )
pval2ᵨ = 2 * (1 - cdf(TDist(T-2), abs(t2ᵨ)))

[[ρ[1] sqrt(Var₍ᵨ₎[1,1]) t1ᵨ pval1ᵨ]; ρ[2] sqrt(Var₍ᵨ₎[2,2]) t2ᵨ pval2ᵨ]

2×4 Matrix{Float64}:
 -0.144369    0.158836    -0.908919  0.366913
  0.00264145  0.00424887   0.621684  0.536429

Note that the p-values are large, so we fail to reject the null hypothesis. This means that there is no evidence of a linear trend in the ENSO data. Hence, we continue the analysis without detrending the ENSO data.

Estimation of the extended model

In [58]:

Xₑ = [E ENSO VAI]
αₑ = (Xₑ'Xₑ) \ (Xₑ'G)

rssₑ = sum((G - Xₑ * αₑ) .^ 2)
σ²ₑ = rssₑ / (length(G) - 3)
var₍αₑ₎ = σ²ₑ * inv(Xₑ'Xₑ)

[ αₑ, sqrt.([var₍αₑ₎[j, j] for j = 1:3]),   αₑ./ sqrt.([var₍αₑ₎[j, j] for j = 1:3]) ]

3-element Vector{Vector{Float64}}:
 [0.4734551237192292, 0.967254219300192, -14.154904191057945]
 [0.010839839871941388, 0.13273608839500087, 2.6969774873983585]
 [43.677317129448944, 7.287047787801323, -5.248432460855461]

In [59]:

tstats = αₑ./ sqrt.([var₍αₑ₎[j, j] for j = 1:3]) 
1- cdf(TDist(T-3), abs(tstats[3]))

1.0227639827276036e-6

Note that the estimate using the ENSO data without detrending is slighly larger than the estimate using the detrended ENSO data. Nonetheless, the difference is small and the estimates are very close.

R-squared and adjusted R-squared

We calculate the R-squared and adjusted R-squared for the extended model and compare them with the simple model.

R-squared is not a good measure of goodness-of-fit for nested models given that it never decreases and most likely increases with the number of regressors. The adjusted R-squared corrects this issue by penalizing the inclusion of additional regressors (Davidson and MacKinnon 2004).

In [60]:

tssₑ = sum(( G .- mean(G)).^2 )

R²₂ = 1 - rss₂ / tssₑ
R²ₑ = 1 - rssₑ / tssₑ


adjR²₂ = 1 - (rss₂ / (T - 1)) / (tssₑ / (T - 1))
adjR²ₑ = 1 - (rssₑ / (T - 3)) / (tssₑ / (T - 1))

R²₂, R²ₑ, adjR²₂, adjR²ₑ

(0.5862507041876346, 0.8012896668141006, 0.5862507041876346, 0.7947745739227596)

The R-squared and adjusted R-squared are higher for the extended model, suggesting that the additional covariates improve the fit of the model.

Measurement error and bias

Anthropogenic CO_2 emissions are given by E_t = E_t^{FF}+E_t^{LULCC}, where E_t^{FF} is the emissions from fossil fuels and E_t^{LULCC} is the emissions from land-use and land-cover changes (LULCC). The uncertainty in measurements of the airborne fraction stems in large part from uncertainties in the magnitude of LULCC emissions (Bennedsen, Hillebrand, and Koopman 2024). This suggesst that LULCC emissions are subject to measurement error.

Plotting LULCCs

Figure 2 shows three different measurements of the LULCC variable. The GCP LULCC data are from the Global Carbon Project (Friedlingstein et al. 2023), the H&C LULCC data are from Houghton and Castanho (2022), and the vMa LULCC data are from Marle et al. (2022).

In [61]:

using Plots

plot_lulcc = plot(year, [lulcc lulcc₂ lulcc₃], label=["GCP" "H&N" "vMa"], xlabel="Year", ylabel="CO2", title="Land-use and land-cover change measures", style=[:solid :dash :dot], linewidth=2)
plot!(fontfamily="Computer Modern", legendfontsize=12, tickfontsize=14, titlefontfamily="Computer Modern", legendfontfamily="Computer Modern", tickfontfamily="Computer Modern", ylabelfontsize=14, xlabelfontsize=14, titlefontsize=16, legend=:topright)

display(plot_lulcc)

Figure 2: Different land-use and land-cover change (LULCC) datasets.

Plotting emissions

Figure 3 shows the emissions variable using the different LULCC measurements. The emissions variable is the sum of the fossil fuels and LULCC emissions.

In [62]:

plot_emisions = plot(year, [E E₂ E₃], label=["Emissions (GCP LULCC)" "Emissions (H&N LULCC)" "Emissions (vMa LULCC)"], xlabel="Year", ylabel="CO2", title="Emission measures", style=[:solid :dash :dot], linewidth=2)
plot!(fontfamily="Computer Modern", legendfontsize=12, tickfontsize=14, titlefontfamily="Computer Modern", legendfontfamily="Computer Modern", tickfontfamily="Computer Modern", ylabelfontsize=14, xlabelfontsize=14, titlefontsize=16, legend=:topleft)

display(plot_emisions)

Figure 3: Emission measures using different land use and land cover change (LULCC) datasets.

Bias due to measurement errors

We show that measurement errors in the emissions variable can bias the estimates of the airborne fraction. Assume that we do not observe the true emissions but a noisy version of it. That is, we observe E_t = E_t^* + \eta_t, where E_t^* is the true emissions and \eta_t is the measurement error, which we assume has mean zero and variance \sigma^2_\eta. Estimating the airborne fraction using the noisy emissions by OLS:

\hat{\alpha}_{ME} = \frac{\sum_{t=1}^T E_t G_t}{\sum_{t=1}^T E_t^2} = \frac{\sum_{t=1}^T (E_t^*G_t + \eta_t G_t)}{\sum_{t=1}^T (E_t^{*2}+2E_t^*\eta_t+\eta_t^2)}\rightarrow \frac{\frac{1}{T}\sum_{t=1}^T E_t^*G_t}{\frac{1}{T}\sum_{t=1}^T E_t^{*2}+\sigma_\eta^2},

which shows that the OLS estimator is biased downwards. The bias increases with the variance of the measurement error, which is unknown.

To correct the bias, we can estimate the airborne fraction using instrumental variables. Unlike Deming regression, instrumental variables do not require the variance of the measurement error to be known, nor assuming that they are normally distributed.

Instrumental variables

To use instrumental variables, we need a variable that is correlated with the emissions but uncorrelated with the measurement error. This variable is called an instrument.

There are several measurements of the land-use and land-coverage changes (LULCC) variable Figure 2, which forms part of the emissions measurement Figure 3. Even under the assumption that all of these different measurements are subject to measurement error, we can use them as instruments to correct the bias in the estimate of the airborne fraction.

Consider a second emissions measurement, E_{2,t} = E_t^* + \omega_t, where \omega_t is the measurement error in the second emissions measurement. We assume that \omega_t is independent of \eta_t since the two measurements are performed independently. We can use the second emissions measurement as an instrument to estimate the airborne fraction. The instrument is correlated with the emissions variable given that they share the same true emissions; but, by construction, uncorrelated with the measurement error in the emissions variable.

Consider the following estimator for the airborne fraction:

\hat{\alpha}_{IV} = \frac{\sum_{t=1}^T E_{2,t} G_t}{\sum_{t=1}^T E_{2,t}E_{t}} = \frac{\sum_{t=1}^T (E_t^*G_t + \omega_t G_t)}{\sum_{t=1}^T (E_t^{*2}+E_t^*\eta_t+E_t^*\omega_t+\eta_t\omega_t)} \rightarrow \frac{\frac{1}{T}\sum_{t=1}^T E_t^*G_t}{\frac{1}{T}\sum_{t=1}^T E_t^{*2}} = \hat{\alpha}, \tag{7}

where \hat{\alpha} is the estimator without measurement errors. Hence, Equation 7 shows that the instrumental variables estimator is unbiased. Moreover, the estimator is consistent and asymptotically normal, regardless of the distribution of the measurement errors.

Note that the theoretical properties of the instrumental variables estimator are a direct consequence of the additive nature of the measurement errors. The order of probability of the measurement errors and the variables multiplied by them is lower than the order of the variables themselves. For a textbook treatment on orders of probability, see Hamilton (1994).

Depending on which variable is selected as the single instrument, two different estimators can be obtained. Later, we will extend the analysis to include both instruments simultaneously.

In [63]:

# H&N LULCC
α₍ₕₙ₎ = (E₂'E) \ (E₂'G)

# vMa LULCC
α₍ᵥₘₐ₎ = (E₃'E) \ (E₃'G)

α₍ₕₙ₎, α₍ᵥₘₐ₎

(0.4478895029464986, 0.44815234697503475)

In contrast to the Deming regression, there is a closed-form expression to compute the standard error of the instrumental variables estimator.

It is given by:

\widehat{\text{Var}}(\hat{\alpha}_{IV}) = \hat{\sigma}^2_{iv} \left(\sum_{t=1}^T E_{2,t} E_t\right) \left(\sum_{t=1}^T E_{2,t} E_{2,t}\right)^{-1} \left(\sum_{t=1}^T E_{2,t} E_t\right), \tag{8}

where \hat{\sigma}^2_{IV} = \frac{1}{T-1}\sum_{t=1^T}(G_t-\hat{\alpha}_{IV}E_t)^2 is the estimator of the variance of the residuals.

In [64]:

# H&N LULCC
rss₍ₕₙ₎ = sum((G - α₍ₕₙ₎ * E) .^ 2)
σ²₍ₕₙ₎ = rss₍ₕₙ₎ / (length(G) - 1)
sd₍α₍ₕₙ₎₎ = sqrt(σ²₍ₕₙ₎ / (E₂'E) * (E₂'E₂) / (E'E₂))

# vMa LULCC
rss₍ᵥₘₐ₎ = sum((G - α₍ᵥₘₐ₎ * E) .^ 2)
σ²₍ᵥₘₐ₎ = rss₍ᵥₘₐ₎ / (length(G) - 1)
sd₍α₍ᵥₘₐ₎₎ = sqrt(σ²₍ᵥₘₐ₎ / (E₃'E) * (E₃'E₃) / (E'E₃))

sd₍α₍ₕₙ₎₎, sd₍α₍ᵥₘₐ₎₎

(0.014250528339527469, 0.014259657933478448)

Estimates and standard errors.

In [65]:

[α₍ₕₙ₎ α₍ᵥₘₐ₎; sd₍α₍ₕₙ₎₎ sd₍α₍ᵥₘₐ₎₎]

2×2 Matrix{Float64}:
 0.44789    0.448152
 0.0142505  0.0142597

R-squared and adjusted R-squared.

In [66]:

R²₍ₕₙ₎ = 1 - rss₍ₕₙ₎ / tssₑ
R²₍ᵥₘₐ₎ = 1 - rss₍ᵥₘₐ₎ / tssₑ

0.586246496759459

Generalised instrumental variables

Instrumental variables can be extended to simultaneously use more than one instrument for each variable with measurement error. The estimator is denoted as the generalised instrumental variables (GIV) and it is given by:

\hat{\alpha}_{GIV} = (\tilde{E}'\tilde{E})^{-1}\tilde{E}' G,

where \tilde{E}_t is the fitted value from the following regression:

E_t = \beta_1 E_{2,t} + \beta_2 E_{3,t} + \epsilon_t,

where E_{2,t} = E_t^* + \omega_t and E_{3,t} = E_t^* + \zeta_t are the second and third emissions measurements, respectively. The coefficients \beta_1 and \beta_2 are estimated by linear regression.

Moreover, the variance of the GIV estimator is given by:

\widehat{\text{Var}}(\hat{\alpha}_{GIV}) = \hat{\sigma}^2_{GIV} (\tilde{E}'\tilde{E})^{-1},

where \hat{\sigma}^2_{GIV} = \frac{1}{T-1}\sum_{t=1}^T(G_t-\hat{\alpha}_{GIV}E_t)^2 is the estimator of the variance of the residuals.

In [67]:

X = E
W = [E₂ E₃]
PW = W * ((W' * W) \ W')

αᵢ = (X' * PW * X) \ (X' * PW * G)

rssᵢ = sum((G - X * αᵢ) .^ 2)
σ²ᵢ = rssᵢ / (length(G) - 1)
var₍αᵢ₎ = σ²ᵢ * inv(X' * PW * X)
sd₍αᵢ₎ = sqrt(var₍αᵢ₎)

αᵢ, sd₍αᵢ₎

(0.44763765543651457, 0.014247682195423763)

R-squared.

In [68]:

R²ᵢ = 1 - rssᵢ / tssₑ

0.5862499339435967

Tests for IV

Having more than one instrument further allows us to test the validity of the instruments. Two common tests for GIV are Sargan’s instrument validity test and the Hausman’s overidentification test.

Sargan test

Sargan’s test is a test to determine if the instruments are correlated with the endogenous variable.

In [69]:

using Statistics
γ = (W' * W) \ (W' * X)
rssₐ = sum((X - W * γ) .^ 2)

γ₂ = (E₂'E₂ ) \ (E₂'X)
rssi2 = sum((X - E₂ * γ₂) .^ 2)

γ₃ = (E₃'E₃ ) \ (E₃'X)
rssi3 = sum((X - E₃ * γ₃) .^ 2)

tsse = sum((X .- mean(X)).^2)

F₂ = ( (tsse-rssi2) / 1) / (rssi2 / (T - 1))
F₃ = ( (tsse-rssi3) / 1) / (rssi3 / (T - 1))
Fₐ = ( (tsse-rssₐ) / 1) / (rssₐ / (T - 1))

[ F₂ F₃ Fₐ]

1×3 Matrix{Float64}:
 3583.77  1776.43  5220.89

In [70]:

using Distributions
1 - cdf(Chisq(1), ȷ)

UndefVarError: UndefVarError: `ȷ` not defined in `Main`
Suggestion: check for spelling errors or missing imports.
UndefVarError: `ȷ` not defined in `Main`

Suggestion: check for spelling errors or missing imports.



Stacktrace:

 [1] top-level scope

   @ ~/Library/CloudStorage/OneDrive-AalborgUniversitet/Research/CLIMATE/AirborneFraction/QuartoVersion/jl_notebook_cell_df34fa98e69747e1a8f8a730347b8e2f_Y102sZmlsZQ==.jl:2

Hausman test

Hausman’s test is a test to determine if there is a systematic difference between the instrumental variables and the OLS estimates.

In [71]:

Hᵢ = (αᵢ - α₂)' * ((sd₍αᵢ₎^2 - sd₍α₂₎^2) \ (αᵢ - α₂))
H₍ₕₙ₎ = (α₍ₕₙ₎ - α₂)' * ((sd₍α₍ₕₙ₎₎^2 - sd₍α₂₎^2) \ (α₍ₕₙ₎ - α₂))
H₍ᵥₘₐ₎ = (α₍ᵥₘₐ₎ - α₂)' * ((sd₍α₍ᵥₘₐ₎₎^2 - sd₍α₂₎^2) \ (α₍ᵥₘₐ₎ - α₂)) 

[1 - cdf(Chisq(1), Hᵢ) 1 - cdf(Chisq(1), H₍ₕₙ₎) 1 - cdf(Chisq(1), H₍ᵥₘₐ₎)]

1×3 Matrix{Float64}:
 0.71721  0.848874  0.618083

Extended model

We consider adding additional covariates to the model. In particular, we consider adding ENSO (El Niño Southern Oscillation) and VAI (volcanic activity index) as covariates. These variables are known to affect the carbon cycle and can potentially influence the airborne fraction. Note that these variables were not considered in the Deming regression analysis by Bennedsen, Hillebrand, and Koopman (2024).

We estimate the following model:

G_t = \alpha E_t + \gamma_1 ENSO_t + \gamma_2 VAI_t + \epsilon_t,

where ENSO_t and VAI_t are the El Niño Southern Oscillation and volcanic activity index at time t, respectively.

Estimating the extended model using instrumental variables is straightforward. We can include the additional covariates in the regression and instrument the emissions variable.

Standard errors for the extended model using instrumental variables are also straightforward.

In [72]:

Xₑ = [E ENSO VAI]

# H&N LULCC with ENSO and VAI
Wₕₙ = [E₂ ENSO VAI]
PWₕₙ = Wₕₙ * ((Wₕₙ' * Wₕₙ) \ Wₕₙ')

α₍ₕₙₑ₎ = (Xₑ' * PWₕₙ * Xₑ) \ (Xₑ' * PWₕₙ * G)
rss₍ₕₙₑ₎ = sum((G - Xₑ * α₍ₕₙₑ₎) .^ 2)
σ²₍ₕₙₑ₎ = rss₍ₕₙₑ₎ / (length(G) - 3)
var₍α₍ₕₙₑ₎₎ = σ²₍ₕₙₑ₎ * inv(Xₑ'PWₕₙ * Xₑ)
sd₍α₍ₕₙₑ₎₎ = sqrt.([var₍α₍ₕₙₑ₎₎[j, j] for j = 1:3])

# vMa LULCC with ENSO and VAI
Wᵥₘₐ = [E₃ ENSO VAI]
PWᵥₘₐ = Wᵥₘₐ * ((Wᵥₘₐ' * Wᵥₘₐ) \ Wᵥₘₐ')

α₍ᵥₘₐₑ₎ = (Xₑ' * PWᵥₘₐ * Xₑ) \ (Xₑ' * PWᵥₘₐ * G)
rss₍ᵥₘₐₑ₎ = sum((G - Xₑ * α₍ᵥₘₐₑ₎) .^ 2)
σ²₍ᵥₘₐₑ₎ = rss₍ᵥₘₐₑ₎ / (length(G) - 3)
var₍α₍ᵥₘₐₑ₎₎ = σ²₍ᵥₘₐₑ₎ * inv(Xₑ'PWᵥₘₐ * Xₑ)
sd₍α₍ᵥₘₐₑ₎₎ = sqrt.([var₍α₍ᵥₘₐₑ₎₎[j, j] for j = 1:3])

# GIV with ENSO and VAI
Wₑ = [E₂ E₃ ENSO VAI]
PWₑ = Wₑ * ((Wₑ' * Wₑ) \ Wₑ')

α₍ᵢₑ₎ = (Xₑ' * PWₑ * Xₑ) \ (Xₑ' * PWₑ * G)
res₍ᵢₑ₎ = G - Xₑ * α₍ᵢₑ₎
rss₍ᵢₑ₎ = sum(res₍ᵢₑ₎ .^ 2)
σ²₍ᵢₑ₎ = sum(res₍ᵢₑ₎ .^ 2) / (length(G) - 4)
var₍α₍ᵢₑ₎₎ = σ²₍ᵢₑ₎ * inv(Xₑ' * PWₑ * Xₑ)
sd₍α₍ᵢₑ₎₎ = sqrt.([var₍α₍ᵢₑ₎₎[j, j] for j = 1:3])

[α₍ₕₙₑ₎ α₍ᵥₘₐₑ₎ α₍ᵢₑ₎; sd₍α₍ₕₙₑ₎₎ sd₍α₍ᵥₘₐₑ₎₎ sd₍α₍ᵢₑ₎₎]

6×3 Matrix{Float64}:
   0.472673     0.472328     0.472978
   0.96578      0.965129     0.966355
 -14.0822     -14.0501     -14.1106
   0.0108476    0.0108547    0.0109354
   0.132744     0.132752     0.133841
   2.69734      2.6977       2.71959

R-squared.

In [73]:

R²₍ₕₙₑ₎ = 1 - rss₍ₕₙₑ₎ / tssₑ
R²₍ᵥₘₐₑ₎ = 1 - rss₍ᵥₘₐₑ₎ / tssₑ
R²₍ᵢₑ₎ = 1 - rss₍ᵢₑ₎ / tssₑ

[ R²₍ₕₙₑ₎, R²₍ᵥₘₐₑ₎, R²₍ᵢₑ₎ ]

3-element Vector{Float64}:
 0.8012727178955843
 0.8012544450214332
 0.8012833537383226

Instruments Tests

Sargan and Hausman tests for the extended model.

In [74]:

γₑ = (Wₑ' * Wₑ) \ (Wₑ' * Xₑ)
rss₍ᵢₑ₎ = sum((Xₑ - Wₑ * γₑ) .^ 2)

R²₍ᵢₑ₎ = 1 - rss₍ᵢₑ₎ / (sum((Xₑ .- mean(Xₑ)) .^ 2))

ȷₑ = length(X) * R²₍ᵢₑ₎

1 - cdf(Chisq(3), ȷₑ)

8.526512829121202e-14

In [75]:

Hₑ = (α₍ᵢₑ₎ - αₑ)' * ((var₍α₍ᵢₑ₎₎ - var₍αₑ₎ ) \ (α₍ᵢₑ₎ - αₑ))
H₍ₕₙₑ₎ = (α₍ₕₙₑ₎ - αₑ)' * ((var₍α₍ₕₙₑ₎₎ - var₍αₑ₎ ) \ (α₍ₕₙₑ₎ - αₑ))
H₍ᵥₘₐₑ₎ = (α₍ᵥₘₐₑ₎ - αₑ)' * ((var₍α₍ᵥₘₐₑ₎₎ - var₍αₑ₎) \ (α₍ᵥₘₐₑ₎ - αₑ))

[1 - cdf(Chisq(3), Hₑ) 1 - cdf(Chisq(3), H₍ₕₙₑ₎) 1 - cdf(Chisq(3), H₍ᵥₘₐₑ₎)]

1×3 Matrix{Float64}:
 0.990682  0.301239  0.267212

Recent subsample

Given the variability of the LULCC measurements at the beginning of the series, we consider a recent subsample of the data. We consider the data from 1992 and estimate the airborne fraction using the new approach.

Getting subsample data.

In [76]:

E92 = E[year.>=1992];
G92 = G[year.>=1992];
E92₂ = E₂[year.>=1992];
E92₃ = E₃[year.>=1992];
VAI92 = VAI[year.>=1992];
ENSO92 = ENSO[year.>=1992];

New approach for the recent subsample.

In [77]:

tss92ₑ = sum(( G92 .- mean(G92)).^2 )
α92₂ = (E92'E92₂) \ (E92₂'G92)

rss92₂ = sum((G92 - α92₂ * E92₂) .^ 2)
σ²92₂ = rss92₂ / (length(G92) - 1)
sd₍α92₂₎ = sqrt(σ²92₂ / (E92₂'E92₂))

α92₂, sd₍α92₂₎

(0.4496265998122475, 0.01847550703053799)

R-squared.

In [78]:

R²92₂ = 1 - rss92₂ / tss92ₑ

0.32571313158037896

Instrumental variables for the recent subsample.

In [79]:

X92 = E92
W92 = [E92₂ E92₃]
PW92 = W92 * ((W92' * W92) \ W92')

# H&N LULCC
α92₍ₕₙ₎ = (E92₂'E92) \ (E92₂'G92)
rss92₍ₕₙ₎ = sum((G92 - α92₍ₕₙ₎ * E92) .^ 2)
σ²92₍ₕₙ₎ = rss92₍ₕₙ₎ / (length(G92) - 1)
sd₍α92₍ₕₙ₎₎ = sqrt(σ²92₍ₕₙ₎ / (E92₂'E92) * (E92₂'E92₂) / (E92'E92₂))

# vMa LULCC
α92₍ᵥₘₐ₎ = (E92₃'E92) \ (E92₃'G92)
rss92₍ᵥₘₐ₎ = sum((G92 - α92₍ᵥₘₐ₎ * E92) .^ 2)
σ²92₍ᵥₘₐ₎ = rss₍ᵥₘₐ₎ / (length(G92) - 1)
sd₍α92₍ᵥₘₐ₎₎ = sqrt(σ²92₍ᵥₘₐ₎ / (E92₃'E92) * (E92₃'E92₃) / (E92'E92₃))

# GIV
α92ᵢ = (X92' * PW92 * X92) \ (X92' * PW92 * G92)
rss92ᵢ = sum((G92 - X92 * α92ᵢ) .^ 2)
σ²92ᵢ = rss92ᵢ / (length(G92) - 1)
sd₍α92ᵢ₎ = sqrt(σ²92ᵢ * inv(X92' * PW92 * X92))

[α92₍ₕₙ₎ α92₍ᵥₘₐ₎ α92ᵢ; sd₍α92₍ₕₙ₎₎ sd₍α92₍ᵥₘₐ₎₎ sd₍α92ᵢ₎]

2×3 Matrix{Float64}:
 0.449627   0.450219   0.449493
 0.0173104  0.0244938  0.0173103

R-squared.

In [80]:

R²92₍ₕₙ₎ = 1 - rss92₍ₕₙ₎ / tss92ₑ
R²92₍ᵥₘₐ₎ = 1 - rss92₍ᵥₘₐ₎ / tss92ₑ
R²92ᵢ = 1 - rss92ᵢ / tss92ₑ

[ R²92₍ₕₙ₎, R²92₍ᵥₘₐ₎, R²92ᵢ ]

3-element Vector{Float64}:
 0.35575764016694966
 0.35573622626933465
 0.3557555265062794

Test for weak instruments.

In [81]:

γ92 = (W92' * W92) \ (W92' * X92)
rss92ₐ = sum((X92 - W92 * γ92) .^ 2)

γ92₂ = (E92₂'E92₂ ) \ (E92₂'X92)
rssi922 = sum((X92 - E92₂ * γ92₂) .^ 2)

γ92₃ = (E92₃'E92₃ ) \ (E92₃'X92)
rssi923 = sum((X92 - E92₃ * γ92₃) .^ 2)

tsse92 = sum((X92 .- mean(X92)).^2)

F92₂ = ( (tsse92-rssi922) / 1) / (rssi922 / (length(G92) - 1))
F92₃ = ( (tsse92-rssi923) / 1) / (rssi923 / (length(G92) - 1))
F92ₐ = ( (tsse92-rss92ₐ) / 1) / (rss92ₐ / (length(G92) - 1))

[ F92₂ F92₃ F92ₐ]

1×3 Matrix{Float64}:
 3024.22  861.035  3308.11

Extended model for the recent subsample.

In [82]:

X92ₑ = [E92 ENSO92 VAI92]

α92ₑ = (X92ₑ'X92ₑ) \ (X92ₑ'G92)
rss92ₑ = sum((G92 - X92ₑ * α92ₑ) .^ 2)
σ²92ₑ = rss92ₑ / (length(G92) - 3)
var₍α92ₑ₎ = σ²92ₑ * inv(X92ₑ'X92ₑ)

[ α92ₑ, sqrt.([var₍α92ₑ₎[j, j] for j = 1:3]) ]

2-element Vector{Vector{Float64}}:
 [0.4622219656227806, 1.0237798088559882, -17.167003554558086]
 [0.011245930036851546, 0.17229498857880865, 3.660364706338823]

R-squared.

In [83]:

R²92ₑ = 1 - rss92ₑ / tss92ₑ

0.7593843710837234

Instrumental variables for the recent subsample and extended model.

In [84]:

X92ₑ = [E92 ENSO92 VAI92]

# H&N LULCC
W92₍ₕₙₑ₎ = [E92₂ ENSO92 VAI92]
PW92₍ₕₙₑ₎ = W92₍ₕₙₑ₎ * ((W92₍ₕₙₑ₎' * W92₍ₕₙₑ₎) \ W92₍ₕₙₑ₎')

α92₍ₕₙₑ₎ = (X92ₑ' * PW92₍ₕₙₑ₎ * X92ₑ) \ (X92ₑ' * PW92₍ₕₙₑ₎ * G92)
rss92₍ₕₙₑ₎ = sum((G92 - X92ₑ * α92₍ₕₙₑ₎) .^ 2)
σ²92₍ₕₙₑ₎ = rss92₍ₕₙₑ₎ / (length(G92) - 3)
var₍α92₍ₕₙₑ₎₎ = σ²92₍ₕₙₑ₎ * inv(X92ₑ' * PW92₍ₕₙₑ₎ * X92ₑ)
sd₍α92₍ₕₙₑ₎₎ = sqrt.([var₍α92₍ₕₙₑ₎₎[j, j] for j = 1:3])

# vMa LULCC
W92₍ᵥₘₐₑ₎ = [E92₃ ENSO92 VAI92]
PW92₍ᵥₘₐₑ₎ = W92₍ᵥₘₐₑ₎ * ((W92₍ᵥₘₐₑ₎' * W92₍ᵥₘₐₑ₎) \ W92₍ᵥₘₐₑ₎')

α92₍ᵥₘₐₑ₎ = (X92ₑ' * PW92₍ᵥₘₐₑ₎ * X92ₑ) \ (X92ₑ' * PW92₍ᵥₘₐₑ₎ * G92)
rss92₍ᵥₘₐₑ₎ = sum((G92 - X92ₑ * α92₍ᵥₘₐₑ₎) .^ 2)
σ²92₍ᵥₘₐₑ₎ = rss92₍ᵥₘₐₑ₎ / (length(G92) - 3)
var₍α92₍ᵥₘₐₑ₎₎ = σ²92₍ᵥₘₐₑ₎ * inv(X92ₑ' * PW92₍ᵥₘₐₑ₎ * X92ₑ)
sd₍α92₍ᵥₘₐₑ₎₎ = sqrt.([var₍α92₍ᵥₘₐₑ₎₎[j, j] for j = 1:3])

# GIV
W92ₑ = [E92₂ E92₃ ENSO92 VAI92]
PW92ₑ = W92ₑ * ((W92ₑ' * W92ₑ) \ W92ₑ')

α92₍ᵢₑ₎ = (X92ₑ' * PW92ₑ * X92ₑ) \ (X92ₑ' * PW92ₑ * G92)
rss92₍ᵢₑ₎ = sum((G92 - X92ₑ * α92₍ᵢₑ₎) .^ 2)
σ²92₍ᵢₑ₎ = rss92₍ᵢₑ₎ / (length(G92) - 3)
var92₍α₍ᵢₑ₎₎ = σ²92₍ᵢₑ₎ * inv(X92ₑ' * PW92ₑ * X92ₑ)
sd₍α92₍ᵢₑ₎₎ = sqrt.([var92₍α₍ᵢₑ₎₎[j, j] for j = 1:3])

# All results
[α92₍ₕₙₑ₎ sd₍α92₍ₕₙₑ₎₎; α92₍ᵥₘₐₑ₎ sd₍α92₍ᵥₘₐₑ₎₎; α92₍ᵢₑ₎ sd₍α92₍ᵢₑ₎₎]

9×2 Matrix{Float64}:
   0.462196  0.0112469
   1.02376   0.172295
 -17.1651    3.66038
   0.462269  0.0112487
   1.02382   0.172295
 -17.1705    3.66041
   0.462177  0.0112468
   1.02374   0.172295
 -17.1637    3.66038

R-squared.

In [85]:

R²92₍ₕₙₑ₎ = 1 - rss92₍ₕₙₑ₎ / tss92ₑ
R²92₍ᵥₘₐₑ₎ = 1 - rss92₍ᵥₘₐₑ₎ / tss92ₑ
R²92₍ᵢₑ₎ = 1 - rss92₍ᵢₑ₎ / tss92ₑ

[ R²92₍ₕₙₑ₎, R²92₍ᵥₘₐₑ₎, R²92₍ᵢₑ₎ ]

3-element Vector{Float64}:
 0.7593843259463682
 0.759384221134288
 0.7593842324071447

Alternative datasets

Similar to Bennedsen, Hillebrand, and Koopman (2024), we consider the specifications of the model using the H&N and vMa LULCC measurements. We estimate the airborne fraction using the new approach and instrumental variables.

H&N LULCC

Specifying the model using the H&N LULCC measurements and using GCP LULCC and vMA LULCC as instruments.

In [86]:

α_hn_gcp = (E'E₂) \ (E'G)
rss_hn_gcp = sum((G - α_hn_gcp * E₂) .^ 2)
σ²_hn_gcp = rss_hn_gcp / (length(G) - 1)
sd₍α_hn_gcp₎ = sqrt(σ²_hn_gcp * (E'E₂) / (E'E) * (E'E₂))

α_hn_vma = (E₃'E₂) \ (E₃'G)
rss_hn_vma = sum((G - α_hn_vma * E₂) .^ 2)
σ²_hn_vma = rss_hn_vma / (length(G) - 1)
sd₍α_hn_vma₎ = sqrt(σ²_hn_vma * (E₃'E₂) / (E₃'E₃) * (E₃'E₂))

[α_hn_gcp α_hn_vma; sd₍α_hn_gcp₎ sd₍α_hn_vma₎]

2×2 Matrix{Float64}:
  0.47605   0.475619
 54.9119   54.9349

vMa LULCC

Specifying the model using the vMa LULCC measurements and using GCP LULCC and H&N LULCC as instruments.

In [87]:

α_vma_gcp = (E'E₃) \ (E'G)
rss_vma_gcp = sum((G - α_vma_gcp * E₃) .^ 2)
σ²_vma_gcp = rss_vma_gcp / (length(G) - 1)
sd₍α_vma_gcp₎ = sqrt(σ²_vma_gcp * (E'E₃) / (E'E) * (E'E₃))

α_vma_hn = (E₂'E₃) \ (E₂'G)
rss_vma_hn = sum((G - α_vma_hn * E₃) .^ 2)
σ²_vma_hn = rss_vma_hn / (length(G) - 1)
sd₍α_vma_hn₎ = sqrt(σ²_vma_hn * (E₂'E₃) / (E₂'E₂) * (E₂'E₃))

[α_vma_gcp α_vma_hn; sd₍α_vma_gcp₎ sd₍α_vma_hn₎]

2×2 Matrix{Float64}:
  0.491074   0.490342
 53.2731    53.3285

Deming with FWL theorem

The theoretical development of the Deming regression based on the Frisch-Waugh-Lovell theorem is presented in Theorem 1. The theorem states that the OLS estimator of the airborne fraction in the preferred specification can be obtained by regressing the residuals of the emissions variable from the covariates on the residuals of the airborne fraction from the covariates.

First, we use the Frisch-Waugh-Lovell theorem in the preferred specification of the model by Bennedsen, Hillebrand, and Koopman (2024).

In [88]:

AX = [ENSO VAI]
coefs1 = (AX'AX) \ (AX'G)
resₐ = G - AX * coefs1

coefs2 = (AX'AX) \ (AX'E)
resₑ = E - AX * coefs2

α₋ = (resₑ'resₑ) \ (resₑ'resₐ)

0.4734551237192293

Above, we also compute the airbone fraction in the preferred specification of the model to show that it is identical to the OLS estimator.

Deming regression standard errors

There is no closed-form expression to compute the standard errors of the Deming regression. Hence, we propose to use the bootstrap method to estimate the standard errors and confidence intervals.

First proposed by Efron (1992), bootstrap has become a major tool for approximating sampling distributions and variance of complex statistics. This is perhaps not surprising in view of its ability to estimate distributions for statistics even when analytical solutions are unavailable. In addition, bootstrap methods often yield more accurate results than standard methods. Similarly, in the context of confidence intervals, bootstrap has been often employed as a means for improving upon the accuracy of standard intervals (DiCiccio and Efron 1996).

We show how to employ a form of model-based bootstrap approach to calculate the confidence intervals of the Deming regression esstimate \hat{\alpha}_{Deming} in the simple specification. The algorithm proceeds as follows:

Estimate the equation G_{t}=\alpha E+u_{t} using Deming regression to obtain \hat{\alpha}_{Deming} and recover the residuals \hat{u}_{t} for t=1,\dots, T based on \hat{\alpha}_{Deming}. Let \tilde{u}_{t}=\hat{u}_{t}-\frac{1}{T}\sum_{t=1}^{T}\hat{u}_{t} be the recentered residuals.
Sample randomly (with replacement) the residuals \tilde{u}_{t} to create the bootstrap pseudo-residuals \tilde{u}^{\ast}_{t}. Create pseudo-data in the G domain by using recursively the following equation: G^{\ast}_{t}=\hat{\alpha}_{Deming} E_{t}+\tilde{u}^{\ast}_{t}. \tag{9}
Repeat the previous step B times(with B sufficiently large), and generate independent copies \hat{\alpha}^{\ast}_{Deming,1}\dots,\hat{\alpha}^{\ast}_{Deming,B} based on Equation 9.
Calculate s.e(\hat{\alpha}_{Deming})=\sqrt{\frac{1}{B-1}\sum_{i=1}^{B}(\hat{\alpha}^{\ast}_{Deming,i}-\bar{\hat{\alpha}}^{\ast}_{Deming})^2} where \bar{\hat{\alpha}}^{\ast}_{Deming}=\frac{1}{B}\sum_{i=1}^{B}\hat{\alpha}^{\ast}_{Deming,i}.
The confidence intervals are then obtained as: \left[\hat{\alpha}_{Deming}- q^{\ast}(1-\mathit{\alpha}/2)\ s.e(\hat{\alpha}_{Deming}),\hat{\alpha}_{Deming}+ q^{\ast}(\mathit{\alpha}/2)\ s.e(\hat{\alpha}_{Deming})\right], where q^{\ast}(1-\mathit{\alpha}/2) and q^{\ast}(\mathit{\alpha}/2) denote the 1-\mathit{\alpha}/2 and \mathit{\alpha}/2 percentiles of \hat{\alpha}^{\ast}_{Deming}.

The bootstrap procedure does not assume a Normal distribution and minimises computation error compared to the jackknife.

We write a function to compute the Deming regression standard errors.

In [89]:

Code

function Deming_se_ConfI(y::Array{Float64}, x::Array{Float64}, δ::Float64, B::Int64, a::Float64)
    ######################################################################
    # Function Arguments
    # y,x: dependent and independent variables
    # δ: ratio between the measurement error variances assumed known
    # B: number of bootstrap replications e.g. 9999
    # a: significance level chosen by the researcher e.g. 0.05
    # Note: Work in progress shows that bootstrapping can also be used to correct the coefficient  
    # for small sample bias. Not pursued here.
    #
    # Function Output
    # alpha_dem: Deming regression coefficient
    # se: standard error of the Deming regression coefficient
    # lower: lower bound of the confidence interval
    # upper: upper bound of the confidence interval
    ######################################################################
    T = length(y)
    y_boot = zeros(T, 1)
    alpha_boot = zeros(B, 1)
    alpha_boot = vec(zeros(B, 1))
    M₍xx₎ = x'x
    M₍xy₎ = x'y
    M₍yy₎ = y'y
    alpha_dem = (M₍yy₎ - δ * M₍xx₎ + sqrt((M₍yy₎ - δ * M₍xx₎)^2 + 4 * δ * M₍xy₎^2)) / (2 * M₍xy₎)
    resid = y - alpha_dem * x
    resid = resid .- mean(resid) #recenter residuals
    #Bootstrap
    for i = 1:B
        y_boot = alpha_dem * x + sample(resid, T, replace=true)
        M₍xx₎_boot = x'x
        M₍xy₎_boot = x'y_boot
        M₍yy₎_boot = y_boot'y_boot
        alpha_boot[i] = (M₍yy₎_boot - δ * M₍xx₎_boot + sqrt((M₍yy₎_boot - δ * M₍xx₎_boot)^2 + 4 * δ * M₍xy₎_boot^2)) / (2 * M₍xy₎_boot)
    end
    se = sqrt(sum((alpha_boot .- mean(alpha_boot)) .^ 2) / (B - 1)) # Calculate standard errors
    lower = quantile(alpha_boot, 1 - a / 2)
    upper = quantile(alpha_boot, a / 2)

    # R-squared of the regression
    R² = 1 - sum(resid.^2) / sum((y .- mean(y)).^2)

    # Generalized R-squared
    X̂ = x .+ (δ*alpha_dem)/(1+δ*alpha_dem^2) * resid
    Ŷ = y .- resid/(1+δ*alpha_dem^2)

    SSTx = sum((x .- mean(x)).^2)
    SSTy = sum((y .- mean(y)).^2)
    
    SSRx = sum((X̂ .- x).^2)
    SSRy = sum((Ŷ .- y).^2)
    
    SSg = SSRx+δ*SSRy
    R²x = 1 - SSg/SSTx
    R²y = 1 - SSg/(δ*SSTy)
    
    R²g = minimum([R²x, R²y])

    return alpha_dem, se, alpha_dem - lower * se, alpha_dem + upper * se, R², R²g
end

Deming_se_ConfI (generic function with 1 method)

We use the function to compute the estimates, standard errors, and confidence intervals of the airborne fraction in the preferred specification of the model. We do this by calling the function on the residuals of the emissions variable from the covariates and the residuals of the airborne fraction from the covariates.

We use 9,999 bootstrap replications to estimate the standard errors and confidence intervals.

In [90]:

δₑ = zeros(7, 5)
δₑ[1, :] = [0.2 0.5 1 2 5]

for ii = 1:5
    thisdelta = Deming_se_ConfI(resₐ, resₑ, δₑ[1, ii], 9999, 0.05)
    δₑ[2, ii] = thisdelta[1]
    δₑ[3, ii] = thisdelta[2]
    δₑ[4, ii] = thisdelta[3]
    δₑ[5, ii] = thisdelta[4]
    δₑ[6, ii] = thisdelta[5]
    δₑ[7, ii] = thisdelta[6]
end

display(δₑ)

7×5 Matrix{Float64}:
 0.2        0.5        1.0        2.0        5.0
 0.481519   0.478173   0.476241   0.474985   0.474106
 0.0105829  0.0106705  0.0106833  0.0104231  0.0106555
 0.476119   0.472795   0.470896   0.469805   0.468826
 0.486481   0.483103   0.481138   0.479739   0.47894
 0.894794   0.895387   0.895686   0.895863   0.895979
 0.899457   0.90612    0.914971   0.928242   0.94247

Deming regression in the simple specification

For completeness, we also compute the Deming regression in the simple specification of the model. The standard errors and confidence intervals were not computed in the paper by Bennedsen, Hillebrand, and Koopman (2024).

In [91]:

δₛ = zeros(7, 5)
δₛ[1, :] = [0.2 0.5 1 2 5]

for ii = 1:5
    thisdelta = Deming_se_ConfI(G, E, δₑ[1, ii], 9999, 0.05)
    δₛ[2, ii] = thisdelta[1]
    δₛ[3, ii] = thisdelta[2]
    δₛ[4, ii] = thisdelta[3]
    δₛ[5, ii] = thisdelta[4]
    δₛ[6, ii] = thisdelta[5]
    δₛ[7, ii] = thisdelta[6]
end

display(δₛ)

7×5 Matrix{Float64}:
 0.2        0.5        1.0        2.0        5.0
 0.462305   0.456067   0.4526     0.450406   0.448895
 0.0151941  0.0148343  0.0144814  0.0143655  0.0141124
 0.454604   0.448749   0.445559   0.443485   0.442151
 0.469104   0.462539   0.458825   0.456516   0.454865
 0.589449   0.588382   0.587709   0.587252   0.586924
 0.606278   0.627158   0.657806   0.706382   0.575025

Deming regression in the recent subsample

Simple specification

In [92]:

δ₉₂ = zeros(7, 5)
δ₉₂[1, :] = [0.2 0.5 1 2 5]

for ii = 1:5
    thisdelta = Deming_se_ConfI(G92, E92, δ₉₂[1, ii], 9999, 0.05)
    δ₉₂[2, ii] = thisdelta[1]
    δ₉₂[3, ii] = thisdelta[2]
    δ₉₂[4, ii] = thisdelta[3]
    δ₉₂[5, ii] = thisdelta[4]
    δ₉₂[6, ii] = thisdelta[5]
    δ₉₂[7, ii] = thisdelta[6]
end

display(δ₉₂)

7×5 Matrix{Float64}:
 0.2        0.5        1.0        2.0         5.0
 0.459831   0.455479   0.453053   0.451512    0.450449
 0.0173767  0.0173344  0.0172303  0.0173461   0.0169984
 0.451071   0.446888   0.444603   0.443049    0.442209
 0.467412   0.462886   0.460336   0.458803    0.45757
 0.358613   0.357528   0.356905   0.356502    0.35622
 0.384636   0.417909   0.466425   0.215276   -0.371494

Extended model

FWL theorem in the extended model and recent subsample.

In [93]:

AX92 = [ENSO92 VAI92]
coefs921 = (AX92'AX92) \ (AX92'G92)
res92ₐ = G92 - AX92 * coefs921

coefs922 = (AX92'AX92) \ (AX92'E92)
res92ₑ = E92 - AX92 * coefs922;

Deming regression standard errors in the extended model and recent subsample.

In [94]:

δ₉₂ₑ = zeros(7, 5)
δ₉₂ₑ[1, :] = [0.2 0.5 1 2 5]

for ii = 1:5
    thisdelta = Deming_se_ConfI(res92ₐ, res92ₑ, δ₉₂ₑ[1, ii], 9999, 0.05)
    δ₉₂ₑ[2, ii] = thisdelta[1]
    δ₉₂ₑ[3, ii] = thisdelta[2]
    δ₉₂ₑ[4, ii] = thisdelta[3]
    δ₉₂ₑ[5, ii] = thisdelta[4]
    δ₉₂ₑ[6, ii] = thisdelta[5]
    δ₉₂ₑ[7, ii] = thisdelta[6]
end

display(δ₉₂ₑ)

7×5 Matrix{Float64}:
 0.2        0.5        1.0        2.0        5.0
 0.466195   0.464524   0.463574   0.462962   0.462536
 0.0106297  0.0105658  0.0107814  0.0107149  0.0107295
 0.460982   0.459376   0.458339   0.457768   0.457348
 0.470965   0.46923    0.468352   0.467705   0.467274
 0.846761   0.846888   0.846951   0.846987   0.84701
 0.853144   0.861799   0.874023   0.892898   0.90928

Summary of results

The table below shows the estimates of the airborne fraction using the different methods. The table includes the estimates, standard errors, and confidence intervals.

Table of results full sample

In [95]:

Code

results_analysis = DataFrame("Model" => String[], "Estimate" => Float64[], "Std. Error" => Float64[], "Confidence Int." => Vector{Float64}[], "R²" => Float64[])

nd = 4;

push!(results_analysis, ["OLS Regression", α₂, sd₍α₂₎, round.([α₂ - 1.96 * sd₍α₂₎, α₂ + 1.96 * sd₍α₂₎], digits=nd), R²₂])
push!(results_analysis, ["OLS Regression with ENSO and VAI", αₑ[1], sqrt(var₍αₑ₎[1,1]), round.([αₑ[1] - 1.96 * var₍αₑ₎[1,1], αₑ[1] + 1.96 * var₍αₑ₎[1,1]], digits=nd), R²ₑ])
push!(results_analysis, ["IV Regression (H&N LULCC)", α₍ₕₙ₎, sd₍α₍ₕₙ₎₎, round.([α₍ₕₙ₎ - 1.96 * sd₍α₍ₕₙ₎₎, α₍ₕₙ₎ + 1.96 * sd₍α₍ₕₙ₎₎], digits=nd), R²₍ₕₙ₎])
push!(results_analysis, ["IV Regression (vMA LULCC)", α₍ᵥₘₐ₎, sd₍α₍ᵥₘₐ₎₎, round.([α₍ᵥₘₐ₎ - 1.96 * sd₍α₍ᵥₘₐ₎₎, α₍ᵥₘₐ₎ + 1.96 * sd₍α₍ᵥₘₐ₎₎], digits=nd), R²₍ᵥₘₐ₎])
push!(results_analysis, ["GIV Regression", αᵢ, sd₍αᵢ₎, round.([αᵢ - 1.96 * sd₍αᵢ₎, αᵢ + 1.96 * sd₍αᵢ₎], digits=nd), R²ᵢ])
push!(results_analysis, ["IV Regression (H&N LULCC) with ENSO and VAI", α₍ₕₙₑ₎[1], sd₍α₍ₕₙₑ₎₎[1], round.([α₍ₕₙₑ₎[1] - 1.96 * sd₍α₍ₕₙₑ₎₎[1], α₍ₕₙₑ₎[1] + 1.96 * sd₍α₍ₕₙₑ₎₎[1]], digits=nd), R²₍ₕₙₑ₎])
push!(results_analysis, ["IV Regression (vMA LULCC) with ENSO and VAI", α₍ᵥₘₐₑ₎[1], sd₍α₍ᵥₘₐₑ₎₎[1], round.([α₍ᵥₘₐₑ₎[1] - 1.96 * sd₍α₍ᵥₘₐₑ₎₎[1], α₍ᵥₘₐₑ₎[1] + 1.96 * sd₍α₍ᵥₘₐₑ₎₎[1]], digits=nd), R²₍ᵥₘₐₑ₎])
push!(results_analysis, ["GIV Regression with ENSO and VAI", α₍ᵢₑ₎[1], sqrt(var₍α₍ᵢₑ₎₎[1, 1]), round.([α₍ᵢₑ₎[1] - 1.96 * sqrt(var₍α₍ᵢₑ₎₎[1, 1]), α₍ᵢₑ₎[1] + 1.96 * sqrt(var₍α₍ᵢₑ₎₎[1, 1])], digits=nd), R²₍ᵢₑ₎])
push!(results_analysis, ["Deming regression (δ=0.2)", δₛ[2, 1], δₛ[3, 1], round.([δₛ[4, 1], δₛ[5, 1]], digits=nd), δₛ[6, 1]])
push!(results_analysis, ["Deming regression (δ=0.5)", δₛ[2, 2], δₛ[3, 2], round.([δₛ[4, 2], δₛ[5, 2]], digits=nd), δₛ[6, 2]])
push!(results_analysis, ["Deming regression (δ=1)", δₛ[2, 3], δₛ[3, 3], round.([δₛ[4, 3], δₛ[5, 3]], digits=nd), δₛ[6, 3]])
push!(results_analysis, ["Deming regression (δ=2)", δₛ[2, 4], δₛ[3, 4], round.([δₛ[4, 4], δₛ[5, 4]], digits=nd), δₛ[6, 4]])
push!(results_analysis, ["Deming regression (δ=5)", δₛ[2, 5], δₛ[3, 5], round.([δₛ[4, 5], δₛ[5, 5]], digits=nd), δₛ[6, 5]])
push!(results_analysis, ["Deming FWL regression (δ=0.2)", δₑ[2, 1], δₑ[3, 1], round.([δₑ[4, 1], δₑ[5, 1]], digits=nd), δₑ[6, 1]])
push!(results_analysis, ["Deming FWL regression (δ=0.5)", δₑ[2, 2], δₑ[3, 2], round.([δₑ[4, 2], δₑ[5, 2]], digits=nd), δₑ[6, 2]])
push!(results_analysis, ["Deming FWL regression (δ=1)", δₑ[2, 3], δₑ[3, 3], round.([δₑ[4, 3], δₑ[5, 3]], digits=nd), δₑ[6, 3]])
push!(results_analysis, ["Deming FWL regression (δ=2)", δₑ[2, 4], δₑ[3, 4], round.([δₑ[4, 4], δₑ[5, 4]], digits=nd), δₑ[6, 4]])
push!(results_analysis, ["Deming FWL regression (δ=5)", δₑ[2, 5], δₑ[3, 5], round.([δₑ[4, 5], δₑ[5, 5]], digits=nd), δₑ[6, 5]])
display(results_analysis)

18×5 DataFrame

Row	Model	Estimate	Std. Error	Confidence Int.	R²
	String	Float64	Float64	Array…	Float64
1	OLS Regression	0.447792	0.0142413	[0.4199, 0.4757]	0.586251
2	OLS Regression with ENSO and VAI	0.473455	0.0108398	[0.4732, 0.4737]	0.80129
3	IV Regression (H&N LULCC)	0.44789	0.0142505	[0.42, 0.4758]	0.58625
4	IV Regression (vMA LULCC)	0.448152	0.0142597	[0.4202, 0.4761]	0.586246
5	GIV Regression	0.447638	0.0142477	[0.4197, 0.4756]	0.58625
6	IV Regression (H&N LULCC) with ENSO and VAI	0.472673	0.0108476	[0.4514, 0.4939]	0.801273
7	IV Regression (vMA LULCC) with ENSO and VAI	0.472328	0.0108547	[0.4511, 0.4936]	0.801254
8	GIV Regression with ENSO and VAI	0.472978	0.0109354	[0.4515, 0.4944]	0.998788
9	Deming regression (δ=0.2)	0.462305	0.0151941	[0.4546, 0.4691]	0.589449
10	Deming regression (δ=0.5)	0.456067	0.0148343	[0.4487, 0.4625]	0.588382
11	Deming regression (δ=1)	0.4526	0.0144814	[0.4456, 0.4588]	0.587709
12	Deming regression (δ=2)	0.450406	0.0143655	[0.4435, 0.4565]	0.587252
13	Deming regression (δ=5)	0.448895	0.0141124	[0.4422, 0.4549]	0.586924
14	Deming FWL regression (δ=0.2)	0.481519	0.0105829	[0.4761, 0.4865]	0.894794
15	Deming FWL regression (δ=0.5)	0.478173	0.0106705	[0.4728, 0.4831]	0.895387
16	Deming FWL regression (δ=1)	0.476241	0.0106833	[0.4709, 0.4811]	0.895686
17	Deming FWL regression (δ=2)	0.474985	0.0104231	[0.4698, 0.4797]	0.895863
18	Deming FWL regression (δ=5)	0.474106	0.0106555	[0.4688, 0.4789]	0.895979

Table of results recent subsample

In [96]:

Code

results_analysis = DataFrame("Model" => String[], "Estimate" => Float64[], "Std. Error" => Float64[], "Confidence Int." => Vector{Float64}[], "R²" => Float64[])

nd = 4;

push!(results_analysis, ["Regression from 1992", α92₂, sd₍α92₂₎, round.([α92₂ - 1.96 * sd₍α92₂₎, α92₂ + 1.96 * sd₍α92₂₎], digits=nd), R²92₂]) 
push!(results_analysis, ["Regression from 1992 with ENSO and VAI", α92ₑ[1], sqrt(var₍α92ₑ₎[1,1]), round.([α92ₑ[1] - 1.96 * var₍α92ₑ₎[1,1], α92ₑ[1] + 1.96 * var₍α92ₑ₎[1,1]], digits=nd), R²92ₑ])
push!(results_analysis, ["IV Regression (H&N LULCC) from 1992", α92₍ₕₙ₎, sd₍α92₍ₕₙ₎₎, round.([α92₍ₕₙ₎ - 1.96 * sd₍α92₍ₕₙ₎₎, α92₍ₕₙ₎ + 1.96 * sd₍α92₍ₕₙ₎₎], digits=nd), R²92₍ₕₙ₎])
push!(results_analysis, ["IV Regression (vMA LULCC) from 1992", α92₍ᵥₘₐ₎, sd₍α92₍ᵥₘₐ₎₎, round.([α92₍ᵥₘₐ₎ - 1.96 * sd₍α92₍ᵥₘₐ₎₎, α92₍ᵥₘₐ₎ + 1.96 * sd₍α92₍ᵥₘₐ₎₎], digits=nd), R²92₍ᵥₘₐ₎])
push!(results_analysis, ["GIV Regression from 1992", α92ᵢ, sd₍α92ᵢ₎, round.([α92ᵢ - 1.96 * sd₍α92ᵢ₎, α92ᵢ + 1.96 * sd₍α92ᵢ₎], digits=nd), R²92ᵢ])
push!(results_analysis, ["IV Regression (H&N LULCC) from 1992 with ENSO and VAI", α92₍ₕₙₑ₎[1], sd₍α92₍ₕₙₑ₎₎[1], round.([α92₍ₕₙₑ₎[1] - 1.96 * sd₍α92₍ₕₙₑ₎₎[1], α92₍ₕₙₑ₎[1] + 1.96 * sd₍α92₍ₕₙₑ₎₎[1]], digits=nd), R²92₍ₕₙₑ₎])
push!(results_analysis, ["IV Regression (vMA LULCC) from 1992 with ENSO and VAI", α92₍ᵥₘₐₑ₎[1], sd₍α92₍ᵥₘₐₑ₎₎[1], round.([α92₍ᵥₘₐₑ₎[1] - 1.96 * sd₍α92₍ᵥₘₐₑ₎₎[1], α92₍ᵥₘₐₑ₎[1] + 1.96 * sd₍α92₍ᵥₘₐₑ₎₎[1]], digits=nd), R²92₍ᵥₘₐₑ₎])
push!(results_analysis, ["GIV Regression from 1992 with ENSO and VAI", α92₍ᵢₑ₎[1], sqrt(var92₍α₍ᵢₑ₎₎[1, 1]), round.([α92₍ᵢₑ₎[1] - 1.96 * sqrt(var92₍α₍ᵢₑ₎₎[1, 1]), α92₍ᵢₑ₎[1] + 1.96 * sqrt(var92₍α₍ᵢₑ₎₎[1, 1])], digits=nd), R²92₍ᵢₑ₎])
push!(results_analysis, ["Deming regression from 1992 (δ=0.2)", δ₉₂[2, 1], δ₉₂[3, 1], round.([δ₉₂[4, 1], δ₉₂[5, 1]], digits=nd), δ₉₂[6, 1]])
push!(results_analysis, ["Deming regression from 1992 (δ=0.5)", δ₉₂[2, 2], δ₉₂[3, 2], round.([δ₉₂[4, 2], δ₉₂[5, 2]], digits=nd), δ₉₂[6, 2]])
push!(results_analysis, ["Deming regression from 1992 (δ=1)", δ₉₂[2, 3], δ₉₂[3, 3], round.([δ₉₂[4, 3], δ₉₂[5, 3]], digits=nd), δ₉₂[6, 3]])
push!(results_analysis, ["Deming regression from 1992 (δ=2)", δ₉₂[2, 4], δ₉₂[3, 4], round.([δ₉₂[4, 4], δ₉₂[5, 4]], digits=nd), δ₉₂[6, 4]])
push!(results_analysis, ["Deming regression from 1992 (δ=5)", δ₉₂[2, 5], δ₉₂[3, 5], round.([δ₉₂[4, 5], δ₉₂[5, 5]], digits=nd), δ₉₂[6, 5]])
push!(results_analysis, ["Deming FWL regression from 1992 (δ=0.2)", δ₉₂ₑ[2, 1], δ₉₂ₑ[3, 1], round.([δ₉₂ₑ[4, 1], δ₉₂ₑ[5, 1]], digits=nd), δ₉₂ₑ[6, 1]])
push!(results_analysis, ["Deming FWL regression from 1992 (δ=0.5)", δ₉₂ₑ[2, 2], δ₉₂ₑ[3, 2], round.([δ₉₂ₑ[4, 2], δ₉₂ₑ[5, 2]], digits=nd), δ₉₂ₑ[6, 2]])
push!(results_analysis, ["Deming FWL regression from 1992 (δ=1)", δ₉₂ₑ[2, 3], δ₉₂ₑ[3, 3], round.([δ₉₂ₑ[4, 3], δ₉₂ₑ[5, 3]], digits=nd), δ₉₂ₑ[6, 3]])
push!(results_analysis, ["Deming FWL regression from 1992 (δ=2)", δ₉₂ₑ[2, 4], δ₉₂ₑ[3, 4], round.([δ₉₂ₑ[4, 4], δ₉₂ₑ[5, 4]], digits=nd), δ₉₂ₑ[6, 4]])
push!(results_analysis, ["Deming FWL regression from 1992 (δ=5)", δ₉₂ₑ[2, 5], δ₉₂ₑ[3, 5], round.([δ₉₂ₑ[4, 5], δ₉₂ₑ[5, 5]], digits=nd), δ₉₂ₑ[6, 5]])
results_analysis.Estimate = round.(results_analysis.Estimate, digits=nd)
results_analysis."Std. Error" = round.(results_analysis."Std. Error", digits=nd)
display(results_analysis)

18×5 DataFrame

Row	Model	Estimate	Std. Error	Confidence Int.	R²
	String	Float64	Float64	Array…	Float64
1	Regression from 1992	0.4496	0.0185	[0.4134, 0.4858]	0.325713
2	Regression from 1992 with ENSO and VAI	0.4622	0.0112	[0.462, 0.4625]	0.759384
3	IV Regression (H&N LULCC) from 1992	0.4496	0.0173	[0.4157, 0.4836]	0.355758
4	IV Regression (vMA LULCC) from 1992	0.4502	0.0245	[0.4022, 0.4982]	0.355736
5	GIV Regression from 1992	0.4495	0.0173	[0.4156, 0.4834]	0.355756
6	IV Regression (H&N LULCC) from 1992 with ENSO and VAI	0.4622	0.0112	[0.4402, 0.4842]	0.759384
7	IV Regression (vMA LULCC) from 1992 with ENSO and VAI	0.4623	0.0112	[0.4402, 0.4843]	0.759384
8	GIV Regression from 1992 with ENSO and VAI	0.4622	0.0112	[0.4401, 0.4842]	0.759384
9	Deming regression from 1992 (δ=0.2)	0.4598	0.0174	[0.4511, 0.4674]	0.358613
10	Deming regression from 1992 (δ=0.5)	0.4555	0.0173	[0.4469, 0.4629]	0.357528
11	Deming regression from 1992 (δ=1)	0.4531	0.0172	[0.4446, 0.4603]	0.356905
12	Deming regression from 1992 (δ=2)	0.4515	0.0173	[0.443, 0.4588]	0.356502
13	Deming regression from 1992 (δ=5)	0.4504	0.017	[0.4422, 0.4576]	0.35622
14	Deming FWL regression from 1992 (δ=0.2)	0.4662	0.0106	[0.461, 0.471]	0.846761
15	Deming FWL regression from 1992 (δ=0.5)	0.4645	0.0106	[0.4594, 0.4692]	0.846888
16	Deming FWL regression from 1992 (δ=1)	0.4636	0.0108	[0.4583, 0.4684]	0.846951
17	Deming FWL regression from 1992 (δ=2)	0.463	0.0107	[0.4578, 0.4677]	0.846987
18	Deming FWL regression from 1992 (δ=5)	0.4625	0.0107	[0.4573, 0.4673]	0.84701

References

Bennedsen, Mikkel, Eric Hillebrand, and Siem Jan Koopman. 2024. “A Regression-Based Approach to the CO2 Airborne Fraction.” Nature Communications 15 (1): 8507. https://doi.org/10.1038/s41467-024-52728-1.

Davidson, R, and James G MacKinnon. 2004. Econometric Theory and Methods. Oxford University Press.

DiCiccio, Thomas J, and Bradley Efron. 1996. “Bootstrap Confidence Intervals.” Statistical Science 11 (3): 189–228.

Efron, Bradley. 1992. “Bootstrap Methods: Another Look at the Jackknife.” In Breakthroughs in Statistics: Methodology and Distribution, 569–93. Springer.

Friedlingstein, Pierre, Michael O’sullivan, Matthew W Jones, et al. 2023. “Global Carbon Budget 2023.” Earth System Science Data 15 (12): 5301–69.

Frisch, Ragnar, and Frederick V Waugh. 1933. “Partial Time Regressions as Compared with Individual Trends.” Econometrica, 387–401.

Hamilton, James D. 1994. Time Series Analysis. 1st ed. Princeton University Press.

Houghton, Richard A, and Andrea Castanho. 2022. “Annual Emissions of Carbon from Land Use, Land-Use Change, and Forestry 1850–2020.” Earth System Science Data Discussions 2022: 1–36.

Lovell, Michael C. 1963. “Seasonal Adjustment of Economic Time Series and Multiple Regression Analysis.” Journal of the American Statistical Association 58 (304): 993–1010.

Marle, Margreet JE van, Dave van Wees, Richard A Houghton, Robert D Field, Jan Verbesselt, and Guido R van der Werf. 2022. “RETRACTED ARTICLE: New Land-Use-Change Emissions Indicate a Declining CO2 Airborne Fraction.” Nature 603 (7901): 450–54.

Supplementary information

Deming regression with additional covariates

Frisch-Waugh-Lovell theorem

Solving for \gamma

Solving for \alpha

Solving for E^*

Comparison with the FWL theorem and additional covariates

Reproducing the results

Setup

Airborne fraction

Data

Plotting the data

Unit root tests

Linear regression

Simple specification of the model

Extended model

Detrending ENSO

Estimation of the extended model

R-squared and adjusted R-squared

Measurement error and bias

Plotting LULCCs

Plotting emissions

Bias due to measurement errors

Instrumental variables

Generalised instrumental variables

Tests for IV

Sargan test

Hausman test

Extended model

Instruments Tests

Recent subsample

Alternative datasets

H&N LULCC

vMa LULCC

Deming with FWL theorem

Deming regression standard errors

Deming regression in the simple specification

Deming regression in the recent subsample

Simple specification

Extended model

Summary of results

Table of results full sample

Table of results recent subsample

References