using Pkg
Pkg.activate(pwd())
Pkg.instantiate()
Activating project at `c:\Users\eduar\OneDrive - Aalborg Universitet\Research\CLIMATE\AirborneFraction\QuartoVersion`
This is a notebook accompanying the paper Robust Estimation of Carbon Dioxide Airborne Fraction Under Measurement Errors. The notebook includes the proofs of the theoretical results presented in the paper. Moreover, the notebook develops the code used to estimate the airborne fraction using instrumental variables, a robust method to correct for measurement errors. We also extend Deming regression to estimate the airborne fraction and its uncertainty using the bootstrap method. The notebook includes the code to reproduce all figures and tables in the paper.
The up-to-date version of this notebook is available at https://github.com/everval/Robust-CO2-Estimation-Supplementary.
Deming regression is a generalisation of OLS that accounts for measurement errors in the independent variable and the dependent variable. Let G_t, and E_t be the noisy atmospheric growth and emissions variables measured with errors.
Deming regression poses the following model:
E_t = E_t^* + \eta_t,
G_t = \alpha E_t^* + \omega_t,
where E_t^* are the true emissions, and \eta_t and \omega_t are the measurement errors. Assuming that the measurement errors are Gaussian with variances given by \sigma_{\eta}^2, and \delta\sigma_{\eta}^2, respectively, the Deming regression estimator is given as the maximum likelihood estimator. Note that the notation implies that the ratio of the variances of the measurement errors is \delta.
In this section, we extend the Deming regression to include additional covariates. We consider the following model:
E_t = E_t^* + \eta_t, \tag{1}
G_t = \alpha E_t + Z_t\gamma + \omega_t, \tag{2}
where Z_t is a row vector of additional covariates at time t, \gamma is a vector of coefficients, and the rest of the notation is as before. In this paper, we consider the El Niño index and the volcanic activity index as additional covariates, but the model can be extended to include other covariates. Assuming that the measurement errors are Gaussian, the Deming regression estimator is obtained as maximum likelihood estimator.
In the context of the airborne fraction, the Frisch-Waugh-Lovell (FWL) theorem (Frisch and Waugh 1933; Lovell 1963) can be used to estimate the Deming regression in the preferred specification including the El Niño index and the volcanic activity index. The FWL theorem guarantees that the airborne fraction estimator, \hat{\alpha}, in the preferred specification given by:
G_t = \alpha E_t + \gamma_1 ENSO_t + \gamma_2 VAI_t + u_{t}
is the same as the airborne fraction estimator in the following specification:
(\mathbb{I}-P_Z)G_t = \alpha (\mathbb{I}-P_Z)E_t + (\mathbb{I}-P_Z)u_{t}, \tag{3}
where Z_t = [ENSO_t,\ VAI_t], and P_Z = Z(Z'Z)^{-1}Z' is the projection matrix onto the column space of Z. Hence, we can estimate the airborne fraction using the residuals from regressing the atmospheric CO_2 concentration and emissions on the El Niño index and the volcanic activity index.
In the following, we show that estimating Equation 3 using Deming regression is equivalent to estimating the model specified by Equation 1 and Equation 2.
We have the following theorem:
Theorem 1 (Deming regression with additional covariates) The Deming regression estimator in the model specified by Equation 1 and Equation 2 is equivalent to the Deming regression in the model specified by Equation 1 and Equation 3. That is, the model where the FWL theorem has been applied.
Proof. The likelihood function for the model specified by Equation 1 and Equation 2 is given by:
L = \prod_{t=1}^T \frac{1}{\sqrt{2\pi\sigma_{\eta}^2}}\exp\left(-\frac{1}{2\sigma_{\eta}^2}\left(E_t - E_t^*\right)^2\right)\frac{1}{\sqrt{2\pi\lambda\sigma_{\eta}^2}}\exp\left(-\frac{1}{2\lambda\sigma_{\eta}^2}\left(G_t - \alpha E_t - Z_t\gamma\right)^2\right).
And the log-likelihood function is given by:
\begin{aligned} \mathcal{L} = \log L &= -\frac{T}{2}\log(2\pi\sigma_{\eta}^2) - \frac{1}{2\sigma_{\eta}^2}\sum_{t=1}^T\left(E_t - E_t^*\right)^2 \\ &- \frac{T}{2}\log(2\pi\lambda\sigma_{\eta}^2) - \frac{1}{2\lambda\sigma_{\eta}^2}\sum_{t=1}^T\left(G_t - \alpha E_t - Z_t\gamma\right)^2. \end{aligned}
Differentiating the log-likelihood and setting the derivatives to zero, we obtain the Deming regression estimator.
In the following, we solve for \gamma and replace it in the equations for \alpha and E_t^* to show that they solve the Deming regression equations in the model with additional covariates where the FWL theorem has been applied; that is, the model specified by Equation 1 and Equation 3.
Rewriting the log-likelihood function using matrix notation, we have:
\begin{aligned} \mathcal{L} &= -\frac{T}{2}\log(2\pi\sigma_{\eta}^2) - \frac{1}{2\sigma_{\eta}^2}\left(E - E^*\right)'(E - E^*) \\ &- \frac{T}{2}\log(2\pi\lambda\sigma_{\eta}^2) - \frac{1}{2\lambda\sigma_{\eta}^2}\left(G - \alpha E - Z\gamma\right)'(G - \alpha E - Z\gamma). \end{aligned}
Hence, the derivative of the log-likelihood with respect to \gamma is given by:
\frac{\partial \mathcal{L}}{\partial\gamma} = \frac{1}{2\lambda\sigma_{\eta}^2}\left(2Z'G-2\alpha Z'E^* - 2 Z'Z\gamma \right)
Equating the derivative to zero, we obtain:
Z'G - \alpha Z'E^* - Z'Z\gamma = 0,
which implies that:
\hat{\gamma} = (Z'Z)^{-1}Z'(G - \alpha E^*). \tag{4}
The derivative of the log-likelihood with respect to \alpha is given by:
\frac{\partial \mathcal{L}}{\partial\alpha} = \frac{1}{2\lambda\sigma_{\eta}^2}\left(2E^{*'}G - 2\alpha E^{*'}E^* - 2 E^{*'} Z\gamma \right)
Equating the derivative to zero, we obtain:
E^{*'}\left(G - \alpha E^* - Z\gamma \right) = 0,
Replacing \gamma from Equation 4, we obtain:
E^{*'}\left( (\mathbb{I} - Z(Z'Z)^{-1}Z' ) (G - \alpha E^*) \right) = 0,
which implies that \hat{\alpha} solves:
E^{*'}\left( (\mathbb{I} - P_Z)G - \alpha (\mathbb{I} - P_Z)E^* \right) = 0, \tag{5}
where P_Z = Z(Z'Z)^{-1}Z' is the projection matrix onto the column space of Z.
The derivative of the log-likelihood with respect to E^*_t is given by:
\frac{\partial \mathcal{L}}{\partial E^*_t} = -\frac{2}{2\sigma_{\eta}^2}\left(E_t - E^*_t\right) - \frac{2\alpha}{2\lambda\sigma_{\eta}^2}\left(G_t -\alpha E_t^* - Z_t\gamma\right)
Equating the derivative to zero, we obtain:
\lambda (E_t - E^*_t) + \alpha\left(G_t - \alpha E_t^* - Z_t\gamma\right) = 0.
Replacing \gamma from Equation 4, we obtain:
\lambda (E_t - E^*_t) + \alpha\left( (\mathbb{I} - P_Z) G_t - \alpha (\mathbb{I} - P_Z) E_t^* \right) = 0, \tag{6}
where P_Z is as before.
Solving the Deming regression with additional covariates implies that estimates for \alpha and E^* are obtained as the solutions to Equation 5 and Equation 6. Analogous steps to the ones above show that these equations are the same as the equations obtained from solving the Deming regression in the model specified by Equation 1 and Equation 3. This implies that the Deming regression with additional covariates is equivalent to the Deming regression in the model specified by Equation 3, where the FWL theorem has been applied. \square
This notebook is written in Julia and uses the following packages:
DataFrames
for data manipulationXLSX
for reading data from an Excel filePlots
Statistics
Distributions
All packages are available in the Julia registry and can be installed using the Julia package manager with the following command:
using Pkg
Pkg.add("DataFrames", "XLSX", "Plots", "Statistics", "Distributions")
In the following, we load a proejct environment that contains the necessary packages. This step is not required if the packages are already installed in the current environment.
In [47]:
using Pkg
Pkg.activate(pwd())
Pkg.instantiate()
Activating project at `c:\Users\eduar\OneDrive - Aalborg Universitet\Research\CLIMATE\AirborneFraction\QuartoVersion`
The airborne fraction is the fraction of CO_2 emissions that remain in the atmosphere. It is a key parameter in the carbon cycle and is used to estimate the impact of human activities on the climate system. The airborne fraction is defined as the ratio of the increase in atmospheric CO_2 concentration to the total CO_2 emissions.
We load the data, which is neatly collected in an Excel file in the author’s GitHub repository at the following link.
To ease things up, we have downloaded the data directly from the repository and saved it in the file AF_data.xlsx
in the local folder.
In [48]:
using DataFrames, XLSX
= "AF_data.xlsx"
path
= DataFrame(XLSX.readtable(path, "Data"))
data
= data[!, 1];
year = Vector{Float64}(data[!, 4]);
fossilfuels = Vector{Float64}(data[!, 6]);
lulcc = fossilfuels .+ lulcc;
emissions = Vector{Float64}(data[!, 5]);
coverage = Vector{Float64}(data[!, 9]);
VAI = Vector{Float64}(data[!, 10]);
ENSO = emissions;
E = coverage; G
In [49]:
using Plots
= @layout [a b; c d]
l = plot(year, G, label="Atmospheric concentration", xlabel="Year", ylabel="GtC/yr", title="", style=:solid, linewidth=2, color=1)
p1 = plot(year, E, label="Emissions", xlabel="Year", ylabel="GtC/yr", title="", style=:dash, linewidth=2, color=2)
p2 = plot(year, VAI, label="Volcanic activity index (VAI)", xlabel="Year", ylabel="", title="", style=:dot, linewidth=2, color=3)
p3= plot(year, ENSO, label="El Niño southern oscillation (ENSO)", xlabel="Year", ylabel="", title="", style=:dashdot, linewidth=2, color=4)
p4 = plot(p1, p2, p3, p4, layout = l, fontfamily="Computer Modern", legendfontsize=10, tickfontsize=10, titlefontfamily="Computer Modern", legendfontfamily="Computer Modern", tickfontfamily="Computer Modern", ylabelfontsize=10, xlabelfontsize=10, titlefontsize=12)
all_plot
display(all_plot)
In [50]:
using Plots
= plot(year, G, label="Atmospheric concentration", xlabel="Year", ylabel="GtC/yr", title="", style=:solid, linewidth=2, color=1)
p1 plot!(fontfamily="Computer Modern", legendfontsize=12, tickfontsize=14, titlefontfamily="Computer Modern", legendfontfamily="Computer Modern", tickfontfamily="Computer Modern", ylabelfontsize=14, xlabelfontsize=14, titlefontsize=16)
display(p1)
= plot(year, VAI, label="Volcanic activity index (VAI)", xlabel="Year", ylabel="", title="", style=:dot, linewidth=2, color=3)
p3 plot!(fontfamily="Computer Modern", legendfontsize=12, tickfontsize=14, titlefontfamily="Computer Modern", legendfontfamily="Computer Modern", tickfontfamily="Computer Modern", ylabelfontsize=14, xlabelfontsize=14, titlefontsize=16)
display(p3)
CO_2 atmospheric concentration [top] and volcanic activity index [bottom]
Bennedsen, Hillebrand, and Koopman (2024) suggested to estimate the airborne fraction by linear regression. They propose to use the following specification:
G_t = \alpha E_t + \epsilon_t,
and estimate \alpha, the airborne fraction, using ordinary least squares (OLS). They argue that this approach provides better statistical properties. Among them, the OLS estimator is super-consistent, meaning that it converges to the true value at a faster rate than the classic estimator. They also show that the estimator has lower variance and it is asymptotically normal.
To contrast the results, we first replicate the main results of Bennedsen, Hillebrand, and Koopman (2024). The authors considered a simple specification of the model, where the emissions variable is the only regressor, and an extended model that includes additional covariates.
In [51]:
= (E'E) \ (E'G)
α₂
= sum((G - α₂ * E) .^ 2)
rss₂ = rss₂ / (length(G) - 1)
σ²₂ = sqrt(σ²₂ / (E'E))
sd₍α₂₎
α₂, sd₍α₂₎
(0.4477918844144535, 0.014241317441433234)
Additional covariates, controlling for the El Niño Southern Oscillation (ENSO) and volcanic activity index (VAI). This is the preferred specification by Bennedsen, Hillebrand, and Koopman (2024).
Note that Bennedsen, Hillebrand, and Koopman (2024) first detrended the ENSO data using a linear trend. We analyse first if the detrending is necessary.
We plot the ENSO data and the detrended ENSO data.
In [52]:
= length(ENSO)
T
= [ones(T) collect(1:T)]
Xₜ
= (Xₜ'Xₜ) \ (Xₜ'ENSO)
ρ
= ENSO - Xₜ * ρ
ENSOᵨ
= plot(year, [ENSOᵨ ENSO], label=["Detrended ENSO" "ENSO"], xlabel="Year", ylabel="Unitless", title="El Niño southern oscillation", linewidth = 2, style = [:solid :dash :dot])
p5 plot!( fontfamily="Computer Modern", legendfontsize=12, tickfontsize=14, titlefontfamily="Computer Modern", legendfontfamily="Computer Modern", tickfontfamily="Computer Modern", ylabelfontsize=14, xlabelfontsize=14, titlefontsize=16, legend = :topleft)
display(p5)
ENSO data and detrended ENSO data
Moreover, we make the hypothesis test of the presence of a linear trend in the ENSO data. The null hypothesis is that there is no linear trend in the data.
In [53]:
using Distributions
= ENSO - Xₜ * ρ
resᵨ = sum(resᵨ.^2) / (T - 2)
σ²ᵨ = σ²ᵨ * inv(Xₜ'Xₜ)
Var₍ᵨ₎
= ρ[1] / sqrt( Var₍ᵨ₎[1,1] )
t1ᵨ = 2 * (1 - cdf(TDist(T-2), abs(t1ᵨ)))
pval1ᵨ
= ρ[2] / sqrt( Var₍ᵨ₎[2,2] )
t2ᵨ = 2 * (1 - cdf(TDist(T-2), abs(t2ᵨ)))
pval2ᵨ
1] sqrt(Var₍ᵨ₎[1,1]) t1ᵨ pval1ᵨ]; ρ[2] sqrt(Var₍ᵨ₎[2,2]) t2ᵨ pval2ᵨ] [[ρ[
2×4 Matrix{Float64}:
-0.144369 0.158836 -0.908919 0.366913
0.00264145 0.00424887 0.621684 0.536429
Note that the p-values are large, so we fail to reject the null hypothesis. This means that there is no evidence of a linear trend in the ENSO data. Hence, we continue the analysis without detrending the ENSO data.
In [54]:
= [E ENSO VAI]
Xₑ = (Xₑ'Xₑ) \ (Xₑ'G)
αₑ
= sum((G - Xₑ * αₑ) .^ 2)
rssₑ = rssₑ / (length(G) - 3)
σ²ₑ = σ²ₑ * inv(Xₑ'Xₑ)
var₍αₑ₎
sqrt.([var₍αₑ₎[j, j] for j = 1:3]), αₑ./ sqrt.([var₍αₑ₎[j, j] for j = 1:3]) ] [ αₑ,
3-element Vector{Vector{Float64}}:
[0.4734551237192292, 0.9672542193001916, -14.154904191057943]
[0.010839839871941386, 0.13273608839500084, 2.696977487398358]
[43.67731712944895, 7.287047787801322, -5.248432460855462]
In [55]:
= αₑ./ sqrt.([var₍αₑ₎[j, j] for j = 1:3])
tstats 1- cdf(TDist(T-3), abs(tstats[3]))
1.0227639827276036e-6
Note that the estimate using the ENSO data without detrending is slighly larger than the estimate using the detrended ENSO data. Nonetheless, the difference is small and the estimates are very close.
We calculate the R-squared and adjusted R-squared for the extended model and compare them with the simple model.
R-squared is not a good measure of goodness-of-fit for nested models given that it never decreases and most likely increases with the number of regressors. The adjusted R-squared corrects this issue by penalizing the inclusion of additional regressors (Davidson and MacKinnon 2004).
In [56]:
= sum(( G .- mean(G)).^2 )
tssₑ
= 1 - rss₂ / tssₑ
R²₂ = 1 - rssₑ / tssₑ
R²ₑ
= 1 - (rss₂ / (T - 1)) / (tssₑ / (T - 1))
adjR²₂ = 1 - (rssₑ / (T - 3)) / (tssₑ / (T - 1))
adjR²ₑ
R²₂, R²ₑ, adjR²₂, adjR²ₑ
(0.5862507041876346, 0.8012896668141007, 0.5862507041876346, 0.7947745739227597)
The R-squared and adjusted R-squared are higher for the extended model, suggesting that the additional covariates improve the fit of the model.
Anthropogenic CO_2 emissions are given by E_t = E_t^{FF}+E_t^{LULCC}, where E_t^{FF} is the emissions from fossil fuels and E_t^{LULCC} is the emissions from land-use and land-cover changes (LULCC). The uncertainty in measurements of the airborne fraction stems in large part from uncertainties in the magnitude of LULCC emissions (Bennedsen, Hillebrand, and Koopman 2024). This suggesst that LULCC emissions are subject to measurement error.
Figure 2 shows three different measurements of the LULCC variable. The GCP LULCC data are from the Global Carbon Project (Friedlingstein et al. 2023), the H&C LULCC data are from Houghton and Castanho (2022), and the vMa LULCC data are from Marle et al. (2022).
In [57]:
using Plots
= Vector{Float64}(data[!, 7]);
lulcc₂ = Vector{Float64}(data[!, 8]);
lulcc₃
= fossilfuels .+ lulcc₂;
E₂ = fossilfuels .+ lulcc₃;
E₃
= plot(year, [lulcc lulcc₂ lulcc₃], label=["GCP" "H&N" "vMa"], xlabel="Year", ylabel="CO2", title="Land-use and land-cover change measures", style=[:solid :dash :dot], linewidth=2)
plot_lulcc plot!(fontfamily="Computer Modern", legendfontsize=12, tickfontsize=14, titlefontfamily="Computer Modern", legendfontfamily="Computer Modern", tickfontfamily="Computer Modern", ylabelfontsize=14, xlabelfontsize=14, titlefontsize=16, legend=:topright)
display(plot_lulcc)
Figure 3 shows the emissions variable using the different LULCC measurements. The emissions variable is the sum of the fossil fuels and LULCC emissions.
In [58]:
= plot(year, [E E₂ E₃], label=["Emissions (GCP LULCC)" "Emissions (H&N LULCC)" "Emissions (vMa LULCC)"], xlabel="Year", ylabel="CO2", title="Emission measures", style=[:solid :dash :dot], linewidth=2)
plot_emisions plot!(fontfamily="Computer Modern", legendfontsize=12, tickfontsize=14, titlefontfamily="Computer Modern", legendfontfamily="Computer Modern", tickfontfamily="Computer Modern", ylabelfontsize=14, xlabelfontsize=14, titlefontsize=16, legend=:topleft)
display(plot_emisions)
We show that measurement errors in the emissions variable can bias the estimates of the airborne fraction. Assume that we do not observe the true emissions but a noisy version of it. That is, we observe E_t = E_t^* + \eta_t, where E_t^* is the true emissions and \eta_t is the measurement error, which we assume has mean zero and variance \sigma^2_\eta. Estimating the airborne fraction using the noisy emissions by OLS:
\hat{\alpha}_{ME} = \frac{\sum_{t=1}^T E_t G_t}{\sum_{t=1}^T E_t^2} = \frac{\sum_{t=1}^T (E_t^*G_t + \eta_t G_t)}{\sum_{t=1}^T (E_t^{*2}+2E_t^*\eta_t+\eta_t^2)}\rightarrow \frac{\frac{1}{T}\sum_{t=1}^T E_t^*G_t}{\frac{1}{T}\sum_{t=1}^T E_t^{*2}+\sigma_\eta^2},
which shows that the OLS estimator is biased downwards. The bias increases with the variance of the measurement error, which is unknown.
To correct the bias, we can estimate the airborne fraction using instrumental variables. Unlike Deming regression, instrumental variables do not require the variance of the measurement error to be known, nor assuming that they are normally distributed.
To use instrumental variables, we need a variable that is correlated with the emissions but uncorrelated with the measurement error. This variable is called an instrument.
There are several measurements of the land-use and land-coverage changes (LULCC) variable Figure 2, which forms part of the emissions measurement Figure 3. Even under the assumption that all of these different measurements are subject to measurement error, we can use them as instruments to correct the bias in the estimate of the airborne fraction.
Consider a second emissions measurement, E_{2,t} = E_t^* + \omega_t, where \omega_t is the measurement error in the second emissions measurement. We assume that \omega_t is independent of \eta_t since the two measurements are performed independently. We can use the second emissions measurement as an instrument to estimate the airborne fraction. The instrument is correlated with the emissions variable given that they share the same true emissions; but, by construction, uncorrelated with the measurement error in the emissions variable.
Consider the following estimator for the airborne fraction:
\hat{\alpha}_{IV} = \frac{\sum_{t=1}^T E_{2,t} G_t}{\sum_{t=1}^T E_{2,t}E_{t}} = \frac{\sum_{t=1}^T (E_t^*G_t + \omega_t G_t)}{\sum_{t=1}^T (E_t^{*2}+E_t^*\eta_t+E_t^*\omega_t+\eta_t\omega_t)} \rightarrow \frac{\frac{1}{T}\sum_{t=1}^T E_t^*G_t}{\frac{1}{T}\sum_{t=1}^T E_t^{*2}} = \hat{\alpha}, \tag{7}
where \hat{\alpha} is the estimator without measurement errors. Hence, Equation 7 shows that the instrumental variables estimator is unbiased. Moreover, the estimator is consistent and asymptotically normal, regardless of the distribution of the measurement errors.
Note that the theoretical properties of the instrumental variables estimator are a direct consequence of the additive nature of the measurement errors. The order of probability of the measurement errors and the variables multiplied by them is lower than the order of the variables themselves. For a textbook treatment on orders of probability, see Hamilton (1994).
Depending on which variable is selected as the single instrument, two different estimators can be obtained. Later, we will extend the analysis to include both instruments simultaneously.
In [59]:
# H&N LULCC
= (E₂'E) \ (E₂'G)
α₍ₕₙ₎
# vMa LULCC
= (E₃'E) \ (E₃'G)
α₍ᵥₘₐ₎
α₍ₕₙ₎, α₍ᵥₘₐ₎
(0.4478895029464986, 0.4481523469750347)
In contrast to the Deming regression, there is a closed-form expression to compute the standard error of the instrumental variables estimator.
It is given by:
\widehat{\text{Var}}(\hat{\alpha}_{IV}) = \hat{\sigma}^2_{iv} \left(\sum_{t=1}^T E_{2,t} E_t\right) \left(\sum_{t=1}^T E_{2,t} E_{2,t}\right)^{-1} \left(\sum_{t=1}^T E_{2,t} E_t\right), \tag{8}
where \hat{\sigma}^2_{IV} = \frac{1}{T-1}\sum_{t=1^T}(G_t-\hat{\alpha}_{IV}E_t)^2 is the estimator of the variance of the residuals.
In [60]:
# H&N LULCC
= sum((G - α₍ₕₙ₎ * E) .^ 2)
rss₍ₕₙ₎ = rss₍ₕₙ₎ / (length(G) - 1)
σ²₍ₕₙ₎ = sqrt(σ²₍ₕₙ₎ / (E₂'E) * (E₂'E₂) / (E'E₂))
sd₍α₍ₕₙ₎₎
# vMa LULCC
= sum((G - α₍ᵥₘₐ₎ * E) .^ 2)
rss₍ᵥₘₐ₎ = rss₍ᵥₘₐ₎ / (length(G) - 1)
σ²₍ᵥₘₐ₎ = sqrt(σ²₍ᵥₘₐ₎ / (E₃'E) * (E₃'E₃) / (E'E₃))
sd₍α₍ᵥₘₐ₎₎
sd₍α₍ₕₙ₎₎, sd₍α₍ᵥₘₐ₎₎
(0.014250528339527467, 0.014259657933478448)
Estimates and standard errors.
In [61]:
[α₍ₕₙ₎ α₍ᵥₘₐ₎; sd₍α₍ₕₙ₎₎ sd₍α₍ᵥₘₐ₎₎]
2×2 Matrix{Float64}:
0.44789 0.448152
0.0142505 0.0142597
Instrumental variables can be extended to simultaneously use more than one instrument for each variable with measurement error. The estimator is denoted as the generalised instrumental variables (GIV) and it is given by:
\hat{\alpha}_{GIV} = (\tilde{E}'\tilde{E})^{-1}\tilde{E}' G,
where \tilde{E}_t is the fitted value from the following regression:
E_t = \beta_1 E_{2,t} + \beta_2 E_{3,t} + \epsilon_t,
where E_{2,t} = E_t^* + \omega_t and E_{3,t} = E_t^* + \zeta_t are the second and third emissions measurements, respectively. The coefficients \beta_1 and \beta_2 are estimated by linear regression.
Moreover, the variance of the GIV estimator is given by:
\widehat{\text{Var}}(\hat{\alpha}_{GIV}) = \hat{\sigma}^2_{GIV} (\tilde{E}'\tilde{E})^{-1},
where \hat{\sigma}^2_{GIV} = \frac{1}{T-1}\sum_{t=1}^T(G_t-\hat{\alpha}_{GIV}E_t)^2 is the estimator of the variance of the residuals.
In [62]:
= E
X = [E₂ E₃]
W = W * ((W' * W) \ W')
PW
= (X' * PW * X) \ (X' * PW * G)
αᵢ
= sum((G - X * αᵢ) .^ 2)
rssᵢ = rssᵢ / (length(G) - 1)
σ²ᵢ = σ²ᵢ * inv(X' * PW * X)
var₍αᵢ₎ = sqrt(var₍αᵢ₎)
sd₍αᵢ₎
αᵢ, sd₍αᵢ₎
(0.44763765543651446, 0.01424768219542376)
Having more than one instrument further allows us to test the validity of the instruments. Two common tests for GIV are Sargan’s instrument validity test and the Hausman’s overidentification test.
Sargan’s test is a test to determine if the instruments are correlated with the endogenous variable.
In [63]:
using Statistics
= (W' * W) \ (W' * X)
γ = sum((X - W * γ) .^ 2)
rssₐ
= 1 - rssₐ / (sum((X .- mean(X)) .^ 2))
R²ᵢ
= length(X) * R²ᵢ ȷ
63.23692611848501
In [64]:
using Distributions
1 - cdf(Chisq(1), ȷ)
1.887379141862766e-15
Hausman’s test is a test to determine if there is a systematic difference between the instrumental variables and the OLS estimates.
In [65]:
= (αᵢ - α₂)' * ((sd₍αᵢ₎^2 - sd₍α₂₎^2) \ (αᵢ - α₂))
Hᵢ = (α₍ₕₙ₎ - α₂)' * ((sd₍α₍ₕₙ₎₎^2 - sd₍α₂₎^2) \ (α₍ₕₙ₎ - α₂))
H₍ₕₙ₎ = (α₍ᵥₘₐ₎ - α₂)' * ((sd₍α₍ᵥₘₐ₎₎^2 - sd₍α₂₎^2) \ (α₍ᵥₘₐ₎ - α₂))
H₍ᵥₘₐ₎
1 - cdf(Chisq(1), Hᵢ) 1 - cdf(Chisq(1), H₍ₕₙ₎) 1 - cdf(Chisq(1), H₍ᵥₘₐ₎)] [
1×3 Matrix{Float64}:
0.71721 0.848874 0.618083
We consider adding additional covariates to the model. In particular, we consider adding ENSO (El Niño Southern Oscillation) and VAI (volcanic activity index) as covariates. These variables are known to affect the carbon cycle and can potentially influence the airborne fraction. Note that these variables were not considered in the Deming regression analysis by Bennedsen, Hillebrand, and Koopman (2024).
We estimate the following model:
G_t = \alpha E_t + \gamma_1 ENSO_t + \gamma_2 VAI_t + \epsilon_t,
where ENSO_t and VAI_t are the El Niño Southern Oscillation and volcanic activity index at time t, respectively.
Estimating the extended model using instrumental variables is straightforward. We can include the additional covariates in the regression and instrument the emissions variable.
Standard errors for the extended model using instrumental variables are also straightforward.
In [66]:
= [E ENSO VAI]
Xₑ
# H&N LULCC with ENSO and VAI
= [E₂ ENSO VAI]
Wₕₙ = Wₕₙ * ((Wₕₙ' * Wₕₙ) \ Wₕₙ')
PWₕₙ
= (Xₑ' * PWₕₙ * Xₑ) \ (Xₑ' * PWₕₙ * G)
α₍ₕₙₑ₎ = sum((G - Xₑ * α₍ₕₙₑ₎) .^ 2)
rss₍ₕₙₑ₎ = rss₍ₕₙₑ₎ / (length(G) - 3)
σ²₍ₕₙₑ₎ = σ²₍ₕₙₑ₎ * inv(Xₑ'PWₕₙ * Xₑ)
var₍α₍ₕₙₑ₎₎ = sqrt.([var₍α₍ₕₙₑ₎₎[j, j] for j = 1:3])
sd₍α₍ₕₙₑ₎₎
# vMa LULCC with ENSO and VAI
= [E₃ ENSO VAI]
Wᵥₘₐ = Wᵥₘₐ * ((Wᵥₘₐ' * Wᵥₘₐ) \ Wᵥₘₐ')
PWᵥₘₐ
= (Xₑ' * PWᵥₘₐ * Xₑ) \ (Xₑ' * PWᵥₘₐ * G)
α₍ᵥₘₐₑ₎ = sum((G - Xₑ * α₍ᵥₘₐₑ₎) .^ 2)
rss₍ᵥₘₐₑ₎ = rss₍ᵥₘₐₑ₎ / (length(G) - 3)
σ²₍ᵥₘₐₑ₎ = σ²₍ᵥₘₐₑ₎ * inv(Xₑ'PWᵥₘₐ * Xₑ)
var₍α₍ᵥₘₐₑ₎₎ = sqrt.([var₍α₍ᵥₘₐₑ₎₎[j, j] for j = 1:3])
sd₍α₍ᵥₘₐₑ₎₎
# GIV with ENSO and VAI
= [E₂ E₃ ENSO VAI]
Wₑ = Wₑ * ((Wₑ' * Wₑ) \ Wₑ')
PWₑ
= (Xₑ' * PWₑ * Xₑ) \ (Xₑ' * PWₑ * G)
α₍ᵢₑ₎ = G - Xₑ * α₍ᵢₑ₎
res₍ᵢₑ₎ = sum(res₍ᵢₑ₎ .^ 2) / (length(G) - 4)
σ²₍ᵢₑ₎ = σ²₍ᵢₑ₎ * inv(Xₑ' * PWₑ * Xₑ)
var₍α₍ᵢₑ₎₎ = sqrt.([var₍α₍ᵢₑ₎₎[j, j] for j = 1:3])
sd₍α₍ᵢₑ₎₎
[α₍ₕₙₑ₎ α₍ᵥₘₐₑ₎ α₍ᵢₑ₎; sd₍α₍ₕₙₑ₎₎ sd₍α₍ᵥₘₐₑ₎₎ sd₍α₍ᵢₑ₎₎]
6×3 Matrix{Float64}:
0.472673 0.472328 0.472978
0.96578 0.965129 0.966355
-14.0822 -14.0501 -14.1106
0.0108476 0.0108547 0.0109354
0.132744 0.132752 0.133841
2.69734 2.6977 2.71959
Sargan and Hausman tests for the extended model.
In [67]:
= (Wₑ' * Wₑ) \ (Wₑ' * Xₑ)
γₑ = sum((Xₑ - Wₑ * γₑ) .^ 2)
rss₍ᵢₑ₎
= 1 - rss₍ᵢₑ₎ / (sum((Xₑ .- mean(Xₑ)) .^ 2))
R²₍ᵢₑ₎
= length(X) * R²₍ᵢₑ₎
ȷₑ
1 - cdf(Chisq(3), ȷₑ)
8.526512829121202e-14
In [68]:
= (α₍ᵢₑ₎ - αₑ)' * ((var₍α₍ᵢₑ₎₎ - var₍αₑ₎ ) \ (α₍ᵢₑ₎ - αₑ))
Hₑ = (α₍ₕₙₑ₎ - αₑ)' * ((var₍α₍ₕₙₑ₎₎ - var₍αₑ₎ ) \ (α₍ₕₙₑ₎ - αₑ))
H₍ₕₙₑ₎ = (α₍ᵥₘₐₑ₎ - αₑ)' * ((var₍α₍ᵥₘₐₑ₎₎ - var₍αₑ₎) \ (α₍ᵥₘₐₑ₎ - αₑ))
H₍ᵥₘₐₑ₎
1 - cdf(Chisq(3), Hₑ) 1 - cdf(Chisq(3), H₍ₕₙₑ₎) 1 - cdf(Chisq(3), H₍ᵥₘₐₑ₎)] [
1×3 Matrix{Float64}:
0.990682 0.301239 0.267212
Given the variability of the LULCC measurements at the beginning of the series, we consider a recent subsample of the data. We consider the data from 1992 and estimate the airborne fraction using the new approach.
Getting subsample data.
In [69]:
= E[year.>=1992];
E92 = G[year.>=1992];
G92 = E₂[year.>=1992];
E92₂ = E₃[year.>=1992];
E92₃ = VAI[year.>=1992];
VAI92 = ENSO[year.>=1992]; ENSO92
New approach for the recent subsample.
In [70]:
92₂ = (E92'E92₂) \ (E92₂'G92)
α
= sum((G92 - α92₂ * E92₂) .^ 2)
rss92₂ 92₂ = rss92₂ / (length(G92) - 1)
σ²92₂₎ = sqrt(σ²92₂ / (E92₂'E92₂))
sd₍α
92₂, sd₍α92₂₎ α
(0.4496265998122475, 0.018475507030537987)
Instrumental variables for the recent subsample.
In [71]:
= E92
X92 = [E92₂ E92₃]
W92 = W92 * ((W92' * W92) \ W92')
PW92
# H&N LULCC
92₍ₕₙ₎ = (E92₂'E92) \ (E92₂'G92)
α= sum((G92 - α92₍ₕₙ₎ * E92) .^ 2)
rss92₍ₕₙ₎ 92₍ₕₙ₎ = rss92₍ₕₙ₎ / (length(G92) - 1)
σ²92₍ₕₙ₎₎ = sqrt(σ²92₍ₕₙ₎ / (E92₂'E92) * (E92₂'E92₂) / (E92'E92₂))
sd₍α
# vMa LULCC
92₍ᵥₘₐ₎ = (E92₃'E92) \ (E92₃'G92)
α= sum((G92 - α92₍ᵥₘₐ₎ * E92) .^ 2)
rss92₍ᵥₘₐ₎ 92₍ᵥₘₐ₎ = rss₍ᵥₘₐ₎ / (length(G92) - 1)
σ²92₍ᵥₘₐ₎₎ = sqrt(σ²92₍ᵥₘₐ₎ / (E92₃'E92) * (E92₃'E92₃) / (E92'E92₃))
sd₍α
# GIV
92ᵢ = (X92' * PW92 * X92) \ (X92' * PW92 * G92)
α= sum((G92 - X92 * α92ᵢ) .^ 2)
rss92ᵢ 92ᵢ = rss92ᵢ / (length(G92) - 1)
σ²92ᵢ₎ = sqrt(σ²92ᵢ * inv(X92' * PW92 * X92))
sd₍α
92₍ₕₙ₎ α92₍ᵥₘₐ₎ α92ᵢ; sd₍α92₍ₕₙ₎₎ sd₍α92₍ᵥₘₐ₎₎ sd₍α92ᵢ₎] [α
2×3 Matrix{Float64}:
0.449627 0.450219 0.449493
0.0173104 0.0244938 0.0173103
Extended model for the recent subsample.
In [72]:
= [E92 ENSO92 VAI92]
X92ₑ
92ₑ = (X92ₑ'X92ₑ) \ (X92ₑ'G92)
α= sum((G92 - X92ₑ * α92ₑ) .^ 2)
rss92ₑ 92ₑ = rss92ₑ / (length(G92) - 3)
σ²92ₑ₎ = σ²92ₑ * inv(X92ₑ'X92ₑ)
var₍α
92ₑ, sqrt.([var₍α92ₑ₎[j, j] for j = 1:3]) ] [ α
2-element Vector{Vector{Float64}}:
[0.4622219656227806, 1.023779808855988, -17.167003554558082]
[0.011245930036851547, 0.17229498857880868, 3.6603647063388234]
Instrumental variables for the recent subsample and extended model.
In [73]:
= [E92 ENSO92 VAI92]
X92ₑ
# H&N LULCC
= [E92₂ ENSO92 VAI92]
W92₍ₕₙₑ₎ = W92₍ₕₙₑ₎ * ((W92₍ₕₙₑ₎' * W92₍ₕₙₑ₎) \ W92₍ₕₙₑ₎')
PW92₍ₕₙₑ₎
92₍ₕₙₑ₎ = (X92ₑ' * PW92₍ₕₙₑ₎ * X92ₑ) \ (X92ₑ' * PW92₍ₕₙₑ₎ * G92)
α= sum((G92 - X92ₑ * α92₍ₕₙₑ₎) .^ 2)
rss92₍ₕₙₑ₎ 92₍ₕₙₑ₎ = rss92₍ₕₙₑ₎ / (length(G92) - 3)
σ²92₍ₕₙₑ₎₎ = σ²92₍ₕₙₑ₎ * inv(X92ₑ' * PW92₍ₕₙₑ₎ * X92ₑ)
var₍α92₍ₕₙₑ₎₎ = sqrt.([var₍α92₍ₕₙₑ₎₎[j, j] for j = 1:3])
sd₍α
# vMa LULCC
= [E92₃ ENSO92 VAI92]
W92₍ᵥₘₐₑ₎ = W92₍ᵥₘₐₑ₎ * ((W92₍ᵥₘₐₑ₎' * W92₍ᵥₘₐₑ₎) \ W92₍ᵥₘₐₑ₎')
PW92₍ᵥₘₐₑ₎
92₍ᵥₘₐₑ₎ = (X92ₑ' * PW92₍ᵥₘₐₑ₎ * X92ₑ) \ (X92ₑ' * PW92₍ᵥₘₐₑ₎ * G92)
α= sum((G92 - X92ₑ * α92₍ᵥₘₐₑ₎) .^ 2)
rss92₍ᵥₘₐₑ₎ 92₍ᵥₘₐₑ₎ = rss92₍ᵥₘₐₑ₎ / (length(G92) - 3)
σ²92₍ᵥₘₐₑ₎₎ = σ²92₍ᵥₘₐₑ₎ * inv(X92ₑ' * PW92₍ᵥₘₐₑ₎ * X92ₑ)
var₍α92₍ᵥₘₐₑ₎₎ = sqrt.([var₍α92₍ᵥₘₐₑ₎₎[j, j] for j = 1:3])
sd₍α
# GIV
= [E92₂ E92₃ ENSO92 VAI92]
W92ₑ = W92ₑ * ((W92ₑ' * W92ₑ) \ W92ₑ')
PW92ₑ
92₍ᵢₑ₎ = (X92ₑ' * PW92ₑ * X92ₑ) \ (X92ₑ' * PW92ₑ * G92)
α= sum((G92 - X92ₑ * α92₍ᵢₑ₎) .^ 2)
rss92₍ᵢₑ₎ 92₍ᵢₑ₎ = rss92₍ᵢₑ₎ / (length(G92) - 3)
σ²= σ²92₍ᵢₑ₎ * inv(X92ₑ' * PW92ₑ * X92ₑ)
var92₍α₍ᵢₑ₎₎ 92₍ᵢₑ₎₎ = sqrt.([var92₍α₍ᵢₑ₎₎[j, j] for j = 1:3])
sd₍α
# All results
92₍ₕₙₑ₎ sd₍α92₍ₕₙₑ₎₎; α92₍ᵥₘₐₑ₎ sd₍α92₍ᵥₘₐₑ₎₎; α92₍ᵢₑ₎ sd₍α92₍ᵢₑ₎₎] [α
9×2 Matrix{Float64}:
0.462196 0.0112469
1.02376 0.172295
-17.1651 3.66038
0.462269 0.0112487
1.02382 0.172295
-17.1705 3.66041
0.462177 0.0112468
1.02374 0.172295
-17.1637 3.66038
Similar to Bennedsen, Hillebrand, and Koopman (2024), we consider the specifications of the model using the H&N and vMa LULCC measurements. We estimate the airborne fraction using the new approach and instrumental variables.
Specifying the model using the H&N LULCC measurements and using GCP LULCC and vMA LULCC as instruments.
In [74]:
= (E'E₂) \ (E'G)
α_hn_gcp = sum((G - α_hn_gcp * E₂) .^ 2)
rss_hn_gcp = rss_hn_gcp / (length(G) - 1)
σ²_hn_gcp = sqrt(σ²_hn_gcp * (E'E₂) / (E'E) * (E'E₂))
sd₍α_hn_gcp₎
= (E₃'E₂) \ (E₃'G)
α_hn_vma = sum((G - α_hn_vma * E₂) .^ 2)
rss_hn_vma = rss_hn_vma / (length(G) - 1)
σ²_hn_vma = sqrt(σ²_hn_vma * (E₃'E₂) / (E₃'E₃) * (E₃'E₂))
sd₍α_hn_vma₎
[α_hn_gcp α_hn_vma; sd₍α_hn_gcp₎ sd₍α_hn_vma₎]
2×2 Matrix{Float64}:
0.47605 0.475619
54.9119 54.9349
Specifying the model using the vMa LULCC measurements and using GCP LULCC and H&N LULCC as instruments.
In [75]:
= (E'E₃) \ (E'G)
α_vma_gcp = sum((G - α_vma_gcp * E₃) .^ 2)
rss_vma_gcp = rss_vma_gcp / (length(G) - 1)
σ²_vma_gcp = sqrt(σ²_vma_gcp * (E'E₃) / (E'E) * (E'E₃))
sd₍α_vma_gcp₎
= (E₂'E₃) \ (E₂'G)
α_vma_hn = sum((G - α_vma_hn * E₃) .^ 2)
rss_vma_hn = rss_vma_hn / (length(G) - 1)
σ²_vma_hn = sqrt(σ²_vma_hn * (E₂'E₃) / (E₂'E₂) * (E₂'E₃))
sd₍α_vma_hn₎
[α_vma_gcp α_vma_hn; sd₍α_vma_gcp₎ sd₍α_vma_hn₎]
2×2 Matrix{Float64}:
0.491074 0.490342
53.2731 53.3285
The theoretical development of the Deming regression based on the Frisch-Waugh-Lovell theorem is presented in Theorem 1. The theorem states that the OLS estimator of the airborne fraction in the preferred specification can be obtained by regressing the residuals of the emissions variable from the covariates on the residuals of the airborne fraction from the covariates.
First, we use the Frisch-Waugh-Lovell theorem in the preferred specification of the model by Bennedsen, Hillebrand, and Koopman (2024).
In [76]:
= [ENSO VAI]
AX = (AX'AX) \ (AX'G)
coefs1 = G - AX * coefs1
resₐ
= (AX'AX) \ (AX'E)
coefs2 = E - AX * coefs2
resₑ
= (resₑ'resₑ) \ (resₑ'resₐ) α₋
0.4734551237192293
Above, we also compute the airbone fraction in the preferred specification of the model to show that it is identical to the OLS estimator.
There is no closed-form expression to compute the standard errors of the Deming regression. Hence, we propose to use the bootstrap method to estimate the standard errors and confidence intervals.
First proposed by Efron (1992), bootstrap has become a major tool for approximating sampling distributions and variance of complex statistics. This is perhaps not surprising in view of its ability to estimate distributions for statistics even when analytical solutions are unavailable. In addition, bootstrap methods often yield more accurate results than standard methods. Similarly, in the context of confidence intervals, bootstrap has been often employed as a means for improving upon the accuracy of standard intervals (DiCiccio and Efron 1996).
We show how to employ a form of model-based bootstrap approach to calculate the confidence intervals of the Deming regression esstimate \hat{\alpha}_{Deming} in the simple specification. The algorithm proceeds as follows:
The bootstrap procedure does not assume a Normal distribution and minimises computation error compared to the jackknife.
We write a function to compute the Deming regression standard errors.
In [77]:
function Deming_se_ConfI(y::Array{Float64}, x::Array{Float64}, δ::Float64, B::Int64, a::Float64)
######################################################################
# Function Arguments
# y,x: dependent and independent variables
# δ: ratio between the measurement error variances assumed known
# B: number of bootstrap replications e.g. 9999
# a: significance level chosen by the researcher e.g. 0.05
# Note: Work in progress shows that bootstrapping can also be used to correct the coefficient
# for small sample bias. Not pursued here.
######################################################################
= length(y)
T = zeros(T, 1)
y_boot = zeros(B, 1)
alpha_boot = vec(zeros(B, 1))
alpha_boot = x'x
M₍xx₎ = x'y
M₍xy₎ = y'y
M₍yy₎ = (M₍yy₎ - δ * M₍xx₎ + sqrt((M₍yy₎ - δ * M₍xx₎)^2 + 4 * δ * M₍xy₎^2)) / (2 * M₍xy₎)
alpha_dem = y - alpha_dem * x
resid = resid .- mean(resid) #recenter residuals
resid #Bootstrap
for i = 1:B
= alpha_dem * x + sample(resid, T, replace=true)
y_boot = x'x
M₍xx₎_boot = x'y_boot
M₍xy₎_boot = y_boot'y_boot
M₍yy₎_boot = (M₍yy₎_boot - δ * M₍xx₎_boot + sqrt((M₍yy₎_boot - δ * M₍xx₎_boot)^2 + 4 * δ * M₍xy₎_boot^2)) / (2 * M₍xy₎_boot)
alpha_boot[i] end
= sqrt(sum((alpha_boot .- mean(alpha_boot)) .^ 2) / (B - 1)) # Calculate standard errors
se = quantile(alpha_boot, 1 - a / 2)
lower = quantile(alpha_boot, a / 2)
upper return alpha_dem, se, alpha_dem - lower * se, alpha_dem + upper * se
end
Deming_se_ConfI (generic function with 1 method)
We use the function to compute the estimates, standard errors, and confidence intervals of the airborne fraction in the preferred specification of the model. We do this by calling the function on the residuals of the emissions variable from the covariates and the residuals of the airborne fraction from the covariates.
We use 9,999 bootstrap replications to estimate the standard errors and confidence intervals.
In [78]:
= zeros(5, 5)
δₑ 1, :] = [0.2 0.5 1 2 5]
δₑ[
for ii = 1:5
= Deming_se_ConfI(resₐ, resₑ, δₑ[1, ii], 9999, 0.05)
thisdelta 2, ii] = thisdelta[1]
δₑ[3, ii] = thisdelta[2]
δₑ[4, ii] = thisdelta[3]
δₑ[5, ii] = thisdelta[4]
δₑ[end
display(δₑ)
5×5 Matrix{Float64}:
0.2 0.5 1.0 2.0 5.0
0.481519 0.478173 0.476241 0.474985 0.474106
0.0107142 0.0108849 0.0107602 0.0106082 0.0106522
0.476053 0.472685 0.47086 0.469712 0.468827
0.486539 0.483199 0.481169 0.479819 0.478939
For completeness, we also compute the Deming regression in the simple specification of the model. The standard errors and confidence intervals were not computed in the paper by Bennedsen, Hillebrand, and Koopman (2024).
In [79]:
= zeros(5, 5)
δₛ 1, :] = [0.2 0.5 1 2 5]
δₛ[
for ii = 1:5
= Deming_se_ConfI(G, E, δₑ[1, ii], 9999, 0.05)
thisdelta 2, ii] = thisdelta[1]
δₛ[3, ii] = thisdelta[2]
δₛ[4, ii] = thisdelta[3]
δₛ[5, ii] = thisdelta[4]
δₛ[end
display(δₛ)
5×5 Matrix{Float64}:
0.2 0.5 1.0 2.0 5.0
0.462305 0.456067 0.4526 0.450406 0.448895
0.0150569 0.0145454 0.0143868 0.0142225 0.0141223
0.454681 0.448906 0.445603 0.443564 0.442136
0.469032 0.462406 0.458782 0.456464 0.454866
In [80]:
= zeros(5, 5)
δ₉₂ 1, :] = [0.2 0.5 1 2 5]
δ₉₂[
for ii = 1:5
= Deming_se_ConfI(G92, E92, δ₉₂[1, ii], 9999, 0.05)
thisdelta 2, ii] = thisdelta[1]
δ₉₂[3, ii] = thisdelta[2]
δ₉₂[4, ii] = thisdelta[3]
δ₉₂[5, ii] = thisdelta[4]
δ₉₂[end
display(δ₉₂)
5×5 Matrix{Float64}:
0.2 0.5 1.0 2.0 5.0
0.459831 0.455479 0.453053 0.451512 0.450449
0.0174315 0.0174551 0.0174061 0.0173577 0.0170645
0.451049 0.446838 0.444517 0.443042 0.442163
0.467417 0.462935 0.4604 0.458793 0.457595
FWL theorem in the extended model and recent subsample.
In [81]:
= [ENSO92 VAI92]
AX92 = (AX92'AX92) \ (AX92'G92)
coefs921 = G92 - AX92 * coefs921
res92ₐ
= (AX92'AX92) \ (AX92'E92)
coefs922 = E92 - AX92 * coefs922; res92ₑ
Deming regression standard errors in the extended model and recent subsample.
In [82]:
= zeros(5, 5)
δ₉₂ₑ 1, :] = [0.2 0.5 1 2 5]
δ₉₂ₑ[
for ii = 1:5
= Deming_se_ConfI(res92ₐ, res92ₑ, δ₉₂ₑ[1, ii], 9999, 0.05)
thisdelta 2, ii] = thisdelta[1]
δ₉₂ₑ[3, ii] = thisdelta[2]
δ₉₂ₑ[4, ii] = thisdelta[3]
δ₉₂ₑ[5, ii] = thisdelta[4]
δ₉₂ₑ[end
display(δ₉₂ₑ)
5×5 Matrix{Float64}:
0.2 0.5 1.0 2.0 5.0
0.466195 0.464524 0.463574 0.462962 0.462536
0.0104772 0.0106106 0.0106332 0.0107953 0.0106563
0.461056 0.459356 0.458416 0.457733 0.457382
0.470904 0.46925 0.468293 0.467737 0.467241
The table below shows the estimates of the airborne fraction using the different methods. The table includes the estimates, standard errors, and confidence intervals.
In [83]:
= DataFrame("Model" => String[], "Estimate" => Float64[], "Std. Error" => Float64[], "Confidence Int." => Vector{Float64}[])
results_analysis
= 4;
nd
push!(results_analysis, ["OLS Regression", α₂, sd₍α₂₎, round.([α₂ - 1.96 * sd₍α₂₎, α₂ + 1.96 * sd₍α₂₎], digits=nd)])
push!(results_analysis, ["OLS Regression with ENSO and VAI", αₑ[1], var₍αₑ₎[1,1], round.([αₑ[1] - 1.96 * var₍αₑ₎[1,1], αₑ[1] + 1.96 * var₍αₑ₎[1,1]], digits=nd)])
push!(results_analysis, ["IV Regression (H&N LULCC)", α₍ₕₙ₎, sd₍α₍ₕₙ₎₎, round.([α₍ₕₙ₎ - 1.96 * sd₍α₍ₕₙ₎₎, α₍ₕₙ₎ + 1.96 * sd₍α₍ₕₙ₎₎], digits=nd)])
push!(results_analysis, ["IV Regression (vMA LULCC)", α₍ᵥₘₐ₎, sd₍α₍ᵥₘₐ₎₎, round.([α₍ᵥₘₐ₎ - 1.96 * sd₍α₍ᵥₘₐ₎₎, α₍ᵥₘₐ₎ + 1.96 * sd₍α₍ᵥₘₐ₎₎], digits=nd)])
push!(results_analysis, ["GIV Regression", αᵢ, sd₍αᵢ₎, round.([αᵢ - 1.96 * sd₍αᵢ₎, αᵢ + 1.96 * sd₍αᵢ₎], digits=nd)])
push!(results_analysis, ["IV Regression (H&N LULCC) with ENSO and VAI", α₍ₕₙₑ₎[1], sd₍α₍ₕₙₑ₎₎[1], round.([α₍ₕₙₑ₎[1] - 1.96 * sd₍α₍ₕₙₑ₎₎[1], α₍ₕₙₑ₎[1] + 1.96 * sd₍α₍ₕₙₑ₎₎[1]], digits=nd)])
push!(results_analysis, ["IV Regression (vMA LULCC) with ENSO and VAI", α₍ᵥₘₐₑ₎[1], sd₍α₍ᵥₘₐₑ₎₎[1], round.([α₍ᵥₘₐₑ₎[1] - 1.96 * sd₍α₍ᵥₘₐₑ₎₎[1], α₍ᵥₘₐₑ₎[1] + 1.96 * sd₍α₍ᵥₘₐₑ₎₎[1]], digits=nd)])
push!(results_analysis, ["GIV Regression with ENSO and VAI", α₍ᵢₑ₎[1], sqrt(var₍α₍ᵢₑ₎₎[1, 1]), round.([α₍ᵢₑ₎[1] - 1.96 * sqrt(var₍α₍ᵢₑ₎₎[1, 1]), α₍ᵢₑ₎[1] + 1.96 * sqrt(var₍α₍ᵢₑ₎₎[1, 1])], digits=nd)])
push!(results_analysis, ["Deming regression (δ=0.2)", δₛ[2, 1], δₛ[3, 1], round.([δₛ[4, 1], δₛ[5, 1]], digits=nd)])
push!(results_analysis, ["Deming regression (δ=0.5)", δₛ[2, 2], δₛ[3, 2], round.([δₛ[4, 2], δₛ[5, 2]], digits=nd)])
push!(results_analysis, ["Deming regression (δ=1)", δₛ[2, 3], δₛ[3, 3], round.([δₛ[4, 3], δₛ[5, 3]], digits=nd)])
push!(results_analysis, ["Deming regression (δ=2)", δₛ[2, 4], δₛ[3, 4], round.([δₛ[4, 4], δₛ[5, 4]], digits=nd)])
push!(results_analysis, ["Deming regression (δ=5)", δₛ[2, 5], δₛ[3, 5], round.([δₛ[4, 5], δₛ[5, 5]], digits=nd)])
push!(results_analysis, ["Deming FWL regression (δ=0.2)", δₑ[2, 1], δₑ[3, 1], round.([δₑ[4, 1], δₑ[5, 1]], digits=nd)])
push!(results_analysis, ["Deming FWL regression (δ=0.5)", δₑ[2, 2], δₑ[3, 2], round.([δₑ[4, 2], δₑ[5, 2]], digits=nd)])
push!(results_analysis, ["Deming FWL regression (δ=1)", δₑ[2, 3], δₑ[3, 3], round.([δₑ[4, 3], δₑ[5, 3]], digits=nd)])
push!(results_analysis, ["Deming FWL regression (δ=2)", δₑ[2, 4], δₑ[3, 4], round.([δₑ[4, 4], δₑ[5, 4]], digits=nd)])
push!(results_analysis, ["Deming FWL regression (δ=5)", δₑ[2, 5], δₑ[3, 5], round.([δₑ[4, 5], δₑ[5, 5]], digits=nd)])
display(results_analysis)
Row | Model | Estimate | Std. Error | Confidence Int. |
---|---|---|---|---|
String | Float64 | Float64 | Array… | |
1 | OLS Regression | 0.447792 | 0.0142413 | [0.4199, 0.4757] |
2 | OLS Regression with ENSO and VAI | 0.473455 | 0.000117502 | [0.4732, 0.4737] |
3 | IV Regression (H&N LULCC) | 0.44789 | 0.0142505 | [0.42, 0.4758] |
4 | IV Regression (vMA LULCC) | 0.448152 | 0.0142597 | [0.4202, 0.4761] |
5 | GIV Regression | 0.447638 | 0.0142477 | [0.4197, 0.4756] |
6 | IV Regression (H&N LULCC) with ENSO and VAI | 0.472673 | 0.0108476 | [0.4514, 0.4939] |
7 | IV Regression (vMA LULCC) with ENSO and VAI | 0.472328 | 0.0108547 | [0.4511, 0.4936] |
8 | GIV Regression with ENSO and VAI | 0.472978 | 0.0109354 | [0.4515, 0.4944] |
9 | Deming regression (δ=0.2) | 0.462305 | 0.0150569 | [0.4547, 0.469] |
10 | Deming regression (δ=0.5) | 0.456067 | 0.0145454 | [0.4489, 0.4624] |
11 | Deming regression (δ=1) | 0.4526 | 0.0143868 | [0.4456, 0.4588] |
12 | Deming regression (δ=2) | 0.450406 | 0.0142225 | [0.4436, 0.4565] |
13 | Deming regression (δ=5) | 0.448895 | 0.0141223 | [0.4421, 0.4549] |
14 | Deming FWL regression (δ=0.2) | 0.481519 | 0.0107142 | [0.4761, 0.4865] |
15 | Deming FWL regression (δ=0.5) | 0.478173 | 0.0108849 | [0.4727, 0.4832] |
16 | Deming FWL regression (δ=1) | 0.476241 | 0.0107602 | [0.4709, 0.4812] |
17 | Deming FWL regression (δ=2) | 0.474985 | 0.0106082 | [0.4697, 0.4798] |
18 | Deming FWL regression (δ=5) | 0.474106 | 0.0106522 | [0.4688, 0.4789] |
In [84]:
= DataFrame("Model" => String[], "Estimate" => Float64[], "Std. Error" => Float64[], "Confidence Int." => Vector{Float64}[])
results_analysis
= 4;
nd
push!(results_analysis, ["Regression from 1992", α92₂, sd₍α92₂₎, round.([α92₂ - 1.96 * sd₍α92₂₎, α92₂ + 1.96 * sd₍α92₂₎], digits=nd)])
push!(results_analysis, ["Regression from 1992 with ENSO and VAI", α92ₑ[1], var₍α92ₑ₎[1,1], round.([α92ₑ[1] - 1.96 * var₍α92ₑ₎[1,1], α92ₑ[1] + 1.96 * var₍α92ₑ₎[1,1]], digits=nd)])
push!(results_analysis, ["IV Regression (H&N LULCC) from 1992", α92₍ₕₙ₎, sd₍α92₍ₕₙ₎₎, round.([α92₍ₕₙ₎ - 1.96 * sd₍α92₍ₕₙ₎₎, α92₍ₕₙ₎ + 1.96 * sd₍α92₍ₕₙ₎₎], digits=nd)])
push!(results_analysis, ["IV Regression (vMA LULCC) from 1992", α92₍ᵥₘₐ₎, sd₍α92₍ᵥₘₐ₎₎, round.([α92₍ᵥₘₐ₎ - 1.96 * sd₍α92₍ᵥₘₐ₎₎, α92₍ᵥₘₐ₎ + 1.96 * sd₍α92₍ᵥₘₐ₎₎], digits=nd)])
push!(results_analysis, ["GIV Regression from 1992", α92ᵢ, sd₍α92ᵢ₎, round.([α92ᵢ - 1.96 * sd₍α92ᵢ₎, α92ᵢ + 1.96 * sd₍α92ᵢ₎], digits=nd)])
push!(results_analysis, ["IV Regression (H&N LULCC) from 1992 with ENSO and VAI", α92₍ₕₙₑ₎[1], sd₍α92₍ₕₙₑ₎₎[1], round.([α92₍ₕₙₑ₎[1] - 1.96 * sd₍α92₍ₕₙₑ₎₎[1], α92₍ₕₙₑ₎[1] + 1.96 * sd₍α92₍ₕₙₑ₎₎[1]], digits=nd)])
push!(results_analysis, ["IV Regression (vMA LULCC) from 1992 with ENSO and VAI", α92₍ᵥₘₐₑ₎[1], sd₍α92₍ᵥₘₐₑ₎₎[1], round.([α92₍ᵥₘₐₑ₎[1] - 1.96 * sd₍α92₍ᵥₘₐₑ₎₎[1], α92₍ᵥₘₐₑ₎[1] + 1.96 * sd₍α92₍ᵥₘₐₑ₎₎[1]], digits=nd)])
push!(results_analysis, ["GIV Regression from 1992 with ENSO and VAI", α92₍ᵢₑ₎[1], sqrt(var92₍α₍ᵢₑ₎₎[1, 1]), round.([α92₍ᵢₑ₎[1] - 1.96 * sqrt(var92₍α₍ᵢₑ₎₎[1, 1]), α92₍ᵢₑ₎[1] + 1.96 * sqrt(var92₍α₍ᵢₑ₎₎[1, 1])], digits=nd)])
push!(results_analysis, ["Deming regression from 1992 (δ=0.2)", δ₉₂[2, 1], δ₉₂[3, 1], round.([δ₉₂[4, 1], δ₉₂[5, 1]], digits=nd)])
push!(results_analysis, ["Deming regression from 1992 (δ=0.5)", δ₉₂[2, 2], δ₉₂[3, 2], round.([δ₉₂[4, 2], δ₉₂[5, 2]], digits=nd)])
push!(results_analysis, ["Deming regression from 1992 (δ=1)", δ₉₂[2, 3], δ₉₂[3, 3], round.([δ₉₂[4, 3], δ₉₂[5, 3]], digits=nd)])
push!(results_analysis, ["Deming regression from 1992 (δ=2)", δ₉₂[2, 4], δ₉₂[3, 4], round.([δ₉₂[4, 4], δ₉₂[5, 4]], digits=nd)])
push!(results_analysis, ["Deming regression from 1992 (δ=5)", δ₉₂[2, 5], δ₉₂[3, 5], round.([δ₉₂[4, 5], δ₉₂[5, 5]], digits=nd)])
push!(results_analysis, ["Deming FWL regression from 1992 (δ=0.2)", δ₉₂ₑ[2, 1], δ₉₂ₑ[3, 1], round.([δ₉₂ₑ[4, 1], δ₉₂ₑ[5, 1]], digits=nd)])
push!(results_analysis, ["Deming FWL regression from 1992 (δ=0.5)", δ₉₂ₑ[2, 2], δ₉₂ₑ[3, 2], round.([δ₉₂ₑ[4, 2], δ₉₂ₑ[5, 2]], digits=nd)])
push!(results_analysis, ["Deming FWL regression from 1992 (δ=1)", δ₉₂ₑ[2, 3], δ₉₂ₑ[3, 3], round.([δ₉₂ₑ[4, 3], δ₉₂ₑ[5, 3]], digits=nd)])
push!(results_analysis, ["Deming FWL regression from 1992 (δ=2)", δ₉₂ₑ[2, 4], δ₉₂ₑ[3, 4], round.([δ₉₂ₑ[4, 4], δ₉₂ₑ[5, 4]], digits=nd)])
push!(results_analysis, ["Deming FWL regression from 1992 (δ=5)", δ₉₂ₑ[2, 5], δ₉₂ₑ[3, 5], round.([δ₉₂ₑ[4, 5], δ₉₂ₑ[5, 5]], digits=nd)])
= round.(results_analysis.Estimate, digits=nd)
results_analysis.Estimate "Std. Error" = round.(results_analysis."Std. Error", digits=nd)
results_analysis.display(results_analysis)
Row | Model | Estimate | Std. Error | Confidence Int. |
---|---|---|---|---|
String | Float64 | Float64 | Array… | |
1 | Regression from 1992 | 0.4496 | 0.0185 | [0.4134, 0.4858] |
2 | Regression from 1992 with ENSO and VAI | 0.4622 | 0.0001 | [0.462, 0.4625] |
3 | IV Regression (H&N LULCC) from 1992 | 0.4496 | 0.0173 | [0.4157, 0.4836] |
4 | IV Regression (vMA LULCC) from 1992 | 0.4502 | 0.0245 | [0.4022, 0.4982] |
5 | GIV Regression from 1992 | 0.4495 | 0.0173 | [0.4156, 0.4834] |
6 | IV Regression (H&N LULCC) from 1992 with ENSO and VAI | 0.4622 | 0.0112 | [0.4402, 0.4842] |
7 | IV Regression (vMA LULCC) from 1992 with ENSO and VAI | 0.4623 | 0.0112 | [0.4402, 0.4843] |
8 | GIV Regression from 1992 with ENSO and VAI | 0.4622 | 0.0112 | [0.4401, 0.4842] |
9 | Deming regression from 1992 (δ=0.2) | 0.4598 | 0.0174 | [0.451, 0.4674] |
10 | Deming regression from 1992 (δ=0.5) | 0.4555 | 0.0175 | [0.4468, 0.4629] |
11 | Deming regression from 1992 (δ=1) | 0.4531 | 0.0174 | [0.4445, 0.4604] |
12 | Deming regression from 1992 (δ=2) | 0.4515 | 0.0174 | [0.443, 0.4588] |
13 | Deming regression from 1992 (δ=5) | 0.4504 | 0.0171 | [0.4422, 0.4576] |
14 | Deming FWL regression from 1992 (δ=0.2) | 0.4662 | 0.0105 | [0.4611, 0.4709] |
15 | Deming FWL regression from 1992 (δ=0.5) | 0.4645 | 0.0106 | [0.4594, 0.4693] |
16 | Deming FWL regression from 1992 (δ=1) | 0.4636 | 0.0106 | [0.4584, 0.4683] |
17 | Deming FWL regression from 1992 (δ=2) | 0.463 | 0.0108 | [0.4577, 0.4677] |
18 | Deming FWL regression from 1992 (δ=5) | 0.4625 | 0.0107 | [0.4574, 0.4672] |