using DataFrames, XLSX, LinearAlgebra
= "AF_data.xlsx"
path
= DataFrame(XLSX.readtable(path, "Data"))
data
= data[!, 1];
year = Vector{Float64}(data[!, 4]);
fossilfuels = Vector{Float64}(data[!, 6]);
lulcc = fossilfuels .+ lulcc;
emissions = Vector{Float64}(data[!, 5]);
coverage
= Vector{Float64}(data[!,9]);
VAI = Vector{Float64}(data[!, 10]);
ENSO = emissions;
E = coverage; G
Replication Noteboook
A regression-based approach to the CO\(_2\) airborne fraction
The following is a replication notebook of the paper by Bennedsen, Hillebrand, and Koopman (2024). The replication is done in Julia using Quarto as part of the paper “On measurements errors in CO\(_2\) airborne fraction estimates” by J. Eduardo Vera-Valdés and Charisios Grivas. This notebook contains no new results.
Notebook setup
This notebook is written in Julia and uses the following packages:
DataFrames
for data manipulationXLSX
for reading data from an Excel filePlots
LinearAlgebra
Statsistics
HypothesisTests
All packages are available in the Julia registry and can be installed using the Julia package manager with the following command:
using Pkg
Pkg.add("DataFrames", "XLSX", "Plots", "LinearAlgebra", "Statistics", "HypothesisTests")
In the following, we load a project environment that contains the necessary packages. This step is not required if the packages are already installed in the current environment.
Airborne fraction
The airborne fraction is the fraction of CO\(_2\) emissions that remain in the atmosphere. It is a key parameter in the carbon cycle and is used to estimate the impact of human activities on the climate system. The airborne fraction is defined as the ratio of the increase in atmospheric CO\(_2\) concentration to the total CO\(_2\) emissions. It is usually expressed as a percentage.
Data
We load the data from an Excel file and plot the CO\(_2\) emissions and the atmospheric CO\(_2\) concentration over time.
The data is neatly collected in an Excel file in the author’s GitHub repository at the following link.
To ease things up, we have downloaded the data directly from the repository and saved it in the file AF_data.xlsx
in the local folder.
We can read the data using the XLSX.jl
package, convert it to a data frame using DataFrames.jl
. We then recover the year, emissions, and coverage variables. Note that emissions are defined as the sum of fossil fuels (FF) and land-use and land-coverage changes (LULCC).
Note that atmospheric concentration growth, denoted G
from hereinafter, is transformed into a vector of Float64
at definition to avoid type issues. Emissions, the sum of fossil fuels (fossilfuels
) and land-use and land-coverage changes (lulcc
), are denoted by E
.
Once loaded, we can plot the data using the Plots.jl
package.
Classic estimation
Commonly, the airborne fraction is estimated using the following formula:
\[\frac{G_t}{E_t} = \alpha + \epsilon_t\]
where \(G_t\) is the atmospheric CO\(_2\) concentration at time \(t\), \(E_t\) is the total CO\(_2\) emissions at time \(t\), and \(\epsilon_t\) is the error term that captures the natural variability in the carbon cycle. The parameter \(\alpha\) is the airborne fraction.
In practice, we estimate \(\alpha\) by taking the mean of the ratio of the coverage to the emissions.
using Statistics
= mean(G ./ E) α₁
0.43856861803874964
This value is the estimated airborne fraction, which is the consensus value in the literature.
We plot the yearly airborne fraction and the estimated mean airborne fraction.
A new approach
Recently, Bennedsen, Hillebrand, and Koopman (2024) has suggested a new approach to estimate the airborne fraction. They propose to use the following formula:
\[G_t = \alpha E_t + \epsilon_t,\]
and estimate \(\alpha\), the airborne fraction, using ordinary least squares (OLS). They argue that this approach provides better statistical properties. Among them, the OLS estimator is super-consistent, meaning that it converges to the true value at a faster rate than the standard OLS estimator. They also show that the estimator has lower variance and is asymptotically normal.
The new approach relies on estimating the cointegration relationship between the emissions and the coverage using OLS. As for all cointegration analyses, as a first step, we need to check if the variables are integrated of the same order. We can do this by testing for the presence of a unit root in the series.
Unit root test
We use the Augmented Dickey-Fuller (ADF) (Dickey and Fuller 1979) test to test for the presence of a unit root in the series. The null hypothesis is that the series has a unit root, while the alternative hypothesis is that the series is stationary.
In Julia, we can use the ADFTest
function from the HypothesisTests.jl
package to perform the test.
As a demonstration, we test the emissions series for the presence of a unit root in a model with a trend and two lags.
using HypothesisTests
= ADFTest(E, :trend, 2) τᵉₜ
Augmented Dickey-Fuller unit root test
--------------------------------------
Population details:
parameter of interest: coefficient on lagged non-differenced variable
value under h_0: 0
point estimate: -0.262114
Test summary:
outcome with 95% confidence: fail to reject h_0
p-value: 0.2907
Details:
sample size in regression: 61
number of lags: 2
ADF statistic: -2.57687
Critical values at 1%, 5%, and 10%: [-4.10768 -3.48147 -3.16849]
In this case, the null hypothesis of a unit root in the emissions series is not rejected.
We can perform the same test for the coverage series while considering different combinations of models and lags.
# Dataframe to store the results
= DataFrame("Variable" => String[], "Model" => String[], "L = 0" => Float64[], "L = 1" => Float64[], "L = 2" => Float64[], "L = 3" => Float64[], "L = 4" => Float64[], "L = 5" => Float64[])
resultsdf
for variable in [:E, :G]
for model in [:none, :constant, :trend]
= zeros(6)
fila for lags in 0:5
= ADFTest(eval(variable), model, lags)
τ +1] = pvalue(τ)
fila[lagsend
push!(resultsdf, [titlecase(string(variable)), titlecase(string(model)), fila...])
end
end
resultsdf
Row | Variable | Model | L = 0 | L = 1 | L = 2 | L = 3 | L = 4 | L = 5 |
---|---|---|---|---|---|---|---|---|
String | String | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | |
1 | E | None | 1.0 | 1.0 | 0.999998 | 0.999812 | 0.997157 | 0.99959 |
2 | E | Constant | 0.953474 | 0.939141 | 0.91181 | 0.862177 | 0.828175 | 0.896349 |
3 | E | Trend | 0.0834996 | 0.202047 | 0.290654 | 0.313837 | 0.213148 | 0.474568 |
4 | G | None | 0.235183 | 0.520472 | 0.695203 | 0.803914 | 0.84451 | 0.85085 |
5 | G | Constant | 0.00180684 | 0.0718307 | 0.299885 | 0.444168 | 0.465884 | 0.386677 |
6 | G | Trend | 2.4945e-9 | 7.17214e-6 | 0.0011416 | 0.0129135 | 0.016052 | 0.00266726 |
Based on these results, we cannot reject the null hypothesis of a unit root in the emissions
series for all models and lags. For the coverage
series, the tests reject except for the model with a trend. This suggests that both series seem stationary.
Cointegration test
We can test for cointegration between the emissions and the coverage using the Engle and Granger (1987) test. The null hypothesis is that there is no cointegration relationship between the series, while the alternative hypothesis is that there is a cointegration relationship.
To test for cointegration, we first estimate the OLS regression of the coverage on the emissions. We then test the residuals for a unit root using the ADF test. Note that the residuals should be stationary if there is a cointegration relationship between the series.
= (E'E) \ (E'G) α₂
0.4477918844144535
The estimated airborne fraction is slightly larger than the classical estimate.
To test if there is a cointegration relationship, and thus that we have a valid estimate, we must recover the residuals and test them for a unit root.
= G - α₂ * E
res₂
= ADFTest(res₂, :none, 0) τᵣ
Augmented Dickey-Fuller unit root test
--------------------------------------
Population details:
parameter of interest: coefficient on lagged non-differenced variable
value under h_0: 0
point estimate: -0.963507
Test summary:
outcome with 95% confidence: reject h_0
p-value: <1e-11
Details:
sample size in regression: 63
number of lags: 0
ADF statistic: -7.58332
Critical values at 1%, 5%, and 10%: [-2.60156 -1.9459 -1.61324]
The test statistic has to be compared against the critical values generated by MacKinnon (2010). We can reject the null hypothesis of a unit root in the residuals, which suggests that there is a cointegration relationship between the emissions and the coverage and the estimate is valid.
Standard errors
We compute the standard errors of the estimates of the airborne fraction using the formula:
\[SD(\hat{\alpha}) = \sqrt{ \hat{\sigma}^2_\epsilon (E'E)^{-1} },\]
where \(\hat{\sigma}^2_\epsilon = \sum\hat{\epsilon}^2/N\) is the estimate of the variance of the error term and \(\hat{\epsilon}_t\) are the residuals from the regression.
= sum((G - α₂ * E).^2)
rss₂ = rss₂ / (length(G) - 1)
σ²₂
= sqrt( σ²₂ / (E'E) ) sd₍α₂₎
0.014241317441433234
Additional covariates
We consider adding additional covariates to the model. In particular, we consider adding \(ENSO\) (El Niño Southern Oscillation) and \(VAI\) (volcanic activity index) as covariates. These variables are known to affect the carbon cycle and can potentially influence the airborne fraction.
We estimate the following model:
\[G_t = \alpha E_t + \gamma_1 ENSO_t + \gamma_2 VAI_t + \epsilon_t,\]
where \(ENSO_t\) and \(VAI_t\) are the El Niño Southern Oscillation and volcanic activity index at time \(t\), respectively.
Note that the authors first detrend the ENSO series before estimating the model. We can do this by regressing the series on a time trend and taking the residuals.
= length(ENSO)
T = [ones(T) collect(1:T)]
Xₜ = (Xₜ'Xₜ) \ (Xₜ'ENSO)
ρ = ENSO - Xₜ * ρ; ENSOᵨ
We estimate the extended model using OLS.
= [E ENSOᵨ VAI]
Xₑ
= (Xₑ'Xₑ) \ (Xₑ'G) αₑ
3-element Vector{Float64}:
0.4696645797106208
1.0024516231210219
-15.1482109617327
Standard errors are computed with the multivariate version of the variance formula:
\[Var(\hat{\alpha}) = \hat{\sigma}^2_\epsilon(X'X)^{-1},\]
where \(X\) is the matrix of all regressors.
= sum((G - Xₑ*αₑ).^2)
rssₑ = rssₑ / (length(G) - 3)
σ²ₑ = σ²ₑ * inv(Xₑ'Xₑ)
var₍αₑ₎ sqrt.(var₍αₑ₎[1,1])
0.01058726080675776
Recent subsample
Given the variability of the LULCC measurements at the beginning of the series, we consider a recent subsample of the data. We consider the data from 1992 and estimate the airborne fraction using the new approach.
Getting subsample data.
= E[year .>= 1992];
E92 = G[year .>= 1992];
G92 = VAI[year .>= 1992];
VAI92 92 = ENSOᵨ[year .>= 1992]; ENSOᵨ
Estimating in the subsample.
Simple specification.
92₂ = (E92'E92) \ (E92'G92)
α= sum((G92 - α92₂ * E92).^2)
rss92₂ 92²₂ = rss92₂ / (length(G92) - 1)
σ92₂₎ = sqrt( σ92²₂ / (E92'E92) )
sd₍α
92₂ sd₍α92₂₎] [α
1×2 Matrix{Float64}:
0.44967 0.017309
= [E92 ENSOᵨ92 VAI92]
X92ₑ 92ₑ = (X92ₑ'X92ₑ) \ (X92ₑ'G92)
α
= sum((G92 - X92ₑ*α92ₑ).^2)
rss92ₑ 92²ₑ = rss92ₑ / (length(G92) - 3)
σ92ₑ₎ = σ92²ₑ * inv(X92ₑ'X92ₑ)
var₍α
92ₑ var₍α92ₑ₎] [α
3×4 Matrix{Float64}:
0.461253 0.000126359 9.12279e-5 -0.0092892
1.02214 9.12279e-5 0.0296269 -0.157186
-17.5878 -0.0092892 -0.157186 13.5323
Other datasets
We can also test the new approach on other datasets. We can use the same methodology to estimate the airborne fraction. The preferred data for the LULCC emissions are from the Global Carbon Project (Friedlingstein et al. 2023). However, we can also use data from Houghton and Castanho (2022) and Marle et al. (2022).
= Vector{Float64}(data[!, 7]);
lulcc₂ = Vector{Float64}(data[!, 8]);
lulcc₃
= fossilfuels .+ lulcc₂;
E₂ = fossilfuels .+ lulcc₃;
E₃
= (E₂'E₂) \ (E₂'G)
α₆ = sum((G - α₆ * E₂).^2)
rss₆ = rss₆ / (length(G) - 1)
σ²₆ = sqrt( σ²₆ / (E₂'E₂) )
sd₍α₆₎
= (E₃'E₃) \ (E₃'G)
α₇ = sum((G - α₇ * E₃).^2)
rss₇ = rss₇ / (length(G) - 1)
σ²₇ = sqrt( σ²₇ / (E₃'E₃) )
sd₍α₇₎
[α₆ sd₍α₆₎; α₇ sd₍α₇₎]
2×2 Matrix{Float64}:
0.475539 0.0152315
0.490211 0.0157142
Deming regression estimator
We can also estimate the airborne fraction using Deming regression (Deming 1943). Deming regression is a method for estimating the parameters of a linear regression model when both the dependent and independent variables are subject to measurement error. The Deming regression assumes that the measurement errors in the dependent and independent variables are normally distributed with known variances.
The Deming regression estimator is given by:
\[ \hat{\alpha}_{Deming} = \frac{M_{GG}-\delta M_{EE} + \sqrt{(M_{GG}-\delta M_{EE})^2+4\delta M^2_{EG}}}{2M_{EG}}, \]
where
\[M_{GG} = \frac{1}{T}\sum_{t=1}^T G_t^2,\] \[M_{EE} = \frac{1}{T}\sum_{t=1}^T E_t^2,\] \[M_{EG} = \frac{1}{T}\sum_{t=1}^T E_t G_t,\]
and \(\delta = \frac{Variance(\omega_{G,t})}{Variance(\eta_{E,t})}\) is the ratio of the variance of the measurement error in the coverage to the variance of the measurement error in emissions.
Several values for \(\delta\) are tried, given that the true value is unknown.
= E'E
M₍ₑₑ₎ = E'G
M₍ₑₐ₎ = G'G
M₍ₐₐ₎
= zeros(2, 5)
δ 1, :] = [0.2 0.5 1 2 5]
δ[
for ii = 1:5
2, ii] = ( M₍ₐₐ₎ - δ[1, ii] * M₍ₑₑ₎ + sqrt( (M₍ₐₐ₎ - δ[1, ii] * M₍ₑₑ₎)^2 + 4 * δ[1, ii] * M₍ₑₐ₎^2 ) ) / (2 * M₍ₑₐ₎)
δ[ end
display(δ)
2×5 Matrix{Float64}:
0.2 0.5 1.0 2.0 5.0
0.462305 0.456067 0.4526 0.450406 0.448895
Other drawbacks of the Deming regression is that there is no closed form expression to obtain the standard errors. It also cannot easily handle additional regressors like in the preferred specification. We solve these issues in the paper: On measurements errors in CO\(_2\) airborne fraction estimates by J. Eduardo Vera-Valdés and Charisios Grivas.
Summary of results
= DataFrame("Model" => String[], "Estimate" => Float64[], "Std. Error" => Float64[], "Confidence Int." => Vector{Float64}[] )
results_replication = 4;
nd
push!(results_replication, ["Regression", α₂, sd₍α₂₎, round.([α₂ - 1.96 * sd₍α₂₎, α₂ + 1.96 * sd₍α₂₎], digits=nd) ] )
push!(results_replication, ["Regression with ENSO and VAI", αₑ[1], sqrt(var₍αₑ₎[1,1]), round.([αₑ[1] - 1.96 * sqrt(var₍αₑ₎[1,1]), αₑ[1] + 1.96 * sqrt(var₍αₑ₎[1,1])], digits=nd) ] )
push!(results_replication, ["Regression from 1992", α92₂, sd₍α92₂₎, round.([α92₂ - 1.96 * sd₍α92₂₎, α92₂ + 1.96 * sd₍α92₂₎], digits=nd) ] )
push!(results_replication, ["Regression from 1992 with ENSO and VAI", α92ₑ[1], sqrt(var₍α92ₑ₎[1,1]), round.([α92ₑ[1] - 1.96 * sqrt(var₍α92ₑ₎[1,1]), α92ₑ[1] + 1.96 * sqrt(var₍α92ₑ₎[1,1])], digits=nd) ] )
push!(results_replication, ["Regression with LULCC (H&N)", α₆, sd₍α₆₎, round.([α₆ - 1.96 * sd₍α₆₎, α₆ + 1.96 * sd₍α₆₎], digits=nd) ] )
push!(results_replication, ["Regression with LULCC (vMa)", α₇, sd₍α₇₎, round.([α₇ - 1.96 * sd₍α₇₎, α₇ + 1.96 * sd₍α₇₎], digits=nd) ] )
push!(results_replication, ["Deming regression (δ=0.2)", δ[2, 1], NaN, [NaN] ] )
push!(results_replication, ["Deming regression (δ=0.5)", δ[2, 2], NaN, [NaN] ] )
push!(results_replication, ["Deming regression (δ=1)", δ[2, 3], NaN, [NaN] ] )
push!(results_replication, ["Deming regression (δ=2)", δ[2, 4], NaN, [NaN] ] )
push!(results_replication, ["Deming regression (δ=5)", δ[2, 5], NaN, [NaN] ] )
= round.(results_replication.Estimate, digits=nd)
results_replication.Estimate "Std. Error" = round.(results_replication."Std. Error", digits=nd)
results_replication.display(results_replication)
Row | Model | Estimate | Std. Error | Confidence Int. |
---|---|---|---|---|
String | Float64 | Float64 | Array… | |
1 | Regression | 0.4478 | 0.0142 | [0.4199, 0.4757] |
2 | Regression with ENSO and VAI | 0.4697 | 0.0106 | [0.4489, 0.4904] |
3 | Regression from 1992 | 0.4497 | 0.0173 | [0.4157, 0.4836] |
4 | Regression from 1992 with ENSO and VAI | 0.4613 | 0.0112 | [0.4392, 0.4833] |
5 | Regression with LULCC (H&N) | 0.4755 | 0.0152 | [0.4457, 0.5054] |
6 | Regression with LULCC (vMa) | 0.4902 | 0.0157 | [0.4594, 0.521] |
7 | Deming regression (δ=0.2) | 0.4623 | NaN | [NaN] |
8 | Deming regression (δ=0.5) | 0.4561 | NaN | [NaN] |
9 | Deming regression (δ=1) | 0.4526 | NaN | [NaN] |
10 | Deming regression (δ=2) | 0.4504 | NaN | [NaN] |
11 | Deming regression (δ=5) | 0.4489 | NaN | [NaN] |
Note that Bennedsen, Hillebrand, and Koopman (2024) considered heteroscedasticity and autocorrelation robust standard errors. Nonetheless, their selected bandwidth is quite small, so that they are almost identical to the OLS standard errors. We report here the latter for simplicity.