Econometrics
Assumptions:
The OLS solution is given by ˆβ=(X′X)−1X′Y.
It maps Y into a vector of fitted values Xˆβ:=PXY and a vector of residuals ˆU:=MXY using: PX=X(X′X)−1X′, and MX=I−PX.
It is easy to show that PX and MX are projection matrices, and that PXX=X, and MXX=0.
Moreover, they are complementary since PXMX=0.
By Pythagoras’ Theorem, ||PXY||2≤||Y||2.
Moreover, the OLS residuals are orthogonal to all the regressors. X′ˆU=X′MXY=0.
In particular, if the regressors include a constant, then the residuals sum to zero, ∑Nt=1ˆUt=0.
We are interested in analyzing the effect that partitioning the regressors have on the estimators.
Assume we broke up the regressors into two groups X=[X1 X2] so that Y=X1β1+X2β2+U.
In general, the OLS estimator of β1 depends on X2. (More in the next lecture.)
In the special case that X1 is orthogonal to X2 we obtain the same OLS estimate for β1 using the complete specification than the one using the reduced specification: Y=X1β1+V.
Analogously, we obtain the same estimate for β2 if we remove X1 from the regression.
In the general case, the two regressions Y=X1β1+MX1X2β2+U, and Y=MX1X2β2+V, yield identical estimates for β2.
Nonetheless, they do not yield the same residuals.
To recover the same residuals we need to clean Y from the effect of X1.
Theorem (Frisch-Waugh-Lovell)
The OLS estimates of β2 in the regressions Y=X1β1+X2β2+U, and MX1Y=MX1X2β2+U, are numerically identical.
Moreover, the residuals in both regressions are numerically identical.
If the regression includes a constant via ι a vector of ones. Y=ιβ0+Xβ1+U.
The FWL theorem shows that the estimator for β1 is the same if we instead run the regression MιY=MιXβ1+U.
Same estimates in a regression with a constant on raw data than with no constant using demeaned data.
Some variables commonly used contain time trends.
We may consider a regression like Y=α0ι+α1T+Xβ+U, where T′=[1,2,⋯,N] to capture the time trend.
The FWL theorem shows that we obtain the same estimates if we instead run the regression using detrended data.
Alternatively, some variables may show a seasonal behavior.
We can model seasonality using seasonal dummy variables Y=α1s1+α2s2+α3s3+α4s4+Xβ+U, where si are the seasonal dummy variables.
The FWL theorem tells us that we can estimate β using deseasonalized data.
Definition
The coefficient of determination or (uncentered) R2 is defined as R2=||PXY||2||Y||2, which is between 0 and 1.
Nonetheless, the (uncentered) R2 is not invariant to translations
Consider ˜Y:=Y+αι in a regression where X includes a constant, then R2=||PXY+αι||2||Y+αι||2.
By choosing α, we can make R2 as close to 1 as we want.
To avoid this problem (for regressions that include a constant) the FWL theorem tells us that we can demean the series without changing the estimates or residuals.
This gives rise to the (centered) R2 defined as R2=||PXMιY||2||MιY||2, which is unaffected by translations.
Theorem
Under correct specification; exogenous regressors; homoskedastic, no-autocorrelated, and normally-distributed errors, the OLS estimator follows a normal distribution with mean β and variance σ2(X′X)−1. ˆβ∼N(β,σ2(X′X)−1).
Theorem (Gauss-Markov)
In a regression under correct specification, exogenous regressors, homoskedastic and no-autocorrelated errors, the OLS estimator is more efficient than any other linear unbiased estimator.
In other words, the OLS estimator is the Best Linear Unbiased Estimator (BLUE).
To show that OLS is unbiased we require E[(X′X)−1X′U]=0.
There are two possibilities:
or
Exogeneity imposes that all errors are not related to past and future values of the regressors.
For instance, ˆβ is biased under the weaker assumption of predetermined regressors, E(Ut|Xt)=0.
using StatsPlots, Distributions, Random
Random.seed!(123)
R = 100;
N = 5000;
beta = zeros(R, 1)
for ii in 1:R
V = rand(Normal(0, 1), N + 1) # error term
Y = zeros(N + 1, 1)
Y[1] = V[1] # first observation
for jj = 2:N+1
Y[jj] = 0.5 * Y[jj-1] + V[jj] # regressand
end
Y₁ = Y[2:(N+1)]
Y₀ = Y[1:N]
beta[ii] = (Y₀' * Y₀) \ (Y₀' * Y₁) # OLS estimator
end
theme(:ggplot2)
boxplot(beta, label="Estimated regressor", orientation=:horizontal)
vline!([0.5], label="True value", color="red", lw=3)
Consistency requires:
plimn→∞X′U=0, after standardization.
plimn→∞(X′X)−1 exists and has a finite limit.
Using the law of large numbers, we can show that predeterminedness implies plimn→∞1nX′U=0.
The proof relies on the law of large numbers. Thus, errors cannot be too correlated or have unbounded variances.
Another use of the law of large numbers can be used to show that plimn→∞(1nX′X)−1=SXX<∞.
So that regressors must not be too correlated, and their variances do not increase without bound.
OLS is consistent under predetermined regressors.
using StatsPlots, Distributions, Random
Random.seed!(123)
R = 100
Ns = [10; 30; 50; 100; 200; 300; 400; 500]
nn = length(Ns)
beta = zeros(R, nn)
for kk = 1:nn
N = Ns[kk]
for ii = 1:R
V = rand(Normal(0, 1), N + 1) # error term
Y = zeros(N + 1, 1)
Y[1] = V[1] # first observation
for jj = 2:N+1
Y[jj] = 0.5 * Y[jj-1] + V[jj] # regressand
end
Y₁ = Y[2:(N+1)]
Y₀ = Y[1:N]
beta[ii, kk] = (Y₀' * Y₀) \ (Y₀' * Y₁) # OLS estimator
end
end
theme(:ggplot2)
plot(Ns, mean(beta, dims=1)', ylims=[0.44, 0.51], line=(:steppre, :dot, 0.5, 4, :blue), label="Estimated beta")
hline!([0.5], label="True value", color="red", ls=:dash, lw=1)
We measure the OLS precision by its covariance matrix under the assumptions on the error term’s second moments.
Assuming homoskedasticity and no autocorrelation, we have: Var(ˆβ)=σ2(X′X)−1.
It can be shown that it depends on the variance of the error term, the sample size, and the regressors’ relationship.
Dependence on the variance of the error term is straightforward.
The dependence on the sample size can be seen if we write Var(ˆβ)=σ2(X′X)−1=(1nσ2)(1nX′X)−1, assuming, as before, that plimn→∞(1nX′X)−1=SXX.
Consider the regression Y=X1β1+x2β2+U, where X=[X1, x2], and xx is a column vector.
From the FWL theorem, ^β2 is the same as the one from M1Y=M1x2β2+V.
Thus, Var(^β2)=σ2/(x′2M1x2), which shows its dependence on the regressors’ relation.
using StatsPlots, Distributions, Random
Random.seed!(123)
R = 1000
N = 100
beta = zeros(R, 2)
for ii = 1:R
U = rand(Normal(0, 1), N ) # error term
# Uncorrelated regressors
X₁ = rand(Normal(0, 1), N) # regressor 1
X₂ = rand(Normal(0, 1), N) # regressor 2
Y = X₁ + X₂ + U # regressand
X = [X₁ X₂]
betas = (X'*X) \ (X'*Y) # OLS estimator
beta[ii, 1] = betas[1]
# Correlated regressors
X₁ = rand(Normal(0, 1), N) # regressor 1
X₂ = 0.5 * X₁ + rand(Normal(0, 1), N) # regressor 2
Y = X₁ + X₂ + U # regressand
X = [X₁ X₂]
betas = (X'*X) \ (X'*Y) # OLS estimator
beta[ii, 2] = betas[1]
end
theme(:ggplot2)
histogram(beta, label=["Estimated beta (Uncorrelated)" "Estimated beta (Correlated)"], legend=:topleft, fillalpha = 0.25,normalize=true)
x = range(0.5, stop=1.5, length=1000)
plot!(x, pdf.(Normal(1, 1/sqrt(N)), x), label="Normal density", color="red", ls=:dash, lw=2)
Department of Mathematical Sciences