Econometrics
Assumptions:
The OLS solution is given by \(\hat{\beta}=(X'X)^{-1}X'Y\).
It maps \(Y\) into a vector of fitted values \(X\hat{\beta}:=P_XY\) and a vector of residuals \(\hat{U}:=M_X Y\) using: \[ P_X = X(X'X)^{-1}X', \ \ \ \text{and}\ \ \ \ \ M_X = I-P_X. \]
It is easy to show that \(P_X\) and \(M_X\) are projection matrices, and that \(P_XX = X\), and \(M_XX = 0\).
Moreover, they are complementary since \(P_XM_X = 0\).
By Pythagoras’ Theorem, \[ ||P_XY||^2 \leq ||Y||^2.\]
Moreover, the OLS residuals are orthogonal to all the regressors. \[X'\hat{U} = X'M_XY = 0.\]
In particular, if the regressors include a constant, then the residuals sum to zero, \(\sum_{t=1}^N \hat{U}_t = 0\).
We are interested in analyzing the effect that partitioning the regressors have on the estimators.
Assume we broke up the regressors into two groups \(X = [X_1\ X_2]\) so that \[Y = X_1\beta_1 + X_2\beta_2 + U.\]
In general, the OLS estimator of \(\beta_1\) depends on \(X_2\). (More in the next lecture.)
In the special case that \(X_1\) is orthogonal to \(X_2\) we obtain the same OLS estimate for \(\beta_1\) using the complete specification than the one using the reduced specification: \[Y = X_1\beta_1 + V.\]
Analogously, we obtain the same estimate for \(\beta_2\) if we remove \(X_1\) from the regression.
In the general case, the two regressions \[Y = X_1\beta_1 + M_{X_1}X_2\beta_2 + U,\] and \[Y = M_{X_1}X_2\beta_2 + V,\] yield identical estimates for \(\beta_2\).
Nonetheless, they do not yield the same residuals.
To recover the same residuals we need to clean \(Y\) from the effect of \(X_1\).
Theorem (Frisch-Waugh-Lovell)
The OLS estimates of \(\beta_2\) in the regressions \[Y = X_1\beta_1 + X_2\beta_2 + U,\] and \[M_{X_1}Y = M_{X_1}X_2\beta_2 + U,\] are numerically identical.
Moreover, the residuals in both regressions are numerically identical.
If the regression includes a constant via \(\iota\) a vector of ones. \[Y = \iota\beta_0 + X\beta_1 + U.\]
The FWL theorem shows that the estimator for \(\beta_1\) is the same if we instead run the regression \[M_{\iota}Y = M_{\iota}X\beta_1 + U.\]
Same estimates in a regression with a constant on raw data than with no constant using demeaned data.
Some variables commonly used contain time trends.
We may consider a regression like \[Y = \alpha_0 \iota + \alpha_1 T + X\beta + U,\] where \(T' = [1,2,\cdots,N]\) to capture the time trend.
The FWL theorem shows that we obtain the same estimates if we instead run the regression using detrended data.
Alternatively, some variables may show a seasonal behavior.
We can model seasonality using seasonal dummy variables \[Y = \alpha_1 s_1 + \alpha_2 s_2 + \alpha_3 s_3 + \alpha_4 s_4 + X\beta + U,\] where \(s_i\) are the seasonal dummy variables.
The FWL theorem tells us that we can estimate \(\beta\) using deseasonalized data.
Definition
The coefficient of determination or (uncentered) \(R^2\) is defined as \[R^2 = \frac{||P_XY||^2}{||Y||^2},\] which is between 0 and 1.
Nonetheless, the (uncentered) \(R^2\) is not invariant to translations
Consider \(\tilde{Y} := Y + \alpha \iota\) in a regression where \(X\) includes a constant, then \[R^2 = \frac{||P_XY + \alpha \iota||^2}{||Y + \alpha\iota||^2}.\]
By choosing \(\alpha\), we can make \(R^2\) as close to 1 as we want.
To avoid this problem (for regressions that include a constant) the FWL theorem tells us that we can demean the series without changing the estimates or residuals.
This gives rise to the (centered) \(R^2\) defined as \[R^2 = \frac{||P_XM_\iota Y||^2}{||M_\iota Y||^2},\] which is unaffected by translations.
Theorem
Under correct specification; exogenous regressors; homoskedastic, no-autocorrelated, and normally-distributed errors, the OLS estimator follows a normal distribution with mean \(\beta\) and variance \(\sigma^2(X'X)^{-1}\). \[\hat{\beta} \sim N(\beta, \sigma^2(X'X)^{-1}).\]
Theorem (Gauss-Markov)
In a regression under correct specification, exogenous regressors, homoskedastic and no-autocorrelated errors, the OLS estimator is more efficient than any other linear unbiased estimator.
In other words, the OLS estimator is the Best Linear Unbiased Estimator (BLUE).
To show that OLS is unbiased we require \[E[(X'X)^{-1}X'U] = 0.\]
There are two possibilities:
or
Exogeneity imposes that all errors are not related to past and future values of the regressors.
For instance, \(\hat{\beta}\) is biased under the weaker assumption of predetermined regressors, \(E(U_t|X_t)=0\).
using StatsPlots, Distributions, Random
Random.seed!(123)
R = 100;
N = 5000;
beta = zeros(R, 1)
for ii in 1:R
V = rand(Normal(0, 1), N + 1) # error term
Y = zeros(N + 1, 1)
Y[1] = V[1] # first observation
for jj = 2:N+1
Y[jj] = 0.5 * Y[jj-1] + V[jj] # regressand
end
Y₁ = Y[2:(N+1)]
Y₀ = Y[1:N]
beta[ii] = (Y₀' * Y₀) \ (Y₀' * Y₁) # OLS estimator
end
theme(:ggplot2)
boxplot(beta, label="Estimated regressor", orientation=:horizontal)
vline!([0.5], label="True value", color="red", lw=3)
Consistency requires:
\(plim_{n\to\infty} X'U = 0\), after standardization.
\(plim_{n\to\infty} (X'X)^{-1}\) exists and has a finite limit.
Using the law of large numbers, we can show that predeterminedness implies \(plim_{n\to\infty} \frac{1}{n}X'U=0.\)
The proof relies on the law of large numbers. Thus, errors cannot be too correlated or have unbounded variances.
Another use of the law of large numbers can be used to show that \(plim_{n\to\infty} \left(\frac{1}{n}X'X\right)^{-1}=S_{XX}<\infty.\)
So that regressors must not be too correlated, and their variances do not increase without bound.
OLS is consistent under predetermined regressors.
using StatsPlots, Distributions, Random
Random.seed!(123)
R = 100
Ns = [10; 30; 50; 100; 200; 300; 400; 500]
nn = length(Ns)
beta = zeros(R, nn)
for kk = 1:nn
N = Ns[kk]
for ii = 1:R
V = rand(Normal(0, 1), N + 1) # error term
Y = zeros(N + 1, 1)
Y[1] = V[1] # first observation
for jj = 2:N+1
Y[jj] = 0.5 * Y[jj-1] + V[jj] # regressand
end
Y₁ = Y[2:(N+1)]
Y₀ = Y[1:N]
beta[ii, kk] = (Y₀' * Y₀) \ (Y₀' * Y₁) # OLS estimator
end
end
theme(:ggplot2)
plot(Ns, mean(beta, dims=1)', ylims=[0.44, 0.51], line=(:steppre, :dot, 0.5, 4, :blue), label="Estimated beta")
hline!([0.5], label="True value", color="red", ls=:dash, lw=1)
We measure the OLS precision by its covariance matrix under the assumptions on the error term’s second moments.
Assuming homoskedasticity and no autocorrelation, we have: \[Var(\hat{\beta}) = \sigma^2(X'X)^{-1}.\]
It can be shown that it depends on the variance of the error term, the sample size, and the regressors’ relationship.
Dependence on the variance of the error term is straightforward.
The dependence on the sample size can be seen if we write \[Var(\hat{\beta}) = \sigma^2(X'X)^{-1} = \left(\frac{1}{n}\sigma^2\right)\left(\frac{1}{n}X'X\right)^{-1},\] assuming, as before, that \(plim_{n\to\infty} \left(\frac{1}{n}X'X\right)^{-1}=S_{XX}.\)
Consider the regression \[Y = X_1\beta_1+x_2\beta_2+U,\] where \(X=[X_1,\ x_2]\), and \(x_x\) is a column vector.
From the FWL theorem, \(\hat{\beta_2}\) is the same as the one from \[M_{1}Y = M_{1}x_2\beta_2+V.\]
Thus, \(Var(\hat{\beta_2}) = \sigma^2/(x_2'M_{1}x_2),\) which shows its dependence on the regressors’ relation.
using StatsPlots, Distributions, Random
Random.seed!(123)
R = 1000
N = 100
beta = zeros(R, 2)
for ii = 1:R
U = rand(Normal(0, 1), N ) # error term
# Uncorrelated regressors
X₁ = rand(Normal(0, 1), N) # regressor 1
X₂ = rand(Normal(0, 1), N) # regressor 2
Y = X₁ + X₂ + U # regressand
X = [X₁ X₂]
betas = (X'*X) \ (X'*Y) # OLS estimator
beta[ii, 1] = betas[1]
# Correlated regressors
X₁ = rand(Normal(0, 1), N) # regressor 1
X₂ = 0.5 * X₁ + rand(Normal(0, 1), N) # regressor 2
Y = X₁ + X₂ + U # regressand
X = [X₁ X₂]
betas = (X'*X) \ (X'*Y) # OLS estimator
beta[ii, 2] = betas[1]
end
theme(:ggplot2)
histogram(beta, label=["Estimated beta (Uncorrelated)" "Estimated beta (Correlated)"], legend=:topleft, fillalpha = 0.25,normalize=true)
x = range(0.5, stop=1.5, length=1000)
plot!(x, pdf.(Normal(1, 1/sqrt(N)), x), label="Normal density", color="red", ls=:dash, lw=2)
Department of Mathematical Sciences