BLUE: Best Linear Unbiased Estimator

Econometrics

J. Eduardo Vera-Valdés

eduardo@math.aau.dk

Aalborg University

Ordinary Least Squares (OLS) Properties

Ordinary Least Squares (OLS)

Assumptions:

Correct specification
Linear independence
Exogeneity
Homoskedasticity
No autocorrelation
Normality

Orthogonal Projections

The OLS solution is given by \(\hat{\beta}=(X'X)^{-1}X'Y\).
It maps \(Y\) into a vector of fitted values \(X\hat{\beta}:=P_XY\) and a vector of residuals \(\hat{U}:=M_X Y\) using: \[ P_X = X(X'X)^{-1}X', \ \ \ \text{and}\ \ \ \ \ M_X = I-P_X. \]
It is easy to show that \(P_X\) and \(M_X\) are projection matrices, and that \(P_XX = X\), and \(M_XX = 0\).
Moreover, they are complementary since \(P_XM_X = 0\).

Orthogonal Projections

The decomposition is represented by a right-angled triangle.

Orthogonal Projections

By Pythagoras’ Theorem, \[ ||P_XY||^2 \leq ||Y||^2.\]
Moreover, the OLS residuals are orthogonal to all the regressors. \[X'\hat{U} = X'M_XY = 0.\]
In particular, if the regressors include a constant, then the residuals sum to zero, \(\sum_{t=1}^N \hat{U}_t = 0\).

The Frisch-Waugh-Lovell (FWL) Theorem

We are interested in analyzing the effect that partitioning the regressors have on the estimators.
Assume we broke up the regressors into two groups \(X = [X_1\ X_2]\) so that \[Y = X_1\beta_1 + X_2\beta_2 + U.\]
In general, the OLS estimator of \(\beta_1\) depends on \(X_2\). (More in the next lecture.)

The FWL Theorem

In the special case that \(X_1\) is orthogonal to \(X_2\) we obtain the same OLS estimate for \(\beta_1\) using the complete specification than the one using the reduced specification: \[Y = X_1\beta_1 + V.\]
Analogously, we obtain the same estimate for \(\beta_2\) if we remove \(X_1\) from the regression.

The FWL Theorem

In the general case, the two regressions \[Y = X_1\beta_1 + M_{X_1}X_2\beta_2 + U,\] and \[Y = M_{X_1}X_2\beta_2 + V,\] yield identical estimates for \(\beta_2\).
Nonetheless, they do not yield the same residuals.
To recover the same residuals we need to clean \(Y\) from the effect of \(X_1\).

The FWL Theorem

Theorem (Frisch-Waugh-Lovell)

The OLS estimates of \(\beta_2\) in the regressions \[Y = X_1\beta_1 + X_2\beta_2 + U,\] and \[M_{X_1}Y = M_{X_1}X_2\beta_2 + U,\] are numerically identical.

Moreover, the residuals in both regressions are numerically identical.

The FWL Theorem - Applications

If the regression includes a constant via \(\iota\) a vector of ones. \[Y = \iota\beta_0 + X\beta_1 + U.\]
The FWL theorem shows that the estimator for \(\beta_1\) is the same if we instead run the regression \[M_{\iota}Y = M_{\iota}X\beta_1 + U.\]
Same estimates in a regression with a constant on raw data than with no constant using demeaned data.

The FWL Theorem - Applications

Some variables commonly used contain time trends.
We may consider a regression like \[Y = \alpha_0 \iota + \alpha_1 T + X\beta + U,\] where \(T' = [1,2,\cdots,N]\) to capture the time trend.
The FWL theorem shows that we obtain the same estimates if we instead run the regression using detrended data.

The FWL Theorem - Applications

Alternatively, some variables may show a seasonal behavior.
We can model seasonality using seasonal dummy variables \[Y = \alpha_1 s_1 + \alpha_2 s_2 + \alpha_3 s_3 + \alpha_4 s_4 + X\beta + U,\] where \(s_i\) are the seasonal dummy variables.
The FWL theorem tells us that we can estimate \(\beta\) using deseasonalized data.

The FWL Theorem - Goodness of Fit

Recalling that \(P_XY\) is what \(X\) can explain from \(Y\) motivates the following definition.

Definition

The coefficient of determination or (uncentered) \(R^2\) is defined as \[R^2 = \frac{||P_XY||^2}{||Y||^2},\] which is between 0 and 1.

Note that it is invariant to (nonsingular) linear transformations of \(X\) and to changes in the scale of \(Y\).

The FWL Theorem - Goodness of Fit

Nonetheless, the (uncentered) \(R^2\) is not invariant to translations
Consider \(\tilde{Y} := Y + \alpha \iota\) in a regression where \(X\) includes a constant, then \[R^2 = \frac{||P_XY + \alpha \iota||^2}{||Y + \alpha\iota||^2}.\]
By choosing \(\alpha\), we can make \(R^2\) as close to 1 as we want.

The FWL Theorem - Goodness of Fit

To avoid this problem (for regressions that include a constant) the FWL theorem tells us that we can demean the series without changing the estimates or residuals.
This gives rise to the (centered) \(R^2\) defined as \[R^2 = \frac{||P_XM_\iota Y||^2}{||M_\iota Y||^2},\] which is unaffected by translations.

OLS is BLUE

The OLS estimator

Theorem

Under correct specification; exogenous regressors; homoskedastic, no-autocorrelated, and normally-distributed errors, the OLS estimator follows a normal distribution with mean \(\beta\) and variance \(\sigma^2(X'X)^{-1}\). \[\hat{\beta} \sim N(\beta, \sigma^2(X'X)^{-1}).\]

The OLS estimator

The Gauss-Markov Theorem

Theorem (Gauss-Markov)

In a regression under correct specification, exogenous regressors, homoskedastic and no-autocorrelated errors, the OLS estimator is more efficient than any other linear unbiased estimator.

In other words, the OLS estimator is the Best Linear Unbiased Estimator (BLUE).

Bias and Consistency

Bias

To show that OLS is unbiased we require \[E[(X'X)^{-1}X'U] = 0.\]

There are two possibilities:

\(X\) is nonstochastic and \(U\) has mean zero.

\(X\) is exogenous, i.e., \(E[U|X] = 0\).

Bias

Exogeneity imposes that all errors are not related to past and future values of the regressors.
For instance, \(\hat{\beta}\) is biased under the weaker assumption of predetermined regressors, \(E(U_t|X_t)=0\).

Code

using StatsPlots, Distributions, Random
Random.seed!(123)

R = 100;
N = 5000;
beta = zeros(R, 1)

for ii in 1:R

    V = rand(Normal(0, 1), N + 1) # error term
    Y = zeros(N + 1, 1)

    Y[1] = V[1] # first observation
    for jj = 2:N+1
        Y[jj] = 0.5 * Y[jj-1] + V[jj] # regressand
    end

    Y₁ = Y[2:(N+1)]
    Y₀ = Y[1:N]

    beta[ii] = (Y₀' * Y₀) \ (Y₀' * Y₁) # OLS estimator

end

theme(:ggplot2)
boxplot(beta, label="Estimated regressor", orientation=:horizontal)
vline!([0.5], label="True value", color="red", lw=3)

Consistency

Another statistical property we may want from our estimators is consistency; that is, \[plim_{n\to\infty} \hat{\beta} = \beta.\]

Consistency requires:

\(plim_{n\to\infty} X'U = 0\), after standardization.
\(plim_{n\to\infty} (X'X)^{-1}\) exists and has a finite limit.

Consistency

Using the law of large numbers, we can show that predeterminedness implies \(plim_{n\to\infty} \frac{1}{n}X'U=0.\)
The proof relies on the law of large numbers. Thus, errors cannot be too correlated or have unbounded variances.

Another use of the law of large numbers can be used to show that \(plim_{n\to\infty} \left(\frac{1}{n}X'X\right)^{-1}=S_{XX}<\infty.\)
So that regressors must not be too correlated, and their variances do not increase without bound.

Consistency

OLS is consistent under predetermined regressors.

Code

using StatsPlots, Distributions, Random
Random.seed!(123)

R = 100
Ns = [10; 30; 50; 100; 200; 300; 400; 500]
nn = length(Ns)
beta = zeros(R, nn)

for kk = 1:nn
    N = Ns[kk]

    for ii = 1:R
        V = rand(Normal(0, 1), N + 1) # error term
        Y = zeros(N + 1, 1)

        Y[1] = V[1] # first observation
        for jj = 2:N+1
            Y[jj] = 0.5 * Y[jj-1] + V[jj] # regressand
        end

        Y₁ = Y[2:(N+1)]
        Y₀ = Y[1:N]

        beta[ii, kk] = (Y₀' * Y₀) \ (Y₀' * Y₁) # OLS estimator
    end
end

theme(:ggplot2)
plot(Ns, mean(beta, dims=1)', ylims=[0.44, 0.51], line=(:steppre, :dot, 0.5, 4, :blue), label="Estimated beta")
hline!([0.5], label="True value", color="red", ls=:dash, lw=1)

Precision

We measure the OLS precision by its covariance matrix under the assumptions on the error term’s second moments.

Assuming homoskedasticity and no autocorrelation, we have: \[Var(\hat{\beta}) = \sigma^2(X'X)^{-1}.\]

It can be shown that it depends on the variance of the error term, the sample size, and the regressors’ relationship.

Precision

Dependence on the variance of the error term is straightforward.
The dependence on the sample size can be seen if we write \[Var(\hat{\beta}) = \sigma^2(X'X)^{-1} = \left(\frac{1}{n}\sigma^2\right)\left(\frac{1}{n}X'X\right)^{-1},\] assuming, as before, that \(plim_{n\to\infty} \left(\frac{1}{n}X'X\right)^{-1}=S_{XX}.\)

The dependence on the relationship between the regressors is more subtle.

Precision

Consider the regression \[Y = X_1\beta_1+x_2\beta_2+U,\] where \(X=[X_1,\ x_2]\), and \(x_x\) is a column vector.
From the FWL theorem, \(\hat{\beta_2}\) is the same as the one from \[M_{1}Y = M_{1}x_2\beta_2+V.\]
Thus, \(Var(\hat{\beta_2}) = \sigma^2/(x_2'M_{1}x_2),\) which shows its dependence on the regressors’ relation.

Precision

Code

using StatsPlots, Distributions, Random
Random.seed!(123)

R = 1000
N = 100

beta = zeros(R, 2)

for ii = 1:R
    U = rand(Normal(0, 1), N ) # error term

    # Uncorrelated regressors

    X₁ = rand(Normal(0, 1), N) # regressor 1
    X₂ = rand(Normal(0, 1), N) # regressor 2

    Y = X₁ + X₂ + U # regressand

    X = [X₁ X₂]

    betas = (X'*X) \ (X'*Y) # OLS estimator

    beta[ii, 1] = betas[1]

    # Correlated regressors

    X₁ = rand(Normal(0, 1), N) # regressor 1
    X₂ = 0.5 * X₁ + rand(Normal(0, 1), N) # regressor 2

    Y = X₁ + X₂ + U # regressand

    X = [X₁ X₂]

    betas = (X'*X) \ (X'*Y) # OLS estimator

    beta[ii, 2] = betas[1]

end

theme(:ggplot2)
histogram(beta, label=["Estimated beta (Uncorrelated)" "Estimated beta (Correlated)"], legend=:topleft, fillalpha = 0.25,normalize=true)
x = range(0.5, stop=1.5, length=1000)
plot!(x, pdf.(Normal(1, 1/sqrt(N)), x), label="Normal density", color="red", ls=:dash, lw=2)

Summing Up

OLS decomposes the regressand into fitted values and residuals using orthogonal projections.
The FWL theorem is a powerful tool for understanding the effects of regressors under alternative specifications.
The centered coefficient of determination \(R^2\) measures goodness of fit.
OLS is unbiased under certain assumptions and consistent under weaker assumptions.
The precision of the OLS estimator depends on the variance of errors, sample size, and relationship between regressors.