Factor Models

Time Series

J. Eduardo Vera-Valdés

Department of Mathematical Sciences, Aalborg University

Factor Models and Principal Component Analysis

Big data

There has been a huge increase in the amount of data available.
This has led to the development of new techniques to analyze and extract information from data.
The dynamics of the data can be complex, but it is often the case that the data has some underlying structure.
Today, we will discuss some of these techniques: principal component analysis and factor models.

Principal Component Analysis

Principal component analysis

Principal component analysis (PCA) is a technique used to reduce the dimensionality of a dataset.
It is based on the idea of finding the directions in which the data has the largest variance.
These directions, called principal components, can be used to capture most of the information in the data with fewer variables.

Principal component analysis

Let \(X_1, X_2,\cdots, X_p\) be a set of variables centered at zero.
The first principal component is the normalized linear combination of the features \[Z_1 = \phi_{11}X_1 +\phi_{21}X_2 +\cdots+\phi_{p1}X_p,\] that has the largest variance.

That is, \(Z_1\) solves the optimization problem \[\max_{\phi_{11},\phi_{21},\cdots\phi_{p1}}Var(Z_1), \ \ \ \ \text{subject to}\ \ \ \ \sum_{j=1}^p\phi_{j,1}^2 =1.\]

Principal component analysis

In matrix form, \[\mathbb{X} = \begin{bmatrix}x_{1,1} &x_{1,2} &\cdots &x_{1,p}\\ x_{2,1} &x_{2,2} &\cdots &x_{2,p}\\\vdots &\vdots &\ddots &\vdots \\ x_{n,1} &x_{n,2} &\cdots &x_{n,p}\end{bmatrix}\ \ \ \ \Phi_1 = \begin{bmatrix} \phi_{1,1} \\ \phi_{2,1}\\\vdots\\\phi_{p,1} \end{bmatrix},\]

\(Z_1\) can be written as \(Z_1 = \mathbb{X}\Phi_1\).

Its variance is given by \[Var(Z_1) = \Phi_1'Var(\mathbb{X})\Phi_1.\]

Principal component analysis

The variance of the first principal component is given by \[Var(Z_1) = \frac{1}{n}\Phi_1'\mathbb{X}'\mathbb{X}\Phi_1.\]
Hence, the first principal component solves \[\max_{\Phi_1} \Phi_1'\mathbb{X}'\mathbb{X}\Phi_1,\ \ \ \text{subject to} \ \ \ \Phi_1'\Phi_1 = 1.\]

Principal component analysis

The Lagrangean of the problem is given by

\[\mathcal{L}(\Phi_1,\lambda_1) = \Phi_1'\mathbb{X}'\mathbb{X}\Phi_1 - \lambda_1(\Phi_1'\Phi_1-1).\]

The first order conditions are given by \[\begin{align} \frac{\partial \mathcal{L}}{\partial \Phi_1} &= 2\mathbb{X}'\mathbb{X}\Phi_1 - 2\lambda_1\Phi_1 = 0,\\ \frac{\partial \mathcal{L}}{\partial \lambda_1} &= \Phi_1'\Phi_1-1=0. \end{align}\]
From the first equation, the first principal component is the eigenvector of \(\mathbb{X}'\mathbb{X}\) with largest eigenvalue, \(\lambda_1\).

Principal component analysis

The second principal component is the normalized linear combination of the features that has the second largest variance and is uncorrelated with the first.

That is, the second principal component solves \[\max_{\Phi_2} \Phi_2'\mathbb{X}'\mathbb{X}\Phi_2,\ \ \text{subject to} \ \Phi_2'\Phi_2 = 1, \ \Phi_2'\Phi_1 = 0.\]
Similar derivations as before show that the second principal component is the eigenvector of \(\mathbb{X}'\mathbb{X}\) with the second largest eigenvalue, \(\lambda_2\).
Moreover, the eigenvectors of \(\mathbb{X}'\mathbb{X}\) are orthogonal.

Principal component analysis

Theorem (Eigenvectors of symmetrical matrices).

Let \(A\) be a symmetrical matrix. Then, the eigenvectors of \(A\) associated to different eigenvalues are orthogonal.

Principal component analysis

The \(k\)-th principal component is the normalized linear combination of the features that has the \(k\)-th largest variance and is uncorrelated with the previous \(k-1\) principal components.
The \(k\)-th principal component is the eigenvector of \(\mathbb{X}'\mathbb{X}\) with the \(k\)-th largest eigenvalue.
The eigenvectors of \(\mathbb{X}'\mathbb{X}\) are orthogonal.
The principal components are the eigenvectors of the covariance matrix of the data.

Principal component analysis

Code

set.seed(614)
many = 500
x = rnorm(many,0,1)
y = 3*x+rnorm(many,0,1)

X = scale(cbind(y,x))
plot(X,xlab="",ylab="",pch=1)
legend(-3, 2.5, c("Original data"),pch=c(1),col=c(1))

Principal component analysis

Code

pr.out = prcomp(X)
NX = X%*%pr.out$rotation[1:2,1]
NXY = NX%*%pr.out$rotation[1,1:2]
plot(X,xlab="",ylab="",pch=1)
points(NXY,pch=18,col=2,cex=1.2)
legend(-3, 2.5, c("Original data", "First Principal Component"),pch=c(1,18),col=c(1,2))

Principal component analysis

Scree plot

The scree plot shows the proportion of variance explained by each principal component.
It is used to determine the number of principal components needed to capture most of the information in the data.
It is defined as the proportion of variance explained by the \(k\)-th principal component, \[\frac{\lambda_k}{\sum_{j=1}^p\lambda_j}.\]

Principal component analysis

Scree plot

Principal component analysis

Scree plot

Principal component analysis

Scree plot

Principal component analysis

Factor Models

Factor models

Factor models are used to describe the relationship between a set of variables and a smaller set of unobservable factors.
The factors are assumed to capture the common information in the data.
Different factor models can be defined depending on the assumptions made on the factors.

Factor models

The factor model is given by \[X_t = \mu + B F_t + \epsilon_t,\] where
- \(X_t\) is a \(p\times 1\) vector of observable variables,
- \(B\) is a \(p\times k\) matrix of factor loadings (tipically \(k\ll p\)),
- \(F_t\) is a \(k\times 1\) vector of (latent or unobservable) factors,
- \(\epsilon_t\) is a \(p\times 1\) vector of idiosyncratic errors.

Note that the factors are the same for all variables.

Factor models

Assumptions

The idiosyncratic errors are assumed to be
- mean zero: \(E(\epsilon_t) = 0\),
- uncorrelated: \(E(\epsilon_t\epsilon_t') = \Psi\), where \(\Psi\) is a \(p\times p\) diagonal matrix,
- uncorrelated with the factors: \(Cov(F_t,\epsilon_t') = 0\).
The variance matrix for the factors is given by: \(Var(F_t) = \Sigma_k\), a \(k\times k\) matrix.

Factor models

Known factors

If the factors are known, the factor model can be estimated by OLS in a time series regression.
For variable \(j\)-th, the factor model is given by \[X_{j,t} = \mu_j + \beta_j F_t + \epsilon_{j,t},\] where \(F_t\) is known and equal for all variables.

The OLS estimator is given by \[\hat{\beta}_j = (F' F)^{-1} F' X_{j}.\]

Factor models

Known factors

The variance of the \(j\)-th variable is given by \[Var(X_{j,t}) = \beta_j'\Sigma_k\beta_j+\Psi.\]

For all variables, the variance is given by \[Var(X_t) = B\Sigma_k B' + \Psi.\]
The power of the factor model is that it reduces the dimensionality of the problem by capturing the common information in the data.

Factor models

Unknown factors

If the factors are unknown, the factors as well as the loadings are estimated.
The model is given by \[X_t = \mu + B F_t + \epsilon_t,\] where the only known variable is \(X_t\) so that we cannot jointly estimate the factors and the loadings by OLS.

Note that the factors and the loadings are not identified.
We can identify the factors by imposing additional restrictions.

Factor models

Unknown factors

In terms of the variances, the system is given by: \[Var(X_t) = B\Sigma_k B' + \Psi,\] which abstracts from the factors.

This suggests an identification strategy: we look for a rotation of the factors that simplifies the variance structure by making \(\Sigma_k=I\).
Additional restrictions can be imposed on \(B\) to identify the loadings.

Factor models

Unknown factors estimation

Estimation of the factors and the loadings can be done by maximum likelihood.
Assume that the idiosyncratic errors and factors are normally distributed, then the likelihood function is given by \[L = \prod_{t=1}^n \frac{1}{(2\pi)^{p/2}|\Sigma|^{1/2}}\exp\left\{-\frac{1}{2}(X_t-\mu)'\Sigma^{-1}(X_t-\mu)\right\},\] where \(\Sigma = BB' + \Psi\).

Factor models

Unknown factors estimation

Estimation is done numerically by maximizing the likelihood function using the Expectation-Maximization (EM) algorithm.
The EM algorithm is an iterative procedure that alternates between the E-step, where the expected value of the log-likelihood is computed, and the M-step, where the parameters are updated.
Once the loadings are estimated, the factors can be estimated by OLS.

Factor models

Code

factors = factanal(stocks, factors = 3, rotation = "none", scores = "regression")
factors


Call:
factanal(x = stocks, factors = 3, scores = "regression", rotation = "none")

Uniquenesses:
 MSFT   BMY   XOM   FDX   MDT  ROST   SLB   UTX  SBUX    GS 
0.244 0.532 0.038 0.044 0.182 0.023 0.077 0.035 0.032 0.134 

Loadings:
     Factor1 Factor2 Factor3
MSFT  0.706   0.490   0.134 
BMY   0.459   0.489  -0.138 
XOM   0.819  -0.532         
FDX   0.942   0.120   0.233 
MDT   0.762   0.410   0.263 
ROST  0.948   0.248  -0.128 
SLB   0.762  -0.584         
UTX   0.969  -0.164         
SBUX  0.955   0.229         
GS    0.793  -0.239   0.424 

               Factor1 Factor2 Factor3
SS loadings      6.807   1.484   0.370
Proportion Var   0.681   0.148   0.037
Cumulative Var   0.681   0.829   0.866

Test of the hypothesis that 3 factors are sufficient.
The chi square statistic is 7039.12 on 18 degrees of freedom.
The p-value is 0

Factor models

Comparison

Original Data
Principal components
Factors