Factor Models

Time Series

Department of Mathematical Sciences, Aalborg University

Factor Models and Principal Component Analysis

Big data

  • There has been a huge increase in the amount of data available.

  • This has led to the development of new techniques to analyze and extract information from data.

  • The dynamics of the data can be complex, but it is often the case that the data has some underlying structure.

  • Today, we will discuss some of these techniques: principal component analysis and factor models.

Principal Component Analysis

Principal component analysis

  • Principal component analysis (PCA) is a technique used to reduce the dimensionality of a dataset.

  • It is based on the idea of finding the directions in which the data has the largest variance.

  • These directions, called principal components, can be used to capture most of the information in the data with fewer variables.

Principal component analysis

  • Let \(X_1, X_2,\cdots, X_p\) be a set of variables centered at zero.

  • The first principal component is the normalized linear combination of the features \[Z_1 = \phi_{11}X_1 +\phi_{21}X_2 +\cdots+\phi_{p1}X_p,\] that has the largest variance.

  • That is, \(Z_1\) solves the optimization problem \[\max_{\phi_{11},\phi_{21},\cdots\phi_{p1}}Var(Z_1), \ \ \ \ \text{subject to}\ \ \ \ \sum_{j=1}^p\phi_{j,1}^2 =1.\]

Principal component analysis

  • In matrix form, \[\mathbb{X} = \begin{bmatrix}x_{1,1} &x_{1,2} &\cdots &x_{1,p}\\ x_{2,1} &x_{2,2} &\cdots &x_{2,p}\\\vdots &\vdots &\ddots &\vdots \\ x_{n,1} &x_{n,2} &\cdots &x_{n,p}\end{bmatrix}\ \ \ \ \Phi_1 = \begin{bmatrix} \phi_{1,1} \\ \phi_{2,1}\\\vdots\\\phi_{p,1} \end{bmatrix},\]

\(Z_1\) can be written as \(Z_1 = \mathbb{X}\Phi_1\).

  • Its variance is given by \[Var(Z_1) = \Phi_1'Var(\mathbb{X})\Phi_1.\]

Principal component analysis

  • The variance of the first principal component is given by \[Var(Z_1) = \frac{1}{n}\Phi_1'\mathbb{X}'\mathbb{X}\Phi_1.\]

  • Hence, the first principal component solves \[\max_{\Phi_1} \Phi_1'\mathbb{X}'\mathbb{X}\Phi_1,\ \ \ \text{subject to} \ \ \ \Phi_1'\Phi_1 = 1.\]

Principal component analysis

  • The Lagrangean of the problem is given by

\[\mathcal{L}(\Phi_1,\lambda_1) = \Phi_1'\mathbb{X}'\mathbb{X}\Phi_1 - \lambda_1(\Phi_1'\Phi_1-1).\]

  • The first order conditions are given by \[\begin{align} \frac{\partial \mathcal{L}}{\partial \Phi_1} &= 2\mathbb{X}'\mathbb{X}\Phi_1 - 2\lambda_1\Phi_1 = 0,\\ \frac{\partial \mathcal{L}}{\partial \lambda_1} &= \Phi_1'\Phi_1-1=0. \end{align}\]

  • From the first equation, the first principal component is the eigenvector of \(\mathbb{X}'\mathbb{X}\) with largest eigenvalue, \(\lambda_1\).

Principal component analysis

  • The second principal component is the normalized linear combination of the features that has the second largest variance and is uncorrelated with the first.
  • That is, the second principal component solves \[\max_{\Phi_2} \Phi_2'\mathbb{X}'\mathbb{X}\Phi_2,\ \ \text{subject to} \ \Phi_2'\Phi_2 = 1, \ \Phi_2'\Phi_1 = 0.\]

  • Similar derivations as before show that the second principal component is the eigenvector of \(\mathbb{X}'\mathbb{X}\) with the second largest eigenvalue, \(\lambda_2\).

  • Moreover, the eigenvectors of \(\mathbb{X}'\mathbb{X}\) are orthogonal.

Principal component analysis

Theorem (Eigenvectors of symmetrical matrices).

Let \(A\) be a symmetrical matrix. Then, the eigenvectors of \(A\) associated to different eigenvalues are orthogonal.

Principal component analysis

  • The \(k\)-th principal component is the normalized linear combination of the features that has the \(k\)-th largest variance and is uncorrelated with the previous \(k-1\) principal components.

  • The \(k\)-th principal component is the eigenvector of \(\mathbb{X}'\mathbb{X}\) with the \(k\)-th largest eigenvalue.

  • The eigenvectors of \(\mathbb{X}'\mathbb{X}\) are orthogonal.

  • The principal components are the eigenvectors of the covariance matrix of the data.

Principal component analysis

Code
set.seed(614)
many = 500
x = rnorm(many,0,1)
y = 3*x+rnorm(many,0,1)

X = scale(cbind(y,x))
plot(X,xlab="",ylab="",pch=1)
legend(-3, 2.5, c("Original data"),pch=c(1),col=c(1))

Principal component analysis

Code
pr.out = prcomp(X)
NX = X%*%pr.out$rotation[1:2,1]
NXY = NX%*%pr.out$rotation[1,1:2]
plot(X,xlab="",ylab="",pch=1)
points(NXY,pch=18,col=2,cex=1.2)
legend(-3, 2.5, c("Original data", "First Principal Component"),pch=c(1,18),col=c(1,2))

Principal component analysis

Scree plot

  • The scree plot shows the proportion of variance explained by each principal component.

  • It is used to determine the number of principal components needed to capture most of the information in the data.

  • It is defined as the proportion of variance explained by the \(k\)-th principal component, \[\frac{\lambda_k}{\sum_{j=1}^p\lambda_j}.\]

Principal component analysis

Scree plot

Principal component analysis

Scree plot

Principal component analysis

Scree plot

Principal component analysis

Scree plot

Factor Models

Factor models

  • Factor models are used to describe the relationship between a set of variables and a smaller set of unobservable factors.

  • The factors are assumed to capture the common information in the data.

  • Different factor models can be defined depending on the assumptions made on the factors.

Factor models

  • The factor model is given by \[X_t = \mu + B F_t + \epsilon_t,\] where

    • \(X_t\) is a \(p\times 1\) vector of observable variables,
    • \(B\) is a \(p\times k\) matrix of factor loadings (tipically \(k\ll p\)),
    • \(F_t\) is a \(k\times 1\) vector of (latent or unobservable) factors,
    • \(\epsilon_t\) is a \(p\times 1\) vector of idiosyncratic errors.
  • Note that the factors are the same for all variables.

Factor models

Assumptions

  • The idiosyncratic errors are assumed to be

    • mean zero: \(E(\epsilon_t) = 0\),
    • uncorrelated: \(E(\epsilon_t\epsilon_t') = \Psi\), where \(\Psi\) is a \(p\times p\) diagonal matrix,
    • uncorrelated with the factors: \(Cov(F_t,\epsilon_t') = 0\).
  • The variance matrix for the factors is given by: \(Var(F_t) = \Sigma_k\), a \(k\times k\) matrix.

Factor models

Known factors

  • If the factors are known, the factor model can be estimated by OLS in a time series regression.

  • For variable \(j\)-th, the factor model is given by \[X_{j,t} = \mu_j + \beta_j F_t + \epsilon_{j,t},\] where \(F_t\) is known and equal for all variables.

  • The OLS estimator is given by \[\hat{\beta}_j = (F' F)^{-1} F' X_{j}.\]

Factor models

Known factors

  • The variance of the \(j\)-th variable is given by \[Var(X_{j,t}) = \beta_j'\Sigma_k\beta_j+\Psi.\]
  • For all variables, the variance is given by \[Var(X_t) = B\Sigma_k B' + \Psi.\]

  • The power of the factor model is that it reduces the dimensionality of the problem by capturing the common information in the data.

Factor models

Unknown factors

  • If the factors are unknown, the factors as well as the loadings are estimated.

  • The model is given by \[X_t = \mu + B F_t + \epsilon_t,\] where the only known variable is \(X_t\) so that we cannot jointly estimate the factors and the loadings by OLS.

  • Note that the factors and the loadings are not identified.

  • We can identify the factors by imposing additional restrictions.

Factor models

Unknown factors

  • In terms of the variances, the system is given by: \[Var(X_t) = B\Sigma_k B' + \Psi,\] which abstracts from the factors.
  • This suggests an identification strategy: we look for a rotation of the factors that simplifies the variance structure by making \(\Sigma_k=I\).

  • Additional restrictions can be imposed on \(B\) to identify the loadings.

Factor models

Unknown factors estimation

  • Estimation of the factors and the loadings can be done by maximum likelihood.

  • Assume that the idiosyncratic errors and factors are normally distributed, then the likelihood function is given by \[L = \prod_{t=1}^n \frac{1}{(2\pi)^{p/2}|\Sigma|^{1/2}}\exp\left\{-\frac{1}{2}(X_t-\mu)'\Sigma^{-1}(X_t-\mu)\right\},\] where \(\Sigma = BB' + \Psi\).

Factor models

Unknown factors estimation

  • Estimation is done numerically by maximizing the likelihood function using the Expectation-Maximization (EM) algorithm.

  • The EM algorithm is an iterative procedure that alternates between the E-step, where the expected value of the log-likelihood is computed, and the M-step, where the parameters are updated.

  • Once the loadings are estimated, the factors can be estimated by OLS.

Factor models

Code
factors = factanal(stocks, factors = 3, rotation = "none", scores = "regression")
factors

Call:
factanal(x = stocks, factors = 3, scores = "regression", rotation = "none")

Uniquenesses:
 MSFT   BMY   XOM   FDX   MDT  ROST   SLB   UTX  SBUX    GS 
0.244 0.532 0.038 0.044 0.182 0.023 0.077 0.035 0.032 0.134 

Loadings:
     Factor1 Factor2 Factor3
MSFT  0.706   0.490   0.134 
BMY   0.459   0.489  -0.138 
XOM   0.819  -0.532         
FDX   0.942   0.120   0.233 
MDT   0.762   0.410   0.263 
ROST  0.948   0.248  -0.128 
SLB   0.762  -0.584         
UTX   0.969  -0.164         
SBUX  0.955   0.229         
GS    0.793  -0.239   0.424 

               Factor1 Factor2 Factor3
SS loadings      6.807   1.484   0.370
Proportion Var   0.681   0.148   0.037
Cumulative Var   0.681   0.829   0.866

Test of the hypothesis that 3 factors are sufficient.
The chi square statistic is 7039.12 on 18 degrees of freedom.
The p-value is 0 

Factor models

Comparison