Long Memory Models

Time Series

J. Eduardo Vera-Valdés

Department of Mathematical Sciences, Aalborg University

Long Memory

Long memory

Long memory, or long-range dependence, in time series analysis deals with the notion that data may have a strong dependence on past values.
In particular, the autocorrelation function of the data may decay slower than what ARMA models can capture.
Long memory models are used in climate, finance, biology, economics, and many other fields.
Today we will discuss the concept of long memory, why it may occur, and how to model it.

Long memory

Nile River annual flows is the classical example.

Code

using Pkg; Pkg.activate(pwd())
using LongMemory
NileDataPlot()

Long memory

Code

using Plots
autocorrelation_plot(NileData().NileMin, 50)
plot!(1:51, [0.4^i for i in 0:50],linewidth=3,linestyle=:dash)
title!("AR(1) with α=0.4")

Long memory

Code

autocorrelation_plot(NileData().NileMin, 50)
plot!(1:51, [0.4^i for i in 0:50],linewidth=3,linestyle=:dash)
plot!(1:51, [0.8^i for i in 0:50],linewidth=3,linestyle=:dot)
title!("AR(1) with α=0.8")

Long memory

Code

autocorrelation_plot(NileData().NileMin, 50)
plot!(1:51, [0.4^i for i in 0:50],linewidth=3,linestyle=:dash)
plot!(1:51, [0.8^i for i in 0:50],linewidth=3,linestyle=:dot)
plot!(1:51, [0.9^i for i in 0:50],linewidth=3,linestyle=:dashdot)
title!("AR(1) with α=0.9")

Long memory

Definition

We say that a time series \(x_t\) has long memory if: \[\begin{equation}\label{def:cov} \gamma_x(k) \approx C_x k^{2d-1}\quad \text{as}\quad k\to\infty, \end{equation}\] where \(\gamma_x(k)\) is the autocovariance function and \(C_x\in\mathbb{R}\).

Above, \(g(x)\approx h(x)\) as \(x\to x_0\) means that \(g(x)/h(x)\) converges to \(1\) as \(x\) tends to \(x_0\).

Long memory examples

The Nile River annual flows.
Stock volatilities; for example the (Chicago board options exchange) Volatility Index, or VIX.
Temperature series.
Network traffic.

Long memory origins

Reasons for long memory in time series include:

The series is a result of cross-sectional aggregation, Granger (1980) and Haldrup and Vera‐Valdés (2017).
The series is a result of shocks of stochastic duration, Parke (1999).
The series was fractionally differenced, Granger and Joyeux (1980) and Hosking (1981).

Fractional Differencing

Fractional differencing

It is the most common approach to model long memory in time series.
It is based on the idea of extending the concept of differencing to non-integer orders.
Was first proposed by Granger and Joyeux (1980) and Hosking (1981).

Fractional differencing

Definition

The series \(x_t\) is said to be integrated of order \(d\), denoted \(I(d)\), if \[ (1-L)^dx_t = \varepsilon_t,\] where \(L\) is the lag operator, and \(\varepsilon_t\) is a white noise process with variance \(\sigma^2\).

\((1-L)^d\) is called the fractional difference operator.
The white noise property of \(\varepsilon_t\) can be relaxed to allow for more general structures.

Fractional differencing

The fractional difference operator is decomposed using the binomial expansion:

\[(1-L)^d = \sum_{k=0}^{\infty} \begin{pmatrix} d \\ k \end{pmatrix} (-L)^k,\]

where \(\begin{pmatrix} d \\ k \end{pmatrix}\) are the (generalized) binomial coefficients.

Fractional differencing

The series can be written as

\[x_t = \sum_{k=0}^{\infty} \pi_k \varepsilon_{t-k},\]

where \(\pi_k=\frac{(-1)^k\Gamma(1-d)}{\Gamma(k+1)\Gamma(1-k-d)}\), and \(\Gamma()\) is the gamma function.

The representation has some unique properties compared to the ARMA model.

Fractional differencing

The autocovariance function, \(\gamma(k)\), is given by: \[\gamma(k) = \sigma^2\frac{(-1)^k\Gamma(1-2d)}{\Gamma(1+k-d)\Gamma(1-k-d)}.\]

In turn, the autocorrelation function is given by \[\begin{align} \rho(k) &= \frac{(-1)^k\Gamma^2(1-d)}{\Gamma(1+k-d)\Gamma(1-k-d)}\\ &=\frac{\Gamma(1-d)\Gamma(k+d)}{\Gamma(d)\Gamma(1+k-d)}. \end{align}\]
Asymptotically, Stirling’s approximation is used to show that \(\gamma(k)\approx k^{2d-1}\) and \(\rho(k)\approx k^{2d-1}\) as \(k\to\infty\).

Fractional differencing

Code

autocorrelation_plot(NileData().NileMin, 50)
plot!(1:51, [0.4^i for i in 0:50],linewidth=3,linestyle=:dash)
plot!(1:51, [0.8^i for i in 0:50],linewidth=3,linestyle=:dot)
plot!(1:51, [0.9^i for i in 0:50],linewidth=3,linestyle=:dashdot)
plot!(1:51, fi_cor_vals( 51, 0.34 ), linewidth = 3, line = :solid)
title!("I(d) [fractional differencing] with d=0.34")

Long Memory Estimation

Long memory estimation

There are several methods to estimate the fractional differencing parameter \(d\).
The most common are the semi-parametric methods, such as the log-periodogram estimator, and the parametric methods, such as the maximum likelihood estimator.
Heuristic methods, such as the log-variance plot, are also used.

Log-variance plot

Consider the variance of the sample mean given by

\[\text{Var}(\bar{x}_n) = \frac{\gamma(0)}{n} + 2\sum_{k=1}^{n-1}\left(1-\frac{k}{n}\right)\gamma(k).\]

If the second term collapses asympotically, as with exponential decay, then the variance of the sample mean goes to zero.
On the contrary, the variance of the sample may diverge or converge depending on the rate of decay of the autocovariance function; that is, depending on \(d\).

Log-variance plot

For long memory series, the following relationship holds: \[Var\left(\bar{x}\right) \approx C_v n^{2d-1}.\]
This suggest a way to determine if the series has long memory by plotting the log-variance of the sample mean against the log of the sample size.

If the series has long memory, the log-variance plot should be a straight line with slope \(2d-1\).
On the other hand, for short memory series, the log-variance plot should be a straight line with slope \(-1\).

Log-variance plot

Code

log_variance_plot( NileData().NileMin; m=300, slope = true, slope2 = true )

Log-variance plot properties

It is a simple and heuristic method to determine the presence of long memory.
It is not a formal test, but it is a useful tool to determine the presence of long memory.
It is not a consistent estimator of \(d\).

Maximum likelihood estimator

The MLE is a parametric estimator based on fitting the autocovariance function of the time series.
Let \(X=[x_0,\cdots,x_{T-1}]'\) be a sample of size \(T\) of a fractionally differenced time series, and let \(\theta = [d,\sigma^2]'\).
Under the assumption that \(\varepsilon_t\) follows a normal distribution, \(X\) follows a normal distribution.

Maximum likelihood estimator

The probability density of \(X\) is given by: \[ f(X|\theta) = (2\pi)^{-T/2}|\Sigma|^{-1/2}\exp\left(-\frac{1}{2}X^\top\Sigma^{-1}X\right),\] where \(\Sigma\) is the covariance matrix defined as: \[\Sigma = [\gamma(|j-k|)].\]

We estimate by maximising the log-likelihood: \[\hat{\theta} = \max_{\theta} \log(f(X|\theta)).\]

MLE properties

Under correct specification, MLE is consistent and normally distributed.
Under additional regularity conditions, it is asymptotically normally distributed even under misspecification of the error distribution.
It is computationally intensive, Sowell (1992) suggested several computational improvements.
It depends on the correct specification of the autocovariance function.

Log-periodogram estimators

The log-periodogram esimator is a semiparametric estimator in the frequency domain.
It was originally proposed by Geweke and Porter-Hudak (1983), hence it is also known as the Geweke-Porter-Hudak estimator or GPH estimator.
Alternative versions are known as Whittle estimators.

Frequency domain

The frequency domain refers to the analysis of series (or signals) with respect to frequency rather than time.
The idea is that any time series can be decomposed into a sum of sinusoidal functions of different frequencies.

A time series \(x_t\) can be written as:

\[x_t = \sum_{j=1}^n a_j\cos(2\pi \lambda_j t) + b_j\sin(2\pi \lambda_j t) + \varepsilon_t,\]

where \(a_j\) and \(b_j\) are the amplitudes of the sinusoidal functions, \(\lambda_j\) are the frequencies, and \(\varepsilon_t\) is the error term.

Frequency domain

The time series next is given by \[x_t = x_{1,t}+x_{2,t}+x_{3,t},\]

for \(t=1,\cdots,100\), where

\[\begin{align} x_{1,t} &= 3\cos\left(2\pi\frac{t}{100}\right) + 4\sin\left(2\pi\frac{t}{100}\right),\\ x_{2,t} &= 4\cos\left(2\pi\frac{5t}{100}\right) + 2\sin\left(2\pi\frac{5t}{100}\right),\\ x_{3,t} &= \cos\left(2\pi\frac{40t}{100}\right) + \sin\left(2\pi\frac{40t}{100}\right). \end{align}\]

Frequency domain

\[x_t = \sum_{j=1}^n a_j\cos(2\pi \lambda_j t) + b_j\sin(2\pi \lambda_j t) + \varepsilon_t\]

The contribution of frequency \(\lambda_j\) is \(a_j^2+b_j^2\).
This measures the sample variance at each frequency component and is then an estimate of \(\sigma_j^2\) for frequency \(\lambda_j\).

Using Euler’s formula, these are computed by the periodogram, \(I_x()\), given by

\[I_x(\lambda_j) = \frac{1}{2\pi}\left|\sum_{t=1}^n x_t e^{-i\lambda_j t}\right|^2.\]

Log-periodogram estimator

The low-frequency components of the periodogram are those associated to the longer cycles.
For long memory series, the periodogram satisfies:

\[I_x(\lambda)\approx C_f\lambda^{-2d}\quad \text{as}\quad \lambda\to 0.\]

Hence, Geweke and Porter-Hudak proposed to estimate \(d\) by fitting a line to the log-periodogram close to the zero frequency.

Log-periodogram estimator

The log-periodogram regression is given by:

\[\log(I(\lambda_k)) = c-2d \log(\lambda_k)+u_k,\quad k = 1,\cdots,m,\]

where \(u_k\) is the error term, and \(m\) is a bandwidth parameter that determines the number of frequencies used for estimation.

The estimate of \(d\) is retrieved from the slope of the regression line.

The bandwidth parameter \(m\) requires a trade-off between bias and variance.

Log-periodogram estimator

Code

periodogram_plot( NileData().NileMin; slope = true)

Log-periodogram estimator properties

It is a consistent estimator of \(d\).
It is asymptotically normally distributed.
It is computationally less intensive than the MLE.
It is robust to the specification of the short memory components.
It is not as efficient as the MLE.
The bandwidth parameter \(m\) has to be chosen carefully: most practitioners use \(m\approx T^{1/2}\), while some suggest \(m\approx T^{4/5}\), where \(T\) is the sample size.

Whittle estimator

An alternative semiparametric formulation was developed by Künsch (1987).
The author proposed to estimate the parameter as the minimiser of the local Whittle likelihood function given by \[R(d) = \log\left(\frac{1}{m}\sum_{k=1}^{m}\lambda_k^{2d}I(\lambda_k)\right)-\frac{2d}{m}\sum_{k=1}^{m}\log(\lambda_k),\] where \(m\) is the bandwidth parameter.

Whittle estimator properties

It is a consistent estimator of \(d\).
It is asymptotically normally distributed.
It is more efficient than the log-periodogram estimator in the sense that it has a smaller variance.
In contrast to log-periodogram regression, the Whittle estimator requires numerical optimisation for estimation.
The bandwidth parameter \(m\) has to be chosen carefully.

Long Memory Models

ARFIMA model

The ARFIMA model is a generalization of the ARIMA model that includes fractional differencing.
The ARFIMA model is defined by the following equation: \[\phi(L)(1-L)^d x_t = \theta(L)\varepsilon_t,\]

where \(\phi(L)\) and \(\theta(L)\) are the autoregressive and moving average polynomials, respectively.

FIGARCH model

The FIGARCH model is a generalization of the GARCH model that includes fractional differencing.
The FIGARCH model is defined by: \[[1-\alpha(L)-\beta(L)](1-L)^d\varepsilon_t^2=\alpha_0+[1-\beta(L)]\nu_t,\]

where \(\alpha(L)\) and \(\beta(L)\) are lag polynomials.

References

Geweke, John, and Susan Porter-Hudak. 1983. “The Estimation and Application of Long Memory Time Series Models.” Journal of Time Series Analysis 4: 221–38. https://doi.org/10.1111/j.1467-9892.1983.tb00371.x.

Granger, Clive W. J. 1980. “Long Memory Relationships and the Aggregation of Dynamic Models.” Journal of Econometrics 14: 227–38. https://doi.org/10.1016/0304-4076(80)90092-5.

Granger, Clive W. J., and R Joyeux. 1980. “An Introduction to Long Memory Time Series Models and Fractional Differencing.” Journal of Time Series Analysis 1: 15–29. https://doi.org/10.1111/j.1467-9892.1980.tb00297.x.

Haldrup, Niels, and J. Eduardo Vera‐Valdés. 2017. “Long Memory, Fractional Integration, and Cross-Sectional Aggregation.” Journal of Econometrics 199: 1–11. https://doi.org/10.1016/j.jeconom.2017.03.001.

Hosking, J. R M. 1981. “Fractional Differencing.” Biometrika 68: 165–76. https://doi.org/10.1093/biomet/68.1.165.

Künsch, Hans. 1987. “Statistical Aspects of Self-Similar Processes.” Bernouli 1: 67–74.

Parke, WR. 1999. “What Is Fractional Integration?” Review of Economics and Statistics 81: 632–38. https://doi.org/10.1162/003465399558490.

Sowell, Fallaw. 1992. “Maximum Likelihood Estimation of Stationary Univariate Fractionally Integrated Time Series Models.” Journal of Econometrics 53: 165–88. https://doi.org/10.1016/0304-4076(92)90084-5.