Long Memory Models

Time Series

Department of Mathematical Sciences, Aalborg University

Long Memory

Long memory

  • Long memory, or long-range dependence, in time series analysis deals with the notion that data may have a strong dependence on past values.

  • In particular, the autocorrelation function of the data may decay slower than what ARMA models can capture.

  • Long memory models are used in climate, finance, biology, economics, and many other fields.

  • Today we will discuss the concept of long memory, why it may occur, and how to model it.

Long memory

Nile River annual flows is the classical example.

Code
using LongMemory
NileDataPlot()

Long memory

Code
using Plots
autocorrelation_plot(NileData().NileMin, 50)
plot!(1:51, [0.4^i for i in 0:50],linewidth=3,linestyle=:dash)
title!("AR(1) with α=0.4")

Long memory

Code
autocorrelation_plot(NileData().NileMin, 50)
plot!(1:51, [0.4^i for i in 0:50],linewidth=3,linestyle=:dash)
plot!(1:51, [0.8^i for i in 0:50],linewidth=3,linestyle=:dot)
title!("AR(1) with α=0.8")

Long memory

Code
autocorrelation_plot(NileData().NileMin, 50)
plot!(1:51, [0.4^i for i in 0:50],linewidth=3,linestyle=:dash)
plot!(1:51, [0.8^i for i in 0:50],linewidth=3,linestyle=:dot)
plot!(1:51, [0.9^i for i in 0:50],linewidth=3,linestyle=:dashdot)
title!("AR(1) with α=0.9")

Long memory

Definition

We say that a time series \(x_t\) has long memory if: \[\begin{equation}\label{def:cov} \gamma_x(k) \approx C_x k^{2d-1}\quad \text{as}\quad k\to\infty, \end{equation}\] where \(\gamma_x(k)\) is the autocovariance function and \(C_x\in\mathbb{R}\).

Above, \(g(x)\approx h(x)\) as \(x\to x_0\) means that \(g(x)/h(x)\) converges to \(1\) as \(x\) tends to \(x_0\).

Long memory examples

Long memory origins

Reasons for long memory in time series include:

  • The series is a result of cross-sectional aggregation, Granger (1980) and Haldrup and Vera‐Valdés (2017).

  • The series is a result of shocks of stochastic duration, Parke (1999).

  • The series was fractionally differenced, Granger and Joyeux (1980) and Hosking (1981).

Fractional Differencing

Fractional differencing

  • It is the most common approach to model long memory in time series.

  • It is based on the idea of extending the concept of differencing to non-integer orders.

  • Was first proposed by Granger and Joyeux (1980) and Hosking (1981).

Fractional differencing

Definition

The series \(x_t\) is said to be integrated of order \(d\), denoted \(I(d)\), if \[ (1-L)^dx_t = \varepsilon_t,\] where \(L\) is the lag operator, and \(\varepsilon_t\) is a white noise process with variance \(\sigma^2\).

  • \((1-L)^d\) is called the fractional difference operator.

  • The white noise property of \(\varepsilon_t\) can be relaxed to allow for more general structures.

Fractional differencing

  • The fractional difference operator is decomposed using the binomial expansion:

\[(1-L)^d = \sum_{k=0}^{\infty} \begin{pmatrix} d \\ k \end{pmatrix} (-L)^k,\]

where \(\begin{pmatrix} d \\ k \end{pmatrix}\) are the (generalized) binomial coefficients.

Fractional differencing

  • The series can be written as

\[x_t = \sum_{k=0}^{\infty} \pi_k \varepsilon_{t-k},\]

where \(\pi_k=\frac{(-1)^k\Gamma(1-d)}{\Gamma(k+1)\Gamma(1-k-d)}\), and \(\Gamma()\) is the gamma function.

  • The representation has some unique properties compared to the ARMA model.

Fractional differencing

  • The autocovariance function, \(\gamma(k)\), is given by: \[\gamma(k) = \sigma^2\frac{(-1)^k\Gamma(1-2d)}{\Gamma(1+k-d)\Gamma(1-k-d)}.\]
  • In turn, the autocorrelation function is given by \[\begin{align} \rho(k) &= \frac{(-1)^k\Gamma^2(1-d)}{\Gamma(1+k-d)\Gamma(1-k-d)}\\ &=\frac{\Gamma(1-d)\Gamma(k+d)}{\Gamma(d)\Gamma(1+k-d)}. \end{align}\]

  • Asymptotically, Stirling’s approximation is used to show that \(\gamma(k)\approx k^{2d-1}\) and \(\rho(k)\approx k^{2d-1}\) as \(k\to\infty\).

Fractional differencing

Code
autocorrelation_plot(NileData().NileMin, 50)
plot!(1:51, [0.4^i for i in 0:50],linewidth=3,linestyle=:dash)
plot!(1:51, [0.8^i for i in 0:50],linewidth=3,linestyle=:dot)
plot!(1:51, [0.9^i for i in 0:50],linewidth=3,linestyle=:dashdot)
plot!(1:51, fi_cor_vals( 51, 0.34 ), linewidth = 3, line = :solid)
title!("I(d) [fractional differencing] with d=0.34")

Long Memory Estimation

Long memory estimation

  • There are several methods to estimate the fractional differencing parameter \(d\).

  • The most common are the semi-parametric methods, such as the log-periodogram estimator, and the parametric methods, such as the maximum likelihood estimator.

  • Heuristic methods, such as the log-variance plot, are also used.

Log-variance plot

  • Consider the variance of the sample mean given by

\[\text{Var}(\bar{x}_n) = \frac{\gamma(0)}{n} + 2\sum_{k=1}^{n-1}\left(1-\frac{k}{n}\right)\gamma(k).\]

  • If the second term collapses asympotically, as with exponential decay, then the variance of the sample mean goes to zero.

  • On the contrary, the variance of the sample may diverge or converge depending on the rate of decay of the autocovariance function; that is, depending on \(d\).

Log-variance plot

  • For long memory series, the following relationship holds: \[Var\left(\bar{x}\right) \approx C_v n^{2d-1}.\]

  • This suggest a way to determine if the series has long memory by plotting the log-variance of the sample mean against the log of the sample size.

  • If the series has long memory, the log-variance plot should be a straight line with slope \(2d-1\).

  • On the other hand, for short memory series, the log-variance plot should be a straight line with slope \(-1\).

Log-variance plot

Code
log_variance_plot( NileData().NileMin; m=300, slope = true, slope2 = true )

Log-variance plot properties

  • It is a simple and heuristic method to determine the presence of long memory.

  • It is not a formal test, but it is a useful tool to determine the presence of long memory.

  • It is not a consistent estimator of \(d\).

Maximum likelihood estimator

  • The MLE is a parametric estimator based on fitting the autocovariance function of the time series.

  • Let \(X=[x_0,\cdots,x_{T-1}]'\) be a sample of size \(T\) of a fractionally differenced time series, and let \(\theta = [d,\sigma^2]'\).

  • Under the assumption that \(\varepsilon_t\) follows a normal distribution, \(X\) follows a normal distribution.

Maximum likelihood estimator

  • The probability density of \(X\) is given by: \[ f(X|\theta) = (2\pi)^{-T/2}|\Sigma|^{-1/2}\exp\left(-\frac{1}{2}X^\top\Sigma^{-1}X\right),\] where \(\Sigma\) is the covariance matrix defined as: \[\Sigma = [\gamma(|j-k|)].\]
  • We estimate by maximising the log-likelihood: \[\hat{\theta} = \max_{\theta} \log(f(X|\theta)).\]

MLE properties

  • Under correct specification, MLE is consistent and normally distributed.

  • Under additional regularity conditions, it is asymptotically normally distributed even under misspecification of the error distribution.

  • It is computationally intensive, Sowell (1992) suggested several computational improvements.

  • It depends on the correct specification of the autocovariance function.

Log-periodogram estimators

  • The log-periodogram esimator is a semiparametric estimator in the frequency domain.

  • It was originally proposed by Geweke and Porter-Hudak (1983), hence it is also known as the Geweke-Porter-Hudak estimator or GPH estimator.

  • Alternative versions are known as Whittle estimators.

Frequency domain

  • The frequency domain refers to the analysis of series (or signals) with respect to frequency rather than time.

  • The idea is that any time series can be decomposed into a sum of sinusoidal functions of different frequencies.

  • A time series \(x_t\) can be written as:

\[x_t = \sum_{j=1}^n a_j\cos(2\pi \lambda_j t) + b_j\sin(2\pi \lambda_j t) + \varepsilon_t,\]

where \(a_j\) and \(b_j\) are the amplitudes of the sinusoidal functions, \(\lambda_j\) are the frequencies, and \(\varepsilon_t\) is the error term.

Frequency domain

The time series next is given by \[x_t = x_{1,t}+x_{2,t}+x_{3,t},\]

for \(t=1,\cdots,100\), where

\[\begin{align} x_{1,t} &= 3\cos\left(2\pi\frac{t}{100}\right) + 4\sin\left(2\pi\frac{t}{100}\right),\\ x_{2,t} &= 4\cos\left(2\pi\frac{5t}{100}\right) + 2\sin\left(2\pi\frac{5t}{100}\right),\\ x_{3,t} &= \cos\left(2\pi\frac{40t}{100}\right) + \sin\left(2\pi\frac{40t}{100}\right). \end{align}\]

Frequency domain

Frequency domain

\[x_t = \sum_{j=1}^n a_j\cos(2\pi \lambda_j t) + b_j\sin(2\pi \lambda_j t) + \varepsilon_t\]

  • The contribution of frequency \(\lambda_j\) is \(a_j^2+b_j^2\).

  • This measures the sample variance at each frequency component and is then an estimate of \(\sigma_j^2\) for frequency \(\lambda_j\).

  • Using Euler’s formula, these are computed by the periodogram, \(I_x()\), given by

\[I_x(\lambda_j) = \frac{1}{2\pi}\left|\sum_{t=1}^n x_t e^{-i\lambda_j t}\right|^2.\]

Log-periodogram estimator

  • The low-frequency components of the periodogram are those associated to the longer cycles.

  • For long memory series, the periodogram satisfies:

\[I_x(\lambda)\approx C_f\lambda^{-2d}\quad \text{as}\quad \lambda\to 0.\]

  • Hence, Geweke and Porter-Hudak proposed to estimate \(d\) by fitting a line to the log-periodogram close to the zero frequency.

Log-periodogram estimator

  • The log-periodogram regression is given by:

\[\log(I(\lambda_k)) = c-2d \log(\lambda_k)+u_k,\quad k = 1,\cdots,m,\]

where \(u_k\) is the error term, and \(m\) is a bandwidth parameter that determines the number of frequencies used for estimation.

  • The estimate of \(d\) is retrieved from the slope of the regression line.
  • The bandwidth parameter \(m\) requires a trade-off between bias and variance.

Log-periodogram estimator

Code
periodogram_plot( NileData().NileMin; slope = true)

Log-periodogram estimator properties

  • It is a consistent estimator of \(d\).

  • It is asymptotically normally distributed.

  • It is computationally less intensive than the MLE.

  • It is robust to the specification of the short memory components.

  • It is not as efficient as the MLE.

  • The bandwidth parameter \(m\) has to be chosen carefully: most practitioners use \(m\approx T^{1/2}\), while some suggest \(m\approx T^{4/5}\), where \(T\) is the sample size.

Whittle estimator

  • An alternative semiparametric formulation was developed by Künsch (1987).

  • The author proposed to estimate the parameter as the minimiser of the local Whittle likelihood function given by \[R(d) = \log\left(\frac{1}{m}\sum_{k=1}^{m}\lambda_k^{2d}I(\lambda_k)\right)-\frac{2d}{m}\sum_{k=1}^{m}\log(\lambda_k),\] where \(m\) is the bandwidth parameter.

Whittle estimator properties

  • It is a consistent estimator of \(d\).

  • It is asymptotically normally distributed.

  • It is more efficient than the log-periodogram estimator in the sense that it has a smaller variance.

  • In contrast to log-periodogram regression, the Whittle estimator requires numerical optimisation for estimation.

  • The bandwidth parameter \(m\) has to be chosen carefully.

Long Memory Models

ARFIMA model

  • The ARFIMA model is a generalization of the ARIMA model that includes fractional differencing.

  • The ARFIMA model is defined by the following equation: \[\phi(L)(1-L)^d x_t = \theta(L)\varepsilon_t,\]

where \(\phi(L)\) and \(\theta(L)\) are the autoregressive and moving average polynomials, respectively.

FIGARCH model

  • The FIGARCH model is a generalization of the GARCH model that includes fractional differencing.

  • The FIGARCH model is defined by: \[[1-\alpha(L)-\beta(L)](1-L)^d\varepsilon_t^2=\alpha_0+[1-\beta(L)]\nu_t,\]

where \(\alpha(L)\) and \(\beta(L)\) are lag polynomials.

References

Geweke, John, and Susan Porter-Hudak. 1983. “The Estimation and Application of Long Memory Time Series Models.” Journal of Time Series Analysis 4: 221–38. https://doi.org/10.1111/j.1467-9892.1983.tb00371.x.
Granger, Clive W. J. 1980. “Long Memory Relationships and the Aggregation of Dynamic Models.” Journal of Econometrics 14: 227–38. https://doi.org/10.1016/0304-4076(80)90092-5.
Granger, Clive W. J., and R Joyeux. 1980. “An Introduction to Long Memory Time Series Models and Fractional Differencing.” Journal of Time Series Analysis 1: 15–29. https://doi.org/10.1111/j.1467-9892.1980.tb00297.x.
Haldrup, Niels, and J. Eduardo Vera‐Valdés. 2017. “Long Memory, Fractional Integration, and Cross-Sectional Aggregation.” Journal of Econometrics 199: 1–11. https://doi.org/10.1016/j.jeconom.2017.03.001.
Hosking, J. R M. 1981. “Fractional Differencing.” Biometrika 68: 165–76. https://doi.org/10.1093/biomet/68.1.165.
Künsch, Hans. 1987. “Statistical Aspects of Self-Similar Processes.” Bernouli 1: 67–74.
Parke, WR. 1999. “What Is Fractional Integration?” Review of Economics and Statistics 81: 632–38. https://doi.org/10.1162/003465399558490.
Sowell, Fallaw. 1992. “Maximum Likelihood Estimation of Stationary Univariate Fractionally Integrated Time Series Models.” Journal of Econometrics 53: 165–88. https://doi.org/10.1016/0304-4076(92)90084-5.