Long Memory Models

Time Series

Department of Mathematical Sciences, Aalborg University

Long Memory

Long memory

  • Long memory, or long-range dependence, in time series analysis deals with the notion that data may have a strong dependence on past values.

  • In particular, the autocorrelation function of the data may decay slower than what ARMA models can capture.

  • Long memory models are used in climate, finance, biology, economics, and many other fields.

  • Today we will discuss the concept of long memory, why it may occur, and how to model it.

Long memory

Nile River annual flows is the classical example.

#| echo: true
#| code-fold: true
#| cache: true
using Pkg; Pkg.activate(pwd())
using LongMemory
NileDataPlot()

Long memory

#| echo: true
#| code-fold: true
#| cache: true
#| 
using Plots
autocorrelation_plot(NileData().NileMin, 50)
plot!(1:51, [0.4^i for i in 0:50],linewidth=3,linestyle=:dash)
title!("AR(1) with α=0.4")

Long memory

#| echo: true
#| code-fold: true
#| cache: true
#| 
autocorrelation_plot(NileData().NileMin, 50)
plot!(1:51, [0.4^i for i in 0:50],linewidth=3,linestyle=:dash)
plot!(1:51, [0.8^i for i in 0:50],linewidth=3,linestyle=:dot)
title!("AR(1) with α=0.8")

Long memory

#| echo: true
#| code-fold: true
#| cache: true
#| 
autocorrelation_plot(NileData().NileMin, 50)
plot!(1:51, [0.4^i for i in 0:50],linewidth=3,linestyle=:dash)
plot!(1:51, [0.8^i for i in 0:50],linewidth=3,linestyle=:dot)
plot!(1:51, [0.9^i for i in 0:50],linewidth=3,linestyle=:dashdot)
title!("AR(1) with α=0.9")

Long memory

Definition

We say that a time series \(x_t\) has long memory if: \[\begin{equation}\label{def:cov} \gamma_x(k) \approx C_x k^{2d-1}\quad \text{as}\quad k\to\infty, \end{equation}\] where \(\gamma_x(k)\) is the autocovariance function and \(C_x\in\mathbb{R}\).

Above, \(g(x)\approx h(x)\) as \(x\to x_0\) means that \(g(x)/h(x)\) converges to \(1\) as \(x\) tends to \(x_0\).

Long memory examples

Long memory origins

Reasons for long memory in time series include:

  • The series is a result of cross-sectional aggregation, Granger (1980) and Haldrup and Vera‐Valdés (2017).

  • The series is a result of shocks of stochastic duration, Parke (1999).

  • The series was fractionally differenced, Granger and Joyeux (1980) and Hosking (1981).

Fractional Differencing

Fractional differencing

  • It is the most common approach to model long memory in time series.

  • It is based on the idea of extending the concept of differencing to non-integer orders.

  • Was first proposed by Granger and Joyeux (1980) and Hosking (1981).

Fractional differencing

Definition

The series \(x_t\) is said to be integrated of order \(d\), denoted \(I(d)\), if \[ (1-L)^dx_t = \varepsilon_t,\] where \(L\) is the lag operator, and \(\varepsilon_t\) is a white noise process with variance \(\sigma^2\).

  • \((1-L)^d\) is called the fractional difference operator.

  • The white noise property of \(\varepsilon_t\) can be relaxed to allow for more general structures.

Fractional differencing

  • The fractional difference operator is decomposed using the binomial expansion:

\[(1-L)^d = \sum_{k=0}^{\infty} \begin{pmatrix} d \\ k \end{pmatrix} (-L)^k,\]

where \(\begin{pmatrix} d \\ k \end{pmatrix}\) are the (generalized) binomial coefficients.

Fractional differencing

  • The series can be written as

\[x_t = \sum_{k=0}^{\infty} \pi_k \varepsilon_{t-k},\]

where \(\pi_k=\frac{(-1)^k\Gamma(1-d)}{\Gamma(k+1)\Gamma(1-k-d)}\), and \(\Gamma()\) is the gamma function.

  • The representation has some unique properties compared to the ARMA model.

Fractional differencing

  • The autocovariance function, \(\gamma(k)\), is given by: \[\gamma(k) = \sigma^2\frac{(-1)^k\Gamma(1-2d)}{\Gamma(1+k-d)\Gamma(1-k-d)}.\]
  • In turn, the autocorrelation function is given by \[\begin{align} \rho(k) &= \frac{(-1)^k\Gamma^2(1-d)}{\Gamma(1+k-d)\Gamma(1-k-d)}\\ &=\frac{\Gamma(1-d)\Gamma(k+d)}{\Gamma(d)\Gamma(1+k-d)}. \end{align}\]

  • Asymptotically, Stirling’s approximation is used to show that \(\gamma(k)\approx k^{2d-1}\) and \(\rho(k)\approx k^{2d-1}\) as \(k\to\infty\).

Fractional differencing

#| echo: true
#| code-fold: true
#| cache: true

autocorrelation_plot(NileData().NileMin, 50)
plot!(1:51, [0.4^i for i in 0:50],linewidth=3,linestyle=:dash)
plot!(1:51, [0.8^i for i in 0:50],linewidth=3,linestyle=:dot)
plot!(1:51, [0.9^i for i in 0:50],linewidth=3,linestyle=:dashdot)
plot!(1:51, fi_cor_vals( 51, 0.34 ), linewidth = 3, line = :solid)
title!("I(d) [fractional differencing] with d=0.34")

Long Memory Estimation

Long memory estimation

  • There are several methods to estimate the fractional differencing parameter \(d\).

  • The most common are the semi-parametric methods, such as the log-periodogram estimator, and the parametric methods, such as the maximum likelihood estimator.

  • Heuristic methods, such as the log-variance plot, are also used.

Log-variance plot

  • Consider the variance of the sample mean given by

\[\text{Var}(\bar{x}_n) = \frac{\gamma(0)}{n} + 2\sum_{k=1}^{n-1}\left(1-\frac{k}{n}\right)\gamma(k).\]

  • If the second term collapses asympotically, as with exponential decay, then the variance of the sample mean goes to zero.

  • On the contrary, the variance of the sample may diverge or converge depending on the rate of decay of the autocovariance function; that is, depending on \(d\).

Log-variance plot

  • For long memory series, the following relationship holds: \[Var\left(\bar{x}\right) \approx C_v n^{2d-1}.\]

  • This suggest a way to determine if the series has long memory by plotting the log-variance of the sample mean against the log of the sample size.

  • If the series has long memory, the log-variance plot should be a straight line with slope \(2d-1\).

  • On the other hand, for short memory series, the log-variance plot should be a straight line with slope \(-1\).

Log-variance plot

#| echo: true
#| code-fold: true
#| cache: true

log_variance_plot( NileData().NileMin; m=300, slope = true, slope2 = true )

Log-variance plot properties

  • It is a simple and heuristic method to determine the presence of long memory.

  • It is not a formal test, but it is a useful tool to determine the presence of long memory.

  • It is not a consistent estimator of \(d\).

Maximum likelihood estimator

  • The MLE is a parametric estimator based on fitting the autocovariance function of the time series.

  • Let \(X=[x_0,\cdots,x_{T-1}]'\) be a sample of size \(T\) of a fractionally differenced time series, and let \(\theta = [d,\sigma^2]'\).

  • Under the assumption that \(\varepsilon_t\) follows a normal distribution, \(X\) follows a normal distribution.

Maximum likelihood estimator

  • The probability density of \(X\) is given by: \[ f(X|\theta) = (2\pi)^{-T/2}|\Sigma|^{-1/2}\exp\left(-\frac{1}{2}X^\top\Sigma^{-1}X\right),\] where \(\Sigma\) is the covariance matrix defined as: \[\Sigma = [\gamma(|j-k|)].\]
  • We estimate by maximising the log-likelihood: \[\hat{\theta} = \max_{\theta} \log(f(X|\theta)).\]

MLE properties

  • Under correct specification, MLE is consistent and normally distributed.

  • Under additional regularity conditions, it is asymptotically normally distributed even under misspecification of the error distribution.

  • It is computationally intensive, Sowell (1992) suggested several computational improvements.

  • It depends on the correct specification of the autocovariance function.

Log-periodogram estimators

  • The log-periodogram esimator is a semiparametric estimator in the frequency domain.

  • It was originally proposed by Geweke and Porter-Hudak (1983), hence it is also known as the Geweke-Porter-Hudak estimator or GPH estimator.

  • Alternative versions are known as Whittle estimators.

Frequency domain

  • The frequency domain refers to the analysis of series (or signals) with respect to frequency rather than time.

  • The idea is that any time series can be decomposed into a sum of sinusoidal functions of different frequencies.

  • A time series \(x_t\) can be written as:

\[x_t = \sum_{j=1}^n a_j\cos(2\pi \lambda_j t) + b_j\sin(2\pi \lambda_j t) + \varepsilon_t,\]

where \(a_j\) and \(b_j\) are the amplitudes of the sinusoidal functions, \(\lambda_j\) are the frequencies, and \(\varepsilon_t\) is the error term.

Frequency domain

The time series next is given by \[x_t = x_{1,t}+x_{2,t}+x_{3,t},\]

for \(t=1,\cdots,100\), where

\[\begin{align} x_{1,t} &= 3\cos\left(2\pi\frac{t}{100}\right) + 4\sin\left(2\pi\frac{t}{100}\right),\\ x_{2,t} &= 4\cos\left(2\pi\frac{5t}{100}\right) + 2\sin\left(2\pi\frac{5t}{100}\right),\\ x_{3,t} &= \cos\left(2\pi\frac{40t}{100}\right) + \sin\left(2\pi\frac{40t}{100}\right). \end{align}\]

Frequency domain

#| cache: true

using Random
Random.seed!(1234)
t = 1:100
x1 = 3cos.(2π * t * 1/100) + 4sin.(2π * t * 1/100)
x2 = 4cos.(2π * t * 5/100) + 2sin.(2π * t * 5/100)
x3 = cos.(2π * t * 40/100) + sin.(2π * t * 40/100)

x = x1 + x2 + x3

p1 = plot(t, x1, label = "Component 1", legend = :bottomleft)
p2 = plot(t, x2, label = "Component 2", legend = :topleft)
p3 = plot(t, x3, label = "Component 3", legend = :topleft)
p4 = plot(t, x, label = "Time series", legend = :bottomleft)
plot(p1, p2, p3, p4, layout = (2,2))

Frequency domain

\[x_t = \sum_{j=1}^n a_j\cos(2\pi \lambda_j t) + b_j\sin(2\pi \lambda_j t) + \varepsilon_t\]

  • The contribution of frequency \(\lambda_j\) is \(a_j^2+b_j^2\).

  • This measures the sample variance at each frequency component and is then an estimate of \(\sigma_j^2\) for frequency \(\lambda_j\).

  • Using Euler’s formula, these are computed by the periodogram, \(I_x()\), given by

\[I_x(\lambda_j) = \frac{1}{2\pi}\left|\sum_{t=1}^n x_t e^{-i\lambda_j t}\right|^2.\]

Log-periodogram estimator

  • The low-frequency components of the periodogram are those associated to the longer cycles.

  • For long memory series, the periodogram satisfies:

\[I_x(\lambda)\approx C_f\lambda^{-2d}\quad \text{as}\quad \lambda\to 0.\]

  • Hence, Geweke and Porter-Hudak proposed to estimate \(d\) by fitting a line to the log-periodogram close to the zero frequency.

Log-periodogram estimator

  • The log-periodogram regression is given by:

\[\log(I(\lambda_k)) = c-2d \log(\lambda_k)+u_k,\quad k = 1,\cdots,m,\]

where \(u_k\) is the error term, and \(m\) is a bandwidth parameter that determines the number of frequencies used for estimation.

  • The estimate of \(d\) is retrieved from the slope of the regression line.
  • The bandwidth parameter \(m\) requires a trade-off between bias and variance.

Log-periodogram estimator

#| echo: true
#| code-fold: true
#| cache: true

periodogram_plot( NileData().NileMin; slope = true)

Log-periodogram estimator properties

  • It is a consistent estimator of \(d\).

  • It is asymptotically normally distributed.

  • It is computationally less intensive than the MLE.

  • It is robust to the specification of the short memory components.

  • It is not as efficient as the MLE.

  • The bandwidth parameter \(m\) has to be chosen carefully: most practitioners use \(m\approx T^{1/2}\), while some suggest \(m\approx T^{4/5}\), where \(T\) is the sample size.

Whittle estimator

  • An alternative semiparametric formulation was developed by Künsch (1987).

  • The author proposed to estimate the parameter as the minimiser of the local Whittle likelihood function given by \[R(d) = \log\left(\frac{1}{m}\sum_{k=1}^{m}\lambda_k^{2d}I(\lambda_k)\right)-\frac{2d}{m}\sum_{k=1}^{m}\log(\lambda_k),\] where \(m\) is the bandwidth parameter.

Whittle estimator properties

  • It is a consistent estimator of \(d\).

  • It is asymptotically normally distributed.

  • It is more efficient than the log-periodogram estimator in the sense that it has a smaller variance.

  • In contrast to log-periodogram regression, the Whittle estimator requires numerical optimisation for estimation.

  • The bandwidth parameter \(m\) has to be chosen carefully.

Long Memory Models

ARFIMA model

  • The ARFIMA model is a generalization of the ARIMA model that includes fractional differencing.

  • The ARFIMA model is defined by the following equation: \[\phi(L)(1-L)^d x_t = \theta(L)\varepsilon_t,\]

where \(\phi(L)\) and \(\theta(L)\) are the autoregressive and moving average polynomials, respectively.

FIGARCH model

  • The FIGARCH model is a generalization of the GARCH model that includes fractional differencing.

  • The FIGARCH model is defined by: \[[1-\alpha(L)-\beta(L)](1-L)^d\varepsilon_t^2=\alpha_0+[1-\beta(L)]\nu_t,\]

where \(\alpha(L)\) and \(\beta(L)\) are lag polynomials.

References

Geweke, John, and Susan Porter-Hudak. 1983. “The Estimation and Application of Long Memory Time Series Models.” Journal of Time Series Analysis 4: 221–38. https://doi.org/10.1111/j.1467-9892.1983.tb00371.x.
Granger, Clive W. J. 1980. “Long Memory Relationships and the Aggregation of Dynamic Models.” Journal of Econometrics 14: 227–38. https://doi.org/10.1016/0304-4076(80)90092-5.
Granger, Clive W. J., and R Joyeux. 1980. “An Introduction to Long Memory Time Series Models and Fractional Differencing.” Journal of Time Series Analysis 1: 15–29. https://doi.org/10.1111/j.1467-9892.1980.tb00297.x.
Haldrup, Niels, and J. Eduardo Vera‐Valdés. 2017. “Long Memory, Fractional Integration, and Cross-Sectional Aggregation.” Journal of Econometrics 199: 1–11. https://doi.org/10.1016/j.jeconom.2017.03.001.
Hosking, J. R M. 1981. “Fractional Differencing.” Biometrika 68: 165–76. https://doi.org/10.1093/biomet/68.1.165.
Künsch, Hans. 1987. “Statistical Aspects of Self-Similar Processes.” Bernouli 1: 67–74.
Parke, WR. 1999. “What Is Fractional Integration?” Review of Economics and Statistics 81: 632–38. https://doi.org/10.1162/003465399558490.
Sowell, Fallaw. 1992. “Maximum Likelihood Estimation of Stationary Univariate Fractionally Integrated Time Series Models.” Journal of Econometrics 53: 165–88. https://doi.org/10.1016/0304-4076(92)90084-5.