Time Series
Department of Mathematical Sciences, Aalborg University
Long memory, or long-range dependence, in time series analysis deals with the notion that data may have a strong dependence on past values.
In particular, the autocorrelation function of the data may decay slower than what ARMA models can capture.
Long memory models are used in climate, finance, biology, economics, and many other fields.
Today we will discuss the concept of long memory, why it may occur, and how to model it.
Nile River annual flows is the classical example.
Definition
We say that a time series \(x_t\) has long memory if: \[\begin{equation}\label{def:cov} \gamma_x(k) \approx C_x k^{2d-1}\quad \text{as}\quad k\to\infty, \end{equation}\] where \(\gamma_x(k)\) is the autocovariance function and \(C_x\in\mathbb{R}\).
Above, \(g(x)\approx h(x)\) as \(x\to x_0\) means that \(g(x)/h(x)\) converges to \(1\) as \(x\) tends to \(x_0\).
The Nile River annual flows.
Stock volatilities; for example the (Chicago board options exchange) Volatility Index, or VIX.
Temperature series.
Network traffic.
Reasons for long memory in time series include:
Definition
The series \(x_t\) is said to be integrated of order \(d\), denoted \(I(d)\), if \[ (1-L)^dx_t = \varepsilon_t,\] where \(L\) is the lag operator, and \(\varepsilon_t\) is a white noise process with variance \(\sigma^2\).
\((1-L)^d\) is called the fractional difference operator.
The white noise property of \(\varepsilon_t\) can be relaxed to allow for more general structures.
\[(1-L)^d = \sum_{k=0}^{\infty} \begin{pmatrix} d \\ k \end{pmatrix} (-L)^k,\]
where \(\begin{pmatrix} d \\ k \end{pmatrix}\) are the (generalized) binomial coefficients.
\[x_t = \sum_{k=0}^{\infty} \pi_k \varepsilon_{t-k},\]
where \(\pi_k=\frac{(-1)^k\Gamma(1-d)}{\Gamma(k+1)\Gamma(1-k-d)}\), and \(\Gamma()\) is the gamma function.
In turn, the autocorrelation function is given by \[\begin{align} \rho(k) &= \frac{(-1)^k\Gamma^2(1-d)}{\Gamma(1+k-d)\Gamma(1-k-d)}\\ &=\frac{\Gamma(1-d)\Gamma(k+d)}{\Gamma(d)\Gamma(1+k-d)}. \end{align}\]
Asymptotically, Stirling’s approximation is used to show that \(\gamma(k)\approx k^{2d-1}\) and \(\rho(k)\approx k^{2d-1}\) as \(k\to\infty\).
autocorrelation_plot(NileData().NileMin, 50)
plot!(1:51, [0.4^i for i in 0:50],linewidth=3,linestyle=:dash)
plot!(1:51, [0.8^i for i in 0:50],linewidth=3,linestyle=:dot)
plot!(1:51, [0.9^i for i in 0:50],linewidth=3,linestyle=:dashdot)
plot!(1:51, fi_cor_vals( 51, 0.34 ), linewidth = 3, line = :solid)
title!("I(d) [fractional differencing] with d=0.34")
There are several methods to estimate the fractional differencing parameter \(d\).
The most common are the semi-parametric methods, such as the log-periodogram estimator, and the parametric methods, such as the maximum likelihood estimator.
Heuristic methods, such as the log-variance plot, are also used.
\[\text{Var}(\bar{x}_n) = \frac{\gamma(0)}{n} + 2\sum_{k=1}^{n-1}\left(1-\frac{k}{n}\right)\gamma(k).\]
If the second term collapses asympotically, as with exponential decay, then the variance of the sample mean goes to zero.
On the contrary, the variance of the sample may diverge or converge depending on the rate of decay of the autocovariance function; that is, depending on \(d\).
For long memory series, the following relationship holds: \[Var\left(\bar{x}\right) \approx C_v n^{2d-1}.\]
This suggest a way to determine if the series has long memory by plotting the log-variance of the sample mean against the log of the sample size.
If the series has long memory, the log-variance plot should be a straight line with slope \(2d-1\).
On the other hand, for short memory series, the log-variance plot should be a straight line with slope \(-1\).
It is a simple and heuristic method to determine the presence of long memory.
It is not a formal test, but it is a useful tool to determine the presence of long memory.
It is not a consistent estimator of \(d\).
The MLE is a parametric estimator based on fitting the autocovariance function of the time series.
Let \(X=[x_0,\cdots,x_{T-1}]'\) be a sample of size \(T\) of a fractionally differenced time series, and let \(\theta = [d,\sigma^2]'\).
Under the assumption that \(\varepsilon_t\) follows a normal distribution, \(X\) follows a normal distribution.
Under correct specification, MLE is consistent and normally distributed.
Under additional regularity conditions, it is asymptotically normally distributed even under misspecification of the error distribution.
It is computationally intensive, Sowell (1992) suggested several computational improvements.
It depends on the correct specification of the autocovariance function.
The log-periodogram esimator is a semiparametric estimator in the frequency domain.
It was originally proposed by Geweke and Porter-Hudak (1983), hence it is also known as the Geweke-Porter-Hudak estimator or GPH estimator.
Alternative versions are known as Whittle estimators.
The frequency domain refers to the analysis of series (or signals) with respect to frequency rather than time.
The idea is that any time series can be decomposed into a sum of sinusoidal functions of different frequencies.
\[x_t = \sum_{j=1}^n a_j\cos(2\pi \lambda_j t) + b_j\sin(2\pi \lambda_j t) + \varepsilon_t,\]
where \(a_j\) and \(b_j\) are the amplitudes of the sinusoidal functions, \(\lambda_j\) are the frequencies, and \(\varepsilon_t\) is the error term.
The time series next is given by \[x_t = x_{1,t}+x_{2,t}+x_{3,t},\]
for \(t=1,\cdots,100\), where
\[\begin{align} x_{1,t} &= 3\cos\left(2\pi\frac{t}{100}\right) + 4\sin\left(2\pi\frac{t}{100}\right),\\ x_{2,t} &= 4\cos\left(2\pi\frac{5t}{100}\right) + 2\sin\left(2\pi\frac{5t}{100}\right),\\ x_{3,t} &= \cos\left(2\pi\frac{40t}{100}\right) + \sin\left(2\pi\frac{40t}{100}\right). \end{align}\]
\[x_t = \sum_{j=1}^n a_j\cos(2\pi \lambda_j t) + b_j\sin(2\pi \lambda_j t) + \varepsilon_t\]
The contribution of frequency \(\lambda_j\) is \(a_j^2+b_j^2\).
This measures the sample variance at each frequency component and is then an estimate of \(\sigma_j^2\) for frequency \(\lambda_j\).
\[I_x(\lambda_j) = \frac{1}{2\pi}\left|\sum_{t=1}^n x_t e^{-i\lambda_j t}\right|^2.\]
The low-frequency components of the periodogram are those associated to the longer cycles.
For long memory series, the periodogram satisfies:
\[I_x(\lambda)\approx C_f\lambda^{-2d}\quad \text{as}\quad \lambda\to 0.\]
\[\log(I(\lambda_k)) = c-2d \log(\lambda_k)+u_k,\quad k = 1,\cdots,m,\]
where \(u_k\) is the error term, and \(m\) is a bandwidth parameter that determines the number of frequencies used for estimation.
It is a consistent estimator of \(d\).
It is asymptotically normally distributed.
It is computationally less intensive than the MLE.
It is robust to the specification of the short memory components.
It is not as efficient as the MLE.
The bandwidth parameter \(m\) has to be chosen carefully: most practitioners use \(m\approx T^{1/2}\), while some suggest \(m\approx T^{4/5}\), where \(T\) is the sample size.
An alternative semiparametric formulation was developed by Künsch (1987).
The author proposed to estimate the parameter as the minimiser of the local Whittle likelihood function given by \[R(d) = \log\left(\frac{1}{m}\sum_{k=1}^{m}\lambda_k^{2d}I(\lambda_k)\right)-\frac{2d}{m}\sum_{k=1}^{m}\log(\lambda_k),\] where \(m\) is the bandwidth parameter.
It is a consistent estimator of \(d\).
It is asymptotically normally distributed.
It is more efficient than the log-periodogram estimator in the sense that it has a smaller variance.
In contrast to log-periodogram regression, the Whittle estimator requires numerical optimisation for estimation.
The bandwidth parameter \(m\) has to be chosen carefully.
The ARFIMA model is a generalization of the ARIMA model that includes fractional differencing.
The ARFIMA model is defined by the following equation: \[\phi(L)(1-L)^d x_t = \theta(L)\varepsilon_t,\]
where \(\phi(L)\) and \(\theta(L)\) are the autoregressive and moving average polynomials, respectively.
The FIGARCH model is a generalization of the GARCH model that includes fractional differencing.
The FIGARCH model is defined by: \[[1-\alpha(L)-\beta(L)](1-L)^d\varepsilon_t^2=\alpha_0+[1-\beta(L)]\nu_t,\]
where \(\alpha(L)\) and \(\beta(L)\) are lag polynomials.
Department of Mathematical Sciences