5.5 Distributional assumptions in pure additive ADAM

\( \newcommand{\mathbbm}[1]{\boldsymbol{\mathbf{#1}}} \)

While the conventional ETS assumes that the error term follows Normal distribution, ADAM ETS proposes some flexibility, implementing the following options for the error term distribution in the additive error models:

Normal: \(\epsilon_t \sim \mathcal{N}(0, \sigma^2)\), meaning that \(y_t = \mu_{y,t} + \epsilon_t \sim \mathcal{N}(\mu_{y,t}, \sigma^2)\);
Laplace: \(\epsilon_t \sim \mathcal{L}(0, s)\), so that \(y_t = \mu_{y,t} + \epsilon_t \sim \mathcal{L}(\mu_{y,t}, s)\);
Generalised Normal: \(\epsilon_t \sim \mathcal{GN}(0, s, \beta)\), leading to \(y_t = \mu_{y,t} + \epsilon_t \sim \mathcal{GN}(\mu_{y,t}, s, \beta)\);
S (special case of \(\mathcal{GN}\) with \(\beta=0.5\)): \(\epsilon_t \sim \mathcal{S}(0, s)\), implying that \(y_t = \mu_{y,t} + \epsilon_t \sim \mathcal{S}(\mu_{y,t}, s)\),

where \(\mu_{y,t} = \mathbf{w}^\prime \mathbf{v}_{t-\boldsymbol{l}}\) is the one step ahead point forecast.

The conditional moments and stability/forecastability conditions do not change for the model with these assumptions. The main element that changes is the scale and the width of prediction intervals. Given that the scales of these distributions are linearly related to the variance, one can calculate the conditional variance as discussed in Section 5.3 and then use it in order to obtain the respective scales. Having the scales, it becomes straightforward to calculate the needed quantiles for the prediction intervals. Here are the formulae for the scales of distributions mentioned above:

Normal: scale is \(\sigma^2_h\);
Laplace: \(s_h = \sigma_h \sqrt{\frac{1}{2}}\);
Generalised Normal: \(s_h = \sigma_h \sqrt{\frac{\Gamma(1/\beta)}{\Gamma(3/\beta)}}\);
S: \(s_h = \sqrt{\sigma_h}\sqrt[4]{\frac{1}{120}}\).

The estimation of pure additive ADAM can be done via the maximisation of the likelihood of the assumed distribution (see Section 11.1), which in some cases coincides with the popular loss functions (e.g. Normal and MSE, or Laplace and MAE).

In addition, the following more exotic options for the additive error models are available in ADAM:

Log-Normal: \(\left(1+\frac{\epsilon_t}{\mu_{y,t}} \right) \sim \text{log}\mathcal{N}\left(-\frac{\sigma^2}{2}, \sigma^2\right)\), implying that \(y_t = \mu_{y,t} \left(1+\frac{\epsilon_t}{\mu_{y,t}} \right) = y_t = \mu_{y,t} + \epsilon_t \sim \text{log}\mathcal{N}\left(\log\mu_{y,t} -\frac{\sigma^2}{2}, \sigma^2\right)\). Here, \(\sigma^2\) is the variance of the error term in logarithms and the \(-\frac{\sigma^2}{2}\) appears due to the restriction \(\text{E}(\epsilon_t)=0\);
Inverse Gaussian: \(\left(1+\frac{\epsilon_t}{\mu_{y,t}} \right) \sim \mathcal{IG}(1, \sigma^2)\) with \(y_t=\mu_{y,t} \left(1+\frac{\epsilon_t}{\mu_{y,t}} \right) \sim \mathcal{IG}\left(\mu_{y,t}, \frac{\sigma^2}{\mu_{y,t}}\right)\);
Gamma: \(\left(1+\frac{\epsilon_t}{\mu_{y,t}} \right) \sim \mathcal{\Gamma}(\sigma^{-2}, \sigma^2)\), so that \(y_t = \mu_{y,t} \left(1+\frac{\epsilon_t}{\mu_{y,t}} \right) \sim \mathcal{\Gamma}(\sigma^{-2}, \sigma^2 \mu_{y,t})\).

The possibility of application of these distributions arises from the reformulation of the original pure additive model (5.5) into: \[\begin{equation} \begin{aligned} {y}_{t} = &\mathbf{w}^\prime \mathbf{v}_{t-\boldsymbol{l}}\left(1 + \frac{\epsilon_t}{\mathbf{w}^\prime \mathbf{v}_{t-\boldsymbol{l}}}\right) \\ \mathbf{v}_{t} = &\mathbf{F} \mathbf{v}_{t-\boldsymbol{l}} + \mathbf{g} \epsilon_t \end{aligned}. \tag{5.23} \end{equation}\] The connection between the two formulations becomes apparent when opening the brackets in the measurement equation of (5.23). Note that in this case, the model assumes that the data is strictly positive, and while it might be possible to fit the model on the data with negative values, the calculation of the scale and the likelihood might become impossible. Using alternative losses (e.g. MSE) is a potential solution in this case.