10.5 Dealing with categorical variables in ADAMX

\( \newcommand{\mathbbm}[1]{\boldsymbol{\mathbf{#1}}} \)

When dealing with categorical variables in a regression context, they are typically expanded to a set of dummy variables. So, for example, a variable “promotions” that can be “light”, “medium”, and “heavy” for different observations \(t\) would be expanded to three dummy variables, promoLight, promoMedium, and promoHeavy, each one of which is equal to 1, when the respective promotion type happens and equal to zero otherwise. When including these variables in the model, we would typically drop one of them (it is sometimes called pivot variable) and have a model with two dummy variables of a type: \[\begin{equation} y_t = a_0 + a_1 x_{1,t} + \dots + a_n x_{n,t} + d_1 promoLight_t + d_2 promoMedium_t + \epsilon_t, \tag{10.31} \end{equation}\] where \(d_i\) is the parameter for the \(i\)-th dummy variable. The same procedure can be done in the context of ADAMX. The logic will be exactly the same for ADAMX{S}, but when it comes to the dynamic model, the parameters will have time indeces, and there can be different ways of formulating the model. Here is the first one: \[\begin{equation} \begin{aligned} & y_{t} = & a_{0,t-1} + a_{1,t-1} x_{1,t} + \dots + a_{n,t-1} x_{n,t} + d_{1,t-1} promoLight_t + \\ & & d_{2,t-1} promoMedium_t + \epsilon_t \\ & a_{i,t} = & a_{i,t-1} + \left \lbrace \begin{aligned} &\delta_i \frac{\log(1+\epsilon_t)}{x_{i,t}} \text{ for each } i \in \{1, \dots, n\}, \text{ if } x_{i,t}\neq 0 \\ &0 \text{ otherwise } \end{aligned} \right. \\ & d_{1,t} = & d_{1,t-1} + \left \lbrace \begin{aligned} &\delta_{n+1} \epsilon_t, \text{ if } promoLight_t\neq 0 \\ &0 \text{ otherwise } \end{aligned} \right. \\ & d_{2,t} = & d_{2,t-1} + \left \lbrace \begin{aligned} &\delta_{n+2} \epsilon_t, \text{ if } promoMedium_t\neq 0 \\ &0 \text{ otherwise } \end{aligned} \right. \end{aligned} . \tag{10.32} \end{equation}\] Here we assume that each specific category of the variable promotion changes over time on its own with its own smoothing parameters \(\delta_{n+1}\) and \(\delta_{n+2}\). Alternatively, we can assume that they have the same smoothing parameters, implying that the changes of the parameters are similar throughout different categories of the variable: \[\begin{equation} \begin{aligned} & d_{1,t} = d_{1,t-1} + \left \lbrace \begin{aligned} &\delta_{n+1} \epsilon_t, \text{ if } promoLight_t\neq 0 \\ &0 \text{ otherwise } \end{aligned} \right. \\ & d_{2,t} = d_{2,t-1} + \left \lbrace \begin{aligned} &\delta_{n+1} \epsilon_t, \text{ if } promoMedium_t\neq 0 \\ &0 \text{ otherwise } \end{aligned} \right. \end{aligned} . \tag{10.33} \end{equation}\] The rationale for such restriction is that we might expect the adaptation mechanism to apply to the promo variable as a whole, not to its specific values. Indeed, in the example above, the variable of interest is promo, not the promoLight or promoMedium. Doing that will reduce the number of parameters and might simplify the estimation.

The mechanism (10.33) also becomes useful in connecting the ETSX and the conventional seasonal ETS model. Consider an example with quarterly data with no trend, and a categorical variable quarterOfYear, which can be First, Second, Third, and Fourth, depending on the specific observation. For convenience, I will call the parameters for the dummy variables, created from this categorical variable as \(s_{1,t}, s_{2,t}, s_{3,t} \text{ and } s_{4,t}\). Based on (10.33), the model can then be formulated as: \[\begin{equation} \begin{aligned} y_{t} = & l_{t-1} + s_{1,t} quarterOfYear_{1,t} + s_{2,t} quarterOfYear_{2,t} + \\ & s_{3,t} quarterOfYear_{3,t} + s_{4,t} quarterOfYear_{4,t} + \epsilon_t \\ l_t = & l_{t-1} + \alpha \epsilon_t \\ s_{i,t} = & s_{i,t-1} + \left \lbrace \begin{aligned} &\delta \epsilon_t \text{ for each } i \in \{1, \dots, 4\}, \text{ if } quarterOfYear_{i,t}\neq 0 \\ &0 \text{ otherwise } \end{aligned} \right. \end{aligned} . \tag{10.34} \end{equation}\] We intentionally added all four dummy variables in (10.34) to separate the seasonal effect from the level component. While in the case of the classical regression, this does not make much sense, in the ETSX we avoid the trap of dummy variables due to the dynamic update of components and/or parameters (see discussion in Section 14.9). Having done that, we have just formulated the conventional ETS(A,N,A) model using a set of dummy variables and one smoothing parameter, the difference being that the latter relies on the lag of component: \[\begin{equation} \begin{aligned} & y_{t} = l_{t-1} + s_{t-4} + \epsilon_t \\ & l_t = l_{t-1} + \alpha \epsilon_t \\ & s_t = s_{t-4} + \gamma \epsilon_t \\ \end{aligned} . \tag{10.35} \end{equation}\] So, this comparison shows on one hand that the mechanism of ADAMX{D} is natural for the ADAM, and on the other hand that using the same smoothing parameter for different values of a categorical variable can be a reasonable idea, especially in cases when we can assume that all categories of the variable should evolve together.