1.4 Models, methods, and typical assumptions

\( \newcommand{\mathbbm}[1]{\boldsymbol{\mathbf{#1}}} \)

While we do not aim to fully cover the topic of models, methods, and typical assumptions of statistical models, we need to make several important definitions to clarify what we will discuss later in this monograph. For a more detailed discussion, see Chapters 1 and 15 of Svetunkov (2022).

Chatfield et al. (2001) was the first to discuss the distinction between forecasting model and method, although the two are not thoroughly defined in their paper: “method, meaning a computational procedure for producing forecasts”, and “a model, meaning a mathematical representation of reality”. I think it is important to make a proper distinction between the two.

The Cambridge dictionary (Dictionary, 2021) defines method as a particular way of doing something. So, the method does not necessarily explain the structure or how some components or variables interact with each other; it only describes how a value (for example, point forecast) is produced. In our context, the forecasting method would be a formula that generates point forecasts based on some parameters and available data. It would not explain what structure underlies the data.

A statistical model, on the other hand, is a “mathematical representation of a real phenomenon with a complete specification of distribution and parameters” (Svetunkov and Boylan, 2023a). It explains what happens inside the data, reveals the structure, and shows how the random variables interact with the structure.

While discussing statistical models, we should also define true model. It is “the idealistic statistical model that is correctly specified (has all the necessary components in the correct form), and applied to the data in population” (Svetunkov, 2022). Some statisticians also use the term Data Generating Process (DGP) when discussing the true model as a synonym. However, we need to distinguish between the two terms, as DGP implies that the data is somehow generated using a mathematical formula. In real life, the data is never generated from any function; it comes from a measurement of a complex process, influenced by many factors (e.g. behaviour of a group of customers based on their individual preferences and mental states). The DGP is useful when we want to conduct experiments on simulated data in a controlled environment, but it is not helpful when applying models to the data. Finally, the true model is an abstract notion because it is never known or reachable (e.g. we do not always have all the necessary variables). But it is still a useful one, as it allows us to see what would happen in therory if we knew the model and, more importantly, what would happen if the model we used was wrong (which is always the case in real life).

Related to this definition is the estimated or applied model, which is the statistical model that is applied to the available sample of data. This model will almost always be wrong because even if we know the specification of the true model for some mysterious reason, we would still need to estimate it on our data. In this case, the estimates of parameters would differ from those in the population, and thus the model will still be wrong.

Mathematically, in the simplest case the true model can be written as: \[\begin{equation} y_t = \mu_{y,t} + \epsilon_t, \tag{1.4} \end{equation}\] where \(y_t\) is the actual observed value, \(\mu_{y,t}\) is the structure, and \(\epsilon_t\) is the true noise. If we manage to capture the structure correctly, the model applied to the sample of data would be written as: \[\begin{equation} y_t = \hat{\mu}_{y,t} + e_t, \tag{1.5} \end{equation}\] where \(\hat{\mu}_{y,t}\) is the estimate of the structure \(\mu_{y,t}\) and \(e_t\) is the estimate of the noise \(\epsilon_t\) (also known as “residuals”). If the structure is captured correctly, there would still be a difference between (1.4) and (1.5) because the latter is estimated on a sample of data. However, if the sample size increases and we use an adequate estimation procedure, then due to Central Limit Theorem (see Chapter 6 of Svetunkov, 2022), the distance between the two models will decrease, and asymptotically (with the increase of sample size) \(e_t\) would converge to \(\epsilon_t\). This does not happen automatically, and some assumptions must hold for this to happen.

1.4.1 Assumptions of statistical models

Very roughly, the typical assumptions of statistical models can be split into the following categories (Svetunkov, 2022):

Model is correctly specified:
1. We have not omitted important variables in the model (underfitting the data);
2. We do not have redundant variables in the model (overfitting the data);
3. The necessary transformations of the variables are applied;
4. We do not have outliers in the model;
Errors are independent and identically distributed (i.i.d.):
1. There is no autocorrelation in the residuals;
2. The residuals are homoscedastic (i.e. have constant variance);
3. The expectation of residuals is zero, no matter what;
4. The variable follows the assumed distribution;
5. More generally speaking, the distribution of residuals does not change over time;
The explanatory variables are not correlated with anything but the response variable:
1. No multicollinearity;
2. No endogeneity.

Remark. The third group above relates more to the assumptions of model estimation rather than the model itself, but it is useful to have it in mind during the model-building process.

Many of these assumptions come from the idea that we have correctly captured the structure, meaning that we have not omitted any essential variables, we have not included redundant ones, and we transformed all the variables correctly (e.g. took logarithms, where needed). If all these assumptions hold, then we would expect the applied model to converge to the true one with the increase of the sample size. If some of them do not hold, then the point forecasts from our model might be biased, or we might end up producing wider (or narrower) prediction intervals than expected.

These assumptions with their implications on an example of multiple regression are discussed in detail in Chapter 15 of Svetunkov (2022). The diagnostics of dynamic models based on these assumptions are discussed in Chapter 14 of this monograph.

References

• Chatfield, C., Koehler, A.B., Ord, J.K., Snyder, R.D., 2001. A New Look at Models for Exponential Smoothing. Journal of the Royal Statistical Society, Series D (The Statistician). 50, 147–159. https://www.jstor.org/stable/2681090

• Dictionary, 2021. Method. https://dictionary.cambridge.org/dictionary/english/method version: 2021-09-02

• Svetunkov, I., 2022. Statistics for business analytics. https://openforecast.org/sba/ version: 31.10.2022

• Svetunkov, I., Boylan, J.E., 2023a. iETS: State Space Model for Intermittent Demand Forecastings. International Journal of Production Economics. 109013. https://doi.org/10.1016/j.ijpe.2023.109013