Time Series: Seasonal Autoregressive Integrated Moving Average (SARIMA)

 

Time Series: Seasonal Autoregressive Integrated Moving Average (SARIMA)

 

What is Time Series Analysis?

Time series analysis can be used to examine historical data, identify any underlying trends, and identify seasonal fluctuations. The term "trend" describes the overarching direction the data is moving in; it can be either upward or downward. The regular variations in the data that occur on a seasonal basis are referred to as seasonal variations. This might be a weekly fluctuation, with some days often having higher or lower sales than other days, or it might be a monthly or quarterly variation. When budgeting and forecasting, trends and seasonal fluctuations can be quite helpful in making predictions about the future. There are various types of time series analysis techniques and models that are commonly used to analyze and forecast time series data. One of the prominent ones is the Seasonal Autoregressive Integrated Moving Average (SARIMA).

Seasonal Autoregressive Integrated Moving Average (SARIMA):

In order to forecast future values of a time series based on historical observations, a statistical time series model known as Seasonal Autoregressive Integrated Moving Average (SARIMA) is used. It is a specific type of ARIMA (Autoregressive Integrated Moving Average) model developed to account for seasonality in time series data.  Seasonality is the recurring pattern in a time series of data that occurs at regular intervals, such as weekly, monthly, or yearly. In order to correctly predict seasonality in time series data, this data must be incorporated into the statistical model. SARIMA does this by combining the concepts of moving averages and autoregression with seasonal factors.


Imagine you have a set of data that shows how a certain value changes over time. For example, let's say you have data on monthly sales of a product. SARIMA is a statistical model that helps you make predictions about how this value will change in the future based on patterns it has shown in the past. Now, one important thing to consider is that some values tend to repeat in a cyclical manner. For instance, if you're looking at monthly sales, you might notice that sales are usually higher during certain months of the year, like around the holidays. This repeating pattern is called seasonality.

SARIMA takes this seasonality into account when making predictions. It does this by combining three main components: moving averages, differencing, and autoregression. The moving average component looks at the average of the past values to help predict future values. It considers the overall trend in the data. The differencing component looks at the difference between the current value and the previous value. It helps account for any changes or trends that are not related to seasonality. This component is useful when the data shows a long-term increase or decrease over time. The autoregression component examines the relationship between past values and future values. It looks at how previous values of the data can help predict what will happen next.

Now, to specifically address seasonality, SARIMA includes two additional hyperparameters: P and Q. These hyperparameters represent the seasonal order of the moving average and seasonal autoregression components, respectively. They help capture the specific seasonal patterns in the data. To use SARIMA effectively, you need to choose the right values for these hyperparameters. This can be done through a combination of trial-and-error and statistical methods like maximum likelihood estimation. Once you have determined the ideal values for the hyperparameters, you can fit the SARIMA model to your time series data. This means the model learns from historical data and can then be used to forecast or predict future values.

The great thing about SARIMA is that it's particularly useful when dealing with data that has repeating patterns, such as sales or weather data. It's also easy to use and understand, making it accessible to a wide range of users. SARIMA can even be adjusted to make predictions for several years ahead by taking into account the recurring seasonal patterns in the data.

 Algorithms used for SARIMA:

Algorithms are typically used to estimate the model parameters and make predictions. Here are two commonly used algorithms for SARIMA. They are:

1. Maximum Likelihood Estimation (MLE): MLE is a popular algorithm used to estimate the parameters of SARIMA models. The goal is to find the parameter values that maximize the likelihood of observing the given data. The algorithm iteratively adjusts the parameter values to improve the likelihood until convergence is achieved. This approach is widely used because it provides consistent and efficient estimates of the parameters.

2. Box-Jenkins Methodology: The Box-Jenkins methodology is a systematic approach to time series analysis and forecasting, which includes the estimation of SARIMA models. It involves three main steps: identification, estimation, and diagnostic checking.

·         Identification: In this step, the appropriate SARIMA model is identified by analyzing the autocorrelation and partial autocorrelation plots of the time series data. These plots help determine the order of autoregressive (AR), integrated (I), and moving average (MA) components, as well as the seasonal order.

·         Estimation: Once the model orders are determined, the parameters of the SARIMA model are estimated using MLE or other optimization techniques. The estimation algorithm seeks the parameter values that minimize the model's error or maximize the likelihood function.

·         Diagnostic Checking: After parameter estimation, the residuals (the differences between the observed and predicted values) are analyzed to ensure that the model adequately captures the underlying patterns and is a good fit for the data. Diagnostic checks include examining the autocorrelation of the residuals and performing statistical tests for randomness and independence.

These algorithms are implemented in various software packages and programming languages, such as R (using the `stats` package) and Python (using libraries like `statsmodels` or `pmdarima`). These tools provide functions and methods for model identification, parameter estimation, and diagnostic checking, making it easier to implement SARIMA models.

Model Specification and Hyperparameter Selection in SARIMA:

The model specification refers to the process of determining the specific configuration of a SARIMA model. This involves selecting appropriate values for the hyperparameters p, d, q, P, and Q.

·         p (order of the autoregressive component): This hyperparameter determines the number of lagged observations (past values) that influence the current value in the autoregressive component. It represents the "AR" part of SARIMA. Selecting the value of p requires considering the autocorrelation function (ACF) plot and partial autocorrelation function (PACF) plot of the time series data.

·         d (order of differencing): This hyperparameter determines the number of times the data needs to be differenced to achieve stationarity. Differencing helps remove trends and make the time series stationary, making it easier to model. The value of d can be determined by observing the trend and seasonality in the data.

·         q (order of the moving average component): This hyperparameter determines the number of lagged forecast errors (residuals) that are included in the moving average component. It represents the "MA" part of SARIMA. The selection of q can be guided by examining the ACF and PACF plots of the residuals.

·         P (order of the seasonal autoregressive component): This hyperparameter represents the seasonal counterpart of the autoregressive component. It captures the influence of lagged seasonal observations on the current value. The selection of P involves examining the ACF and PACF plots of the seasonal differenced series.

·         Q (order of the seasonal moving average component): This hyperparameter represents the seasonal counterpart of the moving average component. It captures the influence of lagged seasonal forecast errors on the current value. The selection of Q can be guided by examining the ACF and PACF plots of the seasonal residuals.

Hyperparameter selection is a crucial step in SARIMA modeling. There are a few common approaches for hyperparameter selection:

·         Manual Selection: Domain knowledge and understanding of the data can guide the selection of hyperparameters. By analyzing plots and examining the characteristics of the time series, an initial set of hyperparameters can be chosen and adjusted iteratively.

·         Information Criteria: Statistical methods such as the Akaike Information Criterion (AIC) or Bayesian Information Criterion (BIC) can be used to compare different SARIMA models. These criteria balance the model's goodness of fit with its complexity, helping to select the most appropriate set of hyperparameters.

·         Grid Search: A systematic approach involves testing different combinations of hyperparameters using a grid search. This involves specifying a range of values for each hyperparameter and evaluating multiple models to find the set of hyperparameters that produces the best results based on a predefined criterion (e.g., minimum AIC).

Conclusion:

In conclusion, Seasonal Autoregressive Integrated Moving Average (SARIMA) models provide a powerful tool for time series analysis and forecasting. SARIMA models excel at capturing seasonal patterns and fluctuations in data, making them particularly useful for datasets with recurring patterns over time. They offer flexibility in modeling trend, seasonality, and noise components, allowing for accurate predictions and interpretation of model parameters.

Overall, SARIMA models offer accurate forecasting capabilities, interpretability of parameters, and a well-established methodology for time series analysis. Their ability to handle seasonal patterns and flexibility in modeling different components make them valuable tool in the field of time series analysis.

References:

https://towardsdatascience.com/time-series-from-scratch-moving-averages-ma-theory-and-implementation-a01b97b60a18

https://www.investopedia.com/terms/a/autoregressive-integrated-moving-average-arima.asp#:~:text=An%20autoregressive%20integrated%20moving%20average%2C%20or%20ARIMA%2C%20is%20a%20statistical,values%20based%20on%20past%20values.

https://www.mdpi.com/2411-5134/7/4/94

Comments

Popular posts from this blog

Koala: A Dialogue Model for Academic Research