How can I select the best SARIMA model
The aim of this note is to show, using a real data, how to select the best a SARIMA model for a given time series.
I won’t remind here the theory behind SARIMA models. You can either come to my class to learn more about time series 😎😎😎 or read the following books
I will use in this tutorial
departures available in fpp2 package and I will the series permanent. It’s about Overseas departures from Australia and I will be using the monthly number permanent departures from January 1976 - November 2016.fpp2, forecast, urca and ggplot2I start then by importing the data from fpp2 package and representing it in a graph
> library(fpp2)
> library(ggplot2)
> data("departures")
> x=departures[,1]
> x%>%autoplot()+theme_bw()+xlab("Years")+ ylab("Departures")

auto.arima, from forecast package, command to check quickly if this kind of data can be fitted using SARIMA models. I the estimated AIC is non-negative and if the log likelihood is negative we can conclude that SARIMA models can be good for this data. I will also use auto.sarima command to get an idea about the values of \(d\), \(D\) and \(T\).> library(forecast)
> m0<-auto.arima(x)
> m0
## Series: x
## ARIMA(0,1,1)(0,1,1)[12]
##
## Coefficients:
## ma1 sma1
## -0.5216 -0.4012
## s.e. 0.0436 0.0412
##
## sigma^2 estimated as 0.1002: log likelihood=-130.45
## AIC=266.9 AICc=266.95 BIC=279.46
I can first notice that I can continue using SARIMA models that fits with this data. I won’t use the previous estimated model by auto.arima because I’m not sure that I’m selecting the best model using the auto.arima command. I will differentiate the series until I obtain stationary. Once this step is done, I will run a selection procedure that we provide the best model adjusting my data that I can use it to make predictions.
I will first represent the process \((1-B)(1-B^12)X_t\) obtained by differentiating my original time series \(X_t\).
> library(latex2exp)
> x%>%diff(12)%>%diff(1)%>%autoplot()+theme_bw()+ylab(TeX("$(1-B)(1-B^2)X_t$}$"))+xlab("")

I can notice from the previous graph that \((1-B)(1-B^12)X_t\) has a stationary behavior. I will confirm this using ADF test and KPSS test too. Both tests can be performed using respectively the commands ur.df and ur.kpss.
Since we notice in the previous graph that the time series \((1-B)(1-B^12)X_t\) has no trend and no drift. I will then choose the type="none" for the ADF test and the type=mu for the KPSS test.
> library(urca)
> t1=x%>%diff(12)%>%diff(1)%>%ur.df(lags = 6,type="none")
> summary(t1)
##
## ###############################################
## # Augmented Dickey-Fuller Test Unit Root Test #
## ###############################################
##
## Test regression none
##
##
## Call:
## lm(formula = z.diff ~ z.lag.1 - 1 + z.diff.lag)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.97024 -0.18372 0.00008 0.17017 1.33122
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## z.lag.1 -2.38348 0.20765 -11.478 < 2e-16 ***
## z.diff.lag1 0.86632 0.19067 4.544 7.04e-06 ***
## z.diff.lag2 0.64170 0.16989 3.777 0.000179 ***
## z.diff.lag3 0.55904 0.14808 3.775 0.000180 ***
## z.diff.lag4 0.47815 0.12148 3.936 9.54e-05 ***
## z.diff.lag5 0.27783 0.08903 3.121 0.001915 **
## z.diff.lag6 0.11445 0.04853 2.359 0.018753 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3351 on 471 degrees of freedom
## Multiple R-squared: 0.7267, Adjusted R-squared: 0.7226
## F-statistic: 178.9 on 7 and 471 DF, p-value: < 2.2e-16
##
##
## Value of test-statistic is: -11.4782
##
## Critical values for test statistics:
## 1pct 5pct 10pct
## tau1 -2.58 -1.95 -1.62
First, I notice that the regression in the ADF test is significant ; a small p-value, an \(R^2\) close to 1 and the coefficients are significant. The value of test-statistic is -11.4782. It’s very smaller that the critical value corresponding to the p-value 1%. Then the p-value of the hypothesis test: \[H_0\,:\, \phi=0\;\mbox{ vs }\;$H_1\,:\, \phi<0\] Hence \(H_0\) is rejected and then I can conclude that \((1-B)(1-B^12)X_t\) can be considered as a stationary process.
Let’s now perform a KPSS test
> library(urca)
> t2=x%>%diff(12)%>%diff(1)%>%ur.kpss(type="mu")
> summary(t2)
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 5 lags.
##
## Value of test-statistic is: 0.0221
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
The value of the test-statistic is .0221 smaller than the critical value corresponding to the p-value 10%. We can also conclude that \((1-B)(1-B^12)X_t\) can be considered as a stationary process.
I can then conclude that \(d=D=1\) and \(T=12\). Let’s notice that \(T=12\) is expected since we’re dealing with monthly data.
In the next step I will determine the possible values of \(p\), \(P\), \(q\) and \(Q\). I will ACF to determine the couples \((q,Q)\) and I will use PACF to determine the couples \((p,P)\).
> x%>%diff(12)%>%diff(1)%>%ggAcf()+theme_bw()

I can deduce from the previous graph that the ACF are equal to zero from the order 14 and these ACF are decreasing to zero. Then \(q+12Q\leq 13\). Let’s recall that \(q+12Q\) is the degree of the MA operator.
Let’s now draw the PACF
> x%>%diff(12)%>%diff(1)%>%ggPacf()+theme_bw()

I will then have the same conclusion as in the ACF and I will choose the couples \(p+12P\) such that \(p+12P\leq 13\).
I will now start the selection procedure. I will first construct a matrix with the possibles values of \(p\), \(d\), \(q\), \(P\), \(D\), \(Q\) and \(T\). Each row corresponds to each case. In the total we have to estimate 256. It’s a lot but it just takes about 10 to 15 minutes to get all results. I will then eliminate the models with negative AIC and then I will eliminate the models that generate not white noise residuals.
Construction of the matrix of the parameters \(p\), \(d\), \(q\), \(P\), \(D\), \(Q\) and \(T\):
> qQ=list()
> for(i in 1:14) qQ[[i]]=c(i-1,0)
> qQ[[15]]=c(0,1)
> qQ[[16]]=c(1,1)
> pP=qQ
>
> dt_params=c()
> for(i in 1:16){
+ for(j in 1:16){
+ temp=c(pP[[i]][1],1,qQ[[j]][1],pP[[i]][2],1,
+ qQ[[j]][2],12)
+ dt_params=rbind(temp,dt_params)
+ }
+ }
> colnames(dt_params)=c("p","d","q","P","D","Q","T")
> rownames(dt_params)=1:256
try in the loop. Indeed I didn’t want that the loops stops when the estimation algorithm doesn’t converge. It will just provide a void object.> models=vector("list",256)
> for(i in 1:256){
+ try(models[[i]]<-Arima(x,order = dt_params[i,1:3],
+ seasonal = list(order=dt_params[i,4:6],period=12),
+ lambda = NULL))
+ }
Box.test.2 from caschrono package.> library(caschrono)
> aa=rep(NA,256)
> for(i in 1:256){
+ if(length(models[[i]]$residuals)>1){
+ a=Box.test.2(x = models[[i]]$residuals,nlag = 10,type = "Box-Pierce")
+ z=prod(1-(a[,2]<.05))
+ if(z==1) aa[i]="y"
+ else aa[i]="n"
+ }
+
+ }
> dt_params2=data.frame(dt_params)
> dt_params2$residuals=aa
Hence the object aa will tell us if the white-noise test was successful or not.
I will now create a vector aic that contains the AIC of the estimated models
> aic=rep(NA,256)
> model_names=rep(NA,256)
> for(i in 1:256){
+ if(length(models[[i]]$aic)>0){
+ aic[i]=models[[i]]$aic
+ model_names[i]=as.character(models[[i]])
+ }
+ }
> dt_params2$aic=aic
> dt_params2$model=model_names
> library(DT)
> dt_params2$aic=round(dt_params2$aic,4)
> dt_params2=na.omit(dt_params2)
> datatable(dt_params2,rownames = F)
auto.arima. We will then select the models with AIC smaller than 260 and we display them all.> i=as.numeric(rownames(dt_params2)[which(dt_params2$aic<260)])
> res=sapply(i, function(x)as.character(models[[x]]))
> res
## [1] "ARIMA(13,1,11)(0,1,0)[12]" "ARIMA(12,1,1)(0,1,1)[12]"
## [3] "ARIMA(12,1,11)(0,1,0)[12]" "ARIMA(10,1,13)(0,1,0)[12]"
## [5] "ARIMA(8,1,13)(0,1,0)[12]" "ARIMA(8,1,12)(0,1,0)[12]"
## [7] "ARIMA(7,1,13)(0,1,0)[12]" "ARIMA(7,1,12)(0,1,0)[12]"
## [9] "ARIMA(6,1,1)(0,1,1)[12]" "ARIMA(6,1,13)(0,1,0)[12]"
## [11] "ARIMA(5,1,1)(0,1,1)[12]" "ARIMA(5,1,13)(0,1,0)[12]"
## [13] "ARIMA(4,1,1)(0,1,1)[12]" "ARIMA(4,1,13)(0,1,0)[12]"
## [15] "ARIMA(3,1,1)(0,1,1)[12]" "ARIMA(2,1,1)(0,1,1)[12]"
t_stat from cashchrono package that performs a \(t-\)test for each coefficient.> x_test=sapply(i, function(x)t_stat(models[[x]]))
> bb=rep(NA,16)
> for(j in 1:16){
+ temp=t(x_test[[j]])[,2]
+ z=prod((temp<.05))
+ bb[j]=z
+ }
> bb
## [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1
> models[[i[length(i)-1]]]
## Series: x
## ARIMA(3,1,1)(0,1,1)[12]
##
## Coefficients:
## ar1 ar2 ar3 ma1 sma1
## 0.4400 0.2009 0.1103 -0.9706 -0.3790
## s.e. 0.0481 0.0504 0.0489 0.0162 0.0431
##
## sigma^2 estimated as 0.0973: log likelihood=-122.33
## AIC=256.66 AICc=256.84 BIC=281.77
> models[[i[length(i)]]]
## Series: x
## ARIMA(2,1,1)(0,1,1)[12]
##
## Coefficients:
## ar1 ar2 ma1 sma1
## 0.4580 0.2415 -0.9636 -0.3678
## s.e. 0.0484 0.0478 0.0184 0.0433
##
## sigma^2 estimated as 0.09814: log likelihood=-124.87
## AIC=259.74 AICc=259.87 BIC=280.66
> as.character(models[[i[length(i)-1]]])
## [1] "ARIMA(3,1,1)(0,1,1)[12]"
> x_tr <- window(x,end=2009)
> fit <- Arima(x_tr,order = c(3,1,1),
+ seasonal = list(order=c(0,1,1),period=12),
+ lambda = NULL)
> f_fit<-forecast(fit)
>
> autoplot(x_tr, series="Data") +
+ autolayer(fit$fitted, series="SARIMA(3,1,1)(0,1,1)[12]") +
+ autolayer(f_fit, series="Prediction") +
+ xlab("Year") + ylab("Departures") + ggtitle("Permanent Departures") + theme_bw()+theme(legend.title = element_blank(),legend.position = "bottom")
