How can I select the best SARIMA model
The aim of this note is to show, using a real data, how to select the best a SARIMA model for a given time series.
I won’t remind here the theory behind SARIMA models. You can either come to my class to learn more about time series 😎😎😎 or read the following books
I will use in this tutorial
departures
available in fpp2
package and I will the series permanent
. It’s about Overseas departures from Australia and I will be using the monthly number permanent departures from January 1976 - November 2016.fpp2
, forecast
, urca
and ggplot2
I start then by importing the data from fpp2
package and representing it in a graph
> library(fpp2)
> library(ggplot2)
> data("departures")
> x=departures[,1]
> x%>%autoplot()+theme_bw()+xlab("Years")+ ylab("Departures")
auto.arima
, from forecast
package, command to check quickly if this kind of data can be fitted using SARIMA models. I the estimated AIC is non-negative and if the log likelihood is negative we can conclude that SARIMA models can be good for this data. I will also use auto.sarima
command to get an idea about the values of \(d\), \(D\) and \(T\).> library(forecast)
> m0<-auto.arima(x)
> m0
## Series: x
## ARIMA(0,1,1)(0,1,1)[12]
##
## Coefficients:
## ma1 sma1
## -0.5216 -0.4012
## s.e. 0.0436 0.0412
##
## sigma^2 estimated as 0.1002: log likelihood=-130.45
## AIC=266.9 AICc=266.95 BIC=279.46
I can first notice that I can continue using SARIMA models that fits with this data. I won’t use the previous estimated model by auto.arima
because I’m not sure that I’m selecting the best model using the auto.arima
command. I will differentiate the series until I obtain stationary. Once this step is done, I will run a selection procedure that we provide the best model adjusting my data that I can use it to make predictions.
I will first represent the process \((1-B)(1-B^12)X_t\) obtained by differentiating my original time series \(X_t\).
> library(latex2exp)
> x%>%diff(12)%>%diff(1)%>%autoplot()+theme_bw()+ylab(TeX("$(1-B)(1-B^2)X_t$}$"))+xlab("")
I can notice from the previous graph that \((1-B)(1-B^12)X_t\) has a stationary behavior. I will confirm this using ADF test and KPSS test too. Both tests can be performed using respectively the commands ur.df
and ur.kpss
.
Since we notice in the previous graph that the time series \((1-B)(1-B^12)X_t\) has no trend and no drift. I will then choose the type="none"
for the ADF test and the type=mu
for the KPSS test.
> library(urca)
> t1=x%>%diff(12)%>%diff(1)%>%ur.df(lags = 6,type="none")
> summary(t1)
##
## ###############################################
## # Augmented Dickey-Fuller Test Unit Root Test #
## ###############################################
##
## Test regression none
##
##
## Call:
## lm(formula = z.diff ~ z.lag.1 - 1 + z.diff.lag)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.97024 -0.18372 0.00008 0.17017 1.33122
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## z.lag.1 -2.38348 0.20765 -11.478 < 2e-16 ***
## z.diff.lag1 0.86632 0.19067 4.544 7.04e-06 ***
## z.diff.lag2 0.64170 0.16989 3.777 0.000179 ***
## z.diff.lag3 0.55904 0.14808 3.775 0.000180 ***
## z.diff.lag4 0.47815 0.12148 3.936 9.54e-05 ***
## z.diff.lag5 0.27783 0.08903 3.121 0.001915 **
## z.diff.lag6 0.11445 0.04853 2.359 0.018753 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3351 on 471 degrees of freedom
## Multiple R-squared: 0.7267, Adjusted R-squared: 0.7226
## F-statistic: 178.9 on 7 and 471 DF, p-value: < 2.2e-16
##
##
## Value of test-statistic is: -11.4782
##
## Critical values for test statistics:
## 1pct 5pct 10pct
## tau1 -2.58 -1.95 -1.62
First, I notice that the regression in the ADF test is significant ; a small p-value, an \(R^2\) close to 1 and the coefficients are significant. The value of test-statistic is -11.4782. It’s very smaller that the critical value corresponding to the p-value 1%. Then the p-value of the hypothesis test: \[H_0\,:\, \phi=0\;\mbox{ vs }\;$H_1\,:\, \phi<0\] Hence \(H_0\) is rejected and then I can conclude that \((1-B)(1-B^12)X_t\) can be considered as a stationary process.
Let’s now perform a KPSS test
> library(urca)
> t2=x%>%diff(12)%>%diff(1)%>%ur.kpss(type="mu")
> summary(t2)
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 5 lags.
##
## Value of test-statistic is: 0.0221
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
The value of the test-statistic is .0221 smaller than the critical value corresponding to the p-value 10%. We can also conclude that \((1-B)(1-B^12)X_t\) can be considered as a stationary process.
I can then conclude that \(d=D=1\) and \(T=12\). Let’s notice that \(T=12\) is expected since we’re dealing with monthly data.
In the next step I will determine the possible values of \(p\), \(P\), \(q\) and \(Q\). I will ACF to determine the couples \((q,Q)\) and I will use PACF to determine the couples \((p,P)\).
> x%>%diff(12)%>%diff(1)%>%ggAcf()+theme_bw()
I can deduce from the previous graph that the ACF are equal to zero from the order 14 and these ACF are decreasing to zero. Then \(q+12Q\leq 13\). Let’s recall that \(q+12Q\) is the degree of the MA operator.
Let’s now draw the PACF
> x%>%diff(12)%>%diff(1)%>%ggPacf()+theme_bw()
I will then have the same conclusion as in the ACF and I will choose the couples \(p+12P\) such that \(p+12P\leq 13\).
I will now start the selection procedure. I will first construct a matrix with the possibles values of \(p\), \(d\), \(q\), \(P\), \(D\), \(Q\) and \(T\). Each row corresponds to each case. In the total we have to estimate 256. It’s a lot but it just takes about 10 to 15 minutes to get all results. I will then eliminate the models with negative AIC and then I will eliminate the models that generate not white noise residuals.
Construction of the matrix of the parameters \(p\), \(d\), \(q\), \(P\), \(D\), \(Q\) and \(T\):
> qQ=list()
> for(i in 1:14) qQ[[i]]=c(i-1,0)
> qQ[[15]]=c(0,1)
> qQ[[16]]=c(1,1)
> pP=qQ
>
> dt_params=c()
> for(i in 1:16){
+ for(j in 1:16){
+ temp=c(pP[[i]][1],1,qQ[[j]][1],pP[[i]][2],1,
+ qQ[[j]][2],12)
+ dt_params=rbind(temp,dt_params)
+ }
+ }
> colnames(dt_params)=c("p","d","q","P","D","Q","T")
> rownames(dt_params)=1:256
try
in the loop. Indeed I didn’t want that the loops stops when the estimation algorithm doesn’t converge. It will just provide a void object.> models=vector("list",256)
> for(i in 1:256){
+ try(models[[i]]<-Arima(x,order = dt_params[i,1:3],
+ seasonal = list(order=dt_params[i,4:6],period=12),
+ lambda = NULL))
+ }
Box.test.2
from caschrono
package.> library(caschrono)
> aa=rep(NA,256)
> for(i in 1:256){
+ if(length(models[[i]]$residuals)>1){
+ a=Box.test.2(x = models[[i]]$residuals,nlag = 10,type = "Box-Pierce")
+ z=prod(1-(a[,2]<.05))
+ if(z==1) aa[i]="y"
+ else aa[i]="n"
+ }
+
+ }
> dt_params2=data.frame(dt_params)
> dt_params2$residuals=aa
Hence the object aa
will tell us if the white-noise test was successful or not.
I will now create a vector aic
that contains the AIC of the estimated models
> aic=rep(NA,256)
> model_names=rep(NA,256)
> for(i in 1:256){
+ if(length(models[[i]]$aic)>0){
+ aic[i]=models[[i]]$aic
+ model_names[i]=as.character(models[[i]])
+ }
+ }
> dt_params2$aic=aic
> dt_params2$model=model_names
> library(DT)
> dt_params2$aic=round(dt_params2$aic,4)
> dt_params2=na.omit(dt_params2)
> datatable(dt_params2,rownames = F)
auto.arima
. We will then select the models with AIC smaller than 260 and we display them all.> i=as.numeric(rownames(dt_params2)[which(dt_params2$aic<260)])
> res=sapply(i, function(x)as.character(models[[x]]))
> res
## [1] "ARIMA(13,1,11)(0,1,0)[12]" "ARIMA(12,1,1)(0,1,1)[12]"
## [3] "ARIMA(12,1,11)(0,1,0)[12]" "ARIMA(10,1,13)(0,1,0)[12]"
## [5] "ARIMA(8,1,13)(0,1,0)[12]" "ARIMA(8,1,12)(0,1,0)[12]"
## [7] "ARIMA(7,1,13)(0,1,0)[12]" "ARIMA(7,1,12)(0,1,0)[12]"
## [9] "ARIMA(6,1,1)(0,1,1)[12]" "ARIMA(6,1,13)(0,1,0)[12]"
## [11] "ARIMA(5,1,1)(0,1,1)[12]" "ARIMA(5,1,13)(0,1,0)[12]"
## [13] "ARIMA(4,1,1)(0,1,1)[12]" "ARIMA(4,1,13)(0,1,0)[12]"
## [15] "ARIMA(3,1,1)(0,1,1)[12]" "ARIMA(2,1,1)(0,1,1)[12]"
t_stat
from cashchrono
package that performs a \(t-\)test for each coefficient.> x_test=sapply(i, function(x)t_stat(models[[x]]))
> bb=rep(NA,16)
> for(j in 1:16){
+ temp=t(x_test[[j]])[,2]
+ z=prod((temp<.05))
+ bb[j]=z
+ }
> bb
## [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1
> models[[i[length(i)-1]]]
## Series: x
## ARIMA(3,1,1)(0,1,1)[12]
##
## Coefficients:
## ar1 ar2 ar3 ma1 sma1
## 0.4400 0.2009 0.1103 -0.9706 -0.3790
## s.e. 0.0481 0.0504 0.0489 0.0162 0.0431
##
## sigma^2 estimated as 0.0973: log likelihood=-122.33
## AIC=256.66 AICc=256.84 BIC=281.77
> models[[i[length(i)]]]
## Series: x
## ARIMA(2,1,1)(0,1,1)[12]
##
## Coefficients:
## ar1 ar2 ma1 sma1
## 0.4580 0.2415 -0.9636 -0.3678
## s.e. 0.0484 0.0478 0.0184 0.0433
##
## sigma^2 estimated as 0.09814: log likelihood=-124.87
## AIC=259.74 AICc=259.87 BIC=280.66
> as.character(models[[i[length(i)-1]]])
## [1] "ARIMA(3,1,1)(0,1,1)[12]"
> x_tr <- window(x,end=2009)
> fit <- Arima(x_tr,order = c(3,1,1),
+ seasonal = list(order=c(0,1,1),period=12),
+ lambda = NULL)
> f_fit<-forecast(fit)
>
> autoplot(x_tr, series="Data") +
+ autolayer(fit$fitted, series="SARIMA(3,1,1)(0,1,1)[12]") +
+ autolayer(f_fit, series="Prediction") +
+ xlab("Year") + ylab("Departures") + ggtitle("Permanent Departures") + theme_bw()+theme(legend.title = element_blank(),legend.position = "bottom")