Consider the data set of annual counts of British coal mining disasters from 1851 to 1962.
> "CoalDisast" <-
+ structure(list(Year = as.integer(c(1851, 1852, 1853, 1854, 1855,
+ 1856, 1857, 1858, 1859, 1860, 1861, 1862, 1863, 1864, 1865, 1866,
+ 1867, 1868, 1869, 1870, 1871, 1872, 1873, 1874, 1875, 1876, 1877,
+ 1878, 1879, 1880, 1881, 1882, 1883, 1884, 1885, 1886, 1887, 1888,
+ 1889, 1890, 1891, 1892, 1893, 1894, 1895, 1896, 1897, 1898, 1899,
+ 1900, 1901, 1902, 1903, 1904, 1905, 1906, 1907, 1908, 1909, 1910,
+ 1911, 1912, 1913, 1914, 1915, 1916, 1917, 1918, 1919, 1920, 1921,
+ 1922, 1923, 1924, 1925, 1926, 1927, 1928, 1929, 1930, 1931, 1932,
+ 1933, 1934, 1935, 1936, 1937, 1938, 1939, 1940, 1941, 1942, 1943,
+ 1944, 1945, 1946, 1947, 1948, 1949, 1950, 1951, 1952, 1953, 1954,
+ 1955, 1956, 1957, 1958, 1959, 1960, 1961, 1962)), Count = c(4,
+ 5, 4, 1, 0, 4, 3, 4, 0, 6, 3, 3, 4, 0, 2, 6, 3, 3, 5, 4, 5, 3,
+ 1, 4, 4, 1, 5, 5, 3, 4, 2, 5, 2, 2, 3, 4, 2, 1, 3, 2, 2, 1, 1,
+ 1, 1, 3, 0, 0, 1, 0, 1, 1, 0, 0, 3, 1, 0, 3, 2, 2, 0, 1, 1, 1,
+ 0, 1, 0, 1, 0, 0, 0, 2, 1, 0, 0, 0, 1, 1, 0, 2, 3, 3, 1, 1, 2,
+ 1, 1, 1, 1, 2, 4, 2, 0, 0, 0, 1, 4, 0, 0, 0, 1, 0, 0, 0, 0, 0,
+ 1, 0, 0, 1, 0, 1)), .Names = c("Year", "Count"), row.names = c("1",
+ "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13",
+ "14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24",
+ "25", "26", "27", "28", "29", "30", "31", "32", "33", "34", "35",
+ "36", "37", "38", "39", "40", "41", "42", "43", "44", "45", "46",
+ "47", "48", "49", "50", "51", "52", "53", "54", "55", "56", "57",
+ "58", "59", "60", "61", "62", "63", "64", "65", "66", "67", "68",
+ "69", "70", "71", "72", "73", "74", "75", "76", "77", "78", "79",
+ "80", "81", "82", "83", "84", "85", "86", "87", "88", "89", "90",
+ "91", "92", "93", "94", "95", "96", "97", "98", "99", "100",
+ "101", "102", "103", "104", "105", "106", "107", "108", "109",
+ "110", "111", "112"), class = "data.frame")
> head(CoalDisast)
## Year Count
## 1 1851 4
## 2 1852 5
## 3 1853 4
## 4 1854 1
## 5 1855 0
## 6 1856 4
library(ggplot2)
p<-ggplot(CoalDisast,aes(x=Year,y=Count))+geom_line()+theme_bw()
p
It suggests that the rate of accidents decreased sometime around 1900.
So let us define the statistical model that may fit to the data and let us give a Bayesian estimation of the parameters.
Let us denote by \(\tau\) the change-point time and let us assume that before \(\tau\) the number of disasters follows a Poisson distribution with parameter \(\lambda_1\) and it follows after \(\tau\) also a Poisson distribution with parameter \(\lambda_2\). We will expect then to have \(\lambda_2< \lambda_1\). We may consider later more than one changepoint.
We will first assume a non-informative prior distribution on the vector of parameters \(\theta=(\lambda_1,\lambda_2,\tau)\): \[\pi(\lambda_1,\lambda_2,\tau)\propto \mathbb{1}_{\mathbb{R}^2}(\lambda_1,\lambda_2)\times\mathbb{1}_{\{1,\ldots,N\}}(\tau)\] where \(N\) is the number of years of the study.
Hence the BUGS code of the model is:
Let us notice that this version uses a slightly different parametrization, with the first element of b
as \(\log(\lambda_1)\) and the second as \(\log(\lambda_2)-\log(\lambda_1)\).
R
implementationWe will run the BUGS code using OpenBUGS
and the R
package R2OpenBUGS
:
sink('exChangePointPoiss.txt')
cat("
model {
for (year in 1:N) {
D[year] ~ dpois(mu[year])
log(mu[year]) <- b[1] + step(year - changeyear) * b[2]
}
for (j in 1:2) { b[j] ~ dnorm(0.0, 1.0E-6) }
changeyear ~ dunif(1,N)
}
",fill=T)
sink()
## Data
dt<-list(D<-CoalDisast$Count,N=nrow(CoalDisast))
## Initials
inits=function(){
inits=list(b=rnorm(2),changeyear=sample(size = 1,x = 1:nrow(CoalDisast)))
}
## Params
params<-c("b","changeyear")
## Running the MC
library(R2OpenBUGS)
filename<-'exChangePointPoiss.txt'
outCPPoiss <-bugs(dt,inits,params,filename,codaPkg=F,
n.thin =1, n.iter=10000,debug=F,
n.chains = 3,working.directory=getwd())
outCPPoiss
Or if you work with MAC OS.
OpenBUGS.pgm="/Users/dhafermalouche/.wine/drive_c/Program Files (x86)/OpenBUGS/OpenBUGS323/OpenBUGS.exe"
outCPPoiss <-bugs(dt,inits,params,filename,codaPkg=F,
n.thin =1, n.iter=10000,debug=F,
n.chains = 3,working.directory=getwd(),
OpenBUGS.pgm=OpenBUGS.pgm, useWINE=T)
## Inference for Bugs model at "exChangePointPoiss.txt",
## Current: 3 chains, each with 10000 iterations (first 5000 discarded)
## Cumulative: n.sims = 15000 iterations saved
## mean sd 2.5% 25% 50% 75% 97.5% Rhat n.eff
## b[1] 1.1 0.1 0.9 1.1 1.1 1.2 1.3 1 15000
## b[2] -1.2 0.2 -1.5 -1.3 -1.2 -1.1 -0.9 1 15000
## changeyear 40.5 2.5 36.1 39.1 40.7 41.8 46.4 1 15000
## deviance 341.2 2.6 337.9 339.3 340.6 342.4 347.9 1 12000
##
## For each parameter, n.eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor (at convergence, Rhat=1).
##
## DIC info (using the rule, pD = Dbar-Dhat)
## pD = 3.0 and DIC = 344.2
## DIC is an estimate of expected predictive error (lower deviance is better).
Let us make a diagnostic of the convergence of the Markov Chains:
> dim(outCPPoiss$sims.array)
## [1] 5000 3 4
> library(coda)
> b1=mcmc(outCPPoiss$sims.array[,,1])
> dim(b1)
## [1] 5000 3
> matplot(b1,col=c("red","green","blue"),type="l",ylab=expression(b_1))
> b2=mcmc(outCPPoiss$sims.array[,,2])
> dim(b2)
## [1] 5000 3
> matplot(b2,col=c("red","green","blue"),type="l",ylab=expression(b_2))
> changeyear=mcmc(outCPPoiss$sims.array[,,3])
> dim(changeyear)
## [1] 5000 3
> matplot(changeyear,col=c("red","green","blue"),type="l",ylab="changeyear")
> gelman.diag(list(b1[,1],b1[,2],b1[,3]))
## Potential scale reduction factors:
##
## Point est. Upper C.I.
## [1,] 1 1
> gelman.plot(list(changeyear[,1],changeyear[,2],changeyear[,3]))
> gelman.diag(list(changeyear[,1],changeyear[,2],changeyear[,3]))
## Potential scale reduction factors:
##
## Point est. Upper C.I.
## [1,] 1 1
> gelman.plot(list(changeyear[,1],changeyear[,2],changeyear[,3]))
> gelman.diag(list(changeyear[,1],changeyear[,2],changeyear[,3]))
## Potential scale reduction factors:
##
## Point est. Upper C.I.
## [1,] 1 1
> gelman.plot(list(b1[,1],b1[,2],b1[,3]))
> autocorr.diag(b1)
## [,1] [,2] [,3]
## Lag 0 1.000000000 1.000000000 1.000000000
## Lag 1 0.257092833 0.242148271 0.255831639
## Lag 5 -0.007711446 0.003761342 0.017163560
## Lag 10 0.017128683 -0.021708080 0.003541084
## Lag 50 0.010435906 0.007931565 -0.018354853
> autocorr.plot(b1)