Metropolis Hasting Algorithm

Why the Metropolis Hasting Algorithm?

This algorithm is used to simulate probability distributions \(p\) from another probability distribution \(g\) easier to simulate. Usually \(p\) is called the target distribution and \(g\) is the proposal.
Indeed the MH-algorithm runs a Markov Chain supposed to converge to the target distribution \(p\).

How the MH-algorithm works?

Let \(\theta^{(t)}\) be the current draw from \(p(\theta)\)
The Metropolis-Hastings algorithm performs the following
1. Draw \(\theta^\ast\) from \(g(\theta\mid \theta^{(t)})\)
2. Accept \(\theta^{(t+1)}=\theta^\ast\) with the probability \(\min\left\{1,r\right\}\) where \[r=\displaystyle\frac{p(\theta^\ast)g(\theta^{(t)}\mid\theta^\ast)}{p(\theta^{(t)})g(\theta^\ast\mid\theta^{(t)} )}\] Otherwise set \(\theta^{(t+1)}=\theta^{(t)}\)
Accepting with the probability \(\min\left\{1,r\right\}\) means that we will be drawing \(x\) according to a uniform distribution on \((0,1)\) and if \(x<\min\left\{1,r\right\}\), then \(\theta^\ast\) is accepted otherwise it’s not.

The MH-algorithm in the Bayesian context

We know that the posterior distribution can be written as follows \[p(\theta\mid y)\propto L(y\mid \theta )p(\theta)\] where \(L\) is the likelihood and \(p\) is prior distribution and we are usually not interesting in computing the normalization constant \[C(y)=\displaystyle\int L(y\mid \theta )\times p(\theta) d\theta\] We can then use the MH-algorithm to simulate \(p(\theta\mid y)\) by only using \(q(\theta\mid y)=L(y\mid \theta )\times p(\theta)\) and a proposal distribution \(g(\theta_1\mid \theta_2,\,y)\) as follows

Draw \(\theta^\ast\) from \(g(\theta\mid\theta^{(t)},y)\)
Accept \(\theta^{(t+1)}=\theta^\ast\) with the probability \(\min\left\{1,r\right\}\) where \[r=\displaystyle\frac{q(\theta^\ast\mid y)g(\theta^{(t)}\mid \theta^\ast,y)}{q(\theta^{(t)}\mid y)g(\theta^\ast\mid \theta^{(t)} ,y)}\] Otherwise set \(\theta^{(t+1)}=\theta^{(t)}\)

Example : Normal Bayesian Model with Cauchy prior distribution

Suppose \(y\sim\mathcal{N}(\theta,1)\), then \[L(y\mid \theta) \propto \exp\left\{(y-\theta)^2/2\right\}\]
Let’s consider a Cauchy distribution as a prior distribution of \(\theta\): \[p(\theta)\propto \displaystyle\frac{1}{1+\theta^2}\]
Then the posterior distribution of \(\theta\) can be expressed as follows

\[ p(\theta\mid y)\propto \displaystyle\frac{\exp\left\{(y-\theta)^2/2\right\}}{1+\theta^2}\]

Let’s consider as proposal distribution \(g\) a Gaussian distribution \(\mathcal{N}(y,1)\) \[g(\theta\mid y)\propto \exp\left\{(y-\theta)^2/2\right\}\]
The acceptance probability \(r\) is then \[\begin{array}{rcl} r&=& \displaystyle\frac{q(\theta^\ast\mid y)g(\theta^{(t)}\mid \theta^\ast,y)}{q(\theta^{(t)}\mid y)g(\theta^\ast\mid \theta^{(t)} ,y)} \\ &=& \displaystyle\frac{\displaystyle\frac{\exp\left\{(y-\theta^\ast)^2/2\right\}}{1+(\theta^\ast)^2}\times \exp\left\{(y-\theta^{(t)})^2/2\right\}}{\displaystyle\frac{\exp\left\{(y-\theta^{(t)})^2/2\right\}}{1+(\theta^{(t)})^2}\times \exp\left\{(y-\theta^\ast)^2/2\right\}}\\ & & \\ &=& \displaystyle\frac{1+(\theta^{(t)})^2}{1+(\theta^\ast)^2} \end{array}\]
The R code of this MH-algorihm

> n=1000  ## The sample size
> y=10.3 ## The observed y
> samp=rep(NA,n) ## The vector that will contain the sample
> samp[1]=4 ## Intial value
> r_accep=function(theta_star,theta_t){
+   (1+theta_t^2)/(1+theta_star^2)
+ }
> for(i in 2:n){
+   proposed=rnorm(1,y,1)
+   r=r_accep(proposed,samp[i-1])
+   u=runif(1)
+   if(u<r) samp[i]=proposed
+    else samp[i]=samp[i-1] 
+ }

Why the Metropolis Hasting Algorithm?

How the MH-algorithm works?

The MH-algorithm in the Bayesian context

Example : Normal Bayesian Model with Cauchy prior distribution

Links to more tutorials on MH-algoithm