Metropolis Hasting Algorithm

Why the Metropolis Hasting Algorithm?

  • This algorithm is used to simulate probability distributions \(p\) from another probability distribution \(g\) easier to simulate. Usually \(p\) is called the target distribution and \(g\) is the proposal.

  • Indeed the MH-algorithm runs a Markov Chain supposed to converge to the target distribution \(p\).

How the MH-algorithm works?

  • Let \(\theta^{(t)}\) be the current draw from \(p(\theta)\)

  • The Metropolis-Hastings algorithm performs the following

    1. Draw \(\theta^\ast\) from \(g(\theta\mid \theta^{(t)})\)
    2. Accept \(\theta^{(t+1)}=\theta^\ast\) with the probability \(\min\left\{1,r\right\}\) where \[r=\displaystyle\frac{p(\theta^\ast)g(\theta^{(t)}\mid\theta^\ast)}{p(\theta^{(t)})g(\theta^\ast\mid\theta^{(t)} )}\] Otherwise set \(\theta^{(t+1)}=\theta^{(t)}\)
  • Accepting with the probability \(\min\left\{1,r\right\}\) means that we will be drawing \(x\) according to a uniform distribution on \((0,1)\) and if \(x<\min\left\{1,r\right\}\), then \(\theta^\ast\) is accepted otherwise it’s not.

The MH-algorithm in the Bayesian context

We know that the posterior distribution can be written as follows \[p(\theta\mid y)\propto L(y\mid \theta )p(\theta)\] where \(L\) is the likelihood and \(p\) is prior distribution and we are usually not interesting in computing the normalization constant \[C(y)=\displaystyle\int L(y\mid \theta )\times p(\theta) d\theta\] We can then use the MH-algorithm to simulate \(p(\theta\mid y)\) by only using \(q(\theta\mid y)=L(y\mid \theta )\times p(\theta)\) and a proposal distribution \(g(\theta_1\mid \theta_2,\,y)\) as follows

  1. Draw \(\theta^\ast\) from \(g(\theta\mid\theta^{(t)},y)\)
  2. Accept \(\theta^{(t+1)}=\theta^\ast\) with the probability \(\min\left\{1,r\right\}\) where \[r=\displaystyle\frac{q(\theta^\ast\mid y)g(\theta^{(t)}\mid \theta^\ast,y)}{q(\theta^{(t)}\mid y)g(\theta^\ast\mid \theta^{(t)} ,y)}\] Otherwise set \(\theta^{(t+1)}=\theta^{(t)}\)

Example : Normal Bayesian Model with Cauchy prior distribution

  • Suppose \(y\sim\mathcal{N}(\theta,1)\), then \[L(y\mid \theta) \propto \exp\left\{(y-\theta)^2/2\right\}\]

  • Let’s consider a Cauchy distribution as a prior distribution of \(\theta\): \[p(\theta)\propto \displaystyle\frac{1}{1+\theta^2}\]

  • Then the posterior distribution of \(\theta\) can be expressed as follows

\[ p(\theta\mid y)\propto \displaystyle\frac{\exp\left\{(y-\theta)^2/2\right\}}{1+\theta^2}\]

  • Let’s consider as proposal distribution \(g\) a Gaussian distribution \(\mathcal{N}(y,1)\) \[g(\theta\mid y)\propto \exp\left\{(y-\theta)^2/2\right\}\]

  • The acceptance probability \(r\) is then \[\begin{array}{rcl} r&=& \displaystyle\frac{q(\theta^\ast\mid y)g(\theta^{(t)}\mid \theta^\ast,y)}{q(\theta^{(t)}\mid y)g(\theta^\ast\mid \theta^{(t)} ,y)} \\ &=& \displaystyle\frac{\displaystyle\frac{\exp\left\{(y-\theta^\ast)^2/2\right\}}{1+(\theta^\ast)^2}\times \exp\left\{(y-\theta^{(t)})^2/2\right\}}{\displaystyle\frac{\exp\left\{(y-\theta^{(t)})^2/2\right\}}{1+(\theta^{(t)})^2}\times \exp\left\{(y-\theta^\ast)^2/2\right\}}\\ & & \\ &=& \displaystyle\frac{1+(\theta^{(t)})^2}{1+(\theta^\ast)^2} \end{array}\]

  • The R code of this MH-algorihm

> n=1000  ## The sample size
> y=10.3 ## The observed y
> samp=rep(NA,n) ## The vector that will contain the sample
> samp[1]=4 ## Intial value
> r_accep=function(theta_star,theta_t){
+   (1+theta_t^2)/(1+theta_star^2)
+ }
> for(i in 2:n){
+   proposed=rnorm(1,y,1)
+   r=r_accep(proposed,samp[i-1])
+   u=runif(1)
+   if(u<r) samp[i]=proposed
+    else samp[i]=samp[i-1] 
+ }