Metropolis Hasting Algorithm

## Why the Metropolis Hasting Algorithm?

• This algorithm is used to simulate probability distributions $$p$$ from another probability distribution $$g$$ easier to simulate. Usually $$p$$ is called the target distribution and $$g$$ is the proposal.

• Indeed the MH-algorithm runs a Markov Chain supposed to converge to the target distribution $$p$$.

## How the MH-algorithm works?

• Let $$\theta^{(t)}$$ be the current draw from $$p(\theta)$$

• The Metropolis-Hastings algorithm performs the following

1. Draw $$\theta^\ast$$ from $$g(\theta\mid \theta^{(t)})$$
2. Accept $$\theta^{(t+1)}=\theta^\ast$$ with the probability $$\min\left\{1,r\right\}$$ where $r=\displaystyle\frac{p(\theta^\ast)g(\theta^{(t)}\mid\theta^\ast)}{p(\theta^{(t)})g(\theta^\ast\mid\theta^{(t)} )}$ Otherwise set $$\theta^{(t+1)}=\theta^{(t)}$$
• Accepting with the probability $$\min\left\{1,r\right\}$$ means that we will be drawing $$x$$ according to a uniform distribution on $$(0,1)$$ and if $$x<\min\left\{1,r\right\}$$, then $$\theta^\ast$$ is accepted otherwise it’s not.

## The MH-algorithm in the Bayesian context

We know that the posterior distribution can be written as follows $p(\theta\mid y)\propto L(y\mid \theta )p(\theta)$ where $$L$$ is the likelihood and $$p$$ is prior distribution and we are usually not interesting in computing the normalization constant $C(y)=\displaystyle\int L(y\mid \theta )\times p(\theta) d\theta$ We can then use the MH-algorithm to simulate $$p(\theta\mid y)$$ by only using $$q(\theta\mid y)=L(y\mid \theta )\times p(\theta)$$ and a proposal distribution $$g(\theta_1\mid \theta_2,\,y)$$ as follows

1. Draw $$\theta^\ast$$ from $$g(\theta\mid\theta^{(t)},y)$$
2. Accept $$\theta^{(t+1)}=\theta^\ast$$ with the probability $$\min\left\{1,r\right\}$$ where $r=\displaystyle\frac{q(\theta^\ast\mid y)g(\theta^{(t)}\mid \theta^\ast,y)}{q(\theta^{(t)}\mid y)g(\theta^\ast\mid \theta^{(t)} ,y)}$ Otherwise set $$\theta^{(t+1)}=\theta^{(t)}$$

## Example : Normal Bayesian Model with Cauchy prior distribution

• Suppose $$y\sim\mathcal{N}(\theta,1)$$, then $L(y\mid \theta) \propto \exp\left\{(y-\theta)^2/2\right\}$

• Let’s consider a Cauchy distribution as a prior distribution of $$\theta$$: $p(\theta)\propto \displaystyle\frac{1}{1+\theta^2}$

• Then the posterior distribution of $$\theta$$ can be expressed as follows

$p(\theta\mid y)\propto \displaystyle\frac{\exp\left\{(y-\theta)^2/2\right\}}{1+\theta^2}$

• Let’s consider as proposal distribution $$g$$ a Gaussian distribution $$\mathcal{N}(y,1)$$ $g(\theta\mid y)\propto \exp\left\{(y-\theta)^2/2\right\}$

• The acceptance probability $$r$$ is then $\begin{array}{rcl} r&=& \displaystyle\frac{q(\theta^\ast\mid y)g(\theta^{(t)}\mid \theta^\ast,y)}{q(\theta^{(t)}\mid y)g(\theta^\ast\mid \theta^{(t)} ,y)} \\ &=& \displaystyle\frac{\displaystyle\frac{\exp\left\{(y-\theta^\ast)^2/2\right\}}{1+(\theta^\ast)^2}\times \exp\left\{(y-\theta^{(t)})^2/2\right\}}{\displaystyle\frac{\exp\left\{(y-\theta^{(t)})^2/2\right\}}{1+(\theta^{(t)})^2}\times \exp\left\{(y-\theta^\ast)^2/2\right\}}\\ & & \\ &=& \displaystyle\frac{1+(\theta^{(t)})^2}{1+(\theta^\ast)^2} \end{array}$

• The R code of this MH-algorihm

> n=1000  ## The sample size
> y=10.3 ## The observed y
> samp=rep(NA,n) ## The vector that will contain the sample
> samp[1]=4 ## Intial value
> r_accep=function(theta_star,theta_t){
+   (1+theta_t^2)/(1+theta_star^2)
+ }
> for(i in 2:n){
+   proposed=rnorm(1,y,1)
+   r=r_accep(proposed,samp[i-1])
+   u=runif(1)
+   if(u<r) samp[i]=proposed
+    else samp[i]=samp[i-1]
+ }