This algorithm is used to simulate probability distributions \(p\) from another probability distribution \(g\) easier to simulate. Usually \(p\) is called the target distribution and \(g\) is the proposal.

Indeed the MH-algorithm runs a Markov Chain supposed to converge to the target distribution \(p\).

Let \(\theta^{(t)}\) be the current draw from \(p(\theta)\)

The Metropolis-Hastings algorithm performs the following

- Draw \(\theta^\ast\) from \(g(\theta\mid \theta^{(t)})\)
- Accept \(\theta^{(t+1)}=\theta^\ast\) with the probability \(\min\left\{1,r\right\}\) where \[r=\displaystyle\frac{p(\theta^\ast)g(\theta^{(t)}\mid\theta^\ast)}{p(\theta^{(t)})g(\theta^\ast\mid\theta^{(t)} )}\] Otherwise set \(\theta^{(t+1)}=\theta^{(t)}\)

Accepting with the probability \(\min\left\{1,r\right\}\) means that we will be drawing \(x\) according to a uniform distribution on \((0,1)\) and if \(x<\min\left\{1,r\right\}\), then \(\theta^\ast\) is accepted otherwise it’s not.

We know that the posterior distribution can be written as follows \[p(\theta\mid y)\propto L(y\mid \theta )p(\theta)\] where \(L\) is the likelihood and \(p\) is prior distribution and we are usually not interesting in computing the normalization constant \[C(y)=\displaystyle\int L(y\mid \theta )\times p(\theta) d\theta\] We can then use the MH-algorithm to simulate \(p(\theta\mid y)\) by only using \(q(\theta\mid y)=L(y\mid \theta )\times p(\theta)\) and a proposal distribution \(g(\theta_1\mid \theta_2,\,y)\) as follows

- Draw \(\theta^\ast\) from \(g(\theta\mid\theta^{(t)},y)\)
- Accept \(\theta^{(t+1)}=\theta^\ast\) with the probability \(\min\left\{1,r\right\}\) where \[r=\displaystyle\frac{q(\theta^\ast\mid y)g(\theta^{(t)}\mid \theta^\ast,y)}{q(\theta^{(t)}\mid y)g(\theta^\ast\mid \theta^{(t)} ,y)}\] Otherwise set \(\theta^{(t+1)}=\theta^{(t)}\)

Suppose \(y\sim\mathcal{N}(\theta,1)\), then \[L(y\mid \theta) \propto \exp\left\{(y-\theta)^2/2\right\}\]

Let’s consider a Cauchy distribution as a prior distribution of \(\theta\): \[p(\theta)\propto \displaystyle\frac{1}{1+\theta^2}\]

Then the posterior distribution of \(\theta\) can be expressed as follows

\[ p(\theta\mid y)\propto \displaystyle\frac{\exp\left\{(y-\theta)^2/2\right\}}{1+\theta^2}\]

Let’s consider as proposal distribution \(g\) a Gaussian distribution \(\mathcal{N}(y,1)\) \[g(\theta\mid y)\propto \exp\left\{(y-\theta)^2/2\right\}\]

The acceptance probability \(r\) is then \[\begin{array}{rcl} r&=& \displaystyle\frac{q(\theta^\ast\mid y)g(\theta^{(t)}\mid \theta^\ast,y)}{q(\theta^{(t)}\mid y)g(\theta^\ast\mid \theta^{(t)} ,y)} \\ &=& \displaystyle\frac{\displaystyle\frac{\exp\left\{(y-\theta^\ast)^2/2\right\}}{1+(\theta^\ast)^2}\times \exp\left\{(y-\theta^{(t)})^2/2\right\}}{\displaystyle\frac{\exp\left\{(y-\theta^{(t)})^2/2\right\}}{1+(\theta^{(t)})^2}\times \exp\left\{(y-\theta^\ast)^2/2\right\}}\\ & & \\ &=& \displaystyle\frac{1+(\theta^{(t)})^2}{1+(\theta^\ast)^2} \end{array}\]

The

`R`

code of this MH-algorihm

```
> n=1000 ## The sample size
> y=10.3 ## The observed y
> samp=rep(NA,n) ## The vector that will contain the sample
> samp[1]=4 ## Intial value
> r_accep=function(theta_star,theta_t){
+ (1+theta_t^2)/(1+theta_star^2)
+ }
> for(i in 2:n){
+ proposed=rnorm(1,y,1)
+ r=r_accep(proposed,samp[i-1])
+ u=runif(1)
+ if(u<r) samp[i]=proposed
+ else samp[i]=samp[i-1]
+ }
```