Sampling Multivariate Gaussian distribution with a Gibbs Sampler

Computing conditional distributions

Let’s consider the following matrix \(\Sigma\)

\[\Sigma=\left(\begin{array}{ccc} 2 & 1 & 0 \\ 1 & 2 & 1 \\ 0 & 1 & 1 \end{array}\right)\]

Sigma=matrix(c(2,1,0,1,2,1,0,1,2),3,3)
Sigma

##      [,1] [,2] [,3]
## [1,]    2    1    0
## [2,]    1    2    1
## [3,]    0    1    2

eigen(Sigma)$values

## [1] 3.4142136 2.0000000 0.5857864

\(\Sigma\) is then definite positive definite matrix and our aim is to simulate the mulitivariate Gaussian distribution with zero mean and covariance matrix \(\Sigma\)

\[(\theta_1,\theta_2,\theta_3)\sim\mathcal{N}_3(0,\Sigma)\]

We will use here the Gibbs sampler and we need then to compute the conditional distributions : \(\theta_1\mid\theta_2,\theta_3\), \(\theta_2\mid\theta_1,\theta_3\) and \(\theta_3\mid\theta_1,\theta_2\).

Let us recall the famous Theorem on the conditional distribution of Multivariate Gaussian Distributions:

Theorem Let \(X\sim\mathcal{N}(\mu,\Sigma)\) be a Gaussian random vector in \(\mathbb{R}^p\). \(A\subset \{1,\ldots,p\}\), \(X_A\) be subvector of \(X\) indexed by \(A\), and \(X_{\overline{A}}\) be subvector of \(X\) indexed by \(\overline{A}=\{1,\ldots,p\}\setminus A\). Then, if we decompose \(\Sigma\) and \(\mu\) as follows \[\mu=\left(\begin{array}{c}\mu_A\\ \mu_{\overline{A}}\end{array}\right)\] and \[ \Sigma=\left(\begin{array}{cc} \Sigma_A & \Sigma_{A \overline{A}} \\ \Sigma_{\overline{A}A} & \Sigma_{\overline{A}} \\ \end{array}\right)\] the random vector \(X_A\mid X_{\overline{A}}=x_{\overline{A}}\) has a Gaussian distribution such that \[\left(X_A\mid X_{\overline{A}}=x_{\overline{A}}\right)\sim \mathcal{N}(\mu_{A\mid \overline{A}},\Sigma_{A\mid \overline{A}})\] where

\[\mu_{A\mid \overline{A}}=\mu_A+ \Sigma_{A \overline{A}}\Sigma_{\overline{A}}^{-1}(x_{\overline{A}}-\mu_{\overline{A}})\]

and \[ \Sigma_{A\mid \overline{A}}=\Sigma_A- \Sigma_{A \overline{A}} \Sigma_{\overline{A}}^{-1} \Sigma_{\overline{A}A}.\]

Let us then applied the previous theorem to compute the conditional distribution of \(\theta_1\mid \theta_2,\theta_3\). Let us consider \(A=\{1\}\) and \(\overline{A}=\{2,3\}\). Then

> sigmaAAb=2-t(c(1,0))%*%solve(matrix(c(2,1,1,2),2,2))%*%c(1,0)
> sigmaAAb

##          [,1]
## [1,] 1.333333

> t(c(1,0))%*%solve(matrix(c(2,1,1,2),2,2))

##           [,1]       [,2]
## [1,] 0.6666667 -0.3333333

Then \[\mu_{1\mid 2,3}=.6666\theta_2-.3333 \theta_3\] Finaly we conclude that \[\theta_1\mid \theta_2,\theta_3\sim\mathcal{N}(.6666\,\theta_2-.3333\, \theta_3,\sqrt{1.333333})\]

Let’s do the same for \(\theta_2\mid \theta_1,\theta_3\) and \(\theta_3\mid \theta_1,\theta_2\)

for \(\theta_2\mid \theta_1,\theta_3\), \(A=\{2\}\) and \(\overline{A}=\{1,3\}\).

> sigmaAAb=2-t(c(1,1))%*%solve(matrix(c(2,0,0,2),2,2))%*%c(1,1)
> sigmaAAb

##      [,1]
## [1,]    1

> t(c(1,1))%*%solve(matrix(c(2,0,0,2),2,2))

##      [,1] [,2]
## [1,]  0.5  0.5

Then \[\mu_{2\mid 1,3}=.5\theta_1+.5 \theta_3\] Finaly we conclude that \[\theta_2\mid \theta_1,\theta_3\sim\mathcal{N}(.5\,\theta_1+.5\, \theta_3,1)\]

for \(\theta_3\mid \theta_1,\theta_2\), \(A=\{3\}\) and \(\overline{A}=\{1,2\}\).

> sigmaAAb=2-t(c(0,1))%*%solve(matrix(c(2,1,1,2),2,2))%*%c(0,1)
> sigmaAAb

##          [,1]
## [1,] 1.333333

> t(c(0,1))%*%solve(matrix(c(2,1,1,2),2,2))

##            [,1]      [,2]
## [1,] -0.3333333 0.6666667

Then \[\mu_{3\mid 1,2}=-.3333 \,\theta_1+ .6666\, \theta_2\] Finaly we conclude that \[\theta_3\mid \theta_1,\theta_2\sim\mathcal{N}(-.3333 \,\theta_1+ .6666\, \theta_2,\sqrt{1.333333})\]

`R` implementation

Here it’s the R code that will allow use to obtain a sample of \((\theta_1,\theta_2,\theta_3)\) using the Gibbs sampler:

> set.seed(55679)
> theta0=rnorm(3)
> N=1000
> theta=matrix(NA,ncol = 3,nrow = N)
> theta=rbind(theta0,theta)
> for(i in 1:N){
+   theta1<-rnorm(1,mean = .6666667*theta[i,2]-.33333*theta[i,3],sd = sqrt(1.3333))
+   theta2<-rnorm(1,mean = .5*theta1+.5*theta[i,3],sd = 1)
+   theta3<-rnorm(1,mean = -.33333*theta1+.66666*theta2,sd = sqrt(1.3333))
+   theta[i+1,]=c(theta1,theta2,theta3)
+ }

Let’s check some characteristics of the distribution of the sample

> colnames(theta)=c("theta1","theta2","theta3")
> theta=as.data.frame(theta[501:N,])
> m1=lm(theta1~theta2+theta3,data=theta)
> summary(m1)

## 
## Call:
## lm(formula = theta1 ~ theta2 + theta3, data = theta)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.3642 -0.7880  0.0400  0.6766  3.8589 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.02611    0.05199   0.502    0.616    
## theta2       0.62595    0.04470  14.002   <2e-16 ***
## theta3      -0.37615    0.04263  -8.823   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.16 on 497 degrees of freedom
## Multiple R-squared:  0.287,  Adjusted R-squared:  0.2841 
## F-statistic:   100 on 2 and 497 DF,  p-value: < 2.2e-16

> m1$coef

## (Intercept)      theta2      theta3 
##  0.02611218  0.62594589 -0.37614945

> m2=lm(theta2~theta1+theta3,data=theta)
> summary(m2)

## 
## Call:
## lm(formula = theta2 ~ theta1 + theta3, data = theta)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.5278 -0.6562  0.0188  0.6822  2.3690 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.002279   0.044186  -0.052    0.959    
## theta1       0.451920   0.032276  14.002   <2e-16 ***
## theta3       0.516927   0.031307  16.511   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9856 on 497 degrees of freedom
## Multiple R-squared:  0.4674, Adjusted R-squared:  0.4653 
## F-statistic: 218.1 on 2 and 497 DF,  p-value: < 2.2e-16

> m2$coef

##  (Intercept)       theta1       theta3 
## -0.002279313  0.451920201  0.516927367

> m3=lm(theta3~theta1+theta2,data=theta)
> summary(m3)

## 
## Call:
## lm(formula = theta3 ~ theta1 + theta2, data = theta)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.5586 -0.7140  0.0092  0.7020  3.6074 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.05641    0.05081  -1.110    0.267    
## theta1      -0.36001    0.04080  -8.823   <2e-16 ***
## theta2       0.68526    0.04150  16.511   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.135 on 497 degrees of freedom
## Multiple R-squared:  0.3579, Adjusted R-squared:  0.3553 
## F-statistic: 138.5 on 2 and 497 DF,  p-value: < 2.2e-16

> m3$coef

## (Intercept)      theta1      theta2 
## -0.05640736 -0.36000971  0.68526468

Computing conditional distributions

R implementation

`R` implementation