Back to Index

This builds on the NNGP implementation in Analysis 2, adding a NIMBLE sampler to perform conjugate (Gibbs) updating of \(\beta\), the mean regression coefficients of the Gaussian Process.

Code for the Gibbs sampler and the NNGP can be viewed here: nngp4.R.

The sampler_NNGP_beta_gibbs follows Algorithm 3 of Finley (2017) paper. However, I believe there’s a typo in this line of Algorithm 3:

B[1,j] = qf(X[,i],X[,j],A,D) + F[1, j]

where I believe the 1’s should be i’s.

library(nimble)
library(MASS)
library(coda)

nimbleOptions(verbose = FALSE)

source('../code/nngp4.R')

We’ll look at two different sized models.

Medium Model

The first has \(N=2000\) observed locations, and uses \(10\) regression coefficients for the mean.

N <- 2000   ## number of observed locations
k <- 15     ## number of nearest neighbors to use
nx <- 10    ## number of parameters in mean regression, including intercept (>= 2)

sigma <- 3
tau <- 0.3
psi <- 20

set.seed(0)
(beta <- rnorm(nx, sd = 2))

##  [1]  2.52590857 -0.65246672  2.65959853  2.54485864  0.82928287
##  [6] -3.07990008 -1.85713407 -0.58944089 -0.01153435  4.80930678

X <- array(0, c(N,nx))
X[,1] <- 1
for(i in 2:nx)   X[,i] <- rnorm(N)
mu <- X %*% beta

s <- cbind(runif(N,0,200), runif(N,0,200))
dst <- unname(as.matrix(dist(s)))
C <- isoExpCov(dst, tau, sigma, psi)

z <- MASS::mvrnorm(1, mu, C)

input <- list(s=s, dst=dst, z=z)

nID <- determineNeighbors(input$dst, k)

Note that this conjugate Gibbs sampling of beta requires a multivariate Normal prior to be used for beta.

## NNGP with mean regression
code_NNGP <- nimbleCode({
    beta[1:nx] ~ dmnorm(mu_beta[1:nx], prec = prec_beta[1:nx,1:nx])
    tau ~ dunif(0, 10000)
    sigma ~ dunif(0, 10000)
    psi ~ dunif(0, 10000)
    mean[1:N] <- X[1:N,1:nx] %*% beta[1:nx]
    AD[1:N,1:(k+1)] <- calculateAD(dst[1:N,1:N], nID[1:N,1:k], N, k, tau, sigma, psi)
    z[1:N] ~ dmnorm_nn2(mean[1:N], AD[1:N,1:(k+1)], nID[1:N,1:k], N, k)
})

constants <- list(nx = nx, N = N, k = k, dst = input$dst, nID = nID,
                  X = X, mu_beta = rep(0,nx), prec_beta = diag(nx)/10000)
data <- list(z = input$z)
inits <- list(beta = rep(0,nx), tau = 1, sigma = 1, psi = 1)


niter <- 10000


Rmodel <- nimbleModel(code_NNGP, constants, data, inits)

conf <- configureMCMC(Rmodel)
conf$printSamplers()

## [1] RW sampler: tau
## [2] RW sampler: sigma
## [3] RW sampler: psi
## [4] RW_block sampler: beta[1:10]

We’ll remove the RW_block sampler that’s placed by default on beta, and instead assign our NNGP_beta_gibbs sampler.

conf$removeSamplers('beta')
conf$addSampler('beta', 'NNGP_beta_gibbs')
conf$printSamplers()

## [1] RW sampler: tau
## [2] RW sampler: sigma
## [3] RW sampler: psi
## [4] NNGP_beta_gibbs sampler: beta

Rmcmc <- buildMCMC(conf)
Cmodel <- compileNimble(Rmodel)
Cmcmc <- compileNimble(Rmcmc, project = Rmodel)

set.seed(0)
time_gibbs <- system.time(samples_gibbs <- runMCMC(Cmcmc, niter))

We’ll compare this sampling against scalar Metropolis-Hastings updating of each individual beta[i] coefficient.

Rmodel <- nimbleModel(code_NNGP, constants, data, inits)

conf <- configureMCMC(Rmodel, multivariateNodesAsScalars = TRUE)
conf$printSamplers()

## [1]  RW sampler: tau
## [2]  RW sampler: sigma
## [3]  RW sampler: psi
## [4]  RW sampler: beta[1]
## [5]  RW sampler: beta[2]
## [6]  RW sampler: beta[3]
## [7]  RW sampler: beta[4]
## [8]  RW sampler: beta[5]
## [9]  RW sampler: beta[6]
## [10] RW sampler: beta[7]
## [11] RW sampler: beta[8]
## [12] RW sampler: beta[9]
## [13] RW sampler: beta[10]

Rmcmc <- buildMCMC(conf)
Cmodel <- compileNimble(Rmodel)
Cmcmc <- compileNimble(Rmcmc, project = Rmodel)

set.seed(0)
time_MH <- system.time(samples_MH <- runMCMC(Cmcmc, niter))

Effective Sample Size

The conjugate updates to beta yield ESS \(\approx\) the number of MCMC samples, as we’d expect.

What we wouldn’t expect, is that the mixing for psi and sigma is actually worse. I’m not sure why this is.

ess_MH <- apply(samples_MH, 2, effectiveSize)
round(ess_MH)

##  beta[1]  beta[2]  beta[3]  beta[4]  beta[5]  beta[6]  beta[7]  beta[8] 
##     1906      710     5857     2222     1493     1327     2143     1003 
##  beta[9] beta[10]      psi    sigma      tau 
##     2373      537       29       26      580

ess_gibbs <- apply(samples_gibbs, 2, effectiveSize)
round(ess_gibbs)

##  beta[1]  beta[2]  beta[3]  beta[4]  beta[5]  beta[6]  beta[7]  beta[8] 
##     9796    10000     9950    10000    10000     9198     7979     7973 
##  beta[9] beta[10]      psi    sigma      tau 
##    10000     8947       18       18      692

round(ess_gibbs / ess_MH, 1)

##  beta[1]  beta[2]  beta[3]  beta[4]  beta[5]  beta[6]  beta[7]  beta[8] 
##      5.1     14.1      1.7      4.5      6.7      6.9      3.7      7.9 
##  beta[9] beta[10]      psi    sigma      tau 
##      4.2     16.7      0.6      0.7      1.2

Runtime

In terms of raw runtime, the Gibbs updating of beta takes about twice as long as MH updating.

## runtimes in minutes
time_MH[3] / 60

##  elapsed 
## 30.33187

## runtimes in minutes
time_gibbs[3] / 60

##  elapsed 
## 65.88548

round(time_gibbs[3] / time_MH[3], 1)

## elapsed 
##     2.2

Efficiency

eff_MH <- ess_MH / time_MH[3]
eff_gibbs <- ess_gibbs / time_gibbs[3]


eff <- round(eff_gibbs / eff_MH, 1)
eff

##  beta[1]  beta[2]  beta[3]  beta[4]  beta[5]  beta[6]  beta[7]  beta[8] 
##      2.4      6.5      0.8      2.1      3.1      3.2      1.7      3.7 
##  beta[9] beta[10]      psi    sigma      tau 
##      1.9      7.7      0.3      0.3      0.5

The sampling of the beta[i] terms is generally twice or more times efficient with the Gibbs sampling. However, the covariance parameters are sampled less efficiently.

Posterior Distribution

We’ll take a quick look at the posterior statistics, to make sure the results look ok.

samplesList <- list(MH = samples_MH, gibbs = samples_gibbs)

chainsPlot(samplesList, nrows=1)

Large Model

We’ll also look at a larger model, to see how these results scale.

This time with \(N=6,000\) observed locations, and \(50\) coefficients for the mean.

N <- 6000   ## number of observed locations
k <- 15     ## number of nearest neighbors to use
nx <- 50    ## number of parameters in mean regression, including intercept (>= 2)

sigma <- 3
tau <- 0.3
psi <- 20

set.seed(0)
(beta <- rnorm(nx, sd = 2))

##  [1]  2.52590857 -0.65246672  2.65959853  2.54485864  0.82928287
##  [6] -3.07990008 -1.85713407 -0.58944089 -0.01153435  4.80930678
## [11]  1.52718692 -1.59801850 -2.29531402 -0.57892315 -0.59843024
## [16] -0.82302167  0.50444690 -1.78384225  0.87136660 -2.47507684
## [21] -0.44853577  0.75479129  0.26667272  1.60837902 -0.11421355
## [26]  1.00721594  2.17153872 -1.38190768 -2.56919871  0.09345234
## [31] -0.47141311 -1.08577651 -0.86662063 -1.29894329  1.45350149
## [36]  2.30382351  1.98432073 -0.85902622  2.47660820 -0.55869256
## [41]  3.51580618  1.12149218 -0.90556795 -1.66408659 -2.33314109
## [46] -2.13118116 -3.12756410  2.31307399  1.66409426 -0.45465738

X <- array(0, c(N,nx))
X[,1] <- 1
for(i in 2:nx)   X[,i] <- rnorm(N)
mu <- X %*% beta

s <- cbind(runif(N,0,200), runif(N,0,200))
dst <- unname(as.matrix(dist(s)))
C <- isoExpCov(dst, tau, sigma, psi)

z <- MASS::mvrnorm(1, mu, C)

input <- list(s=s, dst=dst, z=z)

nID <- determineNeighbors(input$dst, k)


constants <- list(nx = nx, N = N, k = k, dst = input$dst, nID = nID,
                  X = X, mu_beta = rep(0,nx), prec_beta = diag(nx)/10000)
data <- list(z = input$z)
inits <- list(beta = rep(0,nx), tau = 1, sigma = 1, psi = 1)

niter <- 5000    ## decrease to 5000 iterations

Rmodel <- nimbleModel(code_NNGP, constants, data, inits)

conf <- configureMCMC(Rmodel)

Again, remove the RW_block sampler that’s placed by default on beta, and instead assign our NNGP_beta_gibbs sampler.

conf$removeSamplers('beta')
conf$addSampler('beta', 'NNGP_beta_gibbs')
conf$printSamplers()

## [1] RW sampler: tau
## [2] RW sampler: sigma
## [3] RW sampler: psi
## [4] NNGP_beta_gibbs sampler: beta

Rmcmc <- buildMCMC(conf)
Cmodel <- compileNimble(Rmodel)
Cmcmc <- compileNimble(Rmcmc, project = Rmodel)

set.seed(0)
time_gibbs <- system.time(samples_gibbs <- runMCMC(Cmcmc, niter))

Compare this sampling against scalar Metropolis-Hastings updating of each individual beta[i] coefficient.

Rmodel <- nimbleModel(code_NNGP, constants, data, inits)

conf <- configureMCMC(Rmodel, multivariateNodesAsScalars = TRUE)

Rmcmc <- buildMCMC(conf)
Cmodel <- compileNimble(Rmodel)
Cmcmc <- compileNimble(Rmcmc, project = Rmodel)

set.seed(0)
time_MH <- system.time(samples_MH <- runMCMC(Cmcmc, niter))

Effective Sample Size

ess_MH <- apply(samples_MH, 2, effectiveSize)
round(ess_MH)

##  beta[1]  beta[2]  beta[3]  beta[4]  beta[5]  beta[6]  beta[7]  beta[8] 
##      955     3228      507      977      617     1618     4579     1054 
##  beta[9] beta[10] beta[11] beta[12] beta[13] beta[14] beta[15] beta[16] 
##      467      371     1647     2075      551     1038     1084      854 
## beta[17] beta[18] beta[19] beta[20] beta[21] beta[22] beta[23] beta[24] 
##      877      634      603     1289      533      655      163      396 
## beta[25] beta[26] beta[27] beta[28] beta[29] beta[30] beta[31] beta[32] 
##      904      477      947      645     3764      357      558      621 
## beta[33] beta[34] beta[35] beta[36] beta[37] beta[38] beta[39] beta[40] 
##      808      244     1602     1160      777      177      426     1930 
## beta[41] beta[42] beta[43] beta[44] beta[45] beta[46] beta[47] beta[48] 
##      680      842      403      899      465      740     1856     2730 
## beta[49] beta[50]      psi    sigma      tau 
##      413      190        3        3      162

ess_gibbs <- apply(samples_gibbs, 2, effectiveSize)
round(ess_gibbs)

##  beta[1]  beta[2]  beta[3]  beta[4]  beta[5]  beta[6]  beta[7]  beta[8] 
##     5000     5000     5000     2976     5000     4544     5000     5000 
##  beta[9] beta[10] beta[11] beta[12] beta[13] beta[14] beta[15] beta[16] 
##     5074     4767     5000     5000     5000     3625     5254     6056 
## beta[17] beta[18] beta[19] beta[20] beta[21] beta[22] beta[23] beta[24] 
##     5260     4606     4412     5016     5000     4512     4235     4625 
## beta[25] beta[26] beta[27] beta[28] beta[29] beta[30] beta[31] beta[32] 
##     5000     3699     5000     2177     5000     5000     5000     5000 
## beta[33] beta[34] beta[35] beta[36] beta[37] beta[38] beta[39] beta[40] 
##     5000     5000     5000     4178     5000     2899     5000     5259 
## beta[41] beta[42] beta[43] beta[44] beta[45] beta[46] beta[47] beta[48] 
##     3170     3578     5000     4324     5000     5000     5000     5000 
## beta[49] beta[50]      psi    sigma      tau 
##     4771     5000       16       17      669

Pleasingly, for the model of this scale, the raw ESS using the conjugate Gibbs updating is several times higher for all parameters sampled, including the covariance parameters.

round(ess_gibbs / ess_MH, 1)

##  beta[1]  beta[2]  beta[3]  beta[4]  beta[5]  beta[6]  beta[7]  beta[8] 
##      5.2      1.5      9.9      3.0      8.1      2.8      1.1      4.7 
##  beta[9] beta[10] beta[11] beta[12] beta[13] beta[14] beta[15] beta[16] 
##     10.9     12.9      3.0      2.4      9.1      3.5      4.8      7.1 
## beta[17] beta[18] beta[19] beta[20] beta[21] beta[22] beta[23] beta[24] 
##      6.0      7.3      7.3      3.9      9.4      6.9     26.0     11.7 
## beta[25] beta[26] beta[27] beta[28] beta[29] beta[30] beta[31] beta[32] 
##      5.5      7.8      5.3      3.4      1.3     14.0      9.0      8.0 
## beta[33] beta[34] beta[35] beta[36] beta[37] beta[38] beta[39] beta[40] 
##      6.2     20.5      3.1      3.6      6.4     16.4     11.7      2.7 
## beta[41] beta[42] beta[43] beta[44] beta[45] beta[46] beta[47] beta[48] 
##      4.7      4.2     12.4      4.8     10.7      6.8      2.7      1.8 
## beta[49] beta[50]      psi    sigma      tau 
##     11.6     26.3      5.1      6.2      4.1

Runtime

Runtime is now on the scale of hours. The conjugate Gibbs sampling takes about 3 times as long as scalar MH updating.

## runtimes in hours
time_MH[3] / 60 / 60

##  elapsed 
## 3.474308

## runtimes in hours
time_gibbs[3] / 60 / 60

##  elapsed 
## 10.95891

round(time_gibbs[3] / time_MH[3], 1)

## elapsed 
##     3.2

Efficiency

Well, at the end of the day, the efficiencies are similar under each sampling method.

eff_MH <- ess_MH / time_MH[3]
eff_gibbs <- ess_gibbs / time_gibbs[3]


eff <- round(eff_gibbs / eff_MH, 1)
eff

##  beta[1]  beta[2]  beta[3]  beta[4]  beta[5]  beta[6]  beta[7]  beta[8] 
##      1.7      0.5      3.1      1.0      2.6      0.9      0.3      1.5 
##  beta[9] beta[10] beta[11] beta[12] beta[13] beta[14] beta[15] beta[16] 
##      3.4      4.1      1.0      0.8      2.9      1.1      1.5      2.2 
## beta[17] beta[18] beta[19] beta[20] beta[21] beta[22] beta[23] beta[24] 
##      1.9      2.3      2.3      1.2      3.0      2.2      8.2      3.7 
## beta[25] beta[26] beta[27] beta[28] beta[29] beta[30] beta[31] beta[32] 
##      1.8      2.5      1.7      1.1      0.4      4.4      2.8      2.6 
## beta[33] beta[34] beta[35] beta[36] beta[37] beta[38] beta[39] beta[40] 
##      2.0      6.5      1.0      1.1      2.0      5.2      3.7      0.9 
## beta[41] beta[42] beta[43] beta[44] beta[45] beta[46] beta[47] beta[48] 
##      1.5      1.3      3.9      1.5      3.4      2.1      0.9      0.6 
## beta[49] beta[50]      psi    sigma      tau 
##      3.7      8.3      1.6      2.0      1.3

Posterior Distribution

Once again, check the posterior distributions.

samplesList <- list(MH = samples_MH, gibbs = samples_gibbs)

chainsPlot(samplesList, c('beta','sigma','tau'), buffer.right=8, jitter=20, nrows=1)

Conclusions

We’ve compared scalar MH updating against conjugate (Gibbs) sampling for the regression parameters in the mean.

After taking into account runtime (which is much larger for the conjugate sampler), the final sampling efficiencies are reasonably similar under each method, even for the large model.

One thing that might be worth investigating further is when the beta[i] coefficients are themselves correlated in the posterior. In this case, the MH sampling will struggle a lot more at achieve a high ESS, while the conjugate Gibbs sampling will do just fine. So by constructing such a model, I think we could find cases where the conjugate sampling is much better. This would be done by introducing some degree of co-linearity into the covariates (columns in the design matrix X). In this design above, the covariates are entirely independent of one another. I have not experimented with this approach, but if we’re going to promote or use this method, then it’s probably worth doing.

NNGP Analysis 4

Daniel Turek

31 January 2018

Medium Model

Effective Sample Size

Runtime

Efficiency

Posterior Distribution

Large Model

Effective Sample Size

Runtime

Efficiency

Posterior Distribution

Conclusions