Back to Index
This builds on the NNGP implementation in Analysis 2, adding a NIMBLE sampler to perform conjugate (Gibbs) updating of \(\beta\), the mean regression coefficients of the Gaussian Process.
Code for the Gibbs sampler and the NNGP can be viewed here: nngp4.R.
The sampler_NNGP_beta_gibbs
follows Algorithm 3 of Finley (2017) paper. However, I believe there’s a typo in this line of Algorithm 3:
B[1,j] = qf(X[,i],X[,j],A,D) + F[1, j]
where I believe the 1
’s should be i
’s.
library(nimble)
library(MASS)
library(coda)
nimbleOptions(verbose = FALSE)
source('../code/nngp4.R')
We’ll look at two different sized models.
The first has \(N=2000\) observed locations, and uses \(10\) regression coefficients for the mean.
N <- 2000 ## number of observed locations
k <- 15 ## number of nearest neighbors to use
nx <- 10 ## number of parameters in mean regression, including intercept (>= 2)
sigma <- 3
tau <- 0.3
psi <- 20
set.seed(0)
(beta <- rnorm(nx, sd = 2))
## [1] 2.52590857 -0.65246672 2.65959853 2.54485864 0.82928287
## [6] -3.07990008 -1.85713407 -0.58944089 -0.01153435 4.80930678
X <- array(0, c(N,nx))
X[,1] <- 1
for(i in 2:nx) X[,i] <- rnorm(N)
mu <- X %*% beta
s <- cbind(runif(N,0,200), runif(N,0,200))
dst <- unname(as.matrix(dist(s)))
C <- isoExpCov(dst, tau, sigma, psi)
z <- MASS::mvrnorm(1, mu, C)
input <- list(s=s, dst=dst, z=z)
nID <- determineNeighbors(input$dst, k)
Note that this conjugate Gibbs sampling of beta
requires a multivariate Normal prior to be used for beta
.
## NNGP with mean regression
code_NNGP <- nimbleCode({
beta[1:nx] ~ dmnorm(mu_beta[1:nx], prec = prec_beta[1:nx,1:nx])
tau ~ dunif(0, 10000)
sigma ~ dunif(0, 10000)
psi ~ dunif(0, 10000)
mean[1:N] <- X[1:N,1:nx] %*% beta[1:nx]
AD[1:N,1:(k+1)] <- calculateAD(dst[1:N,1:N], nID[1:N,1:k], N, k, tau, sigma, psi)
z[1:N] ~ dmnorm_nn2(mean[1:N], AD[1:N,1:(k+1)], nID[1:N,1:k], N, k)
})
constants <- list(nx = nx, N = N, k = k, dst = input$dst, nID = nID,
X = X, mu_beta = rep(0,nx), prec_beta = diag(nx)/10000)
data <- list(z = input$z)
inits <- list(beta = rep(0,nx), tau = 1, sigma = 1, psi = 1)
niter <- 10000
Rmodel <- nimbleModel(code_NNGP, constants, data, inits)
conf <- configureMCMC(Rmodel)
conf$printSamplers()
## [1] RW sampler: tau
## [2] RW sampler: sigma
## [3] RW sampler: psi
## [4] RW_block sampler: beta[1:10]
We’ll remove the RW_block
sampler that’s placed by default on beta
, and instead assign our NNGP_beta_gibbs
sampler.
conf$removeSamplers('beta')
conf$addSampler('beta', 'NNGP_beta_gibbs')
conf$printSamplers()
## [1] RW sampler: tau
## [2] RW sampler: sigma
## [3] RW sampler: psi
## [4] NNGP_beta_gibbs sampler: beta
Rmcmc <- buildMCMC(conf)
Cmodel <- compileNimble(Rmodel)
Cmcmc <- compileNimble(Rmcmc, project = Rmodel)
set.seed(0)
time_gibbs <- system.time(samples_gibbs <- runMCMC(Cmcmc, niter))
We’ll compare this sampling against scalar Metropolis-Hastings updating of each individual beta[i]
coefficient.
Rmodel <- nimbleModel(code_NNGP, constants, data, inits)
conf <- configureMCMC(Rmodel, multivariateNodesAsScalars = TRUE)
conf$printSamplers()
## [1] RW sampler: tau
## [2] RW sampler: sigma
## [3] RW sampler: psi
## [4] RW sampler: beta[1]
## [5] RW sampler: beta[2]
## [6] RW sampler: beta[3]
## [7] RW sampler: beta[4]
## [8] RW sampler: beta[5]
## [9] RW sampler: beta[6]
## [10] RW sampler: beta[7]
## [11] RW sampler: beta[8]
## [12] RW sampler: beta[9]
## [13] RW sampler: beta[10]
Rmcmc <- buildMCMC(conf)
Cmodel <- compileNimble(Rmodel)
Cmcmc <- compileNimble(Rmcmc, project = Rmodel)
set.seed(0)
time_MH <- system.time(samples_MH <- runMCMC(Cmcmc, niter))
The conjugate updates to beta
yield ESS \(\approx\) the number of MCMC samples, as we’d expect.
What we wouldn’t expect, is that the mixing for psi
and sigma
is actually worse. I’m not sure why this is.
ess_MH <- apply(samples_MH, 2, effectiveSize)
round(ess_MH)
## beta[1] beta[2] beta[3] beta[4] beta[5] beta[6] beta[7] beta[8]
## 1906 710 5857 2222 1493 1327 2143 1003
## beta[9] beta[10] psi sigma tau
## 2373 537 29 26 580
ess_gibbs <- apply(samples_gibbs, 2, effectiveSize)
round(ess_gibbs)
## beta[1] beta[2] beta[3] beta[4] beta[5] beta[6] beta[7] beta[8]
## 9796 10000 9950 10000 10000 9198 7979 7973
## beta[9] beta[10] psi sigma tau
## 10000 8947 18 18 692
round(ess_gibbs / ess_MH, 1)
## beta[1] beta[2] beta[3] beta[4] beta[5] beta[6] beta[7] beta[8]
## 5.1 14.1 1.7 4.5 6.7 6.9 3.7 7.9
## beta[9] beta[10] psi sigma tau
## 4.2 16.7 0.6 0.7 1.2
In terms of raw runtime, the Gibbs updating of beta
takes about twice as long as MH updating.
## runtimes in minutes
time_MH[3] / 60
## elapsed
## 30.33187
## runtimes in minutes
time_gibbs[3] / 60
## elapsed
## 65.88548
round(time_gibbs[3] / time_MH[3], 1)
## elapsed
## 2.2
eff_MH <- ess_MH / time_MH[3]
eff_gibbs <- ess_gibbs / time_gibbs[3]
eff <- round(eff_gibbs / eff_MH, 1)
eff
## beta[1] beta[2] beta[3] beta[4] beta[5] beta[6] beta[7] beta[8]
## 2.4 6.5 0.8 2.1 3.1 3.2 1.7 3.7
## beta[9] beta[10] psi sigma tau
## 1.9 7.7 0.3 0.3 0.5
The sampling of the beta[i]
terms is generally twice or more times efficient with the Gibbs sampling. However, the covariance parameters are sampled less efficiently.
We’ll take a quick look at the posterior statistics, to make sure the results look ok.
samplesList <- list(MH = samples_MH, gibbs = samples_gibbs)
chainsPlot(samplesList, nrows=1)
We’ll also look at a larger model, to see how these results scale.
This time with \(N=6,000\) observed locations, and \(50\) coefficients for the mean.
N <- 6000 ## number of observed locations
k <- 15 ## number of nearest neighbors to use
nx <- 50 ## number of parameters in mean regression, including intercept (>= 2)
sigma <- 3
tau <- 0.3
psi <- 20
set.seed(0)
(beta <- rnorm(nx, sd = 2))
## [1] 2.52590857 -0.65246672 2.65959853 2.54485864 0.82928287
## [6] -3.07990008 -1.85713407 -0.58944089 -0.01153435 4.80930678
## [11] 1.52718692 -1.59801850 -2.29531402 -0.57892315 -0.59843024
## [16] -0.82302167 0.50444690 -1.78384225 0.87136660 -2.47507684
## [21] -0.44853577 0.75479129 0.26667272 1.60837902 -0.11421355
## [26] 1.00721594 2.17153872 -1.38190768 -2.56919871 0.09345234
## [31] -0.47141311 -1.08577651 -0.86662063 -1.29894329 1.45350149
## [36] 2.30382351 1.98432073 -0.85902622 2.47660820 -0.55869256
## [41] 3.51580618 1.12149218 -0.90556795 -1.66408659 -2.33314109
## [46] -2.13118116 -3.12756410 2.31307399 1.66409426 -0.45465738
X <- array(0, c(N,nx))
X[,1] <- 1
for(i in 2:nx) X[,i] <- rnorm(N)
mu <- X %*% beta
s <- cbind(runif(N,0,200), runif(N,0,200))
dst <- unname(as.matrix(dist(s)))
C <- isoExpCov(dst, tau, sigma, psi)
z <- MASS::mvrnorm(1, mu, C)
input <- list(s=s, dst=dst, z=z)
nID <- determineNeighbors(input$dst, k)
constants <- list(nx = nx, N = N, k = k, dst = input$dst, nID = nID,
X = X, mu_beta = rep(0,nx), prec_beta = diag(nx)/10000)
data <- list(z = input$z)
inits <- list(beta = rep(0,nx), tau = 1, sigma = 1, psi = 1)
niter <- 5000 ## decrease to 5000 iterations
Rmodel <- nimbleModel(code_NNGP, constants, data, inits)
conf <- configureMCMC(Rmodel)
Again, remove the RW_block
sampler that’s placed by default on beta
, and instead assign our NNGP_beta_gibbs
sampler.
conf$removeSamplers('beta')
conf$addSampler('beta', 'NNGP_beta_gibbs')
conf$printSamplers()
## [1] RW sampler: tau
## [2] RW sampler: sigma
## [3] RW sampler: psi
## [4] NNGP_beta_gibbs sampler: beta
Rmcmc <- buildMCMC(conf)
Cmodel <- compileNimble(Rmodel)
Cmcmc <- compileNimble(Rmcmc, project = Rmodel)
set.seed(0)
time_gibbs <- system.time(samples_gibbs <- runMCMC(Cmcmc, niter))
Compare this sampling against scalar Metropolis-Hastings updating of each individual beta[i]
coefficient.
Rmodel <- nimbleModel(code_NNGP, constants, data, inits)
conf <- configureMCMC(Rmodel, multivariateNodesAsScalars = TRUE)
Rmcmc <- buildMCMC(conf)
Cmodel <- compileNimble(Rmodel)
Cmcmc <- compileNimble(Rmcmc, project = Rmodel)
set.seed(0)
time_MH <- system.time(samples_MH <- runMCMC(Cmcmc, niter))
ess_MH <- apply(samples_MH, 2, effectiveSize)
round(ess_MH)
## beta[1] beta[2] beta[3] beta[4] beta[5] beta[6] beta[7] beta[8]
## 955 3228 507 977 617 1618 4579 1054
## beta[9] beta[10] beta[11] beta[12] beta[13] beta[14] beta[15] beta[16]
## 467 371 1647 2075 551 1038 1084 854
## beta[17] beta[18] beta[19] beta[20] beta[21] beta[22] beta[23] beta[24]
## 877 634 603 1289 533 655 163 396
## beta[25] beta[26] beta[27] beta[28] beta[29] beta[30] beta[31] beta[32]
## 904 477 947 645 3764 357 558 621
## beta[33] beta[34] beta[35] beta[36] beta[37] beta[38] beta[39] beta[40]
## 808 244 1602 1160 777 177 426 1930
## beta[41] beta[42] beta[43] beta[44] beta[45] beta[46] beta[47] beta[48]
## 680 842 403 899 465 740 1856 2730
## beta[49] beta[50] psi sigma tau
## 413 190 3 3 162
ess_gibbs <- apply(samples_gibbs, 2, effectiveSize)
round(ess_gibbs)
## beta[1] beta[2] beta[3] beta[4] beta[5] beta[6] beta[7] beta[8]
## 5000 5000 5000 2976 5000 4544 5000 5000
## beta[9] beta[10] beta[11] beta[12] beta[13] beta[14] beta[15] beta[16]
## 5074 4767 5000 5000 5000 3625 5254 6056
## beta[17] beta[18] beta[19] beta[20] beta[21] beta[22] beta[23] beta[24]
## 5260 4606 4412 5016 5000 4512 4235 4625
## beta[25] beta[26] beta[27] beta[28] beta[29] beta[30] beta[31] beta[32]
## 5000 3699 5000 2177 5000 5000 5000 5000
## beta[33] beta[34] beta[35] beta[36] beta[37] beta[38] beta[39] beta[40]
## 5000 5000 5000 4178 5000 2899 5000 5259
## beta[41] beta[42] beta[43] beta[44] beta[45] beta[46] beta[47] beta[48]
## 3170 3578 5000 4324 5000 5000 5000 5000
## beta[49] beta[50] psi sigma tau
## 4771 5000 16 17 669
Pleasingly, for the model of this scale, the raw ESS using the conjugate Gibbs updating is several times higher for all parameters sampled, including the covariance parameters.
round(ess_gibbs / ess_MH, 1)
## beta[1] beta[2] beta[3] beta[4] beta[5] beta[6] beta[7] beta[8]
## 5.2 1.5 9.9 3.0 8.1 2.8 1.1 4.7
## beta[9] beta[10] beta[11] beta[12] beta[13] beta[14] beta[15] beta[16]
## 10.9 12.9 3.0 2.4 9.1 3.5 4.8 7.1
## beta[17] beta[18] beta[19] beta[20] beta[21] beta[22] beta[23] beta[24]
## 6.0 7.3 7.3 3.9 9.4 6.9 26.0 11.7
## beta[25] beta[26] beta[27] beta[28] beta[29] beta[30] beta[31] beta[32]
## 5.5 7.8 5.3 3.4 1.3 14.0 9.0 8.0
## beta[33] beta[34] beta[35] beta[36] beta[37] beta[38] beta[39] beta[40]
## 6.2 20.5 3.1 3.6 6.4 16.4 11.7 2.7
## beta[41] beta[42] beta[43] beta[44] beta[45] beta[46] beta[47] beta[48]
## 4.7 4.2 12.4 4.8 10.7 6.8 2.7 1.8
## beta[49] beta[50] psi sigma tau
## 11.6 26.3 5.1 6.2 4.1
Runtime is now on the scale of hours. The conjugate Gibbs sampling takes about 3 times as long as scalar MH updating.
## runtimes in hours
time_MH[3] / 60 / 60
## elapsed
## 3.474308
## runtimes in hours
time_gibbs[3] / 60 / 60
## elapsed
## 10.95891
round(time_gibbs[3] / time_MH[3], 1)
## elapsed
## 3.2
Well, at the end of the day, the efficiencies are similar under each sampling method.
eff_MH <- ess_MH / time_MH[3]
eff_gibbs <- ess_gibbs / time_gibbs[3]
eff <- round(eff_gibbs / eff_MH, 1)
eff
## beta[1] beta[2] beta[3] beta[4] beta[5] beta[6] beta[7] beta[8]
## 1.7 0.5 3.1 1.0 2.6 0.9 0.3 1.5
## beta[9] beta[10] beta[11] beta[12] beta[13] beta[14] beta[15] beta[16]
## 3.4 4.1 1.0 0.8 2.9 1.1 1.5 2.2
## beta[17] beta[18] beta[19] beta[20] beta[21] beta[22] beta[23] beta[24]
## 1.9 2.3 2.3 1.2 3.0 2.2 8.2 3.7
## beta[25] beta[26] beta[27] beta[28] beta[29] beta[30] beta[31] beta[32]
## 1.8 2.5 1.7 1.1 0.4 4.4 2.8 2.6
## beta[33] beta[34] beta[35] beta[36] beta[37] beta[38] beta[39] beta[40]
## 2.0 6.5 1.0 1.1 2.0 5.2 3.7 0.9
## beta[41] beta[42] beta[43] beta[44] beta[45] beta[46] beta[47] beta[48]
## 1.5 1.3 3.9 1.5 3.4 2.1 0.9 0.6
## beta[49] beta[50] psi sigma tau
## 3.7 8.3 1.6 2.0 1.3
Once again, check the posterior distributions.
samplesList <- list(MH = samples_MH, gibbs = samples_gibbs)
chainsPlot(samplesList, c('beta','sigma','tau'), buffer.right=8, jitter=20, nrows=1)
We’ve compared scalar MH updating against conjugate (Gibbs) sampling for the regression parameters in the mean.
After taking into account runtime (which is much larger for the conjugate sampler), the final sampling efficiencies are reasonably similar under each method, even for the large model.
One thing that might be worth investigating further is when the beta[i]
coefficients are themselves correlated in the posterior. In this case, the MH sampling will struggle a lot more at achieve a high ESS, while the conjugate Gibbs sampling will do just fine. So by constructing such a model, I think we could find cases where the conjugate sampling is much better. This would be done by introducing some degree of co-linearity into the covariates (columns in the design matrix X
). In this design above, the covariates are entirely independent of one another. I have not experimented with this approach, but if we’re going to promote or use this method, then it’s probably worth doing.