Back to Index
library(nimble)
library(MASS)
library(coda)
nimbleOptions(verbose = FALSE)
The file code/nngp.R
contains code implementing NNGP calculations in NIMBLE.
The code is available here: nngp.R. It’s probably worth looking over this code file, to get an idea of what’s going on.
source('../code/nngp.R')
Now we’ll look at models on three different scales, and compare posterior results and MCMC efficienies of both NNGP and full GP models.
First, generate a very small dataset, with N=6 observed spatial locations, and k=3 nearest neighbors.
N <- 6
k <- 3
## uses default values: sigma = tau = psi = 1
## returns a list containing response (z), locations (s), and distance matrix (dst)
input <- generateData(N, seed = 0)
input
## $s
## [,1] [,2]
## [1,] 0.8966972 0.89838968
## [2,] 0.2655087 0.94467527
## [3,] 0.3721239 0.66079779
## [4,] 0.5728534 0.62911404
## [5,] 0.9082078 0.06178627
## [6,] 0.2016819 0.20597457
##
## $dst
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 0.0000000 0.6328833 0.5758707 0.4211700 0.8366826 0.9810632
## [2,] 0.6328833 0.0000000 0.3032379 0.4404993 1.0920417 0.7414530
## [3,] 0.5758707 0.3032379 0.0000000 0.2032146 0.8038661 0.4857104
## [4,] 0.4211700 0.4404993 0.2032146 0.0000000 0.6590322 0.5628634
## [5,] 0.8366826 1.0920417 0.8038661 0.6590322 0.0000000 0.7210888
## [6,] 0.9810632 0.7414530 0.4857104 0.5628634 0.7210888 0.0000000
##
## $z
## [1] -0.5931648 2.7314847 0.2834293 0.7694407 2.2904973 -0.3163101
## list of the "neighbor IDs".
## note the first k rows are left as NA, as
## these would contain *fewer* than k neighbor indices, specifically:
## row 1: [nothing]
## row 2: 1
## row 3: 1 2
nID <- determineNeighbors(input$dst, k)
nID
## [,1] [,2] [,3]
## [1,] NA NA NA
## [2,] NA NA NA
## [3,] NA NA NA
## [4,] 1 2 3
## [5,] 4 3 1
## [6,] 3 4 5
## calculation of CtildeInverse, the sparse precision matrix
## (just to test things, and take a look)
C <- diag(N) + isoExpCov(input$dst, 1, 1)
A <- calculateA(C, nID, N, k)
D <- calculateD(C, A, nID, N, k)
CtildeInverse <- t(diag(N)-A) %*% diag(1/D) %*% (diag(N)-A)
## examine sparsity
(CtildeInverse == 0)
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] FALSE FALSE FALSE FALSE FALSE TRUE
## [2,] FALSE FALSE FALSE FALSE TRUE TRUE
## [3,] FALSE FALSE FALSE FALSE FALSE FALSE
## [4,] FALSE FALSE FALSE FALSE FALSE FALSE
## [5,] FALSE TRUE FALSE FALSE FALSE FALSE
## [6,] TRUE TRUE FALSE FALSE FALSE FALSE
## does Ctilde approximate C (the true covariance)?
round(C - solve(CtildeInverse), 3)
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 0.000 0.000 0 0 0.000 0.086
## [2,] 0.000 0.000 0 0 0.071 0.179
## [3,] 0.000 0.000 0 0 0.000 0.000
## [4,] 0.000 0.000 0 0 0.000 0.000
## [5,] 0.000 0.071 0 0 0.000 0.000
## [6,] 0.086 0.179 0 0 0.000 0.000
Fit a NIMBLE model to this small dataset.
We’ll fit two models, and compare results:
## used to speed up testing
compile <- FALSE
## NNGP
code_NNGP <- nimbleCode({
mu ~ dnorm(0, sd = 10000)
tau ~ dunif(0, 10000)
sigma ~ dunif(0, 10000)
psi ~ dunif(0, 10000)
mean[1:N] <- ones[1:N] * mu
C[1:N,1:N] <- tau^2 * diag(N) + isoExpCov(dst[1:N,1:N], sigma, psi)
A[1:N,1:N] <- calculateA(C[1:N,1:N], nID[1:N,1:k], N, k)
D[1:N] <- calculateD(C[1:N,1:N], A[1:N,1:N], nID[1:N,1:k], N, k)
z[1:N] ~ dmnorm_nn(mean[1:N], A[1:N,1:N], D[1:N], nID[1:N,1:k], N, k)
})
constants <- list(N = N, k = k, ones = rep(1,N), dst = input$dst, nID = nID)
data <- list(z = input$z)
inits <- list(mu = 0, tau = 1, sigma = 1, psi = 1)
Rmodel <- nimbleModel(code_NNGP, constants, data, inits)
Rmodel$calculate() ## -48.95777
## [1] -48.95777
conf <- configureMCMC(Rmodel)
conf$printSamplers()
## [1] RW sampler: mu
## [2] RW sampler: tau
## [3] RW sampler: sigma
## [4] RW sampler: psi
Rmcmc <- buildMCMC(conf)
if(compile) {
Cmodel <- compileNimble(Rmodel)
Cmcmc_NNGP <- compileNimble(Rmcmc, project = Rmodel)
}
## full GP
code_GP <- nimbleCode({
mu ~ dnorm(0, sd = 10000)
tau ~ dunif(0, 10000)
sigma ~ dunif(0, 10000)
psi ~ dunif(0, 10000)
mean[1:N] <- ones[1:N] * mu
C[1:N,1:N] <- tau^2 * diag(N) + isoExpCov(dst[1:N,1:N], sigma, psi)
z[1:N] ~ dmnorm(mean[1:N], cov = C[1:N,1:N])
})
Rmodel <- nimbleModel(code_GP, constants, data, inits)
Rmodel$calculate() ## -48.92551
## [1] -48.92551
conf <- configureMCMC(Rmodel)
conf$printSamplers()
## [1] RW sampler: mu
## [2] RW sampler: tau
## [3] RW sampler: sigma
## [4] RW sampler: psi
Rmcmc <- buildMCMC(conf)
if(compile) {
Cmodel <- compileNimble(Rmodel)
Cmcmc_GP <- compileNimble(Rmcmc, project = Rmodel)
}
niter <- 100000
set.seed(0)
system.time(samples_NNGP <- runMCMC(Cmcmc_NNGP, niter, progressBar = FALSE))
## warning: value of right hand side only node not initialized
## user system elapsed
## 3.157 0.023 3.187
set.seed(0)
system.time(samples_GP <- runMCMC(Cmcmc_GP, niter, progressBar = FALSE))
## user system elapsed
## 1.365 0.022 1.399
## summary of the NNGP samples
samplesSummary(samples_NNGP)
## Mean Median St.Dev. 95%CI_low 95%CI_upp
## mu 2.094706 1.082889 231.543340 -462.38262495 467.888255
## psi 5961.676452 6278.196730 2675.736885 658.55067492 9847.259393
## sigma 175.508002 139.877666 165.837622 6.35575582 586.079752
## tau 1.562362 1.274304 1.384633 0.06995716 5.110817
## summary of the full GP samples
samplesSummary(samples_GP)
## Mean Median St.Dev. 95%CI_low 95%CI_upp
## mu -0.189069 0.8119596 209.596021 -440.60098167 423.72431
## psi 5975.479032 6306.6749057 2644.743897 713.26464054 9830.06770
## sigma 157.023183 122.4253598 144.046858 4.75236899 531.67099
## tau 1.674113 1.4242125 1.291536 0.09814136 4.98887
## quick graphical look at posterior medians and CIs
samplesList <- list(NNGP = samples_NNGP, GP = samples_GP)
chainsPlot(samplesList, 'psi', nrows=1, jitter=.03, buffer.left=.3, buffer.right=.3)
chainsPlot(samplesList, c('mu', 'sigma'), nrows=1, jitter=.1, buffer.left=.3, buffer.right=.3)
chainsPlot(samplesList, 'tau', nrows=1, jitter=.03, buffer.left=.3, buffer.right=.3)
Well, the NNGP and GP models agree very nicely with each other, but the posteriors don’t agree well (at all!) with the original data-generating parameter values. Specifically, σ=τ=ψ=1. This is probably fine, though, with so few data points on such a small spatial scale (on the unit square), giving poor inferences.
Next we try a larger model (more observed spatial locations), using different parameter values, and see
Let’s try N=500 observed spatial locations, with k=30 nearest-neighbors.
We’ll also change the data-generating values of σ, τ, and ψ.
Also, another change to note: smax
argument to generateData()
specifies the size of the spatial grid, i.e. the x and y coordinates of each spatial location are drawn from a Uniform(0, smax)
distribution. We’ll increase the size of the spatial grid, allowing for better inferences of the process parameters.
N <- 500
k <- 30
input <- generateData(N, sigma=3, tau=.3, psi=20, smax=200, seed = 0)
nID <- determineNeighbors(input$dst, k)
Once again, fit two NIMBLE models to this larger dataset.
## used to speed up testing
compile <- FALSE
constants <- list(N = N, k = k, ones = rep(1,N), dst = input$dst, nID = nID)
data <- list(z = input$z)
inits <- list(mu = 0, tau = 1, sigma = 1, psi = 1)
Rmodel <- nimbleModel(code_NNGP, constants, data, inits)
Rmodel$calculate() ## -1860.234
## [1] -1860.234
conf <- configureMCMC(Rmodel)
Rmcmc <- buildMCMC(conf)
if(compile) {
Cmodel <- compileNimble(Rmodel)
Cmcmc_NNGP <- compileNimble(Rmcmc, project = Rmodel)
}
Rmodel <- nimbleModel(code_GP, constants, data, inits)
Rmodel$calculate() ## -1860.234
## [1] -1860.234
conf <- configureMCMC(Rmodel)
Rmcmc <- buildMCMC(conf)
if(compile) {
Cmodel <- compileNimble(Rmodel)
Cmcmc_GP <- compileNimble(Rmcmc, project = Rmodel)
}
niter <- 1000
set.seed(0)
system.time(samples_NNGP <- runMCMC(Cmcmc_NNGP, niter, progressBar = FALSE))
## warning: value of right hand side only node not initialized
## user system elapsed
## 62.582 0.329 64.007
set.seed(0)
system.time(samples_GP <- runMCMC(Cmcmc_GP, niter, progressBar = FALSE))
## user system elapsed
## 36.358 0.174 36.610
## summary of the NNGP samples
samplesSummary(samples_NNGP)
## Mean Median St.Dev. 95%CI_low 95%CI_upp
## mu 0.9313985 0.7740421 1.1128774 -1.05639275 3.4354777
## psi 35.9365518 35.8981841 12.8791162 11.47730365 59.7584994
## sigma 3.4496640 3.4349119 0.5161769 2.55954590 4.3412616
## tau 0.6730062 0.6978732 0.1886914 0.08115123 0.9292961
## summary of the full GP samples
samplesSummary(samples_GP)
## Mean Median St.Dev. 95%CI_low 95%CI_upp
## mu 0.7475644 0.6928317 1.0842367 -1.25378452 3.0541685
## psi 34.6954020 33.1688094 14.1310722 11.47730365 63.6774312
## sigma 3.3946645 3.3018767 0.5549213 2.55954590 4.5449963
## tau 0.6629407 0.6837470 0.1913380 0.08115123 0.9213892
## quick graphical look at posterior medians and CIs
samplesList <- list(NNGP = samples_NNGP, GP = samples_GP)
chainsPlot(samplesList, 'psi', nrows=1, jitter=.03, buffer.left=.3, buffer.right=.3)
chainsPlot(samplesList, c('mu', 'sigma'), nrows=1, jitter=.1, buffer.left=.3, buffer.right=.3)
chainsPlot(samplesList, 'tau', nrows=1, jitter=.03, buffer.left=.3, buffer.right=.3)
With the locations spread out a little more on a larger spatial grid (smax = 200
), the posteriors now look reasonable. I’m pretty happy with this.
The issue, still, is that the full GP runs faster than the NNGP! That’s not the idea…. So, we’ll try one larger model yet, in hopes of the NNGP being faster (and having good inferences).
Finally, using N=2000 observed spatial locations, still with k=30 nearest-neighbors.
We’ll use the same values of σ, τ, and ψ as the Medium model.
N <- 2000
k <- 30
input <- generateData(N, sigma=3, tau=.3, psi=20, smax=200, seed = 0)
nID <- determineNeighbors(input$dst, k)
## used to speed up testing
compile <- FALSE
constants <- list(N = N, k = k, ones = rep(1,N), dst = input$dst, nID = nID)
data <- list(z = input$z)
inits <- list(mu = 0, tau = 1, sigma = 1, psi = 1)
if(compile) {
Rmodel <- nimbleModel(code_NNGP, constants, data, inits)
Rmodel$calculate() ## -6177.374
conf <- configureMCMC(Rmodel)
Rmcmc <- buildMCMC(conf)
Cmodel <- compileNimble(Rmodel)
Cmcmc_NNGP <- compileNimble(Rmcmc, project = Rmodel)
Rmodel <- nimbleModel(code_GP, constants, data, inits)
Rmodel$calculate() ## -6177.374
conf <- configureMCMC(Rmodel)
Rmcmc <- buildMCMC(conf)
Cmodel <- compileNimble(Rmodel)
Cmcmc_GP <- compileNimble(Rmcmc, project = Rmodel)
}
niter <- 1000
set.seed(0)
system.time(samples_NNGP <- runMCMC(Cmcmc_NNGP, niter, progressBar = FALSE))
## warning: value of right hand side only node not initialized
## user system elapsed
## 601.308 151.758 755.933
set.seed(0)
system.time(samples_GP <- runMCMC(Cmcmc_GP, niter, progressBar = FALSE))
## user system elapsed
## 1188.616 97.697 1302.240
## summary of the NNGP samples
samplesSummary(samples_NNGP)
## Mean Median St.Dev. 95%CI_low 95%CI_upp
## mu -0.3904708 -0.349897 0.7146794 -1.9633331 1.0478482
## psi 21.6081871 22.817767 5.8745989 10.4968802 30.3245971
## sigma 2.9680546 3.067778 0.4015883 2.1737222 3.5453830
## tau 0.2856242 0.284487 0.1103727 0.1529693 0.3754883
## summary of the full GP samples
samplesSummary(samples_GP)
## Mean Median St.Dev. 95%CI_low 95%CI_upp
## mu -0.5243473 -0.5177156 1.0544506 -2.7765457 1.6865939
## psi 29.6612137 31.1243945 12.2875024 12.0987139 51.4925509
## sigma 3.3934552 3.5947894 0.7123488 2.3180169 4.4115002
## tau 0.3048939 0.3082842 0.1049524 0.1841297 0.4033227
## quick graphical look at posterior medians and CIs
samplesList <- list(NNGP = samples_NNGP, GP = samples_GP)
chainsPlot(samplesList, 'psi', nrows=1, jitter=.03, buffer.left=.3, buffer.right=.3)
chainsPlot(samplesList, c('mu', 'sigma'), nrows=1, jitter=.1, buffer.left=.3, buffer.right=.3)
chainsPlot(samplesList, 'tau', nrows=1, jitter=.03, buffer.left=.3, buffer.right=.3)
Now that the NNGP was faster, let’s compare the effective sample size (ESS), and the efficiency (ESS/runtime) of each model.
Raw effective sample size:
apply(samples_NNGP, 2, effectiveSize)
## mu psi sigma tau
## 166.043352 5.508811 4.877045 172.974832
apply(samples_GP, 2, effectiveSize)
## mu psi sigma tau
## 270.877761 2.219826 2.466830 188.833840
Efficiency (number of effectively independent posterior samples generated per second of algorithm runtime):
apply(samples_NNGP, 2, effectiveSize) / 751
## mu psi sigma tau
## 0.221096341 0.007335301 0.006494068 0.230326008
apply(samples_GP, 2, effectiveSize) / 1438
## mu psi sigma tau
## 0.188371183 0.001543690 0.001715459 0.131316996
Finally, the ratio of efficienies of NNGP/NP. That is, how many times more efficient is the NNGP relative to the full GP model?
round((apply(samples_NNGP, 2, effectiveSize) / 751) / (apply(samples_GP, 2, effectiveSize) / 1438), 2)
## mu psi sigma tau
## 1.17 4.75 3.79 1.75
We’ve looked at NNGP full GP process models, applied to models of 3 different scales (N=6,500,2000). In each case inferenes from the two models agree, giving confidence in the implementation of the NNGP calculations. Note that the posteriors for the smallest model (N=6) are extremely broad and inconclusive, which is probably consistent with so few data points.
For the smaller two models (N=6 and 500) the GP model is faster. However, at the scale of N=2000 observations, the NNGP model takes less than half the time of the full GP, and we’d expect this difference to increase rapidly with larger values of N. It would be interesting to look at curves comparing the MCMC runtimes of each model, as a function of N.
More important the raw runtime, we looked at the efficiency of each model, in terms of effectively independent posterior samples generated per second of MCMC runtime. Only looking at the largest model, the GP covariance parameters (σ, τ and ψ) were sampled between 2 and 5 times more efficiently using the NNGP, relative to the full GP model. For the process mean μ, which I wouldn’t expect to be different between the NNGP and GP models, the efficiency is similar. I would expect gains in efficiency to increase further with larger values of N, although I haven’t tried anything larger yet.
There’s certainly a trade-off in selecting the number of nearest-neighbors (k) for a given value of N, which we haven’t explored at all. Mark, is there any literature or guidance on selecting a “good” value of k?
This seems like a promising starting point for more complex models. Mark, any thoughts on a larger (perhaps real) dataset, ideally with known results, that we might apply this to?