Back to Index
library(nimble)
library(MASS)
library(coda)
nimbleOptions(verbose = FALSE)The file code/nngp.R contains code implementing NNGP calculations in NIMBLE.
The code is available here: nngp.R. It’s probably worth looking over this code file, to get an idea of what’s going on.
source('../code/nngp.R')Now we’ll look at models on three different scales, and compare posterior results and MCMC efficienies of both NNGP and full GP models.
First, generate a very small dataset, with \(N=6\) observed spatial locations, and \(k=3\) nearest neighbors.
N <- 6
k <- 3
 
## uses default values: sigma = tau = psi = 1
## returns a list containing response (z), locations (s), and distance matrix (dst)
input <- generateData(N, seed = 0)
input## $s
##           [,1]       [,2]
## [1,] 0.8966972 0.89838968
## [2,] 0.2655087 0.94467527
## [3,] 0.3721239 0.66079779
## [4,] 0.5728534 0.62911404
## [5,] 0.9082078 0.06178627
## [6,] 0.2016819 0.20597457
## 
## $dst
##           [,1]      [,2]      [,3]      [,4]      [,5]      [,6]
## [1,] 0.0000000 0.6328833 0.5758707 0.4211700 0.8366826 0.9810632
## [2,] 0.6328833 0.0000000 0.3032379 0.4404993 1.0920417 0.7414530
## [3,] 0.5758707 0.3032379 0.0000000 0.2032146 0.8038661 0.4857104
## [4,] 0.4211700 0.4404993 0.2032146 0.0000000 0.6590322 0.5628634
## [5,] 0.8366826 1.0920417 0.8038661 0.6590322 0.0000000 0.7210888
## [6,] 0.9810632 0.7414530 0.4857104 0.5628634 0.7210888 0.0000000
## 
## $z
## [1] -0.5931648  2.7314847  0.2834293  0.7694407  2.2904973 -0.3163101## list of the "neighbor IDs".
## note the first k rows are left as NA, as
## these would contain *fewer* than k neighbor indices, specifically:
## row 1: [nothing]
## row 2: 1
## row 3: 1 2
nID <- determineNeighbors(input$dst, k)
nID##      [,1] [,2] [,3]
## [1,]   NA   NA   NA
## [2,]   NA   NA   NA
## [3,]   NA   NA   NA
## [4,]    1    2    3
## [5,]    4    3    1
## [6,]    3    4    5## calculation of CtildeInverse, the sparse precision matrix
## (just to test things, and take a look)
C <- diag(N) + isoExpCov(input$dst, 1, 1)
A <- calculateA(C, nID, N, k)
D <- calculateD(C, A, nID, N, k)
CtildeInverse <- t(diag(N)-A) %*% diag(1/D) %*% (diag(N)-A)
## examine sparsity
(CtildeInverse == 0)##       [,1]  [,2]  [,3]  [,4]  [,5]  [,6]
## [1,] FALSE FALSE FALSE FALSE FALSE  TRUE
## [2,] FALSE FALSE FALSE FALSE  TRUE  TRUE
## [3,] FALSE FALSE FALSE FALSE FALSE FALSE
## [4,] FALSE FALSE FALSE FALSE FALSE FALSE
## [5,] FALSE  TRUE FALSE FALSE FALSE FALSE
## [6,]  TRUE  TRUE FALSE FALSE FALSE FALSE## does Ctilde approximate C (the true covariance)?
round(C - solve(CtildeInverse), 3)##       [,1]  [,2] [,3] [,4]  [,5]  [,6]
## [1,] 0.000 0.000    0    0 0.000 0.086
## [2,] 0.000 0.000    0    0 0.071 0.179
## [3,] 0.000 0.000    0    0 0.000 0.000
## [4,] 0.000 0.000    0    0 0.000 0.000
## [5,] 0.000 0.071    0    0 0.000 0.000
## [6,] 0.086 0.179    0    0 0.000 0.000Fit a NIMBLE model to this small dataset.
We’ll fit two models, and compare results:
## used to speed up testing
compile <- FALSE
## NNGP
code_NNGP <- nimbleCode({
    mu ~ dnorm(0, sd = 10000)
    tau ~ dunif(0, 10000)
    sigma ~ dunif(0, 10000)
    psi ~ dunif(0, 10000)
    mean[1:N] <- ones[1:N] * mu
    C[1:N,1:N] <- tau^2 * diag(N) + isoExpCov(dst[1:N,1:N], sigma, psi)
    A[1:N,1:N] <- calculateA(C[1:N,1:N], nID[1:N,1:k], N, k)
    D[1:N] <- calculateD(C[1:N,1:N], A[1:N,1:N], nID[1:N,1:k], N, k)
    z[1:N] ~ dmnorm_nn(mean[1:N], A[1:N,1:N], D[1:N], nID[1:N,1:k], N, k)
})
 
constants <- list(N = N, k = k, ones = rep(1,N), dst = input$dst, nID = nID)
data <- list(z = input$z)
inits <- list(mu = 0, tau = 1, sigma = 1, psi = 1)
 
Rmodel <- nimbleModel(code_NNGP, constants, data, inits)
Rmodel$calculate()   ## -48.95777## [1] -48.95777conf <- configureMCMC(Rmodel)
conf$printSamplers()## [1] RW sampler: mu
## [2] RW sampler: tau
## [3] RW sampler: sigma
## [4] RW sampler: psiRmcmc <- buildMCMC(conf)
if(compile) {
    Cmodel <- compileNimble(Rmodel)
    Cmcmc_NNGP <- compileNimble(Rmcmc, project = Rmodel)
}
## full GP
code_GP <- nimbleCode({
    mu ~ dnorm(0, sd = 10000)
    tau ~ dunif(0, 10000)
    sigma ~ dunif(0, 10000)
    psi ~ dunif(0, 10000)
    mean[1:N] <- ones[1:N] * mu
    C[1:N,1:N] <- tau^2 * diag(N) + isoExpCov(dst[1:N,1:N], sigma, psi)
    z[1:N] ~ dmnorm(mean[1:N], cov = C[1:N,1:N])
})
 
Rmodel <- nimbleModel(code_GP, constants, data, inits)
Rmodel$calculate()   ## -48.92551## [1] -48.92551conf <- configureMCMC(Rmodel)
conf$printSamplers()## [1] RW sampler: mu
## [2] RW sampler: tau
## [3] RW sampler: sigma
## [4] RW sampler: psiRmcmc <- buildMCMC(conf)
if(compile) {
    Cmodel <- compileNimble(Rmodel)
    Cmcmc_GP <- compileNimble(Rmcmc, project = Rmodel)
}niter <- 100000
set.seed(0)
system.time(samples_NNGP <- runMCMC(Cmcmc_NNGP, niter, progressBar = FALSE))## warning: value of right hand side only node not initialized##    user  system elapsed 
##   3.157   0.023   3.187set.seed(0)
system.time(samples_GP <- runMCMC(Cmcmc_GP, niter, progressBar = FALSE))##    user  system elapsed 
##   1.365   0.022   1.399## summary of the NNGP samples
samplesSummary(samples_NNGP)##              Mean      Median     St.Dev.     95%CI_low   95%CI_upp
## mu       2.094706    1.082889  231.543340 -462.38262495  467.888255
## psi   5961.676452 6278.196730 2675.736885  658.55067492 9847.259393
## sigma  175.508002  139.877666  165.837622    6.35575582  586.079752
## tau      1.562362    1.274304    1.384633    0.06995716    5.110817## summary of the full GP samples
samplesSummary(samples_GP)##              Mean       Median     St.Dev.     95%CI_low  95%CI_upp
## mu      -0.189069    0.8119596  209.596021 -440.60098167  423.72431
## psi   5975.479032 6306.6749057 2644.743897  713.26464054 9830.06770
## sigma  157.023183  122.4253598  144.046858    4.75236899  531.67099
## tau      1.674113    1.4242125    1.291536    0.09814136    4.98887## quick graphical look at posterior medians and CIs
samplesList <- list(NNGP = samples_NNGP, GP = samples_GP)
chainsPlot(samplesList, 'psi', nrows=1, jitter=.03, buffer.left=.3, buffer.right=.3)chainsPlot(samplesList, c('mu', 'sigma'), nrows=1, jitter=.1, buffer.left=.3, buffer.right=.3)chainsPlot(samplesList, 'tau', nrows=1, jitter=.03, buffer.left=.3, buffer.right=.3)Well, the NNGP and GP models agree very nicely with each other, but the posteriors don’t agree well (at all!) with the original data-generating parameter values. Specifically, \(\sigma = \tau = \psi = 1\). This is probably fine, though, with so few data points on such a small spatial scale (on the unit square), giving poor inferences.
Next we try a larger model (more observed spatial locations), using different parameter values, and see
Let’s try \(N=500\) observed spatial locations, with \(k=30\) nearest-neighbors.
We’ll also change the data-generating values of \(\sigma\), \(\tau\), and \(\psi\).
Also, another change to note: smax argument to generateData() specifies the size of the spatial grid, i.e. the \(x\) and \(y\) coordinates of each spatial location are drawn from a Uniform(0, smax) distribution. We’ll increase the size of the spatial grid, allowing for better inferences of the process parameters.
N <- 500
k <- 30
input <- generateData(N, sigma=3, tau=.3, psi=20, smax=200, seed = 0)
nID <- determineNeighbors(input$dst, k)Once again, fit two NIMBLE models to this larger dataset.
## used to speed up testing
compile <- FALSE
constants <- list(N = N, k = k, ones = rep(1,N), dst = input$dst, nID = nID)
data <- list(z = input$z)
inits <- list(mu = 0, tau = 1, sigma = 1, psi = 1)
 
Rmodel <- nimbleModel(code_NNGP, constants, data, inits)
Rmodel$calculate()   ## -1860.234## [1] -1860.234conf <- configureMCMC(Rmodel)
Rmcmc <- buildMCMC(conf)
if(compile) {
    Cmodel <- compileNimble(Rmodel)
    Cmcmc_NNGP <- compileNimble(Rmcmc, project = Rmodel)
}
Rmodel <- nimbleModel(code_GP, constants, data, inits)
Rmodel$calculate()   ## -1860.234## [1] -1860.234conf <- configureMCMC(Rmodel)
Rmcmc <- buildMCMC(conf)
if(compile) {
    Cmodel <- compileNimble(Rmodel)
    Cmcmc_GP <- compileNimble(Rmcmc, project = Rmodel)
}niter <- 1000
set.seed(0)
system.time(samples_NNGP <- runMCMC(Cmcmc_NNGP, niter, progressBar = FALSE))## warning: value of right hand side only node not initialized##    user  system elapsed 
##  62.582   0.329  64.007set.seed(0)
system.time(samples_GP <- runMCMC(Cmcmc_GP, niter, progressBar = FALSE))##    user  system elapsed 
##  36.358   0.174  36.610## summary of the NNGP samples
samplesSummary(samples_NNGP)##             Mean     Median    St.Dev.   95%CI_low  95%CI_upp
## mu     0.9313985  0.7740421  1.1128774 -1.05639275  3.4354777
## psi   35.9365518 35.8981841 12.8791162 11.47730365 59.7584994
## sigma  3.4496640  3.4349119  0.5161769  2.55954590  4.3412616
## tau    0.6730062  0.6978732  0.1886914  0.08115123  0.9292961## summary of the full GP samples
samplesSummary(samples_GP)##             Mean     Median    St.Dev.   95%CI_low  95%CI_upp
## mu     0.7475644  0.6928317  1.0842367 -1.25378452  3.0541685
## psi   34.6954020 33.1688094 14.1310722 11.47730365 63.6774312
## sigma  3.3946645  3.3018767  0.5549213  2.55954590  4.5449963
## tau    0.6629407  0.6837470  0.1913380  0.08115123  0.9213892## quick graphical look at posterior medians and CIs
samplesList <- list(NNGP = samples_NNGP, GP = samples_GP)
chainsPlot(samplesList, 'psi', nrows=1, jitter=.03, buffer.left=.3, buffer.right=.3)chainsPlot(samplesList, c('mu', 'sigma'), nrows=1, jitter=.1, buffer.left=.3, buffer.right=.3)chainsPlot(samplesList, 'tau', nrows=1, jitter=.03, buffer.left=.3, buffer.right=.3)With the locations spread out a little more on a larger spatial grid (smax = 200), the posteriors now look reasonable. I’m pretty happy with this.
The issue, still, is that the full GP runs faster than the NNGP! That’s not the idea…. So, we’ll try one larger model yet, in hopes of the NNGP being faster (and having good inferences).
Finally, using \(N=2000\) observed spatial locations, still with \(k=30\) nearest-neighbors.
We’ll use the same values of \(\sigma\), \(\tau\), and \(\psi\) as the Medium model.
N <- 2000
k <- 30
input <- generateData(N, sigma=3, tau=.3, psi=20, smax=200, seed = 0)
nID <- determineNeighbors(input$dst, k)## used to speed up testing
compile <- FALSE
constants <- list(N = N, k = k, ones = rep(1,N), dst = input$dst, nID = nID)
data <- list(z = input$z)
inits <- list(mu = 0, tau = 1, sigma = 1, psi = 1)
if(compile) {
    Rmodel <- nimbleModel(code_NNGP, constants, data, inits)
    Rmodel$calculate()   ## -6177.374
    
    conf <- configureMCMC(Rmodel)
    Rmcmc <- buildMCMC(conf)
    Cmodel <- compileNimble(Rmodel)
    Cmcmc_NNGP <- compileNimble(Rmcmc, project = Rmodel)
    Rmodel <- nimbleModel(code_GP, constants, data, inits)
    Rmodel$calculate()   ## -6177.374
    conf <- configureMCMC(Rmodel)
    Rmcmc <- buildMCMC(conf)
    Cmodel <- compileNimble(Rmodel)
    Cmcmc_GP <- compileNimble(Rmcmc, project = Rmodel)
}niter <- 1000
set.seed(0)
system.time(samples_NNGP <- runMCMC(Cmcmc_NNGP, niter, progressBar = FALSE))## warning: value of right hand side only node not initialized##    user  system elapsed 
## 601.308 151.758 755.933set.seed(0)
system.time(samples_GP <- runMCMC(Cmcmc_GP, niter, progressBar = FALSE))##     user   system  elapsed 
## 1188.616   97.697 1302.240## summary of the NNGP samples
samplesSummary(samples_NNGP)##             Mean    Median   St.Dev.  95%CI_low  95%CI_upp
## mu    -0.3904708 -0.349897 0.7146794 -1.9633331  1.0478482
## psi   21.6081871 22.817767 5.8745989 10.4968802 30.3245971
## sigma  2.9680546  3.067778 0.4015883  2.1737222  3.5453830
## tau    0.2856242  0.284487 0.1103727  0.1529693  0.3754883## summary of the full GP samples
samplesSummary(samples_GP)##             Mean     Median    St.Dev.  95%CI_low  95%CI_upp
## mu    -0.5243473 -0.5177156  1.0544506 -2.7765457  1.6865939
## psi   29.6612137 31.1243945 12.2875024 12.0987139 51.4925509
## sigma  3.3934552  3.5947894  0.7123488  2.3180169  4.4115002
## tau    0.3048939  0.3082842  0.1049524  0.1841297  0.4033227## quick graphical look at posterior medians and CIs
samplesList <- list(NNGP = samples_NNGP, GP = samples_GP)
chainsPlot(samplesList, 'psi', nrows=1, jitter=.03, buffer.left=.3, buffer.right=.3)chainsPlot(samplesList, c('mu', 'sigma'), nrows=1, jitter=.1, buffer.left=.3, buffer.right=.3)chainsPlot(samplesList, 'tau', nrows=1, jitter=.03, buffer.left=.3, buffer.right=.3)Now that the NNGP was faster, let’s compare the effective sample size (ESS), and the efficiency (ESS/runtime) of each model.
Raw effective sample size:
apply(samples_NNGP, 2, effectiveSize)##         mu        psi      sigma        tau 
## 166.043352   5.508811   4.877045 172.974832apply(samples_GP, 2, effectiveSize)##         mu        psi      sigma        tau 
## 270.877761   2.219826   2.466830 188.833840Efficiency (number of effectively independent posterior samples generated per second of algorithm runtime):
apply(samples_NNGP, 2, effectiveSize) / 751##          mu         psi       sigma         tau 
## 0.221096341 0.007335301 0.006494068 0.230326008apply(samples_GP, 2, effectiveSize) / 1438##          mu         psi       sigma         tau 
## 0.188371183 0.001543690 0.001715459 0.131316996Finally, the ratio of efficienies of NNGP/NP. That is, how many times more efficient is the NNGP relative to the full GP model?
round((apply(samples_NNGP, 2, effectiveSize) / 751) / (apply(samples_GP, 2, effectiveSize) / 1438), 2)##    mu   psi sigma   tau 
##  1.17  4.75  3.79  1.75
We’ve looked at NNGP full GP process models, applied to models of 3 different scales (\(N = 6, 500, 2000\)). In each case inferenes from the two models agree, giving confidence in the implementation of the NNGP calculations. Note that the posteriors for the smallest model (\(N=6\)) are extremely broad and inconclusive, which is probably consistent with so few data points.
For the smaller two models (\(N = 6\) and \(500\)) the GP model is faster. However, at the scale of \(N = 2000\) observations, the NNGP model takes less than half the time of the full GP, and we’d expect this difference to increase rapidly with larger values of \(N\). It would be interesting to look at curves comparing the MCMC runtimes of each model, as a function of \(N\).
More important the raw runtime, we looked at the efficiency of each model, in terms of effectively independent posterior samples generated per second of MCMC runtime. Only looking at the largest model, the GP covariance parameters (\(\sigma\), \(\tau\) and \(\psi\)) were sampled between 2 and 5 times more efficiently using the NNGP, relative to the full GP model. For the process mean \(\mu\), which I wouldn’t expect to be different between the NNGP and GP models, the efficiency is similar. I would expect gains in efficiency to increase further with larger values of \(N\), although I haven’t tried anything larger yet.
There’s certainly a trade-off in selecting the number of nearest-neighbors (\(k\)) for a given value of \(N\), which we haven’t explored at all. Mark, is there any literature or guidance on selecting a “good” value of \(k\)?
This seems like a promising starting point for more complex models. Mark, any thoughts on a larger (perhaps real) dataset, ideally with known results, that we might apply this to?