NNGP Analysis 5

Back to Index

library(nimble)
library(coda)
nimbleOptions(verbose = FALSE)
source('../bayes_nsgp/source_nimble.R')
source('../code/nngp4.R')

Data

We’ll use the same precipitation data and model that you used in the FNS_M2_implementation example.

COprecip <- read.csv('../fullGP_nonstationary/Final_data.csv')
coords <- as.matrix(COprecip[,c('Longitude', 'Latitude')])
dist_list <- ns_dist(coords)
Xmat <- unname(lm(logPrecip ~ Zelevation*Zslope10, x = TRUE, data = COprecip)$x)
p <- dim(Xmat)[2]
z <- COprecip$logPrecip

constants <- list(X_tau = Xmat, X_sigma = Xmat, X_Sigma = Xmat, X_mu = Xmat,
                  p_tau = p, p_sigma = p, p_Sigma = p, p_mu = p,
                  dist1_sq = dist_list$dist1_sq, dist2_sq = dist_list$dist2_sq, dist12 = dist_list$dist12)

niter <- 1000

Full GP model

First, we’ll use the Full GP likelihood.

Rmodel <- nsgp_model(likelihood = 'fullGP', constants = constants, z = z)
system.time(samples <- nimbleMCMC(model = Rmodel, niter = niter, progressBar = FALSE, setSeed = TRUE))

##    user  system elapsed 
## 193.528   2.206 197.386

samplesList <- list(fullGP = samples)

NNGP

Next, we’ll use the NNGP likelihood approximation, using \(k=15\) nearest neighbors.

k <- 15
dst <- unname(as.matrix(dist(coords)))
nID <- determineNeighbors(dst, k)

Rmodel <- nsgp_model(likelihood = 'NNGP', constants = constants, z = z, k = k, nID = nID)
system.time(samples <- nimbleMCMC(model = Rmodel, niter = niter, progressBar = FALSE, setSeed = TRUE))

##    user  system elapsed 
## 215.647   1.169 217.800

samplesList$NNGP <- samples

Results

Perfect agreement between the fullGP and NNGP likelihoods.

chainsPlot(samplesList)

Effective sample size is similar between the full GP and NNGP models.

do.call('cbind', lapply(samplesList, function(x) apply(x, 2, effectiveSize)))

##              fullGP     NNGP
## alpha[1]   76.76478 16.39946
## alpha[2]   67.27624 35.43334
## alpha[3]  108.17518 60.20091
## alpha[4]  151.36318 52.22122
## beta[1]    46.15630 22.44975
## beta[2]   104.86872 83.12268
## beta[3]   160.78625 26.28199
## beta[4]   124.51320 61.09206
## delta[1]  137.87157 36.64646
## delta[2]  138.38503 68.34985
## delta[3]  179.75589 81.25906
## delta[4]  143.96016 34.97568
## gamma1[1] 196.47077 82.01591
## gamma1[2] 209.14354 29.60924
## gamma1[3] 211.17410 77.49607
## gamma1[4] 207.97081 44.28948
## gamma2[1] 279.74353 18.61498
## gamma2[2] 151.25245 99.87818
## gamma2[3] 216.82321 47.50818
## gamma2[4] 225.74655 62.42767
## psi11     108.17428 25.63554
## psi22     159.83727 56.50624
## rho       212.84515 62.13015

Update

The calculateAD_ns() function now only uses sub-blocks of the full (N x N) covariance matrix, and gives the same results as before.

For this size model (\(N=217\)), we don’t see any computational savings from using the NNGP likelihood.

However, I’m very confident that for larger values of \(N\) (say in the 1000’s or larger), we’ll observe huge computational savings, when using the NNGP likelihood versus the full GP likelihood.

Conclusions

This shows a proof of concept of how this all could work.

A few changes you might want to look at:

Addition of different model options for mu, the GP process mean
Allowing constants to be passed in via named arguments through ..., or as a constants list argument.

Also, we can talk about incorporating the Gibbs sampler for the beta terms in the process mean.