Back to Index
Analysis 3 looks at making efficient predictions from full GP models.
library(nimble)
library(MASS)
library(coda)
library(ggplot2)
library(RColorBrewer)
nimbleOptions(verbose = FALSE)
The isoExpCov
function is modified, to also create non-square blocks of a covariance matrix, not along the main diagonal of the full covariance. This is done using the argument mainDiag = 0
, and will help avoid unnecessary calculations when making GP predictions.
isoExpCov <- nimbleFunction(
run = function(dst = double(2), tau = double(), sigma = double(), psi = double(),
mainDiag = double(0, default = 1)) {
returnType(double(2))
if(!mainDiag) return(sigma^2 * exp(-dst/psi))
N <- dim(dst)[1]
return(tau^2 * diag(N) + sigma^2 * exp(-dst/psi))
}
)
The generateData
function is the same, creating data for \(N\) spatial locations in 2-space, using a fixed mean, no spatial covariates, and a latent Gaussian process random effect using a stationary isotropic exponential covariance. It returns a list of the response (z
), spatial locations (s
), and distance matrix (dst
).
generateData <- function(N, mu=0, tau=1, sigma=1, psi=1, smax=1, seed) {
require(MASS)
if(!missing(seed)) set.seed(seed)
s <- cbind(runif(N,0,smax), runif(N,0,smax))
dst <- unname(as.matrix(dist(s)))
C <- isoExpCov(dst, tau, sigma, psi)
z <- MASS::mvrnorm(1, rep(mu,N), C)
return(list(s=s, dst=dst, z=z))
}
We’ll create data for a reasonably small GP model, and look at posterior predictions at new points.
\(N\) will still represent the number of observed locations, and \(p\) is the number of prediction points. We’ll use \(N=5\) and \(p=5\) to first test the code.
In this first implementation, we’ll generate all of the following in the model:
meanpred
.Cpred
zpred
.The Monte Carlo average of \(\mu\) does, as it must, provide the mean of the \(Z^{*}\) predictions, so if we’re only interested in the mean predicitions, we don’t need to generate realizations of \(Z^{*}\). The same goes for the variance of the predictions. Rather than generating realizations of \(Z^{*}\), we need only the main diagonal elements of \(\Omega\).
N <- 5
p <- 5
Generate data for all \(N+p\) locations, but we’ll totally ignore the last \(p\) response values. We’ll be doing prediction at those points. But we will make use of the \((N+p)\) x \((N+p)\) distance matrix (in this implementation).
input <- generateData(N+p, sigma=3, tau=.3, psi=20, smax=100, seed = 0)
lapply(input, function(x) round(x,2))
## $s
## [,1] [,2]
## [1,] 89.67 6.18
## [2,] 26.55 20.60
## [3,] 37.21 17.66
## [4,] 57.29 68.70
## [5,] 90.82 38.41
## [6,] 20.17 76.98
## [7,] 89.84 49.77
## [8,] 94.47 71.76
## [9,] 66.08 99.19
## [10,] 62.91 38.00
##
## $dst
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] 0.00 64.74 53.70 70.41 32.25 99.22 43.59 65.76 95.96 41.58
## [2,] 64.74 0.00 11.06 57.08 66.69 56.75 69.69 85.03 87.97 40.31
## [3,] 53.70 11.06 0.00 54.85 57.49 61.73 61.65 78.78 86.49 32.78
## [4,] 70.41 57.08 54.85 0.00 45.19 38.03 37.66 37.31 31.73 31.21
## [5,] 32.25 66.69 57.49 45.19 0.00 80.50 11.40 33.55 65.62 27.91
## [6,] 99.22 56.75 61.73 38.03 80.50 0.00 74.80 74.48 51.00 57.85
## [7,] 43.59 69.69 61.65 37.66 11.40 74.80 0.00 22.47 54.84 29.39
## [8,] 65.76 85.03 78.78 37.31 33.55 74.48 22.47 0.00 39.47 46.21
## [9,] 95.96 87.97 86.49 31.73 65.62 51.00 54.84 39.47 0.00 61.27
## [10,] 41.58 40.31 32.78 31.21 27.91 57.85 29.39 46.21 61.27 0.00
##
## $z
## [1] -3.40 -2.01 -3.61 2.63 0.29 0.28 -2.55 -0.40 1.10 -2.51
constants <- list(N = N, p = p, ones = rep(1,N+p), dst = input$dst)
Note only the first \(N\) reponses are passed in as data. The other \(p\) were only generated for the purposes of making predictions, we’ll use the locations of those, and the full distance matrix.
data <- list(z = input$z[1:N]) ## only use first N values as observations
inits <- list(mu = 0, tau = 1, sigma = 1, psi = 1, zpred = rep(0,p))
It’s worth reading and thinking about the comments in the code below. Understanding this model will make the second model (code_GPpred2
, below) easier to follow.
code_GPpred1 <- nimbleCode({
mu ~ dnorm(0, sd = 10000)
tau ~ dunif(0, 10000)
sigma ~ dunif(0, 10000)
psi ~ dunif(0, 10000)
mean[1:(N+p)] <- ones[1:(N+p)] * mu
## full (Nxp) covariance matrix:
C[1:(N+p),1:(N+p)] <- isoExpCov(dst[1:(N+p),1:(N+p)], tau, sigma, psi)
## likelihood:
z[1:N] ~ dmnorm(mean[1:N], cov = C[1:N,1:N])
## matrix of regression coefficients:
regc[1:p,1:N] <- C[(N+1):(N+p),1:N] %*% inverse(C[1:N,1:N])
## conditional mean of response at prediction points:
meanpred[1:p] <- mean[(N+1):(N+p)] + (regc[1:p,1:N] %*% (z[1:N]-mean[1:N]))[1:p,1]
## conditional covariance of response at prediction points:
Cpred[1:p,1:p] <- C[(N+1):(N+p),(N+1):(N+p)] - regc[1:p,1:N] %*% C[1:N,(N+1):(N+p)]
## posterior predictions:
zpred[1:p] ~ dmnorm(meanpred[1:p], cov = Cpred[1:p,1:p])
})
niter <- 100000
t <- system.time(
samples <- nimbleMCMC(code_GPpred1, constants, data, inits,
monitors = c('mu','tau','sigma','psi','meanpred','Cpred','zpred'),
niter = niter, setSeed = TRUE, progressBar = TRUE)
)
t
## user system elapsed
## 31.914 1.202 33.482
The format of the samples of the posterior mean and covariance, and the posterior response predictions, is a little inconvenient in the matrix of samples. We’ll be able to pull them out, though.
postMean <- apply(samples, 2, mean)
round(postMean, 2)
## Cpred[1, 1] Cpred[2, 1] Cpred[3, 1] Cpred[4, 1] Cpred[5, 1] Cpred[1, 2]
## 73.84 3.58 7.31 21.24 4.41 3.58
## Cpred[2, 2] Cpred[3, 2] Cpred[4, 2] Cpred[5, 2] Cpred[1, 3] Cpred[2, 3]
## 43.48 16.77 9.58 6.56 7.31 16.77
## Cpred[3, 3] Cpred[4, 3] Cpred[5, 3] Cpred[1, 4] Cpred[2, 4] Cpred[3, 4]
## 63.23 23.44 4.64 21.24 9.58 23.44
## Cpred[4, 4] Cpred[5, 4] Cpred[1, 5] Cpred[2, 5] Cpred[3, 5] Cpred[4, 5]
## 74.99 3.34 4.41 6.56 4.64 3.34
## Cpred[5, 5] meanpred[1] meanpred[2] meanpred[3] meanpred[4] meanpred[5]
## 45.26 0.39 0.12 0.71 1.28 -0.74
## mu psi sigma tau zpred[1] zpred[2]
## -0.82 5920.94 42.31 3.15 0.40 0.12
## zpred[3] zpred[4] zpred[5]
## 0.68 1.26 -0.75
Here’s the posterior mean of \(\mu\), the conditional mean of the response at the \(p\) prediction points (conditional on the \(N\) observed locations; Monte Carlo averaged over the posterior distribution of the unknown covariance parameters)
meanpred <- unname(postMean[grep('meanpred', names(postMean), value = TRUE)])
meanpred
## [1] 0.3930979 0.1165444 0.7072425 1.2840143 -0.7407922
And, in this case, we generated realizations of \(Z^{*}\). Here’s the mean of those predictions. Note it’s (within simulation error) the same as above.
zpred <- unname(postMean[grep('zpred', names(postMean), value = TRUE)])
zpred
## [1] 0.3956538 0.1152484 0.6818702 1.2637375 -0.7540322
Now consider the variance of these predictions. In the MCMC, we generated Cpred
, or \(\Omega\), the (conditional) covariance of the response at the prediction points. Here’s the posterior mean of the conditional covariance, and more importantly, it’s main diagonal elements.
Cpred <- array(postMean[grep('Cpred', names(postMean), value = TRUE)], c(p,p))
Cpred
## [,1] [,2] [,3] [,4] [,5]
## [1,] 73.841578 3.581254 7.305497 21.244479 4.405814
## [2,] 3.581254 43.476938 16.768007 9.583171 6.563217
## [3,] 7.305497 16.768007 63.228807 23.442256 4.644436
## [4,] 21.244479 9.583171 23.442256 74.987104 3.343346
## [5,] 4.405814 6.563217 4.644436 3.343346 45.260444
diag(Cpred)
## [1] 73.84158 43.47694 63.22881 74.98710 45.26044
And, the variance of the realizations of \(Z^{*}\), which is the same.
postVar <- apply(samples, 2, var)
unname(postVar[grep('zpred', names(postVar), value = TRUE)])
## [1] 76.02885 43.99284 65.18167 79.61679 44.63734
So, point being of all of this, we don’t need to actually generate the \(Z^{*}\) predictions (as in equation (14) of your paper), but instead calculating the conditional mean (\(\mu\)) and covariance (\(\Omega\)) of the posterior predictive distirbution is sufficient to give us a handle on the properties of the predictive locations. At least if we’re only interested in the postior mean and variance (and covariances) of the predicted responses.
Does that sound right to you?
Now we’ll look at a more realistic model. Use \(N=50\) observed locations, and we’ll do prediction on a grid.
First generate the \(N=50\) observed points:
N <- 50
input <- generateData(N, sigma=3, tau=.3, psi=20, smax=100, seed = 0)
Now the predictive locations, called spred
. This is a \(50\) x \(50\) grid of points, for \(p=2500\) predicition points.
locs <- seq(0, 100, length = 50)
spred <- as.matrix(expand.grid(locs, locs))
p <- dim(spred)[1]
p
## [1] 2500
No longer can get away with just using dist
to calculate the distance matrix, since we’ll need to calculate a non-square distance matrix between the \(N\) observed points, and the \(p\) prediction points. So we make our own, to handle x
and y
of different lengths:
distMatrix <- function(x, y) {
dst <- array(NA, c(dim(x)[1], dim(y)[1]))
for(i in 1:dim(x)[1])
for(j in 1:dim(y)[1])
dst[i,j] <- sqrt(sum((x[i,] - y[j,])^2))
return(dst)
}
input$dst
already contains the distance matrix of the \(N\) observed locations with themselves.
dstpN
is the non-square distance matrix, between the predictive and observed locations. We’ll need this for the conditional calculations.
dstpp
is the distance matrix of the \(p\) predictive points, with themselves. This is needed for calculation of the conditional covariance, \(\Omega\).
Note, in the implementation below, the calculations of the conditional covariance are commented out, for speed, but they can easily be added back in.
dstpN <- distMatrix(spred, input$s)
dstpp <- unname(as.matrix(dist(spred)))
constants <- list(N = N, p = p, ones = rep(1,N+p), dstNN = input$dst, dstpN = dstpN, dstpp = dstpp)
data <- list(z = input$z)
inits <- list(mu = 0, tau = 1, sigma = 1, psi = 1)
Again, in this code, we don’t generate realizations of \(Z^{*}\), nor the conditional covariance. We only generate the mean at each prediction point, \(\mu\), or meanpred
below.
But, commenting in the 2nd and 3rd to last lines would generate the conditional covariance (\(\Omega\), or Cpred
), and the final line would generate realizations of the predictive response. Adding either of these will slow things down, of course.
code_GPpred2 <- nimbleCode({
mu ~ dnorm(0, sd = 10000)
tau ~ dunif(0, 10000)
sigma ~ dunif(0, 10000)
psi ~ dunif(0, 10000)
mean[1:(N+p)] <- ones[1:(N+p)] * mu
C[1:N,1:N] <- isoExpCov(dstNN[1:N,1:N], tau, sigma, psi)
z[1:N] ~ dmnorm(mean[1:N], cov = C[1:N,1:N])
CpN[1:p,1:N] <- isoExpCov(dstpN[1:p,1:N], tau, sigma, psi, mainDiag = 0)
regc[1:p,1:N] <- CpN[1:p,1:N] %*% inverse(C[1:N,1:N])
meanpred[1:p] <- asCol(mean[(N+1):(N+p)]) + regc[1:p,1:N] %*% (z[1:N]-mean[1:N])
##Cpp[1:p,1:p] <- isoExpCov(dstpp[1:p,1:p], tau, sigma, psi)
##Cpred[1:p,1:p] <- Cpp[1:p,1:p] - regc[1:p,1:N] %*% t(CpN[1:p,1:N])
##zpred[1:p] ~ dmnorm(meanpred[1:p], cov = Cpred[1:p,1:p])
})
niter <- 2000
t <- system.time(
samples <- nimbleMCMC(code_GPpred2, constants, data, inits,
monitors = c('mu','tau','sigma','psi','meanpred'),
niter = niter, setSeed = TRUE, progressBar = TRUE)
)
Runs quite fast! Especially (I’d say) for \(p=2500\) prediction points.
t
## user system elapsed
## 47.060 0.933 48.178
samplesSummary(samples[,c('mu','tau','sigma','psi')])
## Mean Median St.Dev. 95%CI_low 95%CI_upp
## mu -0.2460074 0.4161332 23.5824609 -67.16528157 45.720378
## tau 0.4775732 0.4372196 0.2919318 0.02565801 1.089049
## sigma 19.9994987 13.0567899 15.9771950 2.32237761 50.767192
## psi 2626.5520763 729.1386029 3168.5838565 16.05516215 9539.508020
Finally, we’ll look at a plot of the observed locations (large circles), and the posterior prediction mean on the grid of prediction locations.
postMean <- apply(samples, 2, mean)
meanpred <- unname(postMean[grep('meanpred', names(postMean), value = TRUE)])
ggplot(data.frame(s1 = input$s[,1], s2 = input$s[,2], z = input$z),
aes(x = s1, y = s2, color = z)) +
coord_fixed() + theme_bw() + scale_color_gradientn(colors = brewer.pal(9, 'Spectral')) +
geom_point(size = 3) +
geom_point(data = data.frame(s1 = spred[,1], s2 = spred[,2], z = meanpred), shape = 4)
Looks good!
And, I think all of this is done as efficiently as possible. No extra calculations going on anywhere, I don’t believe.