Back to Index

Analysis 3 looks at making efficient predictions from full GP models.

library(nimble)
library(MASS)
library(coda)
library(ggplot2)
library(RColorBrewer)

nimbleOptions(verbose = FALSE)

The isoExpCov function is modified, to also create non-square blocks of a covariance matrix, not along the main diagonal of the full covariance. This is done using the argument mainDiag = 0, and will help avoid unnecessary calculations when making GP predictions.

isoExpCov <- nimbleFunction(
    run = function(dst = double(2), tau = double(), sigma = double(), psi = double(),
                   mainDiag = double(0, default = 1)) {
        returnType(double(2))
        if(!mainDiag)     return(sigma^2 * exp(-dst/psi))
        N <- dim(dst)[1]
        return(tau^2 * diag(N) + sigma^2 * exp(-dst/psi))
    }
)

The generateData function is the same, creating data for $N$ spatial locations in 2-space, using a fixed mean, no spatial covariates, and a latent Gaussian process random effect using a stationary isotropic exponential covariance. It returns a list of the response (z), spatial locations (s), and distance matrix (dst).

generateData <- function(N, mu=0, tau=1, sigma=1, psi=1, smax=1, seed) {
    require(MASS)
    if(!missing(seed)) set.seed(seed)
    s <- cbind(runif(N,0,smax), runif(N,0,smax))
    dst <- unname(as.matrix(dist(s)))
    C <- isoExpCov(dst, tau, sigma, psi)
    z <- MASS::mvrnorm(1, rep(mu,N), C)
    return(list(s=s, dst=dst, z=z))
}

Test Model

We’ll create data for a reasonably small GP model, and look at posterior predictions at new points.

$N$ will still represent the number of observed locations, and $p$ is the number of prediction points. We’ll use $N=5$ and $p=5$ to first test the code.

In this first implementation, we’ll generate all of the following in the model:

$\mu$ (conditional mean) for the predictive points, here called meanpred.
$\Omega$ (conditional covariance) for the predictive points, here called Cpred
Realizations of the predictions, $Z^{*}$, here called zpred.

The Monte Carlo average of $\mu$ does, as it must, provide the mean of the $Z^{*}$ predictions, so if we’re only interested in the mean predicitions, we don’t need to generate realizations of $Z^{*}$. The same goes for the variance of the predictions. Rather than generating realizations of $Z^{*}$, we need only the main diagonal elements of $\Omega$.

N <- 5
p <- 5

Generate data for all $N+p$ locations, but we’ll totally ignore the last $p$ response values. We’ll be doing prediction at those points. But we will make use of the $(N+p)$ x $(N+p)$ distance matrix (in this implementation).

input <- generateData(N+p, sigma=3, tau=.3, psi=20, smax=100, seed = 0)

lapply(input, function(x) round(x,2))

## $s
##        [,1]  [,2]
##  [1,] 89.67  6.18
##  [2,] 26.55 20.60
##  [3,] 37.21 17.66
##  [4,] 57.29 68.70
##  [5,] 90.82 38.41
##  [6,] 20.17 76.98
##  [7,] 89.84 49.77
##  [8,] 94.47 71.76
##  [9,] 66.08 99.19
## [10,] 62.91 38.00
## 
## $dst
##        [,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8]  [,9] [,10]
##  [1,]  0.00 64.74 53.70 70.41 32.25 99.22 43.59 65.76 95.96 41.58
##  [2,] 64.74  0.00 11.06 57.08 66.69 56.75 69.69 85.03 87.97 40.31
##  [3,] 53.70 11.06  0.00 54.85 57.49 61.73 61.65 78.78 86.49 32.78
##  [4,] 70.41 57.08 54.85  0.00 45.19 38.03 37.66 37.31 31.73 31.21
##  [5,] 32.25 66.69 57.49 45.19  0.00 80.50 11.40 33.55 65.62 27.91
##  [6,] 99.22 56.75 61.73 38.03 80.50  0.00 74.80 74.48 51.00 57.85
##  [7,] 43.59 69.69 61.65 37.66 11.40 74.80  0.00 22.47 54.84 29.39
##  [8,] 65.76 85.03 78.78 37.31 33.55 74.48 22.47  0.00 39.47 46.21
##  [9,] 95.96 87.97 86.49 31.73 65.62 51.00 54.84 39.47  0.00 61.27
## [10,] 41.58 40.31 32.78 31.21 27.91 57.85 29.39 46.21 61.27  0.00
## 
## $z
##  [1] -3.40 -2.01 -3.61  2.63  0.29  0.28 -2.55 -0.40  1.10 -2.51

constants <- list(N = N, p = p, ones = rep(1,N+p), dst = input$dst)

Note only the first $N$ reponses are passed in as data. The other $p$ were only generated for the purposes of making predictions, we’ll use the locations of those, and the full distance matrix.

data <- list(z = input$z[1:N])     ## only use first N values as observations
inits <- list(mu = 0, tau = 1, sigma = 1, psi = 1, zpred = rep(0,p))

It’s worth reading and thinking about the comments in the code below. Understanding this model will make the second model (code_GPpred2, below) easier to follow.

code_GPpred1 <- nimbleCode({
    mu ~ dnorm(0, sd = 10000)
    tau ~ dunif(0, 10000)
    sigma ~ dunif(0, 10000)
    psi ~ dunif(0, 10000)
    mean[1:(N+p)] <- ones[1:(N+p)] * mu
    ## full (Nxp) covariance matrix:
    C[1:(N+p),1:(N+p)] <- isoExpCov(dst[1:(N+p),1:(N+p)], tau, sigma, psi)
    ## likelihood:
    z[1:N] ~ dmnorm(mean[1:N], cov = C[1:N,1:N])
    ## matrix of regression coefficients:
    regc[1:p,1:N] <- C[(N+1):(N+p),1:N] %*% inverse(C[1:N,1:N])
    ## conditional mean of response at prediction points:
    meanpred[1:p] <- mean[(N+1):(N+p)] + (regc[1:p,1:N] %*% (z[1:N]-mean[1:N]))[1:p,1]
    ## conditional covariance of response at prediction points:
    Cpred[1:p,1:p] <- C[(N+1):(N+p),(N+1):(N+p)] - regc[1:p,1:N] %*% C[1:N,(N+1):(N+p)]
    ## posterior predictions:
    zpred[1:p] ~ dmnorm(meanpred[1:p], cov = Cpred[1:p,1:p])
})

niter <- 100000

t <- system.time(
    samples <- nimbleMCMC(code_GPpred1, constants, data, inits,
                          monitors = c('mu','tau','sigma','psi','meanpred','Cpred','zpred'),
                          niter = niter, setSeed = TRUE, progressBar = TRUE)
)

##    user  system elapsed 
##  31.914   1.202  33.482

The format of the samples of the posterior mean and covariance, and the posterior response predictions, is a little inconvenient in the matrix of samples. We’ll be able to pull them out, though.

postMean <- apply(samples, 2, mean)
round(postMean, 2)

## Cpred[1, 1] Cpred[2, 1] Cpred[3, 1] Cpred[4, 1] Cpred[5, 1] Cpred[1, 2] 
##       73.84        3.58        7.31       21.24        4.41        3.58 
## Cpred[2, 2] Cpred[3, 2] Cpred[4, 2] Cpred[5, 2] Cpred[1, 3] Cpred[2, 3] 
##       43.48       16.77        9.58        6.56        7.31       16.77 
## Cpred[3, 3] Cpred[4, 3] Cpred[5, 3] Cpred[1, 4] Cpred[2, 4] Cpred[3, 4] 
##       63.23       23.44        4.64       21.24        9.58       23.44 
## Cpred[4, 4] Cpred[5, 4] Cpred[1, 5] Cpred[2, 5] Cpred[3, 5] Cpred[4, 5] 
##       74.99        3.34        4.41        6.56        4.64        3.34 
## Cpred[5, 5] meanpred[1] meanpred[2] meanpred[3] meanpred[4] meanpred[5] 
##       45.26        0.39        0.12        0.71        1.28       -0.74 
##          mu         psi       sigma         tau    zpred[1]    zpred[2] 
##       -0.82     5920.94       42.31        3.15        0.40        0.12 
##    zpred[3]    zpred[4]    zpred[5] 
##        0.68        1.26       -0.75

Here’s the posterior mean of $\mu$, the conditional mean of the response at the $p$ prediction points (conditional on the $N$ observed locations; Monte Carlo averaged over the posterior distribution of the unknown covariance parameters)

meanpred <- unname(postMean[grep('meanpred', names(postMean), value = TRUE)])
meanpred

## [1]  0.3930979  0.1165444  0.7072425  1.2840143 -0.7407922

And, in this case, we generated realizations of $Z^{*}$. Here’s the mean of those predictions. Note it’s (within simulation error) the same as above.

zpred <- unname(postMean[grep('zpred', names(postMean), value = TRUE)])
zpred

## [1]  0.3956538  0.1152484  0.6818702  1.2637375 -0.7540322

Now consider the variance of these predictions. In the MCMC, we generated Cpred, or $\Omega$, the (conditional) covariance of the response at the prediction points. Here’s the posterior mean of the conditional covariance, and more importantly, it’s main diagonal elements.

Cpred <- array(postMean[grep('Cpred', names(postMean), value = TRUE)], c(p,p))
Cpred

##           [,1]      [,2]      [,3]      [,4]      [,5]
## [1,] 73.841578  3.581254  7.305497 21.244479  4.405814
## [2,]  3.581254 43.476938 16.768007  9.583171  6.563217
## [3,]  7.305497 16.768007 63.228807 23.442256  4.644436
## [4,] 21.244479  9.583171 23.442256 74.987104  3.343346
## [5,]  4.405814  6.563217  4.644436  3.343346 45.260444

diag(Cpred)

## [1] 73.84158 43.47694 63.22881 74.98710 45.26044

And, the variance of the realizations of $Z^{*}$, which is the same.

postVar <- apply(samples, 2, var)
unname(postVar[grep('zpred', names(postVar), value = TRUE)])

## [1] 76.02885 43.99284 65.18167 79.61679 44.63734

So, point being of all of this, we don’t need to actually generate the $Z^{*}$ predictions (as in equation (14) of your paper), but instead calculating the conditional mean ($\mu$) and covariance ($\Omega$) of the posterior predictive distirbution is sufficient to give us a handle on the properties of the predictive locations. At least if we’re only interested in the postior mean and variance (and covariances) of the predicted responses.

Does that sound right to you?

Larger Model

Now we’ll look at a more realistic model. Use $N=50$ observed locations, and we’ll do prediction on a grid.

First generate the $N=50$ observed points:

N <- 50

input <- generateData(N, sigma=3, tau=.3, psi=20, smax=100, seed = 0)

Now the predictive locations, called spred. This is a $50$ x $50$ grid of points, for $p=2500$ predicition points.

locs <- seq(0, 100, length = 50)
spred <- as.matrix(expand.grid(locs, locs))
p <- dim(spred)[1]
p

## [1] 2500

No longer can get away with just using dist to calculate the distance matrix, since we’ll need to calculate a non-square distance matrix between the $N$ observed points, and the $p$ prediction points. So we make our own, to handle x and y of different lengths:

distMatrix <- function(x, y) {
    dst <- array(NA, c(dim(x)[1], dim(y)[1]))
    for(i in 1:dim(x)[1])
        for(j in 1:dim(y)[1])
            dst[i,j] <- sqrt(sum((x[i,] - y[j,])^2))
    return(dst)
}

input$dst already contains the distance matrix of the $N$ observed locations with themselves.

dstpN is the non-square distance matrix, between the predictive and observed locations. We’ll need this for the conditional calculations.

dstpp is the distance matrix of the $p$ predictive points, with themselves. This is needed for calculation of the conditional covariance, $\Omega$.

Note, in the implementation below, the calculations of the conditional covariance are commented out, for speed, but they can easily be added back in.

dstpN <- distMatrix(spred, input$s)
dstpp <- unname(as.matrix(dist(spred)))

constants <- list(N = N, p = p, ones = rep(1,N+p), dstNN = input$dst, dstpN = dstpN, dstpp = dstpp)
data <- list(z = input$z)
inits <- list(mu = 0, tau = 1, sigma = 1, psi = 1)

Again, in this code, we don’t generate realizations of $Z^{*}$, nor the conditional covariance. We only generate the mean at each prediction point, $\mu$, or meanpred below.

But, commenting in the 2nd and 3rd to last lines would generate the conditional covariance ($\Omega$, or Cpred), and the final line would generate realizations of the predictive response. Adding either of these will slow things down, of course.

code_GPpred2 <- nimbleCode({
    mu ~ dnorm(0, sd = 10000)
    tau ~ dunif(0, 10000)
    sigma ~ dunif(0, 10000)
    psi ~ dunif(0, 10000)
    mean[1:(N+p)] <- ones[1:(N+p)] * mu
    C[1:N,1:N] <- isoExpCov(dstNN[1:N,1:N], tau, sigma, psi)
    z[1:N] ~ dmnorm(mean[1:N], cov = C[1:N,1:N])
    CpN[1:p,1:N] <- isoExpCov(dstpN[1:p,1:N], tau, sigma, psi, mainDiag = 0)
    regc[1:p,1:N] <- CpN[1:p,1:N] %*% inverse(C[1:N,1:N])
    meanpred[1:p] <- asCol(mean[(N+1):(N+p)]) + regc[1:p,1:N] %*% (z[1:N]-mean[1:N])
    ##Cpp[1:p,1:p] <- isoExpCov(dstpp[1:p,1:p], tau, sigma, psi)
    ##Cpred[1:p,1:p] <- Cpp[1:p,1:p] - regc[1:p,1:N] %*% t(CpN[1:p,1:N])
    ##zpred[1:p] ~ dmnorm(meanpred[1:p], cov = Cpred[1:p,1:p])
})

niter <- 2000

t <- system.time(
    samples <- nimbleMCMC(code_GPpred2, constants, data, inits,
                          monitors = c('mu','tau','sigma','psi','meanpred'),
                          niter = niter, setSeed = TRUE, progressBar = TRUE)
)

Runs quite fast! Especially (I’d say) for $p=2500$ prediction points.

##    user  system elapsed 
##  47.060   0.933  48.178

samplesSummary(samples[,c('mu','tau','sigma','psi')])

##               Mean      Median      St.Dev.    95%CI_low   95%CI_upp
## mu      -0.2460074   0.4161332   23.5824609 -67.16528157   45.720378
## tau      0.4775732   0.4372196    0.2919318   0.02565801    1.089049
## sigma   19.9994987  13.0567899   15.9771950   2.32237761   50.767192
## psi   2626.5520763 729.1386029 3168.5838565  16.05516215 9539.508020

Finally, we’ll look at a plot of the observed locations (large circles), and the posterior prediction mean on the grid of prediction locations.

postMean <- apply(samples, 2, mean)
meanpred <- unname(postMean[grep('meanpred', names(postMean), value = TRUE)])

ggplot(data.frame(s1 = input$s[,1], s2 = input$s[,2], z = input$z),
       aes(x = s1, y = s2, color = z)) +
       coord_fixed() + theme_bw() + scale_color_gradientn(colors = brewer.pal(9, 'Spectral')) +
       geom_point(size = 3) + 
       geom_point(data = data.frame(s1 = spred[,1], s2 = spred[,2], z = meanpred), shape = 4)

Looks good!

And, I think all of this is done as efficiently as possible. No extra calculations going on anywhere, I don’t believe.

Analysis 3 - GP Predictions

Daniel Turek

22 January 2018

Test Model

Larger Model