NIMBLE implementation of FNS-M2 from Risser and Calder (2015)

Introduction

In this document, we reproduce an analysis of the precipitation data set used by Paciorek and Schervish (2006) from Colorado, a state in the western United States of America. The data used here consist of monthly precipitation recorded at each of approximately 400 weather stations in Colorado; we specifically analyze annual precipitation from 1981, as this year has the most stations (217) without missing monthly values. Annual precipitation totals are given in millimeters, and we analyze the log of total precipitation to make the Gaussian process assumption more appropriate.

The specific analysis is an illustration of the regression-based nonstationary covariance function from Risser and Calder (2015). In the original implementation, covariate information can be included in three parts of the nonstationary model: the mean function, the spatial variance function, and the kernel matrix function. Risser and Calder (2015) consider elevation as well as a covariate that describes change in elevation (“slope”), measured as a west-to-east gradient. Here, we reproduce the FNS-M2 implementation, which includes the main effects of elevation and slope, as well as their interaction, in each of the mean, variance, and kernel matrix functions. Following Risser and Calder (2015), elevation and slope measurements are standardized in order to put the coefficient estimates on a similar scale.

We now re-state the statistical model and nonstationary covariance function. Next, we describe the steps used for all matrix calculations in the covariance function, and then proceed with the implementation in NIMBLE.

GP spatial statistical model and nonstationary covariance function

A general modeling framework for a univariate spatial Gaussian process can be defined as follows via a linear mixed model. Define \(\{ Z({\bf s}) : {\bf s} \in G\}\) to be the observed value of a spatial process over some subset \(G \subset \mathcal{R}^d\), with \(d\geq 1\). The mixed model can be written as \[ Z({\bf s}) = {\bf x}({\bf s})^\top \boldsymbol{\beta} + Y({\bf s}) + \varepsilon({\bf s}), \] where \(E[Z({\bf s})] = {\bf x}({\bf s})^\top\boldsymbol{\beta}\) is a linear (deterministic) mean function in a set of \(p\) covariates (including an intercept), \(Y(\cdot)\) is a mean-zero spatial random effect, and \(\varepsilon(\cdot)\) is a stochastic component that represents measurement error or microscale variability and is independent and identically distributed as \({N}(0, \tau^2)\) such that \(\varepsilon(\cdot)\) and \(Y(\cdot)\) are independent. The spatial random effect is modeled as a parametric Gaussian process, denoted \(Y(\cdot) \sim GP\big(0, C(.,.; \boldsymbol{\theta})\big)\), where the covariance function depends on a vector of parameters \(\boldsymbol{\theta}\) and describes the covariance between the process \(Y(\cdot)\) as \(C_Y({\bf s}, {\bf s}'; \boldsymbol{\theta}) \equiv Cov\big(Y({\bf s}), Y({\bf s}') \big)\), for all \({\bf s}, {\bf s}' \in G\).

For a fixed, finite set of \(n\) spatial locations \(\{{\bf s}_1, ... , {\bf s}_n\}\in G\), the random (observed) vector \({\bf Z} = \big( Z({\bf s}_1), ... , Z({\bf s}_n) \big)^\top\) has a multivariate Gaussian distribution \(p({\bf Z} | {\bf Y}, \boldsymbol{\beta}, \tau^2 ) = {N}_n\big({\bf X}\boldsymbol{\beta} + {\bf Y}, \tau^2{\bf I}\big)\), where \({\bf X} = ({\bf x}({\bf s}_1)^\top, \dots, {\bf x}({\bf s}_n)^\top)^\top\). Conditional on the other parameters in the model, the process vector \({\bf Y} = \big( Y({\bf s}_1), ... , Y({\bf s}_n) \big)^\top\) is distributed as \(p({\bf Y} | \boldsymbol{\theta}) = {N}_n\big({\bf 0, \Omega}(\boldsymbol{\theta})\big)\), where the elements of \({\bf \Omega}(\boldsymbol{\theta})\) are \(\Omega_{ij}(\boldsymbol{\theta}) \equiv C_Y({\bf s}_i, {\bf s}_j; \boldsymbol{\theta})\). In practice, it is helpful to integrate over the process \(Y(\cdot)\) to arrive at the marginal distribution for \(Z(\cdot)\), which is \[ p({\bf Z} | \boldsymbol{\beta}, \tau^2, \boldsymbol{\theta}) = \int p({\bf Z} | {\bf Y}, \boldsymbol{\beta}, \tau^2 ) p({\bf Y} | \boldsymbol{\theta} ) d{\bf Y} = {N}_n\big({\bf X}\boldsymbol{\beta}, \tau^2{\bf I} + {\bf\Omega}(\boldsymbol{\theta})\big). \]

Following Paciorek and Schervish (2006), Risser and Calder (2015), and others, the parametric covariance function for the spatial random effect \(Y(\cdot)\) is \[ C_Y({\bf s}_i, {\bf s}_j; \boldsymbol{\theta}) = \sigma({\bf s}_i) \sigma({\bf s}_j) \frac{\left|{\bf \Sigma}({\bf s}_i)\right|^{1/4}\left|{\bf \Sigma}({\bf s}_j)\right|^{1/4}}{\left|\frac{{\bf \Sigma}({\bf s}_i) + {\bf \Sigma}({\bf s}_j)}{2} \right|^{1/2}} g\left(\sqrt{Q({\bf s}_i, {\bf s}_j)}\right), \] where \[ Q({\bf s}_i, {\bf s}_j) = ({\bf s}_i - {\bf s}_j)^\top \left(\frac{{\bf \Sigma}({\bf s}_i) + {\bf \Sigma}({\bf s}_j)}{2}\right)^{-1}({\bf s}_i - {\bf s}_j) \] is a squared Mahalanobis distance and \(g(\cdot)\) is any valid correlation function over \(\mathcal{R}^d\), \(d \geq 1\), e.g., the Matern, exponential, spherical, etc. Using the marginalized model, the covariance function of \(Z(\cdot)\) is \[ C_Z({\bf s}_i, {\bf s}_j; \boldsymbol{\theta}) = \tau^2 I_{\{i = j\}} + C_Y({\bf s}_i, {\bf s}_j; \boldsymbol{\theta}), \] which includes the measurement error or microscale variability \(\varepsilon(\cdot)\).

The parameters in the covariance function \(C_Z\) must be regularized somehow, as we require values of \(\sigma(\cdot)\) and \({\bf \Sigma}(\cdot)\) over an infinite-dimensional space (i.e., \(G\)). Using the intuition from mean regression, we can characterize changes in these covariance parameter “processes” in terms of relevant covariates after an appropriate transformation: \[ \begin{array}{l} \log \sigma({\bf s}) = {\bf x}({\bf s})^\top \boldsymbol{\alpha}\\[1ex] {\bf \Sigma}({\bf s}) = {\bf \Psi} + {\bf \Gamma} {\bf x}({\bf s}) {\bf x}({\bf s})^\top {\bf \Gamma}^\top \end{array} \] The regression model for the matrix process \({\bf \Sigma}(\cdot)\) comes from the covariance regression model of Niu and Hoff (2014). Note that we have used the same covariate vector for the mean and all covariance parameter processes, but in general this need not be the case.

Using this parameterization and in the usual case where the spatial domain is two-dimensional, i.e., \(d = 2\), the vector of covariance parameters is \[ \boldsymbol{\theta} = (\tau^2, \boldsymbol{\alpha}, {\bf \Psi}, {\bf \Gamma}), \] which involves \(1 + p + 3 + 2p = 4 + 3p\). For FNS-M2 where \(p = 4\) (i.e., looking at elevation, slope, and the elevation-slope interaction), this means there are 16 parameters to estimate to characterize the the covariance function \(C_Z\).

Matrix calculations for \(C_Y\)

For a fixed set of locations \(\{{\bf s}_1, ... , {\bf s}_n\}\) and conditional on \(\boldsymbol{\theta}\), we need a fast way to calculate the \(n\times n\) matrix \({\bf\Omega}(\boldsymbol{\theta})\). For now, we opt to keep everything in general terms, i.e., \(\sigma(\cdot)\) and \({\bf \Sigma}(\cdot)\), as we will eventually want to consider other models than the regression frameworks outlined above.

First, in the two-dimensional (i.e., \(d=2\)) case, we can re-write the generic Mahalanobis distance \[ Q({\bf s}, {\bf s}') = ({\bf s} - {\bf s}')^\top \left(\frac{{\bf \Sigma}({\bf s}) + {\bf \Sigma}({\bf s}')}{2}\right)^{-1}({\bf s} - {\bf s}') \] using \({\bf s} = (s_1, s_2)\), \({\bf s}' = (s'_1, s'_2)\), and with \[ {\bf \Sigma}({\bf s}) = \left[ \begin{array}{cc} \Sigma_{11}({\bf s}) & \Sigma_{12}({\bf s}) \\ \Sigma_{12}({\bf s}) & \Sigma_{22}({\bf s}) \end{array} \right] \] we can write \[ Q({\bf s}, {\bf s}') = a(s_1-s_1')^2 + 2b(s_1-s_1')(s_2-s_2') + c(s_2-s_2')^2, \] where \(a\), \(b\), and \(c\) are the closed-form elements of \(\left(\frac{{\bf \Sigma}({\bf s}) + {\bf \Sigma}({\bf s}')}{2}\right)^{-1}\), i.e., writing \[ \frac{{\bf \Sigma}({\bf s}) + {\bf \Sigma}({\bf s}')}{2} = \left[ \begin{array}{cc} d_{11} & d_{12} \\ d_{12} & d_{22} \end{array} \right], \] then \[ a = \frac{d_{22}}{d_{11}d_{22} - d_{12}^2}, \hskip2ex b = \frac{-d_{12}}{d_{11}d_{22} - d_{12}^2}, \hskip2ex c = \frac{d_{11}}{d_{11}d_{22} - d_{12}^2}. \]

Of course, all of the above values depend on \({\bf s}\) and \({\bf s}'\). To calculate the entire distance distance matrix, i.e., \(Q({\bf s}, {\bf s}')\) for all pairs of stations, we can convert the above calculation to involve only matrix operations. For the following, we write everything using R syntax.

First, we need matrices of \((s_1-s_1')^2\), \((s_1-s_1')(s_2-s_2')\), and \((s_2-s_2')^2\) for all pairs of stations. Define coords to be a \(n\times 2\) matrix of the coordinates \(\{{\bf s}_1, ... , {\bf s}_n\}\). The squared univariate distances are easily calculated using dist(), but we need to be careful about the the sign of the cross-distance matrix. The following code will calculate these distances correctly:

# Univariate distances
dists1 <- as.matrix(dist(coords[,1], upper = T, diag = T))
dists2 <- as.matrix(dist(coords[,2], upper = T, diag = T))

# Adjustment for the sign of the cross-distances
temp1 <- matrix(coords[,1], nrow = n, ncol = n, byrow = FALSE)
temp2 <- matrix(coords[,2], nrow = n, ncol = n, byrow = FALSE)

sgn_mat1 <- ( temp1 - t(temp1) >= 0 )
sgn_mat1[sgn_mat1 == FALSE] <- -1 
sgn_mat2 <- ( temp2 - t(temp2) >= 0 )
sgn_mat2[sgn_mat2 == FALSE] <- -1 

dist1_sq <- dists1^2
dist2_sq <- dists2^2
dist12 <- sgn_mat1*dists1*sgn_mat2*dists2

Note that for a fixed set of coordinates, we only need to calculate dist1_sq, dist2_sq, and dist12 once, and then pass these around as constants in the NIMBLE code.

Next, we can calculate matrix versions of the scaling matrix \(\left(\frac{{\bf \Sigma}({\bf s}) + {\bf \Sigma}({\bf s}')}{2}\right)^{-1}\). Define Sigma11 to be a vector \(\big(\Sigma_{11}({\bf s}_1), \dots, \Sigma_{11}({\bf s}_n)\big)\), and similarly Sigma22 and Sigma12, the distance matrix Dist.mat containing \(\sqrt{Q({\bf s}, {\bf s}')}\) for all pairs of stations can be calculated as follows:

# Matrix versions for all pairs
mat11_a <- matrix(Sigma11, nrow = N, ncol = N) 
mat22_a <- matrix(Sigma22, nrow = N, ncol = N)
mat12_a <- matrix(Sigma12, nrow = N, ncol = N)
mat11 <- 0.5*(mat11_a + t(mat11_a))
mat22 <- 0.5*(mat22_a + t(mat22_a))
mat12 <- 0.5*(mat12_a + t(mat12_a))

# Inverse elements 
det12 <- mat11*mat22 - mat12^2
inv11 <- mat22/det12      # above, this is denoted a
inv12 <- -mat12/det12     # above, this is denoted b
inv22 <- mat11/det12      # above, this is denoted c

# Combine with distances
Dist.mat <- sqrt( inv11*dist1_sq + 2*inv12*dist12 + inv22*dist2_sq )

The matrix Dist.mat can then be fed into any correlation function \(g(\cdot)\).

The second component of the covariance function calculation involves the scaling terms \(\left|{\bf \Sigma}({\bf s}_i)\right|^{1/4}\left|{\bf \Sigma}({\bf s}_j)\right|^{1/4}\Big/\left|\frac{{\bf \Sigma}({\bf s}_i) + {\bf \Sigma}({\bf s}_j)}{2} \right|^{1/2}\). Again, this can be done using matrix calculations and the same notation as above (using det12 as calculated before), yielding the matrix Scale.mat:

det1 <- Sigma11*Sigma22 - Sigma12^2 
Scale.mat <- diag(sqrt(sqrt(det1))) %*% sqrt(1/det12) %*% diag(sqrt(sqrt(det1)))

Finally, defining sigma_vec to be the vector \(\big( \sigma({\bf s}_1), \dots, \sigma({\bf s}_n) \big)\) and corFun to be a generic correlation function for \(g(\cdot)\) (e.g., the exponential exp(-dist)), the full covariance matrix \({\bf\Omega}(\boldsymbol{\theta})\) is calculated as:

Unscl.corr <- corFun(Dist.mat)
NS.corr <- Scale.mat*Unscl.corr
Omega <- diag(sigma_vec) %*% NS.corr %*% diag(sigma_vec)

No loops necessary!!

Implementation in NIMBLE

We now outline the implementation of all of the above in NIMBLE. The necessary functions are explicitly written out here, and will be sourced in via the fns_m2_source.R file. Note that these functions are generic for the covariance function \(C_Y\) (and hence \(C_Z\)) and do not depend on the regression-based formulation. Also, the code is broken up into several pieces in order to avoid unneccesary calculation of the various correlation/covariance matrices in the MCMC.

nimbleFunction to calculate the nonstationary exponential correlation matrix (outputs what I called NS.corr above)

NS_expCor <- nimbleFunction(     
  run = function( dist1_sq = double(2), dist2_sq = double(2), dist12 = double(2), 
                  Sigma11 = double(1), Sigma22 = double(1), Sigma12 = double(1) ) {
    returnType(double(2))
    
    # Calculate the scale matrix 
    det1 <- Sigma11*Sigma22 - Sigma12^2 
    mat11_a <- matrix(Sigma11, nrow = N, ncol = N) 
    mat22_a <- matrix(Sigma22, nrow = N, ncol = N)
    mat12_a <- matrix(Sigma12, nrow = N, ncol = N)
    mat11 <- 0.5*(mat11_a + t(mat11_a))
    mat22 <- 0.5*(mat22_a + t(mat22_a))
    mat12 <- 0.5*(mat12_a + t(mat12_a))
    det12 <- mat11*mat22 - mat12^2
    Scale.mat <- diag(sqrt(sqrt(det1))) %*% sqrt(1/det12) %*% diag(sqrt(sqrt(det1)))
    
    # Calculate the distance matrix
    inv11 <- mat22/det12
    inv22 <- mat11/det12
    inv12 <- -mat12/det12
    Dist.mat <- sqrt( inv11*dist1_sq + 2*inv12*dist12 + inv22*dist2_sq )
    
    # Combine 
    Unscl.corr <- exp(-Dist.mat) # Here is the exponential correlation
    NS_corr <- Scale.mat*Unscl.corr
    return(NS_corr)
    
  })

nimbleFunction to convert the nonstationary correlation matrix to a nonstationary covariance (realizing now that this might be unneccesary…)

cor2cov <- nimbleFunction(     
  run = function( NS_corr = double(2), sigma_vec = double(1) ) {
    returnType(double(2))
    return( diag(sigma_vec) %*% NS_corr %*% diag(sigma_vec) )
  })

Focusing in now on the regression-based formulation, the source code also includes a nimbleFunction for a Gibbs sampler for the mean coefficients. Risser and Calder (2015) use an independent Metropolis Hastings sampler (inverse-Wishart) for the baseline kernel matrix \({\bf \Psi}\) – this isn’t ideal for a general implementation (also, I don’t want to code up the inverse-Wishart or an independent MH sampler), so I’ll instead use somewhat informative priors for the components of \({\bf \Psi}\). As a result, the parameter estimates won’t be exactly the same.

knitr::opts_chunk$set(warning = FALSE)

# Libraries and source code
library(nimble)
library(MASS)
library(ggplot2)
library(RColorBrewer)
library(gridExtra)

source("fns_m2_source.R")

# Load data =====================================
COprecip <- read.csv("Final_data.csv")
coords <- as.matrix(COprecip[,c("Longitude", "Latitude")])
z <- COprecip$logPrecip
Xmat <- unname(lm(logPrecip ~ Zelevation*Zslope10, x = TRUE, data = COprecip)$x)
N <- nrow(COprecip)

# Distance matrices =============================
dists1 <- as.matrix(dist(coords[,1], upper = T, diag = T))
dists2 <- as.matrix(dist(coords[,2], upper = T, diag = T))

temp1 <- matrix(coords[,1], nrow = N, ncol = N, byrow = FALSE)
temp2 <- matrix(coords[,2], nrow = N, ncol = N, byrow = FALSE)
sgn_mat1 <- ( temp1 - t(temp1) >= 0 )
sgn_mat1[sgn_mat1 == FALSE] <- -1 
sgn_mat2 <- ( temp2 - t(temp2) >= 0 )
sgn_mat2[sgn_mat2 == FALSE] <- -1 

dist1_sq <- dists1^2
dist2_sq <- dists2^2
dist12 <- sgn_mat1*dists1*sgn_mat2*dists2

Take a quick look at the data and covariates:

For simplicity, I will use the parameterization \(\sigma^2_0 = \exp\{\alpha_0\}\). Also, following Risser and Calder (2015), I will put a prior on \(\rho = \psi_{12}/\sqrt{\psi_{11}\psi_{22}}\). Otherwise, the NIMBLE code for the model is as follows:

# Code
fns_m2_code <- nimbleCode({
  
  # Likelihood
  z[1:N] ~ dmnorm( mean = mn[1:N], cov = C[1:N,1:N] ) 
  
  # Mean 
  mn[1:N] <- Xmat[1:N,1:P] %*% beta[1:P]
  
  # Variance -- excluding the intercept (see sigmasq0)
  sigma_vec[1:N] <- exp(Xmat[1:N,2:P] %*% alpha[1:(P-1)])

  # Correlation 
  Cor[1:N,1:N] <- NS_expCor( dist1_sq[1:N,1:N], dist2_sq[1:N,1:N], dist12[1:N,1:N],
                             Sigma11[1:N], Sigma22[1:N], Sigma12[1:N] )
  Sigma11[1:N] <- psi11*ones[1:N] + (Xmat[1:N,1:P] %*% gamma1[1:P])^2
  Sigma12[1:N] <- rho*sqrt(psi11*psi22)*ones[1:N] + (Xmat[1:N,1:P]%*%gamma1[1:P])*(Xmat[1:N,1:P]%*%gamma2[1:P])
  Sigma22[1:N] <- psi22*ones[1:N] + (Xmat[1:N,1:P] %*% gamma2[1:P])^2

  # Covariance
  Cov_uns[1:N, 1:N] <- cor2cov( Cor[1:N, 1:N], sigma_vec[1:N] )
  Cov[1:N, 1:N] <- sigmasq0*Cov_uns[1:N,1:N]
  Nugg[1:N, 1:N] <- tausq*Imat[1:N, 1:N]
  C[1:N,1:N] <- Cov[1:N, 1:N] + Nugg[1:N, 1:N]
   
  # Hyperparameters/priors
  psi11 ~ dunif(0, 2)
  psi22 ~ dunif(0, 2)
  rho ~ dunif(-1, 1)
  tausq ~ dunif(0, 1000)
  sigmasq0 ~ dunif(0, 1000)
  for(j in 1:P){
    gamma1[j] ~ dnorm(0, sd = 10)
    gamma2[j] ~ dnorm(0, sd = 10)
    beta[j] ~ dnorm(0, sd = 100)
  }
  for(j in 1:(P-1)){
    alpha[j] ~ dnorm(0, sd = 10)
  }
  
})

# Initialize
fns_m2_model <- nimbleModel( 
  code = fns_m2_code, 
  constants = list(N = N, P = ncol(Xmat) ), 
  inits = list( beta = rep(0, ncol(Xmat)), psi11 = 0.5, psi22 = 0.5, rho = 0,
                gamma1 = rep(0, ncol(Xmat)), gamma2 = rep(0, ncol(Xmat)),
                alpha = rep(0, ncol(Xmat)-1), tausq = 0.1, sigmasq0 = 0.1,
                Sigma11 = rep(0.5,N), Sigma11 = rep(0.5,N), Sigma12 = rep(0,N),
                Cor = diag(N), Cov_uns = diag(N), Cov = diag(N), Nugg = diag(N), C = diag(N) ), 
  data = list(z = z, Xmat = Xmat, ones = rep(1,N), dist1_sq = dist1_sq, 
              dist2_sq = dist2_sq, dist12 = dist12, ones = rep(1,N), Imat = diag(N)) )

# Compile/configure MCMC
compl_fns_m2_model <- compileNimble( fns_m2_model )
conf_fns_m2_model <- configureMCMC( fns_m2_model )
conf_fns_m2_model$removeSamplers( c("psi11", "psi22", "rho") )
conf_fns_m2_model$addSampler( target = c("psi11", "psi22", "rho"), type = "RW_block" )
conf_fns_m2_model$removeSamplers( "beta[1:4]" )
conf_fns_m2_model$addSampler( target = "beta[1:4]", type = "betaGibbs" )
# # Look at configuration
# conf_fns_m2_model$getSamplers()
mcmc <- buildMCMC( conf_fns_m2_model )
Cmcmc <- compileNimble(mcmc, project = fns_m2_model )

Now, run the MCMC and extract posterior samples:

prt <- proc.time()
Cmcmc$run(20000)
ttm <- proc.time() - prt
samp <- as.matrix(Cmcmc$mvSamples)[10001:20000,]

# Total time (in minutes):
ttm/60

##      user    system   elapsed 
## 60.249433  1.052567 64.976700

Look at the results (trace plots suppressed). “True” parameter values are from the Environmetrics paper. Note: the Gamma parameter is only identifiable up to a sign – so, we condition the posterior samples on one of the parameters being positve.

ggplot(allResults, aes(x = Parameter, y = PostMean, ymin = CI_lb, ymax = CI_ub, color = Type)) +
    geom_pointrange(position = position_dodge(width = 0.2)) +
  theme(axis.text.x = element_text(angle=45, hjust=1) )

NIMBLE implementation of FNS-M2 from Risser and Calder (2015)

Mark Risser

1/25/2018

Introduction

GP spatial statistical model and nonstationary covariance function

Matrix calculations for \(C_Y\)

Implementation in NIMBLE