Document Detail

Diffusion-based spatial priors for imaging.
Jump to Full Text
MedLine Citation:
PMID:  17869542     Owner:  NLM     Status:  MEDLINE    
Abstract/OtherAbstract:
We describe a Bayesian scheme to analyze images, which uses spatial priors encoded by a diffusion kernel, based on a weighted graph Laplacian. This provides a general framework to formulate a spatial model, whose parameters can be optimized. The application we have in mind is a spatiotemporal model for imaging data. We illustrate the method on a random effects analysis of fMRI contrast images from multiple subjects; this simplifies exposition of the model and enables a clear description of its salient features. Typically, imaging data are smoothed using a fixed Gaussian kernel as a pre-processing step before applying a mass-univariate statistical model (e.g., a general linear model) to provide images of parameter estimates. An alternative is to include smoothness in a multivariate statistical model (Penny, W.D., Trujillo-Barreto, N.J., Friston, K.J., 2005. Bayesian fMRI time series analysis with spatial priors. Neuroimage 24, 350-362). The advantage of the latter is that each parameter field is smoothed automatically, according to a measure of uncertainty, given the data. In this work, we investigate the use of diffusion kernels to encode spatial correlations among parameter estimates. Nonlinear diffusion has a long history in image processing; in particular, flows that depend on local image geometry (Romeny, B.M.T., 1994. Geometry-driven Diffusion in Computer Vision. Kluwer Academic Publishers) can be used as adaptive filters. This can furnish a non-stationary smoothing process that preserves features, which would otherwise be lost with a fixed Gaussian kernel. We describe a Bayesian framework that incorporates non-stationary, adaptive smoothing into a generative model to extract spatial features in parameter estimates. Critically, this means adaptive smoothing becomes an integral part of estimation and inference. We illustrate the method using synthetic and real fMRI data.
Authors:
L M Harrison; W Penny; J Ashburner; N Trujillo-Barreto; K J Friston
Publication Detail:
Type:  Journal Article; Research Support, Non-U.S. Gov't     Date:  2007-08-08
Journal Detail:
Title:  NeuroImage     Volume:  38     ISSN:  1053-8119     ISO Abbreviation:  Neuroimage     Publication Date:  2007 Dec 
Date Detail:
Created Date:  2007-11-02     Completed Date:  2008-01-08     Revised Date:  2013-06-06    
Medline Journal Info:
Nlm Unique ID:  9215515     Medline TA:  Neuroimage     Country:  United States    
Other Details:
Languages:  eng     Pagination:  677-95     Citation Subset:  IM    
Affiliation:
The Wellcome Trust Centre for Neuroimaging, Institute of Neurology, University College London, 12 Queen Square, London, WC1N 3BG, UK. l.harrison@fil.ion.ucl.ac.uk
Export Citation:
APA/MLA Format     Download EndNote     Download BibTex
MeSH Terms
Descriptor/Qualifier:
Algorithms
Analysis of Variance
Bayes Theorem
Brain / anatomy & histology,  physiology
Diffusion Magnetic Resonance Imaging / statistics & numerical data*
Image Processing, Computer-Assisted / statistics & numerical data*
Least-Squares Analysis
Linear Models
Magnetic Resonance Imaging / statistics & numerical data
Models, Neurological
Normal Distribution
Population
Grant Support
ID/Acronym/Agency:
056750//Wellcome Trust
Comments/Corrections

From MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine

Full Text
Journal Information
Journal ID (nlm-ta): Neuroimage
ISSN: 1053-8119
ISSN: 1095-9572
Publisher: Academic Press
Article Information
Download PDF
? 2007 Elsevier Inc.
License:This document may be redistributed and reused, subject to certain conditions.
Received Day: 25 Month: 10 Year: 2006
Revision Received Day: 3 Month: 7 Year: 2007
Accepted Day: 16 Month: 7 Year: 2007
Print publication date: Month: 12 Year: 2007
Volume: 38 Issue: 4-3
First Page: 677 Last Page: 695
ID: 2643839
PubMed Id: 17869542
Publisher Id: YNIMG4793
DOI: 10.1016/j.neuroimage.2007.07.032

Diffusion-based spatial priors for imaging
L.M. Harrisona? Email: l.harrison@fil.ion.ucl.ac.uk
W. Pennya
J. Ashburnera
N. Trujillo-Barretob
K.J. Fristona
aThe Wellcome Trust Centre for Neuroimaging, Institute of Neurology, University College London, 12 Queen Square, London, WC1N 3BG, UK
bCuban Neuroscience Centre, Havana, Cuba
?Corresponding author. Fax: +44 207 813 1445. l.harrison@fil.ion.ucl.ac.uk

Introduction

Functional MRI data are typically transformed to a three-dimensional regular grid of voxels in anatomical space, each containing a univariate time series of responses to experimental perturbation. The data are then used to invert a statistical model, e.g., general linear model (GLM), after a number of pre-processing steps, which include spatial normalization and smoothing (i.e., convolving the data with a spatial kernel). In mass-univariate approaches (e.g., statistical parametric mapping), a statistical model is used to extract features from the smoothed data by treating each voxel as a separate observation. Model parameters, at each voxel, are estimated (Friston et al., 2002) and inference about these parameters proceeds using SPMs or posterior probability maps (Friston and Penny, 2003). Smoothing the data ensures the maps of parameter estimates are also smooth. This can be viewed as enforcing a smoothness prior on the parameters. The current paper focuses on incorporating smoothness into the statistical model by making smoothness a hyperparameter of the model and estimating it using empirical Bayes. This optimizes the spatial dependencies among parameter estimates and has the potential to greatly enhance spatial feature detection.

Recently Penny et al. (2005) extended the use of shrinkage priors on parameter estimates (Penny et al., 2003), which assume spatial independence, to spatial priors in a statistical model of fMRI time series. They developed an efficient algorithm using a mean-field approximation within a variational Bayes framework. The result is a smoothing process that is incorporated into a generative model of the data, where each parameter is smoothed according to a measure of uncertainty in that parameter. The advantage of a mean-field approximation is that inversion of a requisite spatial precision matrix is avoided. The advantage of a Bayesian framework is that the evidence for different spatial priors can be compared (MacKay, 2003). Other Bayesian approaches to spatial priors in fMRI include those of Gossl et al. (2001); Woolrich et al. (2004); and more recently Flandin and Penny (2007).

There are two main departures from this previous work on spatiotemporal models in the current method. The first is that we use a Gaussian process prior (GPP) over parameter estimates. Spatial correlations are then encoded using a covariance matrix instead of precisions (cf. Penny et al., 2005). The second is that the covariance matrix is the Green's function of a diffusive process, i.e., a diffusion kernel, which encodes the solution of a diffusion equation involving a weighted graph Laplacian. This has the advantage of providing a full spatial covariance matrix and enables inference with regards to the spatial extent of activations. This is not possible using a mean-field approximation that factorizes the posterior distribution over voxels. The result is an adaptive smoothing that can be spatially non-stationary, depending on the data. This is achieved by allowing the local geometry of the parameter field to influence the diffusion kernel (smoothing operator). This is important as stationary smoothing reveals underlying spatial signal at the expense of blurring spatial features. Given the convoluted spatial structure of the cortex and patchy functional segregation, it is reasonable to expect variability in the gradient structure of a parameter field. The implication is that the local geometry of activations should be preserved. This can be achieved with a nonlinear smoothing process that adapts to local geometric ?features?. A disadvantage is the costly operation of evaluating matrix exponentials and inverting potentially large covariance matrices, which the mean-field approach avoids. However, many approximate methods exist (MacKay, 2003; Rasmussen and Williams, 2006) that can ameliorate this problem, e.g., sparse GPPs (see discussion and Quinonero-Candela and Rasmussen, 2005).

The paper is organized as follows. First, we discuss background and related approaches, before giving an outline of the theory of the method. We start with the model, which is a two-level general linear model (GLM) with matrix-variate density priors on GLM parameters. We focus on reducing the model to the specification of covariance components, in particular, the form of covariance and its hyperparameters. We then look at the form of the spatial priors using graph Laplacians and diffusion kernels. We then describe the EM algorithm that is used to update hyperparameters of covariance components, which embody empirical spatial priors. The edge preserving quality of diffusion over a weighted graph is demonstrated using synthetic data and then applied to real fMRI data. The illustrations in this paper use 2D spatial images, however, the method can be easily extended to 3D, subject to computational resources, which would be necessary to analyze a volume of brain data. We perform a random effects (between subjects) analysis (Penny and Holmes, 2003) on a sample of contrast images from twelve subjects. This means that we consider a scalar field of parameter estimates encoding the population response. However, the nonlinear diffusion kernels described here can be extended to fields of vectors and matrices (ChefD'Hotel et al., 2004; Zhang and Hancock, 2006b). This paper concludes with comments on outstanding issues and future work.


Background

The current work draws on two main sources in the literature; diffusion-based methods in image processing and Gaussian process models (GPM). The image processing community has been using diffusion models for many years, e.g., for the restoration of noisy images (Knutsson et al., 1983). For overviews, from the perspective of scale-space theories, see Romeny (1994, 2003). These models rest on the diffusion equation, which is a nonlinear partial differential equation describing the density fluctuations in an ensemble undergoing diffusion; ???=???D(?)??, where ? can be regarded as the density of the ensemble (e.g., image intensity) and D is the diffusion coefficient. Generally, the diffusion coefficient depends on the density, however, if D is a constant, the equation reduces to the ?classical heat equation?; ???=?D?2?, where ?2???? is the Laplacian operator (second-order spatial derivative). A typical use in image processing is to de-noise an image, where the noisy image is the initial condition, ?(t?=?0) and a smoothed, de-noised, image is the result of integrating the heat equation to evaluate the diffused image at some time later; ?(t). In particular, Perona and Malik (1990) used nonlinear diffusion models to preserve the edges of images using an image dependent diffusion term, D?=?D(??). The dependence on this spatial gradient has the effect of reduced diffusion over regions with high gradient, i.e., edges. Later formulations of nonlinear diffusion methods include those of Alvarez et al. (1992) and Weickert (1996). Of particular relevance to the method presented here are graph-theoretic methods, which use graph Laplacians (Chung, 1991). These have been used recently to adaptively smooth scalar, vector and matrix-valued images (Zhang and Hancock, 2005). Graphical methods provide a general formulation on arbitrary graphs, which is easy to implement. There are also many useful graph-based algorithms in the literature, e.g., image processing on arbitrary graphs (Grady and Schwartz, 2003) and, more generally, graph partitioning to sparsify and solve large linear systems (Spielman and Teng, 2004).

Gaussian process models also have a long history. A Gaussian process prior (GPP) is a collection of random variables, any finite number of which have a joint Gaussian distribution (MacKay, 2003; Rasmussen and Williams, 2006). As such it is completely specified by a mean and covariance function. This is a very flexible prior as it is a prior over a function, which can be used to model general data, not just images. Given a function over space, this function is assumed to be a sample from a Gaussian random field specified by a mean and covariance, which can take many forms, as long as it is positive semi-definite.

Diffusion methods in image processing and covariance functions in GPMs furnish the basis of a spatial smoothing operator; however, the emphasis of each approach is different. One main difference is that a GPM is a statistical model from which inferences and predictions can be made (MacKay, 1998). The objective is not solely to smooth data, but to estimate an optimal smoothing operator, which is embedded in a model of how the data were generated. Graphical models in machine learning (Jordan, 1999) provide a general and easy formulation of statistical models. The similar benefits of graph-based diffusion methods in image processing further motivates the use of graph-theoretic approaches to represent and estimate statistical images, given functional brain data.

The relation between models of diffusion and GPPs is seen when considering a random variable as a diffusive process, which locally is a Gaussian process. We can see this by comparing the Green's function of the classical heat equation, used in early diffusion methods in image processing (Romeny, 1994) and the squared exponential (SE) covariance function used in GPMs (Rasmussen and Williams, 2006). In two dimensions, (uk, ul), where subscripts indicate location in the domain and D is a scalar;

[Formula ID: fd1]
(1) 
??=D??

?(t+?)=K(?)?(t)

K(uk,ul;?)=14?D?exp(?(uk?ul)T(uk?ul)4D?)
where K(?) is the Green's function (solution) of the diffusion equation that represents the evolution of a solution over time. The first line is the special case of constant diffusion coefficient. The solution of this equation is given in the second and third line, where the image at time t, ?(t), is propagated to t?+?? by convolution with the Green's function, or practically by the matrix?vector product using the matrix exponential of the scaled discrete Laplacian. This Green's function is Gaussian with variance 2D?, meaning that the image at t?+?? is a smoothed version of ?(t). This is shown explicitly in the last line1, which has the same form as the SE covariance function, given below, where the squared characteristic length scale is ?2?=?2D?. Typically, a GPP has an additional scale hyperparameter to give

(2) 
K(uk,ul;?)=?exp(?(uk?ul)T(uk?ul)2?2)
where ??=?(?, ?). A zero mean GPP is then specified, at a set of locations, by the multivariate density, ????N(0, K) (Rasmussen and Williams, 2006). In what follows, we use a diffusion kernel as the covariance of a GPP. This is a spatial prior on model parameter images.

There are a number of papers applying methods from image processing to anatomical and functional brain images. These include those of Gerig et al. (1992), who applied nonlinear diffusion methods to MRI data and Chung et al. (2003) who used the Laplace?Beltrami operator (a generalization of the heat equation to a Riemannian manifold) in a statistical approach to deformation-based morphometry. Nonlinear diffusion methods have been used to adaptively smooth functional images (Kim and Cho, 2002; Kim et al., 2005). Other approaches to adaptive analysis of fMRI include those of Cosman et al. (2004); Friman et al. (2003); and Teo et al. (1997). Graph-theoretic approaches to image processing have been used to regularize diffusion tensor images (DTI) (Zhang and Hancock, 2006a). These authors used a weighted graph Laplacian, which is a discrete analogue of the Laplace?Beltrami operator, to adaptively smooth over a field of diffusion tensors, thereby preserving boundaries between regions, e.g., white matter tracts and grey matter.

The contribution of our work is to combine graph-theoretic methods from image processing and Gaussian process models from machine learning to provide a spatial model of fMRI data. We are essentially constructing a manifold out of the parameter estimates of a linear model of fMRI data and performing isotropic diffusion on the induced manifold, which is anisotropic from the perspective of the domain. In other words, the diffusion is isotropic on the sub-manifold (that represents the surface of the image) embedded in anatomical?feature space (see Fig. 1), which is anisotropic in anatomical space. This is somewhat related to the random field approach (Worsley et al., 1999), where isotropic smoothing is attained by smoothing along an induced manifold. In our application we use anisotropic diffusion as an empirical spatial prior in a Bayesian setting.


The model

In this section, we formulate a two-level GLM in terms of matrix-variate normal densities (Gupta and Nagar, 2000). Our focus is the formulation of this as a multivariate normal model, with emphasis on covariance components and their hyperparameters. We start with a linear model, under Gaussian assumptions, of the form

[Formula ID: fd2]
(3) 
Y=X?+?1p(Y,?|X)=p(Y|X,?)p(?)?=?2?p(Y|X,?)=MN(X?,S1?K1)?i?MN(0,Si?Ki)p(?)=MN(0,S2?K2)
where the left-hand expressions specify a hierarchical linear model and the right-hand expressions define the implicit generative density in terms of a likelihood, p(Y|X, ?) and prior, p(?). MN stands for matrix-normal, where the density on matrix
A?Rr?c,A?MN(M,S?K)
, has a mean, M, of size r???c, with covariances, K and S, of size c???c and r???r, that encode covariance between columns and rows respectively2. Here, Y is a T???N data matrix and X is a T???P design matrix with an associated unknown P???N parameter matrix ?.

The errors at both levels have covariance Si over rows (e.g., time, subjects or regressors) and Ki over columns (e.g., voxels). In this paper Si are fixed. Eq. (3) is a typical model used in the analysis of fMRI data comprising T scans, N voxels and P parameters. The addition of the second level places empirical shrinkage priors on the parameters. This model can now be simplified by vectorizing each component using the identity vec(ABC)?=?(CT?A)vec(B) (see Appendix I and Harville, 1997)

[Formula ID: fd3]
(4) 
y=Zw+e1w=e2ei?N(0,?i)
where y?=?vec(Y), Z?=?IN?X, w?=?vec(?), ei?=?vec(?i) and ?i?=?Ki?Si. ? is the Kronecker product of two matrices and IN is the identity matrix of size N. The unknown covariances ?(?)1 and ?(?)2 depend on hyperparameters, ?. The model parameters and hyperparameters are estimated using expectation maximization (EM) by maximizing the log-marginal likelihood

(5) 
lnp(y|?)?F=?12(ln|?|+yT??1y+TNln2?)?(?)=?1+Z?2ZT
with respect to the parameters in the E-Step and the covariance hyperparameters in the M-Step. Here, ?(?) represents the covariance of the data induced by both levels of the model. The model inversion with EM will be described later (see also Appendices I and II). First, we look at the hyperparameterization of the spatial covariances and the specific forms of K(?) entailed by ?i?=?Ki?Si.


The priors

In the previous section, we reduced the problem of specifying a linear empirical Bayesian model to the specification of prior covariance components for noise and signal. In this section, we introduce diffusion-based spatial priors and consider adaptive priors that are functions of the parameters. In brief, we will assume the error or noise covariance is spatially unstructured; i.e., ?1?=?K1?S1, where K(?)1?=??1IN and S1?=?IT. This means that ?1 is the error variance. For simplicity, we will assume that this is fixed over the same for all voxels; however, it is easy to specify a component for each voxel, as in conventional mass-univariate analyses.

For the signal, we adopt an adaptive prior using a non-stationary diffusion kernel, which is based on a weighted graph Laplacian (see Chung, 1991 and next section), L(?), which is a function of the conditional expectation of the parameters, ??=??w?

[Formula ID: fd4]
(6) 
K(?)2=?2KDKD=exp(?L(?)?)

This means the hyperparameters comprise, ??=?{?1,?2,?}, where the first hyperparameter controls a stationary independent and identical (i.i.d.) noise component, the second the amplitude of the parameter image and third its dispersion. The matrix L in Eq. (6) is a weighted graph Laplacian, which is a discrete analogue of the Laplace?Beltrami operator used to model diffusion processes on a Riemannian manifold. The solution of the heat equation is3


(7) 
??=?L(?)??

?(t+?)=K(?(t),?)?(t)

K(?(t),?)=exp(?L(?)?)
The diffusion kernel, K?=?exp(??L?), is the local solution to the heat equation on a graph and corresponds to a symmetric diffusion kernel that encodes the dispersion of ?, over a period ?. The diffusion kernel also evolves according to the heat equation

(8) 
K?=?LKK(t+?)=exp(?L?)K(t)
We use this diffusion kernel as the covariance matrix of a GPM. Generally, the Laplacian is a function, L(?(m)), of the current image (of parameter expectations), where the superscript indicates the mth iteration. In this situation, Green's function is a composition of local solutions.

(9) 
K(?)2=?2KDKD(m)=P(m)P(m?1)?P(1)=P(m)KD(m?1)P(m)=exp(??(m)L(m))

Updating KD(m) requires computation of the current Laplacian L(m) and a matrix multiplication, both of which are costly operations on large matrices. However, if the Laplacian is approximately constant then KD(m) can be evaluated much more simply


(10) 
KD(m)=P(m)P(m?1)?P(1)?exp(?(?(m)+?(m?1)+?+?(1))L)=exp(?L?)
This approximation retains the edge preserving character of the diffusive flow, without incurring the computational cost of re-evaluating the Laplacian. Our experience is that weighted graph Laplacians based on the ordinary least squares (OLS) estimate, ?ols, gives very reasonable results. However, the need to update the Laplacian may arise when the OLS parameter estimate is very noisy. All anisotropic Laplacian priors in this paper are based on ?ols; however, we have included update equations based on the Baker?Campbell?Hausdoff formula in Appendix IV.

In summary, the covariance components and their derivatives are:


(11) 
K1=?1IN

K2=?2exp(?L(?ols)?)

?K1/??1=IN

?K2/??2=KD

?K2/??=??2LKD
where their hyperparameters ??=?{?1, ?2, ?} are optimized to ensure an optimal balance between signal and noise and that the parameter estimates have an optimal, non-stationary and non-isotropic smoothness encoded by the spatial covariance, K2. In the next section, we review graph Laplacians and the diffusion model in more detail and then conclude with a summary of the EM scheme used for optimization.


Diffusion on graphs

In this section, we describe diffusion on graphs and illustrate how this furnishes useful spatial priors for parameters at each voxel. This formulation is useful as it is easily extended to vector and matrix-valued images, which will be necessary when modeling a general vector-field of parameter estimates, e.g., when the number of columns of the design matrix is greater than one. We start with some basic graph theory and then discuss diffusion in terms of graph Laplacians. The end point of this treatment is the form of the diffusion kernel, KD, used in the spatial prior of the previous section. We will see that this is a function of the parameters that enables the prior smoothness to adapt locally to non-stationary features in the image of parameter estimates.

GLM parameters as a function on a graph

We consider the parameter estimates as a function on a graph, ?, with vertices, edges and weights, ??=?(V, E, W). The vertices are indexed 1 to N and pairs are connected by edges, Ekn, where (k, n)???V. If two vertices are connected, i.e., are neighbors, we write k???n. Consider a regular 2D mesh with spatial coordinates u1 and u2. Fig. 1a shows a surface plot of OLS parameter estimates of synthetic data described later (see Fig. 4) to illustrate the construction of a weighted graph Laplacian. To simplify the discussion we concentrate on a small region over a 3???3 grid, or stencil (see inset). Pairs of numbers u1, u2 indicate a vertex or pixel location, where each number corresponds to a spatial dimension. The function has a value at each pixel (voxel if in 3D anatomical space) given by its parameter estimate ?(u), so that three numbers locate a pixel at which a parameter has a specific value, u1, u2, ?(u1, u2). These are coordinates of the parameter estimate at a pixel in Euclidean space,

R3
, which decomposes into ?anatomical? and ?feature? space coordinates (lower right of Fig. 1a). In this case these are 2 and 1 respectively. The 2D image is considered as a 2D sub-manifold of this 3D embedding space (Sochen et al., 1998), which provides a general framework that is easily extended to 3D anatomical space and feature dimensions greater than one. We represent the kth pixel by vk. Distance between two pixels is taken as the shortest distance along the 2D sub-manifold of parameter estimates embedded in
R3
. This is a geodesic distance between points on the sub-manifold, ds(vk, vn). This is shown schematically in Fig. 1c between neighboring pixels. The shortest distance is easy to compute for direct neighbors (example shown in red), however, if the stencil were larger then fast marching algorithms (Sethian, 1999) may be used to compute the shortest path between two points on the sub-manifold. Note that the displacement along the feature coordinates is scaled by a, such that if a?=?0, then ds is reduced to distance on the 2D domain and is no longer a function of image intensity (see subsection on special cases). The construction of a weighted graph Laplacian starts by specifying weights of edges between vertices, wkn. These are a function of the geodesic distance, ds(vk, vn), and are important for specifying non-stationary diffusion. This is shown in Fig. 1b for the 3???3 stencil in Fig. 1a.

Graph Laplacian

As mentioned above, a graph is composed of vertices, edges and weights. Neighboring vertices are encoded by the adjacency matrix, A, with elements


(12) 
akn={1fork?n0otherwise
Weights make up a weight matrix, W, with elements
[Formula ID: fd5]
(13) 
wkn={exp(?ds(vk,vn)2/?)fork?n0otherwise

The un-normalized Laplacian of ? is L?=?D???W, where D is a diagonal matrix with elements Dkk?=??nwkn, which is the degree of the kth vertex. The graph Laplacian is sometimes called the admittance or Kirchhoff matrix. The weights wkn???[0, 1] encode the relationship between neighboring pixels and are symmetric; i.e., wkn?=?wnk. They play the role of conductivities, where a large value enables flow between pixels. ? is a constant that controls velocity of diffusion, which we set to one.

The weights are a function of the distance, ds(vk, vn), on the surface of the function, ?(u), between vertices vk and vn. It is this distance that defines the nature of diffusion generated by the weighted graph Laplacian. In brief, we will define this distance to make the diffusion isotropic on the surface of the parameter image. This means, when looked at from above, that the diffusion will appear less marked when the spatial gradients of parameters are high. In other words, diffusion will be attenuated at the edges of regions with high parameter values. More formally, we specify the distance by choosing a map, ?, from the surface of the function ?(u) to an embedding space, the Euclidean space of

R3
. Each space has a manifold and metric, (M, g) and (N, h), respectively (see Appendix III for more details and a heuristic explanation).

(14) 
?:M?N?:u?(?1(u),?2(u),?3(u))=(u1,u2,?(u1,u2))

Choosing a metric, H, of the embedding space (see below) and computing the Jacobian, J, we can calculate the induced metric, G, on ?(u) (Sochen et al., 1998). In matrix form, these are


(15) 
H=(a1000a2000a?)J=???u=(1001??1??2)
where ai are the relative scales among dimensions and derivatives are with respect to physical space; i.e., ?x?=???/?x, which are computed using central differences. The induced metric is then
[Formula ID: fd6]
(16) 
G=JTHJ=(a1+a??u12a??u1?u2a??u1?u2a2+a??u22)
which is used to calculate distance

(17) 
ds2=duTGdu
where du?=?(du1, du2)T. Due to the dependence of the graph Laplacian on the parameters we write the Laplacian as L?=?L(??ols), where the Laplacian is computed using the OLS estimate, ?ols. As this depends on geodesic distances on the embedded sub-manifold of an image we call it a geodesic graph Laplacian (GGL). If a??=?0 then the Laplacian is based on Euclidean distance in anatomical space. We refer to this as a Euclidean graph Laplacian (EGL) (see also subsection below on special cases). Note that we have chosen the embedding coordinates and embedding space metric. This is one of the advantages of a geometric formulation as we could have chosen a non-Euclidean anatomical space, e.g., spherical coordinates to model diffusion on the surface of a sphere (see Sochen et al., 2003; Sochen and Zeevi, 1998). The diffusion kernel can be computed efficiently using an eigenvalue decomposition of the Laplacian.

(18) 
L=V?VTdiag(?)=(?0,?1,?,?N?1),?i?0K=Vexp(??)VT

This is a standard method for computing the matrix exponential (Moler and Van Loan, 2003) with the added benefit that knowing the eigensystem simplifies many computations in the algorithm. See Appendix IV for more details.


Expectation maximization

Inversion of the multivariate normal model in Eq. (4) is straightforward and can be formulated in terms of expectation maximization (EM). EM entails the iterative application of an E-Step and M-Step (see Appendix 1 and Friston et al., 2002, 2007 for details). The E-Step evaluates the conditional density of the parameters in terms of their expectation and precision (i.e., inverse variance); p(w|y,?)?=?N(?, ???1), where

[Formula ID: fd7]
(19) 
E?Step?=??1ZT?1?1y?=?2?1+ZT?1?1Z
The unknown covariances ?(?)i?=?Ki?Si are functions of covariance hyperparameters, ?. These are estimated by maximizing the log-marginal likelihood, ln p(y|?), in an M-Step. This involves updating the hyperparameters by maximizing a bound on the log-marginal likelihood or log-evidence

(20) 
lnp(y|?)?F=?12(ln|?|+yT??1y+TNln2?)

We update hyperparameters (indexed by subscripts) using a Fisher-scoring scheme4, where ?? represents incremental change of ?


(21) 
M?Step??=I(?)?1??F?F??k=?12tr(??1????k)+12yT??1????k??1yIkl=???2F??k??l?=12tr(??1????k??1????l)
I(?) is the expected information matrix, see Wand (2002), with element Ikl, where the expectation, ??, is over the marginal likelihood of the data, ??F is the score, i.e., a vector of gradients (kth element given by ?F/??k) with respect to the hyperparameters and ? is the current maximum likelihood estimate of the data covariance (see Appendix I). In the examples below, we fix S1?=?IT and S2?=?1; this means the only unknown covariances are K(?)i. This scheme is formally identical to classical restricted maximum likelihood (ReML) (see Friston et al., 2007).

In summary, to invert our model we simply specify the covariances K(?)i and their derivatives, ?K/??i. These enter an M-Step to provide ReML or ML-II estimates of covariance hyperparameters. K(?)i are then used in the E-Step to provide the conditional density of the parameters. E- and M-Steps are iterated until convergence, after which, the objective function for the M-Step can be used as an approximation to the models log-evidence. This quantity is useful in model comparison and selection, as we will see later when comparing models based on different spatial priors.

We now have all the components of a generative model (shown schematically in Fig. 2) that, when inverted, furnishes parameter estimates that are adaptively smooth, with edge preserving characteristics. Furthermore, this smoothing is chosen automatically and optimizes the evidence of the model. Before applying this scheme to synthetic and real data we will consider some special cases that will be compared in the final section.


Special cases
Linear diffusion: a??=?0

If we set the scale of the parameter dimension a??=?0 (see Eq. (16) and Fig. 1c) we recover linear diffusion. The Laplacian (EGL) is now independent of ?. In this case edges are not preserved by the smoothness prior. Although these kernels are not the focus of this paper they are still useful and, as we demonstrate later, produce compelling results compared to non-spatial priors.

Global shrinkage priors: KD?=?I

If we removed diffusion by setting the Laplacian to zero then K2?=??2IN. This corresponds to global or spatially independent (shrinkage) priors (GSP) of the sort used in early posterior probability maps using empirical Bayes (Friston and Penny, 2003). Here, we use the variability of parameter estimates over pixels to shrink their estimates appropriately and provide a posterior or conditional density.

Ordinary least squares estimate: K2?=?0

The OLS estimate obtains when we remove the empirical priors completely by setting K2?=?0.


Synthetic data: de-noising an image

In this section, we apply the algorithm to one image, i.e. T?=?1, to demonstrate edge preservation and provide some intuition as to how this is achieved using a diffusion kernel based on a GGL and compare it to EGL. We use synthetic data shown in Fig. 3. The model is


(22) 
y=w+e1w=e2

The central panel of Fig. 3a contains a binary image of a 2D closed curve (with values equal to one within and on the curve) on a circular background of zeros. Gaussian noise has been added, which we will try to remove using a GPM based on a diffusion kernel. The left panel shows the conditional expectation or mean using a diffusion kernel from a EGL, while GGL is shown on the right. Below each image is a color bar, which encodes the grey-scale of each pixel in the image. It is immediately obvious that smoothing with a EGL removes noise at the expense of blurring the image, while the GGL preserves edges. This is reflected in the values of log-marginal likelihood achieved for each prior (see Table 1). We will discuss the ratio of these values in the next section when we consider model comparison.

Fig. 3b shows contours of local diffusion kernels at three locations in the image, where the local diffusion kernel at the kth pixel is a 2D image reconstructed from the appropriate row of K2. These are superimposed on the smoothed image. Locations include (i) within the perimeter of the object, (ii) near the edge inside the perimeter of the object and (iii) outside the perimeter of the object. The EGL is on the left, where local kernels are the same through-out the image. This is not the case for the GGL on the right. The EGL smoothes the image isotropically, i.e., without a preferred direction. On the right, local kernels within and outside the perimeter of the object are different, depending on their relation to the edge of the central object. The contours of kernels within the perimeter spread into the interior of the object, but stop abruptly at the edge, encoding the topology of the surface, much like the contours on a geographic map. As a result the image is smoothed such that noise is removed, but not at the expense of over smoothing the edges of signal. This is shown in Fig. 3c for a cross-section at the level indicated by a dotted line in Fig. 3b. This shows the original binary image, noisy image, smoothing with EGL and GGL.

Further intuition comes from considering the induced metric tensor, G; consider the square-root of the determinant,

det(G)
, which is the ratio of surface and domain areas. An area element on the surface of ?(u) is
dAs=det(G)du1du2
, while one on the domain is dAD?=?du1du2. This gives a ratio
dAS/dAD=det(G)
, which can be calculated at each location of the image. This is referred to as the magnification factor (Bishop, 1999). This provides a scalar value at each pixel that represents a salient difference between the sub-manifold of the function, compared to the flat surface of the domain. This is shown in Fig. 3d. Flat regions have ratio of about one, while edges are greater than unity. High values correspond to locations where the distance on ?(u) between adjacent pixels (see Fig. 1) is large; i.e., at an edge, where gradients are large. This results in a small weight across the edge connecting these pixels and reduced flow. The effect is that regions with large gradients have less smoothing. As large gradients are a feature of edges, this means that they are preserved. To highlight the anisotropic nature of the ensuing diffusion, we have super-imposed ellipses representing the orientation and magnitude (eigenvectors and eigenvalues respectively) of G at a selection of different locations. Red and blue ellipses represent
det(G)
greater and lower than 2, respectively. It can be seen that the metric tensor is aligned with the edge of the central figure and isotropic elsewhere. This leads to preferential diffusion within the image and edge preservation.

Lastly, we have included a representation of a global property of the graph Laplacian using the graph plot routine in Matlab (gplot.m) of the second and third eigenvectors (of the Laplacian) in Figs. 3e and f. This is a standard representation of similarity between vertices on a graph. The second eigenvector is known as the Fiedler vector (Chung, 1991) and is used in graph-partitioning algorithms. The EGL is regular, whereas the GGL illustrates partitioning of the image into two distinct parts; the central region, which represents the central object in the image of Fig. 3a, while the background is represented by the periphery of the plot.


Evaluations

In this section, we compare the performance of three different Gaussian process priors used to model the same data. These were global shrinkage priors (GSP) and diffusion kernels from Euclidean (EGL) and geodesic graph Laplacians (GGL).

Synthetic data: random effects analysis

The next simulation is similar to the above except now we have twelve samples of the image. Their purpose is to demonstrate posterior probability maps (PPMs) and model selection. The samples, original image, OLS estimate and estimated posterior means are shown in Figs. 4a?c. Fig. 4d compares PPMs using the three different priors. The first observation is enhanced detection of the signal with EGL and GGL compared to GSP. The second is the edge preserving nature of GGL. The evidence for each model is shown in Table 2. As expected, given data with edges, the GGL attains the highest evidence. The log-marginal likelihood ratio (natural logarithm) for EGL and GGL was 146, which is very strong evidence in favor of the GGL. A direct comparison of the log-marginal likelihood is possible as the number of hyperparameters is equal for EGL and GGL. Additional penalty terms can be included for model comparison based on the number of hyperparameters used in a model and their uncertainty (Bishop, 1995). Details of these additional terms are included in Appendix II.

Real data: random effects analysis

fMRI data collected from twelve subjects during a study of the visual motion system (Harrison et al., 2007) were used for our comparative analyses. The study had a 2???2 factorial design with motion type (coherent or incoherent) and motion speed as the two factors. Single subject analyses were performed, with no smoothing, using SPM2 (http://www.fil.ion.ucl.ac.uk/spm) to generate contrast images of the main effect of coherence. Images (one slice) of the twelve contrast images are shown in Fig. 5a. These constitute the data, Y, and the design matrix, X?=?1, was a column of ones, implementing a single-sample t-test. The aim was to estimate ?(u); the conditional expectation of the main effect of coherent motion as a function of position in the brain. We calculated ?(u) under the different priors above.

For demonstration purposes we selected a slice of the whole brain volume, which contained punctuate responses from bilateral posterior cingulate gyri (pCG). The conditional expectations under the Laplacian priors are shown in Fig. 5b for EGL and GGL. Although regions of high or low parameter estimates can be distinguished in the EGL analysis (left panel), the borders are difficult to make out. The posterior mean of GGL is different with well-delineated borders between regions of high and low coefficients, e.g., increased response in pCG. Activated regions are no longer ?blobs?. This feature is easily seen in Fig. 5c with contour plot of a local kernel superimposed.

Posterior probability maps (Friston and Penny, 2003) of full slices for EGL and GGL are shown in Fig. 5d with thresholds p(w?>?0.5)?>?0.95 and close-ups of all three priors in Fig. 5e. The odd-one-out is the model with global shrinkage priors that had only one positive region within the whole slice. Surface plots are shown in Figs. 5f?h and graph embeddings in Figs. 5i and j. Note the vertical axes of surface plots showing largest magnitude for GGL and the degree of shrinkage with GSP.

The log-marginal likelihood for each model is shown in Table 2. The highest log-evidence was for GGL. The difference in log-evidence for the geodesic and Euclidean Laplacian priors was 260. This represents very strong evidence that the data were likely to be generated from a field with spatial covariance given by GGL compared to EGL. This sort of model comparison suggests that the data support the use of adaptive diffusion kernels when modeling spatial covariance of activation effects.


Discussion

We have outlined a Bayesian scheme to estimate the optimal smoothing of conditional parameter estimates using a Gaussian process model. In this model, the prior covariance uses a diffusion kernel generated by a weighted graph Laplacian.

There are many issues to consider. We have only demonstrated the model using scalar-valued parameter images. A typical GLM of single subject data has a vector of parameters of size P???1, at each voxel, with a covariance matrix, size P???P, where the design matrix has P columns. This means that there is a vector-field of regression coefficients over a brain volume. The weights of a GGL can be a function of scalars, vectors or matrices, which make it very flexible. For example, a GGL based on distance between parameter vectors at different voxels is easily implemented using the scheme presented here. More complex spaces, such as a field of symmetric positive definite (SPD) matrices, used to regularize Diffusion Tensor Images (DTI) (Chefd?hotel et al., 2002; Tschumperle and Deriche, 2003; Zhang and Hancock, 2006a), require methods from Lie group analysis, where a SPD matrix is represented as a sub-manifold of

RP2
. Matrices can be represented by vectors and probabilities over such a space can be represented by Gaussian densities (Begelfor and Werman, 2005), which suggests the possibility of using a Gaussian process prior over a spatial distribution of SPD matrices, or a Lie?Gaussian process prior. We also have considered the simplest noise model in this paper; however, noise models that vary over space, i.e., a heteroscedastic noise process, are also easily formulated using Gaussian process priors (see Chapter 5, Rasmussen and Williams, 2006) and Goldberg et al., 1998; Kersting et al., 2007). A possible use in fMRI is a GPP over autoregressive model coefficients in single subject data-sets following Penny et al. (2007).

A major computational issue is the time needed to compute the eigensystem of the Laplacian from which the matrix exponential, inverse, trace and determinant can be computed. The computational complexity scales with N3, which is an issue for large data-sets. We have made no attempt to address this issue here, as our focus was on the combination of graph-theoretic approaches to image processing and spatial GPMs. The time taken to process the 3319 voxels in the random-effects analysis above was about 20?min on a standard personal computer. This has to be reduced, especially if a whole volume is to be processed. An advantage of a geometric formulation of the Laplacian is that 2D coordinates of the cortical surface can be used as the anatomical space and suggests using a cortical mesh, similar to that used in MEG source reconstruction. The cortical mesh is constructed from anatomical MRI and contains subject-specific anatomical information. A GPP based on such a diffusion kernel provides a way to formulate not only anatomically, but also functionally informed basis functions, thereby extending work by Kiebel et al. (2000).

There is a growing literature on sparse GPPs for regression (Lawrence, 2006; Quinonero-Candela and Rasmussen, 2005) used to formulate an approximate instead of a full GPP for use on large data-sets. Alternatively, multi-grid methods may be used (Larin, 2004) or we can utilize the graphical structure of the model and apply graph-theoretic methods to optimally partition a graph into sub-graphs or nested graphs (Tolliver et al., 2005). Recently, the computationally efficient properties of wavelets have been used to adaptively smooth fMRI parameter images (Flandin and Penny, 2007). Diffusion wavelets are an established method for fast implementation of general diffusive processes (Coifman and Maggioni, 2006; Maggioni and Mahadevan, 2006) and suggest an alternative implementation of the current method.

The issue of inverting large matrices is avoided by using the mean-field approximation of Penny et al. (2005). The spatial precision matrix they used is equivalent to the Euclidean graph Laplacian used here. This encodes local information, given by the neighborhood of a vertex, which they use in a variational Bayes scheme to estimate scaling parameters for each regressor, given data. The spatial covariance matrix can be calculated from the matrix exponential of this, which requires consideration when dealing with large data-sets as outlined above. What we get in return is a full spatial covariance that encodes global properties of the graph and the possibility of multivariate statistics over anatomical space.

As we focused on second level (between-subject) analyses the issue of temporal dynamics did not arise. However, for single-subject data this is not the case. A sensible approach would be to use a GLM to summarize temporal features in the signal and adaptively smooth over the vector-field of GLM regression coefficients, as mentioned above. Alternatively, a Kalman filter approach could be used; however, this may be more appropriate for EEG/MEG. The resulting algorithm would have a GPP for spatial signal with temporal covariance constrained by the Kalman filter.

The application of the current method to random-effects analysis was for demonstration purposes only. The method may also be useful in modeling functional and structural data from lesion studies, retinotopic mapping in fMRI, high-resolution fMRI and diffusion tensor imaging (Faugeras et al., 2004; Zhang and Hancock, 2006a).


Appendix A  Linear algebra for the EM scheme

This appendix provides notes on the linear algebra used to compute the gradients and curvatures necessary for the EM scheme in the main text. They are not necessary to understand the results presented above but help optimize implementation.

We require the bound on the log-marginal likelihood, ln p(y|?)

[Formula ID: fd8]
(A.1) 
F=?12(ln|?|+yT??1y+TNln2?)?(?)=?1+Z?2ZT?i=Ki?Si
and its derivatives. The first term of A.1 is
[Formula ID: fd9]
(A.2) 
ln|?|=ln|?1|+ln|?2|+ln|?2?1+ZT?1?1Z|=ln|?1|+ln|?2|+ln|?|
see Appendix of Rasmussen and Williams (2006). This can be reduced further using

(A.3) 
ln|?i|=ln|Ki?Si|=rank(Si)ln|Ki|+rank(Ki)ln|Si|
The second term of A.1 is
[Formula ID: fd10]
(A.4) 
yT??1y=tr(YTA)A=S1?1?^1K1?1
where we have used vec(A)T vec(B)?=?tr(ATB) and ??1?=?Y???X?? is the matrix of prediction errors, where ??=?vec(??) (see Eq. (19)).

The conditional precision is


?=ZT?1?1Z+?2?1=K1?1?XTS1?1X+K2?1?S2?1

Given the number of columns, P, of the design matrix used in this paper is one (i.e., a scalar field of parameter estimates) and K2 includes a scale term, ?2, we can assume XTS1??1X?=?S2??1 (note that this is not the case in general). The conditional precision then factorizes and we avoid computing the Kronecker tensor products implicit in the EM scheme. The precision then simplifies to


(A.5) 
?=K??1?S??1K??1=K1?1+K2?1S??1=S2?1
However, more generally, i.e., P?>?1

(A.6) 
??1=??1???1V2(D2+V2T??1V2)?1V2T??1??1=K1?(XTS1?1X)?1V2=VK2?VS2D2=DK2?DS2
where we have used the matrix inversion lemma (see below), eigenvalue decomposition; ?2?=?V2D2V2??1 and

(A.7) 
C=A?B=(VA?VB)(DA?DB)(VA?VB)?1
where V and D are eigenvectors and eigenvalues respectively of matrices A and B and in this case V??1?=?VT. Using S1?=?IT and S2?=?1 the conditional mean decomposes into the OLS estimate and a shrinkage term due to the priors.

(A.8) 
?=?ols?P?olsP=??1V2(D2+V2T??1V2)?1V2T?ols=vec((XTX)?1XTY)

To compute the derivatives required for the M-Step, we use the matrix inversion lemma (MIL)


(A.9) 
(?1+Z?2ZT)?1=?1?1??1?1Z(?2?1+ZT?1?1Z)?1ZT?1?1=?1?1??1?1Z??1ZT?1?1
and standard results for Kronecker tensor products to show the score and expected information reduce to

(A.10) 
?F??k=?12(tr(Ak)tr(Bk)?tr(Ck)tr(Dk)?tr(A?kATB?kTA))Ikl=12tr((Ak?Bk?Ck?Dk)(Al?Bl?Cl?Dl))

The second line can be simplified further using tr(A?B)?=?tr(A)tr(B). Expressions for ?k, B?k, Ak, Bk, Ck, and Dk are given in Table 3, where we have used XTS1??1X?=?S2??1 (special case where P?=?1).


Appendix B  Computing the model evidence; accuracy and complexity

Given A.2 and A.4 we can write the bound on the log-marginal likelihood as

[Formula ID: fd11]
(A.11) 
F=?12(ln|?1|+ln|?2|+ln|?|+tr(YTA)+TNln2?)

After convergence of the EM scheme, this bound can be used as an approximation to the log-evidence, which can be expressed in terms of accuracy and complexity terms,


(A.12) 
F=?12(ln|?1|+e^1T?1?1e^1+TNln2?+ln|?2|+ln|?|+?T?2?1?)
where ?1?=?vec(??1). The first three terms of A.12 represent the accuracy and remaining terms complexity. To see the equivalence of A.11 and A.12 more clearly,

yT??1y=yT?1?1y??T??=yT?1?1y?2?T??+?T??=yT?1?1y?2?TZT?1?1y+?TZT?1?1Z?+?T?2?1?=(y?Z?)T?1?1(y?Z?)+?T?2?1?
where the MIL and Eq. (19) are used in the first line and Eq. (19) again in the second. To see the decomposition into accuracy and complexity terms consider the likelihood, prior and posterior of the parameters (p(y|w,?), p(w) and q(w) respectively), note that

(A.13) 
F(q,?)=?dwq(w)lnp(y|w,?)??dwq(w)lnq(w)p(w)=?L(w)?q(w)?KL(q(w)||p(w))?L(w)?q(w)=?12(ln|?1|+e^1T?1?1e^1+tr(ZT?1?1Z??1))KL(q||p)=12(ln|?2?|+?T?2?1?+tr(?2?1??1)?tr(IPN))
where ?L(w)?q(w) is the average log-likelihood under q(w) and the Kullback?Leibler (KL) divergence is a penalty term on the parameters. We have used standard results for Gaussian densities N(w; m, S)

(A.14) 
?dx(??Wx)T??1(??Wx)N(m,S)=(??Wm)T??1(??Wm)+tr(WT??1WS)KL(q||p)=12ln|?1?0?1|+12tr(?1?1((?0??1)(?0??1)T+?0??1)
where the KL divergence is over two Gaussian densities q(w)???N(?0, ?0) and p(w)???N(?1, ?1), i.e., are the posterior and prior densities over the parameters.

In practice, optimization of non-negative scale parameters in the M-Step uses the transform; ?i?=?ln?i. The derivatives in Table 3 are then ?K/??i?=??i?K/??i. Under this change of variables, the hyperparameters have non-informative log-normal hyper-priors. Uncertainty about the hyperparameters can be included in the log-evidence for any model m. For example, the approximate log-evidence including uncertainty of one hyperparameter is


(A.15) 
lnp(y|m)?lnp(y|?,m)?12ln?2F??2=lnp(y|?,m)+12lnI
Where the second-order derivative is the expected information used in the M-Step. See Friston et al. (2007) for details.


Appendix C  Metrics on manifolds

The intuition behind the induced metric comes from considering Pythagoras' theorem. See Fig. 1c (inset).


(A.16) 
ds2=du2+ad?2=(1+a(d?du)2)du2=Gdu2

More formally, consider a one-dimensional curve embedded in two-dimensional Euclidean space. A map from one manifold, (M, g), to another, (N, h), where G and H are metrics associate with each respectively, is


(A.17) 
?:M?N?:u?(?1(u),?2(u))=(u,?(u))
Where u is a local coordinate on the curve and ?1 and ?2 are coordinates in the embedding space. A distance ds on the curve in terms of du is given by

(A.18) 
H=(100a)J=???u=(1?u)G=JTHJ=(1?u)(100a)(1?u)=1+a?u2ds2=Gdu2

Where the relative scale between the domain and feature coordinates is a and G is the induced metric, i.e., metric on the curve.


Appendix D  Computing the graph Laplacian

We assemble the graph Laplacian using a 3???3 stencil with 8 nearest neighbors. See Fig. 1 showing how the distance between function values (parameter image) at different points in the embedded sub-manifold are represented by a graph with edge weights, wkn. Computing the eigensystem of the graph Laplacian simplifies many computations; for example


(A.19) 
L=V?VTdiag(?)=(?0,?1,?,?N?1), where?i?0K=Vexp(??)VTK?1=Vexp(?)VT|K|=?i=0N?1exp(??i)tr(K)=?i=0N?1exp(??i)

The third line affords a way to compute the matrix exponential (Moler and Van Loan, 2003). The issue with updating the graph Laplacian with each iteration based on the posterior mean of the parameters is that it entails a composition of non-commuting matrices. An approximation is possible using a truncation of the Baker?Campbell?Hausdoff (BCH) formula (Rossmann, 2002).


(A.20) 
K(m)=exp(Q(m))K(m+1)=exp(?L(m)?(m+1))K(m)=exp(Q(m+1))Q(m+1)=Q(m)+dQ(m+1)dQ(m+1)??(L(m)?(m+1)+12[L(m)?(m+1),Q(m)])
where we have used the first three terms of the BCH formula;

(A.21) 
C=log(exp(A)exp(B))C?A+B+12[A,B]
[A, B] is the commutator of operators, A and B given by [A, B]?=?AB???BA.


Notes

1Element-wise exponential as opposed to matrix exponential used in Eq. (6).

2This means that the vectorized random matrix has a multivariate normal density, given by vec(AT)???N(vec(MT),S?K) and vec(A)???N(vec(M),K?S).

3Minus sign used by convention.

4This is equivalent to a Newton step, but using the expected curvature as opposed to the local curvature of the objective function.

Acknowledgment

The Wellcome Trust and British Council funded this work.


References
Alvarez L.,Lions P.L.,Morel J.M.. Image selective smoothing and edge-detection by nonlinear diffusion: 2SIAM J. Numer. Anal. 1992;29:845–866.
Begelfor E.,Werman M.. How to put probabilities on homographiesIEEE Trans. Pattern Anal. Mach. Intell. 2005;27:1666–1670. [pmid: 16238000]
Bishop C.. Neural Networks for Pattern Recognition1995Oxford University Press; Oxford:
Bishop C.M.. Latent variable modelsJordan M.I.Learning in Graphical Models 1999MIT Press; Massachusetts: :371–403.
Chefd'hotel C.,Tschumperle D.,Deriche R.,Faugeras O.. Constrained flows of matrix-valued functions: application to diffusion tensor regularizationComput. Vis. - Eccv 2002;2350(Pt 1):251–265.
ChefD'Hotel C.,Tschumperle D.,Deriche R.,Faugeras O.. Regularizing flows for constrained matrix-valued imagesJ. Math. Imaging Vis. 2004;20:147–162.
Chung F.. Spectral Graph Theory1991American Mathematics Society; Providence Rhode Island:
Chung M.K.,Worsley K.J.,Robbins S.,Paus T.,Taylor J.,Giedd J.N.,Rapoport J.L.,Evans A.C.. Deformation-based surface morphometry applied to gray matter deformationNeuroImage 2003;18:198–213. [pmid: 12595176]
Coifman R.R.,Maggioni M.. Diffusion waveletsAppl. Comput. Harmon. Anal. 2006;21:53–94.
Cosman E.R.,Fisher J.W.,Wells W.M.. Exact MAP activity detection in fMRI using a GLM with an Ising spatial priorMed. Image Comput. Comput. -Assist. Interv. - Miccai 2004;3217(Pt 2):703–710. (Proceedings)
Faugeras O.,Adde G.,Charpiat G.,Chefd'Hotel C.,Clerc M.,Deneux T.,Deriche R.,Hermosillo G.,Keriven R.,Kornprobst P.. Variational, geometric, and statistical methods for modeling brain anatomy and functionNeuroImage 2004;23:S46–S55. [pmid: 15501100]
Flandin G.,Penny W.D.. Bayesian fMRI data analysis with sparse spatial basis function priorsNeuroImage 2007;34:1108–1125. [pmid: 17157034]
Friman O.,Borga M.,Lundberg P.,Knutsson H.. Adaptive analysis of fMRI dataNeuroImage 2003;19:837–845. [pmid: 12880812]
Friston K.J.,Penny W.. Posterior probability maps and SPMsNeuroImage 2003;19:1240–1249. [pmid: 12880849]
Friston K.J.,Penny W.,Phillips C.,Kiebel S.,Hinton G.,Ashburner J.. Classical and Bayesian inference in neuroimaging: theoryNeuroImage 2002;16:465–483. [pmid: 12030832]
Friston K.,Mattout J.,Trujillo-Barreto N.,Ashburner J.,Penny W.. Variational free energy and the Laplace approximationNeuroImage 2007;34:220–234. [pmid: 17055746]
Gerig G.,Kubler O.,Kikinis R.,Jolesz F.. Nonlinear anisotropic filtering of MRI dataIEEE Trans. Med. Imag. 1992;11:221–232.
Goldberg P.W.,Williams C.K.I.,Bishop C.. Regression with input-dependent noise: a Gaussian process treatmentNIPS 10. 1998MIT Press;
Gossl C.,Auer D.P.,Fahrmeir L.. Bayesian spatiotemporal inference in functional magnetic resonance imagingBiometrics 2001;57:554–562. [pmid: 11414583]
Grady L.,Schwartz E.L.. The Graph Analysis Toolbox: Image Processing on Arbitrary Graphs2003Boston University; Boston, MA:
Gupta A.K.,Nagar D.K.. Matrix Variate Distributions2000Chapman and Hall/CRC; Boca Raton:
Harrison L.,Stephan K.E.,Rees G.,Friston K.. Extra-classical receptive field effects measured in striate cortex with fMRINeuroImage 2007;34(3):1199–1208. (Feb 1) [pmid: 17169579]
Harville D.. Matrix Algebra from A Statistician's Perspective1997Springer Science+Business Media Inc; New York:
Jordan M.I.Learning in Graphical Models. 1999The MIT press;
Kersting, K.. Plagemann, C.. Pfaff, P.. Burgard, W.. Most likely heteroscedastic Gaussian process regressionInternational Conference on Machine Learning 2007
Kiebel S.J.,Goebel R.,Friston K.J.. Anatomically informed basis functionsNeuroImage 2000;11:656–667. [pmid: 10860794]
Kim H.Y.,Cho Z.H.. Robust anisotropic diffusion to produce clear statistical parametric map from noisy fMRIProceedings of the 15th Brazilian Symposium on Computer Graphics and Image Processing 2002:11–17.
Kim H.Y.,Javier G.,Cho Z.H.. Robust anisotropic diffusion to produce enhanced statistical parametric map from noisy fMRIComput. Vis. Image Underst. 2005;99:435–452.
Knutsson H.E.,Wilson R.,Granlund G.H.. Anisotropic nonstationary image estimation and its applications: 1. Restoration of noisy imagesIEEE Trans. Commun. 1983;31:388–397.
Larin M.. On a multigrid method for solving partial eigenproblemsSib. J. Numer. Math. 2004;7:25–42.
Lawrence, N., 2006. Large scale learning with the Gaussian process latent variable model (Technical Report No. CS-06-05. University of Sheffield).
MacKay D.J.C.Introduction to Gaussian Processes, Neural Networks and Machine Learning. 1998Springer; Berlin:
MacKay D.J.C.. Information Theory, Inference, and Learning Algorithms2003Cambridge University Press; Cambridge:
Maggioni M.,Mahadevan S.. A Multiscale Framework For Markov Decision Processes Using Diffusion Wavelets2006University of Massachusetts; Massachusetts:
Moler C.,Van Loan C.. Nineteen dubious ways to compute the exponential of a matrix, twenty-five years laterSIAM Rev. 2003;45:3–49.
Penny W.,Holmes A.. Random-effects analysisFrackowiak R.,Friston K.,Frith C.,Dolan R.,Price C.,Zeki S.,Ashburner J.,Penny W.Human Brain Function. 2003Elseiver Science (USA); San Diego, California:
Penny W.,Kiebel S.,Friston K.. Variational Bayesian inference for fMRI time seriesNeuroImage 2003;19:727–741. [pmid: 12880802]
Penny W.D.,Trujillo-Barreto N.J.,Friston K.J.. Bayesian fMRI time series analysis with spatial priorsNeuroImage 2005;24:350–362. [pmid: 15627578]
Penny W.,Flandin G.,Trujillo-Barreto N.. Bayesian comparison of spatially regularised general linear modelsHum. Brain Mapp. 2007;28:275–293. [pmid: 17133400]
Perona P.,Malik J.. Scale-space and edge-detection using anisotropic diffusionIEEE Trans. Pattern Anal. Mach. Intell. 1990;12:629–639.
Quinonero-Candela J.Q.,Rasmussen C.E.. A unifying view of sparse approximate Gaussian process regressionJ. Mach. Learn. Res. 2005;6:1939–1959.
Rasmussen C.,Williams C.. Gaussian Processes for Machine Learning2006The MIT Press; Massachusetts, Cambridge:
Romeny B.M.T.. Geometry-driven Diffusion in Computer Vision1994Kluwer Academic Publishers;
Romeny B.M.T.. Front-end Vision and Multi-Scale Image Analysis2003Springer;
Rossmann W.. Lie Groups: An Introduction through Linear Groups2002Oxford University Press; Oxford:
Sethian J.A.. Level Set Methods and Fast Marching Methods Evolving Interfaces in Computational Geometry, Fluid Mechanics, Computer Vision, and Materials Science1999Cambridge University Press; Cambridge:
Sochen, N.. Zeevi, Y.Y.. Representation of colored images by manifolds embedded in higher dimensional non-euclidean spaceInternational Conference on Image Processing 1998
Sochen N.,Kimmel R.,Malladi R.. A general framework for low level visionIEEE Trans. Image Process. 1998;7:310–318. [pmid: 18276251]
Sochen, N.. Deriche, R.. Lucero-Lopez, P.. The Beltrami flow over implicit manifoldsProceedings of the Ninth IEEE International Conference on Computer Vision (ICCV) 2003
Spielman D.A.,Teng S.H.. Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systemsProceedings of the 36th Annual ACM Symposium on Theory of Computing 2004:81–90.
Teo P.C.,Sapiro G.,Wandell B.A.. Creating connected representations of cortical gray matter for functional MRI visualizationIEEE Trans. Med. Imag. 1997;16:852–863.
Tolliver D.,Baker S.,Collins R.. Multilevel Spectral Partitioning for Efficient Image Segmentation and Tracking2005
Tschumperle D.,Deriche R.. DT-MRI images: estimation, regularization, and applicationComput. Aided Syst. Theor. - Eurocast 2003;2809:530–541.
Wand M.P.. Vector differential calculus in statisticsAm. Stat. 2002;56:55–62.
Weickert, J., 1996. Anisotropic diffusion in image processing.
Woolrich M.W.,Jenkinson M.,Brady J.M.,Smith S.M.. Fully Bayesian spatio-temporal modeling of FMRI dataIEEE Trans. Med. Imag. 2004;23:213–231.
Worsley K.J.,Andermann M.,Koulis T.,MacDonald D.,Evans A.C.. Detecting changes in nonisotropic imagesHum. Brain Mapp. 1999;8:98–101. [pmid: 10524599]
Zhang F.,Hancock E.R.. Image scale-space from the heat kernelProgr. Pattern Recogn. Image Anal. Applic. Proc. 2005;3773:181–192.
Zhang F.,Hancock E.R.. Riemannian graph diffusion for DT-MRI regularizationMed. Image Comput. Comput. - Assist. Interv.?Miccai 2006;4191(Pt 2):234–242.
Zhang F.,Hancock E.R.. Smoothing tensor-valued images using anisotropic geodesic diffusionStruct. Syntact. Stat. Pattern Recogn. Proc. 2006;4109:83–91.

Article Categories:
  • Article

Keywords: Keywords Diffusion kernel, Weighted graph Laplacian, Spatial priors, Gaussian process model, fMRI, General linear model, Random effects analysis.

Previous Document:  Am I safe? The ventrolateral prefrontal cortex 'detects' when an unpleasant event does not occur.
Next Document:  Low-frequency fluctuations in the cardiac rate as a source of variance in the resting-state fMRI BOL...