Monday, April 5, 2010

AD Model Builder -- BUGS-like flexibility for maximum likelihood (and Bayesian) inference

I've been starting to use AD Model Builder these last few weeks in my research and am quite impressed.  ADMB is an ex-proprietary fourth generation programming language that allows statisticians to quickly build and fit complex statistical models.  For some reason, ADMB has mainly been popular only in fisheries research.  Now that it's gone open software with a BSD license, I believe ADMB is primed for adoption by a much wider audience.  The ability to fit random effects models by automatically computing the Laplace approximation to the integrated likelihood is just too convenient.  I'm debating working ADMB in as the new engine for unmarked to allow for faster model fits and random effects.

I'm too busy working on my research to flush out a full example for this post, but the documentation is quite extensive and there is an extremely helpful user community.  The general workflow is

  1. The user creates a "tempate" file and ADMB converts this into C++ code that uses a powerful combination of the included automatic differentiation and optimization libraries.
  2. This C++ program is then compiled into an executable that runs blazingly fast compared with either BUGS or JAGS, who I see as the main competitors.  
There are a number of R calling packages in the works that make this all more straightforward than this sounds.  And the Emacs package makes working with ADMB a real pleasure.  Thanks to Dave Fournier, the creator of ADMB!

Saturday, February 20, 2010

Easily and dramatically speed up R: install GotoBLAS2 on Ubuntu

The speed of computations in R is highly dependent on the underlying linear algebra routines that R is linked to.  This entry describes how I gained orders of magnitude speed increase for some matrix operations over the default R installation by installing and linking R against multi-threaded GotoBLAS2 thanks to the work of Kazushige Goto at the University of Texas at Austin.

At the core of matrix operations in R is a set of Fortran routines called BLAS (Basic Linear Algebra Routines).  Fortunately, modern versions of R are compiled against a shared BLAS.  What this means to us data analysts is that we can easily swap out one BLAS library for another by only altering symbolic links and copying a couple of files!  Although the R installation manual briefly mentions that this is possible, I haven't seen detailed Ubuntu-specific directions online.  And other similar tips from users describe compiling R from source.  Here's how to install GotoBLAS2 on Ubuntu 9.10 with R 2.10.  I haven't verified any other configuration.

  1. Visit the Texas Advanced Computing Center and follow the directions to download GotoBLAS2.
  2. Extract the files somewhere temporary.
  3. Follow the directions in the file 02QuickInstall.txt, which basically say to enter the directory in a shell and type make.
  4. Copy the resulting .so shared object library file (libgoto2_penrynp-r1.12.so on my machine) to /usr/lib
  5. The original BLAS library located there is called libblas.so.3gf.0.  Back this file up:  cp libblas.so.3gf.0  libblas.so.3gf.0.keep
  6. Make a symbolic link to the new BLAS library:
    ln -s libgoto2_penrynp-r1.12.so libblas.so.3gf.0
That's it!  Do some matrix operations and enjoy the speedup!

Tuesday, February 16, 2010

Resumebucket.com is unfriendly to document portability

Ok, this isn't stats or programming, but it's relevant to like-minded mathy folks.  I'm graduating soon and I'm applying for jobs.  The website resumebucket.com seems like a great way to get myself out there.  Naturally, being a programmer math geeky type, I've prepared my resume in beautiful LaTeX.  It turns out that resumebucket and a few other sites don't accept pdf files, but they do take Microsoft Word files!  Ugh... do they not know that the "p" stands for portable?

Priors for variance parameters in hierarchical models (think outside the manual)

When fitting Bayesian models, modelers are often choose a particular prior simply because they've seen it used plenty of times before or because it's conjugate and they love a satisfying computational convenience.  Well, when it comes to the common hierarchical linear model, this can be dangerous.  Consider the model,
\begin{gather} y_i \sim N(\mu_0 + \mu_i, \sigma^2) \\ \mu_i \sim N(0, \sigma^2_\mu). \end{gather}
The most common choice for a prior for $\sigma^2_\mu$ is inverse gamma with some small shape and rate parameters, say $InvGamma(\epsilon, \epsilon)$.  This a gives nice, conjugate conditional density for $\sigma^2_\mu$.  But I recently ran in to a situation where I was banging my head against a wall trying to figure out why problems were occurring elsewhere in the posterior sampling and thanks to the advice from one of my committee members, I tracked the problem down to this prior.

Even though a decent solution to this problem has been posted by Andrew Gelman at Columbia, it hasn't really made it into the mainstream yet.  His solution is not that complicated -- use a uniform prior on $\sigma_\mu$, the standard deviation.  Sure enough, I programmed this and my application worked like a charm.  I used Neal's slice sampler R code to implement the sampling.

Thursday, February 11, 2010

Speeding up slow spatial MCMC

This is an update to my earlier post about a slow MCMC sampler for fitting a spatial Bayesian model.  As it turns out, there is a simple way to speed up the sampler.  First, here's a quick review of a simplified version of the basic spatial regression model.  The response $\mathbf{y} = (y_1, \dots, y_n)' \sim \text{Gaussian}(X \beta, \Sigma) $ where $X$ is a design matrix, $ \beta $ are the regression coefficients, and $\Sigma$ is a covariance matrix that describes the pattern of spatial correlation.  Commonly, $\Sigma$ is generated by some parametric covariance model and the most common model to use is the Matern model.  According to the Matern correlation function, the correlation between location $i$ and $i'$ is
\[ \frac{\left(\| s_{i} - s_{i'} \| / \rho\right)^{\nu}}
{2^{\nu-1}\Gamma(\nu_{j})}
B_{\nu}(\| s_{i} - s_{i'} \| / \rho) \]
where $s_i$ is the location of site $i$, $B$ is the modified Bessel function of the third kind of order $\nu$.  So then it becomes necessary to estimate the correlation parameters $\nu$ and $\rho$.  It turns out that $\nu$ governs the smoothness of the spatial process and $\rho$ governs the range.

Really, $\nu$ and $\rho$ are nuisance parameters.  That is, we aren't interested in them, but we need to account for their presence in the model when estimating things we care about, like the coefficients $\beta$.  There has been some previous research that says that the MLE for the Matern correlation parameters is not consistent, but predictions using the estimated parameters is consistent.  In other words, those bottlenecking Bessel computations might not be necessary after all.

So a common approach to avoid so many Bessel computations and repeated matrix inversions (in the evaluation of the Gaussian pdf) is to use a discrete uniform prior for the correlation parameters instead of a continuous one.  I'm currently using a $ 10 \times 10 $ grid. Then, I just precompute all of the correlation matrices over the grid.  Then, my sampler only needs to compute a multivariate Gaussian density using the precomputed correlation matrix during each iteration -- much faster than slice sampler I was using before!  It looks like this is working pretty well so far.

Wednesday, February 10, 2010

More on the quest for math on Blogger

It looks like the MathJax folks have created the webmath holy grail. Notice that unlike image-based methods, their MathML approach scales correctly as the browser font size is altered.  Finally, I can get on to blogging about math and not about blogging about math.

Here are some examples from their page:
The Lorenz Equations
\[\begin{matrix}
\dot{x} & = & \sigma(y-x) \\
\dot{y} & = & \rho x - y - xz \\
\dot{z} & = & -\beta z + xy
\end{matrix} \]
The Cauchy-Schwarz Inequality
\[ \left( \sum_{k=1}^n a_k b_k \right)^2 \leq \left( \sum_{k=1}^n a_k^2 \right) \left( \sum_{k=1}^n b_k^2 \right) \]
A Cross Product Formula
\[\mathbf{V}_1 \times \mathbf{V}_2 = \begin{vmatrix}
\mathbf{i} & \mathbf{j} & \mathbf{k} \\
\frac{\partial X}{\partial u} & \frac{\partial Y}{\partial u} & 0 \\ \frac{\partial X}{\partial v} & \frac{\partial Y}{\partial v} & 0
\end{vmatrix} \]
The probability of getting \(k\) heads when flipping \(n\) coins is:
\[P(E) = {n \choose k} p^k (1-p)^{ n-k} \]
An Identity of Ramanujan
\[ \frac{1}{(\sqrt{\phi \sqrt{5}}-\phi) e^{\frac25 \pi}} =
1+\frac{e^{-2\pi}} {1+\frac{e^{-4\pi}} {1+\frac{e^{-6\pi}}
{1+\frac{e^{-8\pi}} {1+\ldots} } } } \]
A Rogers-Ramanujan Identity
\[ 1 + \frac{q^2}{(1-q)}+\frac{q^6}{(1-q)(1-q^2)}+\cdots =
\prod_{j=0}^{\infty}\frac{1}{(1-q^{5j+2})(1-q^{5j+3})},
\quad\quad \text{for} |q|<1. \]
Maxwell’s Equations
\[ \begin{aligned}
\nabla \times \vec{\mathbf{B}} -\, \frac1c\, \frac{\partial\vec{\mathbf{E}}}{\partial t} & = \frac{4\pi}{c}\vec{\mathbf{j}} \\ \nabla \cdot \vec{\mathbf{E}} & = 4 \pi \rho \\
\nabla \times \vec{\mathbf{E}}\, +\, \frac1c\, \frac{\partial\vec{\mathbf{B}}}{\partial t} & = \vec{\mathbf{0}} \\
\nabla \cdot \vec{\mathbf{B}} & = 0 \end{aligned}
\]

Never assume the source of slow code until profiling

I'm currently working on coding an MCMC sampler for a Bayesian spatial model in R.  As expected, the sampler is quite slow.  But, I was totally surprised about the the source of the bottleneck.  After profiling the code, it turned out the the sampler was spending about 50% of the time evaluating the modified Bessel function.  This function is used to compute the correlation between two locations according to a Matern correlation function.  Now, to figure out a trick around this.  Will I need to get into the R internals and find the slowdown for computing the Bessel function in C?  I'd rather find a way around this for time's sake.