Easier literate programming with R

April 2, 2012

I have been using Sweave over the past 5 or 6 years for processing my R documents, and I have been quite happy with this program. However, with the recent release of knitr (already adopted on UCLA Stat Computing and on Vanderbilt Biostatistics Wiki) and all of its nice enhancements, I really need to get more familiar with it.

In fact, there’s a lot of goodies in Yihui Xie’s knitr, including the automatic processing of graphics (no need to call print() to display a lattice object), local or global control of height/width for any figures, removal of R’s prompt (R’s output being nicely prefixed with comments), tidying and highlighting facilities, image cropping, use of framed or listings for embedding code chunk.

To overcome some of those lacking features in Sweave, I generally have to post-process my files using shell scripts or custom Makefile. For example, I am actually giving a course (in French) on introductory #rstats for biomedical research and I provide a series of exercices written with Sweave. I can easily manage my graphics to have the desired size using a combination of Sweave Gin and lattice’s aspect= argument. However, the latter means I have to crop my images afterwards. Moreover, I need to “cache” some of the computations and there’s no command-line argument for that, unless you rely on pgfSweave. This leads to complicated stuff like:

$ R --no-save --no-restore -e "require(cacheSweave); setCacheDir('./cache'); \
  Sweave('hw-sols.rnw', driver=cacheSweaveDriver)"
$ R CMD Stangle hw-sols.rnw
$ ./hw_crop.sh
$ xelatex hw-sols.tex

Here, hw_crop.sh is a small Bash utility which calls TexLive pdfcrop program:

#! /usr/bin/env bash
for i in $(ls figs/*); do pdfcrop --margins 5 $i $i; done

From what I’ve seen so far, transitioning to knitr looks simple and some of the demos are really awesome. (A lot of sample *.Rnw files are available in the package examples directory:

$ ls *.Rnw
knitr-beamer.Rnw      knitr-listings.Rnw    knitr-themes.Rnw
knitr-graphics.Rnw    knitr-manual.Rnw      knitr-twocolumn.Rnw
knitr-input-child.Rnw knitr-minimal.Rnw
knitr-input.Rnw       knitr-subfloats.Rnw

I wrote a small Bash script which basically takes care of the Rnw->pdf conversion, with either xelatex or pdflatex as a $\TeX$ backend. I will update it with more options (bibtex, batch mode, etc.) later. Consider the sample file below:

\documentclass[8pt,a4paper]{article}
\usepackage{blindtext}
\begin{document}
\blindtext[1]
<<setup,echo=FALSE,cache=FALSE>>=
options(width=85)
suppressPackageStartupMessages(library(ggplot2))
@ 
<<reg_demo,fig.cap='A sample demo',fig.width=5,fig.height=3,fig.pos='htbp'>>=
x <- runif(100)
y <- 1.2 + 0.8*x + rnorm(100)
dfrm <- data.frame(x, y)
summary(dfrm)
ggplot(data=dfrm, aes(x, y)) + geom_point() + stat_smooth(method="lm")
@ 
\end{document}

Then, knit it to get the following output (PDF):

$ knitr -ql knitr1.Rnw

Other interesting features are:

We can use Markdown directly (see also Knitr, Github, and a new phase for the lab notebook).
Snapshots from dynamic visualization (e.g., rgl, ggobi, or Yihui’s animate) can be embedded easily.

I wonder if all this good stuff would just work out of the box using Context filter module.

The other Noweb-like system that I really want to try is Dexy. It also offers very nice rendering facilities, and allows to mix several programming languages into the same file. Looking at some of the demonstrations and templates (especially, Code Journal Markdown R and (Tufte Latex Article R](http://www.dexy.it/docs/templates/tufte-latex-article-r/)), I feel it might well serve as another great tool for creating my next course slides and handouts.

aliquote.org

Easier literate programming with R

See Also