aliquote.org

Weaving Stata Documents

April 22, 2012

StatWeave has been recently updated and it has become a powerful engine for weaving Stata documents.

StatWeave

The good news is that we can now use graphical commands with Statweave. There’s a minimal working example in the testing suite: Stata-test.swv. The Statweave package offers some handy customizations like code formatting (see \StataweaveOpts{}), and basically all we need to do is to put our Stata code in a Statacode environment. For R, we would use a Rcode environment. Like with Sweave, we can display or hide the code, and ask Statweave to generate a figure like shown in the following example:

\begin{Statacode}{fig, hide, height=4.5in, width=9in, dispw=4in}
predict g_hat
twoway (scatter gp100m disp) (line g_hat disp, sort), by(foreign)
\end{Statacode}

The STATWEAVE Users’ Manual has more informations on running and customizing StatWeave. I think it should not be too difficult to create language-specific files for, e.g. Julia or gsl-shell.

Context filter

Nowadays, the Context filter module allows to call external programs, like R, Pandoc, or Asymptote, and insert their results into our TEX document. That’s really awesome because it means that we can build dynamic documents that keep in sync with accompanying code or simulation, à la Sweave. There are nice demos in the tests/ directory in the aforementioned Github repository.

I tested the R weaving option, and it works quite well although I noticed two minor points: (a) a proc.time() command is issued at the end of each R chunk, and (b) we have to explicitly ask to save graphics before embedding them in our document. The first issue is easily solved by modifying the filtercommand:

filtercommand={R CMD BATCH -q --no-timing %
  --\externalfilterparameter{mode} %
  \externalfilterinputfile\space \externalfilteroutputfile}

Adding --no-timing will ensure R will exit without printing elapsed time. I added another option, --\externalfilterparameter{mode}, which allows to write things like what is shown below:

\startR[mode=slave]
summary(x)
\stopR

to get results returned by R only (well, it’s a bit crappy but it works). The second issue should easily be solved by saving all graphics into a single PDF file, and using \externalfigure command with a page= option. This is what I use with LaTeX and it works quite well. So, we could add something like this:

\startR[read=no]
pdf("figs.pdf")
\stopR

at the beginning of our document, and a dev.off() command at the end. This way, we just have to call \externalfigure while incrementing page number after each call.

What about Stata? A basic filter would look like:

$ stata -q -b do \externalfilterinputfile

which tells Stata to process \externalfilterinputfile do file in batch mode. Again, there are some caveats with the above command: it will leave something like end of do file as well as a leading Stat prompt (.) at the end of the Stata code chunk.

I wrote a small Bash script to post-process Stata do file available as a Gist:

#! /usr/bin/env sh
# Post-process Stata do -> log file.
# We should ensure that we use the correct stata program
# (e.g., might be stata, stata-mp, etc.). I'll put this here
# for later update.
STATA="$(which stata)"
# Process command line options
usage()
{
cat << EOF
Usage: $0 [-hst] file
This script asks Stata to process a do file and log its output.
OPTIONS:
-h --help show this message
-s --slave slave mode (discard Stata command)
-t --tidy remove all empty lines
EOF
}
SLAVE=0
TIDY=0
while getopts ":hst-:" opt; do
case $opt in
h)
usage
exit 1 ;;
s)
SLAVE=1 ;;
t)
TIDY=1 ;;
-)
case $OPTARG in
help)
usage
exit 1 ;;
slave)
SLAVE=1 ;;
tidy)
TIDY=1 ;;
*)
usage
exit ;;
esac ;;
?)
usage
exit ;;
esac
done
shift ((OPTIND-1))
# Process do file
OUTFILE=${1%%.*}.log
STATAqbdo1
# Delete last two lines ('.', 'end of do file')
sed -i '' 'N;!P;!D;dOUTFILE
# Filter graphsave command and tidy up blank lines at the end
# Note that in case graphic file was already generated, it will not
# filter error message.
sed -i '' '/^\. graph export/d' $OUTFILE
sed -i '' '/^(file/d' $OUTFILE
sed -i '' '/^/N;/\n/D' $OUTFILE
# Also delete first blank line and update orginal log file
sed -i '' '1d' $OUTFILE
if [[ $SLAVE = 1 ]]; then
sed -i '' '/^\./d' $OUTFILE
fi
if [[ $TIDY = 1 ]]; then
sed -i '' '/^/dOUTFILE
fi
view raw ctxstata hosted with ❤ by GitHub

It has few options: keep only results (i.e., remove Stata commands), and/or tidy up the log file by removing extra blank lines. If the do file includes -graph export- commands, they are removed as well. (Almost everything is done with sed.)

$ ctxstata -h
Usage: /usr/local/bin/ctxstata [-hst] file

This script asks Stata to process a do file and log its output.

OPTIONS:
  -h  --help    show this message
  -s  --slave   slave mode (discard Stata command)
  -t  --tidy    remove all empty lines

I defined the following filter:

\defineexternalfilter
  [Stata]
  [filtercommand={ctxstata --\externalfilterparameter{option} \externalfilterinputfile},
    output=\externalfilterbasefile.log,
    readcommand=\typefile,
    color=typecolor,
    cache=yes,
    label=yes,
    spacebefore=big,
    spaceafter=big,
    continue=yes]

For an unknown reason, it works for printing Stata code and results, but it fails rendering images. So, the following piece of code will not generate an EPS picture:

\startStata
sysuse auto, clear
summarize mpg
twoway scatter mpg weight, by(foreign, total)
graph export mpg.eps
\stopStata

However, processing a do file just happens to work:

\processStatafile[output=auto.log,option=tidy]{auto.do}

That’s puzzling, so I guess I’m just missing something obvious or I need to investigate more about the filter module behavior.

See Also

» Weaving scientific documents » Easier literate programming with R » Happy TeXying » Color schemes for Emacs and TeX » Pretty printing statistical distribution tables