R, pipes and Co.


The R language is rapidly changing. I am afraid I'm still teaching R like I learned and liked it 10 years ago (but I was already aware of replicate() long ago :-) although I try to keep regularly informed of what's new on CRAN.

It stucks in my head for two or three years now: Should I just stop teaching how to use lattice graphics and switch to ggplot2? If you are wondering why this causes me some problems, this is just because once students understand the advantage of using R formulae and the split-apply-combine strategy with aggregate() (and not plyr) for statistical modeling and data aggregation you are almost done. The same R formulae can be used directly with, e.g., xyplot(), with minor variations for grouping or conditioning variables. Moreover, the same formulae are in use in the wonderful Hmisc package (which is why, after all, I don't really need the plyr package).

When I stumbled upon this nice tutorial, I could not help but think that lattice::xyplot already provides most of these functionnalities (especially grouped regression lines) with very few options, e.g.

xyplot(lifeExp ~ gdpPercap, data = d, groups = continent, type = c("p", "r"),
       scales = list(x = list(log = 10)))

Of course, ggplot is great and Hadley's {d}plyr packages are really good, but it looks to me like if it were another R. Beside The Good The Bad And The Ugly about domain-specific language vs. general purpose programming languages, issues with naming conventions, different approaches to object-oriented programming, or some idiosyncrasies inherent to R itself and its community of developers, I feel like the R language already known enough internal and external divisions with people looking for or actively involved in alternative solutions, be it Python, Clojure, Lisp, pure C, Scala or Julia.

Although I am familiar with Unix pipes, I must admit I only tried very briefly magrittr + dplyr, or even tidyr, and I am far from mastering all the packages that are now part of what is sometimes called the Hadleyverse,(a) even if I have been using ggplot for a long now. Anyway, I am not sure that the example about magrittr that is available on the RStudio Blog is really attractive for newcomers, though:

mae <- . %>% abs %>% mean(na.rm = TRUE)

This happens to be a way to express the following simple function: mae <- function(x) mean(abs(x), na.rm = TRUE). I agree that in some cases expressing R's operations through pipes could be really fun (and probably more expressive), especially for data munging, but I feel like it often just obscures the language. However, I should note that the following piece of code from Kieran Healy looks more clear to me. At least, I can understand this series of operations:

data.m %>% group_by(Product) %>% filter(Product=="iPad") %>% na.omit() %>% data.frame(.)

Maybe I'm too old after all, and I should just try to use these new tools more regularly.


(a) See also: R: the good parts.


Articles with the same tag(s):

Multi-Group comparison in Partial Least Squares Path Models
Yet another gray theme for R base graphics
Writing a book
R Graphs Cookbook
Emacs Org-mode and literate programming
Reproducible research with R
Audit trails in statistical project
Interactive data visualization with cranvas
Back from the BoRdeaux conference