Venn diagrams and SQL joins in R

2011-01-07

When browsing Tweeter feeds yesterday, I just noticed a post by J.D. Long (alias @CMastication) referring to a nice way of illustrating SQL joins statements with Venn diagrams by Jeff Atwood. So I wonder how it could be reproduced in R.

I initially thought of hacking the venneuler package. However, it happens that I really need a few things, so that I just wrote a wrapper function that takes care of drawing two spheres and shading the appropriate areas. The results actually looks like:

venn

where the five illustrations correspond to the following R code:

# (a) Inner join
merge(tableA, tableB, by="name", all=FALSE)

# (b) Full outer join
merge(tableA, tableB, by="name", all=TRUE)

# (c) Left outer join
merge(tableA, tableB, by="name", all.x=TRUE)

# (d) Records in A, but not in B
res <- merge(tableA, tableB, by="name", all.x=TRUE)
res[apply(res, 2, is.na)[,3],]
# or 
# intersect(tableA$name, tableB$name)

# (e) Records unique to A and B
res <- merge(tableA, tableB, by="name", all=TRUE)
res[apply(res, 1, function(x) any(is.na(x))),]

I am not really satisfied with that, and there's room for improvements, especially in the graphical output. Anyway, I will turn back to it if I had time.

The code is available as Gist 769392. Now, the point is that I still think Metapost, or even Asymptote, would do a better job for such drawings.

So, here is the mp code (thanks to the venn.mp macro):

venn2

It is just a matter of running

mptopdf venn_demo.mp

on the attached file, venn_demo.mp to produce all five pictures.

And here is what it looks like using Asymptote (venn_demo.asy):

venn3

Note

By the way, I switched back to Markdown for editing these post, because textile really sucks from time to time... Using Markdown with textpattern is made possible thanks to php Markdown.

---

Articles with the same tag(s):

Multi-Group comparison in Partial Least Squares Path Models
Yet another gray theme for R base graphics
Writing a book
R, pipes and Co.
R Graphs Cookbook
Emacs Org-mode and literate programming
user2014
Reproducible research with R
Python for interactive scientific data visualization
Bar charts of counts or frequencies in Stata

---