Venn diagrams and SQL joins in R


When browsing Tweeter feeds yesterday, I just noticed a post by J.D. Long (alias @CMastication) referring to a nice way of illustrating SQL joins statements with Venn diagrams by Jeff Atwood. So I wonder how it could be reproduced in R.

I initially thought of hacking the venneuler package. However, it happens that I really need a few things, so that I just wrote a wrapper function that takes care of drawing two spheres and shading the appropriate areas. The results actually looks like:


where the five illustrations correspond to the following R code:

# (a) Inner join
merge(tableA, tableB, by="name", all=FALSE)

# (b) Full outer join
merge(tableA, tableB, by="name", all=TRUE)

# (c) Left outer join
merge(tableA, tableB, by="name", all.x=TRUE)

# (d) Records in A, but not in B
res <- merge(tableA, tableB, by="name", all.x=TRUE)
res[apply(res, 2,[,3],]
# or 
# intersect(tableA$name, tableB$name)

# (e) Records unique to A and B
res <- merge(tableA, tableB, by="name", all=TRUE)
res[apply(res, 1, function(x) any(,]

I am not really satisfied with that, and there's room for improvements, especially in the graphical output. Anyway, I will turn back to it if I had time.

The code is available as Gist 769392. Now, the point is that I still think Metapost, or even Asymptote, would do a better job for such drawings.

So, here is the mp code (thanks to the macro):


It is just a matter of running


on the attached file, to produce all five pictures.

And here is what it looks like using Asymptote (venn_demo.asy):



By the way, I switched back to Markdown for editing these post, because textile really sucks from time to time... Using Markdown with textpattern is made possible thanks to php Markdown.


Articles with the same tag(s):

Multi-Group comparison in Partial Least Squares Path Models
Yet another gray theme for R base graphics
Writing a book
R, pipes and Co.
R Graphs Cookbook
Emacs Org-mode and literate programming
Reproducible research with R
Python for interactive scientific data visualization
Bar charts of counts or frequencies in Stata