Some months ago, I played with Un*x command-line tools to parse my tweets fetched from BackupMyTweets. Here is a more elegant to do so with R.

Well, the code is rather simple and most of what we need is already available through the twitteR package.

library(twitteR)
library(stringr)
my.tweets <- userTimeline("chlalanne", n=1000)

Suppose I want to display the frequency of tags I use in my messages:

find.tag <- function(x) unlist(str_extract_all(x$getText(), "#[A-Za-z0-9]*"))

# a little test to see whether it works or not
# for (i in 1:20) cat(i, ":", find.tag(my.tweets[[i]]), "\n")
my.tags <- lapply(my.tweets, function(x) try(find.tag(x), silent=TRUE))
sort(table(unlist(my.tags)), decr=TRUE)

To get the number of records I have,

me <- getUser("@chlalanne")
me$statusesCount  # or statusesCount(me)

(It works without the @ too.)

We can make a quick and dirty word cloud as follows:

library(snippets)
wcl <- table(unlist(my.tags))
names(wcl) <- str_replace_all(names(wcl), "#", "")
cloud(wcl[wcl > 5])

Other random notes:

  • There’s also the possibility of using OAuth, see help(registerTwitterOAuth), that I didn’t explore much at the moment.
  • The idea of using twitteR to update analysis status online was raised on Stack Overflow. (Note that it uses the older R API so that commands like initSession() are no longer available).
  • It would be even nicer to use the tm package with public timeline or things like that.