< a quantity that can be divided into another a whole number of time />

Playing with TwitteR

June 25, 2011

Some months ago, I played with Un*x command-line tools to parse my tweets fetched from BackupMyTweets. Here is a more elegant to do so with R.

Well, the code is rather simple and most of what we need is already available through the twitteR package:

my.tweets <- userTimeline("chlalanne", n=1000)

Suppose I want to display the frequency of tags I use in my messages:

find.tag <- function(x) unlist(str_extract_all(x$getText(), "#[A-Za-z0-9]*"))

# a little test to see whether it works or not
# for (i in 1:20) cat(i, ":", find.tag(my.tweets[[i]]), "\n")
my.tags <- lapply(my.tweets, function(x) try(find.tag(x), silent=TRUE))
sort(table(unlist(my.tags)), decr=TRUE)

To get the number of records I have:

me <- getUser("@chlalanne")
me$statusesCount  # or statusesCount(me)

(It works without the @ too.)

We can make a quick and dirty word cloud as follows:

wcl <- table(unlist(my.tags))
names(wcl) <- str_replace_all(names(wcl), "#", "")
cloud(wcl[wcl > 5])

Other random notes:

rstats twitter

See Also

» A bag of tweets / May 2011 » Design of experiment in R » Using bootstrap in cluster analysis » Recursive feature elimination coupled to SVM in R » Pretty printing statistical distribution tables