Playing with Julia


Following my earlier post on Julia and gsl-shell, I decided to familiarise myself with Julia.

A lot of smart guys have recently blogged about Julia, see e.g. Julia, I love You by John Myles White. Even Doug Bates started to play with it, and in particular bring us with an easy way to call R's statistical distributions with Julia.

The installation went fine for me. I'm using XCode 4.3.2 on OS X 10.7.3, with command-line tools installed and 64-bit gfortran (first time I installed a different fortran from the one released on R Development Tools and Libraries). Of course, I forgot to ask for a parallel build, so it took far longer than expected. To get a working web REPL, I also needed to fetch lighttpd, which is basically as simple as typing make -C external install-lighttpd in julia/ root directory. Although the web REPL looks great, it has very limited plotting facilities at the moment, so I will stick with the console REPL. It is worth noting that during the first install all external dependencies will be downloaded and everything goes into a root folder. In the end, it amounts to about 1.3 Go on your hard drive, so you can safely delete everything but the external/root folder (and lighttpd.conf if you want to use the webserver).


Let's start with some very basic stuff. Note that I am using the REPL and for plotting purpose I choose the recently released Gaston interface to Gnuplot.(a) There's an embedded plotting library, Winston, but I got a lot of Cairo-related errors when trying it. To use R's probability distributions and RNGs, we need to have a working libRmath somewhere. I happened to compile it following instructions in the R Installation and Administration manual, as pointed out by Doug Bates. (We really just need to type make in the src/nmath/standalone subdirectory of R source tree.)

_jl_libRmath = dlopen("julia/libRmath")
x = -3:.1:3
y = dnorm(x, 0, 1)
a = Axes_conf()
a.title = "The gaussian density from R"
addcoords(x, y)

Here is what I got:

Nothing fabulous actually. But wait, we can do other funny things. First, let's implement a simplified t-test. There's already a bunch of handy functions in base/statistics.jl. Basically, we just need to compute a difference of means and a pooled estimate for the variance. Note that we assume equal variance and a two-sided test. I also added a switch-statement for the case of paired samples. This has been tested with the famous Student's sleep data set (student1908.csv).

function t_test(x, y, paired)
  nx = length(x)
  ny = length(y)
  dobs = mean(y) - mean(x)
  if paired == true && nx == ny
    df = nx - 1
    se = std(x-y) / sqrt(nx)
    df = nx + ny - 2
    se = sqrt(((nx-1) * std(x)^2 + (ny-1) * std(y)^2) / df)
  if paired == false 
    se = se * sqrt(1/nx + 1/ny)
  tobs = dobs / se
  pval = 2*(1 - pt(tobs, df))
  res = HashTable()
  res["stats"] = tobs; res["df"]=df; res["p_value"]=pval

dat = dlmread("student1908.csv")
t_test(dat[:,1], dat[:,2], false)  # p = 0.079
t_test(dat[:,1], dat[:,2], true)   # p = 0.003

Nothing really difficult there. Let's go for a rough PCA, on Fisher-Anderson iris.txt dataset. Julia already has some Basic Linear Algebra functions in base/linalg.jl (plus additional LAPACK bindings in linalg_lapack.jl, including SVD computation).

function scale(x)
  (x - mean(x)) /std(x)

function pca(a, standardize)
  if standardize == true
    for i = 1:size(a, 2)
      a[:,i] = scale(a[:,i])
  res = svd(a)
  sd = res[2] / sqrt(size(a, 1) - 1)
  rot = transpose(res[3])
  sd, rot

iris = dlmread("iris.txt")
pca(iris[:,1:4], true)

We can also compare results from an SVD and an eigenvalue decomposition:

n = 100
x1 = randn(n)
x2 = randn(n)
x3 = .6 * x1 + randn(n)
X = [x1 x2 x3]
for i = 1:size(X, 2)
  X[:,i] = scale(X[:,i])
amap(mean, X, 2)
r = 1/(n-1) * X' * X
res1 = pca(X, false)[1]
res2 = sortr(real(sqrt(eig(r)[1])))

The above code (it probably looks really ugly because these are my very first lines of Julia code!) reminds me of my 4-5 years of Matlab, but overall the documentatin points toward interesting features that I just nead to learn more.

As a conclusion of this 2-hour session, it is evident that I am missing a lot of things about Julia. There's a cor method that is supposed to work with one or two vectors, but also with a matrix, but it doesn't seem to work when I call it like cor(X). I also had some trouble with csv header which when present gave me an Any Array that I know nothing about. Well, in my defense I haven't finished reading the documentation set, and just started from scratch with a console REPL early in the morning.

A reply by Harlan Harris to a recent question on Cross Validated, Does Julia have any hope of sticking in the statistical community?, summarizes a lot of missing features as compared to R. But it still looks like a great project, and it's always interesting to follow the development of a new programming language from the start.


(a) I had to patch gaston.jl because it is actually setting a wxt terminal that I don't have. I replaced it by a standard x11 terminal.


Articles with the same tag(s):

Collecting email usage statistics from mu
Data science at the command-line
Interacting with Weka from Jython
CoffeeScript or how to avoid typing ugly Javascript code
Workflow for statistical data analysis
GSL Shell
Apple weekend miscellanies
Color schemes for Emacs and TeX
Compiling Gnuplot on OS X
Weaving scientific document