The MurmurHash algorithm belongs to non-cryptographic hash function and it is used for hash-based lookup, e.g. to convert a word $w$ to an integer $s \in [0, H)$, where $H$ is the table size. I came across this hash function by reading a recent publication on a new read mapper.1 It has been shown to have good properties in terms of speed and number of collisions compared to other hashing algoritms; see also Hash functions: An empirical comparison.
Although different implementations of Murmurhash3 are cited on the Wikipedia page (see also the
digest packages for Python/R), I couldn’t find one in Scheme. So here is a quick and dirty one in Chicken scheme, using its solid FFI facilities and some C code grabbed from the Wikipedia page. There’s also the digest R package (here’s the C code in question for the interested reader) and some C++ code available on smhasher on Github, but I haven’t tried it yet.
Please note that things have a little bit changed with Chicken 5, compared to this excellent blog post: A (mostly) comprehensive guide to calling C from Scheme and vice versa. In particular, to import the
foreign module, you now have to write
(import (chichen foreign)) instead of
(import foreign). Say you have this little snippet:
(import (chicken foreign)) (foreign-declare "#include <math.h>") (define sin-pi (foreign-lambda* double ((double x)) "double val = x * M_PI;" "C_return(sin(val));")) (display (sin-pi 0.5))
Then you can generate a binary for your platform using the Chicken compiler,
csc. The interactive Chicken REPL (
csi) won’t work with the
foreign import. First, I wanted to test the C code so I added a little header and print statement (murmur32.c). Let us check that it works:
$ cc murmu32.c -o murmur32 $ ./murmur32 1498610893
Here is what I got using R, which also implements MurmurHash3 for 32/128 bit (x86/x64):
> digest::digest("murmur", "murmur32", seed = 101, serialize = FALSE)  "5952fccd"
This also happens to be 1,498,610,893 in decimal notation.
Now, let’s write some Scheme wrapper to use this little C procedure, while trying to handle C types correctly: an
uint_8 is an unsigned char, or a foreign
blob in case it is just a pointer, while an
uint32_t corresponds to an unsigned int (4 bytes). And so, we get the following piece of code:
(import (chicken foreign) (chicken format)) #> extern uint32_t murmur3_32(const uint8_t* key, size_t len, uint32_t seed); <# (define m32 (foreign-lambda unsigned-int32 "murmur3_32" blob size_t unsigned-int32)) (printf "~A " (m32 "murmur" 6 101))
But first, we need to compile the core C functions (no need for a shared library, though):
$ cc -c lib-m32.c $ csc -o m32 lib-m32.o m32.scm
$ ./m32 1498610893