aliquote.org

On memoization

December 21, 2019

Memoization is a technique that keeps intermediate results in memory in order to speed up execution of a computer program. In other words, we store results from succesive call to a given function so that whenever it is called with the same arguments we don’t need to evaluate this function again and again and just pick the value computed from a previous call. This is somewhat akin to using a cache to manage the state of a function or a stack of its returned values, or more generally to dynamic programming (we talked a bit of DP in the case of Motzkin numbers). This assumes that the function is pure, which is often the case in functional PLs, and it is related to Y-Combinator, Y = λf · (λx · f (x x)) (λx · f (x x)), which allows to devise a non-recursive version of a function, apply Y to it, and get a recursive function in turn.

Even with pure function (no side effect) and referential transparency (replacing an expression with its value does not modify the behavior of the program), there are cases where memoization is not possible without resorting to higher-order function, in particular when a recursive function has to perform memoized calls to itself. Indeed, using lambda calculus alone does not allow to devise recursive function since there’s no way to call anonymous function (the f in λx · f(x x)) unless we give it a name. Hence the need for a combinator, i.e. another function without free variables, which allows a function to call itself, without giving it a name in the scope in which it is declared. In Racket, this is expressed as follows:

#lang lazy

(define Y (λ(f)((λ(x)(f (x x)))(λ(x)(f (x x))))))

(define fib
  (Y (λ(f) (λ(x) (if (equal? x 1) 1 (* x (f (- x 1))))))))

You will find similar patterns for Python, Haskell, and even C on this Medium post. Do we need to write all this by ourselves? No, not really, since Racket already comes with a memoization procedure for typed or untyped programs. Here is an example for the Fibonacci sequence using typed Racket:

(require tmemoize)

(: fib (-> Integer Integer))
(define fib (memoize (lambda ([n : Integer])
                        (if (<= n 1) 1
                            (+ (fib (- n 1))
                               (fib (- n 2)))))))

Unfortunately, the tmemoize package seems no longer available, so here is a modified version using untyped Racket:

#lang racket

(require memoize)
(require racket/trace)

(define/memo (fib n)
  (if (<= n 1) 1 (+ (fib (- n 1)) (fib (- n 2)))))

(trace fib)
(fib 10)

The number of function calls should be 2F(n) − 1, where F(n) is the Fibonacci number corresponding to n. You can “trace” function calls when memoization is enabled or not using the racket/trace module, as illustrated in the above script. By running this script in a terminal (racket fib.rkt | grep fib | wc -l), you can easily check that there are just 19 calls to fib when memoization is enable whereas there’s a total of 177 function calls, as expected, when it is not (you just have to remove the define/memo form).

A more interesting discussion is provided on JP’s blog. Another implementation is available on Henry Brooks' website. There’s also a crazy version on Coursera (starting at 3'50). Finally, there is a nice discussion of memoization, from which most the code above borrows from, on the Racket blog: Dynamic Programming versus Memoization.

Regarding Python, I didn’t invest much in the joblib module — which is able to cache intermediate results, AFAIK — but I relied on the lru_cache decorator from the functools module, instead:

import functools

@functools.lru_cache(None)
def fib(n):
    if n < 2:
        return n
    return fib(n-1) + fib(n-2)

Similar tracing functionalities are available in Python, e.g. using the following snippet:

import sys

def tracefunc(frame, event, arg, indent=[0]):
    if event == "call":
        indent[0] += 2
        print("-" * indent[0] + "> call function", frame.f_code.co_name)
    elif event == "return":
        print("<" + "-" * indent[0], "exit function", frame.f_code.co_name)
        indent[0] -= 2
    return tracefunc

Then you’ll just need to add sys.settrace(tracefunc) before your function call.

Haskell is quite good actually when it comes to keeping intermediate results in memory: You just have to name it; in the case of Fibonacci numbers, this means keeping a list of intermediate results, which remains in memory despite garbage collection. The Haskell wiki has a nice application of memoization, although there are other examples available on SO:

fib :: (Int -> Integer) -> Int -> Integer
fib f 0 = 1
fib f 1 = 1
fib f n = f (n - 1) + f (n - 2)

-- this function takes care of memoization per se
memoize :: (Int -> a) -> (Int -> a)
memoize f = (map f [0 ..] !!)

-- and here is the memoized fib function
fibMemo :: Int -> Integer
fibMemo = fix (memoize . fib)

Integer is an arbitrary precision type in Haskell, while Int corresponds to traditional 32- or 64-bit integer.

See Also

» Insertion sort and Python FFI » Racket FFI and C » Prime permutations » Motzkin numbers » De Bruijn graph and genome assembly