List processing and consing

July 29, 2019

I just finished reading The Little Schemer, by Friedman and Felleisen. Here are some basic facts about very simple yet useful data structures: lists.

First of all, this is a very nice book for people interested in Lisp, Scheme, or functional programming to a larger extent. It is very different from SICP in that it focus on basic building blocks (car, cdr, cons, eq, etc.) and use a very pragmatic approach to understanding the structure and interpretation of forms and s-expr. The penultimate goal of this book (4th ed., after the original Little Lisper) is to learn to think in a functional way. The ten commandments are worth keeping in mind for that very specific purpose:

When recurring on a list of atoms, lat, ask two questions about it: (null? lat) and else. When recurring on a number, n, ask two questions about it: (zero? n) and else. When recurring on a list of s-expr, l, ask three questions about it: (null? l), (atom? (car l)), and else.
Use cons to build lists.
When building a list, describe the first typical element, and then cons it into the natural recursion.
Always change at least one argument while recurring. When recurring on a list of atoms, lat, use (cdr lat). When recurring on a number, n, use (sub1 n). And when recurring on a list of s-expr, l, use (car l) and (cdr l) if neither (null? l) nor (atom? (car l)) are true. It must be changed to be closer to termination. The changing argument must be tested in the termination condition: when using cdr, test termination with null?, and when using sub1, test termination with zero?.
When building a value with ÷, always use 0 for the value of the terminating line, for adding 0 does not change the value of an addition. When building a value with x, always use 1 for the value of the terminating line, for multiplying by 1 does not change the value of a multiplication. When building a value with cons, always consider () for the value of the terminating line.
Simplify only after the function is correct.
Recur on the subparts that are of the same nature: on the sublists of a list; on the subexpressions of an arithmetic expression.
Use help functions to abstract from representations.
Abstract common patterns with a new function.
Build functions to collect more than one value at a time.

Rules 1 and 4 mostly resemble safe guards we would use in any programming language, and the other rules also apply even in imperative languages. However, in this post I am only interested in rule 2: using cons to build list. LISP stands for LISt Processing, right? And indeed, lists and sequences are the most useful data structures in Lisp or Scheme dialects, inasmuch as we don’t need vectors. A cons cell is just a record holding two values, much like a pair as in (1 . 2). Note that it is different from (list 1 2), which in fact is composed of two cons cells, where the first cons cell contains 1 and a pointer to the second cons cell (same for the second cons, except that it includes nil to signal the end of the list). So, to add an element in front of a list, you will just be consing it as in the following example:

(cons 1 (list 2 3 4))

The car of the above expression is just 1, and its cdr is (2 3 4). And most operations on lists will rely on these ideas: add an element upfront, or inspect the very first item and keep the remaining elements for further processing. Clojure also comes with a cons function (and a conj, which acts differently whether items are grouped in a list, vector, set or map–the so-called concrete type). Racket provides custom accessors, like first and rest, and even last or append, in addition to all the above Lisp operators. And go read a previous post of mine to see an implementation of cons in Haskell.

Finally, I should note that Rainer Joswig provides a good overview of cons and memory allocation in Lisp, and why consing may add a bit of overhead to a program. And Paul Graham discusses extensively the use of cons vs. macros in his book On Lisp (I still have to manage to find some time to read it more seriously one day!).

In Python, lists are implemented as vectors, not as single linked lists (in this case, you would probably want to try a deque). However, it is quite easy to adapt the definition of cons, car and cdr discussed in the SICP (nicely typesetted PDF here) as closures:

(define (cons x y)
   (lambda (m) (m x y)))

(define (car z)
  (z (lambda (p q) p)))

(define (cdr z)
  (z (lambda (p q) q)))

See what’s going on? In this case, the definition of cons implies a closure holding two values (x and y). Whenever you call car or cdr with a “cons”, you are actually calling a function that selects either the first or the second member of the pair (p q). Translation to Python is straightforward, even if this does not please some PEP conventions (flake8 will report an E731 on the following piece of code):

def cons(x, y):
    return lambda m: m(x, y)

def car(z):
    return z(lambda p, q: p)

def cdr(z):
    return z(lambda p, q: q)

aliquote.org

List processing and consing

See Also