Yesterday I came across a nice post on the (Durstenfeld-)Fisher-Yates1 and Sattolo’s algorithms. Briefly, the Fisher-Yates shuffling algorithm is a robust and efficient technique to generate a random permutation of elements in a list, with uniform probability. The Sattolo’s algorithm relies on a similar approach but produces permutation consisting of a single cycle. Donald Knuth coined the former Algorithm P (TAOCP, vol. 2), and later acknowledged the original work of Fisher and Yates. If you want to get an idea of how this works, trust Mike Bostock for providing so much enlightening visualisation.
There’s a nice implementation available on Programming Praxis:2
(define (shuffle x) (do ((v (list->vector x)) (n (length x) (- n 1))) ((zero? n) (vector->list v)) (let* ((r (randint n)) (t (vector-ref v r))) (vector-set! v r (vector-ref v (- n 1))) (vector-set! v (- n 1) t))))
Reservoir sampling3 is another efficient technique for generating a random sample of a collection of elements, although in this case the sample is a subset of the whole population, and thus is not a single permutation like before. Usually, the population size is not known in advance and cannot fit in memory, such as when presented with streams of data. It is an elegant algorithm where we store the k elements to sample from a stream of unknown size in a reservoir of size k while we add the i-th element to the reservoir with a probability of k/i by replacing a randomly selected element in the reservoir. See this post for more details.