aliquote.org

Generating partitions

January 9, 2022

In a previous post we showed how to efficiently generate the power set of a given set of elements. Other than its relation to Binomial coefficient as documented on Wikipedia, I don’t know if they are of any use on statistics. We also mentioned the case of partitions, which are more interesting from a statistical point of view since they are at the heart of partitioning methods in cluster analysis. How do we generate all partitions? Don Knuth discusses two algorithms to generate integer (algorithm P) or set (algorithm H) partitions,1 but see also The Algorithm Design Manual (or this older course by Steven Skiena).

The main idea of algorithm H to generate set partition is as follows:

  1. Set a1an00,b1bn111,and m1.
  2. Visit the restricted growth string a1an,2 which represents a partition into m+[an=m] blocks. Then go to (4) if an=m.
  3. Set anan+1 and return to (2).
  4. Set jn1; then, while aj=bj, set jj1.
  5. Terminate if j=1. Otherwise set ajaj+1.
  6. Set mbj+[aj=bj] and jj+1. Then, while j<n, set aj leftarrow0, bjm, and jj+1. Finally set an0 and go back to (2).

An Haskell solution to multiset partitions is available in Generating Multiset Partitions, by Brent Yorgey.

What about the total number of partitions? The number of partition, p(n), can be formulated using recursion. Let p1(k)=1 (1kn), then

pm(k)={pm1(k)if k>n,pm1(k)+pm(km)if km>1,

and we evaluate pm(k) for 1mn and mkn. Note that p(n)=pn(n). Berstel et al. offer an implementation in Pascal to compute p(n) as α104+β, whose literal translation in C looks like this:3 (untested code)

#include <stdlib.h>

void npartitions(int n, int *alpha, int *beta) {
  int nmax = 100;
  int base = 10000;
  int a[nmax];
  int b[nmax];
  int k, m, s;

  for (k = 0; k < n; ++k) {
    a[k] = 0;
    b[k] = 1;
  }

  for (m = 2; m < n; ++m) {
    for (k = m; k < n; ++k) {
      s = b[k] + b[k - m];
      b[k] = s % base;
      div_t q = div(s, base);
      a[k] = a[k] + a[k - m] + q.quot;
    }
  }

  *alpha = a[n];
  *beta = b[n];
}

♪ Leisure • Take You Higher


  1. D. E. Knuth. The Art of Computer Programming, Vol. 4A: Combinatorial Algorithms. Addison-Wesley, 2011. ↩︎

  2. A restricted growth string is a string a1a2an of nonnegative integers in which we have a1=0 and aj+11+max(a1,,aj) for 1j<n↩︎

  3. J. Berstel, J.-E. Pin, and M. Pocchiola. Mathématiques et Informatique : Combinatoire et Arithmétique. Ediscience International, 1991. ↩︎

See Also

» Student's t distribution » QR factorization and linear regression » Bootstrap resampling in Lisp » Fisher-Yates shuffling » Statistical software evaluation