Folding in Parallel: The input sequence can be arbitrarily partitioned (and recursively sub-partitioned, if desired) among workers, which would run in parallel with no races, dependencies or even memory bank conflicts. Such embarrassing parallelism is ideal for multi-core, GPU or distributed processing.