Monday, March 26, 2012

Cliff Click's deep dive into Constant Propagation techniques

Cliff Click has published the third episode of his description of lattice-building techniques for constant propagation in optimizing compilers:

  1. Too Much Theory
  2. Too Much Theory, Part 2
  3. Too Much Theory, Part 3

The series starts off with a promise:

I’ve been wrestling with some tricky bugs and I’d thought I’d share the pain. For fun, I’m going to toss in evidence that All Your Existing Optimizing Compilers Are Broken, and least they are if they attempt a data-flow style Constant Propagation on a language with a class hierarchy and a distinguished NULL pointer (e.g. Java, C++).

The meat of the series is the explanation of the general approach to constant propagation, using a lattice data structure to describe various important classes of partially-known values.

The series closes by finally getting to the originally-promised bug:

The Constant Propagation algorithm relies on having a lattice – and lattices are defined as having a unique lower-bound (and unique upper-bound) for any two elements. This structure does NOT have a unique lower-bound, and so is no longer a lattice! For some particular Java program and some particularly bad luck with our CP algorithm – the CP algorithm can hang indefinitely, or miss some constants on some runs – but find those constants on other runs for the same Java program. Garbage-In Garbage-Out. CP is technically wedged.

It's heavy going, but Click does his typical superb job of description and explanation, and if you've spent much time studying compilers, you'll find this well worth your time. The articles are laden with references to background information, so if you find you want to learn more about Partial Evaluation, or Sparse Conditional Constant Propagation, or Constant Folding, you'll have plenty to read and pursue.

By the way, it appears that Click has left Azul Systems, and has founded a new startup: 0xdata. Big data systems are all the rage nowadays, and 0xdata seems to have all the right buzzwords:

H20 brings database like interactiveness to Hadoop. It can be installed as a standalone or on top of existing Hadoop installation. It leverages data in HDFS and other familiar components in the Hadoop ecosystem such as Hive and Pig.
Click's high end performance bona fides are as robust as they get, so it should be interesting to follow the 0xdata work...

No comments:

Post a Comment