Monday, August 31, 2009

Benchmarking Derby GROUP BY

I've been building a simple GROUP BY benchmark for Derby.

I've been prototyping a new GROUP BY implementation (DERBY-3002) which provides support for the new ROLLUP keyword. As part of this work, it's important to be able to get some data about the relative performance of the current Derby GROUP BY implementation versus the new proposed implementation.

So I've been working on building a simple GROUP BY benchmark.

Happily, Derby already has a quite sophisticated benchmarking infrastructure:
  • The and classes provide support for loading a scalable Wisconsin benchmark schema to an arbitrary size.
  • The perf.clients package provides general benchmarking capability, with generic classes to manage the overall benchmark.
  • A somewhat similar benchmark was written not too long ago to measure index join performance.
So, starting with that infrastructure, I've been writing a GROUP BY benchmark (DERBY-4363). My first implementation demonstrated that I could run GROUP BY statements in this simple harness. My next implementation needs to provide a richer set of statements, and also needs to provide command-line arguments to pick the specific statement to run.

Once I get a reasonable benchmark, which I hope to be able to do this week, I'll then use it to collect a set of performance numbers against the current Derby trunk, and against the DERBY-3002 patch.

This will hopefully give us some hard data regarding the performance of the new GROUP BY algorithm.

No comments:

Post a Comment