Wednesday, August 12, 2009

Time-based partitioning

Here's a nice short article from Josh Berkus on partitioning your data by time.

We do a lot of this at work, and spend a lot of time discussing how to do it well.

I think Josh hits all the major points:
  • Choose your partition size to work nicely with your retention period, so that reclaiming data becomes simply dropping tables.
  • Avoid having too many tables, because query processing will slow down dramatically.
  • Beware of the overhead and complexity of dynamically creating and dropping tables. It is expensive, it is low-concurrency, it requires high privileges in the database, and it requires that you have a good scheme for naming the tables to avoid confusion.
  • Be alert for opportunities to have time-varying detail, since older data generally can be retained with less instant-by-instant detail, but can instead be aggregated into larger time units. Of course, this aggregation is expensive and complicated, too.
An interesting point that Josh doesn't discuss too much is the question of how to design the primary and alternate keys (and their corresponding indexes) for the data. It is tempting to make the timestamp field be the primary key, and to use that for a so-called "clustered" index on those database systems which support it. However:
  • You need to have a way to ensure that your timestamp is unique, which may not match your application semantics depending on what sort of data you have, and
  • Not all database implementations recognize the special case of "ever-increasing primary key insertions"; if your particular database doesn't recognize this, you can encounter a well-known physical storage problem where each leaf page in your clustered index is exactly half full, meaning that your database is twice as big as it should be.
All in all, a great summary by Josh, who is a blogger worth reading in the database world (he's been involved in Postgres for many years).

1 comment:

  1. clock_gettime(CLOCK_REALTIME) returns timestamps with nanosecond precision. I store them as:

    seconds DATE,
    nanos LONG,
    CONSTRAINT pk_myTable PRIMARY KEY (seconds, nanos)

    I would be very surprised to ever encounter a collision under this scheme. It would be as likely as having a UUID collision!