We do a lot of this at work, and spend a lot of time discussing how to do it well.
I think Josh hits all the major points:
- Choose your partition size to work nicely with your retention period, so that reclaiming data becomes simply dropping tables.
- Avoid having too many tables, because query processing will slow down dramatically.
- Beware of the overhead and complexity of dynamically creating and dropping tables. It is expensive, it is low-concurrency, it requires high privileges in the database, and it requires that you have a good scheme for naming the tables to avoid confusion.
- Be alert for opportunities to have time-varying detail, since older data generally can be retained with less instant-by-instant detail, but can instead be aggregated into larger time units. Of course, this aggregation is expensive and complicated, too.
- You need to have a way to ensure that your timestamp is unique, which may not match your application semantics depending on what sort of data you have, and
- Not all database implementations recognize the special case of "ever-increasing primary key insertions"; if your particular database doesn't recognize this, you can encounter a well-known physical storage problem where each leaf page in your clustered index is exactly half full, meaning that your database is twice as big as it should be.
clock_gettime(CLOCK_REALTIME) returns timestamps with nanosecond precision. I store them as:
ReplyDeleteseconds DATE,
nanos LONG,
CONSTRAINT pk_myTable PRIMARY KEY (seconds, nanos)
I would be very surprised to ever encounter a collision under this scheme. It would be as likely as having a UUID collision!