Monday, May 11, 2009

Design your Database

When we started designing the Build Farm implementation, one thing we agreed on almost immediately was that we wanted to explicitly and thoroughly design the database schema. We both had had frustrating encounters with the modern trend toward high-level Object-Relational Mapping systems, in which it seems that the typical approach is to ignore the underlying database, and spend all your energy designing the object layer.

I'm not trying to say that there is no point to designing the object layer; object modeling is quite important and necessary. But it seems as though, more and more, people don't want to talk about the database at all!
  • They don't want to talk about data types, or candidate keys, or normalization,
  • They don't want to talk about check constraints, or referential integrity, or update propagation,
  • They don't want to talk about index definitions, or access patterns, or query planning, or transaction design,
  • They don't want to talk about data lifetimes, or versioning, or auditing, or security control.
They don't seem to want to talk about the database at all, which I think is an unfortunate turn of events. Sure, modern database systems are complex and intricate, and have behaviors that take thorough and detailed study. But they are also extraordinarily powerful pieces of software, with amazing capabilities, and rich, mature functionality. They can be used well, or they can be used poorly.

In the Build Farm, we started our overall design by discussing some of the basic goals and requirements. Then, the next step we took was to talk about the data that we wanted to track. We discussed the data at a fairly abstract level, then we sat down at a keyboard and typed in a bunch of DDL: CREATE TABLE, CREATE INDEX, etc. This forced us to get specific about table names, and column names, and data types, and inter-table relationships, and so forth. Then we loaded up a little bit of sample data into those tables (by hand, using SQL Squirrel), and practiced writing some queries.

This is what people used to call "building the Data Dictionary". I don't know if anybody talks about a data dictionary anymore; I guess I'm old-fashioned.

We certainly made some mistakes when we implemented the database schema for the Build Farm. We've fixed some of those mistakes, and we'll need to fix more of them in the future.

But I firmly believe that the decision to design the database first was a major reason why the Build Farm's implementation was a success.

No comments:

Post a Comment