Monday, May 4, 2009

Build Farm Part 1: Concepts and Goals

At work, I spend a lot of time building and maintaining a tool we call the Build Farm.

The Build Farm is internally designed and developed, but it bears a lot of resemblance to tools such as Tinderbox or BuildBot, together with some aspects of tools such as CruiseControl, and even some aspects of test case management systems. I looked at several such tools when we were starting out, but our system is wholly home-grown and coded from scratch by us; this is both good and bad. It's good because we're intimately familiar with how and why it behaves as it does; it's bad because we probably could have taken an existing system and re-used it.

Anyway, the Build Farm's primary goals are:
  1. To manage and make effective use of a large (for us) pool of machines dedicated to building and testing our software. We currently have about 80 machines in the Build Farm; there are about 60 Windows machines, about 10 Linux machines, and about 10 Solaris machines. Work is scheduled both automatically and on-demand.
  2. To preserve and assist with the analysis of the results of building and testing the software. Build problems must be identified and resolved; test problems must be analyzed and investigated.
  3. To enable access to these tasks by users across the company, in all our worldwide offices, at any time 24/7.
Architecturally, the Build Farm consists of several primary pieces:
  1. Persistent storage for information about builds, tests, work requests, machines, schedules, results, etc. Our storage includes a large file server which serves both NFS (Linux, Solaris) clients and SMB (Windows) clients (via Samba), and is used to store build logs, test output, and other build-related data such as application server trace logs, test databases, etc.. Our storage also includes a relational database for structured data; we use Apache Derby.
  2. A web server which provides a simple UI for editing build configurations and for viewing build and test reports. The UI allows a variety of searching, as analyzing historical information and looking for trends and patterns is a very common use.
  3. A client program which is installed on each build machine. We try to keep the client program fairly simple, and concentrate most of the logic on the server. The client knows how to request work, how to spawn subprocesses to perform work, and how to capture output and route it to the file server for the central UI to analyse.
  4. The web server also provides a SOAP interface for use by the build machines. The most important and most frequently-issued SOAP request is: "get next job to perform".
  5. A source code control system which the build machines can use to fetch the code to build and test. We use Perforce.
I'll talk more, and in more detail, about the Build Farm in future postings. I've been thrilled with the success of this tool, and I'm quite grateful to the other members of my team for their ideas, their support, and their willingness to tolerate the ups-and-downs of an internally-developed tool like this.

1 comment: