- Accurately assign the right work to the right machines,
- While providing access to historical results for analysis.
Efficiently executing work is an often-discussed topic; you can find it covered until topics such as "Workload Management", "Resource Management", etc. It tends to be the province of software such as DBMSs, Application Servers, and Operating Systems. That is, it's a really hard problem to do this well.
We did not want to write an operating system.
So, we made several simplifying assumptions:
- We can afford to over-provision our Build Farm. Machine resources are cheap enough nowadays, and our overall problem is small enough, that we can "throw hardware at the problem".
- We don't have to be perfect. We have to do a decent job of scheduling work, but we can afford to make the occasional sub-par decision, so long as in general all the necessary work is getting done in a timely fashion.
Priorities are just what you think they are: each job in the Build Farm is assigned a priority, which is just an integer, and jobs with a higher priority are scheduled in preference to jobs with a lower priority.
Capabilities are an abstract concept which enable the Build Farm to perform match-making between waiting jobs, and available machines. Jobs require certain capabilities, and machines provide certain capabilities, and for a job to be scheduled on a machine, the job's capabilities must be a strict subset of the machine's capabilities.
So, for example, a particular job might specify that it requires the capabilities:
- JDK 1.5
- JBoss 4.3
- DB2 9.5
The bottom line is that we re-defined the Build Farm's core job-scheduling requirement to be:
- When a machine is available to do work, ask it to do the highest-priority job in the queue which this machine is capable of executing.