Wednesday, July 21, 2010

Unix file locking does not implement fair scheduling

I've been whiling away an hour or two thinking about Unix file locks, and fair scheduling, and from what I can tell, Unix file locking does not implement fair scheduling. Below are some more words to explain what that means, in more detail.

Unix file locking is a fairly common set of APIs on most Unix and Unix-like systems to support what are called advisory locks:

UNIX does not automatically lock open files and programs. There are different kinds of file locking mechanisms available in different flavours of UNIX and many operating systems support more than one kind for compatibility. The two most common mechanisms are fcntl(2) and flock(2). Although some types of locks can be configured to be mandatory, file locks under UNIX are by default advisory. This means that cooperating processes may use locks to coordinate access to a file among themselves, but programs are also free to ignore locks and access the file in any way they choose to.

Unix file locking is available on a very broad set of Unix-like operating systems, including Solaris, Linux, FreeBSD, HPUX, etc.

Microsoft Windows offers a fairly similar set of APIs called LockFileEx and UnlockFileEx.

So, most operating systems of interest have these file locking APIs.

Fair Scheduling of lock requests is a term introduced 35 years ago (at least) by that titan of system programming, Jim Gray. In section 4.1 of the ultra-classic paper Granularity of Locks in a Shared Data Base (co-authored by Franco Putzolu and Ray Lorie), Gray writes:

The set of all requests for a particular resource are kept in a queue sorted by some fair scheduler. By "fair" we mean that no particular transaction will be delayed indefinitely. First-in first-out is the simplest fair scheduler and we adopt such a scheduler for this discussion modulo deadlock preemption decisions.

Let's try to illustrate fair scheduling with a short example:

Suppose that I have a set of processes (A, B, C, etc.) who are trying to co-operate by locking and unlocking a common file named file. The processes may lock the file in either shared (S) or exclusive (X) mode.

  1. First, A locks file in shared mode.

  2. Second, B locks file in shared mode. These first two locks are immediately granted, as they conflict with nothing.

  3. Third, C locks file in exclusive mode. This lock cannot be immediately granted, so C waits.

  4. Fourth, A unlocks file. At this point, B still holds file (S), and so C still waits.

  5. Fifth, D locks file in shared mode. Here, we have a scheduling decision.

On the one hand, D's lock request can be immediately granted, because it is compatible with all currently-granted locks. This scheduling regimen is sometimes called greedy scheduling, because it attempts to grant locks whenever possible. On the other hand, if D's lock request is immediately granted, then it has been given "unfair" access to the resource, because it made its request after C, and yet is being granted access before C. If, at this point, the locking implementation forces D to wait for its S lock until C has first been granted, then released, its X lock, then the lock manager is practicing fair scheduling, because it is ensuring that C is not delayed indefinitely. (The delay might be indefinite because S lock requests might keep coming along from various processes and if the arrival rate of the new lock requests is such that there is never a period that all the S lock requests are released, then the X lock request may never be granted.)

So, now it should be clear what I mean when I frame the question:

Does Unix file locking implement fair scheduling?

To try to answer the question, I did a simple experiment: I wrote a short C program (actually, one of my colleagues at my day job wrote it for me) which has a simple command-line interface and allows me to interactively lock and unlock files in shared and exclusive mode, and I fired up about half-a-dozen terminal windows, and I started experimenting.

Sure enough, on the systems that I tried:

  • Linux 2.6 (Ubuntu)

  • MacOS X 10.6 (Darwin)

  • FreeBSD 8.0

None of the systems appeared to offer fair scheduling. Late-arriving S lock requests were granted in preference to waiting X lock requests. In fact, on some of the systems, when multiple X lock requests were waiting, and all the S lock requests were released, the X request that was then granted was not always the first X that had made its request! So not only did I observe lock starvation by late-arriving S lock requests, I also observed lock starvation by late-arriving X lock requests.

Interestingly, there was one system that did implement fair scheduling:

  • Solaris 10

So it appears to be a bit of a mixed bag: some Unix-like systems implement fair scheduling, some do not, and, furthermore, for those that do not, some appear to implement FIFO scheduling of blocked requests, where others will grant an (apparently) arbitrary request from the pool of blocked requests when the last blocking lock is released.

You may be wondering why this matters?

Well, lock scheduling and, specifically, the fairness or unfairness of lock scheduling, is known to be quite closely related to a number of thorny and very annoying performance problems, including starvation and convoying.

Starvation is fairly easy to conceptualize: it is a situation where one request is not making any process because it is being starved of its resources. I tried to illustrate starvation in my example above, where the exclusive lock requester can wait an unexpectedly long time for a lock request to be granted, because the system is unfairly allowing shared lock requests to be granted in preference to it.

Lock convoys are a more complicated problem, and have to do with situations in which the system is fairly busy, and so there is a fair amount of contention for the locks, and additionally the processes are contending for CPU resources and are being time-sliced. In such a case, it is unfortunately quite common for the lock scheduling algorithms to interact very poorly with the time-slicing and context-switching algorithms, leading to a poorly-performing system which is called the "convoy" or "boxcar" effect. I'm not going to have time (nor do I have the brainpower) to describe it very well myself, so let me just point you to a bunch of well-written descriptions:

  • Joe Duffy's Weblog: "This wait, unfortunately, causes a domino effect from which the system will never recover."

  • Raymond Chen: "It's not how long you hold the lock, it's how often you ask for it."

  • Sue Loh: " If all threads at that priority level are contending over the object, what you see is very frequent context switching between them all."

  • Wikipedia: "The overhead of repeated context switches and underutilization of scheduling quanta degrade overall performance."

  • Larry Osterman: "You spend all your valuable CPU time executing context switches between the various threads and none of the CPU time is spent actually processing work items."

  • Rick Vicik: "when there are many threads, each one runs only briefly before waiting for a lock and by the time it runs again, its cache-state has been wiped out by the others.

  • Fred on Programming: "You can get this easily if you wake up a large number of threads with the same priority and have them all immediately try to get another lock."

Well, anyway, there you go: a short essay about Unix file locking, fair scheduling, and lots of pointers to fascinating discussions about lock convoys and alternate scheduling algorithms. What more could you want for a summer afternoon?!

1 comment:

  1. I found your article interesting. Alas, there are different IO schedulers in Linux and you didn't indicate which one was turned on for the filesystem you checked.

    An update that explores the different schedulers would be a welcome addition.