Wednesday, November 24, 2021

Old programmers and old bugs

Why do I find old bugs so interesting? I'm not sure, but I've always really enjoyed tracking down a bug that turns out to have been present in the code for many years.

In this particular case, the code was in a program which was doing some randomized testing. We didn't need ultra-fancy cryptographically-secure randomization here, we just wanted to spread the testing out in interesting ways. So the programmer kept it simple (which is good!):

java.util.Random random = new java.util.Random();

Now we can get a random number for use in our test, whenever we want, quite simply, by just calling random.nextInt(), as in:

int randNum = random.nextInt();

For this particular test program, however, it's much more convenient to work only with non-negative numbers. Every program is different, but this particular test program didn't handle negative numbers in its other sections of the code.

So that's simple; we'll just use the absolute value of the random integer, so we change the above line very slightly:

int randNum = java.lang.Math.abs(random.nextInt());

And all was well. The test program was put into regular use, running regularly in our Continuous Integration test system, doing a variety of randomized testing after every commit. Over seven years, many trillions of invocations of this code were made.

But very occasionally (twice in the last year, which was just often enough for our automated testing tool to notice a pattern), the test failed, printing an (unfortunately) obscure message:

Row not found:-2147483648

Initially, the problem was not given much notice, lost in the cacophony of day-to-day work, but when it happened a second time, I was intrigued! Bugs which happen only once are always interesting, but bugs which happen twice, but with 9 months between them, seem to raise my hackles in strange ways.

Glancing at it, I was struck by -2147483648. As every programmer knows, this is java.lang.Integer.MIN_VALUE; that is, it is the smallest number that can be stored in a 32-bit twos-complement signed integer.

Hmmm... I said to myself. Does this mean that Math.abs() returned a negative number?

Off I went to RTFM, where I found:

Note that if the argument is equal to the value of Integer.MIN_VALUE, the most negative representable int value, the result is that same value, which is negative.

Of course! The absolute value of -2147483648 is too large to be stored in a 32 bit integer.

And, unfortunately, instead of throwing an exception at this point, the original builders of Java, nearly 30 years ago, decided to simply have Math.abs(Integer.MIN_VALUE) return Integer.MIN_VALUE.

Well, at least they documented it.

It turns out that nine months or so works out to be just enough that a one-in-four-billion chance actually occurs. And it appears that we must have increased our overall testing frequency, because until recently we weren't running enough tests to hit this condition often enough for the automated test analyzer to have two occurrences in its historical database to enable it to see a pattern worth notifying us about.

Time was on our side.

The problem was right there, hiding in plain sight! And yet, none of us saw it. Isn't that interesting?

No comments:

Post a Comment