Pages

Monday, November 30, 2009

Finding what you're looking for by looking for everything else

At work, one of my responsibilities is to maintain a complicated object cache. Caches are a great mechanism for improving the performance of searches. However, they don't work well for improving the performance of searches for non-existent items; in fact, they often worsen such performance because the code must first search the cache, fail to find the object, then must search the underlying store, and fail to find the object again. In this case, the cache has added overhead and contention without adding benefit.

In practice, we try to minimize the situations where clients search for objects that don't exist, but such situations still arise, so, every six months or so, somebody tries to figure out a way to improve the cache so that it can cache items that don't exist, a discussion that usually ends quietly once my colleague Tom points out the obvious impossibility of enumerating the infinite space of objects that don't exist.

At any rate, I was reminded (a bit) of this by this cute post on The Daily WTF, in which the programmer makes the basic mistake of trying to enumerate all the unwanted data, rather than simply specifying the valid data.

Of course, not only is this technique a poor performer, but, perhaps more importantly, it is a classic source of security bugs, since the bad guys can always think of an extension that you left off your list. So, in this case as in so many others, the simplest code is not only easiest to understand, and best performing, but it is the most secure as well.

Saturday, November 28, 2009

Coders at Work: Guy Steele

Chapter 9 of Coders at Work contains Peter Seibel's interview of Guy Steele. Steele is another in the progression of language designers that Seibel chose to include in the book: Crockford, Eich, Bloch, Armstrong, Peyton Jones, and now Steele. Although programming language design is clearly one of the most important fields within Computer Science, I do wish Seibel had balanced his choices somewhat, to include more coders from other areas: operating systems, networking, databases, graphics, etc. Still, Steele is an interesting engineer and I quite enjoyed this interview.

Having heard from Zawinski about some of the later developments in the work on Emacs and Lisp, it is interesting to hear Steele talk about some of the very early work in this area:

One of the wonderful things about MIT was that there was a lot of code sitting around that was not kept under lock and key, written by pretty smart hackers. So I read the ITS operating system. I read the implementations of TECO and of Lisp. And the first pretty printer for Lisp, written by Bill Gosper. In fact I read them as a high-school student and then proceeded to replicate some of that in my 1130 implementation.

This description of learning how to program by reading the programs of others, was wide-spread. It is certainly how I learned to program. Although I think that computer science education has come a long way in 30 years, I think that the technique of reading code is still a wonderful way to learn how to program. If you don't like reading code, and don't develop a great deal of comfort with reading code, then you're not going to enjoy programming.

Steele talks about the need to have a wide variety of high quality code to read:

I would not have been able to implement Lisp for an 1130 without having had access to existing implementations of Lisp on another computer. I wouldn't have known what to do. That was an important part of my education. Part of the problem we face nowadays, now that software had become valuable and most software of any size is commercial, is that we don't have a lot of examples of good code to read. The open source movement has helped to rectify that to some extent. You can go in and read the source to Linux, if you want to.


I think that the open source movement is an excellent source of code to read; in addition to just reading the code, many open source projects have communities of programmers who love to talk about the code in great detail, so if you have questions about why the code was written the way it was, open source projects are usually very willing to discuss the reasoning behind the code.

In addition to early work on Lisp, Steele was also present for the invention of the Emacs editor, one of the most famous and longest-living programs in existence:

Then came the breakthrough. The suggestion was, we have this idea of taking a character and looking it up in a table and executing TECO commands. Why don't we apply that to real-time edit mode? So that every character you can type is used as a lookup character in this table. And the default table says, printing characters are self-inserting and control characters do these things. But let's just make it programmable and see what happens. And what immediately happened was four or five different bright people around MIT had their own ideas about what to do with that.

In retrospect, a WYSIWYG text-editing program seems so obvious, but somebody had to think of it for the first time, and to hear first hand from somebody who was actually part of that process is great!

My favorite part of the Steele interview, however, was this description of programming language design, which, again, sounds simple in retrospect, but really cuts directly to the core of what programming language design is trying to achieve:

I think it's important that a language be able to capture what the programmer wants to tell the computer, to be recorded and taken into account. Now different programmers have different styles and different ideas about what they want recorded. As I've progressed through my understand of what ought to be recorded I think that we want to say a lot more about data structures, we want to say a lot more about their invariants. The kinds of things we capture in Javadoc are the kinds of things that ought to be told to a compiler. If it's worth telling another programmer, it's worth telling the compiler, I think.

Exactly, and double-exactly! Firstly, Steele is absolutely right that most programming languages concentrate far too much on helping programmers describe control flow and not enough on helping programmers describe data structures. Most, perhaps nearly all, of the bugs and mistakes that I work on have to do with confusion about data structures, not with confusion about control flow. When reading most programs, the control flow is simple and evident, but teasing out the behavior of the data structures is often horrendous.

And, secondly, I just love the way Steele distills it:

If it's worth telling another programmer, it's worth telling the compiler, I think.


Steele is apparently working on a new programming language, Fortress. It will be interesting to see how it turns out.

Monday, November 23, 2009

Coders at Work: Peter Norvig

Chapter 8 of Peter Seibel's Coders at Work contains his interview with Peter Norvig.

Norvig is the 2nd of 3 Google employees that Seibel interviews in his book, I believe. Is 25% too high a ratio for a single company to have? Perhaps, but there's no disputing that Google has a fantastic array of talent, and Norvig reputedly is one of the reasons why, so I'm pleased to see him included.

At this point in his career, Norvig is much more of a manager/executive than a coder, so his observations on coding are perhaps not as immediate as with others that Seibel speaks to, but still, Norvig has some very serious programming chops. Joel Spolsky and Jeff Atwood told a story on their podcast recently about their DevDays conference, which had the unusual format of being a short conference held multiple times, successively, in various locations around the world: at each location, one of the events that they organized was a tutorial introduction to the programming language Python. Each tutorial was taught by a different instructor, who was chosen from the available instructors local to the conference location, and each time, Joel and Jeff suggested to the instructor that for the material of the course, the instructor could choose to perform an in-depth analysis of Norvig's spelling checker. 10 of the 12 instructors apparently chose to take this suggestion, and, each time, the technique of analyzing a single brilliantly-written piece of software was successful.

One of the things that must be fascinating about being at Google is the scope and scale of the problems. Norvig says:

At Google I think we run up against all these types of problems. There's constantly a scaling problem. If you look at where we are today and say, we'll build something that can handle ten times more than that, in a couple years you'll have exceeded that and you have to throw it out and start all over again. But you want to at least make the right choide for the operating conditions that you've chosen -- you'll work for a billion up to ten billion web pages or something. So what does that mean in terms of how you distribute it over multiple machines? What kind of traffic are you going to have going back and forth?

Although the number of such Internet-scale applications is increasing, there still aren't many places where programmers work on problems of this size.

Another interesting section of the interview (for me) involves Norvig's thoughts about software testing:

But then he never got anywhere. He had five different blog posts and in each one he wrote a bit more and wrote lots of tests but he never got anything working because he didn't know how to solve the problem.
...
I see tests more as a way of correcting errors rather than as a way of design. This extreme approach of saying, "Well, the first thing you do is write a test that says I get the right answer at the end," and then you run it and see that it fails, and then you say, "What do I need next?" -- that doesn't seem like the right way to design something to me.
...
You look at these test suites and they have assertEqual and assertNotEqual and assertTrue and so on. And that's useful but we also want to have assertAsFastAsPossible and assert over this large database of possible queries we get results whose precision value of such and such...
...
They should write lots of tests. They should think about different conditions. And I think you want to have more complex regression tests as well as the unit tests. And think about failure modes.


During this long section of the interview, which actually spans about 5 pages, Norvig appears to be making two basic points:

  • Testing is not design. Design is design.

  • Many test suites are overly simple. Testing needs to start simple, but it needs to be pursued to the point where it is deep and powerful and sophisticated.


I think these are excellent points. I wonder what Norvig would think of Derby's test harness, which has support functions that are much more sophisticated than assertEquals and the like: assertFullResultSet, assertSameContents, assertParameterTypes, etc.

I'm quite pleased that Seibel raises the often-controversial question of Google's "puzzle" style of interviewing, and Norvig's answer is quite interesting:

I don't think it's important whether people can solve the puzzles or not. I don't like the trick puzzle questions. I think it's important to put them in a technical situation and not just chitchat and get a feeling if they're a nice guy.
...
It's more you want to get a feeling for how this person thinks and how they work together, so do they know the basic ideas? Can they say, "Well, in order to solve this, I need to know A, B, and C," and they start putting it together. And I think you can demonstrate that while still failing on a puzzle. You can say, "Well, here's how I attack this puzzle. Well, I first think about this. Then I do that. Then I do that, but geez, here's this part I don't quite understand."
...
And then you really want to have people write code on the board if you're interviewing them for a coding job. Because some people have forgotten or didn't quite know and you could see that pretty quickly.

So the puzzle question, in Norvig's view, is just a way to force the interviewer and the interviewee to talk concretely about an actual problem, rather than retreating into the abstract.
Over the years (decades), I've interviewed at many interesting software companies, including both Microsoft and Google, and I think that this approach to the interviewing process is quite sensible. Although in my experience it was actually Microsoft, not Google, where I encountered the full-on trick puzzle interview process, I can see that, as a technique, it actually works very well. And, as the interviewee, I actually appreciate getting past the small talk and getting direct to the "code on a wall" portion of the interview.

I really enjoyed the Norvig interview.

Saturday, November 21, 2009

Chromium OS and custom hardware

I've been making my way, slowly, through the Chromium OS information. One of the things that surprised me was the notion that the OS requires custom hardware. Did Google make a mistake by doing this? This seems like it would reduce their potential user base and make it harder for casual users to experiment with the operating system. However, I see that there is already a growing list of hardware vendors and viable systems, so maybe (particularly since this is Google) there won't be a problem here, and support for Chromium OS will be commonly present in mainstream hardware.

Will a single physical machine be capable of running multiple OS's? (Say, Windows 7, Ubuntu, Chromium OS, etc.) Will virtualization technologies (Virtual Box, VMWare, etc.) help here?

Wednesday, November 18, 2009

Coders at Work: Joe Armstrong, Simon Peyton Jones

Chapters 6 and 7 of Coders at Work are the interviews with Joe Armstrong and Simon Peyton Jones.

I hadn't heard of Joe Armstrong before, although I knew of Erlang. Armstrong is clearly the real deal, the sort of coder I recognize and can tell almost instantly when I'm around them. It can be hard to read an interview with such a person, because when you see their words in print, you think, ouch!

I used to read programs and think, "Why are they writing it this way; this is very complicated," and I'd just rewrite them to simplify them. It used to strike me as strange that people wrote complicated programs. I could see how to do things in a few lines and they'd written tens of lines and I'd sort of wonder why they didn't see the simple way. I got quite good at that.

I know this feeling; I have this feeling all the time; I'm very familiar with this feeling.

The danger, of course, is that what you think of as "the simple way" may be perhaps too simple, or may miss some property of the problem under study which was apparent to the original author but not to the next reader. This urge to re-write is incredibly strong, though, and while I was reading the interview with Armstrong I was instantly reminded of a saying that was prevalent two decades ago, and which I think is attributed to Michael Stonebraker: "pissing on the code". Stonebraker was describing a behavior that he observed during the research projects at UC Berkeley, during which, from time to time, one student would leave the project and another student would join the project to replace the first. Inevitably, the new student would decide that the prior student's work wasn't up to par, and would embark on an effort to re-write the code, a cycle of revision which came to be compared to the way that dogs mark their territory.

As I was reading the Armstrong interview, I couldn't really decide if he was pulling our legs or not:

If you haven't got a directory system and you have to put all the files in one directory, you have to be fairly disciplined. If you haven't got a revision control system, you have to be fairly disciplined. Given that you apply that discipline to what you're doing it doesn't seem to me to be any better to have hierarchical file systems and revision control. They don't solve the fundamental problem of solving your problem. They probably make it easier for groups of people to work together. For individuals I don't see any difference.

Is he serious? He doesn't see any difference? It's hard to believe.

His primitivistic approach seems almost boundless:

I said, "What's that for? You don't use it." He said, "I know. Reserved for future expansion." So I removed that.

I would write a specific algorithm removing all things that were not necessary for this program. Whenever I got the program, it became shorter as it became more specific.

This, too, is an emotion I know well; people who know me know that I rail against the unneeded and unnecessary in software, as I find that complexity breeds failure; it results in lower developer productivity, and in lower performance, harder to use, buggier software. There's a famous story which is retold in one of the old books describing the joint development of OS/2 by Microsoft and IBM, about how the software management at IBM was obsessed with measuring productivity by counting the lines of code, and how the Microsoft engineers kept "messing up" the schedule by re-writing bits of IBM-written code in fewer lines, thus causing the graphs to show a negative slope and alarm bells to ring.

Many parts of the Armstrong interview definitely ring true, such as the observation that programming is a skill which can be improved by practice, almost to the point of being an addiction:

The really good programmers spend a lot of time programming. I haven't seen very good programmers who don't spend a lot of time programming. If I don't program for two or three days, I need to do it.

As well as his observation on the value of describing what a program is supposed to do:

I quite like a specification. I think it's unprofessional these people who say, "What does it do? Read the code." The code shows me what it does. It doesn't show me what it's supposed to do.

However I ended up feeling about the Amstrong interview, one thing is for sure: it was not boring!

I found the Peyton Jones interview much less gripping. Again, I had heard of Haskell, the language that Peyton Jones is associated with, but I hadn't heard much about Peyton Jones himself. I'd say that Peyton Jones is not really a coder; rather, he is a professor, a teacher, a researcher:

I write some code every day. It's not actually every day, but that's my mantra. I think there's this horrible danger that people who are any good at anything get promoted or become more important until they don't get to do the thing they're any good at anymore. So one of the things I like about working here and working in research generally is that I can still work on the compiler that I've been working on since 1990.
...
How much code do I write? Some days I spend the whole day programming, actually staring at code. Other days, none. So maybe, on average, a couple hours a day, certainly.

It's like he's a researcher who wants to be a coder, but also wants to be a researcher. But I suspect that what he really wants to be is a teacher:

I have a paper about how to write a good paper or give a good research talk and one of the high-order bits is, don't describe an artifact. An artifact is an implementation of an idea. What is the idea, the reusable brain-thing that you're trying to transfer into the minds of your listeners? There's something that's useful to them. It's the business of academics, I think, to abstract reusable ideas from concrete artifacts.

I think that's a great description; I suspect that if you flip it around, it's not a bad description of coders: "to implement concrete artifacts from abstract reusable ideas".

I went into both the Armstrong and Peyton Jones interviews thinking, "Oh, I know these guys! This is the guy that did language X; perhaps I will learn something about language X, or about why this guy's life led him to invent language X." Unfortunately, neither interview did that, perhaps because those stories have been told sufficiently many times elsewhere.

I'm still interested in Erlang, and in Haskell, and hopefully someday I will find the time to study them more. But these interviews were not the springboard to that activity.

Good rant on programming language design

I found this rant interesting.

It's rather strongly worded at times (it is a rant, after all, but I think the author makes some excellent points.

I felt his pain about the appearance of:


just another language written first for the compiler and only secondarily for the programmer --- and stuck in a 70s mindset* about the relationship of that programmer to the (digital) world within which they live and work


And I appreciated his observation that:

- programming languages are for *programmers* --- not compilers and compiler-writers
- until you make the everyday, "simple" things simple, it will continue to be a dark art practiced by fewer and fewer


Is it time for a Great New Programming Language? It's been 15 years since Java and JavaScript. What language will be that next language?

Tuesday, November 17, 2009

Valid rant on Google Closure? Or premature optimization?

I found this rant on Google Closure on a SitePoint blog.

Midway through the long, detailed article, we find this:

Closure Library contains plenty of bloopers that further reveal that its authors lack extensive experience with the finer points of JavaScript.

From string.js, line 97:

// We cast to String in case an argument
// is a Function. ...
var replacement =
String(arguments[i]).replace(...);

This code converts arguments[i] to a string object using the String conversion function. This is possibly the slowest way to perform such a conversion, although it would be the most obvious to many developers coming from other languages.

Much quicker is to add an empty string ("") to the value you wish to convert:

var replacement
= (arguments[i] + "").replace(...);




Now, it's quite possible that the author of the blog entry has a point, and the one technique is faster than the other.

However, unless this is in a very performance critical section, I think that the loss of readability is substantial. It is much easier, particularly for the casual reader, to read the first form of the code (with the String() conversion function), then it is to read the second form.

I'm only a part-time JavaScript coder, but for the foreseeable future I intend to concentrate on writing clear and legible code, and place my bets on the compiler and VM authors improving their ability to optimize my code in such a way that hacks like this become less and less necessary.

Monday, November 16, 2009

Stack Overflow

I've been trying to learn how to use Stack Overflow.

And I've been unsuccessful.

I've had a very hard time finding interesting questions being discussed. I see a lot of questions that look like they might be interesting, but aren't phrased very well. I looked today, and there was a question that read:

My application takes a long time to shut down.


And I see a lot of other questions that don't seem interesting at all: "which language is better, C# or Visual Basic."

Also, I have a hard time figuring out what I can do on the site. I started out with a reputation value of 1, which provides me with a very limited range of actions I can perform on the site.

And I'm having a terrible time learning how to use the tagging system effectively. For example, I'd like to pay attention to questions involving Java JDBC database development, but there are just dozens of tags that are relevant to this area: "java", "database", "DBMS", "SQL", "jdbc", "derby", "apache-derby", etc. Why, there are 416 pages of tags on the web site, currently!

I think that Jeff Atwood and Joel Spolsky are really super smart, and I can tell that there is, potentially, a lot of value in the Stack Overflow concept, but if I can't figure out how to use it better, I'm not sure how much more I'm going to be able to get out of it.

Is there an "Idiot's guide to getting started with Stack Overflow" somewhere?

Saturday, November 14, 2009

Coders at Work: Joshua Bloch

Chapter 5 of Coders at Work is the interview with Joshua Bloch.

I'm very familiar with Bloch's work in the Java community, and I've read his online writings as well as several of his books, so the material in his interview was pretty familiar to me.

Several things about Bloch's comments struck me:

  • Bloch talks frequently about programming-as-writing, an important topic for me:

    Another is Elements of Style, which isn't even a programming book. You should read it for two reasons: The first is that a large part of every software engineer's job is writing prose. If you can't write precise, coherent, readable specs, nobody is going to be able to use your stuff. So anything that improves your prose style is good. The second reason is that most of the ideas in that book are also applicable to programs.
    ...
    Oh, one more book: Merriam-Webster's Collegiate Dictionary, 11th Edition. Never go anywhere without it. It's not something you actually read, but as I said, when you're writing programs you need to be able to name your identifiers well. And your prose has to be good.
    ...
    The older I get, the more I realize that it isn't just about making it work; it's about producing an artifact that is readable, maintainable, and efficient.

    The Elements of Style, and a dictionary: what inspired choices! I wish that more programmers would read these books! One of the reasons I work at my blog is to keep up with the practice of writing; like any skill, it requires constant practice

  • I also very much liked Bloch's description of test-first API development, although he doesn't really call it "test-first"; instead he talks about needing to write use cases:

    Coming up with a good set of use cases is the most important thing you can do at this stage. Once you have that, you have a benchmark against which you can measure any possible solution. It's OK if you spend a lot of time getting it reasonably close to right, because if you get it wrong, you're already dead. The rest of the process will be an exercise in futility.
    ...
    The whole idea is to stay agile at this stage, to flesh out the API just enough that you can take the use cases and code them up with this nascent API to see if it's up to the task.
    ...
    In fact, write the code that uses the API before you even flesh out the spec, because otherwise you may be wasting your time writing detailed specs for something that's fundamentally broken.
    ...
    In a sense, what I'm talking about is test-first programming and refactoring applied to APIs. How do you test an API? You write use cases to it before you've implemented it. Although I can't run them, I am doing test-first programming: I'm testing the quality of the API, when I code up the use cases to see whether the API is up to the task.

    Having such a well-known and influential person as Bloch coming out so strongly in favor of test development is a wonderful thing, and I think he makes the case very persuasively.



Bloch is so closely identified with Java, and so deeply involved in its development, that it's hard to imagine him ever doing anything else. Seibel is interested in this question, too:

Seibel: Do you expect that you will change your primary language again in your career or do you think you'll be doing Java until you retire?

Bloch: I don't know. I sort of turned on a dime from C to Java. I programmed in C pretty much exclusively from the time I left grad school until 1996, and then Java exclusively until now. I could certainly see some circumstance under which I would change to another programming language. But I don't know what that language would be.


My own experience mirrors Bloch's almost exactly: I programmed in C from 1987-1994, then briefly studied C++ from 1994-1997, then have been programming in Java almost exclusively for the last dozen years.

I continue to study other languages and environments, but Java is so rich, and so powerful, and I feel so effective and capable in Java, that I haven't yet found that next great language that offers enough of an advantage to take me out of Java.

Blog spam

I've been hit by blog comment spam for the first time.

I have full comment moderation turned on, so the spam just shows up in the moderation queue, and it's easy to delete.

So long as it's only 1 or 2 spam comments per day, that is.

Groan.

Wednesday, November 11, 2009

Go: a new programming language

Rob Pike and Ken Thompson have announced (with Google's help) their new language Go.

Here's a quick summary, from the Language Design FAQ:


Go is an attempt to combine the ease of programming of an interpreted, dynamically typed language with the efficiency and safety of a statically typed, compiled language. It also aims to be modern, with support for networked and multicore computing. Finally, it is intended to be fast: it should take at most a few seconds to build a large executable on a single computer. To meet these goals required addressing a number of linguistic issues: an expressive but lightweight type system; concurrency and garbage collection; rigid dependency specification; and so on. These cannot be addressed well by libraries or tools; a new language was called for.


It's interesting that they are concerned with compilation speed, as modern compilers seem extraordinarily fast to me: I can build the entire Derby system soup-to-nuts in barely 2 minutes on a 5-year-old computer, and that includes building the entire enormous test suite as well as constructing the sample database.

They seem to spend most of their time comparing Go to C and C++; perhaps it's less compelling to somebody coming from a Java background?

Regardless, it definitely looks like it's worth learning more about.

And it's interesting that they are doing this using an open-source methodology.

Learning about Maven

I'm taking an opportunity to try to learn a little bit about Maven.

I actually started by first taking an opportunity to try to learn a little bit about jUDDI. I encountered a small problem with jUDDI, which was patched within hours by one of the jUDDI developers (thanks!).

But now I need to learn how to build jUDDI, in order to experiment with the patch. And jUDDI uses Maven as its build environment.

It's somewhat interesting that this is the first project I've worked with that has used Maven for its builds, since my understanding is that Maven is increasingly popular. But until now, I hadn't actually encountered it in my own usage.

I can build almost all of jUDDI using Maven. jUDDI has about half-a-dozen different subsystems, and I can build all but one of them individually. But when I go to build the "Tomcat package", I get a Maven build error:


[INFO] Failed to resolve artifact.

GroupId: org.apache.juddi
ArtifactId: juddi-parent
Version: 3.0.0.SNAPSHOT

Reason: Unable to download the artifact from any repository

org.apache.juddi:juddi-parent:pom:3.0.0.SNAPSHOT

from the specified remote repositories:
central (http://repo1.maven.org/maven2),
apache (http://people.apache.org/repo/m2-ibiblio-rsync-repository),
maven2-repository.dev.java.net (http://download.java.net/maven/2),
maven-repository.dev.java.net (http://download.java.net/maven/1)


So far, I haven't figured this problem out, except that I know:

  • I get a similar problem on both the trunk, and on the 3.0.0-tagged branch

  • The jUDDI developers don't get this problem when they build

  • The juddi-parent subsystem is built successfully, and exists correctly in my local repository (under the .m2 folder



My current working theory is that the build scripts are expecting that Maven will fetch this already-built object from my local repository, but for some reason Maven is not looking in the local repository, and is only willing to look in remote repositories.

I've learned about running Maven with the -X flag, and I can see that, at what seems to be the critical point, Maven deliberates about where to look for the juddi-parent object:


[DEBUG] Retrieving parent-POM:
org.apache.juddi:juddi-parent:pom:3.0.0.SNAPSHOT
for project: org.apache.juddi.bootstrap:apache-tomcat:pom:6.0.20
from the repository.
[DEBUG] Skipping disabled repository central
[DEBUG] juddi-parent: using locally installed snapshot
[DEBUG] Trying repository maven2-repository.dev.java.net


It seems like "using locally installed snapshot" should mean that it found the built object in my .m2 local repository, but then why does it then proceed to start looking out on the net?

The next step, I guess, is to learn more about Maven's behavior by reading through the Maven docs, and the jUDDI pom.xml files, and trying to correlate them to the output of the -X build.

Slow going, but that's how learning occurs.

Monday, November 9, 2009

Coders at Work: Brendan Eich

Chapter 4 of Coders at Work presents the interview with Brendan Eich.

Eich is the creator of JavaScript, as well as being the first implementer of the language. He continues to lead the JavaScript language design efforts, but in addition he is still active in implementation, including notably the recent ultra-high-performance JavaScript implementations in modern Mozilla products.

Eich's description of the invention of JavaScript is well-known, but it's still good to hear it again, from his perspective:

The immediate concern at Netscape was it must look like Java. People have done Algol-like syntaxes for Lisp but I didn't have time to take a Scheme core so I ended up doing it all directly and that meant I could make the same mistakes that others make.

...

But I didn't stick to Scheme and it was because of the rushing. I had too little time to actually think through some of the consequences of things I was doing. I was economizing on the number of objects that I was going to have to implement in the browser. So I made the global object be the window object, which is a source of unknown new name bindings and makes it impossible to make static judgments about free variables. So that was regrettable. Doug Crockford and other object-capabilities devotees are upset about the unwanted source of authority you get through the global object. That's a different way of saying the same thing. JavaScript has memory-safe references so we're close to where we want to be but there are these big blunders, these loopholes.


In addition to reading Eich's account of the technical details of the development of JavaScript, he provides a very interesting account of the various social pressures which were complicating the work on the language:

It was definitely a collaborative effort and in some ways a compromise because we were working with Adobe, who had done a derivative language called ActionScript. Their version 3 was the one that was influencing the fourth-edition proposals. And that was based on Waldemar Horwat's work on the original JavaScript 2/ECMAScript fourth-edition proposals in the late 90's, which got mothballed in 2003 when Netscape mostly got laid off and the Mozilla foundation was set up.
...
At the time there was a real risk politically that Microsoft was just not going to cooperate. They came back into ECMA after being asleep and coasting. The new guy, who was from Hyderabad, was very enthusiastic and said, "Yes, we will put the CLR into IE8 and JScript.net will be our new implementation of web JavaScript." But I think his enthusiasm went upstairs and then he got told, "No, that's not what we're doing." So it led to the great revolt and splitting the committee.
...
I think there's been kind of a Stockholm syndrome with JavaScript: "Oh, it only does what it does because Microsoft stopped letting it improve, so why should we want better syntax; it's actually a virtue to go lambda-code everything". But that Stockholm syndrome aside, and Microsoft stagnating the Web aside, language design can do well to take a kernel idea or two and push them hard.

Eich's interview rushes like a whirlwind. He clearly is such an intense and active thinker, and has so much that he wants to talk about, that there just isn't enough time or space in a few small pages to contain it all.

Luckily, there are a number of places on the web where you can find recorded lectures and writings that he has done so you can learn more, and in more detail. For example, here is a recent talk he gave at Yahoo.

As I was reading through the chapter, I found myself pausing about every other page to go chase a reference to various bits of knowledge I hadn't been aware of:

  • Hindley-Milner type inferences

  • The Currey-Howard correspondence

  • Valgrind, Helgrind, Chronomancer, and Replay


It must be exhausting to share an office with Eich :)

And, as a person who loves to use a debugger while studying code, I was pleased to read that Eich shares my fondness for stepping through code in the debugger:

When I did JavaScript's regular expressions I was looking at Perl 4. I did step through it in the debugger, as well as read the code. And that gave me ideas; the implementation I did was similar. In this case the recursive backtracking nature of them was a little novel, so that I had to wrap my head around. It did help to just debug simple regular expressions, just to trace the execution. I know other programmers talk about this: you should step through code, you should understand what the dynamic state of the program looks like in various quick bird's-eye views or sanity checks, and I agree with that.

Seibel: Do you do that with your own code, even when you're not tracking down a bug?

Eich: Absolutely -- just sanity checks. I have plenty of assertions, so if those botch then I'll be in the debugger for sure. But sometimes you write code and you've got some clever bookkeeping scheme or other. And you test it and it seems to work until you step through it in the debugger. Particularly if there's a bit of cleverness that only kicks in when the stars and the moon align. Then you want to use a conditional break point or even a watch point, a data break point, and then you can actually catch it in the act and check that, yes, the planets are all aligned the way they should be and maybe test that you weren't living in optimistic pony land. You can actually look in the debugger, whereas in the source you're still in pony land. So that seems important; I still do it.


"Optimistic pony land" -- what a great expression! It captures perfectly that fantasy world that all programmers are living in when they first start writing some code, before they work slowly and thoroughly through the myriad of pesky details that are inherent in specifying actions to the detail that computers require.

Well, more thoughts will have to wait for later; I'm heading back to pony land :)

Saturday, November 7, 2009

Coders at Work: Douglas Crockford

Chapter Three of Peter Seibel's Coders at Work contains his interview with Douglas Crockford.

I find the inclusion of Crockford in the book a little odd, because I don't really see him as much of a coder. You can see this in the interview: Crockford doesn't spend much time talking about code review, or source code control, or test harnesses, or debuggers, or the other sorts of things that occupy most waking seconds of most coders. When Crockford does talk about these things, he talks about his work at Basic Four, when he was using a Z80, or he talks about ideas from Multics; this is all relevant, but it's 30+ years old at this point.

I would describe Crockford as a language designer, because the work that has (rightfully) brought him attention and renown is his work in transforming the image (and reality) of JavaScript into its current position as the most important programming language in the world. So for that reason alone I think he is worth including in the book.

When I started learning about JavaScript about 10 years ago, the world was full of books talking about "DHTML" and telling you how to paste 3 ugly lines of JavaScript into your HTML form element so that when you clicked the Submit button, the JavaScript would check that your userid was not blank. Now we have Google Maps, and Yahoo Mail, and the Netflix movie browser UI, etc.: example after example of elegant, powerful, full-featured applications written in JavaScript.

Furthermore, the juxtaposition of the Crockford interview with the Eich interview (next chapter) is quite entertaining, as it has been the back-and-forth interaction between these two that has brought JavaScript to where it is. For example, in this chapter we get to hear Crockford say:

I can appreciate Brendan Eich's position there because he did some brilliant work but he rushed it and he was mismanaged and so bad stuff got out. And he's been cursed and vilified for the last dozen years about how stupid he is and how stupid the language is and none of that's true. There's actually brilliance there and he's a brilliant guy. So he's now trying to vindicate himself and prove, I'm really a smart guy and I'm going to show it off with this language that has every good feature that I've ever seen and we're going to put them all together and it's going to work.

And, next chapter, we get to hear this part of the story from Eich's point of view. So well done, Peter Seibel!

I found myself reading this chapter with an on-off switch: I kind of skimmed through the parts where Crockford discusses his work in his pre-JavaScript life, but when he talks about JavaScript, I found it much more interesting. He talks about the experience which has occurred to (probably) every experienced programmer who picked up JavaScript (certainly, it happened to me):

I understand why people are frustrated with the language. If you try to write in JavaScript as though it is Java, it'll keep biting you. I did this. One of the first things I did in the language was to figure out how to simulate something that looked sort of like a Java class, but at the edges it didn't work anything like it. And I would always eventually get pushed up against those edges and get hurt.

Eventually I figured out I just don't need these classes at all and then the language started working for me. Instead of fighting it, I found I was being empowered by it.

The key aspect of JavaScript which takes most Java programmers a long time to get past is the difference between abstraction-based-on-classification (Java) and abstraction-based-on-prototype (JavaScript), so it is here that I found Crockford's insights most fascinating:

Part of what makes programming difficult is most of the time we're doing stuff we've never done before. If it was stuff that had been done before we'd all be re-using something else. For most of what we do, we're doing something that we haven't done before. And doing things that you haven't done before is hard. It's a lot of fun but it's difficult. Particularly if you're using a classical methodology you're having to do classification on systems that you don't fully understand. And the likelihood that you're going to get the classification wrong is high.

Seibel: By "classical" you mean using classes.

Crockford: Right. I've found it's less of a problem in the prototypal world because you focus on the instances. If you can find one instance which is sort of typical of what the problem is, you're done. And generally you don't have to refactor those. But in a classical system you can't do that -- you're always working from the abstract back to the instance. And then making hierarchy out of that is really difficult to get right. So ultimately when you understand the problem better you have to go back and refactor it. But often that can have a huge impact on the code, particularly if the code's gotten big since you figured it out. So you don't.
...
I've become a really big fan of soft objects. In JavaScript, any object is whatever you say it is. That's alarming to people who come at it from a classical perspective because without a class, then what have you got? It turns out you just have what you need, and that's really useful. Adapting your objects ... the objects that you want is much more straightforward.


I think this may be one of the most insightful and brilliant distillations of everything that's right with object-oriented programming, and everything that's wrong with object-oriented programming, that I've ever read. I've been doing object-oriented programming for 15 years, and I haven't seen such a concise, precise, and accurate critique of its essence in a long time.

I hope that Crockford continues to find an audience, and I hope that he continues to work on improving JavaScript. Although he can be a prickly and sharp-tongued fellow, he has a lot of interesting thoughts on this subject, and, particularly given the importance of the subject, I hope he continues to keep attention focused on this topic for a long time.

Friday, November 6, 2009

Google have open-sourced their JavaScript library

Another powerful and sophisticated open source JavaScript library has joined the party.

Among the things that seem interesting about Google Closure are:

  • Their approach is to introduce an explicit compilation step, so that developers can feel confident in writing well-commented maintainable source, which is then compiled by the Closure compiler down to a smaller and tighter deployable JavaScript program

  • They've gone with a templating approach. Templating approaches seem to come and go, dating back to things like ASP and JSP last century, and probably older systems before that -- I think ColdFusion was a templating system.

  • Their infrastructure runs both server-side and client-side, and supports Java as the implementation language on the server-side (and of course JavaScript as the language on the client side.



Since this library is from Google, you can be sure that it is thorough, powerful, and sophisticated, and therefore worthy of study.

I guess that means I've got something else to learn about now!

Thursday, November 5, 2009

Ant integration with application servers is too hard

I spend way too much of my time at work wrestling with Ant scripts that try to integrate with application servers.

Some application servers do a better job of this than others, but overall the current state of the art in this area is still as described in the Ant manual:



<parallel>
<wlrun ... >
<sequential>
<sleep seconds="30"/>
<junit fork="true" forkmode="once" ... >
<wlstop/>
</sequential>
</parallel>

This example represents a typical pattern for testing a server application. In one thread the server is started (the <wlrun> task). The other thread consists of a three tasks which are performed in sequence. The <sleep> task is used to give the server time to come up. Another task which is capable of validating that the server is available could be used in place of the <sleep> task. The <junit> test harness then runs, again in its own JVM. Once the tests are complete, the server is stopped (using <wlstop> in this example), allowing both threads to complete. The <parallel> task will also complete at this time and the build will then continue.


Let's stop for a bit and critique this approach:

  • First, we have the complexity of the <parallel> and <sequential> tasks, which are complicated and intricate. As the Ant manual itself says,

    Anyone trying to run large Ant task sequences in parallel ... is implicitly taking on the task of identifying and fixing all concurrency bugs [in] the tasks that they run. ... Accordingly, while this task has uses, it should be considered an advanced task ...

  • Secondly, consider the ugly <sleep> call: why did we have to sleep? How long do we need to sleep? What happens if we sleep for too long, or for not long enough? As the Ant manual notes, there are sometimes ways around this, but they require assistance from the application server.

  • Lastly, what happens when something fails? How do you ensure that, having started the application server, you can reliably shut it down? What happens if you try to shut it down, but it never actually started up? And so forth.



Furthermore, there are many other interactions that one needs to have with an application server beyond just starting and stopping it:

  • What's the status of this application server? Is it up or down?

  • Deploy or undeploy an application to the server. Query the current version of a deployed application; re-deploy a different version of an application, either with or without stopping and re-starting the application and/or the server

  • Find out if the server has encountered any errors; capture the diagnostic error logs from the server.

  • Adjust the configuration of the server: give it different resources, change its operating parameters, etc.

  • Install or un-install an application server from scratch.


And many more.

All of these tasks are routine jobs that I'd like to be able to reliably automate, and over the years (decades!!) I have made some progress in doing so.

But still, all those hours spent in trying to write and maintain reliable Ant automation scripts for application server integration.

Is there a better way?

Tuesday, November 3, 2009

Coders at Work: Brad Fitzpatrick

Chapter Two of the mesmerizing Coders at Work is the interview with Brad Fitzpatrick.

I hadn't heard about Brad Fitzpatrick before I read the chapter, but I was quite familiar with some of his work. I have visited LiveJournal many times, and I have read a fair amount about memcached, and I've looked at a lot of the work that Six Apart have done.

Early in the interview, Fitzpatrick has this to say about testing:

I now maintain so much code, and there's other people working with it, if there's anything halfway clever at all, I just assume that somebody else is going to not understand some invariants I have. So basically anytime I do something clever, I make sure I have a test in there to break really loudly and to tell them that they messed up. I had to force a lot of people to write tests, mostly people who were working for me. I would write tests to guard against my own code breaking, and then once they wrote code, I was like, "Are you even sure that works? Write a test. Prove it to me." At a certain point, people realize, "Holy crap, it does pay off," especially maintenance costs later.


There's a lot of hard-earned experience in that quote.

But what really resonates with me is Fitzpatrick's instinct to trust nothing until it's been proven by fire, by writing and running tests and proving that it works. That sort of question-every-assumption attitude is crucial to the construction of really solid code.

I was also struck by the way that Fitzpatrick didn't get caught in the trap of over-specializing in any one area, but rather was fascinated by all sorts of different software in the complex systems that now exist:

When I was doing stuff on LiveJournal, I was thinking about things from JavaScript to how things were interacting in the kernel. I was reading Linux kernel code about epoll and I was like, "Well, what if we have all these long TCP connections and JavaScript is polling with these open TCP connections that are going to this load balancer?"


I'm rather a generalist myself, and find myself interested in all sorts of different software, from front-ends and UIs to middleware to networking protocols to file systems and database servers. I think there's lots to learn at all these levels, and lots to explore.

I liked Fitzpatrick's very practical suggestions about how to approach a new library of code:

First step, take a virgin tarball or check out from svn, and try to get the damn thing to build. Get over that hurdle.
...
Anyway, once you have one clean working build, kill it, and just make one damn change. Change the title bar to say, "Brad says, 'Hello world.'" Change something. Even if everything's ugly, just start making changes.
...
Then send out patches along the way. I find that's the best way to start a conversation.
...
When I fix a bug in their product the first thing I do is send them a patch in the mail and just say, "What do you guys think of this?"


I think this is a great approach, to get your hands dirty with real code, and then to start discussing actual code, and actual changes, with the community. I've seen this technique be very successful in the Derby community. Dan Debrunner, in particular, is a big supporter of the "let's talk about actual code" approach. It makes things concrete and specific, and it's remarkable how effective it can be to discuss a patch.

I was quite impressed with Fitzpatrick's approach to solving problems in code:

Thinking like a scientist; changing one thing at a time. Patience and trying to understand the root cause of things. Especially when you're debugging something or designing something that's not quite working. I've seen young programmers say, "Oh shit, it doesn't work," and then rewrite it all. Stop. Try to figure out what's going on. Learn how to write things incrementally so that at each stage you could verify it.

The curse of the voodoo change has a strong hold on people. I've seen many an otherwise great engineer say: "I don't understand what's wrong, but I know that if I just make this small change here, the problem goes away. And I'm tired, and I have other things I need to do, and I'm just going to do this and move on."

It's terribly seductive and Fitzpatrick is completely right: you have to be ever vigilant against this false solution, and force yourself to really do the job right.

JUnit, tearDown, and uncaught exceptions

I know this, I've known it for a while, and still I stumble over it.

I tend to generally declare my JUnit test cases as


public void testXXX()
throws Exception
{
}


Within my test case code, I then typically don't catch any exceptions, unless I'm specifically testing that a particular expected exception is thrown. Instead, I simply allow any unexpected exceptions to be thrown out of the test case, and caught by the JUnit infrastructure, which decides that a test case which terminated with an uncaught exception is an error, and JUnit reports the uncaught exception in the output.

However, there is a gotcha, and it involves the tearDown method. If a particular test suite uses the setUp/tearDown paradigm to perform common initialization and termination functions, then it is crucial that these methods themselves must be cautious with respect to exceptions.

Because if:

  • your test case terminates with an uncaught exception,
  • and then your tearDown method terminates with an uncaught exception,

then it is the tearDown exception which "wins"; i.e., it is the exception which is shown in the JUnit output.

This can completely fool you into a wild goose chase of looking at the wrong problem.

In a way, I think this is a mistake in JUnit; I wish it reported the uncaught exception from the test case in preference to the uncaught exception from tearDown, or even better I wish it reported both exceptions.

But, for now, the best thing to do is to ensure that your tearDown methods are extremely careful, and never terminate with uncaught exceptions.

Monday, November 2, 2009

The little stuff

I'm taking an opportunity to try to clean up some little stuff in Derby.

Over the last few years, the Derby open bug list has grown, as the community has been doing a great job of paying attention to problems that are found, but has been concentrating on the important problems first.

This triage of problems is necessary, and it is good to fix the important problems first, but after a while the smaller problems build up, and obscure the view, and it can be hard to distinguish between the big problems and the little ones.

So I'm spending some time just trying to clean up the little stuff, looking for simple, well-understood minor issues that I can go in and resolve.

There are so many open issues (1000+) that I can't possibly make a serious dent, but if I can do my part to move some of the minor issues out of the way, hopefully it will be easier for the community to find and focus on the big problems.

I'm doing a similar process at work. I've recently taken over maintenance duties for several large and complex libraries of code, and before I add any new functionality, I'm trying to work as hard as I can at resolving the existing bugs, making the existing test suites run reliably, and so forth.

An approach similar to this is what is known as the "broken windows" theory, which says that if you have a crime-plagued community, a useful way to start turning things around is just to fix the simple things that can be fixed, which demonstrates that you care about the situation. The theory says that the first step toward fixing things is to care about them, and that will in turn lead to improved behaviors and improved results.

In the software engineering world, the analogy is quite reasonable, I think: fixing bugs, improving the documentation, working on the test suites, etc. are all part of fixing the broken windows, and showing that you care about the software.

Sunday, November 1, 2009

Coders at Work: Jamie Zawinski

Like lots of people, I'm currently spending my free minutes chewing through Coders At Work.

I find that I have different reactions to each of the programmers whose interviews I've read. I agree with many of the things that each one has to say, but (so far at least) I also find things that I disagree with. Regardless, each interview has been fascinating: well-conducted, well-related. I don't know if I'll have the energy and interest to comment on each chapter; we'll have to see as I progress through the book.

The first interview is with Jamie Zawinski. I was somewhat familiar with Zawinski's work; in particular, I'm fond of repeating his great off-handed observation about regular expressions: "now you have two problems."

I strongly agree with Zawinski about the virtues of making releases and shipping product. I feel his pain when he says:

They were much more interested in abstract things than solving problems. I wanted to be doing something that I could point to and say, "Look, I made this neat thing."
...
In a lot of ways the code wasn't very good because it was done very fast. But it got the job done. We shipped -- that was the bottom line.
...
A month later two million people were running software I'd written. It was unbelievable.
I've never had anywhere close to that many people running my software, but I know the thrill, and it's very real.

I also agree with Zawinski about the similarity between programming and writing, and about how well-written software reads like well-written prose:

I've always seen much more in common with writing prose than math. It feels like you're writing a story and you're trying to express a concept to a very dumb person -- the computer -- who has a limited vocabulary. You've got this concept you want to express and limited tools to express it with. What words do you use and what does your introductory and summary statement look like?
...
You can have a piece of text describing something that describes it correctly or it can describe it well, with some flair. And the same thing's true of programs. It can get the job done or it can also make sense, be put together well.
I remember once, fairly early in my career, I was talking with another engineer who had taken a look at some code I had written, and he said: "it's interesting when I read your code. I feel like you're there with me, like I can hear you talking, like you're speaking directly to me through the code."

I think that in many ways, programs are not written so much for the computer, but for the other humans that come along later and read them, and I think that one mark of truly great software is that it reads well.

I find that I disagree with Zawinski in his approach to the more nitty-gritty aspects of programming, such as design, testing, etc. In his defense, he was under immense time pressure, but still:

When I'm just writing the first version of the program, I tend to put everything in one file. And then I start seeing structure in that file. Like there's this block of things that are pretty similar. That's a thousand lines now, so why don't I move that into another file. And the API sort of builds up organically that way. The design process is definitely an ongoing thing; you never know what the design is until the program is done. So I prefer to get my feet wet as early as possible; get something on the screen so I can look at it sideways.
...
There's bound to be stuff where this would have gone faster if we'd had unit tests or smaller modules or whatever. That all sounds great in principle. Given a leisurely development pace, that's certainly the way to go. But when you're looking at, "We've got to go from zero to done in six weeks," well, I can't do that unless I cut something out. And unit tests are not critical.
...
It's a matter of priorities. Are you trying to write good software or are you trying to be done by next week? You can't do both.

I was never at Netscape, and I don't know what the time pressures and environment was like, but this sort of approach to programming would drive me up the wall. Of course, the bottom line is: they built one of the most interesting pieces of software yet written by humans, and so who am I to criticize? But I don't think I could have stood by and participated in such a project. I'm just too methodical and detail-oriented, and too much a believer in the value of testing.

It sounds like Jamie Zawinski is a fascinating programmer, and it would have been interesting to work with him, particularly at Netscape; just not for very long.

I was a bit disappointed that Seibel didn't coax Zawinski into talking more about the transition to open source. It must have been a very interesting experience to be part of the creation of the Mozilla Foundation, and the transfer of the Netscape software and knowledge to the open source community, but either Seibel didn't ask much about the experience or Zawinski didn't have much to say about it.