Thursday, December 24, 2009

Coders at Work: Donald Knuth

At last I've reached the end of Peter Seibel's Coders at Work; chapter 15 contains his interview with Donald Knuth.

Knuth has always been more of an educator and an author than a coder, but he has some serious coding chops as well, having designed and implemented Tex and MetaFont as well as inventing the concept of literate programming in order to do so. Of course, he's most well known for The Art of Computer Programming, but though I have those books on my shelf, it's interesting to me that my favorite Knuth book is actually Concrete Mathematics.

Seibel does his best to stick to his standard formula when interviewing Knuth: When did you learn to program, what was the hardest bug you ever fixed, how do you go about finding good programmers, how do you read code, etc. But I also enjoyed the parts of the interview where they stray from those topics into other areas.

For example, Knuth defends Literate Programming against one of its most common criticisms, that it's too wordy and ends up being redundant and repetitive:

The first rule of writing is to understand your audience -- the better you know your reader the better you can write. The second rule, for technical writing, is to say everything twice in complementary ways so that the person who's reading it has a chance to put the ideas into his or her brain in ways that reinforce each other.

So in technical writing usually there's redundancy. Things are said both formally and informally. Or you give a definition and then you say, "Therefore, such and such is true," which you can only understand if you've understood the definition.
...
So literate programming is based on this idea that the best way to communicate is to say things both informally and formally that are related.

I enjoy reading literate programming; I enjoy writing programs and their associated specifications/documentation/comments in as literate a fashion as I can accomplish. I think that Knuth's defense of literate programming holds water.

Another part of the interview that I found fascinating was this spirited attack on reusability, whether it comes from reusable subroutine libraries, object-oriented frameworks, or whatever:

People have this strange idea that we want to write our programs as worlds unto themselves so that everybody else can just set up a few parameters and our program will do it for them. So there'll be a few programmers in the world who write the libraries, and then there are people who write the user manuals for these libraries, and then there are people who apply these libraries and that's it.

The problem is that coding isn't fun if all you can do is call things out of a library, if you can't write the library yourself. If the job of coding is just to be finding the right combination of parameters, that does fairly obvious things, then who'd want to go into that as a career?

There's this overemphasis on reusable software where you never get to open up the box and see what's inside the box. It's nice to have these black boxes but, almost always, if you can look inside the box you can improve it and make it work better once you know what's inside the box. Instead people make these closed wrappers around everything and present the closure to the programmers of the world, and the programmers of the world aren't allowed to diddle with that. All they're able to do is assemble the parts.

I think this is Knuth-the-educator speaking. He doesn't want to see Computer Science degenerate into some sort of clerical and monotonous assembly task; he wants each successive generation of programmers to be standing on the shoulders of the ones before them, understanding what they did and why, and inventing the next version of programs.

Knuth returns to this topic later in the interview; it's clearly of tremendous importance to him:

[T]here's the change that I'm really worried about: that the way a lot of programming goes today isn't any fun because it's just plugging in magic incantations -- combine somebody else's software and start it up. It doesn't have much creativity. I'm worried that it's becoming too boring because you don't have a chance to do anything much new. Your kick comes out of seeing fun results coming out of the machine, but not the kind of kick that I always got by creating something new.


As an educator, Knuth realizes that this is an extremely challenging task, because you need to understand that students of computer science need to start at the beginning and learn the basics, not just assume the presence of vast libraries of existing code and go from there:

[M]y take on it is this: take a scientist in any field. The scientist gets older and says, "Oh, yes, some of the things that I've been doing have a really great payoff and other things, I'm not using anymore. I'm not going to have my students waste time on the stuff that doesn't make giant steps. I'm not going to talk about low-level stuff at all. These theoretical concepts are really so powerful -- that's the whole story. Forget about how I got to this point."

I think that's a fundamental error made by scientists in every field. They don't realize that when you're learning something you've got to see something at all levels. You've got to see the floor before you build the ceiling. That all goes into the brain and gets shoved down to the point where the older people forget that they needed it.


As I've said many times, I think that there is great potential for Open Source in education, for it provides a large body of existing software that is available for study, critique, and improvement.

As I've come to the end of the book, I can't close without including the most startling paragraph in the entire book, the one which must have made Seibel's jaw, and the jaw of every reader, drop to the ground with a thundering "thwack", as Knuth singles out for celebration and praise the single most abhorred and condemned feature that Computer Science has produced in its first half-century of existence:

To me one of the most important revolutions in programming languages was the use of pointers in the C language. When you have nontrivial data structures, you often need one part of the structure to point to another part, and people played around with different ways to put that into a higher-level language. Tony Hoare, for example, had a pretty nice clean system but the thing that the C language added -- which at first I thought was a big mistake and then it turned out I loved it -- was that when x is a pointer and then you say, x + 1, that doesn't mean one more byte after x but it means one more node after x, depending on what x points to: if it points to a big node, x + 1 jumps by a large amount; if x points to a small thing, x + 1 just moves a little. That, to me, is one of the most amazing improvements in notation.

And with that, Knuth joins Joel Spolsky and doubles the number of people on the planet who celebrate the C pointer feature.

I really enjoyed Coders at Work, as you can tell by the depth to which I worked through it. In the end, it probably wasn't worth this much time, but I certainly found lots of food for thought in every chapter. If you're at all interested in coding, and in the people who do and enjoy it, you'll probably find this book interesting, too.

2 comments:

  1. Thank you for this post.

    Knuth makes a good point with libraries becoming worlds onto themselves to be used via incantations.

    ReplyDelete
  2. Are you kidding? Pointers are one of the main reasons why people use C. They are very versatile and powerful.

    ReplyDelete