Thursday, December 24, 2009

JetBrains TeamCity

JetBrains have released the latest version of their TeamCity build-and-test automation software.

This is starting to become a crowded field, with open-source offerings such as Hudson, BuildBot, Continuum, CruiseControl, etc., and commercial systems such as Atlassian's Bamboo. TeamCity has been around for a while, and their offering is both low cost and powerful, and the JetBrains team have a reputation as providers of solid tools.

Among the things I like about the latest TeamCity software are the integration with Amazon's EC2 cloud computing infrastructure, since it makes no sense to build your own machine farm in this day and age, and the Java API for extensibility, since no system like this is ever deployed without a certain amount of customization.

Among the things I don't like about TeamCity are their lack of built-in Git support, and their "pre-tested commit" feature.

The whole pre-tested commit thing is a bit of philosophy. It's extremely seductive, and I can completely understand why people think they want it. But you don't want it, really. Laura Wingerd does a great job of explaining why in her book on Perforce, which I don't happen to have handy or I'd give you the quote verbatim.

But the nub of it is that you want to remove barriers for commit, not add them. Your overall philosphy should be:

  • Make building and testing easy

  • Make commit easy and encourage it

  • Depend on your version control system and your build tracking system to help you monitor and understand test failures and regressions, not to try to prevent them.

There are enough barriers to commit as it is; any continuous integration system should have as its first and most important requirement: make it easy to Check In Early, Check In Often.

At work, we're still fairly happy with our home-grown continuous integration system (the Build Farm), but I can see the writing on the wall: between turnkey continuous integration systems, and cloud computing, the build-and-test experience of the typical commercial software product development staff in 2010 is going to be far different than it was 5-10 years ago.

Coders at Work: Donald Knuth

At last I've reached the end of Peter Seibel's Coders at Work; chapter 15 contains his interview with Donald Knuth.

Knuth has always been more of an educator and an author than a coder, but he has some serious coding chops as well, having designed and implemented Tex and MetaFont as well as inventing the concept of literate programming in order to do so. Of course, he's most well known for The Art of Computer Programming, but though I have those books on my shelf, it's interesting to me that my favorite Knuth book is actually Concrete Mathematics.

Seibel does his best to stick to his standard formula when interviewing Knuth: When did you learn to program, what was the hardest bug you ever fixed, how do you go about finding good programmers, how do you read code, etc. But I also enjoyed the parts of the interview where they stray from those topics into other areas.

For example, Knuth defends Literate Programming against one of its most common criticisms, that it's too wordy and ends up being redundant and repetitive:

The first rule of writing is to understand your audience -- the better you know your reader the better you can write. The second rule, for technical writing, is to say everything twice in complementary ways so that the person who's reading it has a chance to put the ideas into his or her brain in ways that reinforce each other.

So in technical writing usually there's redundancy. Things are said both formally and informally. Or you give a definition and then you say, "Therefore, such and such is true," which you can only understand if you've understood the definition.
So literate programming is based on this idea that the best way to communicate is to say things both informally and formally that are related.

I enjoy reading literate programming; I enjoy writing programs and their associated specifications/documentation/comments in as literate a fashion as I can accomplish. I think that Knuth's defense of literate programming holds water.

Another part of the interview that I found fascinating was this spirited attack on reusability, whether it comes from reusable subroutine libraries, object-oriented frameworks, or whatever:

People have this strange idea that we want to write our programs as worlds unto themselves so that everybody else can just set up a few parameters and our program will do it for them. So there'll be a few programmers in the world who write the libraries, and then there are people who write the user manuals for these libraries, and then there are people who apply these libraries and that's it.

The problem is that coding isn't fun if all you can do is call things out of a library, if you can't write the library yourself. If the job of coding is just to be finding the right combination of parameters, that does fairly obvious things, then who'd want to go into that as a career?

There's this overemphasis on reusable software where you never get to open up the box and see what's inside the box. It's nice to have these black boxes but, almost always, if you can look inside the box you can improve it and make it work better once you know what's inside the box. Instead people make these closed wrappers around everything and present the closure to the programmers of the world, and the programmers of the world aren't allowed to diddle with that. All they're able to do is assemble the parts.

I think this is Knuth-the-educator speaking. He doesn't want to see Computer Science degenerate into some sort of clerical and monotonous assembly task; he wants each successive generation of programmers to be standing on the shoulders of the ones before them, understanding what they did and why, and inventing the next version of programs.

Knuth returns to this topic later in the interview; it's clearly of tremendous importance to him:

[T]here's the change that I'm really worried about: that the way a lot of programming goes today isn't any fun because it's just plugging in magic incantations -- combine somebody else's software and start it up. It doesn't have much creativity. I'm worried that it's becoming too boring because you don't have a chance to do anything much new. Your kick comes out of seeing fun results coming out of the machine, but not the kind of kick that I always got by creating something new.

As an educator, Knuth realizes that this is an extremely challenging task, because you need to understand that students of computer science need to start at the beginning and learn the basics, not just assume the presence of vast libraries of existing code and go from there:

[M]y take on it is this: take a scientist in any field. The scientist gets older and says, "Oh, yes, some of the things that I've been doing have a really great payoff and other things, I'm not using anymore. I'm not going to have my students waste time on the stuff that doesn't make giant steps. I'm not going to talk about low-level stuff at all. These theoretical concepts are really so powerful -- that's the whole story. Forget about how I got to this point."

I think that's a fundamental error made by scientists in every field. They don't realize that when you're learning something you've got to see something at all levels. You've got to see the floor before you build the ceiling. That all goes into the brain and gets shoved down to the point where the older people forget that they needed it.

As I've said many times, I think that there is great potential for Open Source in education, for it provides a large body of existing software that is available for study, critique, and improvement.

As I've come to the end of the book, I can't close without including the most startling paragraph in the entire book, the one which must have made Seibel's jaw, and the jaw of every reader, drop to the ground with a thundering "thwack", as Knuth singles out for celebration and praise the single most abhorred and condemned feature that Computer Science has produced in its first half-century of existence:

To me one of the most important revolutions in programming languages was the use of pointers in the C language. When you have nontrivial data structures, you often need one part of the structure to point to another part, and people played around with different ways to put that into a higher-level language. Tony Hoare, for example, had a pretty nice clean system but the thing that the C language added -- which at first I thought was a big mistake and then it turned out I loved it -- was that when x is a pointer and then you say, x + 1, that doesn't mean one more byte after x but it means one more node after x, depending on what x points to: if it points to a big node, x + 1 jumps by a large amount; if x points to a small thing, x + 1 just moves a little. That, to me, is one of the most amazing improvements in notation.

And with that, Knuth joins Joel Spolsky and doubles the number of people on the planet who celebrate the C pointer feature.

I really enjoyed Coders at Work, as you can tell by the depth to which I worked through it. In the end, it probably wasn't worth this much time, but I certainly found lots of food for thought in every chapter. If you're at all interested in coding, and in the people who do and enjoy it, you'll probably find this book interesting, too.

Tuesday, December 22, 2009

Language subsetting

On the Stack Overflow podcast recently, Joel and Jeff were discussing the topic of: when and why do programmers intentionally restrict themselves to using only a subset of the functionality available to them in their programming language.

At first it seems like an odd behavior: if you have features available in your programming language, why would you not use them?

I think there are (at least) 4 reasons: 3 valid reasons and 1 bad reason. Here's my list:

  • Complexity. C++ is a perfect example of this. C++ is such an enormous language, with so many features and possibilities and variations on ways of getting things done, you'll end up creating incomprehensible, illegible, unmaintainable programs. So pretty much every C++ organization I've ever worked with has decided to restrict themselves to some simpler subset of the language's features

  • Vendor portability. SQL is a perfect example of this. There are many different implementations of SQL: Oracle, DB2, SQL Server, Sybase, MySQL, Postgres, Ingres, Derby, etc., and each of them has implemented a different subset of SQL's features, often with slightly different semantics. If you are writing a database-related application, you often find yourself wanting to be careful about particular database behaviors that you are using, so that you can "port" your application from one database to another without too many problems.

  • Version compatibility. Java is a perfect example of this. Over the years, there have been multiple versions of Java, and later releases have introduced many new features. But if you write an application against a new release of Java, using the new release's features, your application probably won't run in an older release of Java. So if you are hoping for your application to be widely used, you are reluctant to use those latest features until they have found widespread deployment in the Java community. Currently, it's my sense of things that JDK 1.4 is still widely used, although most Java environments are now moving to JDK 1.5. JDK 1.6 is commonly used, but it's still somewhat surprising when you encounter a major Java deployment environment (application server, etc.) which has already moved to the JDK 1.6 level of support. So most large Java applications are only now moving from JDK 1.4 to JDK 1.5 as their base level of support. The current version of Ant, for example, still supports JDK 1.2!

  • Unfamiliarity. This is the bad reason for restricting yourself to a subset of your programming language's capabilities. Modern programming languages have an astounding number of features, and it can take a while to learn all these different features, and how to use them effectively. So many programmers, perhaps unconsciously, find themselves reluctant to use certain features: "yeah, I saw that in the manual, but I didn't understand what it was or how to use it, so I'm not using that feature". This is a shame: you owe it to yourself, each time you encounter such a situation, to seize the opportunity to learn something new, and stop and take the time to figure out what this feature is and how it works.

So, anyway, there you go, Jeff and Joel: ask a question (on your podcast) and people will give you their answers!

Friday, December 18, 2009

p4 shelve

Perforce version 2009.2 is now out in beta, and it contains a very interesting new feature: shelving.

Laura Wingerd gave a fairly high-level introduction to shelving in her blog post:

You can cache your modified files on the server, without having to check them in as a versioned change. For teams, shelving makes code reviews and code handoffs possible. Individuals can use shelving for multi-tasking, code shunting, incremental snapshots, and safety nets.

The new commands are p4 shelve and p4 unshelve, and the blog post explains a bit of the workflow involved in using these commands to accomplish the various new scenarios.

I think it's going to take a bit of time to become comfortable with the new commands and how to use them, but I'm looking forward to getting this feature installed and available so I can start to learn more about it!

Scam victims and software security

When both DailyDave and Bruce Schneier point to a paper, you can bet it's going to be very interesting. So if you are at all interested in software security, run don't walk to this paper by Stajano and Wilson: Understanding scam victims: seven principles for systems security.

The seven principles are psychological aspects of human behavior which provide vulnerabilities that scammers and other bad guys exploit:

  • Distraction: While you are distracted by what retains your interest, hustlers can do anything to you and you won't notice.

  • Social Compliance: Society trains people not to question authority. Hustlers exploit this "suspension of suspiciousness" to make you do what they want.

  • Herd: Even suspicious marks will let their guard down when everyone next to them appears to share the same risks. Safety in numbers? Not if they're all conspiring against you.

  • Dishonesty: Anything illegal that you do will be used against you by the fraudster, making it harder for you to seek help once you realize you've been had.

  • Deception: Things and people are not what they seem. Hustlers know how to manipulate you to make you believe that they are.

  • Need and Greed: Your needs and desires make you vulnerable. Once hustlers know what you really want, they can easily manipulate you.

  • Time: When you are under time pressure to make an important choice, you use a different decision strategy. Hustlers steer you towards a strategy involving less reasoning.

It would be great if the BBC would release the TV show episodes on DVD; I'd really enjoy watching them I think.

Wednesday, December 16, 2009

Coders at Work: Fran Allen, Bernie Cosell

I'm almost done with Peter Seibel's fascinating Coders at Work. Chapters 13 and 14 contain his interviews with Frances Allen and Bernie Cosell.

I didn't know much about Fran Allen, although I've certainly benefited from her work, as has anyone who has ever programmed a computer using a language other than assembler. The interview discusses much of her early work on developing the theory and practice of compilers, and, particular, of compiler optimization. The goal of her work is simple to state:

The user writes a sequential code in the language in a way that's natural for the application and then have the compiler do the optimization and mapping it to the machine and taking advantage of concurrency.

Allen's recollections were interesting because they go a long ways back:

I'm not sure if this is accurate, but I used to believe that the first early work of symbolics for names of variables came from a man named Nat Rochester, on a very early IBM machine, the 701 around 1951. He was in charge of testing it and they wrote programs to test the machine. In the process of doing that, they introduced symbolic variables. Now, I've seen some other things since that make me believe that there were earlier ways of representing information symbolically. It emerged in the early 50's, I think, or maybe even in the 40's. One would have to go back and see exactly how things were expressed in the ENIAC, for one thing.

Anyone who can speak about their work in the days of the 701 and the ENIAC certainly is worth listening to!

The interview with Bernie Cosell is interesting because he's a networking guy, and I've always been fascinated with networking software and how it is developed. Cosell gets credit, along with Dave Walden and Will Crowther, for the early programming of the IMP, the Interface Message Processor, that was the key mechanism in the building of the first computer networks.

As Cosell tells the story, his arrival in the networking field was somewhat accidental, for he started doing other work:

BBN was working on a project with Massachusetts General Hospital to experiment with automating hospitals and I got brought onto that project. I started out as an application programmer because that was all I was good for. I think I spent about three weeks as an application programmer. I quickly became a systems programmer, working on the libraries that they were using.
When my projects ran out, Frank [Heart] would figure out what I should work on next. ... Somehow, Frank had decided that I was to be the third guy on the IMP project.

Cosell's interview contains a number of great passages. I liked this description of a debugging session that he remembered "fondly":

... thousands and thousands of machine cycles later, the program crashed because some data structure was corrupt. But it turns out the data structure was used all the time, so we couldn't put in code that says, "Stop when it changes." So I thought about it for a while and eventually I put in this two- or three-stage patch that when the this first thing happened, it enabled another patch that went through a different part of the code. When that happened, it enabled another patch to put in another thing. And then when it noticed something bad happening, it froze the system. I managed to figure how to delay it until the right time by doing a dynamic patching hack where one path through the code was patched dynamically to another piece of the code.

Nowadays, we programmers are spoiled with our powerful high-level programming languages. With Java's various features, such as the absence of a memory pointer type, bounds-checked arrays, immutable strings, automatic memory management, and so forth, we rarely experience such debugging scenarios. But Cosell's recollection brought back a fair number of memories from my own early days in programming, and it was certainly entertaining to read.

I also thought Cosell's description of the role of the design review was very good, and I wish more people had had his experience in order to be able to comprehend the value of that process:

Another thing that Frank did, on other projects, was design reviews. He had the most scary design reviews and I actually carried that idea forward. People would quake in their boots at his design reviews.
The parts that you did absolutely fine hardly got a mention. We all said, "Oh." But the part that you were most uncomfortable with, we would focus in on. I know some people were terrified of it. The trouble is if you were an insecure programmer you assumed that this was an attack and that you have now been shown up as being incompetent, and life sucks for you.

The reality -- I got to be one the good side of the table occasionally -- was it wasn't. The design review was to help you get your program right. There's nothing we can do to help you for the parts thta you got right and now what you've got is four of the brightest people at BBN helping you fix this part that you hadn't thought through. Tell us why you didn't think it through. Tell us what you were thinking. What did you get wrong? We have 15 minutes and we can help you.

That takes enough confidence in your skill as an engineer, to say, "Well that's wonderful. Here's my problem. I couldn't figure out how to do this and I was hoping you guys wouldn't notice so you'd give me an OK on the design review." The implicit answer was, "Of course you're going to get an OK on the design review because it looks OK. Let's fix that problem while we've got all the good guys here so you don't flounder with it for another week or two."

This is a wonderful description of a process which, if handled correctly, can be incredibly effective. I've personally seen it work that way, from both the giving and receiving end. I've also seen it, far too often, fail, to the extent that it seems that more and more often people don't even attempt design reviews anymore.

It's a shame that people haven't learned how to do a design review effectively, and it's a shame that software managers rarely seem to understand what a powerful tool it is that they aren't using. Perhaps more people will read the Cosell interview and will realize that they have a powerful process improvement available to them.

Fran Allen is the only female programmer interviewed in the book, which is a shame. It would be nice to have had more. Women have been involved in computing since the beginning (think Ada Lovelace, Grace Hopper, etc.). How about, say, Val Henson, or Pat Selinger, or Barbara Liskov?

One last chapter to go...

Tuesday, December 15, 2009

git-add --patch

I just recently found Ryan Tomayko's essay about using git-add --patch.

I love the essay; it's quite well written.

He makes two interesting and inter-related points in the essay:

  • git-add --patch allows you to solve a problem in a (fairly) easy way which is extremely hard to solve using other source code control tools and methodologies.

  • The flexibility and power of Git is integral to its philosophy, and you won't understand Git until you understand this philosophy.

From the essay:

The thing about Git is that it's oddly liberal with how and when you use it. Version control systems have traditionally required a lot of up-front planning followed by constant interaction to get changes to the right place at the right time and in the right order.
Git is quite different in this regard.
When I'm coding, I'm coding. Period. Version control -- out of my head. When I feel the need to organize code into logical pieces and write about it, I switch into version control mode and go at it.

It's a very interesting essay, both the concrete parts about how to use the low-level Git tools to accomplish some very specific tests, as well as the more abstract sections about why the author feels that this tool supports his own personal software development process more effectively.

I think that there isn't enough discussion about the role of tools in the development process, and about how tools influence and guide the success or failure of a particular process. One of my favorite articles in this respect is Martin Fowler's essay on Continuous Integration. I'm pleased whenever I find articles discussing the interaction of tools and process, since more discussion can only help improve the situation with respect to software development processes and tools.

Monday, December 14, 2009

HTML 5 and multiple file uploads

Firefox 3.6 now supports multiple file input, as in:

<input type="file" multiple=""/>

This is a great bit of functionality, and pretty much finishes the closing of a 10-year old feature request for Firefox/Mozilla.

I know that for many years, the common solution to this was "use Flash". It's nice to see that features like this are making their way into the browser base.

Phenomenally complex security exploits

Secure software techniques have come a long way in the past decade, but it's important to understand that attacks against secure software have come a long way, as well. This wonderful essay by the ISS X-Force team at IBM gives some of the details behind the current state of the art of software vulnerability exploitation. In order to exploit the actual bug they had to work through multiple other steps first, including a technique they call "heap normalization", which involved inventing a pattern of leaking memory, then freeing the leaked memory, then leaking more, etc., in order to arrange the memory contents "just so".

Here's the conclusion; the whole paper is fun to read:

Although the time it took us to reach reliable exploitation neared several weeks it was worth the effort to prove a few things. Some people would have called this vulnerability "un-exploitable", which is obviously not the case. While others would have claimed remote code execution without actually showing it was possible. X-Force always demands that a working exploit be written for a code execution bug. This way, we never have to use the term "potential code execution". Finally we had to prove that the heap cache exploitation techniques were not just parlor tricks designed for a BlackHat talk, but a real-world technique that could leverage near impossible exploitation scenarios into internet victories.

Thursday, December 10, 2009

High end storage systems

For most of us, our principal exposure to computer storage systems involves the sorts of things we find on our personal computers:

  • Winchester-technology hard drives

  • USB-attached flash memory systems

  • writable optical (DVD/CD) media

But at the high end, where price is (mostly) no object, the state of the art in computer storage systems is advancing rapidly.

If you haven't been pay much attention to this area of computer technology, you owe it to yourself to have a look through this three part series from Chuck Hollis at EMC:

A little sample to whet your appetite:

In this picture, we've got a pool of VMs, a pool of servers, a pool of paths, and a pool of different storage media types.

This sort of picture wants to be managed very differently than traditional server/fabric/storage approaches. It wants to you set policy, stand back -- and simply add more resources if and when they're needed.
In just a few short years, virtualization concepts have changed forever how we think about server environments: how we build them, and how we run them.

Earlier this fall I was talking with my friend Gil about a major e-commerce roll-out he's managing for a high-end retail enterprise, and he confirmed this impact that virtualization technologies are having on enterprise computing. There is lots to learn, and lots of new opportunities become available.

Wednesday, December 9, 2009

Coders at Work: Ken Thompson

Chapter 12 of Coders at Work contains Peter Seibel's interview with Ken Thompson.

In almost any book involving computer programmers, Ken Thompson would be the most famous, impressive, and respected name in the book. He's most famous for his work on Unix, but he's also quite well known for Plan 9, and for Multics, and for B, and for Belle, and for ed, and so on and so on. Many people feel that he gave the best Turing Award Lecture ever.

Here, however, we still have Chapter 15 to look forward to.

Still, the Thompson interview does not disappoint. Fairly early into the interview, it's obvious that we're listening to somebody who has programming in their blood:

I knew, in a deep sense, every line of code I ever wrote. I'd write a program during the day, and at night I'd sit there and walk through it line by line and find bugs. I'd go back the next day and, sure enough, it would be wrong.

If you've ever fixed a bug in the shower, or while riding your bike, or while playing basketball, you know instantly what Thompson means.

This ability to manipulate symbols in your mind is crucial to successful programming, and Thompson describes it well:

I can visualize the structure of programs and how things are efficient or inefficient based on those op codes, by seeing the bottom and imagining the hierarchy. And I can see the same thing with programs. If someone shows me library routines or basic bottom-level things, I can see how you can build that into different programs and what's missing -- the kinds of programs that would still be hard to write.

This ability to conceptualize a synthesis of various component parts into an integrated whole is the essence of what many people call "systems thinking" or "system design", or "software architecture". It's an extremely impressive skill when you see it done well.

Later in the interview Thompson tells a story that I, at least, had not heard before, about the very early design of Unix, and how he wasn't even intending to write an operating system:

A group of us sat down and talked about a file system.
So I went off and implemented this file system, strictly on a PDP-7. At some point I decided that I had to test it. So I wrote some load-generating stuff. But I was having trouble writing programs to drive the file system. You want something interactive.

Seibel: And you just wanted to play around with writing a file system? At that point you weren't planning to write an OS?

Thompson: No, it was just a file system.

Seibel: So you basically wrote an OS so you'd have a better environment to test your file system.

Thompson: Yes. Halfway through there that I realized it was a real time-sharing system.

I'm not really sure if I believe this story, but it was entertaining to read it.

Thompson describes his philosophy of identifying talented programmers:

It's just enthusiasm. You ask them what's the most interesting program they worked on. And then you get them to describe it and its algorithms and what's going on. If they can't withstand my questioning on their program, then they're not good. If I can attack them or find problems with their algorithms and their solutions and they can't defend it, being much more personally involved than I am, then no.
That's how I interview. I've been told that it's devastating to be on the receiving side of that.

I bet it is devastating!

I've been on both sides of such interviews. It's exhausting, although strangely enjoyable, to be on the receiving end of such questioning; it's exhausting, and often emotionally draining, to be the questioner. But it does seem to be a technique that works. Programming-in-the-large is all about finding groups of capable people who can communicate effectively about extraordinarily abstract topics, and I don't know any better way to do this than the one that Thompson describes.

It seems like Thompson has been drifting of late. He describes his work at Google:

Probably my job description -- whether I follow it or not, that's a different question -- would be just to find something to make life better. Or have some new idea of new stuff that replaces old stuff. Try to make it better. Whatever it is that's wrong, that takes time, that causes bugs.

In his defense, it can't be easy to join an organization such as Google and figure out how to effectively contribute.

A very interesting part of the Thompson interview comes near the end, when Seibel gets Thompson to talk about C++. After getting Thompson to describe the delicate working relationship he had with Stroustrup, Seibel gets Thompson to open up about the language itself:

It certainly has its good points. But by and large I think it's a bad language. It does a lot of things half well and it's just a garbage heap of ideas that are mutually exclusive. Everybody I know, whether it's personal or corporate, selects a subset and these subsets are different. So it's not a good language to transport an algorithm -- to say, "I wrote it; here, take it." It's way too big, way too complex. And it's obviously built by committee.

Thompson certainly lets us know how he feels!

But I spent nearly a decade working in C++, across several different companies and in several different domains, and the "subset" behavior that he describes is exactly true. No two uses of C++ ever seem to be the same, and it makes it strangely hard to move from one body of C++ to another body of C++ because even though they are all, at some level, written in the same language, any two libraries of C++ code always seem to be written in two different language.

I wished that the Thompson interview went on and on, and I'm glad that Seibel was able to include him in the book.

IntelliJ IDEA version 9 is out of beta

IntelliJ have released version 9 of their IDEA development environment.

I've been using the Beta version of version 9 and I like it very much. It does take a long time to start up, but once it has started up it is very powerful and responsive.

Perhaps I have set up my project incorrectly, as it seems like IDEA wants to completely re-scan my rt.jar each time it starts up and opens the project. Is there something that I have mis-configured?

Time to go figure out how to upgrade my Beta version to the released version!

Monday, December 7, 2009

Coders at Work: Dan Ingalls and Peter Deutsch

Chapters 10 and 11 of Coders at Work contain Seibel's interviews with Dan Ingalls and Peter Deutsch, and represent Seibel's inclusion of the Smalltalk world.

Ingalls and Deutsch were both part of Alan Kay's historic team at Xerox PARC, and much of the two interviews contains discussions of the work that went on at that time, its goals and techniques and observations. Ingalls recalls the focus on educational software:

It was envisioned as a language for kids -- children of all ages -- to use Alan's original phrase. I think one of the things that helped that whole project, and it was a long-term project, was that it wasn't like we were out to do the world's best programming environment. We were out to build educational software, so a lot of the charter was more in the space of simplicity and modeling the real world.

One of the implementation techniques that Ingalls particularly remembers, in stark contrast to the multi-hour batch cycles that were then common, was the sense of immediacy you got for making a change and almost instantly being able to observe the effect of the change:

For instance, our turnaround time for making changes in the system from the beginning was seconds, and subsecond fairly soon. It was just totally alive. Ant that's something which I, and a bunch of the other people, got a passion for. Let's build a system that's living like that. That's what Smalltalk became.

This goal of providing rapid feedback, interactive development, and immediate reflection of changes is indeed powerful. It lead to the development of modern IDE software (Eclipse was built by the OTI team, who wanted to expand on their work building environments for Smalltalk systems), as well to many software methodologies that favor iteration and short cycles, so that feedback and learning can occur. As the Agile Manifesto proposes:

Deliver working software frequently, from a couple of weeks to a couple of months, with a preference to the shorter timescale.

My favorite part of the Ingalls interview was his description of the sort of person who makes the best programmer:

If you grow up in a family where when the cupboard door doesn't close right, somebody opens it up and looks at the hinge and sees that a screw is loose and therefore it's hanging this way vs. if they say, "Oh, the door doesn't work right; call somebody" -- there's a difference there. To me you don't need any involvement with computers to have that experience of what you see isn't right, what do you do? Inquire. Look. And then if you see the problem, how do you fix it?

In addition to this almost feverish curiosity, another major aspect of great programmers is their ability to think about problems abstractly. Deutsch recalls:

I've always been really comfortable in what I would call the symbolic world, the world of symbols. Symbols and their patterns have always just been the stuff I eat for lunch. And for a lot of people that's not the case.
The people who should be programming are the people who feel comfortable in the world of symbols. If you don't feel really pretty comfortable swimming around in that world, maybe programming isn't what you should be doing.

I thought it was very interesting to include Deutsch, as he's a notable person who was a Famous Computer Programmer for decades, then just one day got on his horse and rode out of town:

And I had this little epiphany that the reason that I was having trouble finding another software project to get excited about was not that I was having trouble finding a project. It was that I wasn't excited about software anymore. As crazy as it may seem now, a lot of my motivation for going into software in the first place was that I thought you could actually make the world a better place by doing it. I don't believe that anymore. Not really. Not in the same way.

Can software make the world a better place? I think it can, and after 30 years I'm not done trying. But I also believe you should follow your desires, and I'm pleased that Deutsch was able to recognize that his passion had moved elsewhere.

Thursday, December 3, 2009

The ongoing evolution of Java

Java continues to evolve. That's a good thing, of course, but it can be a little bit overwhelming trying to keep up, even in the focused areas of concern to me.

A brief set of capsule reports from my small area of Java-related interests:

  • JUnit has released version 4.8. This version adds the new annotations Category, IncludeCategory, and ExcludeCategory, as well as the Categories test runner. Meanwhile, I'm still stuck on version 3 of JUnit, as are many codebases that I know about. Is there an automated assistant which helps with the transition from JUnit 3.X to JUnit 4.X? How do others deal with this migration?

  • Mark Reinhold announced that JDK 1.7 has slipped to September 2010. However, as part of this slippage, there is also increasing anticipation that some form of closures will make the 1.7 release. That closures syntax looks pretty horrendous to me, and reminds me of ancient C-language function prototype syntax; I hope they can see their way to building something less grotesque.

  • Meanwhile, in more near-term JDK 1.7 news, apparently Escape Analysis is now enabled by default as part of 1.7 Milestone 5. Escape Analysis is a very powerful compiler optimization which can dramatically reduce object allocation under certain circumstances. If you want to learn more about it, here's a good place to start; that weblog posting contains a pointer to Brian Goetz's fascinating analysis of Java memory allocation from 2005.

Back in Derby land, we're still somewhat stuck on JDK 1.4. We only dropped support for JDK 1.3 about 18 months ago, so we tend to be fairly conservative and support the older Java implementations long after others have moved on.

But there seems to be some pressure building to consider moving to JDK 1.5 as a base level of support. I think it's increasingly hard to find significant Java deployment environments where JDK 1.5 is not the standard, so I don't think it will be that much longer before Derby makes the jump to assuming a base JDK 1.5 level.

For example, it appears that the new Android Java system ("Dalvik") works best with Java bytecodes that are built by a JDK 1.5 compiler, and the Dalvik VM tools issue warnings when pointed at the Derby jar files.

As the Red Queen said to Alice:

you have to run as fast as you can to stay where you are; if you want to go somewhere, you have to run twice as fast as that

Greg Kroah-Hartman and the Linux drivers project

There was an interesting interview of Greg Kroah-Hartman over at How Software Is Built.

Kroah-Hartman describes the role of trust in the open source Linux development process:

I maintain the subsystems such as USB, and I have people who I trust enough that if they send me a patch, I’ll take it, no questions asked. Because the most important thing is I know that they will still be around in case there’s a problem with it. [laughs]

And then I send stuff off to Linus. So, Linus trusts 10 to 15 people, and I trust 10 to 15 people. And I’m one of the subsystem maintainers. So, it’s a big, giant web of trust helping this go on.

Other people have described this behavior as "meritocracy", or as "reputation-based development". It's a fascinating look inside the social aspects of the software development process, and it's interesting how much of the overall development process involves non-technical topics:

Companies want to get the most value out of Linux, so I counsel them that they should drive the development of their driver and of Linux as a whole in the direction that they think makes the most sense. If they rely on Linux and feel that Linux is going to be part of their business, I think they should become involved so they can help change it for the better.

Kroah-Hartman talks about the increasing maturity of Linux:

we don’t gratuitously change things. A big part of what drives that change is that what Linux is being used for is evolving. We’re the only operating system in something like 85 percent of the world’s top 500 supercomputers today, and we’re also in the number-one-selling phone for the past year, by quantity.

It’s the same exact kernel, and the same exact code base, which is pretty amazing.

In fact, he goes so far as to make a rather bold prediction: Linux is now so well established that it cannot be displaced:

I just looked it up, and we add 11,000 lines, remove 5500 lines, and modify 2200 lines every single day.

People ask whether we can you keep that up, and I have to tell you that every single year, I say there’s no way we can go any faster than this. And then we do. We keep growing, and I don’t see that slowing down at all anywhere.

I mean, the giant server guys love us, the embedded guys love us, and there are entire processor families that only run Linux, so they rely on us. The fact that we’re out there everywhere in the world these days is actually pretty scary from an engineering standpoint. And even at that rate of change, we maintain a stable kernel.

It’s something that no one company can keep up with. It would actually be impossible at this point to create an operating system to compete against us. You can’t sustain that rate of change on your own, which makes it interesting to consider what might come after Linux.

For my part, I think the only thing that’s going to come after Linux is Linux itself, because we keep changing so much. I don’t see how companies can really compete with that.

I'm not sure I really believe it is "impossible"; perhaps this is one of those claims that people will laugh at, 50 years from now, but in my opinion the Linux work is astonishingly good. I run lots of different Linux distributions on a variety of machines and without exception they are solid, reliable, and impressively efficient.

It's a very interesting software development process and I enjoyed reading this interview and recommend it for those who are interested in how open source software development actually works.

Tuesday, December 1, 2009

Consistency in naming matters

If you have a naming convention for your code, stick with it.

If you don't have a naming convention for your code, establish one, and then stick with it.

You may not like the naming convention, but it doesn't matter. Even if you don't like it, stick with it. If you really don't like it, you can try to change the convention (which implies changing all the code to match it, of course), but don't just casually violate the naming convention.

It's just unbelievable how many hours of my life I've wasted dealing with situations where one programmer wrote:

String result = obj.getKeyByID(my_id);

while elsewhere, nearby in the code, a different programmer wrote:

Object obj = store.readById(my_id);

and thus, while working in that code, I casually wrote:

Object anotherObj = store.readByID(different_id);

Yes, I know, modern IDEs help you notice such simple problems rapidly.

But the whole world would be so much better if we just avoided such inconsistencies in the first place.

Grumble grumble grumble