Journal of a Programmer: December 2011

Saturday, December 31, 2011

The Disneyland art of Claude Coats

I'm just old enough to remember growing up in Orange County (well, Whittier, but it was the same thing), in the early 1970's, just as Disneyland was building out and completing the most important of the signature rides: Pirates of the Caribbean, Submarine Voyage, and of course the Haunted Mansion.

So I loved this fabulous essay about the artist Claude Coats, who started his career in the movie-making side of Walt Disney Studios (Pinocchio, Fantasia), then made the transition to artist and set designer at Disneyland: Long-Forgotten: Claude Coats: The Art of Deception and the Deception of Art.

The essayist makes the point that the skill of the background painter is to create the world that the other artists will fill with music, animation, and story:

You the viewer are invited to imagine yourself on the other side of the frame (the opposite dynamic of Davis). You see yourself as the character in the landscape. Though never intended for public display, those sketches are among the most beautiful and seductive examples of Mansion art. Who wouldn't want go exploring in this?

I love the description of the Disneyland ride as a voyage through a painting:

As you look at some of those Coats backgrounds up above, like Gepetto's cottage and the Sorcerer's Apprentice interiors, you almost wish you could step into them and look around, so inviting are they. With Rainbow Caverns, Coats finally enabled you do just that: ride right through one of his moody, atmospheric paintings.

And of course, this is the basis of the oft-remarked "suspension of disbelief":

The whole drawing depicts a dissolve between there and here, inside and outside, human artifice and wild nature. This is not an exit point for characters stepping over into our presence; this is a place that invites you to enter.

The essay is filled with gorgeous sketches and paintings, so go have a look!

Wednesday, December 28, 2011

Prince of Persia: still alive after 25 years!

I've been having fun looking through the notes put together by "mrsid", a programmer who took up the challenge of re-implementing the classic Apple II game "Prince of Persia", by reverse-engineering a running copy of the game, while simultaneously reading Jordan Mechner's original diary and design notes:

In the meantime I found Jordan Mechner's blog. He had the courage and insight to post all of his old journals from the 1980s. He meticulously kept a log of his daily work. What a great read that was. Just a few days before I started looking for Prince of Persia information Jordan also posted this article on his blog. It contained a link to a PDF, which turned out to be the Prince of Persia source code documentation.
I was amazed. The source was lost on Apple II disks, but the document written just a few days after the release in 1989 was there, with all kinds of juicy little details about the graphics engine, the data structures, lists of images, and more. It was like someone had handed me the key to a long lost treasure.

Mrsid presents the notes as follows:

Hopefully there will be more essays posted in the future, as the complete notes are not yet available.

In the meantime, though, it's great fun to read Jordan's original notes, as well as mrsid's reverse engineering analysis, and it's also quite cool to see the discussion back-and-forth between the two of them in the comments on the blog.

I still vividly remember my eldest daughter playing Prince of Persia in the early 1990's on our fresh new Mac IIsi -- my how she loved that game, and how we loved that computer!

Monday, December 26, 2011

It's not just a game ...

... it's a new way to celebrate the holidays.

Saturday, December 24, 2011

Sometimes I think I understand computers ... sometimes not

I spent 4 hours trying to set things up so that my Windows 7 laptop could print to a USB-attached printer on my Ubuntu Linux desktop.

Most of it went pretty easily: ensure that CUPS and Samba were installed and configured on the Linux machine, and verify that the Samba configuration allowed printer sharing.

But then, no amount of fiddling with the Add Printer wizard on the Windows 7 machine was finding success.

Finally, this weird sequence worked:

Choose Add Printer
Choose Add a local printer. Ignore all the warnings about how you should only do this if you have a locally-attached, non-USB cabled printer. :)
Choose Create a new port.
Choose Local Port. Click Next.
When prompted to enter a port name, type in \\computername\printername

It is so weird that in order to print to a printer on another machine, you have to (a) tell Windows to define a locally-attached printer, and then stuff a remote machine network address into the 'local port' field.

But hey, it worked...

Great investigation of a Google synonym query

This in-depth exploration of an unexpected Google query result is fascinating.

But that’s the thing, what seems easy and straightforward to us is actually quite difficult for a machine.

Indeed.

DVCS and change authenticity

In the world of version control, distributed version control systems such as Git and Mercurial are all the rage.

These systems are indeed extremely powerful, but they all suffer from a fundamental issue, which is how the various nodes in the distributed system can establish the necessary trust to verify authenticity of push and pull requests.

(Disclosure: at my day job, we make a version control system, which has a centralized architecture and a wholly different trust and authentication mechanism. So I'm more than just an interested observer here.)

Now, this issue has been known and discussed for quite some time, but it has acquired greater urgency this fall after a fairly significant compromise of the main Linux kernel systems. As Jonathan Corbet notes in that article

We are past the time where kernel developers are all able to identify each other. Locking down kernel.org to the inner core of the development community would not be a good thing; the site is there for the whole community. That means there needs to be a way to deal with mundane issues like lost credentials without actually knowing the people involved.

The emerging proposal to deal with this problem includes several new features in Git:

Signed commits
and Pulling Signed Tags,
both of which are now operational in the development mainline of the Git trunk.

I suspect that this problem is a deep and hard and fundamental one. It seems to me that the DVCS infrastructure is building a fairly complex mechanism: here's how Linus will use this technology to ensure the integrity of the Linux kernel, as described by Junio Hamano (the lead Git developer):

To make the whole merge fabric more trustworthy, the integration made by his lieutenants by pulling from their sub-lieutenants need to be made verifyable the same way, which would (1) make the number of signed tags even larger and (2) make it more likely somebody in the foodchain gets lazy and refuses to push out the signed tags after he or she used them for their own verification.

But reading this description, I'm instantly reminded of a very relevant observation made by Moxie Marlinspike in the context of the near-complete-collapse of the SSL Certificate Authority chain of trust this spring:

Unfortunately the DNSSEC trust relationships depend on sketchy organizations and governments, just like the current CA system.
Worse, far from providing increased trust agility, DNSSEC-based systems actually provide reduced trust agility. As unrealistic as it might be, I or a browser vendor do at least have the option of removing VeriSign from the trusted CA database, even if it would break authenticity with some large percentage of sites. With DNSSEC, there is no action that I or a browser vendor could take which would change the fact that VeriSign controls the .com TLD.
If we sign up to trust these people, we're expecting them to willfully behave forever, without any incentives at all to keep them from misbehaving. The closer you look at this process, the more reminiscent it becomes. Sites create certificates, those certificates are signed by some marginal third party, and then clients have to accept those signatures without ever having the option to choose or revise who we trust. Sound familiar?

I'm not saying I have the answer; indeed, the very smartest programmers on the planet are struggling intensely with this problem. It's a very hard problem. As the researchers at the EFF recently noted:

As currently implemented, the Web's security protocols may be good enough to protect against attackers with limited time and motivation, but they are inadequate for a world in which geopolitical and business contests are increasingly being played out through attacks against the security of computer systems.

Returning to the world of DVCS systems, for a moment, I've just felt, all along, that the fundamental weakness of DVCS systems was going to turn out to be their weak authenticity guarantees; indeed, this is the core reason that organizations like Apache have been very reluctant to open their infrastructure up to DVCS-style source control, even given all its other advantages.

And it seems like the people who are trying to repair the Certificate Authority technology are also skeptical that a 100% distributed solution can be effective; as Adam Langley says:

We are also sacrificing decentralisation to make things easy on the server. As I've previously argued, decentralisation isn't all it's cracked up to be in most cases because 99.99% of people will never change any default settings, so we haven't given up much. Our design does imply a central set of trusted logs which is universally agreed. This saves the server from possibly having to fetch additional audit proofs at runtime, something which requires server code changes and possible network changes.

And the EFF's Sovereign Keys proposal has a similar semi-centralization aspect:

Master copies of the append-only data structure are kept on machines called "timeline servers". There is a small number, around 10-20, of these. The level of trust that must be placed in them is very low, because the Sovereign Key protocol is able to cryptographically verify the important functions they perform. Sovereign Keys are preserved so long as at least one server has remained good. For scalability, verification, and privacy purposes, lots of copies of the entire append-only timeline structure are stored on machines called "mirrors".

With the new Git technology, as I understand it, the user who accepts a pull request from a remote repository now faces a new challenge:

The integrator will see the following in the editor when recording such a merge:

The one-liner merge title (e.g 'Merge tag rusty-for-linus of git://.../rusty.git/');

The message in the tag object (either annotated or signed). This is where the contributor tells the integrator what the purpose of the work contained in the history is, and helps the integrator describe the merge better;

The output of GPG verification of the signed tag object being merged. This is primarily to help the integrator validate the tag before he or she concludes the pull by making a commit, and is prefixed by '#', so that it will be stripped away when the message is actually recorded; and

The usual "merge summary log", if 'merge.log' is enabled.

This will be a challenging task to require of all developers in this chain of trust. Is it feasible? One thing for sure, the Git team are to be commended for facing this problem head on, for openly discussing it, and for trying to push the problem forward. It is exciting to watch them struggle with the issues, and I've learned an immense amount from reading their discussions.

So I think it will be very interesting to see how the Git team fares with this problem, as they, too, have some wonderfully talented people at work on the problems.

Friday, December 23, 2011

Holiday weekend link dump

As always, apologies in advance for the link dump. It's just been so busy recently, that I haven't found the time to explore things in detail.

Still, if you're looking for a few holiday-weekend things to read, try these:

If you haven't been paying much attention to the Carrier IQ controversy, Rich Kulawiec over at TechDirt has a great summary of what's been going on, with an amazing number of links to chase and study. As Kulawiec puts it:
Debate continues about whether Carrier's IQ is a rootkit and/or spyware. Some have observed that if it's a rootkit, it's a rather poorly-concealed one. But it's been made unkillable, and it harvests keystrokes -- two properties most often associated with malicious software. And there's no question that Carrier IQ really did attempt to suppress Eckhart's publication of his findings.
But even if we grant, for the purpose of argument, that it's not a rootkit and not spyware, it still has an impact on the aggregate system security of the phone: it provides a good deal of pre-existing functionality that any attacker can leverage. In other words, intruding malware doesn't need to implement the vast array of functions that Carrier IQ already has; it just has to activate and tap into them.
Many of us may be taking some holiday break, but the busy beavers at CalTrans are embarking on the final major step of the Bay Bride reconstruction: threading the suspension cable over the top of the main bridge tower.
The cable is 2.6 feet in diameter and nearly a mile long. It weighs 5,291 tons, or nearly 10.6 million pounds, and is made up of 137 steel strands, each one composed of 127 steel wires.
The strands will go up to the top of the center tower and down to the San Francisco side of the span, where they will be looped underneath the deck of the bridge, then threaded back up to the tower and back down to the Oakland side of the bridge. There, crews will anchor the other end of the strands.

I found it interesting to read about how they finish the cable installation:

Ney said it will take a few months to complete the installation. Once all the strands are installed, crews will bind them together and coat them with zinc paste.
I'm familiar with the notion of sacrificial zinc anodes in sailboats, where the zinc is used to avoid destruction of a more valuable metal part (such as your stainless steel propellor). Is the zinc paste on the cable used for the same purpose?
The bubble is back! Everywhere you look, there is article after article after article after article about the desparate competition for software engineers that's underway right now.
Of course, at least part of the problem is that it's still the case that, all too frequently, people who think they can program, actually can't. Contrary to many people, I'm not in a hurry to blame our education system for this. I think programming is very hard, and it doesn't surprise me that there's a high failure rate. Some would say that anybody can learn to program, but I think there's a real underlying talent at issue, and just like I would be a lousy lawyer or a lousy surgeon or a lousy soprano, some people will have more aptitude for writing software than others.
Anyway, I can attest to a fair amount of the insanity, though happily I'm pretty well insulated from it. But we have clearly entered into a very exciting new time in software, with a variety of hot technologies, such as cloud computing and mobile applications, providing the fuel. Though I think Marc Andreessen may be a bit too giddy about the prospects for it all, I have to agree with his assessment that:
Six decades into the computer revolution, four decades since the invention of the microprocessor, and two decades into the rise of the modern Internet, all of the technology required to transform industries through software finally works and can be widely delivered at global scale.

A large part of that transformational technology is, once again, being driven by Amazon. As they continue releasing new features at breakneck speed, Wired magazine takes a step back and wonders: what does it mean that Amazon Builds World’s Fastest Nonexistent Supercomputer. I remember my first introduction to system virtualization, when I got to use IBM's VM software back in the early 1980's; it definitely takes a while to get your head around what's really going on here!
Moving on to something entirely unrelated to software, in the world of professional football, the dispute between Uruguayan superstar Luis Suarez and French superstar Patrice Evra has been gathering a lot of attention. My friend Andrew has reprinted a well-phrased essay on the topic, which is well worth reading.
Professional sports is of course mostly entertainment, yet somehow it is more than that; it is undeniably one of the largest parts of modern life. On that note, a recent issue of The New Yorker carries a fine review of the life of Howard Cosell, who played a major part in the development of professional sports into one of America's major passions.
On Monday nights, Cosell called out players for their mistakes with orotund rhetoric and moral high dudgeon. Just as music fans used to go to the New York Philharmonic to watch Leonard Berstein's gymnastics more than to hear, yet again, Beethoven's Fifth, people tuned in to hear -- and howl at -- Cosell. Even if you loathed him, his performance was what made Monday nights memorable.

And lest you think that this is just something minor, don't miss this wonderful article in The Economist: Little red card: Why China Fails at Football..
Solving the riddle of why Chinese football is so awful becomes, then, a subversive inquiry. It involves unravelling much of what might be wrong with China and its politics. Every Chinese citizen who cares about football participates in this subversion, each with some theory—blaming the schools, the scarcity of pitches, the state’s emphasis on individual over team sport, its ruthless treatment of athletes, the one-child policy, bribery and the corrosive influence of gambling. Most lead back to the same conclusion: the root cause is the system.
Is it sports, or is it life? It's definitely not "just" entertainment.
OK, getting back to things I understand better, it's been quite the year for Mozilla and Firefox. Ever since they changed their release process and their version numbering back in the spring, it's been a continual stream of Firefox releases, so it's no suprise that Firefox 11 is soon to be available, with yet more features and functionality. But the Firefox team are pushing beyond just making great browsers, branching into areas like web-centric operating systems, Internet identity management, and building entire applications in the browser, as David Ascher explains. Ascher's article notes that within the Mozilla Foundation, people are now thinking significantly "beyond the browser":
we’re now at a distinct point in the evolution of the web, and Mozilla has appropriately looked around, and broadened its reach. In particular, the browser isn’t the only strategic front in the struggle to promote and maintain people’s sovereignty over their online lives. There are now at least three other fronts where Mozilla is making significant investments of time, energy, passion, sweat & tears. They’re still in their infancy, but they’re important to understand if you want to understand Mozilla

Meanwhile, how is Mozilla handling this? Aren't they just a few open source hackers? How can they do all this? Well, as Kara Swisher points out, Mozilla has some pretty substantial financial backing:
Mozilla is set to announce that it has signed a new three-year agreement for Google to be the default search option in its Firefox browser.
It’s a critical renewal for the Silicon Valley software maker, since its earlier deal with the search giant has been a major source of revenue to date.

Meanwhile, what is all this doing to the life of the ordinary web developer? As Christian Heilmann observes, it brings not just excitement, but also stress and discomfort, but underlying this is the fact that the web is no longer just a place for experimentation, but has transitioned into being the production platform of our daily lives:
We thought we are on a good track there. Our jobs were much more defined, we got more respect in the market and were recognised as a profession. Before we started showing a structured approach and measurable successes with web technologies we were just “designers” or “HTML monkeys”.
Are you just feeling flat-out overwhelmed by all this new technology? Well, one wonderful thing is that the web also provides the technology to stay up to date:
MIT President Susan Hockfield said, “MIT has long believed that anyone in the world with the motivation and ability to engage MIT coursework should have the opportunity to attain the best MIT-based educational experience that Internet technology enables. OpenCourseWare’s great success signals high demand for MIT’s course content and propels us to advance beyond making content available. MIT now aspires to develop new approaches to online teaching.”
Now, if I can just find that free time that I misplaced...
Just because software is open source, it can still fade away into the sunset. Which is a shame, because I was really hoping to get a distribution with Issue 46 fixed, because I hit it all the time! Yes, yes, I know, I should just download the source and build it. Or find a new debugger. Or something.
Forgive the breathless style, and read the well-written summary of the Buckshot Yankee incident at the Washington Post: Cyber-intruder sparks massive federal response — and debate over dealing with threats. As author Ellen Nakashima observes, we're still struggling with what we mean when we toss about terms like "cyber war" and the new Cyber Command unit:
“Cyber Command and [Strategic Command] were asking for way too much authority” by seeking permission to take “unilateral action . . . inside the United States,” said Gen. James E. Cartwright Jr., who retired as vice chairman of the Joint Chiefs in August.
Officials also debated how aggressive military commanders can be in defending their computer systems.
“You have the right of self-defense, but you don’t know how far you can carry it and under what circumstances, and in what places,” Cartwright said. “So for a commander who’s out there in a very ambiguous world looking for guidance, if somebody attacks them, are they supposed to run? Can they respond?”
Finally (since something has to go last), Brad Feld has ended the year by winding down the story of Dick and Jane's SayAhh startup in a most surprising fashion (well, to me, at least): SayAhh Has Shut Down.
If you weren't following the SayAhh series, Feld had been writing a series of articles about a hypothetical software startup, using them to illustrate many of the perils and complexities that can arise when trying to build a new company from scratch. I'm not sure if there's a clean index to all the articles he wrote, but you can start here for the first article, and then mostly follow along via his blog. I think it's great that he ended the series in such a realistic fashion, though I'm quite interested to see how his readership feels about that!

I hope your holidays are enjoyable, safe, and filled with family and friends.

Thursday, December 22, 2011

I found a new Derby bug!

It doesn't happen very often that I find a bug in Derby, so it's worth noting: https://issues.apache.org/jira/browse/DERBY-5554.

I'm just enough disconnected from day-to-day Derby development at this point to not immediately understand what the bug is.

Note that the crash is in a generated method:


Caused by: java.lang.NullPointerException
at org.apache.derby.exe.acf81e0010x0134x6972x0511x0000033820000.g0(Unknown Source)

Derby uses generated methods in its query execution for tasks such as projection and restriction of queries.

So, for example, the generated method above is probably implementing part of the "where" clause of the query that causes the crash.

Derby generated methods are constructed at runtime, by dynamically emitting java bytecodes into a classfile format, and then dynamically loading that class into the Java runtime. It's quite clever, but quite tricky to diagnose, because it's hard to see the actual Java code that is being run.

A long time ago, I tracked down some debugging tips for working on crashes in generated code, and collected them here: http://wiki.apache.org/db-derby/DumpClassFile.

It's late, and I'm tired (and suffering from a head cold), but if I get some time over the holiday weekend I'll try to look into this crash some more.

Of course, perhaps somebody like Knut Anders or Rick will have already figured the problem out by then :)

Wednesday, December 21, 2011

Teaching yourself to become a thinker

I finally got around to reading Bill Deresiewicz's fascinating lecture: Solitude and Leadership. It's over two years old now, but it has aged very well, so if you haven't yet seen it, I encourage you to wander over and give it a read.

Although Deresiewicz spends much of the lecture talking about leadership, that wasn't my favorite part of his talk. Rather, I particularly enjoyed his analysis of thinking. He proposes that modern civilization isn't doing enough to develop a culture of thinkers:

What we don’t have, in other words, are thinkers. People who can think for themselves. People who can formulate a new direction: for the country, for a corporation or a college, for the Army—a new way of doing things, a new way of looking at things.

What does Deresiewicz mean by a "thinker"? He gives as an example (he is speaking to a West Point audience in 2009, after all) General David Petraeus:

He has a Ph.D. from Princeton, but what makes him a thinker is not that he has a Ph.D. or that he went to Princeton or even that he taught at West Point. I can assure you from personal experience that there are a lot of highly educated people who don’t know how to think at all.
No, what makes him a thinker—and a leader—is precisely that he is able to think things through for himself. And because he can, he has the confidence, the courage, to argue for his ideas even when they aren’t popular. Even when they don’t please his superiors. Courage: there is physical courage, which you all possess in abundance, and then there is another kind of courage, moral courage, the courage to stand up for what you believe.

Admiring General Petraeus for his mental and moral courage is fair, but even more interesting to me is Deresiewicz's observation on what (I think) is the more important aspect of being a thinker: creativity and originality:

Thinking means concentrating on one thing long enough to develop an idea about it. Not learning other people’s ideas, or memorizing a body of information, however much those may sometimes be useful. Developing your own ideas. In short, thinking for yourself. You simply cannot do that in bursts of 20 seconds at a time, constantly interrupted by Facebook messages or Twitter tweets, or fiddling with your iPod, or watching something on YouTube.
I find for myself that my first thought is never my best thought. My first thought is always someone else’s; it’s always what I’ve already heard about the subject, always the conventional wisdom. It’s only by concentrating, sticking to the question, being patient, letting all the parts of my mind come into play, that I arrive at an original idea. By giving my brain a chance to make associations, draw connections, take me by surprise. And often even that idea doesn’t turn out to be very good. I need time to think about it, too, to make mistakes and recognize them, to make false starts and correct them, to outlast my impulses, to defeat my desire to declare the job done and move on to the next thing.

It's a wonderful observation, and it's so very, very true. Creativity, originality, and inspiration require patience, reflection, and concentration.

It brings to mind a wonderful lesson that my dear friend Neil Goodman taught me over twenty years ago, when I was still just learning to program and I was trying to understand the way that Neil approached a problem.

I was asking him how he knew when he was done with the design phase of his project, and ready to move on to the coding phase. Neil replied with an answer that was very evocative of Deresiewicz's advice to "outlast your impulses". Neil said (as best I remember):

Work on your design. At some point, you will think you are done, and you are ready, but you are not. You have to resist that feeling, and work on your design some more, and you will find more ways to improve it. Ask others to review it; re-read and re-consider it yourself. Again, you will think you are done, and you are ready, but you are not. You must resist, resist, resist! Force yourself to continue iterating on your design, paying attention to every part of it, over and over. Even if you feel that you can't possibly improve it any more, still you must return to it. Only then, will you reach the point when you are ready to write code.

Of course, for the full effect, you have to have Neil himself (in his earnest, impassioned, gentle-giant sort of way) deliver the message, but hopefully the point comes through.

Incidentally (perhaps Deresiewicz didn't get to pick his own title?), I don't think that solitude really captures the idea properly. Or maybe solitude is the right thing in a military context, but in the software engineering field, where I spend all my time, I don't think that solitude is either necessary nor useful for original, creative thinking. You need to get feedback and reactions from others, and you can't do that without communicating, and without listening. But you do need to exercise a number of activities which are certainly related to solitude: contemplation, reflection, consideration, etc. So if I had the chance to re-title his essay, I might suggest that he have a title more like "Leadership and the ability to think for yourself."

But that's quite a bit wordier :)

I've wandered quite a bit far afield, but hopefully I've intrigued you enough with Deresiewicz's essay that you'll wander over and give it a read, and maybe (hopefully) you will find it time well spent.

Monday, December 19, 2011

Two command line shell tutorials

I just wouldn't be me if I didn't notice things like these and want to immediately post them to my blog...

Now get out there and open those terminal sessions!

Friday, December 16, 2011

Did Iran capture the drone by hacking its GPS?

There's a fascinating report in today's Christian Science Monitor speculating that Iran used vulnerabilities in the drone's GPS technology to simply convince the drone to land in Iran:

"GPS signals are weak and can be easily outpunched [overridden] by poorly controlled signals from television towers, devices such as laptops and MP3 players, or even mobile satellite services," Andrew Dempster, a professor from the University of New South Wales School of Surveying and Spatial Information Systems, told a March conference on GPS vulnerability in Australia.
"This is not only a significant hazard for military, industrial, and civilian transport and communication systems, but criminals have worked out how they can jam GPS," he says.
The US military has sought for years to fortify or find alternatives to the GPS system of satellites, which are used for both military and civilian purposes. In 2003, a “Vulnerability Assessment Team” at Los Alamos National Laboratory published research explaining how weak GPS signals were easily overwhelmed with a stronger local signal.
“A more pernicious attack involves feeding the GPS receiver fake GPS signals so that it believes it is located somewhere in space and time that it is not,” reads the Los Alamos report. “In a sophisticated spoofing attack, the adversary would send a false signal reporting the moving target’s true position and then gradually walk the target to a false position.”

Here's the link to the ten-year-old Los Alamos National Laborary report: GPS Spoofing Countermeasures, which in turn has references to a number of other references to read.

Very interesting stuff.

Given that location-based devices have become so prevalent in our lives (smartphones, cars, etc.), it's interesting to contemplate how we might improve the reliability and trustworthiness of the location awareness of our automated assistants. The guys over at SpiderLabs Anterior had some great articles on this recently:

Thursday, December 15, 2011

Bowl Season is here!

College Football Bowl Game Season is upon us, filled with memorable events such as "The Famous Idaho Potato Bowl".

With 35 bowl games, you need to know what to watch, and what to miss.

So get prepared, and head on over to Yahoo! Sports's bowl game preview: Dashing through the bowls … and coaching moves.

You'll find useful advice such as this:

Fight Hunger Bowl (25)
Dec. 31
Illinois vs. UCLA

Who has momentum? This is the bowl where momentum goes to die. Bruins have lost three of their past four but still look great compared to an Illinois team that has lost six in a row.
Who has motivation? Tail-spinning teams playing for interim coaches make this a potential low-intensity debacle. But both have hired new coaches who presumably will be watching to see who wants to make an impression heading into 2012.
Who wins a mascot fight? Joe Bruin by default, since Chief Illiniwek was forcibly retired in 2007 and not replaced.
Dash fact: Illinois hasn’t scored more than 17 points in a game since Oct. 8.
Dash pick: UCLA 14, Illinois 10. It’s New Year’s Eve. Find something better to do with your time.

Intriguingly, the two best bowl games of the year look to involve Bay Area teams:

The Fiesta Bowl, on Jan 2, features Stanford and Oklahoma State
The Holiday Bowl, on Dec 28, features Texas vs. California

Both games should be well-matched, well-played, and quite fun to watch.

The bubble is back!

Wow! One billion dollars is a lot of money.

Will they use some of that to start addressing the widely-reported discord?

South America joins the Amazon cloud

Amazon Web Services is growing at an incredible rate. They just opened their eighth AWS region, in Sao Paolo, Brazil.

Here's a nifty interactive map of the AWS global infrastructure, so you can see the various regions and their information.

The U.S. West (Northern California) region is actually located just a few miles from my house. (I think that the city planners ran out of imagination, as it's at the corner of Investment Boulevard and Corporate Ave, not far from the Industrial Boulevard exit on the freeway.)

It's not much to look at, as a physical building; it's what's inside that counts!

Delightfully illustrated HTTP status codes

The HyperText Transfer Protocol (HTTP) indicates a lot of information in the "status code". Everyone is of course familiar with codes 404 ("Not found"), 200 ("OK"), and 500 ("Internal server error"), but there are dozens more codes, each with their own precise meaning.

If you're having trouble remembering which code is which, or don't understand a particular code and would like a more "vivid" example, head over to Flickr, for this delightful set of HTTP Status Codes illustrated by cats.

This is definitely the geekiest humor of the year!

I particularly enjoy

but you should really view the entire set for the full effect (apologize to your co-workers in advance before laughing out loud!).

Wednesday, December 14, 2011

Raymond Chen skewers the executive re-org email

Delightful!. Raymond Chen perfectly captures the classic executive reorg email.

The bit about auto-summarize at the end is pretty insightful, as well.

Tuesday, December 13, 2011

Looking for something to read?

It's winter time, you're stuck indoors, how about something to read?

Well, you're in luck: both Longform and GiveMeSomethingToRead are just out with their end-of-the-year lists:

I like both lists; there's lots of interesting material here.

There's a fair amount of overlap, naturally, but also each list has a few gems that the other list missed.

FWIW, my favorite out of both lists is My Summer at an Indian Call Center, from Mother Jones Magazine.

Update: Here's another set of lists!

Friday, December 9, 2011

The Popular Mechanics article on Flight 447 is enthralling

If you haven't yet had a chance to go read the Popular Mechanics article about Air France flight 447 then stop what you are doing right now, go sit down for 5 minutes, and read the article.

It is chock full of insight after insight after insight. Here's just one, picked almost at random:

While Bonin's behavior is irrational, it is not inexplicable. Intense psychological stress tends to shut down the part of the brain responsible for innovative, creative thought. Instead, we tend to revert to the familiar and the well-rehearsed. Though pilots are required to practice hand-flying their aircraft during all phases of flight as part of recurrent training, in their daily routine they do most of their hand-flying at low altitude—while taking off, landing, and maneuvering. It's not surprising, then, that amid the frightening disorientation of the thunderstorm, Bonin reverted to flying the plane as if it had been close to the ground, even though this response was totally ill-suited to the situation.

Although, as the article points out, there were multiple technical issues in play (weather, location, time of day, fatigue, etc.), in the end it boils down to human factors: training, experience, and, most of all, communication:

The men are utterly failing to engage in an important process known as crew resource management, or CRM. They are failing, essentially, to cooperate. It is not clear to either one of them who is responsible for what, and who is doing what.

Every sentence in this article is a nugget, with observations about human behavior, suggestions of areas for study and improvement, and, in the end, a realization that people are flawed and make mistakes, and what we need to do most of all is to think, talk, and help each other:

when trouble suddenly springs up and the computer decides that it can no longer cope—on a dark night, perhaps, in turbulence, far from land—the humans might find themselves with a very incomplete notion of what's going on. They'll wonder: What instruments are reliable, and which can't be trusted? What's the most pressing threat? What's going on?

Don't miss this incredible recap of the tragedy of Flight 447.

Thursday, December 8, 2011

Following up

I love follow-up. Follow-up is good: study something, then study the follow-ups, and you will learn more.

So, a few follow-ups:

I've been fascinated by the Air France Flight 447 investigation (background here and here), so if you were equally interested, don't miss this wonderful article in this month's Popular Mechanics: What Really Happened Aboard Air France 447:
Two years after the Airbus 330 plunged into the Atlantic Ocean, Air France 447's flight-data recorders finally turned up. The revelations from the pilot transcript paint a surprising picture of chaos in the cockpit, and confusion between the pilots that led to the crash.
I've been following Jim Gettys and his studies into the network queueing phenomenon known as "BufferBloat". If you've been paying attention to this, too, you won't want to miss this discussion in the ACM Queue column. Meanwhile, Patrick McManus, who is hard at work on the new SPDY networking protocol, has an essay on the topic in which he notes some recent published research, and worries that there is still more research needed:
A classic HTTP/1.x flow is pretty short - giving it a signal to backoff doesn't save you much - it has sent much of what it needs to send already anyhow. Unless you drop almost all of that flow from your buffers you haven't achieved much. Further, a loss event has a high chance of damaging the flow more seriously than you intended - dropping a SYN or the last packet of the data train is a packet that will have very slow retry timers, and short flows are comprised of high percentages of these kinds of packets.
Understanding TCP's behaviors is certainly complicated; I recently wrote about this at some length on the Perforce blog.

Problems such as the complexity of modern systems such as those controlling airplanes, nuclear reactors, etc., or the unexpected inter-actions of networking equipment across the planet, continue to be extremely hard. Only dedicated study of many years or decades is going to bring progress, so I'm pleased to note such progress when it occurs!

Wednesday, December 7, 2011

The astonishing sophistication of ATM skimmer criminals

On his superb blog, Brian Krebs has just posted the latest entry in his astonishing series of investigative articles reporting on modern ATM card skimming criminals, and the devices they use to capture ATM data from compromised ATM machines: Pro Grade (3D Printer-Made?) ATM Skimmer.

Looking at the backside of the device shows shows the true geek factor of this ATM skimmer. The fraudster who built this appears to have cannibalized parts from a video camera or perhaps a smartphone (possibly to enable the transmission of PIN entry video and stolen card data to the fraudster wirelessly via SMS or Bluetooth).

Everything Brian Krebs writes is worth reading, but I find these ATM articles to be just gripping. Modern life is so complex!

Tuesday, December 6, 2011

Really slow news day

One of the "AP Top Stories" this morning is headlined

Malia Obama, 13, is nearly as tall as her father.

Monday, December 5, 2011

Three practical articles, two practical books

Here's a nifty ad-hoc collection of some nice practical articles and books to keep you grounded and focused on what really matters:

First, a nice review of Robert C. ("Uncle Bob") Martin's The Clean Coder: 9 things I learned from reading The Clean Coder by Robert C. Martin, on how professional developers conduct themselves . I haven't read the book (yet), but the review makes me interested enough to keep this book on the list for when I'm next above water on my technical reading. An excerpt from Christoffer Pettersson's review:
As a professional developer, you should spend time caring for your profession. Just like in any other profession, practice gives performance, skill and experience.
It is your own responsibility to keep training yourself by reading, practicing and learning - actually anything that helps you grow as a software developer and helps you get on board with the constant industry changes.
Second, a nifty essay by Professor John Regehr about the ins and outs of "testcase reduction", one of those rarely-discussed but ultra-important skills that gets far too little recognition. At my day job, I'm lucky to have a colleague who is just astonishingly good at testcase reduction; he just has the knack. An excerpt from Regehr's essay:
testcase reduction is both an art and a science. The science part is about 98% of the problem and we should be able to automate all of it. Creating a test case from scratch that triggers a given compiler bug is, for now at least, not only an art, but an art that has only a handful of practitioners.
Third, a nice article on Ned Batchelder's blog about "Maintenance Hatches", those special visibility hooks that let humans observe the operation of complex software in some high-level and comprehensible fashion, for diagnosis and support purposes. In my world, these hatches are typically trace logs, which cause the software to emit detailed information about its activity, and can be turned on and off, and ratcheted up to higher or lower levels, as needed. I like Batchelder's terminology, which conjures up the image of opening a cover to the machinery to observe it at work:
On a physical machine, you need to be able to get at the inner workings of the thing to observe it, fiddle with it, and so on. The same is true for your software.
Fourth, an interesting online book about the practicalities of building a modern web application: The Twelve-Factor App. I guess I was sold when I saw the first factor was to ensure you are using a source code control system (may I recommend one?). From the introduction to the book:
This document synthesizes all of our experience and observations on a wide variety of software-as-a-service apps in the wild. It is a triangulation on ideal practices app development, paying particular attention to the dynamics of the organic growth of an app over time, the dynamics of collaboration between developers working on the app’s codebase, and avoiding the cost of software erosion.
Last, but far from least, check out this amazing online book about graphics programming: Learning Modern 3D Graphics Programming by Jason McKesson. From the introduction:
This book is intended to teach you how to be a graphics programmer. It is not aimed at any particular graphics field; it is designed to cover most of the basics of 3D rendering. So if you want to be a game developer, a CAD program designer, do some computer visualization, or any number of things, this book can still be an asset for you.

The Internet is a wonderful place; so much stuff to read and explore!

Saturday, December 3, 2011

Three papers on Bloom filters

If you should find yourself noticing that:

You're interested in these mysterious things called Bloom filters, that seem to be popping up over and over,
and you'd like to learn more about how Bloom filters work, but you're not sure where to start.

then you're roughly in the position I was in a little while ago, and maybe this article will be of interest to you.

After reading a dozen or so Bloom-filter-related papers spanning 40 years of interest in the subject, I've whittled down the list to three papers that I can recommend to anybody looking for a (fairly) quick and (somewhat) painless introduction to the world of Bloom filters:

The first paper is Burton Bloom's original description of the technique that he devised back in the late 1960's for building a new data structure for rapidly computing set membership. The paper is remarkably clear, even though in those days we had not yet settled on standard terminology for describing computer algorithms and their behaviors.

The most striking part of Bloom's work is this description of its behavior:

In addition to the two computational factors, reject time and space (i.e. hash area size), this paper considers a third computational factor, allowable fraction of errors. It will be shown that allowing a small number of test messages to be falsely identified as members of the given set will permit a much smaller hash area to be used without increasing the reject time. In some practical applications, this reduction in hash area size may make the difference between maintaining the hash area in core, where it can be processed quickly, or having to put it on a slow access bulk storage device such as a disk.

This discussion of probabilistic, approximate answers to queries must have been truly startling and disturbing in the 1960's, when computer science was still very young: "falsely identified", "fraction of errors"? Horrors! Even nowadays, computer science has a hard time dealing with the notions of randomness and probabilistic computations, but we're getting better at reasoning about these things. Back then, Bloom felt the need to justify the approach by noting that:

In order to gain substantial reductions in hash area size, without introducing excessive reject times, the error-free performance associated with conventional methods is sacrificed. In application areas where error-free performance is a necessity, these new methods are not applicable.

In addition to suggesting the notion of a probabilistic solution to the set membership problem, the other important section of Bloom's original paper is his description of the Bloom filter's core algorithm:

Method 2 completely gets away from the conventional concept of organizing the hash area into cells. The hash area is considered as N individual addressable bits, with addresses 0 through N - 1. It is assumed that all bits in the hash area are first set to 0. Next, each message in the set to be stored is hash coded into a number of distinct bit addresses, say a1, a2, ..., ad. Finally, all d bits addressed by a1 through ad are set to 1.
To test a new message a sequence of d bit addresses, say a1', a2', ... ad', is generated in the same manner as for storing a message. If all d bits are 1, the new message is accepted. If any of these bits is zero, the message is rejected.

Bloom's work was, rather quietly, adopted in various areas, particularly in database systems. I'll return to that subject in a future posting, but for now, let's spin the clock ahead 25 years, to the mid-1990's, when the second paper, by a team working on Web proxy implementations, brought significant popularity and a new audience to the world of Bloom filters.

The Summary Cache paper describes an intriguing problem: suppose you want to implement a set of independent proxy caches, each one physically separate from the others, and you'd like your implementation to arrange to have a cache miss on one proxy be able to quickly decide if the item is available in the cache of another proxy:

ICP discovers cache hits in other proxies by having the proxy multicast a query message to the neighboring caches whenever a cache miss occurs. Suppose that N proxies [are] configured in a cache mesh. The average cache hit ration is H. The average number of requests received by one cache is R. Each cache needs to handle (N - 1) * (1 - H) * R inquiries from neighboring caches. There are a total [of] N * (N - 1) * (1 - H) * R ICP inquiries. Thus, as the number of proxies increases, both the total communication and the total CPU processing overhead increase quadratically.

How do they solve this problem? Well, it turns out that a Bloom filter is just the right tool for this:

We then propose a new cache sharing protocol called "summary cache." Under this protocol, each proxy keeps a compact summary of the cache directory of every other proxy. When a cache miss occurs, a proxy first probes all the summaries to see if the request might be a cache hit in other proxies, and sends a query messages [sic] only to those proxies whose summaries show promising results. The summaries do not need to be accurate at all times. If a request is not a cache hit when the summary indicates so (a false hit), the penalty is a wasted query message. If the request is a cache hit when the summary indicates otherwise (a false miss), the penalty is a higher miss ratio.

Bloom filters continued to spread in usage, and interesting varieties of Bloom filters started to emerge, such as Counting Bloom Filters, Compressed Bloom Filters, and Invertible Bloom Filters. In particular, Professor Michael Mitzenmacher of Harvard has been collecting, studying, and improving upon our understanding of Bloom filters and their usage for several decades.

About 10 years ago, Mitzenmacher collaborated with Andrei Broder, who is now at Yahoo! Research but was then with Digital Equipment Company Research to write the third paper, which is the best paper of the three papers I mention in this article. (Note that Broder was a co-author of the second paper as well.)

The Network Applications of Bloom Filters paper accomplishes two major tasks:

First, it provides a clear, modern, and complete description and analysis of the core Bloom filter algorithm, including most importantly the necessary mathematics to understand the probabilistic behaviors of the algorithm and how to adjust its various parameters.
Second, it provides a wide-ranging survey of a variety of different areas in which Bloom filters can be used, and have successfully been used.

Most importantly, for people considering future use of Bloom filters, the paper notes that:

The theme unifying these diverse applications is that a Bloom filter offers a succinct way to represent a set or a list of items. There are many places in a network where one might like to keep or send a list, but a complete list requires too much space. A Bloom filter offers a representation that can dramatically reduce space, at the cost of introducing false positives. If false positives do not cause significant problems, the Bloom filter may provide improved performance. We call this the Bloom filter principle, and we repeat is for emphasis below.
The Bloom filter principle: Whenever a list or set is used, and space is at a premium, consider using a Bloom filter if the effect of false positives can be mitigated..

So there you have it: three papers on Bloom filters. There is a lot more to talk about regarding Bloom filters, and hopefully I'll have the time to say more about these fascinating objects in the future. But this should be plenty to get you started.

If you only have time to read one paper on Bloom filters, read Broder and Mitzenmacher's Network Applications of Bloom Filters. If you have more time, also read the Summary Cache paper, and if you have gone that far I'm sure you'll take the time to dig up Bloom's original paper just for completeness (it's only 5 pages long, and once you've read Broder and Mitzenmacher, the original paper is easy to swallow).

Thursday, December 1, 2011

The Foundations discussions

There continues to be an active discussion over Mikeal Rogers's essay about Apache and git, which I wrote about a few days ago.

Here's (some of) what's been going on:

Simon Phipps wrote a widely read essay in Computerworld UK about the notion of an open source foundation, as separate from the open source infrastructure, and relates the story of a project which suffered greatly because it hadn't established itself with the support of a larger entity:
the global library community embraced Koha and made it grow to significant richness. When the time came, the original developers were delighted to join a US company that was keen to invest in - and profit from - Koha. Everything was good until the point when that company decided that, to maximise their profit, they needed to exert more control over the activities of the community.
A detailed article at the Linux Weekly News website provides much more of the details of this story. Phipps's point is that part of these problems arose because the developers of the project didn't engage in the open discussion of the long term management of the project that would have occurred had they hosted their project at one of the established Open Source foundations such as Apache or the Software Freedom Conservancy.
Stephen O'Grady also wrote an essay on the difference between foundations and infrastructure acknowledging that "foundations who reject decentralized version control systems will fall behind", but further asserting that:
GitHub is a center of gravity with respect to development, but it is by design intensely non-prescriptive and inclusive, and thus home to projects of varying degrees of quality, maturity and seriousness.
[ ... ] GitHub, in other words, disavows responsibility for the projects hosted on the site. Foundations, conversely, explicitly assume it, hence their typically strict IP policies. These exclusive models offer a filter to volume inclusive models such as GitHub’s.
[ ... ] If you’re choosing between one project of indeterminate pedigree hosted at GitHub and an equivalent maintained by a foundation like Apache, the brand is likely to be a feature.
Mikeal Rogers, whose original essay kicked off the entire discussion, has since followed up with some subsequent thoughts about foundations and institutions:
Simon believes it is the job of an institution (in this case a foundation) to protect members from each other and from the outside world. In the case of legal liabilities this makes perfect sense. In the case of community participation this view has become detrimental.
If you believe, as I do, that we have undertaken a cultural shift in open source then you must re-examine the need for institutional governance of collaboration. If the values we once looked to institutions like Apache to enforce are now enforced within the culture by social contract then there is no need for an institution to be the arbiter of collaboration between members.
Ben Collins-Sussman, a longtime Apache member, chimes in with his thoughts on the value of the Apache Foundation, pointing to the explicit codification of "community":
the ASF requires that each community have a set of stewards (“committers”), which they call a “project management committee”; that communities use consensus-based discussions to resolve disputes; that they use a standardized voting system to resolve questions when discussion fails; that certain standards of humility and respect are used between members of a project, and so on. These cultural traditions are fantastic, and are the reason the ASF provides true long-term sustainability to open source projects.
Jim Jagielski, another longtime Apache member, adds his thoughts, observing that it is important to not get caught up in statistics about popularity, adoption rate, etc., but to concentrate on communities, culture, and communication aspects:
The ASF doesn't exist to be a "leader"; it doesn't exist to be a "voice of Open Source"; it doesn't exist to be cool, or hip, or the "place to be" or any of that.
[ ... ]
It exists to help build communities around those codebases, based on collaboration and consensus-based development, that are self-sustaining; communities that are a "success" measured by health and activity, not just mere numbers.
Ceki Gulcu (poorly transliterated by me, sorry), a longtime Open Source Java developer, observes that what one person sees as consensus and meritocratic collaboration, another might see as endless discussion and fruitless debate:
Apache projects cannot designate a formal project leader. Every committer has strictly equal rights independent of past or future contributions. This is intended to foster consensus building and collaboration ensuring projects' long term sustainability. Worthy goals indeed! However, one should not confuse intent with outcome. I should also observe that committer equality contradicts the notion of meritocracy which Apache misrepresents itself as.
As I have argued in the past, the lack of fair conflict resolution or timely decision making mechanisms constitute favorable terrain for endless arguments.

It seems to be a fairly fundamental debate: some believe that the open source foundations provide substantial benefit, others feel that they reflect a time that no longer exists, and are no longer necessary.

Overall, it's been a fascinating discussion, with lots of viewpoints from lots of different perspectives.

I'll continue to be interested to follow the debate.

Tools and Utilities for Windows

Scott Hanselman has posted a voluminous annotated list of the tools and utilities that he uses for developing software.

Most of these tools are specific to Windows 7, and more precisely to developing Web applications using Microsoft tools such as Visual Studio and C# and DotNet.

Still, it is a tremendous list, and if you're looking for a tool or utility for your personal development environment, there are a lot of references to chase here.