Journal of a Programmer: March 2012

Saturday, March 31, 2012

30 days has September, April, June, and November ...

... but it looks like it's already April 1 in Australia?

Friday, March 30, 2012

Online reviews

When it comes to online reviews, there are a variety of opinions, and you can pick and choose.

You could side with me: my feeling is that most of the people who are motivated enough to bother to review something online are those who have an axe to grind, so I immediately discount almost all negative reviews.
You could read the conclusions of some sophisticated and detailed research into the question of why reviews might decline due to various events.
Or you could just laugh, and say to yourself: once again, Randall Monroe nails it!

Thursday, March 29, 2012

I don't know what the game will be like ...

... but the video is certainly quite pretty.

It's not clear that anybody else knows what the game will be like, either.

Guess we'll just have to wait and see!

Wednesday, March 28, 2012

Every day you get in your life is precious

Don't waste a single one.

Tuesday, March 27, 2012

Last year, it took a full hour!

Google’s Massive Developer Event Sells Out in 20 Minutes.

Bill Slawski's Top Ten SEO Patents list

A few months back, I pointed to Bill Slawski's excellent series on SEO patents. Slawski has continued work on the series, and recently wound it up with a final 10th article.

Slawski takes a few moments to explain why he sees the study of patents as valuable, particularly in this particular sub-field of the computing industry, noting that patents "offer tantalizing hints of a jagged bigger picture, like picture puzzle pieces that don’t necessarily always fit together."

Patents aren't the only thing that Slawski writes about; his search engine observations are fascinating and informative. I particularly appreciate that he routinely illustrates his articles with detailed and compelling examples, as that makes them much easier to follow.

If you're interested in understanding how search engines work, Slawski's site is a great resource.

To tie up, here's the complete "Ten Most Important SEO Patents" series:

Part 1 – The Original PageRank Patent Application

Part 2 – The Original Historical Data Patent Filing and its Children

Part 3 – Classifying Web Blocks with Linguistic Features

Part 4 – PageRank Meets the Reasonable Surfer

Part 5 – Phrase Based Indexing

Part 6 – Named Entity Detection in Queries

Part 7 – Sets, Semantic Closeness, Segmentation, and Webtables

Part 8 – Assigning Geographic Relevance to Web Pages

Part 9 – From Ten Blue Links to Blended and Universal Search

Part 10 – Just the Beginning

Monday, March 26, 2012

Cliff Click's deep dive into Constant Propagation techniques

Cliff Click has published the third episode of his description of lattice-building techniques for constant propagation in optimizing compilers:

The series starts off with a promise:

I’ve been wrestling with some tricky bugs and I’d thought I’d share the pain. For fun, I’m going to toss in evidence that All Your Existing Optimizing Compilers Are Broken, and least they are if they attempt a data-flow style Constant Propagation on a language with a class hierarchy and a distinguished NULL pointer (e.g. Java, C++).

The meat of the series is the explanation of the general approach to constant propagation, using a lattice data structure to describe various important classes of partially-known values.

The series closes by finally getting to the originally-promised bug:

The Constant Propagation algorithm relies on having a lattice – and lattices are defined as having a unique lower-bound (and unique upper-bound) for any two elements. This structure does NOT have a unique lower-bound, and so is no longer a lattice! For some particular Java program and some particularly bad luck with our CP algorithm – the CP algorithm can hang indefinitely, or miss some constants on some runs – but find those constants on other runs for the same Java program. Garbage-In Garbage-Out. CP is technically wedged.

It's heavy going, but Click does his typical superb job of description and explanation, and if you've spent much time studying compilers, you'll find this well worth your time. The articles are laden with references to background information, so if you find you want to learn more about Partial Evaluation, or Sparse Conditional Constant Propagation, or Constant Folding, you'll have plenty to read and pursue.

By the way, it appears that Click has left Azul Systems, and has founded a new startup: 0xdata. Big data systems are all the rage nowadays, and 0xdata seems to have all the right buzzwords:

H20 brings database like interactiveness to Hadoop. It can be installed as a standalone or on top of existing Hadoop installation. It leverages data in HDFS and other familiar components in the Hadoop ecosystem such as Hive and Pig.

Click's high end performance bona fides are as robust as they get, so it should be interesting to follow the 0xdata work...

A great example of the power of generators

Sophisticated programming language concepts are often hard to explain, and so it's great when you find something simple, clear, and immediately persuasive.

For example, generators are one of those things that are hard to grasp.

So I really enjoyed Ned Batchelder's elegant and simple example. It has just enough complexity to make the solution real-world, it is motivated by a common and realistic problem, and it demonstrates an elegant use of the feature.

As many of the commenters point out, there are other possible solutions, but that's not the point! The point is to illustrate the use of generators without descending into programming language theory or resorting to pages and pages of code.

Anyway, nice article, Ned!

I have no idea what it will be like to play ...

... but it sure looks like a beautiful game!

Sunday, March 25, 2012

I'm really enjoying Dan Boneh's online cryptography class

I've finished the second week of Professor Dan Boneh's online cryptography class, available at http://www.crypto-class.org.

The video lectures are exceptionally clear and well presented. I've found it best to watch them through twice: I watch an entire week's videos from start to finish once, then, later, I go back and re-watch each of the videos in sequence.

The problem sets have been well-constructed, and the first week's programming assignments were challenging, but solvable, and solving them gave me some great insights into the material. For the second week, there were no programming questions, just a problem set.

On my first attempt at the second week's problem set, I got one of the questions wrong, and decided to re-take that problem set. I was very impressed to see that, on the second attempt, the overall questions were essentially the same, but the candidate answers for the multiple-choice questions were different! Furthermore, in the cases where the candidate answers were the same, they were shuffled around, so I couldn't just remember that, e.g., "I picked the second choice on problem 3", but actually had to re-solve each question again. That made re-taking the problem set much more rewarding.

A friend asked me what I had learned so far. I think it's these two things:

There are good (strong, sound, legitimate) cryptographic techniques, and bad (weak, flawed, broken) cryptographic techniques, and it isn't trivial to tell the difference just by glancing at them. You really have to understand the concepts and do the analysis and work through the details in order to see.
Cryptography has a mystique of being unapproachable and complex, but if it is well-presented (as this material is), a dedicated student can comprehend it.

The third week of lectures is now online, so I know what I'll be doing next week!

Saturday, March 24, 2012

BATS IPO Algorithmic Snafu?

So, um, what happened with the BATS IPO?

Electronic stock exchange operator BATS Global Markets was forced to abort its US$80m IPO after embarrassing technical difficulties on its own exchange tarnished its planned debut.

It all happened so fast, it seems like nobody's really sure what happened.

"As a result of this systems issue, three erroneous trades occurred on the BATS BYX Exchange in Apple Inc. (APPL), one of which caused a volatility halt in that stock," the company said in a statement. "The erroneous trades were broken under BATS' clearly erroneous trade policy."

I guess there's erroneous, and then there's clearly erroneous?

What we know for sure is that something happened, and then we took action.

"In the wake of today's technical issues, which affected the trading of certain stocks, including that of BATS, we believe withdrawing the IPO is the appropriate action to take for our company and our shareholders," said Joe Ratterman, CEO of BATS Global Markets.

ZeroHedge says this is just another case where "an algo very obviously went insane".

We wish to thank BATS exchange (which is no longer content with busting the NBBOs in the US, it has now also decided to go after Chi-X in Europe) for allowing this vivid demonstration of a mini flash crash in action, in which there is absolutely no fat finger, no Greek parliament to blame the crash on, and is merely a function of a busted HFT algo which ends up making a mockery of the price discovery process.

The generally sober and sensible Felix Salmon tries to make sense of it all:

They’re not going to try that stunt again in a hurry; as finance professor James Angel told Bloomberg, this was “like seeing an airplane crash on takeoff”. On the maiden flight of a new airline. You can imagine how much appetite anybody would have to fly that airline thereafter.

And Bloomberg agrees, the way stocks trade in America doesn’t work

A Bats computer that matches orders in companies with ticker symbols in the A-BFZZZ range “encountered a software bug related to IPO auctions” at 10:45 a.m. New York time, the company said in an e-mailed statement explaining its trading problems. The glitch made existing customer orders for those symbols unavailable for trading, the document said.

Aw, what the heck, it's springtime, who cares if nobody understands how the financial system works anymore...

Friday, March 23, 2012

The complicated task of trying NOT to keep up with the times

Check out this wonderful essay by Princeton graduate student Benjamin Schmidt: The Foreign Language of 'Mad Men'.

Using modern linguistic analysis tools, such as Google's Ngram analysis, Schmidt studies the language used in shows such as Mad Men, trying to see how authentic the language is.

Mostly it is quite accurate, but there are slip ups, some of them glaring:

There are scores of idioms that are strikingly modern. "Feel good about," "match made in heaven," "tough act to follow," "make eye contact," "fantasize about"; all are at least tenfold more common today than in Mad Men's times. Any of these individually might be perfectly plausible; but for "feel good about," for example, to be said four separate times over the course of the show by several different characters is extraordinarily unlikely.

But other anachronisms are extremely subtle:

What seems to be the most ubiquitous mistake in Mad Men is so frequent as to be invisible: the phrase "I need to." Modern scripts set in 1960s, including Mad Men, use it constantly: it's about as frequent as everyday words like "good," "between," or "most." But to say "I need to" so much is a surprisingly modern practice: books, television shows, and movies from the 1960s use it at least ten times less often, and many never use it all. Sixties dialogue written back then used "ought to" far more often than modern imitators do.

It's a very interesting approach, combining computer analysis with old-fashioned linguistic study, and is a fun read.

Special bonus Mad Men link: if you're revving up for Season 5, take this quick 7 minute refresher course on the first 4 seasons, courtesy of the Slate Audio/Visual Club!

Wednesday, March 21, 2012

It's not just a game...

... it's industrial art.

Chjade84 convinced an Automated Laser Corporation 20 watt fiber laser to play "Still Alive," Jonathan Coulton's epic anthem for Valve's video-game Portal; as a lagniappe, the laser performs this feat while carving Valve's logo into a stainless steel plate.

Watch the video.

Tuesday, March 20, 2012

Internet scale, administrations, and operations

I happened across two different "Internet scale" articles over the weekend that are related and worth considering together.

The first one is a few years old, but it had been a while since I'd read it, and I happened to re-read it: On Designing and Deploying Internet-Scale Services. This paper was written by James Hamilton when he was part of the MSN and Windows Live teams at Microsoft, and in the paper he discusses a series of "lessons learned" about building and operating "Internet scale" systems.

The paper is rich with examples and details, rules of thumb, techniques, and approaches. Near the front of the paper, Hamilton distills three rules that he says he learned from Bill Hoffman:

Expect failures. A component may crash or be stopped at any time. Dependent components might fail or be stopped at any time. There will be network failures. Disks will run out of space. Handle all failures gracefully.

Keep things simple. Complexity breeds problems. Simple things are easier to get right. Avoid unnecessary dependencies. Installation should be simple. Failures on one server should have no impact on the rest of the data center.

Automate everything. People make mistakes. People need sleep. People forget things. Automated processes are testable, fixable, and therefore ultimately much more reliable. Automate wherever possible.

You can read more about Hoffman's approach to reliable Internet scale systems in this ACM Queue article.

These are three superb rules, even if they are hard to get right. Of course, the first step to getting them right is to get them out there, in front of everybody, to think about.

More recently, you don't want to miss this great article by Jay Kreps, an engineer on the LinkedIn team: Getting Real About Distributed System Reliability. Again, it's just a treasure trove of lessons learned and techniques for designing and building reliability into your systems from the beginning.

I have come around to the view that the real core difficulty of these systems is operations, not architecture or design. Both are important but good operations can often work around the limitations of bad (or incomplete) software, but good software cannot run reliably with bad operations. This is quite different from the view of unbreakable, self-healing, self-operating systems that I see being pitched by the more enthusiastic NoSQL hypesters.

Both articles remind me of the Chaos Monkey that the Netflix team use in their development process.

One of the first systems our engineers built in AWS is called the Chaos Monkey. The Chaos Monkey’s job is to randomly kill instances and services within our architecture. If we aren’t constantly testing our ability to succeed despite failure, then it isn’t likely to work when it matters most – in the event of an unexpected outage.

As Hamilton's article points out, the emphasis on failure handling and on failure testing builds on the decades old work of Professors David Patterson and Armando Fox in their Recovery Oriented Computing and Crash-only Software efforts.

Crash-only programs crash safely and recover quickly. There is only one way to stop such software – by crashing it – and only one way to bring it up – by initiating recovery. Crash-only systems are built from crash-only components, and the use of transparent component-level retries hides intra-system component crashes from end users.

When these ideas were first being proposed they faced a certain amount of skepticism, but now, years later, they are accepted wisdom, and the stability and reliability of systems like Amazon, Netflix, and LinkedIn is testament to the fact that these techniques do in fact work.

Friday, March 16, 2012

Followup, followup, followup

I love a good follow-up. There's almost always more to a story than first meets the eye, and if the story is an important one, often there is much more to the story.

Here's some recent follow-ups that caught my interest:

Annie Lowrey, a reporter for Slate, got interested in the story of "Why the Lucky Stiff", and has written a fascinating follow-up on his story: Where's _why?
On Aug. 19, 2009, his personal site stopped loading. He stopped answering email. A public repository of his code disappeared. His Twitter account—gone. Hackety Hack—gone. Dozens of other projects—gone.
It's a fascinating story, and Lowrey does a great job of following it up and bringing it up to date.
Megan Garber, a reporter for The Atlantic, got interested in an epic post that appeared on Quora, and decided to learn more about what inspired and motivated Tim Morgan to write that post: The Story Behind That 9,000-Word Quora Post on Airplane Cockpits.
In response to the three-sentence Quora question, Morgan, a web developer and private pilot, proceeded to take Quora users -- and anyone else who happened to come across his post -- on an extended journey: through the mechanical workings of the airplane cockpit, through the intricate dance that takes place between pilot and plane, and, in general, through the small miracle of precision and power that makes us fragile little humans able to fly.
Garber's nice article tracks down Morgan and learns more about who he is and why he took the time to write such an interesting article.
Ira Glass, the host and producer of public radio's This American Life, has announced that the show has entirely retracted their episode covering Mike Daisey and the Apple/Foxconn story: Retraction
Regrettably, we have discovered that one of our most popular episodes was partially fabricated. This week, we devote the entire hour to detailing the errors in "Mr. Daisey Goes to the Apple Factory," Mike Daisey’s story about visiting Foxconn, an Apple supplier factory in China. Rob Schmitz, a reporter for Marketplace, raises doubts on much of Daisey's story.
Meanwhile, Daisey responds to the retraction:
What I do is not journalism. The tools of the theater are not the same as the tools of journalism. For this reason, I regret that I allowed THIS AMERICAN LIFE to air an excerpt from my monologue. THIS AMERICAN LIFE is essentially a journalistic - not a theatrical - enterprise, and as such it operates under a different set of rules and expectations.
The Apple/Foxconn story is obviously continuing to develop; I'm pleased that journalists around the world are going over it closely, and I'll be watching as the story develops further.

Perfectly named for your career

Names are wonderful, and sometimes they fascinate me. Are these two fellows just perfectly named for their careers?

Two days after special-teams star Blake Costanzo signed with the Bears, the 49ers signed his replacement Thursday in 10-year veteran Rock Cartwright, the NFL Network reported.

It's just like the game was screen-written, like I was following characters from a movie. Is Rock Cartwright the fourth son of Ben Cartwright, joining Adam, Hoss, and Little Joe (and wouldn't Hoss Cartwright be a perfectly named fullback?).

And how about Blake Costanzo? Surely he must be related to Tony Blundetto, or Sal "Big Pussy" Bonpensiero? (And who would want to tackle Big Pussy Bonpensiero in the open field, after all?

It does make me wonder if, in the wonderfully stage-managed world of professional sports, we're nearing the point where naming consultants will be providing athletes with advice on choosing their football names...

Thursday, March 15, 2012

rands on hacking

Michael Lopp, who writes as Rands in Repose is, I think, the greatest observer of the realities of life in the software profession. If you want to know what the software industry is like, his book is a great place to start.

The latest article on the Rands in Repose site, Hacking is Important, is a good one, for it's all about the birth of ideas and the death of organizations.

Rands is talking about the emergence of new ideas into an existing world:

I believe bringing anything new into the world is a disruptive act. By being novel and compelling, the new is likely to replace something else and that something else isn’t being replaced without a fight.

He describes the obstacles that the new has to overcome to displace the old:

Pick any company, read its history, and I’m pretty sure there will be a well-documented origin story that will define its beginning and involves someone building something new and possibly of unexpected value. What isn’t documented is the story of every moment before where everyone surrounding the hacker asked, “Why the hell are doing you that?”, “Why would you take the risk with so little reward?”, or “Why are you wasting your time?”

And he talks about companies that try to establish a culture that encourages the exploration of new ideas (e.g., Google's famed "20% time"):

Facebook’s letter documents its core values: focus on impact, move fast, be bold, be open, and build social value.

His basic suggestion is to "institutionalize" the practice of allowing talented, creative employees to freely investigate new ideas:

Someone, somewhere has a bright idea and a handful of talented engineers are whisked off to a different building behind a locked door. Their status is “elsewhere” and their project is “need to know.”

This is a well-known approach, perhaps most famous in its form as Kelly Johnson's Skunk Works team of World War II. The core idea is that these people can move faster and make more progress if they aren't burdened by all the bureaucratic obstacles that the company has accumulated over the years:

The folks who create process care about control, and they use politics to shape that control and to influence communications.

Of course, "process" is what you call it when you perceive it as an obstacle; "coordination" is what you call it when you perceive it as teamwork. Given the number of mistakes I've made in my career, I'm pretty solidly in the camp that prefers the support and infrastructure of a complete team; reviews catch bad ideas, design documents force you to think your work through, and testing exposes oversights and poor implementation.

One thing I think that Rands underestimates is the amount of ill-will and discord that such institutionalization creates. I've been on those hacking teams, and I've been not on those hacking teams, and I think that Rands soft-pedals it when he says:

Yes, there is internal jealousy about the teams performing the wizardry that resulted in products like the iPad, the iPhone, and AppleTV. There are people wondering, Why wasn’t I invited to the hacking? Yes, this did create some elitism, ...

In my experience, the amount of jealousy, elitism, favoritism, rivalry, and discord that this sort of technique creates is substantial, and hard to expunge once it's arrived. The loss of trust is severe, and can be debilitating, and the effects can last for years. The bitterness can be cancerous.

Still, the basic technique is proven and well-known: the Skunk Works team, of course, delivered success after success during their run, and the successes of Google's 20% projects are well-documented as well.

One thing about the Google 20% approach is that it's more egalitarian (at least as I understand it); everybody is allowed to try to develop their own revolutionary ideas. There are plenty of people who will tell you, though, that most 20% projects go nowhere, and those that succeed do so due to many non-technical factors.

I don't have a better suggestion than these, and clearly it is preferable to have your employees working on their new idea as part of your organization, rather than forcing them to quit and move elsewhere in order to work on something new.

But as you implement those hacking teams in your shop, be careful to ensure that you aren't just creating a culture of the "cowboys" who can do whatever they want versus the "good citizens" who keep the lights on and the streets clean.

Wednesday, March 14, 2012

Mapping the past

Imagine what California may have looked like, 200 years ago.

Tuesday, March 13, 2012

Mark your calendars!

Stephen Hawking to guest star on 'Big Bang Theory'

The Google I/O 2012 website is live

The information is still being filled in, but the Google I/O 2012 website is now live. Lots of organizational information is available, but I don't see the technical sessions agenda just yet.

As you'll recall, previous versions of this conference have been pretty hot topics.

I'll be keeping an eye on the conference site to see what's being discussed this year.

Software in the courts

Here's an interesting essay with a viewpoint on software patents that you don't often get.

I sat in a conference room with my co-founders and a couple of patent attorneys and told them what we’d created. They took notes and created nonsensical documents that I still can’t make sense of.

It would be hard to believe, if I hadn't heard this exact story, in almost identical words, from multiple different engineers that I trust deeply.

I thought I was giving them a shield, but turns out I gave them a missile with my name permanently engraved on it.

The only people getting rich off this are the lawyers.

Meanwhile, in other similar, but unrelated, software-in-the-courts news, Judge William Alsup just entered an order setting April 16, 2012 as the definitive trial date for Oracle v. Google at the United States District Court for the Northern District of California.

Monday, March 12, 2012

Great gaming post

This is the most epic game article you will read this year.

Warning: Some rather adult language in the article.

Sunday, March 11, 2012

Link dumping, March 2012 edition

Once again, it seems like there's way more going on than there's time to read.

But here's some quick link-dumping on some of the things that have been occupying me recently:

Two great posts from the Netflix team about the practical aspects of fault tolerance in distributed systems: part one, ant part two
Each service that’s wrapped by a circuit breaker implements a fallback using one of the following three approaches:
1. Custom fallback - in some cases a service’s client library provides a fallback method we can invoke, or in other cases we can use locally available data on an API server (eg, a cookie or local JVM cache) to generate a fallback response
2. Fail silent - in this case the fallback method simply returns a null value, which is useful if the data provided by the service being invoked is optional for the response that will be sent back to the requesting client
3. Fail fast - used in cases where the data is required or there’s no good fallback and results in a client getting a 5xx response. This can negatively affect the device UX, which is not ideal, but it keeps API servers healthy and allows the system to recover quickly when the failing service becomes available again.
Greg Kroah-Hartman re-caps the history of the 2.6.32 Linux kernel
It lived (in my hands) for 823 days (only the 2.6.16 kernel lived longer at 855 days). It ended up containing 3,349 patches that matched up with patches in Linus's kernel tree, the most of any stable kernel release by far (2.6.16 only had 991 patches, the next closest was 2.6.27 with 1,596 patches, 3.0 already has 1448 patches after 133 days, so it might end up being the largest eventually.)
The 2.6.32 kernel was the basis of all of the enterprise distros of the time, still running, and will still supported by the major enterprise Linux vendors for many years in the future, so it will live on.
Joshua Gans takes an independent look at the question of whether there is any substance to the government's investigation of e-book pricing collusion
Apple is not a book retailer. It doesn’t want to work out the price of books to consumers. It thinks publishers can do that. Now Amazon and other book retailers are, of course, in a different position. They might have information they can use there.
Jeff Atwood really nails it with a superb, link-rich, words-from-experience article on How to Hire a Programmer If your organization can follow even a fraction of this advice, you'll do much better at selecting new unknown talent. However, be warned: this is hard work, and Atwood's suggestions will take effort to implement. But it will be worth it, for programming talent is one of the hardest things to assess.
Meanwhile, in one of those places where they're trying to ready some of that new talent, Michael Mitzenmacher rather tongue-in-cheekily observes that it's getting harder to find a safe day to hold an exam, as there are too many parties on the agenda.
And Derek Jones suggests perhaps students of programming languages would be better served if they spent less time trying to invent new languages and instead worked on improving existing programming languages.
Lastly, there are just way too many interesting videos to watch:
- Andrew Tridgell reflects on the 20th anniversary of Samba
- Ralf Langner, the world's foremost expert on the fascinating Stuxnet computer virus, explains the technical details in his Stuxnet Deep Dive
- And all the sessions from this year's C++ Going Native conference are online

Darn these short weekends! I needed that extra hour! Well, as the Bare-Naked Ladies sing, "Who needs sleep?"

Thursday, March 8, 2012

Are some algorithms simply too hard to implement correctly?

I recently got around to reading a rather old paper: McKusick and Ganger: Soft Updates: A Technique for Eliminating Most Synchronous Writes in the Fast Filesystem.

The paper describes a sophisticated enhancement to the 4.4BSD FFS (Fast FileSystem) algorithms that not only improves performance dramatically, but also eliminates the need to run fsck after every system crash.

The core of the algorithm is to carefully track dependencies between different update requests that the filesystem makes:

With soft updates, the ﬁlesystem uses delayed writes (i.e., write-back caching) for metadata changes, tracks dependencies between updates, and enforces these dependencies at write-back time. Because most metadata blocks contain many pointers, cyclic dependencies occur frequently when dependencies are recorded only at the block level. Therefore, soft updates tracks dependencies on a per-pointer basis, which allows blocks to be written in any order. Any still-dependent updates in a metadata block are rolled-back before the block is written and rolled-forward afterwards. Thus, dependency cycles are eliminated as an issue. With soft updates, applications always see the most current copies of metadata blocks and the disk always sees copies that are consistent with its other contents.

It's one of those algorithms that's fairly easy to state but devilishly intricate to implement properly, and the McKusick and Ganger paper is all about implementation.

The core concept of dependency tracking and write ordering is very sound and reliable, and the resulting filesystem has seen enormous practical success. At my day job, we use a similar, though vastly simpler, dependency tracking and write ordering algorithm in the server database, and it, too, has been reliable and has seen enormous practical success.

However, as Val Aurora points out, it's an interesting observation that the basic soft updates algorithm is simultaneously well-known and infrequently used:

If a file system discussion goes on long enough, someone will bring up soft updates eventually, usually in the form of, "Duhhh, why are you Linux people so stupid? Just use soft updates, like BSD!" Generally, there will be no direct reply to this comment and the conversation will flow silently around it, like a stream around an inky black boulder. Why is this? Is it pure NIH (Not Invented Here) on the part of Linux developers (and Solaris and OS X and AIX and...) or is there something deeper going on? Why are soft updates so famous and yet so seldom implemented?

As Aurora notes, the details involved in the implementation turn out to be quite intricate:

I've read this paper at least 15 times, and each time I when get to page 7, I'm feeling pretty good and thinking, "Yeah, okay, I must be smarter now than the last time I read this because I'm getting it this time," - and then I turn to page 8 and my head explodes.

I understand the exploding head issue; the McKusick and Granger paper is dense and detailed. And yet, it seems to me to be almost perfectly written:

The concepts are explained clearly, accurately, and thoroughly.
The implementation complexities are covered
The paper describes the data structures as well as their use

It's really hard to imagine the Soft Updates algorithm being any more clearly described or documented; the paper is nearly ideal. Yet, as Aurora concludes:

when it come to implementation, only programmers with deep, encyclopedic knowledge of the file system's on-disk format can derive the correct ordering rules and construct the associated data structures and web of dependencies. The close coupling of on-disk data structures and their updates and the user-visible data structures and their updates results in weird, counter-intuitive behavior that must be covered up with additional code.
Overall, soft updates is a sophisticated, insightful, clever idea - and an evolutionary dead end. Journaling and copy-on-write systems are easier to implement, require less special-purpose code, and demand far less of the programmer writing to the interface.

How has that assessment fared over the three years since Aurora wrote her article? According to this article from last winter, Linux filesystem engineers remain convinced that Soft Updates are too complex to implement, and other techniques, such as the Journaling or copy-on-write approaches mentioned by Aurora, are still the preferred techniques. As Ted Ts'o observes

The *BSD's didn't get advanced features such as Extended Attribute until some 2 or 3 years after Linux. My theory why is that it required someone as smart as Kirk McKusick to be able to modify UFS with Soft Updates to add support for Extended Attributes and ACL's.

I am comfortable conceding that, if Ted Ts'o feels that a particular filesystem algorithm is too complex, it is too complex.

So it seems that the time for Soft Updates has come and gone. If nothing else, it is extremely informative to study this work, ant it is also interesting to reflect on the rise and fall of the Soft Updates algorithm.

What are other good examples of algorithms that arose, came to be considered to be state of the art, then faded as newer algorithms came along? It's clear that Cryptography is full of such algorithms, but Soft Updates are interesting in that they're from an area that is perhaps less well-known.

Tuesday, March 6, 2012

The Stanford online cryptography class is up and running!

The website for Professor Dan Boneh's online cryptography course is now operational, and the materials are starting to be posted!

Yay!

Hmmm, I guess I better start studying!

A fine use of the web

Here's a fun site: Fan Chants, as profiled in the New York Times.

Monday, March 5, 2012

RSA 2012 has come and gone

The annual RSA security conference has now come and gone. I didn't attend, but read some of the coverage.

This conference has become quite large and established, but it seems very commercial in tone, as opposed to some of the more hacker-oriented security conferences. You'll notice, looking at the list of speakers, that the talks are generally given by executives from large companies.

Which is not bad, but it means that you're getting a lot of corporate-speak, not a lot of cutting-edge research, as opposed to a conference such as SyScan, say.

Anyway, one interesting part of the RSA conference was the annual blogger awards, where I picked up a bunch of new blogs to follow, including

Ever more to read and learn!

Saturday, March 3, 2012

How will the field of software analysis emerge?

The software field is still very young; I was lucky enough to enter it when it was just beginning, and I'm still alive and actively practicing. I think there are very few fields of study which are so young and immature, and yet so incredibly active.

In most other fields, new students are introduced to the field through the study of classics. Literature students study Shakespeare; physics students cut their teeth on Newton's work; artists study daVinci; musicians learn to perform and comprehend Mozart.

The software field still seems to do very little of this. There is the occasional work such as Beautiful Code, or Coders at Work or the much earlier Literate Programming, but these works are few and far between, and I don't get the feeling that they are viewed as successful, or that they are finding enduring usage in students of software.

I expect that at, some point, software education will incorporate a greater "study of the classics" aspect, although clearly the field must still mature in order to even comprehend what "the classics" might be.

When that time comes, I believe that it will be incredibly valuable to have detailed version control histories of those great works, because, in clear distinction to the works I mentioned at the start (Shakespeare, Newton, etc.), great software is almost never a single-person effort, and almost never emerges fully-formed.

Great software is written by great teams, over years or decades, haltingly and painstakingly, with trial and error, bug and fix, initial raw idea and later refined variation.

I hope that tools like those we make at my day job will be able, not only to endure long enough to be part of this eventual emergence of software study as a field, but also to attract bodies of software that are maintained in that SCM system long enough to record all those design decisions and intermediate versions.

At work, we have the entire history of Perforce in Perforce itself, and I can crawl through the history of Perforce itself, going back 17 years, and see the evolution of the product and its concepts, read the ideas, both good and bad, of those who worked on the software before me, and incorporate all of that experience into my own work.

It's an incredibly valuable tool, and I'm not sure how many people realize how valuable this can be until it's too late. Most young students, I believe, get started in programming working on homework assignments and class projects. Their work goes from start to finish in a matter of days or weeks, and they rarely if ever return to it. I rather doubt that students bother checking their work into an SCM system, but I'd love to hear otherwise.

I don't have any clever way to close this post, I just wanted to record some ideas that I was thinking about as I was lifting weights the other day...

Friday, March 2, 2012

Satan's Trifecta

One of the eternal complexities of computing is how to handle dates and times. Practice makes perfect, so goes the saying; a corollary to that saying is that if you only do something rarely, you don't get very good at it. Leap years are a good example, as they only occur once every four years (or so), which is not very often in the greater scheme of things, and means that they nearly always give rise to bugs.

And, indeed, this week brought news that a leap year bug took down the Internet (well, not the entire Internet, but a fairly significant piece of it).

Figuring out how to accurately handle dates and times in a computer is no easy task. People have been struggling with this problem for decades, and are continuing to struggle with it. For a recent example of some detailed and thorough thinking about dates and times, you might want to check out Jon Skeet's Noda Time, a .NET version of the well-known Joda Time library. Joda Time is rich and detailed and has lots of great documentation; Noda Time has some nice documentation, too, such as this page on Core Concepts and this page on choosing the correct datatype.

Of course, these are relatively recent libraries, and build on the work of others over decades. For a superb and encyclopedic treatment of dates and times in computing, you'll want to visit Paul Eggert's master page. It's an incredible resource; you could spend months just reading through all the reference material collected from that site. To cite just a few, you'll find a link to Gilbert Healton's lengthy article: The Best of Dates, The Worst Of Dates, as well as the National Institute of Standards and Technology's gorgeous history of timekeeping: A Walk through Time.

Eggert is an Internet legend for collecting, analyzing, cataloging, and providing all this information. But we nearly lost it all last fall when a lawsuit caused Eggert and his colleague Arthur David Olson to take down their web databases. Happily, the Electronic Frontier Foundation stepped in, and provided legal counsel, so access to these resources remains available.

I tend to think that this is one of those areas where Open Source makes a lot of sense. These are terribly detailed and intricate libraries, and it's really to everyone's advantage to have solid, reliable, accurate, and dependable date/time computations in every piece of software they use, so to me it makes perfect sense for us all to collaborate on a single library of shared software for these datatypes.

So if you might have felt tempted to scoff at this week's Internet outages, wondering how we computer programmers could mess up such a simple concept, rest assured that it's much more complicated than you might originally think, and consider becoming familiar with the extensive and excellent work that has already taken place in this area.

And, of course, when you're done with that, take a break and watch a show!

Thursday, March 1, 2012

I love style guides

Style guides have a long and distinguished past. Classic examples include Strunk & White's The Elements of Style, and The Chicago Manual of Style (can't resist that chance to mention my Alma Mater...).

When programming languages started to become moderately sophisticated, style guides soon followed; Ben Schneiderman's The Elements of Fortran Style was one of the first (it's reached its 40th birthday!), followed 5 years later by Brian Kernighan's The Elements of Programming Style.

I wasn't ever a Fortran programmer (though I was once a PL/1 programmer), but I read both those books cover to cover, several times.

Anyway, I'm a junkie for style guides in general, and for programming language style guides in particular. Nowadays they are a dime a dozen, but I still perk up when I hear of a new one and I go check it out.

So I was intrigued to see, recently, that the engineering team at Twitter had shared their Scala programming guide: Effective Scala. I haven't had a chance to program in Scala, but it's a very interesting language and I got a better feel for the language by reading through the Twitter style guide.

While highly effective, Scala is also a large language, and our experiences have taught us to practice great care in its application. What are its pitfalls? Which features do we embrace, which do we eschew? When do we employ “purely functional style”, and when do we avoid it? In other words: what have we found to be an effective use of the language? This guide attempts to distill our experience into short essays, providing a set of best practices. Our use of Scala is mainly for creating high volume services that form distributed systems — and our advice is thus biased — but most of the advice herein should translate naturally to other domains.

I like the fact that the Twitter team recognize that programs are written to be read by human beings:

Scala provides many tools that enable succinct expression. Less typing is less reading, and less reading is often faster reading, and thus brevity enhances clarity. However brevity is a blunt tool that can also deliver the opposite effect: After correctness, think always of the reader.

You can never tell what will pop up when you read programming texts; there is always something new to learn, and it may be hiding around the next corner. In this case, I stumbled upon The Cake Pattern, which was a programming pattern I hadn't heard of before.

The neat thing is that we can here switch between different implementations of the services (if we had defined an interface trait and multiple implementations). But even more interestingly, we can create multiple “worlds” or “environments” by simply composing the traits in different combinations.

It's a nifty technique, though I haven't yet figured out why it's called "The Cake Pattern". I guess I'll go read the paper.

Thank you, Twitter engineering team, for sharing your style guide, it was most interesting!