Journal of a Programmer: November 2011

Wednesday, November 30, 2011

Wings

The following sounds like a description of an airplane, like something you might hear from somebody like Bert Rutan:

when we addressed the wing, we started with a complicated rule, to limit what a designer could do. We added more and more pieces as we thought of more and more outcomes, and we came to a point where it was so complicated—and it was still going to be hard to control, because the more rules you write the more loopholes you create – that we reverted to a simple principle. Limit the area very accurately, and make it a game of efficiency.

But it's not from Rutan at all; it's an excerpt from Wings, the Next Generation, an article discussing the sailboats to be used in next summer's America's Cup qualification matches.

Now, everybody knows that sails, and airplane wings, actually have very much in common, so it really isn't surprising that this sounds like aerospace design. However, as Paul Cayard notes in the article, the wings on a competition sailboat have a few special constraints:

the America’s Cup rules don’t allow stored power, so two of our eleven guys—we think, two—will be grinding a primary winch all the race long. Not to trim, but to maintain pressure in the hydraulic tank so that any time someone wants to open a hydraulic valve to trim the wing, there will be pressure to make that happen.

It will be fascinating to see these boats in person, racing on the bay, but I'm glad I won't have to be one of those grinders!

Tuesday, November 29, 2011

Apache, Subversion, and Git

Over the long weekend, a number of people seem to have picked up and commented on Mikeal Rogers's essay about Apache and its adoption of the source code control tool, Git. For example, Chris Aniszczyk pointed to the essay, and followed it up with some statistics and elaboration. Aniszczyk, in turn, points to a third essay (a year old), by Josh Berkus, describing the PostgresQL community's migration to git, and a fourth web page describing the Eclipse community's migration to git. (Note: Both Eclipse and PostgresQL migrated from CVS to git).

I find the essays by Rogers and Aniszczyk quite puzzling, full of much heat and emotion, and I'm not sure what to take from them.

Rogers seems to start out on a solid footing:

For a moment, let's put the git part of GitHub on the back burner and talk about the hub.
On GitHub the language is not code, as it is often characterized, it is contribution. GitHub presents a person to person communication system for contributions. Documentation, issues, and of course code, travel between personal repositories.
The communication medium is the contribution itself. Its value, its merit, its intention, all laid naked for the world to see. There is no hierarchy or politic embedded in the system. The creator of a project has a clear first mover advantage but the possibility is always there for its position to be supplanted by a fork, creating a social imperative to manage contributions in a satisfactory manor [sic] to her community.

This is all well-written and clear, I think. But I don't understand how this is a critique of Apache. In my seven years of experience with the Derby project at Apache, this is exactly how an Apache software project works:

Issues are raised in the Apache issue-tracking system;
discussion is held in the issue comments and on mailing lists;
various contributors suggest ideas;
someone "with an itch to scratch" dives into the problem and constructs a patch;
the patch is proposed by attaching it to the issue-tracking system;
further discussion and testing occurs, now shaped by the concrete nature of the proposed patch;
a committer who becomes persuaded of the desirability of the patch commits it to the repository;
eventually a release occurs and the change becomes widely distributed.

This is the process as I have seen it and participated in it, since back in 2004, and, I believe, was how it was done for years before that.

So what, precisely, is it that Apache is failing at?

Here is where Rogers's essay seems to head into the wilderness, starting with this pronouncement:

Many of the social principles I described above are higher order manifestations of the design principles of git itself.
[ ... ]
The problem here is less about git and more about the chasm between Apache and the new culture of open source. There is a growing community of young new open source developers that Apache continues to distance itself from and as the ASF plants itself firmly in this position the growing community drifts farther away.

I don't understand this at all. What, precisely, is it that Apache is doing to distance itself from these developers, and what does this have to do with git?

Rogers offers as evidence this email thread (use the "next message by thread" links to read the thread), but from what I can tell, it seems like a very friendly, open, and productive discussion about the mechanics of using git to manage projects at Apache, with several commenters welcoming newcomers into the community and encouraging them to get involved.

This seems like the Apache way working successfully, from what I can tell.

Aniszczyk's follow-on essay, unfortunately, doesn't shed much additional light. He states that "what has been happening recently regarding the move to a distributed version control system is either pure politicking [sic] or negligence in my opinion."

So, again, what is it that he is specifically concerned about? Here, again, the essay appears to head into the wilderness. "Let's try to have some fun with statistics," says Aniszczyk, and he presents a series of charts and graphs showing that:

git is very popular
lots of job sites, such as LinkedIn, are advertising for developers who know git
There is no 3.

At this point, Aniszczyk says "I knew it was time to stop digging for statistics."

But again, I am confused about what he finds upsetting. The core message of his essay appears to be:

The first is simple and deals with my day job of facilitating open source efforts at Twitter. If you’re going to open source a new project, the fact that you simply have to use SVN at Apache is a huge detterent [sic] from even going that route.
[ ... ]
All I’m saying is that it took a lot of work to start the transition and the eclipse community hasn’t even fully completed it yet. Just ask the PostgreSQL community how quick it was moving to Git. The key point here is that you have to start the transition soon as it’s going to take awhile for you to implement the move (especially since Apache hosts a lot of projects).

Once again, I'm lost. Why, exactly, is it a huge deterrent to use svn? And why, exactly, does Apache need to convert its existing projects from svn to git? Just because LinkedIn is advertising more jobs that use git as a keyword? That doesn't seem like a valid reason, to me.

Note that, as I mentioned at the start of this article, the PostgresQL team migrated from CVS to git, not from Subversion to git. I can completely understand this. The last time I used CVS was in 2001, 10 full years ago; even at that time, CVS had some severe technical shortcomings and there was sufficient benefit to switching that it was worth the effort. So I'm not at all surprised by the PostgresQL community's decision. The article by Berkus, by the way, is definitely worth reading, full of wisdom about platform coverage, tool and infrastructure support, workflow design, etc.

So, to summarize (as I understand it):

PostgresQL and Eclipse are migrating from CVS to git, successfully (although it is taking a significant amount of time and resources)
Apache is working to integrate git into its policies and infrastructure, but still uses Subversion as its primary scm system
Some people seem to feel like Apache is making the wrong decision about this

But what I don't understand, at the end of it all, is in what way this is opposed to "the Apache way?" From everything I can see, the Apache way is alive and well in these discussions.

UPDATE:Thomas Koch, in the comments, provides a number of substantial, concrete examples in which git's powerful functionality can be very helpful. The most important one that Thomas provides, I think, is this:

It is much easier to make a proper integration between review systems, Jenkins and Jira, if the patch remains in the VCS as a branch instead of leaving it.

I completely agree. Working with patch files in isolation is substantially worse than making reference to a branched change that is under SCM control. Certainly in my work with Derby I have seen many a contributor make minor technical errors while manipulating a patch file, that on the whole just adds friction to the overall process. Good point, Thomas!

Monday, November 28, 2011

Burton Bloom and the now-forgotten Computer Usage Company

Burton Bloom's original paper on Bloom Filters is entitled Space/Time Trade-offs in Hash Coding with Allowable Errors, and his by-line is given as

Burton H. Bloom
Computer Usage Company, Newton Upper Falls, Massachusetts

with the additional parenthetical note that

Work on this paper was in part performed while the author was affiliated with Computer Corporation of America, Cambridge, Massachusetts.

Now, I'm quite familiar with Computer Corporation of America; I was an employee of theirs from 1985-1988, and I vividly remember my days working in the 4 Cambridge Center building.

But that was 15 years after Bloom's paper was published, and when I was there, I don't recall anything about "Computer Usage Company".

What was Computer Usage Company?

Unfortunately, a web search reveals only the slightest details:

There is, of course, a Wikipedia page
There is a Facebook page
and there is a short entry at the Computer History Museum website

But there is very little else. George Trimble's homepage no longer exists, and most of the links from the existing summary pages at Wikipedia and elsewhere point to articles in the IEEE Annals of the History of Computing, which (like Bloom's original paper at the ACM site) is protected behind a paywall and can't be read by commoners.

Computer Usage Company is credited with being "the world's first computer software company", but it seems on the verge of disappearing into dust. It's a shame; you'd think the software industry would work harder to keep information about these early pioneers alive.

I wonder if the IEEE keeps any statistics regarding how many people have actually paid the $30 to purchase this 20-year-old, five page memoir? I would have been intrigued to read it; I might even have paid, say, $0.99 or something like that to get it on my Kindle. But thirty dollars?

Saturday, November 26, 2011

Ho-hum, just an 11-1 season

It's amazing to me that Stanford are, at this point, clinging to hopes for a BCS at-large bid. Should it really be this hard to get two Pac-12 teams into the BCS? I guess that the SEC are still hoping they will field 3 teams in the 10 team BCS schedule...

Quote-un-Quote: 50 interviews of indie game developers

Here's a great body of work: "Fifty Independent Videogame Developers; Fifty Interviews; Fifty Weeks".

The interviewer, who goes by the handle "moshboy", describes the intent of the project here:

all I wanted to do was get some words of insight out of a few independent videogame developers that weren’t known to put many of their own words ‘out there’. In the beginning, the idea was to interview those that had rarely or never been interviewed before.

His project succeeded, and produced a fascinating body of work:

Sometimes the quotes are a snapshot of a developer’s mindset from a certain time period, while most lean toward quoting some insight from their thoughts regarding videogame development.

The complete set of interviews are here.

Since I'm unfortunately not familiar with most of these developers, I found that a fun way to approach the work was just to scroll around in the list and randomly pick an interview.

Great job, moshboy, and thanks not only for embarking on the project and carrying it through, but for publishing the results for us all!

Friday, November 25, 2011

Distributed set difference computation using invertible Bloom filters

Recently I've been slowly but steadily working my way through a meaty but rewarding recent paper entitled: What's the Difference? Efficient Set Reconciliation without Prior Context.

The subject of the paper is straightforwardly expressed:

Both reconciliation and deduplication can be abstracted as the problem of efficiently computing the set difference between two sets stored at two nodes across a communication link. The set difference is the set of keys that are in one set but not the other. In reconciliation, the difference is used to compute the set union; in deduplication, it is used to compute the intersection. Efficiency is measured primarily by the bandwidth used (important when the two nodes are connected by a wide-area or mobile link), the latency in round-trip delays, and the computation used at the two hosts. We are particularly interested in optimizing the case when the set difference is small (e.g., the two nodes have almost the same set of routing updates to reconcile, or the two nodes have a large amount of duplicate data blocks) and when there is no prior communication or context between the two nodes.

The paper itself is well-written and clear, and certainly worth your time. It's been particularly rewarding for me because it's taken me down a path of investigating a lot of new algorithms that I hadn't previously been studying. My head is swimming with

Invertible Bloom Filters (a variation on counting Bloom filters, which in turn are a variation on basic Bloom filters, an algorithm that is now 40 years old!)
Tornado codes
Min-wise sketches
Characteristic Polynomial Interpolation
Approximate Reconciliation Trees

and many other related topics.

I hope to return to discussing a number of these sub-topics in later posts, whenever I find the time (heh heh). One of the things that's challenging about a lot of this work is that it's based on probabilistic algorithms, which take some time getting used to. I first studied these sorts of algorithms as an undergraduate in the early 1980's, but they still throw me when I encounter them. When studying probabilistic algorithms, you often encounter sections like the following (from the current paper):

The corollary implies that in order to decode an IBF that uses 4 independent hash functions with high probability, then one needs an overhead of k + 1 = 5. In other words, one has to use 5d cells, where d is the set difference. Our experiments later, however, show that an overhead that is somewhat less than 2 suffices.

The question always arises: what happens to the algorithm in those cases where the probabilities fail, and the algorithm gives the wrong answer (a false positive, say)? I believe, that, in general, you can often structure the overall computation so that in these cases the algorithm still gives the correct answer, but does more work. For example, in the deduplication scenario, you could perhaps structure things so that the set difference code (which is trying to compute the blocks that are identical in both datasets, so that they can be eliminated from one set as redundant and stored only in the other set) fails gracefully on a false positive. Here, a false positive would need to cause the overall algorithm to conclude that two blocks which are in fact distinct, but which collide in the data structure and hence appear to be identical, are treated as distinct and retained in both datasets.

That is, the algorithm could be designed so that it errs on the side of safety when the probabilities cause a false positive to be returned.

Alternatively, some probabilistic algorithms instead fail entirely with very low probability, but fail in such as way as to allow the higher-level code to either simply re-try the computation (if it involves random behaviors, then with high probability it will work the next time), or to vary the computation in some crucial aspect, to ensure that it will succeed (which is the case in this particular implementation).

Most treatments of probabilistic algorithms describe these details, but I still find it important to always keep them in my head, in order to satisfy myself that such a probabilistic algorithm is safe to deploy in practice.

Often, the issue in using probabilistic algorithms is to figure out how to set the parameters so that the behavior of the algorithm performs well. In this particular case, the issue involves estimating the size of the set difference:

To efficiently size our IBF, the Strata Estimator provides an estimate for d. If the Strata Estimator over-estimates, the subsequent IBF will be unnecessarily large and waste bandwidth. However if the Strata Estimator under-estimates, then the subsequent IBF may not decode and cost an expensive transmission of a larger IBF. To prevent this, the values returned by the estimator should be scaled up so that under-estimation rarely occurs.

That is, in this particular usage of the probabilistic algorithms, the data structure itself (the Invertible Bloom Filter) is powerful enough that the code can detect when it fails to be decoded. Using a larger IBF solves that problem, but we don't want to use a wastefully-large IBF, so the main effort of the paper involves techniques to compute the smallest IBF that is needed for a particular pair of sets to be diff'd.

If you're interested in studying these sorts of algorithms, the paper is well-written and straightforward to follow, and contains an excellent reference section with plenty of information on the underlying work on which it is based.

Meanwhile, while wandering through Professor Eppstein's web site, I came across this nifty Wikipedia book on data structures that he put together as course material for a class. Great stuff!

Thursday, November 24, 2011

Stanford crypto class

I'm not sure how this will turn out, but I've signed up for Professor Dan Boneh's online Cryptography class, which starts this winter.

Wednesday, November 23, 2011

The recent events at U C Davis and U C Berkeley

I mostly avoid political topics on my blog, but the current events on the University of California campuses are very important and need more attention. Here is a superb essay by Professor Bob Ostertag of U.C. Davis about the events of the last week, and a follow-up essay discussing ongoing events.

Meanwhile, it's interesting that some of the most compelling and insightful commentary is being published outside the U.S., for example this column and this column in the Guardian.

I don't know what the answers are. But I do know that the debate is important, and I salute the Davis and Berkeley communities for not backing down from the questions, and for opening their minds to the need to hold that debate, now. Our universities, and our children, are our future.

Monday, November 21, 2011

Danah Boyd on privacy in an online world

It's somewhat of a shock to realize that it's been more than a decade since Scott McNealy made his famous pronouncement on online privacy:

You have zero privacy anyway. Get over it.

Well, people haven't actually just got over it. It's an important, complex, and intricate issue, and happily it is getting the sort of attention it needs.

So you should set aside a bit of time, and dig into some of the fascinating work that danah boyd has published recently, including:

A detailed analysis of the impact of the Children's Online Privacy Protection Act: “Why Parents Help Their Children Lie to Facebook About Age: Unintended Consequences of the ‘Children’s Online Privacy Protection Act’” in the online journal First Monday,
and her remarks prepared for the Wall Street Journal: Debating Privacy in a Networked World for the WSJ

Both articles are extremely interesting, well-written, and deeply and carefully considered. Here's an excerpt from the WSJ discussion:

The strategies that people use to assert privacy in social media are diverse and complex, but the most notable approach involves limiting access to meaning while making content publicly accessible. I’m in awe of the countless teens I’ve met who use song lyrics, pronouns, and community references to encode meaning into publicly accessible content. If you don’t know who the Lions are or don’t know what happened Friday night or don’t know why a reference to Rihanna’s latest hit might be funny, you can’t interpret the meaning of the message. This is privacy in action.

And here's an excerpt from the First Monday article:

Furthermore, many parents reported that they helped their children create their accounts. Among the 84 percent of parents who were aware when their child first created the account, 64 percent helped create the account. Among those who knew that their child joined below the age of 13 — even if the child is now older than 13 — over two–thirds (68 percent) indicated that they helped their child create the account. Of those with children who are currently under 13 and on Facebook, an even greater percentage of parents were aware at the time of account creation. In other words, the vast majority of parents whose children signed up underage were involved in the process and would have been notified that the minimum age was 13 during the account creation process.

As Joshua Gans notes in a great essay on Digitopoly, this is not an easy situation for a parent to be in, and the stakes are actually quite high:

And there are actually many reasons why I would want to allow her to do that. First and foremost, this is the opportunity for me to monitor her interactions on Facebook — requiring she be a friend at least for a few years. That allows me some access and the ability to educate. Second, all of her friends were on Facebook. This is where tween interactions occur. Finally, I actually think that it is the evolving means of communication between people. To cut off a child from that seems like cutting them off from the future.

I can entirely sympathise; my wife and I had similar deep discussions about these questions with our children (although at the time it was MySpace and AOL, not Facebook ).

They are your kids; you know them best. In so many ways, Facebook is just another part of life that you can help them with, like all those other temptations of life (drugs, sex, etc.). Talk to them, tell them honestly and openly what the issues are, and why it matters. Keep an eye on what they are doing, and let them know you'll always be there for them.

There are no simple answers, but it's great that people like boyd and Gans are pressing the debate, raising awareness, and making us all think about what we want our modern online world to be like. Here's boyd again:

We must also switch the conversation from being about one of data collection to being one about data usage. This involves drawing on the language of abuse, violence, and victimization to think about what happens when people’s willingness to share is twisted to do them harm. Just as we have models for differentiating sex between consenting partners and rape, so too must we construct models that that separate usage that’s empowering and that which strips people of their freedoms and opportunities.

This isn't going to be easy, but it's hard to think about anything that is more important that the way in which people talk with each other.

So don't just "get over it". Think about it, research it, talk about it, and help ensure that the future turns out the way it should.

Following up on Jonathan's Card

This morning, the O'Reilly web site is running a condensed interview with Jonathan Stark, discussing, with the benefit of several months of hindsight, the intriguing "Jonathan's Card" events of the summer.

If you didn't pay much attention to Jonathan's Card as it was unfolding in real time, this is a good short introduction, with a summary of the events and some links to follow-up material.

Friday, November 18, 2011

The science of Maverick's

Here's a wonderful multi-media piece diving deep into the earth science behind the surfing marvel that is Maverick's. Enjoy!

Thursday, November 17, 2011

The Lewis Chessmen at NYC's Met

Here's a nice story in the New York Times about the Lewis Chessmen, a 1000-year-old set of carved walrus tusk chess pieces, on exhibit at the Metropolitan Museum of Art in New York City.

Too bad I'm on the wrong side of the country; I'd love to see these. Unfortunately, according to the Met website,

After the showing in New York, they will return to London.

I guess I'll just have to figure out a way to travel to see them in their permanent home at the British Museum!

Tuesday, November 15, 2011

Software Patents, Microsoft, Android, and Barnes & Noble

If you have any interest at all in the software industry, you'll be absolutely fascinated to read this detailed article at the GrokLaw website about the legal dispute between Microsoft and Barnes & Noble over Android-related patents.

It is well-known that Microsoft claims that Android infringes on Microsoft's patents; Microsoft themselves explain this on their website, saying they "simply cannot ignore infringement of this scope and scale", and that:

The Microsoft-created features protected by the patents infringed by the Nook and Nook Color tablet are core to the user experience.

and

Our agreements ensure respect and reasonable compensation for Microsoft's inventions and patent portfolio. Equally important, they enable licensees to make use of our patented innovations on a long-term and stable basis.

However, what has never been known (until now), is precisely what those patented innovations are. As Mary-Jo Foley observed more than 6 months ago, Microsoft refuses to identify the patents, and why it believes Android infringes upon them, unless a Non Disclosure Agreement is signed agreeing not to reveal that information.

Barnes & Noble apparently refused to sign that agreement, and instead found counsel to represent them, and now the information about the patents in question is no longer a secret.

According to the Barnes & Noble filings, the primary Microsoft patent which Android infringes is a 16-year-old patent (U.S. Patent 5,778,372), which patents:

A method of remotely browsing an electronic document residing at a remote site on a computer network and specifying a background image which is to be displayed with the electronic document superimposed thereon comprising in response to a user's request to browse to the electronic document.

Apparently, changing the background on your screen when a document is displayed is patented.

I understand software quite well.

I don't understand law at all, and specifically I don't understand intellectual property law.

However, I find the GrokLaw analysis of the Barnes & Noble v. Microsoft dispute absolutely fascinating.

Friday, November 11, 2011

Now THAT'S a data center!

Here's a fun story about the Switch Communications "SUPERNAP" data center in Nevada. Switch claims it is "the world's best data center" and they have the stats to justify their claim.

These Internet-scale datacenters have really taken off in recent years. Last month the Open Compute community held their second Open Compute Summit, and part of that effort was the establishment of a foundation to guide the work as it moves forward; read more about that effort here. I haven't seen too much technical information flowing from the Open Compute Summit, although James Hamilton of Amazon posted his slides online here: here

Meanwhile (was this part of the summit, or independent?), the team at AnandTech have done some independent testing of the Open Compute server components; in their conclusion, they commend the Open Compute work as showing tremendous potential:

The Facebook Open Compute servers have made quite an impression on us. Remember, this is Facebook's first attempt to build a cloud server! This server uses very little power when running at low load (see our idle numbers) and offers slightly better performance while consuming less energy than one of the best general purpose servers on the market. The power supply power factor is also top notch, resulting in even more savings (e.g. power factoring correction) in the data center.
While it's possible to look at the Open Compute servers as a "Cloud only" solution, we imagine anyone with quite a few load-balanced web servers will be interested in the hardware. So far only Cloud / hyperscale data center oriented players like Rackspace have picked up the Open Compute idea, but a lot of other people could benefit from buying these kind of "keep it simple" servers in smaller quantities.

Lastly, since much of the activity in this area of computing has to do with power efficiency, let me draw your attention to this interesting work on power management in Android.

Cheaper, faster, and more power-efficient: the future of computing beckons!

Wednesday, November 9, 2011

Coarse language in professional writing

Scott Hanselman shares his feelings about the use of coarse language.

Zach Holman responds.

Ted Dziuba says the real issue is passion and honesty versus marketing and publishing.

I can see both sides. Nothing about the language that Holman uses (or that Heinmeier Hansson does, for that matter) gets under my skin; perhaps I'm just thicker-skinned than many.

But meanwhile I know many readers who are put off by such things.

And so in my own writing I do my best to avoid such.

But I agree that you should (a) write about what you care about, and (b) write in your own words, not in the words that you think others want you to speak.

I guess I'm not adding much to the conversation, but there you go: a few pointers to some interesting articles and a bit of an observation by me.

Tuesday, November 8, 2011

Systems Software

Here's a simple 3 step formula for determining if you are a systems software engineer (or are destined to be one):

Go to this year's agenda for the High Performance Transaction Systems workshop.
Scroll down the page, and see if you feel an irresistible urge to click on every single link, read every single slide of every single presentation, and follow every single reference.
There is no step 3.

Monday, November 7, 2011

24 hours at Fukushima

This month, the IEEE's Spectrum magazine publishes its special report, Fukushima and the Future of Nuclear Power. It's an immense and detailed report, with multiple articles, multi-media presentations, and lots of material to dig through.

A good place to start is the lead article, 24 Hours at Fukushima. This article summarizes the events of the critical first 24 hours after the earthquake, with a focus on specific events and actions that seem like they represent learning opportunities.

The article has all sorts of fascinating details about the events of that day, such as the fact that it was hard to bring emergency equipment to the site when the roads were full of evacuations headed from the side; the observation that, after the earthquake but prior to the tsunami, an emergency cooling system was intentionally shut down because it was cooling the reactor too fast; the detail that, once power went out, electric security locks on building doors and fences had to be first broken before emergency equipment could be moved through them; and the observation that, when reactor 1 exploded, debris from the explosion ripped through the emergency backup power cable that had been installed to bring emergency power back to the plant.

And much, much more. There are so many small breakdowns, and decisions, and implications, that can be considered and thought about and studied.

The article notes that many of the subsequent problems arose from the fact that backup electrical power was lost, and could not be restored, and suggests several lessons that should be learned, including various ways to ensure that backup electrical power would be less likely to be lost, more likely to be subsequently restored, and perhaps even less likely to be needed; specifically, the article calls out 6 "lessons":

LESSON 1: Emergency generators should be installed at high elevations or in watertight chambers.
LESSON 2: If a cooling system is intended to operate without power, make sure all of its parts can be manipulated without power.
LESSON 3: Keep power trucks on or very close to the power plant site.
LESSON 4: Install independent and secure battery systems to power crucial instruments during emergencies
LESSON 5: Ensure that catalytic hydrogen recombiners (power-free devices that turn dangerous hydrogen gas back into steam) are positioned at the tops of reactor buildings where gas would most likely collect.
LESSON 6: Install power-free filters on vent lines to remove radio-active materials and allow for venting that won't harm nearby residents.

Not all of these lessons seem self-evidently obvious to me; for example, it seems like the recommendation to store backup power trucks "very close to the power plant site" would simply have resulted in leaving those trucks vulnerable to the same event that took out the main building power systems. As we know, the 14-meter tsunami washed away much larger and more resilient structures than backup power trucks.

Still, the lessons seem well-meant and clearly point out starting points for the discussions to come about improvements and enhancements. I love lesson 2 in particular, as it points out one of those "obvious in hindsight" mistakes that clearly represents an opportunity for all operators in every such site to review their similar equipment and ensure that it doesn't suffer from the same design flaw.

Engineering is hard. Things happen that you didn't expect, and you have to study your mistakes, learn from them, explore alternatives, test systems, and revise, revise, revise. As the article notes, we've learned a great deal from the tragedy at Fukushima, and we need to continue to learn more.

I'm not a nuclear engineer, but I am immensely grateful for efforts such as this one, to help us interested lay-people try to come to grips with what happened, and why, and what does it mean, and how do we make it better in the future.

Certainly, it makes me more motivated to return to my own designs, and to study them, and test them, and continue to learn from my own failures and make my future work better. I recommend this article highly; I think you'll find it interesting and well worth your time.

Perforce 2011.1 is out!

Yay! The 2011.1 release of the Perforce server is ready! Read more about it at the Perforce site.

This is the second major release since I joined the team, and I'm quite excited about it. Although I played only a minor role in this release, I had a chance to get involved in many of the new features, and there are some really powerful enhancements in this release.

Engineers love releases (really, we do!): the whole point of writing software is to build something that gets used, and in order for it to get used, it has to get released. So even though a release is a whole lot of work, the result is that a new version of the product becomes available, and gets used, and that's always exciting!

So, congratulations, Perforce, on the 2011.1 release!

Sunday, November 6, 2011

Epic Win for Anonymous

I recently read Cole Stryker's Epic Win for Anonymous: How 4chan's army conquered the web. Well, that's somewhat over-stating the accomplishment: I skimmed it; I waltzed through it; I breezed over it.

This is the sort of book that takes you about 2 solid hours to read, if you try hard. And I'm not exactly sure why you would try hard, because it isn't really a book that rewards that. It is a very transparent book: it sets a simple goal, and it achieves it, completely:

If you've ever wondered, while browsing the web, "Why is this weird thing popular? Who cares about this stuff? How does this thing have so many views? Why do people waste their time with this? Where did it come from and where is it all going?" then read on.

Stryker's book succeeds: it helps you understand the concept of Internet memes; it shines a little light into the odd, strange corners of the Internet; it gives you some context for approaching some of the aspects of Internet life that probably seemed, if not downright horrifying, at least hard to comprehend:

What are memes?
Why is anonymity such a big deal on the Internet?
What are griefers, trolls, noobs?

If you've never heard of Anonymous, 4chan, lolcats, Rule 34, Star Wars Boy, or Encyclopedia Dramatica, then you should probably just pass this book by; its subject matter is of no interest.

But if you've heard of those topics, yet been slightly intimidated, and slightly unsure of how to proceed, then you might find this book helpful: it de-mystifies much of those lesser-known areas of the Internet, sets them out in plain terms and simple descriptions, and gives you at least enough knowledge to decide for yourself whether you want to know more.

As I reflected on the book, and tried to understand what I had learned, and how to summarize it, I found myself drawn to a particular passage. Stryker is describing an old (1986) computer game called Habitat, which was an early investigation of human-versus-human gaming:

One contentious game play element in Habitat was "Player vs. Player" or "PvP" killing. Experienced players were able to handily murder noobs, which made the game less fun for everyone but those who'd been there the longest. In addition, the very concept of virtual murder was controversial. It didn't take long for trolls to start randomly killing other players as they wandered around the virtual town. But if the engineers were to disallow PvP killing entirely, they would rob players of the thrill of danger and the joys of conquest. The moderators held a pool, asking if killing should be allowed in Habitat. The results were split 50/50. So they compromised. Killing would be disallowed inside the carefully manicured urban areas, but the moment you left town and headed out into the frontier, you were announcing to other players that you were down to scrap, if need be. This clever solution pleased most players, and continues to be the standard for many massively multiplayer games.
So will the Internet continue to look. Those who value safety over freedom will hang out on Facebook and other proprietary communities and mobile apps walled off with identity authentication. And those willing to brave the jungles of the open Internet will continue to spend time in anonymous IRC channels and message boards like 4chan.

It's an interesting metaphor, and I think it's insightful. In a new world, it's important to have a discussion about rules. And to have that discussion, there has to be a certain amount of discussion about where (and when) the rules apply. As Stryker notes:

/b/ is significant because it's the only board on 4chan that has no rules (the only thing prohibited is committing or plotting actual crimes, the same rules that apply to any public forum on or offline).

Actually, as it turns out, there are more rules than these, but to a certain extent in order to understand the rules, you have to be a member of the community.

The Internet is still young, and we are still learning how we want to behave in this new cyberspace. Places like 4chan, although almost certainly not your cup of tea, are still worth understanding and thinking about, and Stryker's book is a step toward opening the discussion and having those debates.

Four above 2800!

As Dylan McClain observes in a column in today's NY Times, the current FIDE chess rankings list now includes four players with ratings of 2800 or higher:

Magnus Carlsen, ranking 2826
Viswanathan Anand, ranking 2811
Levon Aronian, ranking 2802
Vladimir Kramnik, ranking 2800

This is great news, and really emphasizes not just the excellence that is currently manifest in the top levels of chess, but also its growing spread, as the top list includes players from several countries not previously known for chess prowess (Carlsen is from Norway, Anand from India, and Wang Hao from China is in the top twenty as well).

But McClain goes on to claim:

But ratings inflation — caused in part by looser rules guiding them — makes it difficult to compare different eras.
The ratings system was actually never intended for such comparisons. It was created in 1960 by Arpad Elo, a physics professor, as a snapshot of each player’s ability and a tool for predicting games’ outcomes. The system has been tweaked over the years, but it has held up well.

McClain provides no evidence for this claim, which is a shame, as from what little I know, the evidence in fact shows entirely the opposite. As I described in a short blog post last summer, a fairly detailed study by students at the University of Buffalo recently concluded that

there has been little or no ‘inﬂation’ in ratings over time—if anything there has been deﬂation. This runs counter to conventional wisdom, but is predicted by population models on which rating systems have been based

Regardless of whether or not the ratings are being inflated, there is no doubt in my mind that today's chess players are playing some wonderful chess. As we look toward next year's World Chess Championship, there is lots of reason to be excited about the world of chess!

Friday, November 4, 2011

They're fish!

I really enjoyed this fabulous interview with William Gibson in The Paris Review.

Gibson, of course, is one of the greatest science fiction writers ever, the man who coined the term "cyberspace", who gave us (so far) nine spectacular novels, with hopefully more coming.

What will you learn if you go read the interview? Well, all sorts of things!

You'll learn about Gibson's fascinating writing techniques: never planning past the first sentence, constantly re-working and re-considering his story:
Every day, when I sit down with the manuscript, I start at page one and go through the whole thing, revising freely.
...
I think revision is hugely underrated. It is very seldom recognized as a place where the higher creativity can live, or where it can manifest. I think it was Yeats who said that literary revision was the only place in life where a man could truly improve himself.
letting the work flow from someplace hard to describe:
I’ve never had any direct fictional input, that I know of, from dreams, but when I’m working optimally I’m in the equivalent of an ongoing lucid dream. That gives me my story, but it also leaves me devoid of much theoretical or philosophical rationale for why the story winds up as it does on the page. The sort of narratives I don’t trust, as a reader, smell of homework.
You'll learn about Gibson's view on whether science fiction writers are writing about the future, the past, or the present:
Nobody can know the real future. And novels set in imaginary futures are necessarily about the moment in which they are written. As soon as a work is complete, it will begin to acquire a patina of anachronism. I know that from the moment I add the final period, the text is moving steadily forward into the real future.
...
all fiction is speculative, and all history, too—endlessly subject to revision.
You'll learn what Gibson, surprisingly, finds to be the technology that is most characteristic of the human species:
Cities look to me to be our most characteristic technology. We didn’t really get interesting as a species until we became able to do cities—that’s when it all got really diverse, because you can’t do cities without a substrate of other technologies. There’s a mathematics to it—a city can’t get over a certain size unless you can grow, gather, and store a certain amount of food in the vicinity. Then you can’t get any bigger unless you understand how to do sewage. If you don’t have efficient sewage technology the city gets to a certain size and everybody gets cholera.
You'll get some great anecdotes that will totally drop your jaw:
For years, I’d found myself telling interviewers and readers that I believed it was possible to write a novel set in the present that would have an effect very similar to the effect of novels I had set in imaginary futures. I think I said it so many times, and probably with such a pissy tone of exasperation, that I finally decided I had to call myself on it.
A friend knew a woman who was having old-fashioned electroshock therapy for depression. He’d pick her up at the clinic after the session and drive her not home but to a fish market. He’d lead her to the ice tables where the day’s catch was spread out, and he’d just stand there with her, and she’d look at the ice tables for a really long time with a blank, searching expression. Finally, she’d turn to him and say, “Wow, they’re fish, aren’t they!” After electroshock, she had this experience of unutterable, indescribable wonderment at seeing these things completely removed from all context of memory, and gradually her brain would come back together and say, Damn, they’re fish. That’s kind of what I do.

It's a thrilling roller-coaster of an interview, with so many choice bits that you'll find yourself returning to his ideas again and again.

Just as we do with his books!

Enjoy.

Thursday, November 3, 2011

Kindle Lending Library

Wow! This, by itself, is just about enough to make me go buy a Kindle.

Andreesen Horowitz bloggers

The blogging portfolio at the Andreesen Horowitz web site continues to grow, with Peter Levine the latest to start a blog.

I think that most of this blogging activity has been inspired by Ben Horowitz, who is by far the most active and best blogger of the bunch.

But the other blogs are quite interesting too, and there are some additional interesting materials on the site. If you're interested in the software industry, and more specifically in the VC-funded software startup industry, there is a lot of interesting material to see here, including blogs from:

I know from my own experience how much time and effort it takes to write a blog. I'm pleased that this group, who have lots of experience and knowledge, are investing the energy into sharing their thoughts and opinions; hopefully it will inspire others to do the same, and in the meantime it means more interesting essays to read!

To get you started, check out Ben's recent essay: Hiring Executives: If You’ve Never Done the Job, How Do You Hire Somebody Good?.

Tuesday, November 1, 2011

This is how computer software evolves

Legacy software, so goes the old saying, is any software that actually works.

Of course, there is a fair nugget of truth to this aphorism, for once a system is running we start to be increasingly unwilling to change it.

Yet software is soft for a reason; it can be changed, and it can be improved.

It's always interesting to watch this process at work, for the improving of software can be a messy business, not just for technical reasons, but also because of social, cultural, and business reasons.

Today's exhibit: JavaScript, now venerable, but not so long ago it was the newest language on the Net. JavaScript has always had a history of strife and tension when it comes to how to upgrade the language, and has suffered through its bumps and bruises.

A major event recently was Google's decision to take a new direction with their proposed new Dart language. As Weiqi Gao observes, this has grown into quite the discussion, with some backers wanting to continue to improve JavaScript, while others feel that it's time to embark on designing a new language (such as Dart). Major discussion has ensued, and some of the emotions have been rather heated:

I respect the people involved and believe they’re for the most part making their own choices. But Dart and other unrelated Google agenda items do impose clear and significant opportunity costs on Google’s standards actiivities.

"Unrelated"? I think that is an overly strong critique. Surely Dart is strongly related.

What is the best way, then? Continue to improve the existing language, or work on building new languages? Can we really not do both? After all, Dart is not the only new language built on top of and closely related to JavaScript: consider CoffeeScript, for example. It is another new language which is designed for the web, and is implemented by being compiled into JavaScript. CoffeeScript is interesting; it pushes the envelope in other directions, such as borrowing the "whitespace is structure" paradigm from Python (a technique I've always found fascinating, even if I can barely manage to maintain decent indentation standards in my own code).

I think such experimentation with new languages is wonderful, which is why it's so great to see techniques such as Source Maps, which allow tools like IDEs and Debuggers that were written for one language to be used, in a semi-interpreted style, with other newer languages built atop those old languages. Clever!

So, just as Weiqi Gao does, I applaud the invention of these new languages, and I hope the experimentation continues. And I applaud the improvement of compilers and VMs and language runtimes, to enable such language experimentation and innovation. Even though I continue to earn my living coding in Dennis Ritchie's good old C language, all these new ideas and new approaches help us all think about problems in new ways, and find better techniques to solve problems.

It is, as the kids say, all good.