Journal of a Programmer

Friday, February 28, 2014

Three years!

Oh, please please please please please please may it be so.

Names and values

One of the greatest educators in the Computer Science field, Niklaus Wirth, has reached his eightieth birthday, which was reason for a recent celebration.

Among many other things, Professor Wirth is the subject of a very old, very dumb, Computer Science joke; when asked how to pronounce his name, he was reported to have replied:

If you call me by name, it is VEERT.
If you call me by value, it is WORTH.

The joke of course refers to one of the basic techniques used in programming languages, which is exhaustively explained by this tedious Wikipedia page.

I rather prefer the joke.

Anyway, I was thinking recently about a completely different aspect of names versus values, which arises in the work I do at my day job.

When Version Control Systems keep track of objects, they pay particular attention to two aspects of the object:

The NAME of the object, e.g.
derby/trunk/java/engine/org/apache/derby/iapi/util/Operator.java

The CONTENT of the object, e.g.

package org.apache.derby.iapi.util;

/**
        Provides an interface for an operator that operates on a range of objects
        E.g in a cache.
*/
public interface Operator {

        /**
                Operate on an input object
        */
        public void operate(Object other);
}

The name of the object is an obvious way to give you a handle to the object, and many of the operations that the Version Control System provides involve using the name:

What version of Operator.java is the most recent one?
What version of Operator.java was used in release 10.8?
Who has Operator.java checked out?
Which branches are there of Operator.java

But a really interesting aspect of Version Control Systems is that they are tools for helping you track and manage change.

And one of the things that can change about an object is its name.

That is, you can rename a file.

A lot of the original Version Control Systems, back in the 1970's and 1980's, didn't handle renaming a file very well. If you changed the name of a file, often the best you could do was to delete the file with the old name, and add the file back with the new name, leaving you with two different objects, and a gaping discontinuity in the history of the object.

And that's wrong, because there aren't two different objects; there is one object, and what you did was to change its name.

A high quality Version Control System takes care of these problems for you, and systems like git and Perforce have rich functionality in this area and have no problem understanding when a file is renamed.

But it's interesting that they do this in quite different ways.

Perforce maintains a databases of names. Each file name is a separate entry in its database, and it records things such as when two names refer to the same file (that is, when you have made a branch of a file), or when a file's name was changed (that is, when you have renamed a file). At the very lowest levels of the database, in fact, these things are the same, for branching a file and renaming a file are the same action as regards the NAME of the file; the difference is that when you branch a file, you continue creating new versions of both files (the file with the old name, and the file with the new name), whereas when you rename a file, that is the last version of the file with the old name, and all future versions of that file are versions of the file with the new name. And, of course, sometimes you rename a file and then you rename it back to its old name. Perforce can track any of those cases in its database equally well.

The point is that, in Perforce, the "primary key" for the object is its name. The name of the object is the primary way that Perforce refers to the objects, and it's the way it organizes the objects internally in its repository. (Try poking around in the server's filesystem for a while and you'll see what I mean.)

git takes a different approach, preferring to organize its database most naturally by creating a database of CONTENT. Quoting from Pro Git:

Git is a content-addressable filesystem. Great. What does that mean? It means that at the core of Git is a simple key-value data store. You can insert any kind of content into it, and it will give you back a key that you can use to retrieve the content again at any time.
...
You can see a file in the objects directory. This is how Git stores the content initially — as a single file per piece of content, named with the SHA-1 checksum of the content and its header.
...
The next type you’ll look at is the tree object, which solves the problem of storing the filename and also allows you to store a group of files together. Git stores content in a manner similar to a UNIX filesystem, but a bit simplified. All the content is stored as tree and blob objects, with trees corresponding to UNIX directory entries and blobs corresponding more or less to inodes or file contents. A single tree object contains one or more tree entries, each of which contains a SHA-1 pointer to a blob or subtree with its associated mode, type, and filename.

There's a fundamental duality here, which is at the heart of how both systems work. Objects have names, and objects have values, and sometimes you care about the name of the file, and sometimes you care about the content of the file.

Perforce tends to keep track of files by their name, which means that usually you start with the name of a file and then access its content, and occasionally Perforce has to go hunting through its database to find the other names of the file, such as when you are moving work from one branch to another and the file has been renamed in the one branch but not the other.

Git, on the other hand, tends to keep track of files by their content, which means that usually you start by knowing the SHA-1 of a file, which is how you access it, and if you want to know the name of the file Git will look that up for you. And if you want to rename a file, you don't have to tell Git that; it will just figure it out because it notices that the SHA-1 signature of the new object matches the SHA-1 signature of an object it already knows about.

This is part of the answer to some fundamental complaints about both systems. Since Perforce spends time maintaining a sophisticated database of file names, when you create a branch, that causes Perforce to create many new file names in its database (one new file name for each file in the branch, together with records which relate the old name to the new one), and thus in Perforce, "branches are expensive". For small branches, Perforce does this so fast that you never notice it, but for large branches you can detect the pause. (And if you add many thousands of branches with many millions of files in them, you may have to buy a bigger server.)

Meanwhile, since Git generally doesn't care about the name of the file, more about its content, creating a branch is very cheap, because Git just has a single operation to perform, no matter how many files are in the branch, so in Git, "branches are cheap."

But on the other hand, this mean that Git has to frequently figure out the SHA-1 of objects, so it spends a lot of time reading files and computing their SHA-1 signatures. For small files, Git does this so fast that you never notice it, but for large files the scanning and cryptography is significant, so in Git, "tracking large files is expensive."

Whereas in Perforce, since it only looks at the content once, when you submit the file, and then just tracks the file by its name, Perforce handles files of any size easily, and tracks gigantic files containing hundreds of gigabytes of content just as fast as it tracks tiny little Operator.java.

These sorts of dualities arise all the time in Computer Science, which, like any engineering field, is full of tradeoffs.

If you make one thing cheap to compute and fast to look up, you tend to have to make a tradeoff, such that other things are harder to compute and slower to look up.

You can't say that one decision is unambiguously "better"; you have to think about the tradeoffs, and why they were made, and what they mean about how to use the system effectively.

Or, you can just think about Professor Wirth, and whether you should call him by name, or call him by value.

Wednesday, February 26, 2014

Extreme sailing at the highest, highest level

Six months have passed since The Miracle on San Francisco Bay.

There's been the few odds and ends stories here and there, but nothing that really dove into the details.

Now, along comes Kimball Livingston, the best living writer of the world of sailing, with all you could ever want to know about what really happened:

Burns and Speer: Secrets of the Comeback
Oracle studied how to retrim to add more load to the back of the wing. “The boat had lee helm,” Speer said. “You know that kills upwind speed. It was clear that we needed to retrim, so we raked the wing aft—and no, that didn’t work. It turned out that when we powered-off the upper elements—when we added twist aloft—the center of effort shifted down and forward. There was no relief in that. So instead we opened the slot. That gave us less lift on the main element and more lift on the flap [which funnels air aft]. Over the course of the regatta we increased the traveler load by 50 percent. That eliminated lee helm, helped the boat point, and simply made us faster upwind.
Secrets of the Comeback: Control Control Control
The essence of a feedback control system is the system works on the difference between what you want and what actually exists. This difference in control engineering parlance is called the error signal. It can be anything, as long as there is this difference operation is going on. The difference can be electrical, as when an electronic sensor provides a measurement and the subtraction is done in the control computer. But you can also set it up so the difference is done mechanically. This is what OTUSA did.

Modern sports, at the highest level, have become extraordinarily sophisticated, and with that sophistication comes extraordinary complexity.

Your race car may catch a virus.

Your speed skating aerodynamic suit may not perform as expected

Your bobsleds may need to be adjusted to match the banking of the curves on the course.

It's always fun to get a peek behind the scenes to learn more about the underlying techniques that are used at the highest level of competition.

Now is art

Just look at it..

I love the way it depicts the mid-ocean gaps over the Atlantic (between Brazil and West Africa) and Pacific (between New Zealand and Africa) oceans.

And the colors are just right, and the shapes of the continents are beautiful, and even that snarky little "Rude to Call" seems to be just right.

I declare it to be art.

Sunday, February 23, 2014

EU IV 1650: I've become an overlord

A progress report from my Europa Universalis IV game.

I've reached the mid 1600's.

Sometime in the early 1600's, my country became advanced enough to begin exploring the larger world and founding colonies.

My first colony was in South East Asia, on the island of Sumatra I think, and I was pleased to see that I had not only established a new colony, but would be exporting "spices".

A few years of game time later, I founded a colony in Southern Africa, somewhere near modern-day Mozambique.

As my colony became established, the game presented me with a popup message, informing me that the exported resource of this colony would be: "slaves", and displaying an image of a manacle with a ball-and-chain.

The new colony immediately became of the highest-valued provinces in my empire, but have I no control over this decision? Surely it's historically accurate, but it's also rather disturbing.

I can't help but think that the sudden onslaught of massive wars initiated by Russia and Austria over the next few years is somehow due to this event.

How does the game make these decisions and trigger these events to occur? It's fascinating to see how it immerses me in the history of the times, and in the complex emotions and social issues that were occurring then.

Obviously, I'm not the only person to find this a troubling question; just read some of the forums.

Or, better, have a read of the fine and thoughtful essay by April Daniels: Europa Universalis IV is The Best Genocide Simulator of The Year.

As Daniels writes:

Let’s be clear about one thing: in real life, the colonization of North America by European settlers was only possible because of the accompanying slow-motion genocide of the people who were already living here. The First Nations of the Americas did not have castles, or royal dynasties, or a continent-spanning church like the Europeans, but they did have a civilization.
They had politics, trade, cultural exchange, territorial disputes, and wars. They built cities and temples, domesticated animals, and mastered their environment just as thoroughly as any other people on the planet.
I knew going in that I’d playing a game about a topic that, in real life, is horrifying to my (white, privileged) progressive sensibilities. I thought I was prepared for it.
Then I actually saw how they treat the Americas.

The question is: is Europa Universalis IV a game about trying to understand history? About trying to explore what happened, in a moderately realistic fashion, with as much depth and sophistication as can be provided, while still keeping it fun enough to be a game?

Or is it a toolbox for trying to rewrite history, to change the outcome, to make a world that wasn't?

I think that the team at Paradox Interactive are pretty clear on what their goal was, and why they built the game. As they say in the initial Development Diary:

In all our games we aim to have believable mechanics. When playing a Grand Strategy game it should be about immersion and suspension of disbelief. You should feel like you are playing a country in the time period.
...
I'll try to clarify a confusion about sandbox, historical events and plausibility. Europa Universalis have always been about historically plausible outcomes, as I mentioned over six years ago , and EU4 is no different in that regard. No determenism or full sandbox will ever be in the EU series. In EU3 we scrapped historical events and added lots and lots of system and mechanics to create more plausible gameplay. While we are continuing on that concept and keep making more plausible mechanics, we are in EU4 doing something new...
We'e adding in Dynamic Historical Events. We'll have more of those than we had historical in EU2, and together with a fair amount of other planned features, this is creating an even more immersive type of gameplay, where countries feel far more unique than they did in any previous game in the series. A 'dynamic historical event', or DHE for short, is an event that has some rather rigid triggers that they feel plausible to happen with, ie, no Spanish Bankruptcy just because its a certain date, but events that tie into mechanics rather heavily.
The example I want to talk about is War of the Roses for England. At any point of time, before 1500, if England lacks an heir, then the chain for War of the Roses can start, which creates a lot of interesting situations for the player, as well as giving unique historical immersion.

It's clear that the development team set out with a goal, and I think it's pretty clear that they achieved it. Perhaps the best evidence of that successful result is the reaction it inspires in commentators like Daniels, who notes:

For a game about creating alternate histories, Europa Universalis IV has some very firm opinions about what should happen to the peoples living in the parts of the world that aren’t Europe. None of them good. I don’t mean to say that it endorses genocide, merely that it doesn’t question it. The game accepts it as natural, inevitable, and unworthy of comment. For a game about creating alternate histories, Europa Universalis IV has some very firm opinions about what should happen to the peoples living in the parts of the world that aren’t Europe.

The (human) history of our world is complex; you may spend your entire life studying it, and thinking about it, and trying to understand what it all means.

And, of course, it's happening all around us, right now.

It's important to keep in mind that Europa Universalis IV is a game, though set in a very specific time period in a very specific world. Colonialism occurred. Conquest occurred. Religious wars and rapacious exploitation occurred.

But I think it's possible to enjoy (to greatly enjoy, in fact!) playing the game, even while it does something that not every game does, nowadays:

It makes you stop and think.

Saturday, February 22, 2014

TPP

Wow, I've been busy recently! Haven't blogged in a week. Oh well.

I've got some stuff to write about, but first I have to at least briefly note this week's Internet rage: TPP, or Twitch Plays Pokemon.

If you're on the Internet, you already know about this, but if not, here's a quick guided tour:

Start with Max Woolf's nice article: Game Theory: How 70,000 Pokemon Players Sabotage Themselves
On the game live-streaming site Twitch.tv, one user simply known as “TwitchPlaysPokemon” setup a live-stream of Pokemon Red, but with a twist: all game commands, such as “up”,“down”,“left”,“right”,“b”,“a”,“select”, and “start”, would be input by typing the appropriate command into the livestream chat. At first, this seems like a crazy idea: if thousands of people are inputting commands at the same time, could we accomplish anything in the game?
Patricia Hernandez captures the spirit of the communal-performance-art aspects of the event: The Miraculous Progress of 'Twitch Plays Pokémon'
Amusingly, because of the delay between when users say something in chat and when it actually happens on-screen, it became easy to accidentally select the wrong things on menus. As a result, the character seems to compulsively check his item bag—specifically, he keeps trying to select the Helix fossil. You can't do anything with the Helix fossil until you bring it to Cinnabar Island in the games, at which point you can use it to revive Omanyte, an ancient fossil Pokemon. Still, the player "consults" the Helix fossil so much that people joke around about it as if it was a holy deity which the player uses for guidance, and that the real point of the entire thing isn't to beat the Elite Four, but rather to revive the old god, Omanyte.
Andrew Cunningham considers the meta-aspects on the spectacle: The bizarre, mind-numbing, mesmerizing beauty of “Twitch Plays Pokémon”
It could even be that Twitch Plays Pokémon is a bleak-but-perfect summary of the human condition—a group of people unified behind a common cause that struggles and fails to accomplish even the most basic tasks. We ostensibly want the same thing, yet we expend Herculean amounts of effort only to end up right back where we started—at best. And that's the case even without considering the people who are only out for themselves.
In any case, Twitch Plays Pokémon encapsulates the best and worst qualities of our user-driven, novelty-hungry age. Today's Internet has an extraordinary propensity for creating things that (1) grow quickly, virally, and organically through word of mouth, (2) provide hours of entertainment, and (3) waste days of peoples’ lives for no apparent purpose (see also: Flappy Bird).
Randall Munroe sympathizes: First Date
I sympathize with the TPP protagonist because I, too, have progressed through a surprising number of stages of life despite spending entire days stuck against simple obstacles.
And Patricia Hernandez returns to the story to explain meta-strategies, anarchy-versus-democracy, and other coordination techniques by which the "team" is making progress: How Players Actually Make Progress in 'Twitch Plays Pokémon'
The Nascar strategy pictured above, which is specific to a particular part of the game, is one of many strategies formulated by folks who are intent on making progress. Here are a couple of other popular strategies and visual arguments relating to game progression that have floated around the web

I have to say, the last two weeks at work have felt like a personal version of Twitch Plays Pokemon, but that's to be expected, given the project I'm currently embroiled in.

Maybe I need to invoke anarchy, or spam the B button.

Or maybe I'll just write some more tests and build another feature.

Saturday, February 15, 2014

On being critical, and the risk of burning out.

I enjoyed Josh Braegger's short essay: 5 Ways To Burn Out Programming.

I was particularly interested in this part:

It's really easy to fall into the "being critical" trap. It's easy to tell other people what the "wrong" choice is. I imagine it's because as software engineers, our job is to find faults in our applications and fix them. And if we don't find them, someone else finds them for us.
But I don't think we need to be negative about our job, decisions that are being made (even if it's not our decision) and what we're working on. Some of the best projects I've worked on worked out that way because we had a great, positive team. We enjoyed showing up every day to work, told each other when we did awesome things, held back heavy-handed criticism and phrased it in a productive manner.

I think this is a fine line, but absolutely one worth dwelling on.

One the one hand, the value of a workplace full of positive energy, respect, cooperation, and dignity is incalculable. I think it's well-put by the Perforce commandments:

P4 Commandments -- Values we work by

* We have high standards.
* We are straightforward.
* We rise to responsibility.
* We like work we can be proud of.
* We like to hear what we've done.
* We value both people skills and job skills.
* We treat each other with dignity and respect.
* We are one team. We are not in competition with each other.
* We talk and listen. We like feedback.
* We appreciate creative and practical solutions. There might be
  a better way.
* We appreciate people for who they are.
* Fun is always an option. It is not mandatory.
* These are the best years of our lives.

Yet, software engineering definitely attracts obsessive, detail-oriented perfectionists. If you aren't passionate about every little crook and cranny of your project, things fall apart quick. Software is hard enough to build in the best of cases; if you don't obsess over it, if you don't demand the utmost from it, if you don't poke at it and tear it apart and rip into it, the result isn't worth squat.

But there's definitely a difference between being critical of the software, and being critical of the people. Two of the most challenging things you will face as a software engineer are:

Learning how to be critical of the work, without being critical of the worker.
Learning how to receive a criticism of your work, without taking it as a criticism of yourself.

I suspect that most people find that the second one is markedly harder than the first.

Some of the most exciting, enjoyable, and illuminating times in my software career have involved working with people who "don't suffer fools gladly." Ask a stupid question, and they'll let you know. Make a poor design choice, and they'll let you know. Phrase a line of code poorly, and they'll let you know.

But, time and again, these are the people I most value, and most seek out. My life is short; the amount remaining to learn is large; and, as my old friend Walt used to say, "if you aren't struggling, you aren't learning."

So, go ahead and be a perfectionist. Strive to write the best software you possibly can.

But don't think you're there yet. Did you just write some software? Well, there's a mistake in it. I guarantee it. Go back, and look at it again; write another test; show it to somebody else; run a checker over it. Be as critical as you possibly can about that code, just remember that the person who wrote it is a human being, and the code is the code.