Journal of a Programmer: June 2010

Wednesday, June 30, 2010

Great "letter from the CEO" post

If this is for real (and I think it is), this may go down as the greatest "letter from the CEO to our employees" letter of all time!

Strange oil trading story

This story is really strange and bizarre. Here's the underlying FSA story, and all the details.

What the documents don't really describe is how the FSA got involved.

Why was only Perkins sanctioned? Did not PVM themselves bear some of the responsibility?

And what sort of financial system has the world built for itself, when:

Perkins' employer, PVM Oil Futures Ltd, did no proprietary trading, but in the early hours of the morning on Tuesday 30 June 2009, Perkins traded on ICE without any client authorisation. He traded in extremely high volume in the ICE August 2009 Brent contract and in doing so accumulated a long outright position in Brent in excess of 7,000 lots (representing over 7 million barrels of oil).

All I can think, is: Yikers!

The home stretch

This is the first day in 3 full weeks that there isn't a World Cup game on the TV to watch. Time to get back to that "honey-do" list...

Uruguay

Ghana

Argentina

Germany

Netherlands

Paraguay

Brazil

Spain

Tuesday, June 29, 2010

How Big Brother knows where you are

Over at his blog, Matt Blaze has posted the PDF of his testimony before Congress regarding location-based technology in modern cell-phone systems. From the document:

As cellular carriers roll out better location technologies in the course of their business, the location information sent to law enforcement (as transmitted from the carrier's call database in (near) real time in response to a wiretap order) is becoming more and more precise. The current base station or sector ID paradigm is becoming less important to carriers, and as networks improve, sector data is increasingly being linked to or supplanted by an accurately calculated latitude and longitude of the customer's handsets.

The document ends with a chilling observation:

Cell phone location information is quietly and automatically calculated by the network, without unusual or overt intervention that might be detected by the subject. And the "tracking device" is now a benign object already carried by the target -- his or her own telephone.

None of this is new or suprising, I guess; it's all been covered on "24" for years now. But Professor Blaze puts it clearly and simply, and it's an interesting (and quick) read.

The Flash Crash and Quote Stuffing

There is a fascinating and detailed essay about the May 6th "Flash Crash" online at www.nanex.net.

You should read the entire detailed analysis at the Nanex site, but here are the two big items that jumped out at me:

Inaccurate quotes from the NYSE caused other trading systems to believe they were observing a downward momentum that wasn't (initially) actually present, but when those other trading systems acted on that inaccurate data, they in fact caused the behavior that they thought they were merely observing (i.e., a feedback loop):

quotes from NYSE began to queue, but because they were time stamped after exiting the queue, the delay was undetectable to systems processing those quotes. The delay was small enough to cause the NYSE bid to be just slightly higher than the lowest offer price from competing exchanges. This caused sell order flow to route to NYSE -- thus removing any buying power that existed on other exchanges. When these sell orders arrived at NYSE, the actual bid price was lower because new lower quotes were still waiting to exit a queue for dissemination.

...

Because many of the stocks involved were high capitalization bellwether stocks and represented a wide range of industries, and because quotes and trades from the NYSE are given higher credibility in many HFT systems, when the results of these trades were published, the HFT systems detected the sudden price drop and automatically went short, betting on capturing the developing downward momentum. This caused a short term feed-back loop to develop and panic ensued.

Certain trading systems exhibited strange and unexpected behavior for which there still does not appear to be a valid explanation:

We decided to analyze a handful of these cases in detail and graphed the sequential bid/offers to better understand them. What we discovered was even more bizarre and can only be evidence of either faulty programming, a virus or a manipulative device aimed at overloading the quotation system.

The quote stuffing analysis is quite disturbing and serious. Was this in fact a malicious event? I am inclined to think it was more likely a mistake. As a software engineer, my first reaction is to favor the "faulty programming" hypothesis.

When you look at the charts and graphs in the quote stuffing section of the analysis, you'll see that they certainly look like artificially-induced, synthetically-generated data patterns.

Like test data, that is.

At work, we have an ultra-high-volume testing tool that we call the "submit-a-tron", which is designed to stress-test our server by intentionally subjecting it to implausible and overwhelming activity. Once in a while, somebody accidentally mis-configures their test system and mistakenly points an instance of the submit-a-tron against a valid production server, which then has us all running around cleaning up the mess for a few days.

Perhaps the software development staff at one of these trading houses accidentally did something similar (that is, accidentally performed an internal stress test intended for use with internal testing systems while pointed at the real production stock exchange channels)?

An ode to Leo

Nice essay, Joe!

In the most likely scenario, all Argentina have to do at this point is, consecutively: beat Germany, beat Spain, and beat Brazil. In the next twelve days. If they do it, they'll deserve to discuss that run for decades. Can it be done?

Regarding instant replay in soccer

I'm with The Daily Mash (*) on this one.

(*) For those who aren't in the loop, The Daily Mash are like The Onion, but with a funny accent.

A classic paper on VMWare

Some papers are classics, and can be read over and over again.

I recently returned to a paper I'd read more than 5 years ago, Carl Waldspurger's Memory Resource Management in VMWare ESX Server, and was amazed at how fresh, readable, and fascinating it remains.

If you've never explored the internals of VMWare, if you aren't familiar with terms like ballooning, content-based virtual memory page sharing, or the idle memory tax, you'll want to read this paper.

Every time I (re)read about the idea of ballooning, I'm struck by how simple, yet perfectly appropriate, this idea is. The paper introduces the problem as follows:

In general, a meta-level page replacement policy must make relatively uninformed resource management decisions. The best information about which pages are least valuable is known only by the guest operating system within each VM. Although there is no shortage of clever page replacement algorithms, this is actually the crux of the problem. A sophisticated meta-level page replacement algorithm is likely to introduce performance anomalies due to unintended interactions with naive memory management policies in guest operating systems.

...

Suppose the meta-level policy selects a page to reclaim and pages it out. If the guest OS is under memory pressure, it may choose the very same page to write to its own virtual paging device. This will cause the page contents to be faulted in from the system paging device, only to be immediately written out to the virtual paging device.

...

A small balloon module is loaded into the guest OS as a pseudo-device driver or kernel service. It has no external interface within the guest, and communicates with ESX Server via a private channel. When the server wants to reclaim memory, it instructs the driver to "inflate" by allocating pinned physical pages within the VM, using appropriate native interfaces. Similarly, the server may "deflate" the balloon by instructing it to deallocate previously-allocated pages.

The balloon is a program that performs an incredibly useful function, simply by allocating some memory when you ask it to, and then, later, deallocating that memory. It's brilliant!

Apparently, many of the ideas in this project came from experiences with the Disco project at Stanford in the mid-1990's, a project I hadn't paid much attention to at the time. I'll try to go track down some of those references and see what I learn from that.

Sunday, June 27, 2010

It would be a most appropriate name

Germany's World Cup goalkeeper is named Manuel Neuer, but the font on the back of the jersey was a bit unfamiliar to me and so I thought his name was Never. Since he has almost never allowed a goal in this tournament (perhaps with a spot of assistance from the linesman :) ), it seems like it would be a most appropriate name.

Can he continue to be Mr. Never against that phenomenal Argentine side (who received a spot of assistance of their own today; as they say, the good make their own luck)? One way or another, it promises to be a tremendous quarter-final!

Thursday, June 24, 2010

Distributed Data Structures

I'm trying to catch up with some of the fundamental ideas behind giant Internet systems such as Google, Amazon, Facebook, etc. Modern systems in these areas include things like: Map/Reduce, Hadoop, GFS, Project Voldemort, Dynamo, etc. So I'm slowly working my way through some of the underlying research papers on the subject.

One early paper in this area is: Scalable, Distributed Data Structures for Internet Service Construction. This project demonstrated many of the basic principles about distributed data structures:

In this paper, we bring scalable, available, and consistent data management capabilities to cluster platforms by designing and implementing a reusable, cluster-based storage layer, called a distributed data structure (DDS), specifically designed for the needs of Internet services. A DDS presents a conventional single site in-memory data structure interface to applications, and durably manages the data behind this interface by distributing and replicating it across the cluster.

The extrememly important data structure that the paper discusses is called a distributed hash table:

The API provides services with put(), get(), remove(), create(), and destroy() operations on hash tables. Each operation is atomic, and all services see the same coherent image of all existing hash tables through this API. Hash table names are strings, hash table keys are 64 bit integers, and hash table values are opaque byte arrays; operations affect hash table values in their entirety.

The basic implementation described in the paper uses a collection of "storage bricks", with a two-phase-commit update layered above the bricks, issuing updates to all replicas of a hash table partition in a consistent fashion. As we will see in later discussions, many other systems have relaxed these aspects of the implementation for a variety of reasons.

Perhaps the most interesting part of the paper is the metadata management, termed "metadata maps" in the paper:

The first map is called the data partitioning (DP) map. Given a hash table key, the DP map returns the name of the key's partition. The DP map thus controls the horizontal partitioning of data across the bricks.

Their implementation uses the trie data structure, one of the grand old ladies of search trees, to index the key space bit-by-bit from LSB to HSB:

the DP map is a trie over hash table keys; to find a key's partition, key bits are used to walk down the trie, starting from the least significant key bit until a leaf node is found. As the cluster grows, the DP trie subdivides in a "split" operation.

The paper continues with lots of discussion of implementation details, performance experiements, and a great section discussing how the distributed hash table ends up getting used in the building of higher-level services:

The hash table was a resounding success in simplifying the construction of interesting services, and these services inherited the scalability, availability, and data consistency of the hash table.

I think that the primary reason that this paper is still read today, a decade later, is that it is very practical, and quite concrete, and quite approachable, even if (as we'll see in future posts) many other systems have built upon this early work by violating most of its basic assumptions in order to see what sort of systems resulted.

This paper is a great "first paper" for those trying to learn about the construction of modern Internet-scale services, and the UC Berkeley Computer Science Ninja project was an important early gathering of researchers in this area. In the last decade, many of the early researches have founded research teams at other institutions to continue this work, and I'll return to some of those ideas later.

Wednesday, June 23, 2010

We're through to the second round!

Today's USA v. Algeria game was fascinating and exciting, and what a tremendous finish! Algeria played extremely well, the US were well fortunate to avoid early disaster when a spectacular Algerian strike met the crossbar in the 6th minute.

Meanwhile, the Guardian reminds us that Algeria are famous for being the victims, nearly 30 years ago, of an incident which changed the entire format of the World Cup. The reason today's USA v. Algeria and England v. Slovenia games were played simultaneously is explained nicely in the Guardian article. If you've never heard of this event, this brief video does a nice job of capturing the flavor; it will all make sense once you've read the article.

Bravo to the teams which advanced today, and bravo as well to Algeria and Slovenia. This was no easy group; all 4 teams played superb football throughout.

Tuesday, June 22, 2010

The Two Escobars

Today is, more-or-less, the 16th anniversary of a very famous, and horribly tragic, event in sports: the "own goal" of Andres Escobar, the young superstar of the Columbia soccer team, versus the United States, in the 1994 World Cup.

Timed to coincide with the anniversary, ESPN Films have released The Two Escobars, a documentary about Andres Escobar and Pablo Escobar (who have the same family name but were not in any way related).

If you have time, I recommend the film. It is quite well made, very moving, and very interesting. It's not for children, though; it is an important movie about an important subject.

And, lest you think it is ancient history and not worth dwelling on, last weekend provided a frightening reminder that it is still all too relevant.

Monday, June 21, 2010

Happy Father's Day

I quite enjoyed this essay from Sunday's New York Times. You should read the entire thing, but to whet your appetite, a few sentences wholly out of context:

From his ship, he had once written her, "I like the morning watch best, in spite of having to get up at 3:15 a.m. to start it, for there is something very pleasant about starting the watch in total darkness -- often with no visibility whatever, and gradually, imperceptibly, have a little light steal over the ship, coming from no apparent source."

As the author so vividly concludes:

My comfortable present swung like a door giving on the past as I realized that this man had not been put here solely to buy me Good Humors and make sure that I got to bed on time.

The scorers have arrived!

Many have made note of the low-scoring nature of the World Cup matches so far. But this is often the case, early in the tournament; teams are initially tentative and cautious, and defensive preparations often carry the result in the early games.

But starting with Brazil's 3-1 thrashing of Cote d'Ivoire yesterday, and continuing with Portugal's 7-nil annihilation of North Korea this morning, it is clear that the offenses have arrived, and goals are now to be had. I could barely blink this morning
between Portuguese goals.

So now, who next? Will Spain be the next side to break out in a big way? Or will it be Chile, one of my favorite squads, as they face the Swiss?

Tune in and see!

Quality Assurance in automated trading algorithms

Sunday's New York Times business section carries an article about a bug in an automated trading program. The article concerns the results of a specific family of Quantitative Funds:

Quantitative funds, after all, are the "black boxes" of investing -- portfolios run by managers who generally try to generate profit with computer algorithms that they don't share with outsiders, or even their own investors.

When you put your money into one of them, you are trusting not only that the overall strategy is sound, but also that its algorithms make sense and, furthermore, that they have been translated properly into computer code.

The article describes how AXA Rosenberg, the quantitative analysis firm in question, uncovered the problem, and links to several fascinating documents from AXA Rosenberg describing how the bug was discovered, and what its impacts were. From their April 15th letter:

We have been working on a number of projects -- in particular an insight into how to model rapidly emerging, transitory risk (our new "state contingent model"). In this process, we discovered the coding error in the scaling of the common-factor risks in the optimization process associated with an earlier upgrade of our risk model.

Modeling and simulating real-world systems inside the data structures of a computer program is an extremely powerful technique, used throughout the software industry. As the systems become more complex, and the models become more complex, verifying the accuracy of the models becomes simultaneously more important and harder.

Review and open critique are about the only way to catch such problems, I think; you need to exploit Linus's Law: "with enough eyeballs, all bugs are shallow". Of course, this is very hard to square with the culture of secrecy cultivated by the quant community; as seen above, these are "algorithms that they don't share with outsiders, or even their own investors".

So it is particularly interesting that the Times article goes on to discuss the question of whether it was possible to detect the presence of the bug, even without having direct access to the algorithms and to their risk models:

One question is whether AXA Rosenberg itself -- or the various mutual fund groups, financial advisers and consultants that have used its services -- monitored its operations with sufficient rigor.

The Times article notes an essay by a principal at another quant firm, which directly addresses this question. In the essay, Michael Markov and Kushal Kshirsagar say:

Because the “coding error” apparently impacted risk controls, we examined two basic risk measures that are routinely used by both fund managers and investors to evaluate and monitor investment products: Beta and Tracking Error.

Their article presents their analysis, with charts showing that the effects of the bug were visible, with the right data, but were not visible if the same analysis was performed with less detailed data.

It is worth stressing that such an apparent aberration in the fund’s risk profile could be most clearly seen using daily data. Unfortunately, investors typically use monthly data even though daily returns are now easily available from data providers (e.g. Lipper), public sources (Yahoo, Google, etc.), funds and custodians. When using monthly data, a longer history needs to be used to have sufficient observations to estimate the regression. This longer history may cloud, or, as in the case of the Laudus Rosenberg fund, completely transform the picture.

It seems unlikely that we will see the rise of open source quantitative analysis. Trading algorithms are likely to remain proprietary, protected by patents, trade secrets, and other similar legal constructs. Would the world be a better place if open source techniques were adopted in more arenas, besides the low level of operating systems and databases where they are currently successful? I'm not sure. But I do feel like the financial industry could do with a very large dose of "openness" being delivered to them, and it would certainly be nice to feel like, in my lifetime, we will get to a point where these banks and traders stop holding the world hostage with their too-big-to-fail, too-complex-to-understand, too-buggy-to-operate-correctly computer based financial products.

Saturday, June 19, 2010

Super Eagles, White Eagles, Black Stars, Green Dragons

I enjoyed this nice writeup, with references, tracing the nicknames of the various squads in the World Cup.

Go learn about General Xatruch, about King Alfonso, about the Fennec, and about the Guarani.

Can it really be true that nobody is quite sure why Richard Lionheart added the third lion to the coat of arms?

Thursday, June 17, 2010

Lunch with a view

I'm actually surprised this isn't more common. It seems like the ironworker equivalent of having the taco truck roll up to your site at 11:45 AM.

I suppose it means that the days of Charles Ebbets's classic photograph are over...

Wednesday, June 16, 2010

Looking for a simple introduction to DTrace

DTrace, if you didn't already know, is the kernel instrumentation package that (I think) originally was built for Sun Solaris, then has more recently been adopted by FreeBSD and possibly other operating systems.

I was trying to understand some NFS behaviors on a Solaris system, and I was thinking that maybe DTrace would help me, so I went looking around the net for simple introductory material such as "dummies guide to DTrace" or "DTrace for old dinosaur programmers like Bryan", but I failed to find anything helpful in my several hours of hunting and poking.

I found lots of references to the DTrace Toolkit.

And I found several massive books I could read.

But I didn't quite find the simple easy-to-approach guide that I was hoping for.

Any pointers to good basic material to help an old man like me get going with DTrace?

Meanwhile, I figured out that between 'truss' and 'snoop -v', I could get a fairly complete picture of what my NFS requests were doing, so for the time being it's not an emergency, but it's still one of those "things I wish I knew about and had in my toolkit", so I'll try to return to it at some point.

Tuesday, June 15, 2010

Rules are rules

The U.S. press attache in Pretoria keeps us up to date:

Papers reported the Nigerians were not allowed to take their green and white-colored live chickens into the game against Argentina in Johannesburg. (And as a result they lost? Hmmm.) The beleaguered spokesman of the South African Organizing Committee was forced to give a press conference on the 'no pets' rule

STM, GC, and other hot Java topics

I haven't spent much time blogging recently; all my spare cycles are spent keeping track of events overseas.

But I wanted to briefly take note of Cliff Click's wonderful review notes from his whirlwind conference tour in early June.

The first note is about Software Transactional Memory, and describes Dr. Click's visit to Portugal to meet with Dr. Cachopo's team at Lisbon Technical University. The work on JVSTM apparently much impressed Click, as he wrote:

This is the first time I've heard of an STM being used in a production setting - with success. The system is clearly alive and well, the STM plays a crucial role in both the performance and ease-of-maintenance in the system. I've talked with a number of people who tried really hard in the past to apply an STM to their particular situation, and for whatever the reason those systems eventually got declared failures.

...

My more dedicated readers can probably detect a shift in my attitude towards STMs here. To be fair, this Microsoft blog summarizes my attitude towards STMs for the past three years... and my opinion has mostly not changed. In short: highly negative, and lacking a killer app. I think Joao may have found the killer app for STMs.

Click's second note reviews his experiences at the combined ISMM/PLDI 2010 conference. As usual, his notes are very dense, but detailed enough to help the interested reader decide which talks/presentations are intriguing enough to be worth further study.

I'm very interested in high-end tools for debugging, tuning, and comprehending the behavior of the enormously complex software systems that are being built in Java nowadays, so I was particularly interested in:

Evaluating the accuracy of Java Profilers

Adversial Memory for Detecting Destructive Races

If I find some time I'll try to look up more information about these projects.

Now, back to soccer... :)

Thursday, June 10, 2010

Ten Tactical Questions

I loved this posting by Tom Williams on his Football Further blog about the tactics questions that are on everyone's mind as we head into tomorrow's matches.

As Williams says in the introduction to his blog,

There is often little time for thinking about how the game works. Hence this blog. The purpose of Football Further is to understand football better. How do sides set themselves up tactically? Why do teams win and lose? What actually happens on the pitch?

Here are William's 10 questions; you'll want to read his post for his full thoughts and opinions about these topics:

1. Will freshness or preparedness prevail in Group A?

2. Will France's 4-3-3 work?

3. Can Capello get the best out of Rooney?

4. Will Maradona go with three at the back?

5. Will Dunga's vision for Brazil be vindicated?

6. How will Chile's 3-3-1-3 formation fare?

7. Can Spain get the balance right?

8. Will 4-2-3-1 continue to dominate?

9. How many three-man defenses will we see?

10. Will there be any innovation?

To his last question, I am certain the answer will be: yes!

Particularly here in the USA, where we have so little exposure to football at the highest levels, it's great to find people who are taking the time to try to educate the casual enthusiast (that is, me) about the game. The more you know, the more you can appreciate what is happening on the field.

Wednesday, June 9, 2010

Barely 36 hours to go until WC 2010!

... and the Boston.com Big Picture blog has some gorgeous pictures to get you ready!

Tuesday, June 8, 2010

Can you copyright a chess move? A chess game?

Somewhere in the intersection of Internet culture, intellectual policy law, and the game of chess, comes this interesting lawsuit in Germany over whether you can copyright the moves of a chess game.

In addition to the extensive details at the ChessVibes site, you might also find these essays by Mike Masnick interesting for his take on things.

Of course, most of the commenters seem to have little knowledge of the actual game of chess. Although it seems like a finite game, and in fact in some ways it sort of is, the number of actual different chess games is so large that being able to copyright an entire chess game does not seem like a totally silly notion to me.

Personally, I have always felt that a chess game does in fact "belong" in some way to the two players who played it. Surely ever chess player who's spent any time whatsoever on the game knows Morphy v. Duke of Brunswick, and has always credited Paul Morphy (now dead 125 years) with the creation of and "rights to" the game, in whatever sense that means.

Goldman Sachs: Economics and the World Cup

OK, OK, maybe I'm just a little over the top with this World Cup thing.

But this report from Goldman Sachs is surprisingly interesting. Enough football to keep the economics from becoming boring, but enough real information so that you can justify reading yet more about the World Cup.

HFT and the May 6 volatility

It's now been more than a month since the stock markets went crazy on May 6, 2010, an event often referred to as the "Flash Crash". Various people have attempted to explain what happened during those 15 minutes; I recently read two such descriptions: in Newsweek, and in the Financial Times. Both were interesting perspectives, but neither seemed to be breaking ground with any new conclusions.

Newsweek points out that there is a lot of money at stake:

While most long-term investors lost their shirts during the Great Panic of 2008, high-frequency traders posted huge profits. "That was the Golden Goose era," says Narang, whose HFT shop launched in March 2009 and just finished its most profitable month.

And Newsweek also provides a nice history lesson, explaining some of what's been going on over the last decade:

You have to go back a decade, to the birth of HFT in September 2000. That month, then-SEC chairman Arthur Levitt, eager to push the market into the digital age, ordered exchanges to implement "decimalization" -- i.e., allowing stocks and options to be listed in one-cent increments rather than 12.5-cent ones.

...

But the combination of decimalization and the advent of new electronic exchanges where buyers and sellers could meet one another directly made life difficult for market makers. Some traditional Wall Street firms folded their market-making desks, while the survivors doubled down on technology and speed.

As a result, trade volume exploded. The average daily volume of equity shares traded in the U.S. zoomed from about 970 million shares in 1999 to 4.1 billion in 2005 to 9.8 billion last year.

And Newsweek also makes a very specific observation about the Flash Crash:

The sell-off gained speed as stop-loss orders were triggered once prices fell a certain amount, and many large institutional investors dumped stocks by the truckload.

It's interesting to me that, in many cases, these "stop-loss" orders may have actually created a loss. For example, suppose that you were holding some shares in Accenture, which you had purchased at $35 earlier in the year, and were feeling good about, since Accenture was trading over $40. Since you were worried that the stock might fall, you had placed a "stop-loss" order to sell at $30. Then, during the crazy 15 minutes, Accenture stock dropped from $40, all the way down to $0.01, then rebounded and closed at $39. But when your "stop-loss" order saw the price drop below $30, and issued an automated "sell", and finally was executed at $20, say, you ended up selling your stock, which was worth $40 at the start of the day, and $39 at the end of the day, for $20, thus creating a loss where none had actually existed.

I believe that many of these trades were the ones that the exchanges tracked down and cancelled, which seems like a reasonable behavior, even though I still don't understand how the exchanges can arbitrarily cancel such trades. Although you lost big with your stop-loss sell order, the counter party actually made money, so how was the exchange justified in taking their money away?

The Financial Times article focuses on an upcoming SEC event, scheduled to be held tomorrow I believe:

On Wednesday, the regulator is hosting a day-long Washington debate on the topic, involving many leading industry participants. "All market reform will be looked at through the prism of what happened on May 6," says William O'Brien, chief executive of DirectEdge, one of the four main public exchanges for US shares.

The article goes on to note that the markets can be, and are, quite mysterious:

"The real shocker is that it was nothing nefarious that caused the crash," says David Weild, senior advisor to Grant Thornton and former vice-chairman at Nasdaq. "It was acceptable investor behavior -- people trying to put on hedge transactions," he believes.

After 30 years of designing and building systems software, I'm quite accustomed to this sort of thing. It's almost routine to experience software which is running correctly, as designed, yet does something completely bizarre and unexpected in just the right circumstances. Relatively simple algorithms, when encoded into modern computers and executed zillions of times at lightning speed, can exhibit strikingly unusual behaviors. People who try to understand such events use terms like "emergent systems" or "unintended consequences" to describe what's going on, but what is really happening is just that our systems are more complex and intricate than we comprehend.

Still, just as computers have enabled us to create these systems, computers also enable us to work to tame them. So long as the systems are open to all, fairly and clearly operated, people will design software that works reliably and acceptably with them. The FT article makes the point that the regulation agencies have some technological catch-up to do:

Collecting data by fax may seem hopelessly 20th-century in an age when trading is conducted hundreds of times faster than the blink of an eye, writes Jeremy Grant.

But that is how Scott O'Malia, a commissioner at the US Commodity Futures Trading Commission, recently said his agency was still gathering certain kinds of information from traders.

If the federal government wants to help improve the markets, it should fund the regulatory agencies appropriately so that they can at least keep pace with the organizations they are attempting to oversee.

In a way, I find this discussion fairly similar to the debate going on over "Network Neutrality" and the Internet. The great success of the Internet is often, and correctly, attributed to something known as the End-to-End principle, which is that the Internet works best when it is simply a fair, open, and un-biased provider of basic services, and usage of and development of applications which take advantages of those services happens entirely at the "endpoints".

The debate over market operations, and HFT, seems to me to be metaphorically quite similar to the debates that occur in the Internet world involving traffic routing. Quite frequently, organizations will show up with an idea about how to "fix" the Internet, by including some new routing protocol, or traffic prioritization scheme, or other network asynchrony. But time and time again it has become clear that these methods actually worsen the problems they are intended to fix. Only the end-users of the Internet actually have the necessary information and tools to be able to design applications that adapt properly and behave as expected under a variety of network conditions.

It seems counter-intuitive that the way to make a network, or market, more effective, fair, efficient, and reliable is to make it simpler; most people's intuition when something is not working is to add complexity, not to remove it. But the End-to-End principle has 40 years of success behind it, and is well worth study and appreciation.

Monday, June 7, 2010

Texas and Nebraska

As most of the world waits eagerly for Friday, when the World Cup begins, here in America we are watching the drama surrounding the college football national championship. Currently, this championship is determined using a mechanism that goes by the ungainly name of the Bowl Championship Series, a system which has been around since 1998 but really dates back more than 5 years farther, to the Bowl Coalition. The BCS has had plenty of critics, all the way up to and including President Obama, who announced in a 60 Minutes interview in November 2008 that he was going to "throw my weight around a little".

Nothing seemed to happen for a while after that; of course, the president was busy with the economy, foreign affairs, etc. But people definitely wondered about what he would do. Then, about 6 months ago, the administration actually did something, announcing that it would conduct antitrust reviews of the BCS contracts.

Over the last week, there has been lots of sudden activity, with various administrators of various conferences announcing, or leaking, lots of possible plans. Apparently, the whole thing boils down to Texas and Nebraska, with many scenarios showing the Big 12 collapsing, with half the teams, led by Texas, joining with the Pac-10, and the other half, led by Nebraska, heading for the Big-10.

If you're confused, and most of us are, about what's really going on here, all you really need to know is that college football in America is entertainment, and it's Big Money. How much money? Billions of dollars! The best person that I've seen for explaining this is Yahoo's Dan Wetzel: start here, and then read this for the more recent updates.

Saturday, June 5, 2010

Six days until WC 2010!

Everybody's getting ready, and looking forward to the festivities!

Of course, it's too bad that the Department of State is a bit confused about last century's history:

When the United States hosted the Cup in 1998, ...

Details, details. At least State have noticed that the world's most significant sporting event is just about ready to launch, so we'll give them some credit!

Friday, June 4, 2010

PG Forever

I don't get as much time as I want to play computer games, but I am quite a game lover. One of my favorite computer games was SSI's Panzer General, and its cousin Allied General. I felt those games struck the perfect balance between complexity and fun.

So I was interested to read on Greg Costikyan's PlayThisThing that somebody has done a "fan remake" of Panzer General, called PG Forever. There's a long and detailed description of the PG Forever work in this forum post. The game is apparently available from the developer's website.

If I get some time (hah!) I'll give the game a try. If you've tried it, and have feedback, let me know!

Tuesday, June 1, 2010

Database language wars and web browsers

The web browser community is preparing to re-live the database language wars of the 1980's. Nowadays it's hard to remember what the world was like in the 1980's before SQL was the standard and every vendor implemented at least some version of it. Back in the 1980's, each DBMS vendor tended to have its own language: DB2 and Oracle had SQL, but Ingres had QUEL, ADABAS had NATURAL, Model204 had User Language, IDMS had Online English (I think), Cincom had TOTAL, ADR DATACOM had their own language, and so forth.

It was, put simply, a mess.

Near the end of the 1980's it became clear that SQL had won, and all the other database vendors conceded, and built SQL language interfaces to their systems. All modern DBMS implementations use some form of SQL as their API, and most adhere quite closely to the SQL standards, though there continue to be too many standards and too many inconsistencies.

Now, along come the browser vendors, trying to figure out how to enable persistent local storage for applications that run inside the web browser.

One branch of this effort is Firefox 2's Local Storage, which has become Web Storage, a persistent key-value local storage system.

But local storage has very little in the way of data structuring and query structuring (structuring, of course, is the "S" in SQL...), so the clamor has arisen for providing some sort of "true" database storage layer for web browser-based applications. Why not use SQL? It's well-known, proven, powerful. Indeed, this was the direction that things seemed to be heading, as Safari, Google Gears, and other systems started providing SQL-style APIs for browser-based applications. But this approach did not find favor at Mozilla:

at Mozilla we've felt that exposing SQL directly to the web, with all of its incompatibilities and under-specification is likely to cause huge amounts of pain down the road for developers as incompatible versions emerge each with very different performance characteristics.

So what is Mozilla's alternate suggestion? Over the weekend, they set it out, in a pair of postings.

In the first article, they explain more about their concerns with SQL, and why they sought an alternative:

despite the ubiquity that SQL enjoys, there isn’t a single normative SQL standard that defines the technology
...
In order to really get Web SQL Database right, we’d have to first start with defining a meaningful subset of SQL for web applications. Why define a whole other language, when more elegant solutions exist within JavaScript itself?

I'm not really sure why Mozilla don't believe that a SQL standard exists, and adquately defines the technology. SQL-92, after all, has been available for nearly 20 years, and is, if not perfect, surely one of the better-known standards in the computing world. In fact, the strongest argument for using SQL is precisely to avoid the mistake of "defin[ing] a whole other language". But maybe I'm misunderstanding Mozilla's complaint. Perhaps it isn't that they are opposed to SQL, or to SQL-92. Perhaps it is that they are frustrated because SQLite, the current implementation of choice among other browser vendors, doesn't actually implement SQL-92, or any other well-known variant of the SQL standard, but instead implements its own generally-similar, but not actually standards-compliant, query language. If that's Mozilla's complaint, then I agree 100%; I too am frustrated by SQLite's co-opting of the SQL terminology without implementing a standard SQL system. Whatever SQLite is, it isn't SQL. If any group deserves some heat for defining "a whole other language", it should be SQLite, and I think it's a shame that their early success in getting vendors like Apple and Google to accept their technology has made it so hard to critique and improve their offering.

In their second article, the Mozilla team go on to describe their preferred alternative, which is to get behind the Indexed Database API, a new web browser standard coming out of the W3C. The Mozilla team say:

As a counter-point, both Microsoft and Mozilla have expressed interest in the indexed sequential storage API edited by Oracle's Nikunj Mehta as a more logical choice for indexed databases on the web than existing SQL API implementations. It's a pretty simple API that is low level and is targeted at library developers who can build database tools around it, including implementations like SQL or CouchDB.

The Mozilla team describe their experiences at a developer "summit", where

We watched as developers whiteboarded a simple BTree API that addressed their application storage needs, and this galvanized us to consider other options. We were resolved that using strings representing SQL commands lacked the elegance of a “web native” JavaScript API, and started looking at alternatives.

Well, good luck, browser world. I'm afraid that I know from 30 years experience that inventing database query languages is a hard business, even if everybody seems to want to do it. I'm afraid I have a sad feeling about what's going to happen when a group of people from Mozilla, Google, Microsoft, and Oracle get together to try to define the brave new query language for the brave new world of structured data storage in the browser. Hopefully I am being too pessimistic about this, and these folks will be successful, and their query language will be powerful, useful, and widely adopted.