Friday, August 28, 2015

GSOC 2015 wrap-up

The 11th year of Google Summer of Code comes to an end.

In April, we accepted 1,051 university students from 73 countries. These students wrote code for 137 mentoring organizations. We also had 1,918 mentors from 70 countries help them out.

Unfortunately, I haven't kept close enough records, but I believe this was my 7th year of participating in GSoC, although one year doesn't count because I messed up the paperwork.

This year, Abhinav Gupta and I worked together on some very interesting and complex projects in Derby. Notably:

  • We addressed several security vulnerabilities in the Derby XML processing logic, which could be exploited by malware to attempt information disclosure attacks on computers running Derby software. (DERBY-6807)
  • We re-designed and re-implemented the data structures and algorithms which Derby uses to track which columns in which database tables are referenced by the database triggers that are in effect. This became more complex recently when Derby added support for trigger "WHEN clauses", which introduce new opportunities for database table references to need to be resolved during trigger execution. (DERBY-6783)
  • We re-factored the DRDA exception-passing logic in the Derby client-server libraries so that client and server code could cooperate on the encoding of exceptions without needing to duplicate code in the underlying code base. As a result, we added the ability for Derby to throw Derby-specific sub-classes of SQLException with additional information available for user applications to access. (DERBY-6773)

I really enjoyed working with Abhinav on these projects, and I hope he enjoyed learning more about Derby and more about the Open Source development process.

And thanks again to Google for your support of this program.

Eleven years, wow!

Thursday, August 27, 2015

Writing a byte and then reading it back

The always-wonderful Raymond Chen posted a completely-brilliant description of an old bug:

There was a bug that said, "The splash screen for this MS-DOS game is all corrupted if you run it from a compressed volume."

What was the real problem? Well, Chen explains:

  • The Windows 95 I/O system assumed that if it wrote a byte, then it could read it back
    The optimization above relied on the property that writing a byte followed by reading the byte produces the byte originally written. But this doesn't work for video memory because of the weird way video memory works. The result was that when the decompression engine tried to read what it thought was the uncompressed data, it was actually asking the video controller to do some strange operations. The result was corrupted decompressed data, and corrupted video data.
  • What is the purpose of the bmPlanes member of the BITMAP structure?
    If you have 16 colors, then you need four bits per pixel. You would think that the encoding would be to have the each byte of video memory encode two pixels, one in the bottom four bits and one in the top four. But for technical reasons, the structure of video memory was not that simple.
  • Machine Organization I
    In EGA and VGA, the Graphics Controller (GC) manages transfers of data among the video memory, CPU registers and the latches.
  • ID3D12Resource::Map method
    your app must honor all restrictions that are associated with such memory


There are more things in heaven and earth, Horatio, Than are dreamt of in your philosophy

Tuesday, August 25, 2015

In which people discuss things I don't understand

If you read nothing else, don't miss Matt Buchanan's spectacular essay (first in the list)

  • The Uber Endgame
    One of the more subtle underlying issues with the rise of Uber is the company’s slow siphoning of the political will to fix existing—or build new—public transit infrastructure in major cities. In Affluence and Influence: Economic Inequality and Political Power in America, Princeton Professor of Politics Martin Gilens shows that—as he put it in an article with Northwestern Professor of Decision Making Benjamin Page—“economic elites and organized groups representing business interests have substantial independent impacts on U.S. government policy, while mass-based interest groups and average citizens have little or no independent influence.” As the wealthy—and, as the prices of Uber and Lyft fall, the slightly less so—essentially remove themselves from the problems of existing mass transit infrastructure with Uber and other services, the urgency to improve or add to it diminishes. The people left riding public transit become, increasingly, the ones with little or no political weight to demand improvements to the system.
  • Uber Is Laying The Groundwork For Perpetual Rides In San Francisco
    This week in San Francisco, Uber took a first step toward realizing the vision that Kalanick described. The ride-hail company began experimenting with a new ride option called Smart Routes. The idea is drivers will be able to both pick up and drop off passengers along a specific route, which in turn allows them to quickly pick up their next passenger. For now the company is experimenting with only two routes: Fillmore Street between Haight and Bay, and Valencia Street between 15th and 26th.
  • Uber will partner with University of Arizona for self-driving car research
    Uber is setting up a new self-driving car project at the University of Arizona, according to an email sent out today to university employees. The new project will focus on self-driving car technology, particularly the mapping and optics challenges involved in developing a fully autonomous vehicle. The news comes just months after a major hiring push for Uber's Pittsburgh center, which many complained had hired so many experts away from the local robotics lab that they had effectively gutted competing projects.
  • Platform Cooperativism : Nov 13-14, NYC
    On November 13 and 14, the New School in New York City will host a coming-out party for the cooperative Internet, built of platforms owned and governed by the people who rely on them. The program will include discussion sessions, screenings, monologues, legal hacks, workshops, and dialogues, as well as a showcase of projects, both conceptual and actual, under the purview of celebrity judges. We’ll learn from coders and worker cooperatives, scholars and designers. Together, we’ll put their lessons to work as we work toward usable apps and structural economic change.
  • How The Heavy Hand Of Government Stifles The On Demand Economy
    Ride hailing companies continue to face pressure from courts and politicians who say drivers should be treated as employees rather than independent contractors. Labor unions are pushing this view, while ignoring that many ride hailing drivers are drawn to the flexibility of being independent contractors. (Meanwhile, taxicab drivers in many cities are also considered independent contractors, a fact that is rarely mentioned in these debates.)

Monday, August 24, 2015

It's not just a game, ...

... it's the culmination of a 20-year effort to build a world-class software industry in Poland. Eurogamer chronicles the history for us:

  • Witcher dev making two "AAA+" games for 2014/15
    The Witcher 2 developer CD Projekt is making two blockbuster "AAA+" games for 2014/2015, the company has confirmed to Eurogamer.
  • Seeing Red: The story of CD Projekt
    But those days can wait, because next year Marcin Iwiński will be 40 years old, and it will be 20 years since his CD Projekt adventure began. He dared in those car parks all those years ago, and he has achieved so much. He did not do it alone, he is at pains to point out - at every twist and turn he had help, be it from Michal Kiciński or his brother Adam Kiciński, or Piotr Nielubowicz or Adam Badowski. Without them and many more he wouldn't be here today, sitting before me, wearing a blue hoodie and jeans and a relaxed stubbly smile, surrounded by a company not only continuing to set an example for Poland, but now the wider world as well.
  • The Witcher 3: The Skyrim debate, the game on PS4, nuggets of clarification and a whiff of multiplayer
    He pays attention to what The Witcher fans are saying, and the number-one concern he's seen about The Witcher 3 is that fans think the traditionally tight stories of the series will be sacrificed to fit an open world.

    Not so. "We don't want to make any compromises in storytelling," he told me. "We simply needed to come up with a larger-scale story. That's it. The world is bigger so we need to fill it with good stories.

  • Into the wild: inside The Witcher 3 launch
    As if the Polish Prime Minister wasn't enough, day two had begun with the CD Projekt board having presidential breakfast with Bronislaw Komorowski - remarkable, given that I don't believe Adam Badowski has slept just yet. He disappears for a lie down later when a procession of models and cosplayers from last night's festivities work their way through the office to the accompaniment of drums, delivering invitations to everyone for next week's party. Some 250 people, plus partners, will get together and celebrate their collective achievement. "This will be a time for emotions," Iwiński says.
  • The Witcher 3 was going to include ice-skating - but it was cut
    "At some point we had the idea of ice-skating," he said.

    "It was more than an idea - it was actually a prototype. You were going to ice-skate and fight..."

Sunday, August 23, 2015

I spent the whole weekend reading!

Actually, that's not true. I went to the beach with the cousins on Saturday afternoon.

  • Lessons Learned From Reading Post Mortems
    One of the things I find to be curious about these failure modes is that when I talked about what I found with other folks, at least one person told me that each process issue I found was obvious. But these “obvious” things still cause a lot of failures. In one case, someone told me that what I was telling them was obvious at pretty much the same time their company was having a global outage of a multi-billion dollar service, caused by the exact thing we were talking about. Just because something is obvious doesn’t mean it’s being done.
  • How does a relational database work
    In this simple example, I end up with many possibilities. But a real query can have other relational operators like OUTER JOIN, CROSS JOIN, GROUP BY, ORDER BY, PROJECTION, UNION, INTERSECT, DISTINCT … which means even more possibilities.

    So, how a database does it?

    Dynamic programming, greedy algorithm and heuristic

    A relational database tries the multiple approaches I’ve just said. The real job of an optimizer is to find a good solution on a limited amount of time.

    Most of the time an optimizer doesn’t find the best solution but a “good” one.

    For small queries, doing a brute force approach is possible. But there is a way to avoid unnecessary computations so that even medium queries can use the brute force approach. This is called dynamic programming.

  • bcachefs - a general purpose COW filesystem
    For those who haven't kept up with bcache, the bcache codebase has been evolving/metastasizing into a full blown, general purpose posix filesystem - a modern COW filesystem with checksumming, compression, multiple devices, caching, and eventually snapshots and all kinds of other nifty features.
  • The Programmer's Guide to bcache
    At a high level, bcache's btree is a copy on write b+ tree. The main difference between bcache's b+ tree and others is the nodes are very large (256k is typical) and log structured. Like other COW b+ trees, updating a node may require recursively rewriting every node up to the root; however, most updates (to both leaf nodes and interior nodes) can be done with only an append, until we've written to the full amount of space we originally reserved for the node.
  • Cake Technical Information
    Cake instead schedules packets based on time deficits. If no deficit exists when a packet is requested, it can be sent immediately. The transmit time of the following packet is then calculated, and until that time the shaper is placed in deficit mode. While in deficit mode, packets are scheduled using a watchdog timer whenever a request arrives too soon, and transmission times are calculated for a continuous packet train. This continues until the queue drains; if a packet is requested, but none are available and the next transmission time has been reached, the shaper returns to the quiescent state in which the next packet can be sent immediately.

    Deficit mode makes the burst size dependent only on hardware and kernel latency (including timer resolution), and minimises bursts without requiring manual tuning. Cake's shaper can therefore be set much closer to the actual link speed without jeopardising latency performance. Modern hardware can achieve sub-millisecond bursts in most cases.

  • The impact of fast networks on graph analytics, part 1
    tl;dr: A recent NSDI paper argued that data analytics stacks don’t get much faster at tasks like PageRank when given better networking, but this is likely just a property of the stack they evaluated (Spark and GraphX) rather than generally true. A different framework (timely dataflow) goes 6x faster than GraphX on a 1G network, which improves by 3x to 15-17x faster than GraphX on a 10G network.
  • Multi-million operations per second on a single Google Compute Engine instance
    Often times technology vendors advertise scale-out as a way to achieve high performance. It is a proven approach, but it is often used to mask single node inefficiencies. Without a well balanced system where CPU, memory, network, and local storage are properly balanced, this is simply what we call “throwing hardware at the problem”. Hardware that, virtual or not, customers pay for.

    To demonstrate this, we decided to check Helium’s performance on a single node on Google Cloud Platform with a workload similar to the one previously used to showcase Aerospike and Cassandra (200 byte objects and 100 million operations). With Cassandra, the data store contained 3 billion indices.

  • The most timeless songs, measured using play counts on Spotify
    Until recently, it was impossible to measure the popularity of older music. Billboard charts and album sales only tell us about a song’s popularity at the time of its release.

    But now we have Spotify, a buffet of all of music, new and old. Tracks with fewer plays are fading into obscurity. And those with more plays are remaining in the cultural ether.

  • Designing And Building Stockfighter, Our Programming Game
    Why? Well, a lot of the fun engineering problems in trading are caused by it actually not being reliably the case that you send in an order and it gets unproblematically matched at “the price.” Markets are distributed systems. The exchange’s view of reality and your trading system’s view of reality are, by necessity, separated by the great firewall known as “physics.” For maximum possible results, you have to be able to do things like accurately predict what the future state of the exchange is, because the order you’re composing right now will arrive in the future not the present, while being cognizant that your present view of the exchange’s state is actually the exchange’s past.

Saturday, August 22, 2015

Oh, my goodness! I thought August was when everybody took vacation and nothing got done...

  • Why is Bitcoin forking?
    The problem is that any change, no matter how obvious, can be nixed entirely if it becomes “controversial”, meaning another person with commit access objects. As there are five committers and many other non-committers who can also make changes “controversial” this is a recipe for deadlock. The fact that the block size was never meant to be permanent has ceased to matter: the fact that removing it is debated, is, by itself, enough to ensure it will not happen. Like a committee with no chairman, the meeting never ends. To quote the committer who has pushed hardest for stasis, “Bitcoin needs a leader like a fish needs a bicycle”.
  • The technology behind preview photos
    We started evaluating standard compression techniques to find the best way to compress this data to 200 bytes. Unfortunately, simply entropy encoding the image, with, say, zlib, gets you only a factor of 2. Still too big. We then evaluated a bunch of nonstandard techniques, but we decided it was better to leverage other code/libraries that we had. So, we looked at JPEG image encoding, which is a very popular image codec. Especially since our image is going to be blurred heavily on the client, and thus band-limiting our image data, JPEG should compress this image quite efficiently for our purposes. Unfortunately, the standard JPEG header is hundreds of bytes in size. In fact, the JPEG header alone is several times bigger than our entire 200-byte budget. However, excluding the JPEG header, the encoded data payload itself was approaching our 200 bytes. We just needed to figure out what to do about that pesky header!
  • Getting Garbage Collection for Free
    An example of this occurs when Chrome is showing an animation on a web page. The animation will update the screen at 60 FPS, giving Chrome around 16.6 ms of time to perform the update. As such, Chrome will start work on the current frame as soon as the previous frame has been displayed, performing input, animation and frame rendering tasks for this new frame. If Chrome completes all this work in less than 16.6 ms, then it has nothing else to do for the remaining time until it needs to start rendering the next frame. Chrome’s scheduler enables V8 to take advantage of this idle time period by scheduling special idle tasks when Chrome would otherwise be idle.
  • How the cloud will devour open source
    Take MySQL, for example. The database has changed hands a few times, with Sun acquiring MySQL AB in 2008, then Oracle picking up the asset through its acquisition of Sun the following year. But MySQL, Sun, and Oracle have collectively made a heck of a lot less -- orders of magnitude less -- by selling MySQL-related services than Amazon Web Services has made by selling MySQL ­as­ a­ service (that is, Relational Database Service).

    Nor will MySQL be the last open source project to be more heavily monetized by a cloud giant than by the original developers who brought it into the world.

  • The End of the Internet Dream
    What does it mean for companies to know everything about us, and for computer algorithms to make life and death decisions? Should we worry more about another terrorist attack in New York, or the ability of journalists and human rights workers around the world to keep working? How much free speech does a free society really need?

    How can we stop being afraid and start being sensible about risk? Technology has evolved into a Golden Age for Surveillance. Can technology now establish a balance of power between governments and the governed that would guard against social and political oppression? Given that decisions by private companies define individual rights and security, how can we act on that understanding in a way that protects the public interest and doesn’t squelch innovation? Whose responsibility is digital security? What is the future of the Dream of Internet Freedom?

  • Aggregation and the New Regulation
    It follows, then, that to the degree that governments answer to the people, effective control and regulation of these companies will be even more difficult than regulating the monopolies of old: that’s why Google got its first deal, and it’s why Uber was able to stare down de Blasio. What changed in Google’s case, though, was the Axel Springer article and the widespread attention it received. Similarly, while Amazon is not being accused of antitrust (for now anyways), at least in some small way the company was this weekend forced to respond in a way they usually avoid because of an article. Meanwhile, Uber, seemingly in a worse position politically, emerged from its crisis stronger than ever, confident in its ability to wield the collective influence of its customers to accomplish its political ends.
  • Science Isn’t Broken
    The p-value reveals almost nothing about the strength of the evidence, yet a p-value of 0.05 has become the ticket to get into many journals. “The dominant method used [to evaluate evidence] is the p-value,” said Michael Evans, a statistician at the University of Toronto, “and the p-value is well known not to work very well.”

    Scientists’ overreliance on p-values has led at least one journal to decide it has had enough of them. In February, Basic and Applied Social Psychology announced that it will no longer publish p-values. “We believe that the p < .05 bar is too easy to pass and sometimes serves as an excuse for lower quality research,” the editors wrote in their announcement. Instead of p-values, the journal will require “strong descriptive statistics, including effect sizes.”

  • Sharp Regrets: Top 10 Worst C# Features
    When I was on the C# design team, several times a year we would have "meet the team" events at conferences, where we would take questions from C# enthusiasts. Probably the most common question we consistently got was "Are there any language design decisions that you now regret?" and my answer is "Good heavens, yes!"

    This article presents my "bottom 10" list of features in C# that I wish had been designed differently, with the lessons we can learn about language design from each decision.

  • Eve Version 0
    Version 0 contains a database, compiler, query runtime, data editor, and query editor. Basically, it's a database with an IDE. You can add data both manually or through importing a CSV and then you can create queries over that data using our visual query editor.


    Our original goal was to build a "better programming," one that enabled more people to build software. To that end we set out to find a simpler foundation, a language with few parts that could still produce everything from your vacation planner to machine learning algorithms. We ultimately found our answer in research out of the BOOM lab at Berkeley and took off trying to prove that with such a simple language you could still build real software. We've built compilers, editors, Turing machines, even a clone of Foursquare to prove that our strategy is workable

  • A Children’s Picture-book Introduction to Quantum Field Theory
    In this post I want to try and paint a picture of what it means to have a field that respects the laws of quantum mechanics. In a previous post, I introduced the idea of fields (and, in particular, the all-important electric field) by making an analogy with ripples on a pond or water spraying out from a hose. These images go surprisingly far in allowing one to understand how fields work, but they are ultimately limited in their correctness because the implied rules that govern them are completely classical. In order to really understand how nature works at its most basic level, one has to think about a field with quantum rules.
  • A Beginner’s Guide to Eigenvectors, PCA, Covariance and Entropy
    This post introduces eigenvectors and their relationship to matrices in plain language and without a great deal of math. It builds on those ideas to explain covariance, principal component analysis, and information entropy.
  • Ways to deal with the growing number of CS majors.
    Univ of MD at College Park will have 2100 students in the CS program next year. Thats... a lot! CS is up across the country which is mostly a good thing but does raise some logistical questions. How are your schools handling the increase in the number of CS students? Here are some options I've heard people use.
  • Say goodbye to the weirdest border dispute in the world
    The exchange between India and Bangladesh means that the world will not only lose one of its most unique borders, but it will also lose the only third-order enclave in the world – an enclave surrounded by an enclave surrounded by an enclave surrounded by another state.
  • New Growth in Temescal Alley Means Death for Polymorph Recording
    Set in a cluster of old storage lockers just one block off Telegraph Avenue, the tiny retail corridor feels like it could have sprung from a different century, yet it's also redolent of the handcrafted, pastoral ethos that's characterized new development in Oakland. Temescal Alley has been designated a hipster hotspot in the press and become a go-to destination for First Friday Art Murmur.
  • A Bolivian Subway in the Sky
    The city of La Paz, Bolivia, has long struggled with transportation issues. Steep terrain, high density, and narrow streets have resulted in years of traffic nightmares for fleets of minibuses and private taxis. In the past two years, the government has worked to alleviate this by building the largest urban cable-car system in the world. Currently La Paz has three urban ropeway lines in operation, stretching over 10 kilometers, with plans to triple the size of the network. The city recently announced six new lines, which will extend the aerial system to 30 kilometers and carry up to 27,000 passengers an hour.

Thursday, August 20, 2015

JavaScript is eating the world

OK, so it had been a while since I had restarted Firefox.

But, really, using 7 GB of physical memory?

When I had barely a dozen tabs open?

And closing all those tabs didn't release that memory?


Well, at least restarting it worked around whatever the problem was.

For now.

Restarting my web browser has become the new "when in doubt, use Control-Alt-Delete and reboot your PC".