Saturday, August 27, 2011

Microsoft does look at the WER data

Microsoft Windows systems provide a feature known as WER, which stands for Windows Error Reporting. This is a collection of technology which detects crashes on your PC, and attempts to gather information about what appears to have caused the crash, and to send that information to a central Microsoft server.

When WER intercepts a crash, it asks you for permission to send that crash data to Microsoft. You may have said yes when this happened; you may have said no. Regardless, you may have wondered whether this information went, and what happened to it.

A recent paper presented by three Microsoft researchers discusses some of the things that Microsoft does with the data it gets: Cycles, Cells and Platters: An Empirical Analysis of Hardware Failures on a Million Consumer PCs.

In this particular case, the researchers were trying to understand what they could learn about the frequency and distribution of hardware problems on commodity machines by analyzing the WER data.

While doing the study, the researchers learned a number of things. For example:

our study has found a number of interesting results. For instance, even small degrees of overclocking significantly degrade machine reliability, and small degrees of underclocking improve reliability over running at rated speed. We also find that faster CPUs tend to become faulty more quickly than slower CPUs, and laptops have lower failure rates than desktops. Beyond our results, this study serves to inform the community that hardware faults on consumer machines are not rare, independent, or always transient.

I found the observed higher reliability of laptops over desktops surprising, and the researches did, too:

Although one might expect the typically abusive environment of laptops to make them less reliable, Figure 7 shows the opposite. Laptops are between 25% and 60% less likely than desktop machines to crash from a hardware fault over the first 30 days of observed TACT. We hypothesize that the durability features built into laptops (such as motion-robust hard drives) make these machines more robust to failures in general.

Another conclusion is that while disks wear out, memory and CPU generally doesn't wear out, at least nowhere near as quickly. However, once memory fails it continues to fail:

Almost 80% of machines that crashed more than once from a 1-bit DRAM failure had a recurrence at the same physical address.

It's a pretty interesting and pretty readable paper. It's nice to see that the WER data is actually doing some good, and thanks to Microsoft for sharing the results of their analysis.

It's not just a game

It becomes part of your imagination.

Friday, August 26, 2011

Ten years

Bruce Schneier asks the hard question.

(Furthermore, note that he only asks the monetary aspect, not even venturing into the "how much has been affected in non-monetary changes to our society".)

Thursday, August 25, 2011

Backpacking 2011: Caribou Wilderness

This summer we went backpacking in the Caribou Wilderness, which is near Lassen National Park. Although I've been backpacking in Lassen Park many times, this was my first time in the Caribou Wilderness, and I really enjoyed it.

From BryanBackpacking2011

We got to the Hay Meadow trailhead early on Sunday morning. The trailhead is at 6,400 feet elevation in the Lassen National Forest, and to get there you need to drive about 12 miles of dirt road from Chester, CA; the last 2 miles are pretty rough but once again Mom's truck was up to the job!

On a summer weekend, all California wilderness sites are busy, and the trailhead contained:

  • Two families car-camping together with a 40-foot trailer

  • A group of 4 riders, 4 horses, and 6 dogs, heading out for their Sunday morning constitutional

  • A group of 4 day-hikers from Arizona

It took us a little while to get organized, and by then everyone else was already on the trail.

From BryanBackpacking2011

The southern portion of the Caribou Wilderness is a rolling plateau, thickly forested with lodgepole pine, red fir, and manzanita. Our trail led along a small stream, dried up by late August, through meadows, along ridges, past lovely little lakes nestled in the hollows. Although this land was heavily logged no more than 50 years ago, it has recovered well and it was as pristine and peaceful a wilderness as you could imagine.

From BryanBackpacking2011

After a pleasant day's hike, we set up camp near Posey Lake, a gorgeous lake at almost exactly 7,000 feet high. Great campsites are plentiful near these lakes, and we found one with beautiful views of the lake. The skies were clear and we enjoyed warm campfires under the stars.

From BryanBackpacking2011

On our second day, we explored several of the neighboring lakes, including Long Lake, the largest lake in this area of the wilderness. After wandering along Long Lake's oddly shaped shoreline, we returned to camp for a rousing match of camp horseshoes.

The next day, we set out for a nearby peak. In these heavily forested areas, the summits are easy to find on the topo map, but hard to locate on foot; we made a few false starts before we locked in our destination, only to find when we got there that it was an enormous fractured mass of volcanic scree: sharp, unstable, and nearly impossible to climb. We ascended as far as we could up the shoulder and managed to get a few hundred feet above the valley floor; our reward was marvelous views to the west into the higher peaks of Lassen National Park.

From BryanBackpacking2011

Descending from the rock-pile after lunch, we made our way up the canyon on its far side in search of several un-named lakes that were marked on the map. We soon found that we had found Bear Canyon, as the trees scarred by bears scratching for food, the scat on the valley floor, and several other un-mistakable signs left us convinced that a bear had chosen this remote wilderness valley as its home.

From BryanBackpacking2011

Although the canyon was once heavily logged, and was, more recently, burned by a forest fire, the wilderness is quite healthy here, and there were tens of thousands of saplings rising from the valley floor among the burned tree trunks. Secure in the daylight hours, but sticking close together nonetheless, we made our way up through the canyon, visited the lakes, and returned back to camp for a swim and a hearty dinner.

From BryanBackpacking2011

A few closing notes and observations:

  • The Red Cinder topo map is the correct one. Surprisingly, Amazon doesn't (yet?) seem to sell topo maps, but MapSport's service was great.

  • The best restaurant in Chester is the Red Onion Grill, next door to the bowling alley. It's rather pricey, but that was the best burger I've had in many months (Roger thought so too).

  • The Google SkyMap software is beautiful and really makes star-gazing fun (if you're not surrounded by lodgepole pines). However, it isn't able to find your location when you're in the wilderness, so be prepared to enter your latitude and longitude manually. I also found that the compass/accelerometer seemed to have trouble tracking my movements as I moved the phone around above my head; perhaps with practice I'd learn to control it better.

  • Roger's SteriPen worked well. Although the pre-filtering process was slow and tedious, the pen itself takes only 90 seconds or so to sterilize a one quart bottle of water. Not only is it vastly less effort than working the old manual pumps, the pen also claims to eliminate a much broader set of threats; for example, it states that it kills Hepatitis C Virus in the water. Hopefully that isn't yet a major thread in hiking the Sierra Nevada mountains, but it's good to know.

  • The camp deer were incredibly unafraid of us. We had one and sometimes a pair of deer browsing contentedly just 25 feet from camp, and when I left the campfire for bed one night I was astonished when my flashlight illuminated a deer just 10 feet behind where I was sitting.

  • Lots of swifts and swallows during the evening insect feeding, but sadly no bats. Not sure what that means.

  • The prettiest lake, we agreed, was the middle of the Hidden Lakes, which is rock-rimmed and striking. But Posey Lake was secluded and offered great campsites, and was also warm enough for long comfortable swims.

  • Chuck Norris is said to maintain a vacation retreat in Chester. We didn't meet him at the supermarket, though.

If you've never been backpacking in the Caribou Wilderness, it has much to recommend it; if you go, let me know what you think!

Great "computer cleanup" checklist

I love this computer cleanup checklist from Scott Hanselman. Lots of great tips and pointers, and presented quite clearly.

Saturday, August 20, 2011

I'm heading out to the mountains...

I'll take some pictures, and reconnect with you all in a week.

In the meantime, before I go:

  1. If you don't know what the "Nym Wars" are, you should. "Nym" is "pseudonym", and the topic involves the implications of using pseudonyms on the Internet. Jamie Zawinski provides a great place to start learning.

  2. This month's Wired features an epic, thrilling, and fascinating look into the America's Cup. Excerpt:

    To gain maximum leverage they hang off the boat upside down, facing up, with their feet tangled in the netting and everything past their knees cantilevered over the side. The goal is not to bring our wayward hull back to the water but rather to bring it as close to the surface as possible without touching down. Flying the hull eliminates its drag. Flitting across the water, literally and figuratively on edge, the black carbon-fiber boat takes on a distinctly alien, insectoid grace.

  3. Mike Masnick rounds up the latest analysis of the insane world of software patents (the courts are now trying to decide what can fit "entirely in the human mind"); better, he provides a concrete proposal for how to fix the problem.

  4. A lot is coming out this week, but the U may have established a new low for college sports programs. Is this the straw that forces the nation to repair college athletics?

  5. Lastly, and rarely for me on politics, but since I respect the author(s), James Fallows says: If you read one 'serious' newspaper article this weekend...

    Translated: S&P downgraded U.S. debt because they concluded U.S. government was dysfunctional. There will be pressure to prove them wrong. Let's prove them right!

So there you go. Stay busy, work hard, I'll see you soon!

Friday, August 19, 2011

Constant updating

It seems like my automatic updates wizard finds things constantly, nowadays. My computers update themselves, my phone updates itself, my game console updates itself (and its games update themselves, too).

Firefox version 4 was out in the spring, then version 5 in early summer, and we just got version 6.

And Chrome, which seems to have its own slightly different updating methodology, keeps up as well.

It's not that the updating software is stupid; as Jeff Atwood points out, the vendors have put a lot of effort into this and the updates are reliable and fast.

But there are significant annoyances to the updating process:

  1. All too frequently, updates come with re-boots, and re-boots are quite disruptive. It's not just the 10 minutes that the reboot takes, it's that they disrupt the flow, that special arrangement of my computer and work environment that I've found, over the years, makes me extremely productive.

  2. The updaters also desensitize me to security questions. They're constantly asking me to type my password, and to approve changes to the system, and the point is that you want these sorts of requests to be rare, so that people actually take their time and think before they respond.

Oh, well. It's clearly vastly better than the bad old days, when we used to run unpatched versions of software for years without updates.

I just wish I could ratchet down the annoying-ness of the automatic updates process just a little bit.

Thursday, August 18, 2011

26 years!

And you're more beautiful every day :)

Wednesday, August 17, 2011


I particularly like the float-over tooltip of today's Pierre de Fermat Google Doodle.

Tuesday, August 16, 2011

Taking the utmost care over initializing a parameter

The Transmission Control Protocol (TCP) is one of the most-studied algorithms in all of computer science. It is now 30+ years old, and has been very carefully examined and incrementally improved over those decades.

So when somebody comes along and proposes a fundamental change, it's worth looking into that, and understanding what's going on.

Recently, Vern Paxson, Mark Allman, Jerry Chu, and Matt Sargent have proposed RFC 6298, which, at its core, advises implementers to change the initial value of a single TCP parameter from 3 to 1:

Traditionally, TCP has used 3 seconds as the initial RTO [Bra89] [PA00]. This document calls for lowering this value to 1 second...

As Mark Allman observes in a separate document, while this might seem like a slight detail, it is in fact phenomenally important:

We note that research has shown the tension between responsiveness and correctness of TCP's RTO seems to be a fundamental tradeoff [AP99]. That is, making the RTO more aggressive (via the EWMA gains, lowering the minimum RTO, etc.) can reduce the time spent waiting on needed RTOs. However, at the same time such aggressiveness leads to more needless RTOs, as well. Therefore, being as aggressive as the guidelines sketched in the last section allow in any particular situation may not be the best course of action (e.g., because an RTO carries a requirement to slow down).

So what makes us think, 11 full years since the IETF last gave guidance on the setting of this initial parameter, that it would be best for the entire Internet if it were lowered from 3 to 1?

Well, here it is most illuminating to look at the slides that Jerry Chu presented to the IETF in the summer of 2009, documenting in great detail the findings of an immense survey that Google undertook about the performance behaviors of the modern global Internet. As Chu observes (in these notes):

There are a number of default TCP parameter settings, that, although conservative, have served us well over the years. We believe the time has come to tune some of the parameters to get more speed out of a much faster Internet than 10-20 years ago.

From our own measurement of world wide RTT distribution to Google servers we believe 3secs is too conservative, and like to propose it to be reduced to 1sec.

Why does it matter?

We have seen SYN-ACK retransmission rates upto a few percentage points to some of our servers. We also have indirect data showing the SYN (client side) retransmission to be non-negligible (~1.42% worldwide). At a rate > 1% a large RTO value can have a significant negative impact on the average end2end latency, hence the user experience. This is especially true for short connections, including much of the web traffic.

What's the downside?

For those users behind a slow (e.g., dialup, wireless) link, the RTT may still go up to > 1 sec. We believe a small amount of supriously retransmitted SYN/SYN-ACK packets should not be a cause for concern (e.g., inducing more congestion,...) In some rare case the TCP performance may be negatively affected by false congestion backoff, resulted from dupacks caused by multiple spuriously retransmitted SYN/SYN-ACK packets. We believe there are techniques to detect and mitigate these cases.

As RFC 6298 succinctly observes:

Choosing a reasonable initial RTO requires balancing two competing considerations:

1. The initial RTO should be sufficiently large to cover most of the end-to-end paths to avoid spurious retransmissions and their associated negative performance impact.

2. The initial RTO should be small enough to ensure a timely recovery from packet loss occurring before an RTT sample is taken.

I enjoy reading the IETF discussions because the team does such a wonderful job of documenting the discussions, sharing both the behind-the-scenes discussions as well as the final work, and making all of their findings open to all. I wish all technical societies did this; just imagine how much smarter we all could be if information like this was always shared so freely.

Extremely persistent cookies

Think you know how to clear your cookies when you are browsing?

Think you know all the ways that web sites track you and where your browsing takes you?

You haven't even begun to learn about the subject unless you read this marvelous article by Ashkan Soltani about how modern web sites use combinations of cookies, Flash Stored Objects, HTML 5 local storage, stored Javascript, and cache ETags to track your activity.

Thorough, detailed, clear, well-presented. Just what you want from a technical article.

Monday, August 15, 2011


On the Internet, predictions are free, so here's mine:

This is actually good news for HTC, LG, Ericsson, Samsung, etc. Google has neither the interest nor the desire to be a consumer-facing device manufacturer; they are a software company and they know it. I predict that, within 18 months, MoMo will be spun back out as a smaller but still viable company; Google will retain the patent portfolio, some selection of the strongest software groups, the most research-oriented of the hardware teams, and little else; there will be a major Google office at 600 North U.S. Highway 45 in the Silicon Prairie, though :)

In the meantime, Google will use that stable of 24,000 cellphone-related patents to keep plenty of intellectual property attorneys and courtroom staff employed, and the end result will be a Supreme Court decision that will annihilate software patents (at least for the time being).

This is, I'm afraid, a flat-out disaster for that company that bought Java and decided it was useful only for litigation purposes, as well as for that most-valuable-company-on-the-planet further South, which was hoping that an early lead and a fanatical user base would enable it to keep prices sky-high and never need to open its mobile iPlatform.

As I said, predictions are free; mine is worth nothing. But if you wanted informed speculation, you'd have looked elsewhere, after all :)

Saturday, August 13, 2011

Family visit 2011

Just wanted to post this wonderful picture from our big family meet-up this summer:

Family Pic?

That's Danny and Emily on the left, the Amideis in the middle, Donna and I on the right, and Amelia on the far right.

Great picture!

Thursday, August 11, 2011

Big data, Bryan style

I had my first encounter with a terabyte dataset today.

At my day job, our product is used across an amazing range of problem sizes:

  • At the low end, many hundreds of thousands of individual engineers quite happily use their own personal server to manage their individual digital work, in complete isolation.

  • At the high end, a single server can manage immense quantities of data under simultaneous use from tens of thousands of users.

Recently, I had the opportunity to investigate a confusing behavior of the server, which only presented its symptoms under certain circumstances. Unfortunately, these circumstances weren't just "run these 3 steps", or "look at this one-page display of data", but rather, "it happens on our production server".

Happily, our user was able to arrange to share their dataset with us, so we embarked on a effort to connect me, the developer, with the symptoms of interest:

When the dataset arrived, I was rather astonished to discover that a 33 GB compressed file expanded to almost 1 TB in size! This was highly textual data, and also contained a fair amount of redundancy, so I had been expecting a high degree of compression, but my expectation was something around 12x, so I initially had confidently selected a 550 GB filesystem and issued the uncompress command there.

Imagine my surprise, when after 2 hours the filesystem filled up and the uncompress was aborted!

After a quick conference call, we realized that we needed something much closer to a terabyte in available space.

Happily, hardware vendors are working hard to keep up with this demand, and our benchmarking team happened to have a machine with a fresh new 4 TB hard drive, so I slipped a few promises their way and the deal was done :) A few hours later, the uncompress was once again running, and a mere 10 hours (!!) later I had my terabyte dataset to play with.

Yes, indeed, the age of big data is upon us. I know I'm still small-potatoes in this world of really big data (check the link out, those guys are amazing!), but I've crossed the 1 TB threshold, so, for me, I think that counts as starting to play with the big boys

Plus, it's the first problem report I've encountered that I couldn't just drop on my awesome MacPro :)

It's much more than just a game

It's a flat-out obsession.

Monday, August 8, 2011

Dark Silicon will soon be a fact of life

An interesting article in the New York Times discusses the findings of a paper published recently by a team of scientists at the International Symposium on Computer Architecture, entitled: Dark Silicon and the End of Multicore Scaling.

The paper extrapolates several basic trends in computer architecture:

  • Device scaling model (DevM): area, frequency, and power requirements at future technology nodes through 2024.

  • Core scaling model (CorM): power/performance and area/performance single core Pareto frontiers derived from a largeset of diverse microprocessor designs.

  • Multicore scaling model (CmpM): area, power and performance of any application for “any” chip topology for CPUlike and GPU-like multicore performance.

The point of this exercise is that, when they work out the numbers and projections carefully, it becomes clear that the power requirements of computer processor chips are growing faster than the ability of application software to use all those transistors effectively.

The result, the authors predict, is that we will be facing the practical end of Moore's law much sooner than others have suggested:

The study shows that regardless of chip organization and topology, multicore scaling is power limited to a degree not widely appreciated by the computing community. Even at 22 nm (just one year from now), 21% of a fixed-size chip must be powered off, and at 8 nm, this number grows to more than 50%. Through 2024, only 7.9 average speedup is possible across commonly used parallel workloads, leaving a nearly 24-fold gap from a target of doubled performance per generation.

Since the essential problem is that hardware designs are moving to extreme multi-core techniques much faster than software has been able to adapt, the only solution that they can see is to re-double our efforts to improve our software techniques and learn to build software that can effectively use these highly parallel machines:

On the other hand, left to the multicore path, we may hit a “transistor utility economics” wall in as few as three to five years, at which point Moore’s Law may end, creating massive disruptions in our industry. Hitting a wall from one of these two directions appears inevitable. There is a silver lining for architects, however: At that point, the onus will be on computer architects–and computer architects only–to deliver performance and eciency gains that can work across a wide range of problems. It promises to be an exciting time.

From my perspective, the challenge of building software that more effectively uses multi-core hardware is not insuperable. I think that many software authors have been thinking about this, but haven't been sufficiently motivated to do so because Moore's law just keeps on delivering.

It's not that we can't write efficient, power-sensitive, highly-parallel code, it's just that if we don't need to, then we won't, because we can spend that time adding more features and building more interesting types of applications.

When that changes, we'll change, too.

I can now return to all those other activities I've been deferring ...

... since (yes, I know I'm 3 months late) I've finished the game!

Can the PS3 version access user-generated levels, such as the summer map winners?

Constrained by draft means that you have the right of way

On the water, there are right-of-way rules that define who has to give way in a potential collision situation.

Here's a video showing what happens when you don't know (or follow) the rules properly.

At the start of the video, you can hear the super-tanker blasting its horn.

From what I can tell, everyone was on the windward rail and probably their view of the super-tanker was blocked by the giant pink spinnaker.

Still, I'm sure they heard the horn; I can testify from personal experience (Hi Tom!) that the horn is loud and very hard to miss.

I sure hope that guy that appears to slip off the side just before the sailboat slides around the prow of the tanker hung on and was pulled to safety...

Friday, August 5, 2011

Matt Welsh and his team dig deeply into web performance measurement

Matt Welsh has published an overview essay about the work his team is doing on mobile web performance. It's a great article, providing an overview of the research problems they're focused on, and giving pointers to additional material for further study.

As Welsh observes, measuring mobile web performance is tricky because the experience will:

depend on what the user is trying to do. Someone trying to check a sports score or weather report only needs limited information from the page they are trying to visit. Someone making a restaurant reservation or buying an airline ticket will require a confirmation that the action was complete before they are satisfied. In most cases, users are going to care most about the "main content" of a page and not things like ads and auxiliary material.

He then covers a variety of efforts that are underway to make these measurements more concrete and precise, including the tools at, with their thorough and detailed documentation, and this great presentation from an O'Reilly conference last spring.

I'm enough of a dinosaur to remember the "bad old days" of performance benchmarking, when the whole field was so volatile and commercialized and fraught with acrimony and hostility that Jim Gray had to publish his seminal database benchmarking paper anonymously.

It's great to see the Google team being so open with their work, sharing not only their results, but more importantly their techniques, methods, and reasoning. Thinking about performance is hard, and studying how a team goes about considering the problem (defining metrics, establishing benchmarks, compiling results, analyzing findings) is incredibly helpful for learning to become a better performance analyst in any area of computer science.

So thanks, Googlers, and please keep on sharing what you're finding!

Tuesday, August 2, 2011

Buttermilk fried chicken

For the last few years, my wife has taken to making her fried chicken by marinating the meat in buttermilk. Apparently, according to one of the chefs at Google, this was the way that Elvis Presley preferred his fried chicken, too!

And yes, she's been putting paprika and other pepper-y spices into her recipe for some time, too.

Perhaps this explains the soft spot she's always had in her heart for the King... :)

Monday, August 1, 2011

Hacker Newsletter

I recently subscribed to Kale Davis's interesting Hacker Newsletter, and so far I've been enjoying it very much.

The newsletter is, generally, a distillation of the week's activity on the wildly popular Hacker News website. It is nicely formatted, clearly presented, and the articles that Davis features have been intriguing and worth clicking on; I'd say that's about everything you could hope for from a newsletter such as this.

From what I can tell, Davis has been doing this for a year now, so I guess I wasn't paying much attention since I just came across the newsletter now. But better late than never.

Bryan sez: check it out!