Journal of a Programmer: June 2011

Thursday, June 30, 2011

Technology marches on

Nowadays, even cables have computers in them!

"Unlike ordinary passive cables that can be used at lower data rates, the unprecedented speed of the new Thunderbolt technology places unique demands on the physical transmission media," according to Gennum's website. "The GN2033 provides the sophisticated signal boosting and detection functions required to transfer high-speed data without errors across inexpensive Thunderbolt copper cables."

As the Ars Technica review notes, this bears a striking resemblance to the early days of FireWire.

I was rather startled to realize, the other day, that it's been 4.5 years since Jim Gray, the greatest computer programmer of our time, was lost at sea.

Recently, Professor Joe Hellerstein of Berkeley published a technical report looking back on the amateur search efforts that were organized in the days after Gray's disappearance: Searching for Jim Gray: A Technical Overview.

The report discusses the work of Gray's friends and co-workers, who hoped to aid the search:

an unprecedented civilian search-and-rescue exercise involving satellites, private planes, automated image analysis, ocean current simulations, and crowdsourced human computing, working in collaboration with the US Coast Guard.

They discovered that this was quite a bit harder than they had hoped:

As we learned, real-time search for boats at sea is not as simple as getting a satellite feed from a mapping service, or borrowing a private jet.

As if even those tasks were simple!

Moreover, once they did conduct their image analysis, they learned that:

Our conclusion is simply that the ocean surface is not only very large, but also very empty.

The report also contains an interesting appendix speculating as to why the EPIRB on Tenacious never sounded an alert, and wonder whether the government might at some point consider:

mandate the installation of maritime safety technology in a failsafe way, as we have done with other technologies like automobile airbags. It is both possible and inexpensive (relative to the cost of a boat) to require EPIRB-like technology to be integrated into boat construction.

The paper is quite interesting, and quite different from the typical computer science technical report. I enjoyed it; you probably will, too.

Tuesday, June 28, 2011

ICGA announcement about Rybka

According to ChessVibes, the International Computer Games Association, organizers of the World Computer Chess Championship, have issued their long-awaited report about Rybka, and have decided to disqualify it and strip it of its victories.

The ChessVibes article includes almost a dozen detailed supporting documents, which it claims were produced by the ICGA as part of its investigations.

I don't really know what to make of all of this, but three facts are indisputable:

Modern chess software is incredibly strong.

Chess software can be an extremely effective training aide

Organizers of chess tournaments are increasingly worried about cheating

SSD activity

All of a sudden, it seems like you can't go anywhere without bumping into a discussion of how SSDs are changing the world. Every conversation seems to be about IOPs, wear-leveling, non-rotational optimization, SSD benchmarking, or MTBF and duty cycles.

An abbreviated selection of some of the interesting recent activity:

Robin Harris digs into some of the details about block-level deduplication on SSD devices. Specifically, there is a discussion about the tradeoffs that storage vendors are making: "minimizing capacity use and maximizing data availability are conflicting goals", as Harris observes. The concern is that, if a storage device "de-duplicates" two blocks which the higher-level file system intentionally wrote redundantly, the underlying storage device may be undoing the file system's attempt to avoid a single point of failure. As David Rosenthal puts it:

File systems write the same metadata to multiple logical blocks as a way of avoiding a single block failure causing massive, or in some cases total, loss of user data. An example is the superblock in the UFS (Unix file system). Suppose you have one of these SSDs with a UFS on it. Each of the multiple alternate logical locations for the superblock will be mapped to the same underlying physical block. If any of the bits in this physical block goes bad, the same bit will go bad in every alternate logical superblock.

As several of the comments to Harris's article note, it's not clear that this interaction between the filesystem and the underlying storage system is unique to SSD; modern storage vendors have a variety of complex layers of logic which may or may not interact cleanly with the file system's algorithms. Meanwhile, Harris gets in touch with one of the storage vendors to discuss the details of error detection and error correction on SSD storage devices, and posts some interesting responses. One thing that is clear is that modern storage devices have become fantastically complex:

Not only do SandForce SSD Processors employ ECC protection enabling an UBER (Uncorrectable Bit Error Rate) of greater than 10^-17, if the ECC engine is unable to correct the bit error RAISE will step in to correct a complete failure of an entire sector, page, or block.

On his "Practical Cloud Computing" blog, Siddharth Anand of Netflix has collected a survey of recent information about work on enhancing the Cassandra system for the new storage hardware. In particular, a UK company named Acunu has been hard at work studying Cassandra's system-level behaviors, and, together with researches from Google, Acunu are proposing a new data structure they call the Stratified B-tree.

On the High Scalability blog, Todd Hoff highlights a recent presentation from the O'Reilly Velocity Conference by Artur Bergman of Wikia, talking about their use of a high-end all-SSD storage system. Jeff Darcy responds with the position that RAM, SSD, and disk should be thought of as a storage hierarchy, and arranging the levels of storage carefully is likely to lead to better efficiency.

Over at AnandTech, the team dig into the details of Sony's latest Vaio with its screaming 512 GB SSD drives.

The one thing that can't be missed is that the new SSD devices are fast, Fast, FAST! You don't have to look very hard to find people relating their amazing experiences with SSD storage:

Jeff Atwood says "SSDs are so scorching hot that I'm willing to put up with their craziness"

Robin Harris says "faster and cheaper SSDs are rewiring data center architectures"

Paul Randal reports that in his benchmarking, "the new v2.2 Fusion-io driver gives a 24% performance boost over the old v1.2 driver"

Here at my day job, our performance lab team recently did an in-depth study of modern server gear and concluded that SSD usage can bring dramatic performance improvements at a surprisingly low price point.

The bottom line is that if you haven't been paying much attention to SSD technology, this is the year when you need to start doing so.

Saturday, June 25, 2011

The lady doth protest too much, methinks

I'm sure that it all is true, but somehow it is ringing a little hollow.

Instant answers. New sources of knowledge. Powerful tools—all for free. In just 13 years we’ve built a model that has changed the way people find answers and helped businesses both large and small create jobs and connect with new customers.

And yet, the Wall Street Journal reports,

The FTC's preparations to subpoena Google are the first concrete signal that the agency's commissioners have decided there is enough evidence to move forward with a formal investigation. The probe is expected to take a year or more to unfold, and it won't necessarily lead to any charges.

Perhaps the big G feels a bit like they're getting it from all sides now.

Friday, June 24, 2011

Job Interview Questions

OK, if The Morning News hasn't finally put a stake in those awful "job interview questions" articles, I don't know what will. Marvelous!

Thursday, June 23, 2011

Deadlock awareness in application level protocols

TCP/IP and its associated sockets API is an amazing piece of work.

It is a connection-based, streaming, reliable, full-duplex protocol.

Connection-based means that you have a definite partner. You aren't just sending your data out randomly, you're sending it to the program on the other end of the connection.

Streaming means that your data is delivered in-order. Byte 1 will be delivered before byte 2, which in turn is delivered before byte 3.

Reliable, because the TCP/IP implementations take over a number of incredibly important details which each application would otherwise have to build on its own.

Full-duplex, meaning that both ends of the connection can be sending and/or receiving at the same time; the network does not mandate the rules about who can send or receive when.

Under the covers, TCP/IP takes the data that you give it, chops it up into packets, check-sums the data, addresses and routes it to its destination, acknowledges delivery, and automatically re-transmits lost or damaged packets.

It also automatically and adaptively detects congestion on the network, and controls the flow of data to ensure fair use of the network resources.

However, there is one thing that TCP/IP doesn't, and, can't, provide: the network is not of infinite capacity. There is some amount of sent-but-not-yet-received data which can be outstanding, but when you hit that limit, no more data may be sent.

Therefore, if you are not careful when you are designing your application-level protocol, it is possible to fall into a simple trap:

On my end, I start sending data to you. I keep sending and sending, and at some point all my buffers fill up and so my write() call blocks, waiting for you to receive some of that data.

However, on your end, you are doing the same thing: you keep sending and sending, and at some point all of your buffers fill up and so your write() call blocks, waiting for me to receive some of that data.

I'm never going to receive your data, because I'm blocked writing.

And you're never going to receive my data, because you're blocking writing.

This is commonly called a "write-write deadlock".

Unfortunately, this is the sort of error that you may not notice until your application has been in production for months or years, particularly if your local testing is performed on machines which are directly connected on local-area networks with large internal buffers.

There are various ways to avoid this problem, such as having programming conventions which ensure that each partner always reads data in preference to writing data, and, when reading data, programs always read as much data as is available before blocking or writing more data.

But the best way to avoid these sorts of problems is to be aware of them while designing and analyzing your application level protocols, so that you are careful not to build programs which fall into this trap.

Tuesday, June 21, 2011

Weekly's intro to stock and options

David Weekly has written an interesting short article entitled: An Introduction to Stock & Options for the Tech Entrepeneur or Startup Employee, and has self-published it on Scribd and Amazon.

It won't instantly qualify you for an MBA program, nor, as he notes, should it be the basis of any legal decisions you make, but it's a clear, well-written, and thorough guide that will make you feel much less helpless and useless should you ever be lucky enough to be part of a venture capital-funded startup.

I wish I'd known all these things when I was 25, when it actually mattered. Of course, the bottom line is still that the investors and executives will be the ones that get rich, while you, the engineer, will do the hard work of designing and building the software, but at least you'll start to be able to understand why they are getting rich and why they keep nattering on about stuff that doesn't seem to matter instead of paying attention to running the company (which you foolishly thought was what they were supposed to be doing).

Heh. OK, Bryan. Down, boy. Pretend you've learned those lessons, and moved on.

Monday, June 20, 2011

Two great tastes that taste great together

It's Mathematics! It's Genealogy! What could be better?

Here's a not-so random search result.

2011 USENIX Annual Technical Conference

The technical program information for the 2011 USENIX Annual Technical Conference appears to be online now. Although the videos are restricted to USENIX members, access to the other technical information is available to all -- kudos to USENIX for making this information available to the broader community!

This conference has a fairly broad scope, with activity in many of the currently hot areas of computing:

Virtualization

Cloud Computing

Storage systems

Security

and more.

Among the talks that immediately jumped out at me as intriguing were these:

vIC: Interrupt Coalescing for Virtual Machine Storage Device IO, Irfan Ahmad, Ajay Gulati, and Ali Mashtizadeh, VMware, Inc.

Taming the Flying Cable Monster: A Topology Design and Optimization Framework for Data-Center Networks, Jayaram Mudigonda, Praveen Yalagandula, and Jeffrey C. Mogul, HP Labs. I love the premise of this work:

Imagine that you have been given the task to design a shipping-container cluster containing 1000 server-class computers. The container provides the necessary power and cooling, you have already chosen the servers, and now you must choose a network to connect the servers within the pod.

The Design and Evolution of Live Storage Migration in VMware ESX, Ali Mashtizadeh, Emré Celebi, Tal Garfinkel, and Min Cai, VMware, Inc. -- my co-worker Dave Ackerman was really impressed by this work by the VMWare team.

Building a High-performance Deduplication System, Fanglu Guo and Petros Efstathopoulos, Symantec Research Labs

Semantics of Caching with SPOCA: A Stateless, Proportional, Optimally-Consistent Addressing Algorithm, Ashish Chawla, Benjamin Reed, Karl Juhnke, and Ghousuddin Syed, Yahoo! Inc.

TidyFS: A Simple and Small Distributed File System, Dennis Fetterly, Maya Haridasan, and Michael Isard, Microsoft Research, Silicon Valley; Swaminathan Sundararaman, University of Wisconsin, Madison

This is just a sampling of what was discussed at the conference. It looks like there was a whole section of the conference devoted to advanced techniques for capturing and replaying system activity under the control of debuggers; another intriguing section had to do with improved scheduling techniques for things like multicore systems, non-uniform memory systems, and GPU-rich systems.

It looks like it was a very interesting conference, and I'm looking forward to digging into it in more detail.

Big day today

I'm 0x32 today!

The weather is beautiful; as they say, this is a fine day on which to begin the rest of my life!

Sunday, June 19, 2011

Un-towning a town

The southern end of California's Central Valley has always been a hard-luck place. Now parts are on the verge of disappearing entirely.

Happy Father's Day

I've finished the first 20 Project Euler problems, listed in order of difficulty at the Project Euler website.

Five more and I get to my first "level", whatever that means...

Friday, June 17, 2011

Putting it all together

Put together a love of books, specifically The Game of Thrones, a love of food, and a reasonable aptitude for computers and blogging, and what do you get? A blog about preparing, serving, and enjoying the meals described in George R.R. Martin's books.

Yummmm... I'm ready for breakfast now!

Thursday, June 16, 2011

The Underground Economy of Fake Antivirus Software

Last night I made my way through the paper: The Underground Economy of Fake Antivirus Software, by a team of researchers from the University of California, Santa Barbara. Although any regular reader of Brian Krebs's weblog will find that they already know a lot of this material, the paper is well-written and fast-paced.

In fact (dare I say it? this is a refereed computer science paper, after all!), the paper is exciting and suspenseful, almost heart-racing.

Unless you've been catatonic for the last decade, you're undoubtedly familiar with the basics of these operations; like most IT professionals, you probably get called in about once a week to untangle your neighbor or other associate from the mess they've stepped in. The paper sums it up succinctly:

The most common form of scareware is fake antivirus (AV) software, also known as "rogue security software." More specifically, a fake AV program impersonates an antivirus scanner and displays misleading or fraudulent alerts in an attempt to dupe a victim into purchasing a license for a commercial version that is capable of removing nonexistent security threats.

So, what did the Santa Barbara team do? Well, they:

have been able to acquire backend servers for several multi-million dollar criminal operations selling fake AV products.

...

Since we have access to the servers used by these criminal organizations, we are able to directly analyze the tools that are used to create the fake AV products, including programs that assist perpetrators in controlling the malware's behavior and brand names, as well as custom packers that obfuscate the malware to evade detection by legitimate antivirus products.

And what is it that they learned? Well, (quoting again):

the modus operandi of the criminals

the amount of money involved

the victims who purchase the software

the affiliate networks that promote the campaigns

the flow of money from the victims's credit cards, to the payment processors, to the bank accounts controlled by the criminals.

That is, basically, everything.

As they put it:

This unprecedented access allowed us to obtain ground truth about the type and sophistication of the techniques used to lure victims into paying for scareware, as well as the amount of transactions performed, including refunds and chargebacks.

One of the most chilling sections of the paper is the part where the authors explore the fuzzy, vague, blurred line between modern organized crime, and the core operations of the modern Internet:

An interesting facet of fake AV sales is the process in which credit card transactions are handled. In particular, payment processors (also known as payment service providers) are an integral part of every sale. Without these processors, fake AV operations would not be able to accept credit card payments. This would make it not only harder for a victim to purchase the product (i.e., they would have to use an alternative form of payment, such as cash, check, or money order), but it would also likely raise red flags that the software may be fraudulent. Note that payment processors must maintain a degree of legitimacy, or they risk losing the ability to accept major credit cards.

...

Perhaps the most notorious payment service provider is Chronopay, which ... has long been associated with processing transactions for various forms of online criminal organizations ... [H]owever ... also provides legitimate services to large organizations such as [ an amazing list of top-shelf names follows ]

I'm not kidding about this paper. You'll think you're reading something from Ludlum or LeCarre, but you're not. This is real life, in the modern world, on the Internet.

Tuesday, June 14, 2011

The bubble is dead! Long live the bubble!

The Economist debates the proposition.

Pandora ups their price range another 50%!

Monday, June 13, 2011

Project Euler

Following a pointer from my mother (yes, indeed!), I've been reading through a fascinating article by James Somers: How I Failed, Failed, and Finally Succeeded at Learning How to Code published at The Atlantic's web site.

In the article Somers discusses his own personal history about learning how to program, and describes some of his early attempts at learning to program by reading programming books:

I imagined myself working montage-like through the book, smoothly accruing expertise one chapter at a time.

What happened instead is that I burned out after a week. The text itself was dense and unsmiling; the exercises were difficult. It was quite possibly the least fun I've ever had with a book, or, for that matter, with anything at all. I dropped it as quickly as I had picked it up.

I've read those books, plenty of them, and they are "dense and unsmiling" indeed. Mr. Somers, I feel your pain!

But then Somers got lucky, and stumbled onto an early version of Colin Hughes's wonderful website, Project Euler.

Somers goes on to describe the effect upon somebody who has an aptitude for programming, when they encounter the Project Euler problem set:

What's especially neat about it is that someone who has never programmed -- someone who doesn't even know what a program is -- can learn to write code that solves this problem in less than three hours. I've seen it happen. All it takes is a little hunger. You just have to want the answer.

It's a very insightful description, and quite accurate, I suspect. If you are a natural programmer, then looking at the problems on the Project Euler website will have an immediate effect on you. As soon as you see Problem 1, written almost a decade ago but still just as fresh now as it will be 100 years from now, you aren't going to be able to think about anything else until you go find a computer somewhere close to hand and bash it out:



# include <stdio.h>

int main( int argc, char *argv[] )
{
 int answer = 0;
 for (int i = 1; i < 1000; i++)
  if( i%3 == 0 || i%5 == 0 )
      answer += i;

 printf("Answer to Project Euler problem 1: %d\n", answer);
 return 0;
}

Now, there's no magic here. Programming is hard work, and, like any well-developed skill, it takes practice, practice, and yet more practice (take it away, Abstruse Goose!) Puzzle-solving techniques like these can help you stay motivated, and can help you progress from the simple to the more complex, but it's up to you to persevere, and to sweat the details.

But as Somers notes, this is what learning should be like, and this is what the Web is supposed to be all about:

What you'll find there is something that educators, technologists and journalists have been talking about for decades. And for nine years it's been quietly thriving on this site. It's the global, distributed classroom, a nurturing community of self-motivated learners -- old, young, from more than two hundred countries -- all sharing in the pleasure of finding things out.

I heartily concur.

So thanks, Mom, for sending me the link, and thanks Colin Hughes for the fascinating website, and thanks James Somers for the thoughtful essay.

Now I'm off to work on that next problem...

Saturday, June 11, 2011

Software engineering in the language of lawyers

Have a look at the paragraph below:

A system comprising: units of a commodity that can be used by respective users in different locations, a user interface, which is part of each of the units of the commodity, configured to provide a medium for two-way local interaction between one of the users and the corresponding unit of the commodity, and further configured to elicit, from a user, information about the user's perception of the commodity, a memory within each of the units of the commodity capable of storing results of the two-way local interaction, the results including elicited information about user perception of the commodity, a communication element associated with each of the units of the commodity capable of carrying results of the two-way local interaction from each of the units of the commodity to a central location, and a component capable of managing the interactions of the users in different locations and collecting the results of the interactions at the central location.

This is the core paragraph of United States Patent 7,222,078, which was applied for by Daniel Abelow of Newton Massachusetts in December, 2003, and was granted by the Patent Office in May, 2007.

This patent is currently gathering a lot of attention.

But what language is the above paragraph written in? It is not the language of software engineers: it is complete gibberish to me; it might as well be written in Urdu for all the meaning I can get out of it. After 30 years of software engineering, I can sit down in front of nearly any description of a piece of software and within a few sentences I can grasp what the author is trying to describe, and compare and contrast it to other similar descriptions that were written by software engineers, for software engineers.

This description is written in the language of lawyers, and I have no idea what they are talking about. What is a "commodity", and what are "units of a commodity"? What is a "two-way local interaction", and what sort of "memory" is it that needs to be "capable of storing results", since I know of no other sort of "memory" that is used in computer software.

Since no software engineers use language like this, there must have been a translation process:

Initially, the computer software in question was described by its engineers, in the language of software engineering, stating its design and how it worked.

Then, some person or persons must have translated this description into the language of lawyers, presumably because the United States Patent Office only accepts applications written in the language of lawyers.

Then, the legal description must have been re-translated back into the language of software engineering, so that experienced software professionals similar to myself could examine the description, understand whether it was meaningful and clear, and determine whether or not it was an invention worthy of a United States Patent.

I suppose that I am faced with two questions, one rather pragmatic, and one more philosophical:

Is it possible to locate either or both of the descriptions of a patent which are written in the language of software engineers, so that I could read, e.g., the description of a patent such as United States Patent 7,222,078 in a language I can understand?

Wouldn't the patent system work better if patents were applied for and granted in their actual language, rather than being translated into the language of lawyers? Since every language translation is fraught with ambiguities and errors, why do we force extra unnecessary translations such as these?

I guess I can take heart in that others are as confused as I am. Strangely, Daniel Abelow's own website, which you might think would actually be written in the language of engineers, for engineers, since it claims that

Abelow's independent investions emerged from conceiving a new type of operating environment for individuals, corporations, and societies, to make self-determined improvements in their quality of life.

instead seems to be full of nothing but the same sort of gibberish.

Oh well.

As I said, if you do know of someplace where this patent (or indeed any software patents) are described in the actual language of software engineers, please do let me know.

2011 One Page Dungeon Contest winners announced

I'm several weeks late on this, but in case you hadn't noticed, this year's One Page Dungeon Contest has wrapped up, and the winning submissions are available on the wiki. There's a nice summary on Greg Costikyan's Play This Thing.

The One Page Dungeon Contest is a wonderful exercise that forces contributions to combine various aspects (design, story, technique) of the dungeon-maker's art into a single compact format. As with a number of creative efforts, the discipline of constraining yourself to a single page has a number of benefits, most importantly it forces you to focus and simplify and identify the essence of your idea.

As Einstein's wording of William of Occam's principle as it: everything should be as simple as possible, but no simpler.

I find that this paring-down-to-the-essence technique works wonders in software engineering, too, by the way. When you are concentrating on a bit of code, the question you should always have in your mind is not "what can I add?", but rather "what can I take out?".

But back to dungeons.

I think that the best way to enjoy the One Page Dungeon Contest is to look through the winning submissions, find one which appeals to you, and then look at the bottom of the campaign wiki where the committee has cross-referenced the supporting essays, articles, and blog posts from the authors of the dungeons.

For example, here's the nifty Citadel of Evil, by Stuart Robertson. It's gorgeous, but what's even better is Stuart's blog post, where he goes into a nice explanation of the "pocketmod" format for packing a booklet into a single foldable page with some great references to other uses of that technique.

Similarly, Aaron Frost and Mundi King provide a nice explanation on their web site of the evolution of their design from an idea that was too large to fit down to the gorgeous final result.

Happy dungeon reading!

Mechanics Chess Club article

Here's a nifty short article about the Mechanics Institute Chess Club in San Francisco. Nice pictures, and a nice interview with John Donaldson.

Not much actual chess in the article though. For that, take yourself off to Romania, where the 5th edition of the Bazna King's Tournament gets going in about an hour. Sponsored by the Romanian natural gas company, RomGaz, the tournament takes place in this museum. The lineup is superb, hopefully the games will be superb as well!

Friday, June 10, 2011

Kevin Kelly article on the Internet Archive's physical archive

Here's a nice follow-up article by Kevin Kelly about the new effort by Brewster Kahle's Internet Archive to store books for the future.

The bubble is dead! Long live the bubble!

Pandora, the money-losing Internet radio company, raised its projected IPO price by 50%!

Thursday, June 9, 2011

The Architecture of Open Source Applications

A group led by Greg Wilson and Amy Brown has put together Volume 1 of a new book called The Architecture of Open Source Applications.

Wilson was previously the editor of Beautiful Code, a book I found very interesting and stimulating. One of the chapters in Beautiful Code is written by the founders of my day job, but that's not the only, nor even the primary, reason that I enjoyed the book. It's just quite a good book.

The Architecture of Open Source Applications at first blush looks to be relatively similar: it is a collection of contributed essays by a variety of authors; the common thread is that each author is writing about a particular Open Source project. From the overview:

the authors of twenty-five open source applications explain how their software is structured, and why. What are each program's major components? How do they interact? And what did their builders learn during their development? In answering these questions, the contributors to this book provide unique insights into how they think.

The book looks quite intriguing, with a substantial list of well-known authors contributing their thoughts on important projects. Here's the table of contents:

Asterisk Russell Bryant

Audacity James Crook

The Bourne-Again Shell Chet Ramey

Berkeley DB Margo Seltzer and Keith Bostic

CMake Bill Hoffman and Kenneth Martin

Eclipse Kim Moir

Graphite Chris Davis

The Hadoop Distributed File System Robert Chansler, Hairong Kuang, Sanjay Radia, Konstantin Shvachko, and Suresh Srinivas

Continuous Integration C. Titus Brown and Rosangela Canino-Koning

Jitsi Emil Ivov

LLVM Chris Lattner

Mercurial Dirkjan Ochtman

The NoSQL Ecosystem Adam Marcus

Python Packaging Tarek Ziadé

Riak and Erlang/OTP Francesco Cesarini, Andy Gross, and Justin Sheehy

Selenium WebDriver Simon Stewart

Sendmail Eric Allman

SnowFlock Roy Bryant and Andrés Lagar-Cavilla

SocialCalc Audrey Tang

Telepathy Danielle Madeley

Thousand Parsec Alan Laudicina and Aaron Mavrinac

Violet Cay Horstmann

VisTrails Juliana Freire, David Koop, Emanuele Santos,
Carlos Scheidegger, Claudio Silva, and Huy T. Vo

VTK Berk Geveci and Will Schroeder

Battle For Wesnoth Richard Shimooka and David White

At least 75% of these projects and authors are well-known to me; this is a very impressive list.

The book is available online under a Creative Commons license.

However, it only took me about 5 minutes of poking around before I made up my mind, and clicked through to buy a physical copy.

Hopefully it's at least as interesting as Beautiful Code was, and perhaps it will even be better?

Oh, and by the way, they're already planning a Volume 2, and Tiago Espinha, one of the Derby committers, has signed up to write a chapter about Derby! Yay!

Yusef Bey IV found guilty today

Read all about it here.

Tuesday, June 7, 2011

High-stakes

How high are the stakes in the Google/Oracle patent lawsuit over Java? As Florian Mueller points out, the stakes are very high indeed.

Brewster Kahle and the preservation of physical books

Thank goodness for Brewster Kahle and the Internet Archive for, if nothing else, raising the question of how to preserve our physical heritage.

The goal is to preserve one copy of every published work. The universe of unique titles has been estimated at close to one hundred million items. Many of these are rare or unique, so we do not expect most of these to come to the Internet Archive; they will instead remain in their current libraries. But the opportunity to preserve over ten million items is possible, so we have designed a system that will expand to this level. Ten million books is approximately the size of a world-class university library or public library, so we see this as a worthwhile goal. If we are successful, then this set of cultural materials will last for centuries and could be beneficial in ways that we cannot predict.

Even "centuries", of course, is not very long. But it's far, far better than doing nothing at all.

Saturday, June 4, 2011

Drobo FS first impressions

In our family data center, we've recently been upgrading from our previous file server solution, which was a 3Ware RAID card on the master Linux workstation, to a sparkling new Data Robotics (hmmm maybe now they just call themselves "Drobo"?) Drobo FS.

The box itself is gorgeous and simple. On the back, there are two connections: power, and Ethernet. On the front, the box communicates with lights. There are green, yellow, red, and blue lights, with a variety of meanings. Basically, green means the box is running and the disks are happy; blue tells you how full your system is (we're barely 10 % used, though that will quickly grow as we migrate data to the box).

Inside the box, there are five slots, each of which takes a SATA 3.5 inch hard drive. Yes, that's the entire specification of the disk drive requirements:

Accommodates from one to five 3.5” SATA I / SATA II hard drives of any manufacturer, capacity,
spindle speed, and/or cache.

We populated ours with 5 fresh new Seagate 1 TB drives; at $50 each, with Drobo's "double disk redundancy" enabled, $250 buys about 3 terabytes of fully protected high speed file server storage.

On your network, the Drobo makes itself available as a completely encapsulated Windows file server. Drobo provide a very simple "dashboard" management tool, with which you can connect to the device, configure user access, monitor system conditions, start/stop/restart the device, control network configuration, create and manage filesystems, etc.

By default, the machine grabs a network position using DHCP and then waits patiently for you to connect with the dashboard and start configuring shares.

Connecting to the Drobo from the 4 Windows machines was nearly instant; the Drobo is incredibly friendly to a small Windows network.

Connecting from the Linux box was a bit more challenging, though we soon worked it out. These are the essential steps (for our Ubuntu machine):

Ensure that you have the smbfs application suite installed on your Linux box to enable things like mount.cifs

Give the Drobo a static IP address on your network for ease of describing mount commands, or at least give it a DNS name that works from your other devices.

Using standard Linux commands, mount the Drobo file shares, and away you go!

I got a little bit confused about how to tell the Linux box to mount and access the device as a non-root user; my first mount attempts resulted in mounting the Drobo so that only the root account on the Linux box could access it.

With a bit of experimentation, we discovered that the mount.cifs tool appears to use some combination of the information in the mount command, and the ownership and permissions on the mountpoint, to decide how to access the remote filesystem, so we did this:

mkdir /my/mount/point

chown normal /my/mount/point

chgrp normal /my/mount/point

in /etc/fsttab, express the mount as type cifs //my-drobo/thedroboshare /my/mount/point user=normal,uid=normaluid,gid=normalgid 0 0

where "normal" is the normal non-root userid we're interested in using from the Linux box.

Once we had that in place, the Linux machine seems just as happy with the Drobo as all the Windows machines, so hopefully we're off to the races.

I wonder how long it will take to go from 10% full to 75% full on the file server?

So far, my only complaint is that it seems to take about 5 minutes to go through a restart cycle. I'm not quite sure why it takes so long, is it doing a complete filesystem verification at startup?

Ars Technica have (as usual) a fantastic detailed two-part review of the Drobo FS, which I will now try to find the time to read thoroughly.

Colossal Cave

Here's a wonderful article about Colossal Cave, and remembering what it was like Back Then.

Don't miss all the great pointers, links, and side discussions in the comments, too!

What in the world?

It's the middle of June, and I'm walking around with my umbrella, because IT'S POURING RAIN OUTSIDE!!!

Friday, June 3, 2011

A most appropriate use of the Internet

It holds a repository of useful, practical information!

Thursday, June 2, 2011

My Droid is eating Gingerbread today!

So this morning I'm downloading and installing the latest Android updates for my Droid X. This is apparently the "Gingerbread" updates, and contains many new features.

JetBrains updates their Perforce support in IntelliJ

Here's a nice short blog entry from the JetBrains team discussing some new features they've added to the Perforce support in IntelliJ 10.5.

P4CONFIG files are perhaps one of the lesser-known features of Perforce, but they are extremely widely used, so it is nice to see some simple but very useful features for improving their usability.

Thanks, JetBrains!