If you were to summarize the last 15 years of my professional efforts, you'd probably end up with something like:
- Distributed systems
- Cloud computing
- Version Control
So I was quite interested to stumble upon a research project named Ori File System, and to read their summary of their early work: Replication, History, and Grafting in the Ori File System.
As the authors describe, they started out thinking about how cloud computing has changed the needs of filesystem users, and ended up blending ideas from distributed systems and from version control to arrive at their project:
Ori is a file system that manages user data in a modern setting where users have multiple devices and wish to access files everywhere, synchronize data, recover from disk failure, access old versions, and share data. The key to satisfying these needs is keeping and replicating file system history across devices, which is now practical as storage space has outpaced both wide-area network (WAN) bandwidth and the size of managed data. Replication provides access to files from multiple devices. History provides synchronization and offline access. Replication and history together subsume backup by providing snapshots and avoiding any single point of failure.
The paper is fascinating, but rather chaotic: they cover a lot of ground, and do so in a hurry.
Of course, the ideas that they are building on are not new: distributed filesystems have been around for decades, version control has been around for decades, and even the idea of including version control in filesystems has been around for decades (Microsoft's Previous Versions feature was introduced in Windows Vista in 2003, if I recall correctly).
They do have some interesting new ideas about how to "graft" filesystems together, and what that might mean:
Ori is designed to facilitate file sharing. Using a novel feature called grafts, one can copy a subtree of one file system to another file system in such a way as to preserve the file history and relationship of the two directories. Grafts can be explicitly re-synchronized in either direction, providing a facility similar to a distributed version control system *DVCS) such as Git. However, with one big difference: in a DVCS, one must decide ahead of time that a particular directory will be a repository; while in Ori, any directory can be grafted at any time. By grafting instead of copying, one can later determine whether one copy of a file contains all changes in another (a common question when files have been copied across file systems and edited in multiple places).
This idea of analyzing history and looking for evidence about the pedigree of a change is a classic Version Control topic, the sort of thing that practitioners spend all their waking hours contemplating. It's enormously powerful, but devilishly intricate in the details.
With a project like this, defined mostly as new ways to interconnect older ideas, it's all about the execution of the ideas.
The Ori team have decided to develop their implementation in the open, and have open sourced the project from their project page.
It will be interesting to follow this project and see how it develops.