Monday, October 17, 2022

Scalar is now bundled with git 2.38

git 2.38 just came out. As usual, it contains a raft of new features, but in particular it introduces Scalar:

Git’s new built-in repository management tool, Scalar, attempts to solve that problem by curating and configuring a uniform set of features with the biggest impact on large repositories.

If you use git, but don't know about Scalar, here's some background information:

  • Scaling Git (and some back story)
    We had an internal source control system called Source Depot that virtually everyone used in the early 2000’s. Over time, TFS and its Team Foundation Version Control solution won over much of the company but never made progress with the biggest teams – like Windows and Office.
  • The largest Git repo on the planet
    we did work in Git and GVFS to change many operations from being proportional to the number of files in the repo to instead be proportional to the number of files “read”. It turns out that, over time, engineers crawl across the code base and touch more and more stuff leading to a problem we call “over hydration”. Basically, you end up with a bunch of files that were touched at some point but aren’t really used any longer and certainly never modified. This leads to a gradual degradation in performance. Individuals can “clean up” their enlistment but that’s a hassle and people don’t, so the system gets slower and slower.

    That led us to embark upon another round of performance improvements we call “O(modified)” which changes the proportionality of many key commands to instead be proportional to the number of files I’ve modified (meaning I have current, uncommitted edits on).
  • Introducing Scalar: Git at scale for everyone
    Scalar is a .NET Core application with installers available for Windows and macOS. Scalar maximizes your Git command performance by setting recommended config values and running background maintenance. You can clone a repository using the GVFS protocol if your repository is hosted by Azure Repos. This is how we will support the next largest Git repository: Microsoft Office.

Note that that last article is almost 3 years old, but progress hasn't stopped!

It appears that the next step has just happened:

The Story of Scalar
Sparse-checkout definitions are extremely generic. They include matching on file prefix, but also file suffix, or path substring, and any combination. For our target monorepo, we only needed directory matches. With that limited type of pattern in mind, we added a new mode to Git’s sparse-checkout feature: “cone mode” sparse-checkout. A quick prototype of cone mode sparse-checkout demonstrated that Git could reach similar performance as VFS for Git, especially when paired with the filesystem monitor hook. Our critical performance measurement was the git status command, and we were seeing performance within three or four seconds, which was close to the typical case in VFS for Git.

This was promising enough to move forward with a full prototype. We decided to make this a separate project from VFS for Git, so it needed its own name: Scalar.

Back in the day, I spent a full decade working on SCM systems, and I still enjoy geeking out by reading the latest news in the SCM world. This is impressive, amazing stuff, though it's hard to convey the importance in a brief way.

No comments:

Post a Comment