Journal of a Programmer: The end of SHA-1, one month later

As everyone already knows, the SHA-1 cryptographic hash function has been Shattered.

This is no particular suprise; as Bruce Schneier pointed out on his blog nearly five years ago, cryptography experts were well aware of the vulnerability of the SHA-1 cryptography. Schneier quoted Jesse Walker as saying:

A collision attack is therefore well within the range of what an organized crime syndicate can practically budget by 2018, and a university research project by 2021.

Pretty good estimate, I'd say, Mr. Walker.

But what does this mean, in practice?

Perhaps the most visible impact is in the area of network security, where Google has been warning about problems for quite some time, and started putting those warnings into action last fall: SHA-1 Certificates in Chrome

To protect users from such attacks, Chrome will stop trusting certificates that use the SHA-1 algorithm, and visiting a site using such a certificate will result in an interstitial warning.

Other large internet sites have followed suit; kudos to them for doing so quickly and responsibly.

Another very interesting aspect of this signature collision arises in what are known as "content-addressible file systems", of which git is the best known. This is a very significant issue, as the Shattered web site points out:

It is essentially possible to create two GIT repositories with the same head commit hash and different contents, say a benign source code and a backdoored one. An attacker could potentially selectively serve either repository to targeted users. This will require attackers to compute their own collision.

And it doesn't just affect git; subversion is vulnerable, as is Mercurial.

People are right to be worried about this.

However, when it comes to the SCM issue, I think that the issue isn't completely cut-and-dried, for several reasons:

Firstly, we're talking about an issue in which an attacker deliberately constructs a collision, as opposed to an accidental collision. The use of SHA-1 identifiers for git objects remains a useful, practical, and trouble-free technique for allowing people to collaborate independently on common computer files without sharing a central server (the so-called DVCS paradigm). In the 12 years that git has been in use, and the trillions of git object SHAs that have been computed, nobody anywhere in the world has reported an accidental collision in practice.
This strength of accidental collision detection is strengthened by the fact that git encodes certain other information into the computed SHA-1 value besides just the file's content: namely, the object type (blob/tree/commit/tag), and the object length, for blob shas, and other ancillary data such as timestamps, etc. for commit shas. I'm not saying this makes git any safer from a security point of view; after all Google arranged to have their two colliding PDF files be both exactly 422,435 bytes long. But it does mean that the accidental collision risk is clearly quite small.
And, of course, for the attacker to actually supplant "a benign source code" with "a backdoored one," not only does the attacker have to construct the alternate file (of identical length and identical SHA-1, but with evil content), but that backdoored file has to still be valid source code. It is no easy task to add in this additional constraint, even if you are the wealthy-enough attacker to be willing to spend "9,223,372,036,854,775,808 SHA1 computations". I'd imagine that this task gets easier, somewhat, as the size of that source file gets larger; that is, given that a certain amount of the backdoored evil source file is necessarily consumed by the source code of the evil payload itself, the attacker is forced to use the remainder of the file size for containing the rubbish that is necessary to make the SHA-1 values line up, and the smaller that remainder is, the harder it will be to generate that matching SHA-1, right? So it's one more reason to keep your individual source files small?

The above was too many words: what I'm trying to point out is:

With SSH, people use SHA-1 to provide security

With git/Mercurial, people use SHA-1 to provide decentralized object identification workflows, for easier collaboration among trusted teams.

The crucial difference between the use of SHA-1 values in validating network security certificates, versus the use of those values in assigning source code file identifiers, involves the different ways that humans use these two systems.

That is, when you connect to a valuable web site using SSH, you are depending on that SSH signature to establish trust in your mind between yourself and some remote network entity.

But when you share source code with your team, with whom you are collaborating using a tool like Mercurial, Subversion, or git, there are, crucially, other trust relationships in effect between you and the other human beings with whom you are a collaborator.

So, yes, be careful from whom you download a git repo full of source code that you intend to compile and run on your computer.

But wasn't that already true, long before SHA-1 was broken?

Journal of a Programmer

Tuesday, March 28, 2017

The end of SHA-1, one month later

No comments:

Post a Comment

About Me

Blog Archive

Pages