Welcome To Ask or Share your Answers For Others

algorithm - How does Git create unique commit hashes, mainly the first few characters?

Welcome To Ask or Share your Answers For Others

1 Reply

replyed Oct 24, 2021 by 深蓝 (71.8m points)

Git uses the following information to generate the sha-1:

The source tree of the commit (which unravels to all the subtrees and blobs)
The parent commit sha1
The author info (with timestamp)
The committer info (right, those are different!, also with timestamp)
The commit message

(on the complete explanation; look here).

Git does NOT guarantee that the first 4 characters will be unique. In chapter 7 of the Pro Git Book it is written:

Git can figure out a short, unique abbreviation for your SHA-1 values. If you pass --abbrev-commit to the git log command, the output will use shorter values but keep them unique; it defaults to using seven characters but makes them longer if necessary to keep the SHA-1 unambiguous:

So Git just makes the abbreviation as long as necessary to remain unique. They even note that:

Generally, eight to ten characters are more than enough to be unique within a project.

As an example, the Linux kernel, which is a pretty large project with over 450k commits and 3.6 million objects, has no two objects whose SHA-1s overlap more than the first 11 characters.

So in fact they just depend on the great improbability of having the exact same (X first characters of a) sha.

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

...