To elaborate a bit more about why the short hash is useful, and why you often don't need the long hash, it has to do with how Git stores things.
c26cf8af130955c5c67cfea96f9532680b963628
will be stored in one of two places. It could be in the file .git/objects/c2/6cf8af130955c5c67cfea96f9532680b963628
. Note that the first two characters, c2
, make up a directory and the rest is the filename. Since many filesystems don't perform well when there's too many files in one directory, this prevents any one directory from having too many files in it and keeps this little directory database efficient.
With just the short hash, c26cf8a
, git can do the equivalent of .git/objects/c2/6cf8a*
and that's likely to be a single file. Since the objects are subdivided into subdirectories, there's not too many filenames to look through to check if there's more than one match.
c26cf8a
alone contains enough possibilities, 16^7 or 2^28 or 268,435,456 that it's very unlikely another commit will share that prefix.
Basically, Git uses the filesystem itself as a simple key/value store, and it can look up partial keys without having to scan the whole list of keys.
That's one way to store objects. More and more, Git stores its objects in packfiles. It's a very efficient way to store just the changes between files. From time to time, your Git repository will examine what's in .git/objects
and store just the differences in .git/objects/pack/pack-<checksum>
.
That's a binary format, I'm not going to get into it here, and I don't understand it myself anyway. :)
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…