Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
449 views
in Technique[技术] by (71.8m points)

git: why exactly is the claim "git is based on differences between files" wrong?

I know git add saves just a new snapshot of a particular file. But I'm a bit confused about the term "snapshot". As I understood git (e.g. by that or that) source, a snapshot is actually a just the difference to the last commit.

Quote from 2:

it basically takes a picture of what all your files look like at that moment and stores a reference to that snapshot. To be efficient, if files have not changed, Git doesn’t store the file again, just a link to the previous identical file it has already stored.

That sounds to me exactly like the description of a system based on file differences -.-

EDIT:

To be a bit more specific:

I understood, that if a blob isn't modified, the hash isn't changed and therefore used in further commits. It also makes sense to me, that git can detect similarities between blobs and hence eliminate redundancy.

=> Would it be equivalent if i would call e.g. vimdiff (just to illustrate the concept) onto a blob and save the output in a new blob?

=> How does a changed blob look like that has things in common with other blobs?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Short answer: No, Git always records the entire file.

Longer answer: Okay, that's not quite true. Logically, Git always records the entire file. In the storage backend, however, Git performs delta compression across all files from all revisions, so it even detects identical content between different files and across the entire history of all branches, not just the parent commit. And since the network protocol and the storage backend share the same format ("pack files"), you get the same efficiency for push and fetch.

However, it is important to remember that this is an internal implementation detail of the storage backend. It is not a part of the object model. The object model is that each commit contains the entire tree.

This is Git's object model:

  • blob: a bytestream. Basically, a file, but only its contents. It doesn't have a name. In this way, Git works like a Unix filesystem, files don't have names, rather directories associate names with files.

  • tree: a flat(!!!) list of (mode, name, {tree|blob}) triples. This is the equivalent to a Unix directory. It associates names and modes (mainly executable or not) with blobs or trees. I.e. trees can be recursive.

  • commit: a pointer to a tree and a pointer to zero, one, or many parent commits. Also contains a datestamp and two name strings (author and committer) and most importantly, the commit message.

  • (local tag): technically, not a Git object. Just a local file pointing to a commit.

  • annotated tag: contains a pointer to a commit, a name, and an annotation message.

  • signed tag: contains an annotated tag(???) and a digital signature [not sure about this one, is built on top of an annotated tag or does it duplicate it?]

  • note: a piece of text that can be attached to any Git object. This can be used to add arbitrary user-defined metadata to any Git object, e.g. a CI server could attach code coverage results to commits or a bug tracker could attach links to tickets to commits which fix a bug, a web server could attach MIME types to blobs, a release management system could attach go/no-go votes to annotated tags, …

Note that only blobs actually contain file data. The rest is just pointers. And blobs don't have names, which means that as long as a blob has the same content, it is the same blob, and thus only exists once in the object store. In fact, it even exists only once in the entire Git universe! For example, the FSF's GPL COPYING file, as long as you keep it unmodified, will be the exact same blob, even in totally unrelated repositories!


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...