Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
814 views
in Technique[技术] by (71.8m points)

git fsck: how --dangling vs. --unreachable vs. --lost-found differ?

I've recently found about git fsck, but the linked answers and git help fsck give a list of various alternative options, some of which seem to mean the same to an untrained eye. To be able to use the tool well, I'd love to learn what's the difference between below commands?

  • git fsck --dangling
  • git fsck --unreachable
  • git fsck --lost-found

Also, can/should they be used together in some combinations, or better not?

(As a side note, I'm particularly interested in using this in git log -G$REGEX $(git fsck --something), to cast the net as wide as possible, in a faint hope of finding something I remember writing at some point, but that I can't find with a git log -G$REGEX -a.)

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Part of the answer is in the git glossary, where we find this:

dangling object

An unreachable object which is not reachable even from other unreachable objects; a dangling object has no references to it from any reference or object in the repository.

(all links theirs). Reachability (follow their link if you like) is a basic concept in git's commit graph, where we start with some external reference like a branch or tag name to get starting points within the graph, then follow the outbound edges from each node to find all other nodes.

(There's a glossary entry for ref but not for reference, but reference just has its regular dictionary meaning here.)

I think this is best explained illustratively, though. Suppose we have a commit DAG that looks like this:

     C--D--E      <-- branch-a
    /
A--B--F---G--H    <-- branch-b
        /
     I--J--K--L   <-- branch-c

Nodes always point left-ish, while possibly also pointing up or down, so node E, for instance, points back at D, which points at C which points at B which points at A. (A points nowhere: it is a root node.) Node G is a merge and points back at both F and J. Every node in this graph is reachable: we start from all the external references (branches) and walk left-ish and discover that nodes A through E are on branch-a; nodes A, B, and F through G are on branch-b; and so on. (Note that nodes A and B are on every branch. The fact that a node can be on many branches is one of the things that is a bit unusual about git. In mercurial, for instance, each node is only ever on one branch. In this particular way, git's branches are fluid while mercurial's are fixed.)

Now let's see what happens if we erase one of the branch labels. Let's peel off the branch-a label first.

Commit E no longer has anything pointing to it. It is unreachable, and also—in git's term here—dangling. Commit D has only commit E pointing to it. Since E is unreachable, D is also unreachable, but D is not dangling, because E points to D. C is in the same state as D. Node B, on the other hand, is reachable from branch-b by following H to G to F to B, and by following H to G to J to I to B, and from branch-c by following L to K to J to I to B.

Let's put the branch-a label back (so that C through E are reachable again) and peel off branch-c instead. This time L and K become unreachable. Node J remains reachable, though, by starting with branch-b and working from H to G to J. Of the K and L commits, only L is dangling, because L points to K.

When using git fsck, as I noted in that other answer, --lost-found "resurrects" (some) dangling objects by writing their IDs or contents into .git/lost-found/.

(Remember that commits point back to previous commits, while blobs are just text and never point to anything. You get dangling commits when you delete a branch, or when rebased-and-thus-abandoned commit chains lose their reflog reference, for instance, so they are pretty normal. You get dangling blobs when you git add a file's contents, then either git reset it or git add new contents without committing first, so dangling blobs are pretty normal. git fsck does not save dangling tree or tag objects. Normally there should be no dangling trees: tree objects can only point to more trees and to blobs, and any dangling tree should normally have been pointed-to by a commit; and you have to use git write-tree manually, and then never reference the tree, to get a dangling tree. I'm not sure why tag objects are not resurrected, since accidentally deleting the external reference for an annotated tag will result in dangling tag objects, and it might be nice to be able to get those back.)

Summary: git fsck detection and restoration of dangling or unreferenced objects

Unreachable objects are those not reachable from external references (principally branch and tag names, though there are others like refs/stash, used by git stash). Dangling objects are a subset of unreachable objects, specifically those with no inbound arcs (in graph theoretic terms).

Adding the --lost-found flag will save the IDs of dangling commits—which makes these commits, and hence any additional unreferenced commits, all referenced again—and decompress and make available all dangling blob objects.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...