Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
434 views
in Technique[技术] by (71.8m points)

version control - git pull --rebase lost commits after coworker's git push --force

I thought I understood how git pull --rebase was working, but this example is confusing me. I would have guessed that the following two scenarios would produce identical results, but they differ.

First, the one that works.

# Dan and Brian start out at the same spot:
dan$ git rev-parse HEAD
067ab5e29670208e654c7cb00abf3de40ddcc556

brian$ git rev-parse HEAD
067ab5e29670208e654c7cb00abf3de40ddcc556

# Separately, each makes a commit (different files, no conflict)
dan$ echo 'bagels' >> favorite_foods.txt
dan$ git commit -am "Add bagels to favorite_foods.txt"

brian$ echo 'root beer' >> favorite_beverages.txt
brian$ git commit -am "I love root beer"

# Brian pushes first, then Dan runs `git pull --rebase`
brian$ git push

dan$ git pull --rebase
dan$ git log
commit 9e1140410af8f2c06f0188f2da16335ff3a6d04c
Author: Daniel
Date:   Wed Mar 1 09:31:41 2017 -0600

    Add bagels to favorite_foods.txt

commit 2f25b9a25923bc608b7fba3b4e66de9e97738763
Author: Brian
Date:   Wed Mar 1 09:47:09 2017 -0600

    I love root beer

commit 067ab5e29670208e654c7cb00abf3de40ddcc556
Author: Brian
Date:   Wed Mar 1 09:27:09 2017 -0600

    Shared history

So that works out fine. In the other scenario, imagine that Dan pushed, and then Brian (rudely) push --force'd over his commit. Now, when Dan runs git pull --rebase, his commit is gone.

# Dan and Brian start out at the same spot:
dan$ git rev-parse HEAD
067ab5e29670208e654c7cb00abf3de40ddcc556

brian$ git rev-parse HEAD
067ab5e29670208e654c7cb00abf3de40ddcc556

# Separately, each makes a commit (different files, no conflict)
dan$ echo 'bagels' >> favorite_foods.txt
dan$ git commit -am "Add bagels to favorite_foods.txt"

brian$ echo 'root beer' >> favorite_beverages.txt
brian$ git commit -am "I love root beer"

# THIS TIME, Dan pushes first, then Brian force pushes.
dan$ git push

brian$ git push --force

dan$ git pull --rebase
dan$ git log  # Notice, Dan's commit is gone!
commit 2f25b9a25923bc608b7fba3b4e66de9e97738763
Author: Brian
Date:   Wed Mar 1 09:47:09 2017 -0600

    I love root beer

commit 067ab5e29670208e654c7cb00abf3de40ddcc556
Author: Brian
Date:   Wed Mar 1 09:27:09 2017 -0600

    Shared history

The origin version of the branch had the same state after Brian's push --force as it did in the first scenario, so I expected the same behavior from git pull --rebase. I'm confused why Dan' commit was lost.

I understand pull --rebase to say "take my local changes, and apply them after the remote ones". I don't expect the local changes to be thrown away. Also, If Dan had run git pull (with no --rebase), his commit is not lost.

So why does Dan lose his local commit when he runs git pull --rebase? The force push makes sense to me, but shouldn't that just leave the remote in the same state as if Brian had pushed first?

How am I thinking about this wrong?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

TL;DR: it's the fork point code

You are getting the effect of git rebase --fork-point, which deliberately drops Dan's commit from your repository too. See also Git rebase - commit select in fork-point mode (although in my answer there I don't mention something I will here).

If you run the git rebase yourself, you choose whether --fork-point is used. The --fork-point option is used when:

  • you run git rebase with no <upstream> argument (the --fork-point is implied), or
  • you run git rebase --fork-point [<arguments>] <upstream>.

This means that to rebase on your upstream without having --fork-point applied, you should use:

git rebase @{u}

or:

git rebase --no-fork-point

Some details are Git-version-dependent, as --fork-point became an option only in Git version 2.0 (but was secretly done by git pull ever since 1.6.4.1, with methods growing more complex until the whole --fork-point thing was invented).

Discussion

As you already know, git push --force rudely overwrites the branch pointer, dropping some existing commit(s). You expected, though, that your git pull --rebase would restore the dropped commit, since you already had it yourself. For naming convenience, let's use your naming, where Dan's commit gets dropped when Brian force-pushes. (As a mnemonic, let's say "Dan got Dropped".)

Sometimes it will! Sometimes, as long as your Git has Dan's commit in your repository, and you have Dan's commit in your history, Dan's commit will get restored when you rebase your commits. This includes the case where you are Dan. And yet, sometimes it won't, and this also include the case where you are Dan. In other words, it's not based on who you are at all.

The complete answer is a bit complicated, and it's worth noting that this behavior is something you can control.

About git pull (don't use it)

First, let's make a brief note: git pull is, in essence, just git fetch followed by either git merge or git rebase.1 You choose in advance which command to run, by supplying --rebase or setting a configuration entry, branch.branch-name.rebase. However, you can run git fetch yourself, and then run git merge or git rebase yourself, and if you do it this way, you gain access to additional options.2

The most important of these is the ability to inspect the result of the fetch before choosing your primary option (merge vs rebase). In other words, this gives you a chance to see that there was a commit dropped. If you had done a git fetch earlier and gotten Dan's commit, then—with or without any intervening work where you may or may not have incorporated Dan's commit—done a second git fetch, you would see something like this:

 + 5122532...6f1308f pu         -> origin/pu  (forced update)

Note the "(forced update)" annotation: this is what tells you that Dan got Dropped. (The branch name used here is pu, which is one in the Git repo for Git that regularly gets force-updated; I just cut-and-pasted an actual git fetch output here.)


1There are several niggling technical differences, especially in very old versions of Git (before 1.8.4). There is also, as I was recently reminded, one other special case, for a git pull in a repository that has no commits on the current branch (typically, into a new empty repository): here git pull invokes neither git merge nor git rebase, but rather runs git read-tree -m and, if that succeeds, sets the branch name itself.

2I think you can supply all the necessary arguments on the command line, but that's not what I mean. In particular, the ability to run other Git commands between the fetch and the second step is what we want.

Basics of git rebase

The main and most fundamental thing to know about git rebase is that it copies commits. The why is itself fundamental to Git: nothing—no one, and not Git itself—can change anything in a commit (or any other Git object), as the "true name" of a Git object is a cryptographic hash of its contents.3 Hence if you take a commit out of the database, modify anything—even a single bit—and go to put the object back in, you get a new, different hash: a new and different commit. It can be extremely similar to the original, but if any bit of it is different in any way, it's a new, different commit.

To see how these copies work, draw at least part of the commit graph. The graph is just a series of commits, starting from the newest—or tip—commit, whose true-name hash ID is stored in the branch's name. We say that the name points to the commit:

            D   <-- master

The commit, which I've called D here, contains (as part of its hashed commit data) the hash ID of its parent commit, i.e., the commit that was the tip of the branch before we made D. So it "points to" its parent, and its parent points further back:

... <- C <- D   <-- master

The fact that the internal arrows are all backwards like this is usually not very important, so I tend to omit them here. When the one-letter names are not very important I just draw a round dot for each commit:

...--o--o   <-- branch

For branch to "branch off from" master, we should draw both branches:

A--B--C--D     <-- master
    
     E--F--G   <-- branch

Note that commit E points back to commit B.

Now, if we want to re-base branch, so that it comes after commit D (which is now the tip of master), we need to copy commit E to a new commit E' that is "just as good as" C, except that it has D as its parent (and of course has a different snapshot as its source base as well):

           E'  <-- (temporary)
          /
A--B--C--D     <-- master
    
     E--F--G   <-- branch

We must now repeat this with F and G, and when we are all done, make the name branch point to the last copy, G', abandoning the original chain in favor of the new one:

           E'-F'-G'  <-- branch
          /
A--B--C--D           <-- master
    
     E--F--G         [abandoned]

This is what git rebase is all about: we pick out some set of commits to copy; we copy them to some new position, one at a time, in parent-first order (vs the more typical child-first backwards Git order); and then we re-point the branch label to the last-copied commit.

Note that this works even for the null case. If the name branch points directly to B and we rebase it on master, we copy all zero commits that come after B, copying them to come after D. Then re-point the label branch to the last-copied commit, which is none, which means we re-point branch to commit D. It's perfectly normal, in Git, to have several branch names all pointing to the same commit. Git knows which branch you are on by reading .git/HEAD, which contains the name of the branch. The branch itself—some portion of the commit graph—is determined by the graph. This means the word "branch" is ambiguous: see What exactly do we mean by "branch"?

Note also that commit A has no parents at all. It's the first commit in the repository: there was no previous commit. Commit A is therefore a root commit, which is just a fancy way to say "a commit with no parents". We can also have commits with two or more parents; these are merge commits. (I did not draw any here, though. It's often unwise to rebase branch chains that contain merges, since it's literally impossible to rebase a merge and git rebase has to re-perform the merge to approximate it. Normally git rebase just omits merges entirely, which causes other problems.)


3Obviously, by the Pigeonhole Principle, any hash that reduces a longer bit-string to a fixed-length k-bit key must necessarily have collisions on some inputs. A key requirement for a Git hash function is that it avoid accidental collisions. The "cryptographic" part is not really crucial to Git, it just makes it hard (but of course not impossible) for someone to deliberately cause a collision. Collisions cause Git to be unable to add new objects, so they are bad, but—aside from bugs in the implementation—they don't actually break Git itself, just the further usage of Git for your own data.


Determining what to copy

One problem with rebasing lies in identifying which commits to copy.

Most of the time, it seems easy enough: you want Git to copy your commits, and not someone else's. But that's not always true—in large, distributed environments, with administrators and managers and so on, sometimes it's appropriate for someone to rebase someone else's commits. In any case, this is not how Git does it in the first place. Instead, Git uses the graph.

Naming a commit—e.g., writing branch—tends to select not just that commit, but also that commit's parent commit, the parent's parent, and so on, all the way back to the root commit. (If there is a merge commit, we usually select all of its parent commits, and follow all of them back towards the root simultaneously. A graph can have more than one root, so this lets us select multiple strands going back to multiple roots, as well as branch-and-merge strands going back to a single root.) We call the set of all commits that we find, when starting from one commit and doing these parent traversals, the set of reachable commits.

For many purposes, including git rebase, we need to make this en-masse selection stop, and we use Git's fancy set operations to do that. If we write master..branch as a revision selector, this means: "All commits reachable from the tip of branch, except for any commits reachable from the tip of master." Look at this graph again:

A--B--C--D     <-- master
    
     E--F--G   <-- branch

The commits reachable from branch are G, F, E, B, and A</


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...