Is there something fundamentally wrong in the way I am thinking about merge?
Yes:
... git diff src:somefile dst:somefile
...
This isn't what merge does. Merging a branch does not mean "make my files the same as theirs".
The commit graph
Let's start with a bit of commit-graph drawing:
... <- B <- C <- D <-- branch
Here, the uppercase letters stand in for a commit ID, i.e., one of those big ugly 40-character SHA-1 IDs like f931ca0...
. Each commit records a source tree (this is how, for instance, you can use <id>:path
to see that file's contents as of that commit ID). It also records some parent commit ID(s): usually just one ID, but sometimes more than one (a "merge" commit) and rarely zero parent IDs (a "root" commit).
The branch
is just the name of one of your branches, and it contains commit-ID D
, i.e., the branch name "points to" the tip-most commit D
on that branch. Commit D
points back to C
, which points back to B
, and so on.
A graph becomes "branch-y" or "branched"—this term unfortunately collides with the same term we use to talk about the name of a branch—when several different commits point to the same parent. Remember that the arrows here all point left-ish, even if they point up-and-left or down-and-left; and since text-style graphs don't have room for arrows, we'll just draw a line like -
or
or /
here:
... B - C - D <-- br1
E - F <-- br2
Here the tip of branch br1
is commit D
, and the tip of branch br2
is commit F
. You can ask git for the actual ID, the concrete 40-character thing, using git rev-parse
:
$ git rev-parse br1
e59f6c2d348d465e3147b11098126d3965686098
for instance. Again, we say that br1
"points to" commit D
, which points back to C
, which points back to B
, and so on. Meanwhile br2
points to F
which points to E
which points back to B
.
The commit that any given branch-name points to is the "tip commit" of that branch. When you add a new commit to a branch, git does this by making the new commit store the previous tip's ID as its parent ID, and once the new commit is finalized and safely in the repository, changing the ID stored in the branch-name. That is, if you add a new commit to br1
, the new commit G
will point back to D
, and git will make br1
point to G
:
... B - C - D - G <-- br1
E - F <-- br2
No matter how many new tip commits we make over time, commit B
remains especially interesting. In graph theory terms, the concept here is "reachability": commit B
is (always) reachable from both branch-tips.
Merge
So, here's what git merge
does. You give it two branch tip IDs—one of these is the branch you're on now, the other is the one you name—and it:
- identifies the nearest commit reachable from both branch tips (this is the "merge base");
- does a diff from the merge-base to the current-branch tip;
- does a diff from the merge-base to the other tip;
- combines these two diffs to see where "both sides" made the same changes;
- applies the changes found in step 3, without re-applying the combined changes found in step 4; and
- if all goes well, makes a new commit with two parents, those two parents being both branch-tips.
Conflicts occur wherever the changes found in steps 2 and 3 affect the same region of the same file, but are not exactly the same (hence can't be subtracted away via step 4). If conflicts occur, git makes you resolve them; when you do your final git commit
this still makes the same kind of "merge commit".
Now, suppose that somewhere along the way in the B
-C
-D
-G
line, someone modified the path README.txt
to include an exclamation point in the first line. Meanwhile in B
-E
-F
line, nobody did this to README.txt
. The final merge commit will retain the change made here, but if you compare br1:README
to br2:README
there will be differences: the first line in br2
won't have the exclamation point. That's because the changes being merged-in (base-to-br2
) don't change this; the changes being retained (base-to-br1
) do. We don't want base-to-br1
changes to get removed, we only want changes that aren't already present (have not been made on "both side") to be added.
In any case, let's draw in the final merge commit H
:
... B - C - D - G - H <-- br1
/
E ----- F <-- br2
Now br1:README
has the !
but br2:README
doesn't.
In short, there's no reason to expect, after a merge, any two files to match in the new branch-tips. If all the files had to match exactly, you'd wipe out all the work on br1
, replacing it with only the work on br2
.