When should I use "git push --force-if-includes"

Question

Welcome To Ask or Share your Answers For Others

When should I use "git push --force-if-includes"

1 Reply

深蓝 · Answer 1 · 2021-10-06T19:32:34+0000

The --force-if-includes option is, as you've noted, new. If you've never needed it before, you don't need it now. So the shortest answer to "when should I use this" would be "never". ?? The recommended answer is (or will be once it's proven?) always. (I'm not yet convinced one way or the other, myself.)

A blanket "always" or "never" is not very useful though. Let's look at where you might want to use it. It is, strictly speaking, never necessary because all it does is modify --force-with-lease slightly. So we already have --force-with-lease in effect, if --force-if-includes is going to be used.¹ Before we look at --force-with-includes we should cover how --force-with-lease actually works. What problem are we trying to solve? What are our "use cases" or "user stories" or whatever the latest buzzwords might be when someone is reading this later?

(Note: if you're already familiar with all of this, you can search for the next force-if-includes string to skip the next few sections, or just jump to the bottom and then scroll up to the section header.)

The fundamental problem we have here is one of atomicity. Git is, in the end, mostly—or at least significantly—a database, and any good database has four properties for which we have the mnemonic ACID: Atomicity, Consistency, Isolation, and Durability. Git doesn't exactly achieve any or all of these on its own: for instance, for the Durability property, it relies (at least partly) on the OS to provide it. But three of these—the C, I, and D ones—are local within a Git repository in the first place: if your computer crashes, your copy of the database may or may not be intact, recoverable, or whatever, depending on the state of your own hardware and OS.

Git is not, however, just a local database. It's a distributed one, distributed via replication, and its unit of atomicity—the commit—is spread out across multiple replications of the database. When we make a new commit locally, we can send it to some other copy or copies of the database, using git push. Those copies will try to provide their own ACID behavior, locally on those computers. But we'd like to preserve atomicity during the push itself.

We can get this in several ways. One way is to start with the idea that every commit has a globally (or universally) unique identifier: a GUID or UUID.² (I'll use the UUID form here.) I can safely give you a new commit I've made as long as we both agree that it gets the UUID I gave it, that you didn't have.

But, while Git does use these UUIDs to find the commits, Git also needs to have a name for the commit—well, for the last commit in some chain. This guarantees that whoever is using the repository has a way to find the commit: the name finds the last one in some chain, from which we find all the earlier ones in the same chain.

If we both use the same name, we have a problem. Let's say we're using the name main to find commit b789abc, and they're using it to find the commit a123456.

The solution we use with git fetch here is simple: we assign a name to their Git repository, e.g., origin. Then, when we get some new commit(s) from them, we take their name—the one that finds the last of these commits in some chain, that is—and rename it. If they used the name main to find that tip commit, we rename that to origin/main. We create or update our own origin/main to remember their commits, and it does not mess with our own main.

But, when we're going the other way—pushing our commits to them—Git doesn't apply this idea. Instead, we ask them to update their main directly. We hand over commit b789abc for instance, and then ask them to set their main to b789abc. What they do, to make sure that they don't lose their a123456 commit, is make sure that a123456 is part of the history of our commit b789abc:

  ... <-a123456 <-b789abc   <--main

Since our main points to b789abc, and b789abc has a123456 as its parent, then having them update their main to point to b789abc is "safe". For this to really be safe, they have to atomically replace their main, but we just leave that up to them.

This method of adding commits to some remote Git repository works fine. What doesn't work is the case where we'd like to remove their a123456. We find there is something wrong or bad with a123456. Instead of making a simple correction, b789abc, that adds on to the branch, we make our b789abc so that it bypasses the bad commit:

... <-something <-a123456   <--main

becomes:

... <-something <-b789abc   <--main
               
                a123456   ??? [no name, hence abandoned]

We then try to send this commit to them, and they reject our attempt with the gripe that it's not a "fast-forward". We add --force to tell them to do the replacement anyway, and—if we have appropriate permissions³—their Git obeys. This effectively drops the bad commit from their clone, just as we dropped it from ours.⁴

¹As the documentation you linked notes, --force-if-includes without --force-with-lease is just ignored. That is, --force-if-includes doesn't turn on --force-with-lease for you: you have to specify both.

²These are the hash IDs, and they need to be unique across all Gits that will ever meet and share IDs, but not across two Gits that never meet. There, we can safely have what I call "doppelg?ngers": commits or other internal objects with the same hash ID, but different content. Still, it's best to just make them truly unique.

³Git as it is, "out of the box", does not have this kind of permissions checking, but hosting providers like GitHub and Bitbucket add it, as part of their value-adding thing to convince us to use their hosting systems.

⁴The un-find-able commit doesn't actually go away right away. Instead, Git leaves this for a later housekeeping git gc operation. Also, dropping a commit from some name may still leave that commit reachable from other names, or via log entries that Git keeps for each name. If so, the commit will stick around longer, perhaps even forever.

So far so good, but ...

The concept of a force-push is fine as far as it goes, but that's not far enough. Suppose we have a repository, hosted somewhere (GitHub or whatever), that receives git push requests. Suppose further that we are not the only person / group doing pushes.

We git push some new commit, then discover it's bad and want to replace it with a new and improved commit immediately, so we take a few seconds or minutes—however long it takes to make the new improved commit—and get that in place and run git push --force. For concreteness, let's say this whole thing takes us one minute, or 60 seconds.

That's sixty seconds during which someone else might:⁵

fetch our bad commit from the hosting system;
add a new commit of their own; and
git push the result.

So at this point, we think the hosting system has:

...--F--G--H   <-- main

where commit H is bad and needs replacement with our new-and-improved H'. But in fact, they now have:

...--F--G--H--I   <-- main

where commit I is from this other faster committer. Meanwhile, we now have, in our repository, the sequence:

...--F--G--H'  <-- main
         
          H   ???

where H is our bad commit, that we're about to replace. We now run git push --force and since we are allowed to force-push, the hosting provider Git accepts our new H' as the last commit in their main, so that they now have:

...--F--G--H'  <-- main
         
          H--I   ???

The effect is that our git push --force removed not only our bad H, but their (presumably still good, or at least, wanted) I.

⁵They might do this by rebasing a commit they'd already made, after finding their own git push blocked because they had based their commit on G originally. Their rebase automatically copied their new commit to the one we're calling I here, with no merge conflicts, enabling them to run git push in fewer seconds than it took us to make our fixed-up commit H'.

Enter `--force-with-lease`

The --force-with-lease option, which internally Git calls a "compare and swap", allows us to send a commit to some other Git, and then have them check that their branch name—whatever it is—contains the hash ID that we think it contains.

Let's add, to our drawing of our own repository, the origin/* names. Since we sent commit H to the hosting provider earlier, and they took it, we actually have this in our repository:

...--F--G--H'  <-- main
         
          H   <-- origin/main

When we use git push --force-with-lease, we have the option of controlling this --force-with-lease completely and exactly. The complete syntax for doing this is:

git push --force-with-lease=refs/heads/main:<hash-of-H> origin <hash-of-H'>:refs/heads/main

That is, we'll:

send to origin commits ending with the one found via hash ID H';
ask them to update their name refs/heads/main (their main branch); and
ask them to force this update, but only if their refs/heads/main currently has in it the hash ID of commit H.

This gives us a chance to catch the case where some commit I has been added to their main. They, using the --force-with-lease=refs/heads/main:<hash> part, check their refs/heads/main. If it's not the given <hash>, they refuse the entire transaction, keeping their database intact: they retain commits I and H, and drop our new commit H' on the floor.⁶

The overall transaction—the forced-with-lease update of their main—has locking inserted so that if someone else is attempting to push some commit (perhaps I)

Categories

When should I use "git push --force-if-includes"