Commit history in Git is nothing but commits.
No commit can ever be changed. So for anything to remove a big file from some existing commit, that thing—whether it's BFG, or git filter-branch
, or git filter-repo
, or whatever—is going to have to extract a "bad" commit, make some changes (e.g., remove the big file), and make a new and improved substitute commit.
The terrible part of this is that each subsequent commit encodes, in an unchangeable way, the raw hash ID of the bad commit. The immediate children of the bad commit encode it as their parent hash. So you—or the tool—must copy those commits to new-and-improved ones. What's improved about them is that they lack the big file and refer back to the replacement they just made for the initial bad commit.
Of course, their children encode their hash IDs as parent hash IDs, so now the tool must copy those commits. This repeats all the way up to the last commit in each branch, as identified by the branch name:
...--o--o--x--o--o--o [old, bad version of branch]
●--●--●--● <-- branch
where x
is the bad commit: x
had to be copied to the first new-and-improved ●
but then all subsequent commits had to be copied too.
The copies, being different commits, have different hash IDs. Every clone must now abandon the "bad" commits—the x
one and all its descendants—in favor of the new-and-improved ones.
All these repository-editing tools should strive to make minimal changes. The BFG is probably the fastest and most convenient to use, but git filter-branch
can be told to copy only all bad-and-descendant commits and to use --index-filter
, which is its fastest (still slow!) filter. To do this, use:
git filter-branch --index-filter <command> -- <hash>..branch1 <hash>..branch2 ...
where the <command>
is an appropriate "git rm --cached --ignore-unmatch"
command (be sure to quote the whole thing) and the <hash>
and branch names specify which commits to copy. Remember that A..B
syntax means don't look at commit A
or earlier, while looking at commits B
and earlier so if commit x
is, say, deadbeefbadf00d...
, you'll want to use the hash of its parent as the limiter:
git filter-branch --index-filter "..." -- deadbeefbadf00d^..master
for instance (fill in the ...
part with the right removal command).
(Note: I have not actually used The BFG, but if it re-copies commits unnecessarily, that's really bad, and I bet it does not.)