First things first: you need a single repo that has all the available history.
Make a clone of the repo with the recent history. Add the repo with the old history as a remote. I recommend this clone be a "mirror" and that you finish by replacing your origin repo with this one. But alternately you can leave --mirror
off, and you'll finish by pushing (possibly force-pushing depending on which approach you use) all refs back to origin.
git clone --mirror url/of/current/repo
cd repo
git remote add history url/of/historical/repo
git fetch history
The next thing you need to do is figure out where you'll be splicing the history. The terminology to describe this is a bit fuzzy I think... what you want is to find the two commits that correspond to the most recent SVN revision for which both histories have a commit. For example your SVN repo contained versions 1, 2, 3, and 4. Now you have
Recent-History Repo
C --- D --- E --- F <--(master)
Old-History Repo
A --- B --- C' --- D'
where A
represents version 1, B
represents version 2, C
and C'
represent version 3, and D
and D'
represent version 4. E
and F
are work created after the original migration. So you want to splice the commits whose parent is D
(E
in this example) onto D'
.
Now, I can think of two approaches, each with pros and cons.
Rewriting The Recent History
IMO the best way if you can coordinate a cut-over of all developers to a new repo (meaning you arrange a time when they all agree that all outstanding work is pushed, so they discard their clones; then you do the conversion; then they all re-clone) is to (effectively) rebase the recent history onto the old history.
If there is really just a single branch, then you can literally use rebase
git rebase --onto D' D master
(where D
and D'
are replaced with the SHA ID of the commits).
More likely you have some branches and merges in the recent history; in that case a rebase operation will start becoming a problem very quickly. On the other hand, you can take advantage of the fact that D
has the same tree as D'
-- so a rebase and a re-parent are more or less equivalent.
So you can use git filter-branch
with a --parent-filter
to do the rewrite. Based on the examples in the docs at https://git-scm.com/docs/git-filter-branch you would do something like
git filter-branch --parent-filter 'test $GIT_COMMIT = D && echo "-p D'" || cat' HEAD
(where again D
and D'
are replaced with the SHA ID of the commits).
This creates "backup" refs that you'll need to clean up. In the end you'll get
A --- B --- C' --- D' --- E' --- F' <--(master)
It's the fact that F
was replace by F'
which creates the need for a hard cut-over (more or less).
Now if you made a mirror clone back at step 1, you can consider wiping the reflog, dropping the remotes, and running gc
, and then this is a new ready-to-use origin repo.
If you made a regular clone, then you'll need to push -f
all the refs to the origin, and this will likely leave behind some clutter on the origin repo.
Using a "replacement commit"
The other option doesn't create a hard cut-over, but it leaves you with small headaches to deal with forever. You can use git replace
. In your combined repo
git replace `D` `D'`
By default, when generating log output or whatever, if git finds D
, it will substitute D'
(and its history) in the output.
There are some known glitches. There may be unknown glitches. And by default the "replacement refs" that make this all work aren't shared, so you have to push and fetch them deliberately.