github - git push is very slow for a huge repo

Question

Welcome To Ask or Share your Answers For Others

github - git push is very slow for a huge repo

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

github - git push is very slow for a huge repo

I am having the same issue as in - git push is very slow for a branch but the answer there doesn't fit my situation.

I am working against a corporate GitHub with a very large repo. My process is as follows:

1) Pull from master

2) Create new branch

3) Commit

4) Push the branch to create a pull request.

When pushing the branch on (4) it wants to write over 1,000,000 objects which take about 3gb when the commit I made was to change only 1 line.

If I go to the GitHub UI and create a branch with the same name as in (2) from the UI, then push into that branch, the push takes less than a second. Needless to say that the changes between master and my branch are very minor (no big file added or deleted).

What can I do to make Git push only the relevant data and not the entire repo?

Git on Windows ver 2.17.0

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T17:51:27+0000

You could try your same push with:

Git For Windows 2.21
git config --global pack.sparse true (I presented the pack.sparse option here in March 2019)

This option is from those paches, and implemented in commit d5d2e93, which includes the comment:

These improvements will have even larger benefits in the super- large Windows repository.

That should be interesting in your case.

See "Exploring new frontiers for Git push performance" from Derrick Stolee

A git push would typically display something like:

$ git push origin topic
Enumerating objects: 3670, done.
Counting objects: 100% (2369/2369), done.
Delta compression using up to 8 threads
Compressing objects: 100% (546/546), done.
Writing objects: 100% (1378/1378), 468.06 KiB | 7.67 MiB/s, done.
Total 1378 (delta 1109), reused 1096 (delta 832)
remote: Resolving deltas: 100% (1109/1109), completed with 312 local objects.
To https://server.info/fake.git
* [new branch] topic -> topic

"Enumerating" means:

Git constructs a pack-file that contains the commit you are trying to push, as well as all commits, trees, and blobs (collectively, objects) that the server will need to understand that commit.
It finds a set of commits, trees, and blobs such that every reachable object is either in the set or known to be on the server.

The goal is to find the right "frontier"

The uninteresting commits that are direct parents of interesting commits form the frontier

Old:

To determine which trees and blobs are interesting, the old algorithm first determined all uninteresting trees and blobs.

Starting at every uninteresting commit in the frontier, recursively walk from its root tree and mark all reachable trees and blobs as uninteresting. This walk skips trees that were already marked as uninteresting to avoid revisiting potentially large portions of the graph.

New

The old algorithm is recursive: it takes a tree and runs the algorithm on all subtrees.

The new algorithm uses the paths to reduce the scope of the tree walk. It is also recursive, but it takes a set of trees.
As we start the algorithm, the set of trees contains the root trees for the uninteresting and the interesting commits.

The new tree walk recursively explores paths containing interesting and uninteresting trees.
Inside the trees at B, we have subtrees with names F and G.
Both sets have interesting and uninteresting paths, so we recurse into each set. This continues into B/F and B/G. The B/F set will not recurse into B/F/M or B/F/N and the B/G set will not recurse into B/G/X but not B/G/Y.

Categories

github - git push is very slow for a huge repo

github - git push is very slow for a huge repo

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Old:

New

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags