(Note: see also Understanding git rev-list, to understand how the code below works.)
You need to use the SHA-1 IDs being supplied on standard input:
while read oldsha newsha refname; do
... testing code goes here ...
done
The "testing code" then needs to look at at least some and maybe all three items, depending on the tests to be performed.
The value in $oldsha
will be 40 0
s if the reference name $refname
is being proposed to be created. That is, $refname
(typically something like refs/heads/master
or refs/tags/v1.2
, but any name in refs/
can appear: refs/notes/commits
, for instance) does not exist now in the receiving repository, but will exist and will point to $newsha
if you allow the change.
The value in $newsha
will be 40 0
s if the reference name $refname
is being proposed to be deleted. That is, $refname
does exist now and points to object $oldsha
; if you allow the change, that reference-name will be deleted.
The values of both will be nonzero if the reference name $refname
is being proposed to be updated, i.e., it currently points to git object $oldsha
, and if you allow the change, it will point to new object $newsha
instead.
If you just run git log
or git show
, git uses the SHA-1 it finds by running git rev-parse HEAD
. In a typical receiving repository, HEAD
is a symbolic reference pointing to refs/heads/master
(the file HEAD
literally contains the string ref: refs/heads/master
), so you will see the top-most commit on branch master
(as you observed).
You need to look specifically at any new objects coming in. How do you know which new objects are coming in? That depends on what's happening to the supplied $refname
, and possibly other refnames as well.
If the refname is to be deleted, nothing new is coming in. Whether any underlying git objects will be deleted (garbage collected) depends on whether that refname is the "last" reference to those objects. For instance, suppose the entire standard input sequence consists of two directives:
- delete
refs/heads/foo
- delete
refs/tags/v1.1
Suppose further that refs/heads/foo
(branch foo
) points to commit F
in this commit-graph diagram, and tag v1.1
points to annotated tag G
:
A - B - C - D <-- refs/heads/master
E - F <-- refs/heads/foo
G <-- refs/tags/v1.1
Deleting branch foo
is "safe" in that no commits will go away because annotated tag G
will retain them, via the v1.1
tag.
Deleting tag v1.1
is "safe"(ish) in that no commits will go away because branch foo
will retain them, via the refs/heads/foo
reference. (The annotated tag object itself will go away. It's up to you whether to allow this)
However, deleting both is not safe: commits E
and F
will become unreachable and will be collected. (It's up to you whether to allow this anyway.)
On the other hand, it's possible that along with those two directives, stdin contains a third directive:
- create
refs/heads/foo2
pointing to commit H
, with commit H
pointing to commit G
as its parent [Edit: on re-reading this now, I notice the glaring assumption that G
is a commit object rather than a tag object. If we assume G
is a commit object the rest of the below is correct, but the above becomes at least a little wrong. However, the general idea—that the DAG is protected by having external references—is still right, and this should mostly make sense.]
in which case deletion of foo
is safe after all, as the new branch foo2
will retain commit H
which will retain commit G
.
Doing a complete analysis is tricky; it's often better to just do a piecewise analysis that allows "safe" operations (whatever you decide these are), and force users to push updates piecewise in a "safe" manner (create branch foo2
first, and only then delete branch foo
as a separate push, for instance).
If you only want to look at new commits, then, for each reference update:
- If it's a delete, allow it (or use other rules).
- If it's a create or a modification, find commit objects it makes reachable that were not reachable before, and examine those commits.
In most "normal" pre-receive hooks you'd use the methods outlined below, but we have an alternative for this particular task.
There's a short-cut method for modifications that handles the most common, and usually most interesting, cases. Suppose someone proposes updating refs/heads/foo
from 1234567...
to 9876543...
. It's possible that some objects in the range already existed, e.g., perhaps 1234567
is the ID of commit C
and 9876543
is the ID of commit E
:
A - B - C <-- refs/heads/foo
D - E <-- refs/heads/bar
in which case this will examine objects D and E. This is also true if commits D
and E
have just been uploaded but have no references yet, i.e., the proposed update is to add D
and E
and the graph currently looks like this:
A - B - C <-- refs/heads/foo
D - E [no reference yet]
In either case, a simple:
git rev-list $oldsha..$newsha
produces the object IDs you should look at.
For new references, there's no short-cut. For instance, suppose we have the same five commits shown above, with the same refs/heads/foo
but no refs/heads/bar
, and the actual proposal is "create refs/heads/bar
pointing to E
". In this case, we should again look at commits D
and E
, but there's no obvious way to know about D
.
The non-obvious way, which only works in some cases, is to find objects that will be reachable given the proposed creation, that are not currently reachable at all:
git rev-list $newsha --not --all
In this particular case, this will again produce the IDs for D
and E
.
Now let's consider your particular case, where you want to look at all commits that are being proposed-to-be-added. Here's a way to handle this one.
For all proposed updates:
- If this one is a delete, we have some deletes.
- If this one is a create or update, we have some new commits; accumulate the new SHA.
If we have some deletes and we have accumulated some SHAs, reject the attempt: it's too hard. Make the user separate out the operations.
Otherwise, if we have no accumulated SHAs, we must just have deletes (or maybe nothing at all—should not happen, but harmless); allow this (exit 0).
Otherwise we must have some new SHA-1 values.
Using the proposed new SHAs as starting points, find all git objects that would be reachable, excluding all objects that are currently reachable under any name. These are all the new objects.
For each one that is a commit, examine it to see if it's forbidden. If so, reject the entire operation (even if some parts could succeed); as before, it's too hard to figure out, so make the user separate out the "good" operations from the "bad" ones.
If we get this far, everything is OK; permit the entire update.
In code form:
#! /bin/sh
# (untested)
NULL_SHA1="0000000000000000000000000000000000000000" # 40 0's
new_list=
any_deleted=false
while read oldsha newsha refname; do
case $oldsha,$newsha in
*,$NULL_SHA1) # it's a delete
any_deleted=true;;
$NULL_SHA1,*) # it's a create
new_list="$new_list $newsha";;
*,*) # it's an update
new_list="$new_list $newsha";;
esac
done
$any_deleted && [ -n "$new_list" ] && {
echo 'error: you are deleting some refs and creating/updating others'
echo 'please split your push into separate operations'
exit 1
}
[ -z "$new_list" ] && exit 0
# look at all new objects, and verify them
# let's write the verifier function, including a check_banned function...
check_banned() {
if [ "$1" = root ]; then
echo "################################################################"
echo "Commits from $1 are not allowed"
echo ... rest of message ...
exit 1
fi
}
check_commit() {
check_banned "$(git log -1 --pretty=format:%an $1)"
check_banned "$(git log -1 --pretty=format:%cn $1)"
}
git rev-list $new_list --not --all |
while read sha1; do
objtype=$(git cat-file -t $sha1)
case $objtype in
commit) check_commit $sha1;;
esac
done