The build-heap algorithm starts at the midpoint and moves items down as required. Let's consider a heap of 127 items (7 levels). In the worst case:
64 nodes (the leaf level) don't move at all
32 nodes move down one level
16 nodes move down two levels
8 nodes move down three levels
4 nodes move down four levels
2 nodes move down five levels
1 node moves down six levels
So in the worst case you have
64*0 + 32*1 + 16*2 + 8*3 + 4*4 + 2*5 + 1*6
0 + 32 + 32 + 24 + 16 + 10 + 6 = 120 swaps
So in the worst case, build-heap makes fewer than N swaps.
Because build-heap requires that you swap an item with the smallest of its children, it requires two comparisons to initiate a swap: one to find the smallest of the two children, and one to determine if the node is larger and must be swapped.
The number of comparisons required to move a node is 2*(levels_moved+1)
, and no more than N/2 nodes will be moved.
The general case
We need to prove that the maximum number of comparisons is no more than 2N-2. As I noted above, it takes two comparisons to move a node one level. So if the number of levels moved is less than N (i.e. (N-1) or fewer), then the maximum number of comparisons cannot exceed 2N-2.
I use a full heap in the discussion below because it represents the worst case.
In a full heap of N items, there are (N+1)/2 nodes at the leaf level. (N+1)/4 at the next level up. (N+1)/8 at the next, etc. You end up with this:
(N+1)/2 nodes move 0 levels
(N+1)/4 nodes move 1 level
(N+1)/8 nodes move 2 levels
(N+1)/16 nodes move 3 levels
(N+1)/32 nodes move 4 levels
...
That gives us the series:
((N+1)/2)*0 + ((N+1)/4)*1 + ((N+1)/8)*2 + ((N+1)/16)*3 ...
Let's see what that does for heaps of different sizes:
heap size levels levels moved
1 1 0
3 2 1
7 3 2*1 + 1*2 = 4
15 4 4*1 + 2*2 + 1*3 = 11
31 5 8*1 + 4*2 + 2*3 + 1*4 = 26
63 6 16*1 + 8*2 + 4*3 + 2*4 + 1*5 = 57
127 7 32*1 + 16*2 + 8*3 + 4*4 + 2*5 + 1*6 = 120
....
I ran that for heaps up of up to 20 levels (size a million and change), and it holds true: the maximum number of levels moved for a full heap of N items is N-log2(N+1).
Taking the above series as an Arithetico-geometric Sequence, we compute the sum for log2(N + 1) - 1
terms, ignoring the first term as it is zero, to be equal to N - 1
. (Recall that a full binary tree has log2(N + 1)
levels)
This sum represents the total number of times a siftup operation was performed. The total number of comparisons thus required is 2N - 2
(since each sift up operation requires two comparisons). This is also the upper bound, since a full binary tree always represents the worst case for a given tree depth.