Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
409 views
in Technique[技术] by (71.8m points)

algorithm - Argument for O(1) average-case complexity of heap insertion

The claim on the Wikipedia page for binary heaps is that insertion is O(log n) in the worst case, but O(1) on average:

The number of operations required depends only on the number of levels the new element must rise to satisfy the heap property, thus the insertion operation has a worst-case time complexity of O(log n) but an average-case complexity of O(1).

The linked page attempts to justify this as follows:

However, on average, the newly inserted element does not travel very far up the tree. In particular, assuming a uniform distribution of keys, it has a one-half chance of being greater than its parent; it has a one-half chance of being greater than its grandparent given that it is greater than its parent; it has a one-half chance of being greater than its great-grandparent given that it is greater than its parent, and so on [...] so that in the average case insertion takes constant time

This is surely nonsense, though? It seems to me that if the tree were randomly ordered then there would be a 50/50 chance that a new element was greater than its parent; but that since, roughly speaking, the large elements sink to the bottom, the chances are much less than 50/50 as the heap grows.

Is that right?

It's been like that on Wikipedia for several months...

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

There is a much better reference for the claim that average time heap insertion is O(1): the 1991 paper "Average Case Analysis of Heap Building by Repeated Insertion" by Hayward & McDiarmid. (This paper is linked in what is currently reference 4 of the Wikipedia article.) That paper in turn references a 1975 paper by Porter & Simon, "Random insertion into a priority queue structure" which deals with a single insertion into a heap, and demonstrates that the average case is O(1).

Intuitively, the argument is simple. Half of the heap is a leaf, and the leaves tend to be larger. If we assume for a moment that leaves are the largest elements in the heap (rather than tending to be larger), then we could say that the probability that a new element will be a leaf -- i.e., that it is in the upper half of the range of values -- would be exactly 0.5. If the new element is not a leaf of the heap (also probability 0.5), we can repeat the process with the truncated heap consisting only of non-leaf nodes in the original heap, so the probability that the new element is at the second-lowest level will be half of what remains: 0.25. And thus the probability that it is at the third level would be 0.125, and so on. Then the expected number of levels we would have to search through would be 1*0.5 + 2*0.25 + 3*0.125..., which is 2.

Of course, the probability that the random new element is greater than a random second-level parent is not really 0.5; it's actually a little less. But, as long as it is bounded by a constant, the sum of the power series which computes the expected number of comparisons will still also be bounded by a constant. And it turns out that the constant is around 2.6.

Also see this useful answer that, while discussing complexities of heaps, and contrasting them with those of BST, gives a detailed graphical analysis of the constant average insertion time in heaps.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...