Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
958 views
in Technique[技术] by (71.8m points)

algorithm - How can a HashSet offer constant time add operation?

I was reading the javadocs on HashSet when I came across the interesting statement:

This class offers constant time performance for the basic operations (add, remove, contains and size)

This confuses me greatly, as I don't understand how one could possibly get constant time, O(1), performance for a comparison operation. Here are my thoughts:

If this is true, then no matter how much data I'm dumping into my HashSet, I will be able to access any element in constant time. That is, if I put 1 element in my HashSet, it will take the same amount of time to find it as if I had a googolplex of elements.

However, this wouldn't be possible if I had a constant number of buckets, or a consistent hash function, since for any fixed number of buckets, the number of elements in that bucket will grow linearly (albeit slowly, if the number is big enough) with the number of elements in the set.

Then, the only way for this to work is to have a changing hash function every time you insert an element (or every few times). A simple hash function that never any collisions would satisfy this need. One toy example for strings could be: Take the ASCII value of the strings and concatenate them together (because adding could result in a conflict).

However, this hash function, and any other hash function of this sort will likely fail for large enough strings or numbers etc. The number of buckets that you can form is immediately limited by the amount of stack/heap space you have, etc. Thus, skipping locations in memory can't be allowed indefinitely, so you'll eventually have to fill in the gaps.

But if at some point there's a recalculation of the hash function, this can only be as fast as finding a polynomial which passes through N points, or O(nlogn).

Thus arrives my confusion. While I will believe that the HashSet can access elements in O(n/B) time, where B is the number of buckets it has decided to use, I don't see how a HashSet could possibly perform add or get functions in O(1) time.

Note: This post and this post both don't address the concerns I listed..

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The number of buckets is dynamic, and is approximately ~2n, where n is the number of elements in the set.

Note that HashSet gives amortized and average time performance of O(1), not worst case. This means, we can suffer an O(n) operation from time to time.
So, when the bins are too packed up, we just create a new, bigger array, and copy the elements to it.
This costs n operations, and is done when number of elements in the set exceeds 2n/2=n, so it means, the average cost of this operation is bounded by n/n=1, which is a constant.

Additionally, the number of collisions a HashMap offers is also constant on average.

Assume you are adding an element x. The probability of h(x) to be filled up with one element is ~n/2n = 1/2. The probability of it being filled up with 3 elements, is ~(n/2n)^2 = 1/4 (for large values of n), and so on and so on.
This gives you an average running time of 1 + 1/2 + 1/4 + 1/8 + .... Since this sum converges to 2, it means this operation takes constant time on average.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...