hashtable - Why isn't modulus sufficient within a hash function for hash tables?

Question

Welcome To Ask or Share your Answers For Others

hashtable - Why isn't modulus sufficient within a hash function for hash tables?

posted Oct 6, 2021 in Technique[技术] by 深蓝 (71.8m points)

hashtable - Why isn't modulus sufficient within a hash function for hash tables?

I often see or hear of modulus being used as a last step of hashing or after hashing. e.g. h(input)%N where h is the hash function and % is the modulus operator. If I am designing a hash table, and want to map a large set of keys to a smaller space of indices for the hash table, doesn't the modulus operator achieve that? Furthermore, if I wanted to randomize the distribution across those locations within the hash table, is the remainder generated by modulus not sufficient? What does the hashing function h provide on top of the modulus operator?

question from:https://stackoverflow.com/questions/66068727/why-isnt-modulus-sufficient-within-a-hash-function-for-hash-tables

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-06T03:00:50+0000

I often see or hear of modulus being used as a last step of hashing or after hashing. e.g. h( input ) % N where h is the hash function and % is the modulus operator.

Indeed.

If I am designing a hash table, and want to map a large set of keys to a smaller space of indices for the hash table, doesn't the modulus operator achieve that?

That's precisely the purpose of the modulo operator: to restrict the range of array indexes, so yes.

But you cannot simply use the modulo operator by itself: the modulo operator requires an integer value: you cannot get the "modulo of a string over N" or "modulo of an object-graph over N"[1].

Furthermore, if I wanted to randomize the distribution across those locations within the hash table, is the remainder generated by modulus not sufficient?

No, it does not - because the modulo operator doesn't give you pseudorandom output - nor does it have any kind of avalanche effect - which means that similar input values will have similar output hashes, which will result in clustering in your hashtable bins, which will result in subpar performance due to the greatly increased likelihood of hash-collisions (and so requiring slower techniques like linear-probing which defeat the purpose of a hashtable because you lose O(1) lookup times.

What does the hashing function h provide on top of the modulus operator?

The domain of h can be anything, especially non-integer values.

[1] Technically speaking, this is possible if you use the value of the memory address of an object (i.e. an object pointer), but that doesn't work if you have hashtable keys that don't use object identity, such as a stack-allocated object or custom struct.

Categories

hashtable - Why isn't modulus sufficient within a hash function for hash tables?

hashtable - Why isn't modulus sufficient within a hash function for hash tables?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags