Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
563 views
in Technique[技术] by (71.8m points)

hash - Overriding Python's Hashing Function in Dictionary

I am trying to create a custom hash function for some object that I'll be hashing into a dictionary. The hashing function is unique (not the standard Python one). This is very important to me: to use the unique function. Each key's value is a list.

Assuming I override __hash__ and end up coming up with the right hash number for an object. Would:

dict = {}
dict[number_here] = value

hash the value into the position number number_here, or would it still be at the position that Python's hash table would compute for that number?

Printing dict only shows the items and not which position they're that. However, when I do hash(4), the result is 4. So I'm assuming this means integers are hashed to their respective locations?

Could someone please verify my findings or explain to me if I'm wrong?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The python dict implementation uses the hash value to both sparsely store values based on the key and to avoid collisions in that storage. It uses the result of hash() as a starting point, it is not the definitive position.

Thus, although hash(4) returns 4, the exact 'position' in the underlying C structure is also based on what other keys are already there, and how large the current table is. The python hash table is resized as needed (items added), for example.

Since a dict has no ordering, this is not something you need to worry about, or can hope to influence. If you need ordering in a dict, use the collections.OrderedDict() implementation instead, which keeps track of ordering separately.

The details of the python hash table implementation

You may want to read up on how hash tables work on Wikipedia; Python uses open addressing for it's implementation.

When selecting a slot in the table, the modulus of the hash value (an integer) and the current table size is taken, thus on a table of size 32, so the key 45, hash value 45 would initially be stored in slot 14.

If there is a collision (there already is something else stored in slot 14 and it's not the integer 45), then the slot value is perturbed until an empty slot is found or the same key is found instead. Perturbing is done with the formula:

perturb = slot = hash
while slot_is_full and item_in_slot_is_not_equal_to_key:
    slot = (5*slot) + 1 + perturb
    perturb >>= 5

So, when there is a collision, another slot is picked at progressively smaller steps until it scans the whole table. Note that the table will already have been resized to make space if necessary.

In order for this to work correctly, custom types need both a __hash__() method and need to implement __eq__() to determine if two instances represent the same key. Matching hash values is not enough. For the dict implementation to consider two instances to represent the exact same key, both their hash values must match, and they must return True for the == equality operator. Such objects are considered hashable.

(For Python 2.x, implementing the __cmp__() hook would do instead of implementing __eq__(); support for this has been removed in Python 3).


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...