The python dict
implementation uses the hash value to both sparsely store values based on the key and to avoid collisions in that storage. It uses the result of hash()
as a starting point, it is not the definitive position.
Thus, although hash(4)
returns 4
, the exact 'position' in the underlying C structure is also based on what other keys are already there, and how large the current table is. The python hash table is resized as needed (items added), for example.
Since a dict has no ordering, this is not something you need to worry about, or can hope to influence. If you need ordering in a dict, use the collections.OrderedDict()
implementation instead, which keeps track of ordering separately.
The details of the python hash table implementation
You may want to read up on how hash tables work on Wikipedia; Python uses open addressing for it's implementation.
When selecting a slot in the table, the modulus of the hash value (an integer) and the current table size is taken, thus on a table of size 32, so the key 45
, hash value 45
would initially be stored in slot 14.
If there is a collision (there already is something else stored in slot 14 and it's not the integer 45
), then the slot value is perturbed until an empty slot is found or the same key is found instead. Perturbing is done with the formula:
perturb = slot = hash
while slot_is_full and item_in_slot_is_not_equal_to_key:
slot = (5*slot) + 1 + perturb
perturb >>= 5
So, when there is a collision, another slot is picked at progressively smaller steps until it scans the whole table. Note that the table will already have been resized to make space if necessary.
In order for this to work correctly, custom types need both a __hash__()
method and need to implement __eq__()
to determine if two instances represent the same key. Matching hash values is not enough. For the dict
implementation to consider two instances to represent the exact same key, both their hash values must match, and they must return True for the ==
equality operator. Such objects are considered hashable.
(For Python 2.x, implementing the __cmp__()
hook would do instead of implementing __eq__()
; support for this has been removed in Python 3).