that how hashcode for the given key is calculated in detail by using
this formula
In case of String
this is calculated by String#hashCode();
which is implemented as follows:
public int hashCode() {
int h = hash;
int len = count;
if (h == 0 && len > 0) {
int off = offset;
char val[] = value;
for (int i = 0; i < len; i++) {
h = 31*h + val[off++];
}
hash = h;
}
return h;
}
Basically following the equation in the java doc
hashcode = s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
One interesting thing to note on this implementation is that String
actually caches its hash code. It can do this, because String
is immutable.
If I calculate the hashcode of the String
"Amit", it will yield to this integer:
System.out.println("Amit".hashCode());
> 2044535
Let's get through a simple put to a map, but first we have to determine how the map is built.
The most interesting fact about a Java HashMap
is that it always has 2^n buckets. So if you call it, the default number of buckets is 16, which is obviously 2^4.
Doing a put operation on this map, it will first get the hashcode of the key. There happens some fancy bit twiddeling on this hashcode to ensure that poor hash functions (especially those that do not differ in the lower bits) don't "overload" a single bucket.
The real function that is actually responsible for distributing your key to the buckets is the following:
h & (length-1); // length is the current number of buckets, h the hashcode of the key
This only works for power of two bucket sizes, because it uses & to map the key to a bucket instead of a modulo.
"Amit" will be distributed to the 10th bucket, because of the bit twiddeling. If there were no bit twiddeling it would go to the 7th bucket, because 2044535 & 15 = 7
.
Now that we have an index for it, we can find the bucket. If the bucket contains elements, we have to iterate over them and replace an equal entry if we find it.
If none item has been found in the linked list we will just add it at the beginning of the linked list.
The next important thing in HashMap
is the resizing, so if the actual size of the map is above over a threshold (determined by the current number of buckets and the loadfactor, in our case 16*0.75=12) it will resize the backing array.
Resize is always 2 * the current number of buckets, which is guranteed to be a power of two to not break the function to find the buckets.
Since the number of buckets change, we have to rehash all the current entries in our table.
This is quite costly, so if you know how many items there are, you should initialize the HashMap
with that count so it does not have to resize the whole time.