A hash table or dictionary is a data structure that stores key-value pairs. The advantage of the hash table is that given a key finding the corresponding value is pretty fast. Simplified, the time to find a key-value pair in the hash table does not depend on the size of the table. Compare that to storing the key-value pairs in a list or an array. To find a key-value pair you would have to search the list from the beginning until a matching key was found. The longer the list the more time it would take to find the key-value pair. Using big-O notation you can say that looking up a key in a hash table is of order O(1) while looking up a key in a list by using linear search is of order O(N) (simplified).
To insert a key-value pair in the hash table you will first have to compute the hash code of the key. In .NET all objects have a method named GetHashCode
that returns a hash code (32 bit integer) for that particular object. It is important that equal objects return the same hash code, but also very useful if different objects return different hash codes. Beware of the misconception that different objects cannot return the same hash code - they can, but it will result in a collision (see below).
As an example consider the hash codes of two strings:
"Boo" 0x598FD95A
"Foo" 0x598FD8DE
Even though the strings are very similar they have different hash codes.
I am simplifying things a bit here to focus on the important aspects of a hash table so for now let us say that Internally Dictionary<TKey, TValue>
stores the key-value pairs in an array. To locate the index in this array where the key-value pair will be stored you have to compute the hash code of the key modulo the size of the array. Assume the size of the array is 5:
Index("Boo") = 0x598FD95A % 5 = 4
Index("Foo") = 0x598FD8DE % 5 = 0
This leads to this internal hash table array:
+---+---------+
| 0 | "Foo" |
+---+---------+
| 1 | (empty) |
+---+---------+
| 2 | (empty) |
+---+---------+
| 3 | (empty) |
+---+---------+
| 4 | "Boo" |
+---+---------+
Looking up an entry in the hash table is very fast. You simply have to compute the hash code of the key modulo the size of the internal array and retrieve the string at that index.
Now consider the key "Zoo":
Index("Zoo") = 0x598FDC62 % 5 = 0
It has the same index as the key "Foo". This results in what is called a collision. A proper implementation of a hash table will have to handle collisions and there are different strategies for doing that. Also, as the internal array fills up there will be fewer and fewer empty elements in the array resulting in an increasing number of collisions. The load factor is the ratio between used elements and total elements in the internal array. In the example above the load factor is 2/5 = 0.4. Most hash table implementations will increase the size of the internal array when the load factor exceeds a certain threshold.
If you want to learn more about some of these concepts you will have to study some of the more comprehensive resources linked in other answers.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…