There are multiple ways of implementing set (and map) functionality, for example:
- tree-based approach (ordered traversal)
- hash-based approach (unordered traversal)
Since you mentioned value-indexed arrays, let's try the hash-based approach which builds naturally on top of the value-indexed array technique.
Beware of the advantages and disadvantages of hash-based vs. tree-based approaches.
You can design a hash-set (a special case of hash-tables) of pointers to hashable PODs, with chaining, internally represented as a fixed-size array of buckets of hashables, where:
- all hashables in a bucket have the same hash value
- a bucket can be implemented as a dynamic array or linked list of hashables
- a hashable's hash value is used to index into the array of buckets (hash-value-indexed array)
- one or more of the hashables contained in the hash-set could be (a pointer to) another hash-set, or even to the hash-set itself (i.e. self-inclusion is possible)
With large amounts of memory at your disposal, you can size your array of buckets generously and, in combination with a good hash method, drastically reduce the probability of collision, achieving virtually constant-time performance.
You would have to implement:
- the hash function for the type being hashed
- an equality function for the type being used to test whether two hashables are equal or not
- the hash-set
contains
/insert
/remove
functionality.
You can also use open addressing as an alternative to maintaining and managing buckets.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…