Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
645 views
in Technique[技术] by (71.8m points)

c - Is it possible to create a Minimal Perfect Hash function without a separate lookup table for a small (<64) set of keys?

I recently read this article Throw away the keys: Easy, Minimal Perfect Hashing about generating a minimal perfect hash table for a known set of keys.

The article seems to assume that you need an intermediate table. Is there any other, simpler way to generate such a function if we assume that the set of keys is small (i.e. < 64).

In my case, I want to map a set of thread ID:s to a unique block of data within an array. The threads are started before the hash function is generated and stay constant during the running time of the program. The exact number of threads vary but stays fixed during the runtime of the program:

unsigned int thread_ids*;
unsigned int thread_count;
struct {
    /* Some thread specific data */
}* ThreadData;

int start_threads () {
    /* Code which starts the threads and allocates the threaddata. */
}

int f(thread_id) {
    /* return unique index into threadData */
}

int main() {
    thread_count = 64; /* This number will be small, e.g. < 64 */
    start_threads();
    ThreadData[f(thread_ids[0])]
}
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Yes, you can build a minimal perfect hash function (MPHF) at runtime. There are multiple algorithms you can use, but most of them are a bit complex to implement so I can't give you working sample code. Many are implemented in the cmph project.

The most simple one is probably BDZ. On a high level, lookup requires calculating 3 hash functions, and 3 memory accesses. If memory isn't an issue, you only need 2. It supports millions of keys. This algorithm requires a lookup table that is about 1.23 times the number of entries.

There are other algorithms, one I invented myself, the RecSplit algorithm (there's even a research paper now), and there is a C++ implementation, and Java right now. Basically, the algorithms finds a way to split the set into subsets (recursively), until the subset size is 1. You need to remember how you split. The most simple solution is in fact using a lookup table for "how you split", but the table is really small, possibly only 5 integers for 64 keys. The first one to divide into 4 subsets of 16, and 4 to map each subset to a number 0..15.

(I added a second answer if you don't strictly need a minimal perfect hash function, just a perfect hash function. Construction is simpler and lookup is a lot faster, but requires a larger array.)


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...