Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.2k views
in Technique[技术] by (71.8m points)

string - Best data structure for implementing a dictionary?

What would be the best data structure to store all the words of a dictionary? The best I could think of was to use a HashMap, which will map to a HashTable. Basically, depending upon the first character, we will get the associated HashTable and then using this, we can add the words starting from that character. We'll then pick a good hash function based on the string.

Is there a better approach?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Depending on what you want to do, there are many good data structures.

If you just want to store the words and ask "is this word here or not?", a standard hash table with no other fancy machinery is a reasonable approach. If that word is list fixed in advance, consider using a perfect hash table to get excellent performance and space usage.

If you want to be able to check if a given prefix exists while supporting fast lookups, a trie is a good option, though it can be a bit space-inefficient. It also supports fast insertions or deletions. It also allows for iteration in alphabetical order, which hashing doesn't offer. This is essentially the structure you've described in your answer, but depending on the use case other representations of tries might be better.

If in addition to the above, you know for a fact that the word list is fixed, consider using a DAWG (directed acyclic word graph), which is essentially a minimum-state DFA for the language. It's substantially more compact than the trie, but supports many of the same operations.

If you want trie-like behavior but don't want to pay a huge space penalty, the ternary search tree is another viable option, as is the radix tree. These are very different structures, but can be much better than the trie in different circumstances.

If space is a concern but you want a trie, look into the succinct trie representation, which has slower lookups but just about theoretically optimal space usage. The link discusses how it's being used in JavaScript as an easy way to transmit a huge amount of data. An alternative compact representation is the double-array trie, though admittedly I know very little about it.

If you want to use the dictionary for operations like spell-checking where you need to find words similar to other words, the BK-tree is an excellent data structure to consider.

Hope this helps!


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...