algorithm - Finding dictionary words

Question

Welcome To Ask or Share your Answers For Others

algorithm - Finding dictionary words

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

algorithm - Finding dictionary words

I have a lot of compound strings that are a combination of two or three English words.

    e.g. "Spicejet" is a combination of the words "spice" and "jet"

I need to separate these individual English words from such compound strings. My dictionary is going to consist of around 100000 words.

What would be the most efficient by which I can separate individual English words from such compound strings.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T17:46:37+0000

I'm not sure how much time or frequency you have to do this (is it a one-time operation? daily? weekly?) but you're obviously going to want a quick, weighted dictionary lookup.

You'll also want to have a conflict resolution mechanism, perhaps a side-queue to manually resolve conflicts on tuples that have multiple possible meanings.

I would look into Tries. Using one you can efficiently find (and weight) your prefixes, which are precisely what you will be looking for.

You'll have to build the Tries yourself from a good dictionary source, and weight the nodes on full words to provide yourself a good quality mechanism for reference.

Just brainstorming here, but if you know your dataset consists primarily of duplets or triplets, you could probably get away with multiple Trie lookups, for example looking up 'Spic' and then 'ejet' and then finding that both results have a low score, abandon into 'Spice' and 'Jet', where both Tries would yield a good combined result between the two.

Also I would consider utilizing frequency analysis on the most common prefixes up to an arbitrary or dynamic limit, e.g. filtering 'the' or 'un' or 'in' and weighting those accordingly.

Sounds like a fun problem, good luck!

Categories

algorithm - Finding dictionary words

algorithm - Finding dictionary words

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags