Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
221 views
in Technique[技术] by (71.8m points)

language agnostic - What algorithm gives suggestions in a spell checker?

What algorithm is typically used when implementing a spell checker that is accompanied with word suggestions?

At first I thought it might make sense to check each new word typed (if not found in the dictionary) against it's Levenshtein distance from every other word in the dictionary and returning the top results. However, this seems like it would be highly inefficient, having to evaluate the entire dictionary repeatedly.

How is this typically done?

Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

There is good essay by Peter Norvig how to implement a spelling corrector. It's basicly a brute force approach trying candidate strings with a given edit distance. (Here are some tips how you can improve the spelling corrector performance using a Bloom Filter and faster candidate hashing.)

The requirements for a spell checker are weaker. You have only to find out that a word is not in the dictionary. You can use a Bloom Filter to build a spell checker which consumes less memory. An ancient versions is decribed in Programming Pearls by Jon Bentley using 64kb for an english dictionary.

A BK-Tree is an alternative approach. A nice article is here.

Levenshstein distance is not exactly the right edit distance for a spell checker. It knows only insertion, deletion and substitution. Transposition is missing and produces 2 for a transposition of 1 character (it's 1 delete and 1 insertion). Damerau–Levenshtein distance is the right edit distance.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...