Programming Journey

Levenshtein Distance Spelling Correction

Sat 18 January 2014

As I started looking into spelling correction, I begin to stumble on several interesting essays about spelling correction. Here are what I've got about Levenshtein Distance Spelling Correction.

But first, we should learn something about Levenshtein Distance (or edit distance).

Peter Norvig's algorithm

The origin of Levenshtein Distance Spelling Correction can be traced to Peter Norvig's famous essay, How to Write a Spelling Corrector.

In Peter Norvig's algorithm, he stores his dictionary words in a python dictionary (similar to a hash table) for lookup. Then he takes the target word (which is probably a mis-spelled word) and generate a list of its spelling mutations in levenshtein distance 2. For every spelling mutation C in the dictionary, Peter Norvig would calculate the probability of C given the original word W and try to find the C with maximum probability.

Other Algorithms

Steve Hanvo's post on Levenshtein Distance approach takes the target word and search the whole dictionary and find the dictionary word with least Levenshtein Distance.

What's interesting is how the author improves it. He figured by storing the dictionary words to the trie, a great deal of cost in calculating Levenshtein Distance can be saved. This is due to the property of prefix trie and the calculation of Levenshtein Distance, which allows the reusing of calculation results. Further improvements includes compressing the trie into a deterministic acyclic finite state automaton (DAFSA) or directed acyclic word graph (DAWG) and a hash table to minimize memory usage.

The BK-Tree is a data structure in improving the dictionary searching, reducing the search time from O(n) to O(log n). Compared to the previously mentioned dictionary search, instead of searching the whole dictionary, the BK-Tree allows partial search.

Here is an interesting reddit discussion about this topic.

A post about Lucene, which features Levenshtein Automaton.

Category: Algorithm Tagged: Algorithm

comments

CrzInput: Spelling Correction with Gesture

Sat 11 January 2014

One day I was suddenly curious about how Apple implemented the English auto-correction feature so I started wondering that might be something with Viterbi Algorithm. Then I went on github for some search, then I found this repo, CrzInput. And the effect is similar to Flesky.

Category: Algorithm Tagged: C Algorithm

comments

Sat 30 November 2013

Let's describe the problem briefly here. Given a string a necklace composed with Red, Blue and White beans, we are going to find the maximum number of beans we can collected if we can choose to break the necklace at a certain point. By "maximum number of beans we can collected", I mean starting from the break point, we collect beads of the same color from one end until you reach a bead of a different color, and do the same for the other end.

Category: Algorithm Tagged: C USACO Algorithm

comments

Levenshtein Distance Spelling Correction

Sat 18 January 2014

Peter Norvig's algorithm

Other Algorithms

CrzInput: Spelling Correction with Gesture

Sat 11 January 2014

USACO Section 1.1 Broken Necklace

Sat 30 November 2013

Page 1 of 1