My Worldview has a Loose Tooth: A quick sketch of a spell checker

I was just poking around in some dusty corners of my hard drive, and I found this little gem. It's part of a writeup for a lab I had to do last semester in Data Structures and Algorithms. Haha, we had 2 hours to implement this in C++... yuck

Norvig's spelling checker makes use of two probability models. The first model describes how likely a particular word is in a given language. For example, in English, "the" is more likely than"theraputic." The other model describes how likely a given mistake is when typing some word (ie. "tha" is more likely than "thm" when trying to type "the").

So the first step in implementing this is to build up a hash table k which maps correctly spelled words to their probability. This is our language model. Next we need to generate the error model. We do this by finding the set of all correctly spelled words which are one or two "edits" away from the word w that we are given to check. If w is in k, then we're done. Otherwise, if the set of words one edit away from w is not empty, take the one with the highest probability. If that set is empty, check the set of words two edits away, and again return the word with highest probability. Finally, if all else fails, return w.

Wow, that was fun. (Note: sarchasm) If we weren't forced to do this in C++ it really would have been fun... I love lisp.

My Worldview has a Loose Tooth

Monday, February 18, 2008

A quick sketch of a spell checker

No comments: