I was just poking around in some dusty corners of my hard drive, and I found this little gem. It's part of a writeup for a lab I had to do last semester in Data Structures and Algorithms. Haha, we had 2 hours to implement this in C++... yuck
Norvig's spelling checker makes use of two probability models. The first model describes how likely a particular word is in a given language. For example, in English, "the" is more likely than"theraputic." The other model describes how likely a given mistake is when typing some word (ie. "tha" is more likely than "thm" when trying to type "the").Wow, that was fun. (Note: sarchasm) If we weren't forced to do this in C++ it really would have been fun... I love lisp.
So the first step in implementing this is to build up a hash table k which maps correctly spelled words to their probability. This is our language model. Next we need to generate the error model. We do this by finding the set of all correctly spelled words which are one or two "edits" away from the word w that we are given to check. If w is in k, then we're done. Otherwise, if the set of words one edit away from w is not empty, take the one with the highest probability. If that set is empty, check the set of words two edits away, and again return the word with highest probability. Finally, if all else fails, return w.
No comments:
Post a Comment