tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

A spell corrector for apropos



Hi all,

While working on the apropos(1) project, I realised that when we are
making apropos(1) clever enough to support full text searches for a
better search experience, we also need to go one step further to
provide spell checking and spell suggestions as well. The reason for
this is that, if you misspell even one keyword in the query, you won't
get any relevant results or possibly no results at all, because
apropos searches for those documents which contain all the keywords
mentioned in the query, and it is a general expectation to have such a
feature with a full fledged search tool.

I have referred to Prof. Peter Norvig's article [1] on spell
correctors and translated his Python implementation to C. Following
are some of the results:

$ ./apropos "funckiton for coping stings"
Did you mean "function for copying strings" ?
$ ./apropos "generat termcap databse"
Did you mean "generate termcap database" ?
$ ./apropos idcmp
Did you mean "icmp" ?
$ ./apropos "confguire kernal"
Did you mean "configure kernel" ?
$ ./apropos "packate fillter"
Did you mean "package filter" ?
$ ./apropos reeltek
Did you mean "realtek" ?

Screenshots of the CGI version of apropos running in the broser:

http://2.bp.blogspot.com/-Ql6yFBO4IsU/Tos03HPpnvI/AAAAAAAACEk/NCE7oItTdeo/s1600/foopen.png
http://1.bp.blogspot.com/-PVBOUigp7jU/Tos09-JjkCI/AAAAAAAACEo/X8JA6l5ks34/s1600/databoose.png
http://4.bp.blogspot.com/-1vuzW4uwmkU/Tos1CrurkQI/AAAAAAAACEs/hxi4ZnXOQ5U/s1600/dns.png
http://1.bp.blogspot.com/-5iyCVl6zQY0/Tos1HT_bp1I/AAAAAAAACEw/NREcz0zOSEk/s1600/reeltak.png

I have also put up a blog post with the implementation details:
http://abhinav-upadhyay.blogspot.com/2011/10/spell-corrector-for-apropos.html

This code is not in the master branch of apropos_replacement as I am
not sure if such a feature is desired in apropos or not. I had
mentioned this on my TODO list during the GSoC but ran out of time.
The code is on the demo-spell and exp-spell branches of the project
repository on Github [2].

References:
[1]: http://norvig.com/spell-correct.html
[2]: https://github.com/abhinav-upadhyay/apropos_replacement/

--
Thanks
Abhinav Upadhyay


Home | Main Index | Thread Index | Old Index