tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: A spell corrector for apropos



On Tue, Oct 04, 2011 at 11:39:30PM +0530, Abhinav Upadhyay wrote:
> Hi all,
> 
> While working on the apropos(1) project, I realised that when we are
> making apropos(1) clever enough to support full text searches for a
> better search experience, we also need to go one step further to
> provide spell checking and spell suggestions as well. The reason for
> this is that, if you misspell even one keyword in the query, you won't
> get any relevant results or possibly no results at all, because
> apropos searches for those documents which contain all the keywords
> mentioned in the query, and it is a general expectation to have such a
> feature with a full fledged search tool.
> 
> I have referred to Prof. Peter Norvig's article [1] on spell
> correctors and translated his Python implementation to C. Following
> are some of the results:
> 
> $ ./apropos "funckiton for coping stings"
> Did you mean "function for copying strings" ?

I think that it's good to print the correction.  Since 9 times out of 10
the user is going to immediately run apropos(1) on the correction, you
should save them the trouble and run the corrected search, too. :-)

Do you correct words that appear in a manual page with negligible
probability (< .001% ??) or only words that are nowhere in the corpus?

> Screenshots of the CGI version of apropos running in the broser:
> 
> http://2.bp.blogspot.com/-Ql6yFBO4IsU/Tos03HPpnvI/AAAAAAAACEk/NCE7oItTdeo/s1600/foopen.png
> http://1.bp.blogspot.com/-PVBOUigp7jU/Tos09-JjkCI/AAAAAAAACEo/X8JA6l5ks34/s1600/databoose.png
> http://4.bp.blogspot.com/-1vuzW4uwmkU/Tos1CrurkQI/AAAAAAAACEs/hxi4ZnXOQ5U/s1600/dns.png
> http://1.bp.blogspot.com/-5iyCVl6zQY0/Tos1HT_bp1I/AAAAAAAACEw/NREcz0zOSEk/s1600/reeltak.png

Nice!

Dave

-- 
David Young             OJC Technologies is now Pixo
dyoung%pixotech.com@localhost     Urbana, IL   (217) 344-0444 x24


Home | Main Index | Thread Index | Old Index