tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: A spell corrector for apropos



On Wed, Oct 05, 2011 at 03:34:04AM +0530, Abhinav Upadhyay wrote:
> On Wed, Oct 5, 2011 at 2:39 AM, Matthew Mondor 
> <mm_lists%pulsar-zone.net@localhost> wrote:
> > On Tue, 4 Oct 2011 23:39:30 +0530
> > Abhinav Upadhyay <er.abhinav.upadhyay%gmail.com@localhost> wrote:
> >
> >> I have referred to Prof. Peter Norvig's article [1] on spell
> >> correctors and translated his Python implementation to C. Following
> >> are some of the results:
> >>
> >> $ ./apropos "funckiton for coping stings"
> >> Did you mean "function for copying strings" ?
> >> [...]
> >
> > Very nice.
> 
> Thanks :-)
> 
> > If I understand, it would ask the question, although non-interactively,
> > followed with results for the original string (as opposed to results for
> > the corrected string)?
> 
> No, actually the reverse. This is how the spell corrector is
> implemented. If a word exists in the dictionary then the spell
> corrector assumes that the word is correctly spelled and does not
> bother with computations. So if apropos returns search results with
> the original query then it means that all the keywords were properly
> spelled and the spell corrector would be useless. If even one of the
> keywords was misspelled, apropos would not return any results and then
> the spell checker kicks in.

It sounds like you are producing the intersection of each keyword's
matching manual pages, so that if any keyword matches no pages, then you
get no results.

I think that a more useful result (and the kind of result that most
of us are used to) would be the union of the manual page sets.  The
relevance ranking will bring the results in the intersection near the
top.

If any word is unknown or very rare, then you can expand the terms
using spelling corrections.  Presumably the terms are weighted by the
relevance function.  Say that the search is "acpi wake", then you could
expand the query like this:

{term: "acpi", weight: P(acpi|acp)}
{term: "scp", weight: P(scp|acp)}
{term: "tcp", weight: P(tcp|acp)}
{term: "wake", weight: 1}

Dave

-- 
David Young             OJC Technologies is now Pixo
dyoung%pixotech.com@localhost     Urbana, IL   (217) 344-0444 x24


Home | Main Index | Thread Index | Old Index