tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: A spell corrector for apropos

On Wed, Oct 5, 2011 at 6:03 AM, David Young <> wrote:
> On Wed, Oct 05, 2011 at 03:34:04AM +0530, Abhinav Upadhyay wrote:
>> No, actually the reverse. This is how the spell corrector is
>> implemented. If a word exists in the dictionary then the spell
>> corrector assumes that the word is correctly spelled and does not
>> bother with computations. So if apropos returns search results with
>> the original query then it means that all the keywords were properly
>> spelled and the spell corrector would be useless. If even one of the
>> keywords was misspelled, apropos would not return any results and then
>> the spell checker kicks in.
> It sounds like you are producing the intersection of each keyword's
> matching manual pages, so that if any keyword matches no pages, then you
> get no results.

Right, it returns an intersection of the pages which matched one or
more keywords in the query.

> I think that a more useful result (and the kind of result that most
> of us are used to) would be the union of the manual page sets.  The
> relevance ranking will bring the results in the intersection near the
> top.

Yes. I can change the behaviour of the search to fetch the union of
the set of matching documents. I will have to see if this helps in
making the search better or degrades it.

> If any word is unknown or very rare, then you can expand the terms
> using spelling corrections.  Presumably the terms are weighted by the
> relevance function.  Say that the search is "acpi wake", then you could
> expand the query like this:
> {term: "acpi", weight: P(acpi|acp)}
> {term: "scp", weight: P(scp|acp)}
> {term: "tcp", weight: P(tcp|acp)}
> {term: "wake", weight: 1}

This is a cool idea. But I think, this might be quite slow in some
cases, as there can be more than one misspelled keywords in the query
and each of them might have 3-4 or more suggestions at edit distance 1
(edit distance 2 is out of question, it can be very slow). I am
skeptical but I guess this is worth experimenting and seeing the
results. :)


Home | Main Index | Thread Index | Old Index