tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: A spell corrector for apropos



On Wed, Oct 5, 2011 at 1:06 AM, David Young <dyoung%pobox.com@localhost> wrote:
>> I have referred to Prof. Peter Norvig's article [1] on spell
>> correctors and translated his Python implementation to C. Following
>> are some of the results:
>>
>> $ ./apropos "funckiton for coping stings"
>> Did you mean "function for copying strings" ?
>
> I think that it's good to print the correction.  Since 9 times out of 10
> the user is going to immediately run apropos(1) on the correction, you
> should save them the trouble and run the corrected search, too. :-)
Yes, I suppose that makes sense. The only trouble is sometimes the
spell suggestions are not so accurate. This is simply because of the
training model used for it. If there are 2 or more matching words at
edit distance 1, it will pick up the word with maximum number of
occurrences in the corpus. But in some rare cases it might turn out
that the user really meant to mention the less frequent keyword. For
instance:
$./apropos acp
Did you mean tcp ?

I was looking for acpi as the suggestion. But I guess most of the time
this works and at the end apropos can give a warning that it thought
there was a spelling error so it searched for what it thought would be
the correct query.

There is of course lots of room for improving this behaviour :-)

> Do you correct words that appear in a manual page with negligible
> probability (< .001% ??) or only words that are nowhere in the corpus?

The latter. If there is a word in the dictionary with even a single
occurrence, it will be counted as a viable candidate and returned to
the user provided there are no other words found at the same edit
distance with larger number of occurrences (thus greater probability).

Do you think correcting words with negligible probability would be a
good thing to do ?. I think Google exhibits this behaviour, shows
results with the corrected spellings and asks the users if (s)he
really meant to search with originally spelled keywords.
I believe this might improve the behaviour of the spell corrector in
general cases but for the rare cases where the user really wanted to
search for the less used keywords (like the driver names in section 4
man pages), and apropos instead does this magic, it might be annoying.


>> Screenshots of the CGI version of apropos running in the broser:
>>
>> http://2.bp.blogspot.com/-Ql6yFBO4IsU/Tos03HPpnvI/AAAAAAAACEk/NCE7oItTdeo/s1600/foopen.png
>> http://1.bp.blogspot.com/-PVBOUigp7jU/Tos09-JjkCI/AAAAAAAACEo/X8JA6l5ks34/s1600/databoose.png
>> http://4.bp.blogspot.com/-1vuzW4uwmkU/Tos1CrurkQI/AAAAAAAACEs/hxi4ZnXOQ5U/s1600/dns.png
>> http://1.bp.blogspot.com/-5iyCVl6zQY0/Tos1HT_bp1I/AAAAAAAACEw/NREcz0zOSEk/s1600/reeltak.png
>
> Nice!
Thanks :)

--
Abhinav


Home | Main Index | Thread Index | Old Index