Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Potentially undesirable behavior with apropos(1)



hi Thilo,

On Fri, Jul 8, 2016 at 3:48 PM, Thilo Jeremias <thilo%nispuk.com@localhost> wrote:
> Hi Abhinav,
>
>> On 8 Jul 2016, at 8:08 AM, Abhinav Upadhyay <er.abhinav.upadhyay%gmail.com@localhost> wrote:
>>
>> Hi Paul,
>>
>>> On Fri, Jul 8, 2016 at 7:24 AM, Paul Goyette <paul%whooppee.com@localhost> wrote:
>>> With a reasonably current 7.99.33 (less than a week old), I noticed that
>>> when I request
>>>
>>>        apropos kms
>>>
>>> (expecting to find man pages referencing "xxxdrmkms"), it seems to find a
>>> lot of entries for "km".  Is this intended?  None of the found entries has
>>> "kms", only "km".
>>>
>>> I really didn't expecting to find anything about kilometers, or meta-keys,
>>> or khmer (cambodian language?)!
>>
>> This is one of the short comings of apropos(1) right now. While
>> indexing the man pages, the tokenizer does stemming of the words being
>> indexed. Stemming essentially tries to reduce the words to their root
>> words, for example
>> running --> run
>> eating -> eat
>> eats -> eat
>> listened -> listen
> Is there a way to disable the stemming (preferably config or environment?)

No, and I don't think it's really a good idea to disable it. I've
tried it while working on apropos(1) and the results are ugly  :)

Stemming allows getting rid of a lot of noise from the signal.  It is
a really good thing to have. Consider, for example, you run the query
"list directories" expecting to see "ls" in the output, but if
stemming was not done, you would never see "ls" in the results as it
has "list directory" in its NAME section and not "list directories".
It is just one of the examples, but you get the idea. We just need to
handle the special cases where we don't want to stem :)

-
Abhinav


Home | Main Index | Thread Index | Old Index