tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Suggestion: add a "no-stemming" option to apropos(1)?



On Thu, Jun 8, 2017 at 10:00 PM, Abhinav Upadhyay
<er.abhinav.upadhyay%gmail.com@localhost> wrote:
> On Thu, Jun 8, 2017 at 10:21 AM, Paul Goyette <paul%whooppee.com@localhost> wrote:
>> Don't get me wrong, I love apropos(1).  But...
>
> Thank you :)
>
>> I'm continually bitten by the "stemming" that occurs.  Today's example
>> is an attempt to find all the man pages that refer to file system lfs.
>> Using "apropos lfs" returns more than 120 entries, complete with their
>> associated context!  The vast majority of those entries are really for
>> "lf" (in either upper- or lower-case), and have no relation to lfs the
>> file system!
>>
>> Would it be unreasonable to add a no-stem option to apropos(1)?
>
> As Joerg said, as an option it would be impractical. The better
> approach would be to use a custom tokenizer which uses a blacklist of
> words that should not be stemmed.
>
> I was tinkering with it few months back and had it even working as a
> proof of concept but now it seems I can't find the copy of the work I
> did. I will try to redo it this weekend. Thanks for bringing it up :-)

I have just committed this:
http://mail-index.netbsd.org/source-changes/2017/06/18/msg085477.html

Could you give it a go and let me know how is it now :)

 (you will have to rebuild the database with `makemandb -f').

-
Abhinav


Home | Main Index | Thread Index | Old Index