tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: List of Keywords for apropos(1) Which Should Not be Stemmed



On Tue, Jul 12, 2016 at 6:24 AM, David Young <dyoung%pobox.com@localhost> wrote:
> On Mon, Jul 11, 2016 at 06:59:25PM +0530, Abhinav Upadhyay wrote:
>> But the downside is that technical keywords (e.g. kms, lfs, ffs), are
>> also stemmed down and stored (e.g. km, lf, ff) in the index. So if you
>> search for kms, you will see results for both kms and km.
>
> Interesting problem.
>
> I expect the set of documents that contain a word ("directories") and
> the set of documents containing its true stem ("directory") to overlap
> widely.  I also expect the set of documents that contain a word ("kms")
> and an incorrect stem ("km") to scarcely overlap.  Do the manual pages
> meet these expections?  If so, then maybe you can decide whether or not
> to keep a stem by looking at the document-set overlap?

Yes, usually when the stem is incorrect, the overlap is not that much.
But the only way to figure out such cases is manually comparing the
output of apropos, unless we have a pre-built list of expected
document-set and we can compare those. :)

-
Abhinav


Home | Main Index | Thread Index | Old Index