NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: bin/46255: apropos(1) sometimes report unrelated responses



On Sun, Mar 25, 2012 at 8:00 AM,  <njoly%pasteur.fr@localhost> wrote:
>>Number:         46255
>>Category:       bin
>>Synopsis:       apropos(1) sometimes report unrelated results
>>Confidential:   no
>>Severity:       non-critical
>>Priority:       medium
>>Responsible:    bin-bug-people
>>State:          open
>>Class:          sw-bug
>>Submitter-Id:   net
>>Arrival-Date:   Sat Mar 24 23:00:00 +0000 2012
>>Originator:     Nicolas Joly
>>Release:        NetBSD 6.99.4
>>Organization:
> Institut Pasteur
>>Environment:
> System: NetBSD lanfeust.sis.pasteur.fr 6.99.4 NetBSD 6.99.4 (LANFEUST) #5: 
> Sat Mar 24 14:34:56 CET 2012 
> njoly%lanfeust.sis.pasteur.fr@localhost:/local/src/NetBSD/obj.amd64/sys/arch/amd64/compile/LANFEUST
>  amd64
> Architecture: x86_64
> Machine: amd64
>>Description:
> Sometimes, apropos(1) return un-related results. By example, the `apropos lfs'
> command report correct entries that include the searched word but some
> un-related ones for the LF word .
>
> newfs_lfs(8)    construct a new LFS file system
> rump_lfs(8)     mount a lfs image with a userspace server
> scan_ffs(8)     find FFSv1/FFSv2/LFS partitions on a disk or file
> lfs_segclean(2) mark a segment clean
> mvme68k/lpt(4)  parallel port driver
> lfs_segwait(2)  wait until a segment is written
> x86/lpt(4)      Parallel port driver
> installboot(8)  install disk bootstrap software
> PCRE(3) - Perl-compatible regular expressions
> PCRE(3) - Perl-compatible regular expressions
>
> For the 10 results reported, 6 are correct and 4 are wrong (2 lpt and 2 PCRE).
>
> Things are worse for `apropos crs' which only report pages with "cr" word,
> not even a single "crs" result is found.
>
> njoly@lanfeust [~]> apropos -n 1000 crs | head
> mvme68k/lpt(4)  parallel port driver
> ...the driver. Minor Bit Function 128 Use the interruptless driver. (polling) 
> 64 Do not initialize the device on the port. 32 Automatic LF on CR. 16 Select 
> 1.6uS strobe pulse width (default is 6.4uS) pcc(4) , pcctwo(4)
> [...]
> njoly@lanfeust [~]> apropos -n 1000 crs | grep -ic crs

This is because of the stemmer. The stemmer strips off the suffix 's'
from the ending of all the tokens in an attempt to reduce the tokens
to their root word. This of course isn't right for technical terms
like lfs or abbreviations etc. I think the fix for this would require
writing a custom tokenizer for the FTS engine of Sqlite, which does
not try to stem down such technical keywords, but it would be a bit of
an undertaking :)

On the other hand, since the new apropos(1) supports full text search,
I think to get better millage out of it, it would be more useful to
specify a bit more detailed queries. It is hard to get 100% relevant
results but I hope to improve it.

--
Abhinav


Home | Main Index | Thread Index | Old Index