tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Support for boolean queries in apropos



On Mon, Mar 12, 2012 at 10:41:29PM +0530, Abhinav Upadhyay wrote:
> On Mon, Mar 12, 2012 at 2:18 AM, David Young <dyoung%pobox.com@localhost> 
> wrote:
> > I think that general boolean queries are known now to be less powerful
> > and useful than everyone supposed that they were 20 or 30 years ago, so
> > search engines either omit support for boolean operators or else do not
> > encourage the use of such operators.  Instead, usually you can specify
> > terms that MUST appear in the results, and terms that MUST NOT appear,
> > using either a simplified set of operators (+ and -) or an "Advanced"
> > search form that has fields for the MUST/MUST-NOT terms.  Is there any
> > reason to believe that boolean operators are more suitable for apropos
> > than for other search engines?
> 
> I agree with what Mouse said. I have not done much research on why Web
> search engines do not advocate the use of such Boolean operators but I
> don't think they really need them when their results are so accurate
> but with apropos where the search is still in its infancy stage, they
> are useful. As a concrete example from my personal experience,
> sometimes it happens that few packages like Git, OpenSSL, Perl, etc.
> come with so many man pages that they might literally start polluting
> the search results. In such cases if the user finds that the results
> are being unnecessarily cluttered by man pages from these packages, it
> would prove to be handy to use a Boolean query to negate such man
> pages from appearing in search results. Like the example I posted in
> my original email, where a simple query "add new user" would get
> cluttered by few results from git or open-ssh with whom I wasn't
> really concerned, so a boolean NOT operator for "git" and "ssh"
> eliminated any man pages from those packages.

Is a boolean query really so handy as the minus operator?  It isn't as
succinct.

> Apart from that, having these capabilities available to the user
> wouldn't hurt.

You may have thought that no one could disagree with that, but I
do. :-) Any software capability comes with a cost to maintain and
support it.  If boolean queries are available, users may spend a lot of
time writing them when a query written with +/- operators would be more
succinct (thus faster to type) and accurate (that is, resembling their
intentions better).

> What I meant to say in that thread was that, if term weights for all
> the terms in the corpus are pre-computed and stored in a database
> table, then while ranking the search results it is possible to use a
> more sophisticated ranking scheme or algorithm. But storing the
> pre-computed weights in the database requires extra storage, which I
> don't think would be welcome by many people, therefore currently this
> approach is being avoided. It would be more prudent to get support for
> storage of term-weights in the FTS index implemented in Sqlite itself,
> as I think it would save a lot of disk space by avoiding duplication
> of data (but I am not sure if that is possible or would be welcome by
> Sqlite developers, I haven't talked to them though).

Just how much extra storage are we talking about?  10 MB? 100 MB? 1 GB?

Dave

-- 
David Young
dyoung%pobox.com@localhost    Urbana, IL    (217) 721-9981


Home | Main Index | Thread Index | Old Index