tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: GSoC Project Progress Update: Apropos Replacement

On Thu, Jun 23, 2011 at 3:35 AM, Jukka Ruohonen <> 
> On Thu, Jun 23, 2011 at 02:08:59AM +0530, Abhinav Upadhyay wrote:

>> 2. A ranking function: A ranking function is very necessary, so that
>> Sqlite ranks and gives back the most useful results at the top. If you
>> try the apropos in the master branch and the one in search branch, you
>> will notice drastic difference in the quality of search results. But
>> even after this lots of effort is required to improve it.
> Besides the usual frequency, possible ranking scores (or "static weights")
> could involve the earlier mentioned .Nm and .Nd. Say, if a word "string"
> appears already in the title, it may be a better result than several
> appearances of the word "string" in the body of the text.

Yes, although the current ranking function does give a static weight
to each column.
name column --> 1.50
name_desc column --> 1.25
desc column --> .75
So after calculating the term frequency in a column we multiply it by
the static weight of the column.
Besides this, calculating the Inverse Document Frequency and using it
as well as a factor in ranking should better the results.

>> - What are the most important things you would look for when
>> performing a search across man pages ?
> It may be difficult to say because we have never had a reasonable search
> utility for man pages ;-). But I think the examples you noted were pretty
> much spot on; from "how to add user" and "package installation" to "kernel
> memory" or "vnode locking".

Yes, although then these were the queries which produced best results.
To me it seems the more elaborate the user is in specifying his query,
the better should be the results ( I mean still he has to mention the
right keywords). A single keyword query might lead no where. But then
we are in a very initial stage of the project.

> - Do you like the current results ?
> Yes, the results were very reasonable already.

Thanks for liking it and taking time to provide feedback. I appreciate it. :-)


Home | Main Index | Thread Index | Old Index