tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: GSoC Project Progress Update: Apropos Replacement



On Thu, Jun 23, 2011 at 4:28 AM, Joerg Sonnenberger
<joerg%britannica.bec.de@localhost> wrote:
> On Thu, Jun 23, 2011 at 02:08:59AM +0530, Abhinav Upadhyay wrote:
>> 1. A Stopword filter: If we are doing full text search, then we also
>> expect users to enter normal queries consisting of usual English
>> words, so we need to filter out the stopwords out of the user query in
>> order to get only those results which match the actual keywords in the
>> user query and not the stopwords.
>
> Be careful here. At least .Nm should *not* get filtered. Consider
> "apropos who"...
>
At the moment I have built a static list of stopwords, but yeah I did
not consider scenarios like "apropos who" (although 'who' is not on
the list).

I have a pretty basic (and perhaps lame) approach in mind for this:

We first eliminate very obvious stopwords from the query (like a, an,
and, are, about, also, etc.).
After this, we run a query only against the name and name_desc columns.
Then we again filter any remaining stopwords from the query and then
perform search against the desc column.
In the end we take a union of all the results and rank them.
Although it makes things somewhat complicated.


Home | Main Index | Thread Index | Old Index