tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Fwd: GSoC Project: Replacement for Apropos



Hi,
My name is Abhinav Upadhyay, and I am a 4th year student of Bachelor
of Technology from India.

I was going through the list of available ideas on the NetBSD GSoC
page. The idea of enhancing (or replacing) the apropos command, so
that it provides more better, useful and relevant results is simply
outstanding and innovative. When I started learning *Nix systems, I
found apropos command somewhat useful, provided I searched with the
right *keywords*. Enhancing its search capabilities would be really
useful, and perhaps it would result in the most often used command by
the users.

Regarding the implementation: I think, parsing the mandocs and
building an inverse index, would be a good way to start. I have
experience of working on a search engine, which I developed as one of
my undergrad projects. I had used the Apache Lucene library that
provides all kinds of search optimization including keyword weights.
If we can afford to use a high level language like Python, this
library could be used, but still it is a very heavy library because of
the runtime requirements.

But looking at the SQLite's full text search capabilities using
FTS3/FTS4, it is a much portable solution because of the small size of
the SQLlite tools and library itself, and it offers high level of
query and performance optimization as well. I think it would be
interesting to play with it and to see if we can add document
comparison to it and add some variant of page rank algorithm to get
more relevant results first. Another add on would be to provide a GUI
browser as well for searching the mandocs.

I had a couple of questions though:
1.) Which language would you prefer to use ? I think Python would be a
good choice.
2.) And if the user installs some new system utilities or tools, and
new mandocs are added for them, then how would we add those documents
to the index ? I think the indexer process could be called
automatically after installing the new mandocs, which would run in the
background and index the new documents.

I am really eager to work on this. I also wanted to know if you
require students to perform some kind or task to judge their
capabilities and filter the applications ?


Thanks.
Abhinav Upadhyay


Home | Main Index | Thread Index | Old Index