tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Fwd: GSoC Project: Replacement for Apropos



On Mon, Mar 21, 2011 at 12:40:18AM +0530, Abhinav Upadhyay wrote:
> My name is Abhinav Upadhyay, and I am a 4th year student of Bachelor
> of Technology from India.

Welcome Abhinav!

> Regarding the implementation: I think, parsing the mandocs and
> building an inverse index, would be a good way to start. I have
> experience of working on a search engine, which I developed as one of
> my undergrad projects. I had used the Apache Lucene library that
> provides all kinds of search optimization including keyword weights.
> If we can afford to use a high level language like Python, this
> library could be used, but still it is a very heavy library because of
> the runtime requirements.

As Julio wrote, the target here is the base system and that effectively
means using C. External dependencies should be kept minimal. While
SQLite is currently not included, it is a very reasonable tool and
acceptable for this purpose. The primary tool Lucense offers for this
purpose is a good stemming implementation.

> I had a couple of questions though:
> 1.) Which language would you prefer to use ? I think Python would be a
> good choice.

As written above, the real implementation should be in C. Using e.g.
Python for prototyping purposes is fine though.

> 2.) And if the user installs some new system utilities or tools, and
> new mandocs are added for them, then how would we add those documents
> to the index ? I think the indexer process could be called
> automatically after installing the new mandocs, which would run in the
> background and index the new documents.

FYI, call it either "manual pages" (man page for short) or refer to
man/mdoc documents (which focuses more on the format of the content).
Updating the index can be done either by invoking a helper program
directly from the installation process or as part of /etc/daily (aka the
handling of periodic jobs). It's good enough to provide something for
the latter.

> I am really eager to work on this. I also wanted to know if you
> require students to perform some kind or task to judge their
> capabilities and filter the applications ?

If you have already participated in Open Source projects, a reference
would be useful. Julio has provided the reference to the normal application
process already.

Joerg


Home | Main Index | Thread Index | Old Index