tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

[GSoC 2011] [Status Report] Apropos Replacement

Hello NetBSD!!

The official coding period of GSoC 2011 has ended, therefore I am writing a
final status report on the progress of the project. I will try to summarise
what were the initial goals of the project and what goals have been achieved by
this deadline.

1. OBJECTIVE: The main objective of the project was to develop a
replacement tool
for apropos(1) which would provide a better search experience. We
often encounter
situations where we are faced with a problem whose solution is easily answered
somewhere in some man page but due to the lack of a search tool, we either turn
towards Google or seek the advice of an expert. The aim of this project was to
try to develop such a search tool, which would point the user towards
the solution.


        1. A utility for parsing and indexing the man pages. (makemandb.c)
        2. A utility for searching the index thus created. (apropos.c_
        3. A ranking algorithm to find more relevant results.
        3. A mechanism to update the index when new man pages are installed or 
           ones are removed.
        4. Using the database to manage the man page aliases.
        5. A library like interface to built applications on top of it.
        6. Documentation in the form of man pages.

        1. I proposed to provide line number or references to specific sections 
           the man pages in the search results but at the time of 
implementation it
           did not seem trivial.
        2. A CGI based interface: I did not have enough time left at the end to 
           this out. Although the ground work for this work has been done in 
the form
           of a library like interface and a function run_query_html() which 
           the search results in the form of HTML fragment. So it should be 
           to write a CGI application to perform the searches from a web 


        There are two command line utilities 'makemandb' and 'apropos'. You 
    first need to build the Full Text Search (FTS) Index using
makemandb(1) and then
    you can use apropos(1) (the one provided by this project) to
perform searches.

    4.1 makemandb: Simply running makemandb will build the FTS index
and tell you
        the number of pages indexed. Some of the pages might not get indexed on
        the way which will be indicated by error messages on the screen but
        nothing to worry about that.

    NOTE: The default behavior of makemandb is incremental updation. That is to
        say it will try to add only those pages to the index which it did not
        have previously and also it will remove those pages from the
index which
        are no more on the file system. Of course if there is no existing index
        it will build it from scratch.

        makemandb supports following options:

        [-f]: The option 'f' will tell makemandb(1) to prune the existing index
        (if there exists one) and rebuild the database from scratch.

        [-l]: The option 'l' will tell makemand(1) to limit the
indexing to only
        the NAME section of the man pages. This option can be used to mimic the
        behavior of the "classical apropos" although with improved search
        capabilities. This option might be useful if you want to save few MB of
        disk space.

        [-o]: The option  'o' is for optimizing the index. makemand(1) will try
        to optimize the FTS index for faster search performance and
also it will
        optimize the storage of the data to optimize disk space usage.

        makemandb also builds and maintains an aliases table for
managing the man
        page aliases which are scattered through the file system in the form of
        symlinks or hardlinks. I have provided a patch to man.c so that man(1)
        looks up this table to identify the target page which it needs
to render.
        Thus, it should be possible to get rid of these symlinks and hardlinks.

    4.2 apropos: Once you have built the database you can fire apropos(1) and
        pass a query to do a search. For example:
        $apropos "add a new user"

        apropos supports following options:

        [-1234569]: You can pass section numbers as options to apropos which
        will make apropos to search only within the specified set of sections.

        [-p]: By default apropos(1) will display the top 10 ranked results on
        stdout. So if you would like to see more results then use 'p'. It will
        allow apropos(1) to display all the results and also it will pipe the
        results to a pager (more(1)).

    Besides the two command line tools, I have also developed a very small
    library to allow and build a search application on top of the FTS
index built
    by makemandb. It has following public functions:

    4.1 init_db(): To initialize a connection to the database. It takes care of
    registering some custom functions with the connection, and also it will
    recreate the database schema in case the database file does not exist and
    you provided the right flags.

    4.2 run_query(): To run a query as entered by the user and process the rows
    obtained in a callback function (apropos.c uses it).

    4.3 run_query_html(): Similar to run_query() but it formats the results
    obtained in the form of an HTML fragment. This can be used to build a CGI
    application to do searches from a browser.

    4.4 run_query_pager(): Similar to run_query_html but it formats the results
    so that the matching text appears highlighted when piped to a pager.
    apropos.c uses it when the -p option is specified.

    4.5 close_db(): To close the database connection and release any resources.

For more detailed documentation you can read up the man pages of the individual

         Following are the requirements for building and running it on NetBSD:
    2.1 -CURRENT version of NetBSD (or at least -CURRENT man pages and -CURRENT
        version of man(1) ).
    2.2 libmandoc from mdocml.

        I uploaded some screenshots of the output on my blog. Here are the 

        I owe a big chunk of the success to my mentor Jörg Sonnenberger who was 
        there to answer my questions, offer advice and review the code. I have 
        a great deal from him and I am sure I have improved as a programmer. 
The best
        thing about working with him was that he never really disclosed the 
        instead he gently guided towards the direction of the solution, so I 
        lost a learning opportunity :-)

        David Young also offered valuable guidance during the project. He
provided some
        clever insights and tips to improve the search and ranking of the 
        I decided to decompose the database into more columns based on different
        sections in a man page based on his idea only.

        Thanks to Kristaps Dzonsons as well who is responsible for the mdocml 
        He also reviewed the code related to parsing of the pages and pointed 
out bugs
        in the code. I implemented makemandb based on his utility "mandocdb", 
so that
        was also a huge help.

        Special thanks goes to Thomas Klausner for reviewing the man pages I 
        and also proving patches for the errors/mistakes I had made in them.
        I must also thank Julio Merino, Jan Schaumann, Jukka Ruohonen, 
        for the interest they showed in the project and offered help throughout 

        And thanks to lots of other people in the community as well whose names 
        forgot to mention. It was encouraging to see responses to each status 
        I made and kept me excited.

        I thoroughly enjoyed my experience while working on this project. I
        would definitely like to continue working in the NetBSD community, in 
fact I
        was discussing with Joerg about some of the projects I could work on. I 
        interest in systems programming but not enough knowledge, but I don't 
        learning ;-)

Thanks for reading this far :-)

Home | Main Index | Thread Index | Old Index