tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

[GSoC 2011] [Status Report] Apropos Replacement



Hello NetBSD!!

The official coding period of GSoC 2011 has ended, therefore I am writing a
final status report on the progress of the project. I will try to summarise
what were the initial goals of the project and what goals have been achieved by
this deadline.

1. OBJECTIVE: The main objective of the project was to develop a
replacement tool
for apropos(1) which would provide a better search experience. We
often encounter
situations where we are faced with a problem whose solution is easily answered
somewhere in some man page but due to the lack of a search tool, we either turn
towards Google or seek the advice of an expert. The aim of this project was to
try to develop such a search tool, which would point the user towards
the solution.

2. DELIVERABLES PROPOSED & DELIVERED:

        1. A utility for parsing and indexing the man pages. (makemandb.c)
        2. A utility for searching the index thus created. (apropos.c_
        3. A ranking algorithm to find more relevant results.
        3. A mechanism to update the index when new man pages are installed or 
old
           ones are removed.
        4. Using the database to manage the man page aliases.
        5. A library like interface to built applications on top of it.
        6. Documentation in the form of man pages.
        
3. DELIVERABLES PROPOSED & NOT DELIVERED: :

        1. I proposed to provide line number or references to specific sections 
of
           the man pages in the search results but at the time of 
implementation it
           did not seem trivial.
        2. A CGI based interface: I did not have enough time left at the end to 
try
           this out. Although the ground work for this work has been done in 
the form
           of a library like interface and a function run_query_html() which 
provides
           the search results in the form of HTML fragment. So it should be 
trivial
           to write a CGI application to perform the searches from a web 
browser.

4. DETAILS ABOUT THE DELIVERABLES PRODUCED

        There are two command line utilities 'makemandb' and 'apropos'. You 
would
    first need to build the Full Text Search (FTS) Index using
makemandb(1) and then
    you can use apropos(1) (the one provided by this project) to
perform searches.

    4.1 makemandb: Simply running makemandb will build the FTS index
and tell you
        the number of pages indexed. Some of the pages might not get indexed on
        the way which will be indicated by error messages on the screen but
        nothing to worry about that.

    NOTE: The default behavior of makemandb is incremental updation. That is to
        say it will try to add only those pages to the index which it did not
        have previously and also it will remove those pages from the
index which
        are no more on the file system. Of course if there is no existing index
        it will build it from scratch.

        makemandb supports following options:

        [-f]: The option 'f' will tell makemandb(1) to prune the existing index
        (if there exists one) and rebuild the database from scratch.

        [-l]: The option 'l' will tell makemand(1) to limit the
indexing to only
        the NAME section of the man pages. This option can be used to mimic the
        behavior of the "classical apropos" although with improved search
        capabilities. This option might be useful if you want to save few MB of
        disk space.

        [-o]: The option  'o' is for optimizing the index. makemand(1) will try
        to optimize the FTS index for faster search performance and
also it will
        optimize the storage of the data to optimize disk space usage.

        makemandb also builds and maintains an aliases table for
managing the man
        page aliases which are scattered through the file system in the form of
        symlinks or hardlinks. I have provided a patch to man.c so that man(1)
        looks up this table to identify the target page which it needs
to render.
        Thus, it should be possible to get rid of these symlinks and hardlinks.

    4.2 apropos: Once you have built the database you can fire apropos(1) and
        pass a query to do a search. For example:
        $apropos "add a new user"

        apropos supports following options:

        [-1234569]: You can pass section numbers as options to apropos which
        will make apropos to search only within the specified set of sections.

        [-p]: By default apropos(1) will display the top 10 ranked results on
        stdout. So if you would like to see more results then use 'p'. It will
        allow apropos(1) to display all the results and also it will pipe the
        results to a pager (more(1)).

5. OTHER DELIVERABLES:
    Besides the two command line tools, I have also developed a very small
    library to allow and build a search application on top of the FTS
index built
    by makemandb. It has following public functions:

    4.1 init_db(): To initialize a connection to the database. It takes care of
    registering some custom functions with the connection, and also it will
    recreate the database schema in case the database file does not exist and
    you provided the right flags.

    4.2 run_query(): To run a query as entered by the user and process the rows
    obtained in a callback function (apropos.c uses it).

    4.3 run_query_html(): Similar to run_query() but it formats the results
    obtained in the form of an HTML fragment. This can be used to build a CGI
    application to do searches from a browser.

    4.4 run_query_pager(): Similar to run_query_html but it formats the results
    so that the matching text appears highlighted when piped to a pager.
    apropos.c uses it when the -p option is specified.

    4.5 close_db(): To close the database connection and release any resources.

For more detailed documentation you can read up the man pages of the individual
components.

6. REQUIREMENTS FOR BUILDING & RUNNING:
         Following are the requirements for building and running it on NetBSD:
    2.1 -CURRENT version of NetBSD (or at least -CURRENT man pages and -CURRENT
        version of man(1) ).
    2.2 libmandoc from mdocml.

7. SCREENSHOTS:
        I uploaded some screenshots of the output on my blog. Here are the 
links:
        
   
http://4.bp.blogspot.com/-q5uy81DqUmE/TlPFTdweyXI/AAAAAAAACDw/Du06YrCBnEQ/s1600/add-user.png
   
http://3.bp.blogspot.com/-nj0SRZVZ0HU/TlPFc46KbrI/AAAAAAAACD0/D7vaaR4wuy0/s1600/password-hash.png
   
http://3.bp.blogspot.com/-lt0chLf9TjU/TlPFmwLo1vI/AAAAAAAACD4/F_Xhen1L5Rw/s1600/psignal.png
   
http://2.bp.blogspot.com/-VLnGy27-ecw/TlPF3zj40wI/AAAAAAAACD8/pWQqYHm1dZ8/s1600/log.png
   
http://2.bp.blogspot.com/-HS7eDup9B-w/TlPGF4IH2aI/AAAAAAAACEA/oieShZiX_co/s1600/realtek.png


8. ACKNOWLEDGEMENTS:
        I owe a big chunk of the success to my mentor Jörg Sonnenberger who was 
always
        there to answer my questions, offer advice and review the code. I have 
learnt
        a great deal from him and I am sure I have improved as a programmer. 
The best
        thing about working with him was that he never really disclosed the 
solution,
        instead he gently guided towards the direction of the solution, so I 
never
        lost a learning opportunity :-)

        David Young also offered valuable guidance during the project. He
provided some
        clever insights and tips to improve the search and ranking of the 
results.
        I decided to decompose the database into more columns based on different
        sections in a man page based on his idea only.

        Thanks to Kristaps Dzonsons as well who is responsible for the mdocml 
project.
        He also reviewed the code related to parsing of the pages and pointed 
out bugs
        in the code. I implemented makemandb based on his utility "mandocdb", 
so that
        was also a huge help.

        Special thanks goes to Thomas Klausner for reviewing the man pages I 
wrote
        and also proving patches for the errors/mistakes I had made in them.
        I must also thank Julio Merino, Jan Schaumann, Jukka Ruohonen, 
S.P.Zeidler
        for the interest they showed in the project and offered help throughout 
:-)

        And thanks to lots of other people in the community as well whose names 
I
        forgot to mention. It was encouraging to see responses to each status 
report
        I made and kept me excited.
        
9. WHAT NEXT ?

        I thoroughly enjoyed my experience while working on this project. I
        would definitely like to continue working in the NetBSD community, in 
fact I
        was discussing with Joerg about some of the projects I could work on. I 
have
        interest in systems programming but not enough knowledge, but I don't 
mind
        learning ;-)


Thanks for reading this far :-)
--
Abhinav

http://abhinav-upadhyay.blogspot.com/2011/08/final-report-netbsd-gsoc-2011-apropos.html


Home | Main Index | Thread Index | Old Index