tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Teaching Apropos to Rank



On 7 May 2016 at 20:56, Abhinav Upadhyay <er.abhinav.upadhyay%gmail.com@localhost> wrote:
> Hi All,
>
> From man-k.org I was able to create a small dataset of queries,
> results and their relevance scores. I am working on trying out some
> machine learning models to improve the ranking algorithm of
> apropos(1).
>
> Currently apropos has a weight for each of the sections such as NAME,
> DESCRIPTION, etc., and it multiplies a match in a section by this
> weight. This is required because a match in one section, for example,
> NAME is more relevant than a match in some other section, such as
> DESCRIPTION. These weights were put arbitrarily by me as I didn't have
> any way to learn their optimum value.
>
> I am trying out some machine learning techniques to learn these
> weights. The results till now have not been any drastic but they are
> definitely an improvement. Hopefully I will be able to get more
> concrete results soon. A small comparison of results between old
> weights and the weights learned from machine learning is below.
>
> apropos -n 10 -C fork #old weights
> fork (2)  create a new process
> perlfork (1)      Perls fork() emulation
> cpu_lwp_fork (9)  finish a fork operation
> pthread_atfork (3)        register handlers to be called when process forks
> rlogind (8)       remote login server
> rshd (8)  remote shell server
> rexecd (8)        remote execution server
> script (1)        make typescript of terminal session
> moncontrol (3)    control execution profile
> vfork (2) spawn new process in a virtual memory efficient way
>
> apropos -n 10 -C fork #new weights
> fork (2) create a new process
> perlfork (1) Perls fork() emulation
> cpu_lwp_fork (9) finish a fork operation
> pthread_atfork (3) register handlers to be called when process forks
> vfork (2) spawn new process in a virtual memory efficient way
> clone (2) spawn new process with options <-- clone(2) appears in top 10
> daemon (3) run in the background
> script (1) make typescript of terminal session
> openpty (3) tty utility functions
> rlogind (8) remote login server
>
> clone(2) shows up, rshd(8) and rexecd(8) go away, rlogind(8) moves down.
>
>
> apropos -n 10 -C create new process
> init (8)  process control initialization
> fork (2)  create a new process
> fork1 (9) create a new process
> timer_create (2)  create a per-process timer
> getpgrp (2)       get process group
> supfilesrv (8)    sup server processes
> posix_spawn (3)   spawn a process
> master (8)        Postfix master process
> popen (3) process I/O
> _lwp_create (2)   create a new light-weight process
>
> apropos -n 10 -C create new process #new weights
> fork (2) create a new process <-- fork(2) is number 1
> fork1 (9) create a new process
> _lwp_create (2) create a new light-weight process
> pthread_create (3) create a new thread
> clone (2) spawn new process with options
> timer_create (2) create a per-process timer
> UI_new (3) New User Interface
> init (8) process control initialization
> posix_spawn (3) spawn a process
> master (8) Postfix master process
>
> fork(2) moves to number 1, init(8) moves to 7, clone(2) appears etc.
>
> I wrote a blog about it:
> http://abhinav-upadhyay.blogspot.in/2016/05/teaching-apropos-to-rank-work-in.html
>
> The data is available here:
> https://github.com/abhinav-upadhyay/man-nlp-experiments/tree/master/data
>
> Let me know your thoughts or concerns :)

Very cool - definitely looking forward to seeing the final result back
into apropos(1) :)

As a possible future option are you planning on special handling of
multiple word searches - eg heavier weighting for the words coming
consecutively in the data?


Home | Main Index | Thread Index | Old Index