tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Teaching Apropos to Rank



Hi All,

From man-k.org I was able to create a small dataset of queries,
results and their relevance scores. I am working on trying out some
machine learning models to improve the ranking algorithm of
apropos(1).

Currently apropos has a weight for each of the sections such as NAME,
DESCRIPTION, etc., and it multiplies a match in a section by this
weight. This is required because a match in one section, for example,
NAME is more relevant than a match in some other section, such as
DESCRIPTION. These weights were put arbitrarily by me as I didn't have
any way to learn their optimum value.

I am trying out some machine learning techniques to learn these
weights. The results till now have not been any drastic but they are
definitely an improvement. Hopefully I will be able to get more
concrete results soon. A small comparison of results between old
weights and the weights learned from machine learning is below.

apropos -n 10 -C fork #old weights
fork (2)  create a new process
perlfork (1)      Perls fork() emulation
cpu_lwp_fork (9)  finish a fork operation
pthread_atfork (3)        register handlers to be called when process forks
rlogind (8)       remote login server
rshd (8)  remote shell server
rexecd (8)        remote execution server
script (1)        make typescript of terminal session
moncontrol (3)    control execution profile
vfork (2) spawn new process in a virtual memory efficient way

apropos -n 10 -C fork #new weights
fork (2) create a new process
perlfork (1) Perls fork() emulation
cpu_lwp_fork (9) finish a fork operation
pthread_atfork (3) register handlers to be called when process forks
vfork (2) spawn new process in a virtual memory efficient way
clone (2) spawn new process with options <-- clone(2) appears in top 10
daemon (3) run in the background
script (1) make typescript of terminal session
openpty (3) tty utility functions
rlogind (8) remote login server

clone(2) shows up, rshd(8) and rexecd(8) go away, rlogind(8) moves down.


apropos -n 10 -C create new process
init (8)  process control initialization
fork (2)  create a new process
fork1 (9) create a new process
timer_create (2)  create a per-process timer
getpgrp (2)       get process group
supfilesrv (8)    sup server processes
posix_spawn (3)   spawn a process
master (8)        Postfix master process
popen (3) process I/O
_lwp_create (2)   create a new light-weight process

apropos -n 10 -C create new process #new weights
fork (2) create a new process <-- fork(2) is number 1
fork1 (9) create a new process
_lwp_create (2) create a new light-weight process
pthread_create (3) create a new thread
clone (2) spawn new process with options
timer_create (2) create a per-process timer
UI_new (3) New User Interface
init (8) process control initialization
posix_spawn (3) spawn a process
master (8) Postfix master process

fork(2) moves to number 1, init(8) moves to 7, clone(2) appears etc.

I wrote a blog about it:
http://abhinav-upadhyay.blogspot.in/2016/05/teaching-apropos-to-rank-work-in.html

The data is available here:
https://github.com/abhinav-upadhyay/man-nlp-experiments/tree/master/data

Let me know your thoughts or concerns :)

--
Abhinav


Home | Main Index | Thread Index | Old Index