tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Perfect hash function generator



Hi all,
we have a number of places in our system where matching an input word
against a table using binary search.  An example is Roy's terminfo code.
This is suboptimal from a performance point of view.

I'd like to fix this issue in two steps. First, I'd like the integrate
the Jenkins hash function into libc. It is using only simple operations
and known to provide few collisions. It can be used with power-of-two
hash tables as well as modular hashes. As the man page states, the
prototype goes into string.h. The primary reason to make this libc and
not e.g. libutil is to allow all parts of the system to use it without
pulling in libutil. This applies e.g. to the new libterminfo.

The second part is the provided minimal perfect hash function generator.
The generated hash function is very simple, using two lookups, a Jenkins
hash and three unsigned constant divisions. Space usage is
c * n * ld n * x Bytes with:
        n = number of keys
        c >= 2
        x = 1 for cn < 256,
        x = 2 for 256 <= cn < 65536,
        x = 4 otherwise
The process itself is propabilistic and generally quite fast. To process all
non-comment lines of /etc/services on my laptop, it needs around 6s.
The output is much smaller than the function created by gperf, but doesn't
contain the check if key actually is valid. E.g. you still have to do a
strcmp after that. The generator is also much faster than gperf.

Comments?

Joerg

Attachment: nbperf.tar.gz
Description: application/tar-gz



Home | Main Index | Thread Index | Old Index