Subject: Re: more Q&D results [hash for use in the kernel]
To: ozan s. yigit <oz@zonzorp.canada.sun.com>
From: Andrea Cocito <acocito@lar.ieo.it>
List: tech-perform
Date: 11/29/2001 19:24:46
oz said:
>I use the McKenzie et. al measure for goodness, namely the ratio
>between the ideal number of probes to locate each word in the table
>vs the number of probes the function at hand generated, indicated as Rn
>in my table.
>
>ideal number of probes is W(W+N)/2N, where W is the number of words
>to hash, and N is the size of the hash table, obviously. :)

Well... I did not know about McKenzie ratio (that you called
Rn in your tests), but as a matter of fact we are doing the same
thing, unless I miss something:

      Rn = 1/score

The bigger my "score" is, the lower Rn is, the better the hash function
is.

While there.. I took ten minutes to remove the pipeline stall in
my code (I think compilers should do their -O2 job better :P), this
is the new toy, always faster than perl hash:

unsigned int
nemhash(unsigned char *str, size_t len)
{
	unsigned int    hash;

        hash = 0;
        while (len--)
		hash += (hash << 8) + *str++;

	return (hash + (hash << 8));
}

Tests here say:

[nemesi:~/Desktop/hashtest] blackye% cat filenames /usr/share/dict/words | ./hash s 32769
total items:  353407
hash dumb         score 5.318160e-02   time 0.285sec
hash fnv          score 9.174211e-01   time 0.427sec
hash lennart      score 9.160142e-01   time 0.383sec
hash crc          score 9.101253e-01   time 0.398sec
hash perl         score 8.923864e-01   time 0.355sec
hash mouse        score 6.621882e-01   time 0.421sec
hash nemesi       score 9.162064e-01   time 0.340sec
[nemesi:~/Desktop/hashtest] blackye% cat filenames /usr/share/dict/words | ./hash s 65536
total items:  353407
hash dumb         score 2.677079e-02   time 0.303sec
hash fnv          score 8.507784e-01   time 0.400sec
hash lennart      score 8.496121e-01   time 0.389sec
hash crc          score 4.150457e-01   time 0.411sec
hash perl         score 8.514676e-01   time 0.346sec
hash mouse        score 2.490107e-01   time 0.441sec
hash nemesi       score 7.981116e-01   time 0.334sec
[nemesi:~/Desktop/hashtest] blackye%

Ciao,

A.

-- 
Andrea Cocito
Director of the Biocomputing Research Unit
Department of Experimental Oncology
Europen Institute of Oncology
Via Ripamonti 435, Milano - Italy
Tel. +39-02-57489857
Fax. +39-02-57489851