tech-kern: Re: more Q&D results [hash for use in the kernel]

Subject: Re: more Q&D results [hash for use in the kernel]
To: Simon Burge <simonb@wasabisystems.com>
From: Andrea Cocito <acocito@lar.ieo.it>
List: tech-kern
Date: 11/29/2001 18:24:44
Simon Burge Said:
>I've done some more playing, and generated some filehandles (look at
>getfh(2) for a handy syscall).  Here's the outputs, sorted by time per
>hash:

Would you post a link to the source for this test and the data for the test
suite you used too ?

This is what I get with the previous code adding my function
(A PowerPC G4 @466 Mhz, compiled with -O2):

[nemesi:~/Desktop/hashtest] blackye% cat filenames /usr/share/dict/words | ./hash s 32769
total items:  353407
hash dumb         used  16990   max    625   time 0.291sec
hash fnv          used  32769   max     28   time 0.409sec
hash lennart      used  32768   max     28   time 0.385sec
hash crc          used  32769   max     35   time 0.394sec
hash perl         used  32084   max     27   time 0.346sec
hash mouse        used  32720   max     79   time 0.423sec
hash nemesi       used  32768   max     27   time 0.343sec

("filenames" the output of "find /" on my machine)

Function "nemesi" is what I posted previously:

unsigned int
nemhash(unsigned char *str, size_t len)
{
        unsigned int hash;

        hash = 0;
        while (len-- > 0) {
                hash += *str++;
                hash += (hash << 8);
        }

        return (hash);
}

I would also note that it is not the maximum lengts of the
worst chain what matters but the sum of squares of the lengths.

I mean: I consider more correct to evaluate the "good spreading"
of the hash function something like:

        /* long loops, nodes, perfect, tmp; */
        loops = 0;
        for (i = 0; i < hashsize; i++) {
                loops += (buckets[i] * buckets[i]);
        }

        tmp = items % hashsize;
        perfect =  (items / hashsize) * (items / hashsize) * (hashsize - tmp)
                 + (1 + (items / hashsize)) * (1 + (items / hashsize)) * (tmp);

        /* double score */
        score = ((double) perfect) / ((double) loops);

        printf("hash %-10s   score %e   time %ld.%03ldsec\n",
            name, score, (long)t3.tv_sec, (long)t3.tv_usec / 1000);


Where you get as score the ratio between the number of loops across
the linked lists to lookup once each element you hashed with this
function respect to what you would get with a "perfect" hash function
that stores exactly either "n" or "n+1" elements in each bucket.
(You won't get a score = 1 even with an ideal random generator,
the bigger the score is the better the function is)

With this test this is what I get (this time compiled with -O3
and running as root to have more data...):

[nemesi:~/Desktop/hashtest] blackye% cat filenames /usr/share/dict/words | ./hash s 32769
total items:  353407
hash dumb         score 5.318160e-02   time 0.289sec
hash fnv          score 9.174211e-01   time 0.408sec
hash lennart      score 9.160142e-01   time 0.396sec
hash crc          score 9.101253e-01   time 0.377sec
hash perl         score 8.923864e-01   time 0.355sec
hash mouse        score 6.621882e-01   time 0.422sec
hash nemesi       score 9.162064e-01   time 0.357sec
[nemesi:~/Desktop/hashtest] blackye% cat filenames /usr/share/dict/words | ./hash s 65536
total items:  353407
hash dumb         score 2.677079e-02   time 0.289sec
hash fnv          score 8.507784e-01   time 0.420sec
hash lennart      score 8.496121e-01   time 0.404sec
hash crc          score 4.150457e-01   time 0.382sec
hash perl         score 8.514676e-01   time 0.367sec
hash mouse        score 2.490107e-01   time 0.426sec
hash nemesi       score 7.981116e-01   time 0.365sec
[nemesi:~/Desktop/hashtest] blackye%

Looking at the asm cocan say that the perl hash tends to be slightly
faster with long sequences and on moderm machines (does one
multiplication by constant inside the loop and one more operation
outside the loop) while mine tends to be slightly faster on short
strings and on older archictectures (causes one more pipeline stall
in the inner loop, but does no multiplication).

The distribution is more or less as good for each as for the other.

Would you send me a link to the data suite you used for testing ?
I can test with several different platforms here (PPC G3 and G4,
Alpha 21164, alpha ev7, SGI, Intel, mc68k and others).

Ciao,

Andrea aka Nemesi