Subject: Re: more Q&D results [hash for use in the kernel]
To: None <tech-kern@netbsd.org, tech-perform@netbsd.org>
From: Simon Burge <simonb@wasabisystems.com>
List: tech-kern
Date: 11/30/2001 02:15:06
"David Laight" wrote:
> How good are these hash functions on non-random binary data?
> Hashing random data is (should be) trivial.
I've done some more playing, and generated some filehandles (look at
getfh(2) for a handy syscall). Here's the outputs, sorted by time per
hash:
> ./hashtest b 262144 28 < fhlist.245
total items: 141431
hash used col/fr max avg s/dev hash time
torek 82948 48632 53 1.71 1.24 2.9861 us
dumb 538 3 563 262.88 166.49 3.7416 us
bob 104911 78337 54 1.35 0.91 3.8239 us
gawk 105737 79303 54 1.34 0.91 4.1064 us
lennart 56030 16700 54 2.52 1.71 4.4145 us
perlxor 87362 53325 53 1.62 1.15 4.5534 us
perl 95307 62480 54 1.48 1.01 4.5536 us
byacc 84908 51442 53 1.67 1.21 4.5555 us
bernstein 95312 62484 54 1.48 1.01 4.5828 us
tcl 62361 30442 53 2.27 1.88 4.7169 us
crc 45092 13904 54 3.14 2.59 5.3558 us
mouse 1604 62 225 88.17 70.49 5.3584 us
python 105190 78668 55 1.34 0.91 5.3858 us
pjw 12346 1615 67 11.46 11.30 5.8344 us
honeyman 84764 42034 54 1.67 1.07 6.1975 us
fnv 103685 76333 54 1.36 0.93 8.4586 us
> cat flist.[245] | ./hashtest s 262144
total items: 141592
hash used col/fr max avg s/dev hash time
torek 109576 82751 6 1.29 0.56 10.7280 us
bob 109438 82526 7 1.29 0.56 11.9131 us
dumb 9270 907 69 15.27 12.06 12.5749 us
gawk 109509 82703 6 1.29 0.56 13.0312 us
lennart 57976 16236 10 2.44 1.32 14.0417 us
perlxor 109434 82583 7 1.29 0.56 14.1793 us
perl 109456 82614 7 1.29 0.56 14.1799 us
byacc 109472 82622 6 1.29 0.56 14.1801 us
tcl 108727 81513 7 1.30 0.58 14.1818 us
bernstein 109425 82572 7 1.29 0.56 14.2112 us
crc 99124 72839 156 1.43 1.57 15.7868 us
mouse 106222 78926 11 1.33 0.66 15.7880 us
python 109373 82461 7 1.29 0.57 15.8151 us
pjw 86060 57144 90 1.65 1.43 17.1417 us
honeyman 102331 74604 129 1.38 1.18 17.4236 us
fnv 109542 82788 6 1.29 0.56 22.0791 us
Also interesting is the size of the functions (not including any data
needed by "lennart" and "honeyman") in bytes for a few architectures.
function sparc alpha ns32k
bernstein 76 68 52
bobhash 708 940 539
byacchash 68 56 46
crchash 80 88 61
dumbhash 52 48 38
fnv 100 92 47
gawkhash 332 396 290
honeyman 92 88 64
lennart 72 132 66
mousehash 80 92 71
perlhash 72 68 52
perlxorhash 72 72 52
pjwhash 84 100 69
pythonhash 92 116 66
tclhash 68 52 42
torekhash 268 300 210
"bob", perhaps the best function in terms of both speed and spread, is
also by far the largest. Is it's size a concern from a Icache polution
POV?
"torek" is the quickest, but is only good and not very good on binary
data. It does however have the obfuscation advantage of using Duff's
device :-)
So, I'm wondering if outright speed or hash distrubtion is the main
criteria for selecting an algorithm? Leaders appear to be:
size doesn't matter: bob
speed important: torek
distrubtion important: gawk
Simon.
--
Simon Burge <simonb@wasabisystems.com>
NetBSD CDs, Support and Service: http://www.wasabisystems.com/