Subject: more Q&D results [hash for use in the kernel]
To: None <tech-kern@netbsd.org, tech-perform@netbsd.org>
From: ozan s. yigit <oz@zonzorp.canada.sun.com>
List: tech-kern
Date: 11/28/2001 13:55:30
i include two sets of results, consider them preliminary and also consider
reading a paper published long ago: Mckenzie et al. "Selecting a Hashing
Algorithm" [1]. the test harness that generates these results was written
for a follow-up paper i was once working on. [i guess I should dust that
up and finish it] the important thing in these results is the Rn, which
can be described as the "closeness" to a perfect function for that
particular set of data. 1 would be perfect scatter but of course
we will be happy with anything that is very close.
just for fun, these results are for all C indentifiers in X11R4
(i think; it was gathered a time ago) hashed to a) an 8K table, and
b) a prime table closest to 8K.
i added the "perl" hash function that was posted here; it is a variant
of the bernstein's k=33 hash, and the multiplication can be eliminated.
it seems to do better than FNV for anything i have so far tried. it can
be implemented with shifts and additions only, so does not need to pay
multiplication cost.
don't get hang-up in these results; i can tell you that bernstein and its
variants generally does well, along with few others. as i said before
FNV is nothing extra special, so far as i can tell from running these
tests since SP&E paper appeared over various types of string data.
more on this anon.
oz
---
[1] B. J. McKenzie, R. Harries, T. Bell.
Selecting a Hashing Algorithm
Software - Practice and Experience
20(2):209-224, Feb 1990.
ozan s. yigit staff engineer, sun microsystems/es
http://www.cs.yorku.ca/~oz ozan.yigit@sun.com || +1 [905] 415 2878
---
the pen is mightier than the sword if the sword is very short and the
pen is very sharp -- Terry Pratchett
--- 8192 ---
X11ID: 16125 words
hashtable size 8192
W(W+N)/2N = 23932
unused coll avg st lookup
function buckets free Rn chain dev time
thinking machines/wais 13.87% 14.10% 1.339 2.3 1.408 0.168
knuth/TAOCP 20.64% 14.83% 1.777 2.5 2.131 0.140
icon (k=1) 71.78% 2.51% 4.173 7.0 4.306 0.136
bernstein 14.37% 13.84% 1.347 2.3 1.424 0.160
phong vo (k=37) 13.68% 14.47% 1.335 2.3 1.400 0.136
larson/linear 13.98% 13.98% 1.338 2.3 1.405 0.146
oz/num (k=153) 14.23% 14.16% 1.344 2.3 1.419 0.161
bernstein (k=33) 13.53% 14.12% 1.326 2.3 1.380 0.161
phong vo 14.47% 13.63% 1.339 2.3 1.407 0.168
K&R/torek (k=31) 14.55% 13.67% 1.345 2.3 1.421 0.132
corbett/byacc 13.83% 14.02% 1.334 2.3 1.396 0.097
pcc/cpp/c++ (k=2) 20.14% 14.00% 1.590 2.5 1.857 0.104
gnu emacs 24.34% 14.24% 1.986 2.6 2.400 0.140
drdobbs/john boyer 14.47% 13.90% 1.344 2.3 1.418 0.237
danny sleator/parse 15.95% 13.64% 1.380 2.3 1.490 0.148
holzmann/protocols 16.21% 14.24% 1.413 2.3 1.554 0.116
perl 13.22% 14.16% 1.322 2.3 1.372 0.132
rayan/ssl/zmailer 25.51% 14.77% 1.951 2.6 2.357 0.262
oz/sdbm (k=65587) 13.39% 13.98% 1.326 2.3 1.380 0.173
ken thompson (dbm) 14.12% 13.84% 1.338 2.3 1.405 0.213
weinberger/dragon book 33.44% 11.90% 2.072 3.0 2.502 0.142
mersenne 13.71% 14.21% 1.334 2.3 1.396 0.280
pearson/cargill 14.84% 14.12% 1.357 2.3 1.444 0.096
knuth/graphbase (k=314159) 14.25% 13.68% 1.344 2.3 1.417 0.165
hanson/fraser/lcc 17.09% 14.32% 1.523 2.4 1.749 0.146
oz/sdbm (k=65599) 14.31% 13.85% 1.339 2.3 1.407 0.162
stu wecker/decus diff 14.28% 13.96% 1.340 2.3 1.409 0.130
gnu cpp (k=4) 36.21% 12.71% 3.203 3.1 3.588 0.146
gonnet/baeza-yates (k=131) 13.94% 13.79% 1.330 2.3 1.389 0.140
kim walden/depend 32.19% 15.60% 4.068 2.9 4.234 0.177
phong vo (k=129) 14.05% 14.20% 1.350 2.3 1.430 0.139
un*x refer 36.90% 7.28% 1.936 3.1 2.339 0.286
stevens/db 21.03% 13.62% 1.556 2.5 1.802 0.197
fowler/noll/vo 14.20% 14.21% 1.347 2.3 1.424 0.159
gcc front-end (k=613) 13.90% 14.13% 1.343 2.3 1.415 0.146
horspool/bsdbook (k=13753) 13.61% 13.74% 1.328 2.3 1.383 0.183
c++/chop 51.32% 2.31% 2.069 4.0 2.499 0.113
numerical recipes (ranqd1) 13.70% 14.18% 1.337 2.3 1.404 0.168
honeyman/pathalias 20.28% 15.47% 4.381 2.5 4.445 0.189
--- 8209 ---
X11ID: 16125 words
hashtable size 8209
W(W+N)/2N = 23899
unused coll avg st lookup
function buckets free Rn chain dev time
thinking machines/wais 14.08% 13.82% 1.339 2.3 1.404 0.173
knuth/TAOCP 14.28% 14.15% 1.344 2.3 1.416 0.135
icon (k=1) 71.84% 2.51% 4.179 7.0 4.303 0.136
bernstein 13.67% 14.10% 1.332 2.3 1.391 0.164
phong vo (k=37) 14.46% 13.49% 1.337 2.3 1.400 0.139
larson/linear 14.14% 14.00% 1.343 2.3 1.413 0.150
oz/num (k=153) 14.24% 13.98% 1.344 2.3 1.415 0.158
bernstein (k=33) 14.08% 14.02% 1.338 2.3 1.403 0.157
phong vo 14.34% 13.69% 1.338 2.3 1.402 0.167
K&R/torek (k=31) 13.75% 14.26% 1.339 2.3 1.404 0.132
pcc/cpp/c++ (k=2) 17.10% 13.59% 1.414 2.4 1.554 0.098
gnu emacs 14.39% 14.10% 1.346 2.3 1.419 0.130
drdobbs/john boyer 14.59% 13.84% 1.344 2.3 1.415 0.212
danny sleator/parse 14.35% 14.13% 1.345 2.3 1.417 0.145
holzmann/protocols 17.16% 13.99% 1.437 2.4 1.595 0.116
perl 14.11% 14.07% 1.337 2.3 1.401 0.132
rayan/ssl/zmailer 14.42% 14.07% 1.349 2.3 1.426 0.255
oz/sdbm (k=65587) 13.74% 14.02% 1.331 2.3 1.387 0.182
ken thompson (dbm) 13.36% 14.27% 1.327 2.3 1.380 0.211
weinberger/dragon book 13.74% 14.28% 1.336 2.3 1.398 0.130
mersenne 13.34% 14.07% 1.325 2.3 1.375 0.280
pearson/cargill 14.24% 13.98% 1.344 2.3 1.415 0.102
knuth/graphbase (k=314159) 14.01% 14.21% 1.336 2.3 1.398 0.168
hanson/fraser/lcc 14.37% 13.84% 1.342 2.3 1.412 0.161
oz/sdbm (k=65599) 13.90% 14.07% 1.339 2.3 1.405 0.163
stu wecker/decus diff 14.03% 14.18% 1.344 2.3 1.416 0.128
gnu cpp (k=4) 15.42% 14.31% 1.418 2.3 1.560 0.118
gonnet/baeza-yates (k=131) 13.86% 14.45% 1.338 2.3 1.404 0.140
kim walden/depend 25.65% 15.62% 2.492 2.6 2.948 0.149
phong vo (k=129) 13.57% 14.15% 1.330 2.3 1.386 0.136
un*x refer 13.64% 14.20% 1.330 2.3 1.386 0.271
stevens/db 20.70% 14.39% 1.559 2.5 1.805 0.201
fowler/noll/vo 14.00% 14.16% 1.342 2.3 1.411 0.159
gcc front-end (k=613) 13.83% 14.15% 1.334 2.3 1.394 0.143
horspool/bsdbook (k=13753) 14.00% 14.13% 1.340 2.3 1.408 0.187
c++/chop 51.42% 2.31% 2.071 4.0 2.498 0.124
numerical recipes (ranqd1) 13.95% 13.99% 1.339 2.3 1.405 0.164
honeyman/pathalias 13.95% 14.01% 1.331 2.3 1.389 0.146
--- end ---