Subject: Re: New hash algorithm - FNV - for use in the kernel
To: Luke Mewburn <lukem@wasabisystems.com>
From: ozan s. yigit <oz@zonzorp.canada.sun.com>
List: tech-kern
Date: 11/28/2001 11:42:41
| I recently added <sys/fnv_hash.h> (from FreeBSD), which provides an
| implementation of the Fowler / Noll / Vo (FNV) hash, as described at:
| http://www.isthe.com/chongo/tech/comp/fnv/
|
| I've done some initial investigation and testing of the effectiveness
| of this hash algorithm, and I'm getting similar improvements that
| Peter Wemm (of FreeBSD) found when he switched parts of FreeBSD over
| to using the FNV hash.
|
| [etc]
i'm joining this discussion a bit late, so sorry if i say anything that
has been said before. there are a number of very fast functions that do
as well or better than Fowler/Vo[1] in speed and distribution, at least
for basic string inputs, such as all filenames in netbsd /usr, using a
table size of 8k. [tests on a voyager running netbsd 1.5, which i happen
to have sitting next to me] here, i have dusted off a test harness i
use for non-cryptographic hash functions [but i rarely see one that is
worth testing.]
stdin: 13148 words
hashtable size 8192
W(W+N)/2N = 17125
unused coll avg st lookup
function buckets free Rn chain dev time
thinking machines/wais 20.59% 20.00% 1.393 2.0 1.282 0.267
knuth/TAOCP 28.53% 19.91% 1.946 2.2 1.989 0.274
icon (k=1) 66.02% 5.00% 3.656 4.7 3.333 0.314
bernstein 20.35% 20.19% 1.388 2.0 1.274 0.244
phong vo (k=37) 20.02% 20.35% 1.388 2.0 1.273 0.235
larson/linear 19.46% 20.35% 1.375 2.0 1.253 0.250
oz/num (k=153) 19.90% 20.44% 1.381 2.0 1.263 0.259
bernstein (k=33) 20.24% 19.97% 1.385 2.0 1.269 0.245
phong vo 19.59% 20.55% 1.383 2.0 1.265 0.296
K&R/torek (k=31) 20.06% 20.25% 1.385 2.0 1.269 0.222
corbett/byacc 19.71% 20.57% 1.386 2.0 1.270 0.194
pcc/cpp/c++ (k=2) 35.36% 17.31% 2.106 2.5 2.151 0.266
gnu emacs 29.16% 19.59% 1.868 2.3 1.905 0.272
drdobbs/john boyer 20.24% 19.67% 1.381 2.0 1.262 0.451
danny sleator/parse 26.16% 18.89% 1.577 2.2 1.553 0.256
holzmann/protocols 22.09% 19.54% 1.432 2.1 1.344 0.239
rayan/ssl/zmailer 48.14% 16.96% 6.274 3.1 4.696 0.827
oz/sdbm (k=65587) 19.67% 20.62% 1.383 2.0 1.265 0.284
ken thompson (dbm) 20.23% 20.00% 1.386 2.0 1.270 0.401
weinberger/dragon book 47.07% 13.55% 2.705 3.0 2.670 0.328
mersenne 20.07% 20.03% 1.380 2.0 1.260 0.578
pearson/cargill 20.59% 20.01% 1.398 2.0 1.290 0.191
knuth/graphbase (k=314159) 19.45% 20.59% 1.374 2.0 1.251 0.297
hanson/fraser/lcc 34.27% 17.77% 1.986 2.4 2.030 0.289
oz/sdbm (k=65599) 20.29% 20.06% 1.385 2.0 1.269 0.258
stu wecker/decus diff 19.89% 20.01% 1.380 2.0 1.260 0.258
gnu cpp (k=4) 51.88% 13.75% 12.322 3.3 6.880 1.261
gonnet/baeza-yates (k=131) 19.34% 20.88% 1.377 2.0 1.255 0.235
kim walden/depend 26.11% 19.19% 1.618 2.2 1.607 0.254
phong vo (k=129) 19.95% 20.40% 1.387 2.0 1.272 0.229
un*x refer 99.22% 0.00% 79.662 205.4 18.136 4.580
stevens/db 21.03% 19.88% 1.404 2.0 1.300 0.415
fowler/noll/vo 20.04% 20.30% 1.387 2.0 1.273 0.276
gcc front-end (k=613) 19.63% 20.51% 1.378 2.0 1.258 0.245
horspool/bsdbook (k=13753) 20.83% 19.51% 1.390 2.0 1.277 0.290
c++/chop 52.37% 3.98% 2.020 3.4 2.065 0.237
numerical recipes (ranqd1) 20.30% 19.89% 1.382 2.0 1.264 0.297
honeyman/pathalias 22.01% 20.02% 1.538 2.1 1.500 0.260
some of these functions should be familiar. note that fowler/noll/vo
is nothing special in this set of tests, and over the years i have been
collecting and running these tests, I have *never* had any good reason
to use it. it is just another better-than-average hash function, and
happens to have a lot of chongo drum beating[1]. [gee, i guess i should
start a web page and a mailing list for my own function, which appears
in many places, and has the claim to fame of having a prime number as
its multiplicative constant that i picked out of thin air, as it gave
excellent bit distribution in sdbm, and can be converted to use shifts
and addition:
foreach (c in str) hash = c + (hash << 6) + (hash << 16) - hash; ]
now, when it comes to binary data, things may differ somewhat, but i have
not really tested binary data with this gear, partly because i have no idea
what kind of binary data i may want to hash. if anyone can provide a
substential data file for the kind of things kernel needs to hash, I
would love to test it. I will also add the few new functions mentioned
here, eg. mouse's CRC variant etc. [i should add that this harness
really needs a rewrite, so don't ask for it yet; i know some of these
functions do better with pow2 tables, and others do well with primes.
my next harness will allow a better separation of the two kinds, so
that the evaluation could be cleaner]
i guess i should do a more comprehensive set of tests, with graphs to
show that people have many more choices than the best-advertised function
that is around.
more on this anon.
oz
---
[1] reminiscent of mark twain's observation about chickens screaming
as if they laid a bomb, not an egg...
---
ozan s. yigit staff engineer, sun microsystems/es
http://www.cs.yorku.ca/~oz ozan.yigit@sun.com || +1 [905] 415 2878
---
calvin: careful - we don't want to learn from this!
i support CAFE: www.eff.org/cafe