Subject: Re: New hash algorithm - FNV - for use in the kernel
To: Luke Mewburn <lukem@wasabisystems.com>
From: ozan s. yigit <oz@zonzorp.canada.sun.com>
List: tech-kern
Date: 11/28/2001 11:42:41
| I recently added <sys/fnv_hash.h> (from FreeBSD), which provides an
| implementation of the Fowler / Noll / Vo (FNV) hash, as described at:
| 	http://www.isthe.com/chongo/tech/comp/fnv/
| 
| I've done some initial investigation and testing of the effectiveness
| of this hash algorithm, and I'm getting similar improvements that
| Peter Wemm (of FreeBSD) found when he switched parts of FreeBSD over
| to using the FNV hash.
|
| [etc]

i'm joining this discussion a bit late, so sorry if i say anything that
has been said before. there are a number of very fast functions that do
as well or better than Fowler/Vo[1] in speed and distribution, at least
for basic string inputs, such as all filenames in netbsd /usr, using a
table size of 8k. [tests on a voyager running netbsd 1.5, which i happen
to have sitting next to me] here, i have dusted off a test harness i
use for non-cryptographic hash functions [but i rarely see one that is
worth testing.]

    stdin: 13148 words
    hashtable size 8192
    W(W+N)/2N = 17125
                                unused  coll          avg     st     lookup
    function 	               buckets  free    Rn   chain   dev    time
    thinking machines/wais      20.59%  20.00%  1.393  2.0   1.282   0.267
    knuth/TAOCP                 28.53%  19.91%  1.946  2.2   1.989   0.274
    icon (k=1)                  66.02%   5.00%  3.656  4.7   3.333   0.314
    bernstein                   20.35%  20.19%  1.388  2.0   1.274   0.244
    phong vo (k=37)             20.02%  20.35%  1.388  2.0   1.273   0.235
    larson/linear               19.46%  20.35%  1.375  2.0   1.253   0.250
    oz/num (k=153)              19.90%  20.44%  1.381  2.0   1.263   0.259
    bernstein (k=33)            20.24%  19.97%  1.385  2.0   1.269   0.245
    phong vo                    19.59%  20.55%  1.383  2.0   1.265   0.296
    K&R/torek (k=31)            20.06%  20.25%  1.385  2.0   1.269   0.222
    corbett/byacc               19.71%  20.57%  1.386  2.0   1.270   0.194
    pcc/cpp/c++ (k=2)           35.36%  17.31%  2.106  2.5   2.151   0.266
    gnu emacs                   29.16%  19.59%  1.868  2.3   1.905   0.272
    drdobbs/john boyer          20.24%  19.67%  1.381  2.0   1.262   0.451
    danny sleator/parse         26.16%  18.89%  1.577  2.2   1.553   0.256
    holzmann/protocols          22.09%  19.54%  1.432  2.1   1.344   0.239
    rayan/ssl/zmailer           48.14%  16.96%  6.274  3.1   4.696   0.827
    oz/sdbm (k=65587)           19.67%  20.62%  1.383  2.0   1.265   0.284
    ken thompson (dbm)          20.23%  20.00%  1.386  2.0   1.270   0.401
    weinberger/dragon book      47.07%  13.55%  2.705  3.0   2.670   0.328
    mersenne                    20.07%  20.03%  1.380  2.0   1.260   0.578
    pearson/cargill             20.59%  20.01%  1.398  2.0   1.290   0.191
    knuth/graphbase (k=314159)  19.45%  20.59%  1.374  2.0   1.251   0.297
    hanson/fraser/lcc           34.27%  17.77%  1.986  2.4   2.030   0.289
    oz/sdbm (k=65599)           20.29%  20.06%  1.385  2.0   1.269   0.258
    stu wecker/decus diff       19.89%  20.01%  1.380  2.0   1.260   0.258
    gnu cpp (k=4)               51.88%  13.75%  12.322  3.3   6.880   1.261
    gonnet/baeza-yates (k=131)  19.34%  20.88%  1.377  2.0   1.255   0.235
    kim walden/depend           26.11%  19.19%  1.618  2.2   1.607   0.254
    phong vo (k=129)            19.95%  20.40%  1.387  2.0   1.272   0.229
    un*x refer                  99.22%   0.00%  79.662  205.4  18.136   4.580
    stevens/db                  21.03%  19.88%  1.404  2.0   1.300   0.415
    fowler/noll/vo              20.04%  20.30%  1.387  2.0   1.273   0.276
    gcc front-end (k=613)       19.63%  20.51%  1.378  2.0   1.258   0.245
    horspool/bsdbook (k=13753)  20.83%  19.51%  1.390  2.0   1.277   0.290
    c++/chop                    52.37%   3.98%  2.020  3.4   2.065   0.237
    numerical recipes (ranqd1)  20.30%  19.89%  1.382  2.0   1.264   0.297
    honeyman/pathalias          22.01%  20.02%  1.538  2.1   1.500   0.260

some of these functions should be familiar. note that fowler/noll/vo
is nothing special in this set of tests, and over the years i have been
collecting and running these tests, I have *never* had any good reason
to use it. it is just another better-than-average hash function, and
happens to have a lot of chongo drum beating[1]. [gee, i guess i should
start a web page and a mailing list for my own function, which appears
in many places, and has the claim to fame of having a prime number as
its multiplicative constant that i picked out of thin air, as it gave
excellent bit distribution in sdbm, and can be converted to use shifts
and addition:
foreach (c in str) hash = c + (hash << 6) + (hash << 16) - hash; ]

now, when it comes to binary data, things may differ somewhat, but i have
not really tested binary data with this gear, partly because i have no idea
what kind of binary data i may want to hash. if anyone can provide a
substential data file for the kind of things kernel needs to hash, I
would love to test it. I will also add the few new functions mentioned
here, eg. mouse's CRC variant etc. [i should add that this harness
really needs a rewrite, so don't ask for it yet; i know some of these
functions do better with pow2 tables, and others do well with primes.
my next harness will allow a better separation of the two kinds, so
that the evaluation could be cleaner]

i guess i should do a more comprehensive set of tests, with graphs to
show that people have many more choices than the best-advertised function
that is around.

more on this anon.

oz
---
[1] reminiscent of mark twain's observation about chickens screaming
as if they laid a bomb, not an egg...
---
ozan s. yigit			staff engineer, sun microsystems/es
http://www.cs.yorku.ca/~oz	ozan.yigit@sun.com || +1 [905] 415 2878
---
calvin: careful - we don't want to learn from this!
i support CAFE: www.eff.org/cafe