Subject: Re: New hash algorithm - FNV - for use in the kernel
To: None <tech-perform@netbsd.org, tech-kern@netbsd.org>
From: Simon Burge <simonb@wasabisystems.com>
List: tech-kern
Date: 11/28/2001 15:44:26
Jason R Thorpe wrote:
> [ ... ] If the Perl hash function gives
> distribution that is as good (or maybe even better?) as FNV for
> a given application, then obviously the Perl hash routine is the
> better choice.
Running a couple of the hashes mentioned here of two different sorts
of data revail some interesting results. The tests use 65536 buckets
(hashfunction % 64k) and just count the number of used buckets and the
maximum count of each bucket.
The hash functions are:
- dumb
the current "add all the bytes together" function.
- fnv
from <sys/fnv_hash.h>
- lennart
the hash function Lennart Augustsson posted to these lists.
- crc
the crc hash function from
http://www.cs.hmc.edu/~geoff/classes/hmc.cs070.200109/homework10/hashfuncs.html
- perl
the perl hash function Tom Spindler posted to these lists.
First is 92378 file names, all starting with "/home/simonb/src":
total: 92378
dumb:
used: 7201
max: 63
fnv:
used: 49709
max: 9
lennart:
used: 49508
max: 8
crc:
used: 46967
max: 80
perl:
used: 49413
max: 9
Second is 32MB of output from /dev/urandom, hashed as 32byte
streams:
total: 1048576
dumb:
used: 3167
max: 1076
fnv:
used: 65536
max: 35
lennart:
used: 65536
max: 4132
crc:
used: 65536
max: 35
perl:
used: 65536
max: 36
From these two test cases, both FNV and perl look to be best. Given
6 "1"s in the FNV multiplacation constant and just 2 "1"s in the perl
multiplacation constant, I'd say the perl hash function is looking good.
It was interesting to note the "lennart" hash function didn't fare too
well for binary data, and the "crc" hash function didn't do too well
over string data.
Obviously more testing should be done first though...
Simon.
--
Simon Burge <simonb@wasabisystems.com>
NetBSD CDs, Support and Service: http://www.wasabisystems.com/