Subject: Re: New hash algorithm - FNV - for use in the kernel
To: None <,>
From: Simon Burge <>
List: tech-perform
Date: 11/28/2001 15:44:26
Jason R Thorpe wrote:

> [ ... ]  If the Perl hash function gives
> distribution that is as good (or maybe even better?) as FNV for
> a given application, then obviously the Perl hash routine is the
> better choice.

Running a couple of the hashes mentioned here of two different sorts
of data revail some interesting results.  The tests use 65536 buckets
(hashfunction % 64k) and just count the number of used buckets and the
maximum count of each bucket.

The hash functions are:

 - dumb
	the current "add all the bytes together" function.
 - fnv
	from <sys/fnv_hash.h>
 - lennart
	the hash function Lennart Augustsson posted to these lists.
 - crc
	the crc hash function from
 - perl
	the perl hash function Tom Spindler posted to these lists.

First is 92378 file names, all starting with "/home/simonb/src":

	total:  92378
	  used: 7201
	  max:  63
	  used: 49709
	  max:  9
	  used: 49508
	  max:  8
	  used: 46967
	  max:  80
	  used: 49413
	  max:  9

Second is 32MB of output from /dev/urandom, hashed as 32byte

	total:  1048576
	  used: 3167
	  max:  1076
	  used: 65536
	  max:  35
	  used: 65536
	  max:  4132
	  used: 65536
	  max:  35
	  used: 65536
	  max:  36

From these two test cases, both FNV and perl look to be best.  Given
6 "1"s in the FNV multiplacation constant and just 2 "1"s in the perl
multiplacation constant, I'd say the perl hash function is looking good.

It was interesting to note the "lennart" hash function didn't fare too
well for binary data, and the "crc" hash function didn't do too well
over string data.

Obviously more testing should be done first though...

Simon Burge                            <>
NetBSD CDs, Support and Service: