Subject: Re: New hash algorithm - FNV - for use in the kernel
To: None <tech-perform@netbsd.org, tech-kern@netbsd.org>
From: Simon Burge <simonb@wasabisystems.com>
List: tech-kern
Date: 11/29/2001 14:44:26
Darren Reed wrote:

> how long does each of these take to compute 100,000 hashes ?

So I'd mostly finished modifying my program last night before Ozan
first posted (feh!).  I've had a couple of requests for the source
code - it's at:

	ftp://ftp.netbsd.org/pub/NetBSD/misc/simonb/hashtest.tar.gz

Note that I don't pretend it to be the model of clean programming,
it's one of those cobbled-together things without any real design
thought.  At least I made it compile with WARNS=2 before I tar'd it
up....

The usage (which I forgot to document anywhere near the source code) is:

	hashtest s <numbuckets>
	hashtest b <numbuckets> <datumsize>

for new-line terminated string data and binary data respectively, with
input from stdin.  Error checking is minimal :)

For my 90000-old file list on a pc164 (500MHz alpha 21164a) I see

	./hashtest s 32769 < flist.2
	total items:  92383
	hash dumb         used   7201   max     63   time 0.192sec
	hash fnv          used  30785   max     10   time 0.280sec
	hash lennart      used  30900   max     13   time 0.303sec
	hash crc          used  30783   max     12   time 0.281sec
	hash perl         used  30137   max     12   time 0.231sec
	hash mouse        used  30666   max     12   time 0.319sec

and for 1M 32-byte random binary data I see

	./hashtest b 65536 32 < rnd.2
	total items:  1048576
	hash dumb         used   3167   max   1076   time 1.242sec
	hash fnv          used  65536   max     35   time 1.400sec
	hash lennart      used  65536   max   4132   time 1.193sec
	hash crc          used  65536   max     35   time 1.393sec
	hash perl         used  65536   max     36   time 1.140sec
	hash mouse        used  65536   max     35   time 1.611sec

Note that I don't think the "lennart" test is right for binary data at
the moment.

For reference, I ran the string test on a pc532 (25MHz ns32532) with a
reduced filename list (otherwise it was swapping!):

	./hashtest s 65536 < flist.3
	total items:  40000
	hash dummy        used      1   max  40000   time  7.289sec
	hash dumb         used   6157   max     29   time  9.261sec
	hash fnv          used  29939   max      6   time 11.718sec
	hash lennart      used  29991   max      7   time  9.786sec
	hash crc          used  27899   max     31   time  9.168sec
	hash perl         used  29931   max      6   time  8.610sec
	hash mouse        used  29599   max      7   time  9.546sec

The "dummy" test is just to make sure all of the data is paged in.  The
MULi instruction on a '532 is slower than a "shift and add" that the
"perl" hash uses (and compiles to on the '532), but faster than the "six
shifts and adds" that the "fnv" hash would use (which gcc was smart
enough to use MULi for).


As for data sets, a "find -x / -print" list works just as well as my
~/src file list (I have /usr and /usr/pkg on my root fs), and the binary
data was the output of "dd if=/dev/urandom of=rnd.2 bs=32 count=1m"

Simon.
--
Simon Burge                            <simonb@wasabisystems.com>
NetBSD CDs, Support and Service:    http://www.wasabisystems.com/