Subject: Re: New hash algorithm - FNV - for use in the kernel
To: None <tech-perform@netbsd.org, tech-kern@netbsd.org>
From: Simon Burge <simonb@wasabisystems.com>
List: tech-kern
Date: 11/29/2001 14:44:26
Darren Reed wrote:
> how long does each of these take to compute 100,000 hashes ?
So I'd mostly finished modifying my program last night before Ozan
first posted (feh!). I've had a couple of requests for the source
code - it's at:
ftp://ftp.netbsd.org/pub/NetBSD/misc/simonb/hashtest.tar.gz
Note that I don't pretend it to be the model of clean programming,
it's one of those cobbled-together things without any real design
thought. At least I made it compile with WARNS=2 before I tar'd it
up....
The usage (which I forgot to document anywhere near the source code) is:
hashtest s <numbuckets>
hashtest b <numbuckets> <datumsize>
for new-line terminated string data and binary data respectively, with
input from stdin. Error checking is minimal :)
For my 90000-old file list on a pc164 (500MHz alpha 21164a) I see
./hashtest s 32769 < flist.2
total items: 92383
hash dumb used 7201 max 63 time 0.192sec
hash fnv used 30785 max 10 time 0.280sec
hash lennart used 30900 max 13 time 0.303sec
hash crc used 30783 max 12 time 0.281sec
hash perl used 30137 max 12 time 0.231sec
hash mouse used 30666 max 12 time 0.319sec
and for 1M 32-byte random binary data I see
./hashtest b 65536 32 < rnd.2
total items: 1048576
hash dumb used 3167 max 1076 time 1.242sec
hash fnv used 65536 max 35 time 1.400sec
hash lennart used 65536 max 4132 time 1.193sec
hash crc used 65536 max 35 time 1.393sec
hash perl used 65536 max 36 time 1.140sec
hash mouse used 65536 max 35 time 1.611sec
Note that I don't think the "lennart" test is right for binary data at
the moment.
For reference, I ran the string test on a pc532 (25MHz ns32532) with a
reduced filename list (otherwise it was swapping!):
./hashtest s 65536 < flist.3
total items: 40000
hash dummy used 1 max 40000 time 7.289sec
hash dumb used 6157 max 29 time 9.261sec
hash fnv used 29939 max 6 time 11.718sec
hash lennart used 29991 max 7 time 9.786sec
hash crc used 27899 max 31 time 9.168sec
hash perl used 29931 max 6 time 8.610sec
hash mouse used 29599 max 7 time 9.546sec
The "dummy" test is just to make sure all of the data is paged in. The
MULi instruction on a '532 is slower than a "shift and add" that the
"perl" hash uses (and compiles to on the '532), but faster than the "six
shifts and adds" that the "fnv" hash would use (which gcc was smart
enough to use MULi for).
As for data sets, a "find -x / -print" list works just as well as my
~/src file list (I have /usr and /usr/pkg on my root fs), and the binary
data was the output of "dd if=/dev/urandom of=rnd.2 bs=32 count=1m"
Simon.
--
Simon Burge <simonb@wasabisystems.com>
NetBSD CDs, Support and Service: http://www.wasabisystems.com/