Subject: Re: New hash algorithm - FNV - for use in the kernel
To: None <tech-perform@netbsd.org, tech-kern@netbsd.org>
From: Simon Burge <simonb@wasabisystems.com>
List: tech-kern
Date: 11/28/2001 17:13:49
der Mouse wrote:
> If you want something CRCish, I'd suggest something like
>
> unsigned int hash(const void *data, int len)
> {
> const unsigned char *dp;
> unsigned int v;
>
> v = ~0;
> for (dp=data;len>0;dp++,len--)
> { v ^= *dp;
> v = (v >> 1) ^ ((v & 1) ? 0xedb88320 : 0);
> }
> return(v);
> }
Adding this as "mouse" gets for filenames:
total: 92378
dumb: used: 7201 max: 63
fnv: used: 49709 max: 9
lennart: used: 49508 max: 8
crc: used: 46967 max: 80
perl: used: 49413 max: 9
mouse: used: 49137 max: 9
and binary data:
total: 1048576
dumb: used: 3167 max: 1076
fnv: used: 65536 max: 35
lennart: used: 65536 max: 4132
crc: used: 65536 max: 35
perl: used: 65536 max: 36
mouse: used: 65536 max: 35
> > crc:
> > used: 46967
> > max: 80
>
> Very interesting. The buckets-used value is good, but there's that
> high max. A histogram of the bucket loads would be intersting, and if
> there really is just one peak, it would be interesting to analyze the
> similarities among the strings that hit it.
Without analysis, >99% of entries in the string case were < 10, with the
last couple (sorted numerically) being:
3181: 26
13721: 26
41076: 26
16296: 27
11178: 31
24223: 31
39751: 32
21170: 77
18108: 80
Similarly, the "lennart" hash with binary data finishes off with
252: 42
3: 43
195: 44
76: 46
0: 4132
If I change it so that it doesn't stop on a zero byte (!), the figure
look a bit better, but not great:
lennart: used: 30316 max: 212
with the largest buckets being:
35668: 172
45852: 172
46032: 173
35654: 174
23575: 175
46020: 186
60616: 186
45824: 193
45969: 208
35681: 212
> > From these two test cases, both FNV and perl look to be best.
>
> Agreed, though I'd be interested to see similar results for, say, a
> data set consisting of several thousand unique filehandles sniffed from
> NFS traffic, since it seems that's one of the major applications. (At
> least, it's the one that got us onto this thread. :-)
I was lazy. If you want to shoot several thousand filehandles to
me, feel free :-)
Simon.
--
Simon Burge <simonb@wasabisystems.com>
NetBSD CDs, Support and Service: http://www.wasabisystems.com/