Subject: Re: New hash algorithm - FNV - for use in the kernel
To: None <,>
From: der Mouse <mouse@Rodents.Montreal.QC.CA>
List: tech-kern
Date: 11/28/2001 00:20:34
>  - crc
> 	the crc hash function from

After reading it, I would hesitate to call that a CRC.  It may be
possible to look at it as a cRC with a sufficiently uninteresting
polynomial, but I'm not sure even of that.

If you want something CRCish, I'd suggest something like

unsigned int hash(const void *data, int len)
 const unsigned char *dp;
 unsigned int v;

 v = ~0;
 for (dp=data;len>0;dp++,len--)
  { v ^= *dp;
    v = (v >> 1) ^ ((v & 1) ? 0xedb88320 : 0);

which does actually bear a visible resemblance to a CRC computation.
(I'd have to think about it to be sure, but offhand I think it's
equivalent to XORing all bytes together, each offset by one bit from
the next, and doing a CRC of the resulting bitstring.  I'd suggest a
real CRC, except that they tend to use either lots of looping or
comparatively large tables - though you can CRC one byte with two
references to a 16-entry table, so it may not be too bad.)

> 	crc:
> 	  used: 46967
> 	  max:  80

Very interesting.  The buckets-used value is good, but there's that
high max.  A histogram of the bucket loads would be intersting, and if
there really is just one peak, it would be interesting to analyze the
similarities among the strings that hit it.

I'll have to do something similar myself and see if a "random" set of
pathnames from my system produces similar results.

> From these two test cases, both FNV and perl look to be best.

Agreed, though I'd be interested to see similar results for, say, a
data set consisting of several thousand unique filehandles sniffed from
NFS traffic, since it seems that's one of the major applications.  (At
least, it's the one that got us onto this thread. :-)

/~\ The ASCII				der Mouse
\ / Ribbon Campaign
 X  Against HTML
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B