Subject: Re: New hash algorithm - FNV - for use in the kernel
To: David Laight <David.Laight@btinternet.com>
From: Luke Mewburn <lukem@wasabisystems.com>
List: tech-perform
Date: 11/27/2001 22:17:13
On Tue, Nov 27, 2001 at 09:21:36AM -0000, David Laight wrote:
> > 1) vmstat -h nfsnodehash:
> >                     total     used     util      num  average  maximum
> > hash table        buckets  buckets        %    items    chain    chain
> > ----------------------------------------------------------------------
> > 
> >     with existing nfs_hash():
> > nfsnodehash         65536     1000     1.53    34688       34      105
> 
> The 1000 used buckets looks too much like a non-random number!
> Anyone know whether this is true?

I've run a bunch of other tests. It's just a random sample. Here's
the non FNV results from another machine on the same data set:
	nfsnodehash          8192      770     9.40     4077        5       17

Take a look at the existing implementation of nfs_hash(); it does not
appear to me that there's a deliberate part of the code that's trying
to specifically limit the hash distribution to around 1000. The code
currently just sums up all the bytes in the filehandle and returns
that as the hash value.


Also, I'm interested in some real-world tests of proposed improved
implementations. It's all well and good for people to fall into
the trap of making statements like ``oh, if you make this [non
trivial] change to the algorithm it will have this benefit'', but
unless an implementation is provided to be actually tested and
proven, the statements are generally all academic.

(Note; I'm not singling you out here; this is a general personal
observation on many such discussions that occur about proposed
enhancements to NetBSD).