Subject: Re: New hash algorithm - FNV - for use in the kernel
To: Luke Mewburn <tech-perform@netbsd.org>
From: David Laight <David.Laight@btinternet.com>
List: tech-kern
Date: 11/27/2001 09:21:36
> There is a potential issue however; FNV uses multiply (*) and xor (^),
> whereas a lot of the existing hashes on strings and buffers just use 
> addition (+).

Using multiply will (probably) stuff the performance of sytems that
don't have fast hardware multiply (eg sparc32).

> 1) vmstat -h nfsnodehash:
>                     total     used     util      num  average  maximum
> hash table        buckets  buckets        %    items    chain    chain
> ----------------------------------------------------------------------
> 
>     with existing nfs_hash():
> nfsnodehash         65536     1000     1.53    34688       34      105

The 1000 used buckets looks too much like a non-random number!
Anyone know whether this is true?
> 
>     using fnv_32_buf()
> nfsnodehash         65536    27054    41.28    34867        1        6

One problem, I have with these kernel hash tables, is that you end up
allocating a massive table - and require a good (computationally expensive)
hash function - to get good performance on large systems, wasting the space
on smaller systems.  Or generate a kernel configuration constant that is
never tuned for the (current) workload of the system.

It often isn't viable to make the hash table (and function) such that the
load factor is < 1 - so there are almost always chains.  Leaving your lookup
O(n) for large data sets.

Surely it would be better to find an ordering relation and put the data
into a balanced binary tree.  For large data sets this is O(log n).

Of course it is probably worth having a cache of recently acessed items
- but this can be much smaller and only needs a cheap hash function.

    David