Subject: Re: New hash algorithm - FNV - for use in the kernel
To: Luke Mewburn <tech-perform@netbsd.org>
From: David Laight <David.Laight@btinternet.com>
List: tech-kern
Date: 11/27/2001 09:21:36
> There is a potential issue however; FNV uses multiply (*) and xor (^),
> whereas a lot of the existing hashes on strings and buffers just use
> addition (+).
Using multiply will (probably) stuff the performance of sytems that
don't have fast hardware multiply (eg sparc32).
> 1) vmstat -h nfsnodehash:
> total used util num average maximum
> hash table buckets buckets % items chain chain
> ----------------------------------------------------------------------
>
> with existing nfs_hash():
> nfsnodehash 65536 1000 1.53 34688 34 105
The 1000 used buckets looks too much like a non-random number!
Anyone know whether this is true?
>
> using fnv_32_buf()
> nfsnodehash 65536 27054 41.28 34867 1 6
One problem, I have with these kernel hash tables, is that you end up
allocating a massive table - and require a good (computationally expensive)
hash function - to get good performance on large systems, wasting the space
on smaller systems. Or generate a kernel configuration constant that is
never tuned for the (current) workload of the system.
It often isn't viable to make the hash table (and function) such that the
load factor is < 1 - so there are almost always chains. Leaving your lookup
O(n) for large data sets.
Surely it would be better to find an ordering relation and put the data
into a balanced binary tree. For large data sets this is O(log n).
Of course it is probably worth having a cache of recently acessed items
- but this can be much smaller and only needs a cheap hash function.
David