tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: radix tree implementation for quota ?



On Mon, 29 Nov 2010 09:57:18 +0000
David Laight <david%l8s.co.uk@localhost> wrote:

> What about the sequence prime, prime*2, prime*3 etc.
> Unless you are going to search for a factor that works, no particular
> value is any better than any other if you allow pathaogical data
> to be selected.
>

Only that keys which are multiples of some even number occur much more 
frequently. Yes you need to have a knowledge of what type of data you are 
hashing, but in most cases hash tables work very well.
 
> Even searching for a factor may not help.
> I looked at the distribution for the elf symbol table in libc.
> The current 'first prime below the power of 2' was no different
> from the '2^n-1' value - except that it is a lot slower to compute.
> 

Yes when hashing strings, power of 2 hash tables work just as well. With 
integers you have to be more careful.

> I would always look for a deterministic algorythm. Hashing to linked
> lists is still o(n), with a hash table with 'h' entries it is just
> 'h' times faster, and then only if the linked lists are equal length.
> Trying to get hash chains of length 1 or 2 requires a lot of knowledge
> of the data in order to generate a suitable hash function.
> And then calculating the hash might be more expensive than a few
> comparisons.

If you are hashing integer values, you can tune the hash table so that every 
bucket will have no more than 1 entry. Usually you simply make hash table size 
a prime number.

As for calculating the hash, modern hardware has fast modulo instructions. A 
very simple test on Intel Atom D510 processor, seems to indicate a single 
'interger % prime' is roughly equal to two integer operations.


Home | Main Index | Thread Index | Old Index