tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: radix tree implementation for quota ?



On Mon, 29 Nov 2010 13:32:27 +0100
Joerg Sonnenberger <joerg%britannica.bec.de@localhost> wrote:

> On Mon, Nov 29, 2010 at 09:31:37AM +0000, Sad Clouds wrote:
> > Joerg I was specifically talking about particular sequences of numbers
> > that result in high collision rates, not random numbers. For example,
> > if you take the following:
> > 
> > 4, 8, 12, 16, 20 ...
> > 
> > Every integer is a multiple of 4. Can you give me a full example of a
> > hash function for power of 2 hash table, that will perform as well as
> > a simple:
> > 
> > interger % prime
> 
> I asked the Python Oracle for a 32bit number and got 3063404725. This
> results in hash(x,n) = ((3063404725ULL * x) >> 32) % n as function.
> For n=32 and 32 keys, I get 8 single collisions. For n=256 and 256 keys,
> I get 39 single collisions. Depending on the choosen random number, the
> collision rate will be higher or lower, but if it is too high, you can
> always just pick a different one and rehash. Theory says that the result
> is probalistically near a random distribution *independent* of the input.
> 
> Joerg

I'm sorry, but this hash function results in too many collisions for power 2 
hash table:

nbuckets = 33554432 /* power of 2 */
nitems = 18000000
hash_function = ((3063404725ULL * num) >> 32) % nbuckets

Integer sequence: 2, 3, 4, 5, 6 ...
Collisions: 5161419

Integer sequence: 2, 4, 6, 8, 10 ...
Collisions: 0

Integer sequence: 4, 8, 12, 16, 20 ...
Collisions: 908960


This time with hash table that is prime:

nbuckets = 33554467 /* prime */
nitems = 18000000
hash_function = num % nbuckets

Integer sequence: 2, 3, 4, 5, 6 ...
Collisions: 0

Integer sequence: 2, 4, 6, 8, 10 ...
Collisions: 0

Integer sequence: 4, 8, 12, 16, 20 ...
Collisions: 0

I've tried many different hash functions and from my experiments, for hashing 
integers, it's very hard to beat 'number % prime'. There is no magic number 
that you need to derive for every different set of integers, it just works for 
all sorts of integers and gives very good results.


Home | Main Index | Thread Index | Old Index