tech-kern archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: radix tree implementation for quota ?
On Mon, 29 Nov 2010 13:32:27 +0100
Joerg Sonnenberger <joerg%britannica.bec.de@localhost> wrote:
> On Mon, Nov 29, 2010 at 09:31:37AM +0000, Sad Clouds wrote:
> > Joerg I was specifically talking about particular sequences of numbers
> > that result in high collision rates, not random numbers. For example,
> > if you take the following:
> >
> > 4, 8, 12, 16, 20 ...
> >
> > Every integer is a multiple of 4. Can you give me a full example of a
> > hash function for power of 2 hash table, that will perform as well as
> > a simple:
> >
> > interger % prime
>
> I asked the Python Oracle for a 32bit number and got 3063404725. This
> results in hash(x,n) = ((3063404725ULL * x) >> 32) % n as function.
> For n=32 and 32 keys, I get 8 single collisions. For n=256 and 256 keys,
> I get 39 single collisions. Depending on the choosen random number, the
> collision rate will be higher or lower, but if it is too high, you can
> always just pick a different one and rehash. Theory says that the result
> is probalistically near a random distribution *independent* of the input.
>
> Joerg
I'm sorry, but this hash function results in too many collisions for power 2
hash table:
nbuckets = 33554432 /* power of 2 */
nitems = 18000000
hash_function = ((3063404725ULL * num) >> 32) % nbuckets
Integer sequence: 2, 3, 4, 5, 6 ...
Collisions: 5161419
Integer sequence: 2, 4, 6, 8, 10 ...
Collisions: 0
Integer sequence: 4, 8, 12, 16, 20 ...
Collisions: 908960
This time with hash table that is prime:
nbuckets = 33554467 /* prime */
nitems = 18000000
hash_function = num % nbuckets
Integer sequence: 2, 3, 4, 5, 6 ...
Collisions: 0
Integer sequence: 2, 4, 6, 8, 10 ...
Collisions: 0
Integer sequence: 4, 8, 12, 16, 20 ...
Collisions: 0
I've tried many different hash functions and from my experiments, for hashing
integers, it's very hard to beat 'number % prime'. There is no magic number
that you need to derive for every different set of integers, it just works for
all sorts of integers and gives very good results.
Home |
Main Index |
Thread Index |
Old Index