[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: radix tree implementation for quota ?
On Tue, 30 Nov 2010 12:57:13 +0000
David Holland <dholland-tech%netbsd.org@localhost> wrote:
> On Mon, Nov 29, 2010 at 11:12:21AM -0500, der Mouse wrote:
> > Without any real data on what UID distribution looks like in practice,
> > we're all speculating in a vacuum here.
> Just for shits and giggles I ran this on a real password file with
> about 350 users that's had lots of churn since it was first
> established. Using Joerg's hash with a table of size 512 gave 90
> collisions; using x % 509 (the nearest available prime) gave 91.
> For comparison I tried some of the (more expensive) hashes from db4
> and got 111, 94, and 84.
> None of the hash functions generated significant hotspots. It seems
> like a nonissue.
> (Of course, hashing 350 things isn't exactly a big challenge...)
> David A. Holland
Well, if hash table is heavily used for looking up keys, you can reduce the
number of collisions even further. The trick is to use different hash
functions, which are very fast to compute:
static inline uint32_t hash_times33(uint32_t n)
return (n << 5) + n;
static inline uint32_t hash_times65(uint32_t n)
return (n << 6) + n;
static inline uint32_t hash_times129(uint32_t n)
return (n << 7) + n;
static inline uint32_t hash_times257(uint32_t n)
return (n << 8) + n;
static inline uint32_t hash_times513(uint32_t n)
return (n << 9) + n;
... and so on.
When you insert an item into hash table, you can use 5 different hashes. If you
get a collision, you try the next hash, until you hit empty bucket or you
reached the last hash function.
Hashing 10000 unique random integer keys, ranging from 0 to 100000 and hash
table with 16384 buckets:
Single times33 hash function:
Collisions = 2247
Five hash functions:
Collisions = 173
Ten hash functions:
Collisions = 54
Main Index |
Thread Index |