Subject: callout table callwheelsize
To: None <tech-kern@netbsd.org>
From: David Laight <david@l8s.co.uk>
List: tech-kern
Date: 07/19/2002 18:24:18
I've noticed that the 'callwheelsize' value depends on the configured
MAXUSERS (via NPROC).

Now the callwheel[] is actually indexed by 'absolute time' (modulo
callwheelsize) so the distribution of callout entries in the
table is not at all random.

ISTM that the table size should depend on 'hz' instead.

On my i386 GENERIC kernel the table has 1024 entries and hz is 100.
This means that every callout entry is processed every 10 seconds
until it expires.

For large numbers of long callouts this is significantly worse
than O(n).  If hz were (say) 1000 this would be much more
noticable - especially since the table size wouldn't have changed.

Now I'm not exactly sure of the likely dustribution of callout
expiry times, but I expect many are below 10 seconds and the
others are relatively large.
(I need to reboot to run the kernel I just built that has a routine
to dump the call wheels to see what my desktop system has)

I was considering implementing a two level structure. The low
level 'wheel' would contain items that expire on the next pass.
The higher level one would have one element processed each time
entry 0 of the low level wheel is actioned and move those of
its items that will expire 'soon' into their correct location
in the low level wheel.

If the 1024 entry wheel were spilt into two 512 entry wheels, then a
callout entry would be looked at every 512*512 ticks (or about 43
minutes at 100Hz) until it gets near to its expiry time.

The only cost is that a callout entry has to be requeued at the
appropriate time.

Note that a multi-level structure could easily be done three 128
entry wheels would wrap at 5h50 at 100Hz, and require items to be
moved 2m44 and 1s28 before their expire time.

For large numbers of callouts this would be considerably better
that the current system (which is clearly much better that the
one it replaced).

Indeed a binary table would be O(log time) but probably less efficient.
(but quite fun to code up! and a pain to get right!!!)

Any comments?

	David

-- 
David Laight: david@l8s.co.uk