tech-kern: Efficient callout handling

Subject: Efficient callout handling
To: None <tech-kern@NetBSD.ORG>
From: Charles M. Hannum <mycroft@mit.edu>
List: tech-kern
Date: 07/17/1996 17:36:13
It's been brought to my attention that more efficient callout handling
would be beneficial, not only for the current uses of callouts, but
even more so in cases where it might be preferrable to have a *huge*
number of callouts at once.

One idea (from Greg) for implementing a more efficient callout
mechanism would be to store the entries in a heap.  This is an
interesting suggestion, and I explored it a bit today.  As it turns
out, though, it is actually difficult to implement a heap efficiently
in the kernel without some perverse hacks to the data structure.  By
the time you're done, it really doesn't look much better than a
balanced binary tree.  Also, insertions and deletions are still O(log
N), and it's arguable that one can do better.

I have, therefore, devised a different (and possibly more perverse
B-)) mechanism for keeping track of callouts.  It results in something
approximating O(1) insertion, deletion, and lookup time.

The idea is to use a sort of radix tree, but with a twist.  You have
HZ lists of callouts to execute within the current second, (say) 60
lists of callouts to execute within the next 60 seconds, 60 lists
divided by minute, 24 lists divided by hour, N lists divided by day.
When a particular point rolls over, you promote and resort the current
list at the appropriate levels up 1 slot.

In addition, rather than the current method of allocating callout
structures, you either allocate them separately, or allocate them
dynamically within the timeout() routine.  Like the original BCI
interface, untimeout() then takes a cookie that was returned by
timeout() to delete the callout, and does so in O(1) time.  This is
trivial to do, for example, by implementing the above-mentioned lists
as TAILQs or something similar.

The overall time used for any given callout (whether deleted or
executed) is thus O(log T), where T is the distance into the future.
Since T is (especially when you include things like TCP timers) almost
always <= 60 seconds, this means that the constant factor is actually
fairly small.


Does anyone have suggestions on further refinements to this, or
perhaps better ways of doing it?