Subject: Re: Improved timer code
To: Dennis Ferguson <dennis@juniper.net>
From: Justin T. Gibbs <gibbs@plutotech.com>
List: tech-kern
Date: 10/24/1998 15:34:48
[Monster cc list trimmed down to just tech-kern]

>Be aware that the freebsd version of this changed the API to timeout()
>to return a handle, and untimeout() to take the handle as an argument,
>kind of like the SVR4 timeout().  This has some advantages, as it makes
>untimeout() a bit faster and eliminates the memory and operations that
>would otherwise be needed to maintain an untimeout() hash table, but
>changes this from a small hack in kern_clock.c to a big hack that requires
>modifications to every caller of untimeout().

Just FYI, FreeBSD is contemplating changing our API one more time to have
timeout take a callout handle.  This will allow us to get rid of the
static allocation of callout entries (which you could also do simply by
tacking on the use of a pool or zone allocator to the current FreeBSD
interface), but more importantly, will allow clients to perform upfront
allocation of callout resources rather than add complicated error recovery
for failed allocations in an interrupt context.

>There are also other ways at this that have slightly different tradeoffs.
>The callwheel code makes timeout() and untimeout() constant time, but
>does so at the cost of perhaps calling setsoftclock() more often than it 
>strictly needs to and by maybe adding O(n) operations to setsoftclock() to
>find guys who have timed out (though the O(n) should have a very small
>constant of proportionality, i.e. it is O(n) quite simple operations
>spread out over time).

softclock() has complexity O(hash-chain length) which means that you need
to take serveral factors into consideration to determine real cost:

How big is your hash table?
	The freeBSD table is a power of 2 size larger than the number
	of static callout entries allocated.  So we have lots of space
	(10 seconds of clock time or more) to distrubute entries.

What is your callout distribution?
	Timeout always specifies the callout time as a delta from the
	current time.  So, unless your application requires varying deltas
	or causes burst of callout activity for the same (tick & tick_mask),
	you are guaranteed good distribution.  The common users of this
	interface (device driver timeouts, process alarms, and (in an
	expanding fashion) networking code) have enough randomness in
	how work is queued and serviced to almost guarantee a good
	distribution.

How long do entries live in the wheel?
	The heaviest user of the timeout routines in FreeBSD is the SCSI
	layer.  The SCSI layer schedules entries that almost never expire,
	and are removed within a softclock tick or two of when they were
	first scheduled.  Because of the size of the callwheel, this ensures
	that these entries are never even encountered by softclock().

	If the entries you schedule are going to expire, so long as they
	are scheduled for a time <= sizeof callwheel / hz in the future,
	these entries will be encountered exactly once by softclock(),
	when the entry has expired.

	If you need additional 'scope' in the time wheel to avoid processing
	entries too many times, you can add hierarchical wheels as mentioned
	by the Varghese/Lauck SOSP paper.

What is your distribution of inserts, removals, and softclock runs?
	In event moderate disk I/O workloads I typically have 4-10
	insert/removals per softclock interval.  If this is typical
	of most workloads, then optimization of insert and removal
	(which often occur in a non-premptible interrupt context)
	is more imporant than optimizing softclock().

--
Justin