tech-kern: Frequent use of callout_stop() in kernel causes panic

Subject: Frequent use of callout_stop() in kernel causes panic
To: None <tech-kern@netbsd.org>
From: Brian Buhrow <buhrow@lothlorien.nfbcal.org>
List: tech-kern
Date: 04/14/2003 03:10:46

	Hello folks.  I've found that using callout_stop() frequently in a
given routine causes corruption of the callwheel structure, resulting in a
kernel panic at some point.
	this is under 1.6.1.
	Although I'm not sure exactly how it happens, as I don't see anything
obviously wrong, if you setup a callout function, by calling callout_init()
and callout_reset(), if you don't call callout_reset() from the function
which gets executed in the callout call, or, if you call callout_stop()
before the timer fires, your callout structure doesn't get taken off of the
execution queue before the function pointer in it is Nullified.
	This seems to happen if either your callout function doesn't call
callout_reset() explicitly, or if you call callout_stop() to try and keep
your function from being executed.
	another side effect of this is that, if you do as I did and  put a
diagnostic printf in softclock() to prevent a panic, you'll discover that
jobs you didn't want deleted are taken off of the execution queue.  I found
that schedcpu() regularly fell off the task list.  However, it turned out
that anyone who happened to be sharing a callwheel bucket with my ill-fated
function, would get forgotten, even as my nullified callout stayed on the
queue.

	I can't tell, at the moment, if the problem is with the queue macros,
or with some locking in the callwheel algorithm.  Has anyone else seen this
behavior, or some strangeness with queues in other contexts?

(I've applied the patches from Havard's Kern/20390 bug, in case anyone
asks.)

	If anyone has seen this problem, or thinks they've seen something
related to it, I'd be interested.
-Brian