Subject: Re: timedwork
To: Iain Hibbert <plunky@rya-online.net>
From: Andrew Doran <ad@netbsd.org>
List: tech-kern
Date: 01/20/2007 22:50:05
On Sat, Jan 20, 2007 at 10:17:39PM +0000, Iain Hibbert wrote:
> On Sat, 20 Jan 2007, Andrew Doran wrote:
>
> > > - s = splsched(); \
> > > + s = splclock(); \
> >
> > I'm not sure about this change. splsched() seems more appropriate to me as
> > it's a symbolic level.
>
> I'm easy on that, but it seemed to me that splclock() is more appropriate
> because the requirement is only to block hardclock(), which is IPL_CLOCK
> interrupt?
We also need to block any interrupt that could call e.g. callout_reset(),
which splsched() will do. splclock() is guaranteed only to block the clock
interrupt (although on a lot of systems, it will do much the same thing
as splsched()).
> > > error = ltsleep(q, PUSER - 1, "callout", 0, &callout_slock);
> >
> > Actually, thinking about it this could lead to deadlock. Since the lock
> > order is sched_lock -> callout_slock, entering tsleep() at this point
> > would acquire sched_lock while we already hold callout_slock.
> [..]
> > > wakeup(&timeout_work);
> >
> > There is the same lock order reversal here - callout_slock needs to be
> > dropped before calling wakeup().
>
> I'm confused now, I thought that was the point of the interlock that you
> could do that?
Generally yes but callout_slock is a special case, see below..
> What do you mean re the lock order?
By convention the lock order is sched_lock -> callout_slock, because
ltsleep() does something like:
SCHED_LOCK(s);
...
if (timeout)
callout_reset(...);
Your change does (in softclock, ignoring the callout thread):
CALLOUT_LOCK(s);
...
wakeup(...);
wakeup() acquires the sched_lock, which means the above makes the assumption
that the order is callout_slock -> sched_lock. There is the same problem
with the call to ltsleep(). It could result in a deadlock where you have:
CPU1: ltsleep() -> acquire sched_lock -> spin on callout_slock
CPU2: softclock() -> acquire callout_slock -> spin on sched_lock
Maybe another lock could be used to protect the timeout_work queue. I'm now
sure how that might look.
Thanks,
Andrew