Re: evtchn_do_event: handler...didn't lower ipl (Was: Re: xl or xm for xen)

To: Robert Elz <kre%munnari.OZ.AU@localhost>
Subject: Re: evtchn_do_event: handler...didn't lower ipl (Was: Re: xl or xm for xen)
From: Manuel Bouyer <bouyer%antioche.eu.org@localhost>
Date: Tue, 3 Dec 2013 13:04:30 +0100

On Tue, Dec 03, 2013 at 08:55:06AM +0700, Robert Elz wrote:
>     Date:        Mon, 2 Dec 2013 12:29:13 +0100
>     From:        Manuel Bouyer <bouyer%antioche.eu.org@localhost>
>     Message-ID:  <20131202112913.GA892%asim.lip6.fr@localhost>
> 
>   | >     evtchn_do_event: handler 0xffffffff801f9c7f didn't lower ipl 8 7
> 
>   | Yes, is has been there for a long time, but we failed to find where is
>   | problem is exactly.
> 
> I didn't get quite as close as I'd hoped, and I probably haven't
> discovered anything that wasn't already known, but ...
> 
> For me at least, that message is triggered from statclock() called from
> hardclock() - when statclock() starts the ipl is 7, when it exits, it is 8.
> 
> But it's not statclock's fault ... the ipl is raised as a side effect or
> mutex_spin_enter - on the other hand, mutex_spin_exit only restores the
> ipl when the mutex being released is the last one held by the CPU.

Ha, good catch. I was looking for a missing splx() but didn't think
about spin mutexes, which also changes the IPL.

> 
> When statclock() - and hardclock() before it - is (or are) called, the
> cpu (apparently) already holds a (spin) mutex (the ci_mtx_count field of
> the cpu_info struct is -1).    Given that, and the way spin mutexes work,
> statclock() (and then hardclock()) must return with the ipl higher.
> 
> What I haven't found yet is where that extra spin mutex is grabbed - but
> it looks as if either there must be one, or the spin_mutex code has bugs
> (and the latter doesn't seem likely - it would need to be some kind of
> race against an interrupt handler, as this is a XEN Dom0 I am testing using,
> and consequently, acts as if there is just one cpu).   In any case, things
> are far too repeatable, for it to be some bizarre random timing event with
> an interrupt just at the wrong nanosecond...
> 
> Unless someone else suddenly realises what's up - even if this is really
> just SOP, that is, if there's some mutex that's meant to be held, and the
> problem is the diagnostic objecting to normal expected conditions, then
> the next time I get some time to look into this, I'll probably add some
> instrumentation to the spin mutex functions, to record from where, and
> which, mutexes are being locked, so when the (seemingly) spurious one is
> detected, we can find out where it came from.
> 
> At the minute I also can't explain how there is apparently a spin mutex
> still held, yet the IPL is not IPL_HIGH ... that also suggests some kind
> of problem I believe - when the first spin mutex is grabbed, the ipl
> should be being raised to IPL, and it should stay there until no more spin
> locks are held.   Yet I (repeatedly, and reliably) observed statclock()
> (and hardclock(), and other) functions being called with ipl lower than
> IPL_HIGH (7 rather than 8) while ci_mtx_count is indicating a mutex is
> held.    Weird.

I think what happens is:
- the kernel is running in a code portion protected by 
  a spin mutex registered at IPL_FOO, with IPL_FOO < IPL_SCHED
- a clock interrupt comes in, it' serviced as we're running < IPL_SCHED.
  We still hold the lock at IPL_FOO from the interruped code.
  Then we take a spin lock registered at IPL_HIGH. This raises the IPL to
  IPL_HIGH, and when we mutex_exit() this spin mutex, the IPL is not
  restored to IPL_SCHED because we're holding another mutex.

Now I guess it's done this way because it's allowed to release mutexes
in any order: we could
grab mutex1 at e.g. IPL_VM
grab mutex2 at IPL_HIGH
release mutex1
release mutex2

if "release mutex1" did restore IPL, we would be running at IPL_VM or
lower, while we would expext to still be at IPL_HIGH because we're
still holding mutex2.
At first glance I don't see an easy way to fix this, if it has to
be fixed.

I can see 2 problems with this
- minor performance issue, because interrupts are blocked more than
  necessery
- in this specific case, the kernel stays at IPL_HIGH when
  it should not. This means IPIs are blocked when they should not be.
  Could this cause a deadlock ?

Note that in the specific case of an interrupt handler returning at IPL_HIGH,
interrupt exit should lower the IPL anyway.

-- 
Manuel Bouyer <bouyer%antioche.eu.org@localhost>
     NetBSD: 26 ans d'experience feront toujours la difference
--

References:
- Re: xl or xm for xen ? (and more questions, maybe raidframe related)
  - From: Manuel Bouyer
- xl or xm for xen ? (and more questions, maybe raidframe related)
  - From: Robert Elz
- evtchn_do_event: handler...didn't lower ipl (Was: Re: xl or xm for xen)
  - From: Robert Elz

Prev by Date: daily CVS update output
Next by Date: re: READ ME: updating mpc, mpfr, and eventually gmp
Previous by Thread: evtchn_do_event: handler...didn't lower ipl (Was: Re: xl or xm for xen)
Next by Thread: Re: evtchn_do_event: handler...didn't lower ipl (Was: Re: xl or xm for xen)
Indexes:

Home | Main Index | Thread Index | Old Index