Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: evtchn_do_event: handler...didn't lower ipl (Was: Re: xl or xm for xen)



    Date:        Tue, 3 Dec 2013 13:04:30 +0100
    From:        Manuel Bouyer <bouyer%antioche.eu.org@localhost>
    Message-ID:  <20131203120430.GA26926%asim.lip6.fr@localhost>

  | I think what happens is:
  | - the kernel is running in a code portion protected by 
  |   a spin mutex registered at IPL_FOO, with IPL_FOO < IPL_SCHED
  | - a clock interrupt comes in,

Yes, that would explain it all - I was looking for any cases where
mutexes aren't released, and hadn't found any (and it would have had to be
something rare or the forgotten mutex would cause a hang eventually).

This is a better explanation - it didn't occur to me as I had this
impression that spin mutexes would always raise the ipl to IPL_HIGH
(which didn't seem so bad, as they should be used and released very
quickly) but now you mention it, I did see the init code, and the
ipl value in the mutex itself...

  | Now I guess it's done this way because it's allowed to release mutexes
  | in any order:

Yes, that much I understood...   I'm not sure I agree with it as a
programming model (for spin mutexes - for other more general locking
the flexibility is probably required) - but in general, apart from
very special cases (mutex_spin_enter being the canonical case of course)
functions should not be returning with spin mutexes (they grabbed) still
held, it should be grab/work/release - if that were the model, the
issue of re-ordering the releases wouldn't ever arise ("work" - which might
also include interrupts firing - might grab a spin mutex itself, so nested
mutexes are clearly needed, only correctly nested ones probably ought to
be supported.)

In any case, since your analysis shows how this all happens without presuming
any bugs, I think we can assume that for this, there are no bugs.   All that
remains is to decide what to do with the printf (well, not much of a choice
there - just delete it, it is annoying and now useless...) and the forced
resetting (lowering) of ci_ilevel that is currently being done.   I don't
see that doing any harm, especially not in this case, but ...

Alternatively, revise the spin mutex definition, permit only correct
nesting, and then make mutex_spin_exit always restore the ipl (avoiding
testing would make it simpler, and faster...)    That's for those of you
who understand the design better to decide, not me.

kre



Home | Main Index | Thread Index | Old Index