Subject: Re: newlock
To: None <>
From: Lars Heidieker <>
List: tech-kern
Date: 09/05/2006 12:59:34
Hash: SHA1

On 4 Sep 2006, at 23:05, Garrett D'Amore wrote:

> David Laight wrote:
>> On Mon, Sep 04, 2006 at 01:35:02PM -0700, Garrett D'Amore wrote:
>>> Masking interrupts doesn't work on very high end SMP hardware.  =20
>>> You're
>>> still trying to design for systems with less than 8 CPUs.
>> You only need to mask it on the current cpu.
>> Which is all spl() has to do.
> Huh?  If I have user code entering the driver on processor 1, and the
> interrupt is being handled on processor 2, then I had better make sure
> that _somehow_ the data integrity is retained.  Simply masking
> interrupts on processor 1 isn't going to help processor 2.

Yes I agree the interrupt handling should be done under the cover of =20
the mutexes.

If you don't service interrupts via threads, you get
(having the disable_it and enable_it moved under the controll of the =20
mutex would be nice, like solaris does)

You have cpu0 entering from user mode the driver:

driver_fromusermode() {
	disable_it()   // Just for the calling CPU
	mutex_enter(the_drivers_lock)   // entering the spinning mutex

	do what ever


and some other cpu is handling the interrupt:

driver_isr() {
	mutex_enter(the_drivers_lock)  // entering the spinning mutex

	do what ever

you end up with one of the cpus spinning while the other gets thru =20
the critical section.

the only assumption needed is that the same interrupt does not fire =20
twice on the same cpu simultaneously.
(This should be handeld via the cpu interrupt controller)

What you cant do (or at least would not be clever) is using adaptive =20
mutexes without interrupt threads.
As the mutex_enter from the isr is not able to sleep, interrupts =20
can't sleep, so if the code called from usermode sleeps
while holding the mutex, the isr would have to wake up the =20
mutex_holding thread and this can't be done on the isr servicing cpu as
isr can't sleep therefor needing a not isr servicing cpu. This would =20
result in the need of initiating a context switch via xcall
for servicing an interrupt and that's not clever.
This would make the mutex half adaptive as the mutex_enter from the =20
isr would not be adaptive.
And this can lead to a deadlock:
cpu0 locked mtx0 from user mode running isr that wants mtx1 and
cpu1 locked mtx1 from user mode running isr that wants mtx0
no way of waking up any mutex holding thread to get on.

Adaptive mutexes can still be used in code paths not containing =20
mutexes called by isr.

And this is what happens in Solaris they use adaptive mutexes from =20
interrupt < level 10 as they are service via threads
(with "lazy context" switching for performance reasons where the =20
ithread does not block in the comman case)
and they use spin mutexes for the higher interrupts and those spin =20
mutexes cook down essentially to simple spin locks
with automatic ipl handling.

Attaching an ipl-level to the mutex makes it easy to check if =20
interrupts up to this level are disabled on the local cpu.

the spin  version of mutex_enter on solaris does set the pil and =20
saves the old pil and mutex_enter will only increment the pil-level:

mtx0 (pil-lvl: 10)
mtx1 (pil-lvl: 13)

    pil is now 10
    pil is now 13


order is important here as the mtx1 know the pil-level of mtx0 as the =20=

old-level so this order is ok if we swap the exits we get:

	pil is now below 10 // probably 0 (all interrupts enabled)

	now an interrupt with level 13 happens (for example) and we hold =
a =20
spinlock with level 13 protection
                the isr tries to enter mtx1 oh this cpu is not doing =20
anything more it's spinning =3D> deadlock.

this is essentially the ipl-level passing Andrew favors and which =20
makes totally sense to me.

Lars Heidieker

- ---

Mystische Erkl=E4rungen.
Die mystischen Erkl=E4rungen gelten f=FCr tief;
die Wahrheit ist, dass sie noch nicht einmal oberfl=E4chlich sind.
      -- Friedrich Nietzsche

Version: GnuPG v1.4.3 (Darwin)