Subject: Re: SMP re-entrancy in kernel drivers/"bottom half?"
To: Jonathan Stone <jonathan@dsg.stanford.edu>
From: Jason Thorpe <thorpej@wasabisystems.com>
List: tech-kern
Date: 12/17/2003 18:50:27
--Apple-Mail-39-1066830328
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=US-ASCII; format=flowed


On Dec 17, 2003, at 2:58 PM, Jonathan Stone wrote:

> Elementary: we have to maintain the invariant ``at most one CPU at or
> above any given [hardware] prioritly level' or we lose the
> synchronization semantics of SPLs (higher SPls than the hypothetical
> SMP-safe interrupt-routine driver entrypoints).

I don't think that's the way we want to move the kernel, in general.  
There's also the question of what "above" is.  Technically, splnet is 
not "above" splbio, but it is allowed to be, by convention, in order to 
allow network devices to have better interrupt latency than disk 
controllers.

Think of this this way -- splbio and splnet lock two different sets of 
data structures.  They are orthogonal, and there is no defined "locking 
order" for moving between them.

We currently have a small set of interrupt-frobbing-simplelocks in the 
kernel that are implemented in an ad hoc way:

	s = splfoo();
	simple_lock(&foo_slock);

	/* manipulate a data structure that foo_slock protects */

	simple_unlock(&foo_slock);
	splx(s);

This is all usually hidden inside of macros.

The logic goes this way:

	1. The data structure is actually protected by foo_slock.  It is not
	   actually protected by splfoo().

	2. Because foo_slock protects the data structure, that prevents other
	   CPUs from getting at the data structure while we have it.

	3. Because foo_slock can be acquired in interrupt context, we must
	   prevent *our* CPU from running that interrupt code path while we
	   acquire/hold the lock, otherwise deadlock could result.  Therefore,
	   we go do splfoo() before we acquire the lock, and drop ipl after
	   we release it.

The branch I was working on basically implemented a couple of things, 
in the Solaris style:

	1. Adaptive and interrupt-safe mutexes.

	   An adaptive mutex is really more like lockmgr(LK_EXCLUSIVE) because
	   it is a sleeping mutex (the "adaptive" part is to spin if the thread
	   that holds the mutex is running on another CPU, in the hope that it
	   will be finished with it soon, thus saving at least 2 context 
switches).

	   An interrupt-safe mutex is one that does precisely the same dance
	   that my ad hoc example above does.  These types of mutexes would
	   always spin, in our kernel.

	   Both of these mutexes are contained in a single kmutex_t and 
mutex_*()
	   API.

	2. Reader-writer locks.  Like lockmgr(LK_SHARED) and 
lockmgr(LK_EXCLUSIVE)
	   combined, they are lighter-weight than the lockmgr equivalents.

	3. Turnstiles.  The context switching primitive used to support the
	   implementation of mutexes and rwlocks.

         -- Jason R. Thorpe <thorpej@wasabisystems.com>


--Apple-Mail-39-1066830328
content-type: application/pgp-signature; x-mac-type=70674453;
	name=PGP.sig
content-description: This is a digitally signed message part
content-disposition: inline; filename=PGP.sig
content-transfer-encoding: 7bit

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (Darwin)

iD8DBQE/4RXzOpVKkaBm8XkRAtW+AJ9xTF3Ii8vsNecSd6MrG0o+JPV9LgCePhFp
FcvekQCM76f8cS1rINjRZMA=
=+A1s
-----END PGP SIGNATURE-----

--Apple-Mail-39-1066830328--