Subject: Re: SMP re-entrancy in kernel drivers/"bottom half?"
To: Jonathan Stone <jonathan@dsg.stanford.edu>
From: Jason Thorpe <thorpej@wasabisystems.com>
List: tech-kern
Date: 12/17/2003 18:50:27
--Apple-Mail-39-1066830328
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=US-ASCII; format=flowed
On Dec 17, 2003, at 2:58 PM, Jonathan Stone wrote:
> Elementary: we have to maintain the invariant ``at most one CPU at or
> above any given [hardware] prioritly level' or we lose the
> synchronization semantics of SPLs (higher SPls than the hypothetical
> SMP-safe interrupt-routine driver entrypoints).
I don't think that's the way we want to move the kernel, in general.
There's also the question of what "above" is. Technically, splnet is
not "above" splbio, but it is allowed to be, by convention, in order to
allow network devices to have better interrupt latency than disk
controllers.
Think of this this way -- splbio and splnet lock two different sets of
data structures. They are orthogonal, and there is no defined "locking
order" for moving between them.
We currently have a small set of interrupt-frobbing-simplelocks in the
kernel that are implemented in an ad hoc way:
s = splfoo();
simple_lock(&foo_slock);
/* manipulate a data structure that foo_slock protects */
simple_unlock(&foo_slock);
splx(s);
This is all usually hidden inside of macros.
The logic goes this way:
1. The data structure is actually protected by foo_slock. It is not
actually protected by splfoo().
2. Because foo_slock protects the data structure, that prevents other
CPUs from getting at the data structure while we have it.
3. Because foo_slock can be acquired in interrupt context, we must
prevent *our* CPU from running that interrupt code path while we
acquire/hold the lock, otherwise deadlock could result. Therefore,
we go do splfoo() before we acquire the lock, and drop ipl after
we release it.
The branch I was working on basically implemented a couple of things,
in the Solaris style:
1. Adaptive and interrupt-safe mutexes.
An adaptive mutex is really more like lockmgr(LK_EXCLUSIVE) because
it is a sleeping mutex (the "adaptive" part is to spin if the thread
that holds the mutex is running on another CPU, in the hope that it
will be finished with it soon, thus saving at least 2 context
switches).
An interrupt-safe mutex is one that does precisely the same dance
that my ad hoc example above does. These types of mutexes would
always spin, in our kernel.
Both of these mutexes are contained in a single kmutex_t and
mutex_*()
API.
2. Reader-writer locks. Like lockmgr(LK_SHARED) and
lockmgr(LK_EXCLUSIVE)
combined, they are lighter-weight than the lockmgr equivalents.
3. Turnstiles. The context switching primitive used to support the
implementation of mutexes and rwlocks.
-- Jason R. Thorpe <thorpej@wasabisystems.com>
--Apple-Mail-39-1066830328
content-type: application/pgp-signature; x-mac-type=70674453;
name=PGP.sig
content-description: This is a digitally signed message part
content-disposition: inline; filename=PGP.sig
content-transfer-encoding: 7bit
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (Darwin)
iD8DBQE/4RXzOpVKkaBm8XkRAtW+AJ9xTF3Ii8vsNecSd6MrG0o+JPV9LgCePhFp
FcvekQCM76f8cS1rINjRZMA=
=+A1s
-----END PGP SIGNATURE-----
--Apple-Mail-39-1066830328--