Subject: Re: newlock
To: Andrew Doran <ad@NetBSD.org>
From: Jason Thorpe <email@example.com>
Date: 09/07/2006 13:23:51
On Sep 2, 2006, at 8:22 AM, Andrew Doran wrote:
> Hi all,
> I have been doing a bit of work on the newlock primatives that Jason
> Thorpe added a few years back. I'd appreciate some constructive review
> of the API before I start checking things in. I have put a couple of
> draft manual pages for the primatives here:
Couple of things out of the way first:
- Thank you very much for picking this up. It is sorely needed, and
I'm sorry that I haven't had the time to really run with it.
- I'm VERY sorry it has taken me so long to provide a real response.
And now, on to the real commentary :-)
First of all, I am assuming that you also picked up my turnstile
implementation and are going to bring it along with the rest of this
stuff? If so, good -- that's what I was hoping for, because I would
love to get rid of the *sleep interfaces completely, move everything
that needs that functionality to condition variables, and shed the
legacy sleep queues.
Secondly, I want to be very careful about straying from the Solaris
API on this stuff. Solaris's API is nice and minimalist, and they
have more than proven that additional baggage in the synchronization
APIs is not needed in order to get the job done.
==> Mutex comments
Overall, this looks great.
I do, however, think that we should NOT get the *_high() variants.
Instead, we should fix the code that needs to release spin locks in
the wrong order. This might mean that code delays the transition, but
I think it's better to force fixing it than it is to add a wart to the
API. Note that this might mean defining a strict hierarchy of
interrupt priority levels... right now, that hierarchy is loose,
except for a few special cases.
Perhaps we should add a MUTEX_DRIVER, like the Solaris API now has.
54 * MUTEX_DRIVER is always used by drivers. mutex_init()
converts this to
55 * either MUTEX_ADAPTIVE or MUTEX_SPIN depending on the
Solaris calls mutex_held() mutex_owned(). We should probably follow
Solaris in this regard.
Solaris also has a mutex_owner(). Probably not useful except for
debugging, but it's easy to implement, so we should probably just have
it in our API, too.
Please avoid changing the ABI if DIAGNOSTIC or LOCKDEBUG are defined.
I know I made this mistake in my first cut, but let's fix it now :-)
The documentation implies that holding an adaptive mutex across a
sleep is not allowed. Actually, it is allowed, but of course leads to
increased contention for the mutex. That said, some things might need
to hold other things off while it sleeps waiting for some condition,
so it should be officially allowed by the API.
==> Rwlock comments
I don't think we should have adaptive vs. spin-only rwlocks. Solaris
only has sleeper-style rwlocks, and we ourselves only have spinlockmgr
() for a few special cases.
Again, let's get rid of the *_high() versions. Do we really have
spinlockmgr()-locks now that are released out-of-order?
Solaris has only rw_enter() with a krw_t argument to indicate the
mode. We should stick with the Solaris API for this. Also, Solaris
only has a single rw_exit(), and we should follow that lead.
Please rename rw_try_enter() and rw_try_upgrade() to rw_tryenter() and
rw_tryupgrade(), to follow the Solaris API.
krwlock_op_t should probably be renamed to krw_t, but that is a nit,
and probably not that important (who actually declares a krw_t in any
of their own data structures? :-)
rw_downgrade() should not need the krwlock_op_t argument -- by
definition, you are always going from writer->reader, so the extra
argument is redundant.
Similarly, rw_tryupgrade() does not need the krwlock_op_t argument,
for the same reason (always reader->writer).
Add a rw_lock_held() which in Solaris is equivalent to (rw_read_held
(rw) || rw_write_held()).
Again, sleeping rwlocks are allowed to be held across sleeps, and the
documentation should reflect that.
> Note that in the non-DIAGNOSTIC / MULTIPROCESSOR case, the spin locks
> basically boil down to at most splraiseipl()/splx(). The adaptive
> remain, however. I don't see that as a big problem:
Not only is it not a problem, it's a requirement so that LKMs can work
in UP -or- MP kernels :-)