Subject: Re: newlock
To: Jason Thorpe <firstname.lastname@example.org>
From: Garrett D'Amore <email@example.com>
Date: 09/07/2006 14:07:58
Jason Thorpe wrote:
> On Sep 2, 2006, at 8:22 AM, Andrew Doran wrote:
>> Hi all,
>> I have been doing a bit of work on the newlock primatives that Jason
>> Thorpe added a few years back. I'd appreciate some constructive review
>> of the API before I start checking things in. I have put a couple of
>> draft manual pages for the primatives here:
> Couple of things out of the way first:
> - Thank you very much for picking this up. It is sorely needed, and
> I'm sorry that I haven't had the time to really run with it.
> - I'm VERY sorry it has taken me so long to provide a real response.
> And now, on to the real commentary :-)
> First of all, I am assuming that you also picked up my turnstile
> implementation and are going to bring it along with the rest of this
> stuff? If so, good -- that's what I was hoping for, because I would
> love to get rid of the *sleep interfaces completely, move everything
> that needs that functionality to condition variables, and shed the
> legacy sleep queues.
> Secondly, I want to be very careful about straying from the Solaris
> API on this stuff. Solaris's API is nice and minimalist, and they
> have more than proven that additional baggage in the synchronization
> APIs is not needed in order to get the job done.
I agree with Jason whole-heartedly here.
> ==> Mutex comments
> Overall, this looks great.
> I do, however, think that we should NOT get the *_high() variants.
> Instead, we should fix the code that needs to release spin locks in
> the wrong order. This might mean that code delays the transition, but
> I think it's better to force fixing it than it is to add a wart to the
> API. Note that this might mean defining a strict hierarchy of
> interrupt priority levels... right now, that hierarchy is loose,
> except for a few special cases.
I don't think it needs to be "strict", but we probably need to have at
least this prioritization:
certain devices that are above the scheduler (splhigh())
the scheduler (splclock())
devices below the scheduler (splbio(), splnet())
soft interrupts (splsoftxxx())
Further prioritization is probably not that useful.
> Perhaps we should add a MUTEX_DRIVER, like the Solaris API now has.
> OpenSolaris says:
> 54 * MUTEX_DRIVER is always used by drivers. mutex_init()
> converts this to
> 55 * either MUTEX_ADAPTIVE or MUTEX_SPIN depending on the iblock
> Solaris calls mutex_held() mutex_owned(). We should probably follow
> Solaris in this regard.
> Solaris also has a mutex_owner(). Probably not useful except for
> debugging, but it's easy to implement, so we should probably just have
> it in our API, too.
Use of mutex_owner/mutex_owned outside of ASSERT() is forbidden in
Solaris. I wouldn't worry too much about this -- but drivers thinking
they want to use these are probably suffering from poor design.
> Please avoid changing the ABI if DIAGNOSTIC or LOCKDEBUG are defined.
> I know I made this mistake in my first cut, but let's fix it now :-)
Yes, a single ABI that works all the time is much to be preferred,
especially for LKMs.
> The documentation implies that holding an adaptive mutex across a
> sleep is not allowed. Actually, it is allowed, but of course leads to
> increased contention for the mutex. That said, some things might need
> to hold other things off while it sleeps waiting for some condition,
> so it should be officially allowed by the API.
> ==> Rwlock comments
> I don't think we should have adaptive vs. spin-only rwlocks. Solaris
> only has sleeper-style rwlocks, and we ourselves only have
> spinlockmgr() for a few special cases.
> Again, let's get rid of the *_high() versions. Do we really have
> spinlockmgr()-locks now that are released out-of-order?
> Solaris has only rw_enter() with a krw_t argument to indicate the
> mode. We should stick with the Solaris API for this. Also, Solaris
> only has a single rw_exit(), and we should follow that lead.
> Please rename rw_try_enter() and rw_try_upgrade() to rw_tryenter() and
> rw_tryupgrade(), to follow the Solaris API.
> krwlock_op_t should probably be renamed to krw_t, but that is a nit,
> and probably not that important (who actually declares a krw_t in any
> of their own data structures? :-)
> rw_downgrade() should not need the krwlock_op_t argument -- by
> definition, you are always going from writer->reader, so the extra
> argument is redundant.
> Similarly, rw_tryupgrade() does not need the krwlock_op_t argument,
> for the same reason (always reader->writer).
> Add a rw_lock_held() which in Solaris is equivalent to
> (rw_read_held(rw) || rw_write_held()).
> Again, sleeping rwlocks are allowed to be held across sleeps, and the
> documentation should reflect that.
>> Note that in the non-DIAGNOSTIC / MULTIPROCESSOR case, the spin locks
>> basically boil down to at most splraiseipl()/splx(). The adaptive locks
>> remain, however. I don't see that as a big problem:
> Not only is it not a problem, it's a requirement so that LKMs can work
> in UP -or- MP kernels :-)
> -- thorpej
Garrett D'Amore, Principal Software Engineer
Tadpole Computer / Computing Technologies Division,
General Dynamics C4 Systems
Phone: 951 325-2134 Fax: 951 325-2191