Subject: Re: newlock
To: Jason Thorpe <>
From: Garrett D'Amore <>
List: tech-kern
Date: 09/07/2006 14:07:58
Jason Thorpe wrote:
> On Sep 2, 2006, at 8:22 AM, Andrew Doran wrote:
>> Hi all,
>> I have been doing a bit of work on the newlock primatives that Jason
>> Thorpe added a few years back. I'd appreciate some constructive review
>> of the API before I start checking things in. I have put a couple of
>> draft manual pages for the primatives here:
> Couple of things out of the way first:
> - Thank you very much for picking this up.  It is sorely needed, and
> I'm sorry that I haven't had the time to really run with it.
> - I'm VERY sorry it has taken me so long to provide a real response.
> And now, on to the real commentary :-)
> First of all, I am assuming that you also picked up my turnstile
> implementation and are going to bring it along with the rest of this
> stuff?  If so, good -- that's what I was hoping for, because I would
> love to get rid of the *sleep interfaces completely, move everything
> that needs that functionality to condition variables, and shed the
> legacy sleep queues.
> Secondly, I want to be very careful about straying from the Solaris
> API on this stuff.  Solaris's API is nice and minimalist, and they
> have more than proven that additional baggage in the synchronization
> APIs is not needed in order to get the job done.

I agree with Jason whole-heartedly here.

> ==> Mutex comments
> Overall, this looks great.
> I do, however, think that we should NOT get the *_high() variants. 
> Instead, we should fix the code that needs to release spin locks in
> the wrong order.  This might mean that code delays the transition, but
> I think it's better to force fixing it than it is to add a wart to the
> API.  Note that this might mean defining a strict hierarchy of
> interrupt priority levels... right now, that hierarchy is loose,
> except for a few special cases.

I don't think it needs to be "strict", but we probably need to have at
least this prioritization:

    certain devices that are above the scheduler (splhigh())
    the scheduler (splclock())
    devices below the scheduler (splbio(), splnet())
    soft interrupts (splsoftxxx())

Further prioritization is probably not that useful.
> Perhaps we should add a MUTEX_DRIVER, like the Solaris API now has. 
> OpenSolaris says:
>      54  * MUTEX_DRIVER is always used by drivers.  mutex_init()
> converts this to
>      55  * either MUTEX_ADAPTIVE or MUTEX_SPIN depending on the iblock
> cookie.
> Solaris calls mutex_held() mutex_owned().  We should probably follow
> Solaris in this regard.
> Solaris also has a mutex_owner().  Probably not useful except for
> debugging, but it's easy to implement, so we should probably just have
> it in our API, too.

Use of mutex_owner/mutex_owned outside of ASSERT() is forbidden in
Solaris.  I wouldn't worry too much about this -- but drivers thinking
they want to use these are probably suffering from poor design.

> Please avoid changing the ABI if DIAGNOSTIC or LOCKDEBUG are defined. 
> I know I made this mistake in my first cut, but let's fix it now :-)

Yes,  a single ABI that works all the time is much to be preferred,
especially for LKMs.

> The documentation implies that holding an adaptive mutex across a
> sleep is not allowed.  Actually, it is allowed, but of course leads to
> increased contention for the mutex.  That said, some things might need
> to hold other things off while it sleeps waiting for some condition,
> so it should be officially allowed by the API.
> ==> Rwlock comments
> I don't think we should have adaptive vs. spin-only rwlocks.  Solaris
> only has sleeper-style rwlocks, and we ourselves only have
> spinlockmgr() for a few special cases.
> Again, let's get rid of the *_high() versions.  Do we really have
> spinlockmgr()-locks now that are released out-of-order?
> Solaris has only rw_enter() with a krw_t argument to indicate the
> mode.  We should stick with the Solaris API for this.  Also, Solaris
> only has a single rw_exit(), and we should follow that lead.
> Please rename rw_try_enter() and rw_try_upgrade() to rw_tryenter() and
> rw_tryupgrade(), to follow the Solaris API.
> krwlock_op_t should probably be renamed to krw_t, but that is a nit,
> and probably not that important (who actually declares a krw_t in any
> of their own data structures? :-)
> rw_downgrade() should not need the krwlock_op_t argument -- by
> definition, you are always going from writer->reader, so the extra
> argument is redundant.
> Similarly, rw_tryupgrade() does not need the krwlock_op_t argument,
> for the same reason (always reader->writer).
> Add a rw_lock_held() which in Solaris is equivalent to
> (rw_read_held(rw) || rw_write_held()).
> Again, sleeping rwlocks are allowed to be held across sleeps, and the
> documentation should reflect that.
>> Note that in the non-DIAGNOSTIC / MULTIPROCESSOR case, the spin locks
>> basically boil down to at most splraiseipl()/splx(). The adaptive locks
>> remain, however. I don't see that as a big problem:
> Not only is it not a problem, it's a requirement so that LKMs can work
> in UP -or- MP kernels :-)


    -- Garrett
> -- thorpej

Garrett D'Amore, Principal Software Engineer
Tadpole Computer / Computing Technologies Division,
General Dynamics C4 Systems
Phone: 951 325-2134  Fax: 951 325-2191