tech-kern: Re: newlock

Subject: Re: newlock
To: Andrew Doran <ad@NetBSD.org>
From: Jason Thorpe <thorpej@shagadelic.org>
List: tech-kern
Date: 09/07/2006 13:23:51
On Sep 2, 2006, at 8:22 AM, Andrew Doran wrote:

> Hi all,
>
> I have been doing a bit of work on the newlock primatives that Jason
> Thorpe added a few years back. I'd appreciate some constructive review
> of the API before I start checking things in. I have put a couple of
> draft manual pages for the primatives here:
>
> 	ftp://ftp.netbsd.org/pub/NetBSD/misc/ad/newlock/mutex.txt
> 	ftp://ftp.netbsd.org/pub/NetBSD/misc/ad/newlock/rwlock.txt

Couple of things out of the way first:

- Thank you very much for picking this up.  It is sorely needed, and  
I'm sorry that I haven't had the time to really run with it.

- I'm VERY sorry it has taken me so long to provide a real response.

And now, on to the real commentary :-)

First of all, I am assuming that you also picked up my turnstile  
implementation and are going to bring it along with the rest of this  
stuff?  If so, good -- that's what I was hoping for, because I would  
love to get rid of the *sleep interfaces completely, move everything  
that needs that functionality to condition variables, and shed the  
legacy sleep queues.

Secondly, I want to be very careful about straying from the Solaris  
API on this stuff.  Solaris's API is nice and minimalist, and they  
have more than proven that additional baggage in the synchronization  
APIs is not needed in order to get the job done.

==> Mutex comments

Overall, this looks great.

I do, however, think that we should NOT get the *_high() variants.   
Instead, we should fix the code that needs to release spin locks in  
the wrong order.  This might mean that code delays the transition, but  
I think it's better to force fixing it than it is to add a wart to the  
API.  Note that this might mean defining a strict hierarchy of  
interrupt priority levels... right now, that hierarchy is loose,  
except for a few special cases.

Perhaps we should add a MUTEX_DRIVER, like the Solaris API now has.   
OpenSolaris says:

      54  * MUTEX_DRIVER is always used by drivers.  mutex_init()  
converts this to
      55  * either MUTEX_ADAPTIVE or MUTEX_SPIN depending on the  
iblock cookie.

Solaris calls mutex_held() mutex_owned().  We should probably follow  
Solaris in this regard.

Solaris also has a mutex_owner().  Probably not useful except for  
debugging, but it's easy to implement, so we should probably just have  
it in our API, too.

Please avoid changing the ABI if DIAGNOSTIC or LOCKDEBUG are defined.   
I know I made this mistake in my first cut, but let's fix it now :-)

The documentation implies that holding an adaptive mutex across a  
sleep is not allowed.  Actually, it is allowed, but of course leads to  
increased contention for the mutex.  That said, some things might need  
to hold other things off while it sleeps waiting for some condition,  
so it should be officially allowed by the API.

==> Rwlock comments

I don't think we should have adaptive vs. spin-only rwlocks.  Solaris  
only has sleeper-style rwlocks, and we ourselves only have spinlockmgr 
() for a few special cases.

Again, let's get rid of the *_high() versions.  Do we really have  
spinlockmgr()-locks now that are released out-of-order?

Solaris has only rw_enter() with a krw_t argument to indicate the  
mode.  We should stick with the Solaris API for this.  Also, Solaris  
only has a single rw_exit(), and we should follow that lead.

Please rename rw_try_enter() and rw_try_upgrade() to rw_tryenter() and  
rw_tryupgrade(), to follow the Solaris API.

krwlock_op_t should probably be renamed to krw_t, but that is a nit,  
and probably not that important (who actually declares a krw_t in any  
of their own data structures? :-)

rw_downgrade() should not need the krwlock_op_t argument -- by  
definition, you are always going from writer->reader, so the extra  
argument is redundant.

Similarly, rw_tryupgrade() does not need the krwlock_op_t argument,  
for the same reason (always reader->writer).

Add a rw_lock_held() which in Solaris is equivalent to (rw_read_held 
(rw) || rw_write_held()).

Again, sleeping rwlocks are allowed to be held across sleeps, and the  
documentation should reflect that.

> Note that in the non-DIAGNOSTIC / MULTIPROCESSOR case, the spin locks
> basically boil down to at most splraiseipl()/splx(). The adaptive  
> locks
> remain, however. I don't see that as a big problem:

Not only is it not a problem, it's a requirement so that LKMs can work  
in UP -or- MP kernels :-)

-- thorpej