Subject: Re: SMP API things, take 2
To: Jason Thorpe <thorpej@nas.nasa.gov>
From: Stefan Grefen <Stefan.Grefen@tantau.com>
List: tech-smp
Date: 07/30/1999 01:14:57
Jason Thorpe wrote:
> 
> On Fri, 30 Jul 1999 00:21:40 +0200
>  Stefan Grefen <Stefan.Grefen@tantau.com> wrote:
> 
>  > I checked the mail and can't find it (besides the 0/1 issue)
>  > On PA-Risc a lock must be cache-line sized. You don't want a lock per page
>  > or so. HPUX creates a hash table with physical locks. When a spinlock is
>  > initialized it is entered into the hashlist (by address) multiple spinlock
>  > share a physical lock. This reduces memory demand by an order of magnitude.
>  > Having a definite end on life time helps a lot here (esp. if the lock is
>  > still locked when discarded).
> 
> Aha, I see what you are saying now.  So, in this case, locking one thing
> might lock N other things, too?  That's unfortunate; it greatly complicates
> the case where you want to lock multiple things (using an ordered locking
> protocol, of course :-)

That's the way HPUX does it. If I remember correctly, the xchg32 instruction
is fastest atomic instruction on the PA-Risc (maybe on the V and X class only)
and can we afford 32 bytes for every simple lock (every page???).
That's why HP came up with the hashed locks. 
There are other architectures with restriction on locks (any thing that's NUMA
probably) where your lock will be a pointer into memory that can hold locks.

Putting it in place now is not much work, (before the locks are everywhere).
It doesn't hurt performance (it's a NO), and we may be glad it is in place
in the future.

> 
> My solution was to tell the compiler to align the lock_data (according
> to Bill Sommerfeld, the atomic operations will work if 16-byte aligned,
> so the compiler can do this with a simple __attribute__ directive).

Space is the issue not alignment. For example an vm_anon struct would increase
5 fold in size. It is a lock and 4 int's or pointers.

> 
>  > It is also a great help in debugging, without it the cpu-lock will go
>  > astray and its hard to figure out where the locks are lost. This function
>  > would be NOP in the normal case, but could complain if a lock is returned
>  > locked or just account the cpu lock count.
> 
> LOCKDEBUG already has checks for this; if you free an object with an
> embedded lock and it is already lock, it complains at you.

Not everything comes out of a pool. And simple_lock_destroy will much 
faster with LOCKDEBUG turned on than checking all locks when
freeing memory.
  
> 
>         -- Jason R. Thorpe <thorpej@nas.nasa.gov>