tech-net: Re: mbuf external storage sharing

Subject: Re: mbuf external storage sharing
To: None <david@l8s.co.uk, martin@duskware.de>
From: List Mail User <track@Plectere.com>
List: tech-net
Date: 10/08/2004 08:54:18
	One (not widely) known algorithm uses the difference in size between
the largest and smallest atomically addressable units to do "small" numbers
of client spin-locks.  For thirty-two bit machines with byte addressing
this allows 4 processors (a number someone else called "large"), to easily
perform a very cheap spin-lock (from memory, the contention-free case was
about 16 instructions and 24 cycles - due to memory delays on a MIPS 3000
case SGI Iris).

	If this would be useful to anyone, I can probably dig up some
code (or better re-write from scratch since I may not have `rights' to
any old code I find).  Basically, its a variant of the "magic" value
algorithm where each processor is pre-assigned a single "small" addressable
unit -- You read the "large" unit, write the "small" unit and use a pseudo-
random backoff if a collision occurs;  If when reading the "large" unit,
the processor's specific "small" unit is the only non-zero value, the lock
has been acquired and of course you spin whenever the "large" unit is not
zero (there's a little more detail, but this should be enough for most
readers).  Of course, many machines have MD methods that are cheaper,
scale better or both (AFAIR, large SGI Iris' had hardware supported spin-lock
registers on the processor board and mapped into a special memory segment).

	Paul Shupak

P.S.  This obviously gives an 8 processor case for most 64-bit machines,
or even on many 32-bit machines with 64-bit memory if coded to use the
available 64-bit instructions (like on an x86 with floating point).

>From tech-net-owner-track=Plectere.com@NetBSD.org Fri Oct  8 02:53:38 2004
>Delivered-To: tech-net@netbsd.org
>Date: Fri, 8 Oct 2004 10:51:59 +0100
>From: David Laight <david@l8s.co.uk>
>To: Martin Husemann <martin@duskware.de>
>Cc: YAMAMOTO Takashi <yamt@mwd.biglobe.ne.jp>, jonathan@dsg.stanford.edu,
>        tech-net@NetBSD.org
>Subject: Re: mbuf external storage sharing
>Mail-Followup-To: Martin Husemann <martin@duskware.de>,
>	YAMAMOTO Takashi <yamt@mwd.biglobe.ne.jp>, jonathan@dsg.stanford.edu,
>	tech-net@NetBSD.org
>References: <E1CFhKX-0003Qh-00@smeg.dsg.stanford.edu> <1097202573.157728.1552.nullmailer@yamt.dyndns.org> <20041008083015.GB26810@drowsy.duskware.de>
>Mime-Version: 1.0
>Content-Type: text/plain; charset=us-ascii
>Content-Disposition: inline
>In-Reply-To: <20041008083015.GB26810@drowsy.duskware.de>
>User-Agent: Mutt/1.4.2.1i
>Sender: tech-net-owner@NetBSD.org
>Precedence: list
>
>On Fri, Oct 08, 2004 at 10:30:15AM +0200, Martin Husemann wrote:
>> On Fri, Oct 08, 2004 at 11:29:33AM +0900, YAMAMOTO Takashi wrote:
>> > 	(it might be good to have MI-exposed atomic_ operations like
>> > 	freebsd and linux.  but it's a separate topic.)
>> 
>> Yes, in deed.
>> Are there any archs that can not easily provide macros for those, so 
>> we could copy the FreeBSD interface?
>> 
>> Otherwise we'd need a feature test macro and fall back to spinlocks,
>> for archs that don't provide the atomice macros.
>
>You can't do atomic increment on sparc, I don't think you can do it on
>ppc (also arm - but i've not seem MP arm).
>
>On architectures with an 'atomic exchange woth memory' you can:
>
>For accurate statistics (but not reference counts) you can use an
>'atomic exchange with memory' to write the new value, and repeat if the
>returned value has changed.
>
>For a reference counts you can use the exchange to write a 'magic' value
>(eg ~0) when reading the current value, then write back the new one.
>Loop and repeat if the read returns the 'magic value'.
>
>Note that these both (effectively) implement a spin-lock using the
>data area as a lock.
>
>For stats, per-processor counts are usually better.
>
>	David
>
>-- 
>David Laight: david@l8s.co.uk
>