Subject: changes for hppa pthreads
To: None <tech-kern@netbsd.org>
From: Chuck Silvers <chuq@chuq.com>
List: tech-kern
Date: 07/14/2004 09:26:14
hi folks,

we need a few changes to the MI libpthread and kernel code to allow for
a number of unique properties of the PA hardware and ABI:

 (1) PLABELs

    this convention is where (like on AIX) C function pointers are indirect.
    the actual value put into a register is a pointer to 2 values:
    the real code address and a pointer to per-shared-object global data.
    this shows up in pthread__resolve_locks(), where we are comparing
    a saved PC value to a particular function pointer.  the current code
    does this with an integer comparison, but for PA we need it to be
    a comparison of function pointers.  gcc emits calls to a millicode
    function called "__canonicalize_funcptr_for_compare" which deals with
    either of the function pointers being PLABELs.

 (2) stack grows up

    on all other platforms the stack grows toward smaller addresses,
    on the PA the stack grows toward larger addresses.  I added a hook
    to expose the kernel STACK_* macros to userland so I could use these
    in the libpthread functions that muck with stacks.

 (3) spinlock values are reversed (0 is locked, non-zero is unlocked)

    the atomic instruction on the PA puts a 0 into a memory address
    while returning the old value.  this means that an uninitialized
    spinlock is held instead of free, so we must be sure that *all* locks
    are initialized before being used (there were several global locks
    in libpthread that were not initialized).

 (4) spinlocks must be 16-byte aligned

    the atomic instruction on the PA additionally only works when the
    address it's used on is on a 16-byte boundary.  I haven't done
    anything about this yet, since RAS is sufficient for the moment.

    the current definition of __cpu_simple_lock_t for PA tries to
    use the GCC "aligned" attribute in a typedef, but that doesn't
    actually have any effect.  to fix this, we'll eventually need to
    define this as an array of 4 ints (or a structure containing such
    an array) and then just use the element that's on the 16-byte boundary.
    I was thinking this would require adding some more macros like

	__SIMPLELOCK_SET_LOCKED()
	__SIMPLELOCK_SET_UNLOCKED()
	__SIMPLELOCK_ISLOCKED()

    and using those instead of having assumptions in MI code that
    __cpu_simple_lock_t is an integral type.  but that'll come later.


I've put the changes in ftp://ftp.netbsd.org/pub/NetBSD/misc/chs/hp700/ :

	hppa-pthreads-2004071401.tgz
	diff.hppa-pthreads.lib.2004071401
	diff.hppa-pthreads.sys.2004071401

extract the tar file and apply both patches and it should build.
the diffs still contain a little debug code (which will be removed)
and some other fixes for floating-point on PA-7300LC CPUs, which will
be split out and committed separately.

one thing I'd like to get more input on is what should go into
struct mcontext.  right now I've got it as:

	31 general registers (r0 is always 0)
	32 floating-point registers
	PSW (process status word)
	SAR (shift amount register, aka cr11)
	pcsqh, pcsqt, pcoqh, pcoqt (PC regs: space and offset, current and next)
	sr0 to sr4
	cr26 and cr27 (cr27 is the ABI's thread-local-storage register,
		       cr26 is also visible from user code but I don't know
		       if there's any convention for its use)

currently we ignore attempts to change the space registers (which works for
now since we only give each process permission to access its one space anyway),
but it seems good to put these in here anyway in case we want to support
multiple spaces per process someday.

is there anything that should be added to or removed from mcontext?


other comments?  questions?  complaints?  if there are no objections
I'll commit this stuff this weekend.

-Chuck