I recently started hacking on NetBSD/amiga and have been working on a patch that I'd like to contribute to -current. I still need to wrap the code with an appropriate #ifdef and do some more testing, but I'd like to get some feedback and find out if there's an open PR for this issue, or if I should create a new PR for the patch. I've attached the current version with comments below. It's only beneficial for m68k systems with a hardware FPU. The problem is that -current has new TLS support which is currently implemented on m68k with a system call to _lwp_getprivate(). This causes a storm of system calls when pthreads are linked, as every call to pthread_self() results in a system call, including stdio calls, which are protected by POSIX mutexes. Another symptom is that ATF test cases mutex2 and mutex3 (in lib/libpthread/t_mutex) time out after 300 seconds. I patched lib/libc/arch/m68k/sys/__m68k_read_tp.S to cache the value returned from _lwp_getprivate in %fp7, with the high 32 bits set to indicate a signaling NaN with a specific bit pattern (the default FP register value is 0x7fffffff...). On entry to the function, %fp7 is pushed to the stack, and if the high long word matches the special bit pattern, the lower 32 bits are returned as the TLS pointer. If the most significant 16 bits of %fp7 are not 0x7fff, then %fp7 is presumed to be in use, and the value is not overwritten. It was also necessary to patch _lwp_makecontext() in lib/libc/arch/m68k/gen/_lwp.c to update %fp7 with the new TLS base when a new context is created, or else it would inherit the cached TLS base of the parent LWP. The FP registers are stored as extended precision in the uc_mcontext, so the magic bit pattern and TLS base are shifted 11 bits. The register is only modified if _UC_FPU is set in uc_flags and the current value of %fp7 is a signaling NaN. No kernel changes or other diffs are required. Because %fp7 is the lowest priority register for GCC, I could find very little usage in the NetBSD userland (at least from running "objdump -d" and grepping for "fp7"). I'm investigating whether GCC generated code is any different if I change %fp7 to reserved in FIXED_REGISTERS in gcc/config/m68k/m68k.h. At any rate, all existing code will continue to work due to the check for the special NaN bit pattern, unless there is some code out there that is specifically (ab)using NaN values in FP registers in the same way to hold integer values. My system is stable so far, and test cases mutex2 and mutex3 now pass (in 148 seconds, on a 40MHz 68040). The attached patch includes two other small and somewhat related changes: I commented out the SMT spinloop code in libpthread/pthread_lock.c (with #ifdef SMP), and changed "tas" to "bset #7" in sys/arch/m68k/include/lock.h. I'd like to see if all of the spinloop code can be commented out for non-SMP archs like m68k, and changed to sched_yield() or similar. The FreeBSD implementation of pthreads reads the "hw.ncpu" sysctl and only spins if ncpu > 1. So I think it'd be nice if NetBSD only included the spinloop code on archs that support SMP, and then only activated it when the CPU count is > 1. The pthread_lock.c modification seems the safest, as it already called sched_yield(), but there are other places in pthreads that only spinwait, so I need to do more testing to see how the behavior changes when sched_yield() is called in all places that would otherwise spin. Any suggestions for the best way to approach this set of changes would be appreciated. As for the "tas" instruction, the Amiga hardware reference manual warns that the special read-modify-write cycle it generates is unsafe on the Amiga platform. I haven't seen any indication of flakiness on my system, but it has the same effect as "bset #7" on non-SMP systems, and because NetBSD doesn't support SMP on m68k (AFAIK), it should be safest to change this for all m68k machines. There's a comment to this effect in external/gpl3/gcc/dist/libstdc++-v3/config/cpu/m68k/atomicity.h, where "tas" is only used when __mcpu32__ or __mcfisab__ or __mcfisac__ is defined, and "bset #7" is used otherwise. For the %fp7 changes, what would be an appropriate #ifdef guard for the new code? I'm thinking #ifndef SOFTFLOAT, but perhaps there's something better? I'm pretty sure that the software FP emulator will run it correctly, but I suspect it'd be slower than the current code that always calls _lwp_getprivate. At least for Amiga, Atari, and Mac68k, I'd assume that the vast majority of NetBSD systems (especially running -current) have an FPU, so I think it should be enabled by default for those platforms (and possibly all except for sun2)? Regards, Jake Hamby
Attachment:
use-fp7-for-tls-patch.diff
Description: Binary data