Subject: port-mips/25942: setcontext() causes kernel panic on MIPS
To: None <gnats-bugs@gnats.NetBSD.org>
From: None <wileyc@rezrov.net>
List: netbsd-bugs
Date: 06/16/2004 15:36:38
>Number:         25942
>Category:       port-mips
>Synopsis:       setcontext() causes kernel panic on MIPS
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    port-mips-maintainer
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Jun 16 06:38:00 UTC 2004
>Closed-Date:
>Last-Modified:
>Originator:     Christopher SEKIYA
>Release:        NetBSD 2.0_BETA
>Organization:
	
>Environment:
	
	
System: NetBSD indigo.rezrov.net 2.0_BETA NetBSD 2.0_BETA (GENERIC32_IP2x) #0: Sun Jun 13 20:36:50 JST 2004 wileyc@izu:/usr/builder/sgimips-2.0/obj/sys/arch/sgimips/compile/GENERIC32_IP2x sgimips
Architecture: mipseb
Machine: sgimips
>Description:

On 23 March 2004, a change was introduced into libc/arch/mips/gen that used
setcontext() to implement longjmp(), rather than the sigreturn() scheme used
in 1.6.

Unfortunately, this change causes kernel panics under various circumstances;
invoking csh as a login shell is the most visible case.  This happens under
both -current (as of 16 June) and 2.0_BETA.  On sgimips, the panic is:

Apr 22 13:28:07 mod80 login: ROOT LOGIN (root) ON console
bus error: cpu_stat 00000203 addr 0887fe40, gio_stat 00000000 addr 1fbc4003
panic: cache error @ EPC 0x882b0010 ErrCtl 0x3 CacheErr 0xa01934f3
panic: cache error @ EPC 0x882b579c ErrCtl 0x3 CacheErr 0xa03b5519
panic: cache error @ EPC 0x882b579c ErrCtl 0x3 CacheErr 0xa03b5519
Stopped in pid 305.1 (csh) at   0x882b1b18:     jr      ra
                bdslot: nop
db>

This is Not Good(tm).

Possible causes:

	* longjmp() is not restoring registers properly.
	* The rtld isn't doing fixups properly.  Invoking a static-linked csh
	  does not produce the panic.
	* Cache botch.  Apparently, an Alchemy Pb1500 doesn't have any problems
	  at all -- but it's the only MIPS CPU supported by NetBSD that has
	  a fully coherent cache.
	* pmap botch.  Suggested by Christos.
	* toolchain miscompilation.  Suggested by Nishimura-san.

>How-To-Repeat:

Invoke csh as a login shell.

>Fix:

I don't know exactly what's wrong with setcontext(), but backing out the
change (and adding COMPAT_16 to the kernel config) prevents the panics.

The patch I've been using to back out the change is:

Index: lib/libc/arch/mips/gen/Makefile.inc
===================================================================
RCS file: /cvsroot/src/lib/libc/arch/mips/gen/Makefile.inc,v
retrieving revision 1.24
diff -u -r1.24 Makefile.inc
--- lib/libc/arch/mips/gen/Makefile.inc	23 Mar 2004 12:31:52 -0000	1.24
+++ lib/libc/arch/mips/gen/Makefile.inc	13 Jun 2004 10:23:44 -0000
@@ -15,7 +15,7 @@
 SRCS+=	flt_rounds.c fpgetmask.c fpgetround.c fpgetsticky.c fpsetmask.c \
 	fpsetround.c fpsetsticky.c
 
-SRCS+=	setjmp.S __setjmp14.S __longjmp14.c
+SRCS+=	setjmp.S __setjmp14.S
 SRCS+=	_setjmp.S
 SRCS+=	sigsetjmp.S __sigsetjmp14.S
 SRCS+=	byte_swap_2.S byte_swap_4.S bswap64.c
Index: lib/libc/arch/mips/gen/__setjmp14.S
===================================================================
RCS file: /cvsroot/src/lib/libc/arch/mips/gen/__setjmp14.S,v
retrieving revision 1.10
diff -u -r1.10 __setjmp14.S
--- lib/libc/arch/mips/gen/__setjmp14.S	23 Mar 2004 02:21:49 -0000	1.10
+++ lib/libc/arch/mips/gen/__setjmp14.S	13 Jun 2004 10:23:45 -0000
@@ -130,6 +130,23 @@
 	move	v0, zero
 	j	ra
 	REG_EPILOGUE
+END(__setjmp14)
+
+LEAF(__longjmp14)
+#ifdef __ABICALLS__
+	.set	noreorder
+	.cpload	t9
+	.set	reorder
+	subu	sp, sp, 32
+	.cprestore 16
+#endif
+	REG_PROLOGUE
+	/* save return value in sc_regs[_R_V0] */
+	REG_S	a1,(_OFFSETOF_SC_REGS + _R_V0 * SZREG)(a0)
+	REG_EPILOGUE
+	li	v0, SYS_compat_16___sigreturn14
+	syscall
 botch:
+	jal	_C_LABEL(longjmperror)
 	jal	_C_LABEL(abort)
-END(__setjmp14)
+END(__longjmp14)
>Release-Note:
>Audit-Trail:
>Unformatted: