Subject: kern/17150: userland program make sparc64 fall over
To: None <gnats-bugs@gnats.netbsd.org>
From: None <lha@stacken.kth.se>
List: netbsd-bugs
Date: 06/03/2002 03:28:32
>Number:         17150
>Category:       kern
>Synopsis:       userland program make sparc64 fall over
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Jun 02 18:29:00 PDT 2002
>Closed-Date:
>Last-Modified:
>Originator:     Love
>Release:        NetBSD 1.5ZC
>Organization:
	Stacken Computer Club
>Environment:
System: NetBSD nutcracker.stacken.kth.se 1.5ZC NetBSD 1.5ZC (NUTCRACKER) #18: Wed May 29 13:16:51 CEST 2002 lha@nutcracker.stacken.kth.se:/usr/src/sys/arch/i386/compile/NUTCRACKER i386
Architecture: i386
Machine: i386
>Description:

arla is a afs implementation that uses a package called lwp
for context switching. since I got/stole/borrowed a UltraSparc
I thought I should try it out.

Now we didn't support the sparcv6 for netbsd (just linux and
solaris), so it should be a simple hack to make it support
netbsd too.

The code we inherited did wrong, for example, it alligns the stack to
the wrong boundery (linux program never noticed since they ran in 32
bit mode, at least when it tried it 2 years ago). Now I'm a losy
assembler programmer and did wrong, and to wrongs doesn't to one
right.

So,

	# pwd 
	/sources/arla-obj
	# cd lwp    
	# ./testlwp
	usage: ./testlwp cmd ...
	Where cmd is one of:
	pc              Producer Consumer test
	sleep           Sleeptest
	selectconsumer  Select consumer
	selectproducer  (special case, just print a string on stdout repeatally)
	cancel          Test iomgr cancel
	deadlock-write  deadlockdetection
	deadlock-read   deadlockdetection
	deadlock-read2  deadlockdetection
	overrun-stack   over run the stack
	underrun-stack  under run the stack
	version         Print version
	Use several of these tests together to test their interopability
	# ./testlwp pc
	starting LWkdb breakpoint at 10086bc
	1 tt=30 tstate=4411080405 tpc=0x1001498 tnpc=0x100149c
	2 tt=30 tstate=82000603 tpc=0x1298d30 tnpc=0x1298d34
	Stopped in pid 272 (testlwp) at winfixspill+0x1c8:      nop
	db> trace
	end(trap type 0x34: pc=100ab30 npc=100ab34 pstate=800016<PEF,PRIV,IE>
	kernel trap 34: mem address not aligned
	Type  'go' to resume
	ok go
	Faulted in DDB; continuing...
	db> reboot 100
	syncing disks... P support
	startin9 g I8 OMGR support
	3 done
	Frame pointer is at 0x1c08e01
	Call traceback:
	12b9ba4(0, 1, 1819400, 180c800, 1839e00, 180c800, 1c08ec1) fp = 1c08ec1
	1136c34(100, 0, 1839de0, 1839c00, 13413e0, 187a800, 1c08f81) fp = 1c08f81
	1136890(10086c0, 0, ffffffffffffffff, 1c09920, 1136bec, 187a960, 1c09051) fp = 1c09051
	11363c8(180c9a8, 0, 1, f, f005b2f8, 0, 1c091b1) fp = 1c091b1
	113ad24(10086c0, 10086c0, 187a800, 90d5c20, 1298d30, 1298d34, 1c09291) fp = 1c09291
	12c47c4(0, 0, 0, 0, 30, 1298d34, 1c09361) fp = 1c09361
	12c1c9c(101, 1c09e20, 90d5e6b, 0, 4002d, 0, 1c09421) fp = 1c09421
	1008e40(1c09e20, 101, 10086bc, 140414, 1066f8, 0, 1c09571) fp = 1c09571
	107248(18050e8, 6, 7, 0, 1093c0, 2095a0, 1c09751) fp = 1c09751
	40203654(40210000, 2d0, 2d0, 3c, 0, 0, 40230a2f) fp = 40230a2f
	
	dumping to dev 7,9 offset 262253
	dump starting dump, blkno 262256
	panic: dma0: cannot allocate DVMA address
	kdb breakpoint at 12c4954
	Stopped in pid 272 (testlwp) at cpu_Debugger+0x4:       nop


The interesting functions are savecontext() and return returnto(),
but its not there the crash is going to happen, it happens later.

	(gdb) file testlwp
	Reading symbols from testlwp...done.
	(gdb) disas savecontext
	Dump of assembler code for function savecontext:
	0x105b00 <savecontext>: save  %sp, -192, %sp
	0x105b04 <savecontext+4>:       ta  3
	0x105b08 <savecontext+8>:       sethi  %hi(0), %l0
	0x105b0c <savecontext+12>:      mov  %l0, %l0   ! 0x0
	0x105b10 <savecontext+16>:      sethi  %hi(0x209400), %g1
	0x105b14 <savecontext+20>:      or  %g1, 0x1b0, %g1     ! 0x2095b0 <PRE_Block>
	0x105b18 <savecontext+24>:      sllx  %l0, 0x20, %l0
	0x105b1c <savecontext+28>:      or  %l0, %g1, %l0
	0x105b20 <savecontext+32>:      mov  1, %l1
	0x105b24 <savecontext+36>:      stb  %l1, [ %l0 ]
	0x105b28 <savecontext+40>:      stx  %fp, [ %i1 ]
	0x105b2c <savecontext+44>:      stx  %g1, [ %i1 + 8 ]
	0x105b30 <savecontext+48>:      stx  %g2, [ %i1 + 0x10 ]
	0x105b34 <savecontext+52>:      stx  %g3, [ %i1 + 0x18 ]
	0x105b38 <savecontext+56>:      stx  %g4, [ %i1 + 0x20 ]
	0x105b3c <savecontext+60>:      stx  %g5, [ %i1 + 0x28 ]
	0x105b40 <savecontext+64>:      stx  %g6, [ %i1 + 0x30 ]
	0x105b44 <savecontext+68>:      stx  %g7, [ %i1 + 0x38 ]
	0x105b48 <savecontext+72>:      rd  %y, %g1
	0x105b4c <savecontext+76>:      stx  %g1, [ %i1 + 0x40 ]
	0x105b50 <savecontext+80>:      cmp  %i2, 0
	0x105b54 <savecontext+84>:      be,a   0x105b70 <L1>
	0x105b58 <savecontext+88>:      nop 
	0x105b5c <savecontext+92>:      restore 
	0x105b60 <savecontext+96>:      add  %o2, 7, %o2
	0x105b64 <savecontext+100>:     and  %o2, -8, %o2
	0x105b68 <savecontext+104>:     call  %o0
	0x105b6c <savecontext+108>:     sub  %o2, 0xc1, %sp
	End of assembler dump.
	(gdb) disas returnto
	Dump of assembler code for function returnto:
	0x105b78 <returnto>:    ta  3
	0x105b7c <returnto+4>:  ldx  [ %o0 ], %g1
	0x105b80 <returnto+8>:  sub  %g1, 0xc0, %fp
	0x105b84 <returnto+12>: sub  %fp, 0xc0, %sp
	0x105b88 <returnto+16>: ldx  [ %o0 + 0x40 ], %g1
	0x105b8c <returnto+20>: mov  %g1, %y
	0x105b90 <returnto+24>: ldx  [ %o0 + 8 ], %g1
	0x105b94 <returnto+28>: ldx  [ %o0 + 0x10 ], %g2
	0x105b98 <returnto+32>: ldx  [ %o0 + 0x18 ], %g3
	0x105b9c <returnto+36>: ldx  [ %o0 + 0x20 ], %g4
	0x105ba0 <returnto+40>: ldx  [ %o0 + 0x28 ], %g5
	0x105ba4 <returnto+44>: ldx  [ %o0 + 0x30 ], %g6
	0x105ba8 <returnto+48>: ldx  [ %o0 + 0x38 ], %g7
	0x105bac <returnto+52>: sethi  %hi(0), %l0
	0x105bb0 <returnto+56>: mov  %l0, %l0   ! 0x0
	0x105bb4 <returnto+60>: sethi  %hi(0x209400), %g1
	0x105bb8 <returnto+64>: or  %g1, 0x1b0, %g1     ! 0x2095b0 <PRE_Block>
	0x105bbc <returnto+68>: sllx  %l0, 0x20, %l0
	0x105bc0 <returnto+72>: or  %l0, %g1, %l0
	0x105bc4 <returnto+76>: clr  %l1
	0x105bc8 <returnto+80>: stb  %l1, [ %l0 ]
	0x105bcc <returnto+84>: restore 
	0x105bd0 <returnto+88>: restore 
	0x105bd4 <returnto+92>: retl 
	0x105bd8 <returnto+96>: nop 
	End of assembler dump.

It started to crash when I changed

	0x105b6c <savecontext+108>:     sub  %o2, 0xc0, %sp
to

	0x105b6c <savecontext+108>:     sub  %o2, 0xc1, %sp


	db> ps
	 PID             PPID       PGRP        UID S   FLAGS          COMMAND    WAIT
	>218              216        218          0 7  0x5806          testlwp
	 216              210        216          0 3  0x4086              gdb    wait
	 210                1        210          0 3  0x4086              csh   pause
	 203                1        203          0 3    0x84            inetd   pause
	 194                1        194          0 3    0x84             sshd  select
	 132                1        132          0 3    0x84        mount_mfs  mfsidl
	 102                1        102          0 2    0x84          syslogd
	 85                 1         85          0 3    0x84         dhclient  select
	 6                  0          0          0 3 0x20204         aiodoned aiodone
	 5                  0          0          0 3 0x20204          ioflush  syncer
	 4                  0          0          0 3 0x20204           reaper  reaper
	 3                  0          0          0 3 0x20204       pagedaemon pgdaemo
	 2                  0          0          0 3 0x20204         scsibus0  sccomp
	 1                  0          1          0 3  0x4084             init    wait
	 0                 -1          0          0 3 0x20204          swapper schedul
	db> trace/t 0t218
	trace: pid 218 at 0x92dd331
	issignal(5, 5, 0, 9982008206, 0, 90d5de0) at issignal+0x198
	trap(92dded0, 1874800, 105b08, 1899400, 0, 0) at trap+0x6e4
	Lslowtrap_reenter(1, 2, 20, 22210b, ffffffffffffffff, 0) at Lslowtrap_reenter+0x
	70
	db> c
	panic: winfault: double invalid window at 0x3ff, nsaved=7
	kdb breakpoint at 12c4954
	1 tt=30 tstate=4411080403 tpc=0x1001498 tnpc=0x100149c
	2 tt=30 tstate=82000601 tpc=0x1298d30 tnpc=0x1298d34
	Stopped in pid 218 (testlwp) at cpu_Debugger+0x4:       nop
	db> c
	syncing disks... 
	SIR Reset
	
	Watchdog Reset,  Rebooting.
	Resetting ... 

I'll keep my build/source-tree for a couple of days (and after that
until I run out of diskspace).

Rather then just guessing I thought some a now is I've fixed my
original problem the right way so now my netbsd/sparc64 is
running. Still it would be great if my sparc didn't crash when I did
stupid things to it.

>How-To-Repeat:

	ftp http://www.e.kth.se/~lha/testlwp
	chmod +x testlwp
	./testlwp pc

>Fix:

	Dunno
>Release-Note:
>Audit-Trail:
>Unformatted: