Subject: kern/27023: kernel crashes in LWP code from userland
To: None <gnats-bugs@gnats.NetBSD.org>
From: None <reinoud@netbsd.org>
List: netbsd-bugs
Date: 09/24/2004 13:45:37
>Number:         27023
>Category:       kern
>Synopsis:       kernel crashes in LWP code from userland
>Confidential:   yes
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Sep 24 11:46:00 UTC 2004
>Closed-Date:
>Last-Modified:
>Originator:     Reinoud Zandijk
>Release:        NetBSD 2.0G
>Organization:
NetBSD
	
>Environment:
        - NetBSD/sparc 2-0 installation (200409050000)                                                                                 
        - NetBSD/sparc kernel 2.0G                                                                                                     
        - Sun SPARCclassic                                                                                                             
        - mysql-server/client 3.x                                                                                                      
        - dspam with mysql driver                                                                                                      
	
System: NetBSD rangerover 2.0G NetBSD 2.0G (GENERIC) #0: Thu Aug 26 12:02:22 CEST 2004 imago@rangerover:/usr/sources/cvs.netbsd.org/src/sys/arch/sparc/compile/GENERIC sparc
Architecture: sparc
Machine: sparc

>Description:

When running dspam tools, it connects to the mysql server trough the mysql 
socket and ran fine. At a time i used a second process but since it had 
connection problems to the server, i temporary suspended the dspam process 
for it might be that it was consuming all the precious little 
processortime. The connection didnt come. I then resumed the dspam 
processes but it seemed stuck again.
                                                                                                                                       
It turned out that a mysqld process was waiting on `sigwait' (according to 
`top'). Thinking that it might be `convinced' to run again i signaled it a 
`CONT' signal. That didn't help. Then i decided to signal it a `TERM' (9)  
to get rid of it and that panicked the machine.
                                                                                                                                       
the last `top' i saw was had the entry :                                                                                               
  429 mysql    -18    4    18M 8824K anonget2 124:13  0.00%  0.00% <mysqld>                                                            
                                                                                                                                       
the last the kernel remembered :                                                                                                       
                                                                                                                                       
Sep 24 04:03:38 rangerover syslogd: restart
Sep 24 04:03:38 rangerover /netbsd: data fault: pc=0xf018deb4 addr=0x24 sfsr=326 <PERR=0,LVL=3,AT=1,FT=1,FAV,OW>
Sep 24 04:03:38 rangerover /netbsd: panic: kernel fault
Sep 24 04:03:39 rangerover /netbsd: syncing disks... stopping on keyboard abort
Sep 24 04:03:39 rangerover /netbsd: panic: PROM sync command
Sep 24 04:03:39 rangerover /netbsd: Frame pointer is at 0xf0326000
Sep 24 04:03:39 rangerover /netbsd: Call traceback:

.... depending on the installation it looks like a non privilidged can
cause this abort.


dissassemble :
Dump of assembler code for function lwp_continue:
0xf018de7c <lwp_continue>:      save  %sp, -104, %sp
0xf018de80 <lwp_continue+4>:    sethi  %hi(0xf0331800), %o0
0xf018de84 <lwp_continue+8>:    
    ld  [ %o0 + 0x304 ], %o1    ! 0xf0331b04 <lwp_debug>
0xf018de88 <lwp_continue+12>:   cmp  %o1, 0
0xf018de8c <lwp_continue+16>:   be  0xf018deb4 <lwp_continue+56>
0xf018de90 <lwp_continue+20>:   sethi  %hi(0xf02e6c00), %o0
0xf018de94 <lwp_continue+24>:   ld  [ %i0 + 0x10 ], %o3
0xf018de98 <lwp_continue+28>:   ld  [ %o3 + 0x34 ], %o1
0xf018de9c <lwp_continue+32>:   or  %o0, 0x3a0, %o0
0xf018dea0 <lwp_continue+36>:   ld  [ %i0 + 0x28 ], %o2
0xf018dea4 <lwp_continue+40>:   add  %o3, 0x159, %o3
0xf018dea8 <lwp_continue+44>:   ld  [ %i0 + 0x24 ], %o4
0xf018deac <lwp_continue+48>:   call  0xf01ae280 <printf>
0xf018deb0 <lwp_continue+52>:   ld  [ %i0 + 0x34 ], %o5

0xf018deb4 <lwp_continue+56>:   ld  [ %i0 + 0x24 ], %o0

0xf018deb8 <lwp_continue+60>:   cmp  %o0, 8
0xf018debc <lwp_continue+64>:   bne  0xf018dee4 <lwp_continue+104>
0xf018dec0 <lwp_continue+68>:   nop 
0xf018dec4 <lwp_continue+72>:   ld  [ %i0 + 0x34 ], %o0
0xf018dec8 <lwp_continue+76>:   cmp  %o0, 0
0xf018decc <lwp_continue+80>:   bne  0xf018dee0 <lwp_continue+100>


>How-To-Repeat:

follow the instructions above

>Fix:
Update or downgrade kernel? maybe its not apparent on 2-0 release?
>Release-Note:
>Audit-Trail:
>Unformatted: