current-users: Re: panic: kernel diagnostic assertion "lwp_locked(l, &l->l_cpu->ci_schedstate.spc

Subject: Re: panic: kernel diagnostic assertion "lwp_locked(l, &l->l_cpu->ci_schedstate.spc_lwplock)" failed
To: Nicolas Joly <njoly@pasteur.fr>
From: Andrew Doran <ad@netbsd.org>
List: current-users
Date: 05/31/2007 10:06:54

Hi Nicholas,

On Thu, May 31, 2007 at 12:31:50AM +0200, Nicolas Joly wrote:

> I just encountered a kernel diagnostic assertion panic, while trying
> to debug a fork()/clone() problem on my -current NetBSD/amd64 under
> compat_linux.
> 
> panic: kernel diagnostic assertion "lwp_locked(l, &l->l_cpu->ci_schedstate.spc_lwplock)" failed: file "/local/src/NetBSD/src/sys/kern/kern_synch.c", line 559
> 
> I was able to reproduce it with a native application too. Run a
> process that will call fork(), stop the chils after fork with `sysctl
> -w proc.<pid>.stopfork=1', and finally attach the stopped child with gdb.
> 
> njoly@lanfeust [tmp/syscall]> uname -a
> NetBSD lanfeust.sis.pasteur.fr 4.99.20 NetBSD 4.99.20 (LANFEUST_DEVEL) #20: Wed May 30 18:51:24 CEST 2007  njoly@lanfeust.sis.pasteur.fr:/local/src/NetBSD/obj/amd64/sys/arch/amd64/compile/LANFEUST_DEVEL amd64
> 
> njoly@lanfeust [tmp/syscall]> file fork
> fork: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), for NetBSD 4.99.20, dynamically linked (uses shared libs), not stripped
> njoly@lanfeust [tmp/syscall]> ./fork
> father pid 1722
> child pid 746
> 
> njoly@lanfeust [tmp/syscall]> sysctl -w proc.1722.stopfork=1
> proc.1722.stopfork: 0 -> 1
> njoly@lanfeust [tmp/syscall]> ps
>  PID TTY   STAT    TIME COMMAND
>  746 ttyp1 T+   0:00.00 ./fork 
> 1722 ttyp1 S+   0:00.00 ./fork 
> njoly@lanfeust [tmp/syscall]> gdb ./fork 746
> GNU gdb 6.5
> [...]
> This GDB was configured as "x86_64--netbsd"...
> Attaching to program: /home/njoly/tmp/syscall/fork, process 746
> [KERNEL PANIC]
> 
> panic: kernel diagnostic assertion "lwp_locked(l, &l->l_cpu->ci_schedstate.spc_lwplock)" failed: file "/local/src/NetBSD/src/sys/kern/kern_synch.c", line 559
> Stopped in pid 1310.1 (gdb) at  netbsd:cpu_Debugger+0x5:        leave
> db{0}> mach cpu 0
> using CPU 0
> db{0}> bt
> cpu_Debugger() at netbsd:cpu_Debugger+0x5
> panic() at netbsd:panic+0x1fc
> __assert() at netbsd:__assert+0x21
> setrunnable() at netbsd:setrunnable+0x2a7
> proc_unstop() at netbsd:proc_unstop+0xfe
> sys_ptrace() at netbsd:sys_ptrace+0xb68
> syscall_plain() at netbsd:syscall_plain+0x1cf
> uvm_fault(0xffff80004cab0250, 0x0, 1) -> e
> kernel: page fault trap, code=0
> Faulted in DDB; continuing...
> db{0}> mach cpu 1
> using CPU 1
> db{0}> bt
> spllower() at netbsd:spllower+0x2e
> syscall_plain() at netbsd:syscall_plain+0x1c1
> uvm_fault(0xffff80004cab0250, 0x0, 1) -> e
> kernel: page fault trap, code=0
> Faulted in DDB; continuing...

Thanks for the report & test case. This appears to be a different
manifestation of the problem recorded in kern/36398 (kernel diagnostic
assertion while runing a native java). I haven't yet had the time to fix
it yet, but will be looking into it in the next couple of days.

Andrew