Subject: Re: assert_sleepable: NULL curlwp
To: Patrick Welche <prlw1@newn.cam.ac.uk>
From: Andrew Doran <ad@netbsd.org>
List: current-users
Date: 03/14/2007 19:50:34
> With Mar 14 14:53 code, 4.99.15/i386, I just got:
>
> (gdb) bt
> #0  0xc0369d9b in cpu_reboot (howto=0, bootstr=0x0)
>     at ../../../../arch/i386/i386/machdep.c:870
> #1  0xc01a00b0 in db_sync_cmd (addr=0, have_addr=false, count=-1067916251,
>     modif=0xcad388b8 "#X$Xc") at ../../../../ddb/db_command.c:838
> #2  0xc01a05a0 in db_command (last_cmdp=0xc0577ebc, cmd_table=0x0)
>     at ../../../../ddb/db_command.c:511
> #3  0xc01a0958 in db_command_loop () at ../../../../ddb/db_command.c:299
> #4  0xc01a3737 in db_trap (type=1, code=0) at
> ../../../../ddb/db_trap.c:101
> #5  0xc0365ea6 in kdb_trap (type=1, code=0, regs=0xcad38ae0)
>     at ../../../../arch/i386/i386/db_interface.c:226
> #6  0xc0373815 in trap (frame=0xcad38ae0)
>     at ../../../../arch/i386/i386/trap.c:308
> #7  0xc0102f7d in calltrap ()
> #8  0xc0365d24 in cpu_Debugger () at ./machine/cpufunc.h:332
> #9  0xc02f4ca7 in panic (fmt=0xc0513a04 "assert_sleepable: NULL curlwp")
>     at ../../../../kern/subr_prf.c:243
> #10 0xc02c98ee in assert_sleepable (interlock=0xd,
>     msg=0x1 <Address 0x1 out of bounds>) at
> ../../../../kern/kern_lock.c:1481
> #11 0xc03314e1 in _fstrans_start (mp=0xc142a000, lock_type=FSTRANS_SHARED,
>     wait=1) at ../../../../kern/vfs_trans.c:173
> #12 0xc025ce9e in ffs_sync (mp=0xc142a000, waitfor=2, cred=0xcad00ee0,
>     l=0xc05bcfc0) at ../../../../ufs/ffs/ffs_vfsops.c:1321
> #13 0xc032fec3 in sys_sync (l=0xc05bcfc0, v=0x0, retval=0x0)
>     at ../../../../kern/vfs_syscalls.c:718
> #14 0xc0328ad2 in vfs_shutdown () at ../../../../kern/vfs_subr.c:2224
> #15 0xc0369e24 in cpu_reboot (howto=256, bootstr=0x0)
>     at ../../../../arch/i386/i386/machdep.c:856
> #16 0xc01a00b0 in db_sync_cmd (addr=-1070179036, have_addr=false,
>     count=-1067916251, modif=0xcad38ca4 "#X$Xc")
>     at ../../../../ddb/db_command.c:838
> #17 0xc01a05a0 in db_command (last_cmdp=0xc0577ebc, cmd_table=0x0)
>     at ../../../../ddb/db_command.c:511
> #18 0xc01a0958 in db_command_loop () at ../../../../ddb/db_command.c:299
> #19 0xc01a3737 in db_trap (type=1, code=0) at
> ../../../../ddb/db_trap.c:101
> #20 0xc0365ea6 in kdb_trap (type=1, code=0, regs=0xcad38ecc)
>     at ../../../../arch/i386/i386/db_interface.c:226
> #21 0xc0373815 in trap (frame=0xcad38ecc)
>     at ../../../../arch/i386/i386/trap.c:308
> #22 0xc0102f7d in calltrap ()
> #23 0xc0365d24 in cpu_Debugger () at ./machine/cpufunc.h:332
> #24 0xc03cdd95 in wskbd_translate (id=0xc05a5ee0, type=2,
>     value=<value optimized out>) at ../../../../dev/wscons/wskbd.c:1505
> #25 0xc03cdf8e in wskbd_input (dev=0xc124b200, type=2, value=1)
>     at ../../../../dev/wscons/wskbd.c:625
> #26 0xc03d1e41 in pckbd_input (vsc=0xc12fba00, data=1)
>     at ../../../../dev/pckbport/pckbd.c:595
> #27 0xc01e3454 in pckbcintr (vsc=0xc12f8080) at
> ../../../../dev/ic/pckbc.c:640
> #28 0xc01011a8 in Xintr_legacy1 ()

panic/sync don't work as well as they could..

> I'm quite surprised. Essentially, I could e.g. ssh in, get a password
> prompt,
> then hang. It felt like when vnodes get locked. I could break into ddb.
> Plenty of processes were waiting in turnstile?

From ps/l? If it happens again, could you get a backtrace from one of
them with t/a, or any LWPs sitting in vnlock?

> # ps -M netbsd.0.core
> ps: can't read pgrp at 0x0: Undefined error: 0

Is the kernel built with LOCKDEBUG?

Cheers,
Andrew