Subject: softdep-related panic two days in a row, 2.0_BETA/i386
To: None <current-users@netbsd.org>
From: Jeff Rizzo <riz@redcrowgroup.com>
List: current-users
Date: 07/13/2004 10:08:21
After a number of months running smoothly (in relation to softdeps, anyway),
I just got a softdep-related panic for the second day in a row with
a similar workload on the machine (./build.sh for -current with -j4).

panic: allocdirect_merge: ob 0 != nb 12663744 || lbn 1 >= 12 ||
osize 0 != nsize 16384
Stopped in pid 20708.1 (nbmakeinfo) at  netbsd:breakpoint+0x4:  leave
db{0}> trace
breakpoint(c057d3f8,d03b98c8,d03b98bc,c037cac2,d03b98c8) at netbsd:breakpoint+0x
4
cpu_Debugger(d03b98c8,100,d03b990c,c02f1b00,c0573f00) at netbsd:cpu_Debugger+0xb

panic(c0573f00,0,0,c13bc0,0) at netbsd:panic+0xd4
allocdirect_merge(d13767dc,d16070c4,d1607e84,c037a965,c0618b70) at netbsd:allocd
irect_merge+0x88
merge_inode_lists(d1376798,2f3b4f,1,d03b99bc,2f3b4f) at netbsd:merge_inode_lists
+0x105
softdep_setup_freeblocks(e1f928f0,0,0,0,e1f932dc) at netbsd:softdep_setup_freebl
ocks+0x6a3
ffs_truncate(d03b9d14,0,d03b9d1c,c02fa0e6,c0580da0) at netbsd:ffs_truncate+0xc80

VOP_TRUNCATE(e1f932dc,0,0,0,ffffffff) at netbsd:VOP_TRUNCATE+0x5b
ufs_inactive(d03b9db4,d03b9dc4,d03b9dcc,c03adf38,c05809e0) at netbsd:ufs_inactiv
e+0x115
VOP_INACTIVE(e1f932dc,ce27c4d0,d03b9dec,c03aad80,0) at netbsd:VOP_INACTIVE+0x37
vput(e1f932dc,2,c2917980,ce27c4d0,c2aa1c68) at netbsd:vput+0x122
vn_close(e1f932dc,2,c2917980,ce27c4d0,ce480414) at netbsd:vn_close+0x4d
vn_closefile(d1475f98,ce27c4d0,800c,0,0) at netbsd:vn_closefile+0x20
closef(d1475f98,ce27c4d0,d03b9ebc,c035ad66,ce27c520) at netbsd:closef+0x1d9
fdfree(ce27c4d0,0,d03b9f0c,c035565c,0) at netbsd:fdfree+0xb0
exit1(ce2185b4,100,0,c0440cd2,0) at netbsd:exit1+0x2bb
sys_exit(ce2185b4,d03b9f64,d03b9f5c,c044079a,fffffffe) at netbsd:sys_exit+0x41
syscall_plain() at netbsd:syscall_plain+0x12a
--- syscall (number 1) ---
0x4811806b:
db{0}> mach cpu 1
db{0}> trace
acquire(c065ba60,d03b3e88,400000,0,600) at netbsd:acquire+0x42
lockmgr(c065ba60,400002,0,c0441bd1,c288c800) at netbsd:lockmgr+0x614
_kernel_proc_lock(d14ddf00,4865c000,d03b3f9c,c0441607,106) at netbsd:_kernel_pro
c_lock+0x1a
trap() at netbsd:trap+0x7bd
--- trap (number 6) ---
0x81bfc98:
db{0}> 

The only thing particularly different from GENERIC.MP about the
kernel I'm running is that NEW_BUFQ_STRATEGY is enabled.
A quick glance over the PR database doesn't show any open PRs with
this panic mentioned, so I figure I should file a PR, but was wondering
if anyone had any suggestions about particulars to includes, or any
suggestions of stuff to look at while the machine is sitting at
the db{0}> prompt, because I've been 100% unsuccessful at getting a
crash dump out of this machine because it hangs when I try.  :(

Suggestions welcome;  if I don't hear anything in an hour or two,
I'll just go ahead and submit what I've got in the interest of getting
this machine running again.

Thanks!

+j
-- 
Jeff Rizzo                                         http://www.redcrowgroup.com/