port-sparc64: Re: 4.99.16 (-current) panic

Subject: Re: 4.99.16 (-current) panic
To: Gert Doering <gert@greenie.muc.de>
From: Juergen Hannken-Illjes <hannken@eis.cs.tu-bs.de>
List: port-sparc64
Date: 04/09/2007 19:38:06
On Mon, Apr 09, 2007 at 07:06:31PM +0200, Gert Doering wrote:
> Hi,
> 
> On Mon, Apr 09, 2007 at 05:40:21PM +0200, Juergen Hannken-Illjes wrote:
> > > 334                     dp = cwdi->cwdi_cdir;
> > > 335                     VREF(dp);
> > 
> > Strange, this means your current working dir is a vnode with `v_usecount == 0'.
> >
> > From a quick grep through kern/vfs_syscalls.c this should never happen.
> >
> > Are you using systrace?
> 
> The kernel has been compiled with "options SYSTRACE" (because I planned
> to eventually use it), but I haven't yet actually used it for anything.
> 
> > Could you print the vnode from ddb if your machine crashes again?
> 
> I think I have found a reproducable way to make the machine crash - so 
> I followed that theory, and got a slightly differently looking trace.
> It's not "namei()" this time, but "cwdinit()" - but still the same
> panic message:
> 
> panic: vref used where vget required, vp 0xca217c0
> cpu0: kdb breakpoint at 12dc3c0
> Stopped in pid 96.1 (ksh) at    netbsd:cpu_Debugger+0x4:        nop
> vref(ca217c0, be8fae0, 80, cc5a440, 0, cc5a43c) at netbsd:vref+0x28
> cwdinit(be8fae0, ce6ce00, ca2a5a0, ce6ce00, 0, 183f400) at netbsd:cwdinit+0x34
> fork1(cb8b8c0, 0, 14, 0, cbffe00, 0) at netbsd:fork1+0x674
> sys_fork(cb8b8c0, cbffdc0, cbffe00, 0, badcafe, badcafe) at netbsd:sys_fork+0x24
>  
> syscall_plain(cbffed0, 0, 405346c0, 405346c4, 2, 405346c0) at netbsd:syscall_pla
> in+0x130
> ?(0, 0, 12ea0b, ffffffffffffc530, badcafe, badcafe) at 0x10092cc
> db> show vnode /f 0xca217c0
> OBJECT 0xca217c0: locked=0, pgops=0x180fac8, npages=0, refs=-1
>   PAGES <pg,offset>:
>  
> VNODE flags 80<LOCKSWORK>
> mp 0x2b0d000 numoutput 0 size 0x200
> data 0xca1ce60 usecount -1 writecount 0 holdcnt 0 numoutput 0
> tag VT_UFS(1) type VDIR(2) mount 0x2b0d000 typedata 0x0
> clean bufs:
> dirty bufs:
> 
> 
> (I hope this is the "right" hex number to put into "show vnode /f",
> but "usecount -1" definitely looks like something that would offend
> vref()).
> 
> The line of code in "cwdinit+0x34" is:
> 
> (gdb) list *(cwdinit+0x34)
> 0x11df7f4 is in cwdinit (/home/src-current/sys/kern/kern_descrip.c:1059).
> 1054            cwdi = pool_get(&cwdi_pool, PR_WAITOK);
> 1055    
> 1056            simple_lock_init(&cwdi->cwdi_slock);
> 1057            cwdi->cwdi_cdir = p->p_cwdi->cwdi_cdir;
> 1058            if (cwdi->cwdi_cdir)
> 1059                    VREF(cwdi->cwdi_cdir);
> 1060            cwdi->cwdi_rdir = p->p_cwdi->cwdi_rdir;
> 1061            if (cwdi->cwdi_rdir)
> 1062                    VREF(cwdi->cwdi_rdir);
> 1063            cwdi->cwdi_cmask =  p->p_cwdi->cwdi_cmask;
> 
> - so it's "cwdi_cdir" again.
> 
> 
> > Which process/command does the lookup?
> 
> uux, uustat, this time: ksh/ls.
> 
> The common pattern seems to be "/var/spool/uucp", which is NFS-exported
> to another NetBSD machine.
> 
> The first couple of crashes were triggered by "uux", the second to last
> crash was triggered by sending a mail to a uucp-connected host (the
> offending process being "uustat" this time), and the last crash was
> triggered by doing:
> 
>   cd /var/spool/uucp
>   ls
> 
> - so it looks like "something with the root inode of a NFS-exported
> file system".  Maybe coupled with "a COMPAT_20 binary"?

It may be NFS exported, but is not a root inode.
Which binaries are COMPAT_20? ksh / ls or uu??

Whenever `v_usecount' gets decremented there should be a DIAGNOSTIC check.

Could you add `options DIAGNOSTIC' to your kernel config and try again?

-- 
Juergen Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig (Germany)