port-sparc64: Re: 4.99.16 (-current) panic

Subject: Re: 4.99.16 (-current) panic
To: Gert Doering <gert@greenie.muc.de>
From: Gert Doering <gert@greenie.muc.de>
List: port-sparc64
Date: 04/09/2007 19:06:31
Hi,

On Mon, Apr 09, 2007 at 05:40:21PM +0200, Juergen Hannken-Illjes wrote:
> > 334                     dp = cwdi->cwdi_cdir;
> > 335                     VREF(dp);
> 
> Strange, this means your current working dir is a vnode with `v_usecount == 0'.
>
> From a quick grep through kern/vfs_syscalls.c this should never happen.
>
> Are you using systrace?

The kernel has been compiled with "options SYSTRACE" (because I planned
to eventually use it), but I haven't yet actually used it for anything.

> Could you print the vnode from ddb if your machine crashes again?

I think I have found a reproducable way to make the machine crash - so 
I followed that theory, and got a slightly differently looking trace.
It's not "namei()" this time, but "cwdinit()" - but still the same
panic message:

panic: vref used where vget required, vp 0xca217c0
cpu0: kdb breakpoint at 12dc3c0
Stopped in pid 96.1 (ksh) at    netbsd:cpu_Debugger+0x4:        nop
vref(ca217c0, be8fae0, 80, cc5a440, 0, cc5a43c) at netbsd:vref+0x28
cwdinit(be8fae0, ce6ce00, ca2a5a0, ce6ce00, 0, 183f400) at netbsd:cwdinit+0x34
fork1(cb8b8c0, 0, 14, 0, cbffe00, 0) at netbsd:fork1+0x674
sys_fork(cb8b8c0, cbffdc0, cbffe00, 0, badcafe, badcafe) at netbsd:sys_fork+0x24
 
syscall_plain(cbffed0, 0, 405346c0, 405346c4, 2, 405346c0) at netbsd:syscall_pla
in+0x130
?(0, 0, 12ea0b, ffffffffffffc530, badcafe, badcafe) at 0x10092cc
db> show vnode /f 0xca217c0
OBJECT 0xca217c0: locked=0, pgops=0x180fac8, npages=0, refs=-1
  PAGES <pg,offset>:
 
VNODE flags 80<LOCKSWORK>
mp 0x2b0d000 numoutput 0 size 0x200
data 0xca1ce60 usecount -1 writecount 0 holdcnt 0 numoutput 0
tag VT_UFS(1) type VDIR(2) mount 0x2b0d000 typedata 0x0
clean bufs:
dirty bufs:


(I hope this is the "right" hex number to put into "show vnode /f",
but "usecount -1" definitely looks like something that would offend
vref()).

The line of code in "cwdinit+0x34" is:

(gdb) list *(cwdinit+0x34)
0x11df7f4 is in cwdinit (/home/src-current/sys/kern/kern_descrip.c:1059).
1054            cwdi = pool_get(&cwdi_pool, PR_WAITOK);
1055    
1056            simple_lock_init(&cwdi->cwdi_slock);
1057            cwdi->cwdi_cdir = p->p_cwdi->cwdi_cdir;
1058            if (cwdi->cwdi_cdir)
1059                    VREF(cwdi->cwdi_cdir);
1060            cwdi->cwdi_rdir = p->p_cwdi->cwdi_rdir;
1061            if (cwdi->cwdi_rdir)
1062                    VREF(cwdi->cwdi_rdir);
1063            cwdi->cwdi_cmask =  p->p_cwdi->cwdi_cmask;

- so it's "cwdi_cdir" again.


> Which process/command does the lookup?

uux, uustat, this time: ksh/ls.

The common pattern seems to be "/var/spool/uucp", which is NFS-exported
to another NetBSD machine.

The first couple of crashes were triggered by "uux", the second to last
crash was triggered by sending a mail to a uucp-connected host (the
offending process being "uustat" this time), and the last crash was
triggered by doing:

  cd /var/spool/uucp
  ls

- so it looks like "something with the root inode of a NFS-exported
file system".  Maybe coupled with "a COMPAT_20 binary"?

gert
-- 
USENET is *not* the non-clickable part of WWW!
                                                           //www.muc.de/~gert/
Gert Doering - Munich, Germany                             gert@greenie.muc.de
fax: +49-89-35655025                        gert@net.informatik.tu-muenchen.de