Subject: vnode refcount panic, perhaps due to kern/vfs_lookup.c:lookup()
To: None <tech-kern@netbsd.org>
From: Greg Troxel <gdt@ir.bbn.com>
List: tech-kern
Date: 03/14/2003 16:01:42
I am running recent netbsd-1-6-1, but the code in question is the same
in -current.

My environment:
  1.6.1_RC1ish on i386
  Coda, approximately 5.3.20
  cfs 1.4.1
  emacs

I have ciphertext in coda, in /coda/home/gdt/secret-foo.
This is attached as gdt-foo, and shows up in /crypt/gdt-foo, which is
the NFS mount to cfsd.

Running emacs and editing a file causes symlinks to be created to
indicate that the file is being edited (the filename is
.#<file-being-edited>, and the target is bogus (user/host/pid)).

When I use emacs in cfs (no coda - ciphertext on plain ffs on local
disk), I get occasional failures to write autosave files.  Later,
saving works, and the system stays up.  I really don't know what's up
here.

Emacs in coda, without cfs, ends up with emacs stuck in R.  I turned
off SIGIO and it then worked normally, or at least pretty much ok.
There is almost certainly something wrong in the coda kernel code with
signal handling.

When editing files in cfs-in-code, creating a symlink causes cfs to
create a symlink in coda, as well as the symlink used to store the IV.
This sometimes works, but I can reliably panic the system by typing,
saving, and repeating.

I set up kgdb, and got a panic in the symlink system call, trying to
link .pvect_4be81305e7f14704 to 6b490fd3.  After reading man pages and
code, I think that the problem is the end of lookup, where the code at
bad2: releases ni_dvp, and then falls through to releasing dp.  In my
case (likely odd due to coda wierdness), these are the *same* vnode,
and the second call (to vput) panics since the reference has already
been released.

bad2:
	if ((cnp->cn_flags & LOCKPARENT) && (cnp->cn_flags & ISLASTCN) &&
			((cnp->cn_flags & PDIRUNLOCK) == 0))
		VOP_UNLOCK(ndp->ni_dvp, 0);
	vrele(ndp->ni_dvp);
bad:
	if (dpunlocked)
		vrele(dp);
	else
		vput(dp);
	ndp->ni_vp = NULL;
	return (error);
}

Note that the coda symlink call is a bit odd; it calls venus to make
the symlink and then does a lookup to get the symlink to return.  This
means if the lookup fails for some reason, symlink can return failure
even though the symlink was made.  However, that beats a panic in
lookup() - if lookup failing is supposed to be fatal, coda_symlink
should panic explicitly.

So, perhaps the code at bad: can decline to release if ndp->ni_dvp ==
dp, but that seems perhaps incomplete - I have not grokked the rules
about which vnodes are locked when in these call paths.  Perhaps these
variables being equal is a sign of a larger problem.

My backtrace:

(gdb) bt
#0  kgdb_connect (verbose=0)
    at /home/gdt/FOO-current/netbsd/src/sys/arch/i386/compile/BAR/../../../../arch/i386/i386/kgdb_machdep.c:258
#1  0xc026d638 in kgdb_panic ()
    at /home/gdt/FOO-current/netbsd/src/sys/arch/i386/compile/BAR/../../../../arch/i386/i386/kgdb_machdep.c:273
#2  0xc01bc1a4 in panic (fmt=0xc0319ca1 "vput: ref cnt")
    at /home/gdt/FOO-current/netbsd/src/sys/arch/i386/compile/BAR/../../../../kern/subr_prf.c:229
#3  0xc01d6cdd in vput (vp=0xcf55ed34)
    at /home/gdt/FOO-current/netbsd/src/sys/arch/i386/compile/BAR/../../../../kern/vfs_subr.c:1213
#4  0xc01d57cd in lookup (ndp=0xcf511dc8)
    at /home/gdt/FOO-current/netbsd/src/sys/arch/i386/compile/BAR/../../../../kern/vfs_lookup.c:650
#5  0xc013524d in coda_symlink (v=0xcf511e38)
    at /home/gdt/FOO-current/netbsd/src/sys/arch/i386/compile/BAR/../../../../coda/coda_vnops.c:1656
#6  0xc01dda3e in VOP_SYMLINK (dvp=0xcf55ed34, vpp=0xcf511ea4, cnp=0xcf511eb8, 
    vap=0xcf511edc, target=0xcf293400 "6b490fd3")
    at /home/gdt/FOO-current/netbsd/src/sys/arch/i386/compile/BAR/../../../../kern/vnode_if.c:899
#7  0xc01da8a9 in sys_symlink (p=0xcf2bead0, v=0xcf511f80, retval=0xcf511f78)
    at /home/gdt/FOO-current/netbsd/src/sys/arch/i386/compile/BAR/../../../../kern/vfs_syscalls.c:1521
#8  0xc0275a97 in syscall_plain (frame={tf_gs = 31, tf_fs = -1078001633, 
      tf_es = -1078001633, tf_ds = 1208942623, tf_edi = -1077954368, 
      tf_esi = -1077951284, tf_ebp = -1077950256, tf_ebx = -1077954380, 
      tf_edx = 0, tf_ecx = 1208925264, tf_eax = 57, tf_trapno = 3, tf_err = 2, 
      tf_eip = 1208484243, tf_cs = 23, tf_eflags = 663, tf_esp = -1077954428, 
      tf_ss = 31, tf_vm86_es = 0, tf_vm86_ds = 0, tf_vm86_fs = 0, 
      tf_vm86_gs = 0})
    at /home/gdt/FOO-current/netbsd/src/sys/arch/i386/compile/BAR/../../../../arch/i386/i386/syscall.c:140
#9  0xc0100d42 in syscall1 ()
#10 0x804d013 in ?? ()
#11 0x804b03e in ?? ()
#12 0x480b7751 in ?? ()
#13 0x480b75d4 in ?? ()
#14 0x4808742c in ?? ()
#15 0x8049c50 in ?? ()
#16 0x80497d0 in ?? ()
(gdb)