Subject: Re: kern/30831
To: None <kern-bug-people@netbsd.org, gnats-admin@netbsd.org,>
From: Antti Kantee <pooka@cs.hut.fi>
List: netbsd-bugs
Date: 04/02/2007 20:10:20
The following reply was made to PR kern/30831; it has been noted by GNATS.

From: Antti Kantee <pooka@cs.hut.fi>
To: gnats-bugs@NetBSD.org
Cc: deadbug@gmail.com, wrstuden@netbsd.org, chs@netbsd.org
Subject: Re: kern/30831
Date: Mon, 2 Apr 2007 23:07:13 +0300

 Ok, here's my theory.
 
 On Mon Apr 02 2007 at 15:10:07 +0000, Patrick Welche wrote:
 >  (gdb) frame 8
 >  #8  0xc0216ec2 in smbfs_sync (mp=0xc14ca000, waitfor=3, cred=0xcad40ee0, 
 >      l=0xcad4f7a0) at ../../../../fs/smbfs/smbfs_vfsops.c:460
 >  460                     if ((vp->v_type == VNON || (np->n_flag & NMODIFIED) == 0) &&
 >  (gdb) print *vp
 >  $1 = {v_uobj = {vmobjlock = {lock_data = 1 '\001', lock_pad = "\000\000", 
 >        lock_file = 0xc056aa8c "../../../../fs/smbfs/smbfs_vfsops.c", 
 >        unlock_file = 0xc0577f7c "../../../../miscfs/genfs/genfs_vnops.c", 
 >        lock_line = 457, unlock_line = 1081, list = {tqe_next = 0x0, 
 >          tqe_prev = 0xc05a29d0}, lock_holder = 0}, pgops = 0xc059e9ec, memq = {
 >        tqh_first = 0x0, tqh_last = 0xcf110988}, uo_npages = 0, uo_refs = 0}, 
 >    v_size = 0, v_flag = 256, v_numoutput = 0, v_writecount = 0, v_holdcnt = 0, 
 v_flag = VXLOCK
 
 >    v_mount = 0xc14ca000, v_op = 0xc118a400, v_freelist = {
 >      tqe_next = 0xcd5a9ae4, tqe_prev = 0xdeadb}, v_mntvnodes = {
 
 note tqe_prev = 0xdeadb
 
 >      tqe_next = 0xcc1e7108, tqe_prev = 0xcc01b34c}, v_cleanblkhd = {
 >      lh_first = 0x0}, v_dirtyblkhd = {lh_first = 0x0}, v_synclist_slot = 0, 
 >    v_synclist = {tqe_next = 0x0, tqe_prev = 0x0}, v_dnclist = {lh_first = 0x0}, 
 >    v_nclist = {lh_first = 0x0}, v_un = {vu_mountedhere = 0x0, vu_socket = 0x0, 
 >      vu_specinfo = 0x0, vu_fifoinfo = 0x0, vu_ractx = 0x0}, v_type = VDIR, 
 >    v_tag = VT_SMBFS, v_lock = {lk_interlock = {lock_data = 0 '\0', 
 
 v_tag = VT_SMBFS
 
 >        lock_pad = "\000\000", 
 >        lock_file = 0xc0534a4f "../../../../kern/kern_lock.c", 
 >        unlock_file = 0xc0534a4f "../../../../kern/kern_lock.c", 
 >        lock_line = 626, unlock_line = 977, list = {tqe_next = 0x0, 
 >          tqe_prev = 0x0}, lock_holder = 4294967295}, lk_flags = 32768, 
 >      lk_sharecount = 0, lk_exclusivecount = 0, lk_recurselevel = 0, 
 >      lk_waitcount = 0, lk_wmesg = 0xc053825b "vnlock", lk_un = {lk_un_sleep = {
 >          lk_sleep_lockholder = -1, lk_sleep_locklwp = 0, lk_sleep_prio = 20, 
 >          lk_sleep_timo = 0, lk_newlock = 0x0}, lk_un_spin = {
 >          lk_spin_cpu = 4294967295, lk_spin_list = {tqe_next = 0x0, 
 >            tqe_prev = 0x14}}}, 
 >      lk_lock_file = 0xc0577f7c "../../../../miscfs/genfs/genfs_vnops.c", 
 >      lk_unlock_file = 0xc0577f7c "../../../../miscfs/genfs/genfs_vnops.c", 
 >      lk_lock_line = 298, lk_unlock_line = 314}, v_vnlock = 0xcf1109f0, 
 >    v_data = 0x0, v_klist = {slh_first = 0x0}}
 
 v_data = NULL
 
 So, this vnode is clearly being recycled.  Now, I can't figure out
 anything that would protect a vnode between vfs_sync (just takes the
 interlock) and VOP_RECLAIM (called unlocked).  And it seems like
 VOP_RECLAIM in smbfs can block *after* setting v_data to NULL and
 otherwise nuking out the vnode:
 
         vp->v_data = NULL;
         smbfs_hash_unlock(smp);
         if (np->n_name)
                 smbfs_name_free(np->n_name);
         pool_put(&smbfs_node_pool, np);
         if (dvp) {
                 vrele(dvp);
                 /*
                  * Indicate that we released something; see comment
                  * in smbfs_unmount().
                  */
                 smp->sm_didrele = 1;
         }
         return 0;
 
 vrele() calls VOP_INACTIVE which calls all kinds of nasty stuff.  So,
 my theory is that the vnode is being reclaimed, sleeping on the final
 leg, and in comes sync(), which goes through all the vnodes and meets
 the one which has been half-reclaimed now.  Other file systems get
 lucky because they don't sleep in reclaim.  No biglock is going to
 be yummy.
 
 Bill, Chuck: This sounds a bit scary, but does it make any sense at all
 or did I miss something obvious?
 
 Patrick: can you get a ps listing out of the kernel?  Anything sleeping
 with the wait channel smb* (probably smbirq, although I'm not familiar
 with the smb code)?
 
 -- 
 Antti Kantee <pooka@iki.fi>                     Of course he runs NetBSD
 http://www.iki.fi/pooka/                          http://www.NetBSD.org/
     "la qualité la plus indispensable du cuisinier est l'exactitude"