Subject: Re: kern/30831
To: None <kern-bug-people@netbsd.org, gnats-admin@netbsd.org,>
From: Antti Kantee <pooka@cs.hut.fi>
List: netbsd-bugs
Date: 04/03/2007 11:20:02
The following reply was made to PR kern/30831; it has been noted by GNATS.

From: Antti Kantee <pooka@cs.hut.fi>
To: Patrick Welche <prlw1@newn.cam.ac.uk>
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/30831
Date: Tue, 3 Apr 2007 14:16:49 +0300

 On Tue Apr 03 2007 at 12:00:06 +0100, Patrick Welche wrote:
 > On Mon, Apr 02, 2007 at 11:14:42PM +0300, Antti Kantee wrote:
 > >  Patrick: can you get a ps listing out of the kernel?  Anything sleeping
 > >  with the wait channel smb* (probably smbirq, although I'm not familiar
 > >  with the smb code)?
 > 
 > Sadly no:
 > 
 > # ps -M netbsd.1.core
 > ps: can't read pgrp at 0x0: Undefined error: 0
 > 
 > (and my ps/l didn't work because I didn't sync (reboot 0x104))
 
 Maybe try xps from /sys/gdbscripts/xps instead?
 
 > Nice point: "Other file systems get lucky because they don't sleep in reclaim."
 > 
 > I was playing spot-the-difference with ffs I didn't spot one...
 
 When smbfs does vrele() for the parent directory, it might end up in a
 situation where it contacts the server.  There is a definate time window
 between issuing the request to the server and getting the response back.
 During this time the node is in a bad state.
 
 Local file systems don't have this problem because they don't have
 network delay.  I am not sure if they could have disk delay due to this.
 But also, they seem to do the reclaim operation in a slightly different
 order and call vrele() earlier.
 
 If the problem is easy to repeat, please try if this patch/hack makes
 it go away:
 
 Index: smbfs_vfsops.c
 ===================================================================
 RCS file: /cvsroot/src/sys/fs/smbfs/smbfs_vfsops.c,v
 retrieving revision 1.63
 diff -u -r1.63 smbfs_vfsops.c
 --- smbfs_vfsops.c      12 Mar 2007 18:18:32 -0000      1.63
 +++ smbfs_vfsops.c      3 Apr 2007 11:15:32 -0000
 @@ -456,7 +456,13 @@
                         goto loop;
                 simple_lock(&vp->v_interlock);
                 nvp = TAILQ_NEXT(vp, v_mntvnodes);
 +
                 np = VTOSMB(vp);
 +               if (np == NULL) {
 +                       simple_unlock(&vp->v_interlock);
 +                       continue;
 +               }
 +
                 if ((vp->v_type == VNON || (np->n_flag & NMODIFIED) == 0) &&
                     LIST_EMPTY(&vp->v_dirtyblkhd) &&
                      vp->v_uobj.uo_npages == 0) {
 
 
 -- 
 Antti Kantee <pooka@iki.fi>                     Of course he runs NetBSD
 http://www.iki.fi/pooka/                          http://www.NetBSD.org/
     "la qualité la plus indispensable du cuisinier est l'exactitude"