Subject: Re: More data. Re: kernel panic in nfs_reclaim (kern/17107)
To: Artem Belevich <art@riverstonenet.com>
From: Christos Zoulas <christos@zoulas.com>
List: tech-kern
Date: 10/02/2002 15:12:04
On Oct 2, 11:23am, art@riverstonenet.com (Artem Belevich) wrote:
-- Subject: More data. Re: kernel panic in nfs_reclaim (kern/17107)

Ok, we are narrowing this down. Something that unmount the filesystem
does not pay attention to flushing the vnode. Could that be happening
*during* unmount. I.e. we might be calling nfs_reclaim while the nfs_unmount
is in progress? Does the following help?

christos

Index: nfs_node.c
===================================================================
RCS file: /cvsroot/syssrc/sys/nfs/nfs_node.c,v
retrieving revision 1.53
diff -u -u -r1.53 nfs_node.c
--- nfs_node.c	2002/03/16 23:05:25	1.53
+++ nfs_node.c	2002/10/02 19:11:19
@@ -272,7 +272,8 @@
 	} */ *ap = v;
 	struct vnode *vp = ap->a_vp;
 	struct nfsnode *np = VTONFS(vp);
-	struct nfsmount *nmp = VFSTONFS(vp->v_mount);
+	extern struct simplelock mntvnode_slock;
+	struct nfsmount *nmp;
 
 	if (prtactive && vp->v_usecount != 0)
 		vprint("nfs_reclaim: pushing active", vp);
@@ -282,9 +283,12 @@
 	/*
 	 * For nqnfs, take it off the timer queue as required.
 	 */
+	simple_lock(&mntvnode_slock);
+	nmp = VFSTONFS(vp->v_mount);
 	if ((nmp->nm_flag & NFSMNT_NQNFS) && np->n_timer.cqe_next != 0) {
 		CIRCLEQ_REMOVE(&nmp->nm_timerhead, np, n_timer);
 	}
+	simple_unlock(&mntvnode_slock);
 
 	/*
 	 * Free up any directory cookie structures and

christos

| I've got the panic tonight and I still have machine in DDB.  I think I
| can keep it this way for couple more hours. So if somebody would like
| to get more info from DDB - I'd be happy to type commands for you.
| 
| Here's the stack trace. This time from 1.6 GENERIC_DIAGNOSTIC kernel.
| 
| nfs_reclaim(e6200c54,8,0,c02a6953,e47dcc9c) at nfs_reclaim+0x54
| VOP_RECLAIM(e4cd70f4,e3c42740,200000,0) at VOP_RECLAIM+0x2e
| vclean(e4cd70f4,8,e3c42740,c025eb3c) at vclean+0x107
| vgonel(e4cd70f4,e3c42740,0,c026034e) at vgonel+0x46
| getnewvnode(1,c10a4200,c0f7ef00,e6200d4c,0) at getnewvnode+0x210
| ffs_vget(c10a4200,56b198,e6200dd8,e3c42740,e58bfcb4) at ffs_vget+0x4f
| ufs_lookup(e6200e10,30002,e6200e20,c02b14f9,e6200ef8) at ufs_lookup+0x74a
| VOP_LOOKUP(e58bfcb4,e6200f08,e6200f1c,c02aac3a,e58bfcb4) at VOP_LOOKUP+0x35
| lookup(e6200ef8,e758d000,400,e6200f10,e6200f80) at lookup+0x2a4
| namei(e6200ef8,e57fd77c,e6200f1c,2) at namei+0x2f1
| sys_unlink(e3c42740,e6200f80,e6200f78,c0375e0f) at sys_unlink+0x3f
| syscall_plain(1f,1f,1f,1f,0) at syscall_plain+0xa7
| 
| 
| I've checked the VNODE and v_data and v_mount pointers:
| 
| db> show vnode e4cd70f4
| OBJECT 0xe4cd70f4: locked=0, pgops=0xc0663f64, npages=0, refs=0
| 
| VNODE flags 100<XLOCK>
| mp 0xc1882200 numoutput 0 size 0xffffffffffffffff
| data 0xe6e3fb98 usecount 0 writecount 0 holdcnt 0 numoutput 0
| type VNON(0) tag VT_NFS(2) id 0xc3c7ed mount 0xc1882200 typedata 0x0
| 
| db> show object 0xe4cd70f4
| OBJECT 0xe4cd70f4: locked=0, pgops=0xc0663f64, npages=0, refs=0
| 
| db> x 0xc0663f64
| uvm_vnodeops:   0
| 
| v->v_data (nfsnode) seems to be OK. At least it points back to vnode
| v->v_data->n_vnode == 0xe4cd70f4 
| 
| db> x/m 0xe6e3fb98,40
| 0xe6e3fb98:     50dd65c0 00000000 00000000 00000000     P.e.............
| 0xe6e3fba8:     00000000 00000000 552e50c0 ffffffff     ........U.P.....
| 0xe6e3fbb8:     08000000 00000000 00000000 00000000     ................
| 0xe6e3fbc8:     00000000 00000000 00000000 00000000     ................
| 0xe6e3fbd8:     00000000 00000000 00000000 00000000     ................
| 0xe6e3fbe8:     00000000 00000000 e0e928c1 00000000     ..........(.....
| 0xe6e3fbf8:     00000000 00000000 00000000 3cfce3e6     ............<...
| 0xe6e3fc08:     804d9be6 f470cde4 00000000 00000000     .M...p..........
| 0xe6e3fc18:     00000000 00000000 00000000 00000000     ................
| 0xe6e3fc28:     00000000 00000000 00000000 00000000     ................
| 0xe6e3fc38:     20000000 346e3700 321d8700 20000000      ...4n7.2... ...
| 0xe6e3fc48:     00376e34 321d8700 3b7c0000 21411a00     .7n42...;|..!A..
| 0xe6e3fc58:     6f4f0400 00000000 00000000 00000000     oO..............
| 0xe6e3fc68:     00000000 00000000 00000000 00000000     ................
| 0xe6e3fc78:     00000000 ffffffff 00000000 00000000     ................
| 0xe6e3fc88:     00000000 00000000 00000000 00000000     ................
| 
| Here comes v->v_mount pointer and the data doesn't look good to me.
| Mount point has been freed and had type M_UVMAMAP (0x52==82).
| 
| db> x/m 0xc1882200,40
| 0xc1882200:     efbeadde 5200adde 00c688c1 efbeadde     ....R...........
| 0xc1882210:     efbeadde efbeadde efbeadde efbeadde     ................
| 0xc1882220:     08000000 09000000 0a000000 0b000000     ................
| 0xc1882230:     0c000000 0d000000 0e000000 0f000000     ................
| 0xc1882240:     10000000 11000000 12000000 13000000     ................
| 0xc1882250:     14000000 15000000 16000000 17000000     ................
| 0xc1882260:     18000000 19000000 1a000000 1b000000     ................
| 0xc1882270:     1c000000 1d000000 1e000000 1f000000     ................
| 0xc1882280:     20000000 21000000 22000000 23000000      ...!..."...#...
| 0xc1882290:     24000000 25000000 26000000 27000000     $...%...&...'...
| 0xc18822a0:     28000000 29000000 2a000000 2b000000     (...)...*...+...
| 0xc18822b0:     2c000000 2d000000 2e000000 2f000000     ,...-......./...
| 0xc18822c0:     30000000 31000000 32000000 33000000     0...1...2...3...
| 0xc18822d0:     34000000 35000000 36000000 37000000     4...5...6...7...
| 0xc18822e0:     38000000 39000000 3a000000 3b000000     8...9...:...;...
| 0xc18822f0:     3c000000 3d000000 3e000000 3f000000     <...=...>...?...
| 
| --Artem
| 
| On Mon, Sep 30, 2002 at 07:59:10PM -0400, Christos Zoulas <christos@zoulas.com> wrote:
| > On Sep 30,  3:55pm, art@riverstonenet.com (Artem Belevich) wrote:
| > -- Subject: Re: kernel panic in nfs_reclaim (kern/17107)
| > 
| > Is the rest of the vnode valid?
| > 
| > christos
| > 
| > | This was the first thing I tried. The kernel survived for a bit longer
| > | - something like 3-4 days instead of usual nightly panic attack, but
| > | finally it crashed in the same place with nmp=0xc. This suggests
| > | that vnode's vp->v_mount has already been reused for something else.
| > | 
| > | This carsh confuses me a little - if filesystem is unmounted,
| > | shouldn't all vnodes associated with it be gone? If so, then how comes
| > | this particular rogue vnode was still around? 
| > | 
| > | --Artem
| > | 
| > | 
| > -- End of excerpt from Artem Belevich
| > 
| > 
-- End of excerpt from Artem Belevich