tech-kern archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
need help ! stale vnode pointer in ufs_fhtovp
Hello,
I'm tracking down an issue where ufs_fhtovp() gets back a clean vnode from
ufs_ihashget(), so VTOI(nvp) is NULL and bad things happens.
THis is easily reproduced on a NFS server by running bonnie++ on a NFS
client, and rm'ing bonnie++'s file directly on the server. I think you can
get the same by removing the files from a second NFS client.
I think the problem is that vget() can sleep, and vclean can have
VOP_INACTIVE()/VOP_RECLAIM() the vnode before vget() returns.
I can reproduce this on a Xen DOMU as NFS server, so it's not related to
SMP or preemption. I don't know if this race can affect other users of
vget() but ufs_ihashget().
I tried this patch:
Index: ufs_ihash.c
===================================================================
RCS file: /cvsroot/src/sys/ufs/ufs/ufs_ihash.c,v
retrieving revision 1.26
diff -u -p -u -r1.26 ufs_ihash.c
--- ufs_ihash.c 5 May 2008 17:11:17 -0000 1.26
+++ ufs_ihash.c 8 Sep 2009 13:03:50 -0000
@@ -152,6 +152,14 @@ ufs_ihashget(dev_t dev, ino_t inum, int
mutex_exit(&ufs_ihash_lock);
if (vget(vp, flags | LK_INTERLOCK))
goto loop;
+ if (VTOI(vp) != ip ||
+ ip->i_number != inum || ip->i_dev != dev) {
+ /* lost race against vclean() */
+ printf("ufs_ihashget lost race vp
%p\n", vp);
+ vfs_vnode_print(vp, 1, printf);
+ vput(vp);
+ goto loop;
+ }
}
return (vp);
}
But it doens't work. I get:
ufs_ihashget lost race vp 0xffffa0000439cc18
OBJECT 0xffffa0000439cc18: locked=0, pgops=0xffffffff803ea080, npages=0, refs=1
PAGES <pg,offset>:
VNODE flags 80010<MPSAFE,CLEAN>
mp 0xffffa00003867008 numoutput 0 size 0x0 writesize 0x0
data 0x0 writecount 0 holdcnt 0
tag UNKNOWN(0) type VREG(1) mount 0xffffa00003867008 typedata 0x0
v_lock 0xffffa0000439cd20 v_vnlock 0xffffa0000439cd20
clean bufs:
dirty bufs:
Reader / writer lock error: rw_destroy: assertion failed: (rw->rw_owner &
~RW_DEBUG) == 0
lock address : 0xffffa0000439cd20 type : sleep/adaptive
initialized : 0xffffffff8032bb26
shared holds : 0 exclusive: 1
shares wanted: 0 exclusive: 0
current cpu : 0 last held: 0
current lwp : 0xffffa000045ce7e0 last held: 0xffffa000045ce7e0
last locked : 0xffffffff803299a6 unlocked : 0xffffffff8032998a
owner/count : 0xffffa000045ce7e0 flags : 0x000000000000000c
Turnstile chain at 0xffffffff80612600.
=> No active turnstile for this lock.
panic: LOCKDEBUG
fatal breakpoint trap in supervisor mode
trap type 1 code 0 rip ffffffff8037bca5 cs e030 rflags 246 cr2 7f7ffdb9f950
cpl 0 rsp ffffa00004611460
Stopped in pid 205.3 (nfsd) at netbsd:breakpoint+0x5: leave
breakpoint() at netbsd:breakpoint+0x5
panic() at netbsd:panic+0x242
lockdebug_abort1() at netbsd:lockdebug_abort1+0xd3
rw_destroy() at netbsd:rw_destroy+0x46
vnfree() at netbsd:vnfree+0x6e
vrelel() at netbsd:vrelel+0x388
ufs_ihashget() at netbsd:ufs_ihashget+0xd2
ffs_vget() at netbsd:ffs_vget+0xc1
ufs_fhtovp() at netbsd:ufs_fhtovp+0x1f
ffs_fhtovp() at netbsd:ffs_fhtovp+0x58
nfsrv_fhtovp() at netbsd:nfsrv_fhtovp+0xa8
nfsrv_write() at netbsd:nfsrv_write+0x495
nfssvc_nfsd() at netbsd:nfssvc_nfsd+0x41b
sys_nfssvc() at netbsd:sys_nfssvc+0x28f
syscall() at netbsd:syscall+0xb4
When this doesn't panic the nfsd server hangs on tstile, waiting for a lock
whose exclusive owner doens't care any more.
It looks like vput() didn't release the lock ?
Remplacing the vput() with vlockmgr(vp->v_vnlock, LK_RELEASE); vrele(vp);
works (the server doesn't panic, the client gets ESTALE),
but I don't understand why vput() doesn't work.
Also I have no idea if this is the correct fix, or if something needs to be
done in vget. Can anyone help me with this ? Having such a bug affecting
the NFS server is bad ...
--
Manuel Bouyer <bouyer%antioche.eu.org@localhost>
NetBSD: 26 ans d'experience feront toujours la difference
--
Home |
Main Index |
Thread Index |
Old Index