tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: NFS panic



On Oct 24, 2012, at 5:44 PM, Manuel Bouyer wrote:

On Wed, Oct 24, 2012 at 04:07:34PM +0200, Manuel Bouyer wrote:
Hello,
I just got this panic on a NFS server:
uvm_fault(0xfffffe9069ecf468, 0x0, 1) -> e
fatal page fault in supervisor mode
trap type 6 code 0 rip ffffffff804bd391 cs 8 rflags 10246 cr2  c8 cpl 0 rsp fffffe817503b660
panic: trap
cpu20: Begin traceback...
printf_nolog() at netbsd:printf_nolog
startlwp() at netbsd:startlwp
alltraps() at netbsd:alltraps+0x9e
ffs_fhtovp() at netbsd:ffs_fhtovp+0x55
VFS_FHTOVP() at netbsd:VFS_FHTOVP+0x1c
nfsrv_fhtovp() at netbsd:nfsrv_fhtovp+0x9a
nfsrv_write() at netbsd:nfsrv_write+0x502
nfssvc_nfsd() at netbsd:nfssvc_nfsd+0x1ce
sys_nfssvc() at netbsd:sys_nfssvc+0x22d
syscall() at netbsd:syscall+0xc4
cpu20: End traceback...

Does it ring a bell to someone ?

I forgot to add: it does to me, I think I debugged (and fixed) something
similar in netbsd-5 ...
http://mail-index.netbsd.org/tech-kern/2009/09/04/msg006026.html
http://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=41147

I think this analysis still holds: I can't see what would prevent
vget() from returning a CLEAN vnode:
if the nfsd thread, in vn_lock(LK_EXCLUSIVE), gets preempted between
mutex_exit(vp->v_interlock) and VOP_LOCK(vp, (flags & ~~LK_RETRY));,
there is time for the vcleaner thread to start cleaning the
vnode. Then nfsd sleeps  in VOP_LOCK(), wgets woken up when the cleaner
releases the exclusive lock and wins the race with the cleaner grabing
the interlock. At this point VI_CLEAN is not set but VI_XLOCK is still
set, but we check only for VI_CLEAN. When the nfsd releases the interlock
the cleaner finish cleaning the vnode, and nfsd hits a NULL v_data in
ffs_fhtovp.

I think we should check for VI_XLOCK in addition to VI_CLEAN at the end of
vn_lock(). What do you think ?

To make it short, Manuels fix was right, me removing it and trying to get it
done in vn_lock() is wrong.

While vget() vs. cleanvnode() (the cleaner) was always free of races,
vget() vs. vrelel() is not.  Manuels fix checking the vnode state when vrelel()
got its vnode lock and before it starts inactivating and cleaning is right.

Please try the attached diff that brings back the fix.

--
J. Hannken-Illjes - hannken%eis.cs.tu-bs.de@localhost - TU Braunschweig (Germany)

Attachment: vget.diff
Description: Binary data



Home | Main Index | Thread Index | Old Index