tech-kern archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
VOP_PUTPAGE ignores mount_nfs -o soft,intr
Hi
I have encountered a bug with NetBSD NFS client. Despite a mount with
-o intr,soft, we can hit situation where a process can remain hang in
kernel because the NFS server is gone.
This happens when the ioflush does its duty, with the following code path:
sync_fsync / nfs_sync / VOP_FSYNC / nfs_fsync / nfs_flush / VOP_PUTPAGES
VOP_PUTPAGES has flags = PGO_ALLPAGES|PGO_FREE. It then goes through
genfs_putpages and genfs_do_putpages, and get stuck in:
/* Wait for output to complete. */
if (!wasclean && !async && vp->v_numoutput != 0) {
while (vp->v_numoutput != 0)
cv_wait(&vp->v_cv, slock);
}
This cv_wait() is tiemout-less and uninterruptible. ioflush will
sleep there forever, holding vnode lock. Any other process doing
I/O on the filesystem will sleep in tstile waiting for the vnode
lock with this path:
sys_write / dofilewrite / vn_write / vn_lock / VOP_LOCK / rw_enter
We have another timeout-less and uninterruptible wait for the
vnode lock, which means -o intr,soft are not honoured. If the NFS
server does not come back, the only way out is reboot -n. Even
umount -f -R will get hung in tstile.
How can we fix it?
1) ioflush should not sleep forever awaiting I/O completion for
a NFS mount if it was mounted with -o soft. A PGO_SOFT
flags could be added to VOP_PUTPAGES so that cv_timedwait() is used
instead of cv_wait(), but how can we get the timeout? Should we introduce
a VOP_PUTPAGES2 with an addtionnal argument? Use a sane default? Get it
from the filesystem using a new VFS_GETTIMEOUT method? (or more general
VFS_GETMNTINFO which would be able to query different informations).
2) Honouring -o intr seems to require either the introduction of a
real nfs_lock (currently it is genfs_lock), or a change to genfs_lock.
The goal is to create an interruptible sleep for vp->v_lock. How can
this be achieved? We have no rw_(try)enter_sig, should we introduce it?
Or should we loop sleeping in an interruptible sleep retrying at
regular intervals? And how can a -o soft 's timeout could be hnoured here?
Last question: is there any hope to get this fixed in netbsd-7, or did the
VFS interface changed too much?
--
Emmanuel Dreyfus
manu%netbsd.org@localhost
Home |
Main Index |
Thread Index |
Old Index