tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

[PATCH] Fixing soft NFS umount -f, round 1



Hi

It took me some time to get it working, but here is a patch that fixes
soft NFS umount -f in NetBSD-current. 
http://ftp.espci.fr/shadow/manu/umount_f1.patch

The problem to fix is that a soft mount is supposed to be allowed to 
fail, but when the server is gone, even a umount -f cannot get rid of
the mount.  The only way out is reboot -n.

This hapened to be caused by several issues that are fixed bu the 
patch:

1) In umount(8), we called sync(2) before attempting a forced unmount(2), 
   but sync(2) does not return before data is sent to storage, and 
   therefore we never had the opportunity to attempt the forced unmount
   when using -f

2) When trying to unmount, we first try vinvalbuf() with V_SAVE in 
   order to push data to storage, but when it fails, we call it
   again without V_SAVE, to get rid of the vnode's buffer. In that
   case, we need PGO_BUSYFAIL in order to avoid being trapped in 
   genfs_gop_write(), on UVM_UNLOCK_AND_WAIT(pg, slock, 0, "genput", 0);

3) In NFS code, the 3 occurences of cv_timedwait() must always have a
   a timeout for soft mounts so that we have an opportunity to detect
   and report a failure. I chose to report EIO. ENOTCONN could be
   more explicit, but it is not an errno POSIX write(2) is supposed to
   return.

4) When unmount is in progress, prevent nfs_connect() to start a new
   connexion, otherwise we will have a thread looping in nfs_reconnect()
   with a RW_READER held on nmp->nm_writeverflock and we cannot unmount.

5) In genfs code, report VOP_STRATEGY errors to higher layers instead
   of hiding it, so that we can detect error. Display a message for the
   administrator.

6) In genfs code, make sure genfs_do_putpages() do not wait I/O completion 
   forever when it hits an error in GOP_WRITE: the write being partial,
   it will never complete, hence we should report error now.

This patch lets netbsd-current (and netsbd-7 with just the variable
catch_p renamed as catch) pass this test:

--- cut here ---
#!/bin/sh -ex

mkdir -p /nfstest/tmp
chmod 1777 /nfstest/tmp
grep '^/nfstest' /etc/exports || 
        echo "/nfstest localhost" >> /etc/exports

/etc/rc.d/rpcbind forcestart || true
/etc/rc.d/mountd forcestart || true
/etc/rc.d/nfsd forcestart || true

mount -t nfs -o rw,soft,intr,tcp,-R=2 localhost:/nfstest /mnt
dd if=/dev/zero of=/mnt/tmp/test bs=1024k &
ddpid=$!
sleep 1
/etc/rc.d/nfsd onestop || true
umount -f -R /mnt & 
umountpid=$!
ps -alp $ddpid
ps -alp $umountpid
--- cut here ---


-- 
Emmanuel Dreyfus
manu%netbsd.org@localhost


Home | Main Index | Thread Index | Old Index