Re: [PATCH] Fixing soft NFS umount -f, round 3

To: Emmanuel Dreyfus <manu%netbsd.org@localhost>
Subject: Re: [PATCH] Fixing soft NFS umount -f, round 3
From: Chuck Silvers <chuq%chuq.com@localhost>
Date: Fri, 3 Jul 2015 09:59:40 -0700

On Sun, Jun 28, 2015 at 07:14:59PM +0200, Emmanuel Dreyfus wrote:
> Third attempt at fixing forced unmount for soft NFS mounts:
> http://ftp.espci.fr/shadow/manu/umount_f3.patch

what's the reason for hardcoding the new timeouts to 2 seconds?
there's a "-t" mount option to specify a timeout duration.

I'm concerned that you're making soft mounts give up too easily.
the NFS client is supposed to retry some number of times before giving up,
but all of your new timeouts give up after a single timeout interval elapses.
the manpage says:

     -s      A soft mount, which implies that file system calls will fail
             after _retrans_ round trip timeout intervals.

> That patch fixes force unmount of NFS soft mounts, but I still have a
> few areas of concerns
> 
> 1) When mounting with -o intr, we have situation where force unmount
> will not work: if ioflush is waiting on an unresponsive server, we
> cannot interrupt it as it is a kernel thread
> 
> A possible workaround would be to set a non null slptimeo when
> nmp->nm_flag & NFSMNT_INTR just like we do if nmp->nm_flag & NFSMNT_SOFT
> in this patch. That would ensuire ioflush would not wait forever.
> Opinions?

for a hard mount, the client is supposed to wait indefinitely,
so having operations fail after a timeout interval isn't really right.

there's not really a good way to fully support "intr" mode anymore without
making a mess of UVM and genfs.  we can add probably add support for
forced unmount of an NFS hard mount without too much fuss:
if the NFS retry code sees that we've started trying to do a forced umount
then it can just stop retrying and fail all the pending operations.
it looks like there would be some changes needed in sys_unmount()
to avoid hanging in namei() while finding the mount structure,
but otherwise this should be simple changes in the NFS code.

anyway, I don't want to derail the fixes for soft mounts with debate
about intr mounts.  let's finish fixing soft mounts first.

> 2) When umounting a soft mount, running the mount(8) command to see what
> is going on will hang until unmount completes. This is because we go
> through sys_getvfsstat/do_sys_getvfsstat/vfs_busy and wait for
> mp->mnt_unmounting. It can be long until we get a timeout. We come there
> with ST_NOWAIT, it would be nice to avoid waiting. 
> 
> This could be fixed somehow like this. It is probably broken, and fails
> to print anything about the unmounting filesystem, but perhaps we can
> improve on it?
> 
> Index: vfs_syscalls.c
> ===================================================================
> RCS file: /cvsroot/src/sys/kern/vfs_syscalls.c,v
> retrieving revision 1.499
> diff -u -4 -r1.499 vfs_syscalls.c
> --- vfs_syscalls.c      12 Jun 2015 19:06:32 -0000      1.499
> +++ vfs_syscalls.c      28 Jun 2015 17:06:31 -0000
> @@ -1246,8 +1246,12 @@
>         maxcount = bufsize / entry_sz;
>         mutex_enter(&mountlist_lock);
>         count = 0;
>         for (mp = TAILQ_FIRST(&mountlist); mp != NULL; mp = nmp) {
> +               if (flags & ST_NOWAIT && mp->mnt_iflag & IMNT_UNMOUNT) {
> +                       nmp = TAILQ_NEXT(mp, mnt_list);
> +                       continue;
> +               }
>                 if (vfs_busy(mp, &nmp)) {
>                         continue;
>                 }
>                 if (sfsp && count < maxcount) {

that seems like it's trading annoyance for confusion and different annoyance.
for NFS, the mountpoint would still be tied up, so an attempt to mount
the fs again on the same mount point (which would be a natural thing
if one is attempting to fix the client when the server is misbehaving)
would still hang for a long time.  that will be very confusing if
we report that nothing is mounted there.  for disk-based file systems,
the same would be true, plus mounting the same fs on a different mount point
would fail because we would still reject that until the unmount really
completes.  I don't think this is a good approach.

a better alternative would be to rearrange the getvfsstat() path to return
the cached statvfs data when the fs is unmounting, without waiting for
the mnt_unmounting lock like vfs_busy() does.  I haven't thought through
all the details of how that would work but it seems doable.

-Chuck

Follow-Ups:
- Re: [PATCH] Fixing soft NFS umount -f, round 3
  - From: Emmanuel Dreyfus
- Re: [PATCH] Fixing soft NFS umount -f, round 3
  - From: David Holland

References:
- [PATCH] Fixing soft NFS umount -f, round 2
  - From: Emmanuel Dreyfus
- [PATCH] Fixing soft NFS umount -f, round 3
  - From: Emmanuel Dreyfus

Prev by Date: [Was USB; is] dkwedge_add(), 6.1.5/amd64: freezes when 2 umass connected
Next by Date: Re: [PATCH] Fixing soft NFS umount -f, round 1
Previous by Thread: [PATCH] Fixing soft NFS umount -f, round 3
Next by Thread: Re: [PATCH] Fixing soft NFS umount -f, round 3
Indexes:

Home | Main Index | Thread Index | Old Index