tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Making forced unmounts work

On Tue, Dec 04, 2012 at 05:31:59PM +0100, J. Hannken-Illjes wrote:
 > >> A suspended fs has the guarantee that no other thread will be inside
 > >> fstrans_suspend / fstrans_done of any vnode operation.
 > >> 
 > >> Threads stuck permanently as in (c) are impossible to catch.
 > > 
 > > ...doesn't that mean the (c) threads are going to be holding read
 > > locks, so the suspend will hang forever?
 > Yes.
 > > We have to assume there will be at least one such thread, as people
 > > don't generally attempt umount -f unless something's wedged.
 > If forced unmounts have to work with vnode operations that will not
 > leave a fs the only solution will be to disable forced unmounts and
 > leave reboot as the last option.  How could we safely determine if
 > a vnode operation just takes some time or is stuck forever?  What should
 > we do with threads that are stuck forever?

You can't. That's why umount -f is currently unsafe. We can probably
make it more safe (and as just pointed out elsewhere, linux-style lazy
unmounts are safe and can often be used as a substitute) but making it
fully safe will be difficult.

The basic use case for umount -f is when you have processes hanging
uninterruptibly waiting for a hard-mounted NFS server that's gone away
and won't be coming back anytime soon.

The other common case is when you accidentally ejected a removable
device without unmounting it, and didn't notice until reinserting it
isn't an option. Usually what causes you to notice is accessing the
device and failing, and that often results in a stuck process.

If there isn't a process hanging, generally you can clear away any
processes holding the volume busy and do an ordinary umount.

The third case I guess is when vnode refcount bugs have left you with
a "busy" fs that nothing's actually using; in this case there won't be
processes hanging; but by the time you've confirmed this, there also
won't be processes using the fs, so the current scheme is more or less
safe enough...

 > Looks like this thread is dead.  No one beside David Holland is
 > interested and David objects.  I take back my proposal.

I think fstrans may not be the mechanism we want, but I don't think we
should give up.

 > >> - Both VOP_GETPAGES() and VOP_PUTPAGES() cannot block or sleep here
 > >>  as they run part or all of the operation with vnode interlock held.
 > > 
 > > Ugh. But this doesn't, for example, preclude atomically incrementing a
 > > per-cpu counter or setting a flag in the lwp structure.
 > Sure, but we need a point in time where it is safe to mark a vnode dead
 > and that means to reclaim its inode or other fs-private state.  This is
 > not just accounting -- we must be sure no thread is using fs-private data
 > and I don't see a way without suspending vnode operations.

The way to do this is to divide the "dead" operation into two parts:
first change the ops vector (or set a dead flag) so no new threads can
enter, then when the threads already in the fs are gone or otherwise
neutralized, delete the private data.

Or we could just leave the private data. One of the advantages of
setting a dead flag (and testing it in vnode_if.c) instead of changing
the ops vector is that the vnode doesn't lose its attachment to the fs
and can be cleaned up properly later if it ever gets released.

As for "neutralized", one possible scheme is to mark the vnodes dead,
wait for a few seconds, and then do something evil to the blocked
threads still using the fs; that requires knowing which threads they
are, which is the hard part. Any simple scheme will end up being a
hotspot for cache contention.

See the "kicking everybody out of the softc" thread from a couple
years ago for one possible approach to that.

 > >> - Accessing devices and fifos through a file system cannot be tracked
 > >>  at the vnode_if.c level.  Take ufs_spec_read() for example:
 > >> 
 > >>   fstrans_start(...);
 > >>   VTOI(vp)->i_flag |= IN_ACCESS;
 > >>   fstrans_done(...);
 > >>   return VOCALL(spec_vnodeop_p, ...)
 > >> 
 > >>  Here the VOCALL may sleep forever if the device has no data.
 > > 
 > > This I don't understand. To the extent it's "in" the fs while it's
 > > doing this, vnode_if.c can know that, because it called ufs_spec_read
 > > and ufs_spec_read hasn't returned yet. To the extent that it's in
 > > specfs, specfs won't itself ever be umount -f'd, since it isn't
 > > mounted, so there's no danger.
 > Ok, we had to use VOCALL rather than vnode_if.c and add the accounting
 > here.

No, I don't see why. vnode_if.c calls ufs_spec_read(); until
ufs_spec_read() returns, as far as tracking done in vnode_if.c is
concerned, the thread doing it is still in the fs.

 > > If umount -f currently blats over the data that specfs uses, then it's
 > > currently not safe, but I don't see how it's different from any other
 > > similar case with ufs data.
 > > 
 > > I expect I'm missing something.
 > So another approach is to serialize vclean(), at least when cleaning
 > active vnodes, and have VOCALL() take care of the vnode currently being
 > cleaned.  We would need just per-thread state then.  This would still
 > not work for vnode operations stuck in the fs.
 > Is this the direction you have in mind?

No, because I don't understand. :(

David A. Holland

Home | Main Index | Thread Index | Old Index