tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Making forced unmounts work



On Thu, Nov 29, 2012 at 06:19:37PM +0100, J. Hannken-Illjes wrote:
 > >> In short the attached diff:
 > >> 
 > >> - Adds a new kernel-internal errno ERESTARTVOP and changes VCALL() to
 > >>  restart a vnode operation once it returns ERESTARTVOP.
 > >> 
 > >> - Changes fstrans_start() to take an optional `hint vnode' and return
 > >>  ERESTARTVOP if the vnode becomes dead.
 > > 
 > > Is there any major reason we can't just use ERESTART and rerun the
 > > whole syscall?
 > 
 > Not all vnode operations come from a syscall and to me it looks cleaner
 > to use one private errno for exactly this purpose.

Could be. All those places are supposed to be capable of coping with
ERESTART though (otherwise, they break if it happens) so it shouldn't
make much difference. And if it does make a difference somewhere, that
should be fixed... regardless of ERESTART for signals, we want FS
operations to be able to bail and restart for transaction abort.

 > > I see there are two references to ERESTARTVOP in genfs_io.c, and I
 > > don't see what they're for without digging deeper, but given that they
 > > appear to make locking behavior depend on the error condition maybe it
 > > would be better not to do that too. :-/
 > 
 > This is the wonderful world of VOP_GETPAGES() and VOP_PUTPAGES().  Both
 > are called with vnode interlock held and when it is needed and possible
 > to check the vnode the interlock has been released.  When these operations
 > return ERESTARTVOP we have to lock the interlock because dead_getpages()
 > and dead_putpages need it on entry (just to release it).
 > 
 > It is possible to directly return the error from genfs_XXXpages() though.
 > To me it looks clearer to always go the ERESTARTVOP route.

Ugh.

I don't like having the locking behavior be different for different
error cases; it's asking for trouble in the long run. I think this
ends up being a reason to use ERESTART instead.

 > > Also I wonder if there's any way to accomplish this that doesn't
 > > require adding fstrans calls to every operation in every fs.
 > 
 > Not in a clean way. We would need some kind of reference counting for
 > vnode operations and that is quite impossible as vnode operations on
 > devices or fifos sometimes wait forever and are called from other fs
 > like ufsspec_read() for example.  How could we protect UFS updating
 > access times here?

I'm not entirely convinced of that. There are basically three
problems: (a) new incoming threads, (b) threads that are already in
the fs and running, and (c) threads that are already in the fs and
that are stuck more or less permanently because something broke.

Admittedly I don't really understand how fstrans suspending works.
Does it keep track of all the threads that are in the fs, so the (b)
ones can be interrupted somehow, or so we at least can wait until all
of them either leave the fs or enter fstrans somewhere and stall?

If we're going to track that information we should really do it from
vnode_if.c, both to avoid having to modify every fs and to make sure
all fses support it correctly. (We also need to be careful about how
it's done to avoid causing massive lock contention; that's why such
logic doesn't already exist.)

If, however, fstrans isn't tracking that information, I don't see how
suspending the fs helps deal with the (b) threads, because if they're
currently running they can continue to chew on fs-specific data for
arbitrarily long before they run into anything that stalls them, and
there's no way to know when they're done or how many of them there
are.

I don't really see what the issue with ufsspec_read() is, however, so
we may be talking past each other.

-- 
David A. Holland
dholland%netbsd.org@localhost


Home | Main Index | Thread Index | Old Index