Re: Making forced unmounts work

To: David Holland <dholland-tech%netbsd.org@localhost>
Subject: Re: Making forced unmounts work
From: J. Hannken-Illjes <hannken%eis.cs.tu-bs.de@localhost>
Date: Sun, 2 Dec 2012 17:29:01 +0100

On Dec 2, 2012, at 5:00 AM, David Holland <dholland-tech%netbsd.org@localhost> 
wrote:

> On Thu, Nov 29, 2012 at 06:19:37PM +0100, J. Hannken-Illjes wrote:
>>>> In short the attached diff:
>>>> 
>>>> - Adds a new kernel-internal errno ERESTARTVOP and changes VCALL() to
>>>> restart a vnode operation once it returns ERESTARTVOP.
>>>> 
>>>> - Changes fstrans_start() to take an optional `hint vnode' and return
>>>> ERESTARTVOP if the vnode becomes dead.
>>> 
>>> Is there any major reason we can't just use ERESTART and rerun the
>>> whole syscall?
>> 
>> Not all vnode operations come from a syscall and to me it looks cleaner
>> to use one private errno for exactly this purpose.
> 
> Could be. All those places are supposed to be capable of coping with
> ERESTART though (otherwise, they break if it happens) so it shouldn't
> make much difference. And if it does make a difference somewhere, that
> should be fixed... regardless of ERESTART for signals, we want FS
> operations to be able to bail and restart for transaction abort.

I'm convinced -- having fstrans_start() return ERESTART is the way to go.

<snip>

>>> Also I wonder if there's any way to accomplish this that doesn't
>>> require adding fstrans calls to every operation in every fs.
>> 
>> Not in a clean way. We would need some kind of reference counting for
>> vnode operations and that is quite impossible as vnode operations on
>> devices or fifos sometimes wait forever and are called from other fs
>> like ufsspec_read() for example.  How could we protect UFS updating
>> access times here?
> 
> I'm not entirely convinced of that. There are basically three
> problems: (a) new incoming threads, (b) threads that are already in
> the fs and running, and (c) threads that are already in the fs and
> that are stuck more or less permanently because something broke.
> 
> Admittedly I don't really understand how fstrans suspending works.
> Does it keep track of all the threads that are in the fs, so the (b)
> ones can be interrupted somehow, or so we at least can wait until all
> of them either leave the fs or enter fstrans somewhere and stall?

Fstrans is a recursive rw-lock. Vnode operations take the reader lock
on entry.  To suspend a fs we take the writer lock and therefore
it will catch both (a) and (b).  New threads will block on first entry,
threads already inside the fs will continue until they leave the fs
and block on next entry.

A suspended fs has the guarantee that no other thread will be inside
fstrans_suspend / fstrans_done of any vnode operation.

Threads stuck permanently as in (c) are impossible to catch.
 
> If we're going to track that information we should really do it from
> vnode_if.c, both to avoid having to modify every fs and to make sure
> all fses support it correctly. (We also need to be careful about how
> it's done to avoid causing massive lock contention; that's why such
> logic doesn't already exist.)

Some cases are nearly impossible to track at the vnode_if.c level:

- Both VOP_GETPAGES() and VOP_PUTPAGES() cannot block or sleep here
  as they run part or all of the operation with vnode interlock held.

- Accessing devices and fifos through a file system cannot be tracked
  at the vnode_if.c level.  Take ufs_spec_read() for example:

        fstrans_start(...);
        VTOI(vp)->i_flag |= IN_ACCESS;
        fstrans_done(...);
        return VOCALL(spec_vnodeop_p, ...)

  Here the VOCALL may sleep forever if the device has no data.

> If, however, fstrans isn't tracking that information, I don't see how
> suspending the fs helps deal with the (b) threads, because if they're
> currently running they can continue to chew on fs-specific data for
> arbitrarily long before they run into anything that stalls them, and
> there's no way to know when they're done or how many of them there
> are.
> 
> I don't really see what the issue with ufsspec_read() is, however, so
> we may be talking past each other.


--
J. Hannken-Illjes - hannken%eis.cs.tu-bs.de@localhost - TU Braunschweig 
(Germany)

Follow-Ups:
- Re: Making forced unmounts work
  - From: David Holland

References:
- Making forced unmounts work
  - From: J. Hannken-Illjes
- Re: Making forced unmounts work
  - From: David Holland
- Re: Making forced unmounts work
  - From: J. Hannken-Illjes
- Re: Making forced unmounts work
  - From: David Holland

Prev by Date: Disc block size (was: Problem identified: WAPL/RAIDframe performance problems)
Next by Date: Re: Broadcast traffic on vlans leaks into the parent interface on NetBSD-5.1
Previous by Thread: Re: Making forced unmounts work
Next by Thread: Re: Making forced unmounts work
Indexes:

Home | Main Index | Thread Index | Old Index