Subject: Re: Redoing file system suspension API (update)
To: YAMAMOTO Takashi <yamt@mwd.biglobe.ne.jp>
From: Juergen Hannken-Illjes <hannken@eis.cs.tu-bs.de>
List: tech-kern
Date: 06/18/2006 20:48:07
On Sun, Jun 18, 2006 at 10:59:33PM +0900, YAMAMOTO Takashi wrote:
> thanks for taking a look of this.
> although i have many negative comments, i think it's a step forward. :-)
>
>
> first of all, i tend to think filesystem snapshot thing should be done
> entirely in filesystem-dependent code.
Depends on what to expect from suspension. I expect a file system state
where system calls are the atomic operations.
> > - Instead of tracking possible long sleeps I put the suspend/resume into
> > ltsleep().
>
> why?
I want to suspend the gates before a thread goes to long sleep. Ltsleep
is the point where it happens.
> i don't think it's desirable for each subsystems to put their own
> random hooks in these places.
It is possible to put the suspend/resume around calls to device
functions (d_open, d_read etc) in spec_vnops, device functions (so_receive,
so_send etc) in fifo_vnops.c, around ttywait(), selcommon() and pollcommon().
That is what I did in my first proposal.
> what happens if a filesystem itself sleeps with PCATCH?
> (maybe you can call it a bug, but we currently have such a code.)
Yes, it is a bug. Which file system btw?
> besides, vngate_resume itself seems to sleep.
> is it safe to call it from ltsleep?
Yes, as long as we only sleep with timeout 0 and no PCATCH.
> > To solve the rest of 3) it adds a throttling on the first gate not involved
> > in a suspending file system.
>
> - isn't it normal that an operation become slow when the system has
> other activities?
Slow, yes. But in case of suspension the sync-to-disk becomes very slow.
Throttling other i/o reduces the time to suspension from > 5 minutes
to < 30 seconds on my test machine.
> - why you check P_SYSTEM?
I don't see the above problem (high i/o load) for any system process yet.
> > ** The new API is:
> >
> > Using explicit enter()/leave() pairs adds much complexity so I took another
> > approach. I use two types of gates. Normal gates need a "leave" operation.
> > Permanent gates are valid until the thread returns to user mode.
>
> while it can make your patch smaller, i think it's actually more complex
> and harder to understand and maintain.
Where is the complexity and maintenance? Threads either return to user through
userret() or we explicitly say vngate_leave_all(). It reduces complexity
because we dont need to track every gate. Using only normal gates lookup/namei
had to return two gates we have to take care of. Doing exec_XXX makes it even
more complex to handle. But of course it is doable.
> > void vngate_initialize(struct lwp *l)
> >
> > Initialize the per-thread state. Will be freed on vngate_leave_all(..., 1).
>
> please try to avoid putting subsystem-specific data to struct lwp.
If we use permanent gates we have per-thread state. Where should this state go
if not into struct lwp?
> > V_NOERROR Panic on error. No need for the caller to check the result.
>
> what's the point of this?
I like style where results are not silently ignored. Any usage of vngate_enter
without V_NOERROR and ignoring the result is a coding error.
> > Adding permanent gates to lookup(), (new) FILE_USE_GATED() and nfsrv_fhtovp()
> > covers most system calls. The various compat parts are not ready. A scan over
> > all "FILE_USE()" should be sufficient here. The implementation renames
> > "vfs_write_XXX" to "vfs_XXX" and adds the state pointer to "struct lwp".
> > All other changes depend on "option NEWVNGATE" to make it easy to test the
> > new API.
>
> why you put vngate_enter into FILE_USE, rather than VOPs?
I hope to understand right. You meant
FILE_USE(...);
if ((fp)->f_type == DTYPE_VNODE)
vngate_enter(...)
instead of
FILE_USE_GATED(...)
It is easier to read. We could add a flag to FILE_USE() instead.
If you meant putting the gates inside the VOP_XXX functions, this cannot work.
Some VOPs need to be called with simple locks so we cannot sleep here.
> YAMAMOTO Takashi
--
Juergen Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig (Germany)