Subject: Re: Redoing file system suspension API (update)
To: YAMAMOTO Takashi <firstname.lastname@example.org>
From: Juergen Hannken-Illjes <email@example.com>
Date: 06/21/2006 13:34:38
On Wed, Jun 21, 2006 at 08:02:56PM +0900, YAMAMOTO Takashi wrote:
> > > > > first of all, i tend to think filesystem snapshot thing should be done
> > > > > entirely in filesystem-dependent code.
> > > >
> > > > Depends on what to expect from suspension. I expect a file system state
> > > > where system calls are the atomic operations.
> > >
> > > isn't it almost the same as VOPs? (with some exceptions, of course)
> > And how would you explain this to a programmer/user?
> > A suspended file system is in a state where VOPs are the atomic operations.
> > Look at the kernel source what this might mean for your application.
> > I think it is a much cleaner way to use system calls as atomic operations.
> > Doing it inside file systems you may also lose the "no locked vnodes" property.
> we should turn (most part of) vnode lock into filesystem internal as well. :)
> well, i think neither syscalls or individual VOPs are appropriate
> for your purpose. what you need is the intermediate. ie. a set of VOPs.
> for example,
> vn_remove(const char *path)
> lookup_parent(..., &dvp, ...);
> lookup_lastcomponent(dvp, &vp, ..);
> VOP_REMOVE(dvp, vp, ...);
Why do you think "lookup_parent()" does not change file system data/metadata?
What if we make lookup() gate-aware?
- add struct mount *ni_gate, *ni_dgate to struct nameidata
- add an option KEEPGATES to namei() so namei() either leaves
the gates on return or keeps them if KEEPGATES is given.
and this becomes
NDINIT(..., KEEPGATES, ...)
VOP_REMOVE(nd.ni_dvp, nd.ni_vp, ...);
> > > > > i don't think it's desirable for each subsystems to put their own
> > > > > random hooks in these places.
> > > >
> > > > It is possible to put the suspend/resume around calls to device
> > > > functions (d_open, d_read etc) in spec_vnops, device functions (so_receive,
> > > > so_send etc) in fifo_vnops.c, around ttywait(), selcommon() and pollcommon().
> > > > That is what I did in my first proposal.
> > >
> > > i don't think this suspend/resume is a good idea at all.
> > We will need it for a file system external implementation. We cannot ignore
> > gating for VCHR/VBLK vnodes as they may change meta data. ffs_specop already
> > does this. And they might go to long sleep holding a suspension for possibly
> > infinite time.
> i think you can call vngate_leave in eg. ufsspec_read.
> yes, in this case, the caller need to ensure that it "holds"
> exactly one vngate_enter. i don't think it's so bad.
> > > > > > To solve the rest of 3) it adds a throttling on the first gate not involved
> > > > > > in a suspending file system.
> > > > >
> > > > > - isn't it normal that an operation become slow when the system has
> > > > > other activities?
> > > >
> > > > Slow, yes. But in case of suspension the sync-to-disk becomes very slow.
> > > > Throttling other i/o reduces the time to suspension from > 5 minutes
> > > > to < 30 seconds on my test machine.
> > >
> > > - is it true even if filesystems are backed by different disks?
> > Yes. My test machine has root on sd0 test1..4 on sd1. It is true for
> > the case where the load is on root and the suspension is on test1. With
> > softdep of course. Main problem is the softdep code is not per-mount.
> > > - why does it need the special care?
> > It solves a real problem now that may go away with updates to the softdep code
> > or the introduction of a real i/o scheduler.
> it isn't clear to me why the suspension on filesystem A has a priority over
> activities on unrelated filesystem B.
Try it for yourself (on one disk if you need real problems)....
> > > > > please try to avoid putting subsystem-specific data to struct lwp.
> > > >
> > > > If we use permanent gates we have per-thread state. Where should this state go
> > > > if not into struct lwp?
> > >
> > > i meant permanent gate is a bad idea.
> > Non-permanent gates have the same problem. We must take care of long sleeps.
> can you explain?
> i thought
> long_sleep(); /* with suspend/resume */
> could be
> long_sleep(); /* without suspend/resume */
At least for specfs/fifofs this looks ok.
> YAMAMOTO Takashi
Juergen Hannken-Illjes - firstname.lastname@example.org - TU Braunschweig (Germany)