Subject: Re: Redoing file system suspension API (update)
To: None <hannken@eis.cs.tu-bs.de>
From: YAMAMOTO Takashi <yamt@mwd.biglobe.ne.jp>
List: tech-kern
Date: 06/21/2006 20:02:56
> > > > first of all, i tend to think filesystem snapshot thing should be done
> > > > entirely in filesystem-dependent code.
> > >
> > > Depends on what to expect from suspension. I expect a file system state
> > > where system calls are the atomic operations.
> >
> > isn't it almost the same as VOPs? (with some exceptions, of course)
>
> And how would you explain this to a programmer/user?
> A suspended file system is in a state where VOPs are the atomic operations.
> Look at the kernel source what this might mean for your application.
>
> I think it is a much cleaner way to use system calls as atomic operations.
>
> Doing it inside file systems you may also lose the "no locked vnodes" property.
we should turn (most part of) vnode lock into filesystem internal as well. :)
well, i think neither syscalls or individual VOPs are appropriate
for your purpose. what you need is the intermediate. ie. a set of VOPs.
for example,
int
vn_remove(const char *path)
{
lookup_parent(..., &dvp, ...);
vngate_enter(dvp->v_mount);
lock(dvp);
lookup_lastcomponent(dvp, &vp, ..);
VOP_REMOVE(dvp, vp, ...);
vngate_leave(dvp->v_mount);
}
> > > > i don't think it's desirable for each subsystems to put their own
> > > > random hooks in these places.
> > >
> > > It is possible to put the suspend/resume around calls to device
> > > functions (d_open, d_read etc) in spec_vnops, device functions (so_receive,
> > > so_send etc) in fifo_vnops.c, around ttywait(), selcommon() and pollcommon().
> > > That is what I did in my first proposal.
> >
> > i don't think this suspend/resume is a good idea at all.
>
> We will need it for a file system external implementation. We cannot ignore
> gating for VCHR/VBLK vnodes as they may change meta data. ffs_specop already
> does this. And they might go to long sleep holding a suspension for possibly
> infinite time.
i think you can call vngate_leave in eg. ufsspec_read.
yes, in this case, the caller need to ensure that it "holds"
exactly one vngate_enter. i don't think it's so bad.
> > > > > To solve the rest of 3) it adds a throttling on the first gate not involved
> > > > > in a suspending file system.
> > > >
> > > > - isn't it normal that an operation become slow when the system has
> > > > other activities?
> > >
> > > Slow, yes. But in case of suspension the sync-to-disk becomes very slow.
> > > Throttling other i/o reduces the time to suspension from > 5 minutes
> > > to < 30 seconds on my test machine.
> >
> > - is it true even if filesystems are backed by different disks?
>
> Yes. My test machine has root on sd0 test1..4 on sd1. It is true for
> the case where the load is on root and the suspension is on test1. With
> softdep of course. Main problem is the softdep code is not per-mount.
>
> > - why does it need the special care?
>
> It solves a real problem now that may go away with updates to the softdep code
> or the introduction of a real i/o scheduler.
it isn't clear to me why the suspension on filesystem A has a priority over
activities on unrelated filesystem B.
> > > > please try to avoid putting subsystem-specific data to struct lwp.
> > >
> > > If we use permanent gates we have per-thread state. Where should this state go
> > > if not into struct lwp?
> >
> > i meant permanent gate is a bad idea.
>
> Non-permanent gates have the same problem. We must take care of long sleeps.
can you explain?
i thought
vngate_enter(PERMANENT)
some_operations();
long_sleep(); /* with suspend/resume */
other_operations();
vngate_leave_all();
could be
vngate_enter()
some_operations();
vngate_leave()
long_sleep(); /* without suspend/resume */
vngate_enter()
other_operations();
vngate_leave()
YAMAMOTO Takashi