Subject: Re: Redoing file system suspension API (update)
To: YAMAMOTO Takashi <yamt@mwd.biglobe.ne.jp>
From: Bill Studenmund <wrstuden@netbsd.org>
List: tech-kern
Date: 06/21/2006 14:24:53
--Sr1nOIr3CvdE5hEN
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
On Wed, Jun 21, 2006 at 01:34:38PM +0200, Juergen Hannken-Illjes wrote:
> On Wed, Jun 21, 2006 at 08:02:56PM +0900, YAMAMOTO Takashi wrote:
> > > > isn't it almost the same as VOPs? (with some exceptions, of course)
> > >=20
> > > And how would you explain this to a programmer/user?
> > > A suspended file system is in a state where VOPs are the atomic ope=
rations.
> > > Look at the kernel source what this might mean for your application.
For the case where there's one real VOP call per syscall, there is no=20
difference.
As noted below, the other cases would get special handling.
> > > I think it is a much cleaner way to use system calls as atomic operat=
ions.
> > >=20
> > > Doing it inside file systems you may also lose the "no locked vnodes"=
property.
> >=20
> > we should turn (most part of) vnode lock into filesystem internal as we=
ll. :)
> >=20
> > well, i think neither syscalls or individual VOPs are appropriate
> > for your purpose. what you need is the intermediate. ie. a set of VOP=
s.
Yeah, this is what I'm thinking we should do.
> > for example,
> >=20
> > int
> > vn_remove(const char *path)
> > {
> >=20
> > lookup_parent(..., &dvp, ...);
> >=20
> > vngate_enter(dvp->v_mount);
> > lock(dvp);
> > lookup_lastcomponent(dvp, &vp, ..);
> > VOP_REMOVE(dvp, vp, ...);
> > vngate_leave(dvp->v_mount);
> > }
>=20
> Why do you think "lookup_parent()" does not change file system data/metad=
ata?
It might. If it does, then the fs has to make sure there isn't a=20
snapshotting going on while it's changing data.
The point is that it doesn't matter if it has to wait for a snapshot. You=
=20
could take 20 snapshots during the course of one lookup_parent() call.=20
Yeah, that's unlikely and a bit crazy, but snapshots there don't matter.
The important point is that a snapshot doesn't see us half-way through the=
=20
lookup_lastcomponent() call and the VOP_REMOVE().
> What if we make lookup() gate-aware?
>=20
> - add struct mount *ni_gate, *ni_dgate to struct nameidata
> - add an option KEEPGATES to namei() so namei() either leaves
> the gates on return or keeps them if KEEPGATES is given.
>=20
> and this becomes
>=20
> NDINIT(..., KEEPGATES, ...)
> namei(&nd);
> VOP_LEASE(...)
> ...
> VOP_REMOVE(nd.ni_dvp, nd.ni_vp, ...);
> vngate_leave(nd.ni_dvp->v_mount);
> vngate_leave(nd.ni_vp->v_mount);
I don't see what this gains us. It's more complex, and feels more awkward.
> > > > - is it true even if filesystems are backed by different disks?
> > >=20
> > > Yes. My test machine has root on sd0 test1..4 on sd1. It is true for
> > > the case where the load is on root and the suspension is on test1. W=
ith
> > > softdep of course. Main problem is the softdep code is not per-mount.
> > >=20
> > > > - why does it need the special care?
> > >=20
> > > It solves a real problem now that may go away with updates to the sof=
tdep code
> > > or the introduction of a real i/o scheduler.
> >=20
> > it isn't clear to me why the suspension on filesystem A has a priority =
over
> > activities on unrelated filesystem B.
>=20
> Try it for yourself (on one disk if you need real problems)....
Yeah, that sounds like a mess, and we should do something about it.
> > i thought
> >=20
> > vngate_enter(PERMANENT)
> > some_operations();
> >=20
> > long_sleep(); /* with suspend/resume */
> >=20
> > other_operations();
> > vngate_leave_all();
> >=20
> > could be
> >=20
> > vngate_enter()
> > some_operations();
> > vngate_leave()
> >=20
> > long_sleep(); /* without suspend/resume */
> >=20
> > vngate_enter()
> > other_operations();
> > vngate_leave()
>=20
> At least for specfs/fifofs this looks ok.
I like that too.
Take care,
Bill
--Sr1nOIr3CvdE5hEN
Content-Type: application/pgp-signature
Content-Disposition: inline
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (NetBSD)
iD8DBQFEmbklWz+3JHUci9cRArJeAJ0SX4nDzOhUrevPIenhnTHtPcE4KQCghqAS
sD/q/QadW8A1W7YMcOmR1I0=
=SOhW
-----END PGP SIGNATURE-----
--Sr1nOIr3CvdE5hEN--