Subject: Re: Redoing file system suspension API (update)
To: Bill Studenmund <wrstuden@netbsd.org>
From: Juergen Hannken-Illjes <hannken@eis.cs.tu-bs.de>
List: tech-kern
Date: 06/22/2006 17:55:52
On Wed, Jun 21, 2006 at 02:24:53PM -0700, Bill Studenmund wrote:
> On Wed, Jun 21, 2006 at 01:34:38PM +0200, Juergen Hannken-Illjes wrote:
> > On Wed, Jun 21, 2006 at 08:02:56PM +0900, YAMAMOTO Takashi wrote:
[snip]
> > > well, i think neither syscalls or individual VOPs are appropriate
> > > for your purpose.  what you need is the intermediate.  ie. a set of VOPs.
> 
> Yeah, this is what I'm thinking we should do.

What if we add "critical regions" to vnodes?  This could look like

    vn_hold(vp, V_WAIT);
    VOP_XXX(...);
    VOP_XXX(...);
    ...
    vn_release(vp);

or

    NDINIT(..., ... | HOLDLEAF, ...);
    if (namei(...) == 0) {
	VOP_XXX(nd->ni_vp, ...);
	VOP_XXX(nd->ni_vp, ...);
	...
	vn_release(vp);
    }

vn_hold() would wait until suspension is over and the vnode is not held,
then set a flag VHOLD and increment a hold counter in vp->v_mount.

vfs_suspend() would stop further vn_hold(), wait until the hold counter
drops to zero, suspend and then allow vn_hold() again.

For specfs/fifofs we could

    need_hold = (vp->v_flags & VHOLD);
    if (need_hold)
	vn_release(vp);
    ... device operation that may sleep long ...
    if (need_hold)
	vn_hold(vp, V_WAIT);

> > > for example,
> > > 
> > > int
> > > vn_remove(const char *path)
> > > {
> > > 
> > > 	lookup_parent(..., &dvp, ...);
> > > 
> > > 	vngate_enter(dvp->v_mount);
> > > 	lock(dvp);
> > > 	lookup_lastcomponent(dvp, &vp, ..);
> > > 	VOP_REMOVE(dvp, vp, ...);
> > > 	vngate_leave(dvp->v_mount);
> > > }
> > 
> > Why do you think "lookup_parent()" does not change file system data/metadata?
> 
> It might. If it does, then the fs has to make sure there isn't a 
> snapshotting going on while it's changing data.
> 
> The point is that it doesn't matter if it has to wait for a snapshot. You 
> could take 20 snapshots during the course of one lookup_parent() call. 
> Yeah, that's unlikely and a bit crazy, but snapshots there don't matter.
> 
> The important point is that a snapshot doesn't see us half-way through the 
> lookup_lastcomponent() call and the VOP_REMOVE().
[snip] 
> > > i thought
> > > 
> > > 	vngate_enter(PERMANENT)
> > > 	some_operations();
> > > 
> > > 	long_sleep(); /* with suspend/resume */
> > > 
> > > 	other_operations();
> > > 	vngate_leave_all();
> > > 
> > > could be
> > > 
> > > 	vngate_enter()
> > > 	some_operations();
> > > 	vngate_leave()
> > > 
> > > 	long_sleep(); /* without suspend/resume */
> > > 
> > > 	vngate_enter()
> > > 	other_operations();
> > > 	vngate_leave()
> > 
> > At least for specfs/fifofs this looks ok.
> 
> I like that too.

-- 
Juergen Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig (Germany)