Subject: Re: Redoing file system suspension API (update)
To: Jason Thorpe <thorpej@shagadelic.org>
From: Juergen Hannken-Illjes <hannken@eis.cs.tu-bs.de>
List: tech-kern
Date: 07/04/2006 16:23:55
On Mon, Jul 03, 2006 at 03:51:35PM -0700, Jason Thorpe wrote:
> 
> On Jul 2, 2006, at 2:22 AM, Juergen Hannken-Illjes wrote:
> 
> >Could we agree on the attached diff?  Compiled but not tested so far.
> 
> 
> Index: sys/kern/kern_exit.c
> ===================================================================
> RCS file: /cvsroot/src/sys/kern/kern_exit.c,v
> retrieving revision 1.156
> diff -p -u -4 -r1.156 kern_exit.c
> --- sys/kern/kern_exit.c	14 May 2006 21:15:11 -0000	1.156
> +++ sys/kern/kern_exit.c	2 Jul 2006 09:05:12 -0000
> @@ -350,8 +350,10 @@ exit1(struct lwp *l, int rv)
> 	 */
> #ifndef __NO_CPU_LWP_FREE
> 	cpu_lwp_free(l, 1);
> #endif
> +	if (l->l_vntrans)
> +		vn_trans_destroy(l);
> 	pmap_deactivate(l);
> 	/*
> 
> I'd say just always call vn_trans_destroy() and let it handle the  
> NULL case.

Ok.

> +int
> +vn_trans_lock(struct mount *mp, int flags)
> +{
> +	int error, pflags;
> +	struct vn_trans_entry *vte, *new_vte;
> +
> +	KASSERT((flags & ~(LK_SHARED|LK_EXCLUSIVE|LK_NOWAIT)) == 0);
> 
> 
> I'd say make new vn_trans-specific flags and internally translate to  
> lockmgr().  Let's not propagate the lockmgr() interface any further,  
> please :-)

Ok. -> FSTRANS_SHARED, FSTRANS_EXCL and FSTRANS_NOWAIT.

> +void
> +vn_trans_unlock(struct mount *mp)
> +{
> +	struct vn_trans_entry *vte;
> +
> +	if (mp == NULL)
> +		return;
> +	if ((mp->mnt_iflag & IMNT_HAS_TRANS) == 0)
> +		return;
> +
> +	for (vte = curlwp->l_vntrans; vte; vte = vte->vte_succ) {
> +		if (vte->vte_mount == mp) {
> +			vte->vte_count -= 1;
> +			if (vte->vte_count > 0)
> +				return;
> +			break;
> +		}
> +	}
> 
> 
> Any reason not to KASSERT() the desired desired transaction is the  
> first on the list?  Nested transactions should always be released in  
> LIFO order, yes?

They must be released in LIFO order but the list order does not depend on
the transaction order.  New items go to the first free list entry.  If all
entries are occupied, a new one gets allocated and linked as the first entry.
Also the sequence "fs_a -> fs_b -> fs_a" is not represented in the list order
because we have a counter per file system.

> I also would say the names of the functions should be changed as so:
> 
> vn_trans_destroy() -> fstrans_exit()
> 
> vn_trans_lock() -> fstrans_start()
> 
> vn_trans_unlock() -> fstrans_done()
> 
> it's really file systems you're operating on, not vnodes.
> 
> Also, I'd say put the new routines in kern/vfs_trans.c

Ok.


Next step: How will we place the initial set of gates.

1) Change the VOP_XXX operations as

    int
    VOP_XXX(args)
    {
	struct vop_xxx_args a;
   +	struct mount *mp;
   +	int error;

   +	if ((error = fstrans_start((mp = vp->v_mount), FSTRANS_SHARED)) < 0)
   +		return error;

	<setup a from args>;

   -	return ((VCALL(vp, VOFFSET(vop_xxx), &a));
   +	error = VCALL(vp, VOFFSET(vop_xxx), &a);
   +	fstrans_done(mp);
   +	return error;
    }

   Should be possible for all operations but VOP_LOCK, VOP_UNLOCK,
   VOP_GETPAGES and VOP_PUTPAGES.  These operations (may) take the
   vnode interlock on entry and therefore cannot sleep.  We have to
   place the gates for these operations inside the file systems.

2) Put all gates inside the file systems as

    int
    fsy_xxx(...)
    {
	...

	if ((error = fstrans_start(mp, FSTRANS_SHARED)) < 0)
		return error;
	...
	...
	fstrans_done(mp);
	...
    }

-- 
Juergen Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig (Germany)