Subject: Re: Now: Fs suspension take 2
To: Bill Studenmund <wrstuden@netbsd.org>
From: Juergen Hannken-Illjes <hannken@eis.cs.tu-bs.de>
List: tech-kern
Date: 08/19/2006 17:31:58
On Fri, Aug 18, 2006 at 03:02:13PM -0700, Bill Studenmund wrote:
> On Thu, Aug 03, 2006 at 04:11:29PM +0200, Juergen Hannken-Illjes wrote:
> > On Wed, Jul 05, 2006 at 10:30:32AM -0700, Bill Studenmund wrote:
> > > On Tue, Jul 04, 2006 at 04:23:55PM +0200, Juergen Hannken-Illjes wrote:
> > > > On Mon, Jul 03, 2006 at 03:51:35PM -0700, Jason Thorpe wrote:
> > 
> > [snip]
> > 
> > > > 2) Put all gates inside the file systems as
> > > 
> > > My preference.
> > 
> > Ok, the attached diff puts transaction locks so we can suspend ffs file systems.
> 
> Thank you. This is a lot of work!
> 
> > - The fstrans implementation goes to files sys/fstrans.h and kern/vfs_trans.c.
> > 
> > - Add new VFS operation VFS_SUSPENDCTL(MP, C) to request a file system to
> >   suspend/resume.
> > 
> > - Transaction locks are used if mnt_iflag has IMNT_HAS_TRANS.  Otherwise the
> >   current (vn_start_write) are used.  With this diff IMNT_HAS_TRANS gets
> >   set for ffs type file systems if kernel option NEWVNGATE is set.
> 
> How many file systems really supported the old suspension method other 
> than ffs? Or is this coexistance more of a smoothing of switching to the 
> new method for ffs?
> 
> i.e. do we need this?

All file systems on a block device support file system external snapshots.  

We add some transaction locks into genfs_XXX and ufs_XXX operations.  They
should only lock for file systems that need it.

> > - The ufs_lock/ufs_unlock part may look strange but it is ok to ignore
> >   vnode lock requests for the thread currently suspending.
> 
> I think it's probably fine to ignore lock requests while the system is 
> suspending. However we need to make sure that locking and unlocking stay 
> on the same side of the is-suspending state.
> 
> i.e. any locks we grab while suspending MUST be released before we 
> finish suspending. And any locks grabbed before suspending MUST be 
> released after we finish. Otherwise we can leak locks.
> 
> It could be that all we have to do is document this well.

Yes.  All we do now is to run ffs_sync().  And it doesn't leak locks.

> > - I left the throttling part of my first attempt because it is only needed
> >   for multiple softdep enabled file systems on the same disk.  This problem
> >   should be solved by a congestion control inside softdep.
> 
> 
> > @@ -762,17 +817,21 @@ ufs_whiteout(void *v)
> >  	int			error;
> >  	struct ufsmount		*ump = VFSTOUFS(dvp->v_mount);
> >  
> >  	error = 0;
> > +
> >  	switch (ap->a_flags) {
> >  	case LOOKUP:
> >  		/* 4.4 format directories support whiteout operations */
> > +		fstrans_done(dvp->v_mount);
> 
> Why do we call fstrans_done() here? CREATE and DELETE _start_ 
> transactions, so I don't see where the transaction we're _done'ing starts.

It doesn't start ... Will be removed.

> >  		if (ump->um_maxsymlinklen > 0)
> >  			return (0);
> >  		return (EOPNOTSUPP);
> >  
> >  	case CREATE:
> >  		/* create a new directory whiteout */
> > +		if ((error = fstrans_start(dvp->v_mount, FSTRANS_SHARED)) != 0)
> > +			return error;
> >  #ifdef DIAGNOSTIC
> >  		if ((cnp->cn_flags & SAVENAME) == 0)
> >  			panic("ufs_whiteout: missing name");
> >  		if (ump->um_maxsymlinklen <= 0)
> > @@ -791,8 +850,10 @@ ufs_whiteout(void *v)
> >  		break;
> >  
> >  	case DELETE:
> >  		/* remove an existing directory whiteout */
> > +		if ((error = fstrans_start(dvp->v_mount, FSTRANS_SHARED)) != 0)
> > +			return error;
> >  #ifdef DIAGNOSTIC
> >  		if (ump->um_maxsymlinklen <= 0)
> >  			panic("ufs_whiteout: old format filesystem");
> >  #endif
> > @@ -807,8 +868,9 @@ ufs_whiteout(void *v)
> >  	if (cnp->cn_flags & HASBUF) {
> >  		PNBUF_PUT(cnp->cn_pnbuf);
> >  		cnp->cn_flags &= ~HASBUF;
> >  	}
> > +	fstrans_done(dvp->v_mount);
> >  	return (error);
> >  }
> 
> 
> > +/*
> > + * Return whether or not the node is locked.
> > + */
> > +int
> > +ufs_islocked(void *v)
> > +{
> > +	struct vop_islocked_args /* {
> > +		struct vnode *a_vp;
> > +	} */ *ap = v;
> > +	struct vnode *vp = ap->a_vp;
> > +
> > +	return (lockstatus(vp->v_vnlock));
> > +}
> 
> This routine looks like the genfs routine. Do we really need a separate 
> one?

I think it is cleaner to move the complete lock family (lock, unlock, islocked).

> > Index: sys/ufs/ffs/ffs_snapshot.c
> > ===================================================================
> > RCS file: /cvsroot/src/sys/ufs/ffs/ffs_snapshot.c,v
> > retrieving revision 1.30
> > diff -p -u -4 -r1.30 ffs_snapshot.c
> > --- sys/ufs/ffs/ffs_snapshot.c	7 Jun 2006 22:34:19 -0000	1.30
> > +++ sys/ufs/ffs/ffs_snapshot.c	30 Jul 2006 16:54:09 -0000
> > @@ -58,8 +58,9 @@ __KERNEL_RCSID(0, "$NetBSD: ffs_snapshot
> >  #include <sys/resource.h>
> >  #include <sys/resourcevar.h>
> >  #include <sys/vnode.h>
> >  #include <sys/kauth.h>
> > +#include <sys/fstrans.h>
> >  
> >  #include <miscfs/specfs/specdev.h>
> >  
> >  #include <ufs/ufs/quota.h>
> > @@ -284,9 +285,9 @@ ffs_snapshot(struct mount *mp, struct vn
> >  	 * All allocations are done, so we can now snapshot the system.
> >  	 *
> >  	 * Suspend operation on filesystem.
> >  	 */
> > -	if ((error = vfs_write_suspend(vp->v_mount, PUSER|PCATCH, 0)) != 0) {
> > +	if ((error = vfs_suspend(vp->v_mount, 0)) != 0) {
> >  		vn_lock(vp, LK_EXCLUSIVE | LK_RETRY);
> >  		goto out;
> >  	}
> >  	vn_lock(vp, LK_EXCLUSIVE | LK_RETRY);
> > @@ -378,25 +379,33 @@ loop:
> >  			VI_UNLOCK(xvp);
> >  			MNT_ILOCK(mp);
> >  			continue;
> >  		}
> > +#ifndef NEWVNGATE
> >  		if (vn_lock(xvp, LK_EXCLUSIVE | LK_INTERLOCK) != 0) {
> >  			MNT_ILOCK(mp);
> >  			goto loop;
> >  		}
> > +#else /* NEWVNGATE */
> > +		VI_UNLOCK(xvp);
> > +#endif /* NEWVNGATE */
> 
> VI_UNLOCK()?

ffs_snapshot.c is still near to the FreeBSD one.  On top it has:

    #define VI_UNLOCK(v)    simple_unlock(&(v)->v_interlock)

> > Index: sys/kern/vfs_subr.c
> > ===================================================================
> > RCS file: /cvsroot/src/sys/kern/vfs_subr.c,v
> > retrieving revision 1.267
> > diff -p -u -4 -r1.267 vfs_subr.c
> > --- sys/kern/vfs_subr.c	23 Jun 2006 14:13:02 -0000	1.267
> > +++ sys/kern/vfs_subr.c	30 Jul 2006 16:54:06 -0000
> 
> > @@ -1203,9 +1204,16 @@ vget(struct vnode *vp, int flags)
> >  	}
> >  #endif
> >  	if (flags & LK_TYPE_MASK) {
> >  		if ((error = vn_lock(vp, flags | LK_INTERLOCK))) {
> > -			vrele(vp);
> > +			simple_lock(&vp->v_interlock);
> > +			if (vp->v_usecount > 1) {
> > +				vp->v_usecount--;
> > +				simple_unlock(&vp->v_interlock);
> > +			} else {
> > +				simple_unlock(&vp->v_interlock);
> > +				vrele(vp);
> > +			}
> >  		}
> >  		return (error);
> >  	}
> >  	simple_unlock(&vp->v_interlock);
> 
> Why doe we need the above?

It cured a deadlock.  I'm not sure it is needed anymore.  Will test again.

[snip]
> > Index: sys/uvm/uvm_pdaemon.c
> > Index: common/lib/libc/gmon/mcount.c
> > Index: sys/arch/sparc64/dev/psm.c

These diffs are not part of the fs transactions.  I forgot to remove them.

> Take care,
> 
> Bill

-- 
Juergen Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig (Germany)