Subject: Re: Proposal: File system suspension - prerequisite for snapshots
To: None <tech-kern@netbsd.org>
From: Juergen Hannken-Illjes <hannken@eis.cs.tu-bs.de>
List: tech-kern
Date: 08/13/2003 23:30:19
On Wed, Aug 13, 2003 at 12:02:01PM -0700, Bill Studenmund wrote:
> On Tue, 12 Aug 2003, Juergen Hannken-Illjes wrote:
> 
> > I propose the support for file system suspension from FreeBSD.
> >
> > The (quite simple) API:
> >
> > 	int
> > 	vfs_write_suspend(struct mount *mp)
> >
> > 	Request a mounted file system to suspend write operations
> > 	and leave it in a clean on-disk state. All operations are
> > 	complete on exit.
> >
> > 	void
> > 	vfs_write_resume(struct mount *mp)
> >
> > 	Request a suspended file system to resume write operations.
> >
> > This is a needed prerequisite for file system snapshots.  It may also
> > help in system suspension.  File system snapshots would give us at least
> > safe dumps from running systems and background fsck (with softdep enabled
> > file systems).
> >
> > The implementation would gate most file system syscalls like this:
> >
> > 	if ((error = vn_start_write(vp, &mp, V_WAIT | PCATCH)) != 0)
> > 		return (error);
> > 	do_the_write_operation
> > 	vn_finished_write(mp);
> >
> > or
> >
> > restart:
> > 	prepare_a_write_operation
> > 	if (vn_start_write(nd.ni_dvp, &mp, V_NOWAIT) != 0) {
> > 		abort_current_preparation
> > 		if ((error = vn_start_write(NULL, &mp, V_XSLEEP | PCATCH)) != 0)
> > 			return (error);
> > 		goto restart;
> > 	}
> > 	do_the_write_operation
> > 	vn_finished_write(mp);
> >
> > Doing it this way guarantees that no operation sleeps with locked vnodes.
> 
> Note: don't you end up calling vn_start_write() _after_ you've been told
> it's ok to start? :-) If you _want_ that to be part of the interface, we
> need to document it. For one, it prevents a simple reference counting
> mechanism to determine if writes are in progress.

vn_start_write(..., V_NOWAIT) will return zero on success.  If the result
is non-zero it means it's NOT ok to start.  So we have to abort and wait
until it is ok (the second vn_start_write(..., V_XSLEEP | PCATCH)) and
restart the syscall.

> > It is not possible to put this gating into the VFS_ calls as they are often
> > called with locked vnodes and the suspend request may deadlock.
> > For the same reason this gating cannot reside below the VFS_ level.
> 
> I think I don't like it, but I believe you're right that there are issues
> with doing it at the VOP level with the locked vnodes. i.e. this way may
> well be the best in the long run, even if I don't like it. :-)
> 
> Note: you _can_ do it at the VOP_level, it would just mean having a
> routine or routines that would unlock the node, sleep, then re-lock and
> move on. But it's probably cleaner to do as you suggest and just make sure
> it's ok to do the write before starting it.
> 
> Take care,
> 
> Bill

-- 
Juergen Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig (Germany)