Subject: Proposal: File system suspension - prerequisite for snapshots
To: None <tech-kern@netbsd.org>
From: Juergen Hannken-Illjes <hannken@eis.cs.tu-bs.de>
List: tech-kern
Date: 08/12/2003 22:45:13
I propose the support for file system suspension from FreeBSD.

The (quite simple) API:

	int
	vfs_write_suspend(struct mount *mp)

	Request a mounted file system to suspend write operations
	and leave it in a clean on-disk state. All operations are
	complete on exit.

	void
	vfs_write_resume(struct mount *mp)

	Request a suspended file system to resume write operations.

This is a needed prerequisite for file system snapshots.  It may also
help in system suspension.  File system snapshots would give us at least
safe dumps from running systems and background fsck (with softdep enabled
file systems).

The implementation would gate most file system syscalls like this:

	if ((error = vn_start_write(vp, &mp, V_WAIT | PCATCH)) != 0)
		return (error);
	do_the_write_operation
	vn_finished_write(mp);

or

restart:
	prepare_a_write_operation
	if (vn_start_write(nd.ni_dvp, &mp, V_NOWAIT) != 0) {
		abort_current_preparation
		if ((error = vn_start_write(NULL, &mp, V_XSLEEP | PCATCH)) != 0)
			return (error);
		goto restart;
	}
	do_the_write_operation
	vn_finished_write(mp);

Doing it this way guarantees that no operation sleeps with locked vnodes.

It is not possible to put this gating into the VFS_ calls as they are often
called with locked vnodes and the suspend request may deadlock.
For the same reason this gating cannot reside below the VFS_ level.

Beside most sys_ calls from kern/vfs_syscalls.c some functions like ktrace,
coredump or unp_bind need this gating.

Comments?
-- 
Juergen Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig (Germany)