tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Respawn crashed PUFFS filesystems?



On Sun, Feb 12, 2012 at 07:12:25AM +0100, Emmanuel Dreyfus wrote:
 > One of the benefits of userland filesystems is that a bug in a
 > filesystem will just crash the filesystem, not the whole kernel. But a
 > crashed filesystem causes an unmount, and leaves the system non fully
 > functionnal.
 > 
 > I thought that we could respawn a crashed userland filesystem, lookup
 > all active vnodes again, and redo all operations failed at crash time.
 > That way a crashed filesystem would  just cause a delay in ongoing
 > operations, but it would not even cause a failure. Does it makes sense?

Sure it makes sense. Like the notion that microkernel-based systems
can respawn broken server processes, it's a great idea in theory and
easy to talk about in the abstract, but difficult-to-impossible to
make actually *work*.

If you're serious about this a good place to start reading is the UW
paper on recovering/restarting device drivers, which won best paper at
the 2008 (?) OSDI. (Implementing their shadow driver scheme for NetBSD
would also be worthwhile.)

At least one of the things you've forgotten right up front is that if
a filesystem server process tips over, the first thing you need to do
is run fsck... and be prepared to cope with fsck failing.

Also note that in the context of glusterfs and similar tools it's
probably more desirable to have a node be reliably fail-stop than to
have it attempt to restart itself and end up half-working.

-- 
David A. Holland
dholland%netbsd.org@localhost


Home | Main Index | Thread Index | Old Index