tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Lost file-system story

I just did a little experiment. I installed OpenBSD 5.0 on the same
machine where I had my adventure with NetBSD. This time, I broke up
the world into separate filesystems, which OpenBSD facilitates,
mounting only /home and /tmp async, noatime. All the others were
mounted softdep,noatime. I downloaded ports.tar.gz and un-tarred it
into my home directory (I had previously un-tarred it into /usr). I
then did

rm -rf ports

which takes awhile. While that was going, I hit the power button (I
can afford to lose a filesystem containing only my home directory;
it's backed up thoroughly, because I rsync it from one machine to
another; there are current copies on several other machines). The
system did a rapid shutdown without sync'ing the filesystems.

On restart, all the softdep-mounted filesystems had no errors in fsck,
as one might expect (especially since there was no intensive
write-activity going on when I improperly shut the system down, as
there was in /home), but I got an "Unexpected inconsistency" error in
my home directory and requested a manual fsck; the system dropped into
single-user mode after the automatic fscks finished. I ran the fsck on
the filesystem that gets mounted as /home and there were a number of
files and directories that were apparently half-deleted and it asked
me one-by-one if I wanted to delete them. I did with a few, but then
used the 'F' option to do so without further interaction (I don't
believe the NetBSD fsck gave me that option; it is not documented in
the NetBSD fsck man page, while it *is* documented in the OpenBSD fsck
man page). The fsck completed and marked the filesystem clean. I
rebooted, everything mounted normally, and a check of my home
directory shows everything intact, even most of the ports directory,
whose deletion I deliberately interrupted.

The async warning in the OpenBSD mount page reads as follows:

            async   Metadata I/O to the file system should be done
                     asynchronously.  By default, only regular data is
                     read/written asynchronously.

                     This is a dangerous flag to set since it does not
                     guarantee to keep a consistent file system structure on
                     the disk.  You should not use this flag unless you are
                     prepared to recreate the file system should your system
                     crash.  The most common use of this flag is to speed up
                     restore(8) where it can give a factor of two speed

"does not guarantee to keep a consistent file system structure on the
disk" is what I expected from NetBSD. From what I've been told in this
discussion, NetBSD pretty much guarantees that if you use async and
the system crashes, you *will* lose the filesystem if there's been any
writing to it for an arbitrarily long period of time, since apparently
meta-data for async filesystems doesn't get written as a matter of
course. And then there's the matter of NetBSD fsck apparently not
really being designed to cope with the mess left on the disk after
such a crash. Please correct me if I've misinterpreted what's been
said here (there have been a few different stories told, so I'm trying
to compute the mean).

I am not telling the OpenBSD story to rub NetBSD peoples' noses in it.
I'm simply pointing out that that system appears to be an example of
ffs doing what I thought it did and what I know ext2 and journal-less
ext4 do -- do a very good job of putting the world into operating
order (without offering an impossible guarantee to do so) after a
crash when async is used, after having been told that ffs and its fsck
were not designed to do this. The reason I'm beating on this is that I
would have liked to use NetBSD for the application I have in mind, but
I need the performance improvement that async provides (my tests show
this; the tests also show that NetBSD async is about as fast as Linux,
much faster than OpenBSD async, at least for doing a lot of writing,
such as un-tarring a large tar file). This is practical if the joint
probability of the system crashing *and* losing the async filesystem
is low. My one little data point was discouraging -- the system
crashed when using a wireless card with a common chipset (atheros)
resulted in losing my network connection and then a system crash when
a restart of networking was attempted (and, I had to use the atheros
card because the system didn't pick up the built-in Cisco wireless
device, which I think is supposed to be served by the an driver). The
crash took out the filesystem, as we've been discussing.

So I'd love it if my experience encourages someone to improve NetBSD
ffs and fsck to make use of async practical, perhaps by drawing on
what OpenBSD has done. I also realize that my situation is unusual,
and with resources being scarce, there are a lot more important things
to work on, that will affect a lot more people. But I'd at least like
to get it in the queue.

/Don Allen

Home | Main Index | Thread Index | Old Index