tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Lost file-system story



On Fri, Dec 9, 2011 at 4:33 PM, Brian Buhrow
<buhrow%lothlorien.nfbcal.org@localhost> wrote:
>        Hello.  Just for your edification, it is possible to break out of fsck
> mid-way and reinvoke it with fsck -y to get it to do the cleaning on its
> own.
>
>        With regard to your notes on speed with NetBSD versus OpenBSD,  I
> suspect the speed trade off is where the difference is.  OpenBSD is
> flushing buffers to disk more frequently than NetBSD is, and thus the
> filesystem is more complete with respect to what is on disk.

I suspect that is due to OpenBSD's lack of a unified buffer cache,
which NetBSD has. So they run out of space in the buffer cache, even
though memory devoted to (empty) page-frames is available.

 Since you
> readily admit that you are a rare case, might I suggest that there may be
> an easy way for you to have your cake and eat it too. That is, get the
> speed and performance of NetBSD with the relative reliability, which may
> have been luck -- I'm not sure, with OpenBSD.  You could write yourself a
> little program, or find an old version of update(8) from old source trees,
> which runs as a daemon and calls sync(2) every n seconds where n is what
> ever comfort level you deem appropriate.  I believe that when you call
> sync(2), even async mounted filesystem data is flushed.  With that program
> running, I'd be interested in having you retry your experiment with NetBSD
> and see if your results differ.

If I can find the time, I'll do that.

>
> -Brian
> On Dec 9,  3:50pm, Donald Allen wrote:
> } Subject: Re: Lost file-system story
> } I just did a little experiment. I installed OpenBSD 5.0 on the same
> } machine where I had my adventure with NetBSD. This time, I broke up
> } the world into separate filesystems, which OpenBSD facilitates,
> } mounting only /home and /tmp async, noatime. All the others were
> } mounted softdep,noatime. I downloaded ports.tar.gz and un-tarred it
> } into my home directory (I had previously un-tarred it into /usr). I
> } then did
> }
> } rm -rf ports
> }
> } which takes awhile. While that was going, I hit the power button (I
> } can afford to lose a filesystem containing only my home directory;
> } it's backed up thoroughly, because I rsync it from one machine to
> } another; there are current copies on several other machines). The
> } system did a rapid shutdown without sync'ing the filesystems.
> }
> } On restart, all the softdep-mounted filesystems had no errors in fsck,
> } as one might expect (especially since there was no intensive
> } write-activity going on when I improperly shut the system down, as
> } there was in /home), but I got an "Unexpected inconsistency" error in
> } my home directory and requested a manual fsck; the system dropped into
> } single-user mode after the automatic fscks finished. I ran the fsck on
> } the filesystem that gets mounted as /home and there were a number of
> } files and directories that were apparently half-deleted and it asked
> } me one-by-one if I wanted to delete them. I did with a few, but then
> } used the 'F' option to do so without further interaction (I don't
> } believe the NetBSD fsck gave me that option; it is not documented in
> } the NetBSD fsck man page, while it *is* documented in the OpenBSD fsck
> } man page). The fsck completed and marked the filesystem clean. I
> } rebooted, everything mounted normally, and a check of my home
> } directory shows everything intact, even most of the ports directory,
> } whose deletion I deliberately interrupted.
> }
> } The async warning in the OpenBSD mount page reads as follows:
> }
> }             async   Metadata I/O to the file system should be done
> }                      asynchronously.  By default, only regular data is
> }                      read/written asynchronously.
> }
> }                      This is a dangerous flag to set since it does not
> }                      guarantee to keep a consistent file system structure on
> }                      the disk.  You should not use this flag unless you are
> }                      prepared to recreate the file system should your system
> }                      crash.  The most common use of this flag is to speed up
> }                      restore(8) where it can give a factor of two speed
> }                      increase.
> }
> } "does not guarantee to keep a consistent file system structure on the
> } disk" is what I expected from NetBSD. From what I've been told in this
> } discussion, NetBSD pretty much guarantees that if you use async and
> } the system crashes, you *will* lose the filesystem if there's been any
> } writing to it for an arbitrarily long period of time, since apparently
> } meta-data for async filesystems doesn't get written as a matter of
> } course. And then there's the matter of NetBSD fsck apparently not
> } really being designed to cope with the mess left on the disk after
> } such a crash. Please correct me if I've misinterpreted what's been
> } said here (there have been a few different stories told, so I'm trying
> } to compute the mean).
> }
> } I am not telling the OpenBSD story to rub NetBSD peoples' noses in it.
> } I'm simply pointing out that that system appears to be an example of
> } ffs doing what I thought it did and what I know ext2 and journal-less
> } ext4 do -- do a very good job of putting the world into operating
> } order (without offering an impossible guarantee to do so) after a
> } crash when async is used, after having been told that ffs and its fsck
> } were not designed to do this. The reason I'm beating on this is that I
> } would have liked to use NetBSD for the application I have in mind, but
> } I need the performance improvement that async provides (my tests show
> } this; the tests also show that NetBSD async is about as fast as Linux,
> } much faster than OpenBSD async, at least for doing a lot of writing,
> } such as un-tarring a large tar file). This is practical if the joint
> } probability of the system crashing *and* losing the async filesystem
> } is low. My one little data point was discouraging -- the system
> } crashed when using a wireless card with a common chipset (atheros)
> } resulted in losing my network connection and then a system crash when
> } a restart of networking was attempted (and, I had to use the atheros
> } card because the system didn't pick up the built-in Cisco wireless
> } device, which I think is supposed to be served by the an driver). The
> } crash took out the filesystem, as we've been discussing.
> }
> } So I'd love it if my experience encourages someone to improve NetBSD
> } ffs and fsck to make use of async practical, perhaps by drawing on
> } what OpenBSD has done. I also realize that my situation is unusual,
> } and with resources being scarce, there are a lot more important things
> } to work on, that will affect a lot more people. But I'd at least like
> } to get it in the queue.
> }
> } /Don Allen
>>-- End of excerpt from Donald Allen
>
>


Home | Main Index | Thread Index | Old Index