Subject: Re: FFS reliability problems
To: der Mouse <mouse@Rodents.Montreal.QC.CA>
From: Greywolf <greywolf@starwolf.com>
List: tech-kern
Date: 06/10/2002 10:38:36
Folks, at this point, now knowing about fsdb, I don't think I really
care any more as to whether or not the option ever makes it in.  I can
always do what a Real Systems Administrator would do and run "fsck -n"
on the fs in question, go in with fsdb, up the reference count, and go
back and fsck the filesystem.

I was only thinking that having a somewhat less troublesome method
by which this could be accomplished might have been seen as worthwhile
to more than two other people.

I never advocated it being a default option; in fact, I don't advocate
that it be usable in conjunction with -p (since -p doesn't fix link
counts UPward anyway -- you have to check it manually to be sure that
this is, in fact what you want to do).

And it's quite true what is said about applications, on both sides, as
there are differing points of view.  It's up to the designer(s) of the
application in question as to how they want to handle garbage
collection.  Is it more important that the program clean up after itself
or is it more important that it preserve the data?  In most cases,
both are possible, but the designers' decision is to pick which scenario
is to apply in the case of an unfielded error (such as a system crash or
a SIGKILL -- everything else can be negotiated with a signal handler,
AFAICT (I don't count SIGSTOP here since the process will be forced to
resume to handle the next signal sent to it)).

In the case of a SIGKILL, it's nice not to have the program leaving
(potentially large) droppings in various filesystems.  In the case of
a system crash, though, following that design, all I was thinking was
that it would be nice to be able to recover the data.

There were times in the past where a system crashed in the middle
of a huge development effort (when I was at a company), and my job,
naturally, was to make sure the system came back.  So I did my share of
console-sitting, watching fsck -p to make sure everything was
hunky-dory.  It was unsettling then, too, to watch those UNREF I=NNNN...
(CLEARED) messages go by.  It was obviously nothing to be too concerned
about, as nobody complained about much except the downtime of the
system, but up until recently I wasn't clear as to why some files got
cleared while others got reconnected.

So I apologise if I've hit some nerves with my foolish notion of
an option to recover potentially valuable data.  In the day of large
sites now, it's probably not even worth considering.  Smaller shops,
if they can afford to keep a decent sysadmin around, might benefit.


				--*greywolf;
--
NetBSD... Bitchin'!