Subject: Re: FFS reliability problems
To: NetBSD Kernel Technical Discussion List <tech-kern@netbsd.org>
From: Greg A. Woods <woods@weird.com>
List: tech-kern
Date: 06/07/2002 13:19:22
[ On Friday, June 7, 2002 at 21:37:16 (+0700), Robert Elz wrote: ]
> Subject: Re: FFS reliability problems 
>
>     Date:        Thu,  6 Jun 2002 12:46:03 -0400 (EDT)
>     From:        woods@weird.com (Greg A. Woods)
>     Message-ID:  <20020606164603.087EAAC@proven.weird.com>
> 
>   | While
>   | obviously a file with no directory references is probably intended only
>   | for temporary data, I'm not sure fsck should make such an assumption,
> 
> Of course it can, because that's what the application is telling it.

No, that's not true _at_ _all_.

The application is assuming the system will continue running smoothly
until it does what it does with the data and closes the file itself
(perhaps by exiting, cleanly or otherwise).  Applications do this in
order to implement a trivial garbage collection algorithm -- but that
doesn't mean the data they write is garbage right from the start.

That data is recoverable.  Fsck has no business deleting it -- none
whatsoever.  Even the most juniour sysadmin can trivially clean it up
after the crash, but only if given the chance.

> If an application unlinks its temp file, the application (the app's
> author) is indicating "this temp file is trash, there's no use at all
> recovering it if the application crashes or the system does").

No, sorry, but that's flat out wrong.  That might be what you'd like
application developer to do, but that's not what happens in the real
world.  Many _many_ applications create and then _immediately_ unlink
temporary files that they will later use to shuffle data around.  They
do so to make cleanup easy, not to say "the data I write here is trash".

<sarcasm weight=super-heavy>
If the application were doing what you claim it to be doing then it
might as well just open /dev/null and write its temporary data there
instead.  Get real.
</sarcasm>

> If the application is unlinking temporary files which could be usefully
> recovered, then the application is broken, and that's where the fix
> should be applied, making fsck do dumb things to compensate is just
> plain wrong.

Nope -- I don't buy it, not for even a nanosecond.  The data is
trivially recoverable.  Fsck has no business putting it in the bit
bucket.  NONE whatsoever.  The system has no business crashing, but yet
it might.  The point of fsck is to put the filesystem into a consistent
state following a crash, _without_ loss of data.

Yes, any application using un-referenced inodes as a garbage collection
technique might be "broken", but so's the system that throws away data
in such inodes following a crash.  The system has no business whatsoever
making such policy decisions, especially since it doesn't have to.

Playing safe with user data is _never_ "dumb".

> ps: the other argument made for this "I just removed a valuable file that I
> know is open in this application, but which has no way to save the file, so
> I'm going to push RESET and then fsck will make the file come back" is just
> so ludicrous as to not be worthy of any comment at all.

Perhaps, but it is a very real-world argument too.  I've heard people
say they've done this (successfully, I might add) more times than I can
count.  I've even seen people do it right in front of me.  It's pretty
damn hard to argue with someone who's just made what would otherwise
have been the mistake of a lifetime -- if they can recover their work
then all the power to them!

-- 
								Greg A. Woods

+1 416 218-0098;  <gwoods@acm.org>;  <g.a.woods@ieee.org>;  <woods@robohack.ca>
Planix, Inc. <woods@planix.com>; VE3TCP; Secrets of the Weird <woods@weird.com>