Subject: Re: FFS reliability problems
To: NetBSD Kernel Technical Discussion List <tech-kern@netbsd.org>
From: Greg A. Woods <woods@weird.com>
List: tech-kern
Date: 06/14/2002 14:15:27
[ On Wednesday, June 12, 2002 at 23:44:56 (-0700), Greywolf wrote: ]
> Subject: Re:  FFS reliability problems
>
> I've never had unlink()ing problems, although I was rather fascinated
> when I compiled an a.out that was running, and got the message:
> 
> a.out:  Text file busy

Indeed -- it would be a bad thing to modify the pages of a program image
file while it is running and is thus a candidate for text section
paging.  (paging the text section of a running program from its original
binary file image is more efficient than writing (and then reading) the
same pages to the swap disks)

If I'm not mistaken there was a bug once upon a time which caused this
protection to fail and all sorts of corruption and general havoc
resulted when people did "make install" in /usr/src (or some
sub-directory) or untar'ed new sets on a running system without using
pax, or GNU Tar's --unlink).


> I'd thought that maybe unlink() should fail on a running program, but
> why?

With a running program it's currently possible to treat the [final,
i.e. refcount==1] unlink() of its binary image file in exactly the same
way as an unlink() of an open file is treated -- i.e. treat the running
program's on-disk image as if it is an open file and keep the inode and
data blocks allocated but reduce the directory reference count to zero.
That way the inode for the program image file remains and all the
storage remains allocated and the text pager can continue using the
image file as backing store for the in-core process, and so it's
(currently) not necessary for unlink() to fail on a running program.

However as I've already said there's no reason why these semantics can't
be retained (for binary program image files) even though unlink() of an
open file might be changed so that it fails.  Since atexit() became
commoly available I've never consiously used unlink() for temporary
files.  While some lazy and/or ignorant (and I mean to use the
dictionary definition of that word, and not to imply any other negative
connotations) sys-admins might like to have temporary files removed even
when the program using them is sent SIGKILL or some other core-causing
signal (either by the sysadmin or by the kernel), some/many/most systems
programmers would probably rather have them left behind so they can be
used to help diagnose whatever reason there was for so drastically
terminating the running process without giving it a chance to call
exit() cleanly.  I for one certainly always want to err on the side of
keeping all the evidence and manually cleaning it up after it is no
longer neede -- after all that's why I would hire a sys-admin in the
first place:  to do the manual cleanups!  :-)


>  If unlink() were to fail, then by UNIX definition, rename() would
> have to fail (since it consists of a link() followed by an unlink(),
> even though it's now an atomic operation).

Where'd (or more properly "when did") you get off the boat from?  :-)

rename(), by all definition since forever, not counting stupid wrapper
functions written (after rename() was first defined) by software porters
who cared more about building a complete runnable binary than about
building a correctly runnable binary, is an atomic operation.  A
rename() is never supposed to leave the filesystem in any state where
the file is not linked in any directory.  Now of course a crash in this
case would only be troublesome if there were a bug that caused the
inode's reference count to go to zero at a time when there were no
directory entries on-disk for that inode.  If such a bug existed then
'fsck -p' would (as currently implemented in NetBSD) silently (almost)
wipe that file from existance.


> [I *would* like it very much if a shell could lock a script so that
> it couldn't get written while running -- or are there still people
> around who routinely write self-modifying code?  I missed that age
> by a couple of years.]

Yeah, me too!  Of course, with a modern VM capable system there's no
reason why the shell can't just slurp up the whole script before it even
starts parsing it -- perhaps using mmap() and setting MAP_COPY.  Too bad
there's not a "MAP_TXTBSY".  (what does ld.so use for shared libraries?
wasn't there recent discussion about a better way to treat shared
libraries?)  Unfortunately "The MAP_COPY flag is not implemented."

(advisory application-level locks would be better than nothing (though
perhaps more expensive than MAP_COPY), but wouldn't be truly safe, and
mandatory application locks are just a bad idea from the get go, at
least in any way dreampt up so far by un*x designers)

-- 
								Greg A. Woods

+1 416 218-0098;  <gwoods@acm.org>;  <g.a.woods@ieee.org>;  <woods@robohack.ca>
Planix, Inc. <woods@planix.com>; VE3TCP; Secrets of the Weird <woods@weird.com>