Subject: Re: funlink() for fun!
To: Matthias Buelow <mkb@mukappabeta.de>
From: Greg A. Woods <woods@weird.com>
List: tech-kern
Date: 07/12/2003 04:47:56
[ On Saturday, July 12, 2003 at 03:38:51 (+0200), Matthias Buelow wrote: ]
> Subject: Re: funlink() for fun!
>
> Greg A. Woods writes:
> 
> >But that's the whole point of funlink() (and perhaps even some of the
> >other f*() calls, such as fchdir()) -- turn an operation on a file into
> >an operation on a filename (i.e. a link to a file).  funlink() would of
> 
> I think it's not proper to regard [f]unlink() as operating on a file.

I think you're forgetting that a "unix file" includes its metadata
(especially if you're talking about kernel internals) and unlink() most
certainly always operates directly on the metadata of a file, even if
the link count is greater than one since the link count is always
decremented.

> Unlink only operates on directories (which can be simplified as
> ordinary files, although this is not quite right).

Unlink() _also_ operates on directories, but most importantly it
decrements the inode link count.

> In most systems you delete files (that's also what probably anybody,
> including those who are familiar with the internal workings of a
> typical Unix system, is thinking when he's working with the system.)
> However, that's not what's being done on Unix.  The kernel deletes
> files, opaquely and behind the scenes (if necessary with the help of
> fsck after an unexpected reboot.)

Ah, not, that's wrong.  The kernel most definitely always decrements the
link count of an inode when unlink() is called.  It _also_ zeros out the
inode number of a filename entry in the parent directory file.

>  Not the user, who just removes
> reference entries from directory files.

The directory reference is only a part of the picture -- if you ignore
the link count in the inode then you have failed to understand the unix
filesystem sufficiently.

> That the "garbage collection" of files which are no longer referenced
> is more or less immediate by using a simple reference counter in the
> file's on-disk structure imho should be regarded as an implementation
> detail.

The fact that the unix filesystem is primarily a table of inodes is no
mere implementation detail.

>  For proper operation, one could also conceive to have a
> process which regularly scans the filesystem and collects files which
> are no longer referenced (like the memory management of certain
> programming languages does.)  It is irrelevant to the exported API how
> exactly this is implemented.  Unlink() should only work as an editing
> operation on directories.

No, one could not concieve of such a thing since it would have to be run
either in single user mode or while the filesystem was not mounted (or
while the whole filesystem is locked from modification).  Do you forget
that Unix systems are inherently multi-processing systems?  It is
absolutely fundamental and critical that unlink() modify the file's link
count (as well as of course freeing the directory entry).

(in fact the moral opposite is true -- it would be possible to not
immediately free the directory entry if inodes included two reference
counts instead of just one such count since an attempt to reference a
file with no valid links but only remaining directory references could
be treated as if there were in fact no directory reference)

>  Any contrived operation that tries to find
> a proper name for an open file through its descriptor is a rather
> unclean thing which probably cannot be done correctly for all cases
> and is way beside the design of the Unix filesystem.

No doubt -- but there it is none the less.  :-)

-- 
								Greg A. Woods

+1 416 218-0098;            <g.a.woods@ieee.org>;           <woods@robohack.ca>
Planix, Inc. <woods@planix.com>; VE3TCP; Secrets of the Weird <woods@weird.com>