Subject: Re: funlink() for fun!
To: NetBSD Kernel Technical Discussion List <tech-kern@NetBSD.org>
From: Greywolf <greywolf@starwolf.com>
List: tech-kern
Date: 07/11/2003 18:35:36
Thus spake Greg A. Woods ("GAW> ") sometime Today...

GAW> Hmmm... no, I'd still say it's normally the other way around (where
GAW> "normal" is the case of one link).  The file (inode and the storage it
GAW> points to) is operated on (i.e. freed), and the zeroing out of the inode
GAW> number in the directory entry is only a side-effect.

assert(FWIW == (probable_t) NULL); I think you have it exactly backwards,
or sideways.  The purpose of the unlink is to:
	- clear the dir->ino
	- decrement ino->st_nlink

If you happen to decrement ino->st_nlink below zero, then the data associated
with the metadata gets freed, and the data goes away.  THAT is the "side
effect", and it's only logical.

GAW> At least that's
GAW> the way I see it if you want to talk about it in terms of degrees of
GAW> affect.  :-)

Yes, well...

GAW> The critical point is that the caller must assume the file and its
GAW> content is gone for good if the unlink() succeeds (unless the caller
GAW> created another link, or at least knows of one, _and_ knows that other
GAW> link is still safe and secure).

Okay, sure, but system calls run on a lot of semantic points, and,
semantically speaking, unlink() works on directory entries and metadata,
unlike, say, read() and write(), which work on file data.

GAW> I agree that the pathname is just serving to locate the file (thus my
GAW> other argument about unix filesystems being primarily just flat inode
GAW> tables :-).  The only logical difference between unlink() and funlink()
GAW> (or any other similar pair of filename/file-descriptor system calls) is
GAW> the time at which the filename is used to locate the file.  The only
GAW> trick with funlink() is that you either have to cache the filename in
GAW> the kernel when you first open the file (and then safely confirm it's
GAW> still the same file before you unlink it) or else you have to go hunting
GAW> again for the filename and at that point you can only safely unlink the
GAW> file if it has a link count of one as otherwise you can't tell if you've
GAW> found the right filename.  If you cache the filename and you still
GAW> implement the ftw() search should the filename prove invalid then you
GAW> can increase the likelyhood that funlink() will "do the right thing",
GAW> but of course if the file or one of its parent directories is renamed
GAW> between the open() and the funlink() thus invalidating the cached
GAW> filename then you still can't unlink the file you find with the ftw()
GAW> unless its link count is only one, so funlink() is always going to be a
GAW> little less reliable, and potentially a lot more costly, than
GAW> fchdir(safe_open(dirname())); unlink(basename()).

...unless you arrange for link(), creat() [open(path,O_CREAT|O_EXCL,mode),
fine, whatever...], rename(), unlink(), mkdir(), rmdir() to perform
maintenance on a table of lists of vnodes...

GAW> >  Since file descriptors are on files, not
GAW> > on links to files, funlink() doesn't really make sense.
GAW>
GAW> Oh, of course it does -- file descriptors are handles to open files, and
GAW> files can only be opened if you know their name.

I really think you need to re-evaluate the above statment.

file descriptors work on files, not on namespace, which is what
makes funlink(2) such a PITA to even consider implementing.

NONE of the other f*(2) routines affect namespace.  _None_ of them.
They all operate on already established objects referenced through
a descriptor.  Granted the descriptors were accessed by their name,
but once the object is attached to the fd, the name may as well
be forgotten.

With the strategy in your statement above, I can just as easily prove
that there is life after death (anyone?).

GAW>  In the trivial case
GAW> where you might funlink(fileno(stdin)), for example, then the intent is
GAW> to unlink the file the parent process opened and connected to the
GAW> child's stdin.  Of course that begs the question as to why the parent
GAW> process wasn't programmed to just wait around for the child to exit and
GAW> then do the unlink() itself; or alternately why the parent process
GAW> didn't just hand the original pathname to the child process.

Ooh, yes, I can just see this now:  All my users saying, "Hey, what does
'no /dev/tty' mean?"

GAW> However if you're really worried about not being able to open(".") then
GAW> I'm all for your O_NOACCESS flag!  ;-)   [or openpwd(), see below]

I think openpwd() is a poor substitute for open(...,O_NOACCESS,...),
since it only allows one to open ".".  It would be much preferable to name
a path which permits access all the way down, but which can not be
opened for reading.

GAW> > Which is why you can't do funlink(), because unlink doesn't operate on
GAW> > files; it operates on links to files.  The file is operated on only in
GAW> > that it's garbage-collected once it's no longer referenceable.  (Which
GAW> > may be when its refcount goes to zero, or it may be an indeterminate
GAW> > time later.)
GAW>
GAW> Again, when a file and its storage is garbage collected is irrelevant.

But when it is made available for GC is not, as far as filesystem state
goes.

GAW> The caller must assume it's gone for good once unlink() returns
GAW> successfully.  Just because there was another link doesn't mean even the
GAW> caller can find it in time to prevent the ultimate destruction of the
GAW> underlying file and its storage.

Such an implementation would be broken.  Think about it.  Any FS which
destroys an object that has a valid positive reference count to it is
doing something wrong.  I would consider a panic() at that point to be
the most friendly thing I could hope to see on my screen.

GAW> > How so?
GAW>
GAW> It is absolutely impossible for a privileged process to use access()
GAW> safely, especially if the target file is on any filesystem where
GAW> sensitive data lives.  (despite the fact access() was intended primarily
GAW> for the use of privileged processes)

Where do you get _this_ from?  It was not very well thought out if that's
the case.

GAW> Besides for non-privileged programs it's probably still more useful to
GAW> look at the actual mode bits and ownerships after an fstat() than it is
GAW> to use the very limited semantics of access().

...in which case access(2) may as well just go away, which I'm sure would
delight quite a few people.

GAW> You always have to lstat() what you think you're going to open, then
GAW> open() it to get a secure handle on it, and finally fstat() it once more
GAW> to make sure you did get what you think you got.  Only then is it safe
GAW> to examine what you actually got to see if it has the attributes you're
GAW> looking for.  At that point of course faccess() could be a library call
GAW> that accepts the struct stat from the fstat() call (or does its own
GAW> fstat() again), though paranoid programmers such as myself might still
GAW> prefer that the exact same code that implements the rules the kernel
GAW> would use to do the same check "in real life" also be used to do the
GAW> faccess() check as well.  :-)

Well, yes -- if the f* call is not as tightly coupled as the original
non f* call, it's a lose since the fd vs. path interfaces cannot be
considered equal at that point.

GAW> > No, it wouldn't.  You still couldn't save and restore your current
GAW> > directory by opening "." and fchdir()ing back there if your current
GAW> > directory is execute-only, even with O_MKDIR, without O_NOACCESS.
GAW>
GAW> If your process is running as root then it sure as heck can!  ;-)

But what if it *isn't*?  I think that's the point he's trying to make.

GAW> On the other hand wouldn't all need for O_NOACCESS be eliminated if
GAW> there were something like openpwd(2)?   Hmmm... maybe not because you
GAW> might want to be able to open a directory that you have no rights on
GAW> just to do an fstat() on it, but then again if you have no rights on it
GAW> an lstat() or stat() would suffice -- there's no need to use fstat().

The idea of O_NOACCESS is, I believe, in opening a directory that has
execute-only permission, for the purposes of returning a valid fd for
use with fchdir().

Think of a mode 0111 directory at some point along a path.  Never mind
that you have to know the node within said directory in order to accomplish
anything; that's kind of given.

Perhaps "O_CHDIR" would be more appropriate?  That could even imply a
check that the underlying object is, in fact, a directory, let alone
one that can be chdir(2)d to.

				--*greywolf;
--
NetBSD: Use the ENTIRE computer!