Subject: Re: funlink() for fun!
To: NetBSD Kernel Technical Discussion List <tech-kern@NetBSD.org>
From: Greywolf <firstname.lastname@example.org>
Date: 07/11/2003 18:35:36
Thus spake Greg A. Woods ("GAW> ") sometime Today...
GAW> Hmmm... no, I'd still say it's normally the other way around (where
GAW> "normal" is the case of one link). The file (inode and the storage it
GAW> points to) is operated on (i.e. freed), and the zeroing out of the inode
GAW> number in the directory entry is only a side-effect.
assert(FWIW == (probable_t) NULL); I think you have it exactly backwards,
or sideways. The purpose of the unlink is to:
- clear the dir->ino
- decrement ino->st_nlink
If you happen to decrement ino->st_nlink below zero, then the data associated
with the metadata gets freed, and the data goes away. THAT is the "side
effect", and it's only logical.
GAW> At least that's
GAW> the way I see it if you want to talk about it in terms of degrees of
GAW> affect. :-)
GAW> The critical point is that the caller must assume the file and its
GAW> content is gone for good if the unlink() succeeds (unless the caller
GAW> created another link, or at least knows of one, _and_ knows that other
GAW> link is still safe and secure).
Okay, sure, but system calls run on a lot of semantic points, and,
semantically speaking, unlink() works on directory entries and metadata,
unlike, say, read() and write(), which work on file data.
GAW> I agree that the pathname is just serving to locate the file (thus my
GAW> other argument about unix filesystems being primarily just flat inode
GAW> tables :-). The only logical difference between unlink() and funlink()
GAW> (or any other similar pair of filename/file-descriptor system calls) is
GAW> the time at which the filename is used to locate the file. The only
GAW> trick with funlink() is that you either have to cache the filename in
GAW> the kernel when you first open the file (and then safely confirm it's
GAW> still the same file before you unlink it) or else you have to go hunting
GAW> again for the filename and at that point you can only safely unlink the
GAW> file if it has a link count of one as otherwise you can't tell if you've
GAW> found the right filename. If you cache the filename and you still
GAW> implement the ftw() search should the filename prove invalid then you
GAW> can increase the likelyhood that funlink() will "do the right thing",
GAW> but of course if the file or one of its parent directories is renamed
GAW> between the open() and the funlink() thus invalidating the cached
GAW> filename then you still can't unlink the file you find with the ftw()
GAW> unless its link count is only one, so funlink() is always going to be a
GAW> little less reliable, and potentially a lot more costly, than
GAW> fchdir(safe_open(dirname())); unlink(basename()).
...unless you arrange for link(), creat() [open(path,O_CREAT|O_EXCL,mode),
fine, whatever...], rename(), unlink(), mkdir(), rmdir() to perform
maintenance on a table of lists of vnodes...
GAW> > Since file descriptors are on files, not
GAW> > on links to files, funlink() doesn't really make sense.
GAW> Oh, of course it does -- file descriptors are handles to open files, and
GAW> files can only be opened if you know their name.
I really think you need to re-evaluate the above statment.
file descriptors work on files, not on namespace, which is what
makes funlink(2) such a PITA to even consider implementing.
NONE of the other f*(2) routines affect namespace. _None_ of them.
They all operate on already established objects referenced through
a descriptor. Granted the descriptors were accessed by their name,
but once the object is attached to the fd, the name may as well
With the strategy in your statement above, I can just as easily prove
that there is life after death (anyone?).
GAW> In the trivial case
GAW> where you might funlink(fileno(stdin)), for example, then the intent is
GAW> to unlink the file the parent process opened and connected to the
GAW> child's stdin. Of course that begs the question as to why the parent
GAW> process wasn't programmed to just wait around for the child to exit and
GAW> then do the unlink() itself; or alternately why the parent process
GAW> didn't just hand the original pathname to the child process.
Ooh, yes, I can just see this now: All my users saying, "Hey, what does
'no /dev/tty' mean?"
GAW> However if you're really worried about not being able to open(".") then
GAW> I'm all for your O_NOACCESS flag! ;-) [or openpwd(), see below]
I think openpwd() is a poor substitute for open(...,O_NOACCESS,...),
since it only allows one to open ".". It would be much preferable to name
a path which permits access all the way down, but which can not be
opened for reading.
GAW> > Which is why you can't do funlink(), because unlink doesn't operate on
GAW> > files; it operates on links to files. The file is operated on only in
GAW> > that it's garbage-collected once it's no longer referenceable. (Which
GAW> > may be when its refcount goes to zero, or it may be an indeterminate
GAW> > time later.)
GAW> Again, when a file and its storage is garbage collected is irrelevant.
But when it is made available for GC is not, as far as filesystem state
GAW> The caller must assume it's gone for good once unlink() returns
GAW> successfully. Just because there was another link doesn't mean even the
GAW> caller can find it in time to prevent the ultimate destruction of the
GAW> underlying file and its storage.
Such an implementation would be broken. Think about it. Any FS which
destroys an object that has a valid positive reference count to it is
doing something wrong. I would consider a panic() at that point to be
the most friendly thing I could hope to see on my screen.
GAW> > How so?
GAW> It is absolutely impossible for a privileged process to use access()
GAW> safely, especially if the target file is on any filesystem where
GAW> sensitive data lives. (despite the fact access() was intended primarily
GAW> for the use of privileged processes)
Where do you get _this_ from? It was not very well thought out if that's
GAW> Besides for non-privileged programs it's probably still more useful to
GAW> look at the actual mode bits and ownerships after an fstat() than it is
GAW> to use the very limited semantics of access().
...in which case access(2) may as well just go away, which I'm sure would
delight quite a few people.
GAW> You always have to lstat() what you think you're going to open, then
GAW> open() it to get a secure handle on it, and finally fstat() it once more
GAW> to make sure you did get what you think you got. Only then is it safe
GAW> to examine what you actually got to see if it has the attributes you're
GAW> looking for. At that point of course faccess() could be a library call
GAW> that accepts the struct stat from the fstat() call (or does its own
GAW> fstat() again), though paranoid programmers such as myself might still
GAW> prefer that the exact same code that implements the rules the kernel
GAW> would use to do the same check "in real life" also be used to do the
GAW> faccess() check as well. :-)
Well, yes -- if the f* call is not as tightly coupled as the original
non f* call, it's a lose since the fd vs. path interfaces cannot be
considered equal at that point.
GAW> > No, it wouldn't. You still couldn't save and restore your current
GAW> > directory by opening "." and fchdir()ing back there if your current
GAW> > directory is execute-only, even with O_MKDIR, without O_NOACCESS.
GAW> If your process is running as root then it sure as heck can! ;-)
But what if it *isn't*? I think that's the point he's trying to make.
GAW> On the other hand wouldn't all need for O_NOACCESS be eliminated if
GAW> there were something like openpwd(2)? Hmmm... maybe not because you
GAW> might want to be able to open a directory that you have no rights on
GAW> just to do an fstat() on it, but then again if you have no rights on it
GAW> an lstat() or stat() would suffice -- there's no need to use fstat().
The idea of O_NOACCESS is, I believe, in opening a directory that has
execute-only permission, for the purposes of returning a valid fd for
use with fchdir().
Think of a mode 0111 directory at some point along a path. Never mind
that you have to know the node within said directory in order to accomplish
anything; that's kind of given.
Perhaps "O_CHDIR" would be more appropriate? That could even imply a
check that the underlying object is, in fact, a directory, let alone
one that can be chdir(2)d to.
NetBSD: Use the ENTIRE computer!