Subject: Re: funlink() for fun!
To: NetBSD Kernel Technical Discussion List <tech-kern@netbsd.org>
From: der Mouse <mouse@Rodents.Montreal.QC.CA>
List: tech-kern
Date: 07/12/2003 01:48:29
[Omnibus reply here]

[Greg Woods, responding to me]

>> However, unlink() does not operate on a file, except as a side
>> effect; it operates on a link to a file.

> The critical point is that the caller must assume the file and its
> content is gone for good if the unlink() succeeds (unless the caller
> created another link, or at least knows of one, _and_ knows that
> other link is still safe and secure).

With that parenthetical note you're making my point almost more
effectively than I did: unlink() is not a destroy-this-file call, and
never has been.  It destroys a link, and often (but not always) does
other things in consequence - but it is fundamentally operating on the
link, not the file.

> I agree that the pathname is just serving to locate the file

Not at all; if this were true, then for a file with multiple hardlinks,
it wouldn't matter which link you passed a pathname to.

> [...] or else you have to go hunting again for the filename and at
> that point you can only safely unlink the file if it has a link count
> of one as otherwise you can't tell if you've found the right
> filename.

Again, you're making my point for me: if you were truly operating on
the file itself, there wouldn't even _be_ a "right" filename, as the
filename would serve only to locate the file, something that's not
necessary when you have an open fd.

>> Since file descriptors are on files, not on links to files,
>> funlink() doesn't really make sense.
> Oh, of course it does -- file descriptors are handles to open files,
> and files can only be opened if you know their name.

Files do not have names.  Links to files have names.

>> (It'd also be nice to be able to do one syscall (funlink of this
>> flavor) instead of three (fchdir, unlink, fchdir).)
> I think funlink() is always going to incur a lot more overhead,
> either globally to the hole system or individually to the caller, no
> matter which direction you choose to try to optimise it in,

My comment was specifically about a different flavor of funlink, one
which takes an fd on a directory and a (slashless) entry name.  That's
why the parenthetical "of this flavor".

> If you get the wrong answer from facess() then you just close() the
> file and no harm is done (unless

...the file is really a tape device and you've just unloaded the
evening's backup tape by opening it and closing it.

You _cannot_ safely do a privileged open of untrusted pathnames.
(Thinking that access() permits this is an example of the confusion
about what access() does and doesn't do.)  The only way to get it right
is to run as the appropriate user when doing the open (or, if
available, use an open() variant that does the equivalent internally).

>> You still couldn't save and restore your current directory by
>> opening "." and fchdir()ing back there if your current directory is
>> execute-only, even with O_MKDIR, without O_NOACCESS.
> If your process is running as root then it sure as heck can!  ;-)

Didn't you just finish talking about NFS servers with no root access?

Besides, just because root can blow past the protection is no reason
why O_NOACCESS is useless; all that means is it's unnecessary when
running as root.

> On the other hand wouldn't all need for O_NOACCESS be eliminated if
> there were something like openpwd(2)?

Which does the equivalent of open(".") but "cannot fail"?  Offhand I
suspect not, but I'd have to think about it.

[Greg Woods, responding to, I think, Greywolf]

>> Your logic is flawed.  Please show how unlink(2) operates on files.
>> - it doesn't write to them.
> Actually it does -- provided their link count is only one and soon to
> be zero.

Whereby you are once again making our point by your "provided".  If it
doesn't always do it (for successful calls), then that's not its core
functionality.

unlink() is not "destroy this file".  It's "destroy this link".
Destroying a link often destroys a file, but not always.

[Greywolf]

> Where does access(2) win over *stat(2)?  I don't understand that at
> all.

The original idea, near as I can tell, is hiding in this sentence from
the manpage for access(2):
     The real user ID is used in place of the effective user ID and the real
     group access list (including the real group ID) are used in place of the
     effective ID for verifying permission.

As Greg points out, this is not, in itself, enough in the face of an
active attacker.

> How about flink(2), for an open fd?

I've thought so, sometimes.

Basically, every call that takes a pathname is either using the
pathname as a way to get hold of a filesystem object, but nothing more
(eg, truncate(2), chdir(2), open(2), or the first arg to link(2)) or it
uses the pathname as a reference to a particular link, whether
potential or actual (eg, the second arg to link(2), or, to come back to
where we started, unlink(2)).

The former have obvious f* analogs, though the f* analog of open()
doesn't exist (no, it's not dup(2)).  The latter would need
directory-fd-and-string pairs to have real f* analogs.  (This implies
three flink() variants for full orthogonality: flink1(fd,path);
flink2(path,fd,entry); flink3(fd,fd,entry).  Perhaps it would be better
to invent some kind of pathname syntax that means "this entry in the
directory referred to by this fd"...though there isn't much room for
inventing new pathname syntax, even standards aside.)

/~\ The ASCII				der Mouse
\ / Ribbon Campaign
 X  Against HTML	       mouse@rodents.montreal.qc.ca
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B