Subject: Re: Improving the Unix API
To: der Mouse <mouse@Rodents.Montreal.QC.CA>
From: Francois-Rene Rideau <fare@tunes.org>
List: tech-kern
Date: 06/28/1999 01:04:47
On Sun, Jun 27, 1999 at 12:58:05PM -0400, der Mouse wrote:
> As I think someone already mentioned, BSD has chflags(), [...]
Yup.

>> Robert had to hand-remove the immutable flag
>> (I guess, by accessing the relevant block directly).
> (clri didn't work?)
Never heard about clri (was under Linux). And I dunno what Robert did.
I will ask him, if it matters.

> funlink makes no sense [...] unlink() operates on names, not files [...]
Oops. Indeed. The thinko is purely mine.

> I've often wanted open-with-no-access in conjunction with fchdir().
> This is because you need only execute access to set your cwd to a
> directory, but there's no way to get an fd on a mode-111 directory.
Again and again, open-with-no-access definitely seems
to have lots of applications.

>> flink (make a new directory link for file given by descriptor),
flink() combined with the ability to create an unlinked file
in a given filesystem would allow for safe temporaries
without race conditions, that could be "published" when ready.

>> freadlink (read link from a file descriptor opened with O_NULL),
>> fexec (execute the binary that we checked), etc.
> freadlink() implies that open() with O_NULL has the peculiar property
> that, unlike all other open()s, it doesn't follow terminal symlinks.
I suggested that there could be a flag O_DONTFOLLOWLINK in such cases;
I'm not fully sure the feature, but it would allow to set flags on symlinks,
and other goodies.

> While I think there are ways symlinks could be improved, I don't think
> this is one of them.  I can't see any use for opening a symlink except
> use of write() to atomically make the link point somewhere different,
> and I'd prefer to do that by making symlink() do that when the link
> already exists and some appropriate condition is met.
Well, I can imagine opening them to lock them,
so as to prevent other people from making them point somewhere else,
as well as change some filesystem attributes on the right thing, etc.
Again, open() allows locking and prevents race conditions.

>> Of course, you'll want to be able to fcntl(fd,F_SETFL,O_RDWR)
>> or something equivalent, to upgrade your access mode
>> on a file you opened with O_NULL.
> The security weenie in me is _really_ unsure that the ability to
> increase the access modes on an open fd is a good idea.
Well, there could be a flag O_NOINCREASEACCESS to prevent
further increasing of access modes (by e.g. children),
if you that makes you safer.
And of course, increasing access mode
is subject to usual permission checking.

>> Another problem was the ability to change the mount status of a partition
>> from read-write to read-only or to unmounted,
> See NetBSD (and presumably other BSD) "mount -o update,rdonly" and/or
> "umount -f".  (Last I tried, the latter didn't work as it should, but
> that's a matter of fixing bugs rather than introducing new features.)
If you re-read the original message, the problem is what to do
about processes with open file descriptors on the partition:
stop them at once? stop them at first file access?
block them instead? kill them? Will you do it atomically?
How will you allow for such large table-walking to be compatible
with real-time kernel response? [Hint: either use incremental
data-structures, or don't be atomic and be interruptible instead.]

>> Finally, we discussed about saving _and restoring_ the state of a process,
>> another hack that he did once to preserve a long-winded calculation
>> from the service shutdown of a big unix computer.
> I did this once, long long ago, under (I think) 4.3.  I found that I
> couldn't just dump core, though I forget why.  As for the open file
> descriptor question, I punted - I made the relevant call fail unless
> the process had no fds open.
Again, the difficult part is precisely about fd handling;
and the suggested feature of whole-computer save&restore
(where external connections will still be a problem)
similarly required that device drivers be able to dump restorable state.

>> By posting on all free unix kernel mailing-list I know,
>> I intend to put free unices in competition as to which
>> will implement these features first.
> Reasonable as this sounds, I think the last thing we need is yet
> another ground on which one free-unix can be doing the "nana nana boo
> boo" taunt at another.
Competition is _not_ about taunting each other for pride;
it's about striving to be the best we can in an atmosphere
of creative diversity whereby people copy each other's good ideas
and drop everyone's bad ideas. Diversity and free competition
increase the odds of good and bad ideas being recognized as what they are,
first by one, then by everyone,
which benefits to everyone in the form of positive evolution.
But let's reserve such meta-technical discussions to another forum.

>> As for the opening with no permissions - well, it would make *big*
>> sense if we could narrow down the API and move chown(), chmod(), etc.
>> into libc leaving f-variants in the kernel.
> I really don't like that.  The reasons why are (1) this means you have
> to have an fd free to do them; (2) it triples the number of user/kernel
> crossings involved.
I think that (1) file descriptors are not an expensive resource,
and the kernel will basically have to maintain an internal temporary resource
that has most of the file descriptor complexity
when doing name-based operations, anyway;
(2) processes that have critical behavior will prefer
a locking open-based interface to do several things at once,
whereas those that do only name-based handling are mostly
non-critical user-interaction processes where syscalls are not
the performance bottleneck;
(3) if that's really really a problem,
you can still keep old syscalls as a an optimization,
although it might be argued that keeping the kernel smaller
will help reduce page faults and make more memory available,
which will have a beneficial overall effect on performance.

[ "Faré" | VN: Уng-Vû Bân | Join the TUNES project!   http://www.tunes.org/  ]
[ FR: François-René Rideau | TUNES is a Useful, Nevertheless Expedient System ]
[ Reflection&Cybernethics  | Project for  a Free Reflective  Computing System ]
Imagine algebra in XML: instead of (sin (+ x y)), sin(x+y) or x y + sin,
you get <sin><plus><var name=x><alsoargument><var name=y></plus></sin>.