Re: vfork() and posix_spawn() [was: Re: CVS commit: src/lib/libc/sys]

To: Joerg Sonnenberger <joerg%bec.de@localhost>
Subject: Re: vfork() and posix_spawn() [was: Re: CVS commit: src/lib/libc/sys]
From: Robert Elz <kre%munnari.OZ.AU@localhost>
Date: Mon, 14 Jun 2021 21:18:10 +0700
    Date:        Mon, 14 Jun 2021 03:56:48 +0200
    From:        Joerg Sonnenberger <joerg%bec.de@localhost>
    Message-ID:  <YMa3YDnqepgJ3JQL%bec.de@localhost>

  | This is even more true for multi-threaded applications
  | (where POSIX explicitly suggests that requirement).

Sure, anything with fork() and threads has issues, that's messy.
Even I know that, and I know very little about threads.

  | On the specific topic, I'm somewhat puzzled by the claim that fork is
  | async-signal-safe

That's not what I said, I said it isn't (though the way I phrased that
might not have been all that clear - I know I often type too much, and
sometimes overcompensate by typing too little).   But (from my message):

	If the "that said" relates to fork() or vfork() not being async
	signal safe, so a double fork() (when the first is vfork anyway)
	would not be condoned,

was meant to acknowledge that fork() (and vfork()) aren't async signal
safe, and so if a requirement that async signal safe functions are all
that is permitted after vfork() actually existed, then a
	if (vfork() == 0) fork();
sequence would not be permitted (not generate defined actions).

But for the purposes of whether double fork() option to posix_spawn
is useful or not, this restriction doesn't matter, as the above can be
(eventually) written
	if (vfork() == 0) _Fork();
instead, as _Fork() is (to be) async signal safe. Same effect as the
(perhaps) undefined version above for this purpose.

  | since most implementations will require synchronisation for pthread_atfork

Yes, which is what _Fork() does not do (_Fork() is to fork() as _exit() is
to exit() - in a sense).

  | It most likely should. The main reason is that much of the system can
  | and often do depend on things like mutexes to ensure correctness and all
  | that is essentially UB after vfork().

Actually, it isn't UB, or shouldn't be.   The behaviour of vfork() is
actually very precisely specified (which doesn't take much, as it is
quite simple).   There's very little room for UB.   What there is is
plenty of room for simple errors to screw things (and appear as if it
is UB, when it is actually quite well defined broken code).   There's no
problem using mutexes after a vfork() with two caveats - first anything locked
must be unlocked again before the eventual _exit() or exec() (the mutex
must be returned to its state at the time of the vfork()) and the
child must not need to lock anything which might be already locked
in the parent (as that's a guaranteed deadlock).   How one makes the
latter condition work is for someone else to solve...

That said, most of what would require a mutex isn't something that should
be being done after a vfork() anyway - most of those operations are likely
to be things which change state of the process, and that's what (at least
without careful preparation) cannot happen in a child of a vfork() (so no
use of stdio, no use of malloc, ...)

  | That's even ignoring the stronger issue of mutating state.

Yes, though if it is intentional, that can actually be OK (sometimes).
Our /bin/sh relies upon the ability to modify its parent's state after
a vfork() to communicate status back to the parent (more than is possible
reliably with exit status - as the parent cannot tell from that whether
the status it receives is from its own image running the child after the
vfork() or from some other process after an exec() succeeded).
A modified volatile variable (in the parent), however, can only have
been modified by the vfork() child (we assume here that the parent
isn't modifying the variable itself elsewhere, naturally).

And yes, that means that on any system that "implements" vfork() using
cc -Dvfork=fork (or its equivalent) cannot use vfork() with our sh - we
need a "real" vfork, or must simply disable its use and only use fork
(which is what the -D implementation does anyway of course, just badly).

  | vfork use really should die...

That sentiment has been around for a long time - almost since vfork()
was first created (40+ years ago).   But it is still here.   Implementing
fork() using CoW was supposed to solve all the issues with fork().  It
didn't, vfork() is still lots faster.

posix_spawn() might allow some uses of vfork()+exec() to be retired, that
would be good, but it isn't going to get all of them.

  | No, it relates to one common pattern for used by or for daemon.

Yes, I understood that, but why do we care?   Daemons start how often?
What percentage of the forks() (vfork is, in practice, never used for
this) on your systems are generated by daemons starting?   What kind of
saving would you expect to see from allowing posix_spawn() to replace
that fork();fork() sequence ?   Is it really worth even the tiniest
extra complexity in that already fairly complex interface to handle
this almost irrelevant issue?   Do you believe in non-memory managed
real time systems there are any daemon processes like that at all?

  | There are non-trivial uses of fork, yes. There are much less non-trivial
  | uses of vfork as the latter already has quite a few problems with spooky
  | actions otherwise.

Agreed.

  | Supporting something like a double fork flag has very
  | little impact on the complexity of the implementation and even less
  | impact on the efficiency.

Agreed, it would be trivial to implement (and specify), and make that
double fork() sequence considerably more efficient (but really, no-one
cares about that).   It adds one more test into every posix_spawn()
execution (almost none of which would use the facility) but compared
to the rest of posix_spawn()'s costs, that's nothing.

Where it adds complexity is in the understanding of the interface - it is
one more option shown to everyone contemplating using posix_spawn for them
to decide whether they should be using it (the option) or not.   The more
complex we make that interface (and everything added adds complexity here,
however trivial implementing it is) the more problems it is going to cause.
Forever.

  | We certainly are at the point where we can
  | start analyzing the remaining blockers for (v)fork+exec users.

"We" could.   If I were you, I'd start with make.   Does make use posix_spawn?
If not, why not?   That's an application that does a lot of forking, but
doesn't need to deal with job control.   It's almost the perfect candidate.

Do we use posix_spawn() in our system() and popen() calls?  (to fork
then exec the shell to process the command - these are painfully trivial
cases, but if called from a BIG process can be expensive if using fork(),
even costly if using vfork()).

Forget the shell, there are too many issues - aside from ones already
mentioned, sh needs to implement "set -C", which is impossible to achieve
with simple O_FLAGS to an open call (which is all posix_spawn allows).
This has been discussed in the austin group (in the context of the shell,
not posix_spawn as such -- there are idiots who believe that noclobber mode
should be a useful technique for making lock files, so want to force the
shell to make that possible - which means no race conditions in the
implementation.)   It has been suggested that a new O_IFORGETWHATTHEYCALLIT
open flag be added (to the kernel) to provide exactly the shell noclobber
semantic - but as there are no implementations of that thing right now,
it probably isn't likely to happen; it would make a noclobber open
race free, and hence suitable for making lock files, as odious an objective
as that is, and simultaneously allow its use for -C mode opens to
use posix_spawn().

  | I quite disagree here, actually. The design-level issue is that
  | POSIX_SPAWN_RESETIDS is a flag and not an action.

Of course.   But it is what it is, and that's because of the objectives
of the design - which were not to replace fork+exec - but to provide a
mechanism to allow that sequence in implementations where implementing
fork() is prohibitively expensive, and the actual requirements are
less general (most likely everything is running as one uid).

It would actually need to be several different actions, or an action with
a modifier parameter (either way, more complexity) as it isn't necessarily
just a "do everything now" issue, one might want to change the uid, then
perform other actions, and then alter the gid, for example.   One might.
But almost no-one does, the single flag that happens first is what is
desired (when it is needed at all) most of the time, and takes no particular
intellectual capacity to set and use it.   And it usually simply works.

Back to the reason for posix_spawn being created...

I'm not sure if you ever saw (or heard of) it or not, but "mini unix"
(6th edition clone for PDP 11/23's and similar, nb: not minix though
that, I believe, had similarities) was one like that - no memory management,
and fork() required swapping the process out, while simultaneously keeping
it in memory (I believe, I never personally used it, but used to hear lots
of talks from people working on it, well, one in particular).  The swapped
out process became the parent (or child, I forget) and the in memory copy
was the other. Scheduling was basically swapping out one process and
swapping in another.  Of course it was slow - but recall, pdp-11, processes
were small, by definition, there was never all that much to swap.

Avoiding all of that cost, when one of those two processes is going to
be discarded almost immediately anyway is what posix_spawn() was created
for.   (In a real time system, the new process might be linked to run at
different addresses than the original - each proc to simultaneously run
can have its own piece of the address space - so nothing need ever be
swapped to make things work using posix_spawn() ... which is impossible
with fork() as the two processes are necessarily identical there.)

  | This means it can't be sequenced and that is the reason for the limitation.

Yes.   It is also much simpler to understand, and for the majority of
uses, does exactly what the application wants to happen.  It just omits
the few outlier applications, which (in the posix model) just keep using
fork()+exec() instead.

  | There is an obvious parallel with the semantics of the chdir action here
  | --that needs to be that, an action and not just a flag.

Yes, and it is, which would be obviously needed anyway as the chdir action
also requires extra data.   The flags do not - they're all simply on
or off.

[Thinking of chdir also reminds me that posix_spawn() has no way to chroot()
either.    And nor should it.   It also has no way to fnctl() (aside from dup)
to alter the modes of already open fds, and lots lots more - most of which
is never used between a fork and exec, but can be.]

  | The separate concern is of course
  | that we need more testing for posix_spawn, but that is hopefully also
  | going to become better as side effect of the non-GSoC project.

One might expect that to be unavoidable (and I tend to guess that Martin's
fix came as a direct result of his role in supervising that project).

But even better would be making the system depend upon it in some major way,
then it would be being used a lot, by everyone, and any bugs more likely to
reveal themselves (though that will only find bugs in the actual operations
used by whatever applications are converted - testing of the exotic, rarely
used options, is also needed).

kre
References:
- Re: vfork() and posix_spawn() [was: Re: CVS commit: src/lib/libc/sys]
  - From: Joerg Sonnenberger
- Re: CVS commit: src/lib/libc/sys
  - From: Joerg Sonnenberger
- Re: CVS commit: src/lib/libc/sys
  - From: Robert Elz
- vfork() and posix_spawn() [was: Re: CVS commit: src/lib/libc/sys]
  - From: Robert Elz
Prev by Date: Re: vfork() and posix_spawn() [was: Re: CVS commit: src/lib/libc/sys]
Next by Date: Re: CVS commit: src/sys/dev/pad
Previous by Thread: Re: vfork() and posix_spawn() [was: Re: CVS commit: src/lib/libc/sys]
Next by Thread: Re: CVS commit: src/sys/dev
Indexes:
Home | Main Index | Thread Index | Old Index