[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: kern/49017: vfork does not suspend all threads
The following reply was made to PR kern/49017; it has been noted by GNATS.
From: Nico Williams <Nico.Williams%twosigma.com@localhost>
Subject: Re: kern/49017: vfork does not suspend all threads
Date: Wed, 5 Apr 2017 18:42:25 +0000
Please do NOT stop any threads in the vfork() parent other than the one that
called vfork(). Also, allow me to make an argument for this.
First, I suppose I should look at rationales for stopping all threads in a
vfork() parent. I can think of two (but if I'm missing some, let me know!): a)
that the man page always said that the parent process is stopped, ergo it must
now mean "all threads in the parent process", and b) that the set of safe
functions to call in the vfork() child is made unacceptably smaller by not
stopping all threads in the parent.
Before refuting my strawman rationales for stopping all threads, I'll explain
why stopping all threads is highly undesirable: it kills performance, the very
reason for vfork()'s existence.
There are several ways to use vfork() to spawn children in a high-performance
- First, obviously, in posix_spawn().
It would be terrible to have to stop all of a JVM's many threads just to
spawn a child, and would negate some of vfork()'s massive performance
advantage over fork().
Why should unrelated threads in the parent suffer? (This gets to the safety
issues which I posit might motivate stopping all parent threads, and which I
address below.) Even if there were a strong safety argument for this, we
should aim to make it go away as the performance rationale for using vfork()
is extremely important in real life cases.
(I should point out that, for example, Linux's vfork() does not stop all
other threads in the parent. I can provide a test program that demonstrates
- Second, one can implement a very fast popen()-like API that uses a threaded
taskq where threads pre-vfork(), enabling a program to spawn processes
faster than with posix_spawn(): without blocking for the child to spin up
then execve()-or-_exit() -- the threads that pre-call vfork() block that
way, but the threads that dispatch the requests to the pre-vfork()ed
children do not block at all, they only call write(2) to write the job to a
pipe to the child.
See my gist about this where I describe this in detail and propose a new
function with this signature:
pid_t avfork(int (*start_routine)(void *), void *arg);
and provide a partial implementation based on a pre-vforking threaded taskq:
There is such a very fast popen()-like implementation here:
https://github.com/famzah/popen-noshell (warning GPLv3)
that uses clone() on Linux to get something very much like the avfork() that
I argue for. Its author needs to be able to spawn thousands of processes
very quickly sometimes (see
Now, to knock down my strawman rationales for stopping all threads in the
- Regarding (a), pre-threads vfork() man page text saying "stops the parent
process" should not be interpreted as meaning "all threads" now that we have
a threaded world. Clearly the original authors could not have meant that,
nor for that matter would they have meant that only the thread that called
vfork() in the parent must be stopped. We must decide this matter de novo.
Clearly the thread that called vfork() must be stopped until the child
execve()s or _exit()s. That much is utterly clear: because two schedulable
threads/entities simply cannot share a stack concurrently. So we only need
to decide whether other threads in the parent must also be stopped, and the
original man page text simply can't guide us as to that as it predates
- Regarding (b), it may already the case that the set of functions that may
safely be called in the vfork() child is somewhat smaller than the set of
functions that may be called in a fork() child. Since POSIX has deprecated
vfork(), we don't know what that set is (though we can inspect earlier POSIX
standards) and may now define it to our liking.
In any case, the set of async-signal-safe functions defined by POSIX looks
like it should be safe to call in a vfork() child on any reasonable OS since
all of them should be system calls that do not affect the shared address
space (or anything else that might still be shared between the child and the
As an aside, obviously the child might also probably not want to change FD
or FL flags with fcntl() for file descriptors shared with the parent. And
it should also not use the horrible POSIX file locking, though that's mostly
because nothing should use the horrible POSIX file locking! This aside
brought to you by intense feelings of disgust elicited by POSIX file locking.
Note that there are a number of functions NOT INCLUDED in the standard list
of async-signal-safe functions:
- brk(), sbrk(), mmap(), munmap(), mprot()
- the heap allocator (quite naturally, since it might need to call
brk()/sbrk() and/or mmap()/munmap(), or pthread_*() functions, none of
which are async-signal-safe)
which means that the scariest functions one might call on the child-side of
vfork() are by definition (e.g., the old POSIX vfork() specification)
already not safe to call on the child-side of vfork().
In any case, again, NetBSD is free to further narrow the set of functions
that are safe to call in the child-side of vfork() should that be necessary.
The biggest problem with vfork(), really, is that unsafe signal handlers in the
might run in the child before the child can block them. This could be bad even
if all threads in the parent are stopped.
Indeed, I would argue that the set of functions that are safe to call in an
asynchronous signal handler (as opposed to the child-side of fork() or vfork())
is smaller than that which POSIX says. The only things I ever do in the signal
handlers I write are:
- write to sig_atomic_t variables
- call write(2) to write a single byte into a pipe that is used in the
application's event loop
If the application does not have an event loop I do sometimes ensure that
there's a thread blocking on read(2) on the other side of that pipe.
- call write(2) to write to stderr
- call _exit(2)
If I had my way those would be actions things I'd allow in signal handlers in
POSIX! (And then we'd have to give a new name to the async-signal-safe
function set that we reuse to define the functions that are safe to call in
various other contexts such as the child-side of fork()!)
Thanks for taking the time to read this -- it's probably too long, and I
apologize about that. If I'm wrong about something here, please let me know!
Main Index |
Thread Index |