NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/49017: vfork does not suspend all threads

The following reply was made to PR kern/49017; it has been noted by GNATS.

From: Nico Williams <>
To: <>
Subject: Re: kern/49017: vfork does not suspend all threads
Date: Wed, 5 Apr 2017 18:42:25 +0000

 Please do NOT stop any threads in the vfork() parent other than the one that
 called vfork().  Also, allow me to make an argument for this.
 First, I suppose I should look at rationales for stopping all threads in a
 vfork() parent.  I can think of two (but if I'm missing some, let me know!): a)
 that the man page always said that the parent process is stopped, ergo it must
 now mean "all threads in the parent process", and b) that the set of safe
 functions to call in the vfork() child is made unacceptably smaller by not
 stopping all threads in the parent.
 Before refuting my strawman rationales for stopping all threads, I'll explain
 why stopping all threads is highly undesirable: it kills performance, the very
 reason for vfork()'s existence.
 There are several ways to use vfork() to spawn children in a high-performance
  - First, obviously, in posix_spawn().
    It would be terrible to have to stop all of a JVM's many threads just to
    spawn a child, and would negate some of vfork()'s massive performance
    advantage over fork().
    Why should unrelated threads in the parent suffer?  (This gets to the safety
    issues which I posit might motivate stopping all parent threads, and which I
    address below.)  Even if there were a strong safety argument for this, we
    should aim to make it go away as the performance rationale for using vfork()
    is extremely important in real life cases.
    (I should point out that, for example, Linux's vfork() does not stop all
    other threads in the parent.  I can provide a test program that demonstrates
  - Second, one can implement a very fast popen()-like API that uses a threaded
    taskq where threads pre-vfork(), enabling a program to spawn processes
    faster than with posix_spawn(): without blocking for the child to spin up
    then execve()-or-_exit() -- the threads that pre-call vfork() block that
    way, but the threads that dispatch the requests to the pre-vfork()ed
    children do not block at all, they only call write(2) to write the job to a
    pipe to the child.
    See my gist about this where I describe this in detail and propose a new
    function with this signature:
         pid_t avfork(int (*start_routine)(void *), void *arg);
    and provide a partial implementation based on a pre-vforking threaded taskq:
    There is such a very fast popen()-like implementation here:
 (warning GPLv3)
    that uses clone() on Linux to get something very much like the avfork() that
    I argue for.  Its author needs to be able to spawn thousands of processes
    very quickly sometimes (see
 Now, to knock down my strawman rationales for stopping all threads in the
 vfork() parent:
  - Regarding (a), pre-threads vfork() man page text saying "stops the parent
    process" should not be interpreted as meaning "all threads" now that we have
    a threaded world.  Clearly the original authors could not have meant that,
    nor for that matter would they have meant that only the thread that called
    vfork() in the parent must be stopped.  We must decide this matter de novo.
    Clearly the thread that called vfork() must be stopped until the child
    execve()s or _exit()s.  That much is utterly clear: because two schedulable
    threads/entities simply cannot share a stack concurrently.  So we only need
    to decide whether other threads in the parent must also be stopped, and the
    original man page text simply can't guide us as to that as it predates
  - Regarding (b), it may already the case that the set of functions that may
    safely be called in the vfork() child is somewhat smaller than the set of
    functions that may be called in a fork() child.  Since POSIX has deprecated
    vfork(), we don't know what that set is (though we can inspect earlier POSIX
    standards) and may now define it to our liking.
    In any case, the set of async-signal-safe functions defined by POSIX looks
    like it should be safe to call in a vfork() child on any reasonable OS since
    all of them should be system calls that do not affect the shared address
    space (or anything else that might still be shared between the child and the
    As an aside, obviously the child might also probably not want to change FD
    or FL flags with fcntl() for file descriptors shared with the parent.  And
    it should also not use the horrible POSIX file locking, though that's mostly
    because nothing should use the horrible POSIX file locking!  This aside
    brought to you by intense feelings of disgust elicited by POSIX file locking.
    Note that there are a number of functions NOT INCLUDED in the standard list
    of async-signal-safe functions:
     - pthread_*()
     - brk(), sbrk(), mmap(), munmap(), mprot()
     - the heap allocator (quite naturally, since it might need to call
       brk()/sbrk() and/or mmap()/munmap(), or pthread_*() functions, none of
       which are async-signal-safe)
    which means that the scariest functions one might call on the child-side of
    vfork() are by definition (e.g., the old POSIX vfork() specification)
    already not safe to call on the child-side of vfork().
    In any case, again, NetBSD is free to further narrow the set of functions
    that are safe to call in the child-side of vfork() should that be necessary.
 The biggest problem with vfork(), really, is that unsafe signal handlers in the
 might run in the child before the child can block them.  This could be bad even
 if all threads in the parent are stopped.
 Indeed, I would argue that the set of functions that are safe to call in an
 asynchronous signal handler (as opposed to the child-side of fork() or vfork())
 is smaller than that which POSIX says.  The only things I ever do in the signal
 handlers I write are:
  - write to sig_atomic_t variables
  - call write(2) to write a single byte into a pipe that is used in the
    application's event loop
    If the application does not have an event loop I do sometimes ensure that
    there's a thread blocking on read(2) on the other side of that pipe.
  - call write(2) to write to stderr
  - call _exit(2)
 If I had my way those would be actions things I'd allow in signal handlers in
 POSIX!  (And then we'd have to give a new name to the async-signal-safe
 function set that we reuse to define the functions that are safe to call in
 various other contexts such as the child-side of fork()!)
 Thanks for taking the time to read this -- it's probably too long, and I
 apologize about that.  If I'm wrong about something here, please let me know!

Home | Main Index | Thread Index | Old Index