tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

RFC: import of posix_spawn GSoC results

Hi folks,

during this years Summer of Code Charles Zhang and I implemented the
posix_spawn syscall. We are now at a point that, with some further minor
cleanup and debugging, this is ready to commit. The main changes are pretty
local, but to avoid code duplication a few of the existing file operations
had to be modified, and this causes changes literally all over the tree.

Let's look at it from a userland perspective first: what use is posix_spawn?
Historically the fork/exec model used in unix has been pretty efficient,
and later variants of it (vfork, some call it a hack) have made it even
more efficient. Nowadays, with a lot of multithreaded applications, neither
fits well. So posix_spwawn was invented, and it is realy simple to use.
A minimalistic test program is:

#include <stdio.h>
#include <string.h>
#include <spawn.h>

int main(int argc, char **argv)
        pid_t child = 0;
        int err;
        extern char **environ;
        char * const cav[3] = { "ls", "-l", NULL };

        printf("trying to spawn /bin/ls\n");
        err = posix_spawn(&child, "/bin/ls", NULL, NULL, cav, environ);
        printf("err: %d, child: %d\n", err, (int)child);

        return 0;

If you don't want to hardcode /bin/ls, the posix_spawnp() variant is available,
which uses the PATH environment variable to locate the binary:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <spawn.h>

int main(int argc, char **argv)
        pid_t child = 0;
        int err;
        extern char **environ;
        char * const cav[3] = { "ls", "-l", NULL };

        printf("trying to spawn any ls\n");
        err = posix_spawnp(&child, "ls", NULL, NULL, cav, environ);
        printf("err: %d, child: %d\n", err, (int)child);

        return 0;

This is implemented as a simple userland wrapper in libc, while posix_spawn
itself is a real system call.

There are a few fancy magic things you can do by passing some of the arguments
that are NULL in above simplisitic examples, which cause the kernel to close
or dup file handles, adjust scheduler paramaters etc. - in short: everything
you would have done in the child process after a (v)fork. However, you do not
have to go through the cloning of the VM space and manual userland adjustments
before the exec, the kernel does all that for you.

Now let's look at the kernel changes: posix_spawn was implemented in a way
that tries to avoid code duplication. Basically it breaks down what used
to be done in execve1() in NetBSD into two parts: execve_loadvm(), which
handles all the VM space setup, and execve_runproc(), which deals with
the exec part. Therefore the patch turns execve1() into:

+execve1(struct lwp *l, const char *path, char * const *args,
+    char * const *envs, execve_fetch_element_t fetch_element)
+       struct execve_data data;
+       int error;
+       error = execve_loadvm(l, path, args, envs, fetch_element, &data);
+       if (error)
+               return error;
+       error = execve_runproc(l, &data);
+       return error;

The struct execve_data is everything that used to be common local variabels
in execve1, but now needs to be stored in a common structure explicitly, as
posix_spawn() will do the exeve_loadvm() part in the parent process, but
do the execve_runproc() inside the freshly created lwp.

So far this was all easy. A few unexpecte stranglers as we do create a fresh
new VM space instead of a full grown clone of the old one, basically a few
more checks for NULL here and there. The only prominent one is in
elf_load_file where we need to decide for topdown or bottomup memory
layout - this used to clone the setting from the parent VM space, but with
posix_spawn we do not have an old VM around which is relevant, so we just
go with the default:

+       if (p->p_vmspace)
+               use_topdown = p->p_vmspace->vm_map.flags & VM_MAP_TOPDOWN;
+       else
+               use_topdown = true;
+               use_topdown = false;

This could be considered a hack and I'm open to better suggestions.

Now what caused all the intrusiveness of the patch? Since posix_spawn, before
doing the exec part, needs to manipulate file handles on behalf of the
user's request for the new process, we need to pass a "lwp" argument to a
few of the already existing file descriptor manipulating functions.

This is the change to the header:

Index: sys/sys/filedesc.h
RCS file: /cvsroot/src/sys/sys/filedesc.h,v
retrieving revision 1.61
diff -c -u -p -r1.61 filedesc.h
--- sys/sys/filedesc.h  26 Jun 2011 16:43:12 -0000      1.61
+++ sys/sys/filedesc.h  18 Dec 2011 23:41:37 -0000
@@ -181,10 +181,11 @@ struct proc;
  * Kernel global variables and routines.
 void   fd_sys_init(void);
-int    fd_dupopen(int, int *, int, int);
+int    fd_open(lwp_t *, const char *, int, int, int *);
+int    fd_dupopen(lwp_t *, int, int *, int, int);
 int    fd_alloc(struct proc *, int, int *);
 void   fd_tryexpand(struct proc *);
-int    fd_allocfile(file_t **, int *);
+int    fd_allocfile(lwp_t *, file_t **, int *);
 void   fd_affix(struct proc *, file_t *, unsigned);
 void   fd_abort(struct proc *, file_t *, unsigned);
 filedesc_t *fd_copy(void);
@@ -192,19 +193,19 @@ filedesc_t *fd_init(filedesc_t *);
 void   fd_share(proc_t *);
 void   fd_hold(lwp_t *);
 void   fd_free(void);
-void   fd_closeexec(void);
+void   fd_closeexec(lwp_t *);
 void   fd_ktrexecfd(void);
 int    fd_checkstd(void);
-file_t *fd_getfile(unsigned);
+file_t *fd_getfile(lwp_t *, unsigned);
 file_t *fd_getfile2(proc_t *, unsigned);
-void   fd_putfile(unsigned);
+void   fd_putfile(lwp_t *, unsigned);
 int    fd_getvnode(unsigned, file_t **);
 int    fd_getsock(unsigned, struct socket **);
 void   fd_putvnode(unsigned);
 void   fd_putsock(unsigned);
-int    fd_close(unsigned);
-int    fd_dup(file_t *, int, int *, bool);
-int    fd_dup2(file_t *, unsigned, int);
+int    fd_close(lwp_t *, unsigned);
+int    fd_dup(lwp_t *, file_t *, int, int *, bool);
+int    fd_dup2(lwp_t *, file_t *, unsigned, int);
 int    fd_clone(file_t *, unsigned, int, const struct fileops *, void *);
 void   fd_set_exclose(struct lwp *, int, bool);
 int    pipe1(struct lwp *, register_t *, int);

and this made it all realy nasty. Every emulation, everything touching file
handles in the kernel, had to be adjusted. Code changes were mostly mechanical
and of course trivial, but I am sure there will be fallout. We will fix it
post commit ASAP, of course.

There are a few test cases for the new syscalls, but they are not yet included
in the files below. Cleanup and atfication of testcases is not fully done yet.

Benchmarks results are being prepared, preliminary results show no noticable
performance difference (which would have to be blamed on the execve1 split
into two functions). I believe the final benchmark results will show no
effect here either - but we will check before committing.

You can find the new files needed for this at:

and the diff for the existing files at:

So, please have a look, all comments welcome.


Home | Main Index | Thread Index | Old Index