NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: misc/59255: tests/lib/librumpclient/t_exec: intermittent failures
> Date: Mon, 07 Apr 2025 02:00:06 +0700
> From: Robert Elz <kre%munnari.OZ.AU@localhost>
>
> Date: Sun, 6 Apr 2025 14:20:01 +0000 (UTC)
> From: campbell+netbsd%mumble.net@localhost
> Message-ID: <20250406142001.10B3B1A923E%mollari.NetBSD.org@localhost>
>
>
> | Various test cases in tests/lib/librumpclient/t_exec have
> | been intermittently failing for a while:
>
> Probably forever. I took a look at this one in particular a
> while ago, there looks to be a fairly obvious race condition in
> the test - it vforks, then the child and parent each set argv[0]
> and re-exec the test file again - the test case looks to see that
> both processes have a shared socket open - or something like that.
>
> But that test happens when the parent exits, simply assuming that by
> that time the child will have had time to establish its setup (since,
> being vfork(), it has to exec() or exit() before the parent gets
> control back from the vfork()), and often, that works, but not always.
> When it doesn't, when the test code (the script) looks to see the state
> of the sockets it isn't what it is expecting.
The sequence of events is something like this:
1. vfork()
2. vfork returns in child
3. child sets childran=1
4. child sends HANDSHAKE_FORK to rump_server
5. server sets up file descriptors
6. child receives HANDSHAKE_FORK reply from rump_server
7. child calls rumpclient_exec
8. child calls execve (no communication with rump_server)
At this point, the child and parent run in parallel:
9(a). child calls rumpclient_init -> lwproc_execnotify to tell
rump_server its p_comm (and whatever else, like closing
O_CLOEXEC descriptors)
9(b). vfork returns in parent, parent exits, test runs rump.sockstat
The test fails if 9(b) runs before 9(a) so rump.sockstat still shows
the old p_comm rather than the new p_comm.
We can ensure these are sequenced, preserving the non-rumpy vfork(2)
semantics, by creating a pipe shared between parent and child. The
attached patch implements this.
(I have not been able to reproduce the failure at all in a VM on my
laptop, though, after thousands of trials, so I can't confirm it
eliminates the symptom.)
That said, I'm not entirely sure that p_comm access is _guaranteed_ to
be ready by the time a vforked execve(2) wakes the parent. There is a
similar question about psstrings with posix_spawn:
https://gnats.netbsd.org/59175
But it's not really that costly to add this additional logic to
rumpclient to dispense with the question altogether; it's more for
testing and experiments than performance.
# HG changeset patch
# User Taylor R Campbell <riastradh%NetBSD.org@localhost>
# Date 1744029106 0
# Mon Apr 07 12:31:46 2025 +0000
# Branch trunk
# Node ID a51ca88c1069b743be56446e0cc8b6e102719bec
# Parent cce289282f347391206ac4392309e42775902465
# EXP-Topic riastradh-pr59255-rumpexectests
librumpclient: Make rumpclient_vfork wait for child to finish exec.
`Finish exec' here means making it into rumpclient_init past
lwproc_execnotify.
No change to rumpclient_fork, other than reserving thread-local
storage for one int object to pass a file descriptor from
rumpclient_vfork to rumpclient_exec.
PR misc/59255: tests/lib/librumpclient/t_exec: intermittent failures
diff -r cce289282f34 -r a51ca88c1069 distrib/sets/lists/base/shl.mi
--- a/distrib/sets/lists/base/shl.mi Mon Apr 07 01:54:02 2025 +0000
+++ b/distrib/sets/lists/base/shl.mi Mon Apr 07 12:31:46 2025 +0000
@@ -90,7 +90,7 @@
./lib/libradius.so.5.0 base-sys-shlib dynamicroot
./lib/librumpclient.so base-sys-shlib dynamicroot,rump
./lib/librumpclient.so.0 base-sys-shlib dynamicroot,rump
-./lib/librumpclient.so.0.0 base-sys-shlib dynamicroot,rump
+./lib/librumpclient.so.0.1 base-sys-shlib dynamicroot,rump
./lib/librumpres.so base-sys-shlib dynamicroot,rump
./lib/librumpres.so.0 base-sys-shlib dynamicroot,rump
./lib/librumpres.so.0.0 base-sys-shlib dynamicroot,rump
diff -r cce289282f34 -r a51ca88c1069 distrib/sets/lists/debug/shl.mi
--- a/distrib/sets/lists/debug/shl.mi Mon Apr 07 01:54:02 2025 +0000
+++ b/distrib/sets/lists/debug/shl.mi Mon Apr 07 12:31:46 2025 +0000
@@ -29,7 +29,7 @@
./usr/libdata/debug/lib/libprop.so.1.2.debug comp-sys-debug debug,dynamicroot
./usr/libdata/debug/lib/libpthread.so.1.5.debug comp-sys-debug debug,dynamicroot
./usr/libdata/debug/lib/libradius.so.5.0.debug comp-sys-debug debug,dynamicroot
-./usr/libdata/debug/lib/librumpclient.so.0.0.debug comp-rump-debug debug,dynamicroot,rump
+./usr/libdata/debug/lib/librumpclient.so.0.1.debug comp-rump-debug debug,dynamicroot,rump
./usr/libdata/debug/lib/librumpres.so.0.0.debug comp-rump-debug debug,dynamicroot,rump
./usr/libdata/debug/lib/libterminfo.so.2.0.debug comp-sys-debug debug,dynamicroot
./usr/libdata/debug/lib/libumem.so.0.0.debug comp-zfs-debug debug,dynamicroot,zfs
diff -r cce289282f34 -r a51ca88c1069 lib/librumpclient/rumpclient.c
--- a/lib/librumpclient/rumpclient.c Mon Apr 07 01:54:02 2025 +0000
+++ b/lib/librumpclient/rumpclient.c Mon Apr 07 12:31:46 2025 +0000
@@ -83,6 +83,7 @@
#define HOSTOPS
int (*host_socket)(int, int, int);
+int (*host_socketpair)(int, int, int, int *);
int (*host_close)(int);
int (*host_connect)(int, const struct sockaddr *, socklen_t);
int (*host_fcntl)(int, int, ...);
@@ -121,9 +122,13 @@ static struct spclient clispc = {
static int holyfd = -1;
static sigset_t fullset;
+static __thread int waitforexecnotifyparentfd = -1;
+
static int doconnect(void);
static int handshake_req(struct spclient *, int, void *, int, bool);
+static void waitforexec_notify(int);
+
/*
* Default: don't retry. Most clients can't handle it
* (consider e.g. fds suddenly going missing).
@@ -864,6 +869,7 @@ rumpclient_init(void)
int error;
int rv = -1;
int hstype;
+ int notifyparentfd;
pid_t mypid;
/*
@@ -911,6 +917,7 @@ rumpclient_init(void)
FINDSYM(socket)
#endif
+ FINDSYM(socketpair)
FINDSYM(close)
FINDSYM(connect)
FINDSYM(fcntl)
@@ -960,7 +967,8 @@ rumpclient_init(void)
goto out;
if ((p = getenv("RUMPCLIENT__EXECFD")) != NULL) {
- sscanf(p, "%d,%d", &clispc.spc_fd, &holyfd);
+ sscanf(p, "%d,%d,%d", &clispc.spc_fd, &holyfd,
+ ¬ifyparentfd);
unsetenv("RUMPCLIENT__EXECFD");
hstype = HANDSHAKE_EXEC;
} else {
@@ -980,6 +988,12 @@ rumpclient_init(void)
}
rv = 0;
+ /*
+ * Notify the parent that exec has completed, in case it is
+ * waiting on vfork.
+ */
+ waitforexec_notify(notifyparentfd);
+
out:
if (rv == -1)
init_done = 0;
@@ -1023,6 +1037,129 @@ rumpclient_prefork(void)
return rpf;
}
+#define WAITFOREXECFD_PARENT 0
+#define WAITFOREXECFD_CHILD 1
+
+/*
+ * rumpclient_waitforexec_prepare(waitforexecfd)
+ *
+ * Called from the parent before forking when the parent wants to
+ * wait for exec to complete in the child, for vfork(2) semantics.
+ * Initialize execfd[0] and execfd[1] with file descriptors for
+ * use with rumpclient_waitforexec_parent/child (or
+ * rumpclient_waitforexec_cancel).
+ */
+int
+rumpclient_waitforexec_prepare(int waitforexecfd[2])
+{
+
+ return host_socketpair(PF_LOCAL, SOCK_STREAM, 0, waitforexecfd);
+}
+
+/*
+ * rumpclient_waitforexec_cancel(waitforexecfd)
+ *
+ * Called from the parent after rumpclient_waitforexec_prepare if
+ * fork/vfork fails.
+ */
+void
+rumpclient_waitforexec_cancel(int waitforexecfd[2])
+{
+
+ (void)host_close(waitforexecfd[WAITFOREXECFD_PARENT]);
+ (void)host_close(waitforexecfd[WAITFOREXECFD_CHILD]);
+}
+
+/*
+ * rumpclient_waitforexec_child(waitforexecfd)
+ *
+ * Called by the child between rumpclient_vfork and
+ * rumpclient_exec with the fds created by
+ * rumpclient_waitforexec_prepare.
+ */
+void
+rumpclient_waitforexec_child(int waitforexecfd[2])
+{
+
+ /*
+ * Close the parent's fd -- we don't need it any more.
+ */
+ (void)host_close(waitforexecfd[WAITFOREXECFD_PARENT]);
+
+ /*
+ * Record the fd we will use to notify the parent after exec,
+ * passed through the RUMPCLIENT__EXECFD environment variable.
+ *
+ * We are running as the single thread of a child process, but
+ * we still share address space with the parent, so it's really
+ * multithreaded. waitforexecnotifyparentfd is a thread-local
+ * variable to obviate any need for serialization here.
+ */
+ waitforexecnotifyparentfd = waitforexecfd[WAITFOREXECFD_CHILD];
+}
+
+/*
+ * rumpclient_waitforexec_parent(waitforexecfd)
+ *
+ * Called by the parent after rumpclient_vfork to wait for the
+ * child to call rumpclient_exec, or exit.
+ */
+void
+rumpclient_waitforexec_parent(int waitforexecfd[2])
+{
+ char c;
+
+ /*
+ * Close the child's end for writing to us so we won't hang
+ * forever if the child exits without writing anything. Next,
+ * read from our end to wait until the child has written or
+ * exited.
+ */
+ (void)host_close(waitforexecfd[WAITFOREXECFD_CHILD]);
+ (void)host_read(waitforexecfd[WAITFOREXECFD_PARENT], &c, 1);
+
+ /*
+ * Close our end now that we've read from it, and out of
+ * paranoia, clear it out of waitforexecnotifyparentfd.
+ *
+ * We must wait until _after_ the read to clear out
+ * waitforexecnotifyparentfd because the child shares address
+ * space until it execs.
+ */
+ (void)host_close(waitforexecfd[WAITFOREXECFD_PARENT]);
+ waitforexecnotifyparentfd = -1;
+}
+
+/*
+ * waitforexec_notify(notifyparentfd)
+ *
+ * Called by the child after exec. Wakes the parent waiting in
+ * rumpclient_waitforexec_parent, if any.
+ */
+static void
+waitforexec_notify(int notifyparentfd)
+{
+ char b = 0;
+ struct iovec iov = { .iov_base = &b, .iov_len = 1 };
+ struct msghdr msg = { .msg_iov = &iov, .msg_iovlen = 1 };
+
+ /*
+ * If there's no notifyparentfd, because the child was created
+ * with fork rather than vfork, nothing to do.
+ */
+ if (notifyparentfd == -1)
+ return;
+
+ /*
+ * Parent is waiting for exec to complete. Notify them that
+ * exec has completed -- but if the parent has died, don't
+ * raise SIGPIPE; just move on. After this we have no more
+ * need of the connection, so close the fd.
+ */
+ (void)host_sendmsg(notifyparentfd, &msg, MSG_NOSIGNAL);
+ (void)host_close(notifyparentfd);
+}
+
int
rumpclient_fork_init(struct rumpclient_fork *rpf)
{
@@ -1148,7 +1285,7 @@ pid_t
rumpclient_fork(void)
{
- return rumpclient__dofork(fork);
+ return rumpclient__dofork(fork, /*waitforexec*/0);
}
/*
@@ -1166,8 +1303,8 @@ rumpclient_exec(const char *path, char *
size_t nelem;
int rv, sverrno;
- snprintf(buf, sizeof(buf), "RUMPCLIENT__EXECFD=%d,%d",
- clispc.spc_fd, holyfd);
+ snprintf(buf, sizeof(buf), "RUMPCLIENT__EXECFD=%d,%d,%d",
+ clispc.spc_fd, holyfd, waitforexecnotifyparentfd);
envstr = malloc(strlen(buf)+1);
if (envstr == NULL) {
return ENOMEM;
diff -r cce289282f34 -r a51ca88c1069 lib/librumpclient/rumpclient.h
--- a/lib/librumpclient/rumpclient.h Mon Apr 07 01:54:02 2025 +0000
+++ b/lib/librumpclient/rumpclient.h Mon Apr 07 12:31:46 2025 +0000
@@ -48,7 +48,7 @@ typedef RUMP_REGISTER_T register_t;
struct rumpclient_fork;
-#define rumpclient_vfork() rumpclient__dofork(vfork)
+#define rumpclient_vfork() rumpclient__dofork(vfork, /*waitforexec*/1)
#ifdef __BEGIN_DECLS
__BEGIN_DECLS
@@ -63,6 +63,10 @@ struct rumpclient_fork *rumpclient_prefo
int rumpclient_fork_init(struct rumpclient_fork *);
void rumpclient_fork_cancel(struct rumpclient_fork *);
void rumpclient_fork_vparent(struct rumpclient_fork *);
+int rumpclient_waitforexec_prepare(int[2]);
+void rumpclient_waitforexec_cancel(int[2]);
+void rumpclient_waitforexec_child(int[2]);
+void rumpclient_waitforexec_parent(int[2]);
pid_t rumpclient_fork(void);
int rumpclient_exec(const char *, char *const [], char *const[]);
@@ -86,21 +90,31 @@ int rumpclient__closenotify(int *, enum
* run in the caller's stackframe.
*/
static __attribute__((__always_inline__)) __returns_twice inline pid_t
-rumpclient__dofork(pid_t (*forkfn)(void))
+rumpclient__dofork(pid_t (*forkfn)(void), int waitforexec)
{
struct rumpclient_fork *rf;
pid_t pid;
int childran = 0;
+ int waitforexecfd[2];
if (!(rf = rumpclient_prefork()))
return -1;
-
+ if (waitforexec) {
+ if (rumpclient_waitforexec_prepare(waitforexecfd) == -1) {
+ rumpclient_fork_cancel(rf);
+ return -1;
+ }
+ }
switch ((pid = forkfn())) {
case -1:
+ if (waitforexec)
+ rumpclient_waitforexec_cancel(waitforexecfd);
rumpclient_fork_cancel(rf);
break;
case 0:
childran = 1;
+ if (waitforexec)
+ rumpclient_waitforexec_child(waitforexecfd);
if (rumpclient_fork_init(rf) == -1)
pid = -1;
break;
@@ -108,6 +122,8 @@ rumpclient__dofork(pid_t (*forkfn)(void)
/* XXX: multithreaded vforker? do they exist? */
if (childran)
rumpclient_fork_vparent(rf);
+ if (waitforexec)
+ rumpclient_waitforexec_parent(waitforexecfd);
break;
}
diff -r cce289282f34 -r a51ca88c1069 lib/librumpclient/shlib_version
--- a/lib/librumpclient/shlib_version Mon Apr 07 01:54:02 2025 +0000
+++ b/lib/librumpclient/shlib_version Mon Apr 07 12:31:46 2025 +0000
@@ -1,4 +1,4 @@
# $NetBSD: shlib_version,v 1.1 2010/11/04 21:01:29 pooka Exp $
#
major=0
-minor=0
+minor=1
Home |
Main Index |
Thread Index |
Old Index