NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: misc/59255: tests/lib/librumpclient/t_exec: intermittent failures
The following reply was made to PR misc/59255; it has been noted by GNATS.
From: Robert Elz <kre%munnari.OZ.AU@localhost>
To: gnats-bugs%netbsd.org@localhost
Cc:
Subject: Re: misc/59255: tests/lib/librumpclient/t_exec: intermittent failures
Date: Mon, 07 Apr 2025 22:56:03 +0700
Date: Mon, 7 Apr 2025 12:55:01 +0000 (UTC)
From: "Taylor R Campbell via gnats" <gnats-admin%NetBSD.org@localhost>
Message-ID: <20250407125501.54C371A923C%mollari.NetBSD.org@localhost>
| The sequence of events is something like this:
| 9(b). vfork returns in parent, parent exits, test runs rump.sockstat
As I recall it (9(b)) is a little more complicated, but that's just
incidental details, in essence exactly, and
| The test fails if 9(b) runs before 9(a) so rump.sockstat still shows
| the old p_comm rather than the new p_comm.
Yes, that was my conclusion. On most systems this is probably rare, the
child is already using the CPU, and would normally just keep on running,
while the parent has been sleeping and needs to get itself scheduled.
That's likely why you can't make it fail in local tests. b5 is something
of an unusual environment - I haven't attempted to look, but it could be
that the probability of failure is higher when b5 is simultaneously
doing several other parallel builds/test runs when the test is run, and
much less likely to fail when it is (for b5) relatively idle (or even
perhaps vice versa).
| We can ensure these are sequenced, preserving the non-rumpy vfork(2)
| semantics, by creating a pipe shared between parent and child. The
| attached patch implements this.
That's one way - what's needed is some way for the child to inform the
parent that it has completed its task, and is ready for the script to
test the results. A pipe can achieve that, so could sending a (caught)
signal from the child to the parent (which would not require any kind of
detour via rump). The are other more heavyweight possibilities.
But before doing any of that I think we really need to understand the
purpose of the test, if it is to test that sockstat can get owner info
from sockets, that can be done with a much simpler test. If it is to
test that vfork works, that can also be done with a much simpler test
(the one that is there now would be satisfied by fork() instead I believe,
whereas we need that vfork() have vfork() properties and not just be
fork()) if it is to test that exec passes args than can be parsed, that
can also be done with a much simpler test.
I just cannot fathom what the test is actually testing. Without that
what ought be done to it remains mysterious, and is why I just gave up
on looking at it.
| That said, I'm not entirely sure that p_comm access is _guaranteed_ to
| be ready by the time a vforked execve(2) wakes the parent.
Aside from the race condition above, I didn't look further, so that may
indeed also be an issue.
| But it's not really that costly to add this additional logic to
| rumpclient to dispense with the question altogether; it's more for
| testing and experiments than performance.
Yes, and while keeping the test runtime on b5 down to something reasonable
(ie: not adding anything not really necessary for a test) is a good thing,
its hard to see any changes here making any material difference.
kre
Home |
Main Index |
Thread Index |
Old Index