NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: bin/47524: make(1) loops in Job_CatchOutput()
The following reply was made to PR bin/47524; it has been noted by GNATS.
From: Andreas Gustafsson <gson%gson.org@localhost>
To: dsl%NetBSD.org@localhost
Cc: gnats-bugs%NetBSD.org@localhost,
christos%NetBSD.org@localhost
Subject: Re: bin/47524: make(1) loops in Job_CatchOutput()
Date: Tue, 5 Feb 2013 23:46:07 +0200
David Laight wrote:
> > But in the timeout case, poll() will return 0, not < 0, therefore
> > Job_CatchChildren() will in fact *not* be called when there is a
> > timeout. It will, however, be called if the poll() system call
> > fails. Should the "nready < 0" have been "nready == 0"?
>
> Quite possibly, I think I was just reducing the number of times
> Job_CatchChildren() would be called unnecessarily.
>
> The only time this all goes awry is when the number of jobs gets near
> to the number of bytes that can be written into a pipe.
> The 'job' pipe will just lose credit, you'd have to be very unluky to
> lose anywhere else.
I'm afraid I can't share your confidence in that being the only
remaining bug. I also think that bug is unrelated to the problem
at hand.
I think I now know how the infinite loop happened: poll() was
interrupted by a signal (probably a SIGCHILD) and returned nready= -1
with errno=EINTR. As a result of your addition of the "nready < 0" in
"if (nready < 0 || readyfd(&childExitJob))", a read() was then
attempted on the childExitJob pipe even though the select had not
indicated it was ready. Since the pipe was not ready, the read()
failed with EAGAIN, which used to be harmlessly ignored until Christos
turned it into an infinite loop.
I propose the following change, where the return values from both the
poll() and the read() are actually checked in the sense of reporting
unexpected errors instead of just looping on EAGAIN and silently
ignoring all other errors. I have successfully tested it with one
native -j4 build and one -j12 cross-build from a Linux host.
The other loops Christos added should probably also be replaced
by something similar.
Index: job.c
===================================================================
RCS file: /cvsroot/src/usr.bin/make/job.c,v
retrieving revision 1.168
diff -u -r1.168 job.c
--- job.c 2 Feb 2013 15:24:08 -0000 1.168
+++ job.c 5 Feb 2013 08:26:03 -0000
@@ -139,6 +139,7 @@
#include <sys/time.h>
#include <sys/wait.h>
+#include <assert.h>
#include <errno.h>
#include <fcntl.h>
#ifndef USE_SELECT
@@ -2040,22 +2041,34 @@
{
int nready;
Job *job;
- int i, n;
+ int i;
(void)fflush(stdout);
/* The first fd in the list is the job token pipe */
- nready = poll(fds + 1 - wantToken, nfds - 1 + wantToken, POLL_MSEC);
-
- if (nready < 0 || readyfd(&childExitJob)) {
- char token = 0;
- nready -= 1;
- for (n = 0; n < 5 &&
- read(childExitJob.inPipe, &token, 1) == -1 && errno == EAGAIN; n++)
- continue;
- if (token == DO_JOB_RESUME[0])
- /* Complete relay requested from our SIGCONT handler */
- JobRestartJobs();
+ do {
+ nready = poll(fds + 1 - wantToken, nfds - 1 + wantToken, POLL_MSEC);
+ } while (nready < 0 && errno == EINTR);
+ if (nready < 0)
+ Punt("poll: %s", strerror(errno));
+
+ if (nready == 0 || readyfd(&childExitJob)) {
+ if (nready > 0) {
+ char token = 0;
+ int count;
+ nready -= 1;
+ count = read(childExitJob.inPipe, &token, 1);
+ if (count < 0) {
+ Punt("token pipe read: %s", strerror(errno));
+ } else if (count == 0) {
+ Punt("unexpected eof on token pipe");
+ } else { /* count > 0 */
+ assert(count == 1);
+ if (token == DO_JOB_RESUME[0])
+ /* Complete relay requested from our SIGCONT handler */
+ JobRestartJobs();
+ }
+ }
Job_CatchChildren();
}
--
Andreas Gustafsson, gson%gson.org@localhost
Home |
Main Index |
Thread Index |
Old Index