NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: bin/47524: make(1) loops in Job_CatchOutput()



The following reply was made to PR bin/47524; it has been noted by GNATS.

From: Andreas Gustafsson <gson%gson.org@localhost>
To: dsl%NetBSD.org@localhost
Cc: gnats-bugs%NetBSD.org@localhost,
    christos%NetBSD.org@localhost
Subject: Re: bin/47524: make(1) loops in Job_CatchOutput()
Date: Tue, 5 Feb 2013 23:46:07 +0200

 David Laight wrote:
 >  >  But in the timeout case, poll() will return 0, not < 0, therefore
 >  >  Job_CatchChildren() will in fact *not* be called when there is a
 >  >  timeout.  It will, however, be called if the poll() system call
 >  >  fails.  Should the "nready < 0" have been "nready == 0"?
 >  
 >  Quite possibly, I think I was just reducing the number of times
 >  Job_CatchChildren() would be called unnecessarily.
 >  
 >  The only time this all goes awry is when the number of jobs gets near
 >  to the number of bytes that can be written into a pipe.
 >  The 'job' pipe will just lose credit, you'd have to be very unluky to
 >  lose anywhere else.
 
 I'm afraid I can't share your confidence in that being the only
 remaining bug.  I also think that bug is unrelated to the problem
 at hand.
 
 I think I now know how the infinite loop happened: poll() was
 interrupted by a signal (probably a SIGCHILD) and returned nready= -1
 with errno=EINTR.  As a result of your addition of the "nready < 0" in
 "if (nready < 0 || readyfd(&childExitJob))", a read() was then
 attempted on the childExitJob pipe even though the select had not
 indicated it was ready.  Since the pipe was not ready, the read()
 failed with EAGAIN, which used to be harmlessly ignored until Christos
 turned it into an infinite loop.
 
 I propose the following change, where the return values from both the
 poll() and the read() are actually checked in the sense of reporting
 unexpected errors instead of just looping on EAGAIN and silently
 ignoring all other errors.  I have successfully tested it with one
 native -j4 build and one -j12 cross-build from a Linux host.
 
 The other loops Christos added should probably also be replaced
 by something similar.
 
 Index: job.c
 ===================================================================
 RCS file: /cvsroot/src/usr.bin/make/job.c,v
 retrieving revision 1.168
 diff -u -r1.168 job.c
 --- job.c      2 Feb 2013 15:24:08 -0000       1.168
 +++ job.c      5 Feb 2013 08:26:03 -0000
 @@ -139,6 +139,7 @@
  #include <sys/time.h>
  #include <sys/wait.h>
  
 +#include <assert.h>
  #include <errno.h>
  #include <fcntl.h>
  #ifndef USE_SELECT
 @@ -2040,22 +2041,34 @@
  {
      int nready;
      Job *job;
 -    int i, n;
 +    int i;
  
      (void)fflush(stdout);
  
      /* The first fd in the list is the job token pipe */
 -    nready = poll(fds + 1 - wantToken, nfds - 1 + wantToken, POLL_MSEC);
 -
 -    if (nready < 0 || readyfd(&childExitJob)) {
 -      char token = 0;
 -      nready -= 1;
 -      for (n = 0; n < 5 &&
 -          read(childExitJob.inPipe, &token, 1) == -1 && errno == EAGAIN; n++)
 -              continue;
 -      if (token == DO_JOB_RESUME[0])
 -          /* Complete relay requested from our SIGCONT handler */
 -          JobRestartJobs();
 +    do {
 +      nready = poll(fds + 1 - wantToken, nfds - 1 + wantToken, POLL_MSEC);
 +    } while (nready < 0 && errno == EINTR);
 +    if (nready < 0)
 +      Punt("poll: %s", strerror(errno));
 +
 +    if (nready == 0 || readyfd(&childExitJob)) {
 +      if (nready > 0) {
 +          char token = 0;
 +          int count;
 +          nready -= 1;
 +          count = read(childExitJob.inPipe, &token, 1);
 +          if (count < 0) {
 +              Punt("token pipe read: %s", strerror(errno));
 +          } else if (count == 0) {
 +              Punt("unexpected eof on token pipe");
 +          } else { /* count > 0 */
 +              assert(count == 1);
 +               if (token == DO_JOB_RESUME[0])
 +                   /* Complete relay requested from our SIGCONT handler */
 +                   JobRestartJobs();
 +          }
 +      }
        Job_CatchChildren();
      }
  
 -- 
 Andreas Gustafsson, gson%gson.org@localhost
 


Home | Main Index | Thread Index | Old Index