Subject: bin/18895: make -j pauses between jobs
To: None <gnats-bugs@gnats.netbsd.org>
From: None <gson@gson.org>
List: netbsd-bugs
Date: 11/02/2002 10:22:42
>Number:         18895
>Category:       bin
>Synopsis:       make -j pauses between jobs
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    bin-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Nov 02 10:23:00 PST 2002
>Closed-Date:
>Last-Modified:
>Originator:     Andreas Gustafsson
>Release:        NetBSD 1.6I
>Organization:
Speaking for myself
>Environment:
System: NetBSD guava.araneus.fi 1.6I NetBSD 1.6I (GUAVAMP) #1: Sun Oct 6 19:38:50 PDT 2002 gson@guava.araneus.fi:/usr/src/sys/arch/i386.sp/compile/GUAVAMP i386
Architecture: i386
Machine: i386
>Description:

When running build.sh -j N where N >= 2, the build sometimes pauses
for several seconds, leaving the CPU(s) idle.

I am seeing this behavior on a dual AMD Athlon 1800 with "-j 2", and
Julio Merino <jmmv@menta.net> reported seeing similar behavior on a
uniprocessor with "-j 4".  See the discussion under the subject
"Strange behavior with build.sh -j 4" on current-users.

When this happens, the make process is sleeping in the poll() call in
/usr/src/usr.bin/make/job.c, and the poll() only returns when its
timeout expires after five seconds.

My theory as to the cause of this problem is that there is a race
condition in make where it fails to detect a job exiting if it happens
so quickly that the SIGCHLD gets delivered before it enters poll().
It is also possible that kern/17517 could have something to do with
this.

>How-To-Repeat:

Run "build.sh -j 2" on a fast machine.  Observe the pauses in the make
output, or run "vmstat 1" in a different windows and observe periods
of 100% idle.  Optionally, increase the value of POLL_MSEC in
/usr/src/usr.bin/make/job.h to 60000 or so first to make the pauses
even more noticeable.

>Fix:

Make the SIGCHLD signal handler write a message to a pipe which is
included in the poll(), and/or fix kern/17517.
>Release-Note:
>Audit-Trail:
>Unformatted:
 (-current as of Oct 20, 2002)