Subject: kern/14658: wait() hangs forever
To: None <gnats-bugs@gnats.netbsd.org>
From: Martin Husemann <martin@aprisoft.de>
List: netbsd-bugs
Date: 11/21/2001 09:15:27
>Number:         14658
>Category:       kern
>Synopsis:       wait() hangs forever
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Nov 21 00:19:00 PST 2001
>Closed-Date:
>Last-Modified:
>Originator:     Martin Husemann
>Release:        NetBSD 1.5Y
>Organization:
	
>Environment:

System: NetBSD beasty.aprisoft.de 1.5Y NetBSD 1.5Y (BEASTY) #0: Tue Nov 20 17:25:21 CET 2001 martin@beasty.aprisoft.de:/usr/src/sys-mp/arch/i386/compile/BEASTY i386
Architecture: i386
Machine: i386
>Description:

A ./build.sh resulted in a process hung on exit (or it's parent not
waking up):

UID   PID  PPID CPU PRI NI   VSZ RSS WCHAN STAT TT    TIME COMMAND
  0 18815 18814  30  10  0   116   4 wait  IW+  p1 0:00.00 /usr/tools/bin/nbgroff -Tascii -mtt
  0 18816 18815  36  64  0     0   0 -     Z+   p1 0:00.00 (troff)

It stayed like this over night, the machne otherwise works ok.

Killing the zombie (obviously) does not work, killing the parent (18815)
works and wakes up all it's parent (making the make fail).

>How-To-Repeat:

Do a full build on a SMP machine. This happens often, but not every time.
I seem not to be able to reproduce it with options LOCKDEBUG enabled, so
I guess either LOCKDEBUG hides some bug in the locking code or this is
timing critical. This is a dual PII@400MHz machine, 256 MB Ram, no swapping
involved.

>Fix:
no clue, sorry.
>Release-Note:
>Audit-Trail:
>Unformatted:
 
 i386 kernel, cvs updated yesterday, arch/i386 from the sommerfeld SMP branch.