Subject: kern/2061: Uninteruptable processes.
To: None <gnats-bugs@NetBSD.ORG>
From: David Gilbert <dgilbert@jaywon.pci.on.ca>
List: netbsd-bugs
Date: 02/10/1996 19:57:44
>Number:         2061
>Category:       kern
>Synopsis:       processes get stuck in uninteruptible disk wait
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    kern-bug-people (Kernel Bug People)
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Feb 10 20:20:08 1996
>Last-Modified:
>Originator:     David Gilbert
>Organization:
----------------------------------------------------------------------------
|David Gilbert, PCI, Richmond Hill, Ontario.  | Two things can only be     |
|Mail:      dgilbert@jaywon.pci.on.ca         |  equal if and only if they |
|http://www.pci.on.ca/~dgilbert               |   are precisely opposite.  |
---------------------------------------------------------GLO----------------
>Release:        1.1
>Environment:
	
System: NetBSD repeat 1.1A NetBSD 1.1A (REPEAT) #37: Fri Feb 9 23:31:14 EST 1996 root@:/u/dgilbert/src/sys/arch/sparc/compile/REPEAT sparc


>Description:
	It is possible that you might actually change this to
port-sparc, but I thought that I'd give it a more general audience
first.  It is also possible that this is related to the clock stopping
problem on the sparc --- but not exclusively so... I think it is
related in cause only (a race condition) 

	What happens is that I will get processes (usually but not
always very busy processes) that get stuck in 'non-interuptible disk
wait' (flag D in ps -ax).  They will never leave this state and are
immune to a root kill -9.

	Typical examples are various parts of cnews, but I have had
pppd get stuck there, too.  I have also had other average 'sock'
NetBSD executables get stuck there (such as find).
>How-To-Repeat:
	Run a full newsfeed :).
>Fix:
	I'm thinking that this is some form of race condition.  I
could be wrong.


>Audit-Trail:
>Unformatted: