Subject: kern/529: processes hanging in D state forever
To: None <gnats-admin@sun-lamp.cs.berkeley.edu>
From: None <danielce@ee.mu.oz.au>
List: netbsd-bugs
Date: 10/20/1994 01:50:06
>Number:         529
>Category:       kern
>Synopsis:       processes hanging in D state forever
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    gnats-admin (Kernel Bug People)
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Oct 20 01:50:04 1994
>Originator:     daniel carosone
>Organization:
bozo software foundation
>Release:        
>Environment:
	
System: NetBSD oink 1.0_BETA NetBSD 1.0_BETA (_oink3_) #16: Fri Sep 23 10:20:45 EST 1994 dan@oink:/home/c/l/NetBSD/src/sys/arch/sparc/compile/_oink3_ sparc


>Description:


I've mentioned this before, some time ago.  The problem seemed to go
away, so I figured it had been fixed, but it's come back again
recently.

Processes that do a lot of filesystem activity, such as a find, or a
make of the world, will sometimes get stuck in the D+ state,
forever. They're obviously unkillable. It seems to only happen after
the machine has been up for some time, which may explain why I thought
the problem had gone away, since average uptime over the past months
has been less than usually.

The rest of the processes on the machine are fine.

Here's the current list of what looks to be stuck:


dan      19753  0.0  0.0   348  792 p0  D    Tue08PM    0:00.90 make all
root     21067  0.0  0.0   108  496 ??  D    Wed02AM    0:02.95 find /lfs/f -xdev ( ( -type f ( -perm -u+s -or -perm -g+s ) ) -or -
dan      23445  0.0  0.0   168  588 p0  D+    8:02PM    0:03.82 find / -name pgp -print
root     23941  0.0  0.0   108  496 ??  D     2:02AM    0:01.28 find /lfs/f -xdev ( ( -type f ( -perm -u+s -or -perm -g+s ) ) -or -
dan      12065  0.0  0.0   284  740 p1  D+    5:48PM    0:00.40 make
dan      12366  0.0  0.0   236  708 p1  D+    5:51PM    0:00.33 make all
dan      12379  0.0  0.0   640 1192 p3  Ds+   6:02PM    0:01.19 -tcsh (tcsh)

(/lfs/* are not LFS's, just a bad choice of name that I think I'll
change soon)

the makes are attempts to rebuild the world that got stuck (also in
/lfs/f in this case, though it hasn't always been).

Once it sets in, it seems to get worse, witness this attempt to see if
there was a specific inode that was causing the trouble:

dan@oink [18:09][36]/lfs/f> find .
.
./lost+found
./a
./l
^C^C^C^C

^C^C
^\^\^\

sources were current when it was last built, the date in the kernel
above. It's been up 20 days.

After a reboot, if it follows past patterns, it will be fine again for
a while. I'm leaving it up in case there's some useful information I
can get out of it for you, though processes are hanging left and right
by now. Sorry, there's no DDB in the running kernel.

One other possibly-relevant piece of information - I'm running amd on
the machine now, serving /home. /lfs is where local file systems get
mounted, and /home/? is a link entry on the same host. However, I
wasn't running amd before when I saw these problems (4 months ago?)
nor any NFS mounts, there were some NFS exports though.

>How-To-Repeat:

I suppose this doesn't happen to anyone else. I don't do anything
special to provoke it, it just happens.

>Fix:
 
Cry for help! :)
>Audit-Trail:
>Unformatted: