Subject: uninterruptable sleep with NIS + cron jobs
To: None <current-users@netbsd.org>
From: Antti Kantee <pooka@iki.fi>
List: current-users
Date: 11/21/2000 14:29:11
We had a short break in NIS services yesterday (a few hours maybe, who
knows). Today I noticed that I had lots of cron processes stuck in
pairs, one waiting at ppwait and the other one at netio:

   0  3665     1   0  10   0   456     0 ppwait   DW   ?? 0:00.00 /USR/SBIN/CRO
   0  3667  3665   0   2   0   456     0 netio    IWVs ?? 0:00.00 /USR/SBIN/CRO
   0  4007     1   0  10   0   456     0 ppwait   DW   ?? 0:00.00 /USR/SBIN/CRO
   0  4008  4007   0   2   0   456     0 netio    IWVs ?? 0:00.00 /USR/SBIN/CRO
   0  4069     1   0  10   0   456     0 ppwait   DW   ?? 0:00.00 /USR/SBIN/CRO
   0  4070  4069   0   2   0   456     0 netio    IWVs ?? 0:00.00 /USR/SBIN/CRO

just to list a few (PPID is 1 with some because I killed the master cron
process while trying to sort this out). Here's the fstat output from one
pair:

USER     CMD          PID   FD MOUNT      INUM MODE         SZ|DV R/W
root     cron        4007   wd /var      21504 drwxr-xr-x     512 r 
root     cron        4007    0 /            22 crw-rw-rw-    null rw
root     cron        4007    1 /            22 crw-rw-rw-    null rw
root     cron        4007    2 /            22 crw-rw-rw-    null rw
root     cron        4007    3* unix stream fffffe00000ea380 <-> fffffe00000a7e00
root     cron        4007    4* unix stream fffffe00000a7e00 <-> fffffe00000ea380
root     cron        4007    5* unix stream fffffe00000eaf80 <-> fffffe00000ea500
root     cron        4007    6* unix stream fffffe00000ea500 <-> fffffe00000eaf80
root     cron        4008   wd /var      21504 drwxr-xr-x     512 r 
root     cron        4008    0* unix stream fffffe00000ea380 <-> fffffe00000a7e00
root     cron        4008    1* unix stream fffffe00000ea500 <-> fffffe00000eaf80
root     cron        4008    2* unix stream fffffe00000ea500 <-> fffffe00000eaf80
root     cron        4008    3 /          8089 -rw-r--r--     315 r 

And here's a random snippet from netstat output even if for pure
bandwidth-consuming purposes:

Active UNIX domain sockets
Address  Type   Recv-Q Send-Q    Inode     Conn     Refs  Nextref Addr
...
fffffc0001ddfd48 stream   4092      0        0 fffffe00000ea500        0 0

It looks like this shouldn't be happening, so does anyone have any
clever ideas (besides send-pr)? I'll try to be able to get a ddb-trace
of the processes if that's required.

This is 1.5J with UBC patches and Alpha hardware.

-- 
Antti Kantee <pooka@iki.fi>          v          Of course he runs NetBSD
http://www.iki.fi/pooka/             i            http://www.NetBSD.org/