Subject: NFS accesses in NetBSD/i386 1.2 hanging
To: None <port-i386@NetBSD.ORG>
From: Greg Earle <earle@isolar.Tujunga.CA.US>
List: port-i386
Date: 05/26/1997 16:39:36
I've got 3 NetBSD boxes in my office at work: a SPARCstation 20/71 running
NetBSD 1.2.1, a Mac IIci running NetBSD 1.2, and a Pentium 133 server box
running NetBSD 1.2.

The SPARCstation and the Mac are doing fine, but the PC has started to get
a problem with processes getting wedged in "D" state that do NFS accesses.

Omitting the obvious (swapper and pagedaemon), it looks like this:

pcnetbsd4me# ps -axlww | egrep D
  UID   PID  PPID CPU PRI NI   VSZ  RSS WCHAN  STAT TT       TIME COMMAND
    0     0     0   0 -18  0     0    0 schedu DLs  ??    0:00.37 (swapper)
    0     2     0   0 -18  0     0    0 thrd_s DL   ??    0:00.00 (pagedaemon)
 1282   771   745   0   2  0   140  416 netio  D    ??    0:00.52 calendar -a 
    0   773    78   0   2  0   252  188 netio  D    ??    0:00.47 amd -l syslog -x error,noinfo,nostats -p -a /tmp_mnt /home /etc/amd/home /pkg /etc/amd/pkg /cd /etc/amd/cdrom /project /etc/amd/project 
    0  1502  1487   0  -1  0    52  288 nfsrcv D    ??    0:00.29 / (find)
    0  1896   133   0  -1  0   380  420 nfsrcv D    ??    0:00.03 rshd 
    0  1898  1897   1  10  0   328  272 ppwait Ds   p0    0:00.12 -csh (csh)
    0  1917  1898   0  28  0   328    0 -      RV   p0    0:00.00 egrep D (csh)

A "ps -ax" shows the calendar process (yes, yes, I know the lecture about
running "calendar -a" with a nice fat NIS password map, thank you) running at
that moment as user "wjm", and sure enough, in /var/log/messages I find

May 25 02:20:00 pcnetbsd4me /netbsd: nfs server amd:78: not responding
May 25 02:20:00 pcnetbsd4me /netbsd: nfs server ss1000:/export/gllssi/wjm: not
responding

"ss1000" (not its real name) is a SPARCserver 1000 running Solaris 2.4.  What
I don't understand is that my SPARCstation can mount things from it just fine.
Also, I'm using the same AMD map files on all 3 systems - with "resvport"
specified in the "opts" fields of all of them.  I'm also seeing the problem
when I try to log in as myself on the PC (i.e., NFS mounting my home directory
from the SPARCstation).

I've just rebooted the PC and the problem seems to have disappeared, at least
for the moment.  I'd like to avoid having to reboot it daily in order to
avoid the problem, if I can help it.

The only major difference (other than 1.2 vs. 1.2.1) is that the PC is on
a network attached to a Cisco/Crescendo Catalyst switch Ethernet hub, which
is on an FDDI/CDDI ring.  The SPARCstation is on a hoary old thick Ethernet.

If this (or the WCHANs shown above) rings a bell with anyone, please let me
know.  Especially if it's a "fixed in 1.2.1" kind of thing ...

Thanks in advance,

	- Greg