Subject: kern/22406: -current kernels don't recover from NFS server crashes
To: None <gnats-bugs@gnats.netbsd.org>
From: None <dholland@eecs.harvard.edu>
List: netbsd-bugs
Date: 08/08/2003 12:54:13
>Number: 22406
>Category: kern
>Synopsis: NFS client hangs permanently after server crash
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Fri Aug 08 16:55:00 UTC 2003
>Closed-Date:
>Last-Modified:
>Originator: dholland@eecs.harvard.edu (David Holland)
>Release: NetBSD 1.6T (current of 20030530)
>Organization:
- David A. Holland / dholland@eecs.harvard.edu
>Environment:
System: NetBSD alicante 1.6T NetBSD 1.6T (ALICANTE) #5: Fri May 30 18:19:54 EDT 2003 root@alicante:/usr/src/sys/arch/i386/compile/ALICANTE i386
Architecture: i386
Machine: i386
>Description:
It seems that 1.6-current kernels (I've seen this before in
earlier ones, but I think the problem may have gotten worse)
don't recover from nfs server downtime.
[I know 1.6T is getting a little old, but I don't see anything
in cvs logs that looks likely to have changed the behavior
since I built this kernel.]
Our main file server here (a NetApp) died yesterday at about
5:30 pm and was down for perhaps an hour. This morning when I
came in, my machine thought the NFS server was still down:
processes that had been using the NFS volume at the time of
the crash were still hung in device wait, and new processes
that touched the volume also went into uninterruptible device
wait and stayed there. So did some processes that
theoretically shouldn't have touched the NFS volume, like
"ls /". Presumably these were queueing up behind other
processes hung in NFS.
The volume was mounted (under /home) with -o soft, intr... or
at least those options are in fstab; mount doesn't report
them, and it's not clear they're being honored. (If not,
though, I object to mount accepting them and silently
discarding them...)
I was able to log in as root on the console, but since lots of
random things were locking up in permanent device wait, I
couldn't get a whole lot of further information out. ps -alx
output is affixed below though. (When I tried to reboot, the
reboot process spawned by shutdown -r hung in device wait,
although sync did not and a second manually started reboot
worked ok.)
There seem to be several things going wrong here:
1. The kernel eventually ought to notice that the nfs server
is back.
2. ls / ought not to hang on an nfs volume not mounted in the
root directory. Or so I'd think - I suppose there are reasons
it might have touched, for instance, $HOME, in which case this
objection is moot. But if not, I think it's worth taking steps
to keep chains of stuck processes from clogging up the system.
3. Either the soft mount logic isn't working, or mount is
silently ignoring "-o soft". mount_nfs.c doesn't *look* like
it should be dropping it... but I might be missing something,
and "-o soft" isn't documented in the man pages.
4. Similarly, "-o intr" doesn't seem to be working either. All
the processes with a wchan looking remotely nfs-related were
sleeping uninterruptibly... although I suppose I might have
missed something.
For the record, the network is running over ex0.
ps -alx output, very slightly edited for confidentiality:
UID PID PPID CPU PRI NI VSZ RSS WCHAN STAT TT TIME COMMAND
0 0 0 0 -18 0 0 23388 schedule DKs ?? 0:00.36 [swapper]
0 1 0 0 10 0 56 188 wait Is ?? 0:00.04 init
0 2 0 0 -6 0 0 23388 sccomp DK ?? 0:00.00 [scsibus0]
0 3 0 0 10 0 0 23388 pmsreset DK ?? 0:00.00 [pms0]
0 4 0 0 -18 0 0 23388 pgdaemon DK ?? 0:02.00 [pagedaemon]
0 5 0 0 -18 0 0 23388 reaper DK ?? 0:16.16 [reaper]
0 6 0 0 -1 0 0 23388 nfsrcvlk DK ?? 3:19.97 [ioflush]
0 7 0 0 -18 0 0 23388 aiodoned DK ?? 0:02.97 [aiodoned]
0 72 628 0 2 0 920 1624 select S ?? 0:00.46 xterm -ut -g
0 137 1 0 2 0 192 712 select Ss ?? 0:00.23 /usr/sbin/sy
0 146 1 0 2 0 280 620 select Ss ?? 0:28.78 /usr/sbin/rp
0 153 1 0 2 0 192 648 select Ss ?? 0:06.65 /usr/sbin/yp
32170 162 7967 0 -2 0 324 844 vnlock DVs ?? 0:00.04 cron: runnin
0 176 0 0 -1 0 0 23388 nfsrcvlk DK ?? 0:02.77 [nfsio]
0 184 0 0 10 0 0 23388 nfsidl IK ?? 0:00.01 [nfsio]
0 186 0 0 -1 0 0 23388 nfsrcvlk DK ?? 0:00.73 [nfsio]
0 188 0 0 -1 0 0 23388 nfsrcvlk DK ?? 0:00.01 [nfsio]
0 207 0 0 -22 0 0 23388 actwat DK ?? 0:00.23 [acctwatch]
32170 225 628 0 2 0 128 760 select S ?? 0:02.00 xbiff -bg #2
0 317 1 0 2 0 120 4 select Ss ?? 0:00.04 /usr/sbin/lp
0 364 1 0 2 0 392 756 select Ss ?? 0:03.30 /usr/sbin/ss
0 369 1 28 18 0 180 4 pause IWs ?? 0:00.02 /usr/X11R6/b
0 383 369 0 2 0 20092 26760 select Ss ?? 8:40.35 /usr/X11R6/b
0 398 369 7 10 0 452 4 wait IWs ?? 0:00.05 xdm: :0
32170 413 628 0 2 0 132 696 select S ?? 0:01.22 oclock -bg #
32170 521 1160 0 -2 0 324 844 vnlock DVs ?? 0:00.04 cron: runnin
12 532 599 0 2 0 284 620 select S ?? 0:01.45 qmgr -l -t f
32235 544 364 0 -2 0 612 1796 vnlock Ds ?? 0:00.07 sshd: user [
32170 572 1973 0 -2 0 324 844 vnlock DVs ?? 0:00.04 cron: runnin
0 599 1 0 2 0 172 440 select Ss ?? 0:02.01 /usr/libexec
0 611 682 0 10 0 324 844 ppwait D ?? 0:00.00 cron: runnin
32170 628 398 0 2 0 272 788 select Ss ?? 0:06.47 fvwm
0 641 682 0 10 0 324 844 ppwait D ?? 0:00.00 cron: runnin
32170 643 641 0 -2 0 324 844 vnlock DVs ?? 0:00.04 cron: runnin
0 682 1 0 10 0 324 384 nanoslee Ss ?? 0:00.92 /usr/sbin/cr
0 685 1 0 2 0 76 292 kqread Is ?? 0:00.02 /usr/sbin/in
0 708 682 0 10 0 324 844 ppwait D ?? 0:00.00 cron: runnin
32170 716 628 0 2 0 136 520 select S ?? 0:01.83 /usr/X11R6/l
0 759 1 0 2 0 568 1364 select S ?? 0:01.12 xterm -geome
0 771 682 0 10 0 324 844 ppwait D ?? 0:00.00 cron: runnin
32170 775 771 0 -2 0 324 844 vnlock DVs ?? 0:00.04 cron: runnin
32170 785 3856 0 -2 0 324 844 vnlock DVs ?? 0:00.04 cron: runnin
0 810 682 0 10 0 324 844 ppwait D ?? 0:00.00 cron: runnin
0 832 682 0 10 0 324 844 ppwait D ?? 0:00.00 cron: runnin
32170 865 1316 0 -2 0 324 844 vnlock DVs ?? 0:00.04 cron: runnin
32170 900 19834 0 -2 0 324 844 vnlock DVs ?? 0:00.04 cron: runnin
16 916 1043 0 2 0 428 1600 netio I ?? 0:00.01 sshd: user [
0 968 682 0 10 0 324 844 ppwait D ?? 0:00.00 cron: runnin
32170 1004 4971 0 -2 0 324 844 vnlock DVs ?? 0:00.04 cron: runnin
0 1041 682 0 10 0 324 844 ppwait D ?? 0:00.00 cron: runnin
32235 1043 364 0 -2 0 612 1796 vnlock Ds ?? 0:00.07 sshd: user [
32170 1048 1083 0 -2 0 324 844 vnlock DVs ?? 0:00.04 cron: runnin
32170 1064 13605 0 -2 0 324 844 vnlock DVs ?? 0:00.04 cron: runnin
32170 1071 968 0 -2 0 324 844 vnlock DVs ?? 0:00.04 cron: runnin
0 1072 682 0 10 0 324 844 ppwait D ?? 0:00.00 cron: runnin
32170 1082 12215 0 -2 0 324 844 vnlock DVs ?? 0:00.04 cron: runnin
0 1083 682 0 10 0 324 844 ppwait D ?? 0:00.00 cron: runnin
0 1103 682 0 10 0 324 844 ppwait D ?? 0:00.00 cron: runnin
0 1110 682 0 10 0 324 844 ppwait D ?? 0:00.00 cron: runnin
0 1160 682 0 10 0 324 844 ppwait D ?? 0:00.00 cron: runnin
32170 1178 5655 0 -2 0 324 844 vnlock DVs ?? 0:00.04 cron: runnin
32170 1213 832 0 -2 0 324 844 vnlock DVs ?? 0:00.04 cron: runnin
32170 1219 1356 0 -2 0 324 844 vnlock DVs ?? 0:00.04 cron: runnin
32170 1268 1394 0 -2 0 324 844 vnlock DVs ?? 0:00.04 cron: runnin
0 1298 682 0 10 0 324 844 ppwait D ?? 0:00.00 cron: runnin
0 1308 682 0 10 0 324 844 ppwait D ?? 0:00.00 cron: runnin
0 1316 682 0 10 0 324 844 ppwait D ?? 0:00.00 cron: runnin
0 1336 682 0 10 0 324 844 ppwait D ?? 0:00.00 cron: runnin
0 1356 682 0 10 0 324 844 ppwait D ?? 0:00.00 cron: runnin
32170 1376 1757 0 -2 0 324 844 vnlock DVs ?? 0:00.04 cron: runnin
0 1394 682 0 10 0 324 844 ppwait D ?? 0:00.00 cron: runnin
0 1397 682 0 10 0 324 844 ppwait D ?? 0:00.00 cron: runnin
0 1419 682 0 10 0 324 844 ppwait D ?? 0:00.00 cron: runnin
32170 1479 708 0 -2 0 324 844 vnlock DVs ?? 0:00.04 cron: runnin
0 1481 682 0 10 0 324 844 ppwait D ?? 0:00.00 cron: runnin
32170 1486 22989 0 -2 0 324 844 vnlock DVs ?? 0:00.04 cron: runnin
0 1535 682 0 10 0 324 844 ppwait D ?? 0:00.00 cron: runnin
0 1549 682 0 10 0 324 844 ppwait D ?? 0:00.00 cron: runnin
32170 1554 1041 0 -2 0 324 844 vnlock DVs ?? 0:00.04 cron: runnin
0 1589 682 0 10 0 324 844 ppwait D ?? 0:00.00 cron: runnin
0 1616 682 0 10 0 324 844 ppwait D ?? 0:00.00 cron: runnin
32170 1639 2014 0 -2 0 324 844 vnlock DVs ?? 0:00.04 cron: runnin
0 1640 682 0 10 0 324 844 ppwait D ?? 0:00.00 cron: runnin
0 1646 4706 4 -2 0 508 1036 vnlock D ?? 0:29.36 find / ( ! -
0 1660 682 0 10 0 324 844 ppwait D ?? 0:00.00 cron: runnin
32170 1664 25944 0 -2 0 324 844 vnlock DVs ?? 0:00.04 cron: runnin
32235 1715 364 0 -2 0 612 1796 vnlock Ds ?? 0:00.08 sshd: user [
0 1757 682 0 10 0 324 844 ppwait D ?? 0:00.00 cron: runnin
0 1785 682 0 10 0 324 844 ppwait D ?? 0:00.00 cron: runnin
32170 1814 2476 0 -2 0 324 844 vnlock DVs ?? 0:00.04 cron: runnin
0 1827 682 0 10 0 324 844 ppwait D ?? 0:00.00 cron: runnin
32170 1831 1072 0 -2 0 324 844 vnlock DVs ?? 0:00.04 cron: runnin
0 1839 682 0 10 0 324 844 ppwait D ?? 0:00.00 cron: runnin
32170 1840 1839 0 -2 0 324 844 vnlock DVs ?? 0:00.04 cron: runnin
32170 1925 1927 0 -2 0 324 844 vnlock DVs ?? 0:00.04 cron: runnin
0 1927 682 0 10 0 324 844 ppwait D ?? 0:00.00 cron: runnin
32170 1937 1298 0 -2 0 324 844 vnlock DVs ?? 0:00.04 cron: runnin
0 1973 682 0 10 0 324 844 ppwait D ?? 0:00.00 cron: runnin
32170 2004 1110 0 -2 0 324 844 vnlock DVs ?? 0:00.04 cron: runnin
0 2014 682 0 10 0 324 844 ppwait D ?? 0:00.00 cron: runnin
32170 2091 810 0 -2 0 324 844 vnlock DVs ?? 0:00.04 cron: runnin
32170 2117 5806 0 -2 0 324 844 vnlock DVs ?? 0:00.04 cron: runnin
32170 2160 4693 0 -2 0 324 844 vnlock DVs ?? 0:00.04 cron: runnin
0 2174 682 0 10 0 324 844 ppwait D ?? 0:00.00 cron: runnin
32170 2263 2803 0 -2 0 324 844 vnlock DVs ?? 0:00.04 cron: runnin
32170 2278 611 0 -2 0 324 844 vnlock DVs ?? 0:00.04 cron: runnin
0 2314 682 0 10 0 324 844 ppwait D ?? 0:00.00 cron: runnin
0 2354 682 0 10 0 324 844 ppwait D ?? 0:00.00 cron: runnin
32170 2427 17641 0 -2 0 324 844 vnlock DVs ?? 0:00.04 cron: runnin
0 2476 682 0 10 0 324 844 ppwait D ?? 0:00.00 cron: runnin
0 2709 682 0 10 0 324 844 ppwait D ?? 0:00.00 cron: runnin
0 2728 682 0 10 0 324 844 ppwait D ?? 0:00.00 cron: runnin
0 2794 3056 0 2 0 148 776 netio I ?? 0:00.03 postdrop pro
32170 2797 1640 0 -2 0 324 844 vnlock DVs ?? 0:00.04 cron: runnin
0 2803 682 0 10 0 324 844 ppwait D ?? 0:00.00 cron: runnin
32170 2841 6423 0 -2 0 324 844 vnlock DVs ?? 0:00.04 cron: runnin
32170 2865 17513 0 -2 0 324 844 vnlock DVs ?? 0:00.04 cron: runnin
32170 2993 1419 0 -2 0 324 844 vnlock DVs ?? 0:00.04 cron: runnin
0 3056 16356 0 -6 0 168 796 piperd I ?? 0:00.02 sendmail -t
32170 3155 23141 0 -2 0 324 844 vnlock DVs ?? 0:00.04 cron: runnin
32170 3274 2728 0 -2 0 324 844 vnlock DVs ?? 0:00.04 cron: runnin
32170 3291 26458 0 -2 0 324 844 vnlock DVs ?? 0:00.04 cron: runnin
0 3472 682 0 10 0 324 844 ppwait D ?? 0:00.00 cron: runnin
0 3539 682 0 10 0 324 844 ppwait D ?? 0:00.00 cron: runnin
0 3856 682 0 10 0 324 844 ppwait D ?? 0:00.00 cron: runnin
32170 4409 18996 0 -2 0 324 844 vnlock DVs ?? 0:00.04 cron: runnin
32170 4587 5271 0 -2 0 324 844 vnlock DVs ?? 0:00.04 cron: runnin
0 4693 682 0 10 0 324 844 ppwait D ?? 0:00.00 cron: runnin
0 4706 16356 5 10 0 180 672 wait I ?? 0:00.02 /bin/sh /etc
0 4971 682 0 10 0 324 844 ppwait D ?? 0:00.00 cron: runnin
32170 5077 3539 0 -2 0 324 844 vnlock DVs ?? 0:00.04 cron: runnin
32170 5166 2314 0 -2 0 324 844 vnlock DVs ?? 0:00.04 cron: runnin
0 5271 682 0 10 0 324 844 ppwait D ?? 0:00.00 cron: runnin
32170 5346 2354 0 -2 0 324 844 vnlock DVs ?? 0:00.04 cron: runnin
0 5655 682 0 10 0 324 844 ppwait D ?? 0:00.00 cron: runnin
0 5806 682 0 10 0 324 844 ppwait D ?? 0:00.00 cron: runnin
12 5934 599 0 2 0 224 876 select S ?? 0:00.05 pickup -l -t
0 6423 682 0 10 0 324 844 ppwait D ?? 0:00.00 cron: runnin
32170 7081 10073 0 -2 0 324 844 vnlock DVs ?? 0:00.04 cron: runnin
16 7201 544 0 2 0 428 1600 netio I ?? 0:00.01 sshd: user [
32170 7465 8998 0 -2 0 324 844 vnlock DVs ?? 0:00.04 cron: runnin
32170 7583 8614 0 -2 0 324 844 vnlock DVs ?? 0:00.04 cron: runnin
0 7967 682 0 10 0 324 844 ppwait D ?? 0:00.00 cron: runnin
32170 8223 20766 0 -2 0 324 844 vnlock DVs ?? 0:00.04 cron: runnin
0 8614 682 0 10 0 324 844 ppwait D ?? 0:00.00 cron: runnin
0 8998 682 0 10 0 324 844 ppwait D ?? 0:00.00 cron: runnin
32170 9153 19087 0 -2 0 324 844 vnlock DVs ?? 0:00.04 cron: runnin
0 10073 682 0 10 0 324 844 ppwait D ?? 0:00.00 cron: runnin
32170 10305 15039 0 -2 0 324 844 vnlock DVs ?? 0:00.04 cron: runnin
32170 10969 1616 0 -2 0 324 844 vnlock DVs ?? 0:00.04 cron: runnin
32170 11020 1549 0 -2 0 324 844 vnlock DVs ?? 0:00.04 cron: runnin
16 11063 25030 0 2 0 420 1700 netio I ?? 0:00.16 sshd: user2
32170 11319 1589 0 -2 0 324 844 vnlock DVs ?? 0:00.04 cron: runnin
16 11966 1715 0 2 0 428 1600 netio I ?? 0:00.01 sshd: user [
32170 12197 1 0 2 0 188 4 netcon IWs ?? 0:00.00 stunnel -P n
0 12215 682 0 10 0 324 844 ppwait D ?? 0:00.00 cron: runnin
32170 12350 1827 0 -2 0 324 844 vnlock DVs ?? 0:00.04 cron: runnin
0 12837 16356 0 -6 0 28 364 piperd I ?? 0:00.01 tee /var/log
0 13605 682 0 10 0 324 844 ppwait D ?? 0:00.00 cron: runnin
0 15039 682 0 10 0 324 844 ppwait D ?? 0:00.00 cron: runnin
32170 15716 22861 0 -2 0 324 844 vnlock DVs ?? 0:00.04 cron: runnin
32170 16239 26844 0 -2 0 324 844 vnlock DVs ?? 0:00.04 cron: runnin
0 16356 18590 0 10 0 132 624 wait Is ?? 0:00.05 /bin/sh -c /
0 17513 682 0 10 0 324 844 ppwait D ?? 0:00.00 cron: runnin
0 17641 682 0 10 0 324 844 ppwait D ?? 0:00.00 cron: runnin
0 18590 682 0 -6 0 324 816 piperd I ?? 0:00.00 cron: runnin
0 18996 682 0 10 0 324 844 ppwait D ?? 0:00.00 cron: runnin
0 19087 682 0 10 0 324 844 ppwait D ?? 0:00.00 cron: runnin
32170 19727 2709 0 -2 0 324 844 vnlock DVs ?? 0:00.04 cron: runnin
0 19834 682 0 10 0 324 844 ppwait D ?? 0:00.00 cron: runnin
32170 19962 1785 0 -2 0 324 844 vnlock DVs ?? 0:00.04 cron: runnin
32170 20123 1308 0 -2 0 324 844 vnlock DVs ?? 0:00.04 cron: runnin
5675 20404 1336 0 -2 0 324 844 vnlock DVs ?? 0:00.04 cron: runnin
0 20766 682 0 10 0 324 844 ppwait D ?? 0:00.00 cron: runnin
32170 21885 26330 0 2 0 324 844 netio DVs ?? 0:00.12 cron: runnin
32170 22781 1397 0 -2 0 324 844 vnlock DVs ?? 0:00.04 cron: runnin
0 22861 682 0 10 0 324 844 ppwait D ?? 0:00.00 cron: runnin
0 22989 682 0 10 0 324 844 ppwait D ?? 0:00.00 cron: runnin
0 23141 682 0 10 0 324 844 ppwait D ?? 0:00.00 cron: runnin
32170 23634 3472 0 -2 0 324 844 vnlock DVs ?? 0:00.04 cron: runnin
32170 24293 1103 0 -2 0 324 844 vnlock DVs ?? 0:00.04 cron: runnin
32170 25030 364 0 -2 0 608 1896 vnlock Ds ?? 0:00.07 sshd: user2
32170 25042 1660 0 -2 0 324 844 vnlock DVs ?? 0:00.04 cron: runnin
32170 25414 1481 0 -2 0 324 844 vnlock DVs ?? 0:00.04 cron: runnin
0 25944 682 0 10 0 324 844 ppwait D ?? 0:00.00 cron: runnin
32170 26072 2174 0 -2 0 324 844 vnlock DVs ?? 0:00.04 cron: runnin
0 26330 682 0 10 0 324 844 ppwait D ?? 0:00.00 cron: runnin
0 26458 682 0 10 0 324 844 ppwait D ?? 0:00.00 cron: runnin
32170 26716 1535 0 -2 0 324 844 vnlock DVs ?? 0:00.04 cron: runnin
0 26844 682 0 10 0 324 844 ppwait D ?? 0:00.00 cron: runnin
32235 324 1 0 -2 0 240 1316 vnlock Ds p0 0:00.18 -zsh
32235 526 324 0 -2 0 72 860 vnlock D p0 0:00.01 talk user3
32170 323 726 0 18 0 880 4 pause IWs p1 0:00.13 -csh (tcsh)
0 419 1 0 2 0 588 2108 select S p1 0:02.28 xterm -fg wh
0 483 1 0 2 0 568 1844 select S p1 0:00.41 xterm -fg bl
0 687 1 0 2 0 568 1120 select S p1 0:00.29 xterm -fg ma
0 976 1 0 2 0 568 1184 select S p1 0:07.83 xterm -fg #f
0 1010 1 0 2 0 564 1116 select S p1 0:07.81 xterm -fg li
0 1088 1354 0 -2 0 3420 3900 vnlock D+ p1 0:00.04 emacs -nw fs
0 1099 1 0 2 0 576 2172 select S p1 0:03.35 xterm -fg wh
0 1117 1 0 2 0 644 1376 select S p1 0:09.96 xterm -fg bl
0 1354 323 0 18 0 1084 1108 pause S p1 0:00.28 -csh (tcsh)
0 1418 1 0 2 0 568 1308 select S p1 0:00.35 xterm -fg go
0 1531 1 0 2 0 576 2164 select S p1 0:05.12 xterm -fg la
32170 232 72 0 3 0 1232 1184 ttyin Ss+ p2 0:00.16 -csh (tcsh)
32170 420 419 0 18 0 1052 1052 pause Ss p3 0:00.89 -csh (tcsh)
32170 10806 420 0 -2 0 40 460 vnlock D p3 0:00.00 ls /home/lai
0 11073 420 0 -2 0 200 736 vnlock D+ p3 0:00.05 -csh
32170 192 759 0 3 0 1056 1036 ttyin Ss+ p4 0:00.29 -csh (tcsh)
0 668 1 0 2 0 568 1860 select S p4 0:01.92 xterm -fg gr
32170 1389 1 0 2 0 4020 1764 select S p4 0:00.74 emacs -fg cy
32170 1894 192 0 -1 0 40 460 nfsrcvlk D p4 0:00.00 ls
32170 1898 1 0 -2 0 4024 4148 vnlock D p4 0:00.89 emacs -fg #f
32170 579 977 0 -2 0 20 416 vnlock D p5 0:00.00 pwd
32170 977 1099 0 18 0 880 1152 pause Ss p5 0:00.14 -csh (tcsh)
32170 2882 977 0 -2 0 40 460 vnlock D p5 0:00.00 ls /
32170 3409 977 0 -2 0 128 528 vnlock D+ p5 0:00.06 less
32170 13631 977 0 -2 0 2600 1444 vnlock D p5 0:00.02 xlock -mode
32170 819 483 0 -2 0 916 1224 vnlock Ds+ p6 0:00.16 -csh (tcsh)
32170 999 1 0 2 0 4128 4240 select S p6 0:01.49 emacs -fg se
0 726 1 0 2 0 568 1188 select S p7- 0:01.18 xterm -fg al
32170 883 687 0 18 0 872 4 pause IWs p7 0:00.10 -csh (tcsh)
33068 946 1813 0 3 0 848 4 ttyin IW+ p7 0:00.07 -tcsh
0 1813 883 0 18 0 836 4 pause IW p7 0:00.08 -csh (tcsh)
32170 15343 10380 0 -2 0 1016 1320 vnlock Ds+ p8 0:00.60 -csh (tcsh)
32170 709 1418 0 3 0 956 960 ttyin Is+ p9 0:00.18 -csh (tcsh)
32170 1963 1 0 -2 0 4116 4468 vnlock D p9 0:01.08 emacs -fg or
32170 7073 709 0 -18 0 32492 56296 uvn_fp2 Da p9 4:56.54 /usr/pkg/lib
32170 1102 976 0 3 0 972 780 ttyin Is+ pa 0:00.28 -csh (tcsh)
32170 992 1117 0 18 0 952 884 pause Is pb 0:00.14 -csh (tcsh)
32170 29474 992 0 2 0 916 1488 select S+ pb 0:02.94 irc user2 t
32170 1333 1010 0 3 0 1092 4 ttyin IWs+ pc 0:01.23 -csh (tcsh)
32170 1364 668 0 3 0 916 1092 ttyin Ss+ pd 0:00.20 -csh (tcsh)
32170 1851 1364 0 -1 0 40 460 nfsrcvlk D pd 0:00.00 ls
0 10380 1 0 2 0 568 1616 select S pd 0:02.26 xterm -fg #f
32170 935 1531 0 18 0 1056 1348 pause Is pe 0:00.12 -csh (tcsh)
32170 1016 935 0 2 0 504 1132 select S+ pe 0:03.74 slogin bowse
0 502 1 0 18 0 876 1356 pause Ss E0 0:00.20 -csh (tcsh)
0 5467 502 0 29 0 220 680 - R+ E0 0:00.00 ps alx
0 699 1 22 3 0 48 4 ttyin IWs+ E1 0:00.01 /usr/libexec
0 757 1 22 3 0 48 4 ttyin IWs+ E2 0:00.01 /usr/libexec
0 523 1 22 3 0 48 4 ttyin IWs+ E3 0:00.01 /usr/libexec
>How-To-Repeat:
As far as I know, it's sufficient to crash the server and
leave it down for longer than just a few minutes. I'm not sure
it happens with short downtimes.
The problem might be some specific NetBSD vs. NetApp
interaction; I don't know if it happens with other server
types. (And, unfortunately, I'm not in a position to check for
the time being.) For what it's worth, other client OSes we
have here (FreeBSD, Linux, OS X) seem to recover ok.
However, I can't find any related PRs or mailing list traffic,
which suggests that it's not something everyone's seeing.
>Fix:
I haven't had a chance to look into it in more detail and I
probably won't have time to for a while, unfortunately.
(However, I can probably test things if someone wants me to.)
>Release-Note:
>Audit-Trail:
>Unformatted: