Subject: kern/5233: NFS panic on reboot after network has gone south
To: None <>
From: None <cgd@NetBSD.ORG>
List: netbsd-bugs
Date: 03/30/1998 09:21:21
>Number:         5233
>Category:       kern
>Synopsis:       on reboot, system panic in NFS code when network was hung
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    kern-bug-people (Kernel Bug People)
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Mar 30 09:35:00 1998
>Originator:     Chris G. Demetriou
Kernel Hackers 'r' Us
>Release:        NetBSD 1.3.1
NetBSD/i386 1.3.1 kernel (built from 1.3 sources + 1.3.1 patch), with
1.3 user-land.  PPro 200, 64MB of RAM.

	[ I searched for other PRs with 'NFS' in the text; this PR
	may be akin to 4115 or 2893, but it has a different panic,
	different trace, and is for a different version of the system.
	It does seem to be describing the same problem as 3072, but 
	the difference is 1.2B vs. 1.3.1. 8-]

	Rebooted one of my PCs after the ethernet card driver got wedged
	(interrupt-related lossage, card wasn't getting its interrupts,
	but the reason for the network lossage itself seems irrelevant).

	Several processes were hung, trying to do NFS operations to
	another system on the network (obviously, given that the ethernet
	driver was hung, those operations couldn't complete).

	On reboot, after the disks were synced, the system crashed (ten
	finger copy of the traceback):

syncing disks... 4 4 2 done
vm_fault (0xf0967a00, 0, 1, 0) -> 1
kernel: page fault trap, code=0 
Stopped at      _nfs_reply+0x9e:        movl    0x8(%edx),%ecx
_nfs_reply(f095bac0,200,f093cd00,f48b3c2c,f090ca00) at _nfs_reply+0x9e
_nfs_request(f08c4400,f093cd00,1,f093e000,f092cf80) at _nfs_request+0x3ad
_nfs_getattr(f48b3cb8) at _nfs_getattr+0x336
_nfs_lookup(f48b3d68,f08c4400,f48b3f04,f48b3ee0,f48b3f04) at _nfs_lookup+0x1f5
_lookup(f48b3ee0) at _lookup+0x26e
_namei(f48b3ee0) at _namei+0x176
_vn_open(f48b3ee0,5,0,f01ee3d4,f093e00) at _vn_open+0x170
_sys_open(f48b3ee0,f48b3f88,f48b3f80,0,0) at _sys_open+0xaa
_syscall() at _syscall+0x238
--- syscall (number 5) ---

	Other possibly-useful info:

	_curproc = f093e000

	which is:

	PID   proc       addr       uid ppid pgrp  flag   stat em     comm
	21825 0xf093e000 0xf48b2000 0   1    21825 004006 2    netbsd vi

	other processes (runnable except where noted):
		tip (blocked on ttyout; IE+, wouldn't die; interrupt lossage
		    re: the serial port interrupt conflicting with the
		    enet card's interrupt)
		csh (ppwait)
		pagedaemon (paged)
		init (wait)
		swapper (scheduler)
		tip (zombie)

	hard-NFS-mount some nfs file systems, tweak your network so
	you can no longer talk to the server, and try to reboot?