Subject: kern/8249: panic: nfs_disconnect: waiters left after drain?
To: None <gnats-bugs@gnats.netbsd.org>
From: Antti Kantee <pooka@iki.fi>
List: netbsd-bugs
Date: 08/21/1999 15:09:59
>Number:         8249
>Category:       kern
>Synopsis:       panic: nfs_disconnect: waiters left after drain?
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    kern-bug-people (Kernel Bug People)
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Aug 21 14:20:01 1999
>Last-Modified:
>Originator:     Antti Kantee
>Organization:
>Release:        -current kernel as of 21st Aug
>Environment:
NetBSD/i386 ELF 1.4K, userland from ~10th

>Description:
Playing around with nfs gives me a panic. Here's what I could get out using
method 1 described below:

#0  0xf022a180 in rcsid ()
#1  0x30bd000 in ?? ()
#2  0xf01dbeab in cpu_reboot (howto=256, bootstr=0x0)
    at ../../../../arch/i386/i386/machdep.c:1212
#3  0xf013550d in panic () at ../../../../kern/subr_prf.c:217
#4  0xf0192735 in nfs_disconnect (nmp=0xf0510000)
    at ../../../../nfs/nfs_socket.c:349
#5  0xf0192651 in nfs_reconnect (rep=0xf0513cc0)
    at ../../../../nfs/nfs_socket.c:301
#6  0xf0192900 in nfs_receive (rep=0xf0513cc0, aname=0xf8fe6b04, mp=0xf8fe6b08)
    at ../../../../nfs/nfs_socket.c:499
#7  0xf0192d4a in nfs_reply (myrep=0xf0513cc0)
    at ../../../../nfs/nfs_socket.c:695
#8  0xf019346b in nfs_request (vp=0xf8fcf198, mrest=0xf0478180, procnum=1, 
    procp=0xf9000dc4, cred=0xf0515880, mrp=0xf8fe6bec, mdp=0xf8fe6bf0, 
    dposp=0xf8fe6bf4) at ../../../../nfs/nfs_socket.c:982
#9  0xf019b786 in nfs_getattr (v=0xf8fe6c24) at ../../../../nfs/nfs_vnops.c:562
#10 0xf019a87a in mountnfs (argp=0xf8fe6dd4, mp=0xf0506c00, nam=0xf0478100, 
    pth=0xf8fe6d78 "/usr/home", hst=0xf8fe6d1c "starfury:/usr/home", 
    vpp=0xf8fe6cd0, p=0xf9000dc4) at ../../../../sys/vnode_if.h:221
can not access 0xefbfda0c, invalid translation (invalid PDE)
can not access 0xefbfda0c, invalid translation (invalid PDE)
can not access 0xefbfda0c, invalid translation (invalid PDE)
can not access 0xefbfda0c, invalid translation (invalid PDE)
#11 0xf019a603 in nfs_mount (mp=0xf0506c00, 
    path=0xefbfda0c <Address 0xefbfda0c out of bounds>, data=0xefbfd8cc, 
    ndp=0xf8fe6e80, p=0xf9000dc4) at ../../../../nfs/nfs_vfsops.c:592
#12 0xf014eef2 in sys_mount (p=0xf9000dc4, v=0xf8fe6f88, retval=0xf8fe6f80)
    at ../../../../kern/vfs_syscalls.c:311
#13 0xf01e2472 in syscall (frame={tf_es = 31, tf_ds = 31, tf_edi = -272639476, 
      tf_esi = -272639495, tf_ebp = -272639724, tf_ebx = -272639796, 
      tf_edx = -272639796, tf_ecx = 134676744, tf_eax = 21, tf_trapno = 3, 
      tf_err = 2, tf_eip = 134525845, tf_cs = 23, tf_eflags = 514, 
      tf_esp = -272639876, tf_ss = 31, tf_vm86_es = 0, tf_vm86_ds = 0, 
      tf_vm86_fs = 0, tf_vm86_gs = 0}) at ../../../../arch/i386/i386/trap.c:753
#14 0xf0100d81 in syscall1 ()
can not access 0xefbfd914, invalid translation (invalid PDE)
can not access 0xefbfd914, invalid translation (invalid PDE)
Cannot access memory at address 0xefbfd914.

(wonder what that stuff about cannot accessing is...)

Here's a trace from method two (without symbols, but I can provide them
if someone finds them necessary):
#0  0xf02278a0 in rcsid ()
#1  0x3da9000 in ?? ()
#2  0xf01da7c3 in cpu_reboot ()
#3  0xf0134ef5 in panic ()
#4  0xf01919e1 in nfs_disconnect ()
#5  0xf0199b89 in nfs_unmount ()
#6  0xf014e8e0 in dounmount ()
#7  0xf014dd8b in vfs_unmountall ()
#8  0xf014de9b in vfs_shutdown ()
#9  0xf01da79b in cpu_reboot ()
#10 0xf0131513 in sys_reboot ()
#11 0xf01e0c52 in syscall ()
#12 0xf0100d81 in syscall1 ()
can not access 0xefbfd948, invalid translation (invalid PDE)
can not access 0xefbfd948, invalid translation (invalid PDE)
Cannot access memory at address 0xefbfd948.

>How-To-Repeat:
I managed to repeat the problem in two ways (at least I hope they are a
result of the same problem because the panic message is the same).

1) mount_nfs -T anything
2) mount nfs mount, disable network interface, try to reboot

>Fix:
I guess the classic "don't do it" is not a good solution.

>Audit-Trail:
>Unformatted: