Subject: kern/10202: kernel loops forever in nfs code
To: None <gnats-bugs@gnats.netbsd.org>
From: Antti Kantee <pooka@iki.fi>
List: netbsd-bugs
Date: 05/26/2000 10:01:13
>Number:         10202
>Category:       kern
>Synopsis:       kernel loops forever in nfs code
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri May 26 10:02:01 PDT 2000
>Closed-Date:
>Last-Modified:
>Originator:     Antti Kantee
>Release:        -current from ~20th May 2000
>Organization:
>Environment:
System: NetBSD roboti.cs.hut.fi 1.4Y NetBSD 1.4Y (ROBOTI) #13: Tue May 23 18:27:04 EEST 2000 root@roboti.cs.hut.fi:/cvs/src/sys/arch/alpha/compile/ROBOTI alpha


>Description:

Sometimes some processes just enter an infinate loop in the kernel. Last
time my victim was cvs and now csh. nfsio threads have collect suspiciously
little processor time.

roboti# ps axl -M netbsd.5.core -N netbsd.5
  UID   PID  PPID CPU PRI NI   VSZ  RSS WCHAN  STAT TT       TIME COMMAND
    0     0     0   0 -18  0     0    0 -      RLs  ??    0:00.03 (swapper)
    0     1     0   2  10  0   752    0 wait   Is   ??    0:00.06 (init)
    0     2     0   0 -18  0     0    0 daemon DL   ??    0:00.26 (pagedaemon)
    0     3     0   0 -18  0     0    0 reaper DL   ??    0:00.02 (reaper)
    0     4     0   0  18  0     0    0 -      RL   ??    0:00.18 (ioflush)
    0   122     0   0  10  0     0    0 nfsidl IL   ??    0:00.01 (nfsio)
    0   123     0   0  10  0     0    0 nfsidl IL   ??    0:00.00 (nfsio)
    0   124     0   0  10  0     0    0 nfsidl IL   ??    0:00.00 (nfsio)
    0   125     0   0  10  0     0    0 nfsidl IL   ??    0:00.00 (nfsio)
    0 14893     0  32  60  0   648    0 -      R    p3-   0:00.03 (reboot)
    0 14854     0   0   2  0  1048    0 -      R    p6-   1:33.35 (csh)
roboti# ps -ax -O paddr -M netbsd.5.core -N netbsd.5
  PID  PADDR TT  STAT      TIME COMMAND
    0 59d4d8 ??  RLs    0:00.03 (swapper)
    1 104a000 ??  Is     0:00.06 (init)
    2 104a258 ??  DL     0:00.26 (pagedaemon)
    3 104a4b0 ??  DL     0:00.02 (reaper)
    4 104a708 ??  RL     0:00.18 (ioflush)
  122 104bc20 ??  IL     0:00.01 (nfsio)
  123 31e8008 ??  IL     0:00.00 (nfsio)
  124 31e8260 ??  IL     0:00.00 (nfsio)
  125 31e84b8 ??  IL     0:00.00 (nfsio)
14893 3786978 p3- R      0:00.03 (reboot)
14854 31e92c8 p6- R      1:33.35 (csh)

(gdb) proc 0xfffffc00031e92c8
(gdb) bt
#0  0xfffffc000033a954 in mi_switch () at ../../../../kern/kern_synch.c:815
#1  0xfffffc0000339cf4 in tsleep (ident=0x0, priority=24, 
    wmesg=0xfffffc00005146b0 "netio", timo=5120)
    at ../../../../kern/kern_synch.c:432
#2  0xfffffc000035c52c in sbwait (sb=0x0)
    at ../../../../kern/uipc_socket2.c:274
#3  0xfffffc000035a688 in soreceive (so=0xfffffc0001172d80, 
    paddr=0xfffffe0006359698, uio=0xfffffe0006359608, mp0=0x0, controlp=0x0, 
    flagsp=0xfffffe000635963c) at ../../../../kern/uipc_socket.c:661
#4  0xfffffc00004355f4 in nfs_receive (rep=0xfffffe00001b0600, aname=0x0, 
    mp=0xfffffe00063596a0) at ../../../../nfs/nfs_socket.c:646
#5  0xfffffc00004356f8 in nfs_reply (myrep=0xfffffe00001b0600)
    at ../../../../nfs/nfs_socket.c:700
#6  0xfffffc00004360c8 in nfs_request (vp=0xfffffc0001a88038, 
    mrest=0xfffffc000293e680, procnum=16, procp=0xfffffc000293e880, 
    cred=0xfffffe00000f5700, mrp=0xfffffe0006359848, mdp=0xfffffe0006359850, 
    dposp=0xfffffe0006359858) at ../../../../nfs/nfs_socket.c:987
#7  0xfffffc000045cd70 in nfs_readdirrpc (vp=0xfffffc0001a88038, 
    uiop=0xfffffe00063598e8, cred=0xfffffe00000f5700)
    at ../../../../nfs/nfs_vnops.c:2082
#8  0xfffffc000041578c in nfs_doio (bp=0xfffffc0000235998, 
    cr=0xfffffe00000f5700, p=0xfffffc00031e92c8)
    at ../../../../nfs/nfs_bio.c:1054
---Type <return> to continue, or q <return> to quit---
#9  0xfffffc0000413f44 in nfs_bioread (vp=0xfffffc0001a88038, 
    uio=0xfffffe0006359b88, ioflag=0, cred=0xfffffe00000f5700, cflag=0)
    at ../../../../nfs/nfs_bio.c:346
#10 0xfffffc000045c0b0 in nfs_readdir (v=0x0)
    at ../../../../nfs/nfs_vnops.c:1961
#11 0xfffffc0000367024 in getcwd_scandir (lvpp=0xfffffe0006359db0, uvpp=0x200, 
    bpp=0xfffffe0006359dc0, bufp=0xfffffe00000b3000 "", p=0xfffffc00031e92c8)
    at ../../../../sys/vnode_if.h:657
#12 0xfffffc0000367560 in getcwd_common (lvp=0xfffffc000383ce28, 
    rvp=0xfffffc0001060158, bpp=0xfffffe0006359e38, 
    bufp=0xfffffe00000b3000 "", limit=512, flags=104177072, 
    p=0xfffffc00031e92c8) at ../../../../kern/vfs_getcwd.c:472
#13 0xfffffc00003677e8 in sys___getcwd (p=0xfffffc00031e92c8, 
    v=0xfffffe0006359e88, retval=0xfffffe0006359ed8)
    at ../../../../kern/vfs_getcwd.c:588
#14 0xfffffc00004e2bbc in syscall (code=296, framep=0xfffffe0006359ef8)
    at ../../../../arch/alpha/alpha/trap.c:698
#15 0xfffffc000030046c in XentSys ()
    at ../../../../arch/alpha/alpha/locore.s:589
warning: Hit heuristic-fence-post without finding
warning: enclosing function for address 0x12002d89c

>How-To-Repeat:

Dunno, there is no dead sure method of repeating this, but running the system
for some time makes the problem pop up.

>Fix:

please...
>Release-Note:
>Audit-Trail:
>Unformatted: