Subject: kern/20646: deadlock on vnlock (?)
To: None <gnats-bugs@gnats.netbsd.org>
From: None <martin@duskware.de>
List: netbsd-bugs
Date: 03/10/2003 12:33:19
>Number:         20646
>Category:       kern
>Synopsis:       deadlock on vnlock(?)
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Mar 10 03:34:00 PST 2003
>Closed-Date:
>Last-Modified:
>Originator:     Martin Husemann
>Release:        NetBSD 1.6.1_RC2
>Organization:
>Environment:
System: NetBSD burgvogt.aprisoft.de 1.6.1_RC2 NetBSD 1.6.1_RC2 (VOGT) #0: Wed Mar  5 11:49:47 CET 2003     martin@beasty.aprisoft.de:/usr/src-1-6/sys/arch/sparc/compile/VOGT sparc
Architecture: sparc
Machine: sparc
>Description:

From time to time a sparc station 2 used as DSL router, running diskless from
a i386 NFS server build from the same sources, locks up. In this state the
kernel continues to route packets, but apparently the (NFS root) filesystem
is locked, so no new process can be forked.

Stopped at      cpu_Debugger+0x4:       jmpl            [%o7 + 0x8], %g0
db> ps
 PID             PPID       PGRP        UID S   FLAGS          COMMAND    WAIT
 1061             176       1061          0 3     0x4             sshd  vnlock
 1060             192        192          0 3 0x100014         ifwatchd  vnlock
 272                1        272          0 3 0x104006            getty  vnlock
 240                1        240          0 3     0x4             ntpd nfsrcvl
 200                1        200          0 3     0x4             cron   netio
 192                1        192          0 3     0x4         ifwatchd  ppwait
 189                1        189          0 3    0x84            inetd   pause
 176                1        176          0 3     0x4             sshd  vnlock
 78                 1         78          0 3    0x84          syslogd  select
 11                 0          0          0 3 0x20204         aiodoned aiodone
 10                 0          0          0 3 0x20204          ioflush  syncer
 9                  0          0          0 3 0x20204           reaper  reaper
 8                  0          0          0 3 0x20204       pagedaemon pgdaemo
 7                  0          0          0 3 0x20284            nfsio  nfsidl
 6                  0          0          0 3 0x20284            nfsio  nfsidl
 5                  0          0          0 3 0x20284            nfsio  nfsidl
 4                  0          0          0 3 0x20284            nfsio  nfsidl
 3                  0          0          0 3 0x20204         scsibus1  sccomp
 2                  0          0          0 3 0x20204         scsibus0  sccomp
 1                  0          1          0 3  0x4084             init    wait
 0                 -1          0          0 3 0x20204          swapper schedul

db> ps/w
 PID          COMMAND     EMUL  PRI UTIME STIME WAIT-MSG    WAIT-CHANNEL
 1061            sshd   netbsd   20   0.0   0.0 vnlock      0xf2f8106c
 1060        ifwatchd   netbsd   20   0.0   0.0 vnlock      0xf2f8106c
 272            getty   netbsd   20   0.0   0.1 vnlock      0xf2f8106c
 240             ntpd   netbsd   21   7.6  23.8 nfsrcvlk    0xf03675c0
 200             cron   netbsd   24   1.1   7.1 netio       0xf0368044
 192         ifwatchd   netbsd   32   0.0   0.0 ppwait      0xf30b33c0
 189            inetd   netbsd   40   0.0   0.0 pause       0xf2f99610
 176             sshd   netbsd   20  39.3   0.3 vnlock      0xf2f8106c
 78           syslogd   netbsd   24   1.0   0.6 select      selwait
 11          aiodoned   netbsd    4   0.0   0.1 aiodoned    uvm+0x34
 10           ioflush   netbsd   40   0.0  30.9 syncer      rushjob
 9             reaper   netbsd    4   0.0   6.9 reaper      deadproc
 8         pagedaemon   netbsd    4   0.0   0.0 pgdaemon    uvm+0x28
 7              nfsio   netbsd   32   0.0   0.0 nfsidl      nfs_iodwant+0xc
 6              nfsio   netbsd   32   0.0   0.0 nfsidl      nfs_iodwant+0x8
 5              nfsio   netbsd   32   0.0   0.0 nfsidl      nfs_iodwant+0x4
 4              nfsio   netbsd   32   0.0   0.3 nfsidl      nfs_iodwant
 3           scsibus1   netbsd   16   0.0   0.0 sccomp      0xf02f2500
 2           scsibus0   netbsd   16   0.0   0.0 sccomp      0xf02f2900
 1               init   netbsd   32   0.0   0.2 wait        0xf276e000
 0            swapper   netbsd    4   0.0   1.2 scheduler   proc0

db> ps/a:
 1061            sshd         0xf2f88b18         0xf30e1000         0xf2f734b0
 1060        ifwatchd         0xf2f88cf0         0xf30d6000         0xf2f73640
 272            getty         0xf2f88590         0xf2f92000         0xf2f730c8
 240             ntpd         0xf30b3b20         0xf30d8000         0xf2f737d0
 200             cron         0xf30b3948         0xf30d1000         0xf2f73898
 192         ifwatchd         0xf30b33c0         0xf30c4000         0xf2f73640
 189            inetd         0xf30b31e8         0xf30bd000         0xf2f73190
 176             sshd         0xf30b3010         0xf30b6000         0xf2f73320
 78           syslogd         0xf2f88940         0xf3041000         0xf2f73258
 11          aiodoned         0xf2f883b8         0xf2f8b000         0xf0195e18
 10           ioflush         0xf2f881e0         0xf2f89000         0xf0195e18
 9             reaper         0xf2f88008         0xf2f86000         0xf0195e18
 8         pagedaemon         0xf276ece8         0xf2f84000         0xf0195e18
 7              nfsio         0xf276eb10         0xf2f7f000         0xf0195e18
 6              nfsio         0xf276e938         0xf2f7d000         0xf0195e18
 5              nfsio         0xf276e760         0xf2f7b000         0xf0195e18
 4              nfsio         0xf276e588         0xf2f79000         0xf0195e18
 3           scsibus1         0xf276e3b0         0xf2f77000         0xf0195e18
 2           scsibus0         0xf276e1d8         0xf2f75000         0xf0195e18
 1               init         0xf276e000         0xf276c000         0xf2f73000
 0            swapper         0xf0195ee8         0xf0170488         0xf0195e18

>How-To-Repeat:
Dunno - it happens regularily for me on this machine - the other machine (i386)
using the same source tree and facing higher network load is completely 
stable.

>Fix:
n/a
>Release-Note:
>Audit-Trail:
>Unformatted: