Subject: kern/20325: processes stuck waiting on vnlock
To: None <gnats-bugs@gnats.netbsd.org>
From: Martin Husemann <martin@aprisoft.de>
List: netbsd-bugs
Date: 02/13/2003 09:07:15
>Number:         20325
>Category:       kern
>Synopsis:       processes stuck waiting on vnlock
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Feb 13 00:08:00 PST 2003
>Closed-Date:
>Last-Modified:
>Originator:     Martin Husemann
>Release:        NetBSD 1.6.1_RC1
>Organization:
>Environment:
System: NetBSD burgvogt.aprisoft.de 1.6.1_RC1 NetBSD 1.6.1_RC1 (VOGT) #0: Tue Feb 11 09:07:24 CET 2003     martin@beasty.aprisoft.de:/usr/src-1-6/sys/arch/sparc/compile/VOGT sparc
Architecture: sparc
Machine: sparc
>Description:

After upgrading my sparc/1.6 router machine to the latest version on
the 1.6 branch some time ago (NFS root on a i386 1.6.1_RC1 system), it
started "wedging" once or twice a week. It continues to route packets
for a while, but my ISP disconnects the line after 24 hours and we need
the ip-down/ip-up scripts to take care of routing changes (new IP), and
those seem to not run - so it finally loses completely.

I have DEBUG, DIAGNOSTIC and LOCKDEBUG in the kernel now.
Breaking into ddb works fine.

It seems sshd and getty are all stuck waiting on vnlock:

db> tr                                                                    
zstty_stint(0xf02f2c68, 0x0, 0xf0116878, 0xf1958000, 0xf0196000, 0x104050a) at z
stty_stint+0x88                                                                 
zsc_intr_hard(0x8, 0xf02efe80, 0xf0172c00, 0xfe000000, 0x8de, 0x100) at zsc_intr
_hard+0x68                                                                      
zshard(0x0, 0xf010f6f0, 0xf00, 0x0, 0x1, 0xf0197df8) at zshard+0x40
sparc_interrupt44c(0x0, 0x0, 0xf0137bec, 0x0, 0xffffffff, 0x2) at sparc_interrup
t44c+0x170                                                                      
mi_switch(0xf0195ee8, 0x3c85, 0xf01706a8, 0xf01981e4, 0x0, 0x70) at mi_switch+0x
210                                                                             
ltsleep(0x0, 0x4, 0xf014f690, 0x0, 0x0, 0xf017d7bc) at ltsleep+0x24c
uvm_scheduler(0xf0195ee0, 0x1, 0xf0195c00, 0xf0142f50, 0xf0196000, 0xf01961e8) a
t uvm_scheduler+0x114                                                           

db> ps                                                          
 PID             PPID       PGRP        UID S   FLAGS          COMMAND    WAIT
 1515             173       1515          0 3     0x4             sshd  vnlock
 1514            1513       1513          0 3 0x100004               sh   netio
 1513            1511       1513          0 3  0x4084               sh    wait 
 1512            1510       1512          0 3 0x100114             cron nfsrcvl
 1511             197        197          0 3    0x84             cron  piperd 
 1510             197        197          0 3     0x4             cron  ppwait
 346                1        346          0 3  0x4006            getty  vnlock
 197                1        197          0 3     0x4             cron nfsrcvl
 189                1        189          0 3    0x84         ifwatchd   netio
 186                1        186          0 3    0x84            inetd   pause
 173                1        173          0 3     0x4             sshd  vnlock
 155                1        155          0 3     0x4             ntpd nfsrcvl
 78                 1         78          0 3     0x4          syslogd nfsrcvl
 11                 0          0          0 3 0x20204         aiodoned aiodone
 10                 0          0          0 3 0x20204          ioflush nfsrcvl
 9                  0          0          0 3 0x20204           reaper  reaper
 8                  0          0          0 3 0x20204       pagedaemon pgdaemo
 7                  0          0          0 3 0x20284            nfsio  nfsidl
 6                  0          0          0 3 0x20284            nfsio  nfsidl
 5                  0          0          0 3 0x20284            nfsio  nfsidl
 4                  0          0          0 3 0x20284            nfsio  nfsidl
 3                  0          0          0 3 0x20204         scsibus1  sccomp
 2                  0          0          0 3 0x20204         scsibus0  sccomp
 1                  0          1          0 3  0x4084             init    wait
 0                 -1          0          0 3 0x20204          swapper schedul


>How-To-Repeat:
Run 1.6.1_RC1 for a while? Not sure.
The NFS server (running a system compiled from the same sources) seems to be 
fine, so maybe being NFS client is important here?

>Fix:
wish I had one...
>Release-Note:
>Audit-Trail:
>Unformatted: