Subject: kern/30401: NFS/vnode lockup
To: None <kern-bug-people@netbsd.org, gnats-admin@netbsd.org,>
From: None <kardel@Orcus.project.Acrys.COM>
List: netbsd-bugs
Date: 06/02/2005 10:06:01
>Number:         30401
>Category:       kern
>Synopsis:       FS lockup in NFS server
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Jun 02 10:06:00 +0000 2005
>Originator:     Frank Kardel
>Release:        NetBSD 2.0.2
>Organization:
>Environment:
System: NetBSD Orcus 2.0.2 NetBSD 2.0.2 (ORCUS32) #1: Wed Jun 1 07:55:53 CEST 2005 kardel@Orcus:/usr/obj/sys/arch/i386/compile.i386/ORCUS32 i386
Architecture: i386
Machine: i386
>Description:
	About onec every day to two time a week our central NFS server lock up
	with many processes waiting on vnlock an NFS not working anymore.
	Reboot only works with the CPU on fire options as unmounting
	of the affected file system cannot process either.
	I do have a 2G core, but only a kernel without debugging
	symbols.
	Information on related / used subsystems:
		Disk: std. IDE
		RAID 1 via raidframe
		UFS 1 FS
		export via loopback mount
		Some Linux clients

	Problem looks similar to PR#30077 but hits our
	production machine.

>How-To-Repeat:
	Use the setup above. Wait a bit (unsually until you
	need to do something with time contraints).
	Lockup!

	ps after killing of most unneeded programs to unmount as
	many fs as possible looks like this:

 UID   PID  PPID CPU PRI NI  VSZ    RSS WCHAN    STAT  TT     TIME COMMAND
   0     0     0   0 -18  0    0 377336 schedule DKs   ??  0:00.08 [swapper]
   0     1     0   3  10  0   88    696 wait     Is    ??  0:00.06 init 
   0     2     0   0  14  0    0 377336 crypto_w DK    ??  0:00.00 [cryptoret]
   0     3     0   0  10  0    0 377336 usbevt   DK    ??  0:00.00 [usb0]
   0     4     0   0  10  0    0 377336 usbtsk   DK    ??  0:00.00 [usbtask]
   0     5     0   0  10  0    0 377336 usbevt   DK    ??  0:00.00 [usb1]
   0     6     0   0  -6  0    0 377336 sccomp   DK    ??  0:00.00 [scsibus0]
   0     7     0   0  -6  0    0 377336 sccomp   DK    ??  0:00.01 [scsibus1]
   0     8     0   0  -6  0    0 377336 atath    DK    ??  0:00.00 [atabus0]
   0     9     0   0  -6  0    0 377336 atath    DK    ??  0:00.01 [atabus1]
   0    10     0   0  -6  0    0 377336 atath    DK    ??  0:00.01 [atabus2]
   0    11     0   0  -6  0    0 377336 atath    DK    ??  0:00.01 [atabus3]
   0    12     0   0  -6  0    0 377336 atath    DK    ??  0:00.00 [atabus4]
   0    13     0   0  -6  0    0 377336 atath    DK    ??  0:00.00 [atabus5]
   0    14     0   0  -6  0    0 377336 atath    DK    ??  0:00.00 [atabus6]
   0    15     0   0  -6  0    0 377336 atath    DK    ??  0:00.00 [atabus7]
   0    16     0   0  -6  0    0 377336 atath    DK    ??  0:00.01 [atabus8]
   0    17     0   0  -6  0    0 377336 atath    DK    ??  0:00.01 [atabus9]
   0    18     0   0  10  0    0 377336 pmsreset DK    ??  0:00.00 [pms0]
   0    19     0   0 -18  0    0 377336 lfswrite DK    ??  0:00.00 [lfs_writer]
   0    20     0   0 -18  0    0 377336 pgdaemon DK    ??  0:07.18 [pagedaemon]
   0    21     0   0  18  0    0 377336 syncer   DK    ??  3:05.06 [ioflush]
   0    22     0   0 -18  0    0 377336 aiodoned DK    ??  0:04.64 [aiodoned]
   0    30     0   0  -6  0    0 377336 rfwcond  DK    ??  0:04.23 [raid0]
   0    31     0   0  -6  0    0 377336 raidiow  DK    ??  0:01.16 [raidio0]
   0    57     0   0  -6  0    0 377336 rfwcond  DK    ??  0:06.77 [raid1]
   0    58     0   0  -6  0    0 377336 raidiow  DK    ??  0:01.51 [raidio1]
   0    61     0   0  -6  0    0 377336 rfwcond  DK    ??  0:04.06 [raid2]
   0    96     0   0  -6  0    0 377336 raidiow  DK    ??  0:01.41 [raidio2]
   0    98     0   0  -6  0    0 377336 rfwcond  DK    ??  0:00.00 [raid3]
   0    99     0   0  -6  0    0 377336 raidiow  DK    ??  0:00.00 [raidio3]
   0   156     0   0  10  0    0 377336 nfsidl   IK    ??  0:00.03 [nfsio]
   0   218     0   0  10  0    0 377336 nfsidl   IK    ??  0:00.03 [nfsio]
   0   219     0   0  10  0    0 377336 nfsidl   IK    ??  0:00.03 [nfsio]
   0   225     0   0  10  0    0 377336 nfsidl   IK    ??  0:00.03 [nfsio]
   0   244     0   0  10  0    0 377336 nfsidl   IK    ??  0:00.03 [nfsio]
   0   252     0   0  10  0    0 377336 nfsidl   IK    ??  0:00.03 [nfsio]
   0   563     1   0  10  0  200    232 mfsidl   Is    ??  0:00.49 mount_mfs -s 2048000 /dev/wd0b /tmp 
   0   693     1   0  -2  0   44    520 vnlock   DL    ??  0:03.75 nfsd: server 
   0   797     1   0  -2  0   44    520 vnlock   DL    ??  0:04.15 nfsd: server 
   0   806     1   0  -2  0  476    700 vnlock   Ds    ??  0:00.09 /usr/sbin/mountd 
   0   823     1   0  -2  0   44    520 vnlock   DL    ??  0:03.99 nfsd: server 
   0   888     1   0 -18  0   44    520 uvn_fp2  DL    ??  0:03.07 nfsd: server 
   0  2023     1   0  -2  0 2624   1676 vnlock   DWEsa ?? 16:47.14 (bacula-fd)
1003  4920     1   0  -2  0  492      4 vnlock   DWs   ??  0:00.01 sshd: kardel [priv] 
1003  5579     1   0  -2  0  492      4 vnlock   DWs   ??  0:00.01 sshd: kardel [priv] 
  16  6866 23119   0 -22  0    0      0 -        ZW    ??  0:00.00 (sshd)
   0 10975     1   9  -2  0  168      4 vnlock   DW    ??  0:03.32 find / ( ! -fstype local -o -fstype rdonly -o -fstype fdesc -o -fstype null -o -fstype kernfs -o -fstype procfs ) -prune -o -name lost+found -prune -o ( -name *.core -o -name core ) -type f -print 
1003 14958     1   0  -2  0  492      4 vnlock   DWs   ??  0:00.01 sshd: kardel [priv] 
  16 19367  5579   0 -22  0    0      0 -        ZW    ??  0:00.00 (sshd)
  16 20254 14958   0 -22  0    0      0 -        ZW    ??  0:00.00 (sshd)
1003 23119     1   0  -2  0  492   2088 vnlock   Ds    ??  0:00.01 sshd: kardel [priv] 
  16 25303  4920   0 -22  0    0      0 -        ZW    ??  0:00.00 (sshd)
   0 23854     1   0  10  0  136    640 wait     Ss    E0  0:00.00 -sh 
   0 26458 23854   0  28  0  100    616 -        R+    E0  0:00.00 ps alxww 
>Fix:
	not known
	I'll tryy to get rid of the loopback mounts in order to avoid
	possible trouble from there.