Subject: kern/30401: NFS/vnode lockup
To: None <kern-bug-people@netbsd.org, gnats-admin@netbsd.org,>
From: None <kardel@Orcus.project.Acrys.COM>
List: netbsd-bugs
Date: 06/02/2005 10:06:01
>Number: 30401
>Category: kern
>Synopsis: FS lockup in NFS server
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Thu Jun 02 10:06:00 +0000 2005
>Originator: Frank Kardel
>Release: NetBSD 2.0.2
>Organization:
>Environment:
System: NetBSD Orcus 2.0.2 NetBSD 2.0.2 (ORCUS32) #1: Wed Jun 1 07:55:53 CEST 2005 kardel@Orcus:/usr/obj/sys/arch/i386/compile.i386/ORCUS32 i386
Architecture: i386
Machine: i386
>Description:
About onec every day to two time a week our central NFS server lock up
with many processes waiting on vnlock an NFS not working anymore.
Reboot only works with the CPU on fire options as unmounting
of the affected file system cannot process either.
I do have a 2G core, but only a kernel without debugging
symbols.
Information on related / used subsystems:
Disk: std. IDE
RAID 1 via raidframe
UFS 1 FS
export via loopback mount
Some Linux clients
Problem looks similar to PR#30077 but hits our
production machine.
>How-To-Repeat:
Use the setup above. Wait a bit (unsually until you
need to do something with time contraints).
Lockup!
ps after killing of most unneeded programs to unmount as
many fs as possible looks like this:
UID PID PPID CPU PRI NI VSZ RSS WCHAN STAT TT TIME COMMAND
0 0 0 0 -18 0 0 377336 schedule DKs ?? 0:00.08 [swapper]
0 1 0 3 10 0 88 696 wait Is ?? 0:00.06 init
0 2 0 0 14 0 0 377336 crypto_w DK ?? 0:00.00 [cryptoret]
0 3 0 0 10 0 0 377336 usbevt DK ?? 0:00.00 [usb0]
0 4 0 0 10 0 0 377336 usbtsk DK ?? 0:00.00 [usbtask]
0 5 0 0 10 0 0 377336 usbevt DK ?? 0:00.00 [usb1]
0 6 0 0 -6 0 0 377336 sccomp DK ?? 0:00.00 [scsibus0]
0 7 0 0 -6 0 0 377336 sccomp DK ?? 0:00.01 [scsibus1]
0 8 0 0 -6 0 0 377336 atath DK ?? 0:00.00 [atabus0]
0 9 0 0 -6 0 0 377336 atath DK ?? 0:00.01 [atabus1]
0 10 0 0 -6 0 0 377336 atath DK ?? 0:00.01 [atabus2]
0 11 0 0 -6 0 0 377336 atath DK ?? 0:00.01 [atabus3]
0 12 0 0 -6 0 0 377336 atath DK ?? 0:00.00 [atabus4]
0 13 0 0 -6 0 0 377336 atath DK ?? 0:00.00 [atabus5]
0 14 0 0 -6 0 0 377336 atath DK ?? 0:00.00 [atabus6]
0 15 0 0 -6 0 0 377336 atath DK ?? 0:00.00 [atabus7]
0 16 0 0 -6 0 0 377336 atath DK ?? 0:00.01 [atabus8]
0 17 0 0 -6 0 0 377336 atath DK ?? 0:00.01 [atabus9]
0 18 0 0 10 0 0 377336 pmsreset DK ?? 0:00.00 [pms0]
0 19 0 0 -18 0 0 377336 lfswrite DK ?? 0:00.00 [lfs_writer]
0 20 0 0 -18 0 0 377336 pgdaemon DK ?? 0:07.18 [pagedaemon]
0 21 0 0 18 0 0 377336 syncer DK ?? 3:05.06 [ioflush]
0 22 0 0 -18 0 0 377336 aiodoned DK ?? 0:04.64 [aiodoned]
0 30 0 0 -6 0 0 377336 rfwcond DK ?? 0:04.23 [raid0]
0 31 0 0 -6 0 0 377336 raidiow DK ?? 0:01.16 [raidio0]
0 57 0 0 -6 0 0 377336 rfwcond DK ?? 0:06.77 [raid1]
0 58 0 0 -6 0 0 377336 raidiow DK ?? 0:01.51 [raidio1]
0 61 0 0 -6 0 0 377336 rfwcond DK ?? 0:04.06 [raid2]
0 96 0 0 -6 0 0 377336 raidiow DK ?? 0:01.41 [raidio2]
0 98 0 0 -6 0 0 377336 rfwcond DK ?? 0:00.00 [raid3]
0 99 0 0 -6 0 0 377336 raidiow DK ?? 0:00.00 [raidio3]
0 156 0 0 10 0 0 377336 nfsidl IK ?? 0:00.03 [nfsio]
0 218 0 0 10 0 0 377336 nfsidl IK ?? 0:00.03 [nfsio]
0 219 0 0 10 0 0 377336 nfsidl IK ?? 0:00.03 [nfsio]
0 225 0 0 10 0 0 377336 nfsidl IK ?? 0:00.03 [nfsio]
0 244 0 0 10 0 0 377336 nfsidl IK ?? 0:00.03 [nfsio]
0 252 0 0 10 0 0 377336 nfsidl IK ?? 0:00.03 [nfsio]
0 563 1 0 10 0 200 232 mfsidl Is ?? 0:00.49 mount_mfs -s 2048000 /dev/wd0b /tmp
0 693 1 0 -2 0 44 520 vnlock DL ?? 0:03.75 nfsd: server
0 797 1 0 -2 0 44 520 vnlock DL ?? 0:04.15 nfsd: server
0 806 1 0 -2 0 476 700 vnlock Ds ?? 0:00.09 /usr/sbin/mountd
0 823 1 0 -2 0 44 520 vnlock DL ?? 0:03.99 nfsd: server
0 888 1 0 -18 0 44 520 uvn_fp2 DL ?? 0:03.07 nfsd: server
0 2023 1 0 -2 0 2624 1676 vnlock DWEsa ?? 16:47.14 (bacula-fd)
1003 4920 1 0 -2 0 492 4 vnlock DWs ?? 0:00.01 sshd: kardel [priv]
1003 5579 1 0 -2 0 492 4 vnlock DWs ?? 0:00.01 sshd: kardel [priv]
16 6866 23119 0 -22 0 0 0 - ZW ?? 0:00.00 (sshd)
0 10975 1 9 -2 0 168 4 vnlock DW ?? 0:03.32 find / ( ! -fstype local -o -fstype rdonly -o -fstype fdesc -o -fstype null -o -fstype kernfs -o -fstype procfs ) -prune -o -name lost+found -prune -o ( -name *.core -o -name core ) -type f -print
1003 14958 1 0 -2 0 492 4 vnlock DWs ?? 0:00.01 sshd: kardel [priv]
16 19367 5579 0 -22 0 0 0 - ZW ?? 0:00.00 (sshd)
16 20254 14958 0 -22 0 0 0 - ZW ?? 0:00.00 (sshd)
1003 23119 1 0 -2 0 492 2088 vnlock Ds ?? 0:00.01 sshd: kardel [priv]
16 25303 4920 0 -22 0 0 0 - ZW ?? 0:00.00 (sshd)
0 23854 1 0 10 0 136 640 wait Ss E0 0:00.00 -sh
0 26458 23854 0 28 0 100 616 - R+ E0 0:00.00 ps alxww
>Fix:
not known
I'll tryy to get rid of the loopback mounts in order to avoid
possible trouble from there.