Subject: kern/35542: NFS rename(?) panics (panic: lockmgr: release of unlocked lock!)
To: None <kern-bug-people@netbsd.org, gnats-admin@netbsd.org,>
From: None <arto@selonen.org>
List: netbsd-bugs
Date: 02/02/2007 08:05:00
>Number:         35542
>Category:       kern
>Synopsis:       NFS rename(?) panics (panic: lockmgr: release of unlocked lock!)
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Feb 02 08:05:00 +0000 2007
>Originator:     Arto Selonen
>Release:        NetBSD-current 4.99.9 ~20070201
>Organization:
>Environment:
NetBSD blah 4.99.9 NetBSD 4.99.9 (BLAH) #4: Thu Feb  1 16:13:51 EET 2007  blah@blah:/obj/sys/arch/i386/compile/BLAH i386

>Description:
The system is a NFS server serving 2 1TB partitions from a twelve disk RAID array (3ware Escalade).

The system was upgraded on January 25th (previous upgrade was on November 28th), and ran without problems for roughly a week. Then on February 1st, it paniced ("panic: lockmgr: release of unlocked lock!"). Repeated reboots resulted in similar panics pretty much as soon as network interface went up. Booting to single user and turning NFS services off made system stable (and NFS disks inaccessible).

The system was then upgraded on February 1st with whatever sources anoncvs gave, and then NFS services were turned back on. After a reboot, once network interface came up, it paniced again.

At the moment, I don't have any network traces for possible client traffic, but I have a "db> reboot 0x104" crash dump of the latest panic, and the following function call trace (just to give an idea of what is going on):

panic: lockmgr: release of unlocked lock!
Stopped in pid 542.1 (nfsd) at netbsd:cpu_Debugger
db> tr
cpu_Debugger
panic
lockmgr
nfs_unlock
VOP_UNLOCK
ufs_inactive
VOP_INACTIVE
vput
nfsrv_rename
nfssvc_nfsd
sys_nfssvc
syscall_plain

Purely guessing from the trace and recent source changes, with simple string matching, I'm guessing this might have something to do with eg. these (of course I could be way off here, as I have no idea of the functional relevance, this is purely from browsing commit messages from December-January for "relevant" strings):

http://mail-index.netbsd.org/source-changes/2006/12/27/0030.html
http://mail-index.netbsd.org/source-changes/2007/01/01/0030.html
http://mail-index.netbsd.org/source-changes/2007/01/07/0045.html
http://mail-index.netbsd.org/source-changes/2007/01/07/0046.html

I have the following crash dump available:

-rw-------    1 root     wheel    10021802 Feb  2 09:13 netbsd.2.core.gz
-rw-------    1 root     wheel     1732192 Feb  2 09:13 netbsd.2.gz

Due to privacy issues, I can not provide those files, but I'm willing to follow instructions on how to access them, if needed.

Kernel config has not been touched in over a year:

include         "arch/i386/conf/std.i386"
options         INCLUDE_CONFIG_FILE
maxusers        32
options         I686_CPU
options         VM86
options         MTRR
options         INSECURE
options         RTC_OFFSET=0
options         NTP
options         KTRACE
options         SYSTRACE
options         SYSVMSG
options         SYSVSEM
options         SYSVSHM
options         P1003_1B_SEMAPHORE
options         NMBCLUSTERS=16384
options         LKM
options         USERCONF
options         BEEP_ONHALT
options         DIAGNOSTIC
options         DEBUG
options         KMEMSTATS
options         DDB
options         DDB_ONPANIC=1
options         DDB_HISTORY_SIZE=512
makeoptions     DEBUG="-g"
options         COMPAT_16
options         COMPAT_BSDPTY
file-system     FFS
file-system     EXT2FS
file-system     LFS
file-system     MFS
file-system     NFS
file-system     CD9660
file-system     MSDOSFS
file-system     KERNFS
file-system     NULLFS
file-system     OVERLAY
file-system     PORTAL
file-system     PROCFS
file-system     UMAPFS
file-system     UNION
options         QUOTA
options         SOFTDEP
options         NFSSERVER
options         GATEWAY
options         INET
options         IPSEC
options         IPSEC_ESP
options         PPP_BSDCOMP
options         PPP_DEFLATE
options         PPP_FILTER
options         PFIL_HOOKS
options         IPFILTER_LOG
options         IPFILTER_DEFAULT_BLOCK
options         MIIVERBOSE
options         PCIVERBOSE
options         USBVERBOSE
options         PNPBIOSVERBOSE
options         WSEMUL_VT100
options         WS_KERNEL_FG=WSCOL_GREEN
options         WSDISPLAY_COMPAT_PCVT
options         WSDISPLAY_COMPAT_SYSCONS
options         WSDISPLAY_COMPAT_USL
options         WSDISPLAY_COMPAT_RAWKBD
options         PCKBD_LAYOUT="(KB_SV | KB_NODEAD)"
options         PCDISPLAY_SOFTCURSOR
<skipped devices>
pseudo-device   crypto
pseudo-device   md              1
pseudo-device   vnd             4
pseudo-device   bpfilter        8
pseudo-device   ipfilter
pseudo-device   loop
pseudo-device   ppp             8
pseudo-device   tap
pseudo-device   tun             2
pseudo-device   gif             4
pseudo-device   vlan
pseudo-device   pty
pseudo-device   rnd
pseudo-device   clockctl
pseudo-device   wsmux
pseudo-device   wsfont
pseudo-device   ksyms

I can provide dmesg output if needed.

Anything else I could provide or test to help get this problem fixed?
>How-To-Repeat:
At the moment, this is very repeatable, as the system goes down as soon as I get it up. No idea of the cause, so don't know if this remains (assuming there is a NFS client sending bad data, that decides to stop sending bad data).
>Fix:
Turn off NFS services to keep the system up. No known fix for keeping NFS services going, though.