NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
kern/56170: NFS-related: panic: lock error: Mutex: mutex_vector_enter,543: locking against myself
>Number: 56170
>Category: kern
>Synopsis: NFS+gcc-ASAN-related: panic: lock error: Mutex: mutex_vector_enter,543: locking against myself
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Fri May 14 20:45:00 +0000 2021
>Originator: Greg A. Woods
>Release: NetBSD 9.99.81
>Organization:
Planix, Inc.; Kelowna, BC; Canada
>Environment:
System: NetBSD xentastic 9.99.81 NetBSD 9.99.81 (XEN3_DOM0) #16: Thu May 6 13:40:07 PDT 2021 woods@xentastic:/build/woods/xentastic/current-amd64-amd64-obj/build/src/sys/arch/amd64/compile/XEN3_DOM0 amd64
Architecture: x86_64
Machine: amd64
>Description:
I've been trying out the GCC sanitizers on one of my recently
favourite little projects, and I've found I can reliably crash
NetBSD with one of the tests, when it is compiled with
USE_ASAN=yes, at least when it is run with $PWD on an NFS mount.
Here is the console output from an example crash:
[ 663.0426878] Mutex error: mutex_vector_enter,543: locking against myself
[ 663.0426878] lock address : 0xffffc8800b962b00
[ 663.0426878] current cpu : 1
[ 663.0426878] current lwp : 0xffffc8800b9db1c0
[ 663.0426878] owner field : 0xffffc8800b9db1c0 wait/spin: 0/0
[ 663.0426878] panic: lock error: Mutex: mutex_vector_enter,543: locking against myself: lock 0xffffc8800b00b9db1c0
[ 663.0426878] cpu1: Begin traceback...
[ 663.0426878] vpanic() at netbsd:vpanic+0x14a
[ 663.0426878] snprintf() at netbsd:snprintf
[ 663.0426878] lockdebug_abort() at netbsd:lockdebug_abort+0xcd
[ 663.0426878] mutex_vector_enter() at netbsd:mutex_vector_enter+0x406
[ 663.0426878] sigpending1() at netbsd:sigpending1+0x24
[ 663.0527222] nfs_sigintr() at netbsd:nfs_sigintr+0x2c
[ 663.0527222] nfs_rcvlock() at netbsd:nfs_rcvlock+0xaf
[ 663.0527222] nfs_request() at netbsd:nfs_request+0x40d
[ 663.0527222] nfs_access() at netbsd:nfs_access+0x1d4
[ 663.0527222] VOP_ACCESS() at netbsd:VOP_ACCESS+0x55
[ 663.0527222] getcwd_common() at netbsd:getcwd_common+0x251
[ 663.0527222] vnode_to_path() at netbsd:vnode_to_path+0xbb
[ 663.0527222] sysctl_vmproc() at netbsd:sysctl_vmproc+0x6cd
[ 663.0527222] sysctl_dispatch() at netbsd:sysctl_dispatch+0xa5
[ 663.0527222] sys___sysctl() at netbsd:sys___sysctl+0xc5
[ 663.0527222] syscall() at netbsd:syscall+0x9c
[ 663.0527222] --- syscall (number 202) ---
[ 663.0527222] netbsd:syscall+0x9c:
[ 663.0527222] cpu1: End traceback...
[ 663.0527222] fatal breakpoint trap in supervisor mode
[ 663.0527222] trap type 1 code 0 rip 0xffffffff8023e93d cs 0xe030 rflags 0x202 cr2 0x7f7ff6892ce0 ilevel
[ 663.0527222] curlwp 0xffffc8800b9db1c0 pid 6987.6987 lowest kstack 0xffffc880ef49a2c0
Stopped in pid 6987.6987 (yajl_test) at netbsd:breakpoint+0x5: leave
ds e650
es e600
fs e640
gs 10
rdi 0
rsi 1
rbp ffffc880ef49e640
rbx ffffffff80ed2f50 mutex_adaptive_lockops
rdx 2
rcx 0
rax 0
r8 ffffffff80ed2f50 mutex_adaptive_lockops
r9 1
r10 0
r11 fffffffe
r12 104
r13 ffffffff80d43960 ostype+0xa6448
r14 ffffc880ef49e688
r15 ffffffff80d3c46b ostype+0x9ef53
rip ffffffff8023e93d breakpoint+0x5
cs e030
rflags 202
rsp ffffc880ef49e640
ss e02b
netbsd:breakpoint+0x5: leave
db{1}> (XEN) [2021-05-14 18:09:45.682] Watchdog timer fired for domain 0
(XEN) [2021-05-14 18:09:45.682] Hardware Dom0 shutdown: watchdog rebooting machine
(I guess ddb.onpanic=1 and the Xen watchdog aren't very useful
together!)
>How-To-Repeat:
I don't yet have an isolated example test, but running the
regression tests in my robohack/yajl project, and in particular
the "ap_eof_str" test, with USE_ASAN=yes and with the source and
build on an NFS mount (which I'm only guessing about because of
the nfs_*() calls in the kernel stack backtrace), has reliably
reproduced this crash for me:
$ cd /some/NFS/mountpoint
$ git clone https://github.com/robohack/yajl
$ cd yajl
$ mkdir build
$ MAKEOBJDIRPREFIX=$(/bin/pwd)/build make regress USE_ASAN=yes MKDOC=no
If I understand correctly the system call involved here is
sysctl(2), and that there's something to do with proc too, but
I'm quite unfamiliar with ASAN runtime internals so I don't know
what it's doing to cause this, especially since a couple of
other tests have already run when this one crashes. I do know
that ASAN will check to make sure ASLR is not enabled, and it
will also mmap() something somewhere really high up and it fails
unless you do "ulimit -v unlimited" first.
If necessary I can try in a domU, or disable the Xen watchdog
for the dom0 (as otherwise I only have 20 seconds before the
reboot!), and try the crash again and do more DDB digging if
someone can guide me along. And/Or I can change what's in
ddb.commandonenter too...
>Fix:
>Unformatted:
2021-03-10T23:08:13Z
Home |
Main Index |
Thread Index |
Old Index