https://www.netbsd.org/~christos/nfs.diff for a disgusting hack I am using to avoid this. christos > On May 14, 2021, at 4:45 PM, Greg A. Woods <woods%planix.ca@localhost> wrote: > >> Number: 56170 >> Category: kern >> Synopsis: NFS+gcc-ASAN-related: panic: lock error: Mutex: mutex_vector_enter,543: locking against myself >> Confidential: no >> Severity: serious >> Priority: medium >> Responsible: kern-bug-people >> State: open >> Class: sw-bug >> Submitter-Id: net >> Arrival-Date: Fri May 14 20:45:00 +0000 2021 >> Originator: Greg A. Woods >> Release: NetBSD 9.99.81 >> Organization: > Planix, Inc.; Kelowna, BC; Canada >> Environment: > System: NetBSD xentastic 9.99.81 NetBSD 9.99.81 (XEN3_DOM0) #16: Thu May 6 13:40:07 PDT 2021 woods@xentastic:/build/woods/xentastic/current-amd64-amd64-obj/build/src/sys/arch/amd64/compile/XEN3_DOM0 amd64 > Architecture: x86_64 > Machine: amd64 >> Description: > > I've been trying out the GCC sanitizers on one of my recently > favourite little projects, and I've found I can reliably crash > NetBSD with one of the tests, when it is compiled with > USE_ASAN=yes, at least when it is run with $PWD on an NFS mount. > > Here is the console output from an example crash: > > > [ 663.0426878] Mutex error: mutex_vector_enter,543: locking against myself > > [ 663.0426878] lock address : 0xffffc8800b962b00 > [ 663.0426878] current cpu : 1 > [ 663.0426878] current lwp : 0xffffc8800b9db1c0 > [ 663.0426878] owner field : 0xffffc8800b9db1c0 wait/spin: 0/0 > > [ 663.0426878] panic: lock error: Mutex: mutex_vector_enter,543: locking against myself: lock 0xffffc8800b00b9db1c0 > [ 663.0426878] cpu1: Begin traceback... > [ 663.0426878] vpanic() at netbsd:vpanic+0x14a > [ 663.0426878] snprintf() at netbsd:snprintf > [ 663.0426878] lockdebug_abort() at netbsd:lockdebug_abort+0xcd > [ 663.0426878] mutex_vector_enter() at netbsd:mutex_vector_enter+0x406 > [ 663.0426878] sigpending1() at netbsd:sigpending1+0x24 > [ 663.0527222] nfs_sigintr() at netbsd:nfs_sigintr+0x2c > [ 663.0527222] nfs_rcvlock() at netbsd:nfs_rcvlock+0xaf > [ 663.0527222] nfs_request() at netbsd:nfs_request+0x40d > [ 663.0527222] nfs_access() at netbsd:nfs_access+0x1d4 > [ 663.0527222] VOP_ACCESS() at netbsd:VOP_ACCESS+0x55 > [ 663.0527222] getcwd_common() at netbsd:getcwd_common+0x251 > [ 663.0527222] vnode_to_path() at netbsd:vnode_to_path+0xbb > [ 663.0527222] sysctl_vmproc() at netbsd:sysctl_vmproc+0x6cd > [ 663.0527222] sysctl_dispatch() at netbsd:sysctl_dispatch+0xa5 > [ 663.0527222] sys___sysctl() at netbsd:sys___sysctl+0xc5 > [ 663.0527222] syscall() at netbsd:syscall+0x9c > [ 663.0527222] --- syscall (number 202) --- > [ 663.0527222] netbsd:syscall+0x9c: > [ 663.0527222] cpu1: End traceback... > [ 663.0527222] fatal breakpoint trap in supervisor mode > [ 663.0527222] trap type 1 code 0 rip 0xffffffff8023e93d cs 0xe030 rflags 0x202 cr2 0x7f7ff6892ce0 ilevel > > [ 663.0527222] curlwp 0xffffc8800b9db1c0 pid 6987.6987 lowest kstack 0xffffc880ef49a2c0 > Stopped in pid 6987.6987 (yajl_test) at netbsd:breakpoint+0x5: leave > ds e650 > es e600 > fs e640 > gs 10 > rdi 0 > rsi 1 > rbp ffffc880ef49e640 > rbx ffffffff80ed2f50 mutex_adaptive_lockops > rdx 2 > rcx 0 > rax 0 > r8 ffffffff80ed2f50 mutex_adaptive_lockops > r9 1 > r10 0 > r11 fffffffe > r12 104 > r13 ffffffff80d43960 ostype+0xa6448 > r14 ffffc880ef49e688 > r15 ffffffff80d3c46b ostype+0x9ef53 > rip ffffffff8023e93d breakpoint+0x5 > cs e030 > rflags 202 > rsp ffffc880ef49e640 > ss e02b > netbsd:breakpoint+0x5: leave > db{1}> (XEN) [2021-05-14 18:09:45.682] Watchdog timer fired for domain 0 > (XEN) [2021-05-14 18:09:45.682] Hardware Dom0 shutdown: watchdog rebooting machine > > (I guess ddb.onpanic=1 and the Xen watchdog aren't very useful > together!) > > >> How-To-Repeat: > > I don't yet have an isolated example test, but running the > regression tests in my robohack/yajl project, and in particular > the "ap_eof_str" test, with USE_ASAN=yes and with the source and > build on an NFS mount (which I'm only guessing about because of > the nfs_*() calls in the kernel stack backtrace), has reliably > reproduced this crash for me: > > $ cd /some/NFS/mountpoint > $ git clone https://github.com/robohack/yajl > $ cd yajl > $ mkdir build > $ MAKEOBJDIRPREFIX=$(/bin/pwd)/build make regress USE_ASAN=yes MKDOC=no > > If I understand correctly the system call involved here is > sysctl(2), and that there's something to do with proc too, but > I'm quite unfamiliar with ASAN runtime internals so I don't know > what it's doing to cause this, especially since a couple of > other tests have already run when this one crashes. I do know > that ASAN will check to make sure ASLR is not enabled, and it > will also mmap() something somewhere really high up and it fails > unless you do "ulimit -v unlimited" first. > > If necessary I can try in a domU, or disable the Xen watchdog > for the dom0 (as otherwise I only have 20 seconds before the > reboot!), and try the crash again and do more DDB digging if > someone can guide me along. And/Or I can change what's in > ddb.commandonenter too... > >> Fix: > >> Unformatted: > 2021-03-10T23:08:13Z
Attachment:
signature.asc
Description: Message signed with OpenPGP