NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/56170: NFS-related: panic: lock error: Mutex: mutex_vector_enter,543: locking against myself



The following reply was made to PR kern/56170; it has been noted by GNATS.

From: Christos Zoulas <christos%zoulas.com@localhost>
To: gnats-bugs%netbsd.org@localhost
Cc: kern-bug-people%netbsd.org@localhost,
 gnats-admin%netbsd.org@localhost,
 netbsd-bugs%netbsd.org@localhost
Subject: Re: kern/56170: NFS-related: panic: lock error: Mutex:
 mutex_vector_enter,543: locking against myself
Date: Fri, 14 May 2021 17:24:29 -0400

 --Apple-Mail=_8E4D82D2-4B57-482B-9A3A-6F9F00CB463A
 Content-Transfer-Encoding: quoted-printable
 Content-Type: text/plain;
 	charset=us-ascii
 
 https://www.netbsd.org/~christos/nfs.diff for a disgusting hack I am =
 using to avoid this.
 
 christos
 
 > On May 14, 2021, at 4:45 PM, Greg A. Woods <woods%planix.ca@localhost> wrote:
 >=20
 >> Number:         56170
 >> Category:       kern
 >> Synopsis:       NFS+gcc-ASAN-related: panic: lock error: Mutex: =
 mutex_vector_enter,543: locking against myself
 >> Confidential:   no
 >> Severity:       serious
 >> Priority:       medium
 >> Responsible:    kern-bug-people
 >> State:          open
 >> Class:          sw-bug
 >> Submitter-Id:   net
 >> Arrival-Date:   Fri May 14 20:45:00 +0000 2021
 >> Originator:     Greg A. Woods
 >> Release:        NetBSD 9.99.81
 >> Organization:
 > Planix, Inc.; Kelowna, BC; Canada
 >> Environment:
 > System: NetBSD xentastic 9.99.81 NetBSD 9.99.81 (XEN3_DOM0) #16: Thu =
 May 6 13:40:07 PDT 2021 =
 woods@xentastic:/build/woods/xentastic/current-amd64-amd64-obj/build/src/s=
 ys/arch/amd64/compile/XEN3_DOM0 amd64
 > Architecture: x86_64
 > Machine: amd64
 >> Description:
 >=20
 > 	I've been trying out the GCC sanitizers on one of my recently
 > 	favourite little projects, and I've found I can reliably crash
 > 	NetBSD with one of the tests, when it is compiled with
 > 	USE_ASAN=3Dyes, at least when it is run with $PWD on an NFS =
 mount.
 >=20
 > 	Here is the console output from an example crash:
 >=20
 >=20
 > [ 663.0426878] Mutex error: mutex_vector_enter,543: locking against =
 myself
 >=20
 > [ 663.0426878] lock address : 0xffffc8800b962b00
 > [ 663.0426878] current cpu  :                  1
 > [ 663.0426878] current lwp  : 0xffffc8800b9db1c0
 > [ 663.0426878] owner field  : 0xffffc8800b9db1c0 wait/spin:            =
     0/0
 >=20
 > [ 663.0426878] panic: lock error: Mutex: mutex_vector_enter,543: =
 locking against myself: lock 0xffffc8800b00b9db1c0
 > [ 663.0426878] cpu1: Begin traceback...
 > [ 663.0426878] vpanic() at netbsd:vpanic+0x14a
 > [ 663.0426878] snprintf() at netbsd:snprintf
 > [ 663.0426878] lockdebug_abort() at netbsd:lockdebug_abort+0xcd
 > [ 663.0426878] mutex_vector_enter() at netbsd:mutex_vector_enter+0x406
 > [ 663.0426878] sigpending1() at netbsd:sigpending1+0x24
 > [ 663.0527222] nfs_sigintr() at netbsd:nfs_sigintr+0x2c
 > [ 663.0527222] nfs_rcvlock() at netbsd:nfs_rcvlock+0xaf
 > [ 663.0527222] nfs_request() at netbsd:nfs_request+0x40d
 > [ 663.0527222] nfs_access() at netbsd:nfs_access+0x1d4
 > [ 663.0527222] VOP_ACCESS() at netbsd:VOP_ACCESS+0x55
 > [ 663.0527222] getcwd_common() at netbsd:getcwd_common+0x251
 > [ 663.0527222] vnode_to_path() at netbsd:vnode_to_path+0xbb
 > [ 663.0527222] sysctl_vmproc() at netbsd:sysctl_vmproc+0x6cd
 > [ 663.0527222] sysctl_dispatch() at netbsd:sysctl_dispatch+0xa5
 > [ 663.0527222] sys___sysctl() at netbsd:sys___sysctl+0xc5
 > [ 663.0527222] syscall() at netbsd:syscall+0x9c
 > [ 663.0527222] --- syscall (number 202) ---
 > [ 663.0527222] netbsd:syscall+0x9c:
 > [ 663.0527222] cpu1: End traceback...
 > [ 663.0527222] fatal breakpoint trap in supervisor mode
 > [ 663.0527222] trap type 1 code 0 rip 0xffffffff8023e93d cs 0xe030 =
 rflags 0x202 cr2 0x7f7ff6892ce0 ilevel
 >=20
 > [ 663.0527222] curlwp 0xffffc8800b9db1c0 pid 6987.6987 lowest kstack =
 0xffffc880ef49a2c0
 > Stopped in pid 6987.6987 (yajl_test) at netbsd:breakpoint+0x5:  leave
 > ds          e650
 > es          e600
 > fs          e640
 > gs          10
 > rdi         0
 > rsi         1
 > rbp         ffffc880ef49e640
 > rbx         ffffffff80ed2f50    mutex_adaptive_lockops
 > rdx         2
 > rcx         0
 > rax         0
 > r8          ffffffff80ed2f50    mutex_adaptive_lockops
 > r9          1
 > r10         0
 > r11         fffffffe
 > r12         104
 > r13         ffffffff80d43960    ostype+0xa6448
 > r14         ffffc880ef49e688
 > r15         ffffffff80d3c46b    ostype+0x9ef53
 > rip         ffffffff8023e93d    breakpoint+0x5
 > cs          e030
 > rflags      202
 > rsp         ffffc880ef49e640
 > ss          e02b
 > netbsd:breakpoint+0x5:  leave
 > db{1}> (XEN) [2021-05-14 18:09:45.682] Watchdog timer fired for domain =
 0
 > (XEN) [2021-05-14 18:09:45.682] Hardware Dom0 shutdown: watchdog =
 rebooting machine
 >=20
 > 	(I guess ddb.onpanic=3D1 and the Xen watchdog aren't very useful
 > 	together!)
 >=20
 >=20
 >> How-To-Repeat:
 >=20
 > 	I don't yet have an isolated example test, but running the
 > 	regression tests in my robohack/yajl project, and in particular
 > 	the "ap_eof_str" test, with USE_ASAN=3Dyes and with the source =
 and
 > 	build on an NFS mount (which I'm only guessing about because of
 > 	the nfs_*() calls in the kernel stack backtrace), has reliably
 > 	reproduced this crash for me:
 >=20
 > 	$ cd /some/NFS/mountpoint
 > 	$ git clone https://github.com/robohack/yajl
 > 	$ cd yajl
 > 	$ mkdir build
 > 	$ MAKEOBJDIRPREFIX=3D$(/bin/pwd)/build make regress USE_ASAN=3Dyes=
  MKDOC=3Dno
 >=20
 > 	If I understand correctly the system call involved here is
 > 	sysctl(2), and that there's something to do with proc too, but
 > 	I'm quite unfamiliar with ASAN runtime internals so I don't know
 > 	what it's doing to cause this, especially since a couple of
 > 	other tests have already run when this one crashes.  I do know
 > 	that ASAN will check to make sure ASLR is not enabled, and it
 > 	will also mmap() something somewhere really high up and it fails
 > 	unless you do "ulimit -v unlimited" first.
 >=20
 > 	If necessary I can try in a domU, or disable the Xen watchdog
 > 	for the dom0 (as otherwise I only have 20 seconds before the
 > 	reboot!), and try the crash again and do more DDB digging if
 > 	someone can guide me along.  And/Or I can change what's in
 > 	ddb.commandonenter too...
 >=20
 >> Fix:
 >=20
 >> Unformatted:
 > 		2021-03-10T23:08:13Z
 
 
 --Apple-Mail=_8E4D82D2-4B57-482B-9A3A-6F9F00CB463A
 Content-Transfer-Encoding: 7bit
 Content-Disposition: attachment;
 	filename=signature.asc
 Content-Type: application/pgp-signature;
 	name=signature.asc
 Content-Description: Message signed with OpenPGP
 
 -----BEGIN PGP SIGNATURE-----
 Comment: GPGTools - http://gpgtools.org
 
 iF0EARECAB0WIQS+BJlbqPkO0MDBdsRxESqxbLM7OgUCYJ7qjQAKCRBxESqxbLM7
 Ot/uAJ9UtJFkEo+iV50fRvSqZLuVg1TJ+wCdHCLtwUtGSLYt/9ufHziixqEcBGc=
 =Mxp+
 -----END PGP SIGNATURE-----
 
 --Apple-Mail=_8E4D82D2-4B57-482B-9A3A-6F9F00CB463A--
 


Home | Main Index | Thread Index | Old Index