NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

port-powerpc/56941: process lock owned by another process?



>Number:         56941
>Category:       port-powerpc
>Synopsis:       process lock owned by another process?
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    port-powerpc-maintainer
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Jul 26 13:45:00 +0000 2022
>Originator:     Martin Husemann
>Release:        NetBSD 9.99.99
>Organization:
The NetBSD Foundation, Inc.
>Environment:
System: NetBSD gethsemane.aprisoft.de 9.99.99 NetBSD 9.99.99 (GETHSEMANE) #206: Tue Jul 26 10:53:25 CEST 2022 martin%seven-days-to-the-wolves.aprisoft.de@localhost:/work/src/sys/arch/macppc/compile/GETHSEMANE macppc
Architecture: powerpc
Machine: macppc
>Description:

I got a confusing panic while running ATF tests:

[ 13806.9272513] Mutex error: mutex_vector_exit,755: assertion failed: MUTEX_OWNER(mtx->mtx_owner) == curthread

[ 13806.9472647] lock address : 0x0000000020f8dbc0
[ 13806.9572689] current cpu  :                  1
[ 13806.9672774] current lwp  : 0x0000000014c11640
[ 13806.9772782] owner field  : 0x0000000042e38a00 wait/spin:                0/0

[ 13806.9972878] panic: lock error: Mutex: mutex_vector_exit,755: assertion failed: MUTEX_OWNER(mtx->mtx_owner) == curthread: lock 0x20f8dbc0 cpu 1 lwp 0x14c11640
[ 13807.0172997] cpu1: Begin traceback...
[ 13807.0273050] 0x1bb47d50: at vpanic+0x158
[ 13807.0373073] 0x1bb47d80: at panic+0x50
[ 13807.0473129] 0x1bb47dc0: at lockdebug_abort+0xe4
[ 13807.0573190] 0x1bb47de0: at mutex_spin_exit+0x104
[ 13807.0673248] 0x1bb47df0: at lwp_exit+0x2a0
[ 13807.0773295] 0x1bb47e50: at lwp_userret+0x17c
[ 13807.0873369] 0x1bb47eb0: at syscall+0x510
[ 13807.0973378] 0x1bb47f20: user SC trap #478 by 0xfd4b65f4: srr1=0xd032
[ 13807.1173521]             r1=0xfaee3e90 cr=0x54002482 xer=0x20000000 ctr=0xfd4b65ec
Stopped in pid 3821.15883 (t_io) at     netbsd:vpanic+0x15c:    or      r3, r26,
 r26
db{1}> ps 
PID    LID S CPU     FLAGS       STRUCT LWP *               NAME WAIT
3821 >15883 5   1    100100           14c11640           (zombie)
[..]
3821 >3821 7   0         0           42e38a00               t_io

and:

(gdb) list *(lwp_exit+0x2a0)
0x7442b0 is in lwp_exit (../../../../kern/kern_lwp.c:1215).
1210            lwp_unlock(l);
1211            p->p_nrlwps--;
1212            cv_broadcast(&p->p_lwpcv);
1213            if (l->l_lwpctl != NULL)
1214                    l->l_lwpctl->lc_curcpu = LWPCTL_CPU_EXITED;
1215            mutex_exit(p->p_lock);
1216    
1217            /*
1218             * We can no longer block.  At this point, lwp_free() may already
1219             * be gunning for us.  On a multi-CPU system, we may be off p_lwps.


>How-To-Repeat:
Have seen it from time to time when running full ATF tests.

>Fix:
n/a



Home | Main Index | Thread Index | Old Index