tech-kern archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
possible lwp_lock() issue
Hi,
I got several panics of this type on a amd64 XEN3_DOM0, with a HVM guest
running (so lots of context switches. The kernel is built with DEBUG+DIAGNOSTIC
(but not LOCKDEBUG). I got this panic several times:
Mutex error: mutex_vector_exit: exiting unheld spin mutex
lock address : 0xffffa0002bca6f48
current cpu : 0
current lwp : 0xffffa0002bcaa000
owner field : 0x0000000000000700 wait/spin: 0/1
panic: lock error
fatal breakpoint trap in supervisor mode
trap type 1 code 0 rip ffffffff804abbdd cs e030 rflags 246 cr2
ffffa0002fa70e00 cpl 7 rsp ffffa0002c62aa80
Stopped in pid 0.5 (system) at netbsd:breakpoint+0x5: leave
breakpoint() at netbsd:breakpoint+0x5
panic() at netbsd:panic+0x255
lockdebug_abort() at netbsd:lockdebug_abort+0x42
mutex_vector_exit() at netbsd:mutex_vector_exit+0xfd
callout_softclock() at netbsd:callout_softclock+0x1ef
softint_thread() at netbsd:softint_thread+0x88
ds 0xaa90
es 0xc95c
fs 0xaa90
gs 0xca37
rdi 0
rsi 0xdeadbeef
rbp 0xffffa0002c62aa80
rbx 0xffffa0002c62aa90
rdx 0
rcx 0
rax 0x1
r8 0xffffffff80ac6200 cpu_info_primary
r9 0x1
r10 0xffffa0002c62a9a0
r11 0xffffffff804e7ca0 xenconscn_putc
r12 0x100
r13 0xffffffff8084b3e3 copyright+0x19663
r14 0xffffffff80a7fd20 mutex_spin_lockops
r15 0xffffffff803fdcc0 sleepq_timeout
rip 0xffffffff804abbdd breakpoint+0x5
cs 0xe030
rflags 0x246
rsp 0xffffa0002c62aa80
ss 0xe02b
netbsd:breakpoint+0x5: leave
First, I think ddb miss a function call which got optimised here,
and mutex_vector_exit() was really called from sleepq_timeout:
the assembly around callout_softclock+0x1ef is:
0xffffffff8040b094 <callout_softclock+484>: callq 0xffffffff804babd0
<mutex_spin_exit>
0xffffffff8040b099 <callout_softclock+489>: mov %r14,%rdi
0xffffffff8040b09c <callout_softclock+492>: callq *%r15
0xffffffff8040b09f <callout_softclock+495>: mov %r13,%rdi
0xffffffff8040b0a2 <callout_softclock+498>: callq 0xffffffff804bab80
<mutex_spin_enter>
%r15 points to sleepq_timeout, and I think it has not been clobbered when ddb
is called (I disasembled the sleepq_timeout->mutex_vector_exit to make sure).
I guess in sleepq_timeout() we're ending in the (l->l_wchan == NULL) case.
Now the question:
in lwp_lock(), we call lwp_lock_retry() if l->l_mutex got changed while tacking
the lock. But we do it only if LOCKDEBUG || MULTIPROCESSOR, we do a simple
mutex_spin_enter() otherwise (i.e. in a XEN3_DOM0 kernel). Are we sure
l->l_mutex can't be changed in this case ? The panic I'm getting seems to
prove it can be changed ...
--
Manuel Bouyer <bouyer%antioche.eu.org@localhost>
NetBSD: 26 ans d'experience feront toujours la difference
--
Home |
Main Index |
Thread Index |
Old Index