Current-Users archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: something's wrong
On Thu, 14 Nov 2013 23:41:33 +0900
Takahiro HAYASHI <t-hash%abox3.so-net.ne.jp@localhost> wrote:
> I happened unfortunately to meet this problem, but fortunately
> entered ddb.
> I was doing ./build release for amd64 on amd64 HEAD around Nov 9.
>
> Does this give any help?
Yes, thanks - this helps to narrow down the problem. I don't see the
real reason yet, but perhaps someone more familiar with synchronization
matters can make more sense of it. Just extracting some interesting
data.
> db{0}> ps
> PID LID S CPU FLAGS STRUCT LWP * NAME WAIT
> 2150 1 3 5 0 fffffe81c3a0f6e0 cc1
> xchicv
I see only this one LWP with pcu activity. It waits for another CPU
to dump its FPU state to memory, so that it can be loaded and used
on the current CPU:
> db{0}> bt/a fffffe81c3a0f6e0
> trace: pid 2150 lid 1 at 0xfffffe810ed35bc0
> sleepq_block() at netbsd:sleepq_block+0xa0
> cv_wait() at netbsd:cv_wait+0x9f
> xc_wait() at netbsd:xc_wait+0x4a
> pcu_load() at netbsd:pcu_load+0x79
We don't know which CPU the remote one is. The xcall is executed
in an softclk context. 3 of the handlers are blocked:
> 0 56 3 7 200 fffffe810e170ac0 softclk/7
> tstile
> 0 38 3 4 200 fffffe810e105a00 softclk/4
> tstile
> 0 26 3 2 200 fffffe810e0a1980 softclk/2
> tstile
For the first one, we got a stacktrace:
> db{0}> t/a fffffe810e170ac0
> trace: pid 0 lid 56 at 0xfffffe810e169be0
> sleepq_block() at netbsd:sleepq_block+0xa0
> turnstile_block() at netbsd:turnstile_block+0x2cc
> mutex_vector_enter() at netbsd:mutex_vector_enter+0x13d
> arptimer() at netbsd:arptimer+0x15
> callout_softclock() at netbsd:callout_softclock+0x174
> softint_dispatch() at netbsd:softint_dispatch+0x7b
Apparently waiting for softnet_lock.
There is another softint handler also waiting for softnet_lock:
> 0 3 3 0 200 fffffe821dd69440 softnet/0
> tstile
> [...]
> db{0}> bt/a fffffe821dd69440
> trace: pid 0 lid 3 at 0xfffffe810e055c30
> sleepq_block() at netbsd:sleepq_block+0xa0
> turnstile_block() at netbsd:turnstile_block+0x2cc
> mutex_vector_enter() at netbsd:mutex_vector_enter+0x13d
> arpintr() at netbsd:arpintr+0x13
> softint_dispatch() at netbsd:softint_dispatch+0x7b
We don't know what the softclk handlers on cpu2 and cpu4 are
waiting for.
It doesn't look like two pcu actions directly locking against
each other. Other xcalls also don't seem to be involved.
(The only other user of high-priority xcalls is the "pserialize"
framework which I don't see any traces of.)
So it looks like a more complex lock order issue.
best regards
Matthias
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
Home |
Main Index |
Thread Index |
Old Index