NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: kern/52462: aac driver (and possibly others) needs MPification
On Sat, Aug 05, 2017 at 01:50:58PM +0200, Havard Eidnes wrote:
>
> as you provided privately. What then happens is that the kernel
> goes into inactivity when the user-land startup tries to start
> BIND, and when I break into DDB I repeatedly get this as the
> traceback:
>
> db{0}> trace
> breakpoint() at netbsd:breakpoint+0x5
> comintr() at netbsd:comintr+0x59a
> Xintr_ioapic_edge8() at netbsd:Xintr_ioapic_edge8+0xee
> --- interrupt ---
> x86_pause() at netbsd:x86_pause
> lddone() at netbsd:lddone+0x1e
> aac_new_intr() at netbsd:aac_new_intr+0xed
> intr_biglock_wrapper() at netbsd:intr_biglock_wrapper+0x1d
> Xintr_ioapic_level1() at netbsd:Xintr_ioapic_level1+0xf2
> --- interrupt ---
> x86_stihlt() at netbsd:x86_stihlt+0x6
> acpicpu_cstate_idle_enter() at netbsd:acpicpu_cstate_idle_enter+0xdb
> acpicpu_cstate_idle() at netbsd:acpicpu_cstate_idle+0xb6
> idle_loop() at netbsd:idle_loop+0x18c
> db{0}> c
The part between lddone and x86_pause is missing, thanks to
an optimizing compiler...
x86_pause is called when a CPU busy-waits for a spin-mutex
which could be anyhere, but the return address (lddone+0x1e)
points to the mutex_enter() call directly in lddone.
I.e. there is something holding the ld driver mutex.
The most likely reason would be someone calling into ld_diskstart
which holds the mutex while calling into ld_aac_start.
So:
some thread calling into ld driver:
- get mutex (in ld_diskstart)
- get kernel lock (in ld_aac_start)
the interrupt:
- get kernel lock (due to non-MPSAFE interrupt)
- get mutex (in lddone).
The wrong locking order may cause a deadlock.
Ok. For the next try: remove the patch and simply declare ld as non-mpsafe
by removing the D_MPSAFE flags.
Greetings,
--
Michael van Elst
Internet: mlelstv%serpens.de@localhost
"A potential Snark may lurk in every tree."
Home |
Main Index |
Thread Index |
Old Index