NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/52462: aac driver (and possibly others) needs MPification



The following reply was made to PR kern/52462; it has been noted by GNATS.

From: Michael van Elst <mlelstv%serpens.de@localhost>
To: Havard Eidnes <he%NetBSD.org@localhost>
Cc: gnats-bugs%NetBSD.org@localhost, netbsd-bugs%netbsd.org@localhost
Subject: Re: kern/52462: aac driver (and possibly others) needs MPification
Date: Sat, 5 Aug 2017 15:24:20 +0200

 On Sat, Aug 05, 2017 at 01:50:58PM +0200, Havard Eidnes wrote:
 > 
 > as you provided privately.  What then happens is that the kernel
 > goes into inactivity when the user-land startup tries to start
 > BIND, and when I break into DDB I repeatedly get this as the
 > traceback:
 > 
 > db{0}> trace
 > breakpoint() at netbsd:breakpoint+0x5
 > comintr() at netbsd:comintr+0x59a
 > Xintr_ioapic_edge8() at netbsd:Xintr_ioapic_edge8+0xee
 > --- interrupt ---
 > x86_pause() at netbsd:x86_pause
 > lddone() at netbsd:lddone+0x1e
 > aac_new_intr() at netbsd:aac_new_intr+0xed
 > intr_biglock_wrapper() at netbsd:intr_biglock_wrapper+0x1d
 > Xintr_ioapic_level1() at netbsd:Xintr_ioapic_level1+0xf2
 > --- interrupt ---
 > x86_stihlt() at netbsd:x86_stihlt+0x6
 > acpicpu_cstate_idle_enter() at netbsd:acpicpu_cstate_idle_enter+0xdb
 > acpicpu_cstate_idle() at netbsd:acpicpu_cstate_idle+0xb6
 > idle_loop() at netbsd:idle_loop+0x18c
 > db{0}> c
 
 The part between lddone and x86_pause is missing, thanks to
 an optimizing compiler...
                                 
 x86_pause is called when a CPU busy-waits for a spin-mutex
 which could be anyhere, but the return address (lddone+0x1e)
 points to the mutex_enter() call directly in lddone.
 I.e. there is something holding the ld driver mutex.
 
 The most likely reason would be someone calling into ld_diskstart
 which holds the mutex while calling into ld_aac_start.
 
 So:
 
 some thread calling into ld driver:
 - get mutex (in ld_diskstart)
 - get kernel lock (in ld_aac_start)
 
 the interrupt:
 - get kernel lock (due to non-MPSAFE interrupt)
 - get mutex (in lddone).
 
 The wrong locking order may cause a deadlock.
 
 
 Ok. For the next try: remove the patch and simply declare ld as non-mpsafe
 by removing the D_MPSAFE flags.
 
 
 
 
 Greetings,
 -- 
                                 Michael van Elst
 Internet: mlelstv%serpens.de@localhost
                                 "A potential Snark may lurk in every tree."
 


Home | Main Index | Thread Index | Old Index