NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/48733: deadlock in if_output() with interrupt on KERNEL_LOCK



On Fri, May 09, 2014 at 12:20:00PM +0000, Wolfgang Stukenbrock wrote:
>  [...]
>  db{0}> bt/a fffffe822f73f420
>  trace: pid 0 lid 3 at 0xfffffe810e967760
>  ether_output() at netbsd:ether_output+0x2b6
>  ip_output() at netbsd:ip_output+0xa8f
>  tcp_output() at netbsd:tcp_output+0x1698
>  tcp_input() at netbsd:tcp_input+0x15d9
>  ip_input() at netbsd:ip_input+0x3ef
>  ipintr() at netbsd:ipintr+0x109
>  softint_dispatch() at netbsd:softint_dispatch+0xd9
>  DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xfffffe810e967d70
>  Xsoftintr() at netbsd:Xsoftintr+0x4f
>  --- interrupt ---
>  0:
>  
>  That is the part that is gooing to send a packet. I see the printout in 
>  ip_output prior calling 'ifp->if_output()' - not the one behind.
>  The location pointed to by the backtrace in ether_output() is the call 
>  to "return ifq_enqueue(...)". I also see the printout I've added in 
>  front of this call, but not the one behind.
>  In ifq_enqueue() I see the output of the call to 'ifp->if_start' - the 
>  wm-driver - in this routine and the printout in front of the splx(s) at 
>  the end of the routine - not the printout behind it.
>  This is the localtion where the deadlock happens while processing other 
>  interrupts in Xspllower.
>  This always looks the same ....

ether_output() is called with the KERNEL_LOCK held, so at this point cpu0
already owns KERNEL_LOCK, it won't spin trying to grab it again.
You can confirm this by printing curcpu()->ci_biglock_count.
Did you try a kernel with options LOCKDEBUG ?

What's possible here is a loop trying to process the same interrupt
forever.

>  
>  
>  
>  db{0}> bt/a fffffe822f736440
>  trace: pid 0 lid 6 at 0xfffffe810e9739c8
>  breakpoint() at netbsd:breakpoint+0x5
>  comintr() at netbsd:comintr+0x518
>  Xintr_ioapic_edge1() at netbsd:Xintr_ioapic_edge1+0xea
>  --- interrupt ---
>  bus_space_read_4() at netbsd:bus_space_read_4+0xa
>  intr_biglock_wrapper() at netbsd:intr_biglock_wrapper+0x3b
>  Xintr_ioapic_level6() at netbsd:Xintr_ioapic_level6+0xf2
>  --- interrupt ---
>  Xspllower() at netbsd:Xspllower+0xe
>  DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xfffffe810e973d70
>  Xsoftintr() at netbsd:Xsoftintr+0x4f
>  --- interrupt ---
>  0:
>  
>  
>  Hmmm - not shure about it ...
>  It looks like that during processing one pending interrupt in Xspllower 
>  at the end of that routine an interrupt came im that takes the 
>  KERNEL_LOCK in intr_biglock_wrapper() again and do what? Hangup in 
>  bus_space_read_4() ???? Busy-loop for whatever reason in that interrupt 
>  and the location where the DDB-enter occures in bus_space_read_4() is 
>  just random ????
>  The comintr looks like the break-interrupt on the serial console of the 
>  system to enter DDB to me.

it is.

>  Any idea to find out what interrupt routine it is???

dmesg could point to the problem; the interrupt we're looking for is
level-triggered on pin 6 (so maybe "irq 6")

-- 
Manuel Bouyer <bouyer%antioche.eu.org@localhost>
     NetBSD: 26 ans d'experience feront toujours la difference
--


Home | Main Index | Thread Index | Old Index