NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/48733: deadlock in if_output() with interrupt on KERNEL_LOCK



The following reply was made to PR kern/48733; it has been noted by GNATS.

From: Manuel Bouyer <bouyer%antioche.eu.org@localhost>
To: gnats-bugs%NetBSD.org@localhost
Cc: kern-bug-people%NetBSD.org@localhost, gnats-admin%NetBSD.org@localhost, 
netbsd-bugs%NetBSD.org@localhost,
        Wolfgang.Stukenbrock%nagler-company.com@localhost
Subject: Re: kern/48733: deadlock in if_output() with interrupt on KERNEL_LOCK
Date: Mon, 12 May 2014 14:23:46 +0200

 On Fri, May 09, 2014 at 12:20:00PM +0000, Wolfgang Stukenbrock wrote:
 >  [...]
 >  db{0}> bt/a fffffe822f73f420
 >  trace: pid 0 lid 3 at 0xfffffe810e967760
 >  ether_output() at netbsd:ether_output+0x2b6
 >  ip_output() at netbsd:ip_output+0xa8f
 >  tcp_output() at netbsd:tcp_output+0x1698
 >  tcp_input() at netbsd:tcp_input+0x15d9
 >  ip_input() at netbsd:ip_input+0x3ef
 >  ipintr() at netbsd:ipintr+0x109
 >  softint_dispatch() at netbsd:softint_dispatch+0xd9
 >  DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xfffffe810e967d70
 >  Xsoftintr() at netbsd:Xsoftintr+0x4f
 >  --- interrupt ---
 >  0:
 >  
 >  That is the part that is gooing to send a packet. I see the printout in 
 >  ip_output prior calling 'ifp->if_output()' - not the one behind.
 >  The location pointed to by the backtrace in ether_output() is the call 
 >  to "return ifq_enqueue(...)". I also see the printout I've added in 
 >  front of this call, but not the one behind.
 >  In ifq_enqueue() I see the output of the call to 'ifp->if_start' - the 
 >  wm-driver - in this routine and the printout in front of the splx(s) at 
 >  the end of the routine - not the printout behind it.
 >  This is the localtion where the deadlock happens while processing other 
 >  interrupts in Xspllower.
 >  This always looks the same ....
 
 ether_output() is called with the KERNEL_LOCK held, so at this point cpu0
 already owns KERNEL_LOCK, it won't spin trying to grab it again.
 You can confirm this by printing curcpu()->ci_biglock_count.
 Did you try a kernel with options LOCKDEBUG ?
 
 What's possible here is a loop trying to process the same interrupt
 forever.
 
 >  
 >  
 >  
 >  db{0}> bt/a fffffe822f736440
 >  trace: pid 0 lid 6 at 0xfffffe810e9739c8
 >  breakpoint() at netbsd:breakpoint+0x5
 >  comintr() at netbsd:comintr+0x518
 >  Xintr_ioapic_edge1() at netbsd:Xintr_ioapic_edge1+0xea
 >  --- interrupt ---
 >  bus_space_read_4() at netbsd:bus_space_read_4+0xa
 >  intr_biglock_wrapper() at netbsd:intr_biglock_wrapper+0x3b
 >  Xintr_ioapic_level6() at netbsd:Xintr_ioapic_level6+0xf2
 >  --- interrupt ---
 >  Xspllower() at netbsd:Xspllower+0xe
 >  DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xfffffe810e973d70
 >  Xsoftintr() at netbsd:Xsoftintr+0x4f
 >  --- interrupt ---
 >  0:
 >  
 >  
 >  Hmmm - not shure about it ...
 >  It looks like that during processing one pending interrupt in Xspllower 
 >  at the end of that routine an interrupt came im that takes the 
 >  KERNEL_LOCK in intr_biglock_wrapper() again and do what? Hangup in 
 >  bus_space_read_4() ???? Busy-loop for whatever reason in that interrupt 
 >  and the location where the DDB-enter occures in bus_space_read_4() is 
 >  just random ????
 >  The comintr looks like the break-interrupt on the serial console of the 
 >  system to enter DDB to me.
 
 it is.
 
 >  Any idea to find out what interrupt routine it is???
 
 dmesg could point to the problem; the interrupt we're looking for is
 level-triggered on pin 6 (so maybe "irq 6")
 
 -- 
 Manuel Bouyer <bouyer%antioche.eu.org@localhost>
      NetBSD: 26 ans d'experience feront toujours la difference
 --
 


Home | Main Index | Thread Index | Old Index