NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/48733: deadlock in if_output() with interrupt on KERNEL_LOCK



The following reply was made to PR kern/48733; it has been noted by GNATS.

From: Manuel Bouyer <bouyer%antioche.eu.org@localhost>
To: gnats-bugs%NetBSD.org@localhost
Cc: kern-bug-people%NetBSD.org@localhost, gnats-admin%NetBSD.org@localhost, 
netbsd-bugs%NetBSD.org@localhost
Subject: Re: kern/48733: deadlock in if_output() with interrupt on KERNEL_LOCK
Date: Fri, 11 Apr 2014 17:59:09 +0200

 On Fri, Apr 11, 2014 at 02:40:00PM +0000, 
Wolfgang.Stukenbrock%nagler-company.com@localhost wrote:
 > >Description:
 >      Problem located in /src/sys/netinet/ip_output.c.
 >      Since file revision 1.208 the Kernel-Lock is locked prior calling 
 > if_output
 >      on the interface.
 >      Now - at least the wm-driver - will call splnet() and splx() inside the 
 > output
 >      routine.
 >      If any interrupt occurs in between splnet() and splx(), the interrupt 
 > is delayed and
 >      is processes in splx() when the level is released again.
 >      If such an interrupt is e.g. not MP-SAFE, the call stup in 
 > intr_biglock_wrapper() is
 >      used to call the interrupt routine and that one will lock the 
 > KERNEL-LOCK again.
 >      So we try to lock it again here -> dead-lock.
 > 
 >      Our system runs fine with 4 8257x interfaces, but after adding 2 
 > additional 8254x
 >      interfaces, the system lock-up after a short time. Don't ask me, why 
 > the if_output
 >      call takes "to long" with theese two additonal interfaces, but it is 
 > reproducable.
 >      I've analysed this several times with DDB. Most times I've seen an 
 > USB-interrupt
 >      that dead-lock the system.
 
 I think your analsys is wrong. the KERNEL_LOCK is special in the sense that
 it can be locked multiple time on the same CPU. So it's not a problem
 that splx() on the same CPU tries to get KERNEL_LOCK again, it will just
 increase the lock count. A splx() on another CPU will wait for the
 KERNEL_LOCK to be relased.
 
 I think your problem is more likely in the USB stack.
 Maybe one of your new ethernet interface shares an interrupt with the
 USB controller ?
 
 
 > >How-To-Repeat:
 >      Run a lot of trafic over wm-interfaces and do shomething e.g. on USB at 
 > the same
 >      time. It is just a question of time till system-dead-lock.
 > >Fix:
 >      Fist guess: revert change done from 1.207 to 1.208.
 >      But I've no idea about side effects.
 
 Very bad: the output queues are protected by the KERNEL_LOCK and splnet().
 If you revert ip_output 1.208, you'll also have to revert ip_input.c
 1.286 and 1.285, so that the whole IP stack runs under the KERNEL_LOCK again.
 
 -- 
 Manuel Bouyer <bouyer%antioche.eu.org@localhost>
      NetBSD: 26 ans d'experience feront toujours la difference
 --
 


Home | Main Index | Thread Index | Old Index