NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/48733: deadlock in if_output() with interrupt on KERNEL_LOCK



On Fri, Apr 11, 2014 at 02:40:00PM +0000, 
Wolfgang.Stukenbrock%nagler-company.com@localhost wrote:
> >Description:
>       Problem located in /src/sys/netinet/ip_output.c.
>       Since file revision 1.208 the Kernel-Lock is locked prior calling 
> if_output
>       on the interface.
>       Now - at least the wm-driver - will call splnet() and splx() inside the 
> output
>       routine.
>       If any interrupt occurs in between splnet() and splx(), the interrupt 
> is delayed and
>       is processes in splx() when the level is released again.
>       If such an interrupt is e.g. not MP-SAFE, the call stup in 
> intr_biglock_wrapper() is
>       used to call the interrupt routine and that one will lock the 
> KERNEL-LOCK again.
>       So we try to lock it again here -> dead-lock.
> 
>       Our system runs fine with 4 8257x interfaces, but after adding 2 
> additional 8254x
>       interfaces, the system lock-up after a short time. Don't ask me, why 
> the if_output
>       call takes "to long" with theese two additonal interfaces, but it is 
> reproducable.
>       I've analysed this several times with DDB. Most times I've seen an 
> USB-interrupt
>       that dead-lock the system.

I think your analsys is wrong. the KERNEL_LOCK is special in the sense that
it can be locked multiple time on the same CPU. So it's not a problem
that splx() on the same CPU tries to get KERNEL_LOCK again, it will just
increase the lock count. A splx() on another CPU will wait for the
KERNEL_LOCK to be relased.

I think your problem is more likely in the USB stack.
Maybe one of your new ethernet interface shares an interrupt with the
USB controller ?


> >How-To-Repeat:
>       Run a lot of trafic over wm-interfaces and do shomething e.g. on USB at 
> the same
>       time. It is just a question of time till system-dead-lock.
> >Fix:
>       Fist guess: revert change done from 1.207 to 1.208.
>       But I've no idea about side effects.

Very bad: the output queues are protected by the KERNEL_LOCK and splnet().
If you revert ip_output 1.208, you'll also have to revert ip_input.c
1.286 and 1.285, so that the whole IP stack runs under the KERNEL_LOCK again.

-- 
Manuel Bouyer <bouyer%antioche.eu.org@localhost>
     NetBSD: 26 ans d'experience feront toujours la difference
--


Home | Main Index | Thread Index | Old Index