[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: kern/48733: deadlock in if_output() with interrupt on KERNEL_LOCK
The following reply was made to PR kern/48733; it has been noted by GNATS.
From: Manuel Bouyer <bouyer%antioche.eu.org@localhost>
Cc: kern-bug-people%NetBSD.org@localhost, gnats-admin%NetBSD.org@localhost,
Subject: Re: kern/48733: deadlock in if_output() with interrupt on KERNEL_LOCK
Date: Fri, 11 Apr 2014 17:59:09 +0200
On Fri, Apr 11, 2014 at 02:40:00PM +0000,
> Problem located in /src/sys/netinet/ip_output.c.
> Since file revision 1.208 the Kernel-Lock is locked prior calling
> on the interface.
> Now - at least the wm-driver - will call splnet() and splx() inside the
> If any interrupt occurs in between splnet() and splx(), the interrupt
> is delayed and
> is processes in splx() when the level is released again.
> If such an interrupt is e.g. not MP-SAFE, the call stup in
> intr_biglock_wrapper() is
> used to call the interrupt routine and that one will lock the
> KERNEL-LOCK again.
> So we try to lock it again here -> dead-lock.
> Our system runs fine with 4 8257x interfaces, but after adding 2
> additional 8254x
> interfaces, the system lock-up after a short time. Don't ask me, why
> the if_output
> call takes "to long" with theese two additonal interfaces, but it is
> I've analysed this several times with DDB. Most times I've seen an
> that dead-lock the system.
I think your analsys is wrong. the KERNEL_LOCK is special in the sense that
it can be locked multiple time on the same CPU. So it's not a problem
that splx() on the same CPU tries to get KERNEL_LOCK again, it will just
increase the lock count. A splx() on another CPU will wait for the
KERNEL_LOCK to be relased.
I think your problem is more likely in the USB stack.
Maybe one of your new ethernet interface shares an interrupt with the
USB controller ?
> Run a lot of trafic over wm-interfaces and do shomething e.g. on USB at
> the same
> time. It is just a question of time till system-dead-lock.
> Fist guess: revert change done from 1.207 to 1.208.
> But I've no idea about side effects.
Very bad: the output queues are protected by the KERNEL_LOCK and splnet().
If you revert ip_output 1.208, you'll also have to revert ip_input.c
1.286 and 1.285, so that the whole IP stack runs under the KERNEL_LOCK again.
Manuel Bouyer <bouyer%antioche.eu.org@localhost>
NetBSD: 26 ans d'experience feront toujours la difference
Main Index |
Thread Index |