tech-net archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: apparently missing locking in if_bnx.c
On Tue, Mar 06, 2012 at 12:25:23PM -0500, Greg Troxel wrote:
> Restating some off-list discussion for the record, now that we've
> figured it out:
>
> bnx_start can defer work to allocate tx data structures via a
> workqueue
>
> the workqueue registration is marked MPSAFE
>
> so when the workqueue calls the alloc routines, the kernel lock is not
> held
>
> the alloc routine calls bnx_start, and it protects that with splnet,
> but it hasn't taken the kernel lock
>
> so bnx_start (the second time on the first packet) is running at
> splnet, without the kernel lock. This triggers the assert.
>
> if the assert isn't there, then there's the possibility of another
> processor handling an interrupt and calling bnx_start. Both the
> workqueue-called copy and the intr-called copy will be at splnet, but
> on differerent processors.
>
> The above is typically rare, and it seems to take heavy load to
> trigger it sometimes. It's probably the combination of multiple TCPs
> opening up cwnd and the CPU utilization getting high that leads to the
> unintended concurrency.
>
> The proposed fix is to not mark bnx's workqueue MPSAFE (instead of the
> patch I sent earlier).
This is what I've just commited. This doesn't mean your patch isn't
correct, but I prefer to go the easy way for netbsd-6 first.
--
Manuel Bouyer <bouyer%antioche.eu.org@localhost>
NetBSD: 26 ans d'experience feront toujours la difference
--
Home |
Main Index |
Thread Index |
Old Index