tech-net archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: apparently missing locking in if_bnx.c



On Tue, Mar 06, 2012 at 12:25:23PM -0500, Greg Troxel wrote:
> Restating some off-list discussion for the record, now that we've
> figured it out:
> 
>   bnx_start can defer work to allocate tx data structures via a
>   workqueue
> 
>   the workqueue registration is marked MPSAFE
> 
>   so when the workqueue calls the alloc routines, the kernel lock is not
>   held
> 
>   the alloc routine calls bnx_start, and it protects that with splnet,
>   but it hasn't taken the kernel lock
> 
>   so bnx_start (the second time on the first packet) is running at
>   splnet, without the kernel lock.   This triggers the assert.
> 
>   if the assert isn't there, then there's the possibility of another
>   processor handling an interrupt and calling bnx_start.  Both the
>   workqueue-called copy and the intr-called copy will be at splnet, but
>   on differerent processors.
> 
>   The above is typically rare, and it seems to take heavy load to
>   trigger it sometimes.  It's probably the combination of multiple TCPs
>   opening up cwnd and the CPU utilization getting high that leads to the
>   unintended concurrency.
> 
>   The proposed fix is to not mark bnx's workqueue MPSAFE (instead of the
>   patch I sent earlier).

This is what I've just commited. This doesn't mean your patch isn't
correct, but I prefer to go the easy way for netbsd-6 first.

-- 
Manuel Bouyer <bouyer%antioche.eu.org@localhost>
     NetBSD: 26 ans d'experience feront toujours la difference
--


Home | Main Index | Thread Index | Old Index