tech-net archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: apparently missing locking in if_bnx.c

Got your patch.

This change is allowing for very heavy usage on the bnx drivers with no 

However, I tried the same test with wm drivers.  In that case, the kernel just 
freezes.  (I put your KASSERTs in wm_start.)  So there is a problem with the wm 
driver, it's just not the same thing.


On Mar 6, 2012, at 1:30 PM, Manuel Bouyer wrote:

> On Tue, Mar 06, 2012 at 12:25:23PM -0500, Greg Troxel wrote:
>> Restating some off-list discussion for the record, now that we've
>> figured it out:
>>  bnx_start can defer work to allocate tx data structures via a
>>  workqueue
>>  the workqueue registration is marked MPSAFE
>>  so when the workqueue calls the alloc routines, the kernel lock is not
>>  held
>>  the alloc routine calls bnx_start, and it protects that with splnet,
>>  but it hasn't taken the kernel lock
>>  so bnx_start (the second time on the first packet) is running at
>>  splnet, without the kernel lock.   This triggers the assert.
>>  if the assert isn't there, then there's the possibility of another
>>  processor handling an interrupt and calling bnx_start.  Both the
>>  workqueue-called copy and the intr-called copy will be at splnet, but
>>  on differerent processors.
>>  The above is typically rare, and it seems to take heavy load to
>>  trigger it sometimes.  It's probably the combination of multiple TCPs
>>  opening up cwnd and the CPU utilization getting high that leads to the
>>  unintended concurrency.
>>  The proposed fix is to not mark bnx's workqueue MPSAFE (instead of the
>>  patch I sent earlier).
> This is what I've just commited. This doesn't mean your patch isn't
> correct, but I prefer to go the easy way for netbsd-6 first.
> -- 
> Manuel Bouyer <>
>     NetBSD: 26 ans d'experience feront toujours la difference
> --

Home | Main Index | Thread Index | Old Index