[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: apparently missing locking in if_bnx.c
Got your patch.
This change is allowing for very heavy usage on the bnx drivers with no
However, I tried the same test with wm drivers. In that case, the kernel just
freezes. (I put your KASSERTs in wm_start.) So there is a problem with the wm
driver, it's just not the same thing.
On Mar 6, 2012, at 1:30 PM, Manuel Bouyer wrote:
> On Tue, Mar 06, 2012 at 12:25:23PM -0500, Greg Troxel wrote:
>> Restating some off-list discussion for the record, now that we've
>> figured it out:
>> bnx_start can defer work to allocate tx data structures via a
>> the workqueue registration is marked MPSAFE
>> so when the workqueue calls the alloc routines, the kernel lock is not
>> the alloc routine calls bnx_start, and it protects that with splnet,
>> but it hasn't taken the kernel lock
>> so bnx_start (the second time on the first packet) is running at
>> splnet, without the kernel lock. This triggers the assert.
>> if the assert isn't there, then there's the possibility of another
>> processor handling an interrupt and calling bnx_start. Both the
>> workqueue-called copy and the intr-called copy will be at splnet, but
>> on differerent processors.
>> The above is typically rare, and it seems to take heavy load to
>> trigger it sometimes. It's probably the combination of multiple TCPs
>> opening up cwnd and the CPU utilization getting high that leads to the
>> unintended concurrency.
>> The proposed fix is to not mark bnx's workqueue MPSAFE (instead of the
>> patch I sent earlier).
> This is what I've just commited. This doesn't mean your patch isn't
> correct, but I prefer to go the easy way for netbsd-6 first.
> Manuel Bouyer <bouyer%antioche.eu.org@localhost>
> NetBSD: 26 ans d'experience feront toujours la difference
Main Index |
Thread Index |