Re: apparently missing locking in if_bnx.c

To: Manuel Bouyer <bouyer%antioche.eu.org@localhost>
Subject: Re: apparently missing locking in if_bnx.c
From: Beverly Schwartz <bschwart%bbn.com@localhost>
Date: Tue, 6 Mar 2012 13:59:01 -0500

Got your patch.

This change is allowing for very heavy usage on the bnx drivers with no 
problems.

However, I tried the same test with wm drivers.  In that case, the kernel just 
freezes.  (I put your KASSERTs in wm_start.)  So there is a problem with the wm 
driver, it's just not the same thing.

-Bev

On Mar 6, 2012, at 1:30 PM, Manuel Bouyer wrote:

> On Tue, Mar 06, 2012 at 12:25:23PM -0500, Greg Troxel wrote:
>> Restating some off-list discussion for the record, now that we've
>> figured it out:
>> 
>>  bnx_start can defer work to allocate tx data structures via a
>>  workqueue
>> 
>>  the workqueue registration is marked MPSAFE
>> 
>>  so when the workqueue calls the alloc routines, the kernel lock is not
>>  held
>> 
>>  the alloc routine calls bnx_start, and it protects that with splnet,
>>  but it hasn't taken the kernel lock
>> 
>>  so bnx_start (the second time on the first packet) is running at
>>  splnet, without the kernel lock.   This triggers the assert.
>> 
>>  if the assert isn't there, then there's the possibility of another
>>  processor handling an interrupt and calling bnx_start.  Both the
>>  workqueue-called copy and the intr-called copy will be at splnet, but
>>  on differerent processors.
>> 
>>  The above is typically rare, and it seems to take heavy load to
>>  trigger it sometimes.  It's probably the combination of multiple TCPs
>>  opening up cwnd and the CPU utilization getting high that leads to the
>>  unintended concurrency.
>> 
>>  The proposed fix is to not mark bnx's workqueue MPSAFE (instead of the
>>  patch I sent earlier).
> 
> This is what I've just commited. This doesn't mean your patch isn't
> correct, but I prefer to go the easy way for netbsd-6 first.
> 
> -- 
> Manuel Bouyer <bouyer%antioche.eu.org@localhost>
>     NetBSD: 26 ans d'experience feront toujours la difference
> --

Follow-Ups:
- Re: apparently missing locking in if_bnx.c
  - From: Matthew Mondor

References:
- Re: apparently missing locking in if_bnx.c
  - From: Beverly Schwartz
- Re: apparently missing locking in if_bnx.c
  - From: Manuel Bouyer
- Re: apparently missing locking in if_bnx.c
  - From: Manuel Bouyer
- Re: apparently missing locking in if_bnx.c
  - From: Beverly Schwartz
- Re: apparently missing locking in if_bnx.c
  - From: Manuel Bouyer
- Re: apparently missing locking in if_bnx.c
  - From: Beverly Schwartz
- Re: apparently missing locking in if_bnx.c
  - From: Manuel Bouyer
- Re: apparently missing locking in if_bnx.c
  - From: Beverly Schwartz
- Re: apparently missing locking in if_bnx.c
  - From: Manuel Bouyer
- Re: apparently missing locking in if_bnx.c
  - From: Greg Troxel
- Re: apparently missing locking in if_bnx.c
  - From: Manuel Bouyer

Prev by Date: Re: apparently missing locking in if_bnx.c
Next by Date: Re: apparently missing locking in if_bnx.c
Previous by Thread: Re: apparently missing locking in if_bnx.c
Next by Thread: Re: apparently missing locking in if_bnx.c
Indexes:

Home | Main Index | Thread Index | Old Index