Re: apparently missing locking in if_bnx.c

To: Manuel Bouyer <bouyer%antioche.eu.org@localhost>
Subject: Re: apparently missing locking in if_bnx.c
From: Greg Troxel <gdt%ir.bbn.com@localhost>
Date: Tue, 06 Mar 2012 12:25:23 -0500

Manuel Bouyer <bouyer%antioche.eu.org@localhost> writes:

> On Tue, Mar 06, 2012 at 11:56:45AM -0500, Beverly Schwartz wrote:
>> ddb backtrace produces:
>> vpanic
>> kern_assert
>> bnx_start
>> bnx_alloc_pkts
>> workqueue_worker
>
> thanks, so the problem is really the workqueue that should not
> be marked MPSAFE ...

Restating some off-list discussion for the record, now that we've
figured it out:

  bnx_start can defer work to allocate tx data structures via a
  workqueue

  the workqueue registration is marked MPSAFE

  so when the workqueue calls the alloc routines, the kernel lock is not
  held

  the alloc routine calls bnx_start, and it protects that with splnet,
  but it hasn't taken the kernel lock

  so bnx_start (the second time on the first packet) is running at
  splnet, without the kernel lock.   This triggers the assert.

  if the assert isn't there, then there's the possibility of another
  processor handling an interrupt and calling bnx_start.  Both the
  workqueue-called copy and the intr-called copy will be at splnet, but
  on differerent processors.

  The above is typically rare, and it seems to take heavy load to
  trigger it sometimes.  It's probably the combination of multiple TCPs
  opening up cwnd and the CPU utilization getting high that leads to the
  unintended concurrency.

  The proposed fix is to not mark bnx's workqueue MPSAFE (instead of the
  patch I sent earlier).

Attachment: pgpHRgtw_BLb2.pgp
Description: PGP signature

Follow-Ups:
- Re: apparently missing locking in if_bnx.c
  - From: Greg Troxel
- Re: apparently missing locking in if_bnx.c
  - From: Manuel Bouyer

References:
- Re: apparently missing locking in if_bnx.c
  - From: Beverly Schwartz
- Re: apparently missing locking in if_bnx.c
  - From: Manuel Bouyer
- Re: apparently missing locking in if_bnx.c
  - From: Beverly Schwartz
- Re: apparently missing locking in if_bnx.c
  - From: Manuel Bouyer
- Re: apparently missing locking in if_bnx.c
  - From: Manuel Bouyer
- Re: apparently missing locking in if_bnx.c
  - From: Beverly Schwartz
- Re: apparently missing locking in if_bnx.c
  - From: Manuel Bouyer
- Re: apparently missing locking in if_bnx.c
  - From: Beverly Schwartz
- Re: apparently missing locking in if_bnx.c
  - From: Manuel Bouyer
- Re: apparently missing locking in if_bnx.c
  - From: Beverly Schwartz
- Re: apparently missing locking in if_bnx.c
  - From: Manuel Bouyer

Prev by Date: Re: apparently missing locking in if_bnx.c
Next by Date: Re: apparently missing locking in if_bnx.c
Previous by Thread: Re: apparently missing locking in if_bnx.c
Next by Thread: Re: apparently missing locking in if_bnx.c
Indexes:

Home | Main Index | Thread Index | Old Index