tech-net archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: apparently missing locking in if_bnx.c

For others trying to repeat this kind of stress test:

Note that we've found that actually triggering a problem seems to be
very dependent on all sorts of things that shouldn't matter, e.g. i386
vs amd64, firmware revisions, etc.  But that may be about bugs in
private code; we don't have enough experience to make this statement
about the workqueue/MPSAFE bug.

We were reliably able to induce a lockup with

  netbsd-6 from yesterday
  2 machines
  3 bnx each, cabled back-to-back in pairs
  each machine runs a web server, with a ~10G+ file
  each machine runs 3 wget, pulling per interface from the other machine

so this is 6 tcp streams, one per direction on each of 3 pairs of
interfaces.  With the workququeue/remove-MPSAFE patch, the machines are
totally solid under this load.  With the mutex patch I posted earlier,
they were almost solid, but not quite (probably because access to the tx
dma setup hardware was not serialized).

Further, with the patch and LOCKDEBUG, the systems run without
crashing/panicing, but about 40x slow.  Without the patch and with
LOCKDEBUG, there were mysterious hangs.

I would expect that on most machines, it wouldn't be possible to provoke
the bug with only one interface.

My understanding is that the above stress test with 3 pairs of wm
(yesterday or today netbsd-6) also leads to hangs.  (wm doesn't use
workqueues, so it must be something else.  But wm quad-port cards seem
to have funky bridge chips that netbsd-5 at least doesn't handle.)

Attachment: pgpeJPcm4QxVL.pgp
Description: PGP signature

Home | Main Index | Thread Index | Old Index