Subject: wi0: recap
To: None <current-users@netbsd.org, port-i386@netbsd.org>
From: Peter Seebach <seebs@plethora.net>
List: current-users
Date: 12/22/2000 23:56:59
jhawk very usefully pointed out that I have given a lot of little tidbits
about the current state of my PRISM II driver hacks, but no useful summaries
of where I am, and that it would be an immense amount of work to sort through
all the cruft and figure out what I'm doing, or how it's working.

So, a summary of what works, and what doesn't, and why I'm stumped.

The cards in question are Linksys and D-Link PRISM II cards.  These are very
similar to a number of other cards on the market; for instance, the Symbol
Technologies Spectrum 24 appears to use an identical set of registers, which
use the same bits, and the same numbers for various commands.

The NetBSD-current driver has some amount of support for these cards, but it
didn't work (when I started) with the D-Link or Linksys cards.  To fix this,
I added them to pcmciadevs, and added them to the table in if_wi.c, so the
driver can correctly identify them as prism2 devices.

That helps.  It doesn't make the cards probe.  I looked at the FreeBSD driver
(which is currently totally unsupported, so far as I can tell), and I stole
a couple of fixes.  After thinking long and hard about Lennart's point that
DELAY(100000) is always a bad idea, I changed the "command timeout" loop
to DELAY(100) every iteration if the command is "initialize".  This seems to
get us through the loop successfully, generally in around 500-900 iterations.
This seems to be good enough.  I guess it could be adjusted to DELAY(10), but
100 seems pretty small.  I also noticed that the PRISM II patch they have
always sets PARAM1 and PARAM2 to 0 when sending commands with only one
parameter, and this seems like a good idea.

At this point, the cards probe.  wiconfig works, ifconfig works.
Unfortunately, under certain circumstances, the cards hang.  Hanging looks
like:
	* The first command after a hang times out.
	* The second command also times out, and when it's done, the command
	  register has the busy bit set.
	* The third command, of course, times out because the command register
	  has the busy bit set.  (My patch tries to wait for the command
	  register, Just In Case.)
After this, nothing will ever work again.  No interrupts are received, the
card is hosed.  "ifconfig wi0 down; ifconfig wi0 up" fixes it, though.

The circumstances under which this happens:
	* It appears to *only* happen in ad-hoc mode.  I was unable to get
	  it to fail when it was in infrastructure mode.
	* It appears to happen only when other cards either send certain
	  packets, or leave the network.  The most reliable thing to trigger
	  this is to have a Mac with an airport "turn airport off".  Boom.
	  Note that the Mac wasn't actually *using* the airport for
	  networking, it was just turned on.
	* It occasionally fails when tweaking parameters on the Mac.

I have not been able to reliably reproduce this just by having two NetBSD
machines on a network; it seems to require me to have the Mac involved.
I can't reproduce it at all if the card is in infrastructure mode, talking
to a base station.

I have had a number of theories about this.  None have panned out.  My current
patched driver watches for every interrupt except WI_EV_TICK.  During a
command, it blocks WI_EV_CMD, but the rest of the time, I watch for that, even
though it should never happen.

I have added tests to a couple of the other interrupt handlers.  It appears
that, in infrastructure mode, we occasionally get link status interrupts.
We currently ignore them; this seems to be harmless.  We don't get them in
ad-hoc mode.  We get no interrupts that the code isn't supposed to be
handling.  No interrupts are received when the card suddenly dies.

What's got me stumped is that the only thing I can find in the card's state
that looks "different" is that it starts timing out.  No bits set in
registers.  No interrupts.  No nothing.

Now, since it appears to work okay with a base station, I can probably work
around this by getting one, but it seems to me that we really *OUGHT* to be
able to run in ad-hoc mode.  Indeed, I'd think we could probably run in base
station mode, but I suspect we need to write a lot of additional code to
handle that.

Mostly, I'm looking for ideas on what could have gone wrong with the card
to make everything time out, or how I could detect it or block it.  Feedback
on experience people have had with any of the other PRISM II cards, especially
in ad-hoc mode, would be very useful.  Driver docs wouldn't hurt.  :)

-s