Subject: ex driver problem/bug/???
To: None <,>
From: Robert Elz <kre@munnari.OZ.AU>
List: current-users
Date: 10/03/2005 21:08:37
I have no idea which list is the right one for this, so I'm just
using everyone's favourite catch-all ... (and I am experiencing the
problem with a -current system, though nothing in the code in the
area I suspect is the problem looks to have been touched in ages).

[Turns out by the time I finished writing this message that that
is incorrect, just what changed wasn't something I had expected to
be directly relevant, so current-users is probably appropriate].

The problem: I have a PC (ASUS P4P800 type) system into which I've
added a couple of 3com 3c905C type cards.

When one of those gets moderate traffic loads, the system hangs.

"Moderate loads" means about as much as I can generate from a
real application transmitting to the ethernet connected to the
ex card via a bridge from an 802.11b net - that is, I'm sending
from a wireless (11Mbps nominal) via a wireless hub to the 3com card.

That's just to indicate that the traffic load isn't really all that
high, typically around 400-450 pps (I'd assume, but haven't checked
that the packets are 1500 bytes - they're from a

	tar cf - . | ssh victim-with-ex 'cd /somewhere && tar xpf -'

type pipe).   There's the typical about 1/2 rate TCP ACK traffic out.

The ethernet (wired) is a half duplex (100baseTX) (some cheap hub).

Aside from the fast that background noise traffic levels don't seem
to be enough to cause the problem, I doubt that any of that is very

For completeness:

ex0 at pci2 dev 9 function 0: 3Com 3c905C-TX 10/100 Ethernet with mngmt (rev. 0x
ex0: interrupting at irq 5
ex0: MAC address 00:01:03:40:8a:e5
bmtphy0 at ex0 phy 24: Broadcom 3c905C internal PHY, rev. 6
bmtphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
ex1 at pci2 dev 10 function 0: 3Com 3c905B-TX 10/100 Ethernet (rev. 0x64)
ex1: interrupting at irq 5
ex1: MAC address 00:10:5a:76:fd:2f
bmtphy1 at ex1 phy 24: Broadcom 3c905B internal PHY, rev. 0
bmtphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto

ex1 isn't connected anywhere I can generate enough traffic to test
to see if it has the same problem.   irq5 is shared with half the
known universe (but I don't think that's a problem either).

I can break into DDB when the system has hung.   When I stop it, the
system is always at one of three addresses (every time, and I have done
this *many* times - into the hundreds I think).   It can be at
ex_intr+0x6B ex_intr+0xB1 or at ex_txstat+0x1A called from ex_intr+0x4CD.
(Those addresses from a standard compile of a locally configured
kernel from -current in the past few days, before I started adding

I'm sure there's some explanation, though given that it is PC hardware,
I am pretty sure I don't want to know what it is, why the DDB halt only
ever occurs at those 3 precise locations.   As best I can tell, each of
them is immediately after a bus_space operation, that is, ex_intr+0x6B
is just after the bus_space_read_2() that reads the interrupt status
at the top of the for(;;) loop in ex_intr().   ex_intr+0xB1 is just at/after
the bus_space_Write_2() operation that ack's the interrupt, a little
lower down, and ex_txstat+0x1A is at the bus_space_read_1() in the while
loop test.   Of course, it may just be those 3 operations take the
vast majority of the elapsed time in the loop the system gets itself into,
and I just haven't had it happen enough to ever have it stop somewhere

Anyway, while diagnosing this, looking to see if I could see what was
happening, I managed to spend a long time in ddb gazing at the data
structs, etc, then typed "cont" (just for fun, previous times this just
sent the kernel back into its loop, which is what I'd expect to happen
given I did nothing to prevent that).

This time however, the kernel printed "ex0: uplistptr was 0", broke out of
its loop, and aside from having lost lots of clock interrupts (time way
wrong) everything went back to functioning normally (for another short
while anyway, until the loop happened again).

With this, and the comments in the code, I conclude here that what happened
is that while I had the kernel in ddb for ages, enough packets arrived
at the card, that it ran out of places to put them, which caused the DMA
engine in the card to stall, which causes UP_COMPLETE status, detected
by the driver, which resets the card.   Resetting the card clears the
condition that was causing the loop, and the world returns to normal.

First, this is great news, as it means I can debug without reboots, now
when the kernel hangs, I can break into ddb, do a ping -f from elsewhere
for a couple of seconds, flood the card with packets, then continue, and
everything keeps on working.   I got kind of good at doing all that
reaosnably quickly...

Next step was to determine the interrupt status when it is looping, so
I added code to print "stat" whenever ex_intr() looped 128 times (every
128 times), and given that I now know that an ex_init() will unglue
things, I also made it do that after 2048 loops.

With that, I can copy lots of data into the card, it constantly stalls
and loops in the interrupt routine, but fixes itself fairly nicely.

"stat" in the interrupt is 0x2005 - I have no idea what the 0x2000 bit
is, that one is not in any of the header files anywhere I can locate it.
I also suspect that it doesn't matter.

0x5 is INTR_LATCH (1) + TX_COMPLETE (4).    The TX_COMPLETE was kind of
obvious, since ex_txstat() was clearly being called in the loop.

Nothing very obvious from that.   Next step was to look in ex_txstat()
to find out what it is, or isn't, doing.

Here's where I am getting very confused, as I cannot see how the code in
that function (for ex type cards) can possibly do anything at all.

It starts (after the init noise) ...

        while ((i = bus_space_read_1(iot, ioh, ELINK_TXSTATUS)) & TXS_COMPLETE)
		bus_space_write_1(iot, ioh, ELINK_TXSTATUS, 0x0);

		/* followed by various "if (i & TXS_...)" type tests */

FIrst glance, that all looks normal enough.   OK, there's no barrier
operations, and I'm sure someone will say there should be, but I also
think I've heard people say that on a standard type PC while technically
wrong to omit them, nothing breaks.

So, I (we) obviously need to know "i", so I alter the structure of that
code a little, so I can get at 'i' between when it is read, and when it
is tested and print it in those instances when I am already printing a
diagnostic message in ex_intr().  The value there is (always, when I have
seen it, which is enough, but not as many as some of the others) 0x88.

So, first thing, does 0x88 have TXS_COMPLETE set - I know it doesn't
really, as the printf I added only goes off once, meaning that loop
loops just once (I knew it wasn't stuck in there from the places where
ddb would enter the code).

Here's where things get really fishy.   I think it is *impossible* for
i to have TXS_COMPLETE set.   That is, unless bus_space_read_1() is
doing something really wild, as TXS_COMPLETE is 0x8000 - one byte can
never have the 0x8000 bit set, can it?

How does/did this code ever work?   Is it doing something that matters?
Clearly the fact that it is looping for me means that something should
be being done, but isn't?

What makes this so strange, is that I have been using ex cards, including
this particular card, for years, and I've never seen anything like this

Oh, I see, this is a relatively recent change ...

In elink3reg.h (which defines TXS_COMPLETE) ...

revision 1.27
date: 2004/08/09 00:25:33;  author: mycroft;  state: Exp;  lines: +9 -9
branches:  1.27.4;  1.27.6;
Eliminate most 8-bit access.  Just the data in/out path remains.

And TXS_COMPLETE changed from 0x80 to 0x8000.   That change is
clearly incomplete, at least as it applies to the elinkxl variety
of the elink cards.

Can someone have a look, and either back out that change, or complete it?



ps: now I am convinced I know the cause, I will file a PR, as well as this