Subject: Re: kern/10707: "transmit aborted" errors from vr driver
To: None <current-users@NetBSD.ORG>
From: Martin Husemann <martin@rumolt.teuto.de>
List: current-users
Date: 08/01/2000 23:02:30
I'll need a little ethernet-driver-guru advise, please.

I filed a PR (kern/10707) a few days ago (which may be a dup of a slightly
different PR, kern/7948). There were speculations of this all being a
hardware bug. So I tried to find out, but now I'm not sure...

I augmented sys/dev/pci/if_vr.c with debug output on the following theory:

The driver logs this error message, when the interrupt status register
indicates a transmit abort error.

IMHO there should be at least one TX descriptor, that has the coresponding
VR_TXSTAT_ABRT bit set. So I added code to print another error when this
descriptor is found (which should happen right after printing the first
error in vr_txeof). There are checks for other error conditions at that
place already, guarded by the appropriate if (txstat & VR_TXSTAT_ERRSUM),
and I simply added a few lines there:

			if (txstat & VR_TXSTAT_LATECOLL)
				ifp->if_collisions++;
+			if (txstat & VR_TXSTAT_ABRT)
+				printf("%s: TSR1 has TXSTAT_ABRT\n",
+					sc->vr_dev.dv_xname);

With this additional code, I do still get the "vr0: transmit aborted" messages,
but I never got the "TSR1 has TXSTAT_ABRT" message.

This either indicates my fundamental misunderstanding of the chip, or a
flaw in the chip (unsolicited setting of the error bit). In the later case
we should move the diagnostic message to the place where I added my test
printf and ignore the interrupt status bit completely (or make vr_txeof return
a value indicating whether we need to retransmit that packet or not).

Am I completely on the wrong path? If the error bit realy is simply bogus,
should the other side see duplicate packets? (I'm only experiencing this
with heavy NFS traffic via UDP, would those dup packets show up somewhere?)


Confused...

Martin