Subject: Re: 2.0 fxp timeouts
To: Izumi Tsutsui <tsutsui@ceres.dti.ne.jp>
From: Stephen Jones <smj@cirr.com>
List: port-alpha
Date: 01/21/2005 18:39:35
I've applied the patch to a non-multicpu kernel, but unfortunately the
issue persists.

fxp1: WARNING: SCB timed out!
fxp1: WARNING: SCB timed out!
fxp1: device timeout

What is interesting is that the fxp1 interface is a public network 
interface
while the fxp0 interface (which does much more in the way of traffic) 
is a
back end nfs interface.

Its been up just after an hour with the patched kernel (I used the Jan 
15th
source from the NetBSD-2-0-release directory)

Name  Mtu   Network       Address              Ipkts Ierrs    Opkts 
Oerrs Colls
fxp0  1500  <Link>        00:02:56:00:0f:ad   119647     0   132672     
0     0
fxp0  1500  10/24         vinland1            119647     0   132672     
0     0
fxp1  1500  <Link>        00:02:56:00:0f:ae    12295     0    10323     
1     0
fxp1  1500  192.94.73/24  vinland.freeshell    12295     0    10323     
1     0

On Jan 21, 2005, at 6:31 AM, Izumi Tsutsui wrote:

> In article <200501142046.j0EKkTdd021953@egsner.cirr.com>
> smj@cirr.com wrote:
>
>> fxp1: WARNING: SCB timed out!
>> fxp1: device timeout
>
> How about the attached patch?
> ---
> Izumi Tsutsui
> tsutsui@ceres.dti.ne.jp
>
> --- i82557.c.orig	2005-01-19 00:24:59.000000000 +0900
> +++ i82557.c	2005-01-19 00:25:10.000000000 +0900
> @@ -916,7 +916,7 @@
>  			break;
>  		m = NULL;
>
> -		if (sc->sc_txpending == FXP_NTXCB) {
> +		if (sc->sc_txpending == FXP_NTXCB - 1) {
>  			FXP_EVCNT_INCR(&sc->sc_ev_txstall);
>  			break;
>  		}
> @@ -1070,7 +1070,7 @@
>  #endif
>  	}
>
> -	if (sc->sc_txpending == FXP_NTXCB) {
> +	if (sc->sc_txpending == FXP_NTXCB - 1) {
>  		/* No more slots; notify upper layer. */
>  		ifp->if_flags |= IFF_OACTIVE;
>  	}
> @@ -1087,9 +1087,23 @@
>  		 * Cause the chip to interrupt and suspend command
>  		 * processing once the last packet we've enqueued
>  		 * has been transmitted.
> +		 *
> +		 * To avoid a race between updating status bits
> +		 * by the fxp chip and clearing command bits
> +		 * by this function on machines which don't have
> +		 * atomic methods to clear/set bits in memory
> +		 * smaller than 32bits (both cb_status and cb_command
> +		 * members are uint16_t and in the same 32bit word),
> +		 * we have to prepare a dummy TX descriptor which has
> +		 * NOP command and just causes a TX completion interrupt.
>  		 */
> -		FXP_CDTX(sc, sc->sc_txlast)->txd_txcb.cb_command |=
> -		    htole16(FXP_CB_COMMAND_I | FXP_CB_COMMAND_S);
> +		sc->sc_txpending++;
> +		sc->sc_txlast = FXP_NEXTTX(sc->sc_txlast);
> +		txd = FXP_CDTX(sc, sc->sc_txlast);
> +		/* BIG_ENDIAN: no need to swap to store 0 */
> +		txd->txd_txcb.cb_status = 0;
> +		txd->txd_txcb.cb_command = htole16(FXP_CB_COMMAND_NOP |
> +		    FXP_CB_COMMAND_I | FXP_CB_COMMAND_S);
>  		FXP_CDTXSYNC(sc, sc->sc_txlast,
>  		    BUS_DMASYNC_PREREAD|BUS_DMASYNC_PREWRITE);
>
> @@ -1221,6 +1235,11 @@
>  		FXP_CDTXSYNC(sc, i,
>  		    BUS_DMASYNC_POSTREAD|BUS_DMASYNC_POSTWRITE);
>
> +		/* skip dummy NOP TX descriptor */
> +		if ((le16toh(txd->txd_txcb.cb_command) & FXP_CB_COMMAND_CMD)
> +		    == FXP_CB_COMMAND_NOP)
> +			continue;
> +
>  		txstat = le16toh(txd->txd_txcb.cb_status);
>
>  		if ((txstat & FXP_CB_STATUS_C) == 0)
> --- i82557reg.h.orig	2005-01-19 00:25:22.000000000 +0900
> +++ i82557reg.h	2005-01-19 00:25:26.000000000 +0900
> @@ -368,6 +368,7 @@
>  #define FXP_CB_STATUS_C		0x8000
>
>  /* commands */
> +#define FXP_CB_COMMAND_CMD	0x0007	/* XXX how about FXPF_IPCB case? */
>  #define FXP_CB_COMMAND_NOP	0x0
>  #define FXP_CB_COMMAND_IAS	0x1
>  #define FXP_CB_COMMAND_CONFIG	0x2
>