Subject: tlp driver receive ring overruns
To: None <tech-net@netbsd.org>
From: Andreas Johansson <ajo@wopr.campus.luth.se>
List: tech-net
Date: 11/10/2000 12:24:32
For anyone suffering from this problem, I've forwarded my PR reply here,
see below. For some reason my PR reply hasn't shown in the PR database
yet, have I done something wrong or is the processing just slow?

/Andreas

---------- Forwarded message ----------
Date: Fri, 10 Nov 2000 11:12:12 +0100 (CET)
From: Andreas Johansson <ajo@wopr.campus.luth.se>
To: gnats-bugs@netbsd.org
Cc: wolfgang@wsrcc.com, dave@dtsp.co.nz
Subject: Re: kern/10764

I believe I've tracked down the source to this problem.

Here's what I think happens:

1. The tulip gets an overrun interrupt.
2. The overrun handling code takes the current polling position in the
   decriptor ring and writes it back into the chip using
   TULIP_WRITE(sc, CSR_RXLIST, TULIP_CDRXADDR(sc, sc->sc_rxptr));
3. The CSR_RXLIST contains a pointer to the beginning of the receive
   descriptor ring, and therefore this will become the new first position
   in the list. This makes the list sc->sc_rxptr entries smaller than
   before from the tulip's sense of view.
4. Goto 1

After repeating several times there will be only one descriptor left in
the list.

I haven't verified that the tulip can take underruns after my bugfix,
but it clearly looks like a bug to me, and it has happened to me several
times in an environment with a somewhat loaded server that is slower than
it's clients. This server produced about 3500 underruns per 10 minutes
yesterday even with minimal load.

I'd upgrade the criticality of this bug - my machine is a pretty bad NFS
server after this has happened.

Here is a patch that should solve this problem:

-----------
--- tulip.c.orig	Fri Nov 10 03:05:36 2000
+++ tulip.c	Fri Nov 10 03:10:56 2000
@@ -1154,6 +1154,19 @@
 				/* Get the receive process going again. */
 				if (sc->sc_tdctl_er != TDCTL_ER) {
 					tlp_idle(sc, OPMODE_SR);
+#if 1
+					/* BUGFIX: Don't reset beginning of
+					   list to current position! The 
+					   descriptor ring will become smaller
+					   and smaller from the tulip's sense
+					   of view. /ajo */
+
+					/* Make sure all packets are received */
+					tlp_rxintr(sc);
+
+					/* Reset list pointer to beginning */
+					sc->sc_rxptr = 0;
+#endif
 					TULIP_WRITE(sc, CSR_RXLIST,
 					    TULIP_CDRXADDR(sc, sc->sc_rxptr));
 					TULIP_WRITE(sc, CSR_OPMODE,
-----------

/Andreas