Subject: Re: fxp on i82559 still has some timeout problems, even at 10baseT
To: Izumi Tsutsui <tsutsui@ceres.dti.ne.jp>
From: Greg A. Woods <woods@weird.com>
List: tech-kern
Date: 11/10/2003 20:33:35
[ On Saturday, November 8, 2003 at 21:59:38 (+0900), Izumi Tsutsui wrote: ]
> Subject: Re: How can I help with hung vnlock()'ed clients?
>
> Does the attached patch (the idea taken from OpenBSD) fix
> your "fxp0: device timeout" problem?

I was just about to report that your changes had seemed to fix the
problem for me with the on-board fxp0 on the Intel STL/2 motherboard
when suddenly it spewed the timeout error and the NFS mount point I was
copying from has hung again.  The pattern of error messages seems
somewhat different this time (although this time I didn't enable the
link0 microcode feature and that may be why I have not yet seen the
"dmasync timeout" errors this time):

Nov 10 20:04:27 always /netbsd: fxp0: WARNING: SCB timed out!
Nov 10 20:04:33 always last message repeated 5 times
Nov 10 20:04:38 always /netbsd: fxp0: device timeout
Nov 10 20:04:48 always /netbsd: fxp0: WARNING: SCB timed out!
Nov 10 20:04:53 always /netbsd: fxp0: device timeout
Nov 10 20:05:20 always /netbsd: fxp0: WARNING: SCB timed out!
Nov 10 20:05:22 always last message repeated 2 times
Nov 10 20:05:24 always /netbsd: nfs server proven.weird.com:/build: not responding
Nov 10 20:05:24 always /netbsd: fxp0: WARNING: SCB timed out!
Nov 10 20:05:29 always /netbsd: fxp0: device timeout
Nov 10 20:05:36 always /netbsd: fxp0: WARNING: SCB timed out!
Nov 10 20:05:39 always /netbsd: fxp0: WARNING: SCB timed out!
Nov 10 20:05:44 always /netbsd: fxp0: device timeout
Nov 10 20:06:39 always /netbsd: fxp0: WARNING: SCB timed out!
Nov 10 20:06:44 always /netbsd: fxp0: device timeout
Nov 10 20:07:16 always /netbsd: fxp0: WARNING: SCB timed out!
Nov 10 20:08:37 always last message repeated 38 times
Nov 10 20:11:59 always last message repeated 40 times
Nov 10 20:12:04 always /netbsd: fxp0: device timeout
Nov 10 20:15:10 always /netbsd: fxp0: WARNING: SCB timed out!

It did _seem_ to last a lot longer and through much heavier traffic
conditions than before.  Of course the second fxp, an i82550, I have on
this same system can easily work an order of magnitude longer and has
never failed in any way.

At this point the interface is still working sporadically enough for
some TCP traffic and I can rlogin:

# netstat -in -I fxp0
Name  Mtu   Network       Address              Ipkts Ierrs    Opkts Oerrs Colls
fxp0  1500  <Link>        00:d0:b7:b6:ad:4b  1716551     0   760047     6 489472
fxp0  1500  fe80::/64     fe80::2d0:b7ff:fe  1716551     0   760047     6 489472
fxp0  1500  204.92.254    204.92.254.7       1716551     0   760047     6 489472

The number of "Oerrs" seems to correspond exactly, as expected, to the
number of "device timeout" messages from the kernel.

(the collision rate is high because I was doing 'ping -f' while copying
from NFS and I'm still running at only 10baseT to my switch)

# ifconfig fxp0
fxp0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
        address: 00:d0:b7:b6:ad:4b
        media: Ethernet autoselect (10baseT)
        status: active
        inet 204.92.254.7 netmask 0xffffff00 broadcast 204.92.254.255
        inet6 fe80::2d0:b7ff:feb6:ad4b%fxp0 prefixlen 64 scopeid 0x1
# uname -a  
NetBSD always 1.6.2_RC1 NetBSD 1.6.2_RC1 (GENERIC) #5: Mon Nov 10 17:41:56 EST 2003     woods@proven:/build/woods/proven/NetBSD-1.6.x-i386-i386-obj/work/woods/m-NetBSD-1.6/sys/arch/i386/compile/GENERIC i386

FYI this device probes as:

	fxp0 at pci0 dev 3 function 0: i82559 Ethernet, rev 8
	fxp0: interrupting at irq 10
	fxp0: Ethernet address 00:d0:b7:b6:ad:4b
	inphy0 at fxp0 phy 1: i82555 10/100 media interface, rev. 4
	inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto

Back to the PCI add-on fxp1 for this system.....

Note that doing so restores the NFS mount to working condition.

Of course NFS still shouldn't hang, at least not so permanently, given
the mount options include '-i':

proven.weird.com:/build /proven/build   nfs     -b,-i,rw,nodev,nosuid   0 0


I'll leave the tests I was running in a loop overnight on the i82550
(fxp1 on this box) just to make sure it still really does run better
than fxp0 has.

-- 
						Greg A. Woods

+1 416 218-0098                  VE3TCP            RoboHack <woods@robohack.ca>
Planix, Inc. <woods@planix.com>          Secrets of the Weird <woods@weird.com>