Subject: Re: implementation: NetBSD on AS1200s
To: Michael L. Hitch <mhitch@lightning.msu.montana.edu>
From: Stephen M Jones <smj@cirr.com>
List: port-alpha
Date: 03/06/2002 16:23:59
Michael Hitch writes:

>   I don't think either the de or tlp driver properly set the DE500
> NICs into full-duplex.  I've got two systems with two DE500-BA each,
> and even with a -current tlp driver, it was still generating
> collisions.

Okay, so while this is an inconvenience its nothing critical.
 
>   Ah - this might be part of your problem.  I've got a couple of 3COM
> 3C980 server cards and a 3C905B card.  The elinkxl driver seems to get
> an interrupt once in a while which the driver doesn't think came from
> the card.  That generates the stray interrupts.  On my PC164, if I
> get more than 3 or 4 stray interrupts, the alpha disables that interrupt
> and the interface stops working.  Do you see one of the messages
> that says "...; stopped logging"?  If so, that's when the interrupt
> has been disabled.

Okay, this is a good trail!  While I'm not seeing ' ... stopped logging'
anywhere, I am getting a number of 'stray kn300 irq' messages that build
up over time.   But probably less than 10.  Since BOTH of these machines
have 3COM 905 cards, that concerns me.  

Also, I've not mentioned it before but there are occasional NFS outtages
(brief) where one system won't be accessible for just a few seconds, 
though it doesn't happen alot, just enough to log.

Typically something like this (actual dmesg output from today, uptime
1+ day):

stray kn300 irq 48
nfs server sdf1:/sys: not responding
nfs server sdf1:/sys: is alive again
nfs server sdf1:/sys: not responding
nfs server sdf1:/sys: is alive again
nfs server sdf1:/sys: not responding
nfs server sdf1:/sys: is alive again
uvn_attach: blocked at 0x0xfffffc002239c080 flags 0x4
uvn_attach: blocked at 0x0xfffffc000c0a0db8 flags 0x4
stray kn300 irq 16
uvn_attach: blocked at 0x0xfffffc004e4b2560 flags 0x4

and who are those devices?

ex0: interrupting at kn300 irq 48
ex1: interrupting at kn300 irq 16

Note again that NFS traffic is being routed over ex1 (to another ex device
on the other system) via a cross over cable (point to point).

data flow looks like this (both sides)

(side A)
ex1   1500  <Link>        00:50:da:df:7a:00  8302159     0  7772794     0     0
ex1   1500  10/24         otaku1             8302159     0  7772794     0     0

(side B)
ex0   1500  <Link>        00:50:da:22:98:b8  7768047     0  8309646     0     0
ex0   1500  10/24         sdf1               7768047     0  8309646     0     0

Looks good right?  no overruns or errors, just this stray interrupt occuring.
now if that device stops working after say 8 stray interrupts, that could
explain my mysterious hangs.  

Does anyone have any solutions?  I wouldn't even mind buy 4 new ethernet
cards if its a problem that can't be corrected within the software 
driver.