Subject: CS20 ethernet hang oddities
To: None <port-alpha@netbsd.org>
From: Stephen M. Jones <smj@cirr.com>
List: port-alpha
Date: 11/08/2002 12:21:04
Hi there.  I just had a long night.  This is mostly aimed at
the CS20 users who are using NetBSD 1.6 though I'm not sure
this is really a NetBSD issue at all.
A quick overview of the configuration.
I've got four AS1200/5305 machines and four CS20s (the later
not quite in production).
Each machine has a public ethernet interface and a private one.
I've got an SMC EZ switch 1016DT for public traffic and an
SMC EZ switch 108DT for private traffic.  Private traffic is
strictly NFS at the moment.
So yesterday I go up on site and plug in two of the CS20s with
a fresh install of 1.6 all ready to go.  I put the primary 
ethernet interface on the private network (so that I can make
use of its I2C remote managment semi-securely) and put the
secondary on the public side .. bring up the machines, they
bring up the remote file systems and start going at it.. 
everything is great .. for about 3 hours.
Then suddenly, without panic on the console one of the CS20s 
freezes up .. at the same time, the ethernet switches seem to
freeze as well .. all hosts are inaccessible .. the LEDs on
both ethernet interfaces as well as the switch ports for the
CS20 are strobing ..
reset the switches, other hosts are pingable for a few seconds ..
unplug the CS20 cables and other hosts are pingable until you
plug the CS20 back in.
Thinking it was just this one CS20 causing the problem, I took
it home last night .. after about 2 hours of being home, the
other CS20 did preceisely the same thing .. no message on
the console nor in syslog .. switch ports asserted and all
hosts unreachable .. until 6 hours later when I could go on
site :(  
When I got there, I noticed that the two LEDs on the ethernet
interfaces on the CS20 and the switches were strobbing just 
like with the other CS20.  I started up a ping and then unplugged
the ethernet cables to the CS20 .. all hosts were accessible.  
Anyone have any ideas whats going on?  Is what I'm doing (two
switches one for public traffic one for NFS) off the mark?
Seeing that I've done this with the 5305s for a long time I
don't think thats the problem.
I know I would be better off using intelligent hubs, but before
I go out and get them .. whats going on with the CS20s?  They
are using the final firmware rev that API had.
Since I don't see any netbsd complaints about the ethernet
inteface, stray interrupts or any panic message I'm going to
assume that this is a hardware issue rather than a software one.