tech-kern: Re: Serious SCB Timeout problems in 2.0.2 on Alpha w/ fxp devices

Subject: Re: Serious SCB Timeout problems in 2.0.2 on Alpha w/ fxp devices
To: Christos Zoulas <christos@astron.com>
From: Stephen Jones <smj@cirr.com>
List: tech-kern
Date: 06/26/2005 10:34:09

On Jun 26, 2005, at 9:22 AM, Christos Zoulas wrote:
> Maybe it is the amount of traffic that is different? I.e. How much
> traffic do you push through the interface daily?

I'm not sure that really matters.  On the two production CS20s the 
errors always occur on the fxp1
interface and almost never on the fxp0 interface.  This could be why 
people always reply with
"I don't see this problem" since they probably only have one interface 
active anyway.

More interestingly, both machines have significantly higher traffic on 
the fxp0 interface because
that is used for back end (NIS, NFS and other traffic) while the fxp1 
is on the public network.

Errors do get logged and a result of that can be seen in netstat -i on 
both machines (this is after 11 to 12 days of uptime per machine):

Name  Mtu   Network               Ipkts   Ierrs          Opkts   Oerrs 
Colls
fxp0  1500  <Link>       162086694       0 165885486          0        0
fxp1  1500  <Link>        34167868        0    33341892       62        
0

Name  Mtu   Network               Ipkts   Ierrs           Opkts  Oerrs 
Colls
fxp0  1500  <Link>         80684076        0   88500884          0      
   0
fxp1  1500  <Link>         19507608        0   16336390        27       
  0

Both interfaces include ifconfig flags of:   media 100baseTX mediaopt 
full-duplex
as well as FastFD settings in SRM.

I'm not sure there is a one to one SCB timeout incident per outbound 
error .. the recovery
time is typically 15 or 20 seconds though no message is logged that the 
interface has
reset .. just that the SCB timed out.