tech-kern: Re: Serious SCB Timeout problems in 2.0.2 on Alpha w/ fxp devices

Subject: Re: Serious SCB Timeout problems in 2.0.2 on Alpha w/ fxp devices
To: Stephen Jones <smj@cirr.com>
From: Christos Zoulas <christos@zoulas.com>
List: tech-kern
Date: 06/26/2005 15:19:44

On Jun 26, 10:34am, smj@cirr.com (Stephen Jones) wrote:
-- Subject: Re: Serious SCB Timeout problems in 2.0.2 on Alpha w/ fxp devices

| On Jun 26, 2005, at 9:22 AM, Christos Zoulas wrote:
| > Maybe it is the amount of traffic that is different? I.e. How much
| > traffic do you push through the interface daily?
| 
| I'm not sure that really matters.  On the two production CS20s the 
| errors always occur on the fxp1
| interface and almost never on the fxp0 interface.  This could be why 
| people always reply with
| "I don't see this problem" since they probably only have one interface 
| active anyway.
| 
| More interestingly, both machines have significantly higher traffic on 
| the fxp0 interface because
| that is used for back end (NIS, NFS and other traffic) while the fxp1 
| is on the public network.
| 
| Errors do get logged and a result of that can be seen in netstat -i on 
| both machines (this is after 11 to 12 days of uptime per machine):
| 
| Name  Mtu   Network               Ipkts   Ierrs          Opkts   Oerrs 
| Colls
| fxp0  1500  <Link>       162086694       0 165885486          0        0
| fxp1  1500  <Link>        34167868        0    33341892       62        
| 0
| 
| 
| Name  Mtu   Network               Ipkts   Ierrs           Opkts  Oerrs 
| Colls
| fxp0  1500  <Link>         80684076        0   88500884          0      
|    0
| fxp1  1500  <Link>         19507608        0   16336390        27       
|   0
| 
| Both interfaces include ifconfig flags of:   media 100baseTX mediaopt 
| full-duplex
| as well as FastFD settings in SRM.
| 
| I'm not sure there is a one to one SCB timeout incident per outbound 
| error .. the recovery
| time is typically 15 or 20 seconds though no message is logged that the 
| interface has
| reset .. just that the SCB timed out.

Are the two interfaces connected to the same type of switches? Could it
be that the switch is causing the problem to the internal interface?

christos