Subject: Re: got drivers?
To: Manuel Bouyer <bouyer@antioche.eu.org>
From: Dieter <netbsd@sopwith.solgatos.com>
List: netbsd-help
Date: 01/23/2005 12:41:39
In message <20050123142237.GA3535@antioche.eu.org>, Manuel Bouyer writes:
> On Sat, Jan 22, 2005 at 06:19:19PM +0000, Dieter wrote:
> > > > SCSI disks haven't been keeping up (small capacity and expensive) so I added
> > > > a 250 GB ATA disk.  I/O to the ATA disk causes problems with rs-232 and ethernet.
> > > > 
> > > > 	com1: 5 silo overflows, 0 ibuf floods
> > > > 	de1: abnormal interrupt: transmit underflow
> > > > 
> > > > I'm guessing a latency problem servicing interrupts, but that's just a guess.
> > > 
> > > At last the "transmit underflow" is that the tulip chip couldn't read data
> > > fast enouth from the host memory: to minimize latency, ethernet chips start
> > > transmitting on the wire before they have read the full packet from host
> > > memory. Now, if they can't read data fast enouth, they have to stop
> > > transmitting, which means that the packet is lost.
> > 
> > I'm getting deja-vu here.  I'm sure I read about this problem 15-20 years ago.
> > 
> > Awhile back I tried reducing the MTU, which greatly reduced, perhaps eliminated
> > the problem, but then it didn't like large incoming packets.
> > 
> > I'm planning on upgrading to gigabit Ethernet, do any of the giga chips/boards
> > have enough buffer space to avoid this stupid problem?
> 
> Even the tulip driver has enouth buffer, it just needs to be switched to
> store and forward mode.

I added some debugging output, and it appears to me that the driver thinks my
boards don't have store and forward mode:

     de1: abnormal interrupt: transmit underflow csr=0x20 mask=0x20
     sc->tulip_flags=0x8140100 sc->tulip_cmdmode=0x2e022 sc->tulip_features=0x3200

> BTW, if you're using the de driver, I guess you're
> running a quite old release. Maybe the new tlp driver would switch to
> store and forward mode automatically in such situation.

I'm running 1.6.2.  The tlp driver was at least as bad, and seemed worse.
It seemed like more windows timed out and went away with tlp.
With de, windows freeze up awhile, but usually recover after the wd i/o stops.
Windows only rarely time out and go away with de.

2.0 panics within a minute or two of going multiuser.
[ The "2.0 panics "size > 0" (zero-sized mapping)" thread in netbsd-help and port-alpha ]

> > > The serial problem could be the same (if a device grabs the PCI bus for too
> > > long, then the serial chip's fifo will overflow). On i386, serial interrupts
> > > are above splhigh, which means no other subsystems can block them (exept
> > > IPIs on SMP systems).
> > 
> > This is on alpha, single CPU.  Is there some knob in the wd driver I can
> > turn to get it to not hold the bus too long?
> 
> It's not in wd, it a PCI setting issue, probably the same as reported in
> http://mail-index.netbsd.org/port-alpha/2005/01/13/0000.html
> 
> There's a proposed hack here, you can try to adapt it to pciide.
> Also use pcictl dump to check the latency timer setting or the devices.

pcictl pci0 list
000:05:0: Q Logic ISP1020 (SCSI mass storage, revision 0x05)
000:06:0: Digital Equipment DECchip 21040 ("Tulip") Ethernet (ethernet network, revision 0x23)
000:07:0: Digital Equipment DECchip 21040 ("Tulip") Ethernet (ethernet network, revision 0x24)
000:08:0: Intel 82378ZB System I/O (SIO) (miscellaneous prehistoric, revision 0x43)
000:09:0: Digital Equipment DECchip 21152 PCI-PCI Bridge (PCI bridge, revision 0x02)
000:11:0: CMD Technology PCI0646 (IDE mass storage, interface 0x80, revision 0x01)
pcictl pci1 list
002:05:0: Q Logic ISP1020 (SCSI mass storage, revision 0x05)
002:06:0: HiNT HB1 PCI-PCI Bridge (PCI bridge, revision 0x11)
pcictl pci2 list
003:08:0: NEC uPD72871 IEEE 1394 OHCI Host Controller (Firewire serial bus, interface 0x10, revision 0x01)
003:09:0: NEC USB Host Controller (USB serial bus, interface 0x10, revision 0x41)
003:09:1: NEC USB Host Controller (USB serial bus, interface 0x10, revision 0x41)
003:09:2: NEC USB Host Controller (USB serial bus, interface 0x20, revision 0x02)
pcictl pci0 dump -d 5 | grep Latency
    Latency Timer: 0x40
    Maximum Latency: 0x00
pcictl pci0 dump -d 6 | grep Latency
    Latency Timer: 0x00
    Maximum Latency: 0x00
pcictl pci0 dump -d 7 | grep Latency
    Latency Timer: 0x00
    Maximum Latency: 0x00
pcictl pci0 dump -d 8 | grep Latency
    Latency Timer: 0x00
    Maximum Latency: 0x00
pcictl pci0 dump -d 9 | grep Latency
    Latency Timer: 0xff
pcictl pci0 dump -d 11 | grep Latency
    Latency Timer: 0x00
    Maximum Latency: 0x04
pcictl pci1 dump -d 5 | grep Latency
    Latency Timer: 0x40
    Maximum Latency: 0x00
pcictl pci1 dump -d 6 | grep Latency
    Latency Timer: 0xf8
pcictl pci2 dump -d 8 | grep Latency
    Latency Timer: 0x00
    Maximum Latency: 0x00
pcictl pci2 dump -d 9 | grep Latency
    Latency Timer: 0x08
    Maximum Latency: 0x2a

Is there a "PCI latency for dummies" README/HOWTO/FAQ somewhere?