Subject: Re: got drivers?
To: Manuel Bouyer <bouyer@antioche.eu.org>
From: Dieter <netbsd@sopwith.solgatos.com>
List: netbsd-help
Date: 01/23/2005 21:09:12
In message <20050123142237.GA3535@antioche.eu.org>, Manuel Bouyer writes:
> On Sat, Jan 22, 2005 at 06:19:19PM +0000, Dieter wrote:
> > > > SCSI disks haven't been keeping up (small capacity and expensive) so I added
> > > > a 250 GB ATA disk.  I/O to the ATA disk causes problems with rs-232 and ethernet.
> > > > 
> > > > 	com1: 5 silo overflows, 0 ibuf floods
> > > > 	de1: abnormal interrupt: transmit underflow
> > > > 
> > > > I'm guessing a latency problem servicing interrupts, but that's just a guess.
> > > 
> > > At last the "transmit underflow" is that the tulip chip couldn't read data
> > > fast enouth from the host memory: to minimize latency, ethernet chips start
> > > transmitting on the wire before they have read the full packet from host
> > > memory. Now, if they can't read data fast enouth, they have to stop
> > > transmitting, which means that the packet is lost.
> > 
> > I'm getting deja-vu here.  I'm sure I read about this problem 15-20 years ago.
> > 
> > Awhile back I tried reducing the MTU, which greatly reduced, perhaps eliminated
> > the problem, but then it didn't like large incoming packets.
> > 
> > I'm planning on upgrading to gigabit Ethernet, do any of the giga chips/boards
> > have enough buffer space to avoid this stupid problem?
> 
> Even the tulip driver has enouth buffer, it just needs to be switched to
> store and forward mode. BTW, if you're using the de driver, I guess you're
> running a quite old release. Maybe the new tlp driver would switch to
> store and forward mode automatically in such situation.
> 
> > 
> > > The serial problem could be the same (if a device grabs the PCI bus for too
> > > long, then the serial chip's fifo will overflow). On i386, serial interrupts
> > > are above splhigh, which means no other subsystems can block them (exept
> > > IPIs on SMP systems).
> > 
> > This is on alpha, single CPU.  Is there some knob in the wd driver I can
> > turn to get it to not hold the bus too long?
> 
> It's not in wd, it a PCI setting issue, probably the same as reported in
> http://mail-index.netbsd.org/port-alpha/2005/01/13/0000.html
> 
> There's a proposed hack here, you can try to adapt it to pciide.
> Also use pcictl dump to check the latency timer setting or the devices.

The message from Chuck was helpful, thanks!  Then google found
http://www.reric.net/linux/pci_latency.html
which helped even more.

If the "Latency Timer" is set to zero, does that mean "no limit" ?
I modified pciide.c (diffs below) and amasingly enough it works!
I can now dd a raw partition and not a peep from the de driver!

sopwith # while true
> do
> dd if=/dev/rwd0b of=/dev/null bs=1024k count=1024
> done
1024+0 records in
1024+0 records out
1073741824 bytes transferred in 67.926 secs (15807523 bytes/sec)

This Latency Timer is getting stored somewhere, I booted my
old kernel to do the same dd and see if I lost any throughput,
but still have Latency Timer of 0x40.  Given this, perhaps
pcictl could be extended to allow changing these settings?
If this is stored in one of those devices that can only be
written 'x' number of times you wouldn't want to write it every
time you boot.

pcictl pci0 dump -d 11 | grep Latency
    Latency Timer: 0x40
    Maximum Latency: 0x04

Thanks gang!  One problem solved (I think).  Now I just need to figure
out why 2.0 panics, and get firewire and HD-3000 drivers.

sopwith rcsdiff -c -r1.1 pciide.c
===================================================================
RCS file: RCS/pciide.c,v
retrieving revision 1.1
diff -c -r1.1 pciide.c
*** pciide.c    2005/01/24 03:40:15     1.1
--- pciide.c    2005/01/24 04:13:29
***************
*** 713,718 ****
--- 713,731 ----
        char devinfo[256];
        const char *displaydev;
  
+ #if 1
+       /* Adjust PCI latency, 0 not working well.
+        * SCSI board has 0x40 and didn't cause problems, so try that.
+        */
+ #define PCI_LATENCY_OVERRIDE 0x40
+       uint32_t reg;
+       printf ("\n%s: Overriding default PCI latency, setting to 0x%x", self->dv_xname, PCI_LATENCY_OVERRIDE);
+       reg = pci_conf_read(pa->pa_pc, pa->pa_tag, PCI_BHLC_REG);
+       reg &= ~(PCI_LATTIMER_MASK << PCI_LATTIMER_SHIFT);
+       reg |= PCI_LATENCY_OVERRIDE << PCI_LATTIMER_SHIFT;
+       pci_conf_write(pa->pa_pc, pa->pa_tag, PCI_BHLC_REG, reg);
+ #endif
+ 
        sc->sc_pp = pciide_lookup_product(pa->pa_id);
        if (sc->sc_pp == NULL) {
                sc->sc_pp = &default_product_desc;