Subject: Strange behavior under 1.3 with esp driver?
To: None <port-sparc@NetBSD.ORG>
From: Brian Buhrow <buhrow@cats.ucsc.edu>
List: port-sparc
Date: 02/27/1998 10:56:28
	Hello.  I'm running a news server on a Sparc 5 under NetBSD/sparc 1.3.
This machine has 3 scsi busses, 1 internal and 2 external esp100A sbus
cards. The root is on the internal scsi chain, i.e. the one on the
motherboard, the overview is on one of the esp100A chips, esp1, and the
news spool is on the other esp100A chip, esp2.
	The machine seems to run fine for about 10 days at which point the
esp2 card begins experiencing timeouts and strange read/write errors appear
on the disk.  Rebooting the machine solves the problem for another 10 days
or so.  (Sometimes it runs longer, but it never makes it to three weeks.)
	This last time, about three days after it was rebooted, vmstat -i
reported:
"event chain trashed: kvm_read"
or something like that.  This after it had printed the interrupt rates for
esp0, but before it had printed the interrupt rates for esp1 and esp2.
Rebooting the machine restored normal output.
	The scsi bus with the news spool on it, esp2, takes a lot of
interrupts.  During normal operation, the interrupts/sec seems to be
between 600 and 700.  This rate is more or less constant throughout the
day.
	Is there some bug where a certain number of disk transfers or
interrupts causes corruption somewhere, either in the driver's work area,
or the chip somewhere which manifests itself after a long period of heavy
use?  Sometimes the driver resets the bus and things continue, but only for
an hour or so before the machine panics tdue to some disk error, either a
read or write error.  A reboot always fixes the problem for another 10 days
or so.  We've tried changing esp cards with no luck.

Any thoughts and/or ideas would be greatly appreciated.
-thanks
-Brian