Subject: Re: asc_intr: ignoring strange interrupt
To: None <port-pmax@netbsd.org>
From: Michael L. Hitch <mhitch@lightning.msu.montana.edu>
List: port-pmax
Date: 02/09/2000 10:06:47
On Wed, 9 Feb 2000, NetBSD Bob wrote:

> > Jan 29 15:28:33 spade /netbsd: asc_intr: ignoring strange interrupt tc 984 fifo residue 3 script 1
> 
> I get the same errors on DS5000/200s with 64M ram and PMAG-BA or PMAG-C.
> 
> > They start immediately after boot and the numbers in the message change.
> > I those errors down in times of 1.3G and vaguely remeber them to cause data
> > loss, so this makes all -current kernels unusable for me - and I would love to
> > run IPv6 :-)
> > I have a DS5000/200 with a serial console.
> 
> I have not had any problems with data loss that I can tell.  Been running
> 1.4P on them for a month or so.  It seems to occur only on heavy disk I/O,
> and not any other I/O.   I don't get the error generally after boots, and
> only rarely on normal operation.  It seems to occur when the disks are doing
> disk intensive compiles like kernels or gcc or something like that, or large
> tarball unrolls.  It seems like a random occurance.
> 
> Anyone else know anything.

  That error message comes from a strange condition seen by the NetBSD
driver as well as the Mach driver (from which the NetBSD driver was
derived).  The drive appears to done a disconnect, has reconnected, and
the DMA transfer restarted.  Then a "bus service" interrupt occurs, but
the DMA transfer is still in progress and appears to be functioning
normally.  If I remember correctly, the 'tc' value is the remaining byte
count of the DMA transfer, and will vary depending on how far the transfer
has gotten by the time the interrupt occurs and what the original transfer
request size was.  The Mach driver was just noting the interrupt and
'ignores' it.  The NetBSD driver also just notes the interrupt occurred
and continues.

  As far as I have been able to tell, there has never been any data loss
from this interrupt.  Please note that there are a number of other
"asc_inter:" messages which do indicate potential problems.  In the past,
some of these have indeed caused data lossage, but most of these I have
been able to work around.

  During my testing of the MI SCSI code using the ncr53c9x driver, I have
not seen any similar problems with the ncr53c9x driver, even on machines
that I have been able to duplicate a lot of the problems in the current
ASC driver.

--
Michael L. Hitch			mhitch@montana.edu
Computer Consultant
Information Technology Center
Montana State University	Bozeman, MT	USA