amiga-dev: Re: More info on the WarpEngine SCSI problem

Subject: Re: More info on the WarpEngine SCSI problem
To: Tom Hayko <tjhayko@io.org>
From: Michael L. Hitch <osymh@gemini.oscs.montana.edu>
List: amiga-dev
Date: 06/29/1994 09:27:22
On Jun 29,  1:59am, Tom Hayko wrote:
> --- start of output ---
> 
> SIOP: Parity Error
> siopchkintr: target 4
> scripts 80bcff0 ds 8211028 regs 40040000 dsp 80bd0f8 dcmd 1f00000
> 
> (the next line scrolled off the screen because the line below kept
>  repeating itself, but I did managed to see sstat0 1)
> 
> waiting: tgt 4 cmd 03 sbcl 21 dsp 8010d1b0 (+1c0) dcmd 19000020 ds 559028
> --- end of output ---

  Hmm, it doesn't look like the SCSI driver got things reset properly after
the error.

> and the las line kept scrolling on for a couple of minutes until I did a
> control-amiga-amiga and rebooted.  I'm pretty sure that before, the first
> message was not a Parity Error.

  I'll bet that a parity error was was occurred before - the driver just
wasn't checking for that error and called panic.  The current driver now
prints out what the error status is, and is supposed to reset the SCSI
chip (and SCSI bus) and return an error condition on the I/O that was in
progress.  It looks like the drive hangs up waiting for the request sense
command to respond (and I don't think I currently have a limit check on
that wait - I'm not sure what I would do in this case, since the reset
doesn't seem to have cleared things.)

> Once I rebooted, I tried to do and fsck on the device that I was untarring
> to at the time and I got the following:
> 
> --- start of output ---
> 
> siopchkintr: istat a dstat 80 sstat0 1 dsps 8211412 dsa 8211028 sbcl 67 sts ff msg 3
> wesc0: siop id 7 reset

  The sstat0 value of 1 indicates a Parity error, which is what was reported
previously.

> he screen in the previous message.  I never did get dropped into the kernel
> debugger, so I wasn't able to get a symbolic stack trace.  

  A stack trace isn't going to tell me much, if anything.  The siopchkintr()
routine is being called to check the completion status of an I/O operation,
and that status is being reported as a parity error (from the 53c710 chip).
Not being able to recover and continue after that is a problem that I need
to try and fix.  That might not be so simple since I don't have any way
to duplicate the error.

> Is this possibly caused by an incompatibility between the drive I'm using
> and the WarpEngine/NetBSD combo?  Could it be that the synchronous
> negotiation is too quick for the drive (I'm really guessing here because
> it's about the only thing that I change on the WarpEngine).

  Well, you can supress the sync negotiation if you want to give that a
try. There is a byte array named _siop_inhibit_sync that currently
contains 0. Just patch byte 4 of that array to a non-zero value and it
should run asynchronous mode.

> I'm running Michael Hitch's kernel with AGA support dated 940620 from
> ftp.coe.montana.edu.

  The 940620 date just indicates which set of -current sources my kernel
is based on.  I've gone through a number of revisions locally - you would
need to look at the version message printed at bootup to see which local
revision of the kernel it is and the date & time it was compiled.

Michael

-- 
Michael L. Hitch			INTERNET:  osymh@montana.edu
Computer Consultant			BITNET:  OSYMH@MTSUNIX1.BITNET
Office of Systems and Computing Services
Montana State University	Bozeman, MT	USA

------------------------------------------------------------------------------