Subject: Panic on scsi retry
To: None <port-sparc@sun-lamp.cs.berkeley.edu>
From: Rolf Grossmann <grossman@informatik.tu-muenchen.de>
List: port-sparc
Date: 08/02/1994 20:33:21
Hello,

I've hit a problem, that I don't understand: First up, here are
the kernel messages:

sd0: sdintr scsi status 0x2 resid 0
sd0: scsi sense class 7, code 0, key 1, blk 1079344
sd0: retry 1
panic: espintr sq

The panic happens every time I do a grep <something> /usr/src/sys/kern/*, i.e.
I read every file in that directory. It seems to be an almost bad block on
the disk, that can't be read on the first try (The disk is quite old, so
that's likely).
But I don't understand, why the system panics whenever this happens (I can
reproduce it!). I've had a look at the code and it looks to me, like
something goes wrong with the interrupts.

Could anybody clear things up for me or help me to investigate the problem?

I'm using NetBSD-current/sparc 1.0_BETA. I've completely recompiled the
kernel (after a make clean).

Thanks in advance,
	Rolf

P.S: Here is what I think is going on:
In scsi/sd.c:sdintr() the scsi driver detects that there was an error with
the last transfer, sees that it should be retried and restarts the transfer,
i.e. it calls (via table lookup) sbus/esp.c:espstart() which enqueues this
transfer (sc->sc_hba.hba_busy must still be 1). Next, sdintr() returns and
the enqueued transfer is started. A moment later, the disk is done and
espintr() is called. Here is to be determined, if the interrupt is handled
here and what to do. So espact() is called to determine what to do (and
eventually handle the interrupt itself). This must return ACT_DONE, because
otherwise there would be another message before the panic. Now 
sc->sc_hba.hba_busy is checked and it seems to be not set, so the machine
panics. But why is this? Any hints?

------------------------------------------------------------------------------