Port-mips archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: scsi timeouts on sgimips indy + zuluscsi



On Tue, Nov 18, 2025, 5:51 PM Adrian Chadd <adrian%freebsd.org@localhost> wrote:

> hi!
>
> I'm working on the error handling in the wdc driver when it's talking with
> this zuluscsi SD SCSI emulator thing. (In parallel I'm trying to figure out
> if I can configure the zuluscsi to hang less.)
>
> Anyway, I'm looking for someone who has any idea about ye olde scsi
> chipsets and the right way to handle this stuff.
>
> Here's an example of it hanging and not resetting:
>
> ...
>
>      Status: Command failed
>     Command: /sbin/mount -rt cd9660 /dev/cd0a /mnt2
>      Hit enter to continue[ 177.9141033] sd0(wdsc0:0:1:0): wdsc0: timed
> out; asr=0x20 [acb 0x97ec4fa8 (flags 0x1, dleft 20)],
>  <state 5, nexus 0x97ec4fa8, resid 20, msg(q 0,o 0)>sd0(wdsc0:0:1:0): ABORT
> in timeout: csr=0xff, asr=0x20
> [ 178.1254838] sd0(wdsc0:0:1:0): sending ABORT command
> [ 178.1840412] sd0(wdsc0:0:1:0): Resetting bus
> [ 180.1885505] sd0(wdsc0:0:1:0): wdsc0: timed out; asr=0x00 [acb 0x97ec4fa8
> (flags 0x41, dleft 20)], <state 8, nexus 0x97ec4f
> a8, resid 20, msg(q 0,o 0)>sd0(wdsc0:0:1:0): ABORT in timeout: csr=0x01,
> asr=0x00
> [ 180.4009330] sd0(wdsc0:0:1:0): sending ABORT command
> [ 180.4594968] sd0(wdsc0:0:1:0): sending DISCONNECT to target
> [ 183.4353722] wd33c93_wait: TIMEO @959 with asr=x0 csr=x1
> [ 186.4119961] wd33c93_wait: TIMEO @959 with asr=x0 csr=x1
> [ 189.3885164] wd33c93_wait: TIMEO @959 with asr=x0 csr=x1
> [ 192.3650359] wd33c93_wait: TIMEO @959 with asr=x0 csr=x1
> [ 195.3415581] wd33c93_wait: TIMEO @959 with asr=x0 csr=x1
> [ 198.3180772] wd33c93_wait: TIMEO @959 with asr=x0 csr=x1
> [ 201.2945987] wd33c93_wait: TIMEO @959 with asr=x0 csr=x1
> [ 204.2711138] wd33c93_wait: TIMEO @959 with asr=x0 csr=x1
> [ 207.2476389] wd33c93_wait: TIMEO @959 with asr=x0 csr=x1
> [ 210.2241586] wd33c93_wait: TIMEO @959 with asr=x0 csr=x1
> [ 213.2006735] wd33c93_wait: TIMEO @959 with asr=x0 csr=x1
> [ 216.1770920] wd33c93_wait: TIMEO @959 with asr=x0 csr=x1
> [ 219.1536130] wd33c93_wait: TIMEO @959 with asr=x0 csr=x1
> [ 222.1302349] wd33c93_wait: TIMEO @959 with asr=x0 csr=x1
> [ 225.1068634] wd33c93_wait: TIMEO @959 with asr=x0 csr=x1
>
> ...
>
> Does anyone remember how the ye olde controller works enough to go through
> the driver and figure out what could be missing with the error handling /
> recovery?
>
> eg - I have one diff already to handle a NULL pointer inside the timeout
> routine:
>
> @@ -2298,7 +2304,7 @@ wd33c93_timeout(void *arg)
>                 /* We need to service a missed IRQ */
>                 wd33c93_intr(sc);
>         } else {
> -               (void) wd33c93_abort(sc, sc->sc_nexus, "timeout");
> +               (void) wd33c93_abort(sc, acb, "timeout");
>         }
>         splx(s);
>  }
>
> sc->sc_nexus is NULL after a disconnect, before the timeout fires, so that
> would panic. Is using acb there instead "correct" ?
>
> Thanks!
>

What's the CDB of the failing command? Is it the first one sent, or does
this happen at random? Go ahead with the eye roll on this one: are both
ends of the bus properly terminated? And it's the right kind (i remember
hassled from active vs passive). Timeout on mount always has me going
through all the basics since they aren't as top of mind as the were 30
years ago.

Warner

-adrian
>


Home | Main Index | Thread Index | Old Index