Subject: kern/4460: Some SCSI devices will not work (sleep endlessly)
To: None <gnats-bugs@gnats.netbsd.org>
From: Hiroshi HORIMOTO <horimoto@cs-yuugao.cs.sist.ac.jp>
List: netbsd-bugs
Date: 11/11/1997 01:05:36
>Number: 4460
>Category: kern
>Synopsis: Some SCSI devices will not work (sleep endlessly)
>Confidential: no
>Severity: serious
>Priority: high
>Responsible: kern-bug-people (Kernel Bug People)
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Mon Nov 10 08:20:04 1997
>Last-Modified:
>Originator: Hiroshi HORIMOTO
>Organization:
Shizuoka Institute of Science and Technology, Japan.
>Release: NetBSD-current Nov. 10, 1997
>Environment:
X68030 with Quantum XP32150, Quantum Lightning 730S, SONY CDU-76S
System:
NetBSD silpheed NetBSD 1.3_ALPHA (SILPHEED) #57: Fri Nov 7 22:17:56 JST 1997
root@silpheed:/usr/src/sys/arch/x68k/compile/SILPHEED x68k
>Description:
In /sys/dev/scsipi/scsipi_base.c lines 382-392...
382: switch (scsipi_command_direct(xs)) {
383: case SUCCESSFULLY_QUEUED:
384: if ((xs->flags & (SCSI_NOSLEEP | SCSI_POLL)) == SCSI_NOSLEEP)
385: return (EJUSTRETURN);
386:#ifdef DIAGNOSTIC
387: if (xs->flags & SCSI_NOSLEEP)
388: panic("scsipi_execute_xs: NOSLEEP and POLL");
389:#endif
390: s = splbio();
391: while ((xs->flags & ITSDONE) == 0)
392: tsleep(xs, PRIBIO + 1, "scsipi_cmd", 0);
393: splx(s);
394: case COMPLETE: /* Polling command completed ok */
While flag `ITSDONE' is asserted in interrupt handler, `xs->flags' is not
declared to be volatile.
Therefore, in compiling with -fforce-mem (-O2 includes this), `xs->flags'
is cached into a register at line 384. Then the cached data is used in first
check of while-loop (line 391).
This is the step-flow to enter endless sleep:
1. scsipi_command_direct(xs). (line 382)
2. Returned with SUCCESSFULLY_QUEUED. (and started to access device)
3. xs->flags is CACHED and tested. (line 384)
4. BEFORE splbio(), accessing device is done. Dispatched interrupt.
5. Asserted ITSDONE in interrupt. (but not changed cached data!)
6. wakeup(xs) in scsipi_done() in interrupt.
7. splbio() and test xs->flags using cached data. (line 390-391)
8. First check is failed, then tsleep(xs, ...). (line 392)
9. No one awake the `tsleep'...
>How-To-Repeat:
Depends on devices, host machines, and codes generated by compiler.
>Fix:
There are three choices:
1) Compile with `-fno-force-mem' for /sys/dev/scsipi/scsipi_base.c. :-)
2) Declare scsipi_xfer.flags to be `volatile'. (scsipiconf.h line 223)
3) Do `volatile' access in that point. (apply below)
--- scsipi_base.c.orig Sun Oct 19 09:24:35 1997
+++ scsipi_base.c Mon Nov 10 11:09:20 1997
@@ -388,7 +388,7 @@
panic("scsipi_execute_xs: NOSLEEP and POLL");
#endif
s = splbio();
- while ((xs->flags & ITSDONE) == 0)
+ while ((*(__volatile int *)(&(xs->flags)) & ITSDONE) == 0)
tsleep(xs, PRIBIO + 1, "scsipi_cmd", 0);
splx(s);
case COMPLETE: /* Polling command completed ok */
>Audit-Trail:
>Unformatted: