Subject: kern/4460: Some SCSI devices will not work (sleep endlessly)
To: None <gnats-bugs@gnats.netbsd.org>
From: Hiroshi HORIMOTO <horimoto@cs-yuugao.cs.sist.ac.jp>
List: netbsd-bugs
Date: 11/11/1997 01:05:36
>Number:         4460
>Category:       kern
>Synopsis:       Some SCSI devices will not work (sleep endlessly)
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    kern-bug-people (Kernel Bug People)
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Nov 10 08:20:04 1997
>Last-Modified:
>Originator:     Hiroshi HORIMOTO
>Organization:
Shizuoka Institute of Science and Technology, Japan.
>Release:        NetBSD-current Nov. 10, 1997
>Environment:
  X68030 with Quantum XP32150, Quantum Lightning 730S, SONY CDU-76S
System:
  NetBSD silpheed NetBSD 1.3_ALPHA (SILPHEED) #57: Fri Nov 7 22:17:56 JST 1997
  root@silpheed:/usr/src/sys/arch/x68k/compile/SILPHEED x68k


>Description:
In /sys/dev/scsipi/scsipi_base.c lines 382-392...

382:	switch (scsipi_command_direct(xs)) {
383:	case SUCCESSFULLY_QUEUED:
384:		if ((xs->flags & (SCSI_NOSLEEP | SCSI_POLL)) == SCSI_NOSLEEP)
385:			return (EJUSTRETURN);
386:#ifdef DIAGNOSTIC
387:		if (xs->flags & SCSI_NOSLEEP)
388:			panic("scsipi_execute_xs: NOSLEEP and POLL");
389:#endif
390:		s = splbio();
391:		while ((xs->flags & ITSDONE) == 0)
392:			tsleep(xs, PRIBIO + 1, "scsipi_cmd", 0);
393:		splx(s);
394:	case COMPLETE:		/* Polling command completed ok */

While flag `ITSDONE' is asserted in interrupt handler, `xs->flags' is not
declared to be volatile.
Therefore, in compiling with -fforce-mem (-O2 includes this), `xs->flags'
is cached into a register at line 384. Then the cached data is used in first
check of while-loop (line 391).

This is the step-flow to enter endless sleep:

1. scsipi_command_direct(xs). (line 382)
2. Returned with SUCCESSFULLY_QUEUED. (and started to access device)
3. xs->flags is CACHED and tested. (line 384)
4. BEFORE splbio(), accessing device is done. Dispatched interrupt.
5. Asserted ITSDONE in interrupt. (but not changed cached data!)
6. wakeup(xs) in scsipi_done() in interrupt.
7. splbio() and test xs->flags using cached data. (line 390-391)
8. First check is failed, then tsleep(xs, ...). (line 392)
9. No one awake the `tsleep'...

>How-To-Repeat:
Depends on devices, host machines, and codes generated by compiler.

>Fix:
There are three choices:

1) Compile with `-fno-force-mem' for /sys/dev/scsipi/scsipi_base.c. :-)

2) Declare scsipi_xfer.flags to be `volatile'.  (scsipiconf.h line 223)

3) Do `volatile' access in that point.  (apply below)

--- scsipi_base.c.orig	Sun Oct 19 09:24:35 1997
+++ scsipi_base.c	Mon Nov 10 11:09:20 1997
@@ -388,7 +388,7 @@
 			panic("scsipi_execute_xs: NOSLEEP and POLL");
 #endif
 		s = splbio();
-		while ((xs->flags & ITSDONE) == 0)
+		while ((*(__volatile int *)(&(xs->flags)) & ITSDONE) == 0)
 			tsleep(xs, PRIBIO + 1, "scsipi_cmd", 0);
 		splx(s);
 	case COMPLETE:		/* Polling command completed ok */

>Audit-Trail:
>Unformatted: