Subject: kern/2231: ncr53c810 pci scsi driver hangs system frequently
To: None <gnats-bugs@NetBSD.ORG>
From: James E. Bernard <jbernard@geek.mines.edu>
List: netbsd-bugs
Date: 03/17/1996 12:07:51
>Number:         2231
>Category:       kern
>Synopsis:       ncr53c810 pci scsi driver gets into a failure loop and hangs system
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people (Kernel Bug People)
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Mar 17 14:20:01 1996
>Last-Modified:
>Originator:     Jim Bernard
>Organization:
	Speaking for myself
>Release:        1.1
>Environment:
System: NetBSD zoo 1.1 NetBSD 1.1 (ZOO) #0: Sun Dec 31 21:06:09 MST 1995 local@zoo:/home/local/netbsd-1.1/usr/src/sys/arch/i386/compile/ZOO i386
The cpu is a 100 MHz Pentium.
SCSI devices include: Quantum Atlas XP32150 and Toshiba XM-3601 CD-ROM drive:
/netbsd: ncr0 targ 0 lun 0: <Quantum, XP32150, 81HB> SCSI2 0/direct fixed
/netbsd: sd0 at scsibus0sd0(ncr0:0:0): FAST SCSI-2 100ns (10 Mb/sec) offset 8.
/netbsd: : 2050MB, 3907 cyl, 10 head, 107 sec, 512 bytes/sec
/netbsd: ncr0 targ 2 lun 0: <TOSHIBA, CD-ROM XM-3601TA, 0725> SCSI2 5/cdrom removable
/netbsd: cd0 at scsibus0cd0(ncr0:2:0): asynchronous.
Root, swap, /usr, and /var are on the Atlas.  User files are nfs mounted from
another machine.  An ide disk is present and mounted, but not normally
accessed.


>Description:
	From time to time (varying from a minimum of about 1/2 hour to a maximum
	of about 1 week) the scsi driver goes into a loop, with the following
	error messages printed on the console:

	  assertion "cp" failed: file "../../../../dev/pci/ncr.c", line 5577
	  sd0(ncr0:0:0): COMMAND FAILED (4 28) @f87d2800.

	repeatedly (it does this forever).  In this state, the disk (controller)
	activity light is on continuously, and no action involving the disk can
	be taken until the system is restarted.  Note that it is not clear which
	of the messages above comes first, since I've never been present and
	watching the console (i.e., not running X) when the problem starts.
	Also, it is difficult to read the messages, since they overwrite each
	other so fast, but I think they are correct.  (Nothing, of course, is
	written to the console log on disk.)

	I have not been able to associate this problem with any particular
	system activity.  Indeed, it has never happened while I was at the
	machine, and the machine is usually fairly inactive (except for uucp
	and pop mail transfers) when I am away from it.  I have copied as much
	as 1.5 GB of data to the disk in a fairly short time, with no problems,
	so disk activity does not seem to cause it.

	Perusal of the ncr code turned up the following: the two numbers (4 28)
	refer to host status and scsi status, respectively, the former represented
	in the code by the cpp symbol HS_COMPLETE, and the latter by S_QUEUE_FULL.
>How-To-Repeat:
	Prescription unknown; all I have to do is wait.
>Fix:
	Unknown.

>Audit-Trail:
>Unformatted: