port-mac68k: ...and still more annoying SCSI errors

Subject: ...and still more annoying SCSI errors
To: None <port-mac68k@NetBSD.ORG>
From: Dennis Eberl <drw@adelphia.net>
List: port-mac68k
Date: 01/20/1998 15:50:50
Hi everyone,

I'm new to Unix and have been lurking as I brought up a small Ethernet LAN
consisting of a 486 running Caldera's flavor of Linux and a Mac Quadra 800
running v1.3 NetBSD. The problem I will describe occurred in exactly the
same form when running v1.2 NetBSD. (I did a clean install of v1.3.)
Specifically, I am using a Mac Quadra 800 with 1GB HD, 40 MB RAM, the stock
CD-ROM drive, and a Zip drive.

In a nutshell, I am getting SCSI error messages that always result in a
SCSI bus reset. While all of this is happening (typically several minutes)
the console is frozen. The problem seems benign enough in that I have found
no file damage and, indeed, in the middle of editing a file with vi the
SCSI monster did its little dance and then politely returned me safe and
sound to where I left off in vi.

Some context might be helpful. This is the chunck of boot messages I get
regarding what NetBSD finds on my SCSI bus at boot time. I have a small mac
os partition (~ 100 MB)on the DEC DSP3107LS 1GB HD with the reamainder
partitioned as an 80 MB NetBSD Swap partition and the remainder partitioned
(and properly formated as ) NetBSD Root & Usr (I think that's what it's
called, if memory serves...).

   ...
   esp0 at obio0 (quick): address 0x897000 : NCR53C96, 16MHz, SCSI ID 7
   scsibus0 at esp0: 8 targets
   sd0 at scsibus0 targ0 lun0: <DEC, DSP3107LS, 440C> SCSU2 0/direct fixed
   sd0: 1021MB, 3117 cyl, 8 head, 83 sec, 512 bytes/sec x 2091144 sectors
   cd0 at scsibus0 targ3 lun0: <SONY, CD-ROM CDU-8003A, 1.9a> SCSI2 5/cdrom
removable
   sd1 at scsibus0 targ5 lun0: <IOMEGA, ZIP 100, D.08> SCSI2 0/direct removable
   sd1: 96MB, 96 cyl, 64 head, 32 sec, 512 bytes/sect x 196608 sectors
   ...

All this seems normal to me, but I include it in case it contains some
information you will need (e.g., NCR controller chip number, HD drive
manufacturer/model, etc.).

The disruption always begins with the following:

   # dmaintr: discarded 32 b (last transfer was 7664 b).
   esp0: !TC [intr 10, stat 83, step 4] prevphase 0, resid 1df0

After a short time it is followed by a sequence of messages culminating in
a reset of the SCSI bus as shown below.

   sd0(esp0:0:0): esp0: timed out [ecb 0x6b9209e (flags 0x3, dleft 1df0,
stat0)], <stat 4, nexus 0xcb9209e,
    phase(c 3, p 3 ), resid 0, msg(q 0, o 0) >
   sd0(esp0:0:0): esp0: timed out [ecb 0x6b9209e (flags 0x43, dleft 1df0,
stat0)], <stat 4, nexus 0xcb9209e
   , phase(c 3, p 3 ), resid 0, msg(q 20, o 0) > AGAIN
   esp0: SCSI bus reset
   Dec 29 14:45:08 /netbsd: dmaintr: discarded 32 b (last transfer was 7664 b).
   Dec 29 14:47:17 /netbsd: esp0: !TC [intr 10, stat 83, step 4] prevphase
0, resid 1df0
   Dec 29 14:47:17 /netbsd: sd0(esp0:0:0): esp0: timed out [ecb 0x6b9209e
(flags 0x3, dleft 1df0, stat0)],
    <stat 4, nexus 0xcb9209e, phase(c 3, p 3 ), resid 0, msg(q 0, o 0) >
   Dec 29 14:47:17 /netbsd: sd0(esp0:0:0): esp0: timed out [ecb 0x6b9209e
(flags 0x43, dleft 1df0, stat0)],
    <stat 4, nexus 0xcb9209e, phase(c 3, p 3 ), resid 0, msg(q 20, o 0) > AGAIN
   Dec 29 14:47:17 /netbsd: esp0: SCSI bus reset

Since the Quadra 800 is an '040 machine I am using the appropriate kernel.
(I tried the kernel designed for '030 machines with SCSI problems and found
no difference -- i.e., same problem.) Note, however, that the problem DOES
NOT occur when I load OpenBSD on the Quadra. (Unfortunately, hangman
doesn't run properly under OpenBSD -- perhaps its afraid of Theo? -- which
(among other things) brought me back to NetBSD and your congenial little
group.

Gee, I'm having a hell of a lot of fun, but this one I just don't know how
to solve. Help?

Gratefully,

Dennis Eberl

If anyone can explain this to me and offer a fix, I would be most grateful.
Frankly, I am having a ball learning Unix.