Subject: Re: -current on Ultra 5+ - now it's major siop0 lossage
To: Greg Earle <earle@isolar.DynDNS.ORG>
From: Jim Bernard <jbernard@mines.edu>
List: current-users
Date: 01/27/2001 13:51:23
On Sat, Jan 27, 2001 at 04:56:15AM -0800, Greg Earle wrote:
> I wrote:
> > During the /etc/daily run, with lots of "find"-ing going on, the same
> > DMA errors came back.  Back to the drawing board ... will try disabling
> > tagged queuing next.
> 
> (Last message everyone, promise)
> 
> Built a new kernel with the previous siop changes and tagged queuing disabled.
> While trying to make a whole new kernel (changed config file), it happened
> again:
> 
> DMA IRQ: bus fault dma fifo empty, DSP=0x3b0 DSA=0xc006c7e0: last msg_in=0x0 status=0xff
> siop0: scsi bus reset
> cmd 0x1a3e340 (target 0:0) in reset list
> cmd 0x1a3e340 (status 2) about to be processed
> siop0: target 0 not synchronous at 20.0Mhz, offset 16
> DMA IRQ: bus fault, DSP=0x3e0 DSA=0xc006c8dc: last msg_in=0x4 status=0xff
> 
> It then hung here for some time, finally followed by a
> 
> sd0(siop0:0:0): command timeout
> 
> And now it's hung completely.  I'm going back to my stable 1.5-based kernel ...
> 
> 	- Greg


  I saw something similar on an i386 using siop a couple of weeks ago (this was
the first time I had tried the siop driver).  I ran the system with it for a
couple of days, and saw several instances of this problem in the logs.  One
time when it happened I was working at the console, and it froze, but did
eventually recover.  I haven't had time to investigate, so I just switched
back to the ncr driver.

Here is an example of typical log messages from such an event:

Jan 16 12:28:39 zoo /netbsd: sd0(siop0:0:0): command timeout
Jan 16 12:28:39 zoo /netbsd: siop0: scsi bus reset
Jan 16 12:28:39 zoo /netbsd: cmd 0xc04258c0 (target 0:0) in reset list
Jan 16 12:28:39 zoo /netbsd: cmd 0xc0425a80 (target 0:0) in reset list
Jan 16 12:28:39 zoo /netbsd: cmd 0xc04259c0 (target 0:0) in reset list
Jan 16 12:28:39 zoo /netbsd: cmd 0xc0425900 (target 0:0) in reset list
Jan 16 12:28:39 zoo /netbsd: cmd 0xc04258c0 (status 2) about to be processed
Jan 16 12:28:39 zoo /netbsd: cmd 0xc0425a80 (status 2) about to be processed
Jan 16 12:28:39 zoo /netbsd: cmd 0xc04259c0 (status 2) about to be processed
Jan 16 12:28:39 zoo /netbsd: cmd 0xc0425900 (status 2) about to be processed
Jan 16 12:28:39 zoo /netbsd: siop0: target 0 now synchronous at 10.0Mhz, offset 8

This was a 1.5Q kernel built from sources updated Jan. 13.

Possibly relevant boot messages with that kernel:

Jan 13 11:41:26 zoo /netbsd: NetBSD 1.5Q (ZOO-$Revision: 1.57 $) #0: Sat Jan 13 11:26:38 MST 2001
Jan 13 11:41:26 zoo /netbsd:     jim@zoo:/home/tmp/compile/sys/arch/i386/compile/ZOO
Jan 13 11:41:26 zoo /netbsd: cpu0: Intel Pentium (P54C) (586-class), 99.48 MHz
Jan 13 11:41:26 zoo /netbsd: cpu0: features 1bf<FPU,VME,DE,PSE,TSC,MSR,MCE,CX8>
...
Jan 13 11:41:27 zoo /netbsd: siop0 at pci0 dev 11 function 0: Symbios Logic 53c810 (fast scsi)
Jan 13 11:41:27 zoo /netbsd: siop0: interrupting at irq 10
Jan 13 11:41:27 zoo /netbsd: scsibus0 at siop0: 8 targets, 8 luns per target
...
Jan 13 11:41:28 zoo /netbsd: scsibus0: waiting 2 seconds for devices to settle...
Jan 13 11:41:28 zoo /netbsd: siop0: target 0 using tagged queuing
Jan 13 11:41:28 zoo /netbsd: sd0 at scsibus0 target 0 lun 0: <Quantum, XP32150, 81HB> SCSI2 0/direct fixed
Jan 13 11:41:29 zoo /netbsd: siop0: target 0 now synchronous at 10.0Mhz, offset 8
Jan 13 11:41:29 zoo /netbsd: sd0: 2050 MB, 3907 cyl, 10 head, 107 sec, 512 bytes/sect x 4199760 sectors
Jan 13 11:41:29 zoo /netbsd: siop0: target 1 using tagged queuing
Jan 13 11:41:29 zoo /netbsd: sd1 at scsibus0 target 1 lun 0: <iomega, jaz 1GB, J^77> SCSI2 0/direct removable
Jan 13 11:41:29 zoo /netbsd: siop0: target 1 now synchronous at 10.0Mhz, offset 8
Jan 13 11:41:29 zoo /netbsd: sd1: drive offline
Jan 13 11:41:29 zoo /netbsd: cd0 at scsibus0 target 2 lun 0: <TOSHIBA, CD-ROM XM-3601TA, 0725> SCSI2 5/cdrom removable
Jan 13 11:41:29 zoo /netbsd: siop0: target 2 asynchronous
Jan 13 11:41:29 zoo /netbsd: boot device: wd0


Possibly relevant boot messages with the ncr driver (and a more recent kernel):

NetBSD 1.5Q (ZOO-$Revision: 1.59 $) #0: Sat Jan 20 21:56:29 MST 2001
cpu0: Intel Pentium (P54C) (586-class), 99.48 MHz
cpu0: features 1bf<FPU,VME,DE,PSE,TSC,MSR,MCE,CX8>
...
ncr0 at pci0 dev 11 function 0: ncr 53c810 fast10 scsi
ncr0: interrupting at irq 10
scsibus0 at ncr0: 8 targets, 8 luns per target
...
scsibus0: waiting 2 seconds for devices to settle...
sd0 at scsibus0 target 0 lun 0: <Quantum, XP32150, 81HB> SCSI2 0/direct fixed
sd0(ncr0:0:0): 10.0 MB/s (100 ns, offset 8)
sd0: 2050 MB, 3907 cyl, 10 head, 107 sec, 512 bytes/sect x 4199760 sectors
sd1 at scsibus0 target 1 lun 0: <iomega, jaz 1GB, J^77> SCSI2 0/direct removable
sd1(ncr0:1:0): 10.0 MB/s (100 ns, offset 8)
sd1: drive offline
cd0 at scsibus0 target 2 lun 0: <TOSHIBA, CD-ROM XM-3601TA, 0725> SCSI2 5/cdrom removable
probe(ncr0:2:1): 4.0 MB/s (250 ns, offset 8)
boot device: wd0