Subject: port-macppc/22314: Adaptec SCSI cards do not work under load on macppc
To: None <gnats-bugs@gnats.netbsd.org>
From: None <john@sixgirls.org>
List: netbsd-bugs
Date: 07/30/2003 23:51:36
>Number:         22314
>Category:       port-macppc
>Synopsis:       macppc systems crash when heavily using any Adaptec SCSI card.
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    port-macppc-maintainer
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Jul 31 03:52:00 UTC 2003
>Closed-Date:
>Last-Modified:
>Originator:     John Klos
>Release:        NetBSD 1.6.1_STABLE
>Organization:
Sixgirls Computing Labs
	
>Environment:
	
	
System: NetBSD andromeda.ziaspace.com 1.6.1_STABLE NetBSD 1.6.1_STABLE (ANDROMEDA-$Revision: 1.625 $) #1: Fri Jul 11 11:27:07 EDT 2003     root@andromeda.sixgirls.org:/usr/src/sys/arch/macppc/compile/ANDROMEDA macppc
Architecture: powerpc
Machine: macppc
>Description:
	
When using a macppc system with any of a number of Adaptec SCSI cards 
(2930U, 2940U, 2940UW, 2940U2W), the system crashes when there is heavy
SCSI traffic running.

SCSIRATE == 0x0
sd0(ahc0:0:0:0): no longer in timeout, status = 0
sd0: async, 8-bit transfers, tagged queueing
ahc0: Issued Channel A Bus Reset. 16 SCBs aborted
sd0: sync (25.0ns offset 63), 16-bit (80.000MB/s) transfers, tagged queueing

andromeda: {17} sd0(ahc0:0:0:0): SCB 13 - timed out in Command phase, SEQADDR ==
 0x166
 SCSIRATE == 0x93
 sd0(ahc0:0:0:0): BDR message in message buffer
 ahc0:A:0: unknown scsi bus phase b6.  Attempting to continue
 sd0(ahc0:0:0:0): SCB 11 - timed out while idle, SEQADDR == 0x42
 SCSIRATE == 0x0
 sd0(ahc0:0:0:0): no longer in timeout, status = 0
 sd0: async, 8-bit transfers, tagged queueing
 ahc0: Issued Channel A Bus Reset. 16 SCBs aborted
 sd0: sync (25.0ns offset 63), 16-bit (80.000MB/s) transfers, tagged queueing
 sd0(ahc0:0:0:0): SCB f - timed out while idle, SEQADDR == 0x155
 SCSIRATE == 0x0
 sd0(ahc0:0:0:0): SCB f: Immediate reset.  Flags = 0x4040
 sd0(ahc0:0:0:0): no longer in timeout, status = 0
 sd0: async, 8-bit transfers, tagged queueing
 ahc0: Issued Channel A Bus Reset. 16 SCBs aborted
 sd0: sync (25.0ns offset 63), 16-bit (80.000MB/s) transfers, tagged queueing
 sd0(ahc0:0:0:0): SCB 16 - timed out in Command phase, SEQADDR == 0x165
 SCSIRATE == 0x93
 sd0(ahc0:0:0:0): BDR message in message buffer
 ahc0:A:0: unknown scsi bus phase b6.  Attempting to continue
 sd0(ahc0:0:0:0): SCB 13 - timed out while idle, SEQADDR == 0x43
 SCSIRATE == 0x0
 panic: Disconnected List inconsistency. SCB index == 255, yet numscbs == 30.
 Begin traceback...
 0x0038df18: at ahc_search_disc_list+11c
 0x0038df58: at ahc_abort_scbs+1a4
 0x0038df98: at ahc_reset_channel+274
 0x0038dfe8: at ahc_timeout+290
 0x0038e018: at softclock+1ec
 0x0038e048: at hardclock+6c4
 0x0038e068: at decr_intr+104

 -------------------------------------

andromeda# sd1(ahc0:0:1:0): SCB 17 - timed out in Message-out phase, SEQADDR ==
0x165
SCSIRATE == 0x93
sd1(ahc0:0:1:0): BDR message in message buffer

 -------------------------------------

sd2(ahc0:0:2:0): SCB 18 - timed out in Data-out phase, SEQADDR == 0x1
0e
SCSIRATE == 0xf
sd2(ahc0:0:2:0): BDR message in message buffer
sd2(ahc0:0:2:0): no longer in timeout, status = 0
sd2(ahc0:0:2:0): Unexpected busfree in Message-out phase
SEQADDR == 0x151
sd2(ahc0:0:2:0): generic HBA error
/usr/web: got error 5 while accessing filesystem
panic: softdep_deallocate_dependencies: unrecovered I/O error
Begin traceback...
0xdd964ed0: at softdep_deallocate_dependencies+40
0xdd964ee0: at brelse+10c
0xdd964ef0: at biodone+c4
0xdd964f00: at scsipi_complete+428
0xdd964f30: at scsipi_completion_thread+80
0xdd964f50: at fork_trampoline+10
End traceback...
syncing disks...

-------------------------------------

sd1(ahc0:0:1:0): SCB 3a - timed out while idle, SEQADDR == 0x185
SCSIRATE == 0x0
sd1(ahc0:0:1:0): SCB 3a: Immediate reset.  Flags = 0x4040
sd1(ahc0:0:1:0): no longer in timeout, status = 0
sd0: async, 8-bit transfers, tagged queueing
sd1: async, 8-bit transfers, tagged queueing
sd2: async, 8-bit transfers, tagged queueing
ahc0: Issued Channel A Bus Reset. 16 SCBs aborted
sd0: sync (50.0ns offset 15), 8-bit (20.000MB/s) transfers, tagged queueing
sd2: sync (50.0ns offset 15), 8-bit (20.000MB/s) transfers, tagged queueing
sd1: sync (50.0ns offset 15), 8-bit (20.000MB/s) transfers, tagged queueing
Jul 18 20:31:48 andromeda /netbsd: sd1(ahc0:0:1:0): SCB 3a - timed out while idl
e, SEQADDR == 0x185
Jul 18 20:31:48 andromeda /netbsd: SCSIRATE == 0x0
Jul 18 20:31:48 andromeda /netbsd: sd1(ahc0:0:1:0): SCB 3a: Immediate reset.  Fl
ags = 0x4040
Jul 18 20:31:48 andromeda /netbsd: sd1(ahc0:0:1:0): no longer in timeout, status
 = 0
 Jul 18 20:31:48 andromeda /netbsd: sd0: async, 8-bit transfers, tagged queueing
 Jul 18 20:31:48 andromeda /netbsd: sd1: async, 8-bit transfers, tagged queueing
 Jul 18 20:31:48 andromeda /netbsd: sd2: async, 8-bit transfers, tagged queueing
 Jul 18 20:31:48 andromeda /netbsd: ahc0: Issued Channel A Bus Reset. 16 SCBs abo
 rted
 Jul 18 20:31:48 andromeda /netbsd: sd0: sync (50.0ns offset 15), 8-bit (20.000MB
 /s) transfers, tagged queueing
 Jul 18 20:31:49 andromeda /netbsd: sd2: sync (50.0ns offset 15), 8-bit (20.000MB
 /s) transfers, tagged queueing
 Jul 18 20:31:49 andromeda /netbsd: sd1: sync (50.0ns offset 15), 8-bit (20.000MB
 /s) transfers, tagged queueing
 Jul 18 20:40:19 andromeda su: john to root on /dev/ttyp0
 sd1(ahc0:0:1:0): SCB 14 - timed out in Data-out phase, SEQADDR == 0x110
 SCSIRATE == 0xf
 sd1(ahc0:0:1:0): BDR message in message buffer
 sd1(ahc0:0:1:0): no longer in timeout, status = 0
 sd1(ahc0:0:1:0): Unexpected busfree in Message-out phase
 SEQADDR == 0x153
 sd1(ahc0:0:1:0): generic HBA error
 sd1(ahc0:0:1:0): generic HBA error
 /usr/sandbox: got error 5 while accessing filesystem
 panic: softdep_deallocate_dependencies: unrecovered I/O error
 Begin traceback...
 0xdd965ed0: at softdep_deallocate_dependencies+40
 0xdd965ee0: at brelse+10c
 0xdd965ef0: at biodone+c4
 0xdd965f00: at scsipi_complete+428
 0xdd965f30: at scsipi_completion_thread+80
 0xdd965f50: at fork_trampoline+10
 End traceback...
 syncing disks...
>How-To-Repeat:
	
I started cvs fetches on two hard drives and started a few copies from 
several hard drives to others. This problem was repeatable with all of the
cards and with two different sets of drives.
>Fix:
	
None at this time.
>Release-Note:
>Audit-Trail:
>Unformatted: