Subject: ahc & mpt scsi timeouts
To: None <tech-kern@NetBSD.org>
From: Tracy Di Marco White <netbsd@gendalia.org>
List: tech-kern
Date: 05/26/2006 23:36:43
I have a machine with 4 tape drives attached, each on their own scsi
chain, to do backups with.  I regularly get these timeouts, hanging
the process accessing a drive, and requiring me to restart the machine,
and causing problems with backups.  The tape drives are attached via
ahc(4) cards.  It also has two spool disks, attached via mpt(4).

I am running a not exactly new current at this point.  It is a
multiprocessor machine that I am running UP in hopes that it would
be more stable.  The only modification I have to the kernel is
that I doubled ST_IO_TIME in src/sys/dev/scsipi/stvar.h from
3 minutes to 6 minutes.

Is there something I can do to make these stop happening, and
allow backups to work more consistently?

The mpt timeouts only prevent me from booting, and if I reboot
it, possibly a few times, it'll eventually come up.  They look
like:

  probe(mpt0:0:0:0): command timeout
  mpt0: timeout on request index = 0xfe, seq = 0x00000068
  mpt0: Status 0x80000000, Mask 0x00000001, Doorbell 0x24000000
  mpt0: request state: On Chip
  probe(mpt0:0:1:0): command timeout

and are repeated over & over until I drop to the debugger
to reboot, or it finally drops to single user mode, unable
to mount the spool disks and I reboot it.  These don't happen
in any consistent fashion.

As for the adaptec timeouts, I don't see them in any consistent
fashion either.  I believe they're usually on commands where a
tape is being mounted for write.

ahc4:SCB 0xe - timed out
>>>>>>>>>>>>>>>>>> Dump Card State Begins <<<<<<<<<<<<<<<<<
ahc4: Dumping Card State while idle, at SEQADDR 0x9
Card was paused
ACCUM = 0x4, SINDEX = 0x17, DINDEX = 0x22, ARG_2 = 0x3
HCNT = 0x0 SCBPTR = 0x0
SCSIPHASE[0x0] SCSISIGI[0x0] ERROR[0x0] SCSIBUSL[0x0]
LASTPHASE[0x1] SCSISEQ[0x12] SBLKCTL[0xa] SCSIRATE[0x0]
SEQCTL[0x10] SEQ_FLAGS[0xc0] SSTAT0[0x0] SSTAT1[0x8]
SSTAT2[0x0] SSTAT3[0x0] SIMODE0[0x8] SIMODE1[0xa4]
SXFRCTL0[0x80] DFCNTRL[0x0] DFSTATUS[0x89]
STACK: 0x0 0x16b 0x180 0x3
SCB count = 16
Kernel NEXTQSCB = 15
Card NEXTQSCB = 15
QINFIFO entries:
Waiting Queue entries:
Disconnected Queue entries: 0:14
QOUTFIFO entries:
Sequencer Free SCB List: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
Sequencer SCB Info:
  0 SCB_CONTROL[0x44]
SCB_SCSIID[0x17] SCB_LUN[0x0] SCB_TAG[0xe]
  1 SCB_CONTROL[0x0]
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff]
  2 SCB_CONTROL[0x0]
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff]
  3 SCB_CONTROL[0x0]
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff]
  4 SCB_CONTROL[0x0]
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff]
  5 SCB_CONTROL[0x0]
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff]
  6 SCB_CONTROL[0x0]
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff]
  7 SCB_CONTROL[0x0]
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff]
  8 SCB_CONTROL[0x0]
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff]
  9 SCB_CONTROL[0x0]
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff]
 10 SCB_CONTROL[0x0]
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff]
 11 SCB_CONTROL[0x0]
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff]
 12 SCB_CONTROL[0x0]
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff]
 13 SCB_CONTROL[0x0]
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff]
 14 SCB_CONTROL[0x0]
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff]
 15 SCB_CONTROL[0x0]
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff]
 16 SCB_CONTROL[0x0]
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff]
 17 SCB_CONTROL[0x0]
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff]
 18 SCB_CONTROL[0x0]
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff]
 19 SCB_CONTROL[0x0]
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff]
 20 SCB_CONTROL[0x0]
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff]
 21 SCB_CONTROL[0x0]
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff]
 22 SCB_CONTROL[0x0]
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff]
 23 SCB_CONTROL[0x0]
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff]
 24 SCB_CONTROL[0x0]
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff]
 25 SCB_CONTROL[0x0]
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff]
 26 SCB_CONTROL[0x0]
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff]
 27 SCB_CONTROL[0x0]
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff]
 28 SCB_CONTROL[0x0]
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff]
 29 SCB_CONTROL[0x0]
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff]
 30 SCB_CONTROL[0x0]
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff]
 31 SCB_CONTROL[0x0]
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff]
Pending list:
 14 SCB_CONTROL[0x40]
SCB_SCSIID[0x17] SCB_LUN[0x0]
Kernel Free SCB list: 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Untagged Q(1): 14

<<<<<<<<<<<<<<<<< Dump Card State Ends >>>>>>>>>>>>>>>>>>
ahc4:Queuing a BDR SCB
ahc4:Bus Device Reset Message Sent
st3(ahc4:0:1:0): ahc4: no longer in timeout, status = 0
ahc4: Bus Device Reset on A:1. 1 SCBs aborted


# dmesg
Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005
    The NetBSD Foundation, Inc.  All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
    The Regents of the University of California.  All rights reserved.

NetBSD 3.99.9 (GENERIC) #18: Wed May 10 22:35:44 CDT 2006
        root@backup:/usr/obj/usr/src/sys/arch/i386/compile/GENERIC
total memory = 3583 MB
avail memory = 3511 MB
BIOS32 rev. 0 found at 0xffe90
mainbus0 (root)
cpu0 at mainbus0: (uniprocessor)
cpu0: Intel (686-class), 3192.14 MHz, id 0xf34
cpu0: features bfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
cpu0: features bfebfbff<PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX>
cpu0: features bfebfbff<FXSR,SSE,SSE2,SS,HTT,TM,SBF>
cpu0: features2 441d<SSE3,MONITOR,DS-CPL,CID,xTPR>
cpu0: features3 20000000<EM64T>
cpu0: "Intel(R) Xeon(TM) CPU 3.20GHz"
cpu0: I-cache 12K uOp cache 8-way
cpu0: L2 cache 1 MB 64B/line 8-way
cpu0: ITLB 4K/4M: 64 entries
cpu0: DTLB 4K/4M: 64 entries
cpu0: running without thermal monitor!
cpu0: 32 page colors
pci0 at mainbus0 bus 0: configuration mode 1
pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
pchb0 at pci0 dev 0 function 0
pchb0: Intel E7525 Memory Controller Hub (rev. 0x09)
ppb0 at pci0 dev 2 function 0: Intel E7525 PCI Express Port A (rev. 0x09)
pci1 at ppb0 bus 1
pci1: i/o space, memory space enabled, rd/line, wr/inv ok
ppb1 at pci1 dev 0 function 0: Intel product 0x0330 (rev. 0x06)
pci2 at ppb1 bus 2
pci2: i/o space, memory space enabled, rd/line, wr/inv ok
amr0 at pci2 dev 14 function 0: AMI RAID <PERC 4e/Di>
amr0: interrupting at irq 7
amr0: firmware 521X, BIOS H430, 256MB RAM
ld0 at amr0 unit 0: RAID 1, optimal
ld0: 136 GB, 17834 cyl, 255 head, 63 sec, 512 bytes/sect x 286515200 sectors
ld1 at amr0 unit 1: RAID 1, optimal
ld1: 136 GB, 17834 cyl, 255 head, 63 sec, 512 bytes/sect x 286515200 sectors
ppb2 at pci1 dev 0 function 2: Intel product 0x0332 (rev. 0x06)
pci3 at ppb2 bus 3
pci3: i/o space, memory space enabled, rd/line, wr/inv ok
ppb3 at pci0 dev 3 function 0: Intel E7525 PCI Express Port A1 (rev. 0x09)
pci4 at ppb3 bus 4
pci4: i/o space, memory space enabled, rd/line, wr/inv ok
ppb4 at pci4 dev 0 function 0: Intel 6700PXH PCI Express-to-PCI Bridge #0 (rev. 0x09)
pci5 at ppb4 bus 5
pci5: i/o space, memory space enabled, rd/line, wr/inv ok
mpt0 at pci5 dev 4 function 0: LSI Logic 53c1030 Ultra320 SCSI
mpt0: interrupting at irq 7
scsibus0 at mpt0: 16 targets, 8 luns per target
mpt1 at pci5 dev 4 function 1: LSI Logic 53c1030 Ultra320 SCSI
mpt1: interrupting at irq 10
scsibus1 at mpt1: 16 targets, 8 luns per target
ppb5 at pci4 dev 0 function 2: Intel 6700PXH PCI Express-to-PCI Bridge #1 (rev. 0x09)
pci6 at ppb5 bus 6
pci6: i/o space, memory space enabled, rd/line, wr/inv ok
ahc1 at pci6 dev 2 function 0: Adaptec 3960D Ultra160 SCSI adapter
ahc1: interrupting at irq 11
ahc1: aic7899: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs
scsibus2 at ahc1: 16 targets, 8 luns per target
ahc2 at pci6 dev 2 function 1: Adaptec 3960D Ultra160 SCSI adapter
ahc2: interrupting at irq 11
ahc2: aic7899: Ultra160 Wide Channel B, SCSI Id=7, 32/253 SCBs
scsibus3 at ahc2: 16 targets, 8 luns per target
ahc3 at pci6 dev 3 function 0: Adaptec 3960D Ultra160 SCSI adapter
ahc3: interrupting at irq 11
ahc3: aic7899: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs
scsibus4 at ahc3: 16 targets, 8 luns per target
ahc4 at pci6 dev 3 function 1: Adaptec 3960D Ultra160 SCSI adapter
ahc4: interrupting at irq 7
ahc4: aic7899: Ultra160 Wide Channel B, SCSI Id=7, 32/253 SCBs
scsibus5 at ahc4: 16 targets, 8 luns per target
ppb6 at pci0 dev 4 function 0: Intel E7525 PCI Express Port B (rev. 0x09)
pci7 at ppb6 bus 7
pci7: i/o space, memory space enabled, rd/line, wr/inv ok
ppb7 at pci0 dev 5 function 0: Intel product 0x3598 (rev. 0x09)
pci8 at ppb7 bus 10
pci8: i/o space, memory space enabled, rd/line, wr/inv ok
ppb8 at pci8 dev 0 function 0: Intel 6700PXH PCI Express-to-PCI Bridge #0 (rev. 0x09)
pci9 at ppb8 bus 11
pci9: i/o space, memory space enabled, rd/line, wr/inv ok
wm0 at pci9 dev 7 function 0: Intel i82541GI 1000BASE-T Ethernet, rev. 5
wm0: interrupting at irq 11
wm0: 32-bit 66MHz PCI bus
wm0: 65536 word (16 address bits) SPI EEPROM
wm0: Ethernet address 00:11:43:5a:87:9c
igphy0 at wm0 phy 1: Intel IGP01E1000 Gigabit PHY, rev. 0
igphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
ppb9 at pci8 dev 0 function 2: Intel 6700PXH PCI Express-to-PCI Bridge #1 (rev. 0x09)
pci10 at ppb9 bus 12
pci10: i/o space, memory space enabled, rd/line, wr/inv ok
wm1 at pci10 dev 8 function 0: Intel i82541GI 1000BASE-T Ethernet, rev. 5
wm1: interrupting at irq 11
wm1: 32-bit 66MHz PCI bus
wm1: 65536 word (16 address bits) SPI EEPROM
wm1: Ethernet address 00:11:43:5a:87:9d
igphy1 at wm1 phy 1: Intel IGP01E1000 Gigabit PHY, rev. 0
igphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
ppb10 at pci0 dev 6 function 0: Intel product 0x3599 (rev. 0x09)
pci11 at ppb10 bus 13
pci11: i/o space, memory space enabled, rd/line, wr/inv ok
uhci0 at pci0 dev 29 function 0: Intel 82801EB/ER USB UHCI Controller (rev. 0x02)
uhci0: interrupting at irq 11
usb0 at uhci0: USB revision 1.0
uhub0 at usb0
uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
uhci1 at pci0 dev 29 function 1: Intel 82801EB/ER USB UHCI Controller (rev. 0x02)
uhci1: interrupting at irq 10
usb1 at uhci1: USB revision 1.0
uhub1 at usb1
uhub1: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub1: 2 ports with 2 removable, self powered
uhci2 at pci0 dev 29 function 2: Intel 82801EB/ER USB UHCI Controller (rev. 0x02)
uhci2: interrupting at irq 7
usb2 at uhci2: USB revision 1.0
uhub2 at usb2
uhub2: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub2: 2 ports with 2 removable, self powered
ehci0 at pci0 dev 29 function 7: Intel 82801EB/ER USB EHCI Controller (rev. 0x02)
ehci0: interrupting at irq 3
ehci0: EHCI version 1.0
ehci0: companion controllers, 2 ports each: uhci0 uhci1 uhci2
usb3 at ehci0: USB revision 2.0
uhub3 at usb3
uhub3: Intel EHCI root hub, class 9/0, rev 2.00/1.00, addr 1
uhub3: 6 ports with 6 removable, self powered
ppb11 at pci0 dev 30 function 0: Intel 82801BA Hub-PCI Bridge (rev. 0xc2)
pci12 at ppb11 bus 16
pci12: i/o space, memory space enabled
vga1 at pci12 dev 13 function 0: ATI Technologies Radeon 7000/VE (rev. 0x00)
wsdisplay0 at vga1 kbdmux 1
wsmux1: connecting to wsdisplay0
pcib0 at pci0 dev 31 function 0
pcib0: Intel 82801EB LPC Interface Bridge (rev. 0x02)
piixide0 at pci0 dev 31 function 1
piixide0: Intel 82801EB IDE Controller (ICH5) (rev. 0x02)
piixide0: bus-master DMA support present
piixide0: primary channel configured to compatibility mode
piixide0: primary channel interrupting at irq 14
atabus0 at piixide0 channel 0
piixide0: secondary channel configured to compatibility mode
piixide0: secondary channel interrupting at irq 15
atabus1 at piixide0 channel 1
isa0 at pcib0
lpt1 at isa0 port 0x278-0x27b irq : polled
com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, working fifo
com0: console
pckbc0 at isa0 port 0x60-0x64
pckbd0 at pckbc0 (kbd slot)
pckbc0: using irq 1 for kbd slot
wskbd0 at pckbd0 mux 1
wskbd0: connecting to wsdisplay0
pms0 at pckbc0 (aux slot)
pckbc0: using irq 12 for aux slot
wsmouse0 at pms0 mux 0
attimer0 at isa0 port 0x40-0x43: AT Timer
pcppi0 at isa0 port 0x61
midi0 at pcppi0: PC speaker
sysbeep0 at pcppi0
isapnp0 at isa0 port 0x279: ISA Plug 'n Play device support
npx0 at isa0 port 0xf0-0xff: using exception 16
fdc0 at isa0 port 0x3f0-0x3f7 irq 6 drq 2
pcppi0: attached to attimer0
isapnp0: no ISA Plug 'n Play devices found
fd0 at fdc0 drive 0: 1.44MB, 80 cyl, 2 head, 18 sec
Kernelized RAIDframe activated
scsibus0: waiting 2 seconds for devices to settle...
scsibus1: waiting 2 seconds for devices to settle...
scsibus2: waiting 2 seconds for devices to settle...
scsibus3: waiting 2 seconds for devices to settle...
scsibus4: waiting 2 seconds for devices to settle...
scsibus5: waiting 2 seconds for devices to settle...
atapibus0 at atabus0: 2 targets
cd0 at atapibus0 drive 0: <TEAC CD-ROM CD-224E, , K.9A> cdrom removable
cd0: 32-bit data port
cd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 2 (Ultra/33)
cd0(piixide0:0:0): using PIO mode 4, Ultra-DMA mode 2 (Ultra/33) (using DMA)
uhub4 at uhub3 port 3
uhub4: Dell product 0xa001, class 9/0, rev 2.00/0.00, addr 2
uhub4: multiple transaction translators
uhub4: 2 ports with 2 removable, self powered
sd0 at scsibus0 target 0 lun 0: <, , R0.0> disk fixed
sd0: 1394 GB, 178515 cyl, 128 head, 128 sec, 512 bytes/sect x 2924789760 sectors
sd0: sync (6.25ns offset 127), 16-bit (320.000MB/s) transfers, tagged queueing
uplcom0 at uhub2 port 2
uplcom0: Prolific Technology PL2303 Serial adapter (ATEN/IOGEAR UC232A), rev 1.10/2.02, addr 2
ucom0 at uplcom0
sd1 at scsibus1 target 0 lun 0: <, , R0.0> disk fixed
sd1: 1394 GB, 178515 cyl, 128 head, 128 sec, 512 bytes/sect x 2924789760 sectors
sd1: sync (6.25ns offset 127), 16-bit (320.000MB/s) transfers, tagged queueing
ch0 at scsibus2 target 0 lun 0: <QUALSTAR, TLS-412180, 2.24> changer removable
ch0: 150 slots, 4 drives, 1 picker, 30 portals
ch0: async, 8-bit transfers
st0 at scsibus2 target 1 lun 0: <SONY, SDX-900V, 0103> tape removable
st0: density code 51, variable blocks, write-enabled
st0: sync (12.50ns offset 100), 16-bit (160.000MB/s) transfers
st1 at scsibus3 target 1 lun 0: <SONY, SDX-900V, 0103> tape removable
st1: density code 51, variable blocks, write-enabled
st1: sync (12.50ns offset 100), 16-bit (160.000MB/s) transfers
st2 at scsibus4 target 1 lun 0: <SONY, SDX-900V, 0103> tape removable
st2: density code 51, variable blocks, write-enabled
st2: sync (12.50ns offset 100), 16-bit (160.000MB/s) transfers
st3 at scsibus5 target 1 lun 0: <SONY, SDX-900V, 0103> tape removable
st3: density code 51, variable blocks, write-enabled
st3: sync (12.50ns offset 100), 16-bit (160.000MB/s) transfers
boot device: ld0

-Tracy