Subject: kern/13659: two possibly related ahc panics on aic7899 based system
To: None <gnats-bugs@gnats.netbsd.org>
From: Greg A. Woods <woods@weird.com>
List: netbsd-bugs
Date: 08/08/2001 16:05:39
>Number: 13659
>Category: kern
>Synopsis: two possibly related ahc panics on aic7899 based system
>Confidential: no
>Severity: serious
>Priority: high
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Wed Aug 08 14:12:00 PDT 2001
>Closed-Date:
>Last-Modified:
>Originator: Greg A. Woods
>Release: 2001/04/24
>Organization:
Planix, Inc.; Toronto, Ontario; Canada
>Environment:
System: NetBSD 1.5W
Architecture: i386
Machine: i386
>Description:
I've got a big server here with an Intel STL2 server motherboard in it.
It panics while running 'squid -z', both with softdep on the cache_dir
filesystems:
ahc0: WARNING no command for scb 39 (cmdcmplt)
QOUTPOS = 188
sd1(ahc0:0:1:0): Check Condition on CDB: 0x2a 00 00 58 a9 fc 00 00 08 00
SENSE KEY: Aborted Command
ASC/ASCQ: Overlapped Commands Attempted
FRU CODE: 0x1
sd0(ahc0:0:0:0): Check Condition on CDB: 0x2a 00 00 b1 7a 2c 00 00 04 00
SENSE KEY: Aborted Command
ASC/ASCQ: Overlapped Commands Attempted
FRU CODE: 0x1
ahc0: WARNING no command for scb 95 (cmdcmplt)
QOUTPOS = 47
sd3(ahc0:0:3:0): Check Condition on CDB: 0x2a 00 00 53 27 fc 00 00 08 00
SENSE KEY: Aborted Command
ASC/ASCQ: Overlapped Commands Attempted
FRU CODE: 0x1
sd1(ahc0:0:1:0): SCB 27 - timed out while idle, SEQADDR == 0x9
SCSIRATE == 0x0
sd1(ahc0:0:1:0): Queuing a BDR SCB
sd1(ahc0:0:1:0): SCB 27 - timed out while idle, SEQADDR == 0x9
SCSIRATE == 0x0
sd1(ahc0:0:1:0): no longer in timeout, status = 0
sd0: async, 8-bit transfers, tagged queueing
sd1: async, 8-bit transfers, tagged queueing
sd2: async, 8-bit transfers, tagged queueing
sd3: async, 8-bit transfers, tagged queueing
ahc0: Issued Channel A Bus Reset. 45 SCBs aborted
sd3: sync (25.0ns offset 63), 16-bit (80.000MB/s) transfers, tagged queueing
kernel: page fault trap, code=0
Stopped at ahc_match_scb+0x17: movb 0x1(%eax),%dl
db> trace
ahc_match_scb(c0d34ed8,1,0,0,ff) at ahc_match_scb+0x17
ahc_search_qinfifo(c0d32e00,1,0,0,ff) at ahc_search_qinfifo+0xb0
ahc_freeze_devq(c0d32e00,c0d3d000) at ahc_freeze_devq+0x27
ahc_handle_seqint(c0d32e00,71) at ahc_handle_seqint+0x513
ahc_intr(c0d32e00) at ahc_intr+0x118
Xintr9() at Xintr9+0x7c
--- interrupt ---
idle(e88ed8e8) at idle+0x20
bpendtsleep(c0d5a400,20,c02c5b7a,72,0) at bpendtsleep
apm_thread(c0d5a400) at apm_thread+0x6c
db> reboot
syncing disks... sd3(ahc0:0:3:0): SCB 27 - timed out in Message-in phase, SEQADD
R == 0xdd
SCSIRATE == 0x93
sd1(ahc0:0:1:0): BDR message in message buffer
ahc0:A:3: unknown scsi bus phase b6. Attempting to continue
sd1(ahc0:0:1:0): SCB 27 - timed out while idle, SEQADDR == 0x17c
SCSIRATE == 0x0
sd1(ahc0:0:1:0): no longer in timeout, status = 0
ahc0: WARNING no command for scb 48 (cmdcmplt)
QOUTPOS = 2
ahc0: WARNING no command for scb 208 (cmdcmplt)
QOUTPOS = 5
ahc0: WARNING no command for scb 84 (cmdcmplt)
QOUTPOS = 6
ahc0: WARNING no command for scb 175 (cmdcmplt)
QOUTPOS = 7
ahc0: WARNING no command for scb 1 (cmdcmplt)
QOUTPOS = 9
ahc0: WARNING no command for scb 144 (cmdcmplt)
QOUTPOS = 14
ahc0: WARNING no command for scb 70 (cmdcmplt)
QOUTPOS = 15
ahc0: WARNING no command for scb 8 (cmdcmplt)
QOUTPOS = 18
ahc0: WARNING no command for scb 224 (cmdcmplt)
QOUTPOS = 21
ahc0: WARNING no command for scb 10 (cmdcmplt)
QOUTPOS = 25
panic: handle_written_filepage: not started
Stopped at cpu_Debugger+0x4: leave
db> reboot
sd3: cache synchronization failed
sd2(ahc0:0:2:0): failed to enqueue polling command, retrying...
sd2(ahc0:0:2:0): failed to enqueue polling command, retrying...
sd2(ahc0:0:2:0): failed to enqueue polling command, retrying...
sd2(ahc0:0:2:0): failed to enqueue polling command, retrying...
sd2(ahc0:0:2:0): failed to enqueue polling command
sd2: cache synchronization failed
sd1: cache synchronization failed
sd0(ahc0:0:0:0): failed to enqueue polling command, retrying...
sd0(ahc0:0:0:0): failed to enqueue polling command, retrying...
sd0(ahc0:0:0:0): failed to enqueue polling command, retrying...
sd0(ahc0:0:0:0): failed to enqueue polling command, retrying...
sd0(ahc0:0:0:0): failed to enqueue polling command
sd0: cache synchronization failed
rebooting...
and with softdep turned off:
ahc0: WARNING no command for scb 63 (cmdcmplt)
QOUTPOS = 147
sd2(ahc0:0:2:0): SCB 3f - timed out while idle, SEQADDR == 0xa
SCSIRATE == 0x0
sd2(ahc0:0:2:0): Queuing a BDR SCB
sd2(ahc0:0:2:0): SCB 3f - timed out in Command phase, SEQADDR == 0x36
SCSIRATE == 0x0
sd2(ahc0:0:2:0): no longer in timeout, status = 0
sd0: async, 8-bit transfers, tagged queueing
sd1: async, 8-bit transfers, tagged queueing
sd2: async, 8-bit transfers, tagged queueing
sd3: async, 8-bit transfers, tagged queueing
ahc0: Issued Channel A Bus Reset. 1 SCBs aborted
sd0: sync (25.0ns offset 63), 16-bit (80.000MB/s) transfers, tagged queueing
sd0(ahc0:0:0:0): SCB 14 - timed out in Command phase, SEQADDR == 0x9a
SCSIRATE == 0x0
kernel: page fault trap, code=0
Stopped at ahc_timeout+0x443: movl 0x34(%edx),%edx
db> race
ahc_timeout(c0d34320) at ahc_timeout+0x443
softclock(0,c0d33740,e88ed8e8,e88ed8e8,e88feebc) at softclock+0x122
hardclock(e88feec8,e88feec4,c0100dd8,e88feec8,0) at hardclock+0x528
clockintr(e88feec8) at clockintr+0xb
Xintr0() at Xintr0+0x78
--- interrupt ---
idle(e88ed8e8) at idle+0x20
bpendtsleep(c0d5a400,20,c02c5b7a,72,0) at bpendtsleep
apm_thread(c0d5a400) at apm_thread+0x6c
db> reboot
syncing disks...
this second time the bus seemed permanently hung and I eventually gave
up waiting and just hit the reset switch....
Interestingly 1.5T did not have this problem (with no softdep) on the
very same hardware....
Here's the boot log (kernel config available on request):
>How-To-Repeat:
try running "squid -z" on similar hardware with four similar disks?
>Fix:
none known
>Release-Note:
>Audit-Trail:
>Unformatted:
>> NetBSD/i386 BIOS Boot, Revision 2.10
>> (woods@proven, Thu Jul 5 17:56:40 EDT 2001)
>> Memory: 637/523200 k
Press return to boot now, any other key for boot menu
booting hd0a:netbsd - starting in 0
1942850+51764+1145176 [65+142240+117769]=0x33f2a0
[ using 260532 bytes of netbsd ELF symbol table ]
Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001
The NetBSD Foundation, Inc. All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
The Regents of the University of California. All rights reserved.
NetBSD 1.5W (ACI-SQUID) #0: Wed Aug 8 09:58:53 EDT 2001
woods@proven:/work/woods/NetBSD-src/sys/arch/i386/compile/ACI-SQUID
cpu0: Intel Pentium III (Coppermine) (686-class), 733.22 MHz
cpu0: I-cache 16 KB 32b/line 4-way, D-cache 16 KB 32b/line 2-way
cpu0: L2 cache 256 KB 32b/line 8-way
cpu0: features 387fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
cpu0: features 387fbff<PGE,MCA,CMOV,FGPAT,PSE36,PN,MMX,FXSR,XMM>
cpu0: serial number 0000-0686-0003-DA75-1D14-60EC
total memory = 511 MB
avail memory = 469 MB
using 6573 buffers containing 26292 KB of memory
BIOS32 rev. 0 found at 0xfd85e
mainbus0 (root)
pci0 at mainbus0 bus 0: configuration mode 1
pci0: i/o space, memory space enabled
pchb0 at pci0 dev 0 function 0
pchb0: ServerWorks CNB20LE Host (rev. 0x05)
pchb1 at pci0 dev 0 function 1
pchb1: ServerWorks CNB20LE Host (rev. 0x05)
pci1 at pchb1 bus 1
pci1: i/o space, memory space enabled
ahc0 at pci1 dev 4 function 0
ahc0: interrupting at irq 9
ahc0: aic7899 Wide Channel A, SCSI Id=7, 16/255 SCBs
scsibus0 at ahc0: 16 targets, 8 luns per target
ahc1 at pci1 dev 4 function 1
ahc1: interrupting at irq 11
ahc1: aic7899 Wide Channel B, SCSI Id=7, 16/255 SCBs
scsibus1 at ahc1: 16 targets, 8 luns per target
vga1 at pci0 dev 2 function 0: ATI Technologies Mach64 GV (rev. 0x7a)
wsdisplay0 at vga1
fxp0 at pci0 dev 3 function 0: i82559 Ethernet, rev 8
fxp0: interrupting at irq 10
fxp0: Ethernet address 00:d0:b7:b6:ad:4b
inphy0 at fxp0 phy 1: i82555 10/100 media interface, rev. 4
inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
fxp1 at pci0 dev 7 function 0: i82550 Ethernet, rev 12
fxp1: interrupting at irq 9
fxp1: Ethernet address 00:02:b3:28:1e:ac
inphy1 at fxp1 phy 1: i82555 10/100 media interface, rev. 4
inphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
pcib0 at pci0 dev 15 function 0
pcib0: ServerWorks ROSB4 SouthBridge (rev. 0x4f)
pciide0 at pci0 dev 15 function 1: ServerWorks IDE (rev. 0x00)
pciide0: bus-master DMA support present, but unused (no driver support)
pciide0: primary channel configured to compatibility mode
pciide0: primary channel interrupting at irq 14
atapibus0 at pciide0 channel 0: 2 targets
cd0 at atapibus0 drive 0: <MATSHITA CR-594, , YS0B> type 5 cdrom removable
cd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 2 (Ultra/33)
pciide0: secondary channel configured to compatibility mode
pciide0: secondary channel interrupting at irq 15
ohci0 at pci0 dev 15 function 2: ServerWorks USB (rev. 0x04)
ohci0: interrupting at irq 9
ohci0: OHCI version 1.0, legacy support
usb0 at ohci0: USB revision 1.0
uhub0 at usb0
uhub0: ServerWorks OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 4 ports with 4 removable, self powered
isa0 at pcib0
com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, working fifo
com0: console
com1 at isa0 port 0x2f8-0x2ff irq 3: ns16550a, working fifo
pckbc0 at isa0 port 0x60-0x64
pckbd0 at pckbc0 (kbd slot)
pckbc0: using irq 1 for kbd slot
wskbd0 at pckbd0
lpt0 at isa0 port 0x378-0x37b irq 7
pcppi0 at isa0 port 0x61
sysbeep0 at pcppi0
isapnp0 at isa0 port 0x279: ISA Plug 'n Play device support
npx0 at isa0 port 0xf0-0xff: using exception 16
fdc0 at isa0 port 0x3f0-0x3f7 irq 6 drq 2
fd0 at fdc0 drive 0: 1.44MB, 80 cyl, 2 head, 18 sec
isapnp0: no ISA Plug 'n Play devices found
apm0 at mainbus0: Power Management spec V1.2
APM get capabilities: no APM present (0x8610)
apm0: A/C state: on
apm0: battery charge state: no battery
biomask fb65 netmask ff65 ttymask ffe7
scsibus0: waiting 2 seconds for devices to settle...
sd0 at scsibus0 target 0 lun 0: <SEAGATE, ST39205LW, 5063> SCSI3 0/direct fixed
sd0: 8750 MB, 19036 cyl, 2 head, 470 sec, 512 bytes/sect x 17921835 sectors
sd0: sync (25.0ns offset 63), 16-bit (80.000MB/s) transfers, tagged queueing
sd1 at scsibus0 target 1 lun 0: <SEAGATE, ST39205LW, 5063> SCSI3 0/direct fixed
sd1: 8750 MB, 19036 cyl, 2 head, 470 sec, 512 bytes/sect x 17921835 sectors
sd1: sync (25.0ns offset 63), 16-bit (80.000MB/s) transfers, tagged queueing
sd2 at scsibus0 target 2 lun 0: <SEAGATE, ST39205LW, 5063> SCSI3 0/direct fixed
sd2: 8750 MB, 19036 cyl, 2 head, 470 sec, 512 bytes/sect x 17921835 sectors
sd2: sync (25.0ns offset 63), 16-bit (80.000MB/s) transfers, tagged queueing
sd3 at scsibus0 target 3 lun 0: <SEAGATE, ST39205LW, 5063> SCSI3 0/direct fixed
sd3: 8750 MB, 19036 cyl, 2 head, 470 sec, 512 bytes/sect x 17921835 sectors
sd3: sync (25.0ns offset 63), 16-bit (80.000MB/s) transfers, tagged queueing
scsibus1: waiting 2 seconds for devices to settle...
boot device: sd0
root on sd0a dumps on sd0b
root file system type: ffs