Subject: kern/13659: two possibly related ahc panics on aic7899 based system
To: None <gnats-bugs@gnats.netbsd.org>
From: Greg A. Woods <woods@weird.com>
List: netbsd-bugs
Date: 08/08/2001 16:05:39
>Number:         13659
>Category:       kern
>Synopsis:       two possibly related ahc panics on aic7899 based system
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Aug 08 14:12:00 PDT 2001
>Closed-Date:
>Last-Modified:
>Originator:     Greg A. Woods
>Release:        2001/04/24
>Organization:
Planix, Inc.; Toronto, Ontario; Canada
>Environment:

System: NetBSD 1.5W
Architecture: i386
Machine: i386

>Description:

I've got a big server here with an Intel STL2 server motherboard in it.

It panics while running 'squid -z', both with softdep on the cache_dir
filesystems:

ahc0: WARNING no command for scb 39 (cmdcmplt)
QOUTPOS = 188
sd1(ahc0:0:1:0):  Check Condition on CDB: 0x2a 00 00 58 a9 fc 00 00 08 00
    SENSE KEY:  Aborted Command
     ASC/ASCQ:  Overlapped Commands Attempted
     FRU CODE:  0x1

sd0(ahc0:0:0:0):  Check Condition on CDB: 0x2a 00 00 b1 7a 2c 00 00 04 00
    SENSE KEY:  Aborted Command
     ASC/ASCQ:  Overlapped Commands Attempted
     FRU CODE:  0x1

ahc0: WARNING no command for scb 95 (cmdcmplt)
QOUTPOS = 47
sd3(ahc0:0:3:0):  Check Condition on CDB: 0x2a 00 00 53 27 fc 00 00 08 00
    SENSE KEY:  Aborted Command
     ASC/ASCQ:  Overlapped Commands Attempted
     FRU CODE:  0x1

sd1(ahc0:0:1:0): SCB 27 - timed out while idle, SEQADDR == 0x9
SCSIRATE == 0x0
sd1(ahc0:0:1:0): Queuing a BDR SCB
sd1(ahc0:0:1:0): SCB 27 - timed out while idle, SEQADDR == 0x9
SCSIRATE == 0x0
sd1(ahc0:0:1:0): no longer in timeout, status = 0
sd0: async, 8-bit transfers, tagged queueing
sd1: async, 8-bit transfers, tagged queueing
sd2: async, 8-bit transfers, tagged queueing
sd3: async, 8-bit transfers, tagged queueing
ahc0: Issued Channel A Bus Reset. 45 SCBs aborted
sd3: sync (25.0ns offset 63), 16-bit (80.000MB/s) transfers, tagged queueing
kernel: page fault trap, code=0
Stopped at      ahc_match_scb+0x17:     movb           0x1(%eax),%dl
db> trace
ahc_match_scb(c0d34ed8,1,0,0,ff) at ahc_match_scb+0x17
ahc_search_qinfifo(c0d32e00,1,0,0,ff) at ahc_search_qinfifo+0xb0
ahc_freeze_devq(c0d32e00,c0d3d000) at ahc_freeze_devq+0x27
ahc_handle_seqint(c0d32e00,71) at ahc_handle_seqint+0x513
ahc_intr(c0d32e00) at ahc_intr+0x118
Xintr9() at Xintr9+0x7c
--- interrupt ---
idle(e88ed8e8) at idle+0x20
bpendtsleep(c0d5a400,20,c02c5b7a,72,0) at bpendtsleep
apm_thread(c0d5a400) at apm_thread+0x6c
db> reboot
syncing disks... sd3(ahc0:0:3:0): SCB 27 - timed out in Message-in phase, SEQADD
R == 0xdd
SCSIRATE == 0x93
sd1(ahc0:0:1:0): BDR message in message buffer
ahc0:A:3: unknown scsi bus phase b6.  Attempting to continue
sd1(ahc0:0:1:0): SCB 27 - timed out while idle, SEQADDR == 0x17c
SCSIRATE == 0x0
sd1(ahc0:0:1:0): no longer in timeout, status = 0
ahc0: WARNING no command for scb 48 (cmdcmplt)
QOUTPOS = 2
ahc0: WARNING no command for scb 208 (cmdcmplt)
QOUTPOS = 5
ahc0: WARNING no command for scb 84 (cmdcmplt)
QOUTPOS = 6
ahc0: WARNING no command for scb 175 (cmdcmplt)
QOUTPOS = 7
ahc0: WARNING no command for scb 1 (cmdcmplt)
QOUTPOS = 9
ahc0: WARNING no command for scb 144 (cmdcmplt)
QOUTPOS = 14
ahc0: WARNING no command for scb 70 (cmdcmplt)
QOUTPOS = 15
ahc0: WARNING no command for scb 8 (cmdcmplt)
QOUTPOS = 18
ahc0: WARNING no command for scb 224 (cmdcmplt)
QOUTPOS = 21
ahc0: WARNING no command for scb 10 (cmdcmplt)
QOUTPOS = 25
panic: handle_written_filepage: not started
Stopped at      cpu_Debugger+0x4:       leave
db> reboot
sd3: cache synchronization failed
sd2(ahc0:0:2:0): failed to enqueue polling command, retrying...
sd2(ahc0:0:2:0): failed to enqueue polling command, retrying...
sd2(ahc0:0:2:0): failed to enqueue polling command, retrying...
sd2(ahc0:0:2:0): failed to enqueue polling command, retrying...
sd2(ahc0:0:2:0): failed to enqueue polling command
sd2: cache synchronization failed
sd1: cache synchronization failed
sd0(ahc0:0:0:0): failed to enqueue polling command, retrying...
sd0(ahc0:0:0:0): failed to enqueue polling command, retrying...
sd0(ahc0:0:0:0): failed to enqueue polling command, retrying...
sd0(ahc0:0:0:0): failed to enqueue polling command, retrying...
sd0(ahc0:0:0:0): failed to enqueue polling command
sd0: cache synchronization failed
rebooting...



and with softdep turned off:


ahc0: WARNING no command for scb 63 (cmdcmplt)
QOUTPOS = 147
sd2(ahc0:0:2:0): SCB 3f - timed out while idle, SEQADDR == 0xa
SCSIRATE == 0x0
sd2(ahc0:0:2:0): Queuing a BDR SCB
sd2(ahc0:0:2:0): SCB 3f - timed out in Command phase, SEQADDR == 0x36
SCSIRATE == 0x0
sd2(ahc0:0:2:0): no longer in timeout, status = 0
sd0: async, 8-bit transfers, tagged queueing
sd1: async, 8-bit transfers, tagged queueing
sd2: async, 8-bit transfers, tagged queueing
sd3: async, 8-bit transfers, tagged queueing
ahc0: Issued Channel A Bus Reset. 1 SCBs aborted
sd0: sync (25.0ns offset 63), 16-bit (80.000MB/s) transfers, tagged queueing
sd0(ahc0:0:0:0): SCB 14 - timed out in Command phase, SEQADDR == 0x9a
SCSIRATE == 0x0
kernel: page fault trap, code=0
Stopped at      ahc_timeout+0x443:      movl          0x34(%edx),%edx
db> race
ahc_timeout(c0d34320) at ahc_timeout+0x443
softclock(0,c0d33740,e88ed8e8,e88ed8e8,e88feebc) at softclock+0x122
hardclock(e88feec8,e88feec4,c0100dd8,e88feec8,0) at hardclock+0x528
clockintr(e88feec8) at clockintr+0xb
Xintr0() at Xintr0+0x78
--- interrupt ---
idle(e88ed8e8) at idle+0x20
bpendtsleep(c0d5a400,20,c02c5b7a,72,0) at bpendtsleep
apm_thread(c0d5a400) at apm_thread+0x6c
db> reboot
syncing disks... 

this second time the bus seemed permanently hung and I eventually gave
up waiting and just hit the reset switch....



Interestingly 1.5T did not have this problem (with no softdep) on the
very same hardware....



Here's the boot log (kernel config available on request):


>How-To-Repeat:

	try running "squid -z" on similar hardware with four similar disks?

>Fix:

	none known

>Release-Note:
>Audit-Trail:
>Unformatted:
 >> NetBSD/i386 BIOS Boot, Revision 2.10
 >> (woods@proven, Thu Jul  5 17:56:40 EDT 2001)
 >> Memory: 637/523200 k
 Press return to boot now, any other key for boot menu
 booting hd0a:netbsd - starting in 0
 1942850+51764+1145176 [65+142240+117769]=0x33f2a0
 [ using 260532 bytes of netbsd ELF symbol table ]
 
 Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001
     The NetBSD Foundation, Inc.  All rights reserved.
 Copyright (c) 1982, 1986, 1989, 1991, 1993
     The Regents of the University of California.  All rights reserved.
 
 NetBSD 1.5W (ACI-SQUID) #0: Wed Aug  8 09:58:53 EDT 2001
     woods@proven:/work/woods/NetBSD-src/sys/arch/i386/compile/ACI-SQUID
 cpu0: Intel Pentium III (Coppermine) (686-class), 733.22 MHz
 cpu0: I-cache 16 KB 32b/line 4-way, D-cache 16 KB 32b/line 2-way
 cpu0: L2 cache 256 KB 32b/line 8-way
 cpu0: features 387fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
 cpu0: features 387fbff<PGE,MCA,CMOV,FGPAT,PSE36,PN,MMX,FXSR,XMM>
 cpu0: serial number 0000-0686-0003-DA75-1D14-60EC
 total memory = 511 MB
 avail memory = 469 MB
 using 6573 buffers containing 26292 KB of memory
 BIOS32 rev. 0 found at 0xfd85e
 mainbus0 (root)
 pci0 at mainbus0 bus 0: configuration mode 1
 pci0: i/o space, memory space enabled
 pchb0 at pci0 dev 0 function 0
 pchb0: ServerWorks CNB20LE Host (rev. 0x05)
 pchb1 at pci0 dev 0 function 1
 pchb1: ServerWorks CNB20LE Host (rev. 0x05)
 pci1 at pchb1 bus 1
 pci1: i/o space, memory space enabled
 ahc0 at pci1 dev 4 function 0
 ahc0: interrupting at irq 9
 ahc0: aic7899 Wide Channel A, SCSI Id=7, 16/255 SCBs
 scsibus0 at ahc0: 16 targets, 8 luns per target
 ahc1 at pci1 dev 4 function 1
 ahc1: interrupting at irq 11
 ahc1: aic7899 Wide Channel B, SCSI Id=7, 16/255 SCBs
 scsibus1 at ahc1: 16 targets, 8 luns per target
 vga1 at pci0 dev 2 function 0: ATI Technologies Mach64 GV (rev. 0x7a)
 wsdisplay0 at vga1
 fxp0 at pci0 dev 3 function 0: i82559 Ethernet, rev 8
 fxp0: interrupting at irq 10
 fxp0: Ethernet address 00:d0:b7:b6:ad:4b
 inphy0 at fxp0 phy 1: i82555 10/100 media interface, rev. 4
 inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
 fxp1 at pci0 dev 7 function 0: i82550 Ethernet, rev 12
 fxp1: interrupting at irq 9
 fxp1: Ethernet address 00:02:b3:28:1e:ac
 inphy1 at fxp1 phy 1: i82555 10/100 media interface, rev. 4
 inphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
 pcib0 at pci0 dev 15 function 0
 pcib0: ServerWorks ROSB4 SouthBridge (rev. 0x4f)
 pciide0 at pci0 dev 15 function 1: ServerWorks IDE (rev. 0x00)
 pciide0: bus-master DMA support present, but unused (no driver support)
 pciide0: primary channel configured to compatibility mode
 pciide0: primary channel interrupting at irq 14
 atapibus0 at pciide0 channel 0: 2 targets
 cd0 at atapibus0 drive 0: <MATSHITA CR-594, , YS0B> type 5 cdrom removable
 cd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 2 (Ultra/33)
 pciide0: secondary channel configured to compatibility mode
 pciide0: secondary channel interrupting at irq 15
 ohci0 at pci0 dev 15 function 2: ServerWorks USB (rev. 0x04)
 ohci0: interrupting at irq 9
 ohci0: OHCI version 1.0, legacy support
 usb0 at ohci0: USB revision 1.0
 uhub0 at usb0
 uhub0: ServerWorks OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
 uhub0: 4 ports with 4 removable, self powered
 isa0 at pcib0
 com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, working fifo
 com0: console
 com1 at isa0 port 0x2f8-0x2ff irq 3: ns16550a, working fifo
 pckbc0 at isa0 port 0x60-0x64
 pckbd0 at pckbc0 (kbd slot)
 pckbc0: using irq 1 for kbd slot
 wskbd0 at pckbd0
 lpt0 at isa0 port 0x378-0x37b irq 7
 pcppi0 at isa0 port 0x61
 sysbeep0 at pcppi0
 isapnp0 at isa0 port 0x279: ISA Plug 'n Play device support
 npx0 at isa0 port 0xf0-0xff: using exception 16
 fdc0 at isa0 port 0x3f0-0x3f7 irq 6 drq 2
 fd0 at fdc0 drive 0: 1.44MB, 80 cyl, 2 head, 18 sec
 isapnp0: no ISA Plug 'n Play devices found
 apm0 at mainbus0: Power Management spec V1.2
 APM get capabilities: no APM present (0x8610)
 apm0: A/C state: on
 apm0: battery charge state: no battery
 biomask fb65 netmask ff65 ttymask ffe7
 scsibus0: waiting 2 seconds for devices to settle...
 sd0 at scsibus0 target 0 lun 0: <SEAGATE, ST39205LW, 5063> SCSI3 0/direct fixed
 sd0: 8750 MB, 19036 cyl, 2 head, 470 sec, 512 bytes/sect x 17921835 sectors
 sd0: sync (25.0ns offset 63), 16-bit (80.000MB/s) transfers, tagged queueing
 sd1 at scsibus0 target 1 lun 0: <SEAGATE, ST39205LW, 5063> SCSI3 0/direct fixed
 sd1: 8750 MB, 19036 cyl, 2 head, 470 sec, 512 bytes/sect x 17921835 sectors
 sd1: sync (25.0ns offset 63), 16-bit (80.000MB/s) transfers, tagged queueing
 sd2 at scsibus0 target 2 lun 0: <SEAGATE, ST39205LW, 5063> SCSI3 0/direct fixed
 sd2: 8750 MB, 19036 cyl, 2 head, 470 sec, 512 bytes/sect x 17921835 sectors
 sd2: sync (25.0ns offset 63), 16-bit (80.000MB/s) transfers, tagged queueing
 sd3 at scsibus0 target 3 lun 0: <SEAGATE, ST39205LW, 5063> SCSI3 0/direct fixed
 sd3: 8750 MB, 19036 cyl, 2 head, 470 sec, 512 bytes/sect x 17921835 sectors
 sd3: sync (25.0ns offset 63), 16-bit (80.000MB/s) transfers, tagged queueing
 scsibus1: waiting 2 seconds for devices to settle...
 boot device: sd0
 root on sd0a dumps on sd0b
 root file system type: ffs