NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

port-i386/41706: disk subsystem unresponsive after (recovered) disk failure



>Number:         41706
>Category:       port-i386
>Synopsis:       after a failure of a componented disk of raid0 the disk 
>subsystem became unresponsive
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    port-i386-maintainer
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Jul 12 15:05:00 +0000 2009
>Originator:     Christoph Badura
>Release:        NetBSD 5.0_STABLE as of 2009-07-02
>Organization:
netbsd bozotic software test labs
        
>Environment:
        
        
System: NetBSD sanctioned-parts-list 5.0_STABLE NetBSD 5.0_STABLE (GENERIC) #0: 
Thu Jul 2 18:47:45 UTC 2009 
root@arbitrary:/m/obj/m/src/sys/arch/i386/compile/GENERIC i386
Architecture: i386
Machine: i386
Dmesg:
Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
    2006, 2007, 2008
    The NetBSD Foundation, Inc.  All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
    The Regents of the University of California.  All rights reserved.

NetBSD 5.0_STABLE (GENERIC) #0: Thu Jul  2 18:47:45 UTC 2009
        root@arbitrary:/m/obj/m/src/sys/arch/i386/compile/GENERIC
total memory = 2047 MB
avail memory = 2000 MB
timecounter: Timecounters tick every 10.000 msec
timecounter: Timecounter "i8254" frequency 1193182 Hz quality 100
Dell Computer Corporation PowerEdge 1400              
mainbus0 (root)
cpu0 at mainbus0 apid 0: Intel 686-class, 860MHz, id 0x686
ioapic0 at mainbus0 apid 1: pa 0xfec00000, version 11, 16 pins
ioapic1 at mainbus0 apid 2: pa 0xfec01000, version 11, 16 pins
acpi0 at mainbus0: Intel ACPICA 20080321
acpi0: X/RSDT: OemId <DELL  ,PE1400  ,00000002>, AslId <MSFT,0100000a>
LUSB: ACPI: Found matching pin for 0.15.INTA at func 2: 10
acpi0: SCI interrupting at int 9
acpi0: fixed-feature power button present
timecounter: Timecounter "ACPI-Safe" frequency 3579545 Hz quality 900
ACPI-Safe 32-bit timer
npx1 at acpi0 (FPU, PNP0C04): io 0xf0-0xff irq 13
npx1: reported by CPUID; using exception 16
pcppi1 at acpi0 (SPK, PNP0800): io 0x61
midi0 at pcppi1: PC speaker (CPU-intensive output)
sysbeep0 at pcppi1
attimer1 at acpi0 (TMR, PNP0100): io 0x40-0x5f irq 0
FDC (PNP0700) at acpi0 not configured
pckbc1 at acpi0 (KBD, PNP0303) (kbd port): io 0x60,0x64 irq 1
pckbc2 at acpi0 (MOU, PNP0F13) (aux port): irq 12
COMA (PNP0501) at acpi0 not configured
COMB (PNP0501) at acpi0 not configured
PRT (PNP0401) at acpi0 not configured
apm0 at acpi0: Power Management spec V1.2
attimer1: attached to pcppi1
pckbd0 at pckbc1 (kbd slot)
pckbc1: using irq 1 for kbd slot
wskbd0 at pckbd0: console keyboard
pci0 at mainbus0 bus 0: configuration mode 1
pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
pchb0 at pci0 dev 0 function 0
pchb0: vendor 0x1166 product 0x0009 (rev. 0x06)
pchb1 at pci0 dev 0 function 1
pchb1: vendor 0x1166 product 0x0009 (rev. 0x06)
pci1 at pchb1 bus 1
pci1: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
ahc1 at pci1 dev 2 function 0: Adaptec aic7899 Ultra160 SCSI adapter
ahc1: interrupting at ioapic1 pin 14
ahc1: aic7899: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs
scsibus0 at ahc1: 16 targets, 8 luns per target
ahc2 at pci1 dev 2 function 1: Adaptec aic7899 Ultra160 SCSI adapter
ahc2: interrupting at ioapic1 pin 15
ahc2: aic7899: Ultra160 Wide Channel B, SCSI Id=7, 32/253 SCBs
scsibus1 at ahc2: 16 targets, 8 luns per target
fxp0 at pci0 dev 2 function 0: i82559 Ethernet, rev 8
fxp0: interrupting at ioapic1 pin 0
fxp0: May need receiver lock-up workaround
fxp0: Ethernet address 00:b0:d0:aa:f3:3c
inphy0 at fxp0 phy 1: i82555 10/100 media interface, rev. 4
inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
vga1 at pci0 dev 14 function 0: vendor 0x1002 product 0x4752 (rev. 0x27)
wsdisplay0 at vga1 kbdmux 1: console (80x25, vt100 emulation), using wskbd0
wsmux1: connecting to wsdisplay0
drm at vga1 not configured
piixpm0 at pci0 dev 15 function 0
piixpm0: vendor 0x1166 product 0x0200 (rev. 0x50)
piixpm0: interrupting at SMIpiixpm0: polling
iic0 at piixpm0: I2C bus
rccide0 at pci0 dev 15 function 1
rccide0: ServerWorks OSB4 IDE Controller (rev. 0x00)
rccide0: bus-master DMA support present
rccide0: primary channel configured to compatibility mode
rccide0: primary channel interrupting at ioapic0 pin 14
atabus0 at rccide0 channel 0
rccide0: secondary channel configured to compatibility mode
rccide0: secondary channel interrupting at ioapic0 pin 15
atabus1 at rccide0 channel 1
ohci0 at pci0 dev 15 function 2: vendor 0x1166 product 0x0220 (rev. 0x04)
ohci0: interrupting at ioapic0 pin 10
ohci0: OHCI version 1.0, legacy support
usb0 at ohci0: USB revision 1.0
isa0 at mainbus0
lpt0 at isa0 port 0x378-0x37b irq 7
com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, working fifo
com1 at isa0 port 0x2f8-0x2ff irq 3: ns16550a, working fifo
fdc0 at isa0 port 0x3f0-0x3f7 irq 6 drq 2
timecounter: Timecounter "clockinterrupt" frequency 100 Hz quality 0
scsibus0: waiting 2 seconds for devices to settle...
scsibus1: waiting 2 seconds for devices to settle...
fd0 at fdc0 drive 0: 1.44MB, 80 cyl, 2 head, 18 sec
uhub0 at usb0: vendor 0x1166 OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
umass0 at uhub0 port 2 configuration 1 interface 0
umass0: vendor 0x0d7d USB DISK 2.0, rev 2.00/1.00, addr 2
umass0: using SCSI over Bulk-Only
scsibus2 at umass0: 2 targets, 1 lun per target
sd0 at scsibus0 target 0 lun 0: <HP, 9.10GB C 68-D94N, D94N> disk fixed
sd0: 8678 MB, 15110 cyl, 3 head, 392 sec, 512 bytes/sect x 17773524 sectors
sd0: sync (25.00ns offset 63), 16-bit (80.000MB/s) transfers, tagged queueing
sd1 at scsibus0 target 1 lun 0: <HP, 9.10GB C 68-D94N, D94N> disk fixed
sd1: 8678 MB, 15110 cyl, 3 head, 392 sec, 512 bytes/sect x 17773524 sectors
sd1: sync (25.00ns offset 63), 16-bit (80.000MB/s) transfers, tagged queueing
atapibus0 at atabus0: 2 targets
cd0 at atapibus0 drive 0: <CRD-8482B, , 1.05> cdrom removable
cd0: 32-bit data port
cd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 2 (Ultra/33)
cd0(rccide0:0:0): using PIO mode 4, DMA mode 2, Ultra-DMA mode 2 (Ultra/33) 
(using DMA)
sd2 at scsibus1 target 2 lun 0: <SEAGATE, SX150176LC, BA0F> disk fixed
sd2: 47702 MB, 12024 cyl, 22 head, 369 sec, 512 bytes/sect x 97693755 sectors
sd2: sync (50.00ns offset 15), 16-bit (40.000MB/s) transfers, tagged queueing
sd3 at scsibus1 target 3 lun 0: <SEAGATE, SX150176LC, BA11> disk fixed
sd3: 47702 MB, 12024 cyl, 22 head, 369 sec, 512 bytes/sect x 97693755 sectors
sd3: sync (50.00ns offset 15), 16-bit (40.000MB/s) transfers, tagged queueing
sd4 at scsibus2 target 0 lun 0: <, USB DISK 2.0, PMAP> disk removable
sd4: 240 MB, 962 cyl, 16 head, 32 sec, 512 bytes/sect x 492544 sectors
Kernelized RAIDframe activated
pad0: outputs: 44100Hz, 16-bit, stereo
audio0 at pad0: half duplex
raid0: RAID Level 1
raid0: Components: /dev/sd0a /dev/sd1a
raid0: Total Sectors: 17248256 (8422 MB)
boot device: raid0
root on raid0a dumps on raid0b
root file system type: ffs
raid0: Device already configured!
raid1: Component /dev/sd2a being configured at col: 0
         Column: 0 Num Columns: 2
         Version: 2 Serial Number: 76763 Mod Counter: 80
         Clean: Yes Status: 0
raid1: Component /dev/sd3a being configured at col: 1
         Column: 1 Num Columns: 2
         Version: 2 Serial Number: 76763 Mod Counter: 80
         Clean: Yes Status: 0
raid1: RAID Level 1
raid1: Components: /dev/sd2a /dev/sd3a
raid1: Total Sectors: 97693568 (47701 MB)
cgd0: error 22
tap0: Ethernet address f2:0b:a4:74:97:0d
wsdisplay0: screen 1 added (80x25, vt100 emulation)
wsdisplay0: screen 2 added (80x25, vt100 emulation)
wsdisplay0: screen 3 added (80x25, vt100 emulation)
wsdisplay0: screen 4 added (80x25, vt100 emulation)

>Description:
        
sd1 failed on the above system a couple of days ago.  What I could see
on the console were the messages from ahc1 being reset.  sd1 became
unready and would no longer respond positivly to a TEST UNIT READY command
(firmware diagnostic failure given as the reason).

The system sat there for 2 more days without further kernel messages.
Pressing return on the console would produce a new login prompt from getty.
The system was pingable and did accept TCP connections (e.g. to the SSH port).
But no disk IO would happen and no error messages were printed.
IOW. the block IO subsystem seems to have been deadlocked at a high level.

>How-To-Repeat:
        
Provoke a hardware failure in a component of a raid set inducing the ahc
driver to perform a HBA reset.

>Fix:
        

>Unformatted:
        
        


Home | Main Index | Thread Index | Old Index