Subject: RAID kills the machine
To: None <current-users@netbsd.org>
From: Tobias Schuepp <netbsd@schuepp.net>
List: current-users
Date: 08/23/2002 16:24:31
Hello current-users,

i am running an i386 box with NetBSD1.6F. I compiled the raidframe
support into the Kernel and installed two scsi disks:

>Aug 23 13:24:31 netbsd /netbsd: sd0 at scsibus0 target 0 lun 0: <QUANTUM, ATLAS IV 9 WLS, 0808> SCSI3 0/direct fixed
>Aug 23 13:24:31 netbsd /netbsd: sd0: 8761 MB, 13816 cyl, 4 head, 324 sec, 512 bytes/sect x 17942584 sectors
>Aug 23 13:24:31 netbsd /netbsd: sd0: sync (50.0ns offset 31), 16-bit (40.000MB/s) transfers, tagged queueing
>Aug 23 13:24:31 netbsd /netbsd: sd1 at scsibus0 target 4 lun 0: <QUANTUM, XP34550W, LXY4> SCSI2 0/direct fixed
>Aug 23 13:24:31 netbsd /netbsd: sd1: 4341 MB, 5899 cyl, 10 head, 150 sec, 512 bytes/sect x 8890760 sectors
>Aug 23 13:24:31 netbsd /netbsd: sd1: sync (50.0ns offset 31), 16-bit (40.000MB/s) transfers, tagged queueing

They are configured as RAID-Level 1. If i start playing an mpeg file
form this device the machine hangs and produces output like this:

>Aug 23 13:18:06 netbsd /netbsd: ahc0: Data Parity Error Detected during address or write data phase
>Aug 23 13:19:06 netbsd /netbsd: sd1(ahc0:0:4:0): SCB 11 - timed out in Data-in phase, SEQADDR == 0x5d
>Aug 23 13:19:06 netbsd /netbsd: SCSIRATE == 0x95
>Aug 23 13:19:06 netbsd /netbsd: sd1(ahc0:0:4:0): Other SCB Timeout
>Aug 23 13:19:06 netbsd /netbsd: sd0(ahc0:0:0:0): SCB 10 - timed out in Data-in phase, SEQADDR == 0x5d
>Aug 23 13:19:06 netbsd /netbsd: SCSIRATE == 0x95
>Aug 23 13:19:06 netbsd /netbsd: sd0(ahc0:0:0:0): BDR message in message buffer
>Aug 23 13:19:06 netbsd /netbsd: sd0(ahc0:0:0:0): no longer in timeout, status = 0
>Aug 23 13:19:06 netbsd /netbsd: sd0(ahc0:0:0:0): Unexpected busfree in Message-out phase
>Aug 23 13:19:06 netbsd /netbsd: SEQADDR == 0x165
>Aug 23 13:19:06 netbsd /netbsd: sd0(ahc0:0:0:0): parity error detected in Data-in phase. SEQADDR(0x164) SCSIRATE(0x95)
>Aug 23 13:19:06 netbsd /netbsd: sd0(ahc0:0:0:0): parity error detected in Data-in phase. SEQADDR(0x166) SCSIRATE(0x95)
>Aug 23 13:19:06 netbsd last message repeated 16 times
>Aug 23 13:19:06 netbsd /netbsd: ahc0:A:0: unknown scsi bus phase 0.  Attempting to continue
>Aug 23 13:19:06 netbsd /netbsd: ahc0:A:0: Target did not send an IDENTIFY message. LASTPHASE = 0x0, SAVED_TCL == 0x0
>Aug 23 13:19:06 netbsd /netbsd: sd0: async, 8-bit transfers, tagged queueing
>Aug 23 13:19:06 netbsd /netbsd: sd1: async, 8-bit transfers, tagged queueing
>Aug 23 13:19:06 netbsd /netbsd: ahc0: Issued Channel A Bus Reset. 2 SCBs aborted
>Aug 23 13:19:06 netbsd /netbsd: sd0(ahc0:0:0:0): generic HBA error
>Aug 23 13:19:06 netbsd /netbsd: raid0: IO Error.  Marking /dev/sd0a as failed.
>Aug 23 13:20:06 netbsd /netbsd: raid0: node (Wsd) returned fail, rolling forward

And then the machine panics and reboots:

Aug 23 13:24:30 netbsd savecore: reboot after panic: panic: raidframe error at line 460 file /usr/src/sys/arch/i386/compile/CUSTOM/../../../../dev/raidframe/rf_states.c


I can reproduce that. Does it belong to my disks or is it a bug in raidframe?


Tobias


ps.: I append the dmesg:

Aug 23 13:24:30 netbsd /netbsd: NetBSD 1.6F (Goldkettchen) #4: Sun Aug 18 19:58:15 CEST 2002
Aug 23 13:24:30 netbsd /netbsd:     root@netbsd:/usr/src/sys/arch/i386/compile/CUSTOM
Aug 23 13:24:30 netbsd /netbsd: cpu0: Intel Pentium III (Coppermine) (686-class), 797.38 MHz
Aug 23 13:24:30 netbsd /netbsd: cpu0: I-cache 16 KB 32b/line 4-way, D-cache 16 KB 32b/line 2-way
Aug 23 13:24:30 netbsd /netbsd: cpu0: L2 cache 256 KB 32b/line 8-way
Aug 23 13:24:30 netbsd /netbsd: cpu0: features 383fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
Aug 23 13:24:30 netbsd /netbsd: cpu0: features 383fbff<PGE,MCA,CMOV,FGPAT,PSE36,MMX>
Aug 23 13:24:30 netbsd /netbsd: cpu0: features 383fbff<FXSR,SSE>
Aug 23 13:24:30 netbsd /netbsd: total memory = 383 MB
Aug 23 13:24:30 netbsd /netbsd: avail memory = 351 MB
Aug 23 13:24:30 netbsd /netbsd: using 4934 buffers containing 19736 KB of memory
Aug 23 13:24:30 netbsd /netbsd: BIOS32 rev. 0 found at 0xe7300
Aug 23 13:24:30 netbsd /netbsd: mainbus0 (root)
Aug 23 13:24:30 netbsd /netbsd: pnpbios0 at mainbus0: nodes 18, max len 250
Aug 23 13:24:30 netbsd /netbsd: pnpbios0: node index mismatch (static): requested 0, got 1
Aug 23 13:24:30 netbsd /netbsd: pci0 at mainbus0 bus 0: configuration mode 1
Aug 23 13:24:30 netbsd /netbsd: pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
Aug 23 13:24:30 netbsd /netbsd: pchb0 at pci0 dev 0 function 0
Aug 23 13:24:30 netbsd /netbsd: pchb0: Intel 82815 Hub (rev. 0x02)
Aug 23 13:24:30 netbsd /netbsd: pchb0: random number generator enabled
Aug 23 13:24:30 netbsd /netbsd: agp0 at pchb0: can't find internal VGA device config space
Aug 23 13:24:30 netbsd /netbsd: ppb0 at pci0 dev 1 function 0: Intel 82815 AGP (rev. 0x02)
Aug 23 13:24:30 netbsd /netbsd: pci1 at ppb0 bus 1
Aug 23 13:24:30 netbsd /netbsd: pci1: i/o space, memory space enabled
Aug 23 13:24:30 netbsd /netbsd: vga1 at pci1 dev 0 function 0: Nvidia Corporation Vanta (rev. 0x15)
Aug 23 13:24:30 netbsd /netbsd: wsdisplay0 at vga1 kbdmux 1: console (80x24, vt100 emulation)
Aug 23 13:24:30 netbsd /netbsd: wsmux1: connecting to wsdisplay0
Aug 23 13:24:30 netbsd /netbsd: ppb1 at pci0 dev 30 function 0: Intel 82801AA Hub-to-PCI Bridge (rev. 0x02)
Aug 23 13:24:30 netbsd /netbsd: pci2 at ppb1 bus 2
Aug 23 13:24:30 netbsd /netbsd: pci2: i/o space, memory space enabled
Aug 23 13:24:30 netbsd /netbsd: eso0 at pci2 dev 8 function 0: ESS Solo-1 PCI AudioDrive ES1946 Revision E
Aug 23 13:24:30 netbsd /netbsd: eso0: interrupting at irq 5
Aug 23 13:24:30 netbsd /netbsd: eso0: mapping Audio 1 DMA using VC I/O space at 0x14d0
Aug 23 13:24:30 netbsd /netbsd: audio0 at eso0: full duplex, mmap, independent
Aug 23 13:24:30 netbsd /netbsd: opl at eso0 not configured
Aug 23 13:24:30 netbsd /netbsd: mpu at eso0 not configured
Aug 23 13:24:30 netbsd /netbsd: joy at eso0: not configured
Aug 23 13:24:30 netbsd /netbsd: ex0 at pci2 dev 9 function 0: 3Com 3c905C-TX 10/100 Ethernet with mngmt (rev. 0x74)
Aug 23 13:24:30 netbsd /netbsd: ex0: interrupting at irq 9
Aug 23 13:24:30 netbsd /netbsd: ex0: MAC address 00:01:02:f7:57:36
Aug 23 13:24:30 netbsd /netbsd: bmtphy0 at ex0 phy 24: Broadcom 3c905C internal PHY, rev. 6
Aug 23 13:24:30 netbsd /netbsd: bmtphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
Aug 23 13:24:30 netbsd /netbsd: ahc0 at pci2 dev 10 function 0
Aug 23 13:24:30 netbsd /netbsd: ahc0: interrupting at irq 10
Aug 23 13:24:30 netbsd /netbsd: ahc0: aic7890/91 Wide Channel A, SCSI Id=7, 16/255 SCBs
Aug 23 13:24:30 netbsd /netbsd: scsibus0 at ahc0: 16 targets, 8 luns per target
Aug 23 13:24:30 netbsd /netbsd: pcib0 at pci0 dev 31 function 0
Aug 23 13:24:30 netbsd /netbsd: pcib0: Intel 82801AA LPC Interface Bridge (rev. 0x02)
Aug 23 13:24:30 netbsd /netbsd: pciide0 at pci0 dev 31 function 1: Intel 82801AA IDE Controller (ICH) (rev. 0x02)
Aug 23 13:24:30 netbsd /netbsd: pciide0: bus-master DMA support present
Aug 23 13:24:30 netbsd /netbsd: pciide0: primary channel wired to compatibility mode
Aug 23 13:24:30 netbsd /netbsd: wd0 at pciide0 channel 0 drive 1: <IC35L040AVVA07-0>
Aug 23 13:24:30 netbsd /netbsd: wd0: drive supports 16-sector PIO transfers, LBA addressing
Aug 23 13:24:30 netbsd /netbsd: wd0: 39266 MB, 16383 cyl, 16 head, 63 sec, 512 bytes/sect x 80418240 sectors
Aug 23 13:24:30 netbsd /netbsd: wd0: 32-bit data port
Aug 23 13:24:31 netbsd /netbsd: wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
Aug 23 13:24:31 netbsd /netbsd: pciide0: primary channel interrupting at irq 14
Aug 23 13:24:31 netbsd /netbsd: wd0(pciide0:0:1): using PIO mode 4, Ultra-DMA mode 4 (Ultra/66) (using DMA data transfers)
Aug 23 13:24:31 netbsd /netbsd: pciide0: secondary channel wired to compatibility mode
Aug 23 13:24:31 netbsd /netbsd: atapibus0 at pciide0 channel 1: 2 targets
Aug 23 13:24:31 netbsd /netbsd: cd0 at atapibus0 drive 0: <SONY    CD-RW  CRX155E, , 1.0c> type 5 cdrom removable
Aug 23 13:24:31 netbsd /netbsd: cd0: 32-bit data port
Aug 23 13:24:31 netbsd /netbsd: cd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 2 (Ultra/33)
Aug 23 13:24:31 netbsd /netbsd: pciide0: secondary channel interrupting at irq 15
Aug 23 13:24:31 netbsd /netbsd: cd0(pciide0:1:0): using PIO mode 4, Ultra-DMA mode 2 (Ultra/33) (using DMA data transfers)
Aug 23 13:24:31 netbsd /netbsd: uhci0 at pci0 dev 31 function 2: Intel 82801AA USB Controller (rev. 0x02)
Aug 23 13:24:31 netbsd /netbsd: uhci0: interrupting at irq 11
Aug 23 13:24:31 netbsd /netbsd: usb0 at uhci0: USB revision 1.0
Aug 23 13:24:31 netbsd /netbsd: uhub0 at usb0
Aug 23 13:24:31 netbsd /netbsd: uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
Aug 23 13:24:31 netbsd /netbsd: uhub0: 2 ports with 2 removable, self powered
Aug 23 13:24:31 netbsd /netbsd: isa0 at pcib0
Aug 23 13:24:31 netbsd /netbsd: com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, working fifo
Aug 23 13:24:31 netbsd /netbsd: com1 at isa0 port 0x2f8-0x2ff irq 3: ns16550a, working fifo
Aug 23 13:24:31 netbsd /netbsd: pckbc0 at isa0 port 0x60-0x64
Aug 23 13:24:31 netbsd /netbsd: pckbd0 at pckbc0 (kbd slot)
Aug 23 13:24:31 netbsd /netbsd: pckbc0: using irq 1 for kbd slot
Aug 23 13:24:31 netbsd /netbsd: wskbd0 at pckbd0: console keyboard, using wsdisplay0
Aug 23 13:24:31 netbsd /netbsd: pms0 at pckbc0 (aux slot)
Aug 23 13:24:31 netbsd /netbsd: pckbc0: using irq 12 for aux slot
Aug 23 13:24:31 netbsd /netbsd: wsmouse0 at pms0 mux 0
Aug 23 13:24:31 netbsd /netbsd: lpt0 at isa0 port 0x378-0x37b irq 7
Aug 23 13:24:31 netbsd /netbsd: pcppi0 at isa0 port 0x61
Aug 23 13:24:31 netbsd /netbsd: spkr0 at pcppi0
Aug 23 13:24:31 netbsd /netbsd: sysbeep0 at pcppi0
Aug 23 13:24:31 netbsd /netbsd: npx0 at isa0 port 0xf0-0xff: using exception 16
Aug 23 13:24:31 netbsd /netbsd: fdc0 at isa0 port 0x3f0-0x3f7 irq 6 drq 2
Aug 23 13:24:31 netbsd /netbsd: biomask ed45 netmask ef45 ttymask ffc7
Aug 23 13:24:31 netbsd /netbsd: scsibus0: waiting 2 seconds for devices to settle...
Aug 23 13:24:31 netbsd /netbsd: sd0 at scsibus0 target 0 lun 0: <QUANTUM, ATLAS IV 9 WLS, 0808> SCSI3 0/direct fixed
Aug 23 13:24:31 netbsd /netbsd: sd0: 8761 MB, 13816 cyl, 4 head, 324 sec, 512 bytes/sect x 17942584 sectors
Aug 23 13:24:31 netbsd /netbsd: sd0: sync (50.0ns offset 31), 16-bit (40.000MB/s) transfers, tagged queueing
Aug 23 13:24:31 netbsd /netbsd: st0 at scsibus0 target 1 lun 0: <SONY, SDT-9000, 0200> SCSI2 1/sequential removable
Aug 23 13:24:31 netbsd /netbsd: st0: drive empty
Aug 23 13:24:31 netbsd /netbsd: st0: async, 8-bit transfers
Aug 23 13:24:31 netbsd /netbsd: sd1 at scsibus0 target 4 lun 0: <QUANTUM, XP34550W, LXY4> SCSI2 0/direct fixed
Aug 23 13:24:31 netbsd /netbsd: sd1: 4341 MB, 5899 cyl, 10 head, 150 sec, 512 bytes/sect x 8890760 sectors
Aug 23 13:24:31 netbsd /netbsd: sd1: sync (50.0ns offset 31), 16-bit (40.000MB/s) transfers, tagged queueing
Aug 23 13:24:31 netbsd /netbsd: Kernelized RAIDframe activated
Aug 23 13:24:31 netbsd /netbsd: IPsec: Initialized Security Association Processing.
Aug 23 13:24:31 netbsd /netbsd: uhub1 at uhub0 port 2
Aug 23 13:24:31 netbsd /netbsd: uhub1: Alps Electric Hub in Apple USB Keyboard, class 9/0, rev 1.10/2.10, addr 2
Aug 23 13:24:31 netbsd /netbsd: uhub1: 3 ports with 2 removable, bus powered
Aug 23 13:24:31 netbsd /netbsd: uhidev0 at uhub1 port 1 configuration 1 interface 0
Aug 23 13:24:31 netbsd /netbsd: uhidev0: Alps Electric Apple USB Keyboard, rev 1.10/1.02, addr 3, iclass 3/1
Aug 23 13:24:31 netbsd /netbsd: ukbd0 at uhidev0
Aug 23 13:24:31 netbsd /netbsd: wskbd1 at ukbd0 mux 1
Aug 23 13:24:31 netbsd /netbsd: wskbd1: connecting to wsdisplay0
Aug 23 13:24:31 netbsd /netbsd: uhidev1 at uhub1 port 2 configuration 1 interface 0
Aug 23 13:24:31 netbsd /netbsd: uhidev1: Logitech M4848, rev 1.00/5.02, addr 4, iclass 3/1
Aug 23 13:24:31 netbsd /netbsd: ums0 at uhidev1: 1 button
Aug 23 13:24:31 netbsd /netbsd: wsmouse1 at ums0 mux 0
Aug 23 13:24:31 netbsd /netbsd: boot device: wd0
Aug 23 13:24:31 netbsd /netbsd: root on wd0a dumps on wd0b
Aug 23 13:24:31 netbsd /netbsd: root file system type: ffs
Aug 23 13:24:31 netbsd /netbsd: Hosed component: /dev/sd0a
Aug 23 13:24:31 netbsd /netbsd: raid0: Ignoring /dev/sd0a
Aug 23 13:24:31 netbsd /netbsd: raid0: Component /dev/sd1a being configured at row: 0 col: 1
Aug 23 13:24:31 netbsd /netbsd:          Row: 0 Column: 1 Num Rows: 1 Num Columns: 2
Aug 23 13:24:31 netbsd /netbsd:          Version: 2 Serial Number: 112341 Mod Counter: 158
Aug 23 13:24:31 netbsd /netbsd:          Clean: No Status: 0
Aug 23 13:24:31 netbsd /netbsd: /dev/sd1a is not clean!
Aug 23 13:24:31 netbsd /netbsd: raid0: RAID Level 1
Aug 23 13:24:31 netbsd /netbsd: raid0: Components: /dev/sd0a[**FAILED**] /dev/sd1a
Aug 23 13:24:31 netbsd /netbsd: raid0: Total Sectors: 8890624 (4341 MB)
Aug 23 13:24:31 netbsd /netbsd: raid0: Error re-writing parity!
Aug 23 13:24:31 netbsd /netbsd: IP Filter: v3.4.27 initialized.  Default = pass all, Logging = enabled
Aug 23 13:24:31 netbsd /netbsd: wsdisplay0: screen 1 added (80x25, vt100 emulation)
Aug 23 13:24:31 netbsd /netbsd: wsdisplay0: screen 2 added (80x25, vt100 emulation)
Aug 23 13:24:31 netbsd /netbsd: wsdisplay0: screen 3 added (80x50, vt100 emulation)
Aug 23 13:24:31 netbsd /netbsd: wsdisplay0: screen 4 added (80x25, vt100 emulation)
Aug 23 13:24:30 netbsd savecore: reboot after panic: panic: raidframe error at line 460 file /usr/src/sys/arch/i386/compile/CUSTOM/../../../../dev/raidframe/rf_states.c
Aug 23 13:24:31 netbsd savecore: writing compressed core to /var/crash/netbsd.1.core.gz
Aug 23 13:25:27 netbsd savecore: writing compressed kernel to /var/crash/netbsd.1.gz