NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/38856: Raidframe reconstruction locks my machine up



The following reply was made to PR kern/38856; it has been noted by GNATS.

From: Matthias Scheler <tron%zhadum.org.uk@localhost>
To: gnats-bugs%NetBSD.org@localhost
Cc: 
Subject: Re: kern/38856: Raidframe reconstruction locks my machine up
Date: Thu, 5 Jun 2008 22:37:20 +0100

 On 4 Jun 2008, at 14:45, martin%duskware.de@localhost wrote:
 > Soon (sometimes imediately) after I start reconstruction, the  
 > machine locks
 > up completely, I can't even break into ddb on the console. This  
 > happens within
 > less than 10 minutes reliably on the affected raid set. The other raid
 > (consisting of slightly slower disks, wd0 and wd1, see dmesg below)  
 > can
 > be reconstructed. I saw the lockup there too once, but can't  
 > reproduce this.
 
 
 I can reproduce this with NetBSD 3.x and 4.0 on this machine:
 
 NetBSD 4.0 (BEAVER) #0: Sun Dec 16 14:36:33 CET 2007
        tron%beaver.core.de@localhost:/usr/src/sys/arch/i386/compile/BEAVER
 total memory = 1023 MB
 avail memory = 1000 MB
 timecounter: Timecounters tick every 10.000 msec
 timecounter: Timecounter "i8254" frequency 1193182 Hz quality 100
 BIOS32 rev. 0 found at 0xf0e90
 mainbus0 (root)
 cpu0 at mainbus0: apid 0 (boot processor)
 cpu0: Intel Pentium 4 (686-class), 2018.08 MHz, id 0xf24
 cpu0: features  
 3febfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
 cpu0: features 3febfbff<PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX>
 cpu0: features 3febfbff<FXSR,SSE,SSE2,SS,HTT,TM>
 cpu0: "Intel(R) Pentium(R) 4 CPU 2.00GHz"
 cpu0: I-cache 12K uOp cache 8-way, D-cache 8 KB 64B/line 4-way
 cpu0: L2 cache 512 KB 64B/line 8-way
 cpu0: ITLB 4K/4M: 64 entries
 cpu0: DTLB 4K/4M: 64 entries
 cpu0: enabling thermal monitor 1 ... enabled.
 cpu0: calibrating local timer
 cpu0: apic clock running at 100 MHz
 cpu0: 16 page colors
 ioapic0 at mainbus0 apid 2 (I/O APIC)
 ioapic0: pa 0xfec00000, version 20, 24 pins
 acpi0 at mainbus0: Advanced Configuration and Power Interface
 acpi0: using Intel ACPI CA subsystem version 20060217
 acpi0: X/RSDT: OemId <ASUS  ,P4B266  ,42302e31>, AslId <MSFT,31313031>
 acpi0: SCI interrupting at int 9
 acpi0: fixed-feature power button present
 timecounter: Timecounter "ACPI-Fast" frequency 3579545 Hz quality 1000
 ACPI-Fast 24-bit timer
 mpacpi: could not get bus number, assuming bus 0
 ACPI Object Type 'Processor' (0x0c) at acpi0 not configured
 acpibut0 at acpi0 (PNP0C0C): ACPI Power Button
 PNP0C01 [System Board] at acpi0 not configured
 PNP0C0F [PCI interrupt link device] at acpi0 not configured
 PNP0C0F [PCI interrupt link device] at acpi0 not configured
 PNP0C0F [PCI interrupt link device] at acpi0 not configured
 PNP0C0F [PCI interrupt link device] at acpi0 not configured
 PNP0A03 [PCI/PCI-X Host Bridge] at acpi0 not configured
 PNP0C02 [Plug and Play motherboard register resources] at acpi0 not  
 configured
 PNP0C02 [Plug and Play motherboard register resources] at acpi0 not  
 configured
 PNP0000 [AT Interrupt Controller] at acpi0 not configured
 PNP0200 [AT DMA Controller] at acpi0 not configured
 attimer0 at acpi0 (PNP0100): AT Timer
 attimer0: io 0x40-0x43 irq 0
 PNP0B00 [AT Real-Time Clock] at acpi0 not configured
 pcppi0 at acpi0 (PNP0800)
 pcppi0: io 0x61
 sysbeep0 at pcppi0
 npx0 at acpi0 (PNP0C04)
 npx0: io 0xf0-0xff irq 13
 npx0: reported by CPUID; using exception 16
 fdc0 at acpi0 (PNP0700)
 fdc0: io 0x3f2-0x3f5,0x3f7 irq 6 drq 2
 lpt0 at acpi0 (PNP0401)
 lpt0: io 0x378-0x37f,0x778-0x77b irq 7 drq 3
 com0 at acpi0 (PNP0501-1)
 com0: io 0x3f8-0x3ff irq 4
 com0: ns16550a, working fifo
 com1 at acpi0 (PNP0501-2)
 com1: io 0x2f8-0x2ff irq 3
 com1: ns16550a, working fifo
 pckbc0 at acpi0 (PNP0303): kbd port
 pckbc0: io 0x60,0x64 irq 1
 PNP0C02 [Plug and Play motherboard register resources] at acpi0 not  
 configured
 pcppi0: attached to attimer0
 pci0 at mainbus0 bus 0: configuration mode 1
 pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
 pchb0 at pci0 dev 0 function 0
 pchb0: Intel 82845 Host (rev. 0x04)
 agp0 at pchb0: aperture at 0xfe000000, size 0x800000
 ppb0 at pci0 dev 1 function 0: Intel 82845 AGP (rev. 0x04)
 pci1 at ppb0 bus 1
 pci1: i/o space, memory space enabled
 ppb1 at pci0 dev 30 function 0: Intel 82801BA Hub-PCI Bridge (rev. 0x05)
 pci2 at ppb1 bus 2
 pci2: i/o space, memory space enabled
 pdcide0 at pci2 dev 9 function 0
 pdcide0: Promise Ultra100TX2/ATA Bus Master IDE Accelerator (rev. 0x01)
 pdcide0: bus-master DMA support present
 pdcide0: primary channel configured to native-PCI mode
 pdcide0: using ioapic0 pin 21 (irq 12) for native-PCI interrupt
 atabus0 at pdcide0 channel 0
 pdcide0: secondary channel configured to native-PCI mode
 atabus1 at pdcide0 channel 1
 ex0 at pci2 dev 11 function 0: 3Com 3c905C-TX 10/100 Ethernet with  
 mngmt (rev. 0x74)
 ex0: interrupting at ioapic0 pin 23 (irq 5)
 ex0: MAC address 00:04:76:1a:33:b8
 bmtphy0 at ex0 phy 24: Broadcom 3c905C internal PHY, rev. 6
 bmtphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
 vga0 at pci2 dev 12 function 0: S3 Trio32/64 (rev. 0x54)
 wsdisplay0 at vga0 kbdmux 1: console (80x25, vt100 emulation)
 wsmux1: connecting to wsdisplay0
 pcib0 at pci0 dev 31 function 0
 pcib0: Intel 82801BA LPC Interface Bridge (rev. 0x05)
 piixide0 at pci0 dev 31 function 1
 piixide0: Intel 82801BA IDE Controller (ICH2) (rev. 0x05)
 piixide0: bus-master DMA support present
 piixide0: primary channel wired to compatibility mode
 piixide0: primary channel interrupting at ioapic0 pin 14 (irq 14)
 atabus2 at piixide0 channel 0
 piixide0: secondary channel wired to compatibility mode
 piixide0: secondary channel interrupting at ioapic0 pin 15 (irq 15)
 atabus3 at piixide0 channel 1
 uhci0 at pci0 dev 31 function 2: Intel 82801BA USB Controller (rev.  
 0x05)
 uhci0: interrupting at ioapic0 pin 19 (irq 11)
 usb0 at uhci0: USB revision 1.0
 uhub0 at usb0
 uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
 uhub0: 2 ports with 2 removable, self powered
 uhci1 at pci0 dev 31 function 4: Intel 82801BA USB Controller (rev.  
 0x05)
 uhci1: interrupting at ioapic0 pin 23 (irq 5)
 usb1 at uhci1: USB revision 1.0
 uhub1 at usb1
 uhub1: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
 uhub1: 2 ports with 2 removable, self powered
 isa0 at pcib0
 ioapic0: enabling
 timecounter: Timecounter "TSC" frequency 2018112920 Hz quality 800
 timecounter: Timecounter "clockinterrupt" frequency 100 Hz quality 0
 Kernelized RAIDframe activated
 wd0 at atabus0 drive 0: <WDC WD2500SB-01RFA0>
 wd0: drive supports 16-sector PIO transfers, LBA48 addressing
 wd0: 233 GB, 486344 cyl, 16 head, 63 sec, 512 bytes/sect x 490234752  
 sectors
 wd0: 32-bit data port
 wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
 wd0(pdcide0:0:0): using PIO mode 4, Ultra-DMA mode 2 (Ultra/33) (using  
 DMA)
 wd1 at atabus1 drive 0: <SAMSUNG SP1203N>
 wd1: drive supports 16-sector PIO transfers, LBA48 addressing
 wd1: 111 GB, 232632 cyl, 16 head, 63 sec, 512 bytes/sect x 234493056  
 sectors
 wd1: 32-bit data port
 wd1: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
 wd1(pdcide0:1:0): using PIO mode 4, Ultra-DMA mode 5 (Ultra/100)  
 (using DMA)
 wd2 at atabus2 drive 0: <WDC WD2500SB-01RFA0>
 wd2: drive supports 16-sector PIO transfers, LBA48 addressing
 wd2: 233 GB, 486344 cyl, 16 head, 63 sec, 512 bytes/sect x 490234752  
 sectors
 wd2: 32-bit data port
 wd2: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
 wd2(piixide0:0:0): using PIO mode 4, Ultra-DMA mode 5 (Ultra/100)  
 (using DMA)
 raid0: RAID Level 1
 raid0: Components: component0[**FAILED**] /dev/wd2a
 raid0: Total Sectors: 490234624 (239372 MB)
 boot device: raid0
 root on raid0a dumps on raid0b
 root file system type: ffs
 wsdisplay0: screen 1 added (80x25, vt100 emulation)
 wsdisplay0: screen 2 added (80x25, vt100 emulation)
 wsdisplay0: screen 3 added (80x25, vt100 emulation)
 wsdisplay0: screen 4 added (80x25, vt100 emulation)
 
 The symptoms are exactly the same:
 The machine locks up hard while rebuilding the RAID. No panic, no  
 change to get into
 the kernel debugger. The problem is very easily reproducible. The  
 machine looks up
 in about 75% of the attempt. It occasionally manages to complete the  
 parity rewrite
 but freezes within the next few days.
 
 The machine is currently running with a broken RAID 1 and has been up  
 for 82 days
 without any problems.
 
        Kind regards
 
 -- 
 Matthias Scheler                           http://zhadum.org.uk/
 
 


Home | Main Index | Thread Index | Old Index