NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
kern/42904: RaidFrame panic after removal of RAID-1 member
>Number: 42904
>Category: kern
>Synopsis: after removal of a failing RaidFrame RAID-1 member, netbsd
>panics
>Confidential: no
>Severity: serious
>Priority: high
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sun Feb 28 20:40:00 +0000 2010
>Originator: Louis Guillaume
>Release: NetBSD 5.0_STABLE
>Organization:
>Environment:
System: NetBSD xxx.xxx.xxx 5.0_STABLE NetBSD 5.0_STABLE (GENERIC) #13: Wed Dec
30 14:39:00 EST 2009
louis%xx.xx.xxx@localhost:/usr/obj/sys/arch/i386/compile/GENERIC i386
Architecture: i386
Machine: i386
>Description:
First some background on our setup...
# raidctl -s raid0
Components:
/dev/sd0a: failed
/dev/sd1a: optimal
No spares.
/dev/sd0a status is: failed. Skipping label.
Component label for /dev/sd1a:
Row: 0, Column: 1, Num Rows: 1, Num Columns: 2
Version: 2, Serial Number: 20071216, Mod Counter: 280
Clean: No, Status: 0
sectPerSU: 128, SUsPerPU: 1, SUsPerRU: 1
Queue size: 100, blocksize: 512, numBlocks: 143638784
RAID Level: 1
Autoconfig: Yes
Root partition: Yes
Last configured as: raid0
Parity status: DIRTY
Reconstruction is 100% complete.
Parity Re-write is 100% complete.
Copyback is 100% complete.
# dmesg | grep sd0
sd0 at scsibus0 target 0 lun 0: <ModusLnk, , > disk fixed
sd0: 70136 MB, 78753 cyl, 2 head, 911 sec, 512 bytes/sect x 143638992 sectors
sd0: sync (12.50ns offset 62), 16-bit (160.000MB/s) transfers, tagged queueing
raid0: Components: /dev/sd0a[**FAILED**] /dev/sd1a
# grep smartd.*sd0d /var/log/messages |tail -3
Feb 26 00:43:04 thoth smartd[296]: Device: /dev/sd0d, opened
Feb 26 00:43:04 thoth smartd[296]: Device: /dev/sd0d, is SMART capable. Adding
to "monitor" list.
Feb 26 00:43:04 thoth smartd[296]: Device: /dev/sd0d, SMART Failure: HARDWARE
IMPENDING FAILURE TOO MANY BLOCK REASSIGNS
So we got a bad disk and I have to change it out. So I did the following:
o failed the component with "raidctl -f /dev/sd0a raid0"
o shut down
o replaced the disk
o rebooted
o Now the system panics right after raidframe initializes.
Screen shots can be found at...
ftp://zabrico.com/pub/RaidFrame-Panic-0.jpeg
ftp://zabrico.com/pub/RaidFrame-Panic-1.jpeg
In this case, I had removed the failing drive, so we have sd0 on
scsibus1. This drive normally shows up as sd1 on scsibus1, but that
shouldn't matter to RaidFrame. At any rate, the same thing happens
with a new blank (identical) disk in scsibus0.
o power off
o replace the "bad" sd0
o machine boots as normal
>How-To-Repeat:
Not sure if this will be repeatable on other raidframe machines, but here's
what causes
it to happen:
o Set up a RAID-1 device
o Fail one component with "raidctl -f /dev/xx0a raid0"
o shut down
o remove the failed component
o start up
o system panics right after, "Kernelized RaidFrame activated".
>Fix:
See Greg Oster's analysis in this thread...
http://mail-index.netbsd.org/netbsd-users/2010/02/26/msg005746.html
not sure if the actual fix is there but...
Home |
Main Index |
Thread Index |
Old Index