NetBSD-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: RaidFrame Raid-1 problem (can't ditch a failing disk)



On Fri, 26 Feb 2010 01:12:12 -0500
Louis Guillaume <louis%zabrico.com@localhost> wrote:

> Hi!
> 
> I have a strange problem replacing a drive from a RAID-1 RaidFrame set. 
> Here's some info:
> 
> # uname -mrs
> NetBSD 5.0_STABLE i386
> 
> # raidctl -s raid0
> Components:
>             /dev/sd0a: failed
>             /dev/sd1a: optimal
> No spares.
> /dev/sd0a status is: failed.  Skipping label.
> Component label for /dev/sd1a:
>     Row: 0, Column: 1, Num Rows: 1, Num Columns: 2
>     Version: 2, Serial Number: 20071216, Mod Counter: 280
>     Clean: No, Status: 0
>     sectPerSU: 128, SUsPerPU: 1, SUsPerRU: 1
>     Queue size: 100, blocksize: 512, numBlocks: 143638784
>     RAID Level: 1
>     Autoconfig: Yes
>     Root partition: Yes
>     Last configured as: raid0
> Parity status: DIRTY
> Reconstruction is 100% complete.
> Parity Re-write is 100% complete.
> Copyback is 100% complete.
> 
> # dmesg | grep sd0
> sd0 at scsibus0 target 0 lun 0: <ModusLnk, , > disk fixed
> sd0: 70136 MB, 78753 cyl, 2 head, 911 sec, 512 bytes/sect x 143638992 
> sectors
> sd0: sync (12.50ns offset 62), 16-bit (160.000MB/s) transfers, tagged 
> queueing
> raid0: Components: /dev/sd0a[**FAILED**] /dev/sd1a
> 
> # grep smartd.*sd0d /var/log/messages |tail -3
> Feb 26 00:43:04 thoth smartd[296]: Device: /dev/sd0d, opened
> Feb 26 00:43:04 thoth smartd[296]: Device: /dev/sd0d, is SMART capable. 
> Adding to "monitor" list.
> Feb 26 00:43:04 thoth smartd[296]: Device: /dev/sd0d, SMART Failure: 
> HARDWARE IMPENDING FAILURE TOO MANY BLOCK REASSIGNS
> 
> 
> 
> So we got a bad disk and I have to change it out. So I did the following:
> 
>    o failed the component with "raidctl -f /dev/sd0a raid0"
>    o shut down
>    o replaced the disk
>    o rebooted
>    o Now the system panics right after raidframe initializes. Sorry
>      I don't have the exact messages but its all raidframe stuff.
>      Maybe I'll have to take a photo or something. "reboot 0x104"
>      didn't seem to work.
>    o power off
>    o replace the "bad" sd0
>    o machine boots as normal
> 
> So what gives? I verified that I'm removing the correct disk. No 
> question; the hardware agrees, the LSI Logic bios display agrees and the 
> scsibus/devices all agree that I'm removing the correct drive.
> 
> I also tried removing the drive and not replacing it with a new one. 
> Still no luck there.
> 
> Any help would be great!
 
Since what you're doing seems to be correct, I think we'e going to need
a photo or backtrace or whatever of the panic in order to figure out
what's gone wrong :(

Later...

Greg Oster


Home | Main Index | Thread Index | Old Index