NetBSD-Users archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: RaidFrame Raid-1 problem (can't ditch a failing disk)
On Sun, 28 Feb 2010 01:49:53 -0500
Louis Guillaume <louis%zabrico.com@localhost> wrote:
> On 2/26/10 8:39 AM, Greg Oster wrote:
> > On Fri, 26 Feb 2010 01:12:12 -0500
> > Louis Guillaume<louis%zabrico.com@localhost> wrote:
> >
> >> Hi!
> >>
> >> I have a strange problem replacing a drive from a RAID-1 RaidFrame set.
> >> Here's some info:
> >>
> >> # uname -mrs
> >> NetBSD 5.0_STABLE i386
> >>
> >> # raidctl -s raid0
> >> Components:
> >> /dev/sd0a: failed
> >> /dev/sd1a: optimal
> >> No spares.
> >> /dev/sd0a status is: failed. Skipping label.
> >> Component label for /dev/sd1a:
> >> Row: 0, Column: 1, Num Rows: 1, Num Columns: 2
> >> Version: 2, Serial Number: 20071216, Mod Counter: 280
> >> Clean: No, Status: 0
> >> sectPerSU: 128, SUsPerPU: 1, SUsPerRU: 1
> >> Queue size: 100, blocksize: 512, numBlocks: 143638784
> >> RAID Level: 1
> >> Autoconfig: Yes
> >> Root partition: Yes
> >> Last configured as: raid0
> >> Parity status: DIRTY
> >> Reconstruction is 100% complete.
> >> Parity Re-write is 100% complete.
> >> Copyback is 100% complete.
> >>
> >> # dmesg | grep sd0
> >> sd0 at scsibus0 target 0 lun 0:<ModusLnk, ,> disk fixed
> >> sd0: 70136 MB, 78753 cyl, 2 head, 911 sec, 512 bytes/sect x 143638992
> >> sectors
> >> sd0: sync (12.50ns offset 62), 16-bit (160.000MB/s) transfers, tagged
> >> queueing
> >> raid0: Components: /dev/sd0a[**FAILED**] /dev/sd1a
> >>
> >> # grep smartd.*sd0d /var/log/messages |tail -3
> >> Feb 26 00:43:04 thoth smartd[296]: Device: /dev/sd0d, opened
> >> Feb 26 00:43:04 thoth smartd[296]: Device: /dev/sd0d, is SMART capable.
> >> Adding to "monitor" list.
> >> Feb 26 00:43:04 thoth smartd[296]: Device: /dev/sd0d, SMART Failure:
> >> HARDWARE IMPENDING FAILURE TOO MANY BLOCK REASSIGNS
> >>
> >>
> >>
> >> So we got a bad disk and I have to change it out. So I did the following:
>
> >> Any help would be great!
> >
> > Since what you're doing seems to be correct, I think we'e going to need
> > a photo or backtrace or whatever of the panic in order to figure out
> > what's gone wrong :(
> >
> > Later...
> >
> > Greg Oster
>
>
> Ok - I was afraid of that. The problem is 100% reproducible, though, so
> it's easy to do. Here are the screenshots:
>
> ftp://zabrico.com/pub/RaidFrame-Panic-0.jpeg
> ftp://zabrico.com/pub/RaidFrame-Panic-1.jpeg
>
> In this case, I had removed the failing drive, so we have sd0 on
> scsibus1. This drive normally shows up as sd1 on scsibus1, but IIRC that
> doesn't matter to RaidFrame, right?
Nope.
> At any rate, the same thing happens
> with a new blank (identical) disk in scsibus0.
The issue is that the rf_parity_map.c bits arn't checking to see if a
component is valid before attempting to get (and then use!) a component
label. I'll see if I can whip up a patch for you to test, if Jed
doesn't beat me to it... (I missed this issue when I was looking over
the paritymap stuff... :( )
This will also be a critical patch to get pulled up for NetBSD 5.1 as
well...
Later...
Greg Oster
Home |
Main Index |
Thread Index |
Old Index