tech-kern archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: Problems with raidframe under NetBSD-5.1/i386
On Thu, 20 Jan 2011 17:28:21 -0800
buhrow%lothlorien.nfbcal.org@localhost (Brian Buhrow) wrote:
> hello. I got side tracked from this problem for a while, but
> I'm back to looking at it as I have time.
> I think I may have been barking up the wrong tree with
> respect to the problem I'm having reconstructing to raidframe disks
> with wedges on the raid sets. Putting in a little extra info in the
> error messages yields: raid2: initiating in-place reconstruction on
> column 4 raid2: Recon write failed (status 30(0x1e)!
> raid2: reconstruction failed.
>
> If that status number, taken from the second argument of
> rf_ReconeWriteDoneProc() is an error from /usr/include/sys/errno.h,
> then I'm getting EROFS when I try to reconstruct the disk.
Hmmm... strange...
> Wouldn't
> that seem to imply that raidframe is trying to write over some
> protected portion of one of the components, probably the one I can't
> reconstruct to? Each of the components has a BSD disklabel on it, and
> I know that the raid set actually begins 64 sectors from the start of
> the partition in which the raid set resides. However, is a similar
> "back set" done for the end of the raid? That is, does the raid set
> extend all the way to the end of its partition or does it leave some
> space at the end for data as well?
No, it doesn't. The RAID set will use the remainder of the component,
but up to a multiple of whatever the stripe width is... (that is, the
RAID set will always end on a complete stripe.)
> Here's the thought. I notice when
> I was reading through the wedge code, that there's a reference to
> searching for backup gpt tables and that one of the backups is stored
> at the end of the media passed to the wedge discovery code. Since
> the broken component is the last component in the raid set, I wonder
> if the wedge discovery code is marking the sectors containing the gpt
> table at the end of the raid set as protected, but for the disk
> itself, rather than the raid set? I want to say that this is only a
> theory at the moment, based on a quick diagnostic enhancement to the
> error messages, but I can't think of another reason why I'd be
> getting that error. I'm going to be in and out of the office over the
> next week, but I'll try to see if I can capture the block numbers
> that are attempting to be written when the error occurs. I think I
> can do that with a debug kernel I have built for the purpose. Again,
> this problem exists under 5.0, not just 5.1, so it predates Jed's
> changes. If anyone has any other thoughts as to why I'd be getting
> EROFS on a raid component when trying to reconstruct to it, but not
> when I create the raid, I'm all ears.
So when one builds a regular filesystem on a wedge, do they end up with
the same problem with 'data' at the end of the wedge? If one does a dd
to the wedge, does it report write errors before the end of the wedge?
I really need to get my test box up-to-speed again, but that's going to
have to wait a few more weeks...
Later...
Greg Oster
> On Jan 7, 3:22pm, Brian Buhrow wrote:
> } Subject: Re: Problems with raidframe under NetBSD-5.1/i386
> } hello Greg. Regarding problem 1, the inability to
> reconstruct disks } in raid sets with wedges in them, I confess I
> don't understand the vnode } stuff entirely, but rf_getdisksize() in
> rf_netbsdkintf.c looks suspicious } to me. I'm a little unclear, but
> it looks like it tries to get the disk } size a number of ways,
> including by checking for a possible wedge on the } component. I
> wonder if that's what's sending the reference count too high? }
> -thanks } -Brian
> }
> } On Jan 7, 2:17pm, Greg Oster wrote:
> } } Subject: Re: Problems with raidframe under NetBSD-5.1/i386
> } } On Fri, 7 Jan 2011 05:34:11 -0800
> } } buhrow%lothlorien.nfbcal.org@localhost (Brian Buhrow) wrote:
> } }
> } } > hello. OK. Still more info.There seem to be two bugs
> here: } } >
> } } > 1. Raid sets with gpt partition tables in the raid set are not
> able } } > to reconstruct failed components because, for some reason,
> the failed } } > component is still marked open by the system even
> after the raidframe } } > code has marked it dead. Still looking
> into the fix for that one. } }
> } } Is this just with autoconfig sets, or with non-autoconfig sets
> too? } } When RF marks a disk as 'dead', it only does so internally,
> and doesn't } } write anything to the 'dead' disk. It also doesn't
> even try to close } } the disk (maybe it should?). Where it does try
> to close the disk is } } when you do a reconstruct-in-place -- there,
> it will close the disk } } before re-opening it...
> } }
> } } rf_netbsdkintf.c:rf_close_component() should take care of closing
> a } } component, but does something Special need to be done for
> wedges there? } }
> } } > 2. Raid sets with gpt partition tables on them cannot be
> } } > unconfigured and reconfigured without rebooting. This is
> because } } > dkwedge_delall() is not called during the raid shutdown
> process. I } } > have a patch for this issue which seems to work
> fine. See the } } > following output:
> } } [snip]
> } } >
> } } > Here's the patch. Note that this is against NetBSD-5.0
> sources, but } } > it should be clean for 5.1, and, i'm guessing,
> -current as well. } }
> } } Ah, good! Thanks for your help with this. I see Christos has
> already } } commited your changes too. (Thanks, Christos!)
> } }
> } } Later...
> } }
> } } Greg Oster
> } >-- End of excerpt from Greg Oster
> }
> }
> >-- End of excerpt from Brian Buhrow
>
Later...
Greg Oster
Home |
Main Index |
Thread Index |
Old Index