NetBSD-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: RAID reconstruction hangs the whole system



On Thu, Dec 05, 2013 at 04:32:55PM -0600, Greg Oster wrote:
> On Thu, 5 Dec 2013 23:30:06 +0200
> Jarmo Jaakkola <netbsd-users%roskakori.fi@localhost> wrote:
> > On Thu, Dec 05, 2013 at 02:04:09PM -0600, Greg Oster wrote:
> > > On Thu, 5 Dec 2013 21:37:53 +0200
> > > Jarmo Jaakkola <netbsd-users%roskakori.fi@localhost> wrote:
> If you check your disklabels and such, I think what you'll see is the
> size of your RAID 1 set is actually about 2x what it should be.... 

Actually it is the size I was expecting: size of single component -
2*64 sectors.  I hope I would have noticed if the size would have been
larger than I expected ;D

    # for i in 1 3 4; do disklabel wd${i} | grep ' a:'; done
     a: 131072 2048 RAID
     a: 131072 2048 RAID
     a: 131072 2048 RAID
    # disklabel raid0 | grep 'total sectors'
    total sectors: 130944

> > The RAID set configured and seems to work just fine except for
> > the  reconstruction problems.
> 
> I'm surprised it did, as it's technically missing the '4th component'
> that it would need to work properly....   Essentially your wd4a is not
> mirrored anywhere, and doesn't have anywhere to rebuild to -- and it
> might be this later fact that is causing things to hiccup when you try
> to rebuild.....

I wonder if this also caused the other problem I remember having: I was
not able to remove a hot spare once it was added.  I.e. "raidctl -r" did
nothing.  I'll try to confirm that one way or another when I start fixing
this muck up.

> Havn't looked at all the relevant code, but at least
> the RAID 1 config bits in the kernel don't seem to check to make sure
> there are an even number of components provided, and I'm betting
> raidctl doesn't do so either :(  (So ya it 'worked', but wasn't really
> providing the RAID 1 like you were expecting :(  At a minimum I should
> have added more error checking to enforce the even number of components
> requirement...

I'll change to using only two components then.  I was going to submit
a PR for this as a TODO for you (I've gotten the idea that you work on
RAIDframe).  I found that this issue has been reported before as #45162:
    http://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=45162
I'd say the severity could be raised from "non-critical". :)

Thank you very much for your help!

-- 
Jarmo Jaakkola


Home | Main Index | Thread Index | Old Index