NetBSD-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: RAID reconstruction hangs the whole system



On Thu, 5 Dec 2013 21:37:53 +0200
Jarmo Jaakkola <netbsd-users%roskakori.fi@localhost> wrote:

> Recently I've had to twice try to reconstruct a RAID set.  Both of
> these times lead to a system hang needing a hard reset.
> 
> I wrote a whole email about my problem.  Then it occurred to me that
> it might be just a stupid user error, so let's get that out of the way
> first.  From raidctl(8) manual page:
>     "Note as well that RAID 1 sets are currently limited to only 2
>     components."
> 
> Is this still valid (NetBSD 6.1.2)? 

Yes.

> What would this actually mean?

It means you should only use 2 components for a RAID 1 set, as using 3
or more is not currently supported.

> Should raidctl barf when trying to create a mirrored set with more
> than two components? 

It probably should, but I'm betting it doesn't.  What does your
RAID config file look like?  What does 'raidctl -s' say for the RAID
set?

> Or would you get problems like I just did?

Probably.... :(

Later...

Greg Oster


> Because the RAID 1 set whose reconstruction causes a system hang was
> created with three components.
> 
> Below is the original email I was going to send describing the problem
> in a bit more detail, in the case that this is not just a PEBKAC.
> 
> --8<--8<--8<--
> 
> The first time was for trying to add an initially missing component to
> a set.  Couldn't get it to work, so I just worked around doing the
> full set from the beginning.  This works fine.
> 
> After the set was created I managed to wiggle the connector loose from
> one of the disks and tried to boot with that, getting a failed
> component.  Trying to reconstruct the set I got a system hang again.
> 
> So I would do:
>     # raidctl -a comp dev
>     # raidctl -F absent
> or
>     # raidctl -R comp dev
> to start reconstructing the set.  Then a minute or two afterwards
> the system hangs.  If I look at the output of
>     # raidctl -S dev
> I see
>     1% |*      | ETA 00.00  /
> with the "i'm doing something" indicator scrolling right up until
> the system hangs.  The ETA never updates from 00.00.  Also
>     # iostat -x -w5
> shows almost non-existant disk activity and CPU usage is low too.
> 
> This is a two core amd64 with NetBSD 6.1.2.  The problematic set is
> a small three component RAID 1 set for booting.  Rest of those disks
> are used for cgds that form a RAID 5.  Then there are three other
> disks which form yet another cgd + RAID 5.
> 
> Now the question is, how do I go about getting some more information
> about this?  I'm not happy to send just this "it doesn't work"
> as a bug report.
> 
> --8<--8<--8<--
> 
> -- 
> Jarmo Jaakkola


Home | Main Index | Thread Index | Old Index