NetBSD-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

RAID reconstruction hangs the whole system



Recently I've had to twice try to reconstruct a RAID set.  Both of
these times lead to a system hang needing a hard reset.

I wrote a whole email about my problem.  Then it occurred to me that it
might be just a stupid user error, so let's get that out of the way
first.  From raidctl(8) manual page:
    "Note as well that RAID 1 sets are currently limited to only 2
    components."

Is this still valid (NetBSD 6.1.2)?  What would this actually mean?
Should raidctl barf when trying to create a mirrored set with more than
two components?  Or would you get problems like I just did?  Because
the RAID 1 set whose reconstruction causes a system hang was created
with three components.

Below is the original email I was going to send describing the problem
in a bit more detail, in the case that this is not just a PEBKAC.

--8<--8<--8<--

The first time was for trying to add an initially missing component to
a set.  Couldn't get it to work, so I just worked around doing the full
set from the beginning.  This works fine.

After the set was created I managed to wiggle the connector loose from
one of the disks and tried to boot with that, getting a failed
component.  Trying to reconstruct the set I got a system hang again.

So I would do:
    # raidctl -a comp dev
    # raidctl -F absent
or
    # raidctl -R comp dev
to start reconstructing the set.  Then a minute or two afterwards
the system hangs.  If I look at the output of
    # raidctl -S dev
I see
    1% |*      | ETA 00.00  /
with the "i'm doing something" indicator scrolling right up until
the system hangs.  The ETA never updates from 00.00.  Also
    # iostat -x -w5
shows almost non-existant disk activity and CPU usage is low too.

This is a two core amd64 with NetBSD 6.1.2.  The problematic set is
a small three component RAID 1 set for booting.  Rest of those disks
are used for cgds that form a RAID 5.  Then there are three other
disks which form yet another cgd + RAID 5.

Now the question is, how do I go about getting some more information
about this?  I'm not happy to send just this "it doesn't work"
as a bug report.

--8<--8<--8<--

-- 
Jarmo Jaakkola


Home | Main Index | Thread Index | Old Index