Re: Problems with raidframe under NetBSD-5.1/i386

To: Greg Oster <oster%cs.usask.ca@localhost>
Subject: Re: Problems with raidframe under NetBSD-5.1/i386
From: buhrow%lothlorien.nfbcal.org@localhost (Brian Buhrow)
Date: Thu, 6 Jan 2011 18:33:58 -0800

        Hello.  Ok.  I have more information, perhaps this is a known issue.
If not, I can file a bug.

        the problem seems to be that if you partition a raid set with gpt
instead of disklabel, if a component of that raid set fails,  the
underlying component is held open even after raidframe declares it dead.
Thus, when you try to ask raidframe to do a reconstruct on the dead
component, it can't open the component because the component is busy.

Here's how to repeat:

1.  Set up 2 raid sets, one with a standard disklabel on the raid set and
one with a gpt partition table.  Don't know if it's important or not, but
all underlying disk components have bsd disklabels partitioning them, not
gpt partition tables.

2.  Pick one of the components from each raid set and try the command:

dd if=/dev/<component> of=/dev/null count=5000

You should get a device busy error on each command.

3.  Soft fail one of the components in each of the raid sets.

raidctl -f /dev/<component> raidx

4.  Repeat step 2.  This time, you'll get success on the raid set with a
disklabel, but the same error on  the raid set with the gpt partition.

5.  Try reconstructing on each raid set:

raidctl -R /dev/<component> raidx

You'll hav  success on the raid set with BSD disklabels on it, but not with
the one with gpt partitions.

        this might be related, orit might not, but if I unconfigure the raid
set with the gpt partition table on it and then reconfigure it, I get the
following output.  It looks like the dk device on top of the raid set
didn't get detached when the raid set was unconfigured.   Initial boot time
looks like:

raid2: RAID Level 5
raid2: Components: /dev/sd8a /dev/sd3a /dev/sd9a /dev/sd4a 
/dev/sd10a[**FAILED**]
raid2: Total Sectors: 573495040 (280026 MB)
raid2: GPT GUID: f8b345ac-c74f-11df-806c-001e68041c80
dk1 at raid2: f8b345ca-c74f-11df-806c-001e68041c80
dk1: 573494973 blocks at 34, type: ffs

And the reconfigure looks like:

raid2: Summary of serial numbers:
2009081302 5
raid2: Summary of mod counters:
320 4
209 1
Hosed component: /dev/sd10a
raid2: Component /dev/sd8a being configured at col: 0
         Column: 0 Num Columns: 5
         Version: 2 Serial Number: 2009081302 Mod Counter: 320
         Clean: Yes Status: 0
raid2: Component /dev/sd3a being configured at col: 1
         Column: 1 Num Columns: 5
         Version: 2 Serial Number: 2009081302 Mod Counter: 320
         Clean: Yes Status: 0
raid2: Component /dev/sd9a being configured at col: 2
         Column: 2 Num Columns: 5
         Version: 2 Serial Number: 2009081302 Mod Counter: 320
         Clean: Yes Status: 0
raid2: Component /dev/sd4a being configured at col: 3
         Column: 3 Num Columns: 5
         Version: 2 Serial Number: 2009081302 Mod Counter: 320
         Clean: Yes Status: 0
raid2: Ignoring /dev/sd10a
raid2: allocating 50 buffers of 32768 bytes.
raid2: RAID Level 5
raid2: Components: /dev/sd8a /dev/sd3a /dev/sd9a /dev/sd4a 
/dev/sd10a[**FAILED**]
raid2: Total Sectors: 573495040 (280026 MB)
raid2: GPT GUID: f8b345ac-c74f-11df-806c-001e68041c80
raid2: wedge named 'f8b345ca-c74f-11df-806c-001e68041c80' already exists, 
trying 'f8b345ca-c74f-11df-806c-001e68041c80'
raid2: wedge named 'f8b345ca-c74f-11df-806c-001e68041c80' already exists, 
manual intervention required
raid2: initiating in-place reconstruction on column 4
raid2: Recon write failed!
raid2: reconstruction failed.

Follow-Ups:
- Re: Problems with raidframe under NetBSD-5.1/i386
  - From: Greg Oster

Prev by Date: Re: Problems with raidframe under NetBSD-5.1/i386
Next by Date: Re: Problems with raidframe under NetBSD-5.1/i386
Previous by Thread: Re: Problems with raidframe under NetBSD-5.1/i386
Next by Thread: Re: Problems with raidframe under NetBSD-5.1/i386
Indexes:

Home | Main Index | Thread Index | Old Index