NetBSD-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

wedges and raidframe recovery



I'm setting up three 2TB disks in a RAID-5 array, on amd64/5.1_rc2.  During 
testing, I ran into what I suspect is a serious problem.

Because of the size of the disks, as best I can tell I have to use wedges 
instead of disklabel.  After poking around and fighting it a bit, I managed to 
set up dk0, dk1, and dk2 as the three wedges, each of which is all of one disk. 
 I then successfully configured the raid1 device across those three dk devices, 
and configured a fourth wedge, dk3, to describe all of raid1.  I was then able 
to build a file system:

# raidctl -s raid1
Components:
            /dev/dk0: optimal
            /dev/dk1: optimal
            /dev/dk2: optimal
No spares.
Component label for /dev/dk0:
   Row: 0, Column: 0, Num Rows: 1, Num Columns: 3
   Version: 2, Serial Number: 2010061200, Mod Counter: 187
   Clean: No, Status: 0
   sectPerSU: 64, SUsPerPU: 1, SUsPerRU: 1
   Queue size: 100, blocksize: 512, numBlocks: 3907028992
   RAID Level: 5
   Autoconfig: Yes
   Root partition: No
   Last configured as: raid1
Component label for /dev/dk1:
   Row: 0, Column: 1, Num Rows: 1, Num Columns: 3
   Version: 2, Serial Number: 2010061200, Mod Counter: 187
   Clean: No, Status: 0
   sectPerSU: 64, SUsPerPU: 1, SUsPerRU: 1
   Queue size: 100, blocksize: 512, numBlocks: 3907028992
   RAID Level: 5
   Autoconfig: Yes
   Root partition: No
   Last configured as: raid1
Component label for /dev/dk2:
   Row: 0, Column: 2, Num Rows: 1, Num Columns: 3
   Version: 2, Serial Number: 2010061200, Mod Counter: 187
   Clean: No, Status: 0
   sectPerSU: 64, SUsPerPU: 1, SUsPerRU: 1
   Queue size: 100, blocksize: 512, numBlocks: 3907028992
   RAID Level: 5
   Autoconfig: Yes
   Root partition: No
   Last configured as: raid1
Parity status: clean
Reconstruction is 100% complete.
Parity Re-write is 100% complete.
Copyback is 100% complete.
# df
Filesystem 1024-blocks       Used      Avail %Cap Mounted on
/dev/raid0a   302652118   29210740  258308774  10% /
kernfs                1          1          0 100% /kern
ptyfs                 1          1          0 100% /dev/pts
procfs                4          4          0 100% /proc
/dev/dk3     3787858122          2 3598465214   0% /shared

During some testing involving removing cables, dk2 was perceived as "failed" by 
the RAIDfrom code.  Ah -- a perfect opportunity to test recovery.  The problem 
is that I couldn't make it work; no matter what I tried, I could not induce the 
kernel to start recovery on that wedge.  dmesg showed complaints about being 
unable to open the device, with 16 -- EBUSY -- as the error code.

First -- should this have worked?  Second -- has anyone every tried this sort 
of configuration?  Third -- my suspicion is that I was getting EBUSY because of 
the multiple layers of wedges, which left the disks appearing to be busy.  But 
if that's correct, there's no way to recover, which is not acceptable.

                --Steve Bellovin, http://www.cs.columbia.edu/~smb







Home | Main Index | Thread Index | Old Index