tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: patch for raidframe and non 512 byte sector devices



        Hello.  Following up on the first question in this thread, see below
with comments and questions.

On Nov 8, 10:41am, Greg Oster wrote:
} > 1. Raidframe autoconfigure on raw disks.
} >     From what I can tell, raidframe can't autoconfigure on a disk
} > unless the disk has either a BSD disklabel, or a gpt wedge and the
} > raidframe lives inside the defined wedge or in a BSD partition.
} > However, it is possible to configure raidframe on a raw disk without
} > such a disklabel or gpt table. My thought was to teach raidframe a
} > third way of autoconfiguring its components.  Namely, using the same
} > trick the boot blocks use to boot off of raid1 partitions.  That is,
} > if there is no disklabel containing a raid partition, or no wedge
} > containing one, seek to the offset where the raid label would go on a
} > raw disk and see if it exists. If enough labels containing the right
} > data exist, and the raid is set to autoconfigure, then configure a
} > raid set. Is there a reason this hasn't been done already?  Are there
} > compelling reasons not to do this that I haven't thought of?  It
} > seems like a simple change, but I haven't actually done more than
} > glance at the code as yet, so I can't  be  sure it's as trivial as it
} > sounds.
} 
} I think the only reason it hasn't been done is that a) I never thought
} of it and b) no-one else has written the code :)

        Hello.  Below is a patch which implements this idea.  I've tested it
on systems with raids configured on raw disks, where autoconfigure didnt
work before this patch, on systems with existing raid sets inside BSD
disklabels, and on systems without any raid configured at all.  I have not
yet booted on a system with raid components configured inside gpt wedges.
        All works as expected, except that there is a side effect on the
systems with raid components configured on raw disks.  After the raid set
is autoconfigured, opendisk() fails with EBUSY, which is expected.  This
doesn't seem to have any bad side effects, except that it creates a lot of noise
in the dmesg output.  Also, one thought I have is that in the event of a
component failure, it might not be possible to re-open the disk to replace
it without rebooting.  This depends, I guess, on whether raidframe closes a
device when it fails, and whether the system will notice that the disk has
ben closed and can be re-opened.
        Do you have any thoughts about this patch, and ways to mitigate the
noise it generates?  One thought I had was that if you could tell the
difference between a real label, i.e. one that came from the disk, versus
one that was faked up by the system when you asked for disk parameters, you
could fail the configuration test in the first case, and use the faked up
'a' partition in the second case.
        Below is the partial output of dmesg with the patch applied, as well
as the output of raidctl -s raid0.  At the end of all this is the patch
itself.
        Any thoughts or suggestions you have about this patch, and what it
might take to make it tree-worthy would be greatly appreciated.
-thanks
-Brian


Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
    2006, 2007, 2008
    The NetBSD Foundation, Inc.  All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
    The Regents of the University of California.  All rights reserved.

NetBSD 5.0_STABLE (RBL) #2: Mon Nov  8 14:53:08 PST 2010
        buhrow%arathorn.via.net@localhost:/usr/src/sys/arch/i386/compile/RBL
total memory = 3327 MB
avail memory = 3258 MB
timecounter: Timecounters tick every 10.000 msec
timecounter: Timecounter "i8254" frequency 1193182 Hz quality 100
Supermicro X6DH8-XB (0123456789)

...

3ware 9000 series: (rev. 0x00)
twa0 at pci3 dev 1 function 0: 3ware Apache
twa0: interrupting at ioapic2 pin 0
twa0: 8 ports, Firmware FE9X 2.04.00.003, BIOS BE9X 2.03.01.047
twa0: Monitor BL9X 2.02.00.001, PCB Rev 019 , Achip 3.20    , Pchip 1.50    
twa0: port 0: Hitachi HDS722020ALA330                  1907729 MB
twa0: port 1: Hitachi HDS722020ALA330                  1907729 MB
twa0: port 4: Hitachi HDS722020ALA330                  1907729 MB
twa0: port 5: Hitachi HDS722020ALA330                  1907729 MB
twa0: 3ware   Logical Disk 00 1.00
ld0 at twa0 unit 0
ld0: 1862 GB, 243151 cyl, 255 head, 63 sec, 512 bytes/sect x 3906228224 sectors
twa0: 3ware   Logical Disk 01 1.00
ld1 at twa0 unit 1
ld1: 1862 GB, 243151 cyl, 255 head, 63 sec, 512 bytes/sect x 3906228224 sectors
twa0: 3ware   Logical Disk 02 1.00
ld2 at twa0 unit 2
ld2: 1862 GB, 243151 cyl, 255 head, 63 sec, 512 bytes/sect x 3906228224 sectors
twa0: 3ware   Logical Disk 03 1.00
ld3 at twa0 unit 3
ld3: 1862 GB, 243151 cyl, 255 head, 63 sec, 512 bytes/sect x 3906228224 sectors

...

3ware 9000 series: (rev. 0x00)
twa1 at pci9 dev 1 function 0: 3ware Apache
twa1: interrupting at ioapic4 pin 0
twa1: 8 ports, Firmware FE9X 2.04.00.003, BIOS BE9X 2.03.01.047
twa1: Monitor BL9X 2.02.00.001, PCB Rev 019 , Achip 3.20    , Pchip 1.50    
twa1: port 0: Hitachi HDS722020ALA330                  1907729 MB
twa1: port 1: Hitachi HDS722020ALA330                  1907729 MB
twa1: port 4: Hitachi HDS722020ALA330                  1907729 MB
twa1: port 5: Hitachi HDS722020ALA330                  1907729 MB
twa1: 3ware   Logical Disk 00 1.00
ld4 at twa1 unit 0
ld4: 1862 GB, 243151 cyl, 255 head, 63 sec, 512 bytes/sect x 3906228224 sectors
twa1: 3ware   Logical Disk 01 1.00
ld5 at twa1 unit 1
ld5: 1862 GB, 243151 cyl, 255 head, 63 sec, 512 bytes/sect x 3906228224 sectors
twa1: 3ware   Logical Disk 02 1.00
ld6 at twa1 unit 2
ld6: 1862 GB, 243151 cyl, 255 head, 63 sec, 512 bytes/sect x 3906228224 sectors
twa1: 3ware   Logical Disk 03 1.00
ld7 at twa1 unit 3
ld7: 1862 GB, 243151 cyl, 255 head, 63 sec, 512 bytes/sect x 3906228224 sectors

...

Kernelized RAIDframe activated
pad0: outputs: 44100Hz, 16-bit, stereo
audio0 at pad0: half duplex
raid0: RAID Level 5
raid0: Components: /dev/ld0d /dev/ld4d /dev/ld1d /dev/ld5d /dev/ld2d /dev/ld6d 
/dev/ld3d /dev/ld7d
raid0: Total Sectors: 27343597120 (13351365 MB)
raid0: GPT GUID: 24df0712-e6b6-11df-b548-003048785e28
dk0 at raid0: 24df0726-e6b6-11df-b548-003048785e28
dk0: 27343597053 blocks at 34, type: ffs
opendisk: can't open dev ld0 (16)
opendisk: can't open dev ld1 (16)
opendisk: can't open dev ld2 (16)
opendisk: can't open dev ld3 (16)
opendisk: can't open dev ld4 (16)
opendisk: can't open dev ld5 (16)
opendisk: can't open dev ld6 (16)
opendisk: can't open dev ld7 (16)
opendisk: can't open dev ld0 (16)
opendisk: can't open dev ld1 (16)
opendisk: can't open dev ld2 (16)
opendisk: can't open dev ld3 (16)
opendisk: can't open dev ld4 (16)
opendisk: can't open dev ld5 (16)
opendisk: can't open dev ld6 (16)
opendisk: can't open dev ld7 (16)
opendisk: can't open dev ld0 (16)
opendisk: can't open dev ld1 (16)
opendisk: can't open dev ld2 (16)
opendisk: can't open dev ld3 (16)
opendisk: can't open dev ld4 (16)
opendisk: can't open dev ld5 (16)
opendisk: can't open dev ld6 (16)
opendisk: can't open dev ld7 (16)
boot device: wd0
root on wd0a dumps on wd0b
root file system type: ffs
raid0: Device already configured!
Accounting started

[Output of raidctl -s raid0]

Components:
           /dev/ld0d: optimal
           /dev/ld4d: optimal
           /dev/ld1d: optimal
           /dev/ld5d: optimal
           /dev/ld2d: optimal
           /dev/ld6d: optimal
           /dev/ld3d: optimal
           /dev/ld7d: optimal
No spares.
Component label for /dev/ld0d:
   Row: 0, Column: 0, Num Rows: 1, Num Columns: 8
   Version: 2, Serial Number: 20101102, Mod Counter: 114
   Clean: No, Status: 0
   sectPerSU: 64, SUsPerPU: 1, SUsPerRU: 1
   Queue size: 100, blocksize: 512, numBlocks: 3906228160
   RAID Level: 5
   Autoconfig: Yes
   Root partition: No
   Last configured as: raid0
Component label for /dev/ld4d:
   Row: 0, Column: 1, Num Rows: 1, Num Columns: 8
   Version: 2, Serial Number: 20101102, Mod Counter: 114
   Clean: No, Status: 0
   sectPerSU: 64, SUsPerPU: 1, SUsPerRU: 1
   Queue size: 100, blocksize: 512, numBlocks: 3906228160
   RAID Level: 5
   Autoconfig: Yes
   Root partition: No
   Last configured as: raid0
Component label for /dev/ld1d:
   Row: 0, Column: 2, Num Rows: 1, Num Columns: 8
   Version: 2, Serial Number: 20101102, Mod Counter: 114
   Clean: No, Status: 0
   sectPerSU: 64, SUsPerPU: 1, SUsPerRU: 1
   Queue size: 100, blocksize: 512, numBlocks: 3906228160
   RAID Level: 5
   Autoconfig: Yes
   Root partition: No
   Last configured as: raid0
Component label for /dev/ld5d:
   Row: 0, Column: 3, Num Rows: 1, Num Columns: 8
   Version: 2, Serial Number: 20101102, Mod Counter: 114
   Clean: No, Status: 0
   sectPerSU: 64, SUsPerPU: 1, SUsPerRU: 1
   Queue size: 100, blocksize: 512, numBlocks: 3906228160
   RAID Level: 5
   Autoconfig: Yes
   Root partition: No
   Last configured as: raid0
Component label for /dev/ld2d:
   Row: 0, Column: 4, Num Rows: 1, Num Columns: 8
   Version: 2, Serial Number: 20101102, Mod Counter: 114
   Clean: No, Status: 0
   sectPerSU: 64, SUsPerPU: 1, SUsPerRU: 1
   Queue size: 100, blocksize: 512, numBlocks: 3906228160
   RAID Level: 5
   Autoconfig: Yes
   Root partition: No
   Last configured as: raid0
Component label for /dev/ld6d:
   Row: 0, Column: 5, Num Rows: 1, Num Columns: 8
   Version: 2, Serial Number: 20101102, Mod Counter: 114
   Clean: No, Status: 0
   sectPerSU: 64, SUsPerPU: 1, SUsPerRU: 1
   Queue size: 100, blocksize: 512, numBlocks: 3906228160
   RAID Level: 5
   Autoconfig: Yes
   Root partition: No
   Last configured as: raid0
Component label for /dev/ld3d:
   Row: 0, Column: 6, Num Rows: 1, Num Columns: 8
   Version: 2, Serial Number: 20101102, Mod Counter: 114
   Clean: No, Status: 0
   sectPerSU: 64, SUsPerPU: 1, SUsPerRU: 1
   Queue size: 100, blocksize: 512, numBlocks: 3906228160
   RAID Level: 5
   Autoconfig: Yes
   Root partition: No
   Last configured as: raid0
Component label for /dev/ld7d:
   Row: 0, Column: 7, Num Rows: 1, Num Columns: 8
   Version: 2, Serial Number: 20101102, Mod Counter: 114
   Clean: No, Status: 0
   sectPerSU: 64, SUsPerPU: 1, SUsPerRU: 1
   Queue size: 100, blocksize: 512, numBlocks: 3906228160
   RAID Level: 5
   Autoconfig: Yes
   Root partition: No
   Last configured as: raid0
Parity status: clean
Reconstruction is 100% complete.
Parity Re-write is 100% complete.
Copyback is 100% complete.


[The patch itself]

Index: rf_netbsdkintf.c
===================================================================
RCS file: /cvsroot/src/sys/dev/raidframe/rf_netbsdkintf.c,v
retrieving revision 1.250.4.4
diff -u -r1.250.4.4 rf_netbsdkintf.c
--- rf_netbsdkintf.c    4 Apr 2009 17:15:14 -0000       1.250.4.4
+++ rf_netbsdkintf.c    8 Nov 2010 23:58:43 -0000
@@ -2849,7 +2849,7 @@
        struct disklabel label;
        struct device *dv;
        dev_t dev;
-       int bmajor, bminor, wedge;
+       int bmajor, bminor, wedge, rf_part_found;
        int error;
        int i;
        RF_AutoConfig_t *ac_list;
@@ -2895,6 +2895,8 @@
                /* need to find the device_name_to_block_device_major stuff */
                bmajor = devsw_name2blk(device_xname(dv), NULL, 0);
 
+               rf_part_found = 0; /*No raid partition as yet*/
+
                /* get a vnode for the raw partition of this disk */
 
                wedge = device_is_a(dv, "dk");
@@ -2935,6 +2937,7 @@
                                
                        ac_list = rf_get_component(ac_list, dev, vp,
                            device_xname(dv), dkw.dkw_size);
+                       rf_part_found = 1; /*There is a raid component on this 
disk*/
                        continue;
                }
 
@@ -2959,6 +2962,7 @@
                if (error)
                        continue;
 
+               rf_part_found = 0; /*No raid partitions yet*/
                for (i = 0; i < label.d_npartitions; i++) {
                        char cname[sizeof(ac_list->devname)];
 
@@ -2980,7 +2984,35 @@
                            device_xname(dv), 'a' + i);
                        ac_list = rf_get_component(ac_list, dev, vp, cname,
                                label.d_partitions[i].p_size);
+                               rf_part_found = 1; /*There is at least one raid 
partition on this disk*/
                }
+
+               /*
+                *If there is no raid component on this disk, either in a
+                *disklabel or inside a wedge, check the raw partition as well,
+                *as it is possible to configure raid components on raw disk
+                *devices.
+                */
+
+               if (!rf_part_found) {
+                       char cname[sizeof(ac_list->devname)];
+
+                       dev = MAKEDISKDEV(bmajor, device_unit(dv), RAW_PART);
+                       if (bdevvp(dev, &vp))
+                               panic("RAID can't alloc vnode");
+
+                       error = VOP_OPEN(vp, FREAD, NOCRED);
+                       if (error) {
+                               /* Whatever... */
+                               vput(vp);
+                               continue;
+                       }
+                       snprintf(cname, sizeof(cname), "%s%c",
+                           device_xname(dv), 'a' + RAW_PART);
+                       ac_list = rf_get_component(ac_list, dev, vp, cname,
+                               label.d_partitions[RAW_PART].p_size);
+               }
+
        }
        return ac_list;
 }


Home | Main Index | Thread Index | Old Index