Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: RAIDframe performance (RAID-5)



Johan Ihren writes:
>> 
> With this disklabel fragment (and 26212528 is a multiple of the stripe  
> size of 128), I get:
> 
> - -bash-3.2# disklabel raid0 | grep g:
>   g: 301465216  26214528     4.2BSD   2048 16384     0  # (Cyl.   
> 25600*- 319999*)
> 
> Version  1.03       ------Sequential Output------ --Sequential Input-  
> - --Random-
>                      -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--  
> - --Seeks--
> Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec  
> %CP  /sec %CP
> pear         300M 11804   5 11790   4  9011   2 91340  64 105476  18  
> 548.5   2
> 
> Better. But not much. Now with a blocksize of 64K and a frag size of 8K:
> 
> Version  1.03       ------Sequential Output------ --Sequential Input-  
> - --Random-
>                      -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--  
> - --Seeks--
> Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec  
> %CP  /sec %CP
> pear           300M 16012   7 15333   5 12643   4 104565  74 106378   
> 15 443.4   2
> 
> Again, an improvement, but not really good.

Right... 

> Just to understand this better: you're arguing that the partition  
> should be aligned on a stripe size boundary with the beginning of the  
> RAID set. 

Yes.. and more importantly that the blocks being written end up being 
stripe-aligned.. (which I think they will do in this case if you get the 
start of the partition lined up...)

> But the RAID set itself is (in my case) on an 63 block  
> offset from the 128 block alignment with the beginning of the physical  
> disk:
[snip]
>   i: 123575856 189005952       RAID                     # (Cyl.  
> 187505*- 310100)

That's fine... where the bits are on the underlying components is not 
an issue... 

> Isn't the right criteria that the partition (raid0g in my case) should  
> be aligned with the stripe size relative to the physical beginning of  
> the actual disk, rather than the RAID set?

No... it's relative to the RAID set... 

> I.e. in my case with two 63 block offsets the raid0g partition should  
> be nudged 2 blocks towards the end of the disk to achive "alignment".  
> But this is just me guessing.
> 
> So I did that (actually I moved all the wdNg partitions that are the  
> components of the RAID set), but as it will take about 80 minutes to  
> recompute the partity for the new RAID set I will not get any numbers  
> for that theory until tomorrow.

Aha...  In a previous post (when I wasn't paying attention ;) ) you said:
> RAID5 w/ following parameters:
> * START layout
> * # sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_1
> * 128 1 1 5

Change that 128 to 64 or even 32.

You have 3 disks.  That means for a given stripe you have "2 disks
that will have data, 1 disk with parity".  For a given stripe then,
that's 128+128 blocks of data... which is 128K of data -- which is 
larger than the 64K MAXPHYS value (the largest amount RAIDframe will
ever be handed for one IO :( ).  So what's happening here is that 
you're only ever given a max of 64K, and so RAIDframe is *always* 
doing small-stripe-writes.  Change the 128 to 64, and now your 
stripes will have 64K of data, which at least will give it a fighting 
chance.  A stripe width of 32 *may* perform better (can't say for 
sure), so if you're wanting to play that's another thing you can 
try...

Oh.. and don't worry about rebuilding parity before doing the tests 
-- parity only matters if you care about the data in the event of a 
failure ;) 

Later...

Greg Oster




Home | Main Index | Thread Index | Old Index