Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: RAIDframe performance (RAID-5)



-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


On 28 Aug 2008, at 23:26, Greg Oster wrote:

Again, an improvement, but not really good.

Right...

Just to understand this better: you're arguing that the partition
should be aligned on a stripe size boundary with the beginning of the
RAID set.

Yes.. and more importantly that the blocks being written end up being
stripe-aligned.. (which I think they will do in this case if you get the
start of the partition lined up...)

Right.

That's fine... where the bits are on the underlying components is not
an issue...

Isn't the right criteria that the partition (raid0g in my case) should
be aligned with the stripe size relative to the physical beginning of
the actual disk, rather than the RAID set?

No... it's relative to the RAID set...

Ok, good to have a clear answer to that one ;-)

Aha... In a previous post (when I wasn't paying attention ;) ) you said:

RAID5 w/ following parameters:
* START layout
* # sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_1
* 128 1 1 5

Change that 128 to 64 or even 32.

You have 3 disks.  That means for a given stripe you have "2 disks
that will have data, 1 disk with parity".  For a given stripe then,
that's 128+128 blocks of data... which is 128K of data -- which is
larger than the 64K MAXPHYS value (the largest amount RAIDframe will
ever be handed for one IO :( ).  So what's happening here is that
you're only ever given a max of 64K, and so RAIDframe is *always*
doing small-stripe-writes.  Change the 128 to 64, and now your
stripes will have 64K of data, which at least will give it a fighting
chance.

Ok, I interpret that as:

($stripesize * 2 * ($disks -1)) < MAXPHYS

I.e. the stripe size should go down with an increasing number of disks in the RAID set. Previously I thought it was the other way around. Things are finally getting more understandable.

A stripe width of 32 *may* perform better (can't say for
sure), so if you're wanting to play that's another thing you can
try...

I've done six tests, for bsize/fsize 16K/2K-->32K/4K-->64K/8K respectively, and both 64K and 32K stripe sizes. Very interesting. Three winners and three losers with absolutely nothing in between. The 64K stripe size with 64K bsize probably hits the absolute optimal performance (this is seems to be very close to 2x the raw single disk performance) but 32K/32K and 32K/64K are both also very good at 80+MB/ s both read AND write.

** BAD: 64K stripe size, 16K block size, 2K frag size:

Version 1.03 ------Sequential Output------ --Sequential Input- - --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- - --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP pear 300M 12310 5 12189 4 6987 1 84635 63 91295 16 521.8 2


** BAD: 64K stripe size, 32K block size, 4K frag size:

Version 1.03 ------Sequential Output------ --Sequential Input- - --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- - --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP pear 300M 11648 5 11803 4 8717 2 89978 66 92012 16 516.1 2


** GOOD: 64K stripe size, 64K block size, 4K frag size:

Version 1.03 ------Sequential Output------ --Sequential Input- - --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- - --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP pear 300M 96252 46 97565 32 19439 6 91624 66 91608 14 310.9 1


** BAD: 32K stripe size, 16K block size, 2K frag size:

Version 1.03 ------Sequential Output------ --Sequential Input- - --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- - --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP pear 300M 8839 4 8770 3 6628 2 78156 58 91332 17 525.1 2


** GOOD: 32K stripe size, 32K block size, 4K frag size:

Version 1.03 ------Sequential Output------ --Sequential Input- - --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- - --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP pear.johani.or 300M 81546 38 90802 34 20711 7 81892 61 91937 16 364.5 1


** GOOD: 32K stripe size, 64K block size, 8K frag size:

Version 1.03 ------Sequential Output------ --Sequential Input- - --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- - --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP pear 300M 84566 40 90798 33 20682 6 83047 58 92784 15 270.2 1


Oh.. and don't worry about rebuilding parity before doing the tests
-- parity only matters if you care about the data in the event of a failure ;)

Right, I actually knew that, but I was not thinking straight. Thanks for pointing it out. And many thanks for helping me to understand this much better than before.

Johan

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Darwin)

iD8DBQFIt0hSKJmr+nqSTbYRAsg+AJoD+pcldT9RarHngZTBgerLpudO2wCdEk0q
+oXwcfOGHdZtMlZfekUFQcc=
=QtnG
-----END PGP SIGNATURE-----


Home | Main Index | Thread Index | Old Index