tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

RAIDframe performance vs. stripe size: Test Results



> Of course, this is all still theoretical -- the best thing is to
> experiment with different RAID settings and real workloads to see what
> works best for the particular applications...

So I've bitten the bullet and conducted experiments.
I hope the results are useful to others, too.

The setup is a SuperMicro H8SCM-F with 16G of ECC RAM, an Opteron 4226 and
five Hitachi HUS723030ALS640 behind an LSI 1068E.
I'm running 6.0_BETA/amd64.

For the test, I created 25G sized partitions on each of the discs and a
Level 5 RAID accross five such partitions.

The test data (in .tgz's on another RAID) consists of a subset of real
user data and real user mail. The home set is ~60G in ~500k files and
~115k dirs, the mail set is ~40G in ~750k files and ~22k dirs.

My main interest was to test two things presumably relevant for the backup:
stating and reading. For the former, I did a "find -x . -ls > /dev/null",
for the latter, "tar cf /dev/null .". I also somewhat tested parallel
performance by running two tar's in parallel, acting on disjoints covering
subsets of the data.
I also recorded the times to build the RAID (i.e. to re-write the parity)
and to extract (i.e. tar xpzf) the test data set as well as df -h output.

The two variables accross the tests were file system block size and RAID
stripe unit size.
I also tested with -o noatime after I found out it matters.

As extracting  the test data takes some three hours, I was limited in the
number of value combination I could test. I spent quite a few days on this.

The values I tested were 16k fsbsize with 32, 8 and 128 SPSU (i.e. one block
per stripe unit, one block per stripe and four blocks per stripe unit) and
64k fsbsize with 128 and 32 SPSU (i.e., one block per stripe unit and one block
per stripe).
In case someone convinces me that other combinations may be interesting,
I could probably test an additional one or two of them.
[As it looks like that larger blocks and larger stripes yield better
performance everywhere except at extracting the test data, I'l probably start
another run with 64k fsbsize and 512 SPSU.]
[[It looks like I'm having problems to create that RAID.]]

All test (except one extraction were I forgot to enable quota on newfs) were
done with WAPL and new quotas (both user and group).

I re-mounted before the "find" test in order to invalidate the buffer cache.
I initially did the same for the "tar" test to note it made no significant
difference, so I refrained from doing so in the other tests.

Parity re-build:
32      8       128
6min    ~15min  <5min

Extraction (minutes):
        16k/32  16k/8   16k/128 64k/128 64k/32
home    163      81*    151     116     79
mail    146     113*    137     109     94
*without quota (by mistake)

In the following, + means atime, - noatime.
Column titles are fsbsize/SPSU.

find (seconds):
        16k/32  16k/8   16k/128 64k/128 64k/32
home+   158     178     167      98     103
home-    79      86      76      73      76
mail+    82     ???      81      47      52
mail-    56      59      54      39      41
???: forgot to measure

tar (seconds):
        16k/32  16k/8   16k/128 64k/128 64k/32
home+   577     639     536     406     418
home-   422     466     369     362     361
mail+   600     690     587     371     395
mail-   411     484     375     318     337

parallel tars (seconds):
        16k/32  16k/8   16k/128 64k/128 64k/32
home+   430+506 497+574 389+466 269+327 332+358
home-   302+360 355+412 275+333 242+290 294+288
mail+   431+444 496+505 419+431 252+254 285+288
mail-   246+255 289+290 234+237 203+204 231+233


I hope I made no errors transcribing these values from my log. If something
looks suspiciuos to you, please ask.


Any comments on the results?


My questions:
Why does parity re-build take longer with smaller stripes? Is it really
done one stripe at a time?
Why does enabling quotas slow down extraction so much? The test data should
be ordered by uid in the tar, so quota should be easily cachable.
Why does the negative impact of atime updates decrease at larger block/stripe
sizes?


Home | Main Index | Thread Index | Old Index