tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: RAIDframe performance vs. stripe size



On Thu, 10 May 2012 14:06:11 -0400
Thor Lancelot Simon <tls%panix.com@localhost> wrote:

> On Thu, May 10, 2012 at 11:47:36AM -0600, Greg Oster wrote:
> > On Thu, 10 May 2012 13:23:24 -0400
> > Thor Lancelot Simon <tls%panix.com@localhost> wrote:
> > 
> > > On Thu, May 10, 2012 at 11:15:09AM -0600, Greg Oster wrote:
> > > > 
> > > > What you're typically looking for in the parallelization is
> > > > that a given IO will span all of the components.  In that way,
> > > > if you have n
> > > 
> > > That's not what I'm typically looking for.  You're describing the
> > > desideratum for a maximum-throughput application.  Edgar is
> > > describing the desideratum for a minimum-latency application.  No?
> > 
> > I think what I describe still works for minimum-latency too...
> > where it doesn't work is when your IO is so small that the time to
> > actually transfer the data is totally dominated by the time to seek
> > to the data.
> 
> What if I have 8 simultaneous, unrelated streams of I/O, on a 9
> data-disk set? 

That's the "lots of simultaneous IOs happening" part of the bit
you cut out:

 In that case you're better off in just going to a single component
 instead of having n components all moving their heads around to grab
 those few bytes (especially true where there are lots of simultaneous
 IOs happening).

> Like, say, 8 CVS clients all at different points
> fetching a repository that is too big to fit in RAM?
> 
> If the I/Os are all smaller than a stripe size, the heads should be
> able to service them in parallel.

Doing reads, where somehow each of the 8 reads is magically on a
disk independent of the others, is a pretty specific use-case ;) 

If you were talking about writes here, then you still have the
Read-modify-write thing to contend with -- and now instead of just
doing IO to a single disk, you're moving two other disk heads to read
the old data and old parity, and then moving them back again to do the
write... 

Reads I agree - you can get those done in parallel with no interference
(assuming the reading of the data is somehow aligned accordingly to
evenly distribute the load to all disks). Write to anything other than a
full stripe, and it gets really expensive really fast....

> If they are stripe size or larger, they will have to be serviced in
> sequence -- it will take 8 times as long.

Reads, given the configuration and assumptions you have suggested,
would certainly take longer.

Writes would be a different story (how the drive does caching
might be the determining factor?).  

Writing as a stripe each disk would end up with 8 IOs for each of the 8
writes -- 64 writes in all.  

Writing "to a block on each of the 8 disks" would actually end up with:
 a) 8 reads to each of the disks to fetch the old blocks, 

 b) 8 reads to "one of the other disks" to get the old parity.  This
 will also mess up the head positions for the reads happening in a),
 even though both a) and b) will fire at the same time. 

 c) 8 writes to each of the disks to write out the new data and, 
 
 d) 8 writes to each of the disks to write out the new parity.

Thats now 128 reads and 128 writes (and the writes have to wait on the
reads!) to now do the work that the striped system could do with just
64 writes....

> In practice, this is why I often layer a ccd with a huge (and prime)
> "stripe" size over RAIDframe.  It's also a good use case for LVMs.
> But it should be possible to do it entirely at the RAID layer through
> proper stripe size selection.  In this regard RAIDframe seems to be
> optimized for throughput alone.

See my point before about optimizing for one's own particular
workloads :)  (e.g. how you might optimize for a mostly read-only CVS
repository will be completely different from an application where the
mixture of reads and writes is more balanced...)

Later...

Greg Oster


Home | Main Index | Thread Index | Old Index