tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Where is the component queue depth actually used in the raidframe system?

On Sat, 16 Mar 2013 21:56:43 -0700
Brian Buhrow <> wrote:

> On Mar 14,  8:47am, Greg Oster wrote:
> } Subject: Re: Where is the component queue depth actually used in
> the raidf } On Thu, 14 Mar 2013 10:32:26 -0400
> } Thor Lancelot Simon <> wrote:
> } 
> } > On Wed, Mar 13, 2013 at 09:36:07PM -0400, Thor Lancelot Simon
> wrote: } > > On Wed, Mar 13, 2013 at 03:32:02PM -0700, Brian Buhrow
> wrote: } > > >        hello.   What I'm seeing is that the
> underlying disks } > > > under both a raid1 set and a raid5 set are
> not seeing anymore } > > > than 8 active requests at once across the
> entire bus of disks. } > > > This leaves a lot of disk bandwidth
> unused, not to mention less } > > > than stellar disk performance.  I
> see that RAIDOUTSTANDING is } > > > defined as 6 if not otherwise
> defined, and this suggests that } > > > this is the limiting factor,
> rather than the actual number of } > > > requests allowed to be sent
> to a component's queue. } > > 
> } > > It should be the sum of the number of openings on the underlying
> } > > components, divided by the number of data disks in the set.
> Well, } > > roughly.  Getting it just right is a little harder than
> that, but I } > > think it's obvious how.
> } > 
> } > Actually, I think the simplest correct answer is that it should
> be the } > minimum number of openings presented by any individual
> underlying } > component. I cannot see any good reason why it should
> be either more } > nor less than that value.
> } 
> } Consider the case when a read spans two stripes...  Unfortunately,
> each } of those reads will be done independently, requiring two IOs
> for a given } disk, even though there is only one request.
> } 
> } The reason '6' was picked back in the day was that it seemed to
> offer } reasonable performance while not requiring a huge amount of
> memory to } be reserved for the kernel.  And part of the issue there
> was that } RAIDframe had no way to stop new requests from coming in
> and consuming } all kernel resources :(  '6' is probably a reasonable
> hack for older } machines, but if we can come up with something
> self-tuning I'm all for } it...  (Having this self-tuning is going to
> be even more critical when } MAXPHYS gets sent to the bitbucket and
> the amount of memory needed for } a given IO increases...)
> } 
> } Later...
> } 
> } Greg Oster
>       Hello.  If I understand Thor's formula right, then a raid set
> I have (raid5) with 4 components, each on a wd(ata) disk, then the
> correct number of outstanding requests should be limited to 4 because
> it looks like our ata drivers only present 1 opening per channel.
> However, increasing the outstanding requests on this box from 6,
> which is already too high according to the formula as I understand
> it, to 20, increases the disk throughput on this machine by almost
> 50% for many of the work loads I put on it. 

Yum! :) 

> I imagine there is a
> point of diminishing returns in terms of how much of a queue I should
> allow on the outstanding requests limit, 


> but right now, it's unclear
> to me how to figure out what the optimal setting is for this number
> based on any underlying capacity indicators there may be.  It seems
> like a better huristic might be to be able to specify a maximum
> amount of memory the raidframe driver would be allowed to use, and
> then have it  set the outstanding request count accordingly. 

I think that is the preferred approach.  At least, that is where the
'6' number came from back in the day...

> IN the
> case of the machine I refer to above, I have 2 raid sets, the stripe
> size is set to 64 blocks (32K) with 4 stripes per raid set. with one
> of the raid sets running in degraded mode, the maximum amount of
> memory used by the raidframe subsystem is 10.4MB.  That's not an
> insignificant amount of memory, but it's certainly not a profligate
> amount.  Further thoughts?

10MB is reasonable today, but not so much on a 32MB or 64MB machine :)

I'm not sure what the magic number should be... whether we say 5% of
kernel memory per RAID set, and then scale that by the size of the RAID
set to produce the number of openings (minimum remains at 6?).

Another option to self-tuning would be to introduce a sysctl to allow
setting the value on-the-fly...  

According to my notes I was attempting to do memory calculations on
this back in 2003/2004, but it doesn't look like I came up with a firm
formula back then either...  According to those notes, the number of
nodes in the IO graph is bounded by:

 (2 * raidPtr->Layout->numDataCol) + (1 * layoutPtr->numParityCol)
  + (1 * 2 * layoutPtr->numParityCol) + 3

Multiplying that by the stripe width we get a bound on the memory
requirements for the data -- I think it overestimates the requirement
per IO, but that's fine.  For a 5-disk RAID 5 set with a stripe
width of 32 (16K/component, 64K data for the entire stripe) what we end
up with is a memory requirement of:

 (2*4+1*1+1*2*1+3)*16K=224K per IO.

It's just a matter of scaling the number of openings to match some
reasonable use of kernel memory...  


Greg Oster

Home | Main Index | Thread Index | Old Index