current-users: Re: Why my life is sucking. Part 2.

Subject: Re: Why my life is sucking. Part 2.
To: None <tls@rek.tjls.com>
From: Greg Oster <oster@cs.usask.ca>
List: current-users
Date: 01/18/2001 19:24:30
Thor Lancelot Simon writes:
> On Thu, Jan 18, 2001 at 03:00:30PM -0600, Greg Oster wrote:
> > Bill Sommerfeld writes:
> > > > Wouldn't it be better to first check the overall status of the array?
> > > > And once the array's parity has been correctly written, you can free
> > > > the memory used to hold this bitmap.  It means that you're doing two
> > > > checks, not just one, while you're actually doing the on-demand
> > > > re-writing of the parity; but when you're not fixing parity, it ought
> > > > to save you memory, and probably time, too, when you think about
> > > > keeping that whole bitmap in the CPU's cache...
> > > > 
> > > > if(array_is_dirty)
> > > >     if(this_block_is_dirty)
> > > >         rewrite_parity();
> > > 
> > > if there's already a function pointer at the right place in the I/O
> > > path, you can do the check with zero overhead -- you start off with it
> > > pointing to the "dirty, slow" path and once parity is cleaned up
> > > re-point it ot the "clean, fast" path.
> > 
> > Yup... I havn't had time to look, but I suspect it can be found if one look
> s 
> > hard enough :)  
> > 
> > A few other things about "parity rewrite on demand".
> > 1) if a block is to be read, then the associated stripe must have its parit
> y 
> > updated before the block is returned.  (If it is not, and the component tha
> t 
> > block lives on dies, then that block could be reconstructed incorrectly.)
> > 2) if a block is to be written, then the associated stripe must have its 
> > parity updated before the block is written. (same reason as above)
> > 3) there could *still* be other stripes where the parity is incorrect, and 
> > where a failed component would result in incorrect data being reconstructed
> .
> > 
> > While 1) and 2) help in getting the parity correct, allowing other 'normal'
> > IO increases the amount of time through which 3) poses a major problem.
> 
> I strongly believe the above analysis to be incorrect. 

"Ok"... [after reading what you wrote, I remain confused as to what you see as 
'incorrect', but...]

> 3) holds whether
> the machine is switched on doing a parity rebuild only or whether it's doing
> "real" I/O at the same time -- doing "real" I/O, so long as it's always
> preceded by a parity update if required, does not increase one's risk of
> encountering a fatal failure except inasmuch as it may lengthen the time
> window of exposure to it. 

Agreed.  

> *HOWEVER*,
> 
> * If you're not doing very much "real" I/O, you won't move the heads around
>   much and thereby interfere with the I/O being generated by the parity
>   rebuild thread.  So, if you aren't doing much "real" I/O, you won't
>   actually hurt the parity rebuild time much at all.

'much' -- but the time is there... and it's non-zero.

> * If you *are* doing a lot of "real" I/O, you're causing the parity for
>   every stripe you touch to be synchronously rebuilt, so though you change
>   the *ordering* of the rebuild, you don't make it much slower.

Again 'much'... And it could be that such a very slight increase is a 
tollerable level.. I'd like to se it *decrease* though..

> And, *so long as you update parity every time you write a stripe for any
> reason*, you never actually increase the chance of having a fatal failure.

Other than by taking longer (even if it's say '1 second' longer).  If one does 
anything other than spend time getting the parity updated as fast as possible, 
one leaves Murphy more time to come along and nuke a drive.

> I repeat: this issue is well-understood; this is how every dedicated
> hardware RAID controller I've encountered does it; we should do it the
> same way.

I need to think about this more... 

> Another thing to consider is to be smarter about the parity updates: try to
> keep the number of entries in the queue at a constant length using parity
> update I/O, and use the last known head position (that is, the request
> that was last in the queue when the number of I/O requests in the queue
> fell below some threshold value) to decide where to start generating parity
> update I/Os for your online rebuild.  You don't *have* to rebuild the
> disk in order from one end to the other, and not trying to do so will likely
> yield radically better rebuild performance in the presence of other I/O
> sources.

Later...

Greg Oster