current-users: Re: Why my life is sucking. Part 2.

Subject: Re: Why my life is sucking. Part 2.
To: None <current-users@netbsd.org>
From: Greg Oster <oster@cs.usask.ca>
List: current-users
Date: 01/18/2001 15:00:30
Bill Sommerfeld writes:
> > Wouldn't it be better to first check the overall status of the array?
> > And once the array's parity has been correctly written, you can free
> > the memory used to hold this bitmap.  It means that you're doing two
> > checks, not just one, while you're actually doing the on-demand
> > re-writing of the parity; but when you're not fixing parity, it ought
> > to save you memory, and probably time, too, when you think about
> > keeping that whole bitmap in the CPU's cache...
> > 
> > if(array_is_dirty)
> >     if(this_block_is_dirty)
> >         rewrite_parity();
> 
> if there's already a function pointer at the right place in the I/O
> path, you can do the check with zero overhead -- you start off with it
> pointing to the "dirty, slow" path and once parity is cleaned up
> re-point it ot the "clean, fast" path.

Yup... I havn't had time to look, but I suspect it can be found if one looks 
hard enough :)  

A few other things about "parity rewrite on demand".
1) if a block is to be read, then the associated stripe must have its parity 
updated before the block is returned.  (If it is not, and the component that 
block lives on dies, then that block could be reconstructed incorrectly.)
2) if a block is to be written, then the associated stripe must have its 
parity updated before the block is written. (same reason as above)
3) there could *still* be other stripes where the parity is incorrect, and 
where a failed component would result in incorrect data being reconstructed.

While 1) and 2) help in getting the parity correct, allowing other 'normal'
IO increases the amount of time through which 3) poses a major problem.
The longer one delays getting the parity correct, the more the risk...

If we step back a bit: people want their machines to come up fast, but the 
parity re-write is slowing that down.  One way to 'solve' that is to delay the 
re-write until after things like filesystem checks.  But that increases the 
risk.  So another way is with the 'rewrite on demand' thing.  That could help 
to ensure that the data being read/written is correct, but doesn't reduce the 
time it takes to finish the re-write (it actually would increase the time, 
which increases the risk).  Most people can probably live with the extra 
risk in either of these cases....  However:  I'm much more inclined to 
work on a scheme where it can be known that certain portions of the set 
are fine, and that the parity rewrite only needs to deal with a small 
portion of the set, rather than knowing that we have to check the entire 
thing, and hoping that we get the 'wrong bits' fixed before a 
component dies...  By reducing the amount of parity rewriting that needs to be 
done, one reduces the amount of time it takes, which reduces the amount of 
risk... and as a lovely side-affect, reduces the amount of time it takes 
to get to multi-user, which makes everyone happy :)

Later...

Greg Oster