Subject: Re: kern/30674: RAIDframe should be able to create volumes without parity rewrite
To: None <kern-bug-people@netbsd.org, gnats-admin@netbsd.org,>
From: Greg Oster <oster@cs.usask.ca>
List: netbsd-bugs
Date: 07/06/2005 16:03:01
The following reply was made to PR kern/30674; it has been noted by GNATS.

From: Greg Oster <oster@cs.usask.ca>
To: Matthias Scheler <tron@zhadum.de>
Cc: gnats-bugs@netbsd.org
Subject: Re: kern/30674: RAIDframe should be able to create volumes without parity rewrite 
Date: Wed, 06 Jul 2005 10:02:35 -0600

 Matthias Scheler writes:
 > On Wed, Jul 06, 2005 at 09:19:19AM -0600, Greg Oster wrote:
 > > Let's address the RAID 1 case first:
 > > If you're just going to build a FFS on it, then one can get away with 
 > > marking the parity as "good" because data will never be read until 
 > > after it has been written.  Fine.
 > 
 > Exactly.
 > 
 > > If the machine crashes or otherwise goes down without marking the
 > > parity as "good", then you are back to square one -- you *HAVE* to
 > > do the parity rebuild at that point,
 > 
 > That is actually another disadvantage of RAIDframe. SVM doesn't manage
 > "parity good" by a single bit. It uses a database which manages it
 > on per "SVM meta cluster" base. The result is that Solaris only needs
 > to sync a few MBs after a crash and not the whole volume.
 
 Right.  Something like this is on my TODO list, but I've not gotten 
 to it.. 
 
 > > There is, however, also a violation of the Principle of Least Astonishment.
 > 
 > I don't ask for this being turned on by default. Solaris doesn't manul page
 > doesn't even recomment. But it is nice to have that option if you know what
 > your are doing.
 > 
 > > If, for example, the components had random data on them before the 
 > > RAID 1 set was created, and one does two "dd if=/dev/rraid0d | md5" 
 > > with the parity marked as "good" (but not actually synced!) then one
 > > might well yield different results.
 > 
 > That is a very artifical case.
 
 Somewhat artificial, yes :)
 
 > The *really* interesting information is
 > the checksum of the filesystem data on the RAID volume. And that will
 > always match even if the mirror was created without an initial
 > parity rewrite.
 
 Yes.
  
 > > Let's now look at the RAID 5 case:
 > 
 > I already guessed that it is different for RAID 5. So we can just leave
 > that case out of the discussion.
 > 
 > > I've heard the argument a couple of times, but I don't see it buying 
 > > anything other than removing one parity rebuild...
 > 
 > Which might save you hours of waiting and/or slow system performance.
 
 But only the first time... after the first crash, you're back to waiting
 again... And the longer the system is up (or the more rebuilds that are 
 done) the less expensive that one rebuild really is in the life of the 
 system...
 
 Later...
 
 Greg Oster