Subject: Re: RAIDframe: why are only "root" raidsets closed on reboot/halt?
To: Eric S. Hvozda <hvozda@ack.org>
From: Greg Oster <oster@cs.usask.ca>
List: tech-kern
Date: 11/26/2002 09:21:27
"Eric S. Hvozda" writes:
> On Sun, 24 Nov 2002 10:27:14 -0600  Greg Oster wrote:
> > 
> > Hmmm... That shouldn't be the case.  Any RAID set (whether auto-configured 
> or
>  
> > not) that has all open partitions closed will have the parity status bits
> > updated properly...
> 
> Interesting.
> 
> Perhaps I should supply more detail. 

Please :)

> I have built a stripe of
> mirrors.  I have tried striping the mirrors with both RAIDframe
> and ccd (all mirrored components are RADIframe based of course).
> 
> Both methods yeild the same behavior:  One or more of the mirrors
> of stripe are "dirty" on restart.
> 
> > If the partitions on those sets get closed, then the corresponding RAID set
> s 
> > will get properly taken care of..  If something keeps a partition open, the
> n 
> > the RAID set won't get properly shutdown.
> 
> Hmmm, if I am in single user mode and have done a "umount -a"
> ensuring all file systems are silent before the reboot (the root
> file system is its own raidset and always closes as you mention)
> I still get dirty mirrors.
> 
> I note that if I "ccdconfig -U" for "raidctl -u" the stripe first,
> parity will stay clean. 

Right.  And this is the part that is missing for the shutdown/reboot.
Without doing the "ccdconfig -U" or "raidctl -u" on the stripe, the 
underlying components (raid sets themselves) will always remain "in use".
That means that their parity bits will never get updated properly on 
a shutdown.  Something like the following:

 umount -f /dev/ccd0e
 ccdconfig -U 

will need to get run during the shutdown process, preferably before 
/etc/rc.d/raidframe is run as part of the shutdown process.

> However I will note that in any event I see
> the root raidset closed (as per the console messages) as follows:
> 
> syncing disks... done
> Closing vnode for row: 0 col: 0
> Closing vnode for row: 0 col: 1
> rebooting...
> 
> I never see any of the other sets give the "Closing vnode for row"
> message.  Perhaps the closing of the vnode for the components has
> nothign to do with the fact that parity stays clean or not?  I must
> admit I have not delved too deeply into the source yet.

I've taken out that message for -current, but in 1.6 (and before) you should 
see those messages for every RAID set.  If you don't, then that means they 
didn't get unconfigured, and that means that the parity status will not have 
been updated in the component labels.

> So it appears I may be wedging open the raidsets with the stripe
> (as you theorized).  However, I have got to believe there are others
> out there attempting to build complex io systems with RAIDframe as
> well, so I believe this is an issue worth looking at.

Hmm... I think every multi-level RAID setup that I've seen so-far has had to 
add "custom shutdown/close" stuff to the shutdown procedure.  Basically, the 
"last configured" device needs to be unconfigured first.  Once it's 
unconfigured, then the underlying RAID sets will have had their "last 
partitions closed", and the parity bits will be updated correctly. 
If nothing else, that fact should be noted in the raidctl man-page.

> I suppose I could RAID 5 across the two controllers, but I really
> don't want the speed hit.

You could, but you shouldn't need to :)

> It seems that there is a similar kind of thing with umounting all
> filesystems at shutdown in that they have to be umount'd in a
> specific order or parent fs's won't umount (ie /var/mail must umount
> before /var).

Right.
 
> I'll have to take a closer look at the source...

Later...

Greg Oster