Subject: CVS commit: src/sys/dev/raidframe
To: None <source-changes@NetBSD.org>
From: Greg Oster <oster@netbsd.org>
List: source-changes
Date: 01/04/2004 06:37:16
Module Name:	src
Committed By:	oster
Date:		Sun Jan  4 06:37:16 UTC 2004

Modified Files:
	src/sys/dev/raidframe: raidframevar.h rf_layout.c

Log Message:
As noted by Thierry Deval in a posting to misc/at/openbsd.org,
rf_DecrAccessesCountState wasn't in the correct spot in
RF_AccessState_e.  Following up on that has resulted in one other
correction.  Changing orderings of these states is tricky, and
shouldn't be attempted without some thorough analysis.  For the
changes committed, the following analysis is offerred:

1) RAIDframe uses a little state machine to take care of building,
executing, and processing the DAGs used to direct IO.

2) The rf_DecrAccessesCountState state is handled by the function
rf_State_DecrAccessCount().  The purpose of this state is to
decrement the number of "accesses-in-flight".

3) rf_Cleanup_State is handled by rf_State_Cleanup().  Its job is to
do general cleanup of DAG arrays and any stripe locks.

4) DefaultStates[] in rf_layout.c indicates that the right spot
for rf_DecrAccessesCountState is just before rf_Cleanup_State.
Analysis of code for both states indicates that the order doesn't
matter too much, although rf_State_DecrAccessCount() should probably
take place *after* rf_State_Cleanup() to be more correct.

5) Comments in rf_State_ProcessDAG() indicates that the next state
should be rf_Cleanup_State.  However: it attempts to get there by using

 desc->state++;

which actually takes it to just rf_DecrAccessesCountState! This turned
out to be OK before, since rf_Cleanup_State would follow right after,
and all would be taken careof (albeit in arguably the "less correct"
order).

6) With the current ordering, if we head directly to rf_Cleanup_State
(as we do, for example, if multiple components fail in a RAID 5 set),
then we'll actually miss going trough rf_DecrAccessesCountState), and
could end up never being able to reach quiescence!  Perhaps not too
big of a deal, given that the RAID set is pretty much toast by that
point at which such a drastic state change happens, but might as well
have this correct.

The changes made are:
1) Since having rf_State_DecrAccessCount() come after
rf_State_Cleanup() is just fine, change rf_layout.c to reflect that
rf_DecrAccessesCountState comes after rf_Cleanup_State (i.e. they swap
positions in the state list).  This means that going to
rf_Cleanup_State after bailing on a failed DAG access will do all the
right things -- the state will get cleaned up, and then the access
counts will get decremented properly.  The comment in
rf_State_ProcessDAG() is now actually correct -- the next state *will*
be rf_Cleanup_State.

2) Move rf_DecrAccessesCountState in RF_AccessState_e to just after
rf_CleanupState.  This puts RF_AccessState_e in sync with
DefaultStates[].  Fortunately, these states are rarely referred to by
name, and so this change ends up being mostly cosmetic -- it really
only fixes cleanup behaviour for the recent "Failed to create a DAG"
changes.


To generate a diff of this commit:
cvs rdiff -r1.5 -r1.6 src/sys/dev/raidframe/raidframevar.h
cvs rdiff -r1.15 -r1.16 src/sys/dev/raidframe/rf_layout.c

Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.