Subject: Re: anyone know if there's a fix for this "malloc with held simple_lock" in RAIDframe bug yet?
To: Greg Oster <oster@cs.usask.ca>
From: Greg A. Woods <woods@weird.com>
List: port-alpha
Date: 03/15/2005 14:16:09
[ On Monday, March 14, 2005 at 09:37:39 (-0600), Greg Oster wrote: ]
> Subject: Re: anyone know if there's a fix for this "malloc with held simple_lock" in RAIDframe bug yet? 
>
> The change in rev 1.55 may fix this problem, but IIRC there were 
> quite a few more changes that had to be made before all (?) of the 
> locking issues were sorted out.  (You'll need at least 1.64 and 1.65 
> for this file, and probably a whole mess of other changes for other 
> files in RAIDframeland)
> 
> If you're looking for "the best RAIDframe", might I recommend you 
> use 2.0? :)  (The code in 2.0 is MUCH better than what shipped in 
> 1.6.x)

I'm not in any way prepared to upgrade to 2.0 yet, but using the
RAIDframe code from 2.0 or -current does seem to be a most excellent
idea.

I've done a very quick (enough to get it to compile cleanly) backport of
yesterday's -current RAIDframe code and it gets me a heck of a lot
further along:

[console]<@> # raidctl -v -C root/root-raid0.conf raid0
raidlookup on device: /dev/sd9e failed!
raid0: Component /dev/sd1e being configured at col: 0
         Column: 0 Num Columns: 0
         Version: 0 Serial Number: 0 Mod Counter: 0
         Clean: No Status: 0
Number of columns do not match for: /dev/sd1e
/dev/sd1e is not clean!
raid0: Component /dev/sd9e being configured at col: 1
         Column: 0 Num Columns: 0
         Version: 0 Serial Number: 0 Mod Counter: 0
         Clean: No Status: 0
Column out of alignment for: /dev/sd9e
Number of columns do not match for: /dev/sd9e
/dev/sd9e is not clean!
raid0: There were fatal errors
raid0: Fatal errors being ignored.
raid0: RAID Level 1
raid0: Components: /dev/sd1e /dev/sd9e[**FAILED**]
raid0: Total Sectors: 71129600 (34731 MB)
[console]<@> # raidctl -v -I 1412893 raid0  
raid0: no disk label
[console]<@> # raidctl -v -i raid0
raid0: no disk label
Initiating re-wrraid0: Error re-writing parity!
ite of parity
Parity Re-write status:

[console]<@> # raidctl -v -s raid0
raid0: no disk label
Components:
           /dev/sd1e: optimal
           /dev/sd9e: failed
No spares.
Component label for /dev/sd1e:
   Row: 0, Column: 0, Num Rows: 1, Num Columns: 2
   Version: 2, Serial Number: 1412893, Mod Counter: 7
   Clean: No, Status: 0
   sectPerSU: 128, SUsPerPU: 1, SUsPerRU: 1
   Queue size: 100, blocksize: 512, numBlocks: 71129600
   RAID Level: 1
   Autoconfig: No
   Root partition: No
   Last configured as: raid0
/dev/sd9e status is: failed.  Skipping label.
Parity status: DIRTY
Parity status: DIRTY
Reconstruction is 100% complete.
Parity Re-write is 100% complete.
Copyback is 100% complete.
[console]<@> # 


In fact that's probably exactly where it should be (since /dev/sd9e does
not exist as I'm in the first steps of setting up the root mirror :-)

Once I get to the poing of booting from the mirrored root then I'll send
you my diffs (and if I don't get that far I'll be asking for help! :-)

-- 
						Greg A. Woods

H:+1 416 218-0098  W:+1 416 489-5852 x122  VE3TCP  RoboHack <woods@robohack.ca>
Planix, Inc. <woods@planix.com>          Secrets of the Weird <woods@weird.com>