Subject: Re: ffs crash after raid reconfiguration
To: Kazushi Marukawa (Jam) <jam@pobox.com>
From: Greg Oster <oster@cs.usask.ca>
List: current-users
Date: 07/21/2001 11:54:33
Kazushi Marukawa writes:
>    On Jul 21,  2:47, Greg Oster wrote:
>    > Subject: Re: ffs crash after raid reconfiguration
>    > 
>    > Ummm...  The autoconfig is not supposed to care what components are on
>    > what drives, and what ID's those drives might be at.  Can you send me th
> e
>    > 'dmesg' from this boot?
> 
> It was working like you explained long time ago.  However,
> it is not working in such a way now.  So, I was just
> thinking something was changed in the codes...  However, you
> said it should work correctly.  Maybe, I misconfigured it?
> 
> In dmesg:
>   biomask ef47 netmask ff67 ttymask ffe7
>   Kernelized RAIDframe activated
>   boot device: wd0
>   root on wd0a dumps on wd0b
>   root file system type: ffs
>   RAIDFRAME: protectedSectors is 64
>   raidlookup on device: /dev/wd4e failed!

Hmmmmm... what did your original raid0.conf file look like?  If you removed a 
disk, plus the raidlookup failed on another, that might be why it didn't auto
config (i.e. RAIDframe thought it was a 2-component failure)

> RAIDframe seems to try to configure raid system
> automatically, so I thought the 'auto-configuration flag' was
> on.  However, I'm not sure about it.  Probably, I should
> "raidctl -A yes" once later...  Ouch, I looked at the status
> of my raid.  It said clearly "Autoconfig: No".  Sorry.
> This is the reason why renamed drives are not recognized
> correctly.

If you had done the 'raidctl -A yes raid0' at sometime, then the Autoconfig 
flag shouldn't be getting cleared without you doing a 'raidctl -A no raid0'.

[snip]
>    > > I guess there may be a problem around the raid
>    > > reconfiguration.  Today and last time, I experienced panics
>    > > when I start the system without reboot after the
>    > > reconfiguration. 
>    > 
>    > Sorry... I'm not sure what you mean here.  Please explain with a bit mor
> e 
>    > detail.  If the 'raidctl -I 1234' has been done, and autoconfiguration i
> s 
>    > turned on, then it shouldn't matter where the components are and what
>    > ID's the drives are at -- the autoconfig code is supposed to find them,
>    > sort out which ones belong to what sets, and glue the appropriate ones
>    > together.  Can you also send me the output of 'raidctl -s raid0'?
>    > (for whatever state raid0 is in now.)
> 
> I'm not talking about why it is not auto-configured.  Sorry.
> I'm talking about the crash after hand re-configuration.
> The reason of no auto-configuration become clear like above.
> Thank you about that.  The flag was turned off once
> probably after first crash, then I haven't turned it on.

Did you turn it off?  It shouldn't get turned off unless you do the explicit 
'raidctl -a no raid0'.

> Anyway, what I wanted to say was this.  Both time, RAIDframe
> failed to configure.  I changed the configuration file and
> manually reconfigured them.  I mean that I changed
> /etc/raid0.conf and executed "raidctl -c /etc/raid0.conf
> raid0".  For example, I can reboot the machine at this time.
> However, both time I didn't reboot the machine.  I just
> configured, fscked (if it's necessary), mounted, and
> continued the system.  Then, system was crashed.  So, I was
> wondering it might work without crash if I rebooted the
> system.

A reboot isn't necessary.  You should be able to configure/unconfigure 
a RAID set as many times as you'd like without a reboot.

> On the other hand, there is a possibility of real file
> system corruption although there is no errors on other
> drives this time and it was crashed just after fsck last
> time.

I'm just wondering why it said that wd4e failed after you removed a drive from 
the system -- if the failed drive was removed, it shouldn't be finding another
failed drive!   You used '-c' (and not '-C') in the  
'raidctl -c /etc/raid0.conf raid0', right?  If the autoconfig didn't work, then
the 'raidctl -c' shouldn't have worked either... 

Also: what were the exact changes you made to /etc/raid0.conf, and can you 
send me the full output of 'raidctl -s raid0'. 

Later...

Greg Oster