netbsd-help: Re: newbie with RAIDFRAME troubles

Subject: Re: newbie with RAIDFRAME troubles
To: None <plonnie@home.nl>
From: Greg Oster <oster@cs.usask.ca>
List: netbsd-help
Date: 07/25/2001 09:42:52
"bjorn" writes:
> Hi,
> I started out it would be nice to configure a raid level 0 device but I=
'm
> having difficulties in configuring a RAIDFRAME properly. I hope sopmene=
 can
> help me I dont know exactly what information to send so here is the
> output from dmesg, a piece of /var/log/messages en the output from
> raidctl -s raid0

For not knowing what info to send, you've done a reasonably good job :)

> ------------------------------------------
[snip]
> RAIDFRAME: protectedSectors is 64
> raid0: Too many different mod counters!
> raid0: Component /dev/sd0e being configured at row: 0 col: 0
>          Row: 0 Column: 0 Num Rows: 1 Num Columns: 3
>          Version: 2 Serial Number: 123456 Mod Counter: 113
>          Clean: No Status: 0
> /dev/sd0e is not clean!
> raid0: Component /dev/sd1e being configured at row: 0 col: 1
>          Row: 0 Column: 1 Num Rows: 1 Num Columns: 3
>          Version: 2 Serial Number: 123456 Mod Counter: 112
>          Clean: No Status: 0
> /dev/sd1e has a different modfication count: 113 112
> /dev/sd1e is not clean!
> raid0: Component /dev/sd2e being configured at row: 0 col: 2
>          Row: 0 Column: 2 Num Rows: 1 Num Columns: 3
>          Version: 2 Serial Number: 123456 Mod Counter: 104
>          Clean: No Status: 0
> /dev/sd2e has a different modfication count: 113 104
> /dev/sd2e is not clean!
> raid0: There were fatal errors
> Closing vnode for row: 0 col: 0
> Closing vnode for row: 0 col: 1
> Closing vnode for row: 0 col: 2
> RAIDFRAME: failed rf_ConfigureDisks with 22

So the array is quite unhappy right now, since sd2e had failed at some po=
int.  =


> ------------------------------------------
> #piece of the messages
> <<<<<<<
> Jul 25 17:21:42 gonzo /netbsd: raid0: Component /dev/sd2e being configu=
red
> at row: 0 col: 2
> Jul 25 17:21:42 gonzo /netbsd:          Row: 0 Column: 2 Num Rows: 1 Nu=
m
> Columns: 3
> Jul 25 17:21:42 gonzo /netbsd:          Version: 2 Serial Number: 12345=
6 Mod
> Counter: 104
> Jul 25 17:21:42 gonzo /netbsd:          Clean: No Status: 0
> Jul 25 17:21:42 gonzo /netbsd: /dev/sd2e has a different modfication co=
unt:
> 113 104
> Jul 25 17:21:42 gonzo /netbsd: /dev/sd2e is not clean!
> Jul 25 17:21:42 gonzo /netbsd: raid0: There were fatal errors
> Jul 25 17:21:42 gonzo /netbsd: raid0: Fatal errors being ignored.
> Jul 25 17:21:42 gonzo /netbsd: RAIDFRAME: Configure (RAID Level 0): tot=
al
> number of sectors is 53348736 (26049 MB)
> Jul 25 17:21:42 gonzo /netbsd: RAIDFRAME(RAID Level 0): Using 9 floatin=
g
> recon bufs with no head sep limit

At the time this configure is done, sd2e has a 'modification counter' of =
104.
You don't show what sd0 and sd1 reported, but they should have the same =

modification counter value.

> Jul 25 17:43:32 gonzo /netbsd: sd1(siop0:2:0): command timeout
> Jul 25 17:43:32 gonzo /netbsd: siop0: scsi bus reset
> Jul 25 17:43:32 gonzo /netbsd: cmd 0xc0e10180 (target 2:0) in reset lis=
t
> Jul 25 17:43:32 gonzo /netbsd: cmd 0xc0e10000 (target 2:0) in reset lis=
t
> Jul 25 17:43:32 gonzo /netbsd: cmd 0xc0e10180 (status 2) about to be
> processed
> Jul 25 17:43:32 gonzo /netbsd: cmd 0xc0e10000 (status 2) about to be
> processed
> Jul 25 17:43:32 gonzo /netbsd: siop0: target 2 using 16bit transfers
> Jul 25 17:43:32 gonzo /netbsd: siop0: target 2 now synchronous at 20.0M=
hz,
> offset 15
> Jul 25 17:44:32 gonzo /netbsd: sd1(siop0:2:0): command timeout
> Jul 25 17:44:32 gonzo /netbsd: siop0: scsi bus reset
> Jul 25 17:44:32 gonzo /netbsd: cmd 0xc0ea2240 (target 0:0) in reset lis=
t
> Jul 25 17:44:32 gonzo /netbsd: cmd 0xc0e102c0 (target 2:0) in reset lis=
t
> Jul 25 17:44:32 gonzo /netbsd: cmd 0xc0e10040 (target 2:0) in reset lis=
t
> Jul 25 17:44:32 gonzo /netbsd: cmd 0xc0ea2240 (status 2) about to be
> processed
> Jul 25 17:44:32 gonzo /netbsd: cmd 0xc0e102c0 (status 2) about to be
> processed
> Jul 25 17:44:32 gonzo /netbsd: cmd 0xc0e10040 (status 2) about to be
> processed

These sorts of errors can't be helping anything.  If you look in earlier =
logs, =

you may find the error that caused sd2e to be marked as failed.

> ------------------------------------------
> #raidctl -s raid0
> Components:
>            /dev/sd0e: optimal
>            /dev/sd1e: failed
>            /dev/sd2e: optimal
> No spares.
> Component label for /dev/sd0e:
>    Row: 0 Column: 0 Num Rows: 1 Num Columns: 3
>    Version: 2 Serial Number: 123456 Mod Counter: 123
>    Clean: No Status: 0
>    sectPerSU: 64 SUsPerPU: 1 SUsPerRU: 1
>    RAID Level: 0  blocksize: 512 numBlocks: 17782912
>    Autoconfig: No
>    Root partition: No
>    Last configured as: raid0
> /dev/sd1e status is: failed.  Skipping label.
> Component label for /dev/sd2e:
>    Row: 0 Column: 2 Num Rows: 1 Num Columns: 3
>    Version: 2 Serial Number: 123456 Mod Counter: 123
>    Clean: No Status: 0
>    sectPerSU: 64 SUsPerPU: 1 SUsPerRU: 1
>    RAID Level: 0  blocksize: 512 numBlocks: 17782912
>    Autoconfig: No
>    Root partition: No
>    Last configured as: raid0
> Parity status: DIRTY
> Reconstruction is 100% complete.
> Parity Re-write is 100% complete.
> Copyback is 100% complete.
> ------------------------------------------

This RAID set is quite unhappy too, since a RAID 0 cannot tollerate any =

failures. =


> however when I first did raidctl -C /etc/raid0.conf raid0 all /dev/ds?e=

> where optimal

Right.  By the looks of things, you've done a 'raidctl -I 123456 raid0', =
right?
If that was done after the most recent 'raidctl -C', then you'll need to =
look =

through /var/log/messages* to see what caused the IO to fail.  If you've =
been =

playing with various configurations, and forgot to do the 'raidctl -I 123=
456 =

raid0' after doing the 'raidctl -C', then you'll need to rebuild the set =
and =

remember the 'raidctl -I' :)

Later...

Greg Oster