Subject: Re: newbie with RAIDFRAME troubles
To: None <plonnie@home.nl>
From: Greg Oster <oster@cs.usask.ca>
List: netbsd-help
Date: 07/25/2001 09:42:52
"bjorn" writes:
> Hi,
> I started out it would be nice to configure a raid level 0 device but I=
'm
> having difficulties in configuring a RAIDFRAME properly. I hope sopmene=
can
> help me I dont know exactly what information to send so here is the
> output from dmesg, a piece of /var/log/messages en the output from
> raidctl -s raid0
For not knowing what info to send, you've done a reasonably good job :)
> ------------------------------------------
[snip]
> RAIDFRAME: protectedSectors is 64
> raid0: Too many different mod counters!
> raid0: Component /dev/sd0e being configured at row: 0 col: 0
> Row: 0 Column: 0 Num Rows: 1 Num Columns: 3
> Version: 2 Serial Number: 123456 Mod Counter: 113
> Clean: No Status: 0
> /dev/sd0e is not clean!
> raid0: Component /dev/sd1e being configured at row: 0 col: 1
> Row: 0 Column: 1 Num Rows: 1 Num Columns: 3
> Version: 2 Serial Number: 123456 Mod Counter: 112
> Clean: No Status: 0
> /dev/sd1e has a different modfication count: 113 112
> /dev/sd1e is not clean!
> raid0: Component /dev/sd2e being configured at row: 0 col: 2
> Row: 0 Column: 2 Num Rows: 1 Num Columns: 3
> Version: 2 Serial Number: 123456 Mod Counter: 104
> Clean: No Status: 0
> /dev/sd2e has a different modfication count: 113 104
> /dev/sd2e is not clean!
> raid0: There were fatal errors
> Closing vnode for row: 0 col: 0
> Closing vnode for row: 0 col: 1
> Closing vnode for row: 0 col: 2
> RAIDFRAME: failed rf_ConfigureDisks with 22
So the array is quite unhappy right now, since sd2e had failed at some po=
int. =
> ------------------------------------------
> #piece of the messages
> <<<<<<<
> Jul 25 17:21:42 gonzo /netbsd: raid0: Component /dev/sd2e being configu=
red
> at row: 0 col: 2
> Jul 25 17:21:42 gonzo /netbsd: Row: 0 Column: 2 Num Rows: 1 Nu=
m
> Columns: 3
> Jul 25 17:21:42 gonzo /netbsd: Version: 2 Serial Number: 12345=
6 Mod
> Counter: 104
> Jul 25 17:21:42 gonzo /netbsd: Clean: No Status: 0
> Jul 25 17:21:42 gonzo /netbsd: /dev/sd2e has a different modfication co=
unt:
> 113 104
> Jul 25 17:21:42 gonzo /netbsd: /dev/sd2e is not clean!
> Jul 25 17:21:42 gonzo /netbsd: raid0: There were fatal errors
> Jul 25 17:21:42 gonzo /netbsd: raid0: Fatal errors being ignored.
> Jul 25 17:21:42 gonzo /netbsd: RAIDFRAME: Configure (RAID Level 0): tot=
al
> number of sectors is 53348736 (26049 MB)
> Jul 25 17:21:42 gonzo /netbsd: RAIDFRAME(RAID Level 0): Using 9 floatin=
g
> recon bufs with no head sep limit
At the time this configure is done, sd2e has a 'modification counter' of =
104.
You don't show what sd0 and sd1 reported, but they should have the same =
modification counter value.
> Jul 25 17:43:32 gonzo /netbsd: sd1(siop0:2:0): command timeout
> Jul 25 17:43:32 gonzo /netbsd: siop0: scsi bus reset
> Jul 25 17:43:32 gonzo /netbsd: cmd 0xc0e10180 (target 2:0) in reset lis=
t
> Jul 25 17:43:32 gonzo /netbsd: cmd 0xc0e10000 (target 2:0) in reset lis=
t
> Jul 25 17:43:32 gonzo /netbsd: cmd 0xc0e10180 (status 2) about to be
> processed
> Jul 25 17:43:32 gonzo /netbsd: cmd 0xc0e10000 (status 2) about to be
> processed
> Jul 25 17:43:32 gonzo /netbsd: siop0: target 2 using 16bit transfers
> Jul 25 17:43:32 gonzo /netbsd: siop0: target 2 now synchronous at 20.0M=
hz,
> offset 15
> Jul 25 17:44:32 gonzo /netbsd: sd1(siop0:2:0): command timeout
> Jul 25 17:44:32 gonzo /netbsd: siop0: scsi bus reset
> Jul 25 17:44:32 gonzo /netbsd: cmd 0xc0ea2240 (target 0:0) in reset lis=
t
> Jul 25 17:44:32 gonzo /netbsd: cmd 0xc0e102c0 (target 2:0) in reset lis=
t
> Jul 25 17:44:32 gonzo /netbsd: cmd 0xc0e10040 (target 2:0) in reset lis=
t
> Jul 25 17:44:32 gonzo /netbsd: cmd 0xc0ea2240 (status 2) about to be
> processed
> Jul 25 17:44:32 gonzo /netbsd: cmd 0xc0e102c0 (status 2) about to be
> processed
> Jul 25 17:44:32 gonzo /netbsd: cmd 0xc0e10040 (status 2) about to be
> processed
These sorts of errors can't be helping anything. If you look in earlier =
logs, =
you may find the error that caused sd2e to be marked as failed.
> ------------------------------------------
> #raidctl -s raid0
> Components:
> /dev/sd0e: optimal
> /dev/sd1e: failed
> /dev/sd2e: optimal
> No spares.
> Component label for /dev/sd0e:
> Row: 0 Column: 0 Num Rows: 1 Num Columns: 3
> Version: 2 Serial Number: 123456 Mod Counter: 123
> Clean: No Status: 0
> sectPerSU: 64 SUsPerPU: 1 SUsPerRU: 1
> RAID Level: 0 blocksize: 512 numBlocks: 17782912
> Autoconfig: No
> Root partition: No
> Last configured as: raid0
> /dev/sd1e status is: failed. Skipping label.
> Component label for /dev/sd2e:
> Row: 0 Column: 2 Num Rows: 1 Num Columns: 3
> Version: 2 Serial Number: 123456 Mod Counter: 123
> Clean: No Status: 0
> sectPerSU: 64 SUsPerPU: 1 SUsPerRU: 1
> RAID Level: 0 blocksize: 512 numBlocks: 17782912
> Autoconfig: No
> Root partition: No
> Last configured as: raid0
> Parity status: DIRTY
> Reconstruction is 100% complete.
> Parity Re-write is 100% complete.
> Copyback is 100% complete.
> ------------------------------------------
This RAID set is quite unhappy too, since a RAID 0 cannot tollerate any =
failures. =
> however when I first did raidctl -C /etc/raid0.conf raid0 all /dev/ds?e=
> where optimal
Right. By the looks of things, you've done a 'raidctl -I 123456 raid0', =
right?
If that was done after the most recent 'raidctl -C', then you'll need to =
look =
through /var/log/messages* to see what caused the IO to fail. If you've =
been =
playing with various configurations, and forgot to do the 'raidctl -I 123=
456 =
raid0' after doing the 'raidctl -C', then you'll need to rebuild the set =
and =
remember the 'raidctl -I' :)
Later...
Greg Oster