Subject: Re: raidframe confused about status of components
To: grant beattie <grant@NetBSD.org>
From: Greg Oster <oster@cs.usask.ca>
List: current-users
Date: 12/26/2003 17:18:54
grant beattie writes:
> hi,
> 
> after my -current/i386 desktop hung for the 6th time in 3 days (serial
> console, break to ddb impossible, uptime < 2 minutes after the
> previous hard hang), raidframe was confused about the status of my
> mirror components:
> 
> Hosed component: /dev/wd6a

Do you still have the lines from above this?  They may tell you the 
story... 

> raid0: Component /dev/wd4a being configured at row: 0 col: 0
>          Row: 0 Column: 0 Num Rows: 1 Num Columns: 2
>          Version: 2 Serial Number: 444444 Mod Counter: 1026742437
>          Clean: No Status: 0
> /dev/wd4a is not clean!
> raid0: Ignoring /dev/wd6a
> raid0: RAID Level 1
> raid0: Components: /dev/wd4a /dev/wd6a[**FAILED**]
> raid0: Total Sectors: 156301312 (76319 MB)
> raid0: Error re-writing parity!
> 
> # /sbin/raidctl -p raid0
> /dev/raid0d: Parity status: DIRTY
> # /sbin/raidctl -P raid0
> /dev/raid0d: Parity status: DIRTY
> /dev/raid0d: Initiating re-write of parity
> raid0: Error re-writing parity!
> /dev/raid0d: Parity Re-write complete
> 
> I had to run 'raidctl -R /dev/wd6a raid0' to beat it into submission.
> 
> this shouldn't have been necessary, should it?

Yes, it was.  Looking at the 64K chunks you posted, the "modification 
counters" were 0x3d32d8c2 and 0x3d32d8a3 for the two components.  If the 
modification counters don't match the majority of other component 
labels (but all other items in the component label indicate that 
said component belongs to the set) then the component is brought 
into the set, but as a failed component.  In the case of RAID 1, the 
component with the highest modification counter is considered to be 
the "most up-to-date" component.

As to why the "mod counters" didn't match is anyones guess -- if 
you're seeing hard hangs, an inopportune hang when the mod counters 
are being updated could easily cause this sort of problem.

Later...

Greg Oster