Strange problem with raidframe under NetBSD-5.1

To: Greg Oster <oster%cs.usask.ca@localhost>, Edgar Fuß <ef%math.uni-bonn.de@localhost>
Subject: Strange problem with raidframe under NetBSD-5.1
From: buhrow%lothlorien.nfbcal.org@localhost (Brian Buhrow)
Date: Tue, 12 Jun 2012 14:44:55 -0700

        Hello.  I've just encountered a strange problem with raidframe under
NetBSD-5.1 that I can't immediately explain.

        this machine has been runing a raid set since 2007.  The raid set was
originally constructed under NetBSD-3.  For the past year, it's been 
running 5.0_stable with sources from 
July 2009 or so without a problem.      Last night, I installed NetBSD-5.1 with
sources from May 23 2012 or so.  Now, the raid0 set fails the first
component with an i/o error with no corresponding disk  errors underneath.  
Trying to reconstruct to the failed component also fails with an error of
22, invalid argument.  Looking at the dmesg output compared with the output
of raidctl -s reveals the problem.  The size of the raid in the dmesg
output is bogus, and, if the raid driver dries to write as many blocks as
is reported by the configuration output, it will surely fail as it does. 
However, raidctl -g /dev/wd0a looks ok and the underlying disk label on
/dev/wd0a looks ok as well.
        Where does the raid driver get the numbers it reports on bootup?
Also, there is a second raid set on this machine, the second half of the
same two drives, which was constructed at the same time.  It works fine
with the new code.

        Below is the output of the boot sequence before the upgrade, and then
the boot sequence after the upgrade.  Below that are the output of  raidctl
-s raid0 and raidctl -g /dev/wd0a raid0.
        It looks to me like something is not zero'd out in the component label 
that
should be, but some change in the raid code is no longer ignoring the noise
in the component label.
Any ideas?

-thanks
-Brian

dmesg boot from NetBSD-5.0_stable from July 2009 sources.

raid0: RAID Level 1
raid0: Components: /dev/wd0a /dev/wd1a
raid0: Total Sectors: 268435264 (131071 MB)

Dmesg boot from NetBSD-5.1_stable from May 23 2012 sources

raid0: RAID Level 1
raid0: Components: /dev/wd0a[**FAILED**] /dev/wd1a
raid0: Total Sectors: 39213563641016896 (19147247871590 MB)

Output from raidctl -s raid0 and raidctl -g /dev/wd0a raid0
(From NetBSD-5.1 sources of May 23, 2012)

raidctl -s raid0
Components:
           /dev/wd0a: failed
           /dev/wd1a: optimal
No spares.
/dev/wd0a status is: failed.  Skipping label.
Component label for /dev/wd1a:
   Row: 0, Column: 1, Num Rows: 1, Num Columns: 2
   Version: 2, Serial Number: 2007050100, Mod Counter: 1580164
   Clean: No, Status: 0
   sectPerSU: 64, SUsPerPU: 1, SUsPerRU: 1
   Queue size: 100, blocksize: 512, numBlocks: 268435264
   RAID Level: 1
   Autoconfig: Yes
   Root partition: Yes
   Last configured as: raid0
Parity status: clean
Reconstruction is 100% complete.
Parity Re-write is 100% complete.
Copyback is 100% complete.

raidctl -g /dev/wd0a raid0
Component label for /dev/wd0a:
   Row: 0, Column: 0, Num Rows: 1, Num Columns: 2
   Version: 2, Serial Number: 2007050100, Mod Counter: 1580148
   Clean: Yes, Status: 0
   sectPerSU: 64, SUsPerPU: 1, SUsPerRU: 1
   Queue size: 100, blocksize: 512, numBlocks: 268435264
   RAID Level: 1
   Autoconfig: Yes
   Root partition: Yes
   Last configured as: raid0

Output from disklabel /dev/rwd0d

# /dev/rwd0d:
type: unknown
disk: st3300631as_r
label: 
flags:
bytes/sector: 512
sectors/track: 63
tracks/cylinder: 16
sectors/cylinder: 1008
cylinders: 581421
total sectors: 586072368
rpm: 7200
interleave: 1
trackskew: 0
cylinderskew: 0
headswitch: 0           # microseconds
track-to-track seek: 0  # microseconds
drivedata: 0 

5 partitions:
#        size    offset     fstype [fsize bsize cpg/sgs]
 a: 268435329        63       RAID                     # (Cyl.      0*- 266304*)
 c: 586072305        63     unused      0     0        # (Cyl.      0*- 581420)
 d: 586072368         0     unused      0     0        # (Cyl.      0 - 581420)
 e: 317636975 268435393       RAID                     # (Cyl. 266304*- 581420)

Follow-Ups:
- Re: Strange problem with raidframe under NetBSD-5.1
  - From: Greg Oster
- Re: Strange problem with raidframe under NetBSD-5.1
  - From: Edgar Fuß

References:
- Re: RAIDframe performance vs. stripe size
  - From: Greg Oster

Prev by Date: Re: RAIDframe parity rebuild
Next by Date: Re: Strange problem with raidframe under NetBSD-5.1
Previous by Thread: Re: RAIDframe performance vs. stripe size
Next by Thread: Re: Strange problem with raidframe under NetBSD-5.1
Indexes:

Home | Main Index | Thread Index | Old Index