tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

re: Strange problem with raidframe under NetBSD-5.1



> buhrow%lothlorien.nfbcal.org@localhost (Brian Buhrow) wrote:
> 
> >     Hello.  I've just encountered a strange problem with
> > raidframe under NetBSD-5.1 that I can't immediately explain.
> > 
> >     this machine has been runing a raid set since 2007.  The raid
> > set was originally constructed under NetBSD-3.  For the past year,
> > it's been running 5.0_stable with sources from 
> > July 2009 or so without a problem.  Last night, I installed
> > NetBSD-5.1 with sources from May 23 2012 or so.  Now, the raid0 set
> > fails the first component with an i/o error with no corresponding
> > disk  errors underneath. Trying to reconstruct to the failed
> > component also fails with an error of 22, invalid argument.  Looking
> > at the dmesg output compared with the output of raidctl -s reveals
> > the problem.  The size of the raid in the dmesg output is bogus, and,
> > if the raid driver dries to write as many blocks as is reported by
> > the configuration output, it will surely fail as it does. However,
> > raidctl -g /dev/wd0a looks ok and the underlying disk label
> > on /dev/wd0a looks ok as well. Where does the raid driver get the
> > numbers it reports on bootup? Also, there is a second raid set on
> > this machine, the second half of the same two drives, which was
> > constructed at the same time.  It works fine with the new code.
> > 
> >     Below is the output of the boot sequence before the upgrade,
> > and then the boot sequence after the upgrade.  Below that are the
> > output of  raidctl -s raid0 and raidctl -g /dev/wd0a raid0.
> >     It looks to me like something is not zero'd out in the
> > component label that should be, but some change in the raid code is
> > no longer ignoring the noise in the component label.
> 
> Correct.
> 
> > Any ideas?
> 
> There was some code added a while back to handle components whose sizes
> were larger than 32-bit.  But 5.1_stable should have the code to handle
> those 'bogus' values in the component label and do the appropriate
> thing (see rf_fix_old_label_size in rf_netbsdkintf.c version
> 1.250.4.11, for example).
> 
> What is your code rev for src/sys/dev/raidframe/rf_netbsdkintf.c ?

looks like netbsd-5 is missing this change:

revision 1.284
date: 2011/03/18 23:53:26;  author: mrg;  state: Exp;  lines: +27 -11
apply the fix_label hack to partitionSizeHi as well.  it's needed there.
to do so, move the call to fix the label inside of rf_reasonable_label()
itself, so we can fix the partition sizes before calling
rf_component_label_partitionsize() itself.

fixes the failure mode where i had garbage not in numBlocksHi but in
partitionSizeHi, and the check against rf_component_label_partitionsize()
would fail and my raid would not auto-configure.



.mrg.


Home | Main Index | Thread Index | Old Index