Re: raidframe fun

To: Patrick Welche <prlw1%cam.ac.uk@localhost>, Greg Oster <oster%cs.usask.ca@localhost>
Subject: Re: raidframe fun
From: Brian Buhrow <buhrow%nfbcal.org@localhost>
Date: Sun, 18 Aug 2013 08:17:20 -0700

        hello.  I believe I have reliably reproduced this behavior with the
following steps:

1.  Take a working raid1 set and deliberately fail one of the components.

2.  Take the broken component and create it as part of a new raid1 raid
set.

3.  Reboot.  

Notes:  Before step 1, you had a working raid1 raid set with autoconfigure
enabled.  

As part of step 1 you failed to turn off autoconfigure

You did not explicitly do anything with autoconfigure in step 2, so it
reamained enabled on the new raid set.

You did not perform raidctl -i as part of step 2.

        That causes the two separate raid components to come up as if they
were 2 identical raid sets that couldn't find each other.  In my case, the
second raid always comes up as raid7 and I don't know why that is.
On Aug 16, 11:03pm, Patrick Welche wrote:
} Subject: Re: raidframe fun
} On Fri, Aug 16, 2013 at 03:57:23PM -0600, Greg Oster wrote:
} > On Fri, 16 Aug 2013 22:30:58 +0100
} > Patrick Welche <prlw1%cam.ac.uk@localhost> wrote:
} > 
} > > On Fri, Aug 16, 2013 at 03:22:28PM -0600, Greg Oster wrote:
} > > > > 
} > > > > They didn't quite find each other again...
} > > > 
} > > > Right.. so the question is why?  This is not supposed to happen (and
} > > > the only way I've ever seen it happen before is if a User gets
} > > > playing tricks with the raid devices and attempts to re-combine
} > > > components after they've been configured to different RAID sets...)
} > > 
} > > So strictly those 2 disks were the only 2 disks in a functioning
} > > NetBSD box, they were then added to another box which also had a
} > > raid array, at which point I had kern/38241 _kernel_lock: spinout
} > > fun, and then loads of reboots failing to get kgdb to work.  It is
} > > only just now after a boot -1 that I actually got past the kernel
} > > panic, and this is what I saw. I haven't been in a position to play
} > > tricks yet... There is an outside chance that they may have met
} > > seatools, but I think that those were other disks...
} > > 
} > > BTW I don't think the data on those disks is worth worrying about,
} > > but I'm happy to do a bit of debugging with you if you think there
} > > is something to be fixed...
} > 
} > I don't think seatools would have done anything... what likely happened
} > was that the 'other box' with a RAID array likely had a raid0... that
} > would mean that the 'newly introduced disks', which also likely had
} > been raid0 at last configuration, would have auto-configured to raid9
} > (or whatever the highest raid available might have been).  If only one
} > of them got configured, and somehow the label got written back out with
} > the 'new raid id', then the raid ids would get out-of-sync and that
} > would make raidframe think that they were from different raid sets.
} > You probably would have been ok had both of them gotten re-configured,
} > but it's my guess that only one got done before the panic, at which
} > point things are badly messed up.
} 
} It just occurred to me that both autoconfigured root raid sets had
} to have been correctly identified for raidframe to RB_ASKNAME to
} ask which should be booted from to get to the "boot -a" condition
} for the panic. So yes, first time around all must have been fine,
} then kern/38241 caused trouble...
} 
} Thanks!
} 
} Patrick
>-- End of excerpt from Patrick Welche

Follow-Ups:
- Re: raidframe fun
  - From: Greg Oster

References:
- Re: raidframe fun
  - From: Patrick Welche

Prev by Date: Build break on port amd64 - balloon.c
Next by Date: Re: raidframe fun
Previous by Thread: Re: raidframe fun
Next by Thread: Re: raidframe fun
Indexes:

Home | Main Index | Thread Index | Old Index