Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: raidframe fun



On Sun, 18 Aug 2013 08:17:20 -0700
Brian Buhrow <buhrow%nfbcal.org@localhost> wrote:

>       hello.  I believe I have reliably reproduced this behavior
> with the following steps:
> 
> 1.  Take a working raid1 set and deliberately fail one of the
> components.
> 
> 2.  Take the broken component and create it as part of a new raid1
> raid set.
> 
> 3.  Reboot.  
> 
> Notes:  Before step 1, you had a working raid1 raid set with
> autoconfigure enabled.  
> 
> As part of step 1 you failed to turn off autoconfigure
> 
> You did not explicitly do anything with autoconfigure in step 2, so it
> reamained enabled on the new raid set.
> 
> You did not perform raidctl -i as part of step 2.
> 
>       That causes the two separate raid components to come up as if
> they were 2 identical raid sets that couldn't find each other.  In my
> case, the second raid always comes up as raid7 and I don't know why
> that is. 

Both of those RAID sets will be trying to be raid0 (or whatever), but
only one of them can be.  The one that doesn't become raid0 tries to
configure itself anyway, but at a likely-unused RAID id...

Later...

Greg Oster

> On Aug 16, 11:03pm, Patrick Welche wrote:
> } Subject: Re: raidframe fun
> } On Fri, Aug 16, 2013 at 03:57:23PM -0600, Greg Oster wrote:
> } > On Fri, 16 Aug 2013 22:30:58 +0100
> } > Patrick Welche <prlw1%cam.ac.uk@localhost> wrote:
> } > 
> } > > On Fri, Aug 16, 2013 at 03:22:28PM -0600, Greg Oster wrote:
> } > > > > 
> } > > > > They didn't quite find each other again...
> } > > > 
> } > > > Right.. so the question is why?  This is not supposed to
> happen (and } > > > the only way I've ever seen it happen before is
> if a User gets } > > > playing tricks with the raid devices and
> attempts to re-combine } > > > components after they've been
> configured to different RAID sets...) } > > 
> } > > So strictly those 2 disks were the only 2 disks in a functioning
> } > > NetBSD box, they were then added to another box which also had a
> } > > raid array, at which point I had kern/38241 _kernel_lock:
> spinout } > > fun, and then loads of reboots failing to get kgdb to
> work.  It is } > > only just now after a boot -1 that I actually got
> past the kernel } > > panic, and this is what I saw. I haven't been
> in a position to play } > > tricks yet... There is an outside chance
> that they may have met } > > seatools, but I think that those were
> other disks... } > > 
> } > > BTW I don't think the data on those disks is worth worrying
> about, } > > but I'm happy to do a bit of debugging with you if you
> think there } > > is something to be fixed...
> } > 
> } > I don't think seatools would have done anything... what likely
> happened } > was that the 'other box' with a RAID array likely had a
> raid0... that } > would mean that the 'newly introduced disks', which
> also likely had } > been raid0 at last configuration, would have
> auto-configured to raid9 } > (or whatever the highest raid available
> might have been).  If only one } > of them got configured, and
> somehow the label got written back out with } > the 'new raid id',
> then the raid ids would get out-of-sync and that } > would make
> raidframe think that they were from different raid sets. } > You
> probably would have been ok had both of them gotten re-configured, }
> > but it's my guess that only one got done before the panic, at
> > which } > point things are badly messed up.
> } 
> } It just occurred to me that both autoconfigured root raid sets had
> } to have been correctly identified for raidframe to RB_ASKNAME to
> } ask which should be booted from to get to the "boot -a" condition
> } for the panic. So yes, first time around all must have been fine,
> } then kern/38241 caused trouble...
> } 
> } Thanks!
> } 
> } Patrick
> >-- End of excerpt from Patrick Welche
> 


Later...

Greg Oster


Home | Main Index | Thread Index | Old Index