NetBSD-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: raidframe panic



On Sun, 27 May 2012 07:13:37 -0400
Greg Troxel <gdt%ir.bbn.com@localhost> wrote:

> 
> I have a machine with 2 x 400G SATA drives, which are too small, so I
> bought 2 * 2T SATA drives as replacements.  I put one of the new ones
> in an Aluratek docking station (== external drive case electrically,
> with mechanicals for easy swapping), and then (netbsd-5, i386)
> 
>   created gpt label and RF partition
> 
>   created RAID1 set with this drive and a missing drive
> 
>   disklabeled the drive
> 
>   made filesystems, copied some data, etc.
> 
> Then, I
> 
>   unmounted the filesystems
> 
>   didn't do anything about the raid set
> 
>   powered off the drive
> 
>   waited about 10s
> 
>   did something like 'raidctl -s raid1', or 'disklabel raid1', and
> got a crash
> 
> It seems unplugging USB drives ought to be stable.  I realize raid is
> tricky, because there's deconfiguring and there's failed.  But drive
> going away from USB is pretty much failed, so this ought to be
> graceful. Am I confused, or have I found a bug?

I suspect you've found a bug...

> Separately, it seems like I should have done 'raidctl -u'.

That would have avoided this problem, yes....

> Also, it would be nice if
> 
>   unmounted filesystems caused the raid set to be put in a state
> similar to unconfigured relative to clean/dirty status (it probably
> does)

Yes, it does that.

>   when a raid set's disks all go away, perhaps it should just vanish
> if it's autoconfigured, so plugging in two usb disks of a RAID1 set
>   brings it back and it's just like a single disk.

RAIDframe isn't really designed to work this way...

> 
> But I think if the result was that raid1 showed as having the missing
> disk as failed/missing and no panic, things would be much better.  
> 
> 
> 
> #0  0xc05e55c2 in cpu_reboot ()
> #1  0xc0516890 in panic ()
> #2  0xc05e8467 in trap ()
> #3  0xc010ccb7 in calltrap ()
> #4  0xc05e06a1 in db_read_bytes ()
> #5  0xc01dabf7 in db_get_value ()
> #6  0xc05e107d in db_stack_trace_print ()
> #7  0xc0516865 in panic ()
> #8  0xc05e8467 in trap ()
> #9  0xc010ccb7 in calltrap ()
> #10 0xc04acd8a in dkstrategy ()

Can you tell me what line in dkstrategy it's trapping on?  My guess is
that bp->b_dev is no longer pointing to a valid device, but that should
have been caught in bdev_strategy()...

> #11 0xc050b289 in bdev_strategy ()
> #12 0xc0204ca9 in rf_DispatchKernelIO ()
> #13 0xc01fce91 in rf_DiskIOEnqueue ()
> #14 0xc01fb17f in rf_DiskReadFuncForThreads ()
> #15 0xc01ffdd9 in FireNode ()
> #16 0xc01ffeed in FireNodeList ()
> #17 0xc02003d0 in rf_FinishNode ()
> #18 0xc01fae3d in rf_NullNodeFunc ()
> #19 0xc01ffdd9 in FireNode ()
> #20 0xc0200065 in rf_DispatchDAG ()
> #21 0xc0214c27 in rf_State_ExecuteDAG ()
> #22 0xc02155aa in rf_ContinueRaidAccess ()
> #23 0xc01feffd in rf_DoAccess ()
> #24 0xc0204fbe in raidstart ()
> #25 0xc0200880 in rf_RaidIOThread ()
> #26 0xc01002e1 in lwp_trampoline ()


Later...

Greg Oster


Home | Main Index | Thread Index | Old Index